revision of manuscript

0badf884 · Kevin Kunzmann · 21167658 · 0badf884
Commit 0badf884 authored Mar 22, 2019 by Kevin Kunzmann
Show whitespace changes
Inline Side-by-side

Showing with 311 additions and 302 deletions

manuscript/manuscript.Rmd manuscript/manuscript.Rmd +311 -302

No files found.
--- a/manuscript/manuscript.Rmd
+++ b/manuscript/manuscript.Rmd
@@ -463,9 +463,7 @@ for (i in 1:config$folds) {
  )
 }
-```
-```{r misc, include=FALSE}
 modelnames <- lapply(
    list.dirs(
      sprintf("%s/validation/posteriors", params$data_dir), 
@@ -540,7 +538,7 @@ All models were fit on the entire available data after removing the
 180 +/- 14 days post-injury observation from the respective test fold
 thus mimicking a missing completely at random missing data mechanism.
 The distribution of GOSe values in the respective three test sets is well 
-balanced, cf. Figure ??? (Appendix).
+balanced, cf. Figure ??? (cf. Appendix).
 Performance is assessed using the absolute-count and the normalized 
 (proportions) confusion matrices as well as
 bias, mean absolute error (MAE), and root mean squared error (RMSE).
@@ -551,7 +549,7 @@ RMSE puts more weight on large deviations as compared to RMSE.
 Comparisons in terms of bias, MAE, and RMSE tacitly assume that 
 GOSe values can be sensibly interpreted on an interval scale.
 We therefore also consider $Pr[est > true] - Pr[est < true]$ as an
-alternative measure of bias.
+alternative measure of bias which does not require this tacit assumption.
 Note that the scale is not directlz comparable to the one of the
 other three quantities!
 All measures are considered both conditional on the ground-truth
@@ -584,42 +582,27 @@ is depicted in Figure ??? both conditional on LOCF being applicable and,
 excluding LOCF, on the entire test set.
 ```{r overall-comparison-all-methods, echo=FALSE, fig.height=7}
-p1 <- df_predictions %>%
+plot_summary_measures <- function(df, label) {
+  df_predictions %>%
    filter(!(gupi %in% idx)) %>% 
    group_by(model, fold) %>% 
    summarize(
      RMSE = mean((GOSE - prediction)^2, na.rm = TRUE) %>% sqrt,
      MAE  = mean(abs(GOSE - prediction), na.rm = TRUE),
      Bias = mean(prediction, na.rm = TRUE) - mean(GOSE, na.rm = TRUE),
-    `Pr[est > true] - Pr[est < true]` = mean(prediction > GOSE, na.rm = TRUE) - mean(prediction < GOSE, na.rm = TRUE)
+      `Pr[est. > true] - Pr[est. < true]` = mean(prediction > GOSE, na.rm = TRUE) - mean(prediction < GOSE, na.rm = TRUE)
    ) %>% 
    ungroup %>% 
    gather(error, value, -model, -fold) %>% 
-  ggplot(aes(model, value)) + 
+    mutate(
-    geom_hline(yintercept = 0, color = "black") +
+      error = factor(error, c(
-    geom_boxplot() + 
+        "Bias",
-    facet_wrap(~error, nrow = 1) +
+        "Pr[est. > true] - Pr[est. < true]",
-    scale_y_continuous(name = "", breaks = seq(-2, 8, .25), limits = c(-.5, 1.5)) +
+        "MAE",
-    scale_x_discrete("") + 
+        "RMSE"
-    theme_bw() +
+      ))
-    theme(
-      panel.grid.minor   = element_blank(),
-      panel.grid.major.x = element_blank(),
-      axis.text.x        = element_text(angle = 66, hjust = 1)
-    ) +
-    ggtitle("Overall summary measures, LOCF subset")
-p2 <- df_predictions %>%
-  filter(model != "LOCF") %>% 
-  group_by(model, fold) %>% 
-  summarize(
-    RMSE = mean((GOSE - prediction)^2, na.rm = TRUE) %>% sqrt,
-    MAE  = mean(abs(GOSE - prediction), na.rm = TRUE),
-    Bias = mean(prediction, na.rm = TRUE) - mean(GOSE, na.rm = TRUE),
-    `Pr[est > true] - Pr[est < true]` = mean(prediction > GOSE, na.rm = TRUE) - mean(prediction < GOSE, na.rm = TRUE)
    ) %>% 
-  ungroup %>% 
-  gather(error, value, -model, -fold) %>% 
    ggplot(aes(model, value)) + 
      geom_hline(yintercept = 0, color = "black") +
      geom_boxplot() + 
@@ -632,35 +615,46 @@ p2 <- df_predictions %>%
        panel.grid.major.x = element_blank(),
        axis.text.x        = element_text(angle = 66, hjust = 1)
      ) +
-    ggtitle("Overall summary measures, full test set")
+      ggtitle(label)
+}
 cowplot::plot_grid(
-  p1, 
+    plot_summary_measures(
-  p2, 
+      df_predictions %>% filter(gupi %in% idx), 
+      "Summary measures, LOCF subset"
+    ), 
+    plot_summary_measures(
+      df_predictions, 
+      "Summary measures, full test set"
+    ), 
    ncol = 1, align = "v", axis = "lr"
-)
+  )
 ```
-Firstly, as could be expected, LOCF is overall negatively biased, i.e., 
+Firstly, LOCF is overall negatively biased, i.e., 
 on average it imputes lower-than-observed GOSe values.
-This indicates that there is a population average trend towards continued
+This reflects a population average trend towards continued
-recovery within the first 6 months post-injury.
+recovery within the first 6 months post injury.
-In terms of accuracy, the LOCF does perform worst but differences between
+The fact that both ways of quantifying bias qualitatively agree, 
+indicates that the interpretation of GOSe as an interval measure which is
+tacitly underlying Bias, MAE, and RMSE comparisons is not too restrictive.
+In terms of accuracy, LOCF does perform worst but differences between
 methods are less pronounced than in terms of bias.
-Notably, the difference in terms of RMSE is more substantial than in terms of
+Notably, the RMSE difference between LOCF and the other methods is slightly
-MAE which indicates that LOCF tends to incur more large deviations across
+larger than the MAE difference which indicates that LOCF tends to produce more 
-several GOSe categories than the other methods considered here.
+large deviations, i.e., across several GOSe categories.
 Secondly, including baseline covariates only affects the GP regression model
 notably. 
 Both the MM and MSM model perform more or less the same irrespective of adjustment
 for baseline covariates.
-This indicates that the marginal additional predictive value of baseline 
+This indicates that the additional predictive value of baseline 
 covariates over the information contained in at least one observed GOSe value
 is limited.
-Furthermore, note that both variants of the mixed effects model fail to remove
+Furthermore, note that both variants of the mixed effects model fail to correct
 the overall bias of the imputed values.
-We proceed with an in-depth analyses of a subset of models both in direct 
+We proceed with a detailed analyses of a subset of models both in direct 
 comparison with LOCF and in the entire data set including those cases where
 LOCF is not applicable.
 In the following we only consider the baseline-adjusted Gaussian process model 
@@ -668,18 +662,23 @@ In the following we only consider the baseline-adjusted Gaussian process model
 and the multi-state model without baseline covariates ('MSM').
 The rationale behind dropping baseline adjustment for MM and MSM being that
 the additional complexity does not substantially alter overall performance.
+On the other hand, the GP model benefits from the inclusion of the IMPACT
+baseline covariates.
 ## Detailed comparison conditional on LOCF subset
 We first consider results for the set of test cases which allow LOCF imputation 
 (n = `r df_predictions %>% filter(model == "LOCF") %>% nrow - length(idx)`).
-Both the raw count as well as the relative (by true GOSe) confusion matrices
+Both the raw count as well as the relative (by left-out true GOSe) confusion matrices
 are presented in Figure ???.
 ```{r confusion-matrix-locf, warning=FALSE, message=FALSE, echo=FALSE, fig.cap="Confusion matrices on LOCF subset."}
-df_average_confusion_matrices <- df_predictions %>% 
+plot_confusion_matrices <- function(df_predictions, models) {
-  filter(!(gupi %in% idx)) %>% 
+  df_average_confusion_matrices <- df_predictions %>% 
+    filter(model %in% models) %>% 
    group_by(fold, model) %>% 
    do(
      confusion_matrix = caret::confusionMatrix(
@@ -693,16 +692,21 @@ df_average_confusion_matrices <- df_predictions %>%
    unnest %>% 
    group_by(model, `Predicted GOSE`, `True GOSE`) %>% 
    summarize(n = mean(n)) %>% 
-  ungroup
+    ungroup %>% 
+    mutate(model = factor(model, models))
-p_cnf_mtrx_raw <- df_average_confusion_matrices %>%
+  p_cnf_mtrx_raw <- df_average_confusion_matrices %>%
-  filter(model %in% c("LOCF", "MM", "GP + cov", "MSM")) %>% 
-  group_by(model, `True GOSE`) %>%
    ggplot(aes(`True GOSE`, `Predicted GOSE`, fill = n)) +
      geom_raster() +
+      geom_text(aes(
+          label = sprintf("%.1f", n) %>% 
+            ifelse(. == "0.0", "", .)
+        ), 
+        size = 1.5
+      ) +
      geom_hline(yintercept = c(2, 4, 6) + .5, color = "black") +
      geom_vline(xintercept = c(2, 4, 6) + .5, color = "black") +
-    scale_fill_gradient(low = "white", high = "black") +
+      scale_fill_gradient(low = "white", high = "#555555") +
      coord_fixed(expand = FALSE) + 
      labs(x = "true GOSe", y = "imputed GOSe", fill = "") +
      theme_bw() +
@@ -712,8 +716,7 @@ p_cnf_mtrx_raw <- df_average_confusion_matrices %>%
      facet_wrap(~model, nrow = 1) +
      ggtitle("Average confusion matrix accross folds (absolute counts)")
-p_cnf_mtrx_colnrm <- df_average_confusion_matrices %>%
+  p_cnf_mtrx_colnrm <- df_average_confusion_matrices %>%
-  filter(model %in% c("LOCF", "MM", "GP + cov", "MSM")) %>% 
    group_by(model, `True GOSE`) %>%
    mutate(
      `fraction (column)` = n / sum(n),
@@ -733,31 +736,63 @@ p_cnf_mtrx_colnrm <- df_average_confusion_matrices %>%
      facet_wrap(~model, nrow = 1) +
      ggtitle("Average confusion matrix accross folds (column fraction)")
-cowplot::plot_grid(p_cnf_mtrx_raw, p_cnf_mtrx_colnrm, ncol = 1, align = "v")
+  cowplot::plot_grid(p_cnf_mtrx_raw, p_cnf_mtrx_colnrm, ncol = 1, align = "v")        
+}
+plot_confusion_matrices(
+  df_predictions %>% filter(!(gupi %in% idx)), 
+  c("MSM", "GP + cov", "MM", "LOCF")
+)
-ggsave(filename = "confusion_matrices_locf.pdf", width = 7, height = 7)
+ggsave(filename = "confusion_matrices_locf.pdf", width = 7, height = 6)
-ggsave(filename = "confusion_matrices_locf.png", width = 7, height = 7)
+ggsave(filename = "confusion_matrices_locf.png", width = 7, height = 6)
 ```
 The absolute-count confusion matrices show that most imputed values are 
-within +/- one GOSE categories of the true value. 
+within +/- one GOSE categories of the observed ones. 
-However, they also reflect the category imbalance (cf. Figures ??? and ???, 
+However, they also reflect the category imbalance (cf. Figures ??? and ??? 
-Appendix) in the study ppopulation.
+Appendix) in the study population.
-The performance conditional on the (in practice unknown) true GOSe value
+The performance conditional on the (in practice unknown) observed GOSe value
 clearly shows that imputation performance for the most infrequent category 4
-(GOSe of 1 and 2 are not observed in the test set!) is most problematic.
+is most problematic.
-This is, however, true across the board of methods considered.
+This is, however, true across the range of methods considered.
-Note that both the MSM as well as the MM models account for this fact by 
+Both the MSM and the MM models account for this by almost never imputing a 
-almost never imputing a GOSe of 4.
+GOSe of 4.
 Instead, the respective cases tend to be imputed to GOSe 3 or 5.
-Additionally, the following table ??? lists some confusion probabilities
+**TODO:**
-of particular interest.
-[Comment: 1 -> > 1 is not relevant since our imputation is conditional
+* this section table is the one we David requested in our last meeting,
-on not being 1 at 6 months; the comparison seems to favor LOCF since only
+not entirely convinced though ...
+* ... 1 -> > 1 is not relevant since our imputation is conditional
+on not being 1 at 6 months
+* ... the comparison seems to favor LOCF since only
 upward confusions are considered (which LOCF by design tends to do less)]
+* Is there a clinical interpretation along the way that '4' might constitue
+a short-term transition state or is it just defined in a way that makes it
+highly unlikely to be observed in practice?
 ```{r crossing-table, echo=FALSE, warning=FALSE, results='asis'}
+models <- c("MSM", "GP + cov", "MM")
+df_average_confusion_matrices <- df_predictions %>% 
+    filter(model %in% models) %>% 
+    filter(!(gupi %in% idx)) %>% 
+    group_by(fold, model) %>% 
+    do(
+      confusion_matrix = caret::confusionMatrix(
+          data = factor(.$prediction, levels = 1:8), 
+          reference = factor(.$GOSE, levels = 1:8)
+        ) %>% 
+        as.matrix %>% as_tibble %>% 
+        mutate(`Predicted GOSE` = row_number() %>% as.character) %>% 
+        gather(`True GOSE`, n, 1:8)
+    ) %>% 
+    unnest %>% 
+    group_by(model, `Predicted GOSE`, `True GOSE`) %>% 
+    summarize(n = mean(n)) %>% 
+    ungroup %>% 
+    mutate(model = factor(model, models))
 rbind(
 df_average_confusion_matrices %>% 
  filter(model %in% c("LOCF", "MM", "GP + cov", "MSM")) %>% 
@@ -791,37 +826,52 @@ df_average_confusion_matrices %>%
  pander::pandoc.table("Some specific confusion percentages, LOCF subset.", digits = 3)
 ```
-[TODO: Interpretation? GOSe 4 potentially relatively brief transient state, 
-clinical relevance of distinction between 3/4?]
 To better understand the overall performance assessment in Figure ???,
 we also consider the performance conditional on the respective ground-truth
-(i.e. the true GOSe values in the respective test sets).
+(i.e. the observed GOSe values in the test sets).
-The results are shown in Figure ???.
+The results are shown in Figure ??? (vertical bars are =/- one standard error of the mean).
-```{r error-scores-locf, echo=FALSE, fig.height=3}
+```{r error-scores-locf, echo=FALSE, fig.height=3, fig.width=9}
-df_predictions %>%
+plot_summary_measures_cond <- function(df_predictions, models, label) {
-  filter(!(gupi %in% idx)) %>% 
-  filter(model %in% c("LOCF", "MM", "GP + cov", "MSM")) %>% 
+  df_predictions %>%
+    filter(model %in% models) %>% 
    group_by(model, fold, GOSE) %>% 
    summarize(
      RMSE = mean((GOSE - prediction)^2, na.rm = TRUE) %>% sqrt,
      MAE  = mean(abs(GOSE - prediction), na.rm = TRUE),
-    bias = mean(prediction, na.rm = TRUE) - mean(GOSE, na.rm = TRUE) 
+      Bias = mean(prediction, na.rm = TRUE) - mean(GOSE, na.rm = TRUE),
+      `Pr[est. > true] - Pr[est. < true]` = mean(prediction > GOSE, na.rm = TRUE) - mean(prediction < GOSE, na.rm = TRUE)
+    ) %>% 
+    gather(error, value, -model, -GOSE, -fold) %>% 
+    group_by(GOSE, model, error, fold) %>% 
+    summarize(
+        mean_per_fold = mean(value) # overall mean
    ) %>%
-  gather(error, value, -model, -fold, -GOSE) %>% 
    group_by(GOSE, model, error) %>% 
    summarize(
-      mean = mean(value),
+        mean = mean(mean_per_fold), # overall mean
-      sd   = sd(value)
+        se   = sd(mean_per_fold) / sqrt(n())
    ) %>%
    ungroup() %>% 
+    mutate(
+      model = factor(model, models),
+      error = factor(error, c(
+        "Bias",
+        "Pr[est. > true] - Pr[est. < true]",
+        "MAE",
+        "RMSE"
+      ))   
+    ) %>% 
  ggplot(aes(GOSE, color = model)) + 
      geom_hline(yintercept = 0, color = "black") +
-    geom_line(aes(y = mean)) + 
+      geom_errorbar(aes(ymin = mean - se, ymax = mean + se), 
-    geom_point(aes(y = mean)) + 
+        width = .2,
+        position = position_dodge(.33)
+      ) +
+      geom_line(aes(y = mean), alpha = .5) + 
      xlab("GOSe") + 
-    facet_wrap(~error) +
+      facet_wrap(~error, nrow = 1) +
      scale_y_continuous(name = "", breaks = seq(-2, 8, .25)) +
      theme_bw() +
      theme(
@@ -829,10 +879,18 @@ df_predictions %>%
        panel.grid.major.x = element_blank(),
        legend.title       = element_blank()
      ) +
-    ggtitle("Bias and accuracy by true GOSe value, LOCF subset")
+      ggtitle(label)
-ggsave(filename = "errors_stratified_locf.pdf", width = 7, height = 7)
+}
-ggsave(filename = "errors_stratified_locf.png", width = 7, height = 7)
+plot_summary_measures_cond(
+  df_predictions %>% filter(!(gupi %in% idx)), 
+  c("MSM", "GP + cov", "MM", "LOCF"), 
+  "Summary measures by observed GOSe, LOCF subset"
+)
+ggsave(filename = "errors_stratified_locf.pdf", width = 9, height = 3)
+ggsave(filename = "errors_stratified_locf.png", width = 9, height = 3)
 ```
 Just as with the overall performance, differences are most pronounced in terms
@@ -871,11 +929,25 @@ where only GOSe values after 180 days post-injury are available.
 The relative characteristics of the three considered approaches are comparable
 to the LOCF subset.
+**TODO**
+* decide whether figures go in appendix - David and I agree on them being actually the 
+primary analysis. we just needto convince people of the fact that LOCF should be dropped *first*. As always, I am open to debate this but we should just make a decision, figurexit or figuremain?
 ```{r confusion-matrix, warning=FALSE, message=FALSE, echo=FALSE, fig.cap="Confusion matrices, full training set without LOCF."}
+plot_confusion_matrices(
+  df_predictions, 
+  c("MSM", "GP + cov", "MM")
+)
+ggsave(filename = "confusion_matrices_all.pdf", width = 7, height = 6)
+ggsave(filename = "confusion_matrices_all.png", width = 7, height = 6)
+```
+```{r crossing-table-full, echo=FALSE, warning=FALSE, results='asis'}
+models <- c("MSM", "GP + cov", "MM")
 df_average_confusion_matrices <- df_predictions %>% 
-  filter(
+    filter(model %in% models) %>% 
-    model %in% c("GP + cov", "MM", "MSM")
-  ) %>% 
    group_by(fold, model) %>% 
    do(
      confusion_matrix = caret::confusionMatrix(
@@ -889,51 +961,8 @@ df_average_confusion_matrices <- df_predictions %>%
    unnest %>% 
    group_by(model, `Predicted GOSE`, `True GOSE`) %>% 
    summarize(n = mean(n)) %>% 
-  ungroup
+    ungroup %>% 
+    mutate(model = factor(model, models))
-p_cnf_mtrx_raw <- df_average_confusion_matrices %>% 
-  group_by(model, `True GOSE`) %>%
-  ggplot(aes(`True GOSE`, `Predicted GOSE`, fill = n)) +
-    geom_raster() +
-    geom_hline(yintercept = c(2, 4, 6) + .5, color = "black") +
-    geom_vline(xintercept = c(2, 4, 6) + .5, color = "black") +
-    scale_fill_gradient(low = "white", high = "black") +
-    coord_fixed(expand = FALSE) + 
-    labs(x = "true GOSe", y = "imputed GOSe", fill = "") +
-    theme_bw() +
-    theme(
-      panel.grid = element_blank()
-    ) + 
-    facet_wrap(~model, nrow = 1) +
-    ggtitle("Average confusion matrix accross folds (absolute counts)")
-p_cnf_mtrx_colnrm <- df_average_confusion_matrices %>%
-  group_by(model, `True GOSE`) %>%
-  mutate(
-    `fraction (column)` = n / sum(n),
-    `fraction (column)` = ifelse(is.nan(`fraction (column)`), 0, `fraction (column)`)
-  ) %>% 
-  ggplot(aes(`True GOSE`, `Predicted GOSE`, fill = `fraction (column)`)) +
-    geom_raster() +
-    geom_hline(yintercept = c(2, 4, 6) + .5, color = "black") +
-    geom_vline(xintercept = c(2, 4, 6) + .5, color = "black") +
-    scale_fill_gradient("", low = "white", high = "black", limits = c(0, 1)) +
-    coord_fixed(expand = FALSE) + 
-    labs(x = "true GOSe", y = "imputed GOSe", fill = "") +
-    theme_bw() +
-    theme(
-      panel.grid = element_blank()
-    ) + 
-    facet_wrap(~model, nrow = 1) +
-    ggtitle("Average confusion matrix accross folds (column fraction)")
-cowplot::plot_grid(p_cnf_mtrx_raw, p_cnf_mtrx_colnrm, ncol = 1, align = "v")
-ggsave(filename = "confusion_matrices_all.pdf", width = 7, height = 7)
-ggsave(filename = "confusion_matrices_all.png", width = 7, height = 7)
-```
-```{r crossing-table-full, echo=FALSE, warning=FALSE, results='asis'}
 rbind(
 df_average_confusion_matrices %>% 
  group_by(model) %>% 
@@ -964,39 +993,15 @@ df_average_confusion_matrices %>%
  pander::pandoc.table("Some specific confusion percentages, full data set.", digits = 3)
 ```
-```{r error-scores-all, echo=FALSE, fig.height=3}
+```{r error-scores-all, echo=FALSE, fig.height=3, fig.width=9}
-df_predictions %>%
+plot_summary_measures_cond(
-  filter(model %in% c("GP + cov", "MM", "MSM")) %>% 
+  df_predictions %>% filter(!(gupi %in% idx)), 
-  group_by(model, fold, GOSE) %>% 
+  c("MSM", "GP + cov", "MM"), 
-  summarize(
+  "Summary measures by observed GOSe, full test set"
-    RMSE = mean((GOSE - prediction)^2, na.rm = TRUE) %>% sqrt,
+)
-    MAE  = mean(abs(GOSE - prediction), na.rm = TRUE),
-    bias = mean(prediction, na.rm = TRUE) - mean(GOSE, na.rm = TRUE) 
-  ) %>% 
-  gather(error, value, -model, -fold, -GOSE) %>% 
-  group_by(GOSE, model, error) %>% 
-  summarize(
-      mean = mean(value),
-      sd   = sd(value)
-  ) %>% 
-  ungroup() %>% 
-  ggplot(aes(GOSE, color = model)) + 
-    geom_hline(yintercept = 0, color = "black") +
-    geom_point(aes(y = mean)) + 
-    geom_line(aes(y = mean)) + 
-    facet_wrap(~error) +
-    scale_y_continuous(name = "", breaks = seq(-2, 8, .25)) +
-    xlab("GOSe") +
-    theme_bw() +
-    theme(
-      panel.grid.minor   = element_blank(),
-      panel.grid.major.x = element_blank(),
-      legend.title       = element_blank()
-    ) +
-    ggtitle("Accuracy by true GOSe value, full test set")
-ggsave(filename = "imputation_error.pdf", width = 7, height = 7)
+ggsave(filename = "imputation_error.pdf", width = 9, height = 3)
-ggsave(filename = "imputation_error.png", width = 7, height = 7)
+ggsave(filename = "imputation_error.png", width = 9, height = 3)
 ```
@@ -1005,40 +1010,49 @@ ggsave(filename = "imputation_error.png", width = 7, height = 7)
 # Discussion
-The most important measure to prevent missing data bias 
+Handling missing data *post-hoc* to prevent biased analyses often requires
-is to avoid missing values in the first place!
+great effort.
-Study protocols and procedures should be checked preemptively for their 
+It is thus of the utmost importance to implement measures for avoiding missing
-applicability to clinical practice and adequate effort should be put in
+data in the first place [comment: I strongly feel we should lead with this 
-ensuring good compliance with follow-up protocols.
+sentence or something in the same spirit to make it absolutely clear that 
+statistics cannot be used to impute data out of nowhere. 
+Raising awareness for the complexity of missing data problems and should rather be seen
+as an incetive to invest more effort upfront in preventing missingness in the first place ;)]
 Nevertheless, in practice, missing values due to loss-to-follow-up will always
-occur and should be addressed efficiently.
+occur and should be addressed effectively
-There is a wide consensus that non-imputation of missing values cannot be 
+There is a wide consensus that statistically sound imputation of missing values 
-considered best practice and may introduce bias or lead to an unnecessary loss 
+is beneficial for both the reduction of bias and for increasing statistical power.
-of power.
+The current gold-standard for imputing missing values is multiple
-The current gold-standard for imputing missing values is certainly multiple
 imputation on a per-analysis basis including analysis-specific covariates to
 further reduce bias and to preserve the imputation uncertainty in the 
 downstream analysis.
-However, in practice, there are good reasons to provide a set of single-imputed
+In practice, however, there are good reasons for providing a set of single-imputed
-default values in larger projects such as CENTER-TBI.
+default values in large bservational studies such as CENTER-TBI.
-E.g., consortia are increasingly committed to making their databases 
+Consortia are increasingly committed to making their databases 
 available to a wider range of researchers. 
-In fact, more liberal data-sharing policies are becoming an core requirement 
+In fact, more liberal data-sharing policies are becoming a core requirement 
-for many funding bodies [REFERENCES].
+for funding bodies (cf. https://www.openaire.eu/).
 In this context, it might not be possible to ensure that every analysis team
 has the necessary statistical expertise to properly conduct a per-analysis
-multiple imputation.
+multiple imputation in the future.
 Furthermore, the imputed values of a multiple-imputation procedure are 
 inherently random and it is thus difficult to ensure consistency across 
-different analysis teams.
+different analysis teams if the values themselves cannot be stored 
-For this reason, we suggest to provide a default single-imputation with 
+directly in a database.
-appropriate measures of uncertainty for key outcomes in the published 
+For this reason, as a pratical way forward, we suggest providing a default 
-data base itself.
+single-imputation with appropriate measures of uncertainty for key outcomes 
-This mitigates problems with complete cases analyses and provide a more
+in the published data base itself.
+This mitigates problems with complete-case analyses and provides a
 principled and consistent default approach to handling missing values.
+Since we strongly suggest to employ a model-based approach to imputation,
+the fitted class probabilities can be provided in the core databse along the
+imputed values.
+Based on these probabilities, it is easy to draw samples for a multiple imputation
+analysis.
 Wherever necessary and practical, a custom, analysis-specific multiple 
-imputation approach might still be used and the model used to provide the 
+imputation approach might still be employed.
-single-imputed values may be used as a starting point.
+In these cases, the model providing the single-imputed values may 
+be used as a starting point.
 Several reasons disqualify LOCF as method of choice.
 Not only is it inherently biased, but LOCF is also inefficient in that it 
@@ -1053,34 +1067,29 @@ necessary in some cases to further reduce bias.
 Finally, LOCF cannot produce an adequate measure of imputation uncertainty 
 since it is not model based.
-We draw two main conclusion from our in-depth comparison of three 
+We draw two main conclusion from our comparison of three 
 alternative, model-based approaches.
 Firstly, despite its theoretical drawbacks, LOCF is hard to beat in terms of
-raw accuracy. 
+accuracy.  
 Still, small improvements are possible (cf. Section ???).
 The main advantages of a model-based approach is thus a reduction of bias,
 the ability to provide a measure of uncertainty together with the imputed 
 values
 (or to use the same very same model to draw multiple imputations), 
-as well as the possibility to include further analysis-specific covariates.
+as well as the possibility of including further analysis-specific covariates.
 Secondly, we found that the inclusion of established baseline predictors for
-GOSe at 6 months post-injury had next to no effect on the imputation quality.
+GOSe at 6 months post-injury had little effect on the imputation quality.
 Note that this does not refute their predictive value but only indicates that
 there is little marginal benefit over knowing at least one other GOSe value. 
-Finally, we conclude that the differences between the considered model-bases
+Differences between the model-based approaches tend to be rather nuanced.
-approaches are rather small.
+We nevertheless favor the multi-state model (MSM).
+It is well-interpretable in terms of transition intensities.
-We nevertheless favor the multi-state model.
+An efficient implementation [@msm2011] in standard statistical software [@R2016] is 
-The multi state model is well-interpretable in terms of modeling transition 
+available.
-intensities.
+It succeeds in eliminating the bias introduced by LOCF, is
-Efficient implementations in standard statistical software (msm, R) are 
-available, it succeeds in eliminating the bias introduced by LOCF, is
 able to provide imputed values for the entire population and is able to
 provide a probabilistic output.
-I.e., besides the single predicted GOSe value, the fitted probabilities
-for each GOSe value may be stored in the database.
-Based on these values, it is easy to draw samples for a multiple imputation
-analysis.