...

32996c4e · Kevin Kunzmann · d5175d55 · 32996c4e · 32996c4e
Commit 32996c4e authored May 23, 2019 by Kevin Kunzmann
Hide whitespace changes
Inline Side-by-side

Showing with 135 additions and 99 deletions

manuscript/manuscript.Rmd manuscript/manuscript.Rmd +129 -69

manuscript/references.bib manuscript/references.bib +6 -30

No files found.
--- a/manuscript/manuscript.Rmd
+++ b/manuscript/manuscript.Rmd
@@ -38,30 +38,49 @@ single (@clifton2011; @silver2006; @skolnick2014) and
 multiple imputation (@bulger2010; @kirkness2006; @wright2014; @robertson2014).
 Last observation carried forward (LOCF) is a widely applied single-imputation
 method for dealing with missing data in TBI research.
-Typically, 3-month outcome is substituted for missing 6-month data (@steyerberg2008).
+The 3-month outcome is recognized as one approach of subsituting for missing
-Although LOCF is easy to understand and implement, the technique is clearly 
+6-month data (@steyerberg2008), 
-lacking in several respects.
+and has been used in recent trials (@skolnick2014).
-Firstly, it is biased in that it neglects potential trends in the GOS(e) 
+Although LOCF is easy to understand and implement, the technique is 
-trajectories.
+suboptimal in several respects.
+Firstly, it is biased in that it neglects potential trends in 
+GOS(e) trajectories.
 Secondly, a naive application of LOCF is also inefficient since it neglects
 data observed briefly after the target time window.
 E.g., a GOS(e) value recorded at 200 days post-injury is likely to be more
 informative about the status at 180 days post-injury than a value observed 90 
 days post-injury.
 Finally, the *ad-hoc* nature of the LOCF method implies that there is no 
-probabilistic model an thus no measure of uncertainty about the imputed values.
+probabilistic model, and thus no measure of uncertainty about the imputed values.
 This also implies that it is impossible to include additional covariates
 to further reduce bias introduced by the imputation method and that LOCF
 cannot be used to obtain multiply imputed data sets by design.
+The variation in timing of outcome assessments for patients with TBI 
+varies between studies.  
+Some studies define very stringent time windows (e.g. + 2 weeks [TRACK-TBI]), 
+but this can lead to a substantial amount missing data (@richter2019).  Consequently other studies have defined more pragmatic protocol windows 
+(e.g. -1 month to +2 months, @maas2014).  
+While the wider windows enable more complete data collection, 
+they suffer from the problem that outcome can be evolving over this period,
+and an outcome assessment obtained at five months 
+(the beginning of this window) in one subject may not be strictly comparable
+with outcomes obtained just before eight months (the end of the window) in 
+another subject.  
+Consequently, even where outcomes are available within pragmatic protocol 
+windows, 
+there may be a benefit from being able to impute an outcome more 
+precisely at the 180 day (6 month) time point.
 In this manuscript, three model-based imputation strategies for GOSe at 
-6 months (= 180 days) post-injury in the longitudinal CENTER-TBI study 
+6 months (=180 days) post-injury in the longitudinal CENTER-TBI study 
-@center2015collaborative are compared with LOCF.
+(@maas2014) are compared with LOCF.
 While we acknowledge the principle superiority of multiple imputation over
 single imputation to propagating imputation uncertainty,
-we focus on single-imputation performance for primarily practical reasons.
+we focus on single-imputation performance primarily for practical reasons.
-CENTER-TBI is committed to providing a curated database to facilitate subsequent
+Unlike randomized trials which involve a single principle analysis,
-analyses.
+CENTER-TBI is committed to providing a curated database to facilitate 
+multiple subsequent analyses.
 Since the primary endpoint in CENTER-TBI is functional outcome at 6 months, 
 a single default imputed value for as many study participants as possible is 
 desirable. 
@@ -143,7 +162,7 @@ n_pat <- nrow(df_baseline)
 ```
 The CENTER-TBI project methods and design are described in detail 
-elsewhere [@center2015collaborative]. 
+elsewhere [@center2014]. 
 Participants with TBI were recruited into three strata: 
 (a) patients attending the emergency room, 
 (b) patients admitted to hospital but not intensive care, 
@@ -151,6 +170,8 @@ and (c) patients admitted to intensive care.
 Follow-up of participants was scheduled per protocol at 2 weeks, 3 months, 
 and 6 months in group (a) and at 3 months, 6 months, and 12 months in groups 
 (b) and (c).   
+The protocol time window for the 6 months GOSe was between 5 and 8 months
+post injury.
 Outcome assessments at all timepoints included the GOSe 
 (@jennett1981disability, @mcmillan2016glasgow). 
 The GOSe is an eight-point scale with the following categories: 
@@ -159,15 +180,17 @@ The GOSe is an eight-point scale with the following categories:
 (6) upper moderate disability, (7) lower good recovery, 
 (8) upper good recovery.
-The study population for this empirical methods comparison are all individuals
+The study population for this empirical methods comparison are all 
-from the CENTER-TBI database (total of n = 4509) whose GOSe status was queried 
+individuals from the CENTER-TBI database (total of n = 4509) whose GOSe 
-at least once within the first 18 months and who were still alive 180 days 
+status was queried at least once within the first 18 months and who were 
-post-injury `r n_pat`.
+still alive 180 days post-injury `r n_pat`.
 The rationale for conducting the comparison conditional on 6-months survival
 is simply that the GOSe can only be missing at 6-months if the individuals
 are still alive since GOSe would be (1) otherwise.
+All data were accessed from the CENTER-TBI Neurobot database 
+(release 1.1, cf. Appendix for details).
+Basic summary statistics for population characteristics are listed in
+Table 1.
 ```{r baseline-table-continuous, echo=FALSE, results='asis'}
 summarizer <- function(x) {
@@ -230,15 +253,16 @@ n_gose_pp <- df_gose %>%
  n_groups()
 ```
-Only GOSe observations between injury and 18 months post injury are used
+We decided to use only those GOSe observations obtained between injury 
-since extremely late follow-ups are not providing enough information.
+and 18 months post injury
-This leads to a total of `r nrow(df_gose)` GOSe observations of the study
+since extremely late follow-ups were considered to be irrelevant to the
+index follow-up time point of 6 months post injury.
+This lead to a total of `r nrow(df_gose)` GOSe observations of the study
 population being available for the analyses in this manuscript.
-Only for `r n_gose_180` (`r round(n_gose_180 / n_pat * 100, 1)`%) individuals,
+For `r n_gose_180` (`r round(n_gose_180 / n_pat * 100, 1)`%) individuals,
-GOSe observations at 180 +/- 14 days post injury are available,
+GOSe observations at 180 +/- 14 days post injury were available and
-for `r n_gose_pp` (`r round(n_gose_pp / n_pat * 100, 1)`%) individuals
+`r n_gose_pp` (`r round(n_gose_pp / n_pat * 100, 1)`%) individuals had
-GOSe observations within the per-protocol window of 5-8 months post injury
+GOSe observations within the per-protocol window of 5-8 months post injury.
-exist.
 The distribution of GOSe sampling times and both absolute and
 relative frequencies of the respective GOSe categories are shown in 
 Figure (???).
@@ -324,28 +348,60 @@ gridExtra::grid.arrange(
 # Imputation methods
-We compare last observation carried forward (LOCF) to a mixed effect model (MM), 
+We compared last observation carried forward (LOCF) to a mixed effect 
+model (MM), 
 a Gaussian process regression (GP), and a multi-state model (MSM). 
-For all model-based approaches we additionally explore variants including the key 
+For all model-based approaches we additionally explored variants 
-IMPACT [@steyerberg2008predicting] predictors as covariates. 
+including the key IMPACT [@steyerberg2008] predictors as covariates. 
+These are [????].
 ## Last-observation-carried-forward
 Since LOCF is widely used to impute missing outcomes in TBI studies, 
-it serves as the baseline method.
+it served as the baseline method.
-Here, LOCF is defined as the last GOSe observation before the
+Here, LOCF was defined as the last GOSe observation before the
 imputation time point of 180 days post-injury.
-LOCF is not model-based and, by definition, only permits the imputation of a 
+LOCF is not a model-based method and, by definition, 
-GOSe value for subjects where at least one value is available within the 
+only permits the imputation of a  GOSe value for subjects where at least one 
-first 180 days post injury.
+value is available within the first 180 days post injury.
-We account for this lack of complete coverage under LOCF by performing all
+We accounted for this lack of complete coverage under LOCF by performing all
 performance comparisons including LOCF only on the subset of individuals
 for which a LOCF-imputed value can be obtained. 
 ## Model-based methods 
+Model-based imputation approaches offer richer output (probabilistic imputation,
+multiple imputation) and may reduce the LOCF-inherent bias.
+We compared LOCF with three model-based approaches.
+We considered mixed effects models (MM) are a widely used approach in 
+longitudinal data analysis and model individual deviations from a population 
+mean trajectory (@verbeke2009linear).
+An alternative non-linear regression model for longitudinal data is 
+Gaussian process regression (GP) which allows flexible modelling of 
+both the individual GOSE trajectories as well as the population mean
+in a Bayesian non-parametric way [@rasmussen2006].
+Both the mixed effects model as well as the Gaussian process regression 
+model are non-linear regression techniques for longitudinal 
+data.
+While they are both powerful tools to model longitudinal trajectories,
+they do not explicitly model the probability of transitions between 
+GOSe states.
+Since the number of observations per individual is very limited in our
+data set (1 to 4 GOSe observations per individual),
+an approach explicitly modelling transition probabilities might be 
+more suitable to capture the dynamics of the GOSe trajectories.
+To explore this further, a Markov multi-state model (MSM)
+was considered (@meira2009).
+All models were fitted using eiter none or all IMPACT predictors except for the 
+MSM model which only used age due to issues with numerical stability.
+Further details on the respective implementations is given in the Appendix.
@@ -435,48 +491,54 @@ df_predictions <- df_model_posteriors %>%
  )
 ```
-Model performance was assessed via three fold cross validation on the subset 
+Model performance was assessed via three-fold cross validation on the subset 
 of individuals with a valid GOSe value within 180 +/- 14 days post-injury 
 (n = `r nrow(df_ground_truth)`).
 All models were fit on the entire available data after removing the 
 180 +/- 14 days post-injury observation from the respective test fold
 thus mimicking a missing completely at random missing data mechanism.
 The distribution of GOSe values in the respective three test sets is well 
-balanced, (cf. Appendix).
+balanced, (cf. Appendix, Figure ???).
-Performance is assessed using the absolute-count and the normalized 
+Performance was assessed using the absolute-count and the normalized 
 (proportions) confusion matrices as well as
 bias, mean absolute error (MAE), and root mean squared error (RMSE).
+All confusion matrices are reported as averages over the three-fold cross
+validation test sets.
+The normalized confusion matrices are normalized within each stratum of observed 
+GOSe value and are thus estimates of confusion probability conditional on the
+observed GOSe.
 Bias indicates whether, on average, predicted values are systematically 
 lower (negative) or higher (positive) than observed values.
 MAE and RMSE are both a measures of average precision where
 RMSE puts more weight on large deviations as compared to RMSE.
 Comparisons in terms of bias, MAE, and RMSE tacitly assume that 
 GOSe values can be sensibly interpreted on an interval scale.
-We therefore also consider the directional bias (bias'), 
+We therefore also considered the directional bias (bias'), 
 the difference between the model-fitted 
 probability of exceeding the true value minus the model-fitted probability of 
 undershooting the true GOSe ($Pr[imp. > true] - Pr[imp. < true]$) as an
 alternative measure of bias which does not require this assumption.
-Note that the scale is not directly comparable to the one of the
+Note that the scale of the directional bias is not directly comparable to the 
-other three quantities!
+one of the other three quantities!
 All measures are considered both conditional on the ground-truth
 (unobserved observed GOSe) as well as averaged over the entire test set.
-LOCF, by design, cannot provide imputed values when there are no
-observations before 180 days post injury.
-A valid comparison of LOCF with the other methods must therefore be 
-based on the set of individuals for whom an LOCF imputation is possible.
 ```{r non-locf-ids, include=FALSE}
 idx <- df_predictions %>% 
  filter(model == "LOCF", !complete.cases(.)) %>% 
  .[["gupi"]]
 ```
+LOCF, by design, cannot provide imputed values when there are no
+observations before 180 days post injury.
+A valid comparison of LOCF with the other methods must therefore be 
+based on the set of individuals for whom an LOCF imputation is possible.
 Overall, `r length(idx)` out of 
 `r df_predictions %>% filter(model == "LOCF") %>% nrow` test cases 
-(`r round(100 * length(idx) / (df_predictions %>% filter(model == "LOCF") %>% nrow), 1)`%) cannot be imputed with the LOCF approach.
+(`r round(100 * length(idx) / (df_predictions %>% filter(model == "LOCF") %>% nrow), 1)`%) could not be imputed with the LOCF approach.
 In the entire study population, `r df_gose %>% group_by(gupi) %>% summarize(LOCF = any(Outcomes.DerivedCompositeGOSEDaysPostInjury <= 180)) %>% ungroup %>% summarize(n_LOCF = sum(!LOCF)) %>% .[["n_LOCF"]]`
-individuals do not permit an LOCF imputation (`r round(100 * (df_gose %>% group_by(gupi) %>% summarize(LOCF = any(Outcomes.DerivedCompositeGOSEDaysPostInjury <= 180)) %>% ungroup %>% summarize(n_LOCF = sum(!LOCF)) %>% .[["n_LOCF"]]) / (df_gose$gupi %>% unique %>% length), 1)`%).
+individuals (`r round(100 * (df_gose %>% group_by(gupi) %>% summarize(LOCF = any(Outcomes.DerivedCompositeGOSEDaysPostInjury <= 180)) %>% ungroup %>% summarize(n_LOCF = sum(!LOCF)) %>% .[["n_LOCF"]]) / (df_gose$gupi %>% unique %>% length), 1)`%) did not have data that would permit an LOCF imputation.
+The subset used for comparison of the imputation approaches with the LOCF 
+approach  was similar to the overall dataset (cf. Appendix, Table ???).
@@ -555,8 +617,8 @@ on average it imputes lower-than-observed GOSe values.
 This reflects a population average trend towards continued
 recovery within the first 6 months post injury.
 The fact that both ways of quantifying bias qualitatively agree, 
-indicates that the interpretation of GOSe as an interval measure which is
+indicates that the interpretation of GOSe as an interval measure which
-tacitly underlying Bias, MAE, and RMSE comparisons is not too restrictive.
+tacitly underlies Bias, MAE, and RMSE comparisons is not too restrictive.
 In terms of accuracy, LOCF does perform worst but differences between
 methods are less pronounced than in terms of bias.
 Notably, the RMSE difference between LOCF and the other methods is slightly
@@ -593,19 +655,21 @@ Both the raw count as well as the relative (by left-out observed GOSe) confusion
 are presented in Figure ???.
 ```{r confusion-matrix-locf, warning=FALSE, message=FALSE, echo=FALSE, fig.cap="Confusion matrices on LOCF subset.", fig.height=6, fig.width=6}
 plot_confusion_matrices <- function(df_predictions, models, nrow = 2, legendpos, scriptsize) {
  df_average_confusion_matrices <- df_predictions %>% 
+    select(-`1`, -`2`) %>% 
    filter(model %in% models) %>% 
    group_by(fold, model) %>% 
    do(
      confusion_matrix = caret::confusionMatrix(
-          data = factor(.$prediction, levels = 1:8), 
+          data = factor(.$prediction, levels = 3:8), 
-          reference = factor(.$GOSE, levels = 1:8)
+          reference = factor(.$GOSE, levels = 3:8)
        ) %>% 
        as.matrix %>% as_tibble %>% 
-        mutate(`Predicted GOSE` = row_number() %>% as.character) %>% 
+        mutate(`Predicted GOSE` = {row_number() + 2} %>% as.character) %>% 
-        gather(`Observed GOSE`, n, 1:8)
+        gather(`Observed GOSE`, n, 1:6)
    ) %>% 
    unnest %>% 
    group_by(model, `Predicted GOSE`, `Observed GOSE`) %>% 
@@ -635,7 +699,7 @@ plot_confusion_matrices <- function(df_predictions, models, nrow = 2, legendpos,
        legend.position = "none"
      ) + 
      facet_wrap(~model, nrow = nrow) +
-      ggtitle("Average confusion matrix accross folds (absolute counts)")
+      ggtitle("Average confusion matrix across folds (absolute counts)")
  p_cnf_mtrx_colnrm <- df_average_confusion_matrices %>%
    group_by(model, `Observed GOSE`) %>%
@@ -656,10 +720,9 @@ plot_confusion_matrices <- function(df_predictions, models, nrow = 2, legendpos,
        legend.position = legendpos
      ) + 
      facet_wrap(~model, nrow = nrow) +
-      ggtitle("Average confusion matrix accross folds (column fraction)")
+      ggtitle("Average confusion matrix across folds (column fraction)")
  cowplot::plot_grid(p_cnf_mtrx_raw, p_cnf_mtrx_colnrm, ncol = 1, align = "h")      
 }
 plot_confusion_matrices(
@@ -668,11 +731,11 @@ plot_confusion_matrices(
  c("MSM", "GP + cov", "MM", "LOCF"),
  nrow = 1,
  legendpos = "none",
-  scriptsize = 2
+  scriptsize = 2.5
 )
-ggsave(filename = "confusion_matrices_locf.pdf", width = 6, height = 9)
+ggsave(filename = "confusion_matrices_locf.pdf", width = 6, height = 6)
-ggsave(filename = "confusion_matrices_locf.png", width = 6, height = 9)
+ggsave(filename = "confusion_matrices_locf.png", width = 6, height = 6)
 ```
 The absolute-count confusion matrices show that most imputed values are 
@@ -793,8 +856,9 @@ In the following, LOCF is not considered since a meaningful comparison
 including LOCF is not possible on the entire set of test candidates due to the 
 fact that LOCF is not applicable in cases
 where only GOSe values after 180 days post-injury are available.
-The relative characteristics of the three considered approaches are comparable
+The qualitative performance of the three imputation approaches in the complete 
-to the LOCF subset.
+dataset was similar to their performance in the subset of data used for 
+comparison with LOCF.
@@ -804,7 +868,7 @@ plot_confusion_matrices(
  c("MSM", "GP + cov", "MM"),
  nrow = 1,
  legendpos = "none",
-  scriptsize = 2.5
+  scriptsize = 3
 )
 ggsave(filename = "confusion_matrices_all.pdf", width = 6, height = 6)
@@ -833,11 +897,7 @@ ggsave(filename = "imputation_error.png", width = 6, height = 3.5)
 Handling missing data *post-hoc* to prevent biased analyses often requires
 great effort.
 It is thus of the utmost importance to implement measures for avoiding missing
-data in the first place [comment: I strongly feel we should lead with this 
+data in the first place.
-sentence or something in the same spirit to make it absolutely clear that 
-statistics cannot be used to impute data out of nowhere. 
-Raising awareness for the complexity of missing data problems and should rather be seen
-as an incentive to invest more effort upfront in preventing missingness in the first place ;)]
 Nevertheless, in practice, missing values due to loss-to-follow-up will always
 occur and should be addressed effectively
 There is a wide consensus that statistically sound imputation of missing values 
@@ -937,7 +997,7 @@ ggsave(filename = "gose_marignal_per_fold.png", width = 6, height = 3)
-## hm
+## Comparison of LOCF and non-LOCF subgroups
 ```{r baseline-table-continuous2, echo=FALSE, results='asis'}
 summarizer <- function(x) {

--- a/manuscript/references.bib
+++ b/manuscript/references.bib
@@ -104,15 +104,15 @@
  publisher={American Medical Association}
 }
-@article{center2015collaborative,
+@article{maas2014,
-  title={Collaborative European neurotrauma effectiveness research in traumatic brain injury (CENTER-TBI): A prospective longitudinal observational study},
+  title={Collaborative European NeuroTrauma Effectiveness Research in Traumatic Brain Injury (CENTER-TBI) A Prospective Longitudinal Observational Study},
-  author={CENTER-TBI Participants and Investigators and others},
+  author={Maas, Andrew IR and Menon, David K and Steyerberg, Ewout W and Citerio, Giuseppe and Lecky, Fiona and Manley, Geoffrey T and Hill, Sean and Legrand, Valerie and Sorgner, Annina},
  journal={Neurosurgery},
  volume={76},
  number={1},
  pages={67--80},
-  year={2015},
+  year={2014},
-  publisher={Lippincott Williams and Wilkins}
+  publisher={Oxford University Press}
 }
 @article{kurtzer2017singularity,
@@ -149,16 +149,6 @@
  publisher={BMJ Publishing Group Ltd}
 }
-@article{steyerberg2008predicting,
-  title={Predicting outcome after traumatic brain injury: development and international validation of prognostic scores based on admission characteristics},
-  author={Steyerberg, Ewout W and Mushkudiani, Nino and Perel, Pablo and Butcher, Isabella and Lu, Juan and McHugh, Gillian S and Murray, Gordon D and Marmarou, Anthony and Roberts, Ian and Habbema, J Dik F and others},
-  journal={PLoS medicine},
-  volume={5},
-  number={8},
-  pages={e165},
-  year={2008},
-  publisher={Public Library of Science}
-}
 @book{verbeke2009linear,
  title={Linear mixed models for longitudinal data},
@@ -248,7 +238,7 @@ year =
 @article{steyerberg2008,
  title={Predicting outcome after traumatic brain injury: development and international validation of prognostic scores based on admission characteristics},
-  author={Steyerberg, Ewout W and Mushkudiani, Nino and Perel, Pablo and Butcher, Isabella and Lu, Juan and McHugh, Gillian S and Murray, Gordon D and Marmarou, Anthony and Roberts, Ian and Habbema, J Dik F and others},
+  author={Steyerberg, Ewout and Mushkudiani, Nino and Perel, Pablo and Butcher, Isabella and Lu, Juan and McHugh, Gillian S and Murray, Gordon D and Marmarou, Anthony and Roberts, Ian and Habbema, J Dik F and others},
  journal={PLoS medicine},
  volume={5},
  number={8},
@@ -288,17 +278,3 @@ title = {{brms: An R Package for Bayesian Multilevel Models Using Stan}},
 volume = {80},
 year = {2017}
 }
-@article{Steyerberg2008,
-author = {Steyerberg, Ewout W and Mushkudiani, Nino and Perel, Pablo and Butcher, Isabella and Lu, Juan and McHugh, Gillian S and Murray, Gordon D and Marmarou, Anthony and Roberts, Ian and Habbema, J. Dik F and Maas, Andrew I. R},
-doi = {10.1371/journal.pmed.0050165},
-editor = {Singer, Mervyn},
-journal = {PLoS Medicine},
-month = {aug},
-number = {8},
-pages = {e165},
-title = {{Predicting Outcome after Traumatic Brain Injury: Development and International Validation of Prognostic Scores Based on Admission Characteristics}},
-url = {https://dx.plos.org/10.1371/journal.pmed.0050165},
-volume = {5},
-year = {2008}
-}