added manuscript draft

bd57f29f · Kevin · 78ff72b7 · bd57f29f · bd57f29f · bd57f29f
Commit bd57f29f authored Mar 15, 2019 by Kevin
4 changed files
--- a/docker/dockerfile
+++ b/docker/dockerfile
@@ -16,11 +16,9 @@ RUN R -e "install.packages('ggalluvial')"
 RUN R -e "install.packages('caret')"
 RUN R -e "install.packages('msm')"
 RUN R -e "install.packages('cowplot')"
+RUN R -e "install.packages('pander')"
+RUN R -e "install.packages('DiagrammeR')"
 RUN R -e "install.packages('opal', repos=c('https://cran.rstudio.com/', 'https://cran.obiba.org'), dependencies=TRUE)"
 RUN R -e "devtools::install_github('kkmann/centertbi')"
-RUN R -e "devtools::install_github('kkmann/reportr')" 
+RUN R -e "devtools::install_github('kkmann/reportr')"
 RUN R -e "devtools::install_github('kkmann/describr')"
-
-# install missing tex packages for fancy report
-RUN tlmgr update --self
-RUN tlmgr install koma-script psnfss enumitem xcolor lastpage float placeins beamer translator
--- a/manuscript/manuscript.Rmd
+++ b/manuscript/manuscript.Rmd
+---
+title: "Model-based longitudinal imputation of cross-sectional GOSe in CENTER-TBI"
+output: 
+  word_document: default
+  pdf_document: default
+  html_document:
+    keep_md: yes
+bibliography: "references.bib"
+params:
+  data_dir:    "../output/v1.1/data"
+  config_file: "../config.yml"
+---
+
+```{r setup-chunk, include=FALSE}
+library(tidyverse)
+knitr::opts_chunk$set(dpi = 300, fig.width = 8, fig.height = 5)
+config <- yaml::read_yaml(params$config_file)
+```
+
+*Abstract:* TODO
+
+# Introduction
+
+Assessments of global functional outcome such as the Glasgow Outcome Scale (GOS)
+and the Glasgow Outcome Scale extended (GOSe) capture 
+meaningful differences across the full spectrum of recovery,
+and have popularity as endpoints in TBI studies (Horton et al 2018).
+However, missing outcome data is a common problem in TBI research, 
+and for longitudinal studies completion rates at six months can be 
+lower than 70% (Richter et al).
+Imputation of patient outcomes is gradually gaining acceptance in the TBI
+field as a method of dealing with missing data in studies.
+This is important since it is well known that complete-case analyses may
+introduce bias or reduce power [REFERENCE].
+Recent longitudinal studies have successfully employed techniques for both
+single (Clifton et al., 2011; Silver et al 2006; Skolnick et al 2014) and
+multiple imputation (Bulger et al, 2010, Kirkness et al 2006;
+Wright et al, 2014; Robertson et al, 2014).
+The advantage of multiple imputation over single imputation clearly lies in the
+fact that the uncertainty about imputed values can be properly accounted for 
+in a subsequent statistical analysis.
+Despite these considerations, last observation carried forward (LOCF) is still
+a widely applied method for dealing with missing data in TBI research,
+e.g. simply substituting the respective 3-months outcome for a missing 6-months data point [REFERENCE]. 
+Although LOCF is simple to understand and implement, the technique clearly 
+lacks in several respects.
+Firstly, it is biased in that it neglects potential trends in the GOS(e) trajectories.
+Secondly, a naive application of LOCF is also inefficient since it neglects
+data observed briefly after the target time window.
+E.g., a GOS(e) value recorded at 200 days post-injury is likely to be more
+informative about the status at 180 days post-injury than a value observed 90 
+days post-injury.
+Finally, the *ad-hoc* nature of the LOCF method implies that there is no 
+probabilistic model an thus no measure of uncertainty about the imputed values.
+This also implies that it is impossible to include additional covariates
+to further reduce bias introduced by the imputation method and that LOCF
+cannot be used to obtain multiply imputed data sets by design.
+
+In this manuscript, we three model-based imputation strategies for longitudinal
+GOSe data at 6 months (180 days) post-injury in TBI studies by example of the 
+CENTER-TBI study and compare the results with standard LOCF.
+While we acknowledge the principle superiority of multiple imputation over
+single imputation to propagating imputation uncertainty,
+we focus on single-imputation performance for primarily practical reasons:
+CENTER-TBI is committed to providing a curated data base to facilitate subsequent
+analyses.
+Since the primary endpoint is functional outcome at 6 months, an default imputed
+value for as many study participants as possible is desirable. 
+To this end, a single imputed value is to be included in the published database
+together with a measure of uncertainty about the imputed value.
+Subsequent analyses may then either resort to using the provided imputed values
+or - based on the findings presented in this manuscript - fit a custom
+imputation model to adjust for different covariates or obtain multiply
+imputed values.
+
+We propose three different model-based approaches - a mixed-effects model, a Gaussian process regression, and a multi-state model - for imputing GOSe 
+longitudinally each of which we fit in a version including baseline covariates
+and without.
+
+
+
+# Study population
+
+The study population for our empirical methods comparison are all individuals
+from the CENTER-TBI database (n = 4509) whose GOSe status was queried at least
+once and who were still alive 180 days post-injury.
+The rationale for conducting the comparison conditional on 6-months survival
+is simply that the GOSe can only be missing at 6-months if the individuals
+are still alive since GOSe would be 1 (dead) otherwise.
+
+```{r read-population-data, include=FALSE}
+df_baseline <- read_rds(paste0(params$data_dir, "/df_baseline.rds"))
+```
+
+Overall, `r nrow(df_baseline)` individuals are eligible for a model-based 
+imputation.
+
+```{r baseline-table-continuous, echo=FALSE, results='asis'}
+summarizer <- function(x) {
+  return(tibble(
+    statistic = c("# NA", "min", "mean", "median", "max", "std") %>% 
+      factor(., levels = ., ordered = TRUE),
+    value     = list(sum(is.na(x)), mean(x, na.rm = TRUE), median(x, na.rm = TRUE),
+                 min(x, na.rm = TRUE), max(x, na.rm = TRUE), sd(x, na.rm = TRUE))
+  ))
+}
+
+df_baseline %>%
+  select_if(is.numeric) %>% 
+  gather(variable, value) %>% 
+  group_by(variable) %>% 
+  do(
+    smry = summarizer(.$value)
+  ) %>%
+  unnest() %>%
+  spread(statistic, value) %>% 
+  unnest() %>% 
+  pander::pandoc.table("Discrete baseline variables", digits = 3)
+```
+
+```{r baseline-table-discrete, echo=FALSE, results='asis'}
+summarizer <- function(x) {
+  return(tibble(
+    statistic = c("# NA", "table") %>% 
+      factor(., levels = ., ordered = TRUE),
+    value     = list(
+      sum(is.na(x)), 
+      {
+        rbind(table(x), prop.table(table(x))) %>% 
+          t %>% 
+          {tmp <- .; colnames(tmp) <- c("n", "%"); tmp} %>%  
+          as_tibble(rownames = "level") %>% 
+          mutate(`%` = 100 * `%`)
+      }
+    )
+  ))
+}
+
+df_baseline %>%
+  select_if(~!is.numeric(.)) %>% 
+  select(-gupi) %>% 
+  mutate_all(as.character) %>% 
+  gather(variable, value) %>% 
+  group_by(variable) %>% 
+  do(
+    smry = summarizer(.$value)
+  ) %>%
+  unnest() %>%
+  spread(statistic, value) %>% 
+  unnest(`# NA`) %>%
+  unnest(table) %>% 
+  pander::pandoc.table("Continuous baseline variables", digits = 3)
+```
+
+```{r read-gose-data, include=FALSE}
+df_gose <- read_rds(paste0(params$data_dir, "/df_gose.rds"))
+```
+
+A total of `r nrow(df_gose)` GOSe observations are available for the study 
+population.
+The marginal distribution of their sampling times as well as absolute and
+relative frequencies of the respective GOSe categories are depicted in 
+Figure (???).
+Note that the small fraction of GOSe 1 (dead) observations at 6 months is due
+to the fact that the true observation times where mapped to the respective
+categories by rounding to the respective closest.
+I.e., the '6 months' category actually contains observations of up to 9 months 
+post-injury and the figures are thus not contradicting the study population
+inclusion criteria.
+
+```{r gose-plots, echo=FALSE, fig.cap="GOSe time point distribution and marginal distribution at per-protocol time points.", fig.height=5, fig.width=7}
+p_gose_times <- df_gose %>% 
+  ggplot(aes(Outcomes.DerivedCompositeGOSEDaysPostInjury / 30)) +
+    geom_histogram(binwidth = 1) +
+    theme_bw() +
+    scale_x_continuous("Months post-injury",
+      breaks = seq(0, 50)
+    ) + 
+    scale_y_continuous("", breaks = seq(0, 2000, 200)) +
+    theme(
+      panel.grid.minor = element_blank(),
+      panel.grid.major.x = element_blank()       
+    )
+
+timepoint <- function(x) {
+  abs(
+    matrix(rep(x, 4), ncol = 4)
+    - matrix(rep(c(14, 90, 180, 360), length(x)), nrow = length(x), byrow = TRUE)
+  ) %>% 
+  apply(., 1, which.min) %>% 
+  factor(
+    levels = 1:4, 
+    labels = c("2 weeks", "3 months", "6 months", "12 months")
+  )
+}
+
+tmp <- df_gose %>% 
+  transmute(
+    gupi,
+    t    = timepoint(Outcomes.DerivedCompositeGOSEDaysPostInjury),
+    GOSe = Outcomes.DerivedCompositeGOSE
+  )
+
+p1 <- tmp %>% 
+  ggplot(aes(GOSe)) +
+    geom_bar(aes(fill = GOSe)) +
+    scale_fill_viridis_d(guide = guide_legend(nrow = 1)) +
+    facet_wrap(~t, nrow = 1) + 
+    ylab("absolute count") +
+    theme_bw() +
+    theme(
+      panel.grid.major.x = element_blank(),
+      legend.position = "bottom"
+    ) 
+
+p2 <- tmp %>% 
+  group_by(t, GOSe) %>% 
+  summarize(n = n()) %>% 
+  group_by(t) %>% 
+  mutate(freq = n / sum(n)) %>% 
+  ggplot(aes(x = GOSe, y = freq)) +
+    geom_bar(aes(fill = GOSe), stat = "identity") +
+    scale_fill_viridis_d() + 
+    facet_wrap(~t, nrow = 1) + 
+    ylab("relative frequency") +
+    theme_bw() +
+    theme(
+      panel.grid.major.x  = element_blank(),
+      legend.position = "none"
+    )
+
+g_legend <- function(a.gplot){
+  tmp <- ggplot_gtable(ggplot_build(a.gplot))
+  leg <- which(sapply(tmp$grobs, function(x) x$name) == "guide-box")
+  legend <- tmp$grobs[[leg]]
+  return(legend)
+}
+
+gridExtra::grid.arrange(
+  p_gose_times,
+  gridExtra::arrangeGrob(
+    p1 + theme(legend.position = "none"),
+    p2 + theme(legend.position = "none"),
+    nrow = 1
+  ),
+  g_legend(p1), 
+  nrow    = 3,
+  heights = c(10, 10, 1)
+)
+```
+
+
+
+
+
+# Methods
+
+We compare the current default method of last observation carried forward 
+(LOCF) to 
+a mixed effect model (MM), a Gaussian process regression (GP)
+as well as a multi-state model (MSM).
+For all model-based approaches we additionally explore variants including
+the key IMPACT [REFERENCE] predictors as covariates.
+For the MSM approach the inclusion of all predictors proved to be numerically 
+unstable due to the resulting model complexity and we only considered 
+age at injury.
+
+
+## Last-observation-carried-forward
+
+Since LOCF is still widely used to impute missing outcomes in TBI studies, 
+it serves as baseline methods for the comparison at hand.
+Here, LOCF is defined as the last GOSe observation before the
+imputation time point of exactly 180 days post-injury.
+Note that LOCF is not model-based and may not permit the imputation
+of a GOSe value for all subjects in the study population.
+A value cannot be imputed, if only valid GOSe observations after 180 days 
+post-injury are available for a subject but none before.
+We account for this lack of complete coverage under LOCF by performing all
+performance comparisons including LOCF on the subset of individuals
+for which a LOCF-imputed value can be obtained (cf. Section ???). 
+
+
+## Mixed-effects model
+
+Mixed effects models are a standard approach to longitudinal data analysis  
+which model individual deviations from a population mean trajectory 
+(REFERENCE).
+To account for the fact that the GOSe outcome is an ordered factor,
+we employ a cumulative link function model with flexible intercepts @Agresti2003.
+The population mean is modeled as spline function with knots at ???
+to allow a non-linear population mean trajectory.
+Patient-individual deviations from this population mean are modeled 
+as quadratic polynomials to allow sufficient flexibility (random effects).
+Baseline covariates are added as linear fixed effects to to the 
+population mean.
+The model was fitted using Bayesian statistics via the BRMS 
+package @brms2017, @brms2018 for the R environment for statistical 
+computing @r2016 and the Stan modelling language @stan2017. 
+Non-informative priors were used for the model parameters.
+
+
+## Gaussian process model
+
+A potential drawback of the proposed longitudinal mixed effects model 
+is the fact that the individual deviations from the population mean 
+are modeled globally using polynomials.
+Since linear and quadratic terms are not identifiable for patients with 
+only one observed GOSE value, this implies large uncertainty over the patient-specific effects for individuals with one or two observations 
+only by falling back on the non-informative priors on these model 
+parameters. 
+Thus, the overall uncertainty associated with model-based imputations 
+at exactly 180 days may become relatively large.
+
+Gaussian process regression allows to flexibly model 
+both the individual GOSE trajectories as well as the population mean
+in a Bayesian non-parametric way @rasmussen2006.
+This non-parametric paradigm leads to a very flexible model and
+low model-uncertainty in the vicinity of actually observed GOSe outcomes.
+To account for the discreteness of the GOSe outcome, 
+the continuous output of the Gaussian process model is rounded to the
+nearest integer in  1 to 8 (GOSe categories).
+The commonly used squared exponential covariance function with shared 
+length scale for all individuals is used to model intra-individual
+dependency of GOSe outcomes.
+The population mean trajectory of the Gaussian process is modeled as
+mean of an independent Gaussian process with pseudo observations at 
+`r config$gp$t_knots %>% as.numeric()` days post-injury (also using a squared 
+exponential covariance function).
+This approach maintains flexibility of the population mean function 
+similar to the spline-based approach for the mixed effects model
+while avoiding the the computational complexity of a fully
+hierarchical Gaussian process model.
+Again, the impact of baseline covariates is modeled via linear effects
+on the population mean of the Gaussian process.
+All parameters are estimated in a fully Bayesian fashion using the Stan 
+modelling language @stan2017 and non-informative priors except for the 
+length scale of the squared exponential kernel.
+Due to the sparseness of the data, the estimated length scale will naturally 
+tend towards extremely large values implying unrealistically 
+long-range dependency between observations.
+We therefore limit the length scale to a maximum of 120 days (4 months) and 
+impose a Gaussian prior $\mathcal{N}(60, 14)$.
+
+
+## Multi-state model
+
+Both the mixed effects model as well as the Gaussian process regression 
+model are essentially non-linear regression techniques for longitudinal 
+data.
+While they are both powerful tools to model longitudinal trajectories,
+they do not explicitly model the probability of transitions between 
+GOSe states.
+Since the number of observations per individual is very limited in our
+data set (1 to 4 GOSe observations per individual),
+an approach explicitly modelling transition probabilities might be 
+more suitable to capture the dynamics of the GOSe trajectories.
+To explore this further, a Markov multi-state model is considered [REFERENCE].
+This model class assumes that the transitions between adjacent GOSe 
+states can be modeled as a Markov process and the transition 
+intensities between adjacent states are fitted to the observed data.
+
+```{r diagram, echo=FALSE, fig.cap="Structural diagram of allowed transitions between GOSe states for the proposed multi-state model.", fig.width=3.5, fig.height=2}
+DiagrammeR::grViz("
+digraph boxes_and_circles {
+
+  # a 'graph' statement
+  graph [layout = dot, rankdir = LR, overlap = false, fontsize = 10]
+
+  # several 'node' statements
+  node [shape = circle,
+        fixedsize = true,
+        width = 0.9] // sets as circles
+  1; 2; 3; 4; 5; 6; 7; 8
+
+  # several 'edge' statements
+  2->1 2->3
+  3->1 3->2 3->4
+  4->1 4->3 4->5
+  5->1 5->4 5->6
+  6->1 6->5 6->7
+  7->1 7->6 7->8
+  8->1 8->7
+}
+")
+```
+
+To account for the fact that state-transitions might be more frequent 
+in the early post-injury phase, 
+piece wise constant transition intensities were fitted to the intervals 
+[0, 90), [90, 270), and 270+ days post-injury.
+The model was fit using the msm @msm2011 package for the R environment
+for statistical computing @R2016.
+Due to the large number of 19 transition intensities in the proposed model
+(cf. arrows in Figure ???, structure of transition graph),
+inclusion of all baseline covariates turned out to be numerically unstable.
+For the MSM model, instead of including all covariates, 
+only a model adjusting for age at injury via a proportional hazard approach  
+was fit.
+
+
+## Performance assessment
+
+```{r load-test-data, include=FALSE}
+df_ground_truth <- c()
+
+for (i in 1:config$folds) {
+  
+  df_ground_truth <- rbind(
+    df_ground_truth,
+    # note that GOSE values in test data are not subject to mi - loading the 
+    # first data set is sufficient!
+    sprintf("%s/validation/df_test_mi_1_fold_%i.rds", params$data_dir, i) %>%
+      readRDS() %>% 
+      mutate(fold = i) %>% 
+      select(gupi, fold, Outcomes.DerivedCompositeGOSEDaysPostInjury, Outcomes.DerivedCompositeGOSE)
+  )
+  
+}
+```
+
+Model performance was assessed via three fold cross validation on the subset 
+of individuals with a valid GOSe value within 180 +/- 14 days post-injury 
+(n = `r nrow(df_ground_truth)`.
+All models were fit on the entire available data after removing the 
+180 +/- 14 days post-injury observation from the respective test fold
+thus mimicking a missing completely at random missing data mechanism.
+The distribution of GOSe values in the respective three test sets is well 
+balanced, cf. Figure ??? (Appendix).
+
+```{r misc, include=FALSE}
+modelnames <- lapply(
+    list.dirs(
+      sprintf("%s/validation/posteriors", params$data_dir), 
+      recursive = FALSE
+    ),
+    basename
+  ) %>% 
+  as.character()
+
+df_model_posteriors <- c()
+
+for (modelname in modelnames) {
+  for (i in 1:config$mi_m) {
+    for (j in 1:config$folds) {
+      tryCatch(
+        df_model_posteriors <- rbind(
+          df_model_posteriors,
+          sprintf("%s/validation/posteriors/%s/df_posterior_mi_%i_fold_%i.rds", 
+                  params$data_dir, modelname, i, j) %>%
+            readRDS() %>% 
+            mutate(imp = i, fold = j, model = modelname) %>% 
+            select(gupi, fold, imp, model, everything()) %>% 
+            filter(
+              gupi %in% (df_ground_truth %>% filter(fold == j) %>% .[["gupi"]])
+            )
+        ), 
+        error = function(e) print(e)
+      )
+    }
+  }
+}
+
+df_model_posteriors <- df_model_posteriors %>% 
+  mutate(
+    model = factor(model,
+      levels = c("locf", "mm_nb", "mm", "gp_nb", "gp", "msm", "msm_age"),
+      labels = c("LOCF", "MM", "MM + cov", "GP", "GP + cov", "MSM", "MSM + age"),
+    ) %>% as.character
+  )
+
+df_predictions <- df_model_posteriors %>%
+  group_by(fold, model, gupi, t, GOSE) %>%
+  summarise(probability = mean(probability)) %>%
+  arrange(model, fold, gupi, t, GOSE) %>% 
+  ungroup() %>% 
+  filter(t == 180) %>% 
+  select(-t) %>% 
+  group_by(fold, model, gupi) %>% 
+  do(
+    GOSE = .$GOSE,
+    prediction = rep(which.max(.$probability) %>% 
+      ifelse(length(.) == 0, NA_integer_, .) %>% 
+      ifelse(length(.) > 1, round(mean(.)), .), 8),
+    probability = .$probability
+  ) %>% 
+  unnest %>% 
+  spread(GOSE, probability) %>% 
+  right_join(
+    df_ground_truth %>% 
+      mutate(
+        gupi = as.character(gupi),
+        GOSE = as.integer(Outcomes.DerivedCompositeGOSE)
+    ), 
+    by = c("gupi", "fold")
+  )
+```
+
+Performance is assessed using the absolute-count and the normalized 
+(proportions) confusion matrices as well as
+bias, mean absolute error (MAE) and root mean squared error (RMSE).
+Bias, MAE, and RMSE are considered both conditional on the ground-truth
+(unobserved true GOSe) as well as averaged over the entire test set.
+Note that comparisons in terms of bias, MAE, and RMSE tacitly assume that 
+GOSe values can be sensibly interpreted on an interval scale.
+
+LOCF, by design, may not be able to provide imputed values whenever 
+no observation before 180 days post-injury is observed.
+Since there is reason to assume that these individuals 6-months GOSE will be 
+harder to predict, 
+a comparison of LOCF with the other methods must be conditional on the 
+set of individuals for which an LOCF imputation is possible.
+```{r non-locf-ids, include=FALSE}
+idx <- df_predictions %>% 
+  filter(model == "LOCF", !complete.cases(.)) %>% 
+  .[["gupi"]]
+```
+Overall, `r length(idx)` out of 
+`r df_predictions %>% filter(model == "LOCF") %>% nrow` test cases 
+(`r round(100 * length(idx) / (df_predictions %>% filter(model == "LOCF") %>% nrow), 1)`%) cannot be imputed with the LOCF approach.
+
+In the entire study population, `r df_gose %>% group_by(gupi) %>% summarize(LOCF = any(Outcomes.DerivedCompositeGOSEDaysPostInjury <= 180)) %>% ungroup %>% summarize(n_LOCF = sum(!LOCF)) %>% .[["n_LOCF"]]`
+individuals do not permit an LOCF imputation (`r round(100 * (df_gose %>% group_by(gupi) %>% summarize(LOCF = any(Outcomes.DerivedCompositeGOSEDaysPostInjury <= 180)) %>% ungroup %>% summarize(n_LOCF = sum(!LOCF)) %>% .[["n_LOCF"]]) / (df_gose$gupi %>% unique %>% length), 1)`%).
+
+
+
+
+
+# Results
+
+The overall performance of all fitted models in terms of bias, MAE, and RMSE
+is depicted in Figure ??? both conditional on LOCF being applicable and,
+excluding LOCF, on the entire test set.
+
+```{r overall-comparison-all-methods, echo=FALSE, fig.height=7}
+p1 <- df_predictions %>%
+  filter(!(gupi %in% idx)) %>% 
+  group_by(model, fold) %>% 
+  summarize(
+    RMSE = mean((GOSE - prediction)^2, na.rm = TRUE) %>% sqrt,
+    MAE  = mean(abs(GOSE - prediction), na.rm = TRUE),
+    bias = mean(prediction, na.rm = TRUE) - mean(GOSE, na.rm = TRUE) 
+  ) %>% 
+  ungroup %>% 
+  gather(error, value, -model, -fold) %>% 
+  ggplot(aes(model, value)) + 
+    geom_hline(yintercept = 0, color = "black") +
+    geom_boxplot() + 
+    facet_wrap(~error) +
+    scale_y_continuous(name = "", breaks = seq(-2, 8, .25), limits = c(-.5, 1.5)) +
+    scale_x_discrete("") + 
+    theme_bw() +
+    theme(
+      panel.grid.minor   = element_blank(),
+      panel.grid.major.x = element_blank(),
+      axis.text.x        = element_text(angle = 66, hjust = 1)
+    ) +
+    ggtitle("Overall summary measures, LOCF subset")
+
+p2 <- df_predictions %>%
+  filter(model != "LOCF") %>% 
+  group_by(model, fold) %>% 
+  summarize(
+    RMSE = mean((GOSE - prediction)^2, na.rm = TRUE) %>% sqrt,
+    MAE  = mean(abs(GOSE - prediction), na.rm = TRUE),
+    bias = mean(prediction, na.rm = TRUE) - mean(GOSE, na.rm = TRUE) 
+  ) %>% 
+  ungroup %>% 
+  gather(error, value, -model, -fold) %>% 
+  ggplot(aes(model, value)) + 
+    geom_hline(yintercept = 0, color = "black") +
+    geom_boxplot() + 
+    facet_wrap(~error) +
+    scale_y_continuous(name = "", breaks = seq(-2, 8, .25), limits = c(-.5, 1.5)) +
+    scale_x_discrete("") + 
+    theme_bw() +
+    theme(
+      panel.grid.minor   = element_blank(),
+      panel.grid.major.x = element_blank(),
+      axis.text.x        = element_text(angle = 66, hjust = 1)
+    ) +
+    ggtitle("Overall summary measures, full test set")
+
+cowplot::plot_grid(
+  p1, 
+  p2, 
+  ncol = 1, align = "v", axis = "lr"
+)
+```
+
+Firstly, as could be expected, LOCF is overall negatively biased, i.e., 
+on average it imputes lower-than-observed GOSe values.
+This indicates that there is a population average trend towards continued
+recovery within the first 6 months post-injury.
+In terms of accuracy, the LOCF does perform worst but differences between
+methods are less pronounced than in terms of bias.
+Notably, the difference in terms of RMSE is more substantial than in terms of
+MAE which indicates that LOCF tends to incur more large deviations across
+several GOSe categories than the other methods considered here.
+Secondly, including baseline covariates only affects the GP regression model
+notably. 
+Both the MM and MSM model perform more or less the same irrespective of adjustment
+for baseline covariates.
+This indicates that the marginal additional predictive value of baseline 
+covariates over the information contained in at least one observed GOSe value
+is limited.
+Furthermore, note that both variants of the mixed effects model fail to remove
+the overall bias of the imputed values.
+
+We proceed with an in-depth analyses of a subset of models both in direct 
+comparison with LOCF and in the entire data set including those cases where
+LOCF is not applicable.
+In the following we only consider the baseline-adjusted Gaussian process model 
+('GP + baseline'), the mixed effect model without baseline covariates ('MM')
+and the multi-state model without baseline covariates ('MSM').
+The rationale behind dropping baseline adjustment for MM and MSM being that
+the additional complexity does not substantially alter overall performance.
+
+
+## Detailed comparison conditional on LOCF subset
+
+We first consider results for the set of test cases which allow LOCF imputation 
+(n = `r df_predictions %>% filter(model == "LOCF") %>% nrow - length(idx)`).
+Both the raw count as well as the relative (by true GOSe) confusion matrices
+are presented in Figure ???.
+
+```{r confusion-matrix-locf, warning=FALSE, message=FALSE, echo=FALSE, fig.cap="Confusion matrices on LOCF subset."}
+df_average_confusion_matrices <- df_predictions %>% 
+  filter(!(gupi %in% idx)) %>% 
+  group_by(fold, model) %>% 
+  do(
+    confusion_matrix = caret::confusionMatrix(
+        data = factor(.$prediction, levels = 1:8), 
+        reference = factor(.$GOSE, levels = 1:8)
+      ) %>% 
+      as.matrix %>% as_tibble %>% 
+      mutate(`Predicted GOSE` = row_number() %>% as.character) %>% 
+      gather(`True GOSE`, n, 1:8)
+  ) %>% 
+  unnest %>% 
+  group_by(model, `Predicted GOSE`, `True GOSE`) %>% 
+  summarize(n = mean(n)) %>% 
+  ungroup
+
+p_cnf_mtrx_raw <- df_average_confusion_matrices %>%
+  filter(model %in% c("LOCF", "MM", "GP + cov", "MSM")) %>% 
+  group_by(model, `True GOSE`) %>%
+  ggplot(aes(`True GOSE`, `Predicted GOSE`, fill = n)) +
+    geom_raster() +
+    geom_hline(yintercept = c(2, 4, 6) + .5, color = "black") +
+    geom_vline(xintercept = c(2, 4, 6) + .5, color = "black") +
+    scale_fill_gradient(low = "white", high = "black") +
+    coord_fixed(expand = FALSE) + 
+    labs(x = "true GOSe", y = "imputed GOSe", fill = "") +
+    theme_bw() +
+    theme(
+      panel.grid = element_blank()
+    ) + 
+    facet_wrap(~model, nrow = 1) +
+    ggtitle("Average confusion matrix accross folds (absolute counts)")
+
+p_cnf_mtrx_colnrm <- df_average_confusion_matrices %>%
+  filter(model %in% c("LOCF", "MM", "GP + cov", "MSM")) %>% 
+  group_by(model, `True GOSE`) %>%
+  mutate(
+    `fraction (column)` = n / sum(n),
+    `fraction (column)` = ifelse(is.nan(`fraction (column)`), 0, `fraction (column)`)
+  ) %>% 
+  ggplot(aes(`True GOSE`, `Predicted GOSE`, fill = `fraction (column)`)) +
+    geom_raster() +
+    geom_hline(yintercept = c(2, 4, 6) + .5, color = "black") +
+    geom_vline(xintercept = c(2, 4, 6) + .5, color = "black") +
+    scale_fill_gradient("", low = "white", high = "black", limits = c(0, 1)) +
+    coord_fixed(expand = FALSE) + 
+    labs(x = "true GOSe", y = "imputed GOSe", fill = "") +
+    theme_bw() +
+    theme(
+      panel.grid = element_blank()
+    ) + 
+    facet_wrap(~model, nrow = 1) +
+    ggtitle("Average confusion matrix accross folds (column fraction)")
+
+cowplot::plot_grid(p_cnf_mtrx_raw, p_cnf_mtrx_colnrm, ncol = 1, align = "v")
+
+ggsave(filename = "confusion_matrices_locf.pdf", width = 7, height = 7)
+ggsave(filename = "confusion_matrices_locf.png", width = 7, height = 7)
+```
+
+The absolute-count confusion matrices show that most imputed values are 
+within +/- one GOSE categories of the true value. 
+However, they also reflect the category imbalance (cf. Figures ??? and ???, 
+Appendix) in the study ppopulation.
+The performance conditional on the (in practice unknown) true GOSe value
+clearly shows that imputation performance for the most infrequent category 4
+(GOSe of 1 and 2 are not observed in the test set!) is most problematic.
+This is, however, true across the board of methods considered.
+Note that both the MSM as well as the MM models account for this fact by 
+almost never imputing a GOSe of 4.
+Instead, the respective cases tend to be imputed to GOSe 3 or 5.
+
+Additionally, the following table ??? lists some confusion probabilities
+of particular interest.
+[Comment: 1 -> > 1 is not relevant since our imputation is conditional
+on not being 1 at 6 months; the comparison seems to favor LOCF since only
+upward confusions are considered (which LOCF by design tends to do less)]
+
+```{r crossing-table, echo=FALSE, warning=FALSE, results='asis'}
+rbind(
+df_average_confusion_matrices %>% 
+  filter(model %in% c("LOCF", "MM", "GP + cov", "MSM")) %>% 
+  group_by(model) %>% 
+  filter(`True GOSE` <= 3) %>% 
+  mutate(n_total = sum(n)) %>% 
+  filter(`Predicted GOSE` > 3) %>% 
+  summarize(fraction = sum(n / n_total)) %>% 
+  mutate(`Event` = "<=3 -> >3"),
+
+df_average_confusion_matrices %>% 
+  filter(model %in% c("LOCF", "MM", "GP + cov", "MSM")) %>% 
+  group_by(model) %>% 
+  filter(`True GOSE` == 4) %>% 
+  mutate(n_total = sum(n)) %>% 
+  filter(`Predicted GOSE` > 4) %>% 
+  summarize(fraction = sum(n / n_total)) %>% 
+  mutate(`Event` = "4 -> >4"),
+
+df_average_confusion_matrices %>% 
+  filter(model %in% c("LOCF", "MM", "GP + cov", "MSM")) %>% 
+  group_by(model) %>% 
+  filter(`True GOSE` < 8) %>% 
+  mutate(n_total = sum(n)) %>% 
+  filter(`Predicted GOSE` == 8) %>% 
+  summarize(fraction = sum(n / n_total)) %>% 
+  mutate(`Event` = "<8 -> 8")
+) %>% 
+  transmute(Model = model, Percent = 100*fraction, Event) %>% 
+  spread(Event, Percent) %>% 
+  pander::pandoc.table("Some specific confusion percentages, LOCF subset.", digits = 3)
+```
+
+[TODO: Interpretation? GOSe 4 potentially relatively brief transient state, 
+clinical relevance of distinction between 3/4?]
+
+To better understand the overall performance assessment in Figure ???,
+we also consider the performance conditional on the respective ground-truth
+(i.e. the true GOSe values in the respective test sets).
+The results are shown in Figure ???.
+
+```{r error-scores-locf, echo=FALSE, fig.height=3}
+df_predictions %>%
+  filter(!(gupi %in% idx)) %>% 
+  filter(model %in% c("LOCF", "MM", "GP + cov", "MSM")) %>% 
+  group_by(model, fold, GOSE) %>% 
+  summarize(
+    RMSE = mean((GOSE - prediction)^2, na.rm = TRUE) %>% sqrt,
+    MAE  = mean(abs(GOSE - prediction), na.rm = TRUE),
+    bias = mean(prediction, na.rm = TRUE) - mean(GOSE, na.rm = TRUE) 
+  ) %>% 
+  gather(error, value, -model, -fold, -GOSE) %>% 
+  group_by(GOSE, model, error) %>% 
+  summarize(
+      mean = mean(value),
+      sd   = sd(value)
+  ) %>% 
+  ungroup() %>% 
+  ggplot(aes(GOSE, color = model)) + 
+    geom_hline(yintercept = 0, color = "black") +
+    geom_line(aes(y = mean)) + 
+    geom_point(aes(y = mean)) + 
+    xlab("GOSe") + 
+    facet_wrap(~error) +
+    scale_y_continuous(name = "", breaks = seq(-2, 8, .25)) +
+    theme_bw() +
+    theme(
+      panel.grid.minor   = element_blank(),
+      panel.grid.major.x = element_blank(),
+      legend.title       = element_blank()
+    ) +
+    ggtitle("Bias and accuracy by true GOSe value, LOCF subset")
+
+ggsave(filename = "errors_stratified_locf.pdf", width = 7, height = 7)
+ggsave(filename = "errors_stratified_locf.png", width = 7, height = 7)
+```
+
+Just as with the overall performance, differences are most pronounced in terms
+of bias.
+Again, the category imbalance in the GOSe distribution explains the fact that
+all model-based approaches tend to perform better for the most frequent
+categories 6, 7, and 8 while sacrificing performance for the less frequent
+categories 4 and 5 as compared to LOCF.
+Bias-wise all methods exhibit a certain regression to the mean effect since low
+categories tend to be confused with better (higher) GOSe on average while
+high true GOSe values are dampened (negative bias at 7, 8).
+Since LOCF does not take the category imbalance into account and since it exhibits
+a relatively large negative bias at the most frequent GOSe values, it is
+overall negatively biased.
+Interestingly, the conditional assessment of the GP regressions bias profile
+reveals that the overall unbiasedness can be explained by the relatively high
+positive and negative biases conditional on low/high GOSe values canceling out
+in the overall population.
+Since the accuracy results mirror this effect, the GP regression model seems
+to suffer from an overly string regression to the mean effect. 
+
+The MSM and MM models are fairly similar with respect to accuracy with a slight
+advantage for MSM.
+Interestingly, though, the MSM approach is consistently less positively biased
+across the rarer low GOSe categories while being only insignificantly more 
+negatively biased for categories 7 and 8.
+
+
+
+## Detailed comparison on full test set
+
+In the following, LOCF is not considered since a meaningful comparison 
+including LOCF is not possible on the entire set of test candidates due to the 
+fact that LOCF is not applicable in cases
+where only GOSe values after 180 days post-injury are available.
+The relative characteristics of the three considered approaches are comparable
+to the LOCF subset.
+
+```{r confusion-matrix, warning=FALSE, message=FALSE, echo=FALSE, fig.cap="Confusion matrices, full training set without LOCF."}
+df_average_confusion_matrices <- df_predictions %>% 
+  filter(
+    model %in% c("GP + cov", "MM", "MSM")
+  ) %>% 
+  group_by(fold, model) %>% 
+  do(
+    confusion_matrix = caret::confusionMatrix(
+        data = factor(.$prediction, levels = 1:8), 
+        reference = factor(.$GOSE, levels = 1:8)
+      ) %>% 
+      as.matrix %>% as_tibble %>% 
+      mutate(`Predicted GOSE` = row_number() %>% as.character) %>% 
+      gather(`True GOSE`, n, 1:8)
+  ) %>% 
+  unnest %>% 
+  group_by(model, `Predicted GOSE`, `True GOSE`) %>% 
+  summarize(n = mean(n)) %>% 
+  ungroup
+
+p_cnf_mtrx_raw <- df_average_confusion_matrices %>% 
+  group_by(model, `True GOSE`) %>%
+  ggplot(aes(`True GOSE`, `Predicted GOSE`, fill = n)) +
+    geom_raster() +
+    geom_hline(yintercept = c(2, 4, 6) + .5, color = "black") +
+    geom_vline(xintercept = c(2, 4, 6) + .5, color = "black") +
+    scale_fill_gradient(low = "white", high = "black") +
+    coord_fixed(expand = FALSE) + 
+    labs(x = "true GOSe", y = "imputed GOSe", fill = "") +
+    theme_bw() +
+    theme(
+      panel.grid = element_blank()
+    ) + 
+    facet_wrap(~model, nrow = 1) +
+    ggtitle("Average confusion matrix accross folds (absolute counts)")
+
+p_cnf_mtrx_colnrm <- df_average_confusion_matrices %>%
+  group_by(model, `True GOSE`) %>%
+  mutate(
+    `fraction (column)` = n / sum(n),
+    `fraction (column)` = ifelse(is.nan(`fraction (column)`), 0, `fraction (column)`)
+  ) %>% 
+  ggplot(aes(`True GOSE`, `Predicted GOSE`, fill = `fraction (column)`)) +
+    geom_raster() +
+    geom_hline(yintercept = c(2, 4, 6) + .5, color = "black") +
+    geom_vline(xintercept = c(2, 4, 6) + .5, color = "black") +
+    scale_fill_gradient("", low = "white", high = "black", limits = c(0, 1)) +
+    coord_fixed(expand = FALSE) + 
+    labs(x = "true GOSe", y = "imputed GOSe", fill = "") +
+    theme_bw() +
+    theme(
+      panel.grid = element_blank()
+    ) + 
+    facet_wrap(~model, nrow = 1) +
+    ggtitle("Average confusion matrix accross folds (column fraction)")
+
+cowplot::plot_grid(p_cnf_mtrx_raw, p_cnf_mtrx_colnrm, ncol = 1, align = "v")
+
+ggsave(filename = "confusion_matrices_all.pdf", width = 7, height = 7)
+ggsave(filename = "confusion_matrices_all.png", width = 7, height = 7)
+```
+
+```{r crossing-table-full, echo=FALSE, warning=FALSE, results='asis'}
+rbind(
+df_average_confusion_matrices %>% 
+  group_by(model) %>% 
+  filter(`True GOSE` <= 3) %>% 
+  mutate(n_total = sum(n)) %>% 
+  filter(`Predicted GOSE` > 3) %>% 
+  summarize(fraction = sum(n / n_total)) %>% 
+  mutate(`Event` = "<=3 -> >3"),
+
+df_average_confusion_matrices %>% 
+  group_by(model) %>% 
+  filter(`True GOSE` == 4) %>% 
+  mutate(n_total = sum(n)) %>% 
+  filter(`Predicted GOSE` > 4) %>% 
+  summarize(fraction = sum(n / n_total)) %>% 
+  mutate(`Event` = "4 -> >4"),
+
+df_average_confusion_matrices %>% 
+  group_by(model) %>% 
+  filter(`True GOSE` < 8) %>% 
+  mutate(n_total = sum(n)) %>% 
+  filter(`Predicted GOSE` == 8) %>% 
+  summarize(fraction = sum(n / n_total)) %>% 
+  mutate(`Event` = "<8 -> 8")
+) %>% 
+  transmute(Model = model, Percent = 100*fraction, Event) %>% 
+  spread(Event, Percent) %>% 
+  pander::pandoc.table("Some specific confusion percentages, full data set.", digits = 3)
+```
+
+```{r error-scores-all, echo=FALSE, fig.height=3}
+df_predictions %>%
+  filter(model %in% c("GP + cov", "MM", "MSM")) %>% 
+  group_by(model, fold, GOSE) %>% 
+  summarize(
+    RMSE = mean((GOSE - prediction)^2, na.rm = TRUE) %>% sqrt,
+    MAE  = mean(abs(GOSE - prediction), na.rm = TRUE),
+    bias = mean(prediction, na.rm = TRUE) - mean(GOSE, na.rm = TRUE) 
+  ) %>% 
+  gather(error, value, -model, -fold, -GOSE) %>% 
+  group_by(GOSE, model, error) %>% 
+  summarize(
+      mean = mean(value),
+      sd   = sd(value)
+  ) %>% 
+  ungroup() %>% 
+  ggplot(aes(GOSE, color = model)) + 
+    geom_hline(yintercept = 0, color = "black") +
+    geom_point(aes(y = mean)) + 
+    geom_line(aes(y = mean)) + 
+    facet_wrap(~error) +
+    scale_y_continuous(name = "", breaks = seq(-2, 8, .25)) +
+    xlab("GOSe") +
+    theme_bw() +
+    theme(
+      panel.grid.minor   = element_blank(),
+      panel.grid.major.x = element_blank(),
+      legend.title       = element_blank()
+    ) +
+    ggtitle("Accuracy by true GOSe value, full test set")
+
+ggsave(filename = "imputation_error.pdf", width = 7, height = 7)
+ggsave(filename = "imputation_error.png", width = 7, height = 7)
+```
+
+
+
+
+
+# Discussion
+
+The most important measure to prevent missing data bias 
+is to avoid missing values in the first place!
+Study protocols and procedures should be checked preemptively for their 
+applicability to clinical practice and adequate effort should be put in
+ensuring good compliance with follow-up protocols.
+Nevertheless, in practice, missing values due to loss-to-follow-up will always
+occur and should be addressed efficiently.
+There is a wide consensus that non-imputation of missing values cannot be 
+considered best practice and may introduce bias or lead to an unnecessary loss 
+of power.
+The current gold-standard for imputing missing values is certainly multiple
+imputation on a per-analysis basis including analysis-specific covariates to
+further reduce bias and to preserve the imputation uncertainty in the 
+downstream analysis.
+However, in practice, there are good reasons to provide a set of single-imputed
+default values in larger projects such as CENTER-TBI.
+E.g., consortia are increasingly committed to making their databases 
+available to a wider range of researchers. 
+In fact, more liberal data-sharing policies are becoming an core requirement 
+for many funding bodies [REFERENCES].
+In this context, it might not be possible to ensure that every analysis team
+has the necessary statistical expertise to properly conduct a per-analysis
+multiple imputation.
+Furthermore, the imputed values of a multiple-imputation procedure are 
+inherently random and it is thus difficult to ensure consistency across 
+different analysis teams.
+For this reason, we suggest to provide a default single-imputation with 
+appropriate measures of uncertainty for key outcomes in the published 
+data base itself.
+This mitigates problems with complete cases analyses and provide a more
+principled and consistent default approach to handling missing values.
+Wherever necessary and practical, a custom, analysis-specific multiple 
+imputation approach might still be used and the model used to provide the 
+single-imputed values may be used as a starting point.
+
+Several reasons disqualify LOCF as method of choice.
+Not only is it inherently biased, but LOCF is also inefficient in that it 
+does not properly take into account the category imbalance of GOSe in 
+the respective target population. 
+Albeit simple to implement, LOCF - by definition - is not capable of 
+exploiting longitudinal information after the target time point. 
+This results in a smaller subset of individuals for which imputed values can
+be provided in the first place (cf. Section ???).
+LOCF also lacks flexibility to adjust for further covariates which might be
+necessary in some cases to further reduce bias.
+Finally, LOCF cannot produce an adequate measure of imputation uncertainty 
+since it is not model based.
+
+We draw two main conclusion from our in-depth comparison of three 
+alternative, model-based approaches.
+Firstly, despite its theoretical drawbacks, LOCF is hard to beat in terms of
+raw accuracy. 
+Still, small improvements are possible (cf. Section ???).
+The main advantages of a model-based approach is thus a reduction of bias,
+the ability to provide a measure of uncertainty together with the imputed 
+values
+(or to use the same very same model to draw multiple imputations), 
+as well as the possibility to include further analysis-specific covariates.
+Secondly, we found that the inclusion of established baseline predictors for
+GOSe at 6 months post-injury had next to no effect on the imputation quality.
+Note that this does not refute their predictive value but only indicates that
+there is little marginal benefit over knowing at least one other GOSe value. 
+Finally, we conclude that the differences between the considered model-bases
+approaches are rather small.
+
+We nevertheless favor the multi-state model.
+The multi state model is well-interpretable in terms of modeling transition 
+intensities.
+Efficient implementations in standard statistical software (msm, R) are 
+available, it succeeds in eliminating the bias introduced by LOCF, is
+able to provide imputed values for the entire population and is able to
+provide a probabilistic output.
+I.e., besides the single predicted GOSe value, the fitted probabilities
+for each GOSe value may be stored in the database.
+Based on these values, it is easy to draw samples for a multiple imputation
+analysis.
+
+
+
+
+# Appendix / Supplemental Material
+
+```{r marginal-GOSe-folds, echo=FALSE, fig.cap="Marginal GOSe distribution per validation fold."}
+df_ground_truth %>% 
+    ggplot(aes(Outcomes.DerivedCompositeGOSE)) +
+        geom_bar() +
+        facet_wrap(~fold) +
+        theme_bw() +
+        ylab("") +
+        xlab("GOSe") +
+        theme(
+            panel.grid.minor   = element_blank(), 
+            panel.grid.major.x = element_blank()
+        )
+```
+
+
+
+
+```{r zip-figures, include=FALSE}
+system("zip figures.zip *.png *.pdf")
+system("rm *.png *.pdf")
+```
+
+# References
--- a/manuscript/references.bib
+++ b/manuscript/references.bib
+@article{msm2011,
+author = {Jackson, Christopher H.},
+doi = {10.18637/jss.v038.i08},
+journal = {Journal of Statistical Software},
+month = {jan},
+number = {8},
+pages = {1--28},
+title = {{Multi-State Models for Panel Data: The msm Package for R}},
+volume = {38},
+year = {2011}
+}
+@article{r2016,
+archivePrefix = {arXiv},
+arxivId = {arXiv:1011.1669v3},
+author = {Team, R Development Core and {R Development Core Team}, R},
+doi = {10.1007/978-3-540-74686-7},
+eprint = {arXiv:1011.1669v3},
+isbn = {3{\_}900051{\_}00{\_}3},
+issn = {3-900051-07-0},
+journal = {R Foundation for Statistical Computing},
+pmid = {16106260},
+title = {{R: A Language and Environment for Statistical Computing}},
+year = {2016}
+}
+@book{rasmussen2006,
+abstract = {Gaussian processes (GPs) are natural generalisations of multivariate Gaussian random variables to infinite (countably or continuous) index sets. GPs have been applied in a large number of fields to a diverse range of ends, and very many deep theoretical analyses of various properties are available. This paper gives an introduction to Gaussian processes on a fairly elementary level with special emphasis on characteristics relevant in machine learning. It draws explicit connections to branches such as spline smoothing models and support vector machines in which similar ideas have been investigated. Gaussian process models are routinely used to solve hard machine learning problems. They are attractive because of their flexible non-parametric nature and computational simplicity. Treated within a Bayesian framework, very powerful statistical methods can be implemented which offer valid estimates of uncertainties in our predictions and generic model selection procedures cast as nonlinear optimization problems. Their main drawback of heavy computational scaling has recently been alleviated by the introduction of generic sparse approximations.13,78,31 The mathematical literature on GPs is large and often uses deep concepts which are not required to fully understand most machine learning applications. In this tutorial paper, we aim to present characteristics of GPs relevant to machine learning and to show up precise connections to other "kernel machines" popular in the community. Our focus is on a simple presentation, but references to more detailed sources are provided.},
+archivePrefix = {arXiv},
+arxivId = {026218253X},
+author = {Rasmussen, Carl Edward and Williams, Christopher K. I.},
+booktitle = {International Journal of Neural Systems},
+doi = {10.1142/S0129065704001899},
+eprint = {026218253X},
+isbn = {026218253X},
+issn = {0129-0657},
+pmid = {15112367},
+title = {{Gaussian Processes for Machine Learning}},
+year = {2006}
+}
+@article{stan2017,
+author = {Carpenter, B and Gelman, A and Hoffman, MD and Lee, D},
+journal = {Journal of statistical software},
+number = {1},
+title = {{Stan: A Probabilistic Programming Language}},
+volume = {76},
+year = {2017}
+}
+@book{Agresti2003,
+author = {Agresti, Alan},
+editor = {{John Wiley {\&} Sons}},
+title = {{Categorical data analysis}},
+year = {2003}
+}
+@article{brms2018,
+author = {B{\"{u}}rkner, Paul},
+journal = {The R Journal},
+number = {1},
+pages = {395--411},
+title = {{Advanced Bayesian Multilevel Modeling with the R Package brms}},
+volume = {10},
+year = {2018}
+}
+@article{brms2017,
+author = {B{\"{u}}rkner, Paul},
+journal = {Journal Of Statistical Software},
+number = {1},
+pages = {1--28},
+title = {{brms: An R Package for Bayesian Multilevel Models Using Stan}},
+volume = {80},
+year = {2017}
+}
+@article{Steyerberg2008,
+author = {Steyerberg, Ewout W and Mushkudiani, Nino and Perel, Pablo and Butcher, Isabella and Lu, Juan and McHugh, Gillian S and Murray, Gordon D and Marmarou, Anthony and Roberts, Ian and Habbema, J. Dik F and Maas, Andrew I. R},
+doi = {10.1371/journal.pmed.0050165},
+editor = {Singer, Mervyn},
+journal = {PLoS Medicine},
+month = {aug},
+number = {8},
+pages = {e165},
+title = {{Predicting Outcome after Traumatic Brain Injury: Development and International Validation of Prognostic Scores Based on Admission Characteristics}},
+url = {https://dx.plos.org/10.1371/journal.pmed.0050165},
+volume = {5},
+year = {2008}
+}
--- a/reports/prepare_data.Rmd
+++ b/reports/prepare_data.Rmd
@@ -135,13 +135,16 @@ Overall, `r nrow(df_baseline)` individuals have recorded baseline data.

 ## GOSE data

+No GOSE data after $18*30=540$ days post-injury is used.
+
 ```{r extract-gose}
 df_gose <- df_gose %>% 
  distinct %>% 
  filter(complete.cases(.)) %>% 
  arrange(gupi, Outcomes.DerivedCompositeGOSEDaysPostInjury) %>% 
  group_by(gupi, Outcomes.DerivedCompositeGOSEDaysPostInjury) %>% 
-  mutate(Outcomes.DerivedCompositeGOSE = factor(Outcomes.DerivedCompositeGOSE, levels = 1:8))
+  mutate(Outcomes.DerivedCompositeGOSE = factor(Outcomes.DerivedCompositeGOSE, levels = 1:8)) %>% 
+  filter(Outcomes.DerivedCompositeGOSEDaysPostInjury <= 18*30)
 ``` 

 Overall, `r nrow(df_gose)` GOSE measurements of 
@@ -155,7 +158,7 @@ The target population is thus the subset of individuals with

 ```{r exclude-early-deaths}
 early_deaths_gupi <- df_deaths %>% 
-  filter(days <= 180 - 14) %>% 
+  filter(days <= 180) %>% 
  .[["gupi"]]

 df_gose <- df_gose %>% 
@@ -179,7 +182,8 @@ df_gose <- df_gose %>%
    {
      first_death <- which(Outcomes.DerivedCompositeGOSE == 1)[1] 
      (is.na(first_death) | n() <= first_death)
-    }
+    },
+    Outcomes.DerivedCompositeGOSEDaysPostInjury <= 18*30
  ) %>% 
  ungroup()