Commit 32996c4e authored by Kevin Kunzmann's avatar Kevin Kunzmann

...

parent d5175d55
......@@ -38,30 +38,49 @@ single (@clifton2011; @silver2006; @skolnick2014) and
multiple imputation (@bulger2010; @kirkness2006; @wright2014; @robertson2014).
Last observation carried forward (LOCF) is a widely applied single-imputation
method for dealing with missing data in TBI research.
Typically, 3-month outcome is substituted for missing 6-month data (@steyerberg2008).
Although LOCF is easy to understand and implement, the technique is clearly
lacking in several respects.
Firstly, it is biased in that it neglects potential trends in the GOS(e)
trajectories.
The 3-month outcome is recognized as one approach of subsituting for missing
6-month data (@steyerberg2008),
and has been used in recent trials (@skolnick2014).
Although LOCF is easy to understand and implement, the technique is
suboptimal in several respects.
Firstly, it is biased in that it neglects potential trends in
GOS(e) trajectories.
Secondly, a naive application of LOCF is also inefficient since it neglects
data observed briefly after the target time window.
E.g., a GOS(e) value recorded at 200 days post-injury is likely to be more
informative about the status at 180 days post-injury than a value observed 90
days post-injury.
Finally, the *ad-hoc* nature of the LOCF method implies that there is no
probabilistic model an thus no measure of uncertainty about the imputed values.
probabilistic model, and thus no measure of uncertainty about the imputed values.
This also implies that it is impossible to include additional covariates
to further reduce bias introduced by the imputation method and that LOCF
cannot be used to obtain multiply imputed data sets by design.
The variation in timing of outcome assessments for patients with TBI
varies between studies.
Some studies define very stringent time windows (e.g. + 2 weeks [TRACK-TBI]),
but this can lead to a substantial amount missing data (@richter2019). Consequently other studies have defined more pragmatic protocol windows
(e.g. -1 month to +2 months, @maas2014).
While the wider windows enable more complete data collection,
they suffer from the problem that outcome can be evolving over this period,
and an outcome assessment obtained at five months
(the beginning of this window) in one subject may not be strictly comparable
with outcomes obtained just before eight months (the end of the window) in
another subject.
Consequently, even where outcomes are available within pragmatic protocol
windows,
there may be a benefit from being able to impute an outcome more
precisely at the 180 day (6 month) time point.
In this manuscript, three model-based imputation strategies for GOSe at
6 months (= 180 days) post-injury in the longitudinal CENTER-TBI study
@center2015collaborative are compared with LOCF.
6 months (=180 days) post-injury in the longitudinal CENTER-TBI study
(@maas2014) are compared with LOCF.
While we acknowledge the principle superiority of multiple imputation over
single imputation to propagating imputation uncertainty,
we focus on single-imputation performance for primarily practical reasons.
CENTER-TBI is committed to providing a curated database to facilitate subsequent
analyses.
we focus on single-imputation performance primarily for practical reasons.
Unlike randomized trials which involve a single principle analysis,
CENTER-TBI is committed to providing a curated database to facilitate
multiple subsequent analyses.
Since the primary endpoint in CENTER-TBI is functional outcome at 6 months,
a single default imputed value for as many study participants as possible is
desirable.
......@@ -143,7 +162,7 @@ n_pat <- nrow(df_baseline)
```
The CENTER-TBI project methods and design are described in detail
elsewhere [@center2015collaborative].
elsewhere [@center2014].
Participants with TBI were recruited into three strata:
(a) patients attending the emergency room,
(b) patients admitted to hospital but not intensive care,
......@@ -151,6 +170,8 @@ and (c) patients admitted to intensive care.
Follow-up of participants was scheduled per protocol at 2 weeks, 3 months,
and 6 months in group (a) and at 3 months, 6 months, and 12 months in groups
(b) and (c).
The protocol time window for the 6 months GOSe was between 5 and 8 months
post injury.
Outcome assessments at all timepoints included the GOSe
(@jennett1981disability, @mcmillan2016glasgow).
The GOSe is an eight-point scale with the following categories:
......@@ -159,15 +180,17 @@ The GOSe is an eight-point scale with the following categories:
(6) upper moderate disability, (7) lower good recovery,
(8) upper good recovery.
The study population for this empirical methods comparison are all individuals
from the CENTER-TBI database (total of n = 4509) whose GOSe status was queried
at least once within the first 18 months and who were still alive 180 days
post-injury `r n_pat`.
The study population for this empirical methods comparison are all
individuals from the CENTER-TBI database (total of n = 4509) whose GOSe
status was queried at least once within the first 18 months and who were
still alive 180 days post-injury `r n_pat`.
The rationale for conducting the comparison conditional on 6-months survival
is simply that the GOSe can only be missing at 6-months if the individuals
are still alive since GOSe would be (1) otherwise.
All data were accessed from the CENTER-TBI Neurobot database
(release 1.1, cf. Appendix for details).
Basic summary statistics for population characteristics are listed in
Table 1.
```{r baseline-table-continuous, echo=FALSE, results='asis'}
summarizer <- function(x) {
......@@ -230,15 +253,16 @@ n_gose_pp <- df_gose %>%
n_groups()
```
Only GOSe observations between injury and 18 months post injury are used
since extremely late follow-ups are not providing enough information.
This leads to a total of `r nrow(df_gose)` GOSe observations of the study
We decided to use only those GOSe observations obtained between injury
and 18 months post injury
since extremely late follow-ups were considered to be irrelevant to the
index follow-up time point of 6 months post injury.
This lead to a total of `r nrow(df_gose)` GOSe observations of the study
population being available for the analyses in this manuscript.
Only for `r n_gose_180` (`r round(n_gose_180 / n_pat * 100, 1)`%) individuals,
GOSe observations at 180 +/- 14 days post injury are available,
for `r n_gose_pp` (`r round(n_gose_pp / n_pat * 100, 1)`%) individuals
GOSe observations within the per-protocol window of 5-8 months post injury
exist.
For `r n_gose_180` (`r round(n_gose_180 / n_pat * 100, 1)`%) individuals,
GOSe observations at 180 +/- 14 days post injury were available and
`r n_gose_pp` (`r round(n_gose_pp / n_pat * 100, 1)`%) individuals had
GOSe observations within the per-protocol window of 5-8 months post injury.
The distribution of GOSe sampling times and both absolute and
relative frequencies of the respective GOSe categories are shown in
Figure (???).
......@@ -324,28 +348,60 @@ gridExtra::grid.arrange(
# Imputation methods
We compare last observation carried forward (LOCF) to a mixed effect model (MM),
We compared last observation carried forward (LOCF) to a mixed effect
model (MM),
a Gaussian process regression (GP), and a multi-state model (MSM).
For all model-based approaches we additionally explore variants including the key
IMPACT [@steyerberg2008predicting] predictors as covariates.
For all model-based approaches we additionally explored variants
including the key IMPACT [@steyerberg2008] predictors as covariates.
These are [????].
## Last-observation-carried-forward
Since LOCF is widely used to impute missing outcomes in TBI studies,
it serves as the baseline method.
Here, LOCF is defined as the last GOSe observation before the
it served as the baseline method.
Here, LOCF was defined as the last GOSe observation before the
imputation time point of 180 days post-injury.
LOCF is not model-based and, by definition, only permits the imputation of a
GOSe value for subjects where at least one value is available within the
first 180 days post injury.
We account for this lack of complete coverage under LOCF by performing all
LOCF is not a model-based method and, by definition,
only permits the imputation of a GOSe value for subjects where at least one
value is available within the first 180 days post injury.
We accounted for this lack of complete coverage under LOCF by performing all
performance comparisons including LOCF only on the subset of individuals
for which a LOCF-imputed value can be obtained.
## Model-based methods
Model-based imputation approaches offer richer output (probabilistic imputation,
multiple imputation) and may reduce the LOCF-inherent bias.
We compared LOCF with three model-based approaches.
We considered mixed effects models (MM) are a widely used approach in
longitudinal data analysis and model individual deviations from a population
mean trajectory (@verbeke2009linear).
An alternative non-linear regression model for longitudinal data is
Gaussian process regression (GP) which allows flexible modelling of
both the individual GOSE trajectories as well as the population mean
in a Bayesian non-parametric way [@rasmussen2006].
Both the mixed effects model as well as the Gaussian process regression
model are non-linear regression techniques for longitudinal
data.
While they are both powerful tools to model longitudinal trajectories,
they do not explicitly model the probability of transitions between
GOSe states.
Since the number of observations per individual is very limited in our
data set (1 to 4 GOSe observations per individual),
an approach explicitly modelling transition probabilities might be
more suitable to capture the dynamics of the GOSe trajectories.
To explore this further, a Markov multi-state model (MSM)
was considered (@meira2009).
All models were fitted using eiter none or all IMPACT predictors except for the
MSM model which only used age due to issues with numerical stability.
Further details on the respective implementations is given in the Appendix.
......@@ -435,48 +491,54 @@ df_predictions <- df_model_posteriors %>%
)
```
Model performance was assessed via three fold cross validation on the subset
Model performance was assessed via three-fold cross validation on the subset
of individuals with a valid GOSe value within 180 +/- 14 days post-injury
(n = `r nrow(df_ground_truth)`).
All models were fit on the entire available data after removing the
180 +/- 14 days post-injury observation from the respective test fold
thus mimicking a missing completely at random missing data mechanism.
The distribution of GOSe values in the respective three test sets is well
balanced, (cf. Appendix).
Performance is assessed using the absolute-count and the normalized
balanced, (cf. Appendix, Figure ???).
Performance was assessed using the absolute-count and the normalized
(proportions) confusion matrices as well as
bias, mean absolute error (MAE), and root mean squared error (RMSE).
All confusion matrices are reported as averages over the three-fold cross
validation test sets.
The normalized confusion matrices are normalized within each stratum of observed
GOSe value and are thus estimates of confusion probability conditional on the
observed GOSe.
Bias indicates whether, on average, predicted values are systematically
lower (negative) or higher (positive) than observed values.
MAE and RMSE are both a measures of average precision where
RMSE puts more weight on large deviations as compared to RMSE.
Comparisons in terms of bias, MAE, and RMSE tacitly assume that
GOSe values can be sensibly interpreted on an interval scale.
We therefore also consider the directional bias (bias'),
We therefore also considered the directional bias (bias'),
the difference between the model-fitted
probability of exceeding the true value minus the model-fitted probability of
undershooting the true GOSe ($Pr[imp. > true] - Pr[imp. < true]$) as an
alternative measure of bias which does not require this assumption.
Note that the scale is not directly comparable to the one of the
other three quantities!
Note that the scale of the directional bias is not directly comparable to the
one of the other three quantities!
All measures are considered both conditional on the ground-truth
(unobserved observed GOSe) as well as averaged over the entire test set.
LOCF, by design, cannot provide imputed values when there are no
observations before 180 days post injury.
A valid comparison of LOCF with the other methods must therefore be
based on the set of individuals for whom an LOCF imputation is possible.
```{r non-locf-ids, include=FALSE}
idx <- df_predictions %>%
filter(model == "LOCF", !complete.cases(.)) %>%
.[["gupi"]]
```
LOCF, by design, cannot provide imputed values when there are no
observations before 180 days post injury.
A valid comparison of LOCF with the other methods must therefore be
based on the set of individuals for whom an LOCF imputation is possible.
Overall, `r length(idx)` out of
`r df_predictions %>% filter(model == "LOCF") %>% nrow` test cases
(`r round(100 * length(idx) / (df_predictions %>% filter(model == "LOCF") %>% nrow), 1)`%) cannot be imputed with the LOCF approach.
(`r round(100 * length(idx) / (df_predictions %>% filter(model == "LOCF") %>% nrow), 1)`%) could not be imputed with the LOCF approach.
In the entire study population, `r df_gose %>% group_by(gupi) %>% summarize(LOCF = any(Outcomes.DerivedCompositeGOSEDaysPostInjury <= 180)) %>% ungroup %>% summarize(n_LOCF = sum(!LOCF)) %>% .[["n_LOCF"]]`
individuals do not permit an LOCF imputation (`r round(100 * (df_gose %>% group_by(gupi) %>% summarize(LOCF = any(Outcomes.DerivedCompositeGOSEDaysPostInjury <= 180)) %>% ungroup %>% summarize(n_LOCF = sum(!LOCF)) %>% .[["n_LOCF"]]) / (df_gose$gupi %>% unique %>% length), 1)`%).
individuals (`r round(100 * (df_gose %>% group_by(gupi) %>% summarize(LOCF = any(Outcomes.DerivedCompositeGOSEDaysPostInjury <= 180)) %>% ungroup %>% summarize(n_LOCF = sum(!LOCF)) %>% .[["n_LOCF"]]) / (df_gose$gupi %>% unique %>% length), 1)`%) did not have data that would permit an LOCF imputation.
The subset used for comparison of the imputation approaches with the LOCF
approach was similar to the overall dataset (cf. Appendix, Table ???).
......@@ -555,8 +617,8 @@ on average it imputes lower-than-observed GOSe values.
This reflects a population average trend towards continued
recovery within the first 6 months post injury.
The fact that both ways of quantifying bias qualitatively agree,
indicates that the interpretation of GOSe as an interval measure which is
tacitly underlying Bias, MAE, and RMSE comparisons is not too restrictive.
indicates that the interpretation of GOSe as an interval measure which
tacitly underlies Bias, MAE, and RMSE comparisons is not too restrictive.
In terms of accuracy, LOCF does perform worst but differences between
methods are less pronounced than in terms of bias.
Notably, the RMSE difference between LOCF and the other methods is slightly
......@@ -593,19 +655,21 @@ Both the raw count as well as the relative (by left-out observed GOSe) confusion
are presented in Figure ???.
```{r confusion-matrix-locf, warning=FALSE, message=FALSE, echo=FALSE, fig.cap="Confusion matrices on LOCF subset.", fig.height=6, fig.width=6}
plot_confusion_matrices <- function(df_predictions, models, nrow = 2, legendpos, scriptsize) {
df_average_confusion_matrices <- df_predictions %>%
select(-`1`, -`2`) %>%
filter(model %in% models) %>%
group_by(fold, model) %>%
do(
confusion_matrix = caret::confusionMatrix(
data = factor(.$prediction, levels = 1:8),
reference = factor(.$GOSE, levels = 1:8)
data = factor(.$prediction, levels = 3:8),
reference = factor(.$GOSE, levels = 3:8)
) %>%
as.matrix %>% as_tibble %>%
mutate(`Predicted GOSE` = row_number() %>% as.character) %>%
gather(`Observed GOSE`, n, 1:8)
mutate(`Predicted GOSE` = {row_number() + 2} %>% as.character) %>%
gather(`Observed GOSE`, n, 1:6)
) %>%
unnest %>%
group_by(model, `Predicted GOSE`, `Observed GOSE`) %>%
......@@ -635,7 +699,7 @@ plot_confusion_matrices <- function(df_predictions, models, nrow = 2, legendpos,
legend.position = "none"
) +
facet_wrap(~model, nrow = nrow) +
ggtitle("Average confusion matrix accross folds (absolute counts)")
ggtitle("Average confusion matrix across folds (absolute counts)")
p_cnf_mtrx_colnrm <- df_average_confusion_matrices %>%
group_by(model, `Observed GOSE`) %>%
......@@ -656,10 +720,9 @@ plot_confusion_matrices <- function(df_predictions, models, nrow = 2, legendpos,
legend.position = legendpos
) +
facet_wrap(~model, nrow = nrow) +
ggtitle("Average confusion matrix accross folds (column fraction)")
ggtitle("Average confusion matrix across folds (column fraction)")
cowplot::plot_grid(p_cnf_mtrx_raw, p_cnf_mtrx_colnrm, ncol = 1, align = "h")
}
plot_confusion_matrices(
......@@ -668,11 +731,11 @@ plot_confusion_matrices(
c("MSM", "GP + cov", "MM", "LOCF"),
nrow = 1,
legendpos = "none",
scriptsize = 2
scriptsize = 2.5
)
ggsave(filename = "confusion_matrices_locf.pdf", width = 6, height = 9)
ggsave(filename = "confusion_matrices_locf.png", width = 6, height = 9)
ggsave(filename = "confusion_matrices_locf.pdf", width = 6, height = 6)
ggsave(filename = "confusion_matrices_locf.png", width = 6, height = 6)
```
The absolute-count confusion matrices show that most imputed values are
......@@ -793,8 +856,9 @@ In the following, LOCF is not considered since a meaningful comparison
including LOCF is not possible on the entire set of test candidates due to the
fact that LOCF is not applicable in cases
where only GOSe values after 180 days post-injury are available.
The relative characteristics of the three considered approaches are comparable
to the LOCF subset.
The qualitative performance of the three imputation approaches in the complete
dataset was similar to their performance in the subset of data used for
comparison with LOCF.
......@@ -804,7 +868,7 @@ plot_confusion_matrices(
c("MSM", "GP + cov", "MM"),
nrow = 1,
legendpos = "none",
scriptsize = 2.5
scriptsize = 3
)
ggsave(filename = "confusion_matrices_all.pdf", width = 6, height = 6)
......@@ -833,11 +897,7 @@ ggsave(filename = "imputation_error.png", width = 6, height = 3.5)
Handling missing data *post-hoc* to prevent biased analyses often requires
great effort.
It is thus of the utmost importance to implement measures for avoiding missing
data in the first place [comment: I strongly feel we should lead with this
sentence or something in the same spirit to make it absolutely clear that
statistics cannot be used to impute data out of nowhere.
Raising awareness for the complexity of missing data problems and should rather be seen
as an incentive to invest more effort upfront in preventing missingness in the first place ;)]
data in the first place.
Nevertheless, in practice, missing values due to loss-to-follow-up will always
occur and should be addressed effectively
There is a wide consensus that statistically sound imputation of missing values
......@@ -937,7 +997,7 @@ ggsave(filename = "gose_marignal_per_fold.png", width = 6, height = 3)
## hm
## Comparison of LOCF and non-LOCF subgroups
```{r baseline-table-continuous2, echo=FALSE, results='asis'}
summarizer <- function(x) {
......
......@@ -104,15 +104,15 @@
publisher={American Medical Association}
}
@article{center2015collaborative,
title={Collaborative European neurotrauma effectiveness research in traumatic brain injury (CENTER-TBI): A prospective longitudinal observational study},
author={CENTER-TBI Participants and Investigators and others},
@article{maas2014,
title={Collaborative European NeuroTrauma Effectiveness Research in Traumatic Brain Injury (CENTER-TBI) A Prospective Longitudinal Observational Study},
author={Maas, Andrew IR and Menon, David K and Steyerberg, Ewout W and Citerio, Giuseppe and Lecky, Fiona and Manley, Geoffrey T and Hill, Sean and Legrand, Valerie and Sorgner, Annina},
journal={Neurosurgery},
volume={76},
number={1},
pages={67--80},
year={2015},
publisher={Lippincott Williams and Wilkins}
year={2014},
publisher={Oxford University Press}
}
@article{kurtzer2017singularity,
......@@ -149,16 +149,6 @@
publisher={BMJ Publishing Group Ltd}
}
@article{steyerberg2008predicting,
title={Predicting outcome after traumatic brain injury: development and international validation of prognostic scores based on admission characteristics},
author={Steyerberg, Ewout W and Mushkudiani, Nino and Perel, Pablo and Butcher, Isabella and Lu, Juan and McHugh, Gillian S and Murray, Gordon D and Marmarou, Anthony and Roberts, Ian and Habbema, J Dik F and others},
journal={PLoS medicine},
volume={5},
number={8},
pages={e165},
year={2008},
publisher={Public Library of Science}
}
@book{verbeke2009linear,
title={Linear mixed models for longitudinal data},
......@@ -248,7 +238,7 @@ year =
@article{steyerberg2008,
title={Predicting outcome after traumatic brain injury: development and international validation of prognostic scores based on admission characteristics},
author={Steyerberg, Ewout W and Mushkudiani, Nino and Perel, Pablo and Butcher, Isabella and Lu, Juan and McHugh, Gillian S and Murray, Gordon D and Marmarou, Anthony and Roberts, Ian and Habbema, J Dik F and others},
author={Steyerberg, Ewout and Mushkudiani, Nino and Perel, Pablo and Butcher, Isabella and Lu, Juan and McHugh, Gillian S and Murray, Gordon D and Marmarou, Anthony and Roberts, Ian and Habbema, J Dik F and others},
journal={PLoS medicine},
volume={5},
number={8},
......@@ -288,17 +278,3 @@ title = {{brms: An R Package for Bayesian Multilevel Models Using Stan}},
volume = {80},
year = {2017}
}
@article{Steyerberg2008,
author = {Steyerberg, Ewout W and Mushkudiani, Nino and Perel, Pablo and Butcher, Isabella and Lu, Juan and McHugh, Gillian S and Murray, Gordon D and Marmarou, Anthony and Roberts, Ian and Habbema, J. Dik F and Maas, Andrew I. R},
doi = {10.1371/journal.pmed.0050165},
editor = {Singer, Mervyn},
journal = {PLoS Medicine},
month = {aug},
number = {8},
pages = {e165},
title = {{Predicting Outcome after Traumatic Brain Injury: Development and International Validation of Prognostic Scores Based on Admission Characteristics}},
url = {https://dx.plos.org/10.1371/journal.pmed.0050165},
volume = {5},
year = {2008}
}
Markdown is supported
0% or
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment