Commit b0dd68da authored by Kevin Kunzmann's avatar Kevin Kunzmann

manucript details, more compact figures

parent 32996c4e
......@@ -43,7 +43,7 @@ The 3-month outcome is recognized as one approach of subsituting for missing
and has been used in recent trials (@skolnick2014).
Although LOCF is easy to understand and implement, the technique is
suboptimal in several respects.
Firstly, it is biased in that it neglects potential trends in
Firstly, it is biased in that it neglects potential time trends in
GOS(e) trajectories.
Secondly, a naive application of LOCF is also inefficient since it neglects
data observed briefly after the target time window.
......@@ -57,16 +57,16 @@ to further reduce bias introduced by the imputation method and that LOCF
cannot be used to obtain multiply imputed data sets by design.
The variation in timing of outcome assessments for patients with TBI
varies between studies.
Some studies define very stringent time windows (e.g. + 2 weeks [TRACK-TBI]),
varies between studies.
Some studies define very stringent time windows (e.g. +/- 2 weeks [TRACK-TBI ???????????]),
but this can lead to a substantial amount missing data (@richter2019). Consequently other studies have defined more pragmatic protocol windows
(e.g. -1 month to +2 months, @maas2014).
(e.g. -1 month to +2 months, cf. @maas2014).
While the wider windows enable more complete data collection,
they suffer from the problem that outcome can be evolving over this period,
and an outcome assessment obtained at five months
(the beginning of this window) in one subject may not be strictly comparable
with outcomes obtained just before eight months (the end of the window) in
another subject.
another subject.
Consequently, even where outcomes are available within pragmatic protocol
windows,
there may be a benefit from being able to impute an outcome more
......@@ -162,14 +162,14 @@ n_pat <- nrow(df_baseline)
```
The CENTER-TBI project methods and design are described in detail
elsewhere [@center2014].
elsewhere (@maas2014).
Participants with TBI were recruited into three strata:
(a) patients attending the emergency room,
(b) patients admitted to hospital but not intensive care,
and (c) patients admitted to intensive care.
Follow-up of participants was scheduled per protocol at 2 weeks, 3 months,
and 6 months in group (a) and at 3 months, 6 months, and 12 months in groups
(b) and (c).
(b) and (c).
The protocol time window for the 6 months GOSe was between 5 and 8 months
post injury.
Outcome assessments at all timepoints included the GOSe
......@@ -183,12 +183,14 @@ The GOSe is an eight-point scale with the following categories:
The study population for this empirical methods comparison are all
individuals from the CENTER-TBI database (total of n = 4509) whose GOSe
status was queried at least once within the first 18 months and who were
still alive 180 days post-injury `r n_pat`.
still alive 180 days post-injury (n = `r n_pat`).
The rationale for conducting the comparison conditional on 6-months survival
is simply that the GOSe can only be missing at 6-months if the individuals
are still alive since GOSe would be (1) otherwise.
All data were accessed from the CENTER-TBI Neurobot database
(release 1.1, cf. Appendix for details).
Data for the CENTER-TBI study has been collected through the Quesgen e-CRF
(Quesgen Systems Inc, USA),
hosted on the INCF platform and extracted via the INCF Neurobot tool (https://neurobot.incf.org/).
Release 1.1 of the database was used (cf. Appendix for details).
Basic summary statistics for population characteristics are listed in
Table 1.
......@@ -265,7 +267,7 @@ GOSe observations at 180 +/- 14 days post injury were available and
GOSe observations within the per-protocol window of 5-8 months post injury.
The distribution of GOSe sampling times and both absolute and
relative frequencies of the respective GOSe categories are shown in
Figure (???).
Figure 1.
True observation times were mapped to categories by rounding to the
closest time point, i.e.,
the 6 months category contains observations up to 9 months post-injury.
......@@ -353,7 +355,8 @@ model (MM),
a Gaussian process regression (GP), and a multi-state model (MSM).
For all model-based approaches we additionally explored variants
including the key IMPACT [@steyerberg2008] predictors as covariates.
These are [????].
These are age, GCS motot score, pupil reactivity (0, 1, 2), hypoxia, hypotension, Marshall CT classification, traumatic subarachnoid hemorrhage, epidural hematoma,
glucose, and Hb.
......@@ -387,10 +390,10 @@ in a Bayesian non-parametric way [@rasmussen2006].
Both the mixed effects model as well as the Gaussian process regression
model are non-linear regression techniques for longitudinal
data.
While they are both powerful tools to model longitudinal trajectories,
While these are powerful tools to model longitudinal trajectories,
they do not explicitly model the probability of transitions between
GOSe states.
Since the number of observations per individual is very limited in our
Since the number of observations per individual is limited in our
data set (1 to 4 GOSe observations per individual),
an approach explicitly modelling transition probabilities might be
more suitable to capture the dynamics of the GOSe trajectories.
......@@ -399,7 +402,7 @@ was considered (@meira2009).
All models were fitted using eiter none or all IMPACT predictors except for the
MSM model which only used age due to issues with numerical stability.
Further details on the respective implementations is given in the Appendix.
Further details on the respective implementations are given in the Appendix.
......@@ -498,7 +501,7 @@ All models were fit on the entire available data after removing the
180 +/- 14 days post-injury observation from the respective test fold
thus mimicking a missing completely at random missing data mechanism.
The distribution of GOSe values in the respective three test sets is well
balanced, (cf. Appendix, Figure ???).
balanced, (cf. Appendix, Figure A.1).
Performance was assessed using the absolute-count and the normalized
(proportions) confusion matrices as well as
bias, mean absolute error (MAE), and root mean squared error (RMSE).
......@@ -513,10 +516,10 @@ MAE and RMSE are both a measures of average precision where
RMSE puts more weight on large deviations as compared to RMSE.
Comparisons in terms of bias, MAE, and RMSE tacitly assume that
GOSe values can be sensibly interpreted on an interval scale.
We therefore also considered the directional bias (bias'),
We therefore also considered directional bias (bias'),
the difference between the model-fitted
probability of exceeding the true value minus the model-fitted probability of
undershooting the true GOSe ($Pr[imp. > true] - Pr[imp. < true]$) as an
undershooting the true GOSe ($Pr[imputed > observed] - Pr[imputed < observed]$) as an
alternative measure of bias which does not require this assumption.
Note that the scale of the directional bias is not directly comparable to the
one of the other three quantities!
......@@ -528,6 +531,7 @@ idx <- df_predictions %>%
filter(model == "LOCF", !complete.cases(.)) %>%
.[["gupi"]]
```
LOCF, by design, cannot provide imputed values when there are no
observations before 180 days post injury.
A valid comparison of LOCF with the other methods must therefore be
......@@ -547,8 +551,10 @@ approach was similar to the overall dataset (cf. Appendix, Table ???).
# Results
The overall performance of all fitted models in terms of bias, bias', MAE, and RMSE
is depicted in Figure ??? both conditional on LOCF being applicable and,
excluding LOCF, on the entire test set.
is depicted in Figure 2 both conditional on LOCF being applicable (gray) and,
excluding LOCF, on the entire test set (black).
Values are reported as mean over the three cross-validation folds and
error bars indicate +/- 1.96 standard errors.
```{r overall-comparison-all-methods, echo=FALSE, fig.height=3.5, fig.width=6, warning=FALSE}
compute_summary_measures <- function(df) {
......@@ -616,7 +622,7 @@ Firstly, LOCF is overall negatively biased, i.e.,
on average it imputes lower-than-observed GOSe values.
This reflects a population average trend towards continued
recovery within the first 6 months post injury.
The fact that both ways of quantifying bias qualitatively agree,
The fact that both ways of measuring bias qualitatively agree,
indicates that the interpretation of GOSe as an interval measure which
tacitly underlies Bias, MAE, and RMSE comparisons is not too restrictive.
In terms of accuracy, LOCF does perform worst but differences between
......@@ -652,7 +658,10 @@ baseline covariates.
We first consider results for the set of test cases which allow LOCF imputation
(n = `r df_predictions %>% filter(model == "LOCF") %>% nrow - length(idx)`).
Both the raw count as well as the relative (by left-out observed GOSe) confusion matrices
are presented in Figure ???.
are presented in Figure 3.
The GOSe scale is restricted to 3+ since the imputation is conditional on
an observed GOSe larger than 1 (deaths are known and no imputation necessary)
and no GOSe 2 was observed.
```{r confusion-matrix-locf, warning=FALSE, message=FALSE, echo=FALSE, fig.cap="Confusion matrices on LOCF subset.", fig.height=6, fig.width=6}
......@@ -740,20 +749,19 @@ ggsave(filename = "confusion_matrices_locf.png", width = 6, height = 6)
The absolute-count confusion matrices show that most imputed values are
within +/- one GOSE categories of the observed ones.
However, they also reflect the category imbalance (cf. Figures ??? and ???
Appendix) in the study population.
However, they also reflect the category imbalance (cf. Figures 1) in the study population.
The performance conditional on the (in practice unknown) observed GOSe value
clearly shows that imputation performance for the most infrequent category 4
is most problematic.
clearly shows that imputation for the most infrequent category 4
is hardest.
This is, however, true across the range of methods considered.
Both the MSM and the MM models account for this by almost never imputing a
GOSe of 4.
Instead, the respective cases tend to be imputed to GOSe 3 or 5.
To better understand the overall performance assessment in Figure ???,
To better understand the overall performance assessment in Figure 2,
we also consider the performance conditional on the respective ground-truth
(i.e. the observed GOSe values in the test sets).
The results are shown in Figure ??? (vertical bars are +/- one standard error of the mean).
The results are shown in Figure 4 (vertical bars are +/-1.96 standard error of the mean).
```{r error-scores-locf, echo=FALSE, fig.height=3.5, fig.width=6}
plot_summary_measures_cond <- function(df_predictions, models, label) {
......@@ -790,14 +798,14 @@ plot_summary_measures_cond <- function(df_predictions, models, label) {
ggplot(aes(GOSE, color = model)) +
geom_hline(yintercept = 0, color = "black") +
geom_line(aes(y = mean)) +
geom_errorbar(aes(ymin = mean - se, ymax = mean + se),
geom_errorbar(aes(ymin = mean - 1.96*se, ymax = mean + 1.96*se),
width = .2,
position = position_dodge(.2),
position = position_dodge(.33),
size = 1
) +
xlab("observed GOSe") +
facet_wrap(~error, nrow = 1) +
scale_y_continuous(name = "", breaks = seq(-2, 8, .25)) +
scale_y_continuous(name = "", breaks = seq(-2, 8, .5)) +
theme_bw() +
theme(
panel.grid.minor = element_blank(),
......@@ -844,7 +852,7 @@ clearly dominates with respect to bias.
Note that irrespective of the exact definition of bias used, MSM dominates the other
model-based approaches.
Comparing LOCF and MSM, there is a slight advantage of MSM in terms of accuracy for
the majority classes 3, 7, 8 which explain the overall difference shown in Figure ???.
the majority classes 3, 7, 8 which explain the overall difference shown in Figure 2.
With respect to bias, MSM also performs better than LOCF for the most frequently
observed categories, but the extent of this improvement depend on the performance measure.
......@@ -899,7 +907,7 @@ great effort.
It is thus of the utmost importance to implement measures for avoiding missing
data in the first place.
Nevertheless, in practice, missing values due to loss-to-follow-up will always
occur and should be addressed effectively
occur and should be addressed effectively.
There is a wide consensus that statistically sound imputation of missing values
is beneficial for both the reduction of bias and for increasing statistical power.
The current gold-standard for imputing missing values is multiple
......@@ -941,17 +949,17 @@ the respective target population.
Albeit simple to implement, LOCF - by definition - is not capable of
exploiting longitudinal information after the target time point.
This results in a smaller subset of individuals for which imputed values can
be provided in the first place (cf. Section ???).
be provided in the first place.
LOCF also lacks flexibility to adjust for further covariates which might be
necessary in some cases to further reduce bias.
necessary in some cases to further reduce bias under a missing at random assumption.
Finally, LOCF cannot produce an adequate measure of imputation uncertainty
since it is not model based.
We draw two main conclusion from our comparison of three
We draw two main conclusions from our comparison of three
alternative, model-based approaches.
Firstly, despite its theoretical drawbacks, LOCF is hard to beat in terms of
accuracy.
Still, small improvements are possible (cf. Section ???).
accuracy.
Still, small improvements are possible.
The main advantages of a model-based approach is thus a reduction of bias,
the ability to provide a measure of uncertainty together with the imputed
values
......@@ -960,7 +968,7 @@ as well as the possibility of including further analysis-specific covariates.
Secondly, we found that the inclusion of established baseline predictors for
GOSe at 6 months post-injury had little effect on the imputation quality.
Note that this does not refute their predictive value but only indicates that
there is little marginal benefit over knowing at least one other GOSe value.
there is little marginal benefit over knowing at least one other value.
Differences between the model-based approaches tend to be rather nuanced.
We nevertheless favor the multi-state model (MSM).
It is well-interpretable in terms of transition intensities.
......@@ -971,11 +979,25 @@ able to provide imputed values for the entire population and is able to
provide a probabilistic output.
## Funding sources statement:
Data used in preparation of this manuscript were obtained in the context of
CENTER-TBI, a large collaborative project with the support of the European
Union 7th Framework program (EC grant 602150).
Additional funding was obtained from the Hannelore Kohl Stiftung (Germany),
from OneMind (USA) and from Integra LifeSciences Corporation (USA).
# Appendix / Supplemental Material
## Ethical approval statement
The CENTER-TBI study (EC grant 602150) has been conducted in accordance with all relevant laws of the EU if directly applicable or of direct effect and all relevant laws of the country where the Recruiting sites were located, including but not limited to, the relevant privacy and data protection laws and regulations (the “Privacy Law”), the relevant laws and regulations on the use of human materials, and all relevant guidance relating to clinical studies from time to time in force including, but not limited to, the ICH Harmonised Tripartite Guideline for Good Clinical Practice (CPMP/ICH/135/95) (“ICH GCP”) and the World Medical Association Declaration of Helsinki entitled “Ethical Principles for Medical Research Involving Human Subjects”.
Informed Consent by the patients and/or the legal representative/next of kin was obtained, accordingly to thelocal legislations,for all patients recruited in the Core Dataset of CENTER-TBI and documented in the e-CRF.
Ethical approval was obtained for each recruiting site.
The list of sites, Ethical Committees, approval numbers and approval dates can be foundon the website: https://www.center-tbi.eu/project/ethical-approval.
## Distribution of GOSe in validation folds
......@@ -1235,7 +1257,7 @@ https://center-tbi.incf.org/ are provided.
The analysis is completely automated using the workflow management tool 'snakemake'
[@koster2012snakemake] and a singularity [@kurtzer2017singularity] container image
containing all required dependencies is publicly available from zenodo.org
at https://zenodo.org/record/2600385#.XJzZwEOnw5k (DOI: 10.5281/zenodo.2600385).
(DOI: 10.5281/zenodo.2600385).
Detailed step-by-step instructions on how to reproduce the analysis are provided
in the README.md file of the GitLab repository.
......
......@@ -196,7 +196,7 @@ year = {2011}
@article{R2016,
archivePrefix = {arXiv},
arxivId = {arXiv:1011.1669v3},
author = {Team, R Development Core and {R Development Core Team}, R},
author = {R Core Team},
doi = {10.1007/978-3-540-74686-7},
eprint = {arXiv:1011.1669v3},
isbn = {3{\_}900051{\_}00{\_}3},
......@@ -206,6 +206,7 @@ pmid = {16106260},
title = {{R: A Language and Environment for Statistical Computing}},
year = {2016}
}
@book{rasmussen2006,
abstract = {Gaussian processes (GPs) are natural generalisations of multivariate Gaussian random variables to infinite (countably or continuous) index sets. GPs have been applied in a large number of fields to a diverse range of ends, and very many deep theoretical analyses of various properties are available. This paper gives an introduction to Gaussian processes on a fairly elementary level with special emphasis on characteristics relevant in machine learning. It draws explicit connections to branches such as spline smoothing models and support vector machines in which similar ideas have been investigated. Gaussian process models are routinely used to solve hard machine learning problems. They are attractive because of their flexible non-parametric nature and computational simplicity. Treated within a Bayesian framework, very powerful statistical methods can be implemented which offer valid estimates of uncertainties in our predictions and generic model selection procedures cast as nonlinear optimization problems. Their main drawback of heavy computational scaling has recently been alleviated by the introduction of generic sparse approximations.13,78,31 The mathematical literature on GPs is large and often uses deep concepts which are not required to fully understand most machine learning applications. In this tutorial paper, we aim to present characteristics of GPs relevant to machine learning and to show up precise connections to other "kernel machines" popular in the community. Our focus is on a simple presentation, but references to more detailed sources are provided.},
archivePrefix = {arXiv},
......
Markdown is supported
0% or
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment