@@ -844,7 +852,7 @@ clearly dominates with respect to bias.
Note that irrespective of the exact definition of bias used, MSM dominates the other
model-based approaches.
Comparing LOCF and MSM, there is a slight advantage of MSM in terms of accuracy for
the majority classes 3, 7, 8 which explain the overall difference shown in Figure ???.
the majority classes 3, 7, 8 which explain the overall difference shown in Figure 2.
With respect to bias, MSM also performs better than LOCF for the most frequently
observed categories, but the extent of this improvement depend on the performance measure.
...
...
@@ -899,7 +907,7 @@ great effort.
It is thus of the utmost importance to implement measures for avoiding missing
data in the first place.
Nevertheless, in practice, missing values due to loss-to-follow-up will always
occur and should be addressed effectively
occur and should be addressed effectively.
There is a wide consensus that statistically sound imputation of missing values
is beneficial for both the reduction of bias and for increasing statistical power.
The current gold-standard for imputing missing values is multiple
...
...
@@ -941,17 +949,17 @@ the respective target population.
Albeit simple to implement, LOCF - by definition - is not capable of
exploiting longitudinal information after the target time point.
This results in a smaller subset of individuals for which imputed values can
be provided in the first place (cf. Section ???).
be provided in the first place.
LOCF also lacks flexibility to adjust for further covariates which might be
necessary in some cases to further reduce bias.
necessary in some cases to further reduce bias under a missing at random assumption.
Finally, LOCF cannot produce an adequate measure of imputation uncertainty
since it is not model based.
We draw two main conclusion from our comparison of three
We draw two main conclusions from our comparison of three
alternative, model-based approaches.
Firstly, despite its theoretical drawbacks, LOCF is hard to beat in terms of
accuracy.
Still, small improvements are possible (cf. Section ???).
Still, small improvements are possible.
The main advantages of a model-based approach is thus a reduction of bias,
the ability to provide a measure of uncertainty together with the imputed
values
...
...
@@ -960,7 +968,7 @@ as well as the possibility of including further analysis-specific covariates.
Secondly, we found that the inclusion of established baseline predictors for
GOSe at 6 months post-injury had little effect on the imputation quality.
Note that this does not refute their predictive value but only indicates that
there is little marginal benefit over knowing at least one other GOSe value.
there is little marginal benefit over knowing at least one other value.
Differences between the model-based approaches tend to be rather nuanced.
We nevertheless favor the multi-state model (MSM).
It is well-interpretable in terms of transition intensities.
...
...
@@ -971,11 +979,25 @@ able to provide imputed values for the entire population and is able to
provide a probabilistic output.
## Funding sources statement:
Data used in preparation of this manuscript were obtained in the context of
CENTER-TBI, a large collaborative project with the support of the European
Union 7th Framework program (EC grant 602150).
Additional funding was obtained from the Hannelore Kohl Stiftung (Germany),
from OneMind (USA) and from Integra LifeSciences Corporation (USA).
# Appendix / Supplemental Material
## Ethical approval statement
The CENTER-TBI study (EC grant 602150) has been conducted in accordance with all relevant laws of the EU if directly applicable or of direct effect and all relevant laws of the country where the Recruiting sites were located, including but not limited to, the relevant privacy and data protection laws and regulations (the “Privacy Law”), the relevant laws and regulations on the use of human materials, and all relevant guidance relating to clinical studies from time to time in force including, but not limited to, the ICH Harmonised Tripartite Guideline for Good Clinical Practice (CPMP/ICH/135/95) (“ICH GCP”) and the World Medical Association Declaration of Helsinki entitled “Ethical Principles for Medical Research Involving Human Subjects”.
Informed Consent by the patients and/or the legal representative/next of kin was obtained, accordingly to thelocal legislations,for all patients recruited in the Core Dataset of CENTER-TBI and documented in the e-CRF.
Ethical approval was obtained for each recruiting site.
The list of sites, Ethical Committees, approval numbers and approval dates can be foundon the website: https://www.center-tbi.eu/project/ethical-approval.
## Distribution of GOSe in validation folds
...
...
@@ -1235,7 +1257,7 @@ https://center-tbi.incf.org/ are provided.
The analysis is completely automated using the workflow management tool 'snakemake'
[@koster2012snakemake] and a singularity [@kurtzer2017singularity] container image
containing all required dependencies is publicly available from zenodo.org
at https://zenodo.org/record/2600385#.XJzZwEOnw5k (DOI: 10.5281/zenodo.2600385).
(DOI: 10.5281/zenodo.2600385).
Detailed step-by-step instructions on how to reproduce the analysis are provided
author = {Team, R Development Core and {R Development Core Team}, R},
author = {R Core Team},
doi = {10.1007/978-3-540-74686-7},
eprint = {arXiv:1011.1669v3},
isbn = {3{\_}900051{\_}00{\_}3},
...
...
@@ -206,6 +206,7 @@ pmid = {16106260},
title = {{R: A Language and Environment for Statistical Computing}},
year = {2016}
}
@book{rasmussen2006,
abstract = {Gaussian processes (GPs) are natural generalisations of multivariate Gaussian random variables to infinite (countably or continuous) index sets. GPs have been applied in a large number of fields to a diverse range of ends, and very many deep theoretical analyses of various properties are available. This paper gives an introduction to Gaussian processes on a fairly elementary level with special emphasis on characteristics relevant in machine learning. It draws explicit connections to branches such as spline smoothing models and support vector machines in which similar ideas have been investigated. Gaussian process models are routinely used to solve hard machine learning problems. They are attractive because of their flexible non-parametric nature and computational simplicity. Treated within a Bayesian framework, very powerful statistical methods can be implemented which offer valid estimates of uncertainties in our predictions and generic model selection procedures cast as nonlinear optimization problems. Their main drawback of heavy computational scaling has recently been alleviated by the introduction of generic sparse approximations.13,78,31 The mathematical literature on GPs is large and often uses deep concepts which are not required to fully understand most machine learning applications. In this tutorial paper, we aim to present characteristics of GPs relevant to machine learning and to show up precise connections to other "kernel machines" popular in the community. Our focus is on a simple presentation, but references to more detailed sources are provided.},