Commit 38745375 authored by Kevin's avatar Kevin

added references + additional descriptive covariates

parent 87998fa0
......@@ -2,10 +2,10 @@
title: "Model-based longitudinal imputation of cross-sectional outcomes in traumatic brain injury"
output:
word_document: default
pdf_document: default
pdf_document: default
html_document:
keep_md: yes
bibliography: "references.bib"
bibliography: "references.bib"
params:
data_dir: "../output/v1.1/data"
config_file: "../config.yml"
......@@ -24,46 +24,25 @@ config <- yaml::read_yaml(params$config_file)
Assessments of global functional outcome such as the Glasgow Outcome Scale (GOS)
and the Glasgow Outcome Scale extended (GOSe) capture
meaningful differences across the full spectrum of recovery,
<<<<<<< HEAD
and have popularity as endpoints in TBI studies @horton2018randomized.
and have popularity as endpoints in traumatic brain injury (@horton2018).
However, missing outcome data is a common problem in TBI research,
and for longitudinal studies completion rates at six months can be
lower than 70% @richter2019handling.
=======
and have popularity as endpoints in traumatic brain injury (TBI0 studies
(Horton et al 2018).
However, missing outcome data is a common problem in TBI research,
and for longitudinal studies completion rates at six months can be
lower than 70% (Richter et al).
lower than 70% (@richter2019).
This is important since it is well known that complete-case analyses may
introduce bias or reduce power [@white2010bias].
introduce bias or reduce power (@white2010).
>>>>>>> 943d2c27a63fd82c265933c46a0d6ab674191f03
Imputation of patient outcomes is gradually gaining acceptance in the TBI
field as a method of dealing with missing data.
Recent longitudinal studies have successfully employed techniques for both
<<<<<<< HEAD
single @clifton2011 @silver2006 @skolnick2014 and
multiple imputation @bulger2010 @kirkness2006 @wright2014 @robertson2014.
The advantage of multiple imputation over single imputation clearly lies in the
fact that the uncertainty about imputed values can be properly accounted for
in a subsequent statistical analysis.
Despite these considerations, last observation carried forward (LOCF) is still
a widely applied method for dealing with missing data in TBI research,
e.g. simply substituting the respective 3-months outcome for a missing 6-months data point [REFERENCE].
Although LOCF is simple to understand and implement, the technique clearly
lacks in several respects.
=======
single (Clifton et al., 2011; Silver et al 2006; Skolnick et al 2014) and
multiple imputation (Bulger et al, 2010, Kirkness et al 2006;
Wright et al, 2014; Robertson et al, 2014).
single (@clifton2011; @silver2006; @skolnick2014) and
multiple imputation (@bulger2010; @kirkness2006; @wright2014; @robertson2014).
Last observation carried forward (LOCF) is a widely applied single-imputation
method for dealing with missing data in TBI research.
Typically, 3-month outcome is substituted for missing 6-month data [REFERENCE].
Although LOCF is easz to understand and implement, the technique is clearly
Typically, 3-month outcome is substituted for missing 6-month data (@steyerberg2008).
Although LOCF is easy to understand and implement, the technique is clearly
lacking in several respects.
>>>>>>> 943d2c27a63fd82c265933c46a0d6ab674191f03
Firstly, it is biased in that it neglects potential trends in the GOS(e) trajectories.
Firstly, it is biased in that it neglects potential trends in the GOS(e)
trajectories.
Secondly, a naive application of LOCF is also inefficient since it neglects
data observed briefly after the target time window.
E.g., a GOS(e) value recorded at 200 days post-injury is likely to be more
......@@ -77,7 +56,7 @@ cannot be used to obtain multiply imputed data sets by design.
In this manuscript, three model-based imputation strategies for GOSe at
6 months (= 180 days) post-injury in the longitudinal CENTER-TBI study
[@center2015collaborative] are compared with LOCF.
@center2015collaborative are compared with LOCF.
While we acknowledge the principle superiority of multiple imputation over
single imputation to propagating imputation uncertainty,
we focus on single-imputation performance for primarily practical reasons.
......@@ -98,13 +77,6 @@ for imputing cross-sectional GOSe at 6 months exploiting the longitudinal
GOSe measurements.
Each model is fit in a version with and without baseline covariates.
<<<<<<< HEAD
We propose three different model-based approaches - a mixed-effects model,
a Gaussian process regression, and a multi-state model - for imputing GOSe
longitudinally each of which we fit in a version including baseline covariates
and without.
=======
>>>>>>> 943d2c27a63fd82c265933c46a0d6ab674191f03
# Methods
......@@ -112,7 +84,49 @@ and without.
## Study population
```{r read-population-data, include=FALSE}
df_baseline <- read_rds(paste0(params$data_dir, "/df_baseline.rds"))
df_baseline <- read_rds(paste0(params$data_dir, "/df_baseline.rds")) %>%
mutate(
InjuryHx.PupilsBaselineDerived = factor(InjuryHx.PupilsBaselineDerived,
levels = 0:2),
InjuryHx.GCSScoreBaselineDerived_dscr = case_when(
InjuryHx.GCSScoreBaselineDerived <= 8 ~ "Severe",
InjuryHx.GCSScoreBaselineDerived <= 12 ~ "Moderate",
InjuryHx.GCSScoreBaselineDerived > 12 ~ "Mild"
) %>% as.factor()
) %>%
left_join(
read_rds(
"../data/v1.1/df_baseline_descriptive.rds"
) %>%
transmute(
gupi,
Subject.PatientType = factor(
Subject.PatientType,
level = 1:3,
labels = c("Emergency Room", "Admission to Hospital", "Intensive Care Unit")
),
InjuryHx.InjCause = factor(InjuryHx.InjCause,
level = c(1:6, 88, 99),
labels = c(
"Road traffic incident",
"Incidental fall",
"Other non-intentional injury",
"Violence/assault",
"Act of mass violence",
"Suicide attempt",
"Unknown",
"Other"
) %>%
fct_recode(
Other = "Other non-intentional injury",
Other = "Suicide attempt",
`Violence/assault` = "Act of mass violence"
)
),
InjuryHx.TotalISS
),
by = "gupi"
)
n_pat <- nrow(df_baseline)
```
......@@ -127,7 +141,7 @@ Follow-up of participants was scheduled per protocol at 2 weeks, 3 months,
and 6 months in group (a) and at 3 months, 6 months, and 12 months in groups
(b) and (c).
Outcome assessments at all timepoints included the GOSe
[@jennett1981disability, @mcmillan2016glasgow].
(@jennett1981disability, @mcmillan2016glasgow).
The GOSe is an eight-point scale with the following categories:
(1) dead, (2) vegetative state, (3) lower severe disability,
(4) upper severe disability, (5) lower moderate disability,
......@@ -142,13 +156,6 @@ The rationale for conducting the comparison conditional on 6-months survival
is simply that the GOSe can only be missing at 6-months if the individuals
are still alive since GOSe would be (1) otherwise.
**TODO:**
* ->LW: I explicitly asked for additional covariates that are needed from the DB,
happy to include them if you pass me the Neurobot codes!
* yes, the entire document is automatically generated to ensure reproducibility and
up-to-date data; once we agree on the manuscript we can pimp the table formatting and
add cross-references to the respective figures (that is only supported for pdf output which you could not edit directly)
```{r baseline-table-continuous, echo=FALSE, results='asis'}
......@@ -171,7 +178,7 @@ df_baseline %>%
unnest() %>%
spread(statistic, value) %>%
unnest() %>%
pander::pandoc.table("Discrete baseline variables", digits = 3)
pander::pandoc.table("Discrete baseline variables", digits = 3, split.tables = 120)
```
```{r baseline-table-discrete, echo=FALSE, results='asis'}
......@@ -195,6 +202,8 @@ summarizer <- function(x) {
df_baseline %>%
select_if(~!is.numeric(.)) %>%
select(-gupi) %>%
mutate_all(as.factor) %>%
mutate_all(fct_explicit_na) %>%
mutate_all(as.character) %>%
gather(variable, value) %>%
group_by(variable) %>%
......@@ -204,8 +213,9 @@ df_baseline %>%
unnest() %>%
spread(statistic, value) %>%
unnest(`# NA`) %>%
select(-`# NA`) %>%
unnest(table) %>%
pander::pandoc.table("Continuous baseline variables", digits = 3)
pander::pandoc.table("Continuous baseline variables", digits = 3, split.tables = 120)
```
```{r read-gose-data, include=FALSE}
......@@ -225,7 +235,7 @@ n_gose_pp <- df_gose %>%
Only GOSe observations between injury and 18 months post injury are used
since extremely late follow-ups are not providing enough information.
This leads to a total of `r nrow(df_gose)` GOSe observations of the study
population being availabe for the analyses in this manuscript.
population being available for the analyses in this manuscript.
Only for `r n_gose_180` (`r round(n_gose_180 / n_pat * 100, 1)`%) individuals,
GOSe observations at 180 +/- 14 days post injury are available,
for `r n_gose_pp` (`r round(n_gose_pp / n_pat * 100, 1)`%) individuals
......@@ -344,25 +354,26 @@ GOSe value for subjects where at least one value is available within the
first 180 days post injury.
We account for this lack of complete coverage under LOCF by performing all
performance comparisons including LOCF only on the subset of individuals
for which a LOCF-imputed value can be obtained (cf. Section ???).
for which a LOCF-imputed value can be obtained.
## Mixed-effects model
Mixed effects models are a a widely used approach in longitudinal
data analysis andd model individual deviations from a population mean trajectory
[@verbeke2009linear].
Mixed effects models are a widely used approach in longitudinal
data analysis and model individual deviations from a population mean trajectory
(@verbeke2009linear).
To account for the fact that the GOSe outcome is an ordered factor,
we employ a cumulative link function model with flexible intercepts [@Agresti2003].
The population mean is modeled as spline function with knots at ???
to allow a non-linear population mean trajectory.
we employ a cumulative link function model with flexible intercepts
(@Agresti2003).
The population mean is modeled as cubic spline function to allow a
non-linear population mean trajectory.
Patient-individual deviations from this population mean are modeled
as quadratic polynomials to allow sufficient flexibility (random effects).
Baseline covariates are added as linear fixed effects to to the
population mean.
The model was fitted using Bayesian statistics via the BRMS
package [@brms2017, @brms2018] for the R environment for statistical
computing [@R2016] and the Stan modelling language [@stan2017].
package (@brms2017; @brms2018) for the R environment for statistical
computing (@R2016) and the Stan modelling language (@stan2017).
Non-informative priors were used for the model parameters.
A potential drawback of the proposed longitudinal mixed effects model
is the fact that the individual deviations from the population mean
......@@ -423,7 +434,7 @@ Since the number of observations per individual is very limited in our
data set (1 to 4 GOSe observations per individual),
an approach explicitly modelling transition probabilities might be
more suitable to capture the dynamics of the GOSe trajectories.
To explore this further, a Markov multi-state model is considered [REFERENCE].
To explore this further, a Markov multi-state model is considered (@meira2009).
This model class assumes that the transitions between adjacent GOSe
states can be modeled as a Markov process and the transition
intensities between adjacent states are fitted to the observed data.
......@@ -586,7 +597,7 @@ All measures are considered both conditional on the ground-truth
LOCF, by design, cannot provide imputed values when there are no
observations before 180 days post injury.
A valid comparison of LOCF with the other methods must therefore be
baseed on the set of individuals for whom an LOCF imputation is possible.
based on the set of individuals for whom an LOCF imputation is possible.
```{r non-locf-ids, include=FALSE}
idx <- df_predictions %>%
filter(model == "LOCF", !complete.cases(.)) %>%
......@@ -710,7 +721,7 @@ Both the raw count as well as the relative (by left-out observed GOSe) confusion
are presented in Figure ???.
```{r confusion-matrix-locf, warning=FALSE, message=FALSE, echo=FALSE, fig.cap="Confusion matrices on LOCF subset.", fig.height=9, fig.width=6}
plot_confusion_matrices <- function(df_predictions, models) {
plot_confusion_matrices <- function(df_predictions, models, nrow = 2) {
df_average_confusion_matrices <- df_predictions %>%
filter(model %in% models) %>%
......@@ -748,7 +759,7 @@ plot_confusion_matrices <- function(df_predictions, models) {
theme(
panel.grid = element_blank()
) +
facet_wrap(~model, nrow = 2) +
facet_wrap(~model, nrow = nrow) +
ggtitle("Average confusion matrix accross folds (absolute counts)")
p_cnf_mtrx_colnrm <- df_average_confusion_matrices %>%
......@@ -768,7 +779,7 @@ plot_confusion_matrices <- function(df_predictions, models) {
theme(
panel.grid = element_blank()
) +
facet_wrap(~model, nrow = 2) +
facet_wrap(~model, nrow = nrow) +
ggtitle("Average confusion matrix accross folds (column fraction)")
cowplot::plot_grid(p_cnf_mtrx_raw, p_cnf_mtrx_colnrm, ncol = 1, align = "v")
......@@ -777,7 +788,8 @@ plot_confusion_matrices <- function(df_predictions, models) {
plot_confusion_matrices(
df_predictions %>% filter(!(gupi %in% idx)),
c("MSM", "GP + cov", "MM", "LOCF")
c("MSM", "GP + cov", "MM", "LOCF"),
nrow = 2
)
ggsave(filename = "confusion_matrices_locf.pdf", width = 6, height = 9)
......@@ -796,77 +808,12 @@ Both the MSM and the MM models account for this by almost never imputing a
GOSe of 4.
Instead, the respective cases tend to be imputed to GOSe 3 or 5.
**TODO:**
* this section table is the one we David requested in our last meeting,
not entirely convinced though ...
* ... 1 -> > 1 is not relevant since our imputation is conditional
on not being 1 at 6 months
* ... the comparison seems to favor LOCF since only
upward confusions are considered (which LOCF by design tends to do less)]
* Is there a clinical interpretation along the way that '4' might constitue
a short-term transition state or is it just defined in a way that makes it
highly unlikely to be observed in practice?
```{r crossing-table, echo=FALSE, warning=FALSE, results='asis'}
models <- c("MSM", "GP + cov", "MM")
df_average_confusion_matrices <- df_predictions %>%
filter(model %in% models) %>%
filter(!(gupi %in% idx)) %>%
group_by(fold, model) %>%
do(
confusion_matrix = caret::confusionMatrix(
data = factor(.$prediction, levels = 1:8),
reference = factor(.$GOSE, levels = 1:8)
) %>%
as.matrix %>% as_tibble %>%
mutate(`Predicted GOSE` = row_number() %>% as.character) %>%
gather(`Observed GOSE`, n, 1:8)
) %>%
unnest %>%
group_by(model, `Predicted GOSE`, `Observed GOSE`) %>%
summarize(n = mean(n)) %>%
ungroup %>%
mutate(model = factor(model, models))
rbind(
df_average_confusion_matrices %>%
filter(model %in% c("LOCF", "MM", "GP + cov", "MSM")) %>%
group_by(model) %>%
filter(`Observed GOSE` <= 3) %>%
mutate(n_total = sum(n)) %>%
filter(`Predicted GOSE` > 3) %>%
summarize(fraction = sum(n / n_total)) %>%
mutate(`Event` = "<=3 -> >3"),
df_average_confusion_matrices %>%
filter(model %in% c("LOCF", "MM", "GP + cov", "MSM")) %>%
group_by(model) %>%
filter(`Observed GOSE` == 4) %>%
mutate(n_total = sum(n)) %>%
filter(`Predicted GOSE` > 4) %>%
summarize(fraction = sum(n / n_total)) %>%
mutate(`Event` = "4 -> >4"),
df_average_confusion_matrices %>%
filter(model %in% c("LOCF", "MM", "GP + cov", "MSM")) %>%
group_by(model) %>%
filter(`Observed GOSE` < 8) %>%
mutate(n_total = sum(n)) %>%
filter(`Predicted GOSE` == 8) %>%
summarize(fraction = sum(n / n_total)) %>%
mutate(`Event` = "<8 -> 8")
) %>%
transmute(Model = model, Percent = 100*fraction, Event) %>%
spread(Event, Percent) %>%
pander::pandoc.table("Some specific confusion percentages, LOCF subset.", digits = 3)
```
To better understand the overall performance assessment in Figure ???,
we also consider the performance conditional on the respective ground-truth
(i.e. the observed GOSe values in the test sets).
The results are shown in Figure ??? (vertical bars are =/- one standard error of the mean).
The results are shown in Figure ??? (vertical bars are +/- one standard error of the mean).
```{r error-scores-locf, echo=FALSE, fig.height=5, fig.width=9}
```{r error-scores-locf, echo=FALSE, fig.height=4, fig.width=9}
plot_summary_measures_cond <- function(df_predictions, models, label) {
df_predictions %>%
......@@ -950,10 +897,10 @@ positive and negative biases conditional on low/high GOSe values canceling out
in the overall population.
The MSM and MM models are fairly similar with respect to accuracy but MSM
clearly dominates with respect to bias.
Note that irrespective of the exact definition of bias used, MSM ominates the other
Note that irrespective of the exact definition of bias used, MSM dominates the other
model-based approaches.
Comparing LOCF and MSM, there is a slight advantage of MSM in terms of accuracy for
the majority classes 3, 7, 8 which explain the overall difference shwon in Figure ???.
the majority classes 3, 7, 8 which explain the overall difference shown in Figure ???.
With respect to bias, MSM also performs better than LOCF for the most frequently
observed categories, but the extent of this improvement depend on the performance measure.
......@@ -968,79 +915,30 @@ where only GOSe values after 180 days post-injury are available.
The relative characteristics of the three considered approaches are comparable
to the LOCF subset.
**TODO**
* decide whether figures go in appendix - David and I agree on them being actually the
primary analysis. we just needto convince people of the fact that LOCF should be dropped *first*. As always, I am open to debate this but we should just make a decision, figurexit or figuremain?
```{r confusion-matrix, warning=FALSE, message=FALSE, echo=FALSE, fig.cap="Confusion matrices, full training set without LOCF.", fig.height=9, fig.width=6}
```{r confusion-matrix, warning=FALSE, message=FALSE, echo=FALSE, fig.cap="Confusion matrices, full test set without LOCF.", fig.height=6, fig.width=6}
plot_confusion_matrices(
df_predictions,
c("MSM", "GP + cov", "MM")
c("MSM", "GP + cov", "MM"),
nrow = 1
)
ggsave(filename = "confusion_matrices_all.pdf", width = 6, height = 9)
ggsave(filename = "confusion_matrices_all.png", width = 6, height = 9)
ggsave(filename = "confusion_matrices_all.pdf", width = 6, height = 6)
ggsave(filename = "confusion_matrices_all.png", width = 6, height = 6)
```
```{r crossing-table-full, echo=FALSE, warning=FALSE, results='asis'}
models <- c("MSM", "GP + cov", "MM")
df_average_confusion_matrices <- df_predictions %>%
filter(model %in% models) %>%
group_by(fold, model) %>%
do(
confusion_matrix = caret::confusionMatrix(
data = factor(.$prediction, levels = 1:8),
reference = factor(.$GOSE, levels = 1:8)
) %>%
as.matrix %>% as_tibble %>%
mutate(`Predicted GOSE` = row_number() %>% as.character) %>%
gather(`Observed GOSE`, n, 1:8)
) %>%
unnest %>%
group_by(model, `Predicted GOSE`, `Observed GOSE`) %>%
summarize(n = mean(n)) %>%
ungroup %>%
mutate(model = factor(model, models))
rbind(
df_average_confusion_matrices %>%
group_by(model) %>%
filter(`Observed GOSE` <= 3) %>%
mutate(n_total = sum(n)) %>%
filter(`Predicted GOSE` > 3) %>%
summarize(fraction = sum(n / n_total)) %>%
mutate(`Event` = "<=3 -> >3"),
df_average_confusion_matrices %>%
group_by(model) %>%
filter(`Observed GOSE` == 4) %>%
mutate(n_total = sum(n)) %>%
filter(`Predicted GOSE` > 4) %>%
summarize(fraction = sum(n / n_total)) %>%
mutate(`Event` = "4 -> >4"),
df_average_confusion_matrices %>%
group_by(model) %>%
filter(`Observed GOSE` < 8) %>%
mutate(n_total = sum(n)) %>%
filter(`Predicted GOSE` == 8) %>%
summarize(fraction = sum(n / n_total)) %>%
mutate(`Event` = "<8 -> 8")
) %>%
transmute(Model = model, Percent = 100*fraction, Event) %>%
spread(Event, Percent) %>%
pander::pandoc.table("Some specific confusion percentages, full data set.", digits = 3)
```
```{r error-scores-all, echo=FALSE, fig.height=5, fig.width=99}
```{r error-scores-all, echo=FALSE, fig.height=4, fig.width=9}
plot_summary_measures_cond(
df_predictions %>% filter(!(gupi %in% idx)),
c("MSM", "GP + cov", "MM"),
"Summary measures by observed GOSe, full test set"
)
ggsave(filename = "imputation_error.pdf", width = 9, height = 5)
ggsave(filename = "imputation_error.png", width = 9, height = 5)
ggsave(filename = "imputation_error.pdf", width = 9, height = 4)
ggsave(filename = "imputation_error.png", width = 9, height = 4)
```
......@@ -1056,7 +954,7 @@ data in the first place [comment: I strongly feel we should lead with this
sentence or something in the same spirit to make it absolutely clear that
statistics cannot be used to impute data out of nowhere.
Raising awareness for the complexity of missing data problems and should rather be seen
as an incetive to invest more effort upfront in preventing missingness in the first place ;)]
as an incentive to invest more effort upfront in preventing missingness in the first place ;)]
Nevertheless, in practice, missing values due to loss-to-follow-up will always
occur and should be addressed effectively
There is a wide consensus that statistically sound imputation of missing values
......@@ -1066,7 +964,7 @@ imputation on a per-analysis basis including analysis-specific covariates to
further reduce bias and to preserve the imputation uncertainty in the
downstream analysis.
In practice, however, there are good reasons for providing a set of single-imputed
default values in large bservational studies such as CENTER-TBI.
default values in large observational studies such as CENTER-TBI.
Consortia are increasingly committed to making their databases
available to a wider range of researchers.
In fact, more liberal data-sharing policies are becoming a core requirement
......@@ -1078,13 +976,13 @@ Furthermore, the imputed values of a multiple-imputation procedure are
inherently random and it is thus difficult to ensure consistency across
different analysis teams if the values themselves cannot be stored
directly in a database.
For this reason, as a pratical way forward, we suggest providing a default
For this reason, as a practical way forward, we suggest providing a default
single-imputation with appropriate measures of uncertainty for key outcomes
in the published data base itself.
This mitigates problems with complete-case analyses and provides a
principled and consistent default approach to handling missing values.
Since we strongly suggest to employ a model-based approach to imputation,
the fitted class probabilities can be provided in the core databse along the
the fitted class probabilities can be provided in the core database along the
imputed values.
Based on these probabilities, it is easy to draw samples for a multiple imputation
analysis.
......@@ -1155,17 +1053,17 @@ df_ground_truth %>%
## Reproducible Research Strategy
CENTER-TBI is commited to reproducible research.
CENTER-TBI is committed to reproducible research.
To this end, the entire source code to run the analyses is publicly available
at https://git.center-tbi.eu/kunzmann/gose-6mo-imputation.
Scripts for automatically downloading the required data from the central
access restricted 'Neurobot' (https://neurobot.incf.org/) database at
https://center-tbi.incf.org/ are provided.
The analysis is completely automated using the workflow managment tool 'snakemake'
The analysis is completely automated using the workflow management tool 'snakemake'
[@koster2012snakemake] and a singularity [@kurtzer2017singularity] container image
containing all required dependencies is publicly available from zenodo.org
at https://zenodo.org/record/2600385#.XJzZwEOnw5k (DOI: 10.5281/zenodo.2600385).
Detailed step-bz-step instructions on how to reproduce the analysis are provided
Detailed step-by-step instructions on how to reproduce the analysis are provided
in the README.md file of the GitLab repository.
......
@article{horton2018randomized,
@article{horton2018,
title={Randomized controlled trials in adult traumatic brain injury: A systematic review on the use and reporting of clinical outcome assessments},
author={Horton, Lindsay and Rhodes, Jonathan and Wilson, Lindsay},
journal={Journal of neurotrauma},
......@@ -9,14 +9,14 @@
publisher={Mary Ann Liebert, Inc. 140 Huguenot Street, 3rd Floor New Rochelle, NY 10801 USA}
}
@article{richter2019handling,
@article{richter2019,
title={Handling missing outcome data in traumatic brain injury research-a systematic review},
author={Richter, Sophie and Stevenson, Susan and Newman, Tom and Wilson, Lindsay and Menon, David and Maas, Andrew and Nieboer, Daan and Lingsma, Hester and Steyerberg, Ewout and Newcombe, Virginia},
year={2019},
publisher={Mary Ann Liebert Inc.}
}
@article{white2010bias,
@article{white2010,
title={Bias and efficiency of multiple imputation compared with complete-case analysis for missing covariate values},
author={White, Ian R and Carlin, John B},
journal={Statistics in medicine},
......@@ -102,6 +102,7 @@
pages={36--47},
year={2014},
publisher={American Medical Association}
}
@article{center2015collaborative,
title={Collaborative European neurotrauma effectiveness research in traumatic brain injury (CENTER-TBI): A prospective longitudinal observational study},
......@@ -175,7 +176,6 @@
pages={2520--2522},
year={2012},
publisher={Oxford University Press}
>>>>>>> 943d2c27a63fd82c265933c46a0d6ab674191f03
}
......@@ -242,8 +242,33 @@ year = {2017}
author = {Agresti, Alan},
editor = {{John Wiley {\&} Sons}},
title = {{Categorical data analysis}},
year = {2003}
year =
{2003}
}
@article{steyerberg2008,
title={Predicting outcome after traumatic brain injury: development and international validation of prognostic scores based on admission characteristics},
author={Steyerberg, Ewout W and Mushkudiani, Nino and Perel, Pablo and Butcher, Isabella and Lu, Juan and McHugh, Gillian S and Murray, Gordon D and Marmarou, Anthony and Roberts, Ian and Habbema, J Dik F and others},
journal={PLoS medicine},
volume={5},
number={8},
pages={e165},
year={2008},
publisher={Public Library of Science}
}
@article{meira2009,
title={Multi-state models for the analysis of time-to-event data},
author={Meira-Machado, Lu{\'\i}s and de U{\~n}a-{\'A}lvarez, Jacobo and Cadarso-Suarez, Carmen and Andersen, Per K},
journal={Statistical methods in medical research},
volume={18},
number={2},
pages={195--222},
year={2009},
publisher={SAGE Publications Sage UK: London, England}
}
@article{brms2018,
author = {B{\"{u}}rkner, Paul},
journal = {The R Journal},
......@@ -253,6 +278,7 @@ title = {{Advanced Bayesian Multilevel Modeling with the R Package brms}},
volume = {10},
year = {2018}
}
@article{brms2017,
author = {B{\"{u}}rkner, Paul},
journal = {Journal Of Statistical Software},
......@@ -262,6 +288,7 @@ title = {{brms: An R Package for Bayesian Multilevel Models Using Stan}},
volume = {80},
year = {2017}
}
@article{Steyerberg2008,
author = {Steyerberg, Ewout W and Mushkudiani, Nino and Perel, Pablo and Butcher, Isabella and Lu, Juan and McHugh, Gillian S and Murray, Gordon D and Marmarou, Anthony and Roberts, Ian and Habbema, J. Dik F and Maas, Andrew I. R},
doi = {10.1371/journal.pmed.0050165},
......
......@@ -39,3 +39,10 @@ curl \
--digest https://center-tbi.incf.org/api/data/_5c548a5b6b3f2f22e14d20a2.csv > \
$OUT/df_baseline.csv
Rscript -e "library(tidyverse); saveRDS(as_tibble(read_csv('$OUT/df_baseline.csv')), file = '$OUT/df_baseline.rds')"
# baseline descriptive
curl \
--user $NEUROBOT_USR:$NEUROBOT_API \
--digest https://center-tbi.incf.org/api/data/_5cc703eb3a4c5139c387f8b5.csv > \
$OUT/df_baseline_descriptive.csv
Rscript -e "library(tidyverse); saveRDS(as_tibble(read_csv('$OUT/df_baseline_descriptive.csv')), file = '$OUT/df_baseline_descriptive.rds')"
Markdown is supported
0% or
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment