Commit 0badf884 authored by Kevin Kunzmann's avatar Kevin Kunzmann

revision of manuscript

parent 21167658
...@@ -463,9 +463,7 @@ for (i in 1:config$folds) { ...@@ -463,9 +463,7 @@ for (i in 1:config$folds) {
) )
} }
```
```{r misc, include=FALSE}
modelnames <- lapply( modelnames <- lapply(
list.dirs( list.dirs(
sprintf("%s/validation/posteriors", params$data_dir), sprintf("%s/validation/posteriors", params$data_dir),
...@@ -540,7 +538,7 @@ All models were fit on the entire available data after removing the ...@@ -540,7 +538,7 @@ All models were fit on the entire available data after removing the
180 +/- 14 days post-injury observation from the respective test fold 180 +/- 14 days post-injury observation from the respective test fold
thus mimicking a missing completely at random missing data mechanism. thus mimicking a missing completely at random missing data mechanism.
The distribution of GOSe values in the respective three test sets is well The distribution of GOSe values in the respective three test sets is well
balanced, cf. Figure ??? (Appendix). balanced, cf. Figure ??? (cf. Appendix).
Performance is assessed using the absolute-count and the normalized Performance is assessed using the absolute-count and the normalized
(proportions) confusion matrices as well as (proportions) confusion matrices as well as
bias, mean absolute error (MAE), and root mean squared error (RMSE). bias, mean absolute error (MAE), and root mean squared error (RMSE).
...@@ -551,7 +549,7 @@ RMSE puts more weight on large deviations as compared to RMSE. ...@@ -551,7 +549,7 @@ RMSE puts more weight on large deviations as compared to RMSE.
Comparisons in terms of bias, MAE, and RMSE tacitly assume that Comparisons in terms of bias, MAE, and RMSE tacitly assume that
GOSe values can be sensibly interpreted on an interval scale. GOSe values can be sensibly interpreted on an interval scale.
We therefore also consider $Pr[est > true] - Pr[est < true]$ as an We therefore also consider $Pr[est > true] - Pr[est < true]$ as an
alternative measure of bias. alternative measure of bias which does not require this tacit assumption.
Note that the scale is not directlz comparable to the one of the Note that the scale is not directlz comparable to the one of the
other three quantities! other three quantities!
All measures are considered both conditional on the ground-truth All measures are considered both conditional on the ground-truth
...@@ -584,42 +582,27 @@ is depicted in Figure ??? both conditional on LOCF being applicable and, ...@@ -584,42 +582,27 @@ is depicted in Figure ??? both conditional on LOCF being applicable and,
excluding LOCF, on the entire test set. excluding LOCF, on the entire test set.
```{r overall-comparison-all-methods, echo=FALSE, fig.height=7} ```{r overall-comparison-all-methods, echo=FALSE, fig.height=7}
p1 <- df_predictions %>% plot_summary_measures <- function(df, label) {
df_predictions %>%
filter(!(gupi %in% idx)) %>% filter(!(gupi %in% idx)) %>%
group_by(model, fold) %>% group_by(model, fold) %>%
summarize( summarize(
RMSE = mean((GOSE - prediction)^2, na.rm = TRUE) %>% sqrt, RMSE = mean((GOSE - prediction)^2, na.rm = TRUE) %>% sqrt,
MAE = mean(abs(GOSE - prediction), na.rm = TRUE), MAE = mean(abs(GOSE - prediction), na.rm = TRUE),
Bias = mean(prediction, na.rm = TRUE) - mean(GOSE, na.rm = TRUE), Bias = mean(prediction, na.rm = TRUE) - mean(GOSE, na.rm = TRUE),
`Pr[est > true] - Pr[est < true]` = mean(prediction > GOSE, na.rm = TRUE) - mean(prediction < GOSE, na.rm = TRUE) `Pr[est. > true] - Pr[est. < true]` = mean(prediction > GOSE, na.rm = TRUE) - mean(prediction < GOSE, na.rm = TRUE)
) %>% ) %>%
ungroup %>% ungroup %>%
gather(error, value, -model, -fold) %>% gather(error, value, -model, -fold) %>%
ggplot(aes(model, value)) + mutate(
geom_hline(yintercept = 0, color = "black") + error = factor(error, c(
geom_boxplot() + "Bias",
facet_wrap(~error, nrow = 1) + "Pr[est. > true] - Pr[est. < true]",
scale_y_continuous(name = "", breaks = seq(-2, 8, .25), limits = c(-.5, 1.5)) + "MAE",
scale_x_discrete("") + "RMSE"
theme_bw() + ))
theme(
panel.grid.minor = element_blank(),
panel.grid.major.x = element_blank(),
axis.text.x = element_text(angle = 66, hjust = 1)
) +
ggtitle("Overall summary measures, LOCF subset")
p2 <- df_predictions %>%
filter(model != "LOCF") %>%
group_by(model, fold) %>%
summarize(
RMSE = mean((GOSE - prediction)^2, na.rm = TRUE) %>% sqrt,
MAE = mean(abs(GOSE - prediction), na.rm = TRUE),
Bias = mean(prediction, na.rm = TRUE) - mean(GOSE, na.rm = TRUE),
`Pr[est > true] - Pr[est < true]` = mean(prediction > GOSE, na.rm = TRUE) - mean(prediction < GOSE, na.rm = TRUE)
) %>% ) %>%
ungroup %>%
gather(error, value, -model, -fold) %>%
ggplot(aes(model, value)) + ggplot(aes(model, value)) +
geom_hline(yintercept = 0, color = "black") + geom_hline(yintercept = 0, color = "black") +
geom_boxplot() + geom_boxplot() +
...@@ -632,35 +615,46 @@ p2 <- df_predictions %>% ...@@ -632,35 +615,46 @@ p2 <- df_predictions %>%
panel.grid.major.x = element_blank(), panel.grid.major.x = element_blank(),
axis.text.x = element_text(angle = 66, hjust = 1) axis.text.x = element_text(angle = 66, hjust = 1)
) + ) +
ggtitle("Overall summary measures, full test set") ggtitle(label)
}
cowplot::plot_grid( cowplot::plot_grid(
p1, plot_summary_measures(
p2, df_predictions %>% filter(gupi %in% idx),
"Summary measures, LOCF subset"
),
plot_summary_measures(
df_predictions,
"Summary measures, full test set"
),
ncol = 1, align = "v", axis = "lr" ncol = 1, align = "v", axis = "lr"
) )
``` ```
Firstly, as could be expected, LOCF is overall negatively biased, i.e., Firstly, LOCF is overall negatively biased, i.e.,
on average it imputes lower-than-observed GOSe values. on average it imputes lower-than-observed GOSe values.
This indicates that there is a population average trend towards continued This reflects a population average trend towards continued
recovery within the first 6 months post-injury. recovery within the first 6 months post injury.
In terms of accuracy, the LOCF does perform worst but differences between The fact that both ways of quantifying bias qualitatively agree,
indicates that the interpretation of GOSe as an interval measure which is
tacitly underlying Bias, MAE, and RMSE comparisons is not too restrictive.
In terms of accuracy, LOCF does perform worst but differences between
methods are less pronounced than in terms of bias. methods are less pronounced than in terms of bias.
Notably, the difference in terms of RMSE is more substantial than in terms of Notably, the RMSE difference between LOCF and the other methods is slightly
MAE which indicates that LOCF tends to incur more large deviations across larger than the MAE difference which indicates that LOCF tends to produce more
several GOSe categories than the other methods considered here. large deviations, i.e., across several GOSe categories.
Secondly, including baseline covariates only affects the GP regression model Secondly, including baseline covariates only affects the GP regression model
notably. notably.
Both the MM and MSM model perform more or less the same irrespective of adjustment Both the MM and MSM model perform more or less the same irrespective of adjustment
for baseline covariates. for baseline covariates.
This indicates that the marginal additional predictive value of baseline This indicates that the additional predictive value of baseline
covariates over the information contained in at least one observed GOSe value covariates over the information contained in at least one observed GOSe value
is limited. is limited.
Furthermore, note that both variants of the mixed effects model fail to remove Furthermore, note that both variants of the mixed effects model fail to correct
the overall bias of the imputed values. the overall bias of the imputed values.
We proceed with an in-depth analyses of a subset of models both in direct We proceed with a detailed analyses of a subset of models both in direct
comparison with LOCF and in the entire data set including those cases where comparison with LOCF and in the entire data set including those cases where
LOCF is not applicable. LOCF is not applicable.
In the following we only consider the baseline-adjusted Gaussian process model In the following we only consider the baseline-adjusted Gaussian process model
...@@ -668,18 +662,23 @@ In the following we only consider the baseline-adjusted Gaussian process model ...@@ -668,18 +662,23 @@ In the following we only consider the baseline-adjusted Gaussian process model
and the multi-state model without baseline covariates ('MSM'). and the multi-state model without baseline covariates ('MSM').
The rationale behind dropping baseline adjustment for MM and MSM being that The rationale behind dropping baseline adjustment for MM and MSM being that
the additional complexity does not substantially alter overall performance. the additional complexity does not substantially alter overall performance.
On the other hand, the GP model benefits from the inclusion of the IMPACT
baseline covariates.
## Detailed comparison conditional on LOCF subset ## Detailed comparison conditional on LOCF subset
We first consider results for the set of test cases which allow LOCF imputation We first consider results for the set of test cases which allow LOCF imputation
(n = `r df_predictions %>% filter(model == "LOCF") %>% nrow - length(idx)`). (n = `r df_predictions %>% filter(model == "LOCF") %>% nrow - length(idx)`).
Both the raw count as well as the relative (by true GOSe) confusion matrices Both the raw count as well as the relative (by left-out true GOSe) confusion matrices
are presented in Figure ???. are presented in Figure ???.
```{r confusion-matrix-locf, warning=FALSE, message=FALSE, echo=FALSE, fig.cap="Confusion matrices on LOCF subset."} ```{r confusion-matrix-locf, warning=FALSE, message=FALSE, echo=FALSE, fig.cap="Confusion matrices on LOCF subset."}
df_average_confusion_matrices <- df_predictions %>% plot_confusion_matrices <- function(df_predictions, models) {
filter(!(gupi %in% idx)) %>%
df_average_confusion_matrices <- df_predictions %>%
filter(model %in% models) %>%
group_by(fold, model) %>% group_by(fold, model) %>%
do( do(
confusion_matrix = caret::confusionMatrix( confusion_matrix = caret::confusionMatrix(
...@@ -693,16 +692,21 @@ df_average_confusion_matrices <- df_predictions %>% ...@@ -693,16 +692,21 @@ df_average_confusion_matrices <- df_predictions %>%
unnest %>% unnest %>%
group_by(model, `Predicted GOSE`, `True GOSE`) %>% group_by(model, `Predicted GOSE`, `True GOSE`) %>%
summarize(n = mean(n)) %>% summarize(n = mean(n)) %>%
ungroup ungroup %>%
mutate(model = factor(model, models))
p_cnf_mtrx_raw <- df_average_confusion_matrices %>% p_cnf_mtrx_raw <- df_average_confusion_matrices %>%
filter(model %in% c("LOCF", "MM", "GP + cov", "MSM")) %>%
group_by(model, `True GOSE`) %>%
ggplot(aes(`True GOSE`, `Predicted GOSE`, fill = n)) + ggplot(aes(`True GOSE`, `Predicted GOSE`, fill = n)) +
geom_raster() + geom_raster() +
geom_text(aes(
label = sprintf("%.1f", n) %>%
ifelse(. == "0.0", "", .)
),
size = 1.5
) +
geom_hline(yintercept = c(2, 4, 6) + .5, color = "black") + geom_hline(yintercept = c(2, 4, 6) + .5, color = "black") +
geom_vline(xintercept = c(2, 4, 6) + .5, color = "black") + geom_vline(xintercept = c(2, 4, 6) + .5, color = "black") +
scale_fill_gradient(low = "white", high = "black") + scale_fill_gradient(low = "white", high = "#555555") +
coord_fixed(expand = FALSE) + coord_fixed(expand = FALSE) +
labs(x = "true GOSe", y = "imputed GOSe", fill = "") + labs(x = "true GOSe", y = "imputed GOSe", fill = "") +
theme_bw() + theme_bw() +
...@@ -712,8 +716,7 @@ p_cnf_mtrx_raw <- df_average_confusion_matrices %>% ...@@ -712,8 +716,7 @@ p_cnf_mtrx_raw <- df_average_confusion_matrices %>%
facet_wrap(~model, nrow = 1) + facet_wrap(~model, nrow = 1) +
ggtitle("Average confusion matrix accross folds (absolute counts)") ggtitle("Average confusion matrix accross folds (absolute counts)")
p_cnf_mtrx_colnrm <- df_average_confusion_matrices %>% p_cnf_mtrx_colnrm <- df_average_confusion_matrices %>%
filter(model %in% c("LOCF", "MM", "GP + cov", "MSM")) %>%
group_by(model, `True GOSE`) %>% group_by(model, `True GOSE`) %>%
mutate( mutate(
`fraction (column)` = n / sum(n), `fraction (column)` = n / sum(n),
...@@ -733,31 +736,63 @@ p_cnf_mtrx_colnrm <- df_average_confusion_matrices %>% ...@@ -733,31 +736,63 @@ p_cnf_mtrx_colnrm <- df_average_confusion_matrices %>%
facet_wrap(~model, nrow = 1) + facet_wrap(~model, nrow = 1) +
ggtitle("Average confusion matrix accross folds (column fraction)") ggtitle("Average confusion matrix accross folds (column fraction)")
cowplot::plot_grid(p_cnf_mtrx_raw, p_cnf_mtrx_colnrm, ncol = 1, align = "v") cowplot::plot_grid(p_cnf_mtrx_raw, p_cnf_mtrx_colnrm, ncol = 1, align = "v")
}
plot_confusion_matrices(
df_predictions %>% filter(!(gupi %in% idx)),
c("MSM", "GP + cov", "MM", "LOCF")
)
ggsave(filename = "confusion_matrices_locf.pdf", width = 7, height = 7) ggsave(filename = "confusion_matrices_locf.pdf", width = 7, height = 6)
ggsave(filename = "confusion_matrices_locf.png", width = 7, height = 7) ggsave(filename = "confusion_matrices_locf.png", width = 7, height = 6)
``` ```
The absolute-count confusion matrices show that most imputed values are The absolute-count confusion matrices show that most imputed values are
within +/- one GOSE categories of the true value. within +/- one GOSE categories of the observed ones.
However, they also reflect the category imbalance (cf. Figures ??? and ???, However, they also reflect the category imbalance (cf. Figures ??? and ???
Appendix) in the study ppopulation. Appendix) in the study population.
The performance conditional on the (in practice unknown) true GOSe value The performance conditional on the (in practice unknown) observed GOSe value
clearly shows that imputation performance for the most infrequent category 4 clearly shows that imputation performance for the most infrequent category 4
(GOSe of 1 and 2 are not observed in the test set!) is most problematic. is most problematic.
This is, however, true across the board of methods considered. This is, however, true across the range of methods considered.
Note that both the MSM as well as the MM models account for this fact by Both the MSM and the MM models account for this by almost never imputing a
almost never imputing a GOSe of 4. GOSe of 4.
Instead, the respective cases tend to be imputed to GOSe 3 or 5. Instead, the respective cases tend to be imputed to GOSe 3 or 5.
Additionally, the following table ??? lists some confusion probabilities **TODO:**
of particular interest.
[Comment: 1 -> > 1 is not relevant since our imputation is conditional * this section table is the one we David requested in our last meeting,
on not being 1 at 6 months; the comparison seems to favor LOCF since only not entirely convinced though ...
* ... 1 -> > 1 is not relevant since our imputation is conditional
on not being 1 at 6 months
* ... the comparison seems to favor LOCF since only
upward confusions are considered (which LOCF by design tends to do less)] upward confusions are considered (which LOCF by design tends to do less)]
* Is there a clinical interpretation along the way that '4' might constitue
a short-term transition state or is it just defined in a way that makes it
highly unlikely to be observed in practice?
```{r crossing-table, echo=FALSE, warning=FALSE, results='asis'} ```{r crossing-table, echo=FALSE, warning=FALSE, results='asis'}
models <- c("MSM", "GP + cov", "MM")
df_average_confusion_matrices <- df_predictions %>%
filter(model %in% models) %>%
filter(!(gupi %in% idx)) %>%
group_by(fold, model) %>%
do(
confusion_matrix = caret::confusionMatrix(
data = factor(.$prediction, levels = 1:8),
reference = factor(.$GOSE, levels = 1:8)
) %>%
as.matrix %>% as_tibble %>%
mutate(`Predicted GOSE` = row_number() %>% as.character) %>%
gather(`True GOSE`, n, 1:8)
) %>%
unnest %>%
group_by(model, `Predicted GOSE`, `True GOSE`) %>%
summarize(n = mean(n)) %>%
ungroup %>%
mutate(model = factor(model, models))
rbind( rbind(
df_average_confusion_matrices %>% df_average_confusion_matrices %>%
filter(model %in% c("LOCF", "MM", "GP + cov", "MSM")) %>% filter(model %in% c("LOCF", "MM", "GP + cov", "MSM")) %>%
...@@ -791,37 +826,52 @@ df_average_confusion_matrices %>% ...@@ -791,37 +826,52 @@ df_average_confusion_matrices %>%
pander::pandoc.table("Some specific confusion percentages, LOCF subset.", digits = 3) pander::pandoc.table("Some specific confusion percentages, LOCF subset.", digits = 3)
``` ```
[TODO: Interpretation? GOSe 4 potentially relatively brief transient state,
clinical relevance of distinction between 3/4?]
To better understand the overall performance assessment in Figure ???, To better understand the overall performance assessment in Figure ???,
we also consider the performance conditional on the respective ground-truth we also consider the performance conditional on the respective ground-truth
(i.e. the true GOSe values in the respective test sets). (i.e. the observed GOSe values in the test sets).
The results are shown in Figure ???. The results are shown in Figure ??? (vertical bars are =/- one standard error of the mean).
```{r error-scores-locf, echo=FALSE, fig.height=3} ```{r error-scores-locf, echo=FALSE, fig.height=3, fig.width=9}
df_predictions %>% plot_summary_measures_cond <- function(df_predictions, models, label) {
filter(!(gupi %in% idx)) %>%
filter(model %in% c("LOCF", "MM", "GP + cov", "MSM")) %>% df_predictions %>%
filter(model %in% models) %>%
group_by(model, fold, GOSE) %>% group_by(model, fold, GOSE) %>%
summarize( summarize(
RMSE = mean((GOSE - prediction)^2, na.rm = TRUE) %>% sqrt, RMSE = mean((GOSE - prediction)^2, na.rm = TRUE) %>% sqrt,
MAE = mean(abs(GOSE - prediction), na.rm = TRUE), MAE = mean(abs(GOSE - prediction), na.rm = TRUE),
bias = mean(prediction, na.rm = TRUE) - mean(GOSE, na.rm = TRUE) Bias = mean(prediction, na.rm = TRUE) - mean(GOSE, na.rm = TRUE),
`Pr[est. > true] - Pr[est. < true]` = mean(prediction > GOSE, na.rm = TRUE) - mean(prediction < GOSE, na.rm = TRUE)
) %>%
gather(error, value, -model, -GOSE, -fold) %>%
group_by(GOSE, model, error, fold) %>%
summarize(
mean_per_fold = mean(value) # overall mean
) %>% ) %>%
gather(error, value, -model, -fold, -GOSE) %>%
group_by(GOSE, model, error) %>% group_by(GOSE, model, error) %>%
summarize( summarize(
mean = mean(value), mean = mean(mean_per_fold), # overall mean
sd = sd(value) se = sd(mean_per_fold) / sqrt(n())
) %>% ) %>%
ungroup() %>% ungroup() %>%
mutate(
model = factor(model, models),
error = factor(error, c(
"Bias",
"Pr[est. > true] - Pr[est. < true]",
"MAE",
"RMSE"
))
) %>%
ggplot(aes(GOSE, color = model)) + ggplot(aes(GOSE, color = model)) +
geom_hline(yintercept = 0, color = "black") + geom_hline(yintercept = 0, color = "black") +
geom_line(aes(y = mean)) + geom_errorbar(aes(ymin = mean - se, ymax = mean + se),
geom_point(aes(y = mean)) + width = .2,
position = position_dodge(.33)
) +
geom_line(aes(y = mean), alpha = .5) +
xlab("GOSe") + xlab("GOSe") +
facet_wrap(~error) + facet_wrap(~error, nrow = 1) +
scale_y_continuous(name = "", breaks = seq(-2, 8, .25)) + scale_y_continuous(name = "", breaks = seq(-2, 8, .25)) +
theme_bw() + theme_bw() +
theme( theme(
...@@ -829,10 +879,18 @@ df_predictions %>% ...@@ -829,10 +879,18 @@ df_predictions %>%
panel.grid.major.x = element_blank(), panel.grid.major.x = element_blank(),
legend.title = element_blank() legend.title = element_blank()
) + ) +
ggtitle("Bias and accuracy by true GOSe value, LOCF subset") ggtitle(label)
ggsave(filename = "errors_stratified_locf.pdf", width = 7, height = 7) }
ggsave(filename = "errors_stratified_locf.png", width = 7, height = 7)
plot_summary_measures_cond(
df_predictions %>% filter(!(gupi %in% idx)),
c("MSM", "GP + cov", "MM", "LOCF"),
"Summary measures by observed GOSe, LOCF subset"
)
ggsave(filename = "errors_stratified_locf.pdf", width = 9, height = 3)
ggsave(filename = "errors_stratified_locf.png", width = 9, height = 3)
``` ```
Just as with the overall performance, differences are most pronounced in terms Just as with the overall performance, differences are most pronounced in terms
...@@ -871,11 +929,25 @@ where only GOSe values after 180 days post-injury are available. ...@@ -871,11 +929,25 @@ where only GOSe values after 180 days post-injury are available.
The relative characteristics of the three considered approaches are comparable The relative characteristics of the three considered approaches are comparable
to the LOCF subset. to the LOCF subset.
**TODO**
* decide whether figures go in appendix - David and I agree on them being actually the
primary analysis. we just needto convince people of the fact that LOCF should be dropped *first*. As always, I am open to debate this but we should just make a decision, figurexit or figuremain?
```{r confusion-matrix, warning=FALSE, message=FALSE, echo=FALSE, fig.cap="Confusion matrices, full training set without LOCF."} ```{r confusion-matrix, warning=FALSE, message=FALSE, echo=FALSE, fig.cap="Confusion matrices, full training set without LOCF."}
plot_confusion_matrices(
df_predictions,
c("MSM", "GP + cov", "MM")
)
ggsave(filename = "confusion_matrices_all.pdf", width = 7, height = 6)
ggsave(filename = "confusion_matrices_all.png", width = 7, height = 6)
```
```{r crossing-table-full, echo=FALSE, warning=FALSE, results='asis'}
models <- c("MSM", "GP + cov", "MM")
df_average_confusion_matrices <- df_predictions %>% df_average_confusion_matrices <- df_predictions %>%
filter( filter(model %in% models) %>%
model %in% c("GP + cov", "MM", "MSM")
) %>%
group_by(fold, model) %>% group_by(fold, model) %>%
do( do(
confusion_matrix = caret::confusionMatrix( confusion_matrix = caret::confusionMatrix(
...@@ -889,51 +961,8 @@ df_average_confusion_matrices <- df_predictions %>% ...@@ -889,51 +961,8 @@ df_average_confusion_matrices <- df_predictions %>%
unnest %>% unnest %>%
group_by(model, `Predicted GOSE`, `True GOSE`) %>% group_by(model, `Predicted GOSE`, `True GOSE`) %>%
summarize(n = mean(n)) %>% summarize(n = mean(n)) %>%
ungroup ungroup %>%
mutate(model = factor(model, models))
p_cnf_mtrx_raw <- df_average_confusion_matrices %>%
group_by(model, `True GOSE`) %>%
ggplot(aes(`True GOSE`, `Predicted GOSE`, fill = n)) +
geom_raster() +
geom_hline(yintercept = c(2, 4, 6) + .5, color = "black") +
geom_vline(xintercept = c(2, 4, 6) + .5, color = "black") +
scale_fill_gradient(low = "white", high = "black") +
coord_fixed(expand = FALSE) +
labs(x = "true GOSe", y = "imputed GOSe", fill = "") +
theme_bw() +
theme(
panel.grid = element_blank()
) +
facet_wrap(~model, nrow = 1) +
ggtitle("Average confusion matrix accross folds (absolute counts)")
p_cnf_mtrx_colnrm <- df_average_confusion_matrices %>%
group_by(model, `True GOSE`) %>%
mutate(
`fraction (column)` = n / sum(n),
`fraction (column)` = ifelse(is.nan(`fraction (column)`), 0, `fraction (column)`)
) %>%
ggplot(aes(`True GOSE`, `Predicted GOSE`, fill = `fraction (column)`)) +
geom_raster() +
geom_hline(yintercept = c(2, 4, 6) + .5, color = "black") +
geom_vline(xintercept = c(2, 4, 6) + .5, color = "black") +
scale_fill_gradient("", low = "white", high = "black", limits = c(0, 1)) +
coord_fixed(expand = FALSE) +
labs(x = "true GOSe", y = "imputed GOSe", fill = "") +
theme_bw() +
theme(
panel.grid = element_blank()
) +
facet_wrap(~model, nrow = 1) +
ggtitle("Average confusion matrix accross folds (column fraction)")
cowplot::plot_grid(p_cnf_mtrx_raw, p_cnf_mtrx_colnrm, ncol = 1, align = "v")
ggsave(filename = "confusion_matrices_all.pdf", width = 7, height = 7)
ggsave(filename = "confusion_matrices_all.png", width = 7, height = 7)
```
```{r crossing-table-full, echo=FALSE, warning=FALSE, results='asis'}
rbind( rbind(
df_average_confusion_matrices %>% df_average_confusion_matrices %>%
group_by(model) %>% group_by(model) %>%
...@@ -964,39 +993,15 @@ df_average_confusion_matrices %>% ...@@ -964,39 +993,15 @@ df_average_confusion_matrices %>%
pander::pandoc.table("Some specific confusion percentages, full data set.", digits = 3) pander::pandoc.table("Some specific confusion percentages, full data set.", digits = 3)
``` ```
```{r error-scores-all, echo=FALSE, fig.height=3} ```{r error-scores-all, echo=FALSE, fig.height=3, fig.width=9}
df_predictions %>% plot_summary_measures_cond(
filter(model %in% c("GP + cov", "MM", "MSM")) %>% df_predictions %>% filter(!(gupi %in% idx)),
group_by(model, fold, GOSE) %>% c("MSM", "GP + cov", "MM"),
summarize( "Summary measures by observed GOSe, full test set"
RMSE = mean((GOSE - prediction)^2, na.rm = TRUE) %>% sqrt, )
MAE = mean(abs(GOSE - prediction), na.rm = TRUE),
bias = mean(prediction, na.rm = TRUE) - mean(GOSE, na.rm = TRUE)
) %>%
gather(error, value, -model, -fold, -GOSE) %>%
group_by(GOSE, model, error) %>%
summarize(
mean = mean(value),
sd = sd(value)
) %>%
ungroup() %>%
ggplot(aes(GOSE, color = model)) +
geom_hline(yintercept = 0, color = "black") +
geom_point(aes(y = mean)) +
geom_line(aes(y = mean)) +
facet_wrap(~error) +
scale_y_continuous(name = "", breaks = seq(-2, 8, .25)) +
xlab("GOSe") +
theme_bw() +
theme(
panel.grid.minor = element_blank(),
panel.grid.major.x = element_blank(),
legend.title = element_blank()
) +
ggtitle("Accuracy by true GOSe value, full test set")
ggsave(filename = "imputation_error.pdf", width = 7, height = 7) ggsave(filename = "imputation_error.pdf", width = 9, height = 3)
ggsave(filename = "imputation_error.png", width = 7, height = 7) ggsave(filename = "imputation_error.png", width = 9, height = 3)
``` ```
...@@ -1005,40 +1010,49 @@ ggsave(filename = "imputation_error.png", width = 7, height = 7) ...@@ -1005,40 +1010,49 @@ ggsave(filename = "imputation_error.png", width = 7, height = 7)
# Discussion # Discussion
The most important measure to prevent missing data bias Handling missing data *post-hoc* to prevent biased analyses often requires
is to avoid missing values in the first place! great effort.
Study protocols and procedures should be checked preemptively for their It is thus of the utmost importance to implement measures for avoiding missing
applicability to clinical practice and adequate effort should be put in data in the first place [comment: I strongly feel we should lead with this
ensuring good compliance with follow-up protocols. sentence or something in the same spirit to make it absolutely clear that
statistics cannot be used to impute data out of nowhere.
Raising awareness for the complexity of missing data problems and should rather be seen
as an incetive to invest more effort upfront in preventing missingness in the first place ;)]
Nevertheless, in practice, missing values due to loss-to-follow-up will always Nevertheless, in practice, missing values due to loss-to-follow-up will always
occur and should be addressed efficiently. occur and should be addressed effectively
There is a wide consensus that non-imputation of missing values cannot be There is a wide consensus that statistically sound imputation of missing values
considered best practice and may introduce bias or lead to an unnecessary loss is beneficial for both the reduction of bias and for increasing statistical power.
of power. The current gold-standard for imputing missing values is multiple
The current gold-standard for imputing missing values is certainly multiple
imputation on a per-analysis basis including analysis-specific covariates to imputation on a per-analysis basis including analysis-specific covariates to
further reduce bias and to preserve the imputation uncertainty in the further reduce bias and to preserve the imputation uncertainty in the
downstream analysis. downstream analysis.
However, in practice, there are good reasons to provide a set of single-imputed In practice, however, there are good reasons for providing a set of single-imputed
default values in larger projects such as CENTER-TBI. default values in large bservational studies such as CENTER-TBI.
E.g., consortia are increasingly committed to making their databases Consortia are increasingly committed to making their databases
available to a wider range of researchers. available to a wider range of researchers.
In fact, more liberal data-sharing policies are becoming an core requirement In fact, more liberal data-sharing policies are becoming a core requirement
for many funding bodies [REFERENCES]. for funding bodies (cf. https://www.openaire.eu/).
In this context, it might not be possible to ensure that every analysis team In this context, it might not be possible to ensure that every analysis team
has the necessary statistical expertise to properly conduct a per-analysis has the necessary statistical expertise to properly conduct a per-analysis
multiple imputation. multiple imputation in the future.
Furthermore, the imputed values of a multiple-imputation procedure are Furthermore, the imputed values of a multiple-imputation procedure are
inherently random and it is thus difficult to ensure consistency across inherently random and it is thus difficult to ensure consistency across
different analysis teams. different analysis teams if the values themselves cannot be stored
For this reason, we suggest to provide a default single-imputation with directly in a database.
appropriate measures of uncertainty for key outcomes in the published For this reason, as a pratical way forward, we suggest providing a default
data base itself. single-imputation with appropriate measures of uncertainty for key outcomes
This mitigates problems with complete cases analyses and provide a more in the published data base itself.
This mitigates problems with complete-case analyses and provides a
principled and consistent default approach to handling missing values. principled and consistent default approach to handling missing values.
Since we strongly suggest to employ a model-based approach to imputation,
the fitted class probabilities can be provided in the core databse along the
imputed values.
Based on these probabilities, it is easy to draw samples for a multiple imputation
analysis.
Wherever necessary and practical, a custom, analysis-specific multiple Wherever necessary and practical, a custom, analysis-specific multiple
imputation approach might still be used and the model used to provide the imputation approach might still be employed.
single-imputed values may be used as a starting point. In these cases, the model providing the single-imputed values may
be used as a starting point.
Several reasons disqualify LOCF as method of choice. Several reasons disqualify LOCF as method of choice.
Not only is it inherently biased, but LOCF is also inefficient in that it Not only is it inherently biased, but LOCF is also inefficient in that it
...@@ -1053,34 +1067,29 @@ necessary in some cases to further reduce bias. ...@@ -1053,34 +1067,29 @@ necessary in some cases to further reduce bias.
Finally, LOCF cannot produce an adequate measure of imputation uncertainty Finally, LOCF cannot produce an adequate measure of imputation uncertainty
since it is not model based. since it is not model based.
We draw two main conclusion from our in-depth comparison of three We draw two main conclusion from our comparison of three
alternative, model-based approaches. alternative, model-based approaches.
Firstly, despite its theoretical drawbacks, LOCF is hard to beat in terms of Firstly, despite its theoretical drawbacks, LOCF is hard to beat in terms of
raw accuracy. accuracy.
Still, small improvements are possible (cf. Section ???). Still, small improvements are possible (cf. Section ???).
The main advantages of a model-based approach is thus a reduction of bias, The main advantages of a model-based approach is thus a reduction of bias,
the ability to provide a measure of uncertainty together with the imputed the ability to provide a measure of uncertainty together with the imputed
values values
(or to use the same very same model to draw multiple imputations), (or to use the same very same model to draw multiple imputations),
as well as the possibility to include further analysis-specific covariates. as well as the possibility of including further analysis-specific covariates.
Secondly, we found that the inclusion of established baseline predictors for Secondly, we found that the inclusion of established baseline predictors for
GOSe at 6 months post-injury had next to no effect on the imputation quality. GOSe at 6 months post-injury had little effect on the imputation quality.
Note that this does not refute their predictive value but only indicates that Note that this does not refute their predictive value but only indicates that
there is little marginal benefit over knowing at least one other GOSe value. there is little marginal benefit over knowing at least one other GOSe value.
Finally, we conclude that the differences between the considered model-bases Differences between the model-based approaches tend to be rather nuanced.
approaches are rather small. We nevertheless favor the multi-state model (MSM).
It is well-interpretable in terms of transition intensities.
We nevertheless favor the multi-state model. An efficient implementation [@msm2011] in standard statistical software [@R2016] is
The multi state model is well-interpretable in terms of modeling transition available.
intensities. It succeeds in eliminating the bias introduced by LOCF, is
Efficient implementations in standard statistical software (msm, R) are
available, it succeeds in eliminating the bias introduced by LOCF, is
able to provide imputed values for the entire population and is able to able to provide imputed values for the entire population and is able to
provide a probabilistic output. provide a probabilistic output.
I.e., besides the single predicted GOSe value, the fitted probabilities
for each GOSe value may be stored in the database.
Based on these values, it is easy to draw samples for a multiple imputation
analysis.
......
Markdown is supported
0% or
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment