Commit 0badf884 authored by Kevin Kunzmann's avatar Kevin Kunzmann

revision of manuscript

parent 21167658
......@@ -463,9 +463,7 @@ for (i in 1:config$folds) {
)
}
```
```{r misc, include=FALSE}
modelnames <- lapply(
list.dirs(
sprintf("%s/validation/posteriors", params$data_dir),
......@@ -540,7 +538,7 @@ All models were fit on the entire available data after removing the
180 +/- 14 days post-injury observation from the respective test fold
thus mimicking a missing completely at random missing data mechanism.
The distribution of GOSe values in the respective three test sets is well
balanced, cf. Figure ??? (Appendix).
balanced, cf. Figure ??? (cf. Appendix).
Performance is assessed using the absolute-count and the normalized
(proportions) confusion matrices as well as
bias, mean absolute error (MAE), and root mean squared error (RMSE).
......@@ -551,7 +549,7 @@ RMSE puts more weight on large deviations as compared to RMSE.
Comparisons in terms of bias, MAE, and RMSE tacitly assume that
GOSe values can be sensibly interpreted on an interval scale.
We therefore also consider $Pr[est > true] - Pr[est < true]$ as an
alternative measure of bias.
alternative measure of bias which does not require this tacit assumption.
Note that the scale is not directlz comparable to the one of the
other three quantities!
All measures are considered both conditional on the ground-truth
......@@ -584,83 +582,79 @@ is depicted in Figure ??? both conditional on LOCF being applicable and,
excluding LOCF, on the entire test set.
```{r overall-comparison-all-methods, echo=FALSE, fig.height=7}
p1 <- df_predictions %>%
filter(!(gupi %in% idx)) %>%
group_by(model, fold) %>%
summarize(
RMSE = mean((GOSE - prediction)^2, na.rm = TRUE) %>% sqrt,
MAE = mean(abs(GOSE - prediction), na.rm = TRUE),
Bias = mean(prediction, na.rm = TRUE) - mean(GOSE, na.rm = TRUE),
`Pr[est > true] - Pr[est < true]` = mean(prediction > GOSE, na.rm = TRUE) - mean(prediction < GOSE, na.rm = TRUE)
) %>%
ungroup %>%
gather(error, value, -model, -fold) %>%
ggplot(aes(model, value)) +
geom_hline(yintercept = 0, color = "black") +
geom_boxplot() +
facet_wrap(~error, nrow = 1) +
scale_y_continuous(name = "", breaks = seq(-2, 8, .25), limits = c(-.5, 1.5)) +
scale_x_discrete("") +
theme_bw() +
theme(
panel.grid.minor = element_blank(),
panel.grid.major.x = element_blank(),
axis.text.x = element_text(angle = 66, hjust = 1)
) +
ggtitle("Overall summary measures, LOCF subset")
p2 <- df_predictions %>%
filter(model != "LOCF") %>%
group_by(model, fold) %>%
summarize(
RMSE = mean((GOSE - prediction)^2, na.rm = TRUE) %>% sqrt,
MAE = mean(abs(GOSE - prediction), na.rm = TRUE),
Bias = mean(prediction, na.rm = TRUE) - mean(GOSE, na.rm = TRUE),
`Pr[est > true] - Pr[est < true]` = mean(prediction > GOSE, na.rm = TRUE) - mean(prediction < GOSE, na.rm = TRUE)
) %>%
ungroup %>%
gather(error, value, -model, -fold) %>%
ggplot(aes(model, value)) +
geom_hline(yintercept = 0, color = "black") +
geom_boxplot() +
facet_wrap(~error, nrow = 1) +
scale_y_continuous(name = "", breaks = seq(-2, 8, .25), limits = c(-.5, 1.5)) +
scale_x_discrete("") +
theme_bw() +
theme(
panel.grid.minor = element_blank(),
panel.grid.major.x = element_blank(),
axis.text.x = element_text(angle = 66, hjust = 1)
) +
ggtitle("Overall summary measures, full test set")
plot_summary_measures <- function(df, label) {
df_predictions %>%
filter(!(gupi %in% idx)) %>%
group_by(model, fold) %>%
summarize(
RMSE = mean((GOSE - prediction)^2, na.rm = TRUE) %>% sqrt,
MAE = mean(abs(GOSE - prediction), na.rm = TRUE),
Bias = mean(prediction, na.rm = TRUE) - mean(GOSE, na.rm = TRUE),
`Pr[est. > true] - Pr[est. < true]` = mean(prediction > GOSE, na.rm = TRUE) - mean(prediction < GOSE, na.rm = TRUE)
) %>%
ungroup %>%
gather(error, value, -model, -fold) %>%
mutate(
error = factor(error, c(
"Bias",
"Pr[est. > true] - Pr[est. < true]",
"MAE",
"RMSE"
))
) %>%
ggplot(aes(model, value)) +
geom_hline(yintercept = 0, color = "black") +
geom_boxplot() +
facet_wrap(~error, nrow = 1) +
scale_y_continuous(name = "", breaks = seq(-2, 8, .25), limits = c(-.5, 1.5)) +
scale_x_discrete("") +
theme_bw() +
theme(
panel.grid.minor = element_blank(),
panel.grid.major.x = element_blank(),
axis.text.x = element_text(angle = 66, hjust = 1)
) +
ggtitle(label)
}
cowplot::plot_grid(
p1,
p2,
ncol = 1, align = "v", axis = "lr"
)
plot_summary_measures(
df_predictions %>% filter(gupi %in% idx),
"Summary measures, LOCF subset"
),
plot_summary_measures(
df_predictions,
"Summary measures, full test set"
),
ncol = 1, align = "v", axis = "lr"
)
```
Firstly, as could be expected, LOCF is overall negatively biased, i.e.,
Firstly, LOCF is overall negatively biased, i.e.,
on average it imputes lower-than-observed GOSe values.
This indicates that there is a population average trend towards continued
recovery within the first 6 months post-injury.
In terms of accuracy, the LOCF does perform worst but differences between
This reflects a population average trend towards continued
recovery within the first 6 months post injury.
The fact that both ways of quantifying bias qualitatively agree,
indicates that the interpretation of GOSe as an interval measure which is
tacitly underlying Bias, MAE, and RMSE comparisons is not too restrictive.
In terms of accuracy, LOCF does perform worst but differences between
methods are less pronounced than in terms of bias.
Notably, the difference in terms of RMSE is more substantial than in terms of
MAE which indicates that LOCF tends to incur more large deviations across
several GOSe categories than the other methods considered here.
Notably, the RMSE difference between LOCF and the other methods is slightly
larger than the MAE difference which indicates that LOCF tends to produce more
large deviations, i.e., across several GOSe categories.
Secondly, including baseline covariates only affects the GP regression model
notably.
Both the MM and MSM model perform more or less the same irrespective of adjustment
for baseline covariates.
This indicates that the marginal additional predictive value of baseline
This indicates that the additional predictive value of baseline
covariates over the information contained in at least one observed GOSe value
is limited.
Furthermore, note that both variants of the mixed effects model fail to remove
Furthermore, note that both variants of the mixed effects model fail to correct
the overall bias of the imputed values.
We proceed with an in-depth analyses of a subset of models both in direct
We proceed with a detailed analyses of a subset of models both in direct
comparison with LOCF and in the entire data set including those cases where
LOCF is not applicable.
In the following we only consider the baseline-adjusted Gaussian process model
......@@ -668,96 +662,137 @@ In the following we only consider the baseline-adjusted Gaussian process model
and the multi-state model without baseline covariates ('MSM').
The rationale behind dropping baseline adjustment for MM and MSM being that
the additional complexity does not substantially alter overall performance.
On the other hand, the GP model benefits from the inclusion of the IMPACT
baseline covariates.
## Detailed comparison conditional on LOCF subset
We first consider results for the set of test cases which allow LOCF imputation
(n = `r df_predictions %>% filter(model == "LOCF") %>% nrow - length(idx)`).
Both the raw count as well as the relative (by true GOSe) confusion matrices
Both the raw count as well as the relative (by left-out true GOSe) confusion matrices
are presented in Figure ???.
```{r confusion-matrix-locf, warning=FALSE, message=FALSE, echo=FALSE, fig.cap="Confusion matrices on LOCF subset."}
df_average_confusion_matrices <- df_predictions %>%
filter(!(gupi %in% idx)) %>%
group_by(fold, model) %>%
do(
confusion_matrix = caret::confusionMatrix(
data = factor(.$prediction, levels = 1:8),
reference = factor(.$GOSE, levels = 1:8)
) %>%
as.matrix %>% as_tibble %>%
mutate(`Predicted GOSE` = row_number() %>% as.character) %>%
gather(`True GOSE`, n, 1:8)
) %>%
unnest %>%
group_by(model, `Predicted GOSE`, `True GOSE`) %>%
summarize(n = mean(n)) %>%
ungroup
p_cnf_mtrx_raw <- df_average_confusion_matrices %>%
filter(model %in% c("LOCF", "MM", "GP + cov", "MSM")) %>%
group_by(model, `True GOSE`) %>%
ggplot(aes(`True GOSE`, `Predicted GOSE`, fill = n)) +
geom_raster() +
geom_hline(yintercept = c(2, 4, 6) + .5, color = "black") +
geom_vline(xintercept = c(2, 4, 6) + .5, color = "black") +
scale_fill_gradient(low = "white", high = "black") +
coord_fixed(expand = FALSE) +
labs(x = "true GOSe", y = "imputed GOSe", fill = "") +
theme_bw() +
theme(
panel.grid = element_blank()
) +
facet_wrap(~model, nrow = 1) +
ggtitle("Average confusion matrix accross folds (absolute counts)")
p_cnf_mtrx_colnrm <- df_average_confusion_matrices %>%
filter(model %in% c("LOCF", "MM", "GP + cov", "MSM")) %>%
group_by(model, `True GOSE`) %>%
mutate(
`fraction (column)` = n / sum(n),
`fraction (column)` = ifelse(is.nan(`fraction (column)`), 0, `fraction (column)`)
) %>%
ggplot(aes(`True GOSE`, `Predicted GOSE`, fill = `fraction (column)`)) +
geom_raster() +
geom_hline(yintercept = c(2, 4, 6) + .5, color = "black") +
geom_vline(xintercept = c(2, 4, 6) + .5, color = "black") +
scale_fill_gradient("", low = "white", high = "black", limits = c(0, 1)) +
coord_fixed(expand = FALSE) +
labs(x = "true GOSe", y = "imputed GOSe", fill = "") +
theme_bw() +
theme(
panel.grid = element_blank()
) +
facet_wrap(~model, nrow = 1) +
ggtitle("Average confusion matrix accross folds (column fraction)")
plot_confusion_matrices <- function(df_predictions, models) {
df_average_confusion_matrices <- df_predictions %>%
filter(model %in% models) %>%
group_by(fold, model) %>%
do(
confusion_matrix = caret::confusionMatrix(
data = factor(.$prediction, levels = 1:8),
reference = factor(.$GOSE, levels = 1:8)
) %>%
as.matrix %>% as_tibble %>%
mutate(`Predicted GOSE` = row_number() %>% as.character) %>%
gather(`True GOSE`, n, 1:8)
) %>%
unnest %>%
group_by(model, `Predicted GOSE`, `True GOSE`) %>%
summarize(n = mean(n)) %>%
ungroup %>%
mutate(model = factor(model, models))
p_cnf_mtrx_raw <- df_average_confusion_matrices %>%
ggplot(aes(`True GOSE`, `Predicted GOSE`, fill = n)) +
geom_raster() +
geom_text(aes(
label = sprintf("%.1f", n) %>%
ifelse(. == "0.0", "", .)
),
size = 1.5
) +
geom_hline(yintercept = c(2, 4, 6) + .5, color = "black") +
geom_vline(xintercept = c(2, 4, 6) + .5, color = "black") +
scale_fill_gradient(low = "white", high = "#555555") +
coord_fixed(expand = FALSE) +
labs(x = "true GOSe", y = "imputed GOSe", fill = "") +
theme_bw() +
theme(
panel.grid = element_blank()
) +
facet_wrap(~model, nrow = 1) +
ggtitle("Average confusion matrix accross folds (absolute counts)")
p_cnf_mtrx_colnrm <- df_average_confusion_matrices %>%
group_by(model, `True GOSE`) %>%
mutate(
`fraction (column)` = n / sum(n),
`fraction (column)` = ifelse(is.nan(`fraction (column)`), 0, `fraction (column)`)
) %>%
ggplot(aes(`True GOSE`, `Predicted GOSE`, fill = `fraction (column)`)) +
geom_raster() +
geom_hline(yintercept = c(2, 4, 6) + .5, color = "black") +
geom_vline(xintercept = c(2, 4, 6) + .5, color = "black") +
scale_fill_gradient("", low = "white", high = "black", limits = c(0, 1)) +
coord_fixed(expand = FALSE) +
labs(x = "true GOSe", y = "imputed GOSe", fill = "") +
theme_bw() +
theme(
panel.grid = element_blank()
) +
facet_wrap(~model, nrow = 1) +
ggtitle("Average confusion matrix accross folds (column fraction)")
cowplot::plot_grid(p_cnf_mtrx_raw, p_cnf_mtrx_colnrm, ncol = 1, align = "v")
}
cowplot::plot_grid(p_cnf_mtrx_raw, p_cnf_mtrx_colnrm, ncol = 1, align = "v")
plot_confusion_matrices(
df_predictions %>% filter(!(gupi %in% idx)),
c("MSM", "GP + cov", "MM", "LOCF")
)
ggsave(filename = "confusion_matrices_locf.pdf", width = 7, height = 7)
ggsave(filename = "confusion_matrices_locf.png", width = 7, height = 7)
ggsave(filename = "confusion_matrices_locf.pdf", width = 7, height = 6)
ggsave(filename = "confusion_matrices_locf.png", width = 7, height = 6)
```
The absolute-count confusion matrices show that most imputed values are
within +/- one GOSE categories of the true value.
However, they also reflect the category imbalance (cf. Figures ??? and ???,
Appendix) in the study ppopulation.
The performance conditional on the (in practice unknown) true GOSe value
within +/- one GOSE categories of the observed ones.
However, they also reflect the category imbalance (cf. Figures ??? and ???
Appendix) in the study population.
The performance conditional on the (in practice unknown) observed GOSe value
clearly shows that imputation performance for the most infrequent category 4
(GOSe of 1 and 2 are not observed in the test set!) is most problematic.
This is, however, true across the board of methods considered.
Note that both the MSM as well as the MM models account for this fact by
almost never imputing a GOSe of 4.
is most problematic.
This is, however, true across the range of methods considered.
Both the MSM and the MM models account for this by almost never imputing a
GOSe of 4.
Instead, the respective cases tend to be imputed to GOSe 3 or 5.
Additionally, the following table ??? lists some confusion probabilities
of particular interest.
[Comment: 1 -> > 1 is not relevant since our imputation is conditional
on not being 1 at 6 months; the comparison seems to favor LOCF since only
**TODO:**
* this section table is the one we David requested in our last meeting,
not entirely convinced though ...
* ... 1 -> > 1 is not relevant since our imputation is conditional
on not being 1 at 6 months
* ... the comparison seems to favor LOCF since only
upward confusions are considered (which LOCF by design tends to do less)]
* Is there a clinical interpretation along the way that '4' might constitue
a short-term transition state or is it just defined in a way that makes it
highly unlikely to be observed in practice?
```{r crossing-table, echo=FALSE, warning=FALSE, results='asis'}
models <- c("MSM", "GP + cov", "MM")
df_average_confusion_matrices <- df_predictions %>%
filter(model %in% models) %>%
filter(!(gupi %in% idx)) %>%
group_by(fold, model) %>%
do(
confusion_matrix = caret::confusionMatrix(
data = factor(.$prediction, levels = 1:8),
reference = factor(.$GOSE, levels = 1:8)
) %>%
as.matrix %>% as_tibble %>%
mutate(`Predicted GOSE` = row_number() %>% as.character) %>%
gather(`True GOSE`, n, 1:8)
) %>%
unnest %>%
group_by(model, `Predicted GOSE`, `True GOSE`) %>%
summarize(n = mean(n)) %>%
ungroup %>%
mutate(model = factor(model, models))
rbind(
df_average_confusion_matrices %>%
filter(model %in% c("LOCF", "MM", "GP + cov", "MSM")) %>%
......@@ -791,48 +826,71 @@ df_average_confusion_matrices %>%
pander::pandoc.table("Some specific confusion percentages, LOCF subset.", digits = 3)
```
[TODO: Interpretation? GOSe 4 potentially relatively brief transient state,
clinical relevance of distinction between 3/4?]
To better understand the overall performance assessment in Figure ???,
we also consider the performance conditional on the respective ground-truth
(i.e. the true GOSe values in the respective test sets).
The results are shown in Figure ???.
(i.e. the observed GOSe values in the test sets).
The results are shown in Figure ??? (vertical bars are =/- one standard error of the mean).
```{r error-scores-locf, echo=FALSE, fig.height=3}
df_predictions %>%
filter(!(gupi %in% idx)) %>%
filter(model %in% c("LOCF", "MM", "GP + cov", "MSM")) %>%
group_by(model, fold, GOSE) %>%
summarize(
RMSE = mean((GOSE - prediction)^2, na.rm = TRUE) %>% sqrt,
MAE = mean(abs(GOSE - prediction), na.rm = TRUE),
bias = mean(prediction, na.rm = TRUE) - mean(GOSE, na.rm = TRUE)
) %>%
gather(error, value, -model, -fold, -GOSE) %>%
group_by(GOSE, model, error) %>%
summarize(
mean = mean(value),
sd = sd(value)
) %>%
ungroup() %>%
```{r error-scores-locf, echo=FALSE, fig.height=3, fig.width=9}
plot_summary_measures_cond <- function(df_predictions, models, label) {
df_predictions %>%
filter(model %in% models) %>%
group_by(model, fold, GOSE) %>%
summarize(
RMSE = mean((GOSE - prediction)^2, na.rm = TRUE) %>% sqrt,
MAE = mean(abs(GOSE - prediction), na.rm = TRUE),
Bias = mean(prediction, na.rm = TRUE) - mean(GOSE, na.rm = TRUE),
`Pr[est. > true] - Pr[est. < true]` = mean(prediction > GOSE, na.rm = TRUE) - mean(prediction < GOSE, na.rm = TRUE)
) %>%
gather(error, value, -model, -GOSE, -fold) %>%
group_by(GOSE, model, error, fold) %>%
summarize(
mean_per_fold = mean(value) # overall mean
) %>%
group_by(GOSE, model, error) %>%
summarize(
mean = mean(mean_per_fold), # overall mean
se = sd(mean_per_fold) / sqrt(n())
) %>%
ungroup() %>%
mutate(
model = factor(model, models),
error = factor(error, c(
"Bias",
"Pr[est. > true] - Pr[est. < true]",
"MAE",
"RMSE"
))
) %>%
ggplot(aes(GOSE, color = model)) +
geom_hline(yintercept = 0, color = "black") +
geom_line(aes(y = mean)) +
geom_point(aes(y = mean)) +
xlab("GOSe") +
facet_wrap(~error) +
scale_y_continuous(name = "", breaks = seq(-2, 8, .25)) +
theme_bw() +
theme(
panel.grid.minor = element_blank(),
panel.grid.major.x = element_blank(),
legend.title = element_blank()
) +
ggtitle("Bias and accuracy by true GOSe value, LOCF subset")
geom_hline(yintercept = 0, color = "black") +
geom_errorbar(aes(ymin = mean - se, ymax = mean + se),
width = .2,
position = position_dodge(.33)
) +
geom_line(aes(y = mean), alpha = .5) +
xlab("GOSe") +
facet_wrap(~error, nrow = 1) +
scale_y_continuous(name = "", breaks = seq(-2, 8, .25)) +
theme_bw() +
theme(
panel.grid.minor = element_blank(),
panel.grid.major.x = element_blank(),
legend.title = element_blank()
) +
ggtitle(label)
}
plot_summary_measures_cond(
df_predictions %>% filter(!(gupi %in% idx)),
c("MSM", "GP + cov", "MM", "LOCF"),
"Summary measures by observed GOSe, LOCF subset"
)
ggsave(filename = "errors_stratified_locf.pdf", width = 7, height = 7)
ggsave(filename = "errors_stratified_locf.png", width = 7, height = 7)
ggsave(filename = "errors_stratified_locf.pdf", width = 9, height = 3)
ggsave(filename = "errors_stratified_locf.png", width = 9, height = 3)
```
Just as with the overall performance, differences are most pronounced in terms
......@@ -871,69 +929,40 @@ where only GOSe values after 180 days post-injury are available.
The relative characteristics of the three considered approaches are comparable
to the LOCF subset.
```{r confusion-matrix, warning=FALSE, message=FALSE, echo=FALSE, fig.cap="Confusion matrices, full training set without LOCF."}
df_average_confusion_matrices <- df_predictions %>%
filter(
model %in% c("GP + cov", "MM", "MSM")
) %>%
group_by(fold, model) %>%
do(
confusion_matrix = caret::confusionMatrix(
data = factor(.$prediction, levels = 1:8),
reference = factor(.$GOSE, levels = 1:8)
) %>%
as.matrix %>% as_tibble %>%
mutate(`Predicted GOSE` = row_number() %>% as.character) %>%
gather(`True GOSE`, n, 1:8)
) %>%
unnest %>%
group_by(model, `Predicted GOSE`, `True GOSE`) %>%
summarize(n = mean(n)) %>%
ungroup
p_cnf_mtrx_raw <- df_average_confusion_matrices %>%
group_by(model, `True GOSE`) %>%
ggplot(aes(`True GOSE`, `Predicted GOSE`, fill = n)) +
geom_raster() +
geom_hline(yintercept = c(2, 4, 6) + .5, color = "black") +
geom_vline(xintercept = c(2, 4, 6) + .5, color = "black") +
scale_fill_gradient(low = "white", high = "black") +
coord_fixed(expand = FALSE) +
labs(x = "true GOSe", y = "imputed GOSe", fill = "") +
theme_bw() +
theme(
panel.grid = element_blank()
) +
facet_wrap(~model, nrow = 1) +
ggtitle("Average confusion matrix accross folds (absolute counts)")
**TODO**
p_cnf_mtrx_colnrm <- df_average_confusion_matrices %>%
group_by(model, `True GOSE`) %>%
mutate(
`fraction (column)` = n / sum(n),
`fraction (column)` = ifelse(is.nan(`fraction (column)`), 0, `fraction (column)`)
) %>%
ggplot(aes(`True GOSE`, `Predicted GOSE`, fill = `fraction (column)`)) +
geom_raster() +
geom_hline(yintercept = c(2, 4, 6) + .5, color = "black") +
geom_vline(xintercept = c(2, 4, 6) + .5, color = "black") +
scale_fill_gradient("", low = "white", high = "black", limits = c(0, 1)) +
coord_fixed(expand = FALSE) +
labs(x = "true GOSe", y = "imputed GOSe", fill = "") +
theme_bw() +
theme(
panel.grid = element_blank()
) +
facet_wrap(~model, nrow = 1) +
ggtitle("Average confusion matrix accross folds (column fraction)")
* decide whether figures go in appendix - David and I agree on them being actually the
primary analysis. we just needto convince people of the fact that LOCF should be dropped *first*. As always, I am open to debate this but we should just make a decision, figurexit or figuremain?
cowplot::plot_grid(p_cnf_mtrx_raw, p_cnf_mtrx_colnrm, ncol = 1, align = "v")
```{r confusion-matrix, warning=FALSE, message=FALSE, echo=FALSE, fig.cap="Confusion matrices, full training set without LOCF."}
plot_confusion_matrices(
df_predictions,
c("MSM", "GP + cov", "MM")
)
ggsave(filename = "confusion_matrices_all.pdf", width = 7, height = 7)
ggsave(filename = "confusion_matrices_all.png", width = 7, height = 7)
ggsave(filename = "confusion_matrices_all.pdf", width = 7, height = 6)
ggsave(filename = "confusion_matrices_all.png", width = 7, height = 6)
```
```{r crossing-table-full, echo=FALSE, warning=FALSE, results='asis'}
models <- c("MSM", "GP + cov", "MM")
df_average_confusion_matrices <- df_predictions %>%
filter(model %in% models) %>%
group_by(fold, model) %>%
do(
confusion_matrix = caret::confusionMatrix(
data = factor(.$prediction, levels = 1:8),
reference = factor(.$GOSE, levels = 1:8)
) %>%
as.matrix %>% as_tibble %>%
mutate(`Predicted GOSE` = row_number() %>% as.character) %>%
gather(`True GOSE`, n, 1:8)
) %>%
unnest %>%
group_by(model, `Predicted GOSE`, `True GOSE`) %>%
summarize(n = mean(n)) %>%
ungroup %>%
mutate(model = factor(model, models))
rbind(
df_average_confusion_matrices %>%
group_by(model) %>%
......@@ -964,39 +993,15 @@ df_average_confusion_matrices %>%
pander::pandoc.table("Some specific confusion percentages, full data set.", digits = 3)
```
```{r error-scores-all, echo=FALSE, fig.height=3}
df_predictions %>%
filter(model %in% c("GP + cov", "MM", "MSM")) %>%
group_by(model, fold, GOSE) %>%
summarize(
RMSE = mean((GOSE - prediction)^2, na.rm = TRUE) %>% sqrt,
MAE = mean(abs(GOSE - prediction), na.rm = TRUE),
bias = mean(prediction, na.rm = TRUE) - mean(GOSE, na.rm = TRUE)
) %>%
gather(error, value, -model, -fold, -GOSE) %>%
group_by(GOSE, model, error) %>%
summarize(
mean = mean(value),
sd = sd(value)
) %>%
ungroup() %>%
ggplot(aes(GOSE, color = model)) +
geom_hline(yintercept = 0, color = "black") +
geom_point(aes(y = mean)) +
geom_line(aes(y = mean)) +
facet_wrap(~error) +
scale_y_continuous(name = "", breaks = seq(-2, 8, .25)) +
xlab("GOSe") +
theme_bw() +
theme(
panel.grid.minor = element_blank(),
panel.grid.major.x = element_blank(),
legend.title = element_blank()
) +
ggtitle("Accuracy by true GOSe value, full test set")
```{r error-scores-all, echo=FALSE, fig.height=3, fig.width=9}
plot_summary_measures_cond(
df_predictions %>% filter(!(gupi %in% idx)),
c("MSM", "GP + cov", "MM"),
"Summary measures by observed GOSe, full test set"
)
ggsave(filename = "imputation_error.pdf", width = 7, height = 7)
ggsave(filename = "imputation_error.png", width = 7, height = 7)
ggsave(filename = "imputation_error.pdf", width = 9, height = 3)
ggsave(filename = "imputation_error.png", width = 9, height = 3)
```
......@@ -1005,40 +1010,49 @@ ggsave(filename = "imputation_error.png", width = 7, height = 7)
# Discussion
The most important measure to prevent missing data bias
is to avoid missing values in the first place!
Study protocols and procedures should be checked preemptively for their
applicability to clinical practice and adequate effort should be put in
ensuring good compliance with follow-up protocols.
Handling missing data *post-hoc* to prevent biased analyses often requires
great effort.
It is thus of the utmost importance to implement measures for avoiding missing
data in the first place [comment: I strongly feel we should lead with this
sentence or something in the same spirit to make it absolutely clear that
statistics cannot be used to impute data out of nowhere.
Raising awareness for the complexity of missing data problems and should rather be seen
as an incetive to invest more effort upfront in preventing missingness in the first place ;)]
Nevertheless, in practice, missing values due to loss-to-follow-up will always
occur and should be addressed efficiently.
There is a wide consensus that non-imputation of missing values cannot be
considered best practice and may introduce bias or lead to an unnecessary loss
of power.
The current gold-standard for imputing missing values is certainly multiple
occur and should be addressed effectively
There is a wide consensus that statistically sound imputation of missing values
is beneficial for both the reduction of bias and for increasing statistical power.
The current gold-standard for imputing missing values is multiple
imputation on a per-analysis basis including analysis-specific covariates to
further reduce bias and to preserve the imputation uncertainty in the
downstream analysis.
However, in practice, there are good reasons to provide a set of single-imputed
default values in larger projects such as CENTER-TBI.
E.g., consortia are increasingly committed to making their databases
In practice, however, there are good reasons for providing a set of single-imputed
default values in large bservational studies such as CENTER-TBI.
Consortia are increasingly committed to making their databases
available to a wider range of researchers.
In fact, more liberal data-sharing policies are becoming an core requirement
for many funding bodies [REFERENCES].
In fact, more liberal data-sharing policies are becoming a core requirement
for funding bodies (cf. https://www.openaire.eu/).
In this context, it might not be possible to ensure that every analysis team
has the necessary statistical expertise to properly conduct a per-analysis
multiple imputation.
multiple imputation in the future.
Furthermore, the imputed values of a multiple-imputation procedure are
inherently random and it is thus difficult to ensure consistency across
different analysis teams.
For this reason, we suggest to provide a default single-imputation with
appropriate measures of uncertainty for key outcomes in the published
data base itself.
This mitigates problems with complete cases analyses and provide a more
different analysis teams if the values themselves cannot be stored
directly in a database.
For this reason, as a pratical way forward, we suggest providing a default
single-imputation with appropriate measures of uncertainty for key outcomes
in the published data base itself.
This mitigates problems with complete-case analyses and provides a
principled and consistent default approach to handling missing values.
Since we strongly suggest to employ a model-based approach to imputation,
the fitted class probabilities can be provided in the core databse along the
imputed values.
Based on these probabilities, it is easy to draw samples for a multiple imputation
analysis.
Wherever necessary and practical, a custom, analysis-specific multiple
imputation approach might still be used and the model used to provide the
single-imputed values may be used as a starting point.
imputation approach might still be employed.
In these cases, the model providing the single-imputed values may
be used as a starting point.
Several reasons disqualify LOCF as method of choice.
Not only is it inherently biased, but LOCF is also inefficient in that it
......@@ -1053,34 +1067,29 @@ necessary in some cases to further reduce bias.
Finally, LOCF cannot produce an adequate measure of imputation uncertainty
since it is not model based.
We draw two main conclusion from our in-depth comparison of three
We draw two main conclusion from our comparison of three
alternative, model-based approaches.
Firstly, despite its theoretical drawbacks, LOCF is hard to beat in terms of
raw accuracy.
accuracy.
Still, small improvements are possible (cf. Section ???).
The main advantages of a model-based approach is thus a reduction of bias,
the ability to provide a measure of uncertainty together with the imputed
values
(or to use the same very same model to draw multiple imputations),
as well as the possibility to include further analysis-specific covariates.
as well as the possibility of including further analysis-specific covariates.
Secondly, we found that the inclusion of established baseline predictors for
GOSe at 6 months post-injury had next to no effect on the imputation quality.
GOSe at 6 months post-injury had little effect on the imputation quality.
Note that this does not refute their predictive value but only indicates that
there is little marginal benefit over knowing at least one other GOSe value.
Finally, we conclude that the differences between the considered model-bases
approaches are rather small.
We nevertheless favor the multi-state model.
The multi state model is well-interpretable in terms of modeling transition
intensities.
Efficient implementations in standard statistical software (msm, R) are
available, it succeeds in eliminating the bias introduced by LOCF, is
Differences between the model-based approaches tend to be rather nuanced.
We nevertheless favor the multi-state model (MSM).
It is well-interpretable in terms of transition intensities.
An efficient implementation [@msm2011] in standard statistical software [@R2016] is
available.
It succeeds in eliminating the bias introduced by LOCF, is
able to provide imputed values for the entire population and is able to
provide a probabilistic output.
I.e., besides the single predicted GOSe value, the fitted probabilities
for each GOSe value may be stored in the database.
Based on these values, it is easy to draw samples for a multiple imputation
analysis.
......
Markdown is supported
0% or
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment