Commit d5175d55 authored by Kevin Kunzmann's avatar Kevin Kunzmann

reworked figures etc.

parent 8460bae6
......@@ -88,12 +88,14 @@ df_baseline <- read_rds(paste0(params$data_dir, "/df_baseline.rds")) %>%
mutate(
InjuryHx.PupilsBaselineDerived = factor(InjuryHx.PupilsBaselineDerived,
levels = 0:2),
InjuryHx.GCSScoreBaselineDerived_dscr = case_when(
InjuryHx.GCSScoreBaselineDerived = case_when(
InjuryHx.GCSScoreBaselineDerived <= 8 ~ "Severe",
InjuryHx.GCSScoreBaselineDerived <= 12 ~ "Moderate",
InjuryHx.GCSScoreBaselineDerived > 12 ~ "Mild"
) %>% as.factor()
) %>% as.factor(),
Labs.DLGlucosemmolL = exp(Labs.DLGlucosemmolL_log)
) %>%
select(-Labs.DLGlucosemmolL_log) %>%
left_join(
read_rds(
"../data/v1.1/df_baseline_descriptive.rds"
......@@ -128,6 +130,15 @@ df_baseline <- read_rds(paste0(params$data_dir, "/df_baseline.rds")) %>%
by = "gupi"
)
var_labels <- readr::read_csv("varlabels.csv")
get_label <- function(varname) var_labels$label[var_labels$varname == varname]
get_label <- Vectorize(get_label)
get_label(names(df_baseline))
names(df_baseline) <- get_label(names(df_baseline))
n_pat <- nrow(df_baseline)
```
......@@ -163,8 +174,8 @@ summarizer <- function(x) {
return(tibble(
statistic = c("# NA", "min", "mean", "median", "max", "std") %>%
factor(., levels = ., ordered = TRUE),
value = list(sum(is.na(x)), mean(x, na.rm = TRUE), median(x, na.rm = TRUE),
min(x, na.rm = TRUE), max(x, na.rm = TRUE), sd(x, na.rm = TRUE))
value = list(sum(is.na(x)), min(x, na.rm = TRUE), mean(x, na.rm = TRUE), median(x, na.rm = TRUE),
max(x, na.rm = TRUE), sd(x, na.rm = TRUE))
))
}
......@@ -183,20 +194,11 @@ df_baseline %>%
```{r baseline-table-discrete, echo=FALSE, results='asis'}
summarizer <- function(x) {
return(tibble(
statistic = c("# NA", "table") %>%
factor(., levels = ., ordered = TRUE),
value = list(
sum(is.na(x)),
{
rbind(table(x), prop.table(table(x))) %>%
t %>%
{tmp <- .; colnames(tmp) <- c("n", "%"); tmp} %>%
as_tibble(rownames = "level") %>%
mutate(`%` = 100 * `%`)
}
)
))
rbind(table(x), prop.table(table(x))) %>%
t %>%
{tmp <- .; colnames(tmp) <- c("n", "%"); tmp} %>%
as_tibble(rownames = "level") %>%
mutate(`%` = 100 * `%`)
}
df_baseline %>%
......@@ -209,12 +211,8 @@ df_baseline %>%
group_by(variable) %>%
do(
smry = summarizer(.$value)
) %>%
unnest() %>%
spread(statistic, value) %>%
unnest(`# NA`) %>%
select(-`# NA`) %>%
unnest(table) %>%
) %>%
unnest() %>%
pander::pandoc.table("Continuous baseline variables", digits = 3, split.tables = 120)
```
......@@ -346,131 +344,10 @@ performance comparisons including LOCF only on the subset of individuals
for which a LOCF-imputed value can be obtained.
## Mixed-effects model
Mixed effects models are a widely used approach in longitudinal
data analysis and model individual deviations from a population mean trajectory
(@verbeke2009linear).
To account for the fact that the GOSe outcome is an ordered factor,
we employ a cumulative link function model with flexible intercepts
(@Agresti2003).
The population mean is modeled as cubic spline function to allow a
non-linear population mean trajectory.
Patient-individual deviations from this population mean are modeled
as quadratic polynomials to allow sufficient flexibility (random effects).
Baseline covariates are added as linear fixed effects to to the
population mean.
The model was fitted using Bayesian statistics via the BRMS
package (@brms2017; @brms2018) for the R environment for statistical
computing (@R2016) and the Stan modelling language (@stan2017).
Non-informative priors were used for the model parameters.
A potential drawback of the proposed longitudinal mixed effects model
is the fact that the individual deviations from the population mean
are modeled globally using polynomials.
Since linear and quadratic terms are not identifiable for patients with
only one observed GOSE value, this implies large uncertainty over the
patient-specific effects for individuals with one or two observations
only by falling back on the non-informative priors on these model
parameters.
Thus, the overall uncertainty associated with model-based imputations
at exactly 180 days may become relatively large.
## Gaussian process model
## Model-based methods
Gaussian process regression allows flexible modelling of
both the individual GOSE trajectories as well as the population mean
in a Bayesian non-parametric way [@rasmussen2006].
This non-parametric paradigm leads to low model-uncertainty in the
vicinity of actually observed GOSe outcomes.
To account for the discreteness of the GOSe outcome,
the continuous output of the Gaussian process model is rounded to the
nearest integer in 1 to 8 (GOSe categories).
The squared exponential covariance function with shared
length scale for all individuals is used to model intra-individual
dependency of GOSe outcomes.
The population mean trajectory of the Gaussian process is modeled as
mean of an independent Gaussian process with pseudo observations at
`r config$gp$t_knots %>% as.numeric()` days post-injury
(also using a squared exponential covariance function).
This approach maintains flexibility of the population mean function
similar to the spline-based approach for the mixed effects model
while avoiding the the computational complexity of a fully
hierarchical Gaussian process model.
Again, the impact of baseline covariates is modeled via linear effects
on the population mean of the Gaussian process.
All parameters are estimated in a fully Bayesian fashion using the Stan
modelling language @stan2017 and non-informative priors except for the
length scale of the squared exponential kernel.
Due to the sparseness of the data, the estimated length scale will naturally
tend towards extremely large values implying unrealistically
long-range dependency between observations.
We therefore limit the length scale to a maximum of 120 days (4 months) and
impose a Gaussian prior with a mean of 60 days post injury and and a
standard deviation of 14.
## Multi-state model
Both the mixed effects model as well as the Gaussian process regression
model are essentially non-linear regression techniques for longitudinal
data.
While they are both powerful tools to model longitudinal trajectories,
they do not explicitly model the probability of transitions between
GOSe states.
Since the number of observations per individual is very limited in our
data set (1 to 4 GOSe observations per individual),
an approach explicitly modelling transition probabilities might be
more suitable to capture the dynamics of the GOSe trajectories.
To explore this further, a Markov multi-state model is considered (@meira2009).
This model class assumes that the transitions between adjacent GOSe
states can be modeled as a Markov process and the transition
intensities between adjacent states are fitted to the observed data.
```{r diagram, echo=FALSE, fig.cap="Structural diagram of allowed transitions between GOSe states for the proposed multi-state model.", fig.width=8, fig.height=4}
# define transition matrix
Q <- matrix(0, nrow = 8, ncol = 8)
for (i in 1:8) {
for (j in 1:8) {
if (i == j - 1 & i != 1) {
Q[i, j] <- 1
}
if (i == j + 1) {
Q[i, j] <- 1
}
}
}
Q[2:8, 1] <- 1 # allow instantaneous deaths
Q <- t(Q)
curvature <- Q / 5
curvature[1, ] <- 0
diagram::plotmat(
Q,
relsize = 1,
shadow.size = 0,
pos = c(1, 7),
box.size = .033,
arr.pos = .55,
arr.length = .33,
arr.type = "triangle",
cex.txt = 0,
curve = curvature
)
```
To account for the fact that state-transitions might be more frequent
in the early post-injury phase,
piece wise constant transition intensities were fitted to the intervals
[0, 90), [90, 270), and 270+ days post-injury.
The model was fit using the msm @msm2011 package for the R environment
for statistical computing @R2016.
Due to the relatively large number of 19 transition intensities in the proposed model
(cf. arrows in Figure ???, structure of transition graph),
inclusion of all baseline covariates turned out to be numerically unstable.
For the MSM model, instead of including all covariates,
only a model adjusting for age at injury via a proportional hazard approach was fit.
## Performance assessment
......@@ -560,12 +437,12 @@ df_predictions <- df_model_posteriors %>%
Model performance was assessed via three fold cross validation on the subset
of individuals with a valid GOSe value within 180 +/- 14 days post-injury
(n = `r nrow(df_ground_truth)`.
(n = `r nrow(df_ground_truth)`).
All models were fit on the entire available data after removing the
180 +/- 14 days post-injury observation from the respective test fold
thus mimicking a missing completely at random missing data mechanism.
The distribution of GOSe values in the respective three test sets is well
balanced, cf. Figure ??? (cf. Appendix).
balanced, (cf. Appendix).
Performance is assessed using the absolute-count and the normalized
(proportions) confusion matrices as well as
bias, mean absolute error (MAE), and root mean squared error (RMSE).
......@@ -575,8 +452,11 @@ MAE and RMSE are both a measures of average precision where
RMSE puts more weight on large deviations as compared to RMSE.
Comparisons in terms of bias, MAE, and RMSE tacitly assume that
GOSe values can be sensibly interpreted on an interval scale.
We therefore also consider $Pr[imp. > true] - Pr[imp. < true]$ as an
alternative measure of bias which does not require this tacit assumption.
We therefore also consider the directional bias (bias'),
the difference between the model-fitted
probability of exceeding the true value minus the model-fitted probability of
undershooting the true GOSe ($Pr[imp. > true] - Pr[imp. < true]$) as an
alternative measure of bias which does not require this assumption.
Note that the scale is not directly comparable to the one of the
other three quantities!
All measures are considered both conditional on the ground-truth
......@@ -604,27 +484,27 @@ individuals do not permit an LOCF imputation (`r round(100 * (df_gose %>% group_
# Results
The overall performance of all fitted models in terms of bias, MAE, and RMSE
The overall performance of all fitted models in terms of bias, bias', MAE, and RMSE
is depicted in Figure ??? both conditional on LOCF being applicable and,
excluding LOCF, on the entire test set.
```{r overall-comparison-all-methods, echo=FALSE, fig.height=9, fig.width=6}
plot_summary_measures <- function(df, label) {
```{r overall-comparison-all-methods, echo=FALSE, fig.height=3.5, fig.width=6, warning=FALSE}
compute_summary_measures <- function(df) {
df %>%
group_by(model, fold) %>%
summarize(
RMSE = mean((GOSE - prediction)^2, na.rm = TRUE) %>% sqrt,
MAE = mean(abs(GOSE - prediction), na.rm = TRUE),
Bias = mean(prediction, na.rm = TRUE) - mean(GOSE, na.rm = TRUE),
`Pr[imp. > true] - Pr[imp. < true]` = mean(prediction > GOSE, na.rm = TRUE) - mean(prediction < GOSE, na.rm = TRUE)
RMSE = mean((GOSE - prediction)^2, na.rm = TRUE) %>% sqrt,
MAE = mean(abs(GOSE - prediction), na.rm = TRUE),
Bias = mean(prediction, na.rm = TRUE) - mean(GOSE, na.rm = TRUE),
`Bias'` = mean(prediction > GOSE, na.rm = TRUE) - mean(prediction < GOSE, na.rm = TRUE)
) %>%
ungroup %>%
gather(error, value, -model, -fold) %>%
mutate(
error = factor(error, c(
"Bias",
"Pr[imp. > true] - Pr[imp. < true]",
"Bias'",
"MAE",
"RMSE"
))
......@@ -632,38 +512,41 @@ plot_summary_measures <- function(df, label) {
group_by(model, error) %>%
summarize(
mean_error = mean(value),
se_error = sd(value) / sqrt(n())
) %>%
ggplot(aes(x = model, y = mean_error)) +
geom_hline(yintercept = 0, color = "black") +
geom_point(size = .8) +
geom_errorbar(
aes(ymin = mean_error - 1.96*se_error, ymax = mean_error + 1.96*se_error),
width = .25
) +
facet_wrap(~error, nrow = 2) +
scale_y_continuous(name = "", breaks = seq(-2, 8, .25), limits = c(-.5, 1.25)) +
scale_x_discrete("") +
theme_bw() +
theme(
panel.grid.minor = element_blank(),
panel.grid.major.x = element_blank(),
axis.text.x = element_text(angle = 66, hjust = 1)
) +
ggtitle(label)
se_error = sd(value) / sqrt(n()),
)
}
cowplot::plot_grid(
plot_summary_measures(
df_predictions %>% filter(!(gupi %in% idx)),
"Summary measures, LOCF subset"
),
plot_summary_measures(
df_predictions,
"Summary measures, full test set"
),
ncol = 1, align = "v", axis = "lr"
df_plot <- rbind(
compute_summary_measures(
df_predictions %>%
mutate(prediction = ifelse(model == "LOCF", NA, prediction))
) %>%
mutate(label = "full test set"),
compute_summary_measures(df_predictions %>% filter(!(gupi %in% idx))) %>%
mutate(label = "LOCF subset")
) %>%
ungroup
ggplot(df_plot, aes(x = model, y = mean_error)) +
geom_hline(yintercept = 0, color = "black") +
geom_point(aes(color = label), size = .8, position = position_dodge(.3)) +
geom_errorbar(
aes(ymin = mean_error - 1.96*se_error, ymax = mean_error + 1.96*se_error, color = label),
width = .25, position = position_dodge(.3)
) +
scale_color_grey("", start = .0, end = .5) +
facet_wrap(~error, nrow = 1) +
scale_y_continuous(name = "", breaks = seq(-2, 8, .25), limits = c(-.5, 1.25)) +
scale_x_discrete("") +
theme_bw() +
theme(
panel.grid.minor = element_blank(),
panel.grid.major.x = element_blank(),
axis.text.x = element_text(angle = 66, hjust = 1),
legend.position = "bottom"
)
```
......@@ -709,8 +592,8 @@ We first consider results for the set of test cases which allow LOCF imputation
Both the raw count as well as the relative (by left-out observed GOSe) confusion matrices
are presented in Figure ???.
```{r confusion-matrix-locf, warning=FALSE, message=FALSE, echo=FALSE, fig.cap="Confusion matrices on LOCF subset.", fig.height=9, fig.width=6}
plot_confusion_matrices <- function(df_predictions, models, nrow = 2) {
```{r confusion-matrix-locf, warning=FALSE, message=FALSE, echo=FALSE, fig.cap="Confusion matrices on LOCF subset.", fig.height=6, fig.width=6}
plot_confusion_matrices <- function(df_predictions, models, nrow = 2, legendpos, scriptsize) {
df_average_confusion_matrices <- df_predictions %>%
filter(model %in% models) %>%
......@@ -732,21 +615,24 @@ plot_confusion_matrices <- function(df_predictions, models, nrow = 2) {
p_cnf_mtrx_raw <- df_average_confusion_matrices %>%
ggplot(aes(`Observed GOSE`, `Predicted GOSE`, fill = n)) +
geom_raster() +
geom_raster(fill = "white") +
geom_text(aes(
label = sprintf("%.1f", n) %>%
ifelse(. == "0.0", "", .)
label = sprintf("%3.f", n) %>%
ifelse(. == "0.0", "", .),
color = n
),
size = 2
size = scriptsize
) +
geom_hline(yintercept = c(2, 4, 6) + .5, color = "black") +
geom_vline(xintercept = c(2, 4, 6) + .5, color = "black") +
scale_fill_gradient(low = "white", high = "#555555") +
scale_color_gradient(low = "#999999", high = "black") +
#scale_fill_gradient(low = "white", high = "#555555") +
coord_fixed(expand = FALSE) +
labs(x = "observed GOSe", y = "imputed GOSe", fill = "") +
theme_bw() +
theme(
panel.grid = element_blank()
panel.grid = element_blank(),
legend.position = "none"
) +
facet_wrap(~model, nrow = nrow) +
ggtitle("Average confusion matrix accross folds (absolute counts)")
......@@ -766,19 +652,23 @@ plot_confusion_matrices <- function(df_predictions, models, nrow = 2) {
labs(x = "observed GOSe", y = "imputed GOSe", fill = "") +
theme_bw() +
theme(
panel.grid = element_blank()
panel.grid = element_blank(),
legend.position = legendpos
) +
facet_wrap(~model, nrow = nrow) +
ggtitle("Average confusion matrix accross folds (column fraction)")
cowplot::plot_grid(p_cnf_mtrx_raw, p_cnf_mtrx_colnrm, ncol = 1, align = "v")
cowplot::plot_grid(p_cnf_mtrx_raw, p_cnf_mtrx_colnrm, ncol = 1, align = "h")
}
plot_confusion_matrices(
df_predictions %>% filter(!(gupi %in% idx)),
df_predictions %>%
filter(!(gupi %in% idx)),
c("MSM", "GP + cov", "MM", "LOCF"),
nrow = 2
nrow = 1,
legendpos = "none",
scriptsize = 2
)
ggsave(filename = "confusion_matrices_locf.pdf", width = 6, height = 9)
......@@ -802,7 +692,7 @@ we also consider the performance conditional on the respective ground-truth
(i.e. the observed GOSe values in the test sets).
The results are shown in Figure ??? (vertical bars are +/- one standard error of the mean).
```{r error-scores-locf, echo=FALSE, fig.height=4, fig.width=9}
```{r error-scores-locf, echo=FALSE, fig.height=3.5, fig.width=6}
plot_summary_measures_cond <- function(df_predictions, models, label) {
df_predictions %>%
......@@ -812,7 +702,7 @@ plot_summary_measures_cond <- function(df_predictions, models, label) {
RMSE = mean((GOSE - prediction)^2, na.rm = TRUE) %>% sqrt,
MAE = mean(abs(GOSE - prediction), na.rm = TRUE),
Bias = mean(prediction, na.rm = TRUE) - mean(GOSE, na.rm = TRUE),
`Pr[imp. > true] - Pr[imp. < true]` = mean(prediction > GOSE, na.rm = TRUE) - mean(prediction < GOSE, na.rm = TRUE)
`Bias'` = mean(prediction > GOSE, na.rm = TRUE) - mean(prediction < GOSE, na.rm = TRUE)
) %>%
gather(error, value, -model, -GOSE, -fold) %>%
group_by(GOSE, model, error, fold) %>%
......@@ -829,26 +719,28 @@ plot_summary_measures_cond <- function(df_predictions, models, label) {
model = factor(model, models),
error = factor(error, c(
"Bias",
"Pr[imp. > true] - Pr[imp. < true]",
"Bias'",
"MAE",
"RMSE"
))
) %>%
ggplot(aes(GOSE, color = model)) +
geom_hline(yintercept = 0, color = "black") +
geom_line(aes(y = mean)) +
geom_errorbar(aes(ymin = mean - se, ymax = mean + se),
width = .2,
position = position_dodge(.33)
position = position_dodge(.2),
size = 1
) +
geom_line(aes(y = mean), alpha = .66) +
xlab("GOSe") +
xlab("observed GOSe") +
facet_wrap(~error, nrow = 1) +
scale_y_continuous(name = "", breaks = seq(-2, 8, .25)) +
theme_bw() +
theme(
panel.grid.minor = element_blank(),
panel.grid.major.x = element_blank(),
legend.title = element_blank()
legend.title = element_blank(),
legend.position = "bottom"
) +
ggtitle(label)
......@@ -860,8 +752,8 @@ plot_summary_measures_cond(
"Summary measures by observed GOSe, LOCF subset"
)
ggsave(filename = "errors_stratified_locf.pdf", width = 9, height = 5)
ggsave(filename = "errors_stratified_locf.png", width = 9, height = 5)
ggsave(filename = "errors_stratified_locf.pdf", width = 6, height = 3.5)
ggsave(filename = "errors_stratified_locf.png", width = 6, height = 3.5)
```
Just as with the overall performance, differences are most pronounced in terms
......@@ -910,7 +802,9 @@ to the LOCF subset.
plot_confusion_matrices(
df_predictions,
c("MSM", "GP + cov", "MM"),
nrow = 1
nrow = 1,
legendpos = "none",
scriptsize = 2.5
)
ggsave(filename = "confusion_matrices_all.pdf", width = 6, height = 6)
......@@ -919,15 +813,15 @@ ggsave(filename = "confusion_matrices_all.png", width = 6, height = 6)
```{r error-scores-all, echo=FALSE, fig.height=4, fig.width=9}
```{r error-scores-all, echo=FALSE, fig.height=3.5, fig.width=6}
plot_summary_measures_cond(
df_predictions %>% filter(!(gupi %in% idx)),
c("MSM", "GP + cov", "MM"),
"Summary measures by observed GOSe, full test set"
)
ggsave(filename = "imputation_error.pdf", width = 9, height = 4)
ggsave(filename = "imputation_error.png", width = 9, height = 4)
ggsave(filename = "imputation_error.pdf", width = 6, height = 3.5)
ggsave(filename = "imputation_error.png", width = 6, height = 3.5)
```
......@@ -1025,7 +919,7 @@ provide a probabilistic output.
## Distribution of GOSe in validation folds
```{r marginal-GOSe-folds, echo=FALSE, fig.cap="Marginal GOSe distribution per validation fold."}
```{r marginal-GOSe-folds, echo=FALSE, fig.cap="Marginal GOSe distribution per validation fold.", fig.width=6, fig.height=3}
df_ground_truth %>%
ggplot(aes(Outcomes.DerivedCompositeGOSE)) +
geom_bar() +
......@@ -1037,9 +931,239 @@ df_ground_truth %>%
panel.grid.minor = element_blank(),
panel.grid.major.x = element_blank()
)
ggsave(filename = "gose_marignal_per_fold.pdf", width = 6, height = 3)
ggsave(filename = "gose_marignal_per_fold.png", width = 6, height = 3)
```
## hm
```{r baseline-table-continuous2, echo=FALSE, results='asis'}
summarizer <- function(x) {
return(tibble(
statistic = c("n", "# NA", "min", "mean", "median", "max", "std") %>%
factor(., levels = ., ordered = TRUE),
value = list(length(x), sum(is.na(x)), min(x, na.rm = TRUE), mean(x, na.rm = TRUE), median(x, na.rm = TRUE),
max(x, na.rm = TRUE), sd(x, na.rm = TRUE))
))
}
get_cont_smry <- function(dfs) {
res <- list()
for (i in 1:length(dfs)) {
res[[i]] <- dfs[[i]] %>%
select_if(is.numeric) %>%
gather(variable, value) %>%
group_by(variable) %>%
do(
smry = summarizer(.$value)
) %>%
unnest %>%
spread(statistic, value) %>%
unnest()
}
return(res)
}
df_baseline %>%
mutate(locf = ifelse(!(gupi %in% idx), "LOCF possible", "LOCF impossible")) %>%
group_by(locf) %>%
nest %>%
do(
locf = .$locf,
smry = get_cont_smry(.$data)
) %>%
unnest %>% unnest %>%
arrange(variable, locf) %>%
gather(key = "Statistic", value = "value", 3:9) %>%
spread(Statistic, value) %>%
arrange(variable, locf) %>%
pander::pandoc.table("Discrete baseline variables by LOCF status", digits = 3, split.tables = 120)
```
```{r baseline-table-discrete2, echo=FALSE, results='asis'}
summarizer <- function(x) {
rbind(table(x), prop.table(table(x))) %>%
t %>%
{tmp <- .; colnames(tmp) <- c("n", "%"); tmp} %>%
as_tibble(rownames = "level") %>%
mutate(`%` = 100 * `%`)
}
get_cont_smry <- function(dfs) {
res <- list()
for (i in 1:length(dfs)) {
res[[i]] <- dfs[[i]] %>%
select_if(~!is.numeric(.)) %>%
select(-gupi) %>%
mutate_all(as.factor) %>%
mutate_all(fct_explicit_na) %>%
mutate_all(as.character) %>%
gather(variable, value) %>%
group_by(variable) %>%
do(
smry = summarizer(.$value)
) %>%
unnest()
}
return(res)
}
df_baseline %>%
mutate(locf = ifelse(!(gupi %in% idx), "LOCF possible", "LOCF impossible")) %>%
group_by(locf) %>%
nest %>%
do(
locf = .$locf,
smry = get_cont_smry(.$data)
) %>%
unnest %>% unnest %>%
gather("stat", "value", n, `%`) %>%
unite(tmp, c(stat, locf), sep = ", ") %>%
spread(tmp, value) %>%
mutate_at(3:6, ~ifelse(is.na(.x), 0, .x)) %>%
pander::pandoc.table("Continuous baseline variables by LOCF status", digits = 3, split.tables = 140)
```
## Models
### Mixed-effects model
Mixed effects models are a widely used approach in longitudinal
data analysis and model individual deviations from a population mean trajectory
(@verbeke2009linear).
To account for the fact that the GOSe outcome is an ordered factor,
we employ a cumulative link function model with flexible intercepts
(@Agresti2003).
The population mean is modeled as cubic spline function to allow a
non-linear population mean trajectory.
Patient-individual deviations from this population mean are modeled
as quadratic polynomials to allow sufficient flexibility (random effects).
Baseline covariates are added as linear fixed effects to to the
population mean.
The model was fitted using Bayesian statistics via the BRMS
package (@brms2017; @brms2018) for the R environment for statistical
computing (@R2016) and the Stan modelling language (@stan2017).
Non-informative priors were used for the model parameters.
A potential drawback of the proposed longitudinal mixed effects model
is the fact that the individual deviations from the population mean
are modeled globally using polynomials.
Since linear and quadratic terms are not identifiable for patients with
only one observed GOSE value, this implies large uncertainty over the
patient-specific effects for individuals with one or two observations
only by falling back on the non-informative priors on these model
parameters.
Thus, the overall uncertainty associated with model-based imputations
at exactly 180 days may become relatively large.
### Gaussian process model
Gaussian process regression allows flexible modelling of
both the individual GOSE trajectories as well as the population mean
in a Bayesian non-parametric way [@rasmussen2006].
This non-parametric paradigm leads to low model-uncertainty in the
vicinity of actually observed GOSe outcomes.
To account for the discreteness of the GOSe outcome,
the continuous output of the Gaussian process model is rounded to the
nearest integer in 1 to 8 (GOSe categories).
The squared exponential covariance function with shared
length scale for all individuals is used to model intra-individual
dependency of GOSe outcomes.
The population mean trajectory of the Gaussian process is modeled as
mean of an independent Gaussian process with pseudo observations at
`r config$gp$t_knots %>% as.numeric()` days post-injury
(also using a squared exponential covariance function).
This approach maintains flexibility of the population mean function
similar to the spline-based approach for the mixed effects model
while avoiding the the computational complexity of a fully
hierarchical Gaussian process model.
Again, the impact of baseline covariates is modeled via linear effects
on the population mean of the Gaussian process.
All parameters are estimated in a fully Bayesian fashion using the Stan
modelling language @stan2017 and non-informative priors except for the
length scale of the squared exponential kernel.
Due to the sparseness of the data, the estimated length scale will naturally
tend towards extremely large values implying unrealistically
long-range dependency between observations.
We therefore limit the length scale to a maximum of 120 days (4 months) and
impose a Gaussian prior with a mean of 60 days post injury and and a
standard deviation of 14.
### Multi-state model
Both the mixed effects model as well as the Gaussian process regression
model are essentially non-linear regression techniques for longitudinal
data.
While they are both powerful tools to model longitudinal trajectories,
they do not explicitly model the probability of transitions between
GOSe states.
Since the number of observations per individual is very limited in our
data set (1 to 4 GOSe observations per individual),
an approach explicitly modelling transition probabilities might be
more suitable to capture the dynamics of the GOSe trajectories.
To explore this further, a Markov multi-state model is considered (@meira2009).
This model class assumes that the transitions between adjacent GOSe
states can be modeled as a Markov process and the transition
intensities between adjacent states are fitted to the observed data.
```{r diagram, echo=FALSE, fig.cap="Structural diagram of allowed transitions between GOSe states for the proposed multi-state model.", fig.width=8, fig.height=4}
# define transition matrix
Q <- matrix(0, nrow = 8, ncol = 8)
for (i in 1:8) {
for (j in 1:8) {
if (i == j - 1 & i != 1) {
Q[i, j] <- 1
}
if (i == j + 1) {
Q[i, j] <- 1
}
}
}
Q[2:8, 1] <- 1 # allow instantaneous deaths
Q <- t(Q)
curvature <- Q / 5
curvature[1, ] <- 0
diagram::plotmat(
Q,
relsize = 1,
shadow.size = 0,
pos = c(1, 7),
box.size = .033,
arr.pos = .55,
arr.length = .33,
arr.type = "triangle",
cex.txt = 0,
curve = curvature
)
```
To account for the fact that state-transitions might be more frequent
in the early post-injury phase,
piece wise constant transition intensities were fitted to the intervals
[0, 90), [90, 270), and 270+ days post-injury.
The model was fit using the msm @msm2011 package for the R environment
for statistical computing @R2016.
Due to the relatively large number of 19 transition intensities in the proposed model
(cf. arrows in Figure ???, structure of transition graph),
inclusion of all baseline covariates turned out to be numerically unstable.
For the MSM model, instead of including all covariates,
only a model adjusting for age at injury via a proportional hazard approach was fit.
## Reproducible Research Strategy
CENTER-TBI is committed to reproducible research.
......
varname,label
gupi,gupi
Subject.Age,Age
Subject.Sex,Sex
InjuryHx.PupilsBaselineDerived,Pupils
InjuryHx.GCSScoreBaselineDerived,GCS
InjuryHx.EDComplEventHypoxia,Hypoxia
InjuryHx.EDComplEventHypotension,Hypotension
Labs.DLHemoglobingdL,Hemoglobin [g/dL]
Imaging.MarshallCTClassification,Marshall CT
CTMRI.CTSubarachnoidHem,Subarachnoid Hematoma
CTMRI.CTExtraduralHema,Extradural Hematoma
Labs.DLGlucosemmolL,Glucose [mmol/L]
Subject.PatientType,Stratum
InjuryHx.InjCause,Cause of Injury
InjuryHx.TotalISS,"ISS, total"
Markdown is supported
0% or
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment