Commit 210b1842 authored by Kevin's avatar Kevin

expanded model assessment.rmd

parent f4679a85
...@@ -2,6 +2,7 @@ output ...@@ -2,6 +2,7 @@ output
.snakemake .snakemake
.Rproj.user .Rproj.user
.cache .cache
*.zip
*.pdf *.pdf
*.Rproj *.Rproj
*.rds *.rds
......
...@@ -32,7 +32,7 @@ df_gose <- readRDS(inputfile) %>% ...@@ -32,7 +32,7 @@ df_gose <- readRDS(inputfile) %>%
tmp <- df_gose %>% tmp <- df_gose %>%
rbind(tibble( # only predict GOSE at 180 days - takes too long otherwise rbind(tibble( # only predict GOSE at 180 days - takes too long otherwise
gupi = df_gose$gupi %>% unique, gupi = df_gose$gupi %>% unique,
Outcomes.DerivedCompositeGOSEDaysPostInjury = config$t_out_msm, Outcomes.DerivedCompositeGOSEDaysPostInjury = config$t_out_msm + .5, # needed to offset
Outcomes.DerivedCompositeGOSE = 99 Outcomes.DerivedCompositeGOSE = 99
)) %>% )) %>%
arrange(gupi, Outcomes.DerivedCompositeGOSEDaysPostInjury) %>% arrange(gupi, Outcomes.DerivedCompositeGOSEDaysPostInjury) %>%
...@@ -64,7 +64,6 @@ for (i in 1:7) { ...@@ -64,7 +64,6 @@ for (i in 1:7) {
} }
Q[2:7, 1] <- 1 # allow instantaneous deaths Q[2:7, 1] <- 1 # allow instantaneous deaths
# someone comes back from the dead - make sure death is last observation
fit <- msm( fit <- msm(
Outcomes.DerivedCompositeGOSE ~ Outcomes.DerivedCompositeGOSEDaysPostInjury, Outcomes.DerivedCompositeGOSE ~ Outcomes.DerivedCompositeGOSEDaysPostInjury,
subject = tmp$gupi, subject = tmp$gupi,
......
---
title: "Imputing GOSE scores in CENTER-TBI"
date: "`r Sys.time()`"
statistician: "Kevin Kunzmann (kevin.kunzmann@mrc-bsu.cam.ac.uk)"
collaborator: "David Menon (dkm13@cam.ac.uk)"
output: reportr::report
git-commit-hash: "`r system('git rev-parse --verify HEAD', intern=TRUE)`"
git-wd-clean: "`r ifelse(system('git diff-index --quiet HEAD') == 0, 'clean', 'file changes, working directory not clean!')`"
params:
data_dir: "../output/v1.1/data"
config_file: "../config.yml"
---
```{r setup-chunk, include=FALSE}
options(tidyverse.quiet = TRUE) # supresses filter/lag conflicts
require(tidyverse, quietly = TRUE)
config <- yaml::read_yaml(params$config_file)
set.seed(config$seed)
```
# Model descriptions
```{r}
df_imputations <- read_csv(sprintf("%s/imputation/msm/df_gose_imputed.csv", params$data_dir))
df_gose <- readRDS(sprintf("%s/df_gose.rds", params$data_dir))
```
# Session Info
```{r zip-figures}
system("zip figures.zip *.png *.pdf")
system("rm *.png *.pdf")
```
```{r session-info}
sessionInfo()
```
...@@ -451,6 +451,7 @@ We therefore drop it from the compariosn and compare the remaining ...@@ -451,6 +451,7 @@ We therefore drop it from the compariosn and compare the remaining
methods on the entire test sets. methods on the entire test sets.
\newpage
## Comparson on full test set ## Comparson on full test set
...@@ -588,14 +589,47 @@ ggsave(filename = "imputation_error.png", width = 7, height = 7) ...@@ -588,14 +589,47 @@ ggsave(filename = "imputation_error.png", width = 7, height = 7)
# Summary
\begin{itemize}
\item locf is working OK (better than with older data)
\item locf is overall negatively biased
\item this is driven by underestimating GOSE for high-GOSE individuals
which are the most frequent cases in the test set.
\item locf has slightly less accurarcy than other methods (RMSE error indicates a few large deviations)
\item locf cannot compute GOSE in all cases
\item locf does not provide uncertainty estimates
\item msm and gp are also unbiased and have comparable accuracy
\item msm has lower absolute bias conditional on the true GOSE while gp
only blances between underestimating large GOSE and overestimating low GOSE (cf. confusion matrix), overly regressing to the mean
\item covariates helped in the case of gaussian processes but completely irrelevant for the mm models.
\item msm cannot incorporate covariates due to combinatorical explosion of
parameters; possible for model with reduction to 4 categories.
\item msm is only model with interpretable structure
\item msm is fitted by maximum likelihood, deterministic algorithm gp and mm
require sampleing $\leadsto$ msm does not depend on random seed, faster
\end{itemize}
Overall, locf is surprisingly good - but biased. Especially so for the majority
class of good performing individuals.
It is also not universally applicable to all cases.
MSM is unbiased (with better profile than gp),
does not rely on covariates, is relatively fast to fit, can be computed for all
individuals, has similar acuracy to gp and better as compared to locf.
# Session Info
```{r zip-figures} ```{r zip-figures}
system("zip figures.zip *.png *.pdf") system("zip figures.zip *.png *.pdf")
system("rm *.png *.pdf") system("rm *.png *.pdf")
``` ```
# Session Info
```{r session-info} ```{r session-info}
sessionInfo() sessionInfo()
``` ```
......
Markdown is supported
0% or
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment