Skip to content
Projects
Groups
Snippets
Help
Loading...
Help
Support
Submit feedback
Contribute to GitLab
Sign in
Toggle navigation
GOSe-6mo-imputation-paper
Project overview
Project overview
Details
Activity
Releases
Cycle Analytics
Repository
Repository
Files
Commits
Branches
Tags
Contributors
Graph
Compare
Charts
Issues
0
Issues
0
List
Boards
Labels
Milestones
Merge Requests
0
Merge Requests
0
Members
Members
Collapse sidebar
Close sidebar
Activity
Graph
Charts
Create a new issue
Commits
Issue Boards
Open sidebar
Kevin Kunzmann
GOSe-6mo-imputation-paper
Commits
0badf884
Commit
0badf884
authored
Mar 22, 2019
by
Kevin Kunzmann
Browse files
Options
Browse Files
Download
Email Patches
Plain Diff
revision of manuscript
parent
21167658
Changes
1
Show whitespace changes
Inline
Side-by-side
Showing
1 changed file
with
311 additions
and
302 deletions
+311
-302
manuscript/manuscript.Rmd
manuscript/manuscript.Rmd
+311
-302
No files found.
manuscript/manuscript.Rmd
View file @
0badf884
...
...
@@ -463,9 +463,7 @@ for (i in 1:config$folds) {
)
}
```
```{
r
misc
,
include
=
FALSE
}
modelnames
<-
lapply
(
list
.
dirs
(
sprintf
(
"%s/validation/posteriors"
,
params
$
data_dir
),
...
...
@@ -540,7 +538,7 @@ All models were fit on the entire available data after removing the
180
+/-
14
days
post
-
injury
observation
from
the
respective
test
fold
thus
mimicking
a
missing
completely
at
random
missing
data
mechanism
.
The
distribution
of
GOSe
values
in
the
respective
three
test
sets
is
well
balanced
,
cf
.
Figure
???
(
Appendix
).
balanced
,
cf
.
Figure
???
(
cf
.
Appendix
).
Performance
is
assessed
using
the
absolute
-
count
and
the
normalized
(
proportions
)
confusion
matrices
as
well
as
bias
,
mean
absolute
error
(
MAE
),
and
root
mean
squared
error
(
RMSE
).
...
...
@@ -551,7 +549,7 @@ RMSE puts more weight on large deviations as compared to RMSE.
Comparisons
in
terms
of
bias
,
MAE
,
and
RMSE
tacitly
assume
that
GOSe
values
can
be
sensibly
interpreted
on
an
interval
scale
.
We
therefore
also
consider
$
Pr
[
est
>
true
]
-
Pr
[
est
<
true
]$
as
an
alternative
measure
of
bias
.
alternative
measure
of
bias
which
does
not
require
this
tacit
assumption
.
Note
that
the
scale
is
not
directlz
comparable
to
the
one
of
the
other
three
quantities
!
All
measures
are
considered
both
conditional
on
the
ground
-
truth
...
...
@@ -584,42 +582,27 @@ is depicted in Figure ??? both conditional on LOCF being applicable and,
excluding
LOCF
,
on
the
entire
test
set
.
```{
r
overall
-
comparison
-
all
-
methods
,
echo
=
FALSE
,
fig
.
height
=
7
}
p1 <- df_predictions %>%
plot_summary_measures
<-
function
(
df
,
label
)
{
df_predictions
%>%
filter
(
!(gupi %in% idx)) %>%
group_by
(
model
,
fold
)
%>%
summarize
(
RMSE
=
mean
((
GOSE
-
prediction
)^
2
,
na
.
rm
=
TRUE
)
%>%
sqrt
,
MAE
=
mean
(
abs
(
GOSE
-
prediction
),
na
.
rm
=
TRUE
),
Bias
=
mean
(
prediction
,
na
.
rm
=
TRUE
)
-
mean
(
GOSE
,
na
.
rm
=
TRUE
),
`Pr[est > true] - Pr[est
< true]` = mean(prediction > GOSE, na.rm = TRUE) - mean(prediction < GOSE, na.rm = TRUE)
`
Pr
[
est
.
>
true
]
-
Pr
[
est
.
<
true
]`
=
mean
(
prediction
>
GOSE
,
na
.
rm
=
TRUE
)
-
mean
(
prediction
<
GOSE
,
na
.
rm
=
TRUE
)
)
%>%
ungroup
%>%
gather
(
error
,
value
,
-
model
,
-
fold
)
%>%
ggplot(aes(model, value)) +
geom_hline(yintercept = 0, color = "black") +
geom_boxplot() +
facet_wrap(~error, nrow = 1) +
scale_y_continuous(name = "", breaks = seq(-2, 8, .25), limits = c(-.5, 1.5)) +
scale_x_discrete("") +
theme_bw() +
theme(
panel.grid.minor = element_blank(),
panel.grid.major.x = element_blank(),
axis.text.x = element_text(angle = 66, hjust = 1)
) +
ggtitle("Overall summary measures, LOCF subset")
p2 <- df_predictions %>%
filter(model != "LOCF") %>%
group_by(model, fold) %>%
summarize(
RMSE = mean((GOSE - prediction)^2, na.rm = TRUE) %>% sqrt,
MAE = mean(abs(GOSE - prediction), na.rm = TRUE),
Bias = mean(prediction, na.rm = TRUE) - mean(GOSE, na.rm = TRUE),
`Pr[est > true] - Pr[est < true]` = mean(prediction > GOSE, na.rm = TRUE) - mean(prediction < GOSE, na.rm = TRUE)
mutate
(
error
=
factor
(
error
,
c
(
"Bias"
,
"Pr[est. > true] - Pr[est. < true]"
,
"MAE"
,
"RMSE"
))
)
%>%
ungroup %>%
gather(error, value, -model, -fold) %>%
ggplot
(
aes
(
model
,
value
))
+
geom_hline
(
yintercept
=
0
,
color
=
"black"
)
+
geom_boxplot
()
+
...
...
@@ -632,35 +615,46 @@ p2 <- df_predictions %>%
panel
.
grid
.
major
.
x
=
element_blank
(),
axis
.
text
.
x
=
element_text
(
angle
=
66
,
hjust
=
1
)
)
+
ggtitle("Overall summary measures, full test set")
ggtitle
(
label
)
}
cowplot
::
plot_grid
(
p1,
p2,
plot_summary_measures
(
df_predictions
%>%
filter
(
gupi
%
in
%
idx
),
"Summary measures, LOCF subset"
),
plot_summary_measures
(
df_predictions
,
"Summary measures, full test set"
),
ncol
=
1
,
align
=
"v"
,
axis
=
"lr"
)
)
```
Firstly,
as could be expected,
LOCF is overall negatively biased, i.e.,
Firstly
,
LOCF
is
overall
negatively
biased
,
i
.
e
.,
on
average
it
imputes
lower
-
than
-
observed
GOSe
values
.
This indicates that there is a population average trend towards continued
recovery within the first 6 months post-injury.
In terms of accuracy, the LOCF does perform worst but differences between
This
reflects
a
population
average
trend
towards
continued
recovery
within
the
first
6
months
post
injury
.
The
fact
that
both
ways
of
quantifying
bias
qualitatively
agree
,
indicates
that
the
interpretation
of
GOSe
as
an
interval
measure
which
is
tacitly
underlying
Bias
,
MAE
,
and
RMSE
comparisons
is
not
too
restrictive
.
In
terms
of
accuracy
,
LOCF
does
perform
worst
but
differences
between
methods
are
less
pronounced
than
in
terms
of
bias
.
Notably, the
difference in terms of RMSE is more substantial than in terms of
MAE which indicates that LOCF tends to incur more large deviations across
several GOSe categories than the other methods considered here
.
Notably
,
the
RMSE
difference
between
LOCF
and
the
other
methods
is
slightly
larger
than
the
MAE
difference
which
indicates
that
LOCF
tends
to
produce
more
large
deviations
,
i
.
e
.,
across
several
GOSe
categories
.
Secondly
,
including
baseline
covariates
only
affects
the
GP
regression
model
notably
.
Both
the
MM
and
MSM
model
perform
more
or
less
the
same
irrespective
of
adjustment
for
baseline
covariates
.
This indicates that the
marginal
additional predictive value of baseline
This
indicates
that
the
additional
predictive
value
of
baseline
covariates
over
the
information
contained
in
at
least
one
observed
GOSe
value
is
limited
.
Furthermore, note that both variants of the mixed effects model fail to
remove
Furthermore
,
note
that
both
variants
of
the
mixed
effects
model
fail
to
correct
the
overall
bias
of
the
imputed
values
.
We proceed with a
n in-depth
analyses of a subset of models both in direct
We
proceed
with
a
detailed
analyses
of
a
subset
of
models
both
in
direct
comparison
with
LOCF
and
in
the
entire
data
set
including
those
cases
where
LOCF
is
not
applicable
.
In
the
following
we
only
consider
the
baseline
-
adjusted
Gaussian
process
model
...
...
@@ -668,18 +662,23 @@ In the following we only consider the baseline-adjusted Gaussian process model
and
the
multi
-
state
model
without
baseline
covariates
(
'MSM'
).
The
rationale
behind
dropping
baseline
adjustment
for
MM
and
MSM
being
that
the
additional
complexity
does
not
substantially
alter
overall
performance
.
On
the
other
hand
,
the
GP
model
benefits
from
the
inclusion
of
the
IMPACT
baseline
covariates
.
##
Detailed
comparison
conditional
on
LOCF
subset
We
first
consider
results
for
the
set
of
test
cases
which
allow
LOCF
imputation
(
n
=
`
r
df_predictions
%>%
filter
(
model
==
"LOCF"
)
%>%
nrow
-
length
(
idx
)`).
Both the raw count as well as the relative (by true GOSe) confusion matrices
Both
the
raw
count
as
well
as
the
relative
(
by
left
-
out
true
GOSe
)
confusion
matrices
are
presented
in
Figure
???.
```{
r
confusion
-
matrix
-
locf
,
warning
=
FALSE
,
message
=
FALSE
,
echo
=
FALSE
,
fig
.
cap
=
"Confusion matrices on LOCF subset."
}
df_average_confusion_matrices <- df_predictions %>%
filter(!(gupi %in% idx)) %>%
plot_confusion_matrices
<-
function
(
df_predictions
,
models
)
{
df_average_confusion_matrices
<-
df_predictions
%>%
filter
(
model
%
in
%
models
)
%>%
group_by
(
fold
,
model
)
%>%
do
(
confusion_matrix
=
caret
::
confusionMatrix
(
...
...
@@ -693,16 +692,21 @@ df_average_confusion_matrices <- df_predictions %>%
unnest
%>%
group_by
(
model
,
`
Predicted
GOSE
`,
`
True
GOSE
`)
%>%
summarize
(
n
=
mean
(
n
))
%>%
ungroup
ungroup
%>%
mutate
(
model
=
factor
(
model
,
models
))
p_cnf_mtrx_raw <- df_average_confusion_matrices %>%
filter(model %in% c("LOCF", "MM", "GP + cov", "MSM")) %>%
group_by(model, `True GOSE`) %>%
p_cnf_mtrx_raw
<-
df_average_confusion_matrices
%>%
ggplot
(
aes
(`
True
GOSE
`,
`
Predicted
GOSE
`,
fill
=
n
))
+
geom_raster
()
+
geom_text
(
aes
(
label
=
sprintf
(
"%.1f"
,
n
)
%>%
ifelse
(.
==
"0.0"
,
""
,
.)
),
size
=
1.5
)
+
geom_hline
(
yintercept
=
c
(
2
,
4
,
6
)
+
.5
,
color
=
"black"
)
+
geom_vline
(
xintercept
=
c
(
2
,
4
,
6
)
+
.5
,
color
=
"black"
)
+
scale_fill_gradient(low = "white", high = "black
") +
scale_fill_gradient
(
low
=
"white"
,
high
=
"#555555
"
)
+
coord_fixed
(
expand
=
FALSE
)
+
labs
(
x
=
"true GOSe"
,
y
=
"imputed GOSe"
,
fill
=
""
)
+
theme_bw
()
+
...
...
@@ -712,8 +716,7 @@ p_cnf_mtrx_raw <- df_average_confusion_matrices %>%
facet_wrap
(~
model
,
nrow
=
1
)
+
ggtitle
(
"Average confusion matrix accross folds (absolute counts)"
)
p_cnf_mtrx_colnrm <- df_average_confusion_matrices %>%
filter(model %in% c("LOCF", "MM", "GP + cov", "MSM")) %>%
p_cnf_mtrx_colnrm
<-
df_average_confusion_matrices
%>%
group_by
(
model
,
`
True
GOSE
`)
%>%
mutate
(
`
fraction
(
column
)`
=
n
/
sum
(
n
),
...
...
@@ -733,31 +736,63 @@ p_cnf_mtrx_colnrm <- df_average_confusion_matrices %>%
facet_wrap
(~
model
,
nrow
=
1
)
+
ggtitle
(
"Average confusion matrix accross folds (column fraction)"
)
cowplot::plot_grid(p_cnf_mtrx_raw, p_cnf_mtrx_colnrm, ncol = 1, align = "v")
cowplot
::
plot_grid
(
p_cnf_mtrx_raw
,
p_cnf_mtrx_colnrm
,
ncol
=
1
,
align
=
"v"
)
}
plot_confusion_matrices
(
df_predictions
%>%
filter
(
!(gupi %in% idx)),
c
(
"MSM"
,
"GP + cov"
,
"MM"
,
"LOCF"
)
)
ggsave(filename = "confusion_matrices_locf.pdf", width = 7, height =
7
)
ggsave(filename = "confusion_matrices_locf.png", width = 7, height =
7
)
ggsave
(
filename
=
"confusion_matrices_locf.pdf"
,
width
=
7
,
height
=
6
)
ggsave
(
filename
=
"confusion_matrices_locf.png"
,
width
=
7
,
height
=
6
)
```
The
absolute
-
count
confusion
matrices
show
that
most
imputed
values
are
within +/- one GOSE categories of the
true value
.
However, they also reflect the category imbalance (cf. Figures ??? and ???
,
Appendix) in the study p
p
opulation.
The performance conditional on the (in practice unknown)
true
GOSe value
within
+/-
one
GOSE
categories
of
the
observed
ones
.
However
,
they
also
reflect
the
category
imbalance
(
cf
.
Figures
???
and
???
Appendix
)
in
the
study
population
.
The
performance
conditional
on
the
(
in
practice
unknown
)
observed
GOSe
value
clearly
shows
that
imputation
performance
for
the
most
infrequent
category
4
(GOSe of 1 and 2 are not observed in the test set!)
is
most
problematic
.
This
is
,
however
,
true
across
the
board
of
methods
considered
.
Note
that
both
the
MSM
as
well
as
the
MM
models
account
for
this
fact
by
almost
never
imputing
a
GOSe
of
4.
is
most
problematic
.
This
is
,
however
,
true
across
the
range
of
methods
considered
.
Both
the
MSM
and
the
MM
models
account
for
this
by
almost
never
imputing
a
GOSe
of
4.
Instead
,
the
respective
cases
tend
to
be
imputed
to
GOSe
3
or
5.
Additionally
,
the
following
table
???
lists
some
confusion
probabilities
of
particular
interest
.
[
Comment
:
1
->
>
1
is
not
relevant
since
our
imputation
is
conditional
on
not
being
1
at
6
months
;
the
comparison
seems
to
favor
LOCF
since
only
**
TODO
:**
*
this
section
table
is
the
one
we
David
requested
in
our
last
meeting
,
not
entirely
convinced
though
...
*
...
1
->
>
1
is
not
relevant
since
our
imputation
is
conditional
on
not
being
1
at
6
months
*
...
the
comparison
seems
to
favor
LOCF
since
only
upward
confusions
are
considered
(
which
LOCF
by
design
tends
to
do
less
)]
*
Is
there
a
clinical
interpretation
along
the
way
that
'4'
might
constitue
a
short
-
term
transition
state
or
is
it
just
defined
in
a
way
that
makes
it
highly
unlikely
to
be
observed
in
practice
?
```{
r
crossing
-
table
,
echo
=
FALSE
,
warning
=
FALSE
,
results
=
'asis'
}
models
<-
c
(
"MSM"
,
"GP + cov"
,
"MM"
)
df_average_confusion_matrices
<-
df_predictions
%>%
filter
(
model
%
in
%
models
)
%>%
filter
(
!(gupi %in% idx)) %>%
group_by
(
fold
,
model
)
%>%
do
(
confusion_matrix
=
caret
::
confusionMatrix
(
data
=
factor
(.$
prediction
,
levels
=
1
:
8
),
reference
=
factor
(.$
GOSE
,
levels
=
1
:
8
)
)
%>%
as
.
matrix
%>%
as_tibble
%>%
mutate
(`
Predicted
GOSE
`
=
row_number
()
%>%
as
.
character
)
%>%
gather
(`
True
GOSE
`,
n
,
1
:
8
)
)
%>%
unnest
%>%
group_by
(
model
,
`
Predicted
GOSE
`,
`
True
GOSE
`)
%>%
summarize
(
n
=
mean
(
n
))
%>%
ungroup
%>%
mutate
(
model
=
factor
(
model
,
models
))
rbind
(
df_average_confusion_matrices
%>%
filter
(
model
%
in
%
c
(
"LOCF"
,
"MM"
,
"GP + cov"
,
"MSM"
))
%>%
...
...
@@ -791,37 +826,52 @@ df_average_confusion_matrices %>%
pander
::
pandoc
.
table
(
"Some specific confusion percentages, LOCF subset."
,
digits
=
3
)
```
[
TODO
:
Interpretation
?
GOSe
4
potentially
relatively
brief
transient
state
,
clinical
relevance
of
distinction
between
3
/
4
?]
To
better
understand
the
overall
performance
assessment
in
Figure
???,
we
also
consider
the
performance
conditional
on
the
respective
ground
-
truth
(
i
.
e
.
the
true
GOSe
values
in
the
respectiv
e
test
sets
).
The
results
are
shown
in
Figure
???.
(
i
.
e
.
the
observed
GOSe
values
in
th
e
test
sets
).
The
results
are
shown
in
Figure
???
(
vertical
bars
are
=/-
one
standard
error
of
the
mean
)
.
```{
r
error
-
scores
-
locf
,
echo
=
FALSE
,
fig
.
height
=
3
}
df_predictions
%>%
filter
(
!(gupi %in% idx)) %>%
filter
(
model
%
in
%
c
(
"LOCF"
,
"MM"
,
"GP + cov"
,
"MSM"
))
%>%
```{
r
error
-
scores
-
locf
,
echo
=
FALSE
,
fig
.
height
=
3
,
fig
.
width
=
9
}
plot_summary_measures_cond
<-
function
(
df_predictions
,
models
,
label
)
{
df_predictions
%>%
filter
(
model
%
in
%
models
)
%>%
group_by
(
model
,
fold
,
GOSE
)
%>%
summarize
(
RMSE
=
mean
((
GOSE
-
prediction
)^
2
,
na
.
rm
=
TRUE
)
%>%
sqrt
,
MAE
=
mean
(
abs
(
GOSE
-
prediction
),
na
.
rm
=
TRUE
),
bias
=
mean
(
prediction
,
na
.
rm
=
TRUE
)
-
mean
(
GOSE
,
na
.
rm
=
TRUE
)
Bias
=
mean
(
prediction
,
na
.
rm
=
TRUE
)
-
mean
(
GOSE
,
na
.
rm
=
TRUE
),
`
Pr
[
est
.
>
true
]
-
Pr
[
est
.
<
true
]`
=
mean
(
prediction
>
GOSE
,
na
.
rm
=
TRUE
)
-
mean
(
prediction
<
GOSE
,
na
.
rm
=
TRUE
)
)
%>%
gather
(
error
,
value
,
-
model
,
-
GOSE
,
-
fold
)
%>%
group_by
(
GOSE
,
model
,
error
,
fold
)
%>%
summarize
(
mean_per_fold
=
mean
(
value
)
#
overall
mean
)
%>%
gather
(
error
,
value
,
-
model
,
-
fold
,
-
GOSE
)
%>%
group_by
(
GOSE
,
model
,
error
)
%>%
summarize
(
mean
=
mean
(
value
),
sd
=
sd
(
value
)
mean
=
mean
(
mean_per_fold
),
#
overall
mean
se
=
sd
(
mean_per_fold
)
/
sqrt
(
n
()
)
)
%>%
ungroup
()
%>%
mutate
(
model
=
factor
(
model
,
models
),
error
=
factor
(
error
,
c
(
"Bias"
,
"Pr[est. > true] - Pr[est. < true]"
,
"MAE"
,
"RMSE"
))
)
%>%
ggplot
(
aes
(
GOSE
,
color
=
model
))
+
geom_hline
(
yintercept
=
0
,
color
=
"black"
)
+
geom_line
(
aes
(
y
=
mean
))
+
geom_point
(
aes
(
y
=
mean
))
+
geom_errorbar
(
aes
(
ymin
=
mean
-
se
,
ymax
=
mean
+
se
),
width
=
.2
,
position
=
position_dodge
(
.33
)
)
+
geom_line
(
aes
(
y
=
mean
),
alpha
=
.5
)
+
xlab
(
"GOSe"
)
+
facet_wrap
(~
error
)
+
facet_wrap
(~
error
,
nrow
=
1
)
+
scale_y_continuous
(
name
=
""
,
breaks
=
seq
(-
2
,
8
,
.25
))
+
theme_bw
()
+
theme
(
...
...
@@ -829,10 +879,18 @@ df_predictions %>%
panel
.
grid
.
major
.
x
=
element_blank
(),
legend
.
title
=
element_blank
()
)
+
ggtitle
(
"Bias and accuracy by true GOSe value, LOCF subset"
)
ggtitle
(
label
)
ggsave
(
filename
=
"errors_stratified_locf.pdf"
,
width
=
7
,
height
=
7
)
ggsave
(
filename
=
"errors_stratified_locf.png"
,
width
=
7
,
height
=
7
)
}
plot_summary_measures_cond
(
df_predictions
%>%
filter
(
!(gupi %in% idx)),
c
(
"MSM"
,
"GP + cov"
,
"MM"
,
"LOCF"
),
"Summary measures by observed GOSe, LOCF subset"
)
ggsave
(
filename
=
"errors_stratified_locf.pdf"
,
width
=
9
,
height
=
3
)
ggsave
(
filename
=
"errors_stratified_locf.png"
,
width
=
9
,
height
=
3
)
```
Just
as
with
the
overall
performance
,
differences
are
most
pronounced
in
terms
...
...
@@ -871,11 +929,25 @@ where only GOSe values after 180 days post-injury are available.
The
relative
characteristics
of
the
three
considered
approaches
are
comparable
to
the
LOCF
subset
.
**
TODO
**
*
decide
whether
figures
go
in
appendix
-
David
and
I
agree
on
them
being
actually
the
primary
analysis
.
we
just
needto
convince
people
of
the
fact
that
LOCF
should
be
dropped
*
first
*.
As
always
,
I
am
open
to
debate
this
but
we
should
just
make
a
decision
,
figurexit
or
figuremain
?
```{
r
confusion
-
matrix
,
warning
=
FALSE
,
message
=
FALSE
,
echo
=
FALSE
,
fig
.
cap
=
"Confusion matrices, full training set without LOCF."
}
plot_confusion_matrices
(
df_predictions
,
c
(
"MSM"
,
"GP + cov"
,
"MM"
)
)
ggsave
(
filename
=
"confusion_matrices_all.pdf"
,
width
=
7
,
height
=
6
)
ggsave
(
filename
=
"confusion_matrices_all.png"
,
width
=
7
,
height
=
6
)
```
```{
r
crossing
-
table
-
full
,
echo
=
FALSE
,
warning
=
FALSE
,
results
=
'asis'
}
models
<-
c
(
"MSM"
,
"GP + cov"
,
"MM"
)
df_average_confusion_matrices
<-
df_predictions
%>%
filter
(
model
%
in
%
c
(
"GP + cov"
,
"MM"
,
"MSM"
)
)
%>%
filter
(
model
%
in
%
models
)
%>%
group_by
(
fold
,
model
)
%>%
do
(
confusion_matrix
=
caret
::
confusionMatrix
(
...
...
@@ -889,51 +961,8 @@ df_average_confusion_matrices <- df_predictions %>%
unnest
%>%
group_by
(
model
,
`
Predicted
GOSE
`,
`
True
GOSE
`)
%>%
summarize
(
n
=
mean
(
n
))
%>%
ungroup
p_cnf_mtrx_raw
<-
df_average_confusion_matrices
%>%
group_by
(
model
,
`
True
GOSE
`)
%>%
ggplot
(
aes
(`
True
GOSE
`,
`
Predicted
GOSE
`,
fill
=
n
))
+
geom_raster
()
+
geom_hline
(
yintercept
=
c
(
2
,
4
,
6
)
+
.5
,
color
=
"black"
)
+
geom_vline
(
xintercept
=
c
(
2
,
4
,
6
)
+
.5
,
color
=
"black"
)
+
scale_fill_gradient
(
low
=
"white"
,
high
=
"black"
)
+
coord_fixed
(
expand
=
FALSE
)
+
labs
(
x
=
"true GOSe"
,
y
=
"imputed GOSe"
,
fill
=
""
)
+
theme_bw
()
+
theme
(
panel
.
grid
=
element_blank
()
)
+
facet_wrap
(~
model
,
nrow
=
1
)
+
ggtitle
(
"Average confusion matrix accross folds (absolute counts)"
)
p_cnf_mtrx_colnrm
<-
df_average_confusion_matrices
%>%
group_by
(
model
,
`
True
GOSE
`)
%>%
mutate
(
`
fraction
(
column
)`
=
n
/
sum
(
n
),
`
fraction
(
column
)`
=
ifelse
(
is
.
nan
(`
fraction
(
column
)`),
0
,
`
fraction
(
column
)`)
)
%>%
ggplot
(
aes
(`
True
GOSE
`,
`
Predicted
GOSE
`,
fill
=
`
fraction
(
column
)`))
+
geom_raster
()
+
geom_hline
(
yintercept
=
c
(
2
,
4
,
6
)
+
.5
,
color
=
"black"
)
+
geom_vline
(
xintercept
=
c
(
2
,
4
,
6
)
+
.5
,
color
=
"black"
)
+
scale_fill_gradient
(
""
,
low
=
"white"
,
high
=
"black"
,
limits
=
c
(
0
,
1
))
+
coord_fixed
(
expand
=
FALSE
)
+
labs
(
x
=
"true GOSe"
,
y
=
"imputed GOSe"
,
fill
=
""
)
+
theme_bw
()
+
theme
(
panel
.
grid
=
element_blank
()
)
+
facet_wrap
(~
model
,
nrow
=
1
)
+
ggtitle
(
"Average confusion matrix accross folds (column fraction)"
)
cowplot
::
plot_grid
(
p_cnf_mtrx_raw
,
p_cnf_mtrx_colnrm
,
ncol
=
1
,
align
=
"v"
)
ggsave
(
filename
=
"confusion_matrices_all.pdf"
,
width
=
7
,
height
=
7
)
ggsave
(
filename
=
"confusion_matrices_all.png"
,
width
=
7
,
height
=
7
)
```
```{
r
crossing
-
table
-
full
,
echo
=
FALSE
,
warning
=
FALSE
,
results
=
'asis'
}
ungroup
%>%
mutate
(
model
=
factor
(
model
,
models
))
rbind
(
df_average_confusion_matrices
%>%
group_by
(
model
)
%>%
...
...
@@ -964,39 +993,15 @@ df_average_confusion_matrices %>%
pander
::
pandoc
.
table
(
"Some specific confusion percentages, full data set."
,
digits
=
3
)
```
```{
r
error
-
scores
-
all
,
echo
=
FALSE
,
fig
.
height
=
3
}
df_predictions
%>%
filter
(
model
%
in
%
c
(
"GP + cov"
,
"MM"
,
"MSM"
))
%>%
group_by
(
model
,
fold
,
GOSE
)
%>%
summarize
(
RMSE
=
mean
((
GOSE
-
prediction
)^
2
,
na
.
rm
=
TRUE
)
%>%
sqrt
,
MAE
=
mean
(
abs
(
GOSE
-
prediction
),
na
.
rm
=
TRUE
),
bias
=
mean
(
prediction
,
na
.
rm
=
TRUE
)
-
mean
(
GOSE
,
na
.
rm
=
TRUE
)
)
%>%
gather
(
error
,
value
,
-
model
,
-
fold
,
-
GOSE
)
%>%
group_by
(
GOSE
,
model
,
error
)
%>%
summarize
(
mean
=
mean
(
value
),
sd
=
sd
(
value
)
)
%>%
ungroup
()
%>%
ggplot
(
aes
(
GOSE
,
color
=
model
))
+
geom_hline
(
yintercept
=
0
,
color
=
"black"
)
+
geom_point
(
aes
(
y
=
mean
))
+
geom_line
(
aes
(
y
=
mean
))
+
facet_wrap
(~
error
)
+
scale_y_continuous
(
name
=
""
,
breaks
=
seq
(-
2
,
8
,
.25
))
+
xlab
(
"GOSe"
)
+
theme_bw
()
+
theme
(
panel
.
grid
.
minor
=
element_blank
(),
panel
.
grid
.
major
.
x
=
element_blank
(),
legend
.
title
=
element_blank
()
)
+
ggtitle
(
"Accuracy by true GOSe value, full test set"
)
```{
r
error
-
scores
-
all
,
echo
=
FALSE
,
fig
.
height
=
3
,
fig
.
width
=
9
}
plot_summary_measures_cond
(
df_predictions
%>%
filter
(
!(gupi %in% idx)),
c
(
"MSM"
,
"GP + cov"
,
"MM"
),
"Summary measures by observed GOSe, full test set"
)
ggsave
(
filename
=
"imputation_error.pdf"
,
width
=
7
,
height
=
7
)
ggsave
(
filename
=
"imputation_error.png"
,
width
=
7
,
height
=
7
)
ggsave
(
filename
=
"imputation_error.pdf"
,
width
=
9
,
height
=
3
)
ggsave
(
filename
=
"imputation_error.png"
,
width
=
9
,
height
=
3
)
```
...
...
@@ -1005,40 +1010,49 @@ ggsave(filename = "imputation_error.png", width = 7, height = 7)
#
Discussion
The
most
important
measure
to
prevent
missing
data
bias
is
to
avoid
missing
values
in
the
first
place
!
Study
protocols
and
procedures
should
be
checked
preemptively
for
their
applicability
to
clinical
practice
and
adequate
effort
should
be
put
in
ensuring
good
compliance
with
follow
-
up
protocols
.
Handling
missing
data
*
post
-
hoc
*
to
prevent
biased
analyses
often
requires
great
effort
.
It
is
thus
of
the
utmost
importance
to
implement
measures
for
avoiding
missing
data
in
the
first
place
[
comment
:
I
strongly
feel
we
should
lead
with
this
sentence
or
something
in
the
same
spirit
to
make
it
absolutely
clear
that
statistics
cannot
be
used
to
impute
data
out
of
nowhere
.
Raising
awareness
for
the
complexity
of
missing
data
problems
and
should
rather
be
seen
as
an
incetive
to
invest
more
effort
upfront
in
preventing
missingness
in
the
first
place
;)]
Nevertheless
,
in
practice
,
missing
values
due
to
loss
-
to
-
follow
-
up
will
always
occur
and
should
be
addressed
efficiently
.
There
is
a
wide
consensus
that
non
-
imputation
of
missing
values
cannot
be
considered
best
practice
and
may
introduce
bias
or
lead
to
an
unnecessary
loss
of
power
.
The
current
gold
-
standard
for
imputing
missing
values
is
certainly
multiple
occur
and
should
be
addressed
effectively
There
is
a
wide
consensus
that
statistically
sound
imputation
of
missing
values
is
beneficial
for
both
the
reduction
of
bias
and
for
increasing
statistical
power
.
The
current
gold
-
standard
for
imputing
missing
values
is
multiple
imputation
on
a
per
-
analysis
basis
including
analysis
-
specific
covariates
to
further
reduce
bias
and
to
preserve
the
imputation
uncertainty
in
the
downstream
analysis
.
However
,
in
practice
,
there
are
good
reasons
to
provide
a
set
of
single
-
imputed
default
values
in
large
r
project
s
such
as
CENTER
-
TBI
.
E
.
g
.,
c
onsortia
are
increasingly
committed
to
making
their
databases
In
practice
,
however
,
there
are
good
reasons
for
providing
a
set
of
single
-
imputed
default
values
in
large
bservational
studie
s
such
as
CENTER
-
TBI
.
C
onsortia
are
increasingly
committed
to
making
their
databases
available
to
a
wider
range
of
researchers
.
In
fact
,
more
liberal
data
-
sharing
policies
are
becoming
a
n
core
requirement
for
many
funding
bodies
[
REFERENCES
]
.
In
fact
,
more
liberal
data
-
sharing
policies
are
becoming
a
core
requirement
for
funding
bodies
(
cf
.
https
://
www
.
openaire
.
eu
/)
.
In
this
context
,
it
might
not
be
possible
to
ensure
that
every
analysis
team
has
the
necessary
statistical
expertise
to
properly
conduct
a
per
-
analysis
multiple
imputation
.
multiple
imputation
in
the
future
.
Furthermore
,
the
imputed
values
of
a
multiple
-
imputation
procedure
are
inherently
random
and
it
is
thus
difficult
to
ensure
consistency
across
different
analysis
teams
.
For
this
reason
,
we
suggest
to
provide
a
default
single
-
imputation
with
appropriate
measures
of
uncertainty
for
key
outcomes
in
the
published
data
base
itself
.
This
mitigates
problems
with
complete
cases
analyses
and
provide
a
more
different
analysis
teams
if
the
values
themselves
cannot
be
stored
directly
in
a
database
.
For
this
reason
,
as
a
pratical
way
forward
,
we
suggest
providing
a
default
single
-
imputation
with
appropriate
measures
of
uncertainty
for
key
outcomes
in
the
published
data
base
itself
.
This
mitigates
problems
with
complete
-
case
analyses
and
provides
a
principled
and
consistent
default
approach
to
handling
missing
values
.
Since
we
strongly
suggest
to
employ
a
model
-
based
approach
to
imputation
,
the
fitted
class
probabilities
can
be
provided
in
the
core
databse
along
the
imputed
values
.
Based
on
these
probabilities
,
it
is
easy
to
draw
samples
for
a
multiple
imputation
analysis
.
Wherever
necessary
and
practical
,
a
custom
,
analysis
-
specific
multiple
imputation
approach
might
still
be
used
and
the
model
used
to
provide
the
single
-
imputed
values
may
be
used
as
a
starting
point
.
imputation
approach
might
still
be
employed
.
In
these
cases
,
the
model
providing
the
single
-
imputed
values
may
be
used
as
a
starting
point
.
Several
reasons
disqualify
LOCF
as
method
of
choice
.
Not
only
is
it
inherently
biased
,
but
LOCF
is
also
inefficient
in
that
it
...
...
@@ -1053,34 +1067,29 @@ necessary in some cases to further reduce bias.
Finally
,
LOCF
cannot
produce
an
adequate
measure
of
imputation
uncertainty
since
it
is
not
model
based
.
We
draw
two
main
conclusion
from
our
in
-
depth
comparison
of
three
We
draw
two
main
conclusion
from
our
comparison
of
three
alternative
,
model
-
based
approaches
.
Firstly
,
despite
its
theoretical
drawbacks
,
LOCF
is
hard
to
beat
in
terms
of
raw
accuracy
.
accuracy
.
Still
,
small
improvements
are
possible
(
cf
.
Section
???).
The
main
advantages
of
a
model
-
based
approach
is
thus
a
reduction
of
bias
,
the
ability
to
provide
a
measure
of
uncertainty
together
with
the
imputed
values
(
or
to
use
the
same
very
same
model
to
draw
multiple
imputations
),
as
well
as
the
possibility
to
include
further
analysis
-
specific
covariates
.
as
well
as
the
possibility
of
including
further
analysis
-
specific
covariates
.
Secondly
,
we
found
that
the
inclusion
of
established
baseline
predictors
for
GOSe
at
6
months
post
-
injury
had
next
to
no
effect
on
the
imputation
quality
.
GOSe
at
6
months
post
-
injury
had
little
effect
on
the
imputation
quality
.
Note
that
this
does
not
refute
their
predictive
value
but
only
indicates
that
there
is
little
marginal
benefit
over
knowing
at
least
one
other
GOSe
value
.
Finally
,
we
conclude
that
the
differences
between
the
considered
model
-
bases
approaches
are
rather
small
.
We
nevertheless
favor
the
multi
-
state
model
.
The
multi
state
model
is
well
-
interpretable
in
terms
of
modeling
transition
intensities
.
Efficient
implementations
in
standard
statistical
software
(
msm
,
R
)
are
available
,
it
succeeds
in
eliminating
the
bias
introduced
by
LOCF
,
is
Differences
between
the
model
-
based
approaches
tend
to
be
rather
nuanced
.
We
nevertheless
favor
the
multi
-
state
model
(
MSM
).
It
is
well
-
interpretable
in
terms
of
transition
intensities
.
An
efficient
implementation
[@
msm2011
]
in
standard
statistical
software
[@
R2016
]
is
available
.
It
succeeds
in
eliminating
the
bias
introduced
by
LOCF
,
is
able
to
provide
imputed
values
for
the
entire
population
and
is
able
to
provide
a
probabilistic
output
.
I
.
e
.,
besides
the
single
predicted
GOSe
value
,
the
fitted
probabilities
for
each
GOSe
value
may
be
stored
in
the
database
.
Based
on
these
values
,
it
is
easy
to
draw
samples
for
a
multiple
imputation
analysis
.
...
...
Write
Preview
Markdown
is supported
0%
Try again
or
attach a new file
Attach a file
Cancel
You are about to add
0
people
to the discussion. Proceed with caution.
Finish editing this message first!
Cancel
Please
register
or
sign in
to comment