Skip to content
Projects
Groups
Snippets
Help
Loading...
Help
Support
Submit feedback
Contribute to GitLab
Sign in
Toggle navigation
GOSe-6mo-imputation-paper
Project overview
Project overview
Details
Activity
Releases
Cycle Analytics
Repository
Repository
Files
Commits
Branches
Tags
Contributors
Graph
Compare
Charts
Issues
0
Issues
0
List
Boards
Labels
Milestones
Merge Requests
0
Merge Requests
0
Members
Members
Collapse sidebar
Close sidebar
Activity
Graph
Charts
Create a new issue
Commits
Issue Boards
Open sidebar
Kevin Kunzmann
GOSe-6mo-imputation-paper
Commits
f1a20e8d
Commit
f1a20e8d
authored
Feb 04, 2019
by
Kevin
Browse files
Options
Browse Files
Download
Email Patches
Plain Diff
tidier data report, imposed age restriction to 12-80 (configurable)
parent
36aab205
Changes
3
Hide whitespace changes
Inline
Side-by-side
Showing
3 changed files
with
62 additions
and
28 deletions
+62
-28
Snakefile
Snakefile
+2
-2
config.yml
config.yml
+4
-0
reports/prepare_data.Rmd
reports/prepare_data.Rmd
+56
-26
No files found.
Snakefile
View file @
f1a20e8d
...
...
@@ -34,7 +34,7 @@ rule prepare_data:
shell:
"""
mkdir -p output/{wildcards.version}/data
Rscript -e "rmarkdown::render(\\"{input.markdown}\\", output_dir = \\"output/{wildcards.version}\\", params = list(datapath = \\"../data/{wildcards.version}\\", max_lab_days = {config[max_lab_days]}, seed = {config[seed]}))"
Rscript -e "rmarkdown::render(\\"{input.markdown}\\", output_dir = \\"output/{wildcards.version}\\", params = list(datapath = \\"../data/{wildcards.version}\\", max_lab_days = {config[max_lab_days]}, seed = {config[seed]}
, age_min = {config[age_min]}, age_max = {config[age_max]}
))"
mv reports/*.rds output/{wildcards.version}/data
mv reports/figures.zip {output.figures}
"""
...
...
@@ -88,7 +88,7 @@ rule generate_validation_data_v1_1:
for j in range(1, config["folds"] + 1)
]
...
...
config.yml
View file @
f1a20e8d
...
...
@@ -2,6 +2,10 @@ seed:
42
max_lab_days
:
3
age_min
:
12
age_max
:
80
mi_m
:
5
mi_maxiter
:
...
...
reports/prepare_data.Rmd
View file @
f1a20e8d
...
...
@@ -13,6 +13,8 @@ params:
datapath: "../data/v1.1"
max_lab_days: 3
seed: 42
age_min: 12
age_max: 80
---
...
...
@@ -38,7 +40,9 @@ df_gose <- readRDS(sprintf('%s/df_gose.rds', params$datapath))
# Extract data
```{r}
## Baseline and death times
```{r extract-deathtimes}
df_deaths <- df_baseline %>%
transmute(
gupi,
...
...
@@ -50,7 +54,11 @@ df_deaths <- df_baseline %>%
) %>%
filter(complete.cases(.))
```
```{r}
We use exact deathtimes (Subject.DeathDate).
Death dates are recorded for `r df_deaths %>% nrow`.
```{r extract-baseline-covariates}
df_baseline <- df_baseline %>%
select(-Subject.DeathDate) %>%
mutate(
...
...
@@ -121,13 +129,13 @@ df_baseline <- df_baseline %>%
)
```
Overall, `r nrow(df_baseline)` individuals have recorded baseline data.
## GOSE data
```{r
gose-outcomes-ambiguity
}
```{r
extract-gose
}
df_gose <- df_gose %>%
distinct %>%
filter(complete.cases(.)) %>%
...
...
@@ -136,17 +144,16 @@ df_gose <- df_gose %>%
mutate(Outcomes.DerivedCompositeGOSE = factor(Outcomes.DerivedCompositeGOSE, levels = 1:8))
```
This results in `r nrow(df_gose)` GOSE measurements of
`r df_gose %>% group_by(gupi) %>% n_groups()` individuals.
# Compile final datasets
* exclude all patient who do not survive first 6 months (no need to impute)
* exclude all patients with no GOSE measurement (no imputation)
Overall, `r nrow(df_gose)` GOSE measurements of
`r df_gose %>% group_by(gupi) %>% n_groups()` individuals are available.
To these observations, we add a GOSE of 1 at the recorded death times.
We then exclude all patients with a recorded death time of less than 6 months,
since there 6 months GOSE is known exactly (1, dead).
The target population is thus the subset of individuals with
1. at least one valid GOSE observation
2. no confirmed death within 6 months
```{r}
```{r
exclude-early-deaths
}
early_deaths_gupi <- df_deaths %>%
filter(days <= 180 - 14) %>%
.[["gupi"]]
...
...
@@ -185,14 +192,14 @@ df_baseline <- df_baseline %>%
This results in `r nrow(df_gose)` GOSE measurements of
`r df_gose %>% group_by(gupi) %>% n_groups()` individuals.
## Plausibility check
The only genuinly numerical variables are Age, Glucose_mmolL, and Hb_dL.
All other variables are factors and may therefore not contain outliers.
```{r}
```{r
baseline-histograms-raw
}
df_baseline %>%
select(Subject.Age, Labs.DLGlucosemmolL, Labs.DLHemoglobingdL) %>%
gather(Variable, value) %>%
...
...
@@ -203,19 +210,40 @@ df_baseline %>%
theme(panel.grid = element_blank())
```
### Age range
The observed age range is quite wide, we further restrict the study population
to individuals between `r params$age_min` and `r params$age_max`.
```{r restrict-age-range}
df_baseline <- df_baseline %>%
filter(
Subject.Age >= params$age_min,
Subject.Age <= params$age_max
)
df_gose <- df_gose %>%
filter(gupi %in% df_baseline$gupi)
```
This reduces the number GOSE observations to `r nrow(df_gose)` of
`r df_gose %>% group_by(gupi) %>% n_groups()` individuals.
### Glucose and Hemoglobin
Glucose is obviously left-skewed and a log transfrom might improve fits in linear
models.
All values above 50 are considered implausible and set to missing
(probably meant as missing).
```{r
, echo=TRUE
}
```{r
glucose-outliers
}
df_baseline %>%
select(gupi, Labs.DLHemoglobingdL) %>%
filter(Labs.DLHemoglobingdL > 50)
```
All values above 50 are considered implausible and set to missing
(probably meant as missing).
```{r}
```{r log-trans-hemoglobin}
df_baseline <- df_baseline %>%
mutate(
Labs.DLHemoglobingdL = ifelse(Labs.DLHemoglobingdL > 50, NA_real_, Labs.DLHemoglobingdL),
...
...
@@ -334,18 +362,20 @@ ggsave("gose_alluvial_differential_coloring.pdf", height = 5, width = 8)
ggsave("gose_alluvial_differential_coloring.png", height = 5, width = 8)
```
Out of the `r df_gose %>% group_by(gupi) %>% n_groups` individuals in the
final dataset, `r df_gose %>% group_by(gupi) %>% filter(!any(Outcomes.DerivedCompositeGOSEDaysPostInjury >= 5*30 & Outcomes.DerivedCompositeGOSEDaysPostInjury <= 8*30)) %>% n_groups` do not
have per-protocol 6 months GOSE observations and are eligible for model-based
imputation.
# Session Info
```{r zip-figures}
system("zip figures.zip *.png *.pdf")
system("rm *.png *.pdf")
```
# Session Info
```{r session-info}
sessionInfo()
```
Write
Preview
Markdown
is supported
0%
Try again
or
attach a new file
Attach a file
Cancel
You are about to add
0
people
to the discussion. Proceed with caution.
Finish editing this message first!
Cancel
Please
register
or
sign in
to comment