Commit 37cd8615 authored by Kevin's avatar Kevin

...

parent a5ac5123
data
output
.snakemake
.Rproj.user
......@@ -8,4 +9,4 @@ output
*.rds
*.Rhistory
*.out
fetch
\ No newline at end of file
fetch
# CENTER-TBI six-months GOSe-Outcome Imputation
# Prerequisites
We assume a Unix command line workflow. The following software is required to take advantage of the pre-defined workflow:
* curl for downloading the data (in case you do not have curl installed, it is also available from within the container)
* [python](https://www.python.org/download/releases/3.5.1/) 3.5.1 (higher versions might work as well)
* [snakemake](https://snakemake.readthedocs.io/en/stable/getting_started/installation.html) version 5.2.1 (higher versions will work as well)
* [singularity](https://www.sylabs.io/guides/2.6/user-guide/index.html) 2.6.0 (higher versions might work as well)
+ CENTER-TBI account and API key, store as NEUROBOT_USR and NEUROBOT_API
environment variables.
The entire analysis is containerized using a [docker container](https://cloud.docker.com/u/kkmann/repository/docker/kkmann/gose-6mo-imputation).
The container can either be used to execute scripts individually inside the container, or it can be used to run the entire
pre-defined snakemake workflow using the container via singularity (recommended).
A [script](https://github.com/kkmann/center-6mo-gose-imputation/blob/master/snakemake_slurm) for running the entire analysis
on a slurm cluster is provided.
Make sure to adjust the parameters in the [cluster configuration file](https://github.com/kkmann/center-6mo-gose-imputation/blob/master/cluster.json)
accordingly.
A [script](https://github.com/kkmann/center-6mo-gose-imputation/blob/master/snakemake) for execution on a single desktop
machine is provided as well. Depending on the number of cores and available RAM, the cross-validated model comparison may take several
days (3+) to complete.
The script can be invoked via
```
./snakemake_slurm [target]
```
where `[target]` is the build target (e.g. 'data_report_v1_1').
This repository contains the entire source code to reproduce the imputation for
the six-months GOSe in CENTER-TBI.
# Executing the workflow
The available rules can be listed by invoking
```
snakemake -lt
```
## Prerequisites
To reproduce the data extraction an population description on version v1.1 of the neurobot CENTER-TBI data, invoke
```
./snakemake_slurm data_report_v1_1
```
### Data Access
To reproduce the cross-validated model comparison on version v1.1 of the neurobot CENTER-TBI data, invoke
```
./snakemake_slurm cv_model_comparison_report_v1_1
To reproduce the analysis, access to the CENTER-TBI 'Neurobot' database at
https://center-tbi.incf.org and a personal access toke to the curl API
is required.
For information on how to get dat access, see https://www.center-tbi.eu/data.
### Software dependencies
The workflow assumes a linux command line.
To facilitate reproducibility, a
[docker container](https://cloud.docker.com/u/kkmann/repository/docker/kkmann/gose-6mo-imputation)
container with all software dependencies
(R packages etc.) is provided [here]().
The workflow itself is automated using [snakemake]() 5.2.1.
To fully leverage the container and snakemake workflow, the following software
dependencies must be available:
* [python](https://www.python.org/download/releases/3.5.1/) 3.5.1 (higher versions might work as well)
* [snakemake](https://snakemake.readthedocs.io/en/stable/getting_started/installation.html) version 5.2.1 (higher versions will work as well)
* [singularity](https://www.sylabs.io/guides/2.6/user-guide/index.html) 2.6.0 (higher versions might work as well)
## How-To
The download script requires the neurobot user name and the personal API key
to be stored in the environment variables `NEUROBOT_USR` `NEUROBOT_API`,
respectively, i.e.
```bash
export NEUROBOT_USR=[my-neurobot-username]
export NEUROBOT_API=[my-neurobot-api-key]
```
To reproduce the MSM model-based imputation for v1.1 of the neurobot CENTER-TBI data, invoke
### Execute Workflow on Desktop
The workflow can be executed on a potent desktop machine although a cluster
execution is recommended (cf. blow).
```bash
./singularity manuscript_v1_1
./singularity impute_msm_v1_1
```
./snakemake_slurm cv_model_comparison_report_v1_1
All output is written to `output/`.
Depending on the number of cores and available RAM,
the cross-validated model comparison may take several days (3+) to complete.
### Execute Workflow on Cluster
Cluster execution requires slightly more
[configuration](https://github.com/kkmann/center-6mo-gose-imputation/blob/master/cluster.json)
and assumes existence of a slurm cluster.
Simply modify the `cluster.json` accordingly and execute
```bash
./singularity_slurm manuscript_v1_1
./singularity_slurm impute_msm_v1_1
```
singularity: "docker://kkmann/gose-6mo-imputation@sha256:42e72ea0ccaa938b50aea85b7ac3b5d5f8efada79f0b7e2411b1a70a2e037801"
singularity: "docker://kkmann/gose-6mo-imputation@sha256:62540d4bc41b228639bce7e4fe764acfaaeef76e467b92d9c55b26f8ea4f4c5f"
configfile: "config.yml"
rule download_data:
output:
"data/{version}/df_baseline.rds",
......@@ -20,8 +18,6 @@ rule download_data:
rule prepare_data:
input:
rules.download_data.output,
......@@ -29,7 +25,7 @@ rule prepare_data:
output:
"output/{version}/data/df_gose.rds",
"output/{version}/data/df_baseline.rds",
"output/{version}/prepare_data.pdf",
"output/{version}/prepare_data.html",
figures = "output/{version}/prepare_data_figures.zip"
shell:
"""
......@@ -41,8 +37,6 @@ rule prepare_data:
rule impute_baseline:
input:
rules.prepare_data.output
......@@ -56,8 +50,6 @@ rule impute_baseline:
rule generate_validation_data:
input:
rules.prepare_data.output,
......@@ -76,8 +68,6 @@ rule generate_validation_data:
# adjust threads by model type
def get_rule_threads(wildcards):
if wildcards.model in ("locf", "msm"):
......@@ -101,7 +91,6 @@ rule fit_model_validation_set:
# helper rule to just build all posterior datasets
rule model_posteriors:
input:
......@@ -113,9 +102,6 @@ rule model_posteriors:
# rules for imputing on entire dataset
rule generate_imputation_data:
input:
......@@ -133,8 +119,6 @@ rule generate_imputation_data:
rule model_impute:
input:
"config.yml",
......@@ -172,20 +156,20 @@ rule imputation_report:
rules.post_process_imputations.output,
markdown = "reports/imputations.Rmd"
output:
pdf = "output/{version}/gose_imputations_{model}.pdf",
html = "output/{version}/gose_imputations_{model}.html",
figures = "output/{version}/gose_imputations_{model}_figures.zip"
shell:
"""
mkdir -p output/{wildcards.version}
Rscript -e "rmarkdown::render(\\"{input.markdown}\\", params = list(data_dir = \\"../output/{wildcards.version}/data\\", imputations = \\"../output/v1.1/data/imputation/{wildcards.model}/df_gose_imputed.csv\\"))"
mv reports/imputations.pdf {output.pdf}
mv reports/imputations.html {output.html}
mv reports/figures.zip {output.figures}
"""
# define corresponding target rule for ease of use
rule impute_msm_v1_1:
input:
pdf = "output/v1.1/gose_imputations_msm.pdf",
html = "output/v1.1/gose_imputations_msm.html",
figures = "output/v1.1/gose_imputations_msm_figures.zip"
......
......@@ -6,7 +6,7 @@ MAINTAINER Kevin Kunzmann kevin.kunzmann@mrc-bsu.cam.ac.uk
RUN sudo apt-get update
# install prerequisits
RUN sudo apt-get -y install libcurl4-openssl-dev
RUN sudo apt-get -y install libcurl4-openssl-dev curl
# install required R packages
RUN R -e "install.packages('rstan')"
......@@ -18,5 +18,4 @@ RUN R -e "install.packages('msm')"
RUN R -e "install.packages('cowplot')"
RUN R -e "install.packages('pander')"
RUN R -e "install.packages('DiagrammeR')"
RUN R -e "devtools::install_github('kkmann/reportr')"
RUN R -e "devtools::install_github('kkmann/describr')"
#!/usr/bin bash
curl \
--user $NEUROBOT_USR:$NEUROBOT_API \
--digest https://neurobot-stage.incf.org/api/data/_5c8a757252dc3879e3b7cc35.csv
---
title: "Imputing GOSE scores in CENTER-TBI"
subtitle: "assessing final imputations"
title: "Imputing GOSE scores in CENTER-TBI, assessing final imputations"
date: "`r Sys.time()`"
statistician: "Kevin Kunzmann (kevin.kunzmann@mrc-bsu.cam.ac.uk)"
collaborator: "David Menon (dkm13@cam.ac.uk)"
output: reportr::report
git-commit-hash: "`r system('git rev-parse --verify HEAD', intern=TRUE)`"
author: "Kevin Kunzmann (kevin.kunzmann@mrc-bsu.cam.ac.uk)"
git-wd-clean: "`r ifelse(system('git diff-index --quiet HEAD') == 0, 'clean', 'file changes, working directory not clean!')`"
output: html_document
params:
data_dir: "../output/v1.1/data"
......
......@@ -3,15 +3,9 @@ title: "Imputing GOSE scores in CENTER-TBI"
date: "`r Sys.time()`"
statistician: "Kevin Kunzmann (kevin.kunzmann@mrc-bsu.cam.ac.uk)"
author: "Kevin Kunzmann (kevin.kunzmann@mrc-bsu.cam.ac.uk)"
collaborator: "David Menon (dkm13@cam.ac.uk)"
output: reportr::report
git-commit-hash: "`r system('git rev-parse --verify HEAD', intern=TRUE)`"
git-wd-clean: "`r ifelse(system('git diff-index --quiet HEAD') == 0, 'clean', 'file changes, working directory not clean!')`"
output: html_document
bibliography: "references.bib"
......
---
title: "Extract and prepare data"
statistician: "Kevin Kunzmann (kevin.kunzmann@mrc-bsu.cam.ac.uk)"
collaborator: "David Menon (dkm13@cam.ac.uk)"
date: "`r Sys.time()`"
author: "Kevin Kunzmann (kevin.kunzmann@mrc-bsu.cam.ac.uk)"
output: reportr::report
output: html_document
date: "`r Sys.time()`"
params:
datapath: "../data/v1.1"
......
Markdown is supported
0% or
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment