...

37cd8615 · Kevin · a5ac5123 · 37cd8615 · 37cd8615 · 37cd8615
Commit 37cd8615 authored Mar 18, 2019 by Kevin
8 changed files
--- a/.gitignore
+++ b/.gitignore
+data
 output
 .snakemake
 .Rproj.user
@@ -8,4 +9,4 @@ output
 *.rds
 *.Rhistory
 *.out
-fetch
\ No newline at end of file
+fetch
--- a/README.md
+++ b/README.md
 # CENTER-TBI six-months GOSe-Outcome Imputation

-# Prerequisites
-
-We assume a Unix command line workflow. The following software is required to take advantage of the pre-defined workflow:
-  * curl for downloading the data (in case you do not have curl installed, it is also available from within the container)
-  * [python](https://www.python.org/download/releases/3.5.1/) 3.5.1 (higher versions might work as well) 
-  * [snakemake](https://snakemake.readthedocs.io/en/stable/getting_started/installation.html) version 5.2.1 (higher versions will work as well)
-  * [singularity](https://www.sylabs.io/guides/2.6/user-guide/index.html) 2.6.0 (higher versions might work as well)
-  
-+ CENTER-TBI account and API key, store as NEUROBOT_USR and NEUROBOT_API 
-environment variables.
-
-The entire analysis is containerized using a [docker container](https://cloud.docker.com/u/kkmann/repository/docker/kkmann/gose-6mo-imputation). 
-The container can either be used to execute scripts individually inside the container, or it can be used to run the entire 
-pre-defined snakemake workflow using the container via singularity (recommended).
-A [script](https://github.com/kkmann/center-6mo-gose-imputation/blob/master/snakemake_slurm) for running the entire analysis 
-on a slurm cluster is provided. 
-Make sure to adjust the parameters in the [cluster configuration file](https://github.com/kkmann/center-6mo-gose-imputation/blob/master/cluster.json) 
-accordingly.
-
-A [script](https://github.com/kkmann/center-6mo-gose-imputation/blob/master/snakemake) for execution on a single desktop 
-machine is provided as well. Depending on the number of cores and available RAM, the cross-validated model comparison may take several 
-days (3+) to complete.
-The script can be invoked via
-```
-./snakemake_slurm [target]
-```
-where `[target]` is the build target (e.g. 'data_report_v1_1').
+This repository contains the entire source code to reproduce the imputation for
+the six-months GOSe in CENTER-TBI.


-# Executing the workflow

-The available rules can be listed by invoking
-```
-snakemake -lt
-```
+## Prerequisites


-To reproduce the data extraction an population description on version v1.1 of the neurobot CENTER-TBI data, invoke 
-```
-./snakemake_slurm data_report_v1_1
-```
+### Data Access

-To reproduce the cross-validated model comparison on version v1.1 of the neurobot CENTER-TBI data, invoke 
-```
-./snakemake_slurm cv_model_comparison_report_v1_1
+To reproduce the analysis, access to the CENTER-TBI 'Neurobot' database at
+https://center-tbi.incf.org and a personal access toke to the curl API
+is required.
+For information on how to get dat access, see https://www.center-tbi.eu/data.
+
+
+### Software dependencies
+
+The workflow assumes a linux command line.
+To facilitate reproducibility, a 
+[docker container](https://cloud.docker.com/u/kkmann/repository/docker/kkmann/gose-6mo-imputation)
+container with all software dependencies
+(R packages etc.) is provided [here]().
+The workflow itself is automated using [snakemake]() 5.2.1.
+
+To fully leverage the container and snakemake workflow, the following software
+dependencies must be available:
+
+* [python](https://www.python.org/download/releases/3.5.1/) 3.5.1 (higher versions might work as well) 
+* [snakemake](https://snakemake.readthedocs.io/en/stable/getting_started/installation.html) version 5.2.1 (higher versions will work as well)
+* [singularity](https://www.sylabs.io/guides/2.6/user-guide/index.html) 2.6.0 (higher versions might work as well)
+
+
+
+## How-To
+
+The download script requires the neurobot user name and the personal API key
+to be stored in the environment variables `NEUROBOT_USR` `NEUROBOT_API`, 
+respectively, i.e.
+```bash
+export NEUROBOT_USR=[my-neurobot-username]
+export NEUROBOT_API=[my-neurobot-api-key]
 ```


-To reproduce the MSM model-based imputation for v1.1 of the neurobot CENTER-TBI data, invoke 
+### Execute Workflow on Desktop
+
+The workflow can be executed on a potent desktop machine although a cluster
+execution is recommended (cf. blow).
+```bash
+./singularity manuscript_v1_1
+./singularity impute_msm_v1_1
 ```
-./snakemake_slurm cv_model_comparison_report_v1_1
+
+All output is written to `output/`.
+Depending on the number of cores and available RAM, 
+the cross-validated model comparison may take several days (3+) to complete.
+
+
+### Execute Workflow on Cluster
+
+Cluster execution requires slightly more 
+[configuration](https://github.com/kkmann/center-6mo-gose-imputation/blob/master/cluster.json)
+and assumes existence of a slurm cluster.
+Simply modify the `cluster.json` accordingly and execute
+```bash
+./singularity_slurm manuscript_v1_1
+./singularity_slurm impute_msm_v1_1
 ```
--- a/Snakefile
+++ b/Snakefile
-singularity: "docker://kkmann/gose-6mo-imputation@sha256:42e72ea0ccaa938b50aea85b7ac3b5d5f8efada79f0b7e2411b1a70a2e037801"
+singularity: "docker://kkmann/gose-6mo-imputation@sha256:62540d4bc41b228639bce7e4fe764acfaaeef76e467b92d9c55b26f8ea4f4c5f"

 configfile: "config.yml"



-
-
 rule download_data:
    output:
        "data/{version}/df_baseline.rds",
@@ -20,8 +18,6 @@ rule download_data:



-
-
 rule prepare_data:
    input:
        rules.download_data.output,
@@ -29,7 +25,7 @@ rule prepare_data:
    output:
        "output/{version}/data/df_gose.rds",
        "output/{version}/data/df_baseline.rds",
-        "output/{version}/prepare_data.pdf",
+        "output/{version}/prepare_data.html",
        figures = "output/{version}/prepare_data_figures.zip"
    shell:
        """
@@ -41,8 +37,6 @@ rule prepare_data:



-
-
 rule impute_baseline:
    input:
        rules.prepare_data.output
@@ -56,8 +50,6 @@ rule impute_baseline:



-
-
 rule generate_validation_data:
    input:
        rules.prepare_data.output,
@@ -76,8 +68,6 @@ rule generate_validation_data:



-
-
 # adjust threads by model type
 def get_rule_threads(wildcards):
    if wildcards.model in ("locf", "msm"):
@@ -101,7 +91,6 @@ rule fit_model_validation_set:



-
 # helper rule to just build all posterior datasets
 rule model_posteriors:
    input:
@@ -113,9 +102,6 @@ rule model_posteriors:



-
-
-
 # rules for imputing on entire dataset
 rule generate_imputation_data:
    input:
@@ -133,8 +119,6 @@ rule generate_imputation_data:



-
-
 rule model_impute:
    input:
        "config.yml",
@@ -172,20 +156,20 @@ rule imputation_report:
        rules.post_process_imputations.output,
        markdown = "reports/imputations.Rmd"
    output:
-        pdf     = "output/{version}/gose_imputations_{model}.pdf",
+        html    = "output/{version}/gose_imputations_{model}.html",
        figures = "output/{version}/gose_imputations_{model}_figures.zip"
    shell:
        """
        mkdir -p output/{wildcards.version}
        Rscript -e "rmarkdown::render(\\"{input.markdown}\\", params = list(data_dir = \\"../output/{wildcards.version}/data\\", imputations = \\"../output/v1.1/data/imputation/{wildcards.model}/df_gose_imputed.csv\\"))"
-        mv reports/imputations.pdf {output.pdf}
+        mv reports/imputations.html {output.html}
        mv reports/figures.zip {output.figures}
        """

 # define corresponding target rule for ease of use
 rule impute_msm_v1_1:
    input:
-        pdf     = "output/v1.1/gose_imputations_msm.pdf",
+        html    = "output/v1.1/gose_imputations_msm.html",
        figures = "output/v1.1/gose_imputations_msm_figures.zip"



--- a/docker/dockerfile
+++ b/docker/dockerfile
@@ -6,7 +6,7 @@ MAINTAINER Kevin Kunzmann kevin.kunzmann@mrc-bsu.cam.ac.uk
 RUN sudo apt-get update

 # install prerequisits
-RUN sudo apt-get -y install libcurl4-openssl-dev
+RUN sudo apt-get -y install libcurl4-openssl-dev curl

 # install required R packages
 RUN R -e "install.packages('rstan')"
@@ -18,5 +18,4 @@ RUN R -e "install.packages('msm')"
 RUN R -e "install.packages('cowplot')"
 RUN R -e "install.packages('pander')"
 RUN R -e "install.packages('DiagrammeR')"
-RUN R -e "devtools::install_github('kkmann/reportr')"
 RUN R -e "devtools::install_github('kkmann/describr')"
--- a/download.R
+++ b/download.R
-#!/usr/bin bash
-
-curl \
-  --user $NEUROBOT_USR:$NEUROBOT_API \
-  --digest https://neurobot-stage.incf.org/api/data/_5c8a757252dc3879e3b7cc35.csv
-
--- a/reports/imputations.Rmd
+++ b/reports/imputations.Rmd
 ---
-title: "Imputing GOSE scores in CENTER-TBI"
-
-subtitle: "assessing final imputations"
+title: "Imputing GOSE scores in CENTER-TBI, assessing final imputations"

 date: "`r Sys.time()`"
      
-statistician: "Kevin Kunzmann (kevin.kunzmann@mrc-bsu.cam.ac.uk)"
-
-collaborator: "David Menon (dkm13@cam.ac.uk)"
-
-output: reportr::report
-
-git-commit-hash: "`r system('git rev-parse --verify HEAD', intern=TRUE)`"
+author: "Kevin Kunzmann (kevin.kunzmann@mrc-bsu.cam.ac.uk)"

-git-wd-clean: "`r ifelse(system('git diff-index --quiet HEAD') == 0, 'clean', 'file changes, working directory not clean!')`"
+output: html_document

 params:
  data_dir:    "../output/v1.1/data"

--- a/reports/model_assessment.Rmd
+++ b/reports/model_assessment.Rmd
@@ -3,15 +3,9 @@ title: "Imputing GOSE scores in CENTER-TBI"

 date: "`r Sys.time()`"
      
-statistician: "Kevin Kunzmann (kevin.kunzmann@mrc-bsu.cam.ac.uk)"
+author: "Kevin Kunzmann (kevin.kunzmann@mrc-bsu.cam.ac.uk)"

-collaborator: "David Menon (dkm13@cam.ac.uk)"
-
-output: reportr::report
-
-git-commit-hash: "`r system('git rev-parse --verify HEAD', intern=TRUE)`"
-
-git-wd-clean: "`r ifelse(system('git diff-index --quiet HEAD') == 0, 'clean', 'file changes, working directory not clean!')`"
+output: html_document

 bibliography: "references.bib"


--- a/reports/prepare_data.Rmd
+++ b/reports/prepare_data.Rmd
 ---
 title: "Extract and prepare data"

-statistician: "Kevin Kunzmann (kevin.kunzmann@mrc-bsu.cam.ac.uk)"
-
-collaborator: "David Menon (dkm13@cam.ac.uk)"
+date: "`r Sys.time()`"
+      
+author: "Kevin Kunzmann (kevin.kunzmann@mrc-bsu.cam.ac.uk)"

-output: reportr::report
+output: html_document

-date: "`r Sys.time()`"

 params:
  datapath: "../data/v1.1"