Commit 87998fa0 authored by Kevin's avatar Kevin

mrge

Merge branch 'master' of git.center-tbi.eu:kunzmann/gose-6mo-imputation

# Conflicts:
#	.gitignore
#	manuscript/manuscript.Rmd
#	manuscript/references.bib
parents 8e8409f1 943d2c27
data
output
output*
.snakemake
.Rproj.user
.cache
*.sif
*.zip
*.pdf
*.Rproj
......@@ -10,5 +11,4 @@ output
*.Rhistory
*.out
*.docx
fetch
*.sif
......@@ -16,56 +16,85 @@ is required.
For information on how to get dat access, see https://www.center-tbi.eu/data.
### Software dependencies
The workflow assumes a linux command line.
To facilitate reproducibility, a
[docker container](https://cloud.docker.com/u/kkmann/repository/docker/kkmann/gose-6mo-imputation)
container with all software dependencies (R packages etc.) is provided.
The workflow itself is automated using [snakemake]() 5.2.1.
[singularity image](https://zenodo.org/record/2600384) with all software
dependencies (R packages etc.) is provided ![DOI:10.5281/zenodo.2600384](https://zenodo.org/badge/DOI/10.5281/zenodo.2600384.svg).
The workflow itself is automated using
[snakemake](https://snakemake.readthedocs.io/en/stable/index.html).
To fully leverage the container and snakemake workflow, the following software
dependencies must be available:
* [snakemake](https://snakemake.readthedocs.io/en/stable/getting_started/installation.html) version 5.2.1+; requires [python](https://www.python.org/download/releases/3.5.1/) 3.5.1+
* [singularity](https://www.sylabs.io/guides/2.6/user-guide/index.html) 2.6.0+
* [wget](https://www.gnu.org/software/wget/) [optional], only for automatic
download of container image file
* [snakemake](https://snakemake.readthedocs.io/en/stable/getting_started/installation.html) 5.2.1+
[optional], only required for cluster execution; requires [python](https://www.python.org/download/releases/3.5.1/) 3.5.1+
## How-To
## How-To ...
The download script requires the neurobot user name and the personal API key
to be stored in the environment variables `NEUROBOT_USR` `NEUROBOT_API`,
respectively, i.e.
The singularity container image can be downloaded manually from using the
digital object identifier.
Note that in this case the downloaded version of the container must match the
version specified in the URL given in the wget command of
`scripts/download_container.sh`.
It is strongly recommended to download the container via
```bash
./scripts/download_container.sh
```
to ensure the correct version.
The downloaded container image file's md5 sum is checked automatically and
an error is thrown in case of a mismatch.
Note that the image cannot be stored in this repository due to file-size
limitations.
Furthermore, the data-download script requires the neurobot user name and the
personal API key (v.s.) to be stored in the environment variables `NEUROBOT_USR`
`NEUROBOT_API`, respectively, i.e.
```bash
export NEUROBOT_USR=[my-neurobot-username]
export NEUROBOT_API=[my-neurobot-api-key]
```
### Execute Workflow on Desktop
### ... execute workflow on a single machine
The workflow can be executed on a potent desktop machine although a cluster
execution is recommended (cf. blow).
execution is recommended for speed (cf. below).
Given that singularity is installed and the container.sif file is present in
the repository root, simply invoke
```bash
./snakemake manuscript_v1_1
./snakemake impute_msm_v1_1
singularity exec container.sif snakemake create_manuscript_v1_1
singularity exec container.sif snakemake impute_population_wide_msm_v1_1
```
All output is written to `output/`.
The first command creates all files necessary to compile the cross-validated
analysis of imputation performance (output/v1.1/manuscript.docx);
the second one only computes the final imputations for the entire study
population (output/v1.1/data/imputation/msm).
Depending on the number of cores and available RAM,
the cross-validated model comparison may take several days (3+) to complete.
### Execute Workflow on Cluster
### ... execute workflow on cluster
Cluster execution requires a cluster-specific [configuration](https://github.com/kkmann/center-6mo-gose-imputation/blob/master/cluster.json).
The `singularity_slurm` script assumes existence of a slurm cluster.
Data should be downloaded on the login node:
Cluster execution requires a cluster-specific snakemake
[configuration](https://github.com/kkmann/center-6mo-gose-imputation/blob/master/cluster.json).
The `singularity_on_slurm_cluster` script assumes existence of a slurm cluster.
It is recommended to execute only the actual model fitting on the cluster and
to do all preprocessing on the login node to avoid unnecessary queueing time:
```bash
./snakemake download_data_v1_1
singularity exec container.sif snakemake generate_folds_v1_1
```
Then, simply modify the `cluster.json` accordingly and execute
Then, simply modify the `cluster.json` accordingly, make sure that
snakemake is installed and execute
```bash
./snakemake_slurm manuscript_v1_1
./snakemake_slurm impute_msm_v1_1
./snakemake_on_slurm_cluster create_manuscript_v1_1
./snakemake_on_slurm_cluster impute_population_wide_msm_v1_1
```
singularity: "docker://kkmann/gose-6mo-imputation@sha256:85724229d8f4243aaebd6228e5cc7833474577ac107f9719b00016765f9ee342"
singularity: "container.sif"
configfile: "config.yml"
......
......@@ -2,7 +2,7 @@
"__default__" :
{
"account" : "MRC-BSU-SL2-CPU",
"time" : "06:00:00",
"time" : "03:00:00",
"n" : 1,
"partition" : "bsu-cpu"
}
......
#!/bin/bash
USERNAME="kkmann"
IMAGE="gose-6mo-imputation"
BUILDNAME=$USERNAME/$IMAGE
docker build --no-cache -t $BUILDNAME .
FROM rocker/verse:latest
MAINTAINER Kevin Kunzmann kevin.kunzmann@mrc-bsu.cam.ac.uk
# update apt
RUN sudo apt-get update
# install prerequisits
RUN sudo apt-get -y install libcurl4-openssl-dev curl bzip2
# install required R packages
RUN R -e "install.packages('diagram')"
RUN R -e "install.packages('rstan')"
RUN R -e "install.packages('brms')"
RUN R -e "install.packages('mice')"
RUN R -e "install.packages('ggalluvial')"
RUN R -e "install.packages('caret')"
RUN R -e "install.packages('e1071')"
RUN R -e "install.packages('msm')"
RUN R -e "install.packages('cowplot')"
RUN R -e "install.packages('pander')"
RUN R -e "devtools::install_github('kkmann/describr')"
This diff is collapsed.
......@@ -102,6 +102,80 @@
pages={36--47},
year={2014},
publisher={American Medical Association}
@article{center2015collaborative,
title={Collaborative European neurotrauma effectiveness research in traumatic brain injury (CENTER-TBI): A prospective longitudinal observational study},
author={CENTER-TBI Participants and Investigators and others},
journal={Neurosurgery},
volume={76},
number={1},
pages={67--80},
year={2015},
publisher={Lippincott Williams and Wilkins}
}
@article{kurtzer2017singularity,
title={Singularity: Scientific containers for mobility of compute},
author={Kurtzer, Gregory M and Sochat, Vanessa and Bauer, Michael W},
journal={PloS one},
volume={12},
number={5},
pages={e0177459},
year={2017},
publisher={Public Library of Science}
}
@article{mcmillan2016glasgow,
title={The Glasgow Outcome Scale—40 years of application and refinement},
author={McMillan, Tom and Wilson, Lindsay and Ponsford, Jennie and Levin, Harvey and Teasdale, Graham and Bond, Michael},
journal={Nature Reviews Neurology},
volume={12},
number={8},
pages={477},
year={2016},
publisher={Nature Publishing Group}
}
@article{jennett1981disability,
title={Disability after severe head injury: observations on the use of the Glasgow Outcome Scale.},
author={Jennett, B and Snoek, J and Bond, MR and Brooks, N},
journal={Journal of Neurology, Neurosurgery \& Psychiatry},
volume={44},
number={4},
pages={285--293},
year={1981},
publisher={BMJ Publishing Group Ltd}
}
@article{steyerberg2008predicting,
title={Predicting outcome after traumatic brain injury: development and international validation of prognostic scores based on admission characteristics},
author={Steyerberg, Ewout W and Mushkudiani, Nino and Perel, Pablo and Butcher, Isabella and Lu, Juan and McHugh, Gillian S and Murray, Gordon D and Marmarou, Anthony and Roberts, Ian and Habbema, J Dik F and others},
journal={PLoS medicine},
volume={5},
number={8},
pages={e165},
year={2008},
publisher={Public Library of Science}
}
@book{verbeke2009linear,
title={Linear mixed models for longitudinal data},
author={Verbeke, Geert and Molenberghs, Geert},
year={2009},
publisher={Springer Science \& Business Media}
}
@article{koster2012snakemake,
title={Snakemake—a scalable bioinformatics workflow engine},
author={K{\"o}ster, Johannes and Rahmann, Sven},
journal={Bioinformatics},
volume={28},
number={19},
pages={2520--2522},
year={2012},
publisher={Oxford University Press}
>>>>>>> 943d2c27a63fd82c265933c46a0d6ab674191f03
}
......@@ -116,6 +190,19 @@ title = {{Multi-State Models for Panel Data: The msm Package for R}},
volume = {38},
year = {2011}
}
@article{white2010bias,
title={Bias and efficiency of multiple imputation compared with complete-case analysis for missing covariate values},
author={White, Ian R and Carlin, John B},
journal={Statistics in medicine},
volume={29},
number={28},
pages={2920--2931},
year={2010},
publisher={Wiley Online Library}
}
@article{R2016,
archivePrefix = {arXiv},
arxivId = {arXiv:1011.1669v3},
......
......@@ -9,3 +9,11 @@ rule download_data:
"""
bash scripts/download_{wildcards.version}.sh
"""
rule download_data_v1_1:
input:
"data/v1.1/df_baseline.rds",
"data/v1.1/df_ctmri.rds",
"data/v1.1/df_imaging.rds",
"data/v1.1/df_labs.rds",
"data/v1.1/df_gose.rds"
#!/bin/bash
set -e
wget https://zenodo.org/record/2600385/files/container.sif
checksum=($(md5sum container.sif))
if [ $checksum != 7db125c9c83621d78981e558546b1e88 ]; then
echo md5 mismatch!
exit 1
fi
Bootstrap: docker
From: rocker/verse:latest
%labels
Maintainer Kevin Kunzmann kevin.kunzmann@mrc-bsu.cam.ac.uk
%help
CENTER-TBI 6 months GOSe outcome imputation,
cf. https://git.center-tbi.eu/kunzmann/gose-6mo-imputation for details.
%post
apt-get update
apt-get -y install curl python3-pip
pip3 install snakemake
R -e "install.packages('diagram')"
R -e "install.packages('rstan')"
R -e "install.packages('brms')"
R -e "install.packages('mice')"
R -e "install.packages('ggalluvial')"
R -e "install.packages('caret')"
R -e "install.packages('e1071')"
R -e "install.packages('msm')"
R -e "install.packages('cowplot')"
R -e "install.packages('pander')"
R -e "devtools::install_github('kkmann/describr')"
#!/bin/bash
ncpus=$(getconf _NPROCESSORS_ONLN)
snakemake $1 --use-singularity -j $ncpus
#!/bin/bash
# single input: snakemake target
nohup snakemake $1 --use-singularity -j 99 --cluster-config cluster.json --cluster "sbatch -A {cluster.account} -p {cluster.partition} -n {cluster.n} -c {threads} -t {cluster.time}" &
Markdown is supported
0% or
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment