Commit 855ef4fe authored by Kevin Kunzmann's avatar Kevin Kunzmann

Update README.md

parent 18dcc320
...@@ -16,56 +16,85 @@ is required. ...@@ -16,56 +16,85 @@ is required.
For information on how to get dat access, see https://www.center-tbi.eu/data. For information on how to get dat access, see https://www.center-tbi.eu/data.
### Software dependencies ### Software dependencies
The workflow assumes a linux command line. The workflow assumes a linux command line.
To facilitate reproducibility, a To facilitate reproducibility, a
[docker container](https://cloud.docker.com/u/kkmann/repository/docker/kkmann/gose-6mo-imputation) [singularity image](https://zenodo.org/record/2600384) with all software
container with all software dependencies (R packages etc.) is provided. dependencies (R packages etc.) is provided ![DOI:10.5281/zenodo.2600384](https://zenodo.org/badge/DOI/10.5281/zenodo.2600384.svg).
The workflow itself is automated using [snakemake]() 5.2.1. The workflow itself is automated using
[snakemake](https://snakemake.readthedocs.io/en/stable/index.html).
To fully leverage the container and snakemake workflow, the following software To fully leverage the container and snakemake workflow, the following software
dependencies must be available: dependencies must be available:
* [snakemake](https://snakemake.readthedocs.io/en/stable/getting_started/installation.html) version 5.2.1+; requires [python](https://www.python.org/download/releases/3.5.1/) 3.5.1+
* [singularity](https://www.sylabs.io/guides/2.6/user-guide/index.html) 2.6.0+ * [singularity](https://www.sylabs.io/guides/2.6/user-guide/index.html) 2.6.0+
* [wget](https://www.gnu.org/software/wget/) [optional], only for automatic
download of container image file
* [snakemake](https://snakemake.readthedocs.io/en/stable/getting_started/installation.html) 5.2.1+
[optional], only required for cluster execution; requires [python](https://www.python.org/download/releases/3.5.1/) 3.5.1+
## How-To ## How-To ...
The download script requires the neurobot user name and the personal API key The singularity container image can be downloaded manually from using the
to be stored in the environment variables `NEUROBOT_USR` `NEUROBOT_API`, digital object identifier.
respectively, i.e. Note that in this case the downloaded version of the container must match the
version specified in the URL given in the wget command of
`scripts/download_container.sh`.
It is strongly recommended to download the container via
```bash
./scripts/download_container.sh
```
to ensure the correct version.
The downloaded container image file's md5 sum is checked automatically and
an error is thrown in case of a mismatch.
Note that the image cannot be stored in this repository due to file-size
limitations.
Furthermore, the data-download script requires the neurobot user name and the
personal API key (v.s.) to be stored in the environment variables `NEUROBOT_USR`
`NEUROBOT_API`, respectively, i.e.
```bash ```bash
export NEUROBOT_USR=[my-neurobot-username] export NEUROBOT_USR=[my-neurobot-username]
export NEUROBOT_API=[my-neurobot-api-key] export NEUROBOT_API=[my-neurobot-api-key]
``` ```
### Execute Workflow on Desktop
### ... execute workflow on a single machine
The workflow can be executed on a potent desktop machine although a cluster The workflow can be executed on a potent desktop machine although a cluster
execution is recommended (cf. blow). execution is recommended for speed (cf. below).
Given that singularity is installed and the container.sif file is present in
the repository root, simply invoke
```bash ```bash
./snakemake manuscript_v1_1 singularity exec container.sif snakemake create_manuscript_v1_1
./snakemake impute_msm_v1_1 singularity exec container.sif snakemake impute_population_wide_msm_v1_1
``` ```
The first command creates all files necessary to compile the cross-validated
All output is written to `output/`. analysis of imputation performance (output/v1.1/manuscript.docx);
the second one only computes the final imputations for the entire study
population (output/v1.1/data/imputation/msm).
Depending on the number of cores and available RAM, Depending on the number of cores and available RAM,
the cross-validated model comparison may take several days (3+) to complete. the cross-validated model comparison may take several days (3+) to complete.
### Execute Workflow on Cluster ### ... execute workflow on cluster
Cluster execution requires a cluster-specific [configuration](https://github.com/kkmann/center-6mo-gose-imputation/blob/master/cluster.json). Cluster execution requires a cluster-specific snakemake
The `singularity_slurm` script assumes existence of a slurm cluster. [configuration](https://github.com/kkmann/center-6mo-gose-imputation/blob/master/cluster.json).
Data should be downloaded on the login node: The `singularity_on_slurm_cluster` script assumes existence of a slurm cluster.
It is recommended to execute only the actual model fitting on the cluster and
to do all preprocessing on the login node to avoid unnecessary queueing time:
```bash ```bash
./snakemake download_data_v1_1 singularity exec container.sif snakemake generate_folds_v1_1
``` ```
Then, simply modify the `cluster.json` accordingly and execute Then, simply modify the `cluster.json` accordingly, make sure that
snakemake is installed and execute
```bash ```bash
./snakemake_slurm manuscript_v1_1 ./singularity_on_slurm_cluster create_manuscript_v1_1
./snakemake_slurm impute_msm_v1_1 ./singularity_on_slurm_cluster impute_population_wide_msm_v1_1
``` ```
Markdown is supported
0% or
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment