Commit 855ef4fe authored by Kevin Kunzmann's avatar Kevin Kunzmann

Update README.md

parent 18dcc320
......@@ -16,56 +16,85 @@ is required.
For information on how to get dat access, see https://www.center-tbi.eu/data.
### Software dependencies
The workflow assumes a linux command line.
To facilitate reproducibility, a
[docker container](https://cloud.docker.com/u/kkmann/repository/docker/kkmann/gose-6mo-imputation)
container with all software dependencies (R packages etc.) is provided.
The workflow itself is automated using [snakemake]() 5.2.1.
[singularity image](https://zenodo.org/record/2600384) with all software
dependencies (R packages etc.) is provided ![DOI:10.5281/zenodo.2600384](https://zenodo.org/badge/DOI/10.5281/zenodo.2600384.svg).
The workflow itself is automated using
[snakemake](https://snakemake.readthedocs.io/en/stable/index.html).
To fully leverage the container and snakemake workflow, the following software
dependencies must be available:
* [snakemake](https://snakemake.readthedocs.io/en/stable/getting_started/installation.html) version 5.2.1+; requires [python](https://www.python.org/download/releases/3.5.1/) 3.5.1+
* [singularity](https://www.sylabs.io/guides/2.6/user-guide/index.html) 2.6.0+
* [wget](https://www.gnu.org/software/wget/) [optional], only for automatic
download of container image file
* [snakemake](https://snakemake.readthedocs.io/en/stable/getting_started/installation.html) 5.2.1+
[optional], only required for cluster execution; requires [python](https://www.python.org/download/releases/3.5.1/) 3.5.1+
## How-To
## How-To ...
The download script requires the neurobot user name and the personal API key
to be stored in the environment variables `NEUROBOT_USR` `NEUROBOT_API`,
respectively, i.e.
The singularity container image can be downloaded manually from using the
digital object identifier.
Note that in this case the downloaded version of the container must match the
version specified in the URL given in the wget command of
`scripts/download_container.sh`.
It is strongly recommended to download the container via
```bash
./scripts/download_container.sh
```
to ensure the correct version.
The downloaded container image file's md5 sum is checked automatically and
an error is thrown in case of a mismatch.
Note that the image cannot be stored in this repository due to file-size
limitations.
Furthermore, the data-download script requires the neurobot user name and the
personal API key (v.s.) to be stored in the environment variables `NEUROBOT_USR`
`NEUROBOT_API`, respectively, i.e.
```bash
export NEUROBOT_USR=[my-neurobot-username]
export NEUROBOT_API=[my-neurobot-api-key]
```
### Execute Workflow on Desktop
### ... execute workflow on a single machine
The workflow can be executed on a potent desktop machine although a cluster
execution is recommended (cf. blow).
execution is recommended for speed (cf. below).
Given that singularity is installed and the container.sif file is present in
the repository root, simply invoke
```bash
./snakemake manuscript_v1_1
./snakemake impute_msm_v1_1
singularity exec container.sif snakemake create_manuscript_v1_1
singularity exec container.sif snakemake impute_population_wide_msm_v1_1
```
All output is written to `output/`.
The first command creates all files necessary to compile the cross-validated
analysis of imputation performance (output/v1.1/manuscript.docx);
the second one only computes the final imputations for the entire study
population (output/v1.1/data/imputation/msm).
Depending on the number of cores and available RAM,
the cross-validated model comparison may take several days (3+) to complete.
### Execute Workflow on Cluster
### ... execute workflow on cluster
Cluster execution requires a cluster-specific [configuration](https://github.com/kkmann/center-6mo-gose-imputation/blob/master/cluster.json).
The `singularity_slurm` script assumes existence of a slurm cluster.
Data should be downloaded on the login node:
Cluster execution requires a cluster-specific snakemake
[configuration](https://github.com/kkmann/center-6mo-gose-imputation/blob/master/cluster.json).
The `singularity_on_slurm_cluster` script assumes existence of a slurm cluster.
It is recommended to execute only the actual model fitting on the cluster and
to do all preprocessing on the login node to avoid unnecessary queueing time:
```bash
./snakemake download_data_v1_1
singularity exec container.sif snakemake generate_folds_v1_1
```
Then, simply modify the `cluster.json` accordingly and execute
Then, simply modify the `cluster.json` accordingly, make sure that
snakemake is installed and execute
```bash
./snakemake_slurm manuscript_v1_1
./snakemake_slurm impute_msm_v1_1
./singularity_on_slurm_cluster create_manuscript_v1_1
./singularity_on_slurm_cluster impute_population_wide_msm_v1_1
```
Markdown is supported
0% or
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment