Update README.md

855ef4fe · Kevin Kunzmann · 18dcc320 · 855ef4fe
Commit 855ef4fe authored Mar 21, 2019 by Kevin Kunzmann
Hide whitespace changes
Inline Side-by-side

Showing with 51 additions and 22 deletions

README.md README.md +51 -22

No files found.
--- a/README.md
+++ b/README.md
@@ -16,56 +16,85 @@ is required.
 For information on how to get dat access, see https://www.center-tbi.eu/data.


+
 ### Software dependencies

 The workflow assumes a linux command line.
 To facilitate reproducibility, a 
-[docker container](https://cloud.docker.com/u/kkmann/repository/docker/kkmann/gose-6mo-imputation)
-container with all software dependencies (R packages etc.) is provided.
-The workflow itself is automated using [snakemake]() 5.2.1.
+[singularity image](https://zenodo.org/record/2600384) with all software 
+dependencies (R packages etc.) is provided ![DOI:10.5281/zenodo.2600384](https://zenodo.org/badge/DOI/10.5281/zenodo.2600384.svg).
+The workflow itself is automated using 
+[snakemake](https://snakemake.readthedocs.io/en/stable/index.html).
+
 To fully leverage the container and snakemake workflow, the following software
 dependencies must be available:

-* [snakemake](https://snakemake.readthedocs.io/en/stable/getting_started/installation.html) version 5.2.1+; requires [python](https://www.python.org/download/releases/3.5.1/) 3.5.1+
 * [singularity](https://www.sylabs.io/guides/2.6/user-guide/index.html) 2.6.0+
+* [wget](https://www.gnu.org/software/wget/) [optional], only for automatic 
+download of container image file
+* [snakemake](https://snakemake.readthedocs.io/en/stable/getting_started/installation.html) 5.2.1+ 
+[optional], only required for cluster execution; requires [python](https://www.python.org/download/releases/3.5.1/) 3.5.1+



-## How-To
+## How-To ...

-The download script requires the neurobot user name and the personal API key
-to be stored in the environment variables `NEUROBOT_USR` `NEUROBOT_API`, 
-respectively, i.e.
+The singularity container image can be downloaded manually from using the
+digital object identifier.
+Note that in this case the downloaded version of the container must match the
+version specified in the URL given in the wget command of 
+`scripts/download_container.sh`.
+It is strongly recommended to download the container via
+```bash
+./scripts/download_container.sh
+```
+to ensure the correct version.
+The downloaded container image file's md5 sum is checked automatically and
+an error is thrown in case of a mismatch.
+Note that the image cannot be stored in this repository due to file-size 
+limitations.
+
+Furthermore, the data-download script requires the neurobot user name and the 
+personal API key (v.s.) to be stored in the environment variables `NEUROBOT_USR` 
+`NEUROBOT_API`, respectively, i.e.
 ```bash
 export NEUROBOT_USR=[my-neurobot-username]
 export NEUROBOT_API=[my-neurobot-api-key]
 ```


-### Execute Workflow on Desktop
+
+### ... execute workflow on a single machine

 The workflow can be executed on a potent desktop machine although a cluster
-execution is recommended (cf. blow).
+execution is recommended for speed (cf. below).
+Given that singularity is installed and the container.sif file is present in
+the repository root, simply invoke
 ```bash
-./snakemake manuscript_v1_1
-./snakemake impute_msm_v1_1
+singularity exec container.sif snakemake create_manuscript_v1_1
+singularity exec container.sif snakemake impute_population_wide_msm_v1_1
 ```
-
-All output is written to `output/`.
+The first command creates all files necessary to compile the cross-validated
+analysis of imputation performance (output/v1.1/manuscript.docx);
+the second one only computes the final imputations for the entire study 
+population (output/v1.1/data/imputation/msm).
 Depending on the number of cores and available RAM, 
 the cross-validated model comparison may take several days (3+) to complete.


-### Execute Workflow on Cluster
+### ... execute workflow on cluster

-Cluster execution requires a cluster-specific [configuration](https://github.com/kkmann/center-6mo-gose-imputation/blob/master/cluster.json).
-The `singularity_slurm` script assumes existence of a slurm cluster.
-Data should be downloaded on the login node:
+Cluster execution requires a cluster-specific snakemake
+[configuration](https://github.com/kkmann/center-6mo-gose-imputation/blob/master/cluster.json).
+The `singularity_on_slurm_cluster` script assumes existence of a slurm cluster.
+It is recommended to execute only the actual model fitting on the cluster and
+to do all preprocessing on the login node to avoid unnecessary queueing time:
 ```bash
-./snakemake download_data_v1_1
+singularity exec container.sif snakemake generate_folds_v1_1
 ```
-Then, simply modify the `cluster.json` accordingly and execute
+Then, simply modify the `cluster.json` accordingly, make sure that 
+snakemake is installed and execute
 ```bash
-./snakemake_slurm manuscript_v1_1
-./snakemake_slurm impute_msm_v1_1
+./singularity_on_slurm_cluster create_manuscript_v1_1
+./singularity_on_slurm_cluster impute_population_wide_msm_v1_1
 ```