Commit 599605f5 authored by Kevin Kunzmann's avatar Kevin Kunzmann
parents cb431c22 87dfc01d
# Impute gene expression data for CENTER-TBI using PrediXcan
# Impute gene expression for CENTER-TBI with PrediXcan
The singularity container is available for download under https://doi.org/10.5281/zenodo.3376504.
Data currently needs to be accessed manually due to access restrictions, this workflow should work for enssentially any
vcf.gz file with dosage (DS) information.
More information on PrediXcan can be found here https://github.com/hakyimlab/PrediXcan and here in the publication:
The singularity container with most software dpendencies is available at
https://doi.org/10.5281/zenodo.3376504.
Data currently needs to be accessed manually due to access restrictions.
This workflow is design for *.vcf.gz files with dosage (DS) information.
More information on PrediXcan can be found here https://github.com/hakyimlab/PrediXcan and in:
> Gamazon ER†, Wheeler HE†, Shah KP†, Mozaffari SV, Aquino-Michaels K, Carroll RJ, Eyler AE, Denny JC,
Nicolae DL, Cox NJ, Im HK. (2015) A gene-based association method for mapping traits using reference transcriptome data.
Nat Genet. doi:10.1038/ng.3367.
We use snakemake to organize the workflow (also pre-installed in the container) and support cluster execution.
> Johannes Köster, Sven Rahmann, Snakemake—a scalable bioinformatics workflow engine, Bioinformatics,
Volume 28, Issue 19, 1 October 2012, Pages 2520–2522, https://doi.org/10.1093/bioinformatics/bts480
## Dependencies
1. linux shell (`bash`), possibly via virtual machine on Windows/Mac
2. `wget` (pre-installed or via distribution package manager)
3. `singularity` container software (tested on 3.3.0) https://sylabs.io/guides/3.3/user-guide
4. `git`
3. `singularity` container software (tested on 3.3.0, https://sylabs.io/guides/3.3/user-guide)
4. `git` (https://git-scm.com/book/en/v2/Getting-Started-Installing-Git)
### Optional
5. python 3.7+ and snakemake
6. slurm cluster
We use snakemake to organize the workflow (also pre-installed in the container) and support cluster execution.
Snakemake is available via `pip` package for python 3.7.
> Johannes Köster, Sven Rahmann, Snakemake—a scalable bioinformatics workflow engine, Bioinformatics,
Volume 28, Issue 19, 1 October 2012, Pages 2520–2522, https://doi.org/10.1093/bioinformatics/bts480
## Execution
Download and extract the contents of this repository (might be access restricted)
......@@ -46,4 +48,14 @@ Optionally, if snakemake is installed, the workflow can be run in parallel via
snakemake --use-singularity -j 8 impute
where '8' can be replaced by the number of available cores.
Cluster execution is enables via the `scripts/slurm_snakemake.sh` script as
bash scripts/slurm_snakemake.sh impute
## Results
Intermediate files (PrediXcan dosage files and raw outputs) are stored in the `outputs/`
subfolder of the working directory.
The file `output/gene_expressions_combined.rds` combines imputed gene expression levels across
all available brain regions in a compressed .rds file (R data set).
Markdown is supported
0% or
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment