Skip to content
Projects
Groups
Snippets
Help
Loading...
Help
Support
Submit feedback
Contribute to GitLab
Sign in
Toggle navigation
impute-gene-expression
Project overview
Project overview
Details
Activity
Releases
Cycle Analytics
Repository
Repository
Files
Commits
Branches
Tags
Contributors
Graph
Compare
Charts
Issues
0
Issues
0
List
Boards
Labels
Milestones
Merge Requests
0
Merge Requests
0
Members
Members
Collapse sidebar
Close sidebar
Activity
Graph
Charts
Create a new issue
Commits
Issue Boards
Open sidebar
Kevin Kunzmann
impute-gene-expression
Commits
36e4a0ef
Commit
36e4a0ef
authored
Sep 02, 2019
by
Kevin Kunzmann
Browse files
Options
Browse Files
Download
Email Patches
Plain Diff
bugfix
parent
ba722fac
Changes
2
Hide whitespace changes
Inline
Side-by-side
Showing
2 changed files
with
29 additions
and
19 deletions
+29
-19
README.md
README.md
+27
-17
Snakefile
Snakefile
+2
-2
No files found.
README.md
View file @
36e4a0ef
# Impute gene expression for CENTER-TBI with PrediXcan
# Impute gene expression for CENTER-TBI with PrediXcan
The singularity container with most software dpendencies is available at
The singularity container with most software dpendencies is available at
https://doi.org/10.5281/zenodo.3376504.
https://doi.org/10.5281/zenodo.3376504.
Data currently needs to be accessed manually due to access restrictions.
Data currently needs to be accessed manually due to access restrictions.
This workflow is design for
*
.vcf.gz files with dosage (DS) information.
This workflow is design for
*
.vcf.gz files with dosage (DS) information.
More information on PrediXcan can be found here https://github.com/hakyimlab/PrediXcan and in:
More information on PrediXcan can be found here https://github.com/hakyimlab/PrediXcan and in:
> Gamazon ER†, Wheeler HE†, Shah KP†, Mozaffari SV, Aquino-Michaels K, Carroll RJ, Eyler AE, Denny JC,
> Gamazon ER†, Wheeler HE†, Shah KP†, Mozaffari SV, Aquino-Michaels K, Carroll RJ, Eyler AE, Denny JC,
Nicolae DL, Cox NJ, Im HK. (2015) A gene-based association method for mapping traits using reference transcriptome data.
Nicolae DL, Cox NJ, Im HK. (2015) A gene-based association method for mapping traits using reference transcriptome data.
Nat Genet. doi:10.1038/ng.3367.
Nat Genet. doi:10.1038/ng.3367.
## Dependencies
## Dependencies
1.
linux shell (
`bash`
), possibly via virtual machine on Windows/Mac
1.
linux shell (
`bash`
), possibly via virtual machine on Windows/Mac
2.
`wget`
(pre-installed or via distribution package manager)
2.
`wget`
(pre-installed or via distribution package manager)
3.
`singularity`
container software (tested on 3.3.0, https://sylabs.io/guides/3.3/user-guide)
3.
`singularity`
container software (tested on 3.3.0, https://sylabs.io/guides/3.3/user-guide)
4.
`git`
(https://git-scm.com/book/en/v2/Getting-Started-Installing-Git)
4.
`git`
(https://git-scm.com/book/en/v2/Getting-Started-Installing-Git)
5.
for data download fimm GCP bucket access to
`fimm-horizon-outgoing-data/CENTER_TBI_data_freeze_190829/Imputed_data`
### Optional
### Optional
5.
python 3.7+ and snakemake
5.
python 3.7+ and snakemake
6.
slurm cluster
6.
slurm cluster
We use snakemake to organize the workflow (also pre-installed in the container) and support cluster execution.
We use snakemake to organize the workflow (also pre-installed in the container) and support cluster execution.
Snakemake is available via
`pip`
package for python 3.7.
Snakemake is available via
`pip`
package for python 3.7.
> Johannes Köster, Sven Rahmann, Snakemake—a scalable bioinformatics workflow engine, Bioinformatics,
> Johannes Köster, Sven Rahmann, Snakemake—a scalable bioinformatics workflow engine, Bioinformatics,
Volume 28, Issue 19, 1 October 2012, Pages 2520–2522, https://doi.org/10.1093/bioinformatics/bts480
Volume 28, Issue 19, 1 October 2012, Pages 2520–2522, https://doi.org/10.1093/bioinformatics/bts480
## Execution
## Execution
Download and extract the contents of this repository (might be access restricted)
Download and extract the contents of this repository (might be access restricted)
git clone https://github.com/kkmann/impute-gene-expression
git clone https://github.com/kkmann/impute-gene-expression
...
@@ -38,24 +39,33 @@ Download and extract the contents of this repository (might be access restricted
...
@@ -38,24 +39,33 @@ Download and extract the contents of this repository (might be access restricted
Download the container image
Download the container image
bash scripts/download_container.sh
bash scripts/download_container.sh
Obtain the imputed genomes (GCP bucket: fimm-horizon-outgoing-data/CENTER_TBI_data_freeze_190829/Imputed_data).
`gsutils`
is pre-installed in the container image, to authenticate with your
GCP account run and follow the interactive instructions
singularity shell container.sif
gcloud auth login
snakemake download_imputed_genotypes
exit
Execute the workflow inside the container on a single core (takes a while!)
Execute the workflow inside the container on a single core (takes a while!)
singularity exec container.sif snakemake impute
singularity exec container.sif snakemake impute
Optionally, if snakemake is installed, the workflow can be run in parallel via
Optionally, if snakemake is installed, the workflow can be run in parallel via
snakemake --use-singularity -j 8 impute
snakemake --use-singularity -j 8 impute
where '8' can be replaced by the number of available cores.
where '8' can be replaced by the number of available cores.
Cluster execution is enables via the
`scripts/slurm_snakemake.sh`
script as
Cluster execution is enables via the
`scripts/slurm_snakemake.sh`
script as
bash scripts/slurm_snakemake.sh impute
bash scripts/slurm_snakemake.sh impute
## Results
## Results
Intermediate files (PrediXcan dosage files and raw outputs) are stored in the
`outputs/`
Intermediate files (PrediXcan dosage files and raw outputs) are stored in the
`outputs/`
subfolder of the working directory.
subfolder of the working directory.
The file
`output/gene_expressions_combined.rds`
combines imputed gene expression levels across
The file
`output/gene_expressions_combined.rds`
combines imputed gene expression levels across
all available brain regions in a compressed .rds file (R data set).
all available brain regions in a compressed .rds file (R data set).
Snakefile
View file @
36e4a0ef
...
@@ -70,7 +70,7 @@ rule vcf_to_dosages:
...
@@ -70,7 +70,7 @@ rule vcf_to_dosages:
export prefix={wildcards.output_dir}/dosages
export prefix={wildcards.output_dir}/dosages
mkdir -p $prefix
mkdir -p $prefix
echo "extracting and computing MAFs ..."
echo "extracting and computing MAFs ..."
bcftools +fill-tags {input
s
.vcf_gz_file} > $prefix/chr{wildcards.i}.vcf
bcftools +fill-tags {input.vcf_gz_file} > $prefix/chr{wildcards.i}.vcf
echo 'querying dosages ...'
echo 'querying dosages ...'
bcftools query -e 'MAF[0]>{config[min_MAF]} | INFO>{config[min_INFO]} | TYPE!="snp" | N_ALT!=1' -f '%CHROM %ID %POS %REF %ALT %INFO/MAF [%DS ]\n' $prefix/chr{wildcards.i}.vcf > $prefix/chr{wildcards.i}.dosage.txt
bcftools query -e 'MAF[0]>{config[min_MAF]} | INFO>{config[min_INFO]} | TYPE!="snp" | N_ALT!=1' -f '%CHROM %ID %POS %REF %ALT %INFO/MAF [%DS ]\n' $prefix/chr{wildcards.i}.vcf > $prefix/chr{wildcards.i}.dosage.txt
echo 'compressing ...'
echo 'compressing ...'
...
@@ -94,7 +94,7 @@ rule generate_samples_file:
...
@@ -94,7 +94,7 @@ rule generate_samples_file:
"""
"""
export prefix={wildcards.output_dir}/dosages
export prefix={wildcards.output_dir}/dosages
mkdir -p $prefix
mkdir -p $prefix
bcftools query -l {input
s
.vcf_gz_file} >> $prefix/samples_.txt
bcftools query -l {input.vcf_gz_file} >> $prefix/samples_.txt
# family ID = individual ID
# family ID = individual ID
awk {params.format} < $prefix/samples_.txt > $prefix/samples.txt
awk {params.format} < $prefix/samples_.txt > $prefix/samples.txt
rm $prefix/samples_.txt
rm $prefix/samples_.txt
...
...
Write
Preview
Markdown
is supported
0%
Try again
or
attach a new file
Attach a file
Cancel
You are about to add
0
people
to the discussion. Proceed with caution.
Finish editing this message first!
Cancel
Please
register
or
sign in
to comment