sv-callers

Structural variants (SVs) are an important class of genetic variation implicated in a wide array of genetic diseases. sv-callers is a Snakemake-based workflow that combines several state-of-the-art tools for detecting SVs in whole genome sequencing (WGS) data. The workflow is easy to use and deploy on any Linux-based machine. In particular, the workflow supports automated software deployment, easy configuration and addition of new analysis tools as well as enables to scale from a single computer to different HPC clusters with minimal effort.

Dependencies

Python
Conda - package/environment management system
Snakemake - workflow management system
Xenon CLI - command-line interface to compute and storage resources
jq - command-line JSON processor (optional)
YAtiML - library for YAML type inference and schema validation

The workflow includes the following bioinformatics tools:

SV callers
- Manta
- DELLY
- LUMPY
- GRIDSS
Post-processing

The software dependencies can be found in the conda environment files: [1],[2],[3].

1. Clone this repo.

git clone https://github.com/GooglingTheCancerGenome/sv-callers.git
cd sv-callers

2. Install dependencies.

# download Miniconda3 installer
wget https://repo.continuum.io/miniconda/Miniconda3-latest-Linux-x86_64.sh -O miniconda.sh
# install Conda (respond by 'yes')
bash miniconda.sh
# update Conda
conda update -y conda
# install Mamba
conda install -n base -c conda-forge -y mamba
# create a new environment with dependencies & activate it
mamba env create -n wf -f environment.yaml
conda activate wf

3. Configure the workflow.

config files:
- analysis.yaml - analysis-specific settings (e.g., workflow mode, I/O files, SV callers, post-processing or resources used etc.)
- samples.csv - list of (paired) samples
input files:
- example data in workflow/data directory
- reference genome in .fasta (incl. index files)
- excluded regions in .bed (optional)
- WGS samples in .bam (incl. index files)
output files:
- (filtered) SVs per caller and merged calls in .vcf (incl. index files)

4. Execute the workflow.

cd workflow

Locally

# 'dry' run only checks I/O files
snakemake -np

# 'vanilla' run if echo_run set to 1 (default) in analysis.yaml,
# it merely mimics the execution of SV callers by writing (dummy) VCF files;
# SV calling if echo_run set to 0
snakemake --use-conda --jobs

Submit jobs to Slurm or GridEngine cluster

SCH=slurm   # or gridengine
snakemake  --use-conda --latency-wait 30 --jobs \
--cluster "xenon scheduler $SCH --location local:// submit --name smk.{rule} --inherit-env --cores-per-task {threads} --max-run-time 1 --max-memory {resources.mem_mb} --working-directory . --stderr stderr-%j.log --stdout stdout-%j.log" &>smk.log&

Note: One sample or a tumor/normal pair generates in total 18 SV calling and post-processing jobs. See the workflow instance of single-sample (germline) or paired-sample (somatic) analysis.

To perform SV calling:

edit (default) parameters in analysis.yaml
- set echo_run to 0
- choose between two workflow modes: single- (s) or paired-sample (p - default)
- select one or more callers using enable_callers (default all)
use xenon CLI to set:
- --max-run-time of workflow jobs (in minutes)
- --temp-space (optional, in MB)
adjust compute requirements per SV caller according to the system used:
- the number of threads,
- the amount of memory(in MB),
- the amount of temporary disk space or tmpspace (path in TMPDIR env variable) can be used for intermediate files by LUMPY and GRIDSS only.

Query job accounting information

SCH=slurm   # or gridengine
xenon --json scheduler $SCH --location local:// list --identifier [jobID] | jq ...

Name		Name	Last commit message	Last commit date
Latest commit History 475 Commits
.github/workflows		.github/workflows
config		config
doc		doc
workflow		workflow
.editorconfig		.editorconfig
.gitignore		.gitignore
.snakemake-workflow-catalog.yml		.snakemake-workflow-catalog.yml
.zenodo.json		.zenodo.json
CHANGELOG.md		CHANGELOG.md
CITATION.cff		CITATION.cff
LICENSE		LICENSE
README.md		README.md
environment.yaml		environment.yaml
install.sh		install.sh
run.sh		run.sh
test-requirements.txt		test-requirements.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

sv-callers

Dependencies

About

Releases 8

Packages

Contributors 4

Languages

License

GooglingTheCancerGenome/sv-callers

Folders and files

Latest commit

History

Repository files navigation

sv-callers

Dependencies

About

Topics

Resources

License

Stars

Watchers

Forks

Releases 8

Packages 0

Contributors 4

Languages

Packages