Skip to content

Commit

Permalink
Finishing touches to the workflow by Davi (#1)
Browse files Browse the repository at this point in the history
* Add a task file

* [Suggestion] Add stubs to process (#2)

* adding stubs to all modules

* creating a bash script to generate mock data

* adding stub_params and renaming mock data

* changes following PR review

* fixing typo on stub_params

* adding last touches as requested on PR comments

* [Suggestion] README with DAG file (#3)

* adding stubs to all modules

* creating a bash script to generate mock data

* adding stub_params and renaming mock data

* changes following PR review

* fixing typo on stub_params

* adding last touches as requested on PR comments

* adding dag.dot and dag.png as resource to use later

* Updating README

* fixing typo on pipeline name

* removing dag.dot

* accepting modifications from pull request #3

* [Suggestion] Updating config files for Compute environments (#4)

* adding stubs to all modules

* creating a bash script to generate mock data

* adding stub_params and renaming mock data

* changes following PR review

* fixing typo on stub_params

* adding last touches as requested on PR comments

* adding dag.dot and dag.png as resource to use later

* Adding process configuration to nextflow.config profile

* creating a standard profile and removing directives from modules

* scratching profiles

* renoving uncicler from google life science profile

* removing configs from main and adding to folder

* adding aws, gcp and azure configurations

* adding aws access and secret key

* changing nextflow.config

* Changing azure profile name

* using includeConfig to create profiles

* removing dag.dot

* adding local and standard profiles with process settings

* removing local profile and fixing typo

* [Suggestion] README with DAG file (#3)

* adding stubs to all modules

* creating a bash script to generate mock data

* adding stub_params and renaming mock data

* changes following PR review

* fixing typo on stub_params

* adding last touches as requested on PR comments

* adding dag.dot and dag.png as resource to use later

* Updating README

* fixing typo on pipeline name

* removing dag.dot

* accepting modifications from pull request #3

* adding dag.dot and dag.png as resource to use later

* Revert commits

This reverts commit 14c4a96.

* Tweak readme

* Further tweaks

* Update the version

Co-authored-by: Mxrcon <[email protected]>
  • Loading branch information
abhi18av and Mxrcon authored Apr 30, 2021
1 parent 262e3c5 commit 7470ec5
Show file tree
Hide file tree
Showing 19 changed files with 272 additions and 66 deletions.
61 changes: 60 additions & 1 deletion README.md
Original file line number Diff line number Diff line change
@@ -1,4 +1,63 @@
# camila_sao_paolo
# camila_sao_paulo nextflow pipeline
A pipeline for Genome Assembly, Genome Anotation and Variant Calling with quality evaluation, using .fastq files and a reference genome as input.

## Minimal requirements (for local execution)

* Nextflow VERSION > 20.11
* Java 8
* Docker

## Pipeline workflow

![dag file](./resources/dag.png)

This is the complete workflow of this pipeline, the tool integration aims on a good quality evaluation of all process,

## Quick start

### Local execution
1. Install nextflow

Please refer to [Nextflow page on github](https://github.com/nextflow-io/nextflow/) for more info.

2. Run it!

```
nextflow run https://github.com/bioinformatics-lab/camila_sao_paulo_nf.git --reads $READs_PATTERN --gbkFile $GBK_FILE --outdir $OUTDIR
```

$READS_PATTERN = STR, replace for your reads location. You can write using READ_{1,2}.fastq.gz or READ_1.fastq.gz READ_2.fastq.gz

$GBK_FILE = STR, replace for your reference gbk file location.

$OUTDIR = STR, replace for the name of your desired output directory.

## Configuration Profiles.

You can use diferent profiles for this pipeline, based on the computation enviroment at your disposal. Here are the Avaliable Profiles:

* aws

* gls

* azureBatch

* awsBatch

`Note: Update conf/profile with your own credentials`

## Tower execution
This Pipeline can be launched on `Tower`, please refer to [Tower launch documentation](https://help.tower.nf/docs/launch/overview/) for step-by-step execution tutorial.

When launching from `Tower`, please update and use the `params.yml` file contents.

## Mock execution using stub-run
This project has the `-stub-run` feature, that can be used for testing propouse, it can be used on `Tower` with the Advanced settings on launch. You can also test it locally, using the following command:

```
bash data/mock_data/generate_mock_data.sh
nextflow run main.nf \
-params-file stub_params.yaml \
-stub-run
```
12 changes: 12 additions & 0 deletions Taskfile.yml
Original file line number Diff line number Diff line change
@@ -0,0 +1,12 @@
# https://taskfile.dev/

version: '3'

tasks:
default:
cmds:
- NXF_VER=21.04.0-EDGE nextflow run main.nf -params-file params/local.yml -resume -with-tower

with_stubs:
cmds:
- NXF_VER=21.04.0-EDGE nextflow run main.nf -params-file params/local.yml -resume -stub -with-tower
13 changes: 13 additions & 0 deletions conf/aws.config
Original file line number Diff line number Diff line change
@@ -0,0 +1,13 @@
workDir = 's3://YOUR WORK DIR' // <- replace with your own bucket!

process {
errorStrategy = 'retry'
executor = 'awsbatch'
}

aws {
accessKey = '<YOUR S3 ACCESS KEY>'
secretKey = '<YOUR S3 SECRET KEY>'
region = 'YOUR REGION'
client.uploadMaxThreads = 4
}
13 changes: 13 additions & 0 deletions conf/azure.config
Original file line number Diff line number Diff line change
@@ -0,0 +1,13 @@
process {executor = 'azurebatch'}
azure {
storage {
accountName = "<YOUR STORAGE ACCOUNT NAME>"
accountKey = "<YOUR STORAGE ACCOUNT KEY>"
}
batch {
location = '<YOUR LOCATION>'
accountName = '<YOUR BATCH ACCOUNT NAME>'
accountKey = '<YOUR BATCH ACCOUNT KEY>'
autoPoolMode = true
}
}
6 changes: 6 additions & 0 deletions conf/gcp.config
Original file line number Diff line number Diff line change
@@ -0,0 +1,6 @@
process.executor = 'google-lifesciences'
workDir = 'YOUR GOOGLE WORK DIRECTORY' // <- Will be created
google.region = 'YOUR GOOGLR REGION NAME'
google.project = 'YOUR GOOGLE PROJECT NAME'
errorStrategy = 'retry'
maxRetries = 2
65 changes: 65 additions & 0 deletions conf/standard.config
Original file line number Diff line number Diff line change
@@ -0,0 +1,65 @@
process {

withName:
FASTQC {
container 'quay.io/biocontainers/fastqc:0.11.9--0'
cpus 4
memory "8 GB"
}

withName:
MULTIQC {
container 'quay.io/biocontainers/multiqc:1.9--pyh9f0ad1d_0'
cpus 4
memory "8 GB"
}

withName:
PROKKA {
container 'quay.io/biocontainers/prokka:1.14.6--pl526_0'
cpus 8
memory "15 GB"
}

withName:
QUAST {
container 'quay.io/biocontainers/quast:5.0.2--py37pl526hb5aa323_2'
cpus 8
memory "15 GB"
}

withName:
SNIPPY {
container 'quay.io/biocontainers/snippy:4.6.0--0'
cpus 4
memory "8 GB"
}

withName:
SPADES {
container 'quay.io/biocontainers/spades:3.14.0--h2d02072_0'
cpus 8
memory "15 GB"
}

withName:
TRIMMOMATIC {
container 'quay.io/biocontainers/trimmomatic:0.35--6'
cpus 4
memory "8 GB"
}

withName:
UNICYCLER {
container 'quay.io/biocontainers/unicycler:0.4.8--py38h8162308_3'
cpus 8
memory "15 GB"
}

withName:
UTILS_FILTER_CONTIGS {
container 'quay.io/biocontainers/perl-bioperl:1.7.2--pl526_11'
cpus 4
memory "8 GB"
}

13 changes: 13 additions & 0 deletions data/mock_data/generate_mock_data.sh
Original file line number Diff line number Diff line change
@@ -0,0 +1,13 @@
set -uex
touch 001_R1.fastq.gz
touch 001_R2.fastq.gz
touch 002_R1.fastq.gz
touch 002_R2.fastq.gz
touch 003_R1.fastq.gz
touch 003_R2.fastq.gz
touch ref01.gbk
touch ref01.fasta
touch ref02.gbk
touch ref02.fasta
touch ref03.gbk
touch ref03.fasta
9 changes: 6 additions & 3 deletions modules/fastqc/fastqc.nf
Original file line number Diff line number Diff line change
Expand Up @@ -7,9 +7,6 @@ params.shouldPublish = true
process FASTQC {
tag "${genomeName}"
publishDir params.resultsDir, mode: params.saveMode, enabled: params.shouldPublish
container 'quay.io/biocontainers/fastqc:0.11.9--0'
cpus 4
memory "8 GB"

input:
tuple val(genomeName), path(genomeReads)
Expand All @@ -24,6 +21,12 @@ process FASTQC {
fastqc *fastq*
"""

stub:
"""
touch ${genomeName}.html
touch ${genomeName}.zip
"""
}


Expand Down
9 changes: 6 additions & 3 deletions modules/multiqc/multiqc.nf
Original file line number Diff line number Diff line change
Expand Up @@ -8,9 +8,6 @@ params.shouldPublish = true

process MULTIQC {
publishDir params.resultsDir, mode: params.saveMode, enabled: params.shouldPublish
container 'quay.io/biocontainers/multiqc:1.9--pyh9f0ad1d_0'
cpus 4
memory "8 GB"

input:
path("*")
Expand All @@ -26,6 +23,12 @@ process MULTIQC {
multiqc .
"""

stub:
"""
mkdir multiqc_data
touch multiqc_report.html
"""
}


Expand Down
11 changes: 8 additions & 3 deletions modules/prokka/prokka.nf
Original file line number Diff line number Diff line change
Expand Up @@ -7,9 +7,6 @@ params.shouldPublish = true
process PROKKA {
tag "${genomeName}"
publishDir params.resultsDir, mode: params.saveMode, enabled: params.shouldPublish
container 'quay.io/biocontainers/prokka:1.14.6--pl526_0'
cpus 8
memory "15 GB"

input:
tuple val(genomeName), path(bestContig)
Expand All @@ -24,6 +21,14 @@ process PROKKA {
prokka --outdir ${genomeName} --prefix $genomeName --cpus ${task.cpus} --proteins {reference} ${bestContig}
"""

stub:
"""
echo "prokka --outdir ${genomeName} --prefix $genomeName --cpus ${task.cpus} --proteins {reference} ${bestContig}"
mkdir ${genomeName}
"""
}


Expand Down
11 changes: 7 additions & 4 deletions modules/quast/quast.nf
Original file line number Diff line number Diff line change
Expand Up @@ -8,10 +8,6 @@ params.shouldPublish = true

process QUAST {
publishDir params.resultsDir, mode: params.saveMode, enabled: params.shouldPublish
container 'quay.io/biocontainers/quast:5.0.2--py37pl526hb5aa323_2'
cpus 8
memory "15 GB"


input:
path(scaffoldFiles)
Expand All @@ -28,6 +24,13 @@ process QUAST {
"""

stub:
"""
echo "quast -r ${reference} -t ${task.cpus} ${scaffoldFiles}"
mkdir quast_results
"""
}

workflow test {
Expand Down
9 changes: 8 additions & 1 deletion modules/snippy/snippy.nf
Original file line number Diff line number Diff line change
Expand Up @@ -7,7 +7,6 @@ params.shouldPublish = true
process SNIPPY {
tag "${genomeName}"
publishDir params.resultsDir, mode: params.saveMode, enabled: params.shouldPublish
container 'quay.io/biocontainers/snippy:4.6.0--0'

input:
tuple val(genomeName), path(genomeReads)
Expand All @@ -25,4 +24,12 @@ process SNIPPY {
"""

stub:
ram = "${task.memory}".split(" ")[0]
"""
echo "snippy --cpus ${task.cpus} --ram ${ram} --outdir $genomeName --ref $refGbk --R1 ${genomeReads[0]} --R2 ${genomeReads[1]}"
mkdir ${genomeName}
"""
}
14 changes: 10 additions & 4 deletions modules/spades/spades.nf
Original file line number Diff line number Diff line change
Expand Up @@ -7,10 +7,7 @@ params.saveMode = 'copy'

process SPADES {
tag "${genomeName}"
publishDir params.resultsDir, mode: params.saveMode
container 'quay.io/biocontainers/spades:3.14.0--h2d02072_0'
cpus 8
memory "15 GB"
publishDir params.resultsDir, mode: params.saveMod

input:
tuple val(genomeName), path(genomeReads)
Expand All @@ -25,6 +22,15 @@ process SPADES {
spades.py -k 21,33,55,77 --careful --only-assembler --pe1-1 ${genomeReads[0]} --pe1-2 ${genomeReads[1]} -o ${genomeName} -t ${task.cpus}
cp ${genomeName}/scaffolds.fasta ${genomeName}_scaffolds.fasta
"""

stub:
"""
echo "spades.py -k 21,33,55,77 --careful --only-assembler --pe1-1 ${genomeReads[0]} --pe1-2 ${genomeReads[1]} -o ${genomeName} -t ${task.cpus}"
echo "cp ${genomeName}/scaffolds.fasta ${genomeName}_scaffolds.fasta"
touch ${genomeName}_scaffolds.fasta
"""
}


Expand Down
26 changes: 20 additions & 6 deletions modules/trimmomatic/trimmomatic.nf
Original file line number Diff line number Diff line change
Expand Up @@ -9,9 +9,6 @@ params.shouldPublish = true
process TRIMMOMATIC {
tag "${genomeName}"
publishDir params.resultsDir, mode: params.saveMode, enabled: params.shouldPublish
container 'quay.io/biocontainers/trimmomatic:0.35--6'
cpus 4
memory "8 GB"

input:
tuple val(genomeName), path(genomeReads)
Expand Down Expand Up @@ -44,13 +41,30 @@ process TRIMMOMATIC {
"""

stub:
fq_1_paired = genomeName + '_R1.p.fastq.gz'
fq_1_unpaired = genomeName + '_R1.s.fastq.gz'
fq_2_paired = genomeName + '_R2.p.fastq.gz'
fq_2_unpaired = genomeName + '_R2.s.fastq.gz'

def adapter_file = "/usr/local/share/trimmomatic-0.35-6/adapters/NexteraPE-PE.fa"

"""
echo "trimmomatic \
PE \
-threads ${task.cpus} \
-phred33 \
${genomeReads[0]} \
${genomeReads[1]} \
$fq_1_paired \
$fq_1_unpaired \
$fq_2_paired \
$fq_2_unpaired \
ILLUMINACLIP:${adapter_file}:2:40:15 \
LEADING:3 TRAILING:3 SLIDINGWINDOW:3:28 HEADCROP:20 MINLEN:40"
touch ${genomeName}_R1.p.fastq.gz
touch ${genomeName}_R2.p.fastq.gz
touch ${genomeName}_R1.s.fastq.gz
touch ${genomeName}_R2.s.fastq.gz
"""
}

Expand Down
Loading

0 comments on commit 7470ec5

Please sign in to comment.