Finishing touches to the workflow by Davi (#1)

* Add a task file * [Suggestion] Add stubs to process (#2) * adding stubs to all modules * creating a bash script to generate mock data * adding stub_params and renaming mock data * changes following PR review * fixing typo on stub_params * adding last touches as requested on PR comments * [Suggestion] README with DAG file (#3) * adding stubs to all modules * creating a bash script to generate mock data * adding stub_params and renaming mock data * changes following PR review * fixing typo on stub_params * adding last touches as requested on PR comments * adding dag.dot and dag.png as resource to use later * Updating README * fixing typo on pipeline name * removing dag.dot * accepting modifications from pull request #3 * [Suggestion] Updating config files for Compute environments (#4) * adding stubs to all modules * creating a bash script to generate mock data * adding stub_params and renaming mock data * changes following PR review * fixing typo on stub_params * adding last touches as requested on PR comments * adding dag.dot and dag.png as resource to use later * Adding process configuration to nextflow.config profile * creating a standard profile and removing directives from modules * scratching profiles * renoving uncicler from google life science profile * removing configs from main and adding to folder * adding aws, gcp and azure configurations * adding aws access and secret key * changing nextflow.config * Changing azure profile name * using includeConfig to create profiles * removing dag.dot * adding local and standard profiles with process settings * removing local profile and fixing typo * [Suggestion] README with DAG file (#3) * adding stubs to all modules * creating a bash script to generate mock data * adding stub_params and renaming mock data * changes following PR review * fixing typo on stub_params * adding last touches as requested on PR comments * adding dag.dot and dag.png as resource to use later * Updating README * fixing typo on pipeline name * removing dag.dot * accepting modifications from pull request #3 * adding dag.dot and dag.png as resource to use later * Revert commits This reverts commit 14c4a96. * Tweak readme * Further tweaks * Update the version Co-authored-by: Mxrcon <[email protected]>
emilyncosta · Apr 30, 2021 · 7470ec5 · 7470ec5
1 parent 262e3c5
commit 7470ec5
Show file tree

Hide file tree

Showing 19 changed files with 272 additions and 66 deletions.
diff --git a/README.md b/README.md
@@ -1,4 +1,63 @@
-# camila_sao_paolo
+# camila_sao_paulo nextflow pipeline
+A pipeline for Genome Assembly, Genome Anotation and Variant Calling with quality evaluation, using .fastq files and a reference genome as input.
+
+## Minimal requirements (for local execution)
+
+* Nextflow VERSION > 20.11
+* Java 8
+* Docker
+
+## Pipeline workflow
+
+![dag file](./resources/dag.png)
+
+This is the complete workflow of this pipeline, the tool integration aims on a good quality evaluation of all process, 
+
+## Quick start
+
+### Local execution
+1. Install nextflow 
+
+	Please refer to [Nextflow page on github](https://github.com/nextflow-io/nextflow/) for more info.
+
+2. Run it!
+
+```
+	nextflow run https://github.com/bioinformatics-lab/camila_sao_paulo_nf.git --reads $READs_PATTERN --gbkFile $GBK_FILE --outdir $OUTDIR
+
+```
+
+$READS_PATTERN = STR, replace for your reads location. You can write using READ_{1,2}.fastq.gz or READ_1.fastq.gz READ_2.fastq.gz 
+
+$GBK_FILE = STR, replace for your reference gbk file location.
+
+$OUTDIR = STR, replace for the name of your desired output directory.
+
+## Configuration Profiles.
+
+You can use diferent profiles for this pipeline, based on the computation enviroment at your disposal. Here are the Avaliable Profiles:
+
+* aws 
+
+* gls
+
+* azureBatch
+
+* awsBatch
+
+`Note: Update conf/profile with your own credentials`
+
+## Tower execution
+This Pipeline can be launched on `Tower`, please refer to [Tower launch documentation](https://help.tower.nf/docs/launch/overview/) for step-by-step execution tutorial.
 
 When launching from `Tower`, please update and use the `params.yml` file contents.
 
+## Mock execution using stub-run
+This project has the `-stub-run` feature, that can be used for testing propouse, it can be used on `Tower` with the Advanced settings on launch. You can also test it locally, using the following command:
+
+```
+bash data/mock_data/generate_mock_data.sh
+nextflow run main.nf \
+		 -params-file stub_params.yaml \
+		 -stub-run
+``` 
diff --git a/Taskfile.yml b/Taskfile.yml
@@ -0,0 +1,12 @@
+# https://taskfile.dev/
+
+version: '3'
+
+tasks:
+  default:
+    cmds:
+      - NXF_VER=21.04.0-EDGE nextflow run main.nf -params-file params/local.yml -resume -with-tower
+
+  with_stubs:
+    cmds:
+      - NXF_VER=21.04.0-EDGE nextflow run main.nf -params-file params/local.yml -resume -stub -with-tower
diff --git a/conf/aws.config b/conf/aws.config
@@ -0,0 +1,13 @@
+workDir = 's3://YOUR WORK DIR' // <- replace with your own bucket!
+
+process {
+        errorStrategy = 'retry'
+        executor = 'awsbatch'
+        }
+
+aws {
+        accessKey = '<YOUR S3 ACCESS KEY>'
+        secretKey = '<YOUR S3 SECRET KEY>'
+        region = 'YOUR REGION'
+        client.uploadMaxThreads = 4
+        }
diff --git a/conf/azure.config b/conf/azure.config
@@ -0,0 +1,13 @@
+    process {executor = 'azurebatch'}
+    azure {
+            storage {
+            accountName = "<YOUR STORAGE ACCOUNT NAME>"
+            accountKey = "<YOUR STORAGE ACCOUNT KEY>"
+                    }
+            batch {
+                location = '<YOUR LOCATION>'
+                accountName = '<YOUR BATCH ACCOUNT NAME>'
+                accountKey = '<YOUR BATCH ACCOUNT KEY>'
+                autoPoolMode = true
+                    }
+        }           
diff --git a/conf/gcp.config b/conf/gcp.config
@@ -0,0 +1,6 @@
+process.executor = 'google-lifesciences'
+workDir = 'YOUR GOOGLE WORK DIRECTORY' // <- Will be created
+google.region  = 'YOUR GOOGLR REGION NAME'
+google.project = 'YOUR GOOGLE PROJECT NAME'
+errorStrategy = 'retry'
+maxRetries = 2
diff --git a/conf/standard.config b/conf/standard.config
@@ -0,0 +1,65 @@
+process {
+
+           withName:
+           FASTQC {
+                container 'quay.io/biocontainers/fastqc:0.11.9--0'
+                cpus 4
+                memory "8 GB"
+           }
+
+           withName:
+           MULTIQC {
+                container 'quay.io/biocontainers/multiqc:1.9--pyh9f0ad1d_0'
+                cpus 4
+                memory "8 GB"
+           }
+
+           withName:
+           PROKKA {
+                container 'quay.io/biocontainers/prokka:1.14.6--pl526_0'
+                cpus 8
+                memory "15 GB"
+           }
+
+           withName:
+           QUAST {
+                container 'quay.io/biocontainers/quast:5.0.2--py37pl526hb5aa323_2'
+                cpus 8
+                memory "15 GB"
+           }
+
+           withName:
+           SNIPPY {
+                container 'quay.io/biocontainers/snippy:4.6.0--0'
+                cpus 4
+                memory "8 GB"
+           }
+
+           withName:
+           SPADES {
+                container 'quay.io/biocontainers/spades:3.14.0--h2d02072_0'
+                cpus 8
+                memory "15 GB"
+           }
+
+           withName:
+           TRIMMOMATIC {
+                container 'quay.io/biocontainers/trimmomatic:0.35--6'
+                cpus 4
+                memory "8 GB"
+           }
+
+           withName:
+           UNICYCLER {
+                container 'quay.io/biocontainers/unicycler:0.4.8--py38h8162308_3'
+                cpus 8
+                memory "15 GB"
+           } 
+
+           withName:
+           UTILS_FILTER_CONTIGS {
+                container 'quay.io/biocontainers/perl-bioperl:1.7.2--pl526_11'
+                cpus 4
+                memory "8 GB"
+           }
+
diff --git a/data/mock_data/generate_mock_data.sh b/data/mock_data/generate_mock_data.sh
@@ -0,0 +1,13 @@
+set -uex
+touch 001_R1.fastq.gz
+touch 001_R2.fastq.gz
+touch 002_R1.fastq.gz
+touch 002_R2.fastq.gz
+touch 003_R1.fastq.gz
+touch 003_R2.fastq.gz
+touch ref01.gbk
+touch ref01.fasta
+touch ref02.gbk
+touch ref02.fasta
+touch ref03.gbk
+touch ref03.fasta
diff --git a/modules/fastqc/fastqc.nf b/modules/fastqc/fastqc.nf
@@ -7,9 +7,6 @@ params.shouldPublish = true
 process FASTQC {
     tag "${genomeName}"
     publishDir params.resultsDir, mode: params.saveMode, enabled: params.shouldPublish
-    container 'quay.io/biocontainers/fastqc:0.11.9--0'
-    cpus 4
-    memory "8 GB"
 
     input:
     tuple val(genomeName), path(genomeReads)
@@ -24,6 +21,12 @@ process FASTQC {
     fastqc *fastq*
     """
 
+    stub:
+    """
+    touch ${genomeName}.html
+
+    touch ${genomeName}.zip
+    """
 }
 
 

diff --git a/modules/multiqc/multiqc.nf b/modules/multiqc/multiqc.nf
@@ -8,9 +8,6 @@ params.shouldPublish = true
 
 process MULTIQC {
     publishDir params.resultsDir, mode: params.saveMode, enabled: params.shouldPublish
-    container 'quay.io/biocontainers/multiqc:1.9--pyh9f0ad1d_0'
-    cpus 4
-    memory "8 GB"
 
     input:
     path("*")
@@ -26,6 +23,12 @@ process MULTIQC {
     multiqc .
     """
 
+    stub:
+    """
+    mkdir multiqc_data
+
+    touch multiqc_report.html
+    """
 }
 
 

diff --git a/modules/prokka/prokka.nf b/modules/prokka/prokka.nf
@@ -7,9 +7,6 @@ params.shouldPublish = true
 process PROKKA {
     tag "${genomeName}"
     publishDir params.resultsDir, mode: params.saveMode, enabled: params.shouldPublish
-    container 'quay.io/biocontainers/prokka:1.14.6--pl526_0'
-    cpus 8
-    memory "15 GB"
 
     input:
     tuple val(genomeName),  path(bestContig)
@@ -24,6 +21,14 @@ process PROKKA {
     prokka --outdir ${genomeName} --prefix $genomeName --cpus ${task.cpus} --proteins {reference} ${bestContig} 
     """
 
+    stub:
+    """
+    echo "prokka --outdir ${genomeName} --prefix $genomeName --cpus ${task.cpus} --proteins {reference} ${bestContig}"
+
+
+    mkdir ${genomeName}
+    
+    """
 }
 
 

diff --git a/modules/quast/quast.nf b/modules/quast/quast.nf
@@ -8,10 +8,6 @@ params.shouldPublish = true
 
 process QUAST {
     publishDir params.resultsDir, mode: params.saveMode, enabled: params.shouldPublish
-    container 'quay.io/biocontainers/quast:5.0.2--py37pl526hb5aa323_2'
-    cpus 8
-    memory "15 GB"
-
 
     input:
     path(scaffoldFiles)
@@ -28,6 +24,13 @@ process QUAST {
 
     """
 
+    stub:
+    """
+    echo "quast -r ${reference} -t ${task.cpus} ${scaffoldFiles}"
+
+    mkdir quast_results
+
+    """
 }
 
 workflow test {

diff --git a/modules/snippy/snippy.nf b/modules/snippy/snippy.nf
@@ -7,7 +7,6 @@ params.shouldPublish = true
 process SNIPPY {
     tag "${genomeName}"
     publishDir params.resultsDir, mode: params.saveMode, enabled: params.shouldPublish
-    container 'quay.io/biocontainers/snippy:4.6.0--0'
 
     input:
     tuple val(genomeName),  path(genomeReads)
@@ -25,4 +24,12 @@ process SNIPPY {
 
     """
 
+    stub:
+    ram = "${task.memory}".split(" ")[0]
+    """
+    echo "snippy --cpus ${task.cpus} --ram ${ram} --outdir $genomeName --ref $refGbk --R1 ${genomeReads[0]} --R2 ${genomeReads[1]}"
+
+    mkdir ${genomeName}
+
+    """
 }
diff --git a/modules/spades/spades.nf b/modules/spades/spades.nf
@@ -7,10 +7,7 @@ params.saveMode = 'copy'
 
 process SPADES {
     tag "${genomeName}"
-    publishDir params.resultsDir, mode: params.saveMode
-    container 'quay.io/biocontainers/spades:3.14.0--h2d02072_0'
-    cpus 8
-    memory "15 GB"
+    publishDir params.resultsDir, mode: params.saveMod
 
     input:
     tuple val(genomeName), path(genomeReads)
@@ -25,6 +22,15 @@ process SPADES {
     spades.py -k 21,33,55,77 --careful --only-assembler --pe1-1 ${genomeReads[0]} --pe1-2 ${genomeReads[1]} -o ${genomeName} -t ${task.cpus}
     cp ${genomeName}/scaffolds.fasta ${genomeName}_scaffolds.fasta 
     """
+
+    stub:
+    """
+    echo "spades.py -k 21,33,55,77 --careful --only-assembler --pe1-1 ${genomeReads[0]} --pe1-2 ${genomeReads[1]} -o ${genomeName} -t ${task.cpus}"
+    echo "cp ${genomeName}/scaffolds.fasta ${genomeName}_scaffolds.fasta"
+
+    touch ${genomeName}_scaffolds.fasta
+
+    """
 }
 
 

diff --git a/modules/trimmomatic/trimmomatic.nf b/modules/trimmomatic/trimmomatic.nf
@@ -9,9 +9,6 @@ params.shouldPublish = true
 process TRIMMOMATIC {
     tag "${genomeName}"
     publishDir params.resultsDir, mode: params.saveMode, enabled: params.shouldPublish
-    container 'quay.io/biocontainers/trimmomatic:0.35--6'
-    cpus 4
-    memory "8 GB"
 
     input:
     tuple val(genomeName), path(genomeReads)
@@ -44,13 +41,30 @@ process TRIMMOMATIC {
     """
 
     stub:
+    fq_1_paired = genomeName + '_R1.p.fastq.gz'
+    fq_1_unpaired = genomeName + '_R1.s.fastq.gz'
+    fq_2_paired = genomeName + '_R2.p.fastq.gz'
+    fq_2_unpaired = genomeName + '_R2.s.fastq.gz'
+
+    def adapter_file = "/usr/local/share/trimmomatic-0.35-6/adapters/NexteraPE-PE.fa"
+
     """
+    echo "trimmomatic \
+    PE \
+    -threads ${task.cpus} \
+    -phred33 \
+    ${genomeReads[0]} \
+    ${genomeReads[1]} \
+    $fq_1_paired \
+    $fq_1_unpaired \
+    $fq_2_paired \
+    $fq_2_unpaired \
+    ILLUMINACLIP:${adapter_file}:2:40:15  \
+    LEADING:3 TRAILING:3 SLIDINGWINDOW:3:28 HEADCROP:20 MINLEN:40"
+
     touch ${genomeName}_R1.p.fastq.gz
     touch ${genomeName}_R2.p.fastq.gz
 
-    touch ${genomeName}_R1.s.fastq.gz
-    touch ${genomeName}_R2.s.fastq.gz
-
     """
 }