Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Refactor channel identifiers and process names #45

Merged

Conversation

abhi18av
Copy link
Contributor

This PR builds upon the discussion done in #38 .

I'm not sure how to test locally, so I've initiated this PR to track all changes.

Unless I have ignored something crucial, as of now the PR implements the refactor of identifiers used in the modules/workflows.

Happy to accomodate any change request :)

@fmalmeida fmalmeida self-assigned this Dec 19, 2021
@fmalmeida
Copy link
Owner

Hi! Thanks for the the efforts.

I have checked the changes, and it seems that everything will run ok because it just changes the cases and nothing else, however a quick test is required to check if no typo is present.

When skimming through it, I found two typos:

  1. In line 15, the file "prokka.nf" is uppercase, but the filename has not changed
  2. In line 57 there is a typo in the module name

The only way I can think of testing it is running an example with a small bacterial genome and with a nanopore subset with fast5 to try the methylation calling process.

As the second one using raw reads and fast5 can be a little demanding I can make these tests myself directly using the branch from your fork after the typos have been fixed and before merging it.

😄

Copy link
Contributor Author

@abhi18av abhi18av left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the review @fmalmeida - good catches! I've accomodated the change requests.

workflows/bacannot.nf Outdated Show resolved Hide resolved
workflows/bacannot.nf Outdated Show resolved Hide resolved
@fmalmeida
Copy link
Owner

Hi @abhi18av, thank you for accomodating the changes and all the help.

I will perform a rapid test to check if nextflow doesn't find another typo, but everything should run smoothly. As soon as the test finishes, I'll merge it in the develop branch.

Talking about the develop. Which I believe it is comming very near to trigger a new release. We could later discuss about when (which issues to be in it) to create a new one.

@fmalmeida
Copy link
Owner

When running the test it complained about a flye_ch channel:

$ nextflow run \
       abhi18av/bacannot -r abhinav/refactor-channel-identifiers \
       -profile test --threads 10 --output ABHINAV

Launching `abhi18av/bacannot` [astonishing_albattani] - revision: 0f58558832 [abhinav/refactor-channel-identifiers]
=================================================================
 Container-based, fmalmeida/bacannot, Genome Annotation Pipeline 
=================================================================
Input genomes  : https://github.com/fmalmeida/bacannot/raw/master/test_data/samplesheet.yaml
Output dir     : ABHINAV
Threads        : 10
Blast % ID - Virulence Genes: 90
Blast query coverage - Virulence Genes: 80
Blast % ID - AMR Genes: 90
Blast query coverage - AMR Genes: 80
Blast % ID - ICEs or Phages: 65
Blast query coverage - ICEs or Phages: 65
Blast % ID - Plasmids: 90
Blast query coverage - Plasmids: 60
Pipeline Release: abhinav/refactor-channel-identifiers
Current home   : /home/falmeida
Current user   : falmeida
Current path   : /work1/falmeida/bacannot_tests
Configuration file: /home/falmeida/.nextflow/assets/abhi18av/bacannot/nextflow.config
==============================================================
No such variable: flye_ch

 -- Check script '/home/falmeida/.nextflow/assets/abhi18av/bacannot/./workflows/bacannot.nf' at line: 132 or see '.nextflow.log' file for more details

The problem stated is in line 132 where you've changed it to flye_ch and unicycler_ch:

PROKKA(parsed_inputs.annotation_ch.mix(flye_ch.out[1], unicycler_ch.out[1]))

However, in this step, prokka module expects the outputs from assemblers modules, so it should be something like:

PROKKA(parsed_inputs.annotation_ch.mix(FLYE.out[1], UNICYCLER.out[1]))

workflows/bacannot.nf Outdated Show resolved Hide resolved
@abhi18av
Copy link
Contributor Author

Thanks for the catch again @fmalmeida #45 (comment) , I've addressed it in 03cde47

@fmalmeida
Copy link
Owner

Hi @abhi18av, thanks for the change. I've tried it again and it now complained the following:

Missing process or function with name 'iceberg'

 -- Check script '/home/falmeida/.nextflow/assets/abhi18av/bacannot/main.nf' at line: 135 or see '.nextflow.log' file for more details

The problem seems in line 209:

// ICEs search
      if (params.skip_iceberg_search == false) {
        // ICEberg db
        iceberg(PROKKA.out[4], PROKKA.out[3])
        iceberg_output_ch = ICEBERG.out[1]
        iceberg_output_2_ch = ICEBERG.out[2]
      } else {
        iceberg_output_ch = Channel.empty()
        iceberg_output_2_ch = Channel.empty()
      }

workflows/bacannot.nf Outdated Show resolved Hide resolved
@abhi18av
Copy link
Contributor Author

Ah, okay. I've now fixed it with 6b77030

(Feliz Natal! 🎄🎅 )

@fmalmeida
Copy link
Owner

Feliz Natal! 🌲

@fmalmeida
Copy link
Owner

It almost worked, but it know complained about this line:

      /*
          Eighth step -- Merge all annotations with the same Prefix value in a single Channel
      */
      annotations_files_ch = PROKKA.out[3].join(PROKKA.out[1])
                                       .join(MLST.out[0])
                                       .join(BARRNAP.out[0])
                                       .join(COMPUTE_GC.out[0])
                                       .join(kofamscan_output_ch, remainder: true)
                                       .join(vfdb_output_ch,      remainder: true)
                                       .join(victors_output_ch,   remainder: true)
                                       .join(amrfinder_output_ch, remainder: true)
                                       .join(resfinder_gff_ch,    remainder: true)
                                       .join(rgi_output_ch,       remainder: true)
                                       .join(iceberg_output_ch,   remainder: true)
                                       .join(phast_output_ch,     remainder: true)
                                       .join(phigaro_output_2_ch, remainder: true)
                                       .join(FIND_GIS.out[0],  remainder: true)

      // Contatenation of annotations in a single GFF file
      MERGE_ANNOTATIONS(ANNOTATIONS_FILES.join(DIGIS.out[1],     remainder: true))

The annotation files channel is not ANNOTATIONS_FILES, but annotations_files_ch. I believe it could even be changed to annotation_files_ch.

After that I believe it would work.

workflows/bacannot.nf Outdated Show resolved Hide resolved
@abhi18av
Copy link
Contributor Author

Hey @fmalmeida , fixed that one now and keeping my 🤞 😄

As an aside, to speed up such code reviews Github has a couple features

  1. The first is called Suggestions - which helps in adding the exact code change which you feel is the cause for these problems.

  2. The other one is gh the Github CLI, which allows you to contribute directly to this branch (even if it's from a fork, I think) with command like gh pr checkout 45

Apart from this, happy to address any other issues.

@abhi18av
Copy link
Contributor Author

Ah, there's still another one you found. Apologies for this Felipe! 😅

I'd really appreciate if there's a light-weight testing data I could use on my end - hopefully this would avoid these issues in future

@fmalmeida
Copy link
Owner

fmalmeida commented Dec 31, 2021

Hi @abhi18av,

No worries. I still not checked if it finished properly. I will check it again after 06/Jan.

Then, I can also think about a small dataset to share with you. The dataset just can't be too small in order to properly test the majority of processes.

For now, there are two 30X bacterial seq reads available with -profile test. I know it is not too small so it may take ~40 min to finish. But at least it enables the check-up of the NF code itself.

After coming back from vacations I will put more efforts on find a more appropriate testing dataset.

Furthermore,

Happy new year, and happy celebrations. All the best and thank you for helping with the code 🥳😁

@fmalmeida
Copy link
Owner

The tests here have finished successfully, so I will now merge this PR.

I have created a new issue #46 to handle this "problem" of a smaller dataset for quicker tests.

😄

@abhi18av abhi18av deleted the abhinav/refactor-channel-identifiers branch January 10, 2022 09:09
fmalmeida added a commit that referenced this pull request Jan 24, 2022
* Split out the config profiles (#43)

* Split out the config profiles

* Resolve the conflict using the upstream config

* Refactor channel identifiers and process names (#45)

**Main change**

Processes, channels and workflow names have had its case names changed in order to meet nextflow's community standards.

**Commits**
* Refactor names
* refactor names in the workflows
* Update workflows/bacannot.nf
* Update workflows/bacannot.nf
* Apply suggestions from code review
* Update workflows/bacannot.nf wrt iceberg
* Accomodate code review and fix the channel name
* fix MERGE_ANNOTATIONS process case in line 331

Co-authored-by: Felipe Marques de Almeida <[email protected]>

* added small dataset test profile

* fixing name of quicktest profile

* fixing urls of testing samplesheets

* removing unnecessary files

* added new action to test the pipeline from PR and updated docs about quicktest

* fixed branch name in github action

* fixing input typo in quicktest profile

* adding more space to workflow env

* fixed action as only 2 threads are available

Co-authored-by: Abhinav Sharma <[email protected]>
fmalmeida added a commit that referenced this pull request Mar 28, 2022
* Split out the config profiles (#43)

* Split out the config profiles

* Resolve the conflict using the upstream config

* trying to have a create dbs process

* found way to download dbs

* adding resfinder and plasmidfinder download rules

* added phigaro, vfdb and amrfinder download rules

* added last database download rules

* added prokka HMM download rule

* added label to use docker for tools

* fixed argminer download

* fixed victors download

* fixed iceberg download

* prokka using given database and bioconda image

* docker specific for downloading databases

* update packages

* added mlst database download rule

* add bacannot db info

* also downloads PGAP db

* added prokka and mlst

* fix conditional

* fixed pgap conditional

* added barrnap

* added 'compute_gc' module -- think about modules with two tools

* adding identation

* fixing label

* first tries on conda envs

* removing label

* adding plasmidfinder

* added platon

* working until islandpath

* added: VFDB

* added: Victors

* changing name and organization of MISC image

* added: PHAST

* added: phigaro

* added: phispy

* added: iceberg

* added: kofamscan db download

* removing named outputs as it will be solved in another PR

* added: kofamscan from downloaded database

* added: kegg_decoder

* Refactor channel identifiers and process names (#45)

**Main change**

Processes, channels and workflow names have had its case names changed in order to meet nextflow's community standards.

**Commits**
* Refactor names
* refactor names in the workflows
* Update workflows/bacannot.nf
* Update workflows/bacannot.nf
* Apply suggestions from code review
* Update workflows/bacannot.nf wrt iceberg
* Accomodate code review and fix the channel name
* fix MERGE_ANNOTATIONS process case in line 331

Co-authored-by: Felipe Marques de Almeida <[email protected]>

* trying to brind amrfinder, card_rgi and argminer

* trying to fix how the scale is calculated for amrfinder

* changing scale to perl

* typo fix

* removing unnecessary comma

* Update amrfinder.nf

passing calculation of thresold into two decimals scale to groovy

* properly working until amrfinder and card_rgi

* added small dataset test profile

* fixing name of quicktest profile

* fixing urls of testing samplesheets

* removing unnecessary files

* fix missing label

* adding resfinder to miscellaneous image

* fixing gitignore

* change manifest to upcoming version

* updating db download workflow behaviour

* removing .git dirs

* fixing argminder download

* properly added resfinder

* adding script that parses nanopolish methyl call

* removing unnecessary labels

* starting to change how images are done

* added perl tools

Added:

* prokka
* barrnap
* mlst

* added misc module

added:

* gc_compute module

* adding labels

* added kofam analysis

* added main pyenv

added tools:

* platon
* plasmidfinder

* fixed perlenv image

Added tool:

* islandpath

* included virulence modules

they use the miscellaneous image

* pyenv image updated

added prophage tools:

* phast scan
* phigaro
*phispy

* added iceberg db module

it uses the miscellaneous image

* added py36env image

added tool:

* rgi

* added resistance tools

added:

* argminer db module
* resfinder
* amrfinderplus

* added nanopolish

* added refseq_masher

* added digIS

Docker image and modules were updated to run until digIS

* added antismash

* added sequence server

* added merge_annotation module

* added draw_gis modules

* added gff2gbk module

* added create_sql module

* added first resource management labels

* fixing resouce label for phigaro

* little finx in env

* fixing how tuples should be passed

* arrived at jbrowse step

* fixing draw_gis module input tuple

* jbrowse added

* Create test_pr.yml

This is how new versions will be organised after the remodeling branch update, and, therefore, the test_pr must be updated.

* fixed phast db incorporation

* fixing inputs on report

* Update test_pr.yml

Will only act in ready for review PRs

* adding ENV VAR for current version

* adding scripts to automatically build images

* fixed iceberg db incorporation

* Update digIS.nf

DIGis gff usage fixed

* fixed vfdb incorporation

* changing file to path input resolution

* fixed victors db incorporation

* changed channel names in main script

* fixed argminer and prokka tables

* fixed custom db annotations incorporation

* fixed bacannot server loading

* fixed custom db reports

* begin incorporation of nf-core framework

* nf-core libs have been added to the pipeline

* custom database annotation added to JBrowse

* fixed custom database gff generation

* Update Dockerfile

diminished image size by removing antismash db from inside it

* Update docker.config

uncommenting assemblers images

* creating singularity profile

* begin documentation update

* added singularity profile

* made image compatible with earlier versions

* given 777 permissions to workdir

* adjusted default values

* fixed prokka to work with singularity

* fixed rgi for singularity usage

* updating PR test action

* update targeted branches

* limiting process resources

* updated image for singularity

* removing unused dbs in quicktest

* not using big dbs in quicktest

* Update resfinder.nf

* adding gitpod config

* fixed yml

* fixing custom db report getattributefileld snippet

* begin change to mkdocs

* add requirements

* Update .readthedocs.yml

* Update .readthedocs.yml

* Update .readthedocs.yml

* Update .readthedocs.yml

* update

* update requirements

* added index

* fixed tags

* added installation information

* now on quickstart

* changed some admonitions

* added quickstart

* added samplesheet page to mkdocs

* included dir with images

* Update samplesheet.md

fixed header levels

* outputs page added to mkdocs

* Update standard.config

removing "paralallel_jobs" definition

* Update defaults.config

fixed blast default values

* updated and tested quickstart

* updating gitpod.yml

* creates a testing dir with more space

* added profile selection information

* Update manual.md

manual updated for mkdocs and with new --ncbi_proteins parameter

* Update nextflow_schema.json

updated --custom_db parameter description

* Update nextflow.config

removed parameters that are in default and should not be in the boilerplate

* added config file page

* information about custom databases added

* Update nextflow_schema.json

* defaults need to be loaded before boilerplate

* changing label of unicycler and flye

* fixed antismash installation

* fixed keggdecoder requires py36

* fixed resfinder module

fixed db setup

* pipeline fixed for docker

Co-authored-by: Abhinav Sharma <[email protected]>
fmalmeida added a commit that referenced this pull request Mar 30, 2022
* Pipeline remodelling. Issues 24, 31 and 36 (#44)

* Split out the config profiles (#43)

* Split out the config profiles

* Resolve the conflict using the upstream config

* trying to have a create dbs process

* found way to download dbs

* adding resfinder and plasmidfinder download rules

* added phigaro, vfdb and amrfinder download rules

* added last database download rules

* added prokka HMM download rule

* added label to use docker for tools

* fixed argminer download

* fixed victors download

* fixed iceberg download

* prokka using given database and bioconda image

* docker specific for downloading databases

* update packages

* added mlst database download rule

* add bacannot db info

* also downloads PGAP db

* added prokka and mlst

* fix conditional

* fixed pgap conditional

* added barrnap

* added 'compute_gc' module -- think about modules with two tools

* adding identation

* fixing label

* first tries on conda envs

* removing label

* adding plasmidfinder

* added platon

* working until islandpath

* added: VFDB

* added: Victors

* changing name and organization of MISC image

* added: PHAST

* added: phigaro

* added: phispy

* added: iceberg

* added: kofamscan db download

* removing named outputs as it will be solved in another PR

* added: kofamscan from downloaded database

* added: kegg_decoder

* Refactor channel identifiers and process names (#45)

**Main change**

Processes, channels and workflow names have had its case names changed in order to meet nextflow's community standards.

**Commits**
* Refactor names
* refactor names in the workflows
* Update workflows/bacannot.nf
* Update workflows/bacannot.nf
* Apply suggestions from code review
* Update workflows/bacannot.nf wrt iceberg
* Accomodate code review and fix the channel name
* fix MERGE_ANNOTATIONS process case in line 331

Co-authored-by: Felipe Marques de Almeida <[email protected]>

* trying to brind amrfinder, card_rgi and argminer

* trying to fix how the scale is calculated for amrfinder

* changing scale to perl

* typo fix

* removing unnecessary comma

* Update amrfinder.nf

passing calculation of thresold into two decimals scale to groovy

* properly working until amrfinder and card_rgi

* added small dataset test profile

* fixing name of quicktest profile

* fixing urls of testing samplesheets

* removing unnecessary files

* fix missing label

* adding resfinder to miscellaneous image

* fixing gitignore

* change manifest to upcoming version

* updating db download workflow behaviour

* removing .git dirs

* fixing argminder download

* properly added resfinder

* adding script that parses nanopolish methyl call

* removing unnecessary labels

* starting to change how images are done

* added perl tools

Added:

* prokka
* barrnap
* mlst

* added misc module

added:

* gc_compute module

* adding labels

* added kofam analysis

* added main pyenv

added tools:

* platon
* plasmidfinder

* fixed perlenv image

Added tool:

* islandpath

* included virulence modules

they use the miscellaneous image

* pyenv image updated

added prophage tools:

* phast scan
* phigaro
*phispy

* added iceberg db module

it uses the miscellaneous image

* added py36env image

added tool:

* rgi

* added resistance tools

added:

* argminer db module
* resfinder
* amrfinderplus

* added nanopolish

* added refseq_masher

* added digIS

Docker image and modules were updated to run until digIS

* added antismash

* added sequence server

* added merge_annotation module

* added draw_gis modules

* added gff2gbk module

* added create_sql module

* added first resource management labels

* fixing resouce label for phigaro

* little finx in env

* fixing how tuples should be passed

* arrived at jbrowse step

* fixing draw_gis module input tuple

* jbrowse added

* Create test_pr.yml

This is how new versions will be organised after the remodeling branch update, and, therefore, the test_pr must be updated.

* fixed phast db incorporation

* fixing inputs on report

* Update test_pr.yml

Will only act in ready for review PRs

* adding ENV VAR for current version

* adding scripts to automatically build images

* fixed iceberg db incorporation

* Update digIS.nf

DIGis gff usage fixed

* fixed vfdb incorporation

* changing file to path input resolution

* fixed victors db incorporation

* changed channel names in main script

* fixed argminer and prokka tables

* fixed custom db annotations incorporation

* fixed bacannot server loading

* fixed custom db reports

* begin incorporation of nf-core framework

* nf-core libs have been added to the pipeline

* custom database annotation added to JBrowse

* fixed custom database gff generation

* Update Dockerfile

diminished image size by removing antismash db from inside it

* Update docker.config

uncommenting assemblers images

* creating singularity profile

* begin documentation update

* added singularity profile

* made image compatible with earlier versions

* given 777 permissions to workdir

* adjusted default values

* fixed prokka to work with singularity

* fixed rgi for singularity usage

* updating PR test action

* update targeted branches

* limiting process resources

* updated image for singularity

* removing unused dbs in quicktest

* not using big dbs in quicktest

* Update resfinder.nf

* adding gitpod config

* fixed yml

* fixing custom db report getattributefileld snippet

* begin change to mkdocs

* add requirements

* Update .readthedocs.yml

* Update .readthedocs.yml

* Update .readthedocs.yml

* Update .readthedocs.yml

* update

* update requirements

* added index

* fixed tags

* added installation information

* now on quickstart

* changed some admonitions

* added quickstart

* added samplesheet page to mkdocs

* included dir with images

* Update samplesheet.md

fixed header levels

* outputs page added to mkdocs

* Update standard.config

removing "paralallel_jobs" definition

* Update defaults.config

fixed blast default values

* updated and tested quickstart

* updating gitpod.yml

* creates a testing dir with more space

* added profile selection information

* Update manual.md

manual updated for mkdocs and with new --ncbi_proteins parameter

* Update nextflow_schema.json

updated --custom_db parameter description

* Update nextflow.config

removed parameters that are in default and should not be in the boilerplate

* added config file page

* information about custom databases added

* Update nextflow_schema.json

* defaults need to be loaded before boilerplate

* changing label of unicycler and flye

* fixed antismash installation

* fixed keggdecoder requires py36

* fixed resfinder module

fixed db setup

* pipeline fixed for docker

Co-authored-by: Abhinav Sharma <[email protected]>

* removing travis CI file

* Update CHANGELOG.md

* wget tries 10 times before hanging

* fixed indentation

* database download modules retry 2 times before raising errors

* Delete update_database_image.sh

script not useful anymore

* Update manual.md

* Update README.md

* Update manual.md

* update bacannot_db information

* Update manual.md

* to avoid errors, load either pgap or tigrfam in prokka

Instead of loading both, as PGAP contains TIGRFAM, either load one or another. User selects. Default is loading TIGRFAM which is smaller.

Also updated information about it.

* Update manual.md

* updated prokka db information

* Update manual.md

* Update README.md

* Update manual.md

* Update manual.md

* Update manual.md

Co-authored-by: Abhinav Sharma <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Change case of modules/workflows to distinguish them from parameters
2 participants