Refactor channel identifiers and process names #45

abhi18av · 2021-12-19T10:54:31Z

This PR builds upon the discussion done in #38 .

I'm not sure how to test locally, so I've initiated this PR to track all changes.

Unless I have ignored something crucial, as of now the PR implements the refactor of identifiers used in the modules/workflows.

Happy to accomodate any change request :)

fmalmeida · 2021-12-20T10:22:43Z

Hi! Thanks for the the efforts.

I have checked the changes, and it seems that everything will run ok because it just changes the cases and nothing else, however a quick test is required to check if no typo is present.

When skimming through it, I found two typos:

In line 15, the file "prokka.nf" is uppercase, but the filename has not changed
In line 57 there is a typo in the module name

The only way I can think of testing it is running an example with a small bacterial genome and with a nanopore subset with fast5 to try the methylation calling process.

As the second one using raw reads and fast5 can be a little demanding I can make these tests myself directly using the branch from your fork after the typos have been fixed and before merging it.

😄

abhi18av

Thanks for the review @fmalmeida - good catches! I've accomodated the change requests.

workflows/bacannot.nf

fmalmeida · 2021-12-21T16:14:51Z

Hi @abhi18av, thank you for accomodating the changes and all the help.

I will perform a rapid test to check if nextflow doesn't find another typo, but everything should run smoothly. As soon as the test finishes, I'll merge it in the develop branch.

Talking about the develop. Which I believe it is comming very near to trigger a new release. We could later discuss about when (which issues to be in it) to create a new one.

fmalmeida · 2021-12-21T18:23:44Z

When running the test it complained about a flye_ch channel:

$ nextflow run \
       abhi18av/bacannot -r abhinav/refactor-channel-identifiers \
       -profile test --threads 10 --output ABHINAV

Launching `abhi18av/bacannot` [astonishing_albattani] - revision: 0f58558832 [abhinav/refactor-channel-identifiers]
=================================================================
 Container-based, fmalmeida/bacannot, Genome Annotation Pipeline 
=================================================================
Input genomes  : https://github.com/fmalmeida/bacannot/raw/master/test_data/samplesheet.yaml
Output dir     : ABHINAV
Threads        : 10
Blast % ID - Virulence Genes: 90
Blast query coverage - Virulence Genes: 80
Blast % ID - AMR Genes: 90
Blast query coverage - AMR Genes: 80
Blast % ID - ICEs or Phages: 65
Blast query coverage - ICEs or Phages: 65
Blast % ID - Plasmids: 90
Blast query coverage - Plasmids: 60
Pipeline Release: abhinav/refactor-channel-identifiers
Current home   : /home/falmeida
Current user   : falmeida
Current path   : /work1/falmeida/bacannot_tests
Configuration file: /home/falmeida/.nextflow/assets/abhi18av/bacannot/nextflow.config
==============================================================
No such variable: flye_ch

 -- Check script '/home/falmeida/.nextflow/assets/abhi18av/bacannot/./workflows/bacannot.nf' at line: 132 or see '.nextflow.log' file for more details

The problem stated is in line 132 where you've changed it to flye_ch and unicycler_ch:

PROKKA(parsed_inputs.annotation_ch.mix(flye_ch.out[1], unicycler_ch.out[1]))

However, in this step, prokka module expects the outputs from assemblers modules, so it should be something like:

PROKKA(parsed_inputs.annotation_ch.mix(FLYE.out[1], UNICYCLER.out[1]))

workflows/bacannot.nf

abhi18av · 2021-12-23T18:53:16Z

Thanks for the catch again @fmalmeida #45 (comment) , I've addressed it in 03cde47

fmalmeida · 2021-12-26T09:57:00Z

Hi @abhi18av, thanks for the change. I've tried it again and it now complained the following:

Missing process or function with name 'iceberg'

 -- Check script '/home/falmeida/.nextflow/assets/abhi18av/bacannot/main.nf' at line: 135 or see '.nextflow.log' file for more details

The problem seems in line 209:

// ICEs search
      if (params.skip_iceberg_search == false) {
        // ICEberg db
        iceberg(PROKKA.out[4], PROKKA.out[3])
        iceberg_output_ch = ICEBERG.out[1]
        iceberg_output_2_ch = ICEBERG.out[2]
      } else {
        iceberg_output_ch = Channel.empty()
        iceberg_output_2_ch = Channel.empty()
      }

workflows/bacannot.nf

abhi18av · 2021-12-27T06:11:14Z

Ah, okay. I've now fixed it with 6b77030

(Feliz Natal! 🎄🎅 )

fmalmeida · 2021-12-27T09:40:13Z

Feliz Natal! 🌲

fmalmeida · 2021-12-27T09:53:09Z

It almost worked, but it know complained about this line:

      /*
          Eighth step -- Merge all annotations with the same Prefix value in a single Channel
      */
      annotations_files_ch = PROKKA.out[3].join(PROKKA.out[1])
                                       .join(MLST.out[0])
                                       .join(BARRNAP.out[0])
                                       .join(COMPUTE_GC.out[0])
                                       .join(kofamscan_output_ch, remainder: true)
                                       .join(vfdb_output_ch,      remainder: true)
                                       .join(victors_output_ch,   remainder: true)
                                       .join(amrfinder_output_ch, remainder: true)
                                       .join(resfinder_gff_ch,    remainder: true)
                                       .join(rgi_output_ch,       remainder: true)
                                       .join(iceberg_output_ch,   remainder: true)
                                       .join(phast_output_ch,     remainder: true)
                                       .join(phigaro_output_2_ch, remainder: true)
                                       .join(FIND_GIS.out[0],  remainder: true)

      // Contatenation of annotations in a single GFF file
      MERGE_ANNOTATIONS(ANNOTATIONS_FILES.join(DIGIS.out[1],     remainder: true))

The annotation files channel is not ANNOTATIONS_FILES, but annotations_files_ch. I believe it could even be changed to annotation_files_ch.

After that I believe it would work.

workflows/bacannot.nf

abhi18av · 2021-12-28T08:33:09Z

Hey @fmalmeida , fixed that one now and keeping my 🤞 😄

As an aside, to speed up such code reviews Github has a couple features

The first is called Suggestions - which helps in adding the exact code change which you feel is the cause for these problems.
The other one is gh the Github CLI, which allows you to contribute directly to this branch (even if it's from a fork, I think) with command like gh pr checkout 45

Apart from this, happy to address any other issues.

abhi18av · 2021-12-30T16:54:43Z

Ah, there's still another one you found. Apologies for this Felipe! 😅

I'd really appreciate if there's a light-weight testing data I could use on my end - hopefully this would avoid these issues in future

fmalmeida · 2021-12-31T17:01:42Z

Hi @abhi18av,

No worries. I still not checked if it finished properly. I will check it again after 06/Jan.

Then, I can also think about a small dataset to share with you. The dataset just can't be too small in order to properly test the majority of processes.

For now, there are two 30X bacterial seq reads available with -profile test. I know it is not too small so it may take ~40 min to finish. But at least it enables the check-up of the NF code itself.

After coming back from vacations I will put more efforts on find a more appropriate testing dataset.

Furthermore,

Happy new year, and happy celebrations. All the best and thank you for helping with the code 🥳😁

fmalmeida · 2022-01-06T09:45:34Z

The tests here have finished successfully, so I will now merge this PR.

I have created a new issue #46 to handle this "problem" of a smaller dataset for quicker tests.

😄

* Split out the config profiles (#43) * Split out the config profiles * Resolve the conflict using the upstream config * Refactor channel identifiers and process names (#45) **Main change** Processes, channels and workflow names have had its case names changed in order to meet nextflow's community standards. **Commits** * Refactor names * refactor names in the workflows * Update workflows/bacannot.nf * Update workflows/bacannot.nf * Apply suggestions from code review * Update workflows/bacannot.nf wrt iceberg * Accomodate code review and fix the channel name * fix MERGE_ANNOTATIONS process case in line 331 Co-authored-by: Felipe Marques de Almeida <[email protected]> * added small dataset test profile * fixing name of quicktest profile * fixing urls of testing samplesheets * removing unnecessary files * added new action to test the pipeline from PR and updated docs about quicktest * fixed branch name in github action * fixing input typo in quicktest profile * adding more space to workflow env * fixed action as only 2 threads are available Co-authored-by: Abhinav Sharma <[email protected]>

* Split out the config profiles (#43) * Split out the config profiles * Resolve the conflict using the upstream config * trying to have a create dbs process * found way to download dbs * adding resfinder and plasmidfinder download rules * added phigaro, vfdb and amrfinder download rules * added last database download rules * added prokka HMM download rule * added label to use docker for tools * fixed argminer download * fixed victors download * fixed iceberg download * prokka using given database and bioconda image * docker specific for downloading databases * update packages * added mlst database download rule * add bacannot db info * also downloads PGAP db * added prokka and mlst * fix conditional * fixed pgap conditional * added barrnap * added 'compute_gc' module -- think about modules with two tools * adding identation * fixing label * first tries on conda envs * removing label * adding plasmidfinder * added platon * working until islandpath * added: VFDB * added: Victors * changing name and organization of MISC image * added: PHAST * added: phigaro * added: phispy * added: iceberg * added: kofamscan db download * removing named outputs as it will be solved in another PR * added: kofamscan from downloaded database * added: kegg_decoder * Refactor channel identifiers and process names (#45) **Main change** Processes, channels and workflow names have had its case names changed in order to meet nextflow's community standards. **Commits** * Refactor names * refactor names in the workflows * Update workflows/bacannot.nf * Update workflows/bacannot.nf * Apply suggestions from code review * Update workflows/bacannot.nf wrt iceberg * Accomodate code review and fix the channel name * fix MERGE_ANNOTATIONS process case in line 331 Co-authored-by: Felipe Marques de Almeida <[email protected]> * trying to brind amrfinder, card_rgi and argminer * trying to fix how the scale is calculated for amrfinder * changing scale to perl * typo fix * removing unnecessary comma * Update amrfinder.nf passing calculation of thresold into two decimals scale to groovy * properly working until amrfinder and card_rgi * added small dataset test profile * fixing name of quicktest profile * fixing urls of testing samplesheets * removing unnecessary files * fix missing label * adding resfinder to miscellaneous image * fixing gitignore * change manifest to upcoming version * updating db download workflow behaviour * removing .git dirs * fixing argminder download * properly added resfinder * adding script that parses nanopolish methyl call * removing unnecessary labels * starting to change how images are done * added perl tools Added: * prokka * barrnap * mlst * added misc module added: * gc_compute module * adding labels * added kofam analysis * added main pyenv added tools: * platon * plasmidfinder * fixed perlenv image Added tool: * islandpath * included virulence modules they use the miscellaneous image * pyenv image updated added prophage tools: * phast scan * phigaro *phispy * added iceberg db module it uses the miscellaneous image * added py36env image added tool: * rgi * added resistance tools added: * argminer db module * resfinder * amrfinderplus * added nanopolish * added refseq_masher * added digIS Docker image and modules were updated to run until digIS * added antismash * added sequence server * added merge_annotation module * added draw_gis modules * added gff2gbk module * added create_sql module * added first resource management labels * fixing resouce label for phigaro * little finx in env * fixing how tuples should be passed * arrived at jbrowse step * fixing draw_gis module input tuple * jbrowse added * Create test_pr.yml This is how new versions will be organised after the remodeling branch update, and, therefore, the test_pr must be updated. * fixed phast db incorporation * fixing inputs on report * Update test_pr.yml Will only act in ready for review PRs * adding ENV VAR for current version * adding scripts to automatically build images * fixed iceberg db incorporation * Update digIS.nf DIGis gff usage fixed * fixed vfdb incorporation * changing file to path input resolution * fixed victors db incorporation * changed channel names in main script * fixed argminer and prokka tables * fixed custom db annotations incorporation * fixed bacannot server loading * fixed custom db reports * begin incorporation of nf-core framework * nf-core libs have been added to the pipeline * custom database annotation added to JBrowse * fixed custom database gff generation * Update Dockerfile diminished image size by removing antismash db from inside it * Update docker.config uncommenting assemblers images * creating singularity profile * begin documentation update * added singularity profile * made image compatible with earlier versions * given 777 permissions to workdir * adjusted default values * fixed prokka to work with singularity * fixed rgi for singularity usage * updating PR test action * update targeted branches * limiting process resources * updated image for singularity * removing unused dbs in quicktest * not using big dbs in quicktest * Update resfinder.nf * adding gitpod config * fixed yml * fixing custom db report getattributefileld snippet * begin change to mkdocs * add requirements * Update .readthedocs.yml * Update .readthedocs.yml * Update .readthedocs.yml * Update .readthedocs.yml * update * update requirements * added index * fixed tags * added installation information * now on quickstart * changed some admonitions * added quickstart * added samplesheet page to mkdocs * included dir with images * Update samplesheet.md fixed header levels * outputs page added to mkdocs * Update standard.config removing "paralallel_jobs" definition * Update defaults.config fixed blast default values * updated and tested quickstart * updating gitpod.yml * creates a testing dir with more space * added profile selection information * Update manual.md manual updated for mkdocs and with new --ncbi_proteins parameter * Update nextflow_schema.json updated --custom_db parameter description * Update nextflow.config removed parameters that are in default and should not be in the boilerplate * added config file page * information about custom databases added * Update nextflow_schema.json * defaults need to be loaded before boilerplate * changing label of unicycler and flye * fixed antismash installation * fixed keggdecoder requires py36 * fixed resfinder module fixed db setup * pipeline fixed for docker Co-authored-by: Abhinav Sharma <[email protected]>

* Pipeline remodelling. Issues 24, 31 and 36 (#44) * Split out the config profiles (#43) * Split out the config profiles * Resolve the conflict using the upstream config * trying to have a create dbs process * found way to download dbs * adding resfinder and plasmidfinder download rules * added phigaro, vfdb and amrfinder download rules * added last database download rules * added prokka HMM download rule * added label to use docker for tools * fixed argminer download * fixed victors download * fixed iceberg download * prokka using given database and bioconda image * docker specific for downloading databases * update packages * added mlst database download rule * add bacannot db info * also downloads PGAP db * added prokka and mlst * fix conditional * fixed pgap conditional * added barrnap * added 'compute_gc' module -- think about modules with two tools * adding identation * fixing label * first tries on conda envs * removing label * adding plasmidfinder * added platon * working until islandpath * added: VFDB * added: Victors * changing name and organization of MISC image * added: PHAST * added: phigaro * added: phispy * added: iceberg * added: kofamscan db download * removing named outputs as it will be solved in another PR * added: kofamscan from downloaded database * added: kegg_decoder * Refactor channel identifiers and process names (#45) **Main change** Processes, channels and workflow names have had its case names changed in order to meet nextflow's community standards. **Commits** * Refactor names * refactor names in the workflows * Update workflows/bacannot.nf * Update workflows/bacannot.nf * Apply suggestions from code review * Update workflows/bacannot.nf wrt iceberg * Accomodate code review and fix the channel name * fix MERGE_ANNOTATIONS process case in line 331 Co-authored-by: Felipe Marques de Almeida <[email protected]> * trying to brind amrfinder, card_rgi and argminer * trying to fix how the scale is calculated for amrfinder * changing scale to perl * typo fix * removing unnecessary comma * Update amrfinder.nf passing calculation of thresold into two decimals scale to groovy * properly working until amrfinder and card_rgi * added small dataset test profile * fixing name of quicktest profile * fixing urls of testing samplesheets * removing unnecessary files * fix missing label * adding resfinder to miscellaneous image * fixing gitignore * change manifest to upcoming version * updating db download workflow behaviour * removing .git dirs * fixing argminder download * properly added resfinder * adding script that parses nanopolish methyl call * removing unnecessary labels * starting to change how images are done * added perl tools Added: * prokka * barrnap * mlst * added misc module added: * gc_compute module * adding labels * added kofam analysis * added main pyenv added tools: * platon * plasmidfinder * fixed perlenv image Added tool: * islandpath * included virulence modules they use the miscellaneous image * pyenv image updated added prophage tools: * phast scan * phigaro *phispy * added iceberg db module it uses the miscellaneous image * added py36env image added tool: * rgi * added resistance tools added: * argminer db module * resfinder * amrfinderplus * added nanopolish * added refseq_masher * added digIS Docker image and modules were updated to run until digIS * added antismash * added sequence server * added merge_annotation module * added draw_gis modules * added gff2gbk module * added create_sql module * added first resource management labels * fixing resouce label for phigaro * little finx in env * fixing how tuples should be passed * arrived at jbrowse step * fixing draw_gis module input tuple * jbrowse added * Create test_pr.yml This is how new versions will be organised after the remodeling branch update, and, therefore, the test_pr must be updated. * fixed phast db incorporation * fixing inputs on report * Update test_pr.yml Will only act in ready for review PRs * adding ENV VAR for current version * adding scripts to automatically build images * fixed iceberg db incorporation * Update digIS.nf DIGis gff usage fixed * fixed vfdb incorporation * changing file to path input resolution * fixed victors db incorporation * changed channel names in main script * fixed argminer and prokka tables * fixed custom db annotations incorporation * fixed bacannot server loading * fixed custom db reports * begin incorporation of nf-core framework * nf-core libs have been added to the pipeline * custom database annotation added to JBrowse * fixed custom database gff generation * Update Dockerfile diminished image size by removing antismash db from inside it * Update docker.config uncommenting assemblers images * creating singularity profile * begin documentation update * added singularity profile * made image compatible with earlier versions * given 777 permissions to workdir * adjusted default values * fixed prokka to work with singularity * fixed rgi for singularity usage * updating PR test action * update targeted branches * limiting process resources * updated image for singularity * removing unused dbs in quicktest * not using big dbs in quicktest * Update resfinder.nf * adding gitpod config * fixed yml * fixing custom db report getattributefileld snippet * begin change to mkdocs * add requirements * Update .readthedocs.yml * Update .readthedocs.yml * Update .readthedocs.yml * Update .readthedocs.yml * update * update requirements * added index * fixed tags * added installation information * now on quickstart * changed some admonitions * added quickstart * added samplesheet page to mkdocs * included dir with images * Update samplesheet.md fixed header levels * outputs page added to mkdocs * Update standard.config removing "paralallel_jobs" definition * Update defaults.config fixed blast default values * updated and tested quickstart * updating gitpod.yml * creates a testing dir with more space * added profile selection information * Update manual.md manual updated for mkdocs and with new --ncbi_proteins parameter * Update nextflow_schema.json updated --custom_db parameter description * Update nextflow.config removed parameters that are in default and should not be in the boilerplate * added config file page * information about custom databases added * Update nextflow_schema.json * defaults need to be loaded before boilerplate * changing label of unicycler and flye * fixed antismash installation * fixed keggdecoder requires py36 * fixed resfinder module fixed db setup * pipeline fixed for docker Co-authored-by: Abhinav Sharma <[email protected]> * removing travis CI file * Update CHANGELOG.md * wget tries 10 times before hanging * fixed indentation * database download modules retry 2 times before raising errors * Delete update_database_image.sh script not useful anymore * Update manual.md * Update README.md * Update manual.md * update bacannot_db information * Update manual.md * to avoid errors, load either pgap or tigrfam in prokka Instead of loading both, as PGAP contains TIGRFAM, either load one or another. User selects. Default is loading TIGRFAM which is smaller. Also updated information about it. * Update manual.md * updated prokka db information * Update manual.md * Update README.md * Update manual.md * Update manual.md * Update manual.md Co-authored-by: Abhinav Sharma <[email protected]>

abhi18av added 2 commits December 18, 2021 20:36

Refactor names

3aa6057

refactor names in the workflows

311a007

fmalmeida self-assigned this Dec 19, 2021

fmalmeida added cleanup code organisation labels Dec 21, 2021

abhi18av commented Dec 21, 2021

View reviewed changes

workflows/bacannot.nf Outdated Show resolved Hide resolved

workflows/bacannot.nf Outdated Show resolved Hide resolved

abhi18av added 2 commits December 21, 2021 17:07

Update workflows/bacannot.nf

650c03c

Update workflows/bacannot.nf

0f58558

abhi18av commented Dec 23, 2021

View reviewed changes

workflows/bacannot.nf Outdated Show resolved Hide resolved

Apply suggestions from code review

03cde47

abhi18av commented Dec 27, 2021

View reviewed changes

workflows/bacannot.nf Outdated Show resolved Hide resolved

Update workflows/bacannot.nf wrt iceberg

6b77030

abhi18av commented Dec 28, 2021

View reviewed changes

workflows/bacannot.nf Outdated Show resolved Hide resolved

Accomodate code review and fix the channel name

8555a43

fix MERGE_ANNOTATIONS process case in line 331

2184eef

fmalmeida merged commit a50b4e7 into fmalmeida:develop Jan 6, 2022

fmalmeida mentioned this pull request Jan 6, 2022

Change case of modules/workflows to distinguish them from parameters #38

Closed

fmalmeida linked an issue Jan 6, 2022 that may be closed by this pull request

Change case of modules/workflows to distinguish them from parameters #38

Closed

abhi18av deleted the abhinav/refactor-channel-identifiers branch January 10, 2022 09:09

fmalmeida mentioned this pull request Mar 28, 2022

Pipeline remodelling. Issues 24, 31 and 36 (#44) #51

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Refactor channel identifiers and process names #45

Refactor channel identifiers and process names #45

abhi18av commented Dec 19, 2021

fmalmeida commented Dec 20, 2021

abhi18av left a comment •

edited

Loading

fmalmeida commented Dec 21, 2021

fmalmeida commented Dec 21, 2021

abhi18av commented Dec 23, 2021

fmalmeida commented Dec 26, 2021

abhi18av commented Dec 27, 2021

fmalmeida commented Dec 27, 2021

fmalmeida commented Dec 27, 2021

abhi18av commented Dec 28, 2021

abhi18av commented Dec 30, 2021

fmalmeida commented Dec 31, 2021 •

edited

Loading

fmalmeida commented Jan 6, 2022

Refactor channel identifiers and process names #45

Refactor channel identifiers and process names #45

Conversation

abhi18av commented Dec 19, 2021

fmalmeida commented Dec 20, 2021

abhi18av left a comment • edited Loading

Choose a reason for hiding this comment

fmalmeida commented Dec 21, 2021

fmalmeida commented Dec 21, 2021

abhi18av commented Dec 23, 2021

fmalmeida commented Dec 26, 2021

abhi18av commented Dec 27, 2021

fmalmeida commented Dec 27, 2021

fmalmeida commented Dec 27, 2021

abhi18av commented Dec 28, 2021

abhi18av commented Dec 30, 2021

fmalmeida commented Dec 31, 2021 • edited Loading

fmalmeida commented Jan 6, 2022

abhi18av left a comment •

edited

Loading

fmalmeida commented Dec 31, 2021 •

edited

Loading