Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Trio #55

Merged
merged 27 commits into from
Mar 20, 2025
Merged

Trio #55

merged 27 commits into from
Mar 20, 2025

Conversation

yumisims
Copy link
Contributor

@yumisims yumisims commented Sep 23, 2024

PR checklist

  • This comment contains a description of changes (with reason).
  • If you've fixed a bug or added code that should be tested, add tests!
  • If you've added a new tool - have you followed the pipeline conventions in the contribution docs
  • Make sure your code lints (nf-core lint).
  • Ensure the test suite passes (nextflow run . -profile test,docker --outdir <OUTDIR>).
  • Usage Documentation in docs/usage.md is updated.
  • Output Documentation in docs/output.md is updated.
  • CHANGELOG.md is updated.
  • README.md is updated (including new tool citations and authors/contributors).

Copy link

github-actions bot commented Sep 23, 2024

nf-core lint overall result: Passed ✅ ⚠️

Posted for pipeline commit a6519c6

+| ✅ 130 tests passed       |+
#| ❔  19 tests were ignored |#
!| ❗   3 tests had warnings |!

❗ Test warnings:

  • files_exist - File not found: conf/igenomes.config
  • pipeline_todos - TODO string in base.config: Check the defaults for all processes
  • pipeline_todos - TODO string in base.config: Customise requirements for specific processes.

❔ Tests ignored:

  • files_exist - File is ignored: assets/nf-core-genomeassembly_logo_light.png
  • files_exist - File is ignored: docs/images/nf-core-genomeassembly_logo_light.png
  • files_exist - File is ignored: docs/images/nf-core-genomeassembly_logo_dark.png
  • files_exist - File is ignored: .github/ISSUE_TEMPLATE/config.yml
  • files_exist - File is ignored: .github/workflows/awstest.yml
  • files_exist - File is ignored: .github/workflows/awsfulltest.yml
  • nextflow_config - Config variable ignored: manifest.name
  • nextflow_config - Config variable ignored: manifest.homePage
  • files_unchanged - File ignored due to lint config: LICENSE or LICENSE.md or LICENCE or LICENCE.md
  • files_unchanged - File ignored due to lint config: .github/CONTRIBUTING.md
  • files_unchanged - File ignored due to lint config: .github/ISSUE_TEMPLATE/bug_report.yml
  • files_unchanged - File ignored due to lint config: .github/workflows/linting.yml
  • files_unchanged - File ignored due to lint config: assets/sendmail_template.txt
  • files_unchanged - File does not exist: assets/nf-core-genomeassembly_logo_light.png
  • files_unchanged - File does not exist: docs/images/nf-core-genomeassembly_logo_light.png
  • files_unchanged - File does not exist: docs/images/nf-core-genomeassembly_logo_dark.png
  • files_unchanged - File ignored due to lint config: lib/NfcoreTemplate.groovy
  • files_unchanged - File ignored due to lint config: .gitignore or .prettierignore or pyproject.toml
  • actions_awstest - 'awstest.yml' workflow not found: /home/runner/work/genomeassembly/genomeassembly/.github/workflows/awstest.yml

✅ Tests passed:

Run details

  • nf-core/tools version 2.8
  • Run at 2025-03-17 13:16:36

@ksenia-krasheninnikova
Copy link
Contributor

Hi @yumisims
Looks good to me in general, thanks for your work! What has to be changed - we don't have to run purge_dups when hifiasm is run in the trio mode. The logic is similar to the the hap1/hap2 assembly - each of the two hifiasm trio-phased files is to be scaffolded up directly:

if ( hifiasm_hic_on ) {
//
// SUBWORKFLOW: MAP HIC DATA TO THE HAP1 CONTIGS
//
HIC_MAPPING_HAP1 ( RAW_ASSEMBLY.out.hap1_hic_contigs, crams_ch, hic_aligner_ch, 'hap1' )
ch_versions = ch_versions.mix(HIC_MAPPING_HAP1.out.versions)
//
// SUBWORKFLOW: SCAFFOLD HAP1
//
SCAFFOLDING_HAP1( HIC_MAPPING_HAP1.out.bed, RAW_ASSEMBLY.out.hap1_hic_contigs, cool_bin, 'hap1' )
ch_versions = ch_versions.mix(SCAFFOLDING_HAP1.out.versions)
//
// SUBWORKFLOW: MAP HIC DATA TO THE HAP2 CONTIGS
//
HIC_MAPPING_HAP2 ( RAW_ASSEMBLY.out.hap2_hic_contigs, crams_ch, hic_aligner_ch, 'hap2' )
ch_versions = ch_versions.mix(HIC_MAPPING_HAP2.out.versions)
//
// SUBWORKFLOW: SCAFFOLD HAP2
//
SCAFFOLDING_HAP2( HIC_MAPPING_HAP2.out.bed, RAW_ASSEMBLY.out.hap2_hic_contigs, cool_bin, 'hap2' )
ch_versions = ch_versions.mix(SCAFFOLDING_HAP2.out.versions)
//
// LOGIC: CREATE A CHANNEL FOR THE FULL HAP1/HAP2 ASSEMBLY
//
SCAFFOLDING_HAP1.out.fasta.combine(SCAFFOLDING_HAP2.out.fasta)
.map{meta_s, fasta_s, meta_h, fasta_h -> [ [id:meta_h.id], fasta_s, fasta_h ]}
.set{ stats_haps_input_ch }
//
// SUBWORKFLOW: CALCULATE ASSEMBLY STATISTICS FOR HAP1/HAP2 ASSEMBLY
//
GENOME_STATISTICS_SCAFFOLDS_HAPS( stats_haps_input_ch,
PREPARE_INPUT.out.busco,
GENOMESCOPE_MODEL.out.hist,
GENOMESCOPE_MODEL.out.ktab,
[],
[],
set_busco_alts
)

@yumisims
Copy link
Contributor Author

yumisims commented Nov 5, 2024

Hi @yumisims Looks good to me in general, thanks for your work! What has to be changed - we don't have to run purge_dups when hifiasm is run in the trio mode. The logic is similar to the the hap1/hap2 assembly - each of the two hifiasm trio-phased files is to be scaffolded up directly:

if ( hifiasm_hic_on ) {
//
// SUBWORKFLOW: MAP HIC DATA TO THE HAP1 CONTIGS
//
HIC_MAPPING_HAP1 ( RAW_ASSEMBLY.out.hap1_hic_contigs, crams_ch, hic_aligner_ch, 'hap1' )
ch_versions = ch_versions.mix(HIC_MAPPING_HAP1.out.versions)
//
// SUBWORKFLOW: SCAFFOLD HAP1
//
SCAFFOLDING_HAP1( HIC_MAPPING_HAP1.out.bed, RAW_ASSEMBLY.out.hap1_hic_contigs, cool_bin, 'hap1' )
ch_versions = ch_versions.mix(SCAFFOLDING_HAP1.out.versions)
//
// SUBWORKFLOW: MAP HIC DATA TO THE HAP2 CONTIGS
//
HIC_MAPPING_HAP2 ( RAW_ASSEMBLY.out.hap2_hic_contigs, crams_ch, hic_aligner_ch, 'hap2' )
ch_versions = ch_versions.mix(HIC_MAPPING_HAP2.out.versions)
//
// SUBWORKFLOW: SCAFFOLD HAP2
//
SCAFFOLDING_HAP2( HIC_MAPPING_HAP2.out.bed, RAW_ASSEMBLY.out.hap2_hic_contigs, cool_bin, 'hap2' )
ch_versions = ch_versions.mix(SCAFFOLDING_HAP2.out.versions)
//
// LOGIC: CREATE A CHANNEL FOR THE FULL HAP1/HAP2 ASSEMBLY
//
SCAFFOLDING_HAP1.out.fasta.combine(SCAFFOLDING_HAP2.out.fasta)
.map{meta_s, fasta_s, meta_h, fasta_h -> [ [id:meta_h.id], fasta_s, fasta_h ]}
.set{ stats_haps_input_ch }
//
// SUBWORKFLOW: CALCULATE ASSEMBLY STATISTICS FOR HAP1/HAP2 ASSEMBLY
//
GENOME_STATISTICS_SCAFFOLDS_HAPS( stats_haps_input_ch,
PREPARE_INPUT.out.busco,
GENOMESCOPE_MODEL.out.hist,
GENOMESCOPE_MODEL.out.ktab,
[],
[],
set_busco_alts
)

Hi Ksenia, could you please take a look it again? I have add in scaffolding for trio case. Thank you

@yumisims
Copy link
Contributor Author

yumisims commented Nov 5, 2024

@gq1 the same error appear again, could you please take a look at the editorconfig? thanks

@gq1
Copy link
Member

gq1 commented Nov 5, 2024

@gq1 the same error appear again, could you please take a look at the editorconfig? thanks

https://github.com/sanger-tol/genomeassembly/actions/runs/11686036439/workflow?pr=55#L22
You need to use the old version as other pipelines you did? not the latest version.

@ksenia-krasheninnikova
Copy link
Contributor

Hi Yumi

Thank you for the updates! There are some considerations:

  • hifiasm_trio_on should be set to false in base.conf - similar how it's implemented for hifiasm_hic_on

  • the test based on test.yaml runs incomplete. The output files of the hifiasm in trio mode are not picked up because the hifiasm module refers to the wrong names of the gfa files: it has to be ".asm.dip.hap1.p_ctg.gfa" not ".asm.hic.hap1.p_ctg.gfa" etc (you can find them in the hifiasm run in the nextflow workdir). Hence the jobs gfa_to_fasta, assembly stats/busco/merqury and scaffolding are not picked up by nextflow.

  • purge_dups should not be run on an assembly with trio data but these jobs are being scheduled

  • If hifiasm was run in trio mode it has to be visible from it's name: baUndUnlc1.hifiasm-trio.20241106

  • I suggest to move the trio mode in a separate test rather than keeping it in test.conf

@yumisims
Copy link
Contributor Author

@ksenia-krasheninnikova Could you please have a look at the pr again? thanks.
I have added a separate config for trio mode, and make sure no trio mode in the test.config. Please let me know.
Thanks

@ksenia-krasheninnikova
Copy link
Contributor

Thanks @yumisims
I can see in the latest commit trio mode is switched off by default.

The primary assembly is still incomplete in the test mode.
I suggest to move the trio mode to a separate test case which means there has to be a specific test file created for the trio mode. There are paths to trio data in test.yaml which is not needed anymore if this functionality is not tested. I can see multiple tests configuration files were updated by adding trio files for sarscov but that wouldn't work for testing because the HiFi datasets in parents/offspring won't match.
Among the new changes - the naming of the output folder has to be changed as suggested in the previous post, because currently it refers to hifiasm-hic folder output. For this reason it's not possible to run the pipeline with --hifiasm_trio_on True as it fails with an error.

@yumisims yumisims requested a review from prototaxites March 3, 2025 20:59
Copy link
Contributor

@prototaxites prototaxites left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hi Yumi,

Have had a look over - I think this all looks fine! There are things in the code that I think could be tidied, but on the whole I think this can happen in a bigger refactoring of the codebase. I've left a few comments though, which probably aren't very arduous.

That said, I'm still not too familiar with the codebase, so I'll wait for @ksenia-krasheninnikova to give an OK. In the meantime can you share either a path to the output of a test run, or a command I could run to run a trio test?


when:
task.ext.when == null || task.ext.when

script:
// Exit if running this module with -profile conda / -profile mamba
if (workflow.profile.tokenize(',').intersect(['conda', 'mamba']).size() >= 1) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not something to fix in this PR but there is a bioconda package for MerquryFK now

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh nice ! In fact, it's not just the fact that there's a bioconda package, it's importantly that there are now releases https://github.com/thegenemyers/MERQURY.FK/releases !

@yumisims
Copy link
Contributor Author

yumisims commented Mar 8, 2025

@prototaxites, Hi Jim, thanks for the commit, I will do some cleaning up based on your comment. My next step is to refactor three nf-core modules and write up the hapmaker module to submit it to nf-core (see issues created).

matreads:
- https://raw.githubusercontent.com/nf-core/test-datasets/modules/data/genomics/sarscov2/illumina/fastq/test_1.fastq.gz
patreads:
- https://raw.githubusercontent.com/nf-core/test-datasets/modules/data/genomics/sarscov2/illumina/fastq/test_2.fastq.gz
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As discussed, there is a separate trio test implemented. The 'trio:' section has to be removed here.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ok, thanks both @ksenia-krasheninnikova @prototaxites I will incorperate all your comments.

matreads:
- https://raw.githubusercontent.com/nf-core/test-datasets/modules/data/genomics/sarscov2/illumina/fastq/test_1.fastq.gz
patreads:
- https://raw.githubusercontent.com/nf-core/test-datasets/modules/data/genomics/sarscov2/illumina/fastq/test_2.fastq.gz
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As discussed, there is a separate trio test implemented. The 'trio:' section has to be removed here.

@prototaxites prototaxites merged commit 0c2aae8 into dev Mar 20, 2025
6 checks passed
@prototaxites prototaxites deleted the trio branch March 27, 2025 10:29
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants