Skip to content

New release 2.0.0 including dda workflow for quantmsdiann#36

Draft
ypriverol wants to merge 32 commits intomainfrom
dev
Draft

New release 2.0.0 including dda workflow for quantmsdiann#36
ypriverol wants to merge 32 commits intomainfrom
dev

Conversation

@ypriverol
Copy link
Copy Markdown
Member

@ypriverol ypriverol commented Apr 6, 2026

PR checklist

  • This comment contains a description of changes (with reason).
  • If you've fixed a bug or added code that should be tested, add tests!
  • If you've added a new tool - have you followed the pipeline conventions in the contribution docs
  • If necessary, also make a PR on the bigbio/quantmsdiann branch on the nf-core/test-datasets repository.
  • Make sure your code lints (nf-core pipelines lint).
  • Ensure the test suite passes (nextflow run . -profile test,docker --outdir <OUTDIR>).
  • Check for unexpected warnings in debug mode (nextflow run . -profile debug,test,docker --outdir <OUTDIR>).
  • Usage Documentation in docs/usage.md is updated.
  • Output Documentation in docs/output.md is updated.
  • CHANGELOG.md is updated.
  • README.md is updated (including new tool citations and authors/contributors).

Summary by CodeRabbit

  • Documentation

    • Updated README Zenodo badge/DOI and removed legacy .d→mzML converter from docs; added usage notes for DDA mode, new DIA‑NN options, and a new --convert_dotd flag.
  • New Features

    • Added DDA analysis mode support and several DIA‑NN configuration flags/parameters (including channel normalization options) plus version-selection guidance.
  • Bug Fixes

    • Improved pipeline failure propagation to surface earlier errors in several steps.
  • Chores

    • Removed the legacy converter module and its registration; bumped pipeline manifest version.

ypriverol and others added 18 commits April 3, 2026 11:26
13-task plan covering robustness fixes, DDA support, new DIA-NN params,
InfinDIA groundwork, comprehensive documentation, and issue cleanup.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Without pipefail, if the command before tee fails, tee returns 0 and
the Nextflow task appears to succeed. This masked failures in
generate_cfg, diann_msstats, samplesheet_check, and sdrf_parsing.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
These are the longest-running tasks and most susceptible to transient
failures (OOM, I/O timeouts). The error_retry label enables automatic
retry on signal exits (130-145, 104, 175).

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Guard ch_searchdb and ch_experiment_meta with ifEmpty to fail fast
with clear error messages instead of hanging indefinitely.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Adds conf/diann_versions/v2_3_2.config with ghcr.io/bigbio/diann:2.3.2
container. Use -profile diann_v2_3_2 to opt in. Default stays 1.8.1.
Enables DDA support and InfinDIA features.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
- New param diann_dda (boolean, default: false)
- Version guard: requires DIA-NN >= 2.3.2
- Passes --dda to all 5 DIA-NN modules when enabled
- Accepts DDA acquisition method in SDRF when diann_dda=true
- Added --dda to blocked lists in all modules

Closes #5

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
- test_dda: BSA dataset with diann_dda=true on DIA-NN 2.3.2
- test_dia_skip_preanalysis: tests previously untested skip path
Both added to extended_ci.yml stage 2a.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
- diann_light_models: 10x faster in-silico library generation
- diann_export_quant: fragment-level parquet export
- diann_site_ms1_quant: MS1 apex intensities for PTM quantification
All require DIA-NN >= 2.0.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Experimental support for InfinDIA (DIA-NN 2.3.0+). Passes --infin-dia
to library generation when enabled. Version guard enforces >= 2.3.0.
No test config — InfinDIA requires large databases.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Complete reference for all ~70 pipeline parameters grouped by category
with types, defaults, descriptions, and version requirements.

Closes #1

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
- DDA mode documentation with limitations
- Missing param sections (preprocessing, extra_args scope, verbose output)
- DIA-NN version selection guide
- Parquet vs TSV output explanation
- MSstats format section
- pmultiqc citation added
- README updated with version table and parameter reference link

Closes #3, #9, #15

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
- Add version guard for DIA-NN 2.0+ params (--light-models,
  --export-quant, --site-ms1-quant) to prevent crashes with 1.8.1
- Add *.site_report.parquet as optional output in FINAL_QUANTIFICATION
  for site-level PTM quantification

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
1. test_dda.config: Add diann_version = '2.3.2' so the version guard
   doesn't reject DDA mode (default is 1.8.1, guard requires >= 2.3.2)

2. quantmsdiann.nf: Update branch condition to also match "dda"
   acquisition method. Previously "dda".contains("dia") was false,
   causing all DDA files to be silently dropped from processing.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
These flags exist in DIA-NN 1.8.x but were removed in 2.3.x, causing
'unrecognised option' warnings. Only pass them for versions < 2.3.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
@coderabbitai
Copy link
Copy Markdown

coderabbitai bot commented Apr 6, 2026

📝 Walkthrough

Walkthrough

Added DDA support and related DIA-NN options across the pipeline, removed the tdf2mzml module and its manifest, updated documentation and citation (Zenodo DOI), introduced version guards and new test/config profiles, and added a VersionUtils helper for semantic version checks.

Changes

Cohort / File(s) Summary
Docs & citations
README.md, AGENTS.md, CITATIONS.md, docs/parameters.md, docs/usage.md, docs/*
Updated Zenodo DOI and related badge/link; removed tdf2mzml mentions from docs; added/ documented new DIA-NN and preliminary-analysis parameters and usage notes.
Module removals
modules/local/utils/tdf2mzml/main.nf, modules/local/utils/tdf2mzml/meta.yml
Removed the tdf2mzml process and its manifest (conversion logic, outputs, and metadata).
File-preparation subworkflow
subworkflows/local/file_preparation/meta.yml, workflows/quantmsdiann.nf
Removed tdf2mzml from components; classification now treats samples with acquisition_method containing "dia" or "dda" as dia.
DIA-NN feature surface
nextflow.config, nextflow_schema.json, conf/diann_versions/v2_3_2.config, conf/tests/test_dda.config, conf/tests/test_dia_skip_preanalysis.config
Added DIA-NN params (e.g., diann_dda, diann_light_models, diann_export_quant, diann_site_ms1_quant, enable_infin_dia, diann_pre_select, channel norm flags), new profiles/configs, and a version override file for v2.3.2.
Version utility
lib/VersionUtils.groovy
Added semantic version comparison helpers (compare, versionLessThan, versionAtLeast) used for runtime guards.
Workflow control & guards
workflows/dia.nf, subworkflows/local/create_input_channel/main.nf
Added runtime guards that error on incompatible DIA-NN versions for DDA/InfinDIA/features; created ch_is_dda channel and refactored input-channel grouping/metadata detection to infer DDA/DIA and aggregate per-file SDRF rows.
DIA-NN processes
modules/local/diann/* (assemble_empirical_library, insilico_library_generation, individual_analysis, preliminary_analysis, final_quantification, generate_cfg, diann_msstats, diann_msstats/meta.yml, insilico_library_generation/meta.yml)
Propagated --dda handling: strip user-supplied conflicting flags, conditionally append --dda where appropriate, add error_retry labels to several processes, add new inputs/flags (e.g., is_dda, light models, infinDIA, pre-select), make several outputs optional, adjust version checks to use VersionUtils, and enable pipefail in several scripts.
Small metadata & schema tweaks
modules/local/sdrf_parsing/meta.yml, modules/local/utils/decompress_dotd/meta.yml, modules/local/pmultiqc/meta.yml, subworkflows/local/input_check/meta.yml, subworkflows/local/utils_nfcore_quantms_pipeline/meta.yml, assets/schema_input.json, .gitignore, CITATIONS.md
Minor metadata renames/description tweaks, input-pattern adjustment, schema error-message wording changes, added docs/plans/ to .gitignore.

Sequence Diagram(s)

mermaid
sequenceDiagram
rect rgba(96,165,250,0.5)
participant SDRF as SDRF parsing
end
rect rgba(34,197,94,0.5)
participant Create as CREATE_INPUT_CHANNEL
end
rect rgba(250,130,49,0.5)
participant DIAwf as workflows/dia.nf
end
rect rgba(190,24,93,0.5)
participant DIANN as DIA-NN processes
end

SDRF->>Create: parsed rows per file
Create->>Create: group rows → determine meta.acquisition_method (dia/dda) and meta.is_dda
Create->>DIAwf: emit ch_meta + ch_is_dda
DIAwf->>DIANN: pass ch_is_dda to INSILICO/INDIVIDUAL/FINAL steps
DIANN->>DIANN: strip blocked flags; append --dda when is_dda true
DIANN-->>DIAwf: emit results (optional site reports, matrices, stats)

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~45 minutes

Possibly related issues

  • Phase 5: DDA data analysis support via DIA-NN (v2.3.2+) #5 — Implements DDA support, matching the DDA mode params, flag handling, and version guard work in this PR.
  • bigbio/quantms#668 — Adds DDA-related DIA‑NN features and version constraints consistent with the objectives described in this issue.

Possibly related PRs

Suggested reviewers

  • daichengxin
  • jpfeuffer
  • yueqixuan

Poem

🐰 I sniffed the SDRF, grouped each file with care,
A new --dda flag hopped into the air,
The tdf2mzml burrow closed its door,
Docs polished bright, versions checked for more—
🥕🌿

🚥 Pre-merge checks | ✅ 3
✅ Passed checks (3 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Title check ✅ Passed The PR title 'New release 2.0.0 including dda workflow for quantmsdiann' clearly summarizes the main changes: a major version release with DDA workflow support.
Docstring Coverage ✅ Passed No functions found in the changed files to evaluate docstring coverage. Skipping docstring coverage check.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Commit unit tests in branch dev

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

@github-actions
Copy link
Copy Markdown

github-actions bot commented Apr 6, 2026

nf-core pipelines lint overall result: Passed ✅ ⚠️

Posted for pipeline commit 207b6b6

+| ✅ 106 tests passed       |+
#| ❔  19 tests were ignored |#
#| ❔   1 tests had warnings |#
!| ❗   4 tests had warnings |!
Details

❗ Test warnings:

  • files_exist - File not found: conf/igenomes.config
  • files_exist - File not found: conf/igenomes_ignored.config
  • files_exist - File not found: .github/workflows/awstest.yml
  • files_exist - File not found: .github/workflows/awsfulltest.yml

❔ Tests ignored:

❔ Tests fixed:

✅ Tests passed:

Run details

  • nf-core/tools version 3.5.2
  • Run at 2026-04-08 06:09:44

ypriverol and others added 2 commits April 6, 2026 21:10
Bruker .d to mzML conversion via tdf2mzml is no longer needed.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
@ypriverol ypriverol changed the title update zenodo link New release 2.0.0 including dda workflow for quantmsdiann Apr 7, 2026
@ypriverol ypriverol marked this pull request as draft April 7, 2026 05:47
ypriverol and others added 6 commits April 7, 2026 07:20
- Merge dev branch (version bump, tdf2mzml removal, lint fixes, DOI update)
- Update test_dda.config to use PXD022287 HeLa DDA dataset with subset FASTA
- Add test_dda profile to CI matrix in ci.yml

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
The test_dda profile uses ghcr.io/bigbio/diann:2.3.2 which is a
private container requiring authentication. Add Docker login step
(matching merge_ci.yml) conditioned on test_dda profile.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
- Remove implementation plan from repo, add docs/plans/ to .gitignore
- Add lib/VersionUtils.groovy for semantic version comparison
  (prevents string comparison bugs like '2.10.0' < '2.3')
- Update all version guards in dia.nf and module scripts to use
  VersionUtils.versionLessThan/versionAtLeast

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
DDA analysis support is a major feature warranting a major version bump.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
- Rename output version→versions in sdrf_parsing/meta.yml
- Add ch_ prefix to input_file→ch_input_file in input_check/meta.yml
- Fix grammar in pmultiqc and diann_msstats meta.yml descriptions
- Fix glob pattern in decompress_dotd/meta.yml (double-dot expansion)
- Update CITATIONS.md to link published Nature Methods article
- Fix schema_input.json error messages (source name, whitespace)
- Standardize quantmsdiann keyword in utils meta.yml

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
ypriverol and others added 6 commits April 7, 2026 13:10
DDA mode is now automatically detected from the SDRF
`comment[proteomics data acquisition method]` column when it contains
`data-dependent acquisition`. The `--diann_dda` flag is kept as a
fallback for SDRFs that lack this column.

- Modules now read acquisition method from meta instead of params
- INSILICO_LIBRARY_GENERATION receives is_dda as input from workflow
- Version guard triggers for both param and SDRF-detected DDA
- Updated docs, parameters, and schema to reflect auto-detection

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
- Revert meta.yml input_file naming to match workflow interface
- Gate --pre-select behind enable_infin_dia (InfinDIA-only flag)
- Tighten --direct-quant version check from 1.9 to 1.9.2
- Improve GHCR login: guard both token and username, skip test_dda
  gracefully on fork PRs where credentials are unavailable

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
feat: v1.0.0 — robustness, DDA support, new params, InfinDIA, docs
@ypriverol ypriverol marked this pull request as ready for review April 8, 2026 15:41
@ypriverol ypriverol requested a review from yueqixuan April 8, 2026 15:41
@ypriverol ypriverol marked this pull request as draft April 8, 2026 15:42
Copy link
Copy Markdown

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 5

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (1)
.github/workflows/ci.yml (1)

38-74: ⚠️ Potential issue | 🟠 Major

Keep test_dda out of the lightweight ci.yml workflow.

Line 38 adds a private-container profile to the fast/public workflow, and Lines 69-72 immediately have to skip it when GHCR credentials are absent. That means fork PRs can go green without ever exercising the DDA path. This profile belongs in extended_ci.yml/merge_ci.yml, while ci.yml should stay on the public 1.8.1 matrix.

Based on learnings Use 3-tier CI/CD strategy: every PR/push to dev tests latest DIA-NN (2.2.0) + 1.8.1; merge dev→master runs full version × feature matrix; ci.yml uses only 1.8.1 public container, and Tests using ghcr.io/bigbio/diann:* containers require GHCR authentication due to DIA-NN's academic-only license.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In @.github/workflows/ci.yml around lines 38 - 74, The CI workflow is including
a private-container profile (test_dda) in the lightweight ci.yml and
conditionally skipping GHCR login for forks; remove the test_dda profile and the
"Log in to GitHub Container Registry" step from ci.yml (or move both into
extended_ci.yml/merge_ci.yml) so ci.yml's matrix.test_profile and
matrix.exec_profile only include the public/testable entries (e.g., keep
test_dia, test_dia_dotd and exec_profile docker/public 1.8.1), and ensure
references to GHCR_TOKEN, GHCR_USERNAME, and the "Log in to GitHub Container
Registry" step exist only in the extended/merge workflows that run with GHCR
credentials.
🧹 Nitpick comments (3)
workflows/quantmsdiann.nf (1)

61-65: Correct routing logic for DIA and DDA samples.

The branch condition correctly routes both DIA and DDA acquisition methods to the DIA subworkflow. Since create_input_channel normalizes acquisition_method to exactly "dia" or "dda" (lowercase), the logic is sound.

Consider renaming the branch from dia to something like dia_dda or quantifiable for clarity, since it now handles both acquisition methods. This is optional.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@workflows/quantmsdiann.nf` around lines 61 - 65, The branch condition
currently sends both DIA and DDA samples to the same path by checking
item[0].acquisition_method (normalized by create_input_channel), which is
correct; update the branch label/key from "dia" to a clearer name like "dia_dda"
or "quantifiable" wherever used (the .branch closure producing the map key and
downstream references e.g., FILE_PREPARATION.out.results .branch { ... } .set {
ch_fileprep_result } and any consumers that read the branch map) so the intent
is explicit while keeping the existing boolean condition unchanged.
modules/local/diann/insilico_library_generation/main.nf (1)

51-54: Add module-local guards for the new DIA-NN flags.

Lines 51-54 and 75-79 now emit --dda, --light-models, --infin-dia, and --pre-select directly from this process, but the minimum-version checks live only in workflows/dia.nf. Please keep the guard next to the flags here as well, so the module fails fast if it is reused outside this workflow.

Based on learnings When adding a new DIA-NN feature, identify the minimum DIA-NN version required and add appropriate version checks in the module.

Also applies to: 75-79

docs/parameters.md (1)

61-64: Keep --diann_dda documented in one canonical spot.

It is described in both the general DIA-NN table and again in the dedicated DDA section. That duplication is easy to let drift; I’d keep the full parameter row once and let the other section be narrative guidance only.

Also applies to: 124-133

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@docs/parameters.md` around lines 61 - 64, The parameter `--diann_dda` is
duplicated: keep the full canonical table row for `--diann_dda` in the general
DIA-NN parameters table and remove the duplicate table row in the dedicated DDA
section; in that DDA section replace the duplicate row with a short narrative
note referencing the `--diann_dda` flag (e.g., "use `--diann_dda` to explicitly
enable DDA mode when SDRF lacks acquisition method") so the DDA section remains
guidance-only and the parameter definition is maintained only once.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@conf/tests/test_dda.config`:
- Around line 3-10: The new DDA test profile in conf/tests/test_dda.config
breaks the repo convention that conf/tests/*.config are DIA-only; rename or
relocate the profile so it conforms: either rename the profile from test_dda to
test_dia_dda (and update any references to that profile) or move the file out of
conf/tests into the appropriate feature/profile directory for non-DIA tests;
update any references to the profile name (e.g., places that call --profile
test_dda) and apply the same change for the other occurrences mentioned (lines
~22-34).

In `@docs/usage.md`:
- Around line 98-134: The docs contain a duplicate "### Preprocessing Options"
section and document a parameter `--convert_dotd` that is not present in
nextflow_schema.json; remove the duplicated/preliminary block so only the
canonical Preprocessing Options remain (locate the repeated heading and its list
in docs/usage.md), and either add a matching parameter entry for `convert_dotd`
to nextflow_schema.json (using the exact parameter key convert_dotd) or remove
the `--convert_dotd` line from the docs; after updating parameters run `nf-core
pipelines schema build` to regenerate nextflow_schema.json and ensure the docs
and schema stay in sync.

In `@subworkflows/local/create_input_channel/main.nf`:
- Around line 63-64: The code currently uses rows[0] (base_row) after groupTuple
to pick per-file metadata (dissociation method, tolerances, variableMods, m/z
ranges, and FixedModifications via fixedMods[0]) which makes output
order-dependent and can drop row-specific values; update the logic in the block
that references base_row and the other similar blocks (the grouped handling
around groupTuple and the comparable logic in the 88-105 and 129-182 regions) to
collect all unique values across rows for each metadata field and emit them in
the scalar format (e.g., join / dedupe into a single string or array as required
by downstream workflows), and if multiple differing values are incompatible with
aggregation, fail fast with a clear error referencing the file/group;
specifically replace uses of rows[0], fixedMods[0], and any single-row lookups
with aggregation functions (unique/dedupe) or explicit validation that throws
when disagreements exist.
- Around line 33-40: The current logic that rewrites filestr when
params.local_input_type is set naively replaces everything after the last '.'
which breaks compressed filenames like sample.raw.gz (becoming sample.raw.mzML).
Update the filestr rewriting to detect common compression suffixes (e.g., .gz,
.bz2, .zip, .xz): if a compression suffix is present, split filestr into base
and compressionSuffix, replace the base's extension with params.local_input_type
and then reattach the compressionSuffix; otherwise perform the existing
single-extension replacement. Keep references to the same variables (filestr,
params.local_input_type, row.Filename) so the change integrates into the
existing block that builds filestr.

In `@workflows/dia.nf`:
- Around line 71-78: The ch_is_dda calculation mixes params.diann_dda with
SDRF-derived meta.acquisition_method causing downstream inconsistency: ENSURE
the final boolean is used consistently by either making params.diann_dda a
strict override or passing ch_is_dda into all DIA-NN modules; specifically,
update the logic around ch_is_dda (function/variable ch_is_dda) to treat
params.diann_dda as a definitive override when present, and propagate that same
boolean into the INSILICO_LIBRARY_GENERATION, ASSEMBLE_EMPIRICAL_LIBRARY, and
INDIVIDUAL_ANALYSIS module inputs so they all use the same DDA flag instead of
re-reading meta.acquisition_method.

---

Outside diff comments:
In @.github/workflows/ci.yml:
- Around line 38-74: The CI workflow is including a private-container profile
(test_dda) in the lightweight ci.yml and conditionally skipping GHCR login for
forks; remove the test_dda profile and the "Log in to GitHub Container Registry"
step from ci.yml (or move both into extended_ci.yml/merge_ci.yml) so ci.yml's
matrix.test_profile and matrix.exec_profile only include the public/testable
entries (e.g., keep test_dia, test_dia_dotd and exec_profile docker/public
1.8.1), and ensure references to GHCR_TOKEN, GHCR_USERNAME, and the "Log in to
GitHub Container Registry" step exist only in the extended/merge workflows that
run with GHCR credentials.

---

Nitpick comments:
In `@docs/parameters.md`:
- Around line 61-64: The parameter `--diann_dda` is duplicated: keep the full
canonical table row for `--diann_dda` in the general DIA-NN parameters table and
remove the duplicate table row in the dedicated DDA section; in that DDA section
replace the duplicate row with a short narrative note referencing the
`--diann_dda` flag (e.g., "use `--diann_dda` to explicitly enable DDA mode when
SDRF lacks acquisition method") so the DDA section remains guidance-only and the
parameter definition is maintained only once.

In `@workflows/quantmsdiann.nf`:
- Around line 61-65: The branch condition currently sends both DIA and DDA
samples to the same path by checking item[0].acquisition_method (normalized by
create_input_channel), which is correct; update the branch label/key from "dia"
to a clearer name like "dia_dda" or "quantifiable" wherever used (the .branch
closure producing the map key and downstream references e.g.,
FILE_PREPARATION.out.results .branch { ... } .set { ch_fileprep_result } and any
consumers that read the branch map) so the intent is explicit while keeping the
existing boolean condition unchanged.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro

Run ID: ed739da3-8eb4-4f5a-a26b-5b9a92709834

📥 Commits

Reviewing files that changed from the base of the PR and between 66ece26 and 207b6b6.

📒 Files selected for processing (32)
  • .github/workflows/ci.yml
  • .github/workflows/extended_ci.yml
  • .gitignore
  • CITATIONS.md
  • assets/schema_input.json
  • conf/diann_versions/v2_3_2.config
  • conf/tests/test_dda.config
  • conf/tests/test_dia_skip_preanalysis.config
  • docs/parameters.md
  • docs/usage.md
  • lib/VersionUtils.groovy
  • modules/local/diann/assemble_empirical_library/main.nf
  • modules/local/diann/diann_msstats/main.nf
  • modules/local/diann/diann_msstats/meta.yml
  • modules/local/diann/final_quantification/main.nf
  • modules/local/diann/generate_cfg/main.nf
  • modules/local/diann/individual_analysis/main.nf
  • modules/local/diann/insilico_library_generation/main.nf
  • modules/local/diann/insilico_library_generation/meta.yml
  • modules/local/diann/preliminary_analysis/main.nf
  • modules/local/pmultiqc/meta.yml
  • modules/local/samplesheet_check/main.nf
  • modules/local/sdrf_parsing/main.nf
  • modules/local/sdrf_parsing/meta.yml
  • modules/local/utils/decompress_dotd/meta.yml
  • nextflow.config
  • nextflow_schema.json
  • subworkflows/local/create_input_channel/main.nf
  • subworkflows/local/input_check/meta.yml
  • subworkflows/local/utils_nfcore_quantms_pipeline/meta.yml
  • workflows/dia.nf
  • workflows/quantmsdiann.nf
✅ Files skipped from review due to trivial changes (12)
  • .gitignore
  • modules/local/utils/decompress_dotd/meta.yml
  • modules/local/diann/diann_msstats/meta.yml
  • subworkflows/local/input_check/meta.yml
  • CITATIONS.md
  • subworkflows/local/utils_nfcore_quantms_pipeline/meta.yml
  • modules/local/sdrf_parsing/meta.yml
  • modules/local/diann/insilico_library_generation/meta.yml
  • modules/local/pmultiqc/meta.yml
  • assets/schema_input.json
  • conf/diann_versions/v2_3_2.config
  • conf/tests/test_dia_skip_preanalysis.config
🚧 Files skipped from review as they are similar to previous changes (1)
  • nextflow.config

Comment on lines +3 to +10
Nextflow config file for testing DDA analysis (requires DIA-NN >= 2.3.2)
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
Tests DDA mode using the PXD022287 HeLa dataset with --diann_dda flag.
Uses ghcr.io/bigbio/diann:2.3.2.

Use as follows:
nextflow run bigbio/quantmsdiann -profile test_dda,docker [--outdir <OUTDIR>]

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

conf/tests/test_dda.config breaks the current test-profile contract.

This introduces a DDA profile under conf/tests and exposes it as test_dda, but this directory/naming scheme is currently reserved for DIA-only profiles. If DDA is now meant to be first-class here, the repository convention should be updated in the same PR; otherwise this profile should move or be renamed.

As per coding guidelines, conf/tests/*.config test profiles are DIA-only, and based on learnings new feature profiles should follow test_dia_<feature_name>.

Also applies to: 22-34

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@conf/tests/test_dda.config` around lines 3 - 10, The new DDA test profile in
conf/tests/test_dda.config breaks the repo convention that conf/tests/*.config
are DIA-only; rename or relocate the profile so it conforms: either rename the
profile from test_dda to test_dia_dda (and update any references to that
profile) or move the file out of conf/tests into the appropriate feature/profile
directory for non-DIA tests; update any references to the profile name (e.g.,
places that call --profile test_dda) and apply the same change for the other
occurrences mentioned (lines ~22-34).

Comment on lines +98 to +134
### Preprocessing Options

- `--reindex_mzml` (default: true) — Re-index mzML files before processing. Disable with `--reindex_mzml false` if files are already indexed.
- `--mzml_statistics` (default: false) — Generate mzML statistics (parquet format) for QC.
- `--mzml_features` (default: false) — Enable feature detection in mzML statistics.
- `--convert_dotd` (default: false) — Convert Bruker .d files to mzML via tdf2mzml instead of passing natively to DIA-NN.

### Passing Extra Arguments to DIA-NN

Use `--diann_extra_args` to pass additional flags to all DIA-NN steps. The pipeline validates and strips flags it manages internally to prevent conflicts.

Managed flags (stripped with a warning if passed via extra_args): `--lib`, `--f`, `--fasta`, `--threads`, `--verbose`, `--temp`, `--out`, `--matrices`, `--use-quant`, `--gen-spec-lib`, `--mass-acc`, `--mass-acc-ms1`, `--window`, `--var-mod`, `--fixed-mod`, `--monitor-mod`, and others.

To enable this, add `includeConfig 'conf/modules/dia.config'` to your configuration (already included by default).

### DIA-NN Version Selection

The default DIA-NN version is 1.8.1. To use a different version:

| Version | Profile | Features |
| ------- | ----------------------- | ----------------------------------- |
| 1.8.1 | (default) | Core DIA analysis |
| 2.1.0 | `-profile diann_v2_1_0` | Native .raw support, reduced memory |
| 2.2.0 | `-profile diann_v2_2_0` | Speed optimizations |
| 2.3.2 | `-profile diann_v2_3_2` | DDA support, InfinDIA |

Example: `nextflow run bigbio/quantmsdiann -profile test_dia,docker,diann_v2_2_0`

### Verbose Module Output

Use `-profile verbose_modules` to publish intermediate files from all pipeline steps:

```bash
nextflow run bigbio/quantmsdiann -profile test_dia,docker,verbose_modules --outdir results
```

This publishes ThermoRawFileParser conversions, mzML indexing results, per-file DIA-NN logs, and spectral library intermediates.
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor

Consolidate this duplicate section and fix the schema drift.

Line 98 repeats the existing ### Preprocessing Options heading, and the extra-args/version/verbose blocks here are second copies of the fuller sections later in the same file. Line 103 also documents --convert_dotd, but there is no matching property anywhere in the provided nextflow_schema.json, so schema-driven help and docs will disagree. Please keep a single section and either add the schema entry or remove the parameter mention.

Based on learnings Update nextflow_schema.json using nf-core pipelines schema build whenever parameters are added or modified.

🧰 Tools
🪛 markdownlint-cli2 (0.22.0)

[warning] 98-98: Multiple headings with the same content

(MD024, no-duplicate-heading)

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@docs/usage.md` around lines 98 - 134, The docs contain a duplicate "###
Preprocessing Options" section and document a parameter `--convert_dotd` that is
not present in nextflow_schema.json; remove the duplicated/preliminary block so
only the canonical Preprocessing Options remain (locate the repeated heading and
its list in docs/usage.md), and either add a matching parameter entry for
`convert_dotd` to nextflow_schema.json (using the exact parameter key
convert_dotd) or remove the `--convert_dotd` line from the docs; after updating
parameters run `nf-core pipelines schema build` to regenerate
nextflow_schema.json and ensure the docs and schema stay in sync.

Comment on lines +33 to +40
if (!params.root_folder) {
filestr = row.URI?.toString()?.trim() ? row.URI.toString() : row.Filename.toString()
} else {
filestr = row.Filename.toString()
filestr = params.root_folder + File.separator + filestr
filestr = (params.local_input_type
? filestr.take(filestr.lastIndexOf('.')) + '.' + params.local_input_type
: filestr)
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

local_input_type rewrites break compressed local copies.

Replacing only the suffix after the last . turns paths like sample.raw.gz into sample.raw.mzML, so the existence check fails for the compressed variants that --local_input_type is supposed to support.

Suggested fix
-                filestr = row.Filename.toString()
+                filestr = row.Filename.toString().trim()
                 filestr = params.root_folder + File.separator + filestr
-                filestr = (params.local_input_type
-                    ? filestr.take(filestr.lastIndexOf('.')) + '.' + params.local_input_type
-                    : filestr)
+                if (params.local_input_type) {
+                    def compressionSuffix = ['.tar.gz', '.gz', '.tar', '.zip']
+                        .find { suffix -> filestr.toLowerCase().endsWith(suffix) } ?: ''
+                    def uncompressedPath = compressionSuffix
+                        ? filestr[0..<(filestr.length() - compressionSuffix.length())]
+                        : filestr
+                    def extIndex = uncompressedPath.lastIndexOf('.')
+                    def basePath = extIndex >= 0 ? uncompressedPath.take(extIndex) : uncompressedPath
+                    filestr = basePath + '.' + params.local_input_type + compressionSuffix
+                }
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
if (!params.root_folder) {
filestr = row.URI?.toString()?.trim() ? row.URI.toString() : row.Filename.toString()
} else {
filestr = row.Filename.toString()
filestr = params.root_folder + File.separator + filestr
filestr = (params.local_input_type
? filestr.take(filestr.lastIndexOf('.')) + '.' + params.local_input_type
: filestr)
if (!params.root_folder) {
filestr = row.URI?.toString()?.trim() ? row.URI.toString() : row.Filename.toString()
} else {
filestr = row.Filename.toString().trim()
filestr = params.root_folder + File.separator + filestr
if (params.local_input_type) {
def compressionSuffix = ['.tar.gz', '.gz', '.tar', '.zip']
.find { suffix -> filestr.toLowerCase().endsWith(suffix) } ?: ''
def uncompressedPath = compressionSuffix
? filestr[0..<(filestr.length() - compressionSuffix.length())]
: filestr
def extIndex = uncompressedPath.lastIndexOf('.')
def basePath = extIndex >= 0 ? uncompressedPath.take(extIndex) : uncompressedPath
filestr = basePath + '.' + params.local_input_type + compressionSuffix
}
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@subworkflows/local/create_input_channel/main.nf` around lines 33 - 40, The
current logic that rewrites filestr when params.local_input_type is set naively
replaces everything after the last '.' which breaks compressed filenames like
sample.raw.gz (becoming sample.raw.mzML). Update the filestr rewriting to detect
common compression suffixes (e.g., .gz, .bz2, .zip, .xz): if a compression
suffix is present, split filestr into base and compressionSuffix, replace the
base's extension with params.local_input_type and then reattach the
compressionSuffix; otherwise perform the existing single-extension replacement.
Keep references to the same variables (filestr, params.local_input_type,
row.Filename) so the change integrates into the existing block that builds
filestr.

Comment on lines +63 to 64
def base_row = rows[0]

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

Grouped SDRF rows still make per-file metadata order-dependent.

After groupTuple, rows[0] becomes the source of truth for dissociation method, tolerances, variable mods, and m/z ranges, and FixedModifications keeps only fixedMods[0]. For multiplexed files that can silently drop row-specific values before workflows/dia.nf forwards them downstream. Please aggregate every unique value into the emitted scalar format, or fail fast when grouped rows disagree.

Also applies to: 88-105, 129-182

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@subworkflows/local/create_input_channel/main.nf` around lines 63 - 64, The
code currently uses rows[0] (base_row) after groupTuple to pick per-file
metadata (dissociation method, tolerances, variableMods, m/z ranges, and
FixedModifications via fixedMods[0]) which makes output order-dependent and can
drop row-specific values; update the logic in the block that references base_row
and the other similar blocks (the grouped handling around groupTuple and the
comparable logic in the 88-105 and 129-182 regions) to collect all unique values
across rows for each metadata field and emit them in the scalar format (e.g.,
join / dedupe into a single string or array as required by downstream
workflows), and if multiple differing values are incompatible with aggregation,
fail fast with a clear error referencing the file/group; specifically replace
uses of rows[0], fixedMods[0], and any single-row lookups with aggregation
functions (unique/dedupe) or explicit validation that throws when disagreements
exist.

Comment on lines +71 to +78
// Determine DDA mode: true if explicitly set via param OR auto-detected from SDRF
ch_is_dda = ch_experiment_meta.map { meta ->
def dda = params.diann_dda || meta.acquisition_method == 'dda'
if (dda && VersionUtils.versionLessThan(params.diann_version, '2.3.2')) {
error("DDA mode (detected from SDRF) requires DIA-NN >= 2.3.2. Current version: ${params.diann_version}. Use -profile diann_v2_3_2")
}
return dda
}
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

ch_is_dda can disagree with the downstream modules.

Line 73 re-applies params.diann_dda on top of the SDRF-derived acquisition method, but only INSILICO_LIBRARY_GENERATION receives that boolean at Line 90. ASSEMBLE_EMPIRICAL_LIBRARY and INDIVIDUAL_ANALYSIS still derive DDA from meta.acquisition_method, so --diann_dda true against an SDRF that explicitly says DIA will make the first step run with --dda while later steps stay in DIA mode. Either keep --diann_dda as a strict fallback here too, or thread the same boolean through every DIA-NN module.

Possible fix
-    ch_is_dda = ch_experiment_meta.map { meta ->
-        def dda = params.diann_dda || meta.acquisition_method == 'dda'
+    ch_is_dda = ch_experiment_meta.map { meta ->
+        if (params.diann_dda && meta.acquisition_method == 'dia') {
+            error('--diann_dda is only a fallback when the SDRF lacks an acquisition method; remove it or fix the SDRF value.')
+        }
+        def dda = meta.acquisition_method == 'dda'
         if (dda && VersionUtils.versionLessThan(params.diann_version, '2.3.2')) {
             error("DDA mode (detected from SDRF) requires DIA-NN >= 2.3.2. Current version: ${params.diann_version}. Use -profile diann_v2_3_2")
         }
         return dda
     }

Also applies to: 90-90

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@workflows/dia.nf` around lines 71 - 78, The ch_is_dda calculation mixes
params.diann_dda with SDRF-derived meta.acquisition_method causing downstream
inconsistency: ENSURE the final boolean is used consistently by either making
params.diann_dda a strict override or passing ch_is_dda into all DIA-NN modules;
specifically, update the logic around ch_is_dda (function/variable ch_is_dda) to
treat params.diann_dda as a definitive override when present, and propagate that
same boolean into the INSILICO_LIBRARY_GENERATION, ASSEMBLE_EMPIRICAL_LIBRARY,
and INDIVIDUAL_ANALYSIS module inputs so they all use the same DDA flag instead
of re-reading meta.acquisition_method.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants