Conversation
13-task plan covering robustness fixes, DDA support, new DIA-NN params, InfinDIA groundwork, comprehensive documentation, and issue cleanup. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Without pipefail, if the command before tee fails, tee returns 0 and the Nextflow task appears to succeed. This masked failures in generate_cfg, diann_msstats, samplesheet_check, and sdrf_parsing. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
These are the longest-running tasks and most susceptible to transient failures (OOM, I/O timeouts). The error_retry label enables automatic retry on signal exits (130-145, 104, 175). Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Guard ch_searchdb and ch_experiment_meta with ifEmpty to fail fast with clear error messages instead of hanging indefinitely. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Adds conf/diann_versions/v2_3_2.config with ghcr.io/bigbio/diann:2.3.2 container. Use -profile diann_v2_3_2 to opt in. Default stays 1.8.1. Enables DDA support and InfinDIA features. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
- New param diann_dda (boolean, default: false) - Version guard: requires DIA-NN >= 2.3.2 - Passes --dda to all 5 DIA-NN modules when enabled - Accepts DDA acquisition method in SDRF when diann_dda=true - Added --dda to blocked lists in all modules Closes #5 Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
- test_dda: BSA dataset with diann_dda=true on DIA-NN 2.3.2 - test_dia_skip_preanalysis: tests previously untested skip path Both added to extended_ci.yml stage 2a. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
- diann_light_models: 10x faster in-silico library generation - diann_export_quant: fragment-level parquet export - diann_site_ms1_quant: MS1 apex intensities for PTM quantification All require DIA-NN >= 2.0. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Experimental support for InfinDIA (DIA-NN 2.3.0+). Passes --infin-dia to library generation when enabled. Version guard enforces >= 2.3.0. No test config — InfinDIA requires large databases. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Complete reference for all ~70 pipeline parameters grouped by category with types, defaults, descriptions, and version requirements. Closes #1 Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
- DDA mode documentation with limitations - Missing param sections (preprocessing, extra_args scope, verbose output) - DIA-NN version selection guide - Parquet vs TSV output explanation - MSstats format section - pmultiqc citation added - README updated with version table and parameter reference link Closes #3, #9, #15 Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
- Add version guard for DIA-NN 2.0+ params (--light-models, --export-quant, --site-ms1-quant) to prevent crashes with 1.8.1 - Add *.site_report.parquet as optional output in FINAL_QUANTIFICATION for site-level PTM quantification Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
1. test_dda.config: Add diann_version = '2.3.2' so the version guard
doesn't reject DDA mode (default is 1.8.1, guard requires >= 2.3.2)
2. quantmsdiann.nf: Update branch condition to also match "dda"
acquisition method. Previously "dda".contains("dia") was false,
causing all DDA files to be silently dropped from processing.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
These flags exist in DIA-NN 1.8.x but were removed in 2.3.x, causing 'unrecognised option' warnings. Only pass them for versions < 2.3. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
📝 WalkthroughWalkthroughAdded DDA support and related DIA-NN options across the pipeline, removed the tdf2mzml module and its manifest, updated documentation and citation (Zenodo DOI), introduced version guards and new test/config profiles, and added a VersionUtils helper for semantic version checks. Changes
Sequence Diagram(s)mermaid Estimated code review effort🎯 4 (Complex) | ⏱️ ~45 minutes Possibly related issues
Possibly related PRs
Suggested reviewers
Poem
🚥 Pre-merge checks | ✅ 3✅ Passed checks (3 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches🧪 Generate unit tests (beta)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
|
Bruker .d to mzML conversion via tdf2mzml is no longer needed. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
- Merge dev branch (version bump, tdf2mzml removal, lint fixes, DOI update) - Update test_dda.config to use PXD022287 HeLa DDA dataset with subset FASTA - Add test_dda profile to CI matrix in ci.yml Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
The test_dda profile uses ghcr.io/bigbio/diann:2.3.2 which is a private container requiring authentication. Add Docker login step (matching merge_ci.yml) conditioned on test_dda profile. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
- Remove implementation plan from repo, add docs/plans/ to .gitignore - Add lib/VersionUtils.groovy for semantic version comparison (prevents string comparison bugs like '2.10.0' < '2.3') - Update all version guards in dia.nf and module scripts to use VersionUtils.versionLessThan/versionAtLeast Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
DDA analysis support is a major feature warranting a major version bump. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
- Rename output version→versions in sdrf_parsing/meta.yml - Add ch_ prefix to input_file→ch_input_file in input_check/meta.yml - Fix grammar in pmultiqc and diann_msstats meta.yml descriptions - Fix glob pattern in decompress_dotd/meta.yml (double-dot expansion) - Update CITATIONS.md to link published Nature Methods article - Fix schema_input.json error messages (source name, whitespace) - Standardize quantmsdiann keyword in utils meta.yml Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
DDA mode is now automatically detected from the SDRF `comment[proteomics data acquisition method]` column when it contains `data-dependent acquisition`. The `--diann_dda` flag is kept as a fallback for SDRFs that lack this column. - Modules now read acquisition method from meta instead of params - INSILICO_LIBRARY_GENERATION receives is_dda as input from workflow - Version guard triggers for both param and SDRF-detected DDA - Updated docs, parameters, and schema to reflect auto-detection Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
- Revert meta.yml input_file naming to match workflow interface - Gate --pre-select behind enable_infin_dia (InfinDIA-only flag) - Tighten --direct-quant version check from 1.9 to 1.9.2 - Improve GHCR login: guard both token and username, skip test_dda gracefully on fork PRs where credentials are unavailable Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
feat: v1.0.0 — robustness, DDA support, new params, InfinDIA, docs
Support multiplexing
There was a problem hiding this comment.
Actionable comments posted: 5
Caution
Some comments are outside the diff and can’t be posted inline due to platform limitations.
⚠️ Outside diff range comments (1)
.github/workflows/ci.yml (1)
38-74:⚠️ Potential issue | 🟠 MajorKeep
test_ddaout of the lightweightci.ymlworkflow.Line 38 adds a private-container profile to the fast/public workflow, and Lines 69-72 immediately have to skip it when GHCR credentials are absent. That means fork PRs can go green without ever exercising the DDA path. This profile belongs in
extended_ci.yml/merge_ci.yml, whileci.ymlshould stay on the public 1.8.1 matrix.Based on learnings Use 3-tier CI/CD strategy: every PR/push to dev tests latest DIA-NN (2.2.0) + 1.8.1; merge dev→master runs full version × feature matrix; ci.yml uses only 1.8.1 public container, and Tests using ghcr.io/bigbio/diann:* containers require GHCR authentication due to DIA-NN's academic-only license.
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In @.github/workflows/ci.yml around lines 38 - 74, The CI workflow is including a private-container profile (test_dda) in the lightweight ci.yml and conditionally skipping GHCR login for forks; remove the test_dda profile and the "Log in to GitHub Container Registry" step from ci.yml (or move both into extended_ci.yml/merge_ci.yml) so ci.yml's matrix.test_profile and matrix.exec_profile only include the public/testable entries (e.g., keep test_dia, test_dia_dotd and exec_profile docker/public 1.8.1), and ensure references to GHCR_TOKEN, GHCR_USERNAME, and the "Log in to GitHub Container Registry" step exist only in the extended/merge workflows that run with GHCR credentials.
🧹 Nitpick comments (3)
workflows/quantmsdiann.nf (1)
61-65: Correct routing logic for DIA and DDA samples.The branch condition correctly routes both DIA and DDA acquisition methods to the DIA subworkflow. Since
create_input_channelnormalizesacquisition_methodto exactly"dia"or"dda"(lowercase), the logic is sound.Consider renaming the branch from
diato something likedia_ddaorquantifiablefor clarity, since it now handles both acquisition methods. This is optional.🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@workflows/quantmsdiann.nf` around lines 61 - 65, The branch condition currently sends both DIA and DDA samples to the same path by checking item[0].acquisition_method (normalized by create_input_channel), which is correct; update the branch label/key from "dia" to a clearer name like "dia_dda" or "quantifiable" wherever used (the .branch closure producing the map key and downstream references e.g., FILE_PREPARATION.out.results .branch { ... } .set { ch_fileprep_result } and any consumers that read the branch map) so the intent is explicit while keeping the existing boolean condition unchanged.modules/local/diann/insilico_library_generation/main.nf (1)
51-54: Add module-local guards for the new DIA-NN flags.Lines 51-54 and 75-79 now emit
--dda,--light-models,--infin-dia, and--pre-selectdirectly from this process, but the minimum-version checks live only inworkflows/dia.nf. Please keep the guard next to the flags here as well, so the module fails fast if it is reused outside this workflow.Based on learnings When adding a new DIA-NN feature, identify the minimum DIA-NN version required and add appropriate version checks in the module.
Also applies to: 75-79
docs/parameters.md (1)
61-64: Keep--diann_ddadocumented in one canonical spot.It is described in both the general DIA-NN table and again in the dedicated DDA section. That duplication is easy to let drift; I’d keep the full parameter row once and let the other section be narrative guidance only.
Also applies to: 124-133
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@docs/parameters.md` around lines 61 - 64, The parameter `--diann_dda` is duplicated: keep the full canonical table row for `--diann_dda` in the general DIA-NN parameters table and remove the duplicate table row in the dedicated DDA section; in that DDA section replace the duplicate row with a short narrative note referencing the `--diann_dda` flag (e.g., "use `--diann_dda` to explicitly enable DDA mode when SDRF lacks acquisition method") so the DDA section remains guidance-only and the parameter definition is maintained only once.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@conf/tests/test_dda.config`:
- Around line 3-10: The new DDA test profile in conf/tests/test_dda.config
breaks the repo convention that conf/tests/*.config are DIA-only; rename or
relocate the profile so it conforms: either rename the profile from test_dda to
test_dia_dda (and update any references to that profile) or move the file out of
conf/tests into the appropriate feature/profile directory for non-DIA tests;
update any references to the profile name (e.g., places that call --profile
test_dda) and apply the same change for the other occurrences mentioned (lines
~22-34).
In `@docs/usage.md`:
- Around line 98-134: The docs contain a duplicate "### Preprocessing Options"
section and document a parameter `--convert_dotd` that is not present in
nextflow_schema.json; remove the duplicated/preliminary block so only the
canonical Preprocessing Options remain (locate the repeated heading and its list
in docs/usage.md), and either add a matching parameter entry for `convert_dotd`
to nextflow_schema.json (using the exact parameter key convert_dotd) or remove
the `--convert_dotd` line from the docs; after updating parameters run `nf-core
pipelines schema build` to regenerate nextflow_schema.json and ensure the docs
and schema stay in sync.
In `@subworkflows/local/create_input_channel/main.nf`:
- Around line 63-64: The code currently uses rows[0] (base_row) after groupTuple
to pick per-file metadata (dissociation method, tolerances, variableMods, m/z
ranges, and FixedModifications via fixedMods[0]) which makes output
order-dependent and can drop row-specific values; update the logic in the block
that references base_row and the other similar blocks (the grouped handling
around groupTuple and the comparable logic in the 88-105 and 129-182 regions) to
collect all unique values across rows for each metadata field and emit them in
the scalar format (e.g., join / dedupe into a single string or array as required
by downstream workflows), and if multiple differing values are incompatible with
aggregation, fail fast with a clear error referencing the file/group;
specifically replace uses of rows[0], fixedMods[0], and any single-row lookups
with aggregation functions (unique/dedupe) or explicit validation that throws
when disagreements exist.
- Around line 33-40: The current logic that rewrites filestr when
params.local_input_type is set naively replaces everything after the last '.'
which breaks compressed filenames like sample.raw.gz (becoming sample.raw.mzML).
Update the filestr rewriting to detect common compression suffixes (e.g., .gz,
.bz2, .zip, .xz): if a compression suffix is present, split filestr into base
and compressionSuffix, replace the base's extension with params.local_input_type
and then reattach the compressionSuffix; otherwise perform the existing
single-extension replacement. Keep references to the same variables (filestr,
params.local_input_type, row.Filename) so the change integrates into the
existing block that builds filestr.
In `@workflows/dia.nf`:
- Around line 71-78: The ch_is_dda calculation mixes params.diann_dda with
SDRF-derived meta.acquisition_method causing downstream inconsistency: ENSURE
the final boolean is used consistently by either making params.diann_dda a
strict override or passing ch_is_dda into all DIA-NN modules; specifically,
update the logic around ch_is_dda (function/variable ch_is_dda) to treat
params.diann_dda as a definitive override when present, and propagate that same
boolean into the INSILICO_LIBRARY_GENERATION, ASSEMBLE_EMPIRICAL_LIBRARY, and
INDIVIDUAL_ANALYSIS module inputs so they all use the same DDA flag instead of
re-reading meta.acquisition_method.
---
Outside diff comments:
In @.github/workflows/ci.yml:
- Around line 38-74: The CI workflow is including a private-container profile
(test_dda) in the lightweight ci.yml and conditionally skipping GHCR login for
forks; remove the test_dda profile and the "Log in to GitHub Container Registry"
step from ci.yml (or move both into extended_ci.yml/merge_ci.yml) so ci.yml's
matrix.test_profile and matrix.exec_profile only include the public/testable
entries (e.g., keep test_dia, test_dia_dotd and exec_profile docker/public
1.8.1), and ensure references to GHCR_TOKEN, GHCR_USERNAME, and the "Log in to
GitHub Container Registry" step exist only in the extended/merge workflows that
run with GHCR credentials.
---
Nitpick comments:
In `@docs/parameters.md`:
- Around line 61-64: The parameter `--diann_dda` is duplicated: keep the full
canonical table row for `--diann_dda` in the general DIA-NN parameters table and
remove the duplicate table row in the dedicated DDA section; in that DDA section
replace the duplicate row with a short narrative note referencing the
`--diann_dda` flag (e.g., "use `--diann_dda` to explicitly enable DDA mode when
SDRF lacks acquisition method") so the DDA section remains guidance-only and the
parameter definition is maintained only once.
In `@workflows/quantmsdiann.nf`:
- Around line 61-65: The branch condition currently sends both DIA and DDA
samples to the same path by checking item[0].acquisition_method (normalized by
create_input_channel), which is correct; update the branch label/key from "dia"
to a clearer name like "dia_dda" or "quantifiable" wherever used (the .branch
closure producing the map key and downstream references e.g.,
FILE_PREPARATION.out.results .branch { ... } .set { ch_fileprep_result } and any
consumers that read the branch map) so the intent is explicit while keeping the
existing boolean condition unchanged.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: defaults
Review profile: CHILL
Plan: Pro
Run ID: ed739da3-8eb4-4f5a-a26b-5b9a92709834
📒 Files selected for processing (32)
.github/workflows/ci.yml.github/workflows/extended_ci.yml.gitignoreCITATIONS.mdassets/schema_input.jsonconf/diann_versions/v2_3_2.configconf/tests/test_dda.configconf/tests/test_dia_skip_preanalysis.configdocs/parameters.mddocs/usage.mdlib/VersionUtils.groovymodules/local/diann/assemble_empirical_library/main.nfmodules/local/diann/diann_msstats/main.nfmodules/local/diann/diann_msstats/meta.ymlmodules/local/diann/final_quantification/main.nfmodules/local/diann/generate_cfg/main.nfmodules/local/diann/individual_analysis/main.nfmodules/local/diann/insilico_library_generation/main.nfmodules/local/diann/insilico_library_generation/meta.ymlmodules/local/diann/preliminary_analysis/main.nfmodules/local/pmultiqc/meta.ymlmodules/local/samplesheet_check/main.nfmodules/local/sdrf_parsing/main.nfmodules/local/sdrf_parsing/meta.ymlmodules/local/utils/decompress_dotd/meta.ymlnextflow.confignextflow_schema.jsonsubworkflows/local/create_input_channel/main.nfsubworkflows/local/input_check/meta.ymlsubworkflows/local/utils_nfcore_quantms_pipeline/meta.ymlworkflows/dia.nfworkflows/quantmsdiann.nf
✅ Files skipped from review due to trivial changes (12)
- .gitignore
- modules/local/utils/decompress_dotd/meta.yml
- modules/local/diann/diann_msstats/meta.yml
- subworkflows/local/input_check/meta.yml
- CITATIONS.md
- subworkflows/local/utils_nfcore_quantms_pipeline/meta.yml
- modules/local/sdrf_parsing/meta.yml
- modules/local/diann/insilico_library_generation/meta.yml
- modules/local/pmultiqc/meta.yml
- assets/schema_input.json
- conf/diann_versions/v2_3_2.config
- conf/tests/test_dia_skip_preanalysis.config
🚧 Files skipped from review as they are similar to previous changes (1)
- nextflow.config
| Nextflow config file for testing DDA analysis (requires DIA-NN >= 2.3.2) | ||
| ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ | ||
| Tests DDA mode using the PXD022287 HeLa dataset with --diann_dda flag. | ||
| Uses ghcr.io/bigbio/diann:2.3.2. | ||
|
|
||
| Use as follows: | ||
| nextflow run bigbio/quantmsdiann -profile test_dda,docker [--outdir <OUTDIR>] | ||
|
|
There was a problem hiding this comment.
conf/tests/test_dda.config breaks the current test-profile contract.
This introduces a DDA profile under conf/tests and exposes it as test_dda, but this directory/naming scheme is currently reserved for DIA-only profiles. If DDA is now meant to be first-class here, the repository convention should be updated in the same PR; otherwise this profile should move or be renamed.
As per coding guidelines, conf/tests/*.config test profiles are DIA-only, and based on learnings new feature profiles should follow test_dia_<feature_name>.
Also applies to: 22-34
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@conf/tests/test_dda.config` around lines 3 - 10, The new DDA test profile in
conf/tests/test_dda.config breaks the repo convention that conf/tests/*.config
are DIA-only; rename or relocate the profile so it conforms: either rename the
profile from test_dda to test_dia_dda (and update any references to that
profile) or move the file out of conf/tests into the appropriate feature/profile
directory for non-DIA tests; update any references to the profile name (e.g.,
places that call --profile test_dda) and apply the same change for the other
occurrences mentioned (lines ~22-34).
| ### Preprocessing Options | ||
|
|
||
| - `--reindex_mzml` (default: true) — Re-index mzML files before processing. Disable with `--reindex_mzml false` if files are already indexed. | ||
| - `--mzml_statistics` (default: false) — Generate mzML statistics (parquet format) for QC. | ||
| - `--mzml_features` (default: false) — Enable feature detection in mzML statistics. | ||
| - `--convert_dotd` (default: false) — Convert Bruker .d files to mzML via tdf2mzml instead of passing natively to DIA-NN. | ||
|
|
||
| ### Passing Extra Arguments to DIA-NN | ||
|
|
||
| Use `--diann_extra_args` to pass additional flags to all DIA-NN steps. The pipeline validates and strips flags it manages internally to prevent conflicts. | ||
|
|
||
| Managed flags (stripped with a warning if passed via extra_args): `--lib`, `--f`, `--fasta`, `--threads`, `--verbose`, `--temp`, `--out`, `--matrices`, `--use-quant`, `--gen-spec-lib`, `--mass-acc`, `--mass-acc-ms1`, `--window`, `--var-mod`, `--fixed-mod`, `--monitor-mod`, and others. | ||
|
|
||
| To enable this, add `includeConfig 'conf/modules/dia.config'` to your configuration (already included by default). | ||
|
|
||
| ### DIA-NN Version Selection | ||
|
|
||
| The default DIA-NN version is 1.8.1. To use a different version: | ||
|
|
||
| | Version | Profile | Features | | ||
| | ------- | ----------------------- | ----------------------------------- | | ||
| | 1.8.1 | (default) | Core DIA analysis | | ||
| | 2.1.0 | `-profile diann_v2_1_0` | Native .raw support, reduced memory | | ||
| | 2.2.0 | `-profile diann_v2_2_0` | Speed optimizations | | ||
| | 2.3.2 | `-profile diann_v2_3_2` | DDA support, InfinDIA | | ||
|
|
||
| Example: `nextflow run bigbio/quantmsdiann -profile test_dia,docker,diann_v2_2_0` | ||
|
|
||
| ### Verbose Module Output | ||
|
|
||
| Use `-profile verbose_modules` to publish intermediate files from all pipeline steps: | ||
|
|
||
| ```bash | ||
| nextflow run bigbio/quantmsdiann -profile test_dia,docker,verbose_modules --outdir results | ||
| ``` | ||
|
|
||
| This publishes ThermoRawFileParser conversions, mzML indexing results, per-file DIA-NN logs, and spectral library intermediates. |
There was a problem hiding this comment.
Consolidate this duplicate section and fix the schema drift.
Line 98 repeats the existing ### Preprocessing Options heading, and the extra-args/version/verbose blocks here are second copies of the fuller sections later in the same file. Line 103 also documents --convert_dotd, but there is no matching property anywhere in the provided nextflow_schema.json, so schema-driven help and docs will disagree. Please keep a single section and either add the schema entry or remove the parameter mention.
Based on learnings Update nextflow_schema.json using nf-core pipelines schema build whenever parameters are added or modified.
🧰 Tools
🪛 markdownlint-cli2 (0.22.0)
[warning] 98-98: Multiple headings with the same content
(MD024, no-duplicate-heading)
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@docs/usage.md` around lines 98 - 134, The docs contain a duplicate "###
Preprocessing Options" section and document a parameter `--convert_dotd` that is
not present in nextflow_schema.json; remove the duplicated/preliminary block so
only the canonical Preprocessing Options remain (locate the repeated heading and
its list in docs/usage.md), and either add a matching parameter entry for
`convert_dotd` to nextflow_schema.json (using the exact parameter key
convert_dotd) or remove the `--convert_dotd` line from the docs; after updating
parameters run `nf-core pipelines schema build` to regenerate
nextflow_schema.json and ensure the docs and schema stay in sync.
| if (!params.root_folder) { | ||
| filestr = row.URI?.toString()?.trim() ? row.URI.toString() : row.Filename.toString() | ||
| } else { | ||
| filestr = row.Filename.toString() | ||
| filestr = params.root_folder + File.separator + filestr | ||
| filestr = (params.local_input_type | ||
| ? filestr.take(filestr.lastIndexOf('.')) + '.' + params.local_input_type | ||
| : filestr) |
There was a problem hiding this comment.
local_input_type rewrites break compressed local copies.
Replacing only the suffix after the last . turns paths like sample.raw.gz into sample.raw.mzML, so the existence check fails for the compressed variants that --local_input_type is supposed to support.
Suggested fix
- filestr = row.Filename.toString()
+ filestr = row.Filename.toString().trim()
filestr = params.root_folder + File.separator + filestr
- filestr = (params.local_input_type
- ? filestr.take(filestr.lastIndexOf('.')) + '.' + params.local_input_type
- : filestr)
+ if (params.local_input_type) {
+ def compressionSuffix = ['.tar.gz', '.gz', '.tar', '.zip']
+ .find { suffix -> filestr.toLowerCase().endsWith(suffix) } ?: ''
+ def uncompressedPath = compressionSuffix
+ ? filestr[0..<(filestr.length() - compressionSuffix.length())]
+ : filestr
+ def extIndex = uncompressedPath.lastIndexOf('.')
+ def basePath = extIndex >= 0 ? uncompressedPath.take(extIndex) : uncompressedPath
+ filestr = basePath + '.' + params.local_input_type + compressionSuffix
+ }📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| if (!params.root_folder) { | |
| filestr = row.URI?.toString()?.trim() ? row.URI.toString() : row.Filename.toString() | |
| } else { | |
| filestr = row.Filename.toString() | |
| filestr = params.root_folder + File.separator + filestr | |
| filestr = (params.local_input_type | |
| ? filestr.take(filestr.lastIndexOf('.')) + '.' + params.local_input_type | |
| : filestr) | |
| if (!params.root_folder) { | |
| filestr = row.URI?.toString()?.trim() ? row.URI.toString() : row.Filename.toString() | |
| } else { | |
| filestr = row.Filename.toString().trim() | |
| filestr = params.root_folder + File.separator + filestr | |
| if (params.local_input_type) { | |
| def compressionSuffix = ['.tar.gz', '.gz', '.tar', '.zip'] | |
| .find { suffix -> filestr.toLowerCase().endsWith(suffix) } ?: '' | |
| def uncompressedPath = compressionSuffix | |
| ? filestr[0..<(filestr.length() - compressionSuffix.length())] | |
| : filestr | |
| def extIndex = uncompressedPath.lastIndexOf('.') | |
| def basePath = extIndex >= 0 ? uncompressedPath.take(extIndex) : uncompressedPath | |
| filestr = basePath + '.' + params.local_input_type + compressionSuffix | |
| } |
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@subworkflows/local/create_input_channel/main.nf` around lines 33 - 40, The
current logic that rewrites filestr when params.local_input_type is set naively
replaces everything after the last '.' which breaks compressed filenames like
sample.raw.gz (becoming sample.raw.mzML). Update the filestr rewriting to detect
common compression suffixes (e.g., .gz, .bz2, .zip, .xz): if a compression
suffix is present, split filestr into base and compressionSuffix, replace the
base's extension with params.local_input_type and then reattach the
compressionSuffix; otherwise perform the existing single-extension replacement.
Keep references to the same variables (filestr, params.local_input_type,
row.Filename) so the change integrates into the existing block that builds
filestr.
| def base_row = rows[0] | ||
|
|
There was a problem hiding this comment.
Grouped SDRF rows still make per-file metadata order-dependent.
After groupTuple, rows[0] becomes the source of truth for dissociation method, tolerances, variable mods, and m/z ranges, and FixedModifications keeps only fixedMods[0]. For multiplexed files that can silently drop row-specific values before workflows/dia.nf forwards them downstream. Please aggregate every unique value into the emitted scalar format, or fail fast when grouped rows disagree.
Also applies to: 88-105, 129-182
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@subworkflows/local/create_input_channel/main.nf` around lines 63 - 64, The
code currently uses rows[0] (base_row) after groupTuple to pick per-file
metadata (dissociation method, tolerances, variableMods, m/z ranges, and
FixedModifications via fixedMods[0]) which makes output order-dependent and can
drop row-specific values; update the logic in the block that references base_row
and the other similar blocks (the grouped handling around groupTuple and the
comparable logic in the 88-105 and 129-182 regions) to collect all unique values
across rows for each metadata field and emit them in the scalar format (e.g.,
join / dedupe into a single string or array as required by downstream
workflows), and if multiple differing values are incompatible with aggregation,
fail fast with a clear error referencing the file/group; specifically replace
uses of rows[0], fixedMods[0], and any single-row lookups with aggregation
functions (unique/dedupe) or explicit validation that throws when disagreements
exist.
| // Determine DDA mode: true if explicitly set via param OR auto-detected from SDRF | ||
| ch_is_dda = ch_experiment_meta.map { meta -> | ||
| def dda = params.diann_dda || meta.acquisition_method == 'dda' | ||
| if (dda && VersionUtils.versionLessThan(params.diann_version, '2.3.2')) { | ||
| error("DDA mode (detected from SDRF) requires DIA-NN >= 2.3.2. Current version: ${params.diann_version}. Use -profile diann_v2_3_2") | ||
| } | ||
| return dda | ||
| } |
There was a problem hiding this comment.
ch_is_dda can disagree with the downstream modules.
Line 73 re-applies params.diann_dda on top of the SDRF-derived acquisition method, but only INSILICO_LIBRARY_GENERATION receives that boolean at Line 90. ASSEMBLE_EMPIRICAL_LIBRARY and INDIVIDUAL_ANALYSIS still derive DDA from meta.acquisition_method, so --diann_dda true against an SDRF that explicitly says DIA will make the first step run with --dda while later steps stay in DIA mode. Either keep --diann_dda as a strict fallback here too, or thread the same boolean through every DIA-NN module.
Possible fix
- ch_is_dda = ch_experiment_meta.map { meta ->
- def dda = params.diann_dda || meta.acquisition_method == 'dda'
+ ch_is_dda = ch_experiment_meta.map { meta ->
+ if (params.diann_dda && meta.acquisition_method == 'dia') {
+ error('--diann_dda is only a fallback when the SDRF lacks an acquisition method; remove it or fix the SDRF value.')
+ }
+ def dda = meta.acquisition_method == 'dda'
if (dda && VersionUtils.versionLessThan(params.diann_version, '2.3.2')) {
error("DDA mode (detected from SDRF) requires DIA-NN >= 2.3.2. Current version: ${params.diann_version}. Use -profile diann_v2_3_2")
}
return dda
}Also applies to: 90-90
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@workflows/dia.nf` around lines 71 - 78, The ch_is_dda calculation mixes
params.diann_dda with SDRF-derived meta.acquisition_method causing downstream
inconsistency: ENSURE the final boolean is used consistently by either making
params.diann_dda a strict override or passing ch_is_dda into all DIA-NN modules;
specifically, update the logic around ch_is_dda (function/variable ch_is_dda) to
treat params.diann_dda as a definitive override when present, and propagate that
same boolean into the INSILICO_LIBRARY_GENERATION, ASSEMBLE_EMPIRICAL_LIBRARY,
and INDIVIDUAL_ANALYSIS module inputs so they all use the same DDA flag instead
of re-reading meta.acquisition_method.
PR checklist
nf-core pipelines lint).nextflow run . -profile test,docker --outdir <OUTDIR>).nextflow run . -profile debug,test,docker --outdir <OUTDIR>).docs/usage.mdis updated.docs/output.mdis updated.CHANGELOG.mdis updated.README.mdis updated (including new tool citations and authors/contributors).Summary by CodeRabbit
Documentation
New Features
Bug Fixes
Chores