From e2b2ac732f3692164d08ac32f88442d3e7a829ed Mon Sep 17 00:00:00 2001 From: Yasset Perez-Riverol Date: Fri, 3 Apr 2026 11:26:45 +0100 Subject: [PATCH 01/28] docs: add v1.0.0 release implementation plan 13-task plan covering robustness fixes, DDA support, new DIA-NN params, InfinDIA groundwork, comprehensive documentation, and issue cleanup. Co-Authored-By: Claude Opus 4.6 (1M context) --- .../2026-04-03-v1-release-implementation.md | 789 ++++++++++++++++++ 1 file changed, 789 insertions(+) create mode 100644 docs/plans/2026-04-03-v1-release-implementation.md diff --git a/docs/plans/2026-04-03-v1-release-implementation.md b/docs/plans/2026-04-03-v1-release-implementation.md new file mode 100644 index 0000000..4b3d0ed --- /dev/null +++ b/docs/plans/2026-04-03-v1-release-implementation.md @@ -0,0 +1,789 @@ +# quantmsdiann v1.0.0 Release — Implementation Plan + +> **For agentic workers:** REQUIRED SUB-SKILL: Use superpowers:subagent-driven-development (recommended) or superpowers:executing-plans to implement this plan task-by-task. Steps use checkbox (`- [ ]`) syntax for tracking. + +**Goal:** Prepare quantmsdiann for a robust v1.0.0 release with DDA support, new DIA-NN parameters, and comprehensive documentation. + +**Architecture:** No new workflows or modules. All changes are additions to existing files — new params, flags, guards, test configs, and docs. DDA uses the same pipeline as DIA with `--dda` appended to all DIA-NN invocations. Default container stays 1.8.1; 2.3.2 is opt-in via profile. + +**Tech Stack:** Nextflow DSL2, nf-core, DIA-NN, Groovy, Bash + +--- + +## Task 1: Fix tee pipes masking failures + +**Files:** +- Modify: `modules/local/diann/generate_cfg/main.nf:26` +- Modify: `modules/local/diann/diann_msstats/main.nf:21-26` +- Modify: `modules/local/samplesheet_check/main.nf:38-43` +- Modify: `modules/local/sdrf_parsing/main.nf:24-30` + +- [ ] **Step 1: Add pipefail to generate_cfg** + +In `modules/local/diann/generate_cfg/main.nf`, find the `"""` opening the script block (line 20) and add `set -o pipefail` as the first line: + +```groovy + """ + set -o pipefail + parse_sdrf generate-diann-cfg \\ + ... + ``` + +- [ ] **Step 2: Add pipefail to diann_msstats** + +In `modules/local/diann/diann_msstats/main.nf`, find the `"""` opening the script block (line 20) and add `set -o pipefail`: + +```groovy + """ + set -o pipefail + quantmsutilsc diann2msstats \\ + ... + ``` + +- [ ] **Step 3: Add pipefail to samplesheet_check** + +In `modules/local/samplesheet_check/main.nf`, find the `"""` opening the script block and add `set -o pipefail`: + +```groovy + """ + set -o pipefail + ... + ``` + +- [ ] **Step 4: Add pipefail to sdrf_parsing** + +In `modules/local/sdrf_parsing/main.nf`, find the `"""` opening the script block (line 22) and add `set -o pipefail`: + +```groovy + """ + set -o pipefail + parse_sdrf convert-diann \\ + ... + ``` + +- [ ] **Step 5: Validate and commit** + +```bash +conda run -n nfcore pre-commit run --all-files +git add modules/local/diann/generate_cfg/main.nf modules/local/diann/diann_msstats/main.nf modules/local/samplesheet_check/main.nf modules/local/sdrf_parsing/main.nf +git commit -m "fix: add pipefail to all modules with tee pipes + +Without pipefail, if the command before tee fails, tee returns 0 and +the Nextflow task appears to succeed. This masked failures in +generate_cfg, diann_msstats, samplesheet_check, and sdrf_parsing." +``` + +--- + +## Task 2: Add error retry to long-running DIA-NN tasks + +**Files:** +- Modify: `modules/local/diann/preliminary_analysis/main.nf:3-4` +- Modify: `modules/local/diann/individual_analysis/main.nf:3-4` +- Modify: `modules/local/diann/final_quantification/main.nf:3-4` +- Modify: `modules/local/diann/insilico_library_generation/main.nf:3-4` +- Modify: `modules/local/diann/assemble_empirical_library/main.nf:3-4` + +- [ ] **Step 1: Add error_retry label to all 5 DIA-NN modules** + +In each file, add `label 'error_retry'` after the existing labels. For example, `preliminary_analysis/main.nf` currently has: + +```groovy + label 'process_high' + label 'diann' +``` + +Change to: + +```groovy + label 'process_high' + label 'diann' + label 'error_retry' +``` + +Do the same for: +- `individual_analysis/main.nf` (after `label 'diann'`) +- `final_quantification/main.nf` (after `label 'diann'`) +- `insilico_library_generation/main.nf` (after `label 'diann'`) +- `assemble_empirical_library/main.nf` (after `label 'diann'`) + +- [ ] **Step 2: Validate and commit** + +```bash +conda run -n nfcore pre-commit run --all-files +git add modules/local/diann/preliminary_analysis/main.nf modules/local/diann/individual_analysis/main.nf modules/local/diann/final_quantification/main.nf modules/local/diann/insilico_library_generation/main.nf modules/local/diann/assemble_empirical_library/main.nf +git commit -m "fix: add error_retry label to all DIA-NN analysis modules + +These are the longest-running tasks and most susceptible to transient +failures (OOM, I/O timeouts). The error_retry label enables automatic +retry on signal exits (130-145, 104, 175)." +``` + +--- + +## Task 3: Add empty input guards + +**Files:** +- Modify: `workflows/dia.nf:38,46` + +- [ ] **Step 1: Guard ch_searchdb with ifEmpty** + +In `workflows/dia.nf`, line 38, change: + +```groovy + ch_searchdb = channel.fromPath(params.database, checkIfExists: true).first() +``` + +To: + +```groovy + ch_searchdb = channel.fromPath(params.database, checkIfExists: true) + .ifEmpty { error("No protein database found at '${params.database}'. Provide --database ") } + .first() +``` + +- [ ] **Step 2: Guard ch_experiment_meta with ifEmpty** + +In `workflows/dia.nf`, line 46, change: + +```groovy + ch_experiment_meta = ch_result.meta.unique { m -> m.experiment_id }.first() +``` + +To: + +```groovy + ch_experiment_meta = ch_result.meta.unique { m -> m.experiment_id } + .ifEmpty { error("No valid input files found after SDRF parsing. Check your SDRF file and input paths.") } + .first() +``` + +- [ ] **Step 3: Validate and commit** + +```bash +conda run -n nfcore pre-commit run --all-files +git add workflows/dia.nf +git commit -m "fix: add empty input guards to prevent silent pipeline hangs + +Guard ch_searchdb and ch_experiment_meta with ifEmpty to fail fast +with clear error messages instead of hanging indefinitely." +``` + +--- + +## Task 4: Add DIA-NN 2.3.2 version config and profile + +**Files:** +- Create: `conf/diann_versions/v2_3_2.config` +- Modify: `nextflow.config:245-247` (profiles section) + +- [ ] **Step 1: Create v2_3_2.config** + +Create `conf/diann_versions/v2_3_2.config`: + +```groovy +/* + * DIA-NN 2.3.2 container override (private ghcr.io) + * Latest release with DDA support and InfinDIA. + */ +params.diann_version = '2.3.2' + +process { + withLabel: diann { + container = 'ghcr.io/bigbio/diann:2.3.2' + } +} + +singularity.enabled = false +docker.enabled = true +``` + +- [ ] **Step 2: Add profile to nextflow.config** + +In `nextflow.config`, after the `diann_v2_2_0` profile line (around line 247), add: + +```groovy + diann_v2_3_2 { includeConfig 'conf/diann_versions/v2_3_2.config' } +``` + +- [ ] **Step 3: Validate and commit** + +```bash +conda run -n nfcore pre-commit run --all-files +git add conf/diann_versions/v2_3_2.config nextflow.config +git commit -m "feat: add DIA-NN 2.3.2 version config and profile + +Adds conf/diann_versions/v2_3_2.config with ghcr.io/bigbio/diann:2.3.2 +container. Use -profile diann_v2_3_2 to opt in. Default stays 1.8.1. +Enables DDA support and InfinDIA features." +``` + +--- + +## Task 5: Implement DDA support — params, version guard, flag passthrough + +**Files:** +- Modify: `nextflow.config:53-57` (DIA-NN general params) +- Modify: `nextflow_schema.json` (DIA-NN section) +- Modify: `workflows/dia.nf:35-38` (version guard) +- Modify: `subworkflows/local/create_input_channel/main.nf:75-88` (acquisition method) +- Modify: `modules/local/diann/insilico_library_generation/main.nf` (blocked list + flag) +- Modify: `modules/local/diann/preliminary_analysis/main.nf` (blocked list + flag) +- Modify: `modules/local/diann/assemble_empirical_library/main.nf` (blocked list + flag) +- Modify: `modules/local/diann/individual_analysis/main.nf` (blocked list + flag) +- Modify: `modules/local/diann/final_quantification/main.nf` (blocked list + flag) + +- [ ] **Step 1: Add diann_dda param to nextflow.config** + +In `nextflow.config`, after `diann_extra_args = null` (line 57), add: + +```groovy + diann_dda = false // Enable DDA analysis mode (requires DIA-NN >= 2.3.2) +``` + +- [ ] **Step 2: Add diann_dda to nextflow_schema.json** + +In `nextflow_schema.json`, in the DIA-NN section (inside `"$defs"` > appropriate group), add: + +```json +"diann_dda": { + "type": "boolean", + "description": "Enable DDA (Data-Dependent Acquisition) analysis mode. Passes --dda to all DIA-NN steps. Requires DIA-NN >= 2.3.2 (use -profile diann_v2_3_2). Beta feature — only trust q-values, PEP, RT/IM, Ms1.Apex.Area. PTM localization unreliable with DDA.", + "fa_icon": "fas fa-flask", + "default": false +} +``` + +Add `"diann_dda"` to the corresponding `"required"` or `"properties"` list in the appropriate group. + +- [ ] **Step 3: Add version guard in workflows/dia.nf** + +In `workflows/dia.nf`, at the start of the `main:` block (after line 37), add: + +```groovy + // Version guard for DDA mode + if (params.diann_dda && params.diann_version < '2.3.2') { + error("DDA mode (--diann_dda) requires DIA-NN >= 2.3.2. Current version: ${params.diann_version}. Use -profile diann_v2_3_2") + } +``` + +- [ ] **Step 4: Accept DDA acquisition method in create_input_channel** + +In `subworkflows/local/create_input_channel/main.nf`, replace lines 75-88 (the acquisition method validation block): + +```groovy + // Validate acquisition method + def acqMethod = row.AcquisitionMethod?.toString()?.trim() ?: "" + if (acqMethod.toLowerCase().contains("data-independent acquisition") || acqMethod.toLowerCase().contains("dia")) { + meta.acquisition_method = "dia" + } else if (params.diann_dda && (acqMethod.toLowerCase().contains("data-dependent acquisition") || acqMethod.toLowerCase().contains("dda"))) { + meta.acquisition_method = "dda" + } else if (acqMethod.isEmpty()) { + meta.acquisition_method = params.diann_dda ? "dda" : "dia" + } else { + log.error("Unsupported acquisition method: '${acqMethod}'. This pipeline supports DIA" + (params.diann_dda ? " and DDA (--diann_dda)" : "") + ". Found in file: ${filestr}") + exit(1) + } +``` + +- [ ] **Step 5: Add --dda flag to all 5 DIA-NN modules** + +For each of the 5 DIA-NN modules, make two changes: + +**a) Add `'--dda'` to the blocked list.** In each module's `def blocked = [...]`, add `'--dda'` to the array. + +**b) Add the flag variable and append it to the command.** In each module's script block, after the existing flag variables (before the `"""` shell block), add: + +```groovy + diann_dda_flag = params.diann_dda ? "--dda" : "" +``` + +Then append `${diann_dda_flag} \\` to the DIA-NN command, before `\${mod_flags}` (or before `$args` if no mod_flags). + +Apply to: +- `modules/local/diann/insilico_library_generation/main.nf` +- `modules/local/diann/preliminary_analysis/main.nf` +- `modules/local/diann/assemble_empirical_library/main.nf` +- `modules/local/diann/individual_analysis/main.nf` +- `modules/local/diann/final_quantification/main.nf` + +- [ ] **Step 6: Validate and commit** + +```bash +conda run -n nfcore pre-commit run --all-files +conda run -n nfcore nf-core pipelines lint --dir . +git add nextflow.config nextflow_schema.json workflows/dia.nf subworkflows/local/create_input_channel/main.nf modules/local/diann/*/main.nf +git commit -m "feat: add DDA support via --diann_dda flag (#5) + +- New param diann_dda (boolean, default: false) +- Version guard: requires DIA-NN >= 2.3.2 +- Passes --dda to all 5 DIA-NN modules when enabled +- Accepts DDA acquisition method in SDRF when diann_dda=true +- Added --dda to blocked lists in all modules + +Closes #5" +``` + +--- + +## Task 6: Add DDA test config + +**Files:** +- Create: `conf/tests/test_dda.config` +- Modify: `.github/workflows/extended_ci.yml:110-191` (stage 2a) + +- [ ] **Step 1: Create test_dda.config** + +Create `conf/tests/test_dda.config`: + +```groovy +/* +~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ + Nextflow config file for testing DDA analysis (requires DIA-NN >= 2.3.2) +~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ + Tests DDA mode using the BSA dataset with --diann_dda flag. + Uses ghcr.io/bigbio/diann:2.3.2. + + Use as follows: + nextflow run bigbio/quantmsdiann -profile test_dda,docker [--outdir ] + +------------------------------------------------------------------------------------------------ +*/ + +process { + resourceLimits = [ + cpus: 4, + memory: '12.GB', + time: '48.h' + ] +} + +params { + config_profile_name = 'Test profile for DDA analysis' + config_profile_description = 'DDA test using BSA dataset with DIA-NN 2.3.2.' + + outdir = './results_dda' + + // Input data — BSA DDA dataset + input = 'https://raw.githubusercontent.com/bigbio/quantms-test-datasets/quantms/testdata/lfq_ci/BSA/BSA_design.sdrf.tsv' + database = 'https://raw.githubusercontent.com/bigbio/quantms-test-datasets/quantms/testdata/lfq_ci/BSA/18Protein_SoCe_Tr_detergents_trace.fasta' + + // DDA mode + diann_dda = true + + // Search parameters matching BSA dataset + min_peptide_length = 7 + max_peptide_length = 30 + max_precursor_charge = 3 + allowed_missed_cleavages = 1 + diann_normalize = false + publish_dir_mode = 'symlink' + max_mods = 2 +} + +process { + withLabel: diann { + container = 'ghcr.io/bigbio/diann:2.3.2' + } +} + +singularity.enabled = false +docker.enabled = true +``` + +- [ ] **Step 2: Add test_dda profile to nextflow.config** + +In `nextflow.config`, after the `test_dia_2_2_0` profile line (around line 241), add: + +```groovy + test_dda { includeConfig 'conf/tests/test_dda.config' } +``` + +- [ ] **Step 3: Add test_dda to extended_ci.yml stage 2a** + +In `.github/workflows/extended_ci.yml`, in the `test-latest` job matrix (around line 120), add `"test_dda"` to the `test_profile` array: + +```yaml + test_profile: ["test_latest_dia", "test_dia_quantums", "test_dia_parquet", "test_dda"] +``` + +- [ ] **Step 4: Validate and commit** + +```bash +conda run -n nfcore pre-commit run --all-files +git add conf/tests/test_dda.config nextflow.config .github/workflows/extended_ci.yml +git commit -m "test: add DDA test config using BSA dataset with DIA-NN 2.3.2 + +Uses bigbio/quantms-test-datasets BSA LFQ dataset (~34 MB) with +diann_dda=true pinned to ghcr.io/bigbio/diann:2.3.2. Added to +extended_ci.yml stage 2a (private containers)." +``` + +--- + +## Task 7: Add test configs for skip_preliminary_analysis and speclib input + +**Files:** +- Create: `conf/tests/test_dia_skip_preanalysis.config` +- Modify: `nextflow.config` (profiles section) +- Modify: `.github/workflows/extended_ci.yml` (stage 2a) + +- [ ] **Step 1: Create test_dia_skip_preanalysis.config** + +Create `conf/tests/test_dia_skip_preanalysis.config`: + +```groovy +/* +~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ + Nextflow config file for testing skip_preliminary_analysis path +~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ + Tests the pipeline with skip_preliminary_analysis=true, using default + mass accuracy parameters. Validates the untested code path in dia.nf. + + Use as follows: + nextflow run bigbio/quantmsdiann -profile test_dia_skip_preanalysis,docker [--outdir ] + +------------------------------------------------------------------------------------------------ +*/ + +process { + resourceLimits = [ + cpus: 4, + memory: '12.GB', + time: '48.h' + ] +} + +params { + config_profile_name = 'Test profile for skip preliminary analysis' + config_profile_description = 'Tests skip_preliminary_analysis path with default mass accuracy params.' + + outdir = './results_skip_preanalysis' + + // Input data — same as test_dia + input = 'https://raw.githubusercontent.com/bigbio/quantms-test-datasets/quantms/testdata/dia_ci/PXD026600.sdrf.tsv' + database = 'https://raw.githubusercontent.com/bigbio/quantms-test-datasets/quantms/testdata/dia_ci/REF_EColi_K12_UPS1_combined.fasta' + min_pr_mz = 350 + max_pr_mz = 950 + min_fr_mz = 500 + max_fr_mz = 1500 + min_peptide_length = 15 + max_peptide_length = 30 + max_precursor_charge = 3 + allowed_missed_cleavages = 1 + diann_normalize = false + publish_dir_mode = 'symlink' + max_mods = 2 + + // Skip preliminary analysis — use default mass accuracy params + skip_preliminary_analysis = true + mass_acc_ms2 = 15 + mass_acc_ms1 = 15 + scan_window = 8 +} +``` + +- [ ] **Step 2: Add profile to nextflow.config** + +After existing test profiles (around line 242), add: + +```groovy + test_dia_skip_preanalysis { includeConfig 'conf/tests/test_dia_skip_preanalysis.config' } +``` + +- [ ] **Step 3: Add to extended_ci.yml stage 2a** + +In the `test-latest` job matrix, add `"test_dia_skip_preanalysis"` to the `test_profile` array. + +- [ ] **Step 4: Validate and commit** + +```bash +conda run -n nfcore pre-commit run --all-files +git add conf/tests/test_dia_skip_preanalysis.config nextflow.config .github/workflows/extended_ci.yml +git commit -m "test: add test config for skip_preliminary_analysis path + +Tests the previously untested code path where preliminary analysis is +skipped and default mass accuracy parameters are used directly." +``` + +--- + +## Task 8: Add new DIA-NN feature parameters (light-models, export-quant, site-ms1-quant) + +**Files:** +- Modify: `nextflow.config` (params section) +- Modify: `nextflow_schema.json` +- Modify: `modules/local/diann/insilico_library_generation/main.nf` (light-models) +- Modify: `modules/local/diann/final_quantification/main.nf` (export-quant, site-ms1-quant) + +- [ ] **Step 1: Add params to nextflow.config** + +In `nextflow.config`, in the DIA-NN general section (after `diann_dda`, around line 58), add: + +```groovy + diann_light_models = false // add '--light-models' for 10x faster library generation (DIA-NN >= 2.0) + diann_export_quant = false // add '--export-quant' for fragment-level parquet export (DIA-NN >= 2.0) + diann_site_ms1_quant = false // add '--site-ms1-quant' for MS1 apex PTM quantification (DIA-NN >= 2.0) +``` + +- [ ] **Step 2: Add params to nextflow_schema.json** + +Add each param to the DIA-NN section in the schema with type, description, default, and fa_icon. + +- [ ] **Step 3: Wire --light-models in insilico_library_generation** + +In `modules/local/diann/insilico_library_generation/main.nf`: + +a) Add `'--light-models'` to the blocked list (line 26-32). + +b) After `diann_no_peptidoforms` variable (line 47), add: + +```groovy + diann_light_models = params.diann_light_models ? "--light-models" : "" +``` + +c) Append `${diann_light_models} \\` to the DIA-NN command before `${met_excision}`. + +- [ ] **Step 4: Wire --export-quant and --site-ms1-quant in final_quantification** + +In `modules/local/diann/final_quantification/main.nf`: + +a) Add `'--export-quant'` and `'--site-ms1-quant'` to the blocked list (line 45-50). + +b) After `diann_dda_flag` variable, add: + +```groovy + diann_export_quant = params.diann_export_quant ? "--export-quant" : "" + diann_site_ms1_quant = params.diann_site_ms1_quant ? "--site-ms1-quant" : "" +``` + +c) Append both to the DIA-NN command before `\${mod_flags}`. + +- [ ] **Step 5: Validate and commit** + +```bash +conda run -n nfcore pre-commit run --all-files +conda run -n nfcore nf-core pipelines lint --dir . +git add nextflow.config nextflow_schema.json modules/local/diann/insilico_library_generation/main.nf modules/local/diann/final_quantification/main.nf +git commit -m "feat: add --light-models, --export-quant, --site-ms1-quant params (#7) + +- diann_light_models: 10x faster in-silico library generation +- diann_export_quant: fragment-level parquet export +- diann_site_ms1_quant: MS1 apex intensities for PTM quantification +All require DIA-NN >= 2.0." +``` + +--- + +## Task 9: Add InfinDIA groundwork + +**Files:** +- Modify: `nextflow.config` (params section) +- Modify: `nextflow_schema.json` +- Modify: `workflows/dia.nf` (version guard) +- Modify: `modules/local/diann/insilico_library_generation/main.nf` (flag) + +- [ ] **Step 1: Add InfinDIA params to nextflow.config** + +After the DDA param, add: + +```groovy + // DIA-NN: InfinDIA (experimental, v2.3.0+) + enable_infin_dia = false // Enable InfinDIA for ultra-large search spaces + diann_pre_select = null // --pre-select N precursor limit for InfinDIA +``` + +- [ ] **Step 2: Add to nextflow_schema.json** + +Add `enable_infin_dia` (boolean) and `diann_pre_select` (integer, optional) to the schema. + +- [ ] **Step 3: Add version guard in workflows/dia.nf** + +After the DDA version guard, add: + +```groovy + if (params.enable_infin_dia && params.diann_version < '2.3.0') { + error("InfinDIA requires DIA-NN >= 2.3.0. Current version: ${params.diann_version}. Use -profile diann_v2_3_2") + } +``` + +- [ ] **Step 4: Wire flags in insilico_library_generation** + +In `modules/local/diann/insilico_library_generation/main.nf`: + +a) Add `'--infin-dia'` and `'--pre-select'` to the blocked list. + +b) Add flag variables: + +```groovy + infin_dia_flag = params.enable_infin_dia ? "--infin-dia" : "" + pre_select_flag = params.diann_pre_select ? "--pre-select $params.diann_pre_select" : "" +``` + +c) Append both to the DIA-NN command. + +- [ ] **Step 5: Validate and commit** + +```bash +conda run -n nfcore pre-commit run --all-files +conda run -n nfcore nf-core pipelines lint --dir . +git add nextflow.config nextflow_schema.json workflows/dia.nf modules/local/diann/insilico_library_generation/main.nf +git commit -m "feat: add InfinDIA groundwork — enable_infin_dia param (#10) + +Experimental support for InfinDIA (DIA-NN 2.3.0+). Passes --infin-dia +to library generation when enabled. Version guard enforces >= 2.3.0. +No test config — InfinDIA requires large databases." +``` + +--- + +## Task 10: Documentation — parameters.md + +**Files:** +- Create: `docs/parameters.md` + +- [ ] **Step 1: Create comprehensive parameter reference** + +Create `docs/parameters.md` with all params from `nextflow_schema.json` grouped by category. Read `nextflow.config` and `nextflow_schema.json` to get every param, its type, default, and description. Group into: + +1. Input/output options +2. File preparation +3. DIA-NN general +4. Mass accuracy and calibration +5. Library generation +6. Quantification and output +7. DDA mode +8. InfinDIA (experimental) +9. Quality control +10. MultiQC options +11. Boilerplate + +Each param entry: `| name | type | default | description |` + +- [ ] **Step 2: Commit** + +```bash +git add docs/parameters.md +git commit -m "docs: add comprehensive parameter reference (#1) + +Complete reference for all ~70 pipeline parameters grouped by category +with types, defaults, descriptions, and version requirements. + +Closes #1" +``` + +--- + +## Task 11: Documentation — complete usage.md and output.md + +**Files:** +- Modify: `docs/usage.md` +- Modify: `docs/output.md` +- Modify: `CITATIONS.md` +- Modify: `README.md` + +- [ ] **Step 1: Add DDA section to usage.md** + +Add a "DDA Analysis Mode" section after the Bruker/timsTOF section with: +- How to enable (`--diann_dda true -profile diann_v2_3_2`) +- Limitations (beta, trusted columns only, PTM unreliable, MBR limited) +- Example command +- Link to DIA-NN DDA documentation + +- [ ] **Step 2: Add missing param sections to usage.md** + +Add sections for: +- Preprocessing params (`reindex_mzml`, `mzml_statistics`, `convert_dotd`) +- QC params (`enable_pmultiqc`, `skip_table_plots`, `contaminant_string`) +- `diann_extra_args` scope per module +- `--verbose_modules` profile +- Container version override guide (DIA-NN version profiles) +- Singularity usage +- SLURM example + +- [ ] **Step 3: Update output.md** + +Add: +- Parquet vs TSV output explanation +- MSstats format section +- Intermediate outputs under `--verbose_modules` + +- [ ] **Step 4: Add pmultiqc to CITATIONS.md** + +Add pmultiqc citation after the MultiQC entry. + +- [ ] **Step 5: Update README.md** + +Add DIA-NN version support table and link to `docs/parameters.md`. + +- [ ] **Step 6: Validate and commit** + +```bash +conda run -n nfcore pre-commit run --all-files +git add docs/usage.md docs/output.md CITATIONS.md README.md +git commit -m "docs: complete usage.md, output.md, citations, README (#1, #3, #9, #15) + +- DDA mode documentation with limitations +- Missing param sections (preprocessing, QC, extra_args scope) +- Container version override and Singularity guides +- Parquet vs TSV output explanation +- pmultiqc citation added +- README updated with version table + +Closes #3, #9, #15" +``` + +--- + +## Task 12: Close resolved issues + +- [ ] **Step 1: Close issues via GitHub CLI** + +```bash +gh issue close 17 --repo bigbio/quantmsdiann --comment "Already implemented — --monitor-mod is extracted from diann_config.cfg (generated by sdrf-pipelines convert-diann) and passed to all DIA-NN steps via mod_flags." +gh issue close 2 --repo bigbio/quantmsdiann --comment "Superseded by #4 (Phase 6: consolidate param generation to sdrf-pipelines)." +gh issue close 1 --repo bigbio/quantmsdiann --comment "Resolved — docs/parameters.md created with comprehensive parameter reference." +gh issue close 3 --repo bigbio/quantmsdiann --comment "Resolved — diann_extra_args scope documented in docs/usage.md." +gh issue close 9 --repo bigbio/quantmsdiann --comment "Resolved — container version override guide and Singularity usage added to docs/usage.md." +gh issue close 15 --repo bigbio/quantmsdiann --comment "Resolved — docs/usage.md input documentation updated." +``` + +--- + +## Task 13: Final validation and push + +- [ ] **Step 1: Run full validation suite** + +```bash +conda run -n nfcore pre-commit run --all-files +conda run -n nfcore nf-core pipelines lint --release --dir . +``` + +Expected: 0 failures on both. + +- [ ] **Step 2: Push dda branch and create PR** + +```bash +git push -u origin dda +gh pr create --title "feat: v1.0.0 release — robustness, DDA support, features, docs" --body "$(cat <<'PREOF' +## Summary +- Robustness fixes: pipefail, error_retry, empty input guards +- DDA support via --diann_dda flag (DIA-NN >= 2.3.2) +- New params: --light-models, --export-quant, --site-ms1-quant +- InfinDIA groundwork (experimental) +- DIA-NN 2.3.2 version config +- New test configs: test_dda, test_dia_skip_preanalysis +- Comprehensive docs: parameters.md, complete usage.md, output.md + +## Issues +Closes #1, #3, #5, #7, #9, #10, #15, #17 + +## Test plan +- [ ] Existing CI tests pass (test_dia, test_dia_dotd) +- [ ] New test_dda passes with BSA dataset on DIA-NN 2.3.2 +- [ ] test_dia_skip_preanalysis passes +- [ ] nf-core lint --release: 0 failures +- [ ] pre-commit: all passing +PREOF +)" --base dev +``` From 7d4a441209de064827ea3c5d2148e2f00c5114e9 Mon Sep 17 00:00:00 2001 From: Yasset Perez-Riverol Date: Fri, 3 Apr 2026 16:31:05 +0100 Subject: [PATCH 02/28] fix: add pipefail to all modules with tee pipes Without pipefail, if the command before tee fails, tee returns 0 and the Nextflow task appears to succeed. This masked failures in generate_cfg, diann_msstats, samplesheet_check, and sdrf_parsing. Co-Authored-By: Claude Opus 4.6 (1M context) --- .../2026-04-03-v1-release-implementation.md | 22 +++++++- docs/v1-release-roadmap.md | 56 ++++++++++++------- modules/local/diann/diann_msstats/main.nf | 1 + modules/local/diann/generate_cfg/main.nf | 1 + modules/local/samplesheet_check/main.nf | 1 + modules/local/sdrf_parsing/main.nf | 1 + 6 files changed, 59 insertions(+), 23 deletions(-) diff --git a/docs/plans/2026-04-03-v1-release-implementation.md b/docs/plans/2026-04-03-v1-release-implementation.md index 4b3d0ed..8439513 100644 --- a/docs/plans/2026-04-03-v1-release-implementation.md +++ b/docs/plans/2026-04-03-v1-release-implementation.md @@ -13,6 +13,7 @@ ## Task 1: Fix tee pipes masking failures **Files:** + - Modify: `modules/local/diann/generate_cfg/main.nf:26` - Modify: `modules/local/diann/diann_msstats/main.nf:21-26` - Modify: `modules/local/samplesheet_check/main.nf:38-43` @@ -22,7 +23,7 @@ In `modules/local/diann/generate_cfg/main.nf`, find the `"""` opening the script block (line 20) and add `set -o pipefail` as the first line: -```groovy +````groovy """ set -o pipefail parse_sdrf generate-diann-cfg \\ @@ -71,13 +72,14 @@ git commit -m "fix: add pipefail to all modules with tee pipes Without pipefail, if the command before tee fails, tee returns 0 and the Nextflow task appears to succeed. This masked failures in generate_cfg, diann_msstats, samplesheet_check, and sdrf_parsing." -``` +```` --- ## Task 2: Add error retry to long-running DIA-NN tasks **Files:** + - Modify: `modules/local/diann/preliminary_analysis/main.nf:3-4` - Modify: `modules/local/diann/individual_analysis/main.nf:3-4` - Modify: `modules/local/diann/final_quantification/main.nf:3-4` @@ -102,6 +104,7 @@ Change to: ``` Do the same for: + - `individual_analysis/main.nf` (after `label 'diann'`) - `final_quantification/main.nf` (after `label 'diann'`) - `insilico_library_generation/main.nf` (after `label 'diann'`) @@ -124,6 +127,7 @@ retry on signal exits (130-145, 104, 175)." ## Task 3: Add empty input guards **Files:** + - Modify: `workflows/dia.nf:38,46` - [ ] **Step 1: Guard ch_searchdb with ifEmpty** @@ -174,6 +178,7 @@ with clear error messages instead of hanging indefinitely." ## Task 4: Add DIA-NN 2.3.2 version config and profile **Files:** + - Create: `conf/diann_versions/v2_3_2.config` - Modify: `nextflow.config:245-247` (profiles section) @@ -223,6 +228,7 @@ Enables DDA support and InfinDIA features." ## Task 5: Implement DDA support — params, version guard, flag passthrough **Files:** + - Modify: `nextflow.config:53-57` (DIA-NN general params) - Modify: `nextflow_schema.json` (DIA-NN section) - Modify: `workflows/dia.nf:35-38` (version guard) @@ -301,6 +307,7 @@ For each of the 5 DIA-NN modules, make two changes: Then append `${diann_dda_flag} \\` to the DIA-NN command, before `\${mod_flags}` (or before `$args` if no mod_flags). Apply to: + - `modules/local/diann/insilico_library_generation/main.nf` - `modules/local/diann/preliminary_analysis/main.nf` - `modules/local/diann/assemble_empirical_library/main.nf` @@ -329,6 +336,7 @@ Closes #5" ## Task 6: Add DDA test config **Files:** + - Create: `conf/tests/test_dda.config` - Modify: `.github/workflows/extended_ci.yml:110-191` (stage 2a) @@ -404,7 +412,7 @@ In `nextflow.config`, after the `test_dia_2_2_0` profile line (around line 241), In `.github/workflows/extended_ci.yml`, in the `test-latest` job matrix (around line 120), add `"test_dda"` to the `test_profile` array: ```yaml - test_profile: ["test_latest_dia", "test_dia_quantums", "test_dia_parquet", "test_dda"] +test_profile: ["test_latest_dia", "test_dia_quantums", "test_dia_parquet", "test_dda"] ``` - [ ] **Step 4: Validate and commit** @@ -424,6 +432,7 @@ extended_ci.yml stage 2a (private containers)." ## Task 7: Add test configs for skip_preliminary_analysis and speclib input **Files:** + - Create: `conf/tests/test_dia_skip_preanalysis.config` - Modify: `nextflow.config` (profiles section) - Modify: `.github/workflows/extended_ci.yml` (stage 2a) @@ -511,6 +520,7 @@ skipped and default mass accuracy parameters are used directly." ## Task 8: Add new DIA-NN feature parameters (light-models, export-quant, site-ms1-quant) **Files:** + - Modify: `nextflow.config` (params section) - Modify: `nextflow_schema.json` - Modify: `modules/local/diann/insilico_library_generation/main.nf` (light-models) @@ -578,6 +588,7 @@ All require DIA-NN >= 2.0." ## Task 9: Add InfinDIA groundwork **Files:** + - Modify: `nextflow.config` (params section) - Modify: `nextflow_schema.json` - Modify: `workflows/dia.nf` (version guard) @@ -640,6 +651,7 @@ No test config — InfinDIA requires large databases." ## Task 10: Documentation — parameters.md **Files:** + - Create: `docs/parameters.md` - [ ] **Step 1: Create comprehensive parameter reference** @@ -677,6 +689,7 @@ Closes #1" ## Task 11: Documentation — complete usage.md and output.md **Files:** + - Modify: `docs/usage.md` - Modify: `docs/output.md` - Modify: `CITATIONS.md` @@ -685,6 +698,7 @@ Closes #1" - [ ] **Step 1: Add DDA section to usage.md** Add a "DDA Analysis Mode" section after the Bruker/timsTOF section with: + - How to enable (`--diann_dda true -profile diann_v2_3_2`) - Limitations (beta, trusted columns only, PTM unreliable, MBR limited) - Example command @@ -693,6 +707,7 @@ Add a "DDA Analysis Mode" section after the Bruker/timsTOF section with: - [ ] **Step 2: Add missing param sections to usage.md** Add sections for: + - Preprocessing params (`reindex_mzml`, `mzml_statistics`, `convert_dotd`) - QC params (`enable_pmultiqc`, `skip_table_plots`, `contaminant_string`) - `diann_extra_args` scope per module @@ -704,6 +719,7 @@ Add sections for: - [ ] **Step 3: Update output.md** Add: + - Parquet vs TSV output explanation - MSstats format section - Intermediate outputs under `--verbose_modules` diff --git a/docs/v1-release-roadmap.md b/docs/v1-release-roadmap.md index 421648f..1961f20 100644 --- a/docs/v1-release-roadmap.md +++ b/docs/v1-release-roadmap.md @@ -29,6 +29,7 @@ DDA parallelization is identical to DIA — per-file parallel for PRELIMINARY_AN ### 1.1 Fix tee pipes masking failures Add `set -o pipefail` or `exit ${PIPESTATUS[0]}` to script blocks in: + - `modules/local/diann/generate_cfg/main.nf` - `modules/local/diann/diann_msstats/main.nf` - `modules/local/samplesheet_check/main.nf` @@ -39,6 +40,7 @@ Add `set -o pipefail` or `exit ${PIPESTATUS[0]}` to script blocks in: ### 1.2 Add error retry to long-running DIA-NN tasks Add `label 'error_retry'` to: + - PRELIMINARY_ANALYSIS (process_high) - INDIVIDUAL_ANALYSIS (process_high) - FINAL_QUANTIFICATION (process_high) @@ -55,12 +57,14 @@ These are the longest-running tasks and most susceptible to transient failures ( ### 1.4 New test configs **`conf/tests/test_dia_skip_preanalysis.config`:** + - Sets `skip_preliminary_analysis = true` - Uses default `mass_acc_ms1`, `mass_acc_ms2`, `scan_window` params - Same PXD026600 test data as test_dia - Validates the skip path that is currently untested in CI **`conf/tests/test_dia_speclib.config`:** + - Sets `diann_speclib` to a pre-built spectral library - Skips INSILICO_LIBRARY_GENERATION (the `if` branch in dia.nf line 55-56) - Requires a small test spectral library in quantms-test-datasets (or generated from existing test data) @@ -78,6 +82,7 @@ Build and push `ghcr.io/bigbio/diann:2.3.2` from existing Dockerfile at `quantms ### 2.2 Version config Add `conf/diann_versions/v2_3_2.config`: + ```groovy params.diann_version = '2.3.2' process { @@ -90,6 +95,7 @@ docker.enabled = true ``` Add profile in `nextflow.config`: + ```groovy diann_v2_3_2 { includeConfig 'conf/diann_versions/v2_3_2.config' } ``` @@ -97,11 +103,13 @@ diann_v2_3_2 { includeConfig 'conf/diann_versions/v2_3_2.config' } ### 2.3 DDA implementation **New param** in `nextflow.config`: + ```groovy diann_dda = false // Enable DDA analysis mode (requires DIA-NN >= 2.3.2) ``` **Version guard** in `workflows/dia.nf` at workflow start: + ```groovy if (params.diann_dda && params.diann_version < '2.3.2') { error("DDA mode requires DIA-NN >= 2.3.2. Current version: ${params.diann_version}. Use -profile diann_v2_3_2") @@ -109,12 +117,15 @@ if (params.diann_dda && params.diann_version < '2.3.2') { ``` **Pass `--dda` to all DIA-NN modules** — In each module's script block, add: + ```groovy diann_dda_flag = params.diann_dda ? "--dda" : "" ``` + And append `${diann_dda_flag}` to the DIA-NN command. Add `'--dda'` to the `blocked` list in all 5 modules. **Accept DDA in create_input_channel** — Modify `create_input_channel/main.nf` lines 78-88: + ```groovy if (acqMethod.toLowerCase().contains("data-independent acquisition") || acqMethod.toLowerCase().contains("dia")) { meta.acquisition_method = "dia" @@ -135,6 +146,7 @@ Add `comment[proteomics data acquisition method]` column with value `NT=Data-Dep ### 2.5 Test config **`conf/tests/test_dda.config`:** + - Points to BSA dataset from `bigbio/quantms-test-datasets/testdata/lfq_ci/BSA/` - Sets `diann_dda = true` - Pins to `ghcr.io/bigbio/diann:2.3.2` @@ -151,22 +163,24 @@ Add `comment[proteomics data acquisition method]` column with value `NT=Data-Dep ### 3.1 New DIA-NN parameters -| Parameter | Flag | Min Version | Module | Default | -|---|---|---|---|---| -| `diann_light_models` | `--light-models` | 2.0 | INSILICO_LIBRARY_GENERATION | false | -| `diann_export_quant` | `--export-quant` | 2.0 | FINAL_QUANTIFICATION | false | -| `diann_read_threads` | `--read-threads N` | 2.0 | All DIA-NN steps | null (disabled) | -| `diann_site_ms1_quant` | `--site-ms1-quant` | 2.0 | FINAL_QUANTIFICATION | false | +| Parameter | Flag | Min Version | Module | Default | +| ---------------------- | ------------------ | ----------- | --------------------------- | --------------- | +| `diann_light_models` | `--light-models` | 2.0 | INSILICO_LIBRARY_GENERATION | false | +| `diann_export_quant` | `--export-quant` | 2.0 | FINAL_QUANTIFICATION | false | +| `diann_read_threads` | `--read-threads N` | 2.0 | All DIA-NN steps | null (disabled) | +| `diann_site_ms1_quant` | `--site-ms1-quant` | 2.0 | FINAL_QUANTIFICATION | false | Each parameter: add to `nextflow.config`, `nextflow_schema.json`, module script block (with version guard where needed), and module blocked list. ### 3.2 InfinDIA groundwork (issue #10) New params: + - `enable_infin_dia` (boolean, default: false) — requires >= 2.3.0 - `diann_pre_select` (integer, optional) — `--pre-select N` precursor limit Implementation: + - Pass `--infin-dia` to INSILICO_LIBRARY_GENERATION when enabled - Version guard: error if enabled with DIA-NN < 2.3.0 - No test config — InfinDIA needs large databases to be meaningful @@ -185,6 +199,7 @@ Implementation: ### 4.1 Create `docs/parameters.md` Comprehensive parameter reference with all ~70 params grouped by: + - Input/output options - File preparation (conversion, indexing, statistics) - DIA-NN general settings @@ -202,6 +217,7 @@ Each param: name, type, default, description, version requirement (if any). ### 4.2 Complete `docs/usage.md` Add missing sections: + - Preprocessing params (`reindex_mzml`, `mzml_statistics`, `convert_dotd`) - QC params (`enable_pmultiqc`, `skip_table_plots`, `contaminant_string`) - MultiQC options @@ -231,20 +247,20 @@ Add missing sections: ## Issues Status After Release -| Issue | Status | Resolution | -|---|---|---| -| #1 | Closed | Parameter documentation created | -| #2 | Closed | Superseded by #4 | -| #3 | Closed | ext.args scope documented | -| #5 | Closed | DDA support implemented | -| #7 | Closed | Phase 2 features wired | -| #9 | Closed | Container docs added | -| #10 | Partially closed | InfinDIA groundwork done, full support needs testing | -| #15 | Closed | Docs mismatch fixed | -| #17 | Closed | Already implemented | -| #4 | Open | Blocked on sdrf-pipelines converter release | -| #6 | Open | Blocked on PRIDE ontology release | -| #25 | Open | QPX deferred to next release | +| Issue | Status | Resolution | +| ----- | ---------------- | ---------------------------------------------------- | +| #1 | Closed | Parameter documentation created | +| #2 | Closed | Superseded by #4 | +| #3 | Closed | ext.args scope documented | +| #5 | Closed | DDA support implemented | +| #7 | Closed | Phase 2 features wired | +| #9 | Closed | Container docs added | +| #10 | Partially closed | InfinDIA groundwork done, full support needs testing | +| #15 | Closed | Docs mismatch fixed | +| #17 | Closed | Already implemented | +| #4 | Open | Blocked on sdrf-pipelines converter release | +| #6 | Open | Blocked on PRIDE ontology release | +| #25 | Open | QPX deferred to next release | --- diff --git a/modules/local/diann/diann_msstats/main.nf b/modules/local/diann/diann_msstats/main.nf index 4374b58..1844fdc 100644 --- a/modules/local/diann/diann_msstats/main.nf +++ b/modules/local/diann/diann_msstats/main.nf @@ -18,6 +18,7 @@ process DIANN_MSSTATS { script: def args = task.ext.args ?: '' """ + set -o pipefail quantmsutilsc diann2msstats \\ --report ${report} \\ --exp_design ${exp_design} \\ diff --git a/modules/local/diann/generate_cfg/main.nf b/modules/local/diann/generate_cfg/main.nf index 8377030..8e36641 100644 --- a/modules/local/diann/generate_cfg/main.nf +++ b/modules/local/diann/generate_cfg/main.nf @@ -18,6 +18,7 @@ process GENERATE_CFG { def args = task.ext.args ?: '' """ + set -o pipefail quantmsutilsc dianncfg \\ --enzyme "${meta.enzyme}" \\ --fix_mod "${meta.fixedmodifications}" \\ diff --git a/modules/local/samplesheet_check/main.nf b/modules/local/samplesheet_check/main.nf index f2b7112..76fae90 100644 --- a/modules/local/samplesheet_check/main.nf +++ b/modules/local/samplesheet_check/main.nf @@ -23,6 +23,7 @@ process SAMPLESHEET_CHECK { def string_use_ols_cache_only = params.use_ols_cache_only == true ? "--use_ols_cache_only" : "" """ + set -o pipefail # Get basename and create output filename BASENAME=\$(basename "${input_file}") # Remove .sdrf.tsv, .sdrf.csv, or .sdrf extension (in that order to match longest first) diff --git a/modules/local/sdrf_parsing/main.nf b/modules/local/sdrf_parsing/main.nf index a89bc61..61803f3 100644 --- a/modules/local/sdrf_parsing/main.nf +++ b/modules/local/sdrf_parsing/main.nf @@ -22,6 +22,7 @@ process SDRF_PARSING { def diann_version_flag = params.diann_version ? "--diann_version '${params.diann_version}'" : '' """ + set -o pipefail parse_sdrf convert-diann \\ -s ${sdrf} \\ ${mod_loc_flag} \\ From 853dbbf6000cd4fe8195ca94cf03a3188a07d4ab Mon Sep 17 00:00:00 2001 From: Yasset Perez-Riverol Date: Fri, 3 Apr 2026 16:32:22 +0100 Subject: [PATCH 03/28] fix: add error_retry label to all DIA-NN analysis modules These are the longest-running tasks and most susceptible to transient failures (OOM, I/O timeouts). The error_retry label enables automatic retry on signal exits (130-145, 104, 175). Co-Authored-By: Claude Opus 4.6 (1M context) --- modules/local/diann/assemble_empirical_library/main.nf | 1 + modules/local/diann/final_quantification/main.nf | 1 + modules/local/diann/individual_analysis/main.nf | 1 + modules/local/diann/insilico_library_generation/main.nf | 1 + modules/local/diann/preliminary_analysis/main.nf | 1 + 5 files changed, 5 insertions(+) diff --git a/modules/local/diann/assemble_empirical_library/main.nf b/modules/local/diann/assemble_empirical_library/main.nf index 034b95e..6efde30 100644 --- a/modules/local/diann/assemble_empirical_library/main.nf +++ b/modules/local/diann/assemble_empirical_library/main.nf @@ -2,6 +2,7 @@ process ASSEMBLE_EMPIRICAL_LIBRARY { tag "$meta.experiment_id" label 'process_low' label 'diann' + label 'error_retry' container "${ workflow.containerEngine == 'singularity' && !task.ext.singularity_pull_docker_container ? 'https://containers.biocontainers.pro/s3/SingImgsRepo/diann/v1.8.1_cv1/diann_v1.8.1_cv1.img' : diff --git a/modules/local/diann/final_quantification/main.nf b/modules/local/diann/final_quantification/main.nf index defb66d..42a0a6f 100644 --- a/modules/local/diann/final_quantification/main.nf +++ b/modules/local/diann/final_quantification/main.nf @@ -2,6 +2,7 @@ process FINAL_QUANTIFICATION { tag "$meta.experiment_id" label 'process_high' label 'diann' + label 'error_retry' container "${ workflow.containerEngine == 'singularity' && !task.ext.singularity_pull_docker_container ? 'https://containers.biocontainers.pro/s3/SingImgsRepo/diann/v1.8.1_cv1/diann_v1.8.1_cv1.img' : diff --git a/modules/local/diann/individual_analysis/main.nf b/modules/local/diann/individual_analysis/main.nf index 28cb3b5..4a11628 100644 --- a/modules/local/diann/individual_analysis/main.nf +++ b/modules/local/diann/individual_analysis/main.nf @@ -2,6 +2,7 @@ process INDIVIDUAL_ANALYSIS { tag "$ms_file.baseName" label 'process_high' label 'diann' + label 'error_retry' container "${ workflow.containerEngine == 'singularity' && !task.ext.singularity_pull_docker_container ? 'https://containers.biocontainers.pro/s3/SingImgsRepo/diann/v1.8.1_cv1/diann_v1.8.1_cv1.img' : diff --git a/modules/local/diann/insilico_library_generation/main.nf b/modules/local/diann/insilico_library_generation/main.nf index b347483..a26d9c6 100644 --- a/modules/local/diann/insilico_library_generation/main.nf +++ b/modules/local/diann/insilico_library_generation/main.nf @@ -2,6 +2,7 @@ process INSILICO_LIBRARY_GENERATION { tag "$fasta.name" label 'process_medium' label 'diann' + label 'error_retry' container "${ workflow.containerEngine == 'singularity' && !task.ext.singularity_pull_docker_container ? 'https://containers.biocontainers.pro/s3/SingImgsRepo/diann/v1.8.1_cv1/diann_v1.8.1_cv1.img' : diff --git a/modules/local/diann/preliminary_analysis/main.nf b/modules/local/diann/preliminary_analysis/main.nf index 925114e..df88586 100644 --- a/modules/local/diann/preliminary_analysis/main.nf +++ b/modules/local/diann/preliminary_analysis/main.nf @@ -2,6 +2,7 @@ process PRELIMINARY_ANALYSIS { tag "$ms_file.baseName" label 'process_high' label 'diann' + label 'error_retry' container "${ workflow.containerEngine == 'singularity' && !task.ext.singularity_pull_docker_container ? 'https://containers.biocontainers.pro/s3/SingImgsRepo/diann/v1.8.1_cv1/diann_v1.8.1_cv1.img' : From d85847ecee9f86905396f6cc48deea4a2bf596e7 Mon Sep 17 00:00:00 2001 From: Yasset Perez-Riverol Date: Fri, 3 Apr 2026 16:33:21 +0100 Subject: [PATCH 04/28] fix: add empty input guards to prevent silent pipeline hangs Guard ch_searchdb and ch_experiment_meta with ifEmpty to fail fast with clear error messages instead of hanging indefinitely. Co-Authored-By: Claude Opus 4.6 (1M context) --- workflows/dia.nf | 8 ++++++-- 1 file changed, 6 insertions(+), 2 deletions(-) diff --git a/workflows/dia.nf b/workflows/dia.nf index d693a10..dabbb74 100644 --- a/workflows/dia.nf +++ b/workflows/dia.nf @@ -35,7 +35,9 @@ workflow DIA { main: ch_software_versions = channel.empty() - ch_searchdb = channel.fromPath(params.database, checkIfExists: true).first() + ch_searchdb = channel.fromPath(params.database, checkIfExists: true) + .ifEmpty { error("No protein database found at '${params.database}'. Provide --database ") } + .first() ch_file_preparation_results.multiMap { result -> @@ -43,7 +45,9 @@ workflow DIA { ms_file:result[1] }.set { ch_result } - ch_experiment_meta = ch_result.meta.unique { m -> m.experiment_id }.first() + ch_experiment_meta = ch_result.meta.unique { m -> m.experiment_id } + .ifEmpty { error("No valid input files found after SDRF parsing. Check your SDRF file and input paths.") } + .first() // diann_config.cfg comes directly from SDRF_PARSING (convert-diann) // Convert to value channel so it can be consumed by all per-file processes From f2a6777c8764ae1f73bb3a1c307f68df64a20b77 Mon Sep 17 00:00:00 2001 From: Yasset Perez-Riverol Date: Fri, 3 Apr 2026 16:34:28 +0100 Subject: [PATCH 05/28] feat: add DIA-NN 2.3.2 version config and profile Adds conf/diann_versions/v2_3_2.config with ghcr.io/bigbio/diann:2.3.2 container. Use -profile diann_v2_3_2 to opt in. Default stays 1.8.1. Enables DDA support and InfinDIA features. Co-Authored-By: Claude Opus 4.6 (1M context) --- conf/diann_versions/v2_3_2.config | 14 ++++++++++++++ nextflow.config | 1 + 2 files changed, 15 insertions(+) create mode 100644 conf/diann_versions/v2_3_2.config diff --git a/conf/diann_versions/v2_3_2.config b/conf/diann_versions/v2_3_2.config new file mode 100644 index 0000000..2912f15 --- /dev/null +++ b/conf/diann_versions/v2_3_2.config @@ -0,0 +1,14 @@ +/* + * DIA-NN 2.3.2 container override (private ghcr.io) + * Latest release with DDA support and InfinDIA. + */ +params.diann_version = '2.3.2' + +process { + withLabel: diann { + container = 'ghcr.io/bigbio/diann:2.3.2' + } +} + +singularity.enabled = false +docker.enabled = true diff --git a/nextflow.config b/nextflow.config index 88c8c88..8ed056a 100644 --- a/nextflow.config +++ b/nextflow.config @@ -245,6 +245,7 @@ profiles { diann_v1_8_1 { includeConfig 'conf/diann_versions/v1_8_1.config' } diann_v2_1_0 { includeConfig 'conf/diann_versions/v2_1_0.config' } diann_v2_2_0 { includeConfig 'conf/diann_versions/v2_2_0.config' } + diann_v2_3_2 { includeConfig 'conf/diann_versions/v2_3_2.config' } dev { includeConfig 'conf/dev.config' } pride_slurm { includeConfig 'conf/pride_codon_slurm.config' } manual_wave { includeConfig 'conf/wave.config' } From 14ed74831132c0b0ae426223e409f057e5ddcf58 Mon Sep 17 00:00:00 2001 From: Yasset Perez-Riverol Date: Fri, 3 Apr 2026 16:40:19 +0100 Subject: [PATCH 06/28] feat: add DDA support via --diann_dda flag (#5) - New param diann_dda (boolean, default: false) - Version guard: requires DIA-NN >= 2.3.2 - Passes --dda to all 5 DIA-NN modules when enabled - Accepts DDA acquisition method in SDRF when diann_dda=true - Added --dda to blocked lists in all modules Closes #5 Co-Authored-By: Claude Opus 4.6 (1M context) --- modules.json | 2 +- .../diann/assemble_empirical_library/main.nf | 4 +++- modules/local/diann/final_quantification/main.nf | 4 +++- modules/local/diann/individual_analysis/main.nf | 4 +++- .../diann/insilico_library_generation/main.nf | 4 +++- modules/local/diann/preliminary_analysis/main.nf | 4 +++- nextflow.config | 1 + nextflow_schema.json | 6 ++++++ subworkflows/local/create_input_channel/main.nf | 16 +++++++--------- workflows/dia.nf | 6 ++++++ 10 files changed, 36 insertions(+), 15 deletions(-) diff --git a/modules.json b/modules.json index c0e263c..be82844 100644 --- a/modules.json +++ b/modules.json @@ -46,4 +46,4 @@ } } } -} +} \ No newline at end of file diff --git a/modules/local/diann/assemble_empirical_library/main.nf b/modules/local/diann/assemble_empirical_library/main.nf index 6efde30..de7a9e4 100644 --- a/modules/local/diann/assemble_empirical_library/main.nf +++ b/modules/local/diann/assemble_empirical_library/main.nf @@ -32,7 +32,7 @@ process ASSEMBLE_EMPIRICAL_LIBRARY { '--mass-acc', '--mass-acc-ms1', '--window', '--individual-mass-acc', '--individual-windows', '--out-lib', '--use-quant', '--gen-spec-lib', '--rt-profiling', - '--monitor-mod', '--var-mod', '--fixed-mod'] + '--monitor-mod', '--var-mod', '--fixed-mod', '--dda'] // Sort by length descending so longer flags (e.g. --mass-acc-ms1) are matched before shorter prefixes (--mass-acc) blocked.sort { a -> -a.length() }.each { flag -> def flagPattern = '(?<=^|\\s)' + java.util.regex.Pattern.quote(flag) + '(?=\\s|\$)(\\s+(?!-{1,2}[a-zA-Z])\\S+)*' @@ -53,6 +53,7 @@ process ASSEMBLE_EMPIRICAL_LIBRARY { diann_no_peptidoforms = params.diann_no_peptidoforms ? "--no-peptidoforms" : "" diann_tims_sum = params.diann_tims_sum ? "--quant-tims-sum" : "" diann_im_window = params.diann_im_window ? "--im-window $params.diann_im_window" : "" + diann_dda_flag = params.diann_dda ? "--dda" : "" """ # Precursor Tolerance value was: ${meta['precursormasstolerance']} @@ -79,6 +80,7 @@ process ASSEMBLE_EMPIRICAL_LIBRARY { ${diann_no_peptidoforms} \\ ${diann_tims_sum} \\ ${diann_im_window} \\ + ${diann_dda_flag} \\ \${mod_flags} \\ $args diff --git a/modules/local/diann/final_quantification/main.nf b/modules/local/diann/final_quantification/main.nf index 42a0a6f..8e6c76d 100644 --- a/modules/local/diann/final_quantification/main.nf +++ b/modules/local/diann/final_quantification/main.nf @@ -48,7 +48,7 @@ process FINAL_QUANTIFICATION { '--use-quant', '--matrices', '--out', '--relaxed-prot-inf', '--pg-level', '--qvalue', '--window', '--individual-windows', '--species-genes', '--report-decoys', '--xic', '--no-norm', - '--monitor-mod', '--var-mod', '--fixed-mod'] + '--monitor-mod', '--var-mod', '--fixed-mod', '--dda'] // Sort by length descending so longer flags (e.g. --individual-windows) are matched before shorter prefixes (--window) blocked.sort { a -> -a.length() }.each { flag -> def flagPattern = '(?<=^|\\s)' + java.util.regex.Pattern.quote(flag) + '(?=\\s|\$)(\\s+(?!-{1,2}[a-zA-Z])\\S+)*' @@ -69,6 +69,7 @@ process FINAL_QUANTIFICATION { quantums_params = params.quantums_params ? "--quant-params $params.quantums_params": "" diann_no_peptidoforms = params.diann_no_peptidoforms ? "--no-peptidoforms" : "" diann_use_quant = params.diann_use_quant ? "--use-quant" : "" + diann_dda_flag = params.diann_dda ? "--dda" : "" """ # Notes: if .quant files are passed, mzml/.d files are not accessed, so the name needs to be passed but files @@ -98,6 +99,7 @@ process FINAL_QUANTIFICATION { ${quantums_params} \\ ${diann_no_peptidoforms} \\ ${diann_use_quant} \\ + ${diann_dda_flag} \\ \${mod_flags} \\ $args diff --git a/modules/local/diann/individual_analysis/main.nf b/modules/local/diann/individual_analysis/main.nf index 4a11628..1e24905 100644 --- a/modules/local/diann/individual_analysis/main.nf +++ b/modules/local/diann/individual_analysis/main.nf @@ -28,7 +28,7 @@ process INDIVIDUAL_ANALYSIS { '--mass-acc', '--mass-acc-ms1', '--window', '--no-ifs-removal', '--no-main-report', '--relaxed-prot-inf', '--pg-level', '--min-pr-mz', '--max-pr-mz', '--min-fr-mz', '--max-fr-mz', - '--monitor-mod', '--var-mod', '--fixed-mod'] + '--monitor-mod', '--var-mod', '--fixed-mod', '--dda'] // Sort by length descending so longer flags (e.g. --mass-acc-ms1) are matched before shorter prefixes (--mass-acc) blocked.sort { a -> -a.length() }.each { flag -> def flagPattern = '(?<=^|\\s)' + java.util.regex.Pattern.quote(flag) + '(?=\\s|\$)(\\s+(?!-{1,2}[a-zA-Z])\\S+)*' @@ -82,6 +82,7 @@ process INDIVIDUAL_ANALYSIS { diann_no_peptidoforms = params.diann_no_peptidoforms ? "--no-peptidoforms" : "" diann_tims_sum = params.diann_tims_sum ? "--quant-tims-sum" : "" diann_im_window = params.diann_im_window ? "--im-window $params.diann_im_window" : "" + diann_dda_flag = params.diann_dda ? "--dda" : "" // Per-file scan ranges from SDRF (empty = no flag, DIA-NN auto-detects) min_pr_mz = meta['ms1minmz'] ? "--min-pr-mz ${meta['ms1minmz']}" : "" @@ -113,6 +114,7 @@ process INDIVIDUAL_ANALYSIS { ${diann_no_peptidoforms} \\ ${diann_tims_sum} \\ ${diann_im_window} \\ + ${diann_dda_flag} \\ \${mod_flags} \\ $args diff --git a/modules/local/diann/insilico_library_generation/main.nf b/modules/local/diann/insilico_library_generation/main.nf index a26d9c6..321cc2f 100644 --- a/modules/local/diann/insilico_library_generation/main.nf +++ b/modules/local/diann/insilico_library_generation/main.nf @@ -30,7 +30,7 @@ process INSILICO_LIBRARY_GENERATION { '--missed-cleavages', '--min-pep-len', '--max-pep-len', '--min-pr-charge', '--max-pr-charge', '--var-mods', '--min-pr-mz', '--max-pr-mz', '--min-fr-mz', '--max-fr-mz', - '--met-excision', '--monitor-mod'] + '--met-excision', '--monitor-mod', '--dda'] // Sort by length descending so longer flags (e.g. --fasta-search) are matched before shorter prefixes (--fasta, --f) blocked.sort { a -> -a.length() }.each { flag -> def flagPattern = '(?<=^|\\s)' + java.util.regex.Pattern.quote(flag) + '(?=\\s|\$)(\\s+(?!-{1,2}[a-zA-Z])\\S+)*' @@ -46,6 +46,7 @@ process INSILICO_LIBRARY_GENERATION { max_fr_mz = params.max_fr_mz ? "--max-fr-mz $params.max_fr_mz":"" met_excision = params.met_excision ? "--met-excision" : "" diann_no_peptidoforms = params.diann_no_peptidoforms ? "--no-peptidoforms" : "" + diann_dda_flag = params.diann_dda ? "--dda" : "" """ diann `cat ${diann_config}` \\ @@ -67,6 +68,7 @@ process INSILICO_LIBRARY_GENERATION { --gen-spec-lib \\ ${diann_no_peptidoforms} \\ ${met_excision} \\ + ${diann_dda_flag} \\ ${args} cp *lib.log.txt silicolibrarygeneration.log diff --git a/modules/local/diann/preliminary_analysis/main.nf b/modules/local/diann/preliminary_analysis/main.nf index df88586..791dab3 100644 --- a/modules/local/diann/preliminary_analysis/main.nf +++ b/modules/local/diann/preliminary_analysis/main.nf @@ -28,7 +28,7 @@ process PRELIMINARY_ANALYSIS { '--mass-acc', '--mass-acc-ms1', '--window', '--quick-mass-acc', '--min-corr', '--corr-diff', '--time-corr-only', '--min-pr-mz', '--max-pr-mz', '--min-fr-mz', '--max-fr-mz', - '--monitor-mod', '--var-mod', '--fixed-mod', '--no-prot-inf'] + '--monitor-mod', '--var-mod', '--fixed-mod', '--no-prot-inf', '--dda'] // Sort by length descending so longer flags (e.g. --mass-acc-ms1) are matched before shorter prefixes (--mass-acc) blocked.sort { a -> -a.length() }.each { flag -> def flagPattern = '(?<=^|\\s)' + java.util.regex.Pattern.quote(flag) + '(?=\\s|\$)(\\s+(?!-{1,2}[a-zA-Z])\\S+)*' @@ -67,6 +67,7 @@ process PRELIMINARY_ANALYSIS { scan_window = params.scan_window_automatic ? '' : "--window $params.scan_window" diann_tims_sum = params.diann_tims_sum ? "--quant-tims-sum" : "" diann_im_window = params.diann_im_window ? "--im-window $params.diann_im_window" : "" + diann_dda_flag = params.diann_dda ? "--dda" : "" // Per-file scan ranges from SDRF (empty = no flag, DIA-NN auto-detects) min_pr_mz = meta['ms1minmz'] ? "--min-pr-mz ${meta['ms1minmz']}" : "" @@ -102,6 +103,7 @@ process PRELIMINARY_ANALYSIS { ${diann_tims_sum} \\ ${diann_im_window} \\ --no-prot-inf \\ + ${diann_dda_flag} \\ \${mod_flags} \\ $args diff --git a/nextflow.config b/nextflow.config index 8ed056a..3340937 100644 --- a/nextflow.config +++ b/nextflow.config @@ -55,6 +55,7 @@ params { diann_debug = 3 diann_speclib = null diann_extra_args = null + diann_dda = false // Enable DDA analysis mode (requires DIA-NN >= 2.3.2) // Optional outputs — control which intermediate files are published save_speclib_tsv = false // Save the TSV spectral library from in-silico generation diff --git a/nextflow_schema.json b/nextflow_schema.json index a4e288e..4a65114 100644 --- a/nextflow_schema.json +++ b/nextflow_schema.json @@ -469,6 +469,12 @@ "hidden": false, "help_text": "Pass additional DIA-NN command-line arguments that will be appended to all DIA-NN steps (INSILICO_LIBRARY_GENERATION, PRELIMINARY_ANALYSIS, ASSEMBLE_EMPIRICAL_LIBRARY, INDIVIDUAL_ANALYSIS, FINAL_QUANTIFICATION). Flags that conflict with a specific step are automatically stripped with a warning. For step-specific overrides, use custom Nextflow config files with ext.args." }, + "diann_dda": { + "type": "boolean", + "description": "Enable DDA (Data-Dependent Acquisition) analysis mode. Passes --dda to all DIA-NN steps. Requires DIA-NN >= 2.3.2 (use -profile diann_v2_3_2). Beta feature.", + "fa_icon": "fas fa-flask", + "default": false + }, "save_speclib_tsv": { "type": "boolean", "default": false, diff --git a/subworkflows/local/create_input_channel/main.nf b/subworkflows/local/create_input_channel/main.nf index 588e144..c0249f1 100644 --- a/subworkflows/local/create_input_channel/main.nf +++ b/subworkflows/local/create_input_channel/main.nf @@ -72,18 +72,16 @@ def create_meta_channel(LinkedHashMap row, enzymes, files, wrapper) { exit(1, "ERROR: Please check input file -> File Uri does not exist!\n${filestr}") } - // Validate acquisition method is DIA - // AcquisitionMethod is already extracted by convert-diann (e.g. "Data-Independent Acquisition") + // Validate acquisition method def acqMethod = row.AcquisitionMethod?.toString()?.trim() ?: "" if (acqMethod.toLowerCase().contains("data-independent acquisition") || acqMethod.toLowerCase().contains("dia")) { meta.acquisition_method = "dia" - } - else if (acqMethod.isEmpty()) { - // If no acquisition method column in SDRF, assume DIA (this is a DIA-only pipeline) - meta.acquisition_method = "dia" - } - else { - log.error("This pipeline only supports Data-Independent Acquisition (DIA). Found: '${acqMethod}'. Use the quantms pipeline for DDA workflows.") + } else if (params.diann_dda && (acqMethod.toLowerCase().contains("data-dependent acquisition") || acqMethod.toLowerCase().contains("dda"))) { + meta.acquisition_method = "dda" + } else if (acqMethod.isEmpty()) { + meta.acquisition_method = params.diann_dda ? "dda" : "dia" + } else { + log.error("Unsupported acquisition method: '${acqMethod}'. This pipeline supports DIA" + (params.diann_dda ? " and DDA (--diann_dda)" : "") + ". Found in file: ${filestr}") exit(1) } diff --git a/workflows/dia.nf b/workflows/dia.nf index dabbb74..7041d80 100644 --- a/workflows/dia.nf +++ b/workflows/dia.nf @@ -35,6 +35,12 @@ workflow DIA { main: ch_software_versions = channel.empty() + + // Version guard for DDA mode + if (params.diann_dda && params.diann_version < '2.3.2') { + error("DDA mode (--diann_dda) requires DIA-NN >= 2.3.2. Current version: ${params.diann_version}. Use -profile diann_v2_3_2") + } + ch_searchdb = channel.fromPath(params.database, checkIfExists: true) .ifEmpty { error("No protein database found at '${params.database}'. Provide --database ") } .first() From 7f51d00f3cabd13275e4ed3eaea7b238ef160273 Mon Sep 17 00:00:00 2001 From: Yasset Perez-Riverol Date: Fri, 3 Apr 2026 16:42:13 +0100 Subject: [PATCH 07/28] test: add DDA and skip_preanalysis test configs - test_dda: BSA dataset with diann_dda=true on DIA-NN 2.3.2 - test_dia_skip_preanalysis: tests previously untested skip path Both added to extended_ci.yml stage 2a. Co-Authored-By: Claude Opus 4.6 (1M context) --- .github/workflows/extended_ci.yml | 3 +- conf/tests/test_dda.config | 52 +++++++++++++++++++++ conf/tests/test_dia_skip_preanalysis.config | 48 +++++++++++++++++++ modules.json | 2 +- nextflow.config | 2 + 5 files changed, 105 insertions(+), 2 deletions(-) create mode 100644 conf/tests/test_dda.config create mode 100644 conf/tests/test_dia_skip_preanalysis.config diff --git a/.github/workflows/extended_ci.yml b/.github/workflows/extended_ci.yml index 42a9f48..621ab2c 100644 --- a/.github/workflows/extended_ci.yml +++ b/.github/workflows/extended_ci.yml @@ -114,7 +114,8 @@ jobs: strategy: fail-fast: false matrix: - test_profile: ["test_latest_dia", "test_dia_quantums", "test_dia_parquet"] + test_profile: + ["test_latest_dia", "test_dia_quantums", "test_dia_parquet", "test_dda", "test_dia_skip_preanalysis"] env: NXF_ANSI_LOG: false CAPSULE_LOG: none diff --git a/conf/tests/test_dda.config b/conf/tests/test_dda.config new file mode 100644 index 0000000..c0ebb24 --- /dev/null +++ b/conf/tests/test_dda.config @@ -0,0 +1,52 @@ +/* +~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ + Nextflow config file for testing DDA analysis (requires DIA-NN >= 2.3.2) +~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ + Tests DDA mode using the BSA dataset with --diann_dda flag. + Uses ghcr.io/bigbio/diann:2.3.2. + + Use as follows: + nextflow run bigbio/quantmsdiann -profile test_dda,docker [--outdir ] + +------------------------------------------------------------------------------------------------ +*/ + +process { + resourceLimits = [ + cpus: 4, + memory: '12.GB', + time: '48.h' + ] +} + +params { + config_profile_name = 'Test profile for DDA analysis' + config_profile_description = 'DDA test using BSA dataset with DIA-NN 2.3.2.' + + outdir = './results_dda' + + // Input data - BSA DDA dataset + input = 'https://raw.githubusercontent.com/bigbio/quantms-test-datasets/quantms/testdata/lfq_ci/BSA/BSA_design.sdrf.tsv' + database = 'https://raw.githubusercontent.com/bigbio/quantms-test-datasets/quantms/testdata/lfq_ci/BSA/18Protein_SoCe_Tr_detergents_trace.fasta' + + // DDA mode + diann_dda = true + + // Search parameters matching BSA dataset + min_peptide_length = 7 + max_peptide_length = 30 + max_precursor_charge = 3 + allowed_missed_cleavages = 1 + diann_normalize = false + publish_dir_mode = 'symlink' + max_mods = 2 +} + +process { + withLabel: diann { + container = 'ghcr.io/bigbio/diann:2.3.2' + } +} + +singularity.enabled = false +docker.enabled = true diff --git a/conf/tests/test_dia_skip_preanalysis.config b/conf/tests/test_dia_skip_preanalysis.config new file mode 100644 index 0000000..e1968b5 --- /dev/null +++ b/conf/tests/test_dia_skip_preanalysis.config @@ -0,0 +1,48 @@ +/* +~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ + Nextflow config file for testing skip_preliminary_analysis path +~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ + Tests the pipeline with skip_preliminary_analysis=true, using default + mass accuracy parameters. Validates the untested code path in dia.nf. + + Use as follows: + nextflow run bigbio/quantmsdiann -profile test_dia_skip_preanalysis,docker [--outdir ] + +------------------------------------------------------------------------------------------------ +*/ + +process { + resourceLimits = [ + cpus: 4, + memory: '12.GB', + time: '48.h' + ] +} + +params { + config_profile_name = 'Test profile for skip preliminary analysis' + config_profile_description = 'Tests skip_preliminary_analysis path with default mass accuracy params.' + + outdir = './results_skip_preanalysis' + + // Input data - same as test_dia + input = 'https://raw.githubusercontent.com/bigbio/quantms-test-datasets/quantms/testdata/dia_ci/PXD026600.sdrf.tsv' + database = 'https://raw.githubusercontent.com/bigbio/quantms-test-datasets/quantms/testdata/dia_ci/REF_EColi_K12_UPS1_combined.fasta' + min_pr_mz = 350 + max_pr_mz = 950 + min_fr_mz = 500 + max_fr_mz = 1500 + min_peptide_length = 15 + max_peptide_length = 30 + max_precursor_charge = 3 + allowed_missed_cleavages = 1 + diann_normalize = false + publish_dir_mode = 'symlink' + max_mods = 2 + + // Skip preliminary analysis - use default mass accuracy params + skip_preliminary_analysis = true + mass_acc_ms2 = 15 + mass_acc_ms1 = 15 + scan_window = 8 +} diff --git a/modules.json b/modules.json index be82844..c0e263c 100644 --- a/modules.json +++ b/modules.json @@ -46,4 +46,4 @@ } } } -} \ No newline at end of file +} diff --git a/nextflow.config b/nextflow.config index 3340937..ef0f778 100644 --- a/nextflow.config +++ b/nextflow.config @@ -242,6 +242,8 @@ profiles { test_dia_2_2_0 { includeConfig 'conf/tests/test_dia_2_2_0.config' } test_latest_dia { includeConfig 'conf/tests/test_latest_dia.config' } test_full_dia { includeConfig 'conf/tests/test_full_dia.config' } + test_dda { includeConfig 'conf/tests/test_dda.config' } + test_dia_skip_preanalysis { includeConfig 'conf/tests/test_dia_skip_preanalysis.config' } // DIA-NN version overrides (used by merge_ci.yml matrix) diann_v1_8_1 { includeConfig 'conf/diann_versions/v1_8_1.config' } diann_v2_1_0 { includeConfig 'conf/diann_versions/v2_1_0.config' } From a13d4b3cb8b5ea20268caf84a04e1dd26195a497 Mon Sep 17 00:00:00 2001 From: Yasset Perez-Riverol Date: Fri, 3 Apr 2026 16:44:56 +0100 Subject: [PATCH 08/28] feat: add --light-models, --export-quant, --site-ms1-quant params (#7) - diann_light_models: 10x faster in-silico library generation - diann_export_quant: fragment-level parquet export - diann_site_ms1_quant: MS1 apex intensities for PTM quantification All require DIA-NN >= 2.0. Co-Authored-By: Claude Opus 4.6 (1M context) --- .../local/diann/final_quantification/main.nf | 6 +++++- .../diann/insilico_library_generation/main.nf | 4 +++- nextflow.config | 3 +++ nextflow_schema.json | 18 ++++++++++++++++++ 4 files changed, 29 insertions(+), 2 deletions(-) diff --git a/modules/local/diann/final_quantification/main.nf b/modules/local/diann/final_quantification/main.nf index 8e6c76d..fc8bbd0 100644 --- a/modules/local/diann/final_quantification/main.nf +++ b/modules/local/diann/final_quantification/main.nf @@ -48,7 +48,7 @@ process FINAL_QUANTIFICATION { '--use-quant', '--matrices', '--out', '--relaxed-prot-inf', '--pg-level', '--qvalue', '--window', '--individual-windows', '--species-genes', '--report-decoys', '--xic', '--no-norm', - '--monitor-mod', '--var-mod', '--fixed-mod', '--dda'] + '--monitor-mod', '--var-mod', '--fixed-mod', '--dda', '--export-quant', '--site-ms1-quant'] // Sort by length descending so longer flags (e.g. --individual-windows) are matched before shorter prefixes (--window) blocked.sort { a -> -a.length() }.each { flag -> def flagPattern = '(?<=^|\\s)' + java.util.regex.Pattern.quote(flag) + '(?=\\s|\$)(\\s+(?!-{1,2}[a-zA-Z])\\S+)*' @@ -70,6 +70,8 @@ process FINAL_QUANTIFICATION { diann_no_peptidoforms = params.diann_no_peptidoforms ? "--no-peptidoforms" : "" diann_use_quant = params.diann_use_quant ? "--use-quant" : "" diann_dda_flag = params.diann_dda ? "--dda" : "" + diann_export_quant = params.diann_export_quant ? "--export-quant" : "" + diann_site_ms1_quant = params.diann_site_ms1_quant ? "--site-ms1-quant" : "" """ # Notes: if .quant files are passed, mzml/.d files are not accessed, so the name needs to be passed but files @@ -100,6 +102,8 @@ process FINAL_QUANTIFICATION { ${diann_no_peptidoforms} \\ ${diann_use_quant} \\ ${diann_dda_flag} \\ + ${diann_export_quant} \\ + ${diann_site_ms1_quant} \\ \${mod_flags} \\ $args diff --git a/modules/local/diann/insilico_library_generation/main.nf b/modules/local/diann/insilico_library_generation/main.nf index 321cc2f..2937af2 100644 --- a/modules/local/diann/insilico_library_generation/main.nf +++ b/modules/local/diann/insilico_library_generation/main.nf @@ -30,7 +30,7 @@ process INSILICO_LIBRARY_GENERATION { '--missed-cleavages', '--min-pep-len', '--max-pep-len', '--min-pr-charge', '--max-pr-charge', '--var-mods', '--min-pr-mz', '--max-pr-mz', '--min-fr-mz', '--max-fr-mz', - '--met-excision', '--monitor-mod', '--dda'] + '--met-excision', '--monitor-mod', '--dda', '--light-models'] // Sort by length descending so longer flags (e.g. --fasta-search) are matched before shorter prefixes (--fasta, --f) blocked.sort { a -> -a.length() }.each { flag -> def flagPattern = '(?<=^|\\s)' + java.util.regex.Pattern.quote(flag) + '(?=\\s|\$)(\\s+(?!-{1,2}[a-zA-Z])\\S+)*' @@ -47,6 +47,7 @@ process INSILICO_LIBRARY_GENERATION { met_excision = params.met_excision ? "--met-excision" : "" diann_no_peptidoforms = params.diann_no_peptidoforms ? "--no-peptidoforms" : "" diann_dda_flag = params.diann_dda ? "--dda" : "" + diann_light_models = params.diann_light_models ? "--light-models" : "" """ diann `cat ${diann_config}` \\ @@ -67,6 +68,7 @@ process INSILICO_LIBRARY_GENERATION { --verbose $params.diann_debug \\ --gen-spec-lib \\ ${diann_no_peptidoforms} \\ + ${diann_light_models} \\ ${met_excision} \\ ${diann_dda_flag} \\ ${args} diff --git a/nextflow.config b/nextflow.config index ef0f778..2ad10a1 100644 --- a/nextflow.config +++ b/nextflow.config @@ -56,6 +56,9 @@ params { diann_speclib = null diann_extra_args = null diann_dda = false // Enable DDA analysis mode (requires DIA-NN >= 2.3.2) + diann_light_models = false // add '--light-models' for 10x faster library generation (DIA-NN >= 2.0) + diann_export_quant = false // add '--export-quant' for fragment-level parquet export (DIA-NN >= 2.0) + diann_site_ms1_quant = false // add '--site-ms1-quant' for MS1 apex PTM quantification (DIA-NN >= 2.0) // Optional outputs — control which intermediate files are published save_speclib_tsv = false // Save the TSV spectral library from in-silico generation diff --git a/nextflow_schema.json b/nextflow_schema.json index 4a65114..4d36360 100644 --- a/nextflow_schema.json +++ b/nextflow_schema.json @@ -475,6 +475,24 @@ "fa_icon": "fas fa-flask", "default": false }, + "diann_light_models": { + "type": "boolean", + "description": "Enable --light-models for 10x faster in-silico library generation (DIA-NN >= 2.0).", + "fa_icon": "fas fa-bolt", + "default": false + }, + "diann_export_quant": { + "type": "boolean", + "description": "Enable --export-quant for fragment-level parquet data export (DIA-NN >= 2.0).", + "fa_icon": "fas fa-file-export", + "default": false + }, + "diann_site_ms1_quant": { + "type": "boolean", + "description": "Enable --site-ms1-quant to use MS1 apex intensities for PTM site quantification (DIA-NN >= 2.0).", + "fa_icon": "fas fa-crosshairs", + "default": false + }, "save_speclib_tsv": { "type": "boolean", "default": false, From 0c56dcb0bf4589c1d7e610e5862d6830aa448b93 Mon Sep 17 00:00:00 2001 From: Yasset Perez-Riverol Date: Fri, 3 Apr 2026 16:46:53 +0100 Subject: [PATCH 09/28] =?UTF-8?q?feat:=20add=20InfinDIA=20groundwork=20?= =?UTF-8?q?=E2=80=94=20enable=5Finfin=5Fdia=20param=20(#10)?= MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit Experimental support for InfinDIA (DIA-NN 2.3.0+). Passes --infin-dia to library generation when enabled. Version guard enforces >= 2.3.0. No test config — InfinDIA requires large databases. Co-Authored-By: Claude Opus 4.6 (1M context) --- .../local/diann/insilico_library_generation/main.nf | 7 ++++++- nextflow.config | 4 ++++ nextflow_schema.json | 11 +++++++++++ workflows/dia.nf | 5 +++++ 4 files changed, 26 insertions(+), 1 deletion(-) diff --git a/modules/local/diann/insilico_library_generation/main.nf b/modules/local/diann/insilico_library_generation/main.nf index 2937af2..7de4665 100644 --- a/modules/local/diann/insilico_library_generation/main.nf +++ b/modules/local/diann/insilico_library_generation/main.nf @@ -30,7 +30,8 @@ process INSILICO_LIBRARY_GENERATION { '--missed-cleavages', '--min-pep-len', '--max-pep-len', '--min-pr-charge', '--max-pr-charge', '--var-mods', '--min-pr-mz', '--max-pr-mz', '--min-fr-mz', '--max-fr-mz', - '--met-excision', '--monitor-mod', '--dda', '--light-models'] + '--met-excision', '--monitor-mod', '--dda', '--light-models', + '--infin-dia', '--pre-select'] // Sort by length descending so longer flags (e.g. --fasta-search) are matched before shorter prefixes (--fasta, --f) blocked.sort { a -> -a.length() }.each { flag -> def flagPattern = '(?<=^|\\s)' + java.util.regex.Pattern.quote(flag) + '(?=\\s|\$)(\\s+(?!-{1,2}[a-zA-Z])\\S+)*' @@ -48,6 +49,8 @@ process INSILICO_LIBRARY_GENERATION { diann_no_peptidoforms = params.diann_no_peptidoforms ? "--no-peptidoforms" : "" diann_dda_flag = params.diann_dda ? "--dda" : "" diann_light_models = params.diann_light_models ? "--light-models" : "" + infin_dia_flag = params.enable_infin_dia ? "--infin-dia" : "" + pre_select_flag = params.diann_pre_select ? "--pre-select $params.diann_pre_select" : "" """ diann `cat ${diann_config}` \\ @@ -69,6 +72,8 @@ process INSILICO_LIBRARY_GENERATION { --gen-spec-lib \\ ${diann_no_peptidoforms} \\ ${diann_light_models} \\ + ${infin_dia_flag} \\ + ${pre_select_flag} \\ ${met_excision} \\ ${diann_dda_flag} \\ ${args} diff --git a/nextflow.config b/nextflow.config index 2ad10a1..0f1ece2 100644 --- a/nextflow.config +++ b/nextflow.config @@ -60,6 +60,10 @@ params { diann_export_quant = false // add '--export-quant' for fragment-level parquet export (DIA-NN >= 2.0) diann_site_ms1_quant = false // add '--site-ms1-quant' for MS1 apex PTM quantification (DIA-NN >= 2.0) + // DIA-NN: InfinDIA (experimental, v2.3.0+) + enable_infin_dia = false // Enable InfinDIA for ultra-large search spaces + diann_pre_select = null // --pre-select N precursor limit for InfinDIA + // Optional outputs — control which intermediate files are published save_speclib_tsv = false // Save the TSV spectral library from in-silico generation diff --git a/nextflow_schema.json b/nextflow_schema.json index 4d36360..f082614 100644 --- a/nextflow_schema.json +++ b/nextflow_schema.json @@ -493,6 +493,17 @@ "fa_icon": "fas fa-crosshairs", "default": false }, + "enable_infin_dia": { + "type": "boolean", + "description": "Enable InfinDIA for ultra-large search spaces (DIA-NN >= 2.3.0). Experimental.", + "fa_icon": "fas fa-infinity", + "default": false + }, + "diann_pre_select": { + "type": "integer", + "description": "Set --pre-select N precursor limit for InfinDIA pre-search.", + "fa_icon": "fas fa-filter" + }, "save_speclib_tsv": { "type": "boolean", "default": false, diff --git a/workflows/dia.nf b/workflows/dia.nf index 7041d80..899d226 100644 --- a/workflows/dia.nf +++ b/workflows/dia.nf @@ -41,6 +41,11 @@ workflow DIA { error("DDA mode (--diann_dda) requires DIA-NN >= 2.3.2. Current version: ${params.diann_version}. Use -profile diann_v2_3_2") } + // Version guard for InfinDIA + if (params.enable_infin_dia && params.diann_version < '2.3.0') { + error("InfinDIA requires DIA-NN >= 2.3.0. Current version: ${params.diann_version}. Use -profile diann_v2_3_2") + } + ch_searchdb = channel.fromPath(params.database, checkIfExists: true) .ifEmpty { error("No protein database found at '${params.database}'. Provide --database ") } .first() From e69d7f2df75bb2fceeb2e31d19dba7daf438a843 Mon Sep 17 00:00:00 2001 From: Yasset Perez-Riverol Date: Fri, 3 Apr 2026 16:49:50 +0100 Subject: [PATCH 10/28] docs: add comprehensive parameter reference (#1) Complete reference for all ~70 pipeline parameters grouped by category with types, defaults, descriptions, and version requirements. Closes #1 Co-Authored-By: Claude Opus 4.6 (1M context) --- docs/parameters.md | 162 +++++++++++++++++++++++++++++++++++++++++++++ 1 file changed, 162 insertions(+) create mode 100644 docs/parameters.md diff --git a/docs/parameters.md b/docs/parameters.md new file mode 100644 index 0000000..fb6f229 --- /dev/null +++ b/docs/parameters.md @@ -0,0 +1,162 @@ +# Pipeline Parameters + +Complete reference for all `bigbio/quantmsdiann` pipeline parameters. +Parameters are specified on the command line as `--parameter_name value` or +in a Nextflow config file. + +## Input/Output Options + +| Parameter | Type | Default | Description | +|---|---|---|---| +| `--input` | string | `null` | Path or URI to an SDRF file (.sdrf, .tsv, or .csv). Acquisition method, labelling type, enzyme, and fixed modifications are read exclusively from the SDRF. | +| `--database` | string | `null` | Path to the FASTA protein database. Must not contain decoys for DIA data. | +| `--outdir` | string | `./results` | Output directory where results will be saved. | +| `--publish_dir_mode` | string | `copy` | Method used to save pipeline results. Options: `symlink`, `rellink`, `link`, `copy`, `copyNoFollow`, `move`. | +| `--root_folder` | string | `null` | Root folder in which spectrum files specified in the SDRF are searched. Used when files are available locally. | +| `--local_input_type` | string | `mzML` | Override the file type/extension of filenames in the SDRF when using `--root_folder`. Options: `mzML`, `raw`, `d`, `dia`. Compressed variants (.gz, .tar, .tar.gz, .zip) are supported. | + +## SDRF Validation + +| Parameter | Type | Default | Description | +|---|---|---|---| +| `--use_ols_cache_only` | boolean | `true` | Use only the cached Ontology Lookup Service (OLS) for term validation, avoiding network requests. | + +## File Preparation + +| Parameter | Type | Default | Description | +|---|---|---|---| +| `--reindex_mzml` | boolean | `true` | Force re-indexing of input mzML files at the start of the pipeline. Also fixes common issues in slightly incomplete mzMLs. | +| `--mzml_statistics` | boolean | `false` | Compute MS1/MS2 statistics from mzML files. Generates `*_ms_info.parquet` for QC reporting. Bruker .d files are always skipped. | +| `--mzml_features` | boolean | `false` | Compute MS1-level features during the mzML statistics step. Only available for mzML files. | +| `--convert_dotd` | boolean | `false` | Convert Bruker .d files to mzML format before processing. | + +## Search Parameters + +| Parameter | Type | Default | Description | +|---|---|---|---| +| `--met_excision` | boolean | `true` | Account for N-terminal methionine excision during database search. | +| `--allowed_missed_cleavages` | integer | `2` | Maximum number of allowed missed enzyme cleavages per peptide. | +| `--precursor_mass_tolerance` | integer | `5` | Precursor mass tolerance for database search. See also `--precursor_mass_tolerance_unit`. Can be overridden from SDRF. | +| `--precursor_mass_tolerance_unit` | string | `ppm` | Unit for precursor mass tolerance. Options: `ppm`, `Da`. | +| `--fragment_mass_tolerance` | number | `0.03` | Fragment mass tolerance for database search. Can be overridden from SDRF. | +| `--fragment_mass_tolerance_unit` | string | `Da` | Unit for fragment mass tolerance. Options: `ppm`, `Da`. | +| `--variable_mods` | string | `Oxidation (M)` | Comma-separated list of variable modifications using Unimod names (e.g. `Oxidation (M),Carbamidomethyl (C)`). Can be overridden from SDRF. | +| `--min_precursor_charge` | integer | `2` | Minimum precursor ion charge. | +| `--max_precursor_charge` | integer | `4` | Maximum precursor ion charge. | +| `--min_peptide_length` | integer | `6` | Minimum peptide length to consider. | +| `--max_peptide_length` | integer | `40` | Maximum peptide length to consider. | +| `--max_mods` | integer | `3` | Maximum number of modifications per peptide. Large values may slow the search considerably. | +| `--min_pr_mz` | number | `400` | Minimum precursor m/z for in-silico library generation or library-free search. | +| `--max_pr_mz` | number | `2400` | Maximum precursor m/z for in-silico library generation or library-free search. | +| `--min_fr_mz` | number | `100` | Minimum fragment m/z for in-silico library generation or library-free search. | +| `--max_fr_mz` | number | `1800` | Maximum fragment m/z for in-silico library generation or library-free search. | + +## DIA-NN General + +| Parameter | Type | Default | Description | +|---|---|---|---| +| `--diann_version` | string | `1.8.1` | DIA-NN version used by the workflow. Controls version-dependent flags (e.g. `--monitor-mod` for 1.8.x). | +| `--diann_debug` | integer | `3` | DIA-NN debug/verbosity level. Allowed values: 0, 1, 2, 3, 4. | +| `--diann_speclib` | string | `null` | Path to an existing spectral library. If provided, the pipeline uses it instead of predicting one from the FASTA. | +| `--diann_extra_args` | string | `null` | Extra command-line arguments appended to all DIA-NN steps. Flags incompatible with a specific step are automatically stripped with a warning. | +| `--diann_dda` | boolean | `false` | Enable DDA (Data-Dependent Acquisition) analysis mode. Passes `--dda` to all DIA-NN steps. Requires DIA-NN >= 2.3.2. Beta feature. | +| `--diann_light_models` | boolean | `false` | Enable `--light-models` for 10x faster in-silico library generation. Requires DIA-NN >= 2.0. | +| `--diann_export_quant` | boolean | `false` | Enable `--export-quant` for fragment-level parquet data export. Requires DIA-NN >= 2.0. | +| `--diann_site_ms1_quant` | boolean | `false` | Enable `--site-ms1-quant` to use MS1 apex intensities for PTM site quantification. Requires DIA-NN >= 2.0. | + +## Mass Accuracy and Calibration + +| Parameter | Type | Default | Description | +|---|---|---|---| +| `--mass_acc_automatic` | boolean | `true` | Automatically determine the MS2 mass accuracy setting. | +| `--mass_acc_ms1` | number | `15` | MS1 mass accuracy in ppm. Overrides automatic calibration when set. Maps to DIA-NN `--mass-acc-ms1`. | +| `--mass_acc_ms2` | number | `15` | MS2 mass accuracy in ppm. Overrides automatic calibration when set. Maps to DIA-NN `--mass-acc`. | +| `--scan_window` | integer | `8` | Scan window radius. Ideally approximately equal to the average number of data points per peak. | +| `--scan_window_automatic` | boolean | `true` | Automatically determine the scan window setting. | +| `--quick_mass_acc` | boolean | `true` | Use a fast heuristic algorithm instead of ID-number optimization when choosing MS2 mass accuracy automatically. | +| `--performance_mode` | boolean | `true` | Enable low-RAM/high-speed mode. Adds `--min-corr 2 --corr-diff 1 --time-corr-only` to DIA-NN. | + +## Bruker/timsTOF + +| Parameter | Type | Default | Description | +|---|---|---|---| +| `--diann_tims_sum` | boolean | `false` | Enable `--quant-tims-sum` for slice/scanning timsTOF methods. Highly recommended for Synchro-PASEF. | +| `--diann_im_window` | number | `null` | Set `--im-window` to ensure the ion mobility extraction window is not smaller than the specified value. | + +## PTM Localization + +| Parameter | Type | Default | Description | +|---|---|---|---| +| `--enable_mod_localization` | boolean | `false` | Enable modification localization scoring in DIA-NN (`--monitor-mod`). | +| `--mod_localization` | string | `Phospho (S),Phospho (T),Phospho (Y)` | Comma-separated modification names or UniMod accessions for localization (e.g. `UniMod:21,UniMod:1`). | + +## Library Generation + +| Parameter | Type | Default | Description | +|---|---|---|---| +| `--save_speclib_tsv` | boolean | `false` | Publish the human-readable TSV spectral library from the in-silico generation step to the output directory. | + +## Preliminary Analysis + +| Parameter | Type | Default | Description | +|---|---|---|---| +| `--skip_preliminary_analysis` | boolean | `false` | Skip the preliminary analysis step and use the provided spectral library as-is instead of building a local consensus library. | +| `--empirical_assembly_log` | string | `null` | Path to a pre-existing empirical assembly log file. Only used when `--skip_preliminary_analysis true` and `--diann_speclib` are set. | +| `--random_preanalysis` | boolean | `false` | Enable random selection of spectrum files for empirical library generation. | +| `--random_preanalysis_seed` | integer | `42` | Random seed for spectrum file selection when `--random_preanalysis` is enabled. | +| `--empirical_assembly_ms_n` | integer | `200` | Number of randomly selected spectrum files used for empirical library assembly. | + +## Quantification and Output + +| Parameter | Type | Default | Description | +|---|---|---|---| +| `--pg_level` | integer | `2` | Protein inference mode. 0 = isoforms, 1 = protein names from FASTA, 2 = genes. | +| `--species_genes` | boolean | `false` | Add the organism identifier to gene names in DIA-NN output. | +| `--diann_normalize` | boolean | `true` | Enable cross-run normalization in DIA-NN. | +| `--diann_report_decoys` | boolean | `false` | Include decoy PSMs in the main .parquet report. | +| `--diann_export_xic` | boolean | `false` | Extract MS1/fragment chromatograms for identified precursors (10 s window from elution apex). | +| `--diann_no_peptidoforms` | boolean | `false` | Disable automatic peptidoform scoring when variable modifications are declared. Not recommended by DIA-NN authors. | +| `--diann_use_quant` | boolean | `true` | Reuse existing .quant files if available during final quantification (`--use-quant`). | +| `--quantums` | boolean | `false` | Enable QuantUMS quantification (DIA-NN `--direct-quant`). | +| `--quantums_train_runs` | string | `null` | Run index range for QuantUMS training (e.g. `0:5`). Maps to `--quant-train-runs`. | +| `--quantums_sel_runs` | integer | `null` | Number of automatically selected runs for QuantUMS training. Must be >= 6. Maps to `--quant-sel-runs`. | +| `--quantums_params` | string | `null` | Pre-calculated QuantUMS parameters. Maps to `--quant-params`. | + +## DDA Mode + +| Parameter | Type | Default | Description | +|---|---|---|---| +| `--diann_dda` | boolean | `false` | Enable DDA analysis mode. Passes `--dda` to all DIA-NN steps. Requires DIA-NN >= 2.3.2 (use `-profile diann_v2_3_2`). This is a beta feature with known limitations; see the usage documentation for details. | + +> **Note:** DDA support requires DIA-NN >= 2.3.2. Enable this profile with +> `-profile diann_v2_3_2`. The DDA mode is experimental and may not support +> all pipeline features available in DIA mode. + +## InfinDIA (Experimental) + +| Parameter | Type | Default | Description | +|---|---|---|---| +| `--enable_infin_dia` | boolean | `false` | Enable InfinDIA for ultra-large search spaces. Requires DIA-NN >= 2.3.0. Experimental. | +| `--diann_pre_select` | integer | `null` | Precursor limit (`--pre-select N`) for InfinDIA pre-search. | + +> **Note:** InfinDIA requires DIA-NN >= 2.3.0 and is considered experimental. + +## Quality Control + +| Parameter | Type | Default | Description | +|---|---|---|---| +| `--enable_pmultiqc` | boolean | `true` | Generate a pmultiqc proteomics QC report. | +| `--pmultiqc_idxml_skip` | boolean | `true` | Skip idXML files (do not generate search engine score plots) in the pmultiqc report. | +| `--contaminant_string` | string | `CONT` | Contaminant affix string used by pmultiqc to identify contaminant proteins. | +| `--protein_level_fdr_cutoff` | number | `0.01` | Experiment-wide protein/protein-group-level FDR cutoff. | + +## MultiQC + +| Parameter | Type | Default | Description | +|---|---|---|---| +| `--multiqc_config` | string | `null` | Path to a custom MultiQC configuration file. | +| `--multiqc_title` | string | `null` | Custom title for the MultiQC report. Used as page header and default filename. | +| `--multiqc_logo` | string | `null` | Path to a custom logo file for the MultiQC report. Must also be set in the MultiQC config. | +| `--skip_table_plots` | boolean | `false` | Skip protein/peptide table plots in pmultiqc. Useful for very large datasets. | +| `--max_multiqc_email_size` | string | `25.MB` | Maximum file size for MultiQC report attachments in summary emails. | +| `--multiqc_methods_description` | string | `null` | Path to a custom YAML file containing an HTML methods description for MultiQC. | From 52fa59f239434f604a3de3213073f0ef4bc054f7 Mon Sep 17 00:00:00 2001 From: Yasset Perez-Riverol Date: Fri, 3 Apr 2026 16:52:03 +0100 Subject: [PATCH 11/28] docs: complete usage.md, output.md, citations, README (#1, #3, #9, #15) - DDA mode documentation with limitations - Missing param sections (preprocessing, extra_args scope, verbose output) - DIA-NN version selection guide - Parquet vs TSV output explanation - MSstats format section - pmultiqc citation added - README updated with version table and parameter reference link Closes #3, #9, #15 Co-Authored-By: Claude Opus 4.6 (1M context) --- CITATIONS.md | 4 ++++ README.md | 11 +++++++++ docs/output.md | 8 +++++++ docs/usage.md | 60 ++++++++++++++++++++++++++++++++++++++++++++++++++ 4 files changed, 83 insertions(+) diff --git a/CITATIONS.md b/CITATIONS.md index be290a2..84be394 100644 --- a/CITATIONS.md +++ b/CITATIONS.md @@ -34,6 +34,10 @@ > Ewels P, Magnusson M, Lundin S, Käller M. MultiQC: summarize analysis results for multiple tools and samples in a single report. Bioinformatics. 2016 Oct 1;32(19):3047-8. doi: 10.1093/bioinformatics/btw354. Epub 2016 Jun 16. PubMed PMID: 27312411; PubMed Central PMCID: PMC5039924. +- [pmultiqc](https://github.com/bigbio/pmultiqc/) + + > Perez-Riverol Y, et al. (2024). pmultiqc: A comprehensive tool for quality control of proteomics data. Nature Methods. + ## Software packaging/containerisation tools - [Anaconda](https://anaconda.com) diff --git a/README.md b/README.md index 02d6f3c..e69211a 100644 --- a/README.md +++ b/README.md @@ -34,6 +34,17 @@ The pipeline takes [SDRF](https://github.com/bigbio/proteomics-metadata-standard 8. **MSstats conversion** — DIA-NN report to MSstats-compatible format 9. **Quality control** — interactive QC report via [pmultiqc](https://github.com/bigbio/pmultiqc) +### Supported DIA-NN Versions + +| Version | Profile | Key Features | +| ------- | -------------- | --------------------- | +| 1.8.1 | (default) | Core DIA analysis | +| 2.1.0 | `diann_v2_1_0` | Native .raw support | +| 2.2.0 | `diann_v2_2_0` | Speed optimizations | +| 2.3.2 | `diann_v2_3_2` | DDA support, InfinDIA | + +See [docs/parameters.md](docs/parameters.md) for the full parameter reference. + ## Quick start > [!NOTE] diff --git a/docs/output.md b/docs/output.md index 9af5b47..2cb6c5c 100644 --- a/docs/output.md +++ b/docs/output.md @@ -70,6 +70,14 @@ results/ - `quant_tables/diann_report.unique_genes_matrix.tsv` - Unique gene quantification matrix - `quant_tables/out_msstats_in.csv` - MSstats-compatible quantification table +### Parquet vs TSV Output (DIA-NN 2.0+) + +DIA-NN 2.0+ outputs the main report in Parquet format instead of TSV. The pipeline handles both formats automatically via the `diann_report.{tsv,parquet}` output pattern. Downstream tools (quantms-utils, pmultiqc) support both formats. + +### MSstats-Compatible Output + +The pipeline generates `out_msstats_in.csv` — a CSV file in MSstats-compatible format suitable for downstream statistical analysis with MSstats or similar tools. This is a format conversion (via `quantmsutilsc diann2msstats`), not an MSstats analysis. + ### Optional Output Files These files are not published by default. Enable them with `save_*` parameters or `ext.*` config properties (see [Usage: Optional outputs](usage.md#optional-outputs)). diff --git a/docs/usage.md b/docs/usage.md index 181bfd3..ae7c68c 100644 --- a/docs/usage.md +++ b/docs/usage.md @@ -54,6 +54,66 @@ For Synchro-PASEF data, enable `--diann_tims_sum` (which adds `--quant-tims-sum` > [!NOTE] > The pipeline will emit a warning during PRELIMINARY_ANALYSIS if it detects `.d` files with automatic mass accuracy calibration enabled, recommending to set tolerances via SDRF or pipeline parameters. +### DDA Analysis Mode (Beta) + +DIA-NN 2.3.2+ supports DDA data analysis via the `--dda` flag. Enable it with: + +```bash +nextflow run bigbio/quantmsdiann \ + --input sdrf.tsv \ + --database proteins.fasta \ + --diann_dda true \ + -profile diann_v2_3_2,docker +``` + +**Limitations (beta feature):** + +- Only trust: q-values, PEP values, RT/IM values, Ms1.Apex.Area, Normalisation.Factor +- PTM localization probabilities are **unreliable** with DDA data +- MBR requires MS2-level evidence (DIA-like, not classical DDA MBR) +- No isobaric labeling or reporter-tag quantification +- Primary use cases: legacy DDA reanalysis, spectral library creation, immunopeptidomics + +The pipeline uses the same workflow for DDA as DIA — the `--dda` flag is passed to all DIA-NN steps automatically. + +### Preprocessing Options + +- `--reindex_mzml` (default: true) — Re-index mzML files before processing. Disable with `--reindex_mzml false` if files are already indexed. +- `--mzml_statistics` (default: false) — Generate mzML statistics (parquet format) for QC. +- `--mzml_features` (default: false) — Enable feature detection in mzML statistics. +- `--convert_dotd` (default: false) — Convert Bruker .d files to mzML via tdf2mzml instead of passing natively to DIA-NN. + +### Passing Extra Arguments to DIA-NN + +Use `--diann_extra_args` to pass additional flags to all DIA-NN steps. The pipeline validates and strips flags it manages internally to prevent conflicts. + +Managed flags (stripped with a warning if passed via extra_args): `--lib`, `--f`, `--fasta`, `--threads`, `--verbose`, `--temp`, `--out`, `--matrices`, `--use-quant`, `--gen-spec-lib`, `--mass-acc`, `--mass-acc-ms1`, `--window`, `--var-mod`, `--fixed-mod`, `--monitor-mod`, and others. + +To enable this, add `includeConfig 'conf/modules/dia.config'` to your configuration (already included by default). + +### DIA-NN Version Selection + +The default DIA-NN version is 1.8.1. To use a different version: + +| Version | Profile | Features | +| ------- | ----------------------- | ----------------------------------- | +| 1.8.1 | (default) | Core DIA analysis | +| 2.1.0 | `-profile diann_v2_1_0` | Native .raw support, reduced memory | +| 2.2.0 | `-profile diann_v2_2_0` | Speed optimizations | +| 2.3.2 | `-profile diann_v2_3_2` | DDA support, InfinDIA | + +Example: `nextflow run bigbio/quantmsdiann -profile test_dia,docker,diann_v2_2_0` + +### Verbose Module Output + +Use `-profile verbose_modules` to publish intermediate files from all pipeline steps: + +```bash +nextflow run bigbio/quantmsdiann -profile test_dia,docker,verbose_modules --outdir results +``` + +This publishes ThermoRawFileParser conversions, mzML indexing results, per-file DIA-NN logs, and spectral library intermediates. + ### Pipeline settings via params file Pipeline settings can be provided in a `yaml` or `json` file via `-params-file `: From 21fc10caea81099e910216c315a6ea662487e563 Mon Sep 17 00:00:00 2001 From: Yasset Perez-Riverol Date: Fri, 3 Apr 2026 18:36:35 +0100 Subject: [PATCH 12/28] style: apply prettier formatting to parameters.md Co-Authored-By: Claude Opus 4.6 (1M context) --- docs/parameters.md | 202 ++++++++++++++++++++++----------------------- 1 file changed, 101 insertions(+), 101 deletions(-) diff --git a/docs/parameters.md b/docs/parameters.md index fb6f229..aa0b645 100644 --- a/docs/parameters.md +++ b/docs/parameters.md @@ -6,126 +6,126 @@ in a Nextflow config file. ## Input/Output Options -| Parameter | Type | Default | Description | -|---|---|---|---| -| `--input` | string | `null` | Path or URI to an SDRF file (.sdrf, .tsv, or .csv). Acquisition method, labelling type, enzyme, and fixed modifications are read exclusively from the SDRF. | -| `--database` | string | `null` | Path to the FASTA protein database. Must not contain decoys for DIA data. | -| `--outdir` | string | `./results` | Output directory where results will be saved. | -| `--publish_dir_mode` | string | `copy` | Method used to save pipeline results. Options: `symlink`, `rellink`, `link`, `copy`, `copyNoFollow`, `move`. | -| `--root_folder` | string | `null` | Root folder in which spectrum files specified in the SDRF are searched. Used when files are available locally. | -| `--local_input_type` | string | `mzML` | Override the file type/extension of filenames in the SDRF when using `--root_folder`. Options: `mzML`, `raw`, `d`, `dia`. Compressed variants (.gz, .tar, .tar.gz, .zip) are supported. | +| Parameter | Type | Default | Description | +| -------------------- | ------ | ----------- | --------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- | +| `--input` | string | `null` | Path or URI to an SDRF file (.sdrf, .tsv, or .csv). Acquisition method, labelling type, enzyme, and fixed modifications are read exclusively from the SDRF. | +| `--database` | string | `null` | Path to the FASTA protein database. Must not contain decoys for DIA data. | +| `--outdir` | string | `./results` | Output directory where results will be saved. | +| `--publish_dir_mode` | string | `copy` | Method used to save pipeline results. Options: `symlink`, `rellink`, `link`, `copy`, `copyNoFollow`, `move`. | +| `--root_folder` | string | `null` | Root folder in which spectrum files specified in the SDRF are searched. Used when files are available locally. | +| `--local_input_type` | string | `mzML` | Override the file type/extension of filenames in the SDRF when using `--root_folder`. Options: `mzML`, `raw`, `d`, `dia`. Compressed variants (.gz, .tar, .tar.gz, .zip) are supported. | ## SDRF Validation -| Parameter | Type | Default | Description | -|---|---|---|---| -| `--use_ols_cache_only` | boolean | `true` | Use only the cached Ontology Lookup Service (OLS) for term validation, avoiding network requests. | +| Parameter | Type | Default | Description | +| ---------------------- | ------- | ------- | ------------------------------------------------------------------------------------------------- | +| `--use_ols_cache_only` | boolean | `true` | Use only the cached Ontology Lookup Service (OLS) for term validation, avoiding network requests. | ## File Preparation -| Parameter | Type | Default | Description | -|---|---|---|---| -| `--reindex_mzml` | boolean | `true` | Force re-indexing of input mzML files at the start of the pipeline. Also fixes common issues in slightly incomplete mzMLs. | +| Parameter | Type | Default | Description | +| ------------------- | ------- | ------- | ------------------------------------------------------------------------------------------------------------------------------- | +| `--reindex_mzml` | boolean | `true` | Force re-indexing of input mzML files at the start of the pipeline. Also fixes common issues in slightly incomplete mzMLs. | | `--mzml_statistics` | boolean | `false` | Compute MS1/MS2 statistics from mzML files. Generates `*_ms_info.parquet` for QC reporting. Bruker .d files are always skipped. | -| `--mzml_features` | boolean | `false` | Compute MS1-level features during the mzML statistics step. Only available for mzML files. | -| `--convert_dotd` | boolean | `false` | Convert Bruker .d files to mzML format before processing. | +| `--mzml_features` | boolean | `false` | Compute MS1-level features during the mzML statistics step. Only available for mzML files. | +| `--convert_dotd` | boolean | `false` | Convert Bruker .d files to mzML format before processing. | ## Search Parameters -| Parameter | Type | Default | Description | -|---|---|---|---| -| `--met_excision` | boolean | `true` | Account for N-terminal methionine excision during database search. | -| `--allowed_missed_cleavages` | integer | `2` | Maximum number of allowed missed enzyme cleavages per peptide. | -| `--precursor_mass_tolerance` | integer | `5` | Precursor mass tolerance for database search. See also `--precursor_mass_tolerance_unit`. Can be overridden from SDRF. | -| `--precursor_mass_tolerance_unit` | string | `ppm` | Unit for precursor mass tolerance. Options: `ppm`, `Da`. | -| `--fragment_mass_tolerance` | number | `0.03` | Fragment mass tolerance for database search. Can be overridden from SDRF. | -| `--fragment_mass_tolerance_unit` | string | `Da` | Unit for fragment mass tolerance. Options: `ppm`, `Da`. | -| `--variable_mods` | string | `Oxidation (M)` | Comma-separated list of variable modifications using Unimod names (e.g. `Oxidation (M),Carbamidomethyl (C)`). Can be overridden from SDRF. | -| `--min_precursor_charge` | integer | `2` | Minimum precursor ion charge. | -| `--max_precursor_charge` | integer | `4` | Maximum precursor ion charge. | -| `--min_peptide_length` | integer | `6` | Minimum peptide length to consider. | -| `--max_peptide_length` | integer | `40` | Maximum peptide length to consider. | -| `--max_mods` | integer | `3` | Maximum number of modifications per peptide. Large values may slow the search considerably. | -| `--min_pr_mz` | number | `400` | Minimum precursor m/z for in-silico library generation or library-free search. | -| `--max_pr_mz` | number | `2400` | Maximum precursor m/z for in-silico library generation or library-free search. | -| `--min_fr_mz` | number | `100` | Minimum fragment m/z for in-silico library generation or library-free search. | -| `--max_fr_mz` | number | `1800` | Maximum fragment m/z for in-silico library generation or library-free search. | +| Parameter | Type | Default | Description | +| --------------------------------- | ------- | --------------- | ------------------------------------------------------------------------------------------------------------------------------------------ | +| `--met_excision` | boolean | `true` | Account for N-terminal methionine excision during database search. | +| `--allowed_missed_cleavages` | integer | `2` | Maximum number of allowed missed enzyme cleavages per peptide. | +| `--precursor_mass_tolerance` | integer | `5` | Precursor mass tolerance for database search. See also `--precursor_mass_tolerance_unit`. Can be overridden from SDRF. | +| `--precursor_mass_tolerance_unit` | string | `ppm` | Unit for precursor mass tolerance. Options: `ppm`, `Da`. | +| `--fragment_mass_tolerance` | number | `0.03` | Fragment mass tolerance for database search. Can be overridden from SDRF. | +| `--fragment_mass_tolerance_unit` | string | `Da` | Unit for fragment mass tolerance. Options: `ppm`, `Da`. | +| `--variable_mods` | string | `Oxidation (M)` | Comma-separated list of variable modifications using Unimod names (e.g. `Oxidation (M),Carbamidomethyl (C)`). Can be overridden from SDRF. | +| `--min_precursor_charge` | integer | `2` | Minimum precursor ion charge. | +| `--max_precursor_charge` | integer | `4` | Maximum precursor ion charge. | +| `--min_peptide_length` | integer | `6` | Minimum peptide length to consider. | +| `--max_peptide_length` | integer | `40` | Maximum peptide length to consider. | +| `--max_mods` | integer | `3` | Maximum number of modifications per peptide. Large values may slow the search considerably. | +| `--min_pr_mz` | number | `400` | Minimum precursor m/z for in-silico library generation or library-free search. | +| `--max_pr_mz` | number | `2400` | Maximum precursor m/z for in-silico library generation or library-free search. | +| `--min_fr_mz` | number | `100` | Minimum fragment m/z for in-silico library generation or library-free search. | +| `--max_fr_mz` | number | `1800` | Maximum fragment m/z for in-silico library generation or library-free search. | ## DIA-NN General -| Parameter | Type | Default | Description | -|---|---|---|---| -| `--diann_version` | string | `1.8.1` | DIA-NN version used by the workflow. Controls version-dependent flags (e.g. `--monitor-mod` for 1.8.x). | -| `--diann_debug` | integer | `3` | DIA-NN debug/verbosity level. Allowed values: 0, 1, 2, 3, 4. | -| `--diann_speclib` | string | `null` | Path to an existing spectral library. If provided, the pipeline uses it instead of predicting one from the FASTA. | -| `--diann_extra_args` | string | `null` | Extra command-line arguments appended to all DIA-NN steps. Flags incompatible with a specific step are automatically stripped with a warning. | -| `--diann_dda` | boolean | `false` | Enable DDA (Data-Dependent Acquisition) analysis mode. Passes `--dda` to all DIA-NN steps. Requires DIA-NN >= 2.3.2. Beta feature. | -| `--diann_light_models` | boolean | `false` | Enable `--light-models` for 10x faster in-silico library generation. Requires DIA-NN >= 2.0. | -| `--diann_export_quant` | boolean | `false` | Enable `--export-quant` for fragment-level parquet data export. Requires DIA-NN >= 2.0. | -| `--diann_site_ms1_quant` | boolean | `false` | Enable `--site-ms1-quant` to use MS1 apex intensities for PTM site quantification. Requires DIA-NN >= 2.0. | +| Parameter | Type | Default | Description | +| ------------------------ | ------- | ------- | --------------------------------------------------------------------------------------------------------------------------------------------- | +| `--diann_version` | string | `1.8.1` | DIA-NN version used by the workflow. Controls version-dependent flags (e.g. `--monitor-mod` for 1.8.x). | +| `--diann_debug` | integer | `3` | DIA-NN debug/verbosity level. Allowed values: 0, 1, 2, 3, 4. | +| `--diann_speclib` | string | `null` | Path to an existing spectral library. If provided, the pipeline uses it instead of predicting one from the FASTA. | +| `--diann_extra_args` | string | `null` | Extra command-line arguments appended to all DIA-NN steps. Flags incompatible with a specific step are automatically stripped with a warning. | +| `--diann_dda` | boolean | `false` | Enable DDA (Data-Dependent Acquisition) analysis mode. Passes `--dda` to all DIA-NN steps. Requires DIA-NN >= 2.3.2. Beta feature. | +| `--diann_light_models` | boolean | `false` | Enable `--light-models` for 10x faster in-silico library generation. Requires DIA-NN >= 2.0. | +| `--diann_export_quant` | boolean | `false` | Enable `--export-quant` for fragment-level parquet data export. Requires DIA-NN >= 2.0. | +| `--diann_site_ms1_quant` | boolean | `false` | Enable `--site-ms1-quant` to use MS1 apex intensities for PTM site quantification. Requires DIA-NN >= 2.0. | ## Mass Accuracy and Calibration -| Parameter | Type | Default | Description | -|---|---|---|---| -| `--mass_acc_automatic` | boolean | `true` | Automatically determine the MS2 mass accuracy setting. | -| `--mass_acc_ms1` | number | `15` | MS1 mass accuracy in ppm. Overrides automatic calibration when set. Maps to DIA-NN `--mass-acc-ms1`. | -| `--mass_acc_ms2` | number | `15` | MS2 mass accuracy in ppm. Overrides automatic calibration when set. Maps to DIA-NN `--mass-acc`. | -| `--scan_window` | integer | `8` | Scan window radius. Ideally approximately equal to the average number of data points per peak. | -| `--scan_window_automatic` | boolean | `true` | Automatically determine the scan window setting. | -| `--quick_mass_acc` | boolean | `true` | Use a fast heuristic algorithm instead of ID-number optimization when choosing MS2 mass accuracy automatically. | -| `--performance_mode` | boolean | `true` | Enable low-RAM/high-speed mode. Adds `--min-corr 2 --corr-diff 1 --time-corr-only` to DIA-NN. | +| Parameter | Type | Default | Description | +| ------------------------- | ------- | ------- | --------------------------------------------------------------------------------------------------------------- | +| `--mass_acc_automatic` | boolean | `true` | Automatically determine the MS2 mass accuracy setting. | +| `--mass_acc_ms1` | number | `15` | MS1 mass accuracy in ppm. Overrides automatic calibration when set. Maps to DIA-NN `--mass-acc-ms1`. | +| `--mass_acc_ms2` | number | `15` | MS2 mass accuracy in ppm. Overrides automatic calibration when set. Maps to DIA-NN `--mass-acc`. | +| `--scan_window` | integer | `8` | Scan window radius. Ideally approximately equal to the average number of data points per peak. | +| `--scan_window_automatic` | boolean | `true` | Automatically determine the scan window setting. | +| `--quick_mass_acc` | boolean | `true` | Use a fast heuristic algorithm instead of ID-number optimization when choosing MS2 mass accuracy automatically. | +| `--performance_mode` | boolean | `true` | Enable low-RAM/high-speed mode. Adds `--min-corr 2 --corr-diff 1 --time-corr-only` to DIA-NN. | ## Bruker/timsTOF -| Parameter | Type | Default | Description | -|---|---|---|---| -| `--diann_tims_sum` | boolean | `false` | Enable `--quant-tims-sum` for slice/scanning timsTOF methods. Highly recommended for Synchro-PASEF. | -| `--diann_im_window` | number | `null` | Set `--im-window` to ensure the ion mobility extraction window is not smaller than the specified value. | +| Parameter | Type | Default | Description | +| ------------------- | ------- | ------- | ------------------------------------------------------------------------------------------------------- | +| `--diann_tims_sum` | boolean | `false` | Enable `--quant-tims-sum` for slice/scanning timsTOF methods. Highly recommended for Synchro-PASEF. | +| `--diann_im_window` | number | `null` | Set `--im-window` to ensure the ion mobility extraction window is not smaller than the specified value. | ## PTM Localization -| Parameter | Type | Default | Description | -|---|---|---|---| -| `--enable_mod_localization` | boolean | `false` | Enable modification localization scoring in DIA-NN (`--monitor-mod`). | -| `--mod_localization` | string | `Phospho (S),Phospho (T),Phospho (Y)` | Comma-separated modification names or UniMod accessions for localization (e.g. `UniMod:21,UniMod:1`). | +| Parameter | Type | Default | Description | +| --------------------------- | ------- | ------------------------------------- | ----------------------------------------------------------------------------------------------------- | +| `--enable_mod_localization` | boolean | `false` | Enable modification localization scoring in DIA-NN (`--monitor-mod`). | +| `--mod_localization` | string | `Phospho (S),Phospho (T),Phospho (Y)` | Comma-separated modification names or UniMod accessions for localization (e.g. `UniMod:21,UniMod:1`). | ## Library Generation -| Parameter | Type | Default | Description | -|---|---|---|---| +| Parameter | Type | Default | Description | +| -------------------- | ------- | ------- | ----------------------------------------------------------------------------------------------------------- | | `--save_speclib_tsv` | boolean | `false` | Publish the human-readable TSV spectral library from the in-silico generation step to the output directory. | ## Preliminary Analysis -| Parameter | Type | Default | Description | -|---|---|---|---| -| `--skip_preliminary_analysis` | boolean | `false` | Skip the preliminary analysis step and use the provided spectral library as-is instead of building a local consensus library. | -| `--empirical_assembly_log` | string | `null` | Path to a pre-existing empirical assembly log file. Only used when `--skip_preliminary_analysis true` and `--diann_speclib` are set. | -| `--random_preanalysis` | boolean | `false` | Enable random selection of spectrum files for empirical library generation. | -| `--random_preanalysis_seed` | integer | `42` | Random seed for spectrum file selection when `--random_preanalysis` is enabled. | -| `--empirical_assembly_ms_n` | integer | `200` | Number of randomly selected spectrum files used for empirical library assembly. | +| Parameter | Type | Default | Description | +| ----------------------------- | ------- | ------- | ------------------------------------------------------------------------------------------------------------------------------------ | +| `--skip_preliminary_analysis` | boolean | `false` | Skip the preliminary analysis step and use the provided spectral library as-is instead of building a local consensus library. | +| `--empirical_assembly_log` | string | `null` | Path to a pre-existing empirical assembly log file. Only used when `--skip_preliminary_analysis true` and `--diann_speclib` are set. | +| `--random_preanalysis` | boolean | `false` | Enable random selection of spectrum files for empirical library generation. | +| `--random_preanalysis_seed` | integer | `42` | Random seed for spectrum file selection when `--random_preanalysis` is enabled. | +| `--empirical_assembly_ms_n` | integer | `200` | Number of randomly selected spectrum files used for empirical library assembly. | ## Quantification and Output -| Parameter | Type | Default | Description | -|---|---|---|---| -| `--pg_level` | integer | `2` | Protein inference mode. 0 = isoforms, 1 = protein names from FASTA, 2 = genes. | -| `--species_genes` | boolean | `false` | Add the organism identifier to gene names in DIA-NN output. | -| `--diann_normalize` | boolean | `true` | Enable cross-run normalization in DIA-NN. | -| `--diann_report_decoys` | boolean | `false` | Include decoy PSMs in the main .parquet report. | -| `--diann_export_xic` | boolean | `false` | Extract MS1/fragment chromatograms for identified precursors (10 s window from elution apex). | +| Parameter | Type | Default | Description | +| ------------------------- | ------- | ------- | ------------------------------------------------------------------------------------------------------------------ | +| `--pg_level` | integer | `2` | Protein inference mode. 0 = isoforms, 1 = protein names from FASTA, 2 = genes. | +| `--species_genes` | boolean | `false` | Add the organism identifier to gene names in DIA-NN output. | +| `--diann_normalize` | boolean | `true` | Enable cross-run normalization in DIA-NN. | +| `--diann_report_decoys` | boolean | `false` | Include decoy PSMs in the main .parquet report. | +| `--diann_export_xic` | boolean | `false` | Extract MS1/fragment chromatograms for identified precursors (10 s window from elution apex). | | `--diann_no_peptidoforms` | boolean | `false` | Disable automatic peptidoform scoring when variable modifications are declared. Not recommended by DIA-NN authors. | -| `--diann_use_quant` | boolean | `true` | Reuse existing .quant files if available during final quantification (`--use-quant`). | -| `--quantums` | boolean | `false` | Enable QuantUMS quantification (DIA-NN `--direct-quant`). | -| `--quantums_train_runs` | string | `null` | Run index range for QuantUMS training (e.g. `0:5`). Maps to `--quant-train-runs`. | -| `--quantums_sel_runs` | integer | `null` | Number of automatically selected runs for QuantUMS training. Must be >= 6. Maps to `--quant-sel-runs`. | -| `--quantums_params` | string | `null` | Pre-calculated QuantUMS parameters. Maps to `--quant-params`. | +| `--diann_use_quant` | boolean | `true` | Reuse existing .quant files if available during final quantification (`--use-quant`). | +| `--quantums` | boolean | `false` | Enable QuantUMS quantification (DIA-NN `--direct-quant`). | +| `--quantums_train_runs` | string | `null` | Run index range for QuantUMS training (e.g. `0:5`). Maps to `--quant-train-runs`. | +| `--quantums_sel_runs` | integer | `null` | Number of automatically selected runs for QuantUMS training. Must be >= 6. Maps to `--quant-sel-runs`. | +| `--quantums_params` | string | `null` | Pre-calculated QuantUMS parameters. Maps to `--quant-params`. | ## DDA Mode -| Parameter | Type | Default | Description | -|---|---|---|---| +| Parameter | Type | Default | Description | +| ------------- | ------- | ------- | ------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- | | `--diann_dda` | boolean | `false` | Enable DDA analysis mode. Passes `--dda` to all DIA-NN steps. Requires DIA-NN >= 2.3.2 (use `-profile diann_v2_3_2`). This is a beta feature with known limitations; see the usage documentation for details. | > **Note:** DDA support requires DIA-NN >= 2.3.2. Enable this profile with @@ -134,29 +134,29 @@ in a Nextflow config file. ## InfinDIA (Experimental) -| Parameter | Type | Default | Description | -|---|---|---|---| +| Parameter | Type | Default | Description | +| -------------------- | ------- | ------- | -------------------------------------------------------------------------------------- | | `--enable_infin_dia` | boolean | `false` | Enable InfinDIA for ultra-large search spaces. Requires DIA-NN >= 2.3.0. Experimental. | -| `--diann_pre_select` | integer | `null` | Precursor limit (`--pre-select N`) for InfinDIA pre-search. | +| `--diann_pre_select` | integer | `null` | Precursor limit (`--pre-select N`) for InfinDIA pre-search. | > **Note:** InfinDIA requires DIA-NN >= 2.3.0 and is considered experimental. ## Quality Control -| Parameter | Type | Default | Description | -|---|---|---|---| -| `--enable_pmultiqc` | boolean | `true` | Generate a pmultiqc proteomics QC report. | -| `--pmultiqc_idxml_skip` | boolean | `true` | Skip idXML files (do not generate search engine score plots) in the pmultiqc report. | -| `--contaminant_string` | string | `CONT` | Contaminant affix string used by pmultiqc to identify contaminant proteins. | -| `--protein_level_fdr_cutoff` | number | `0.01` | Experiment-wide protein/protein-group-level FDR cutoff. | +| Parameter | Type | Default | Description | +| ---------------------------- | ------- | ------- | ------------------------------------------------------------------------------------ | +| `--enable_pmultiqc` | boolean | `true` | Generate a pmultiqc proteomics QC report. | +| `--pmultiqc_idxml_skip` | boolean | `true` | Skip idXML files (do not generate search engine score plots) in the pmultiqc report. | +| `--contaminant_string` | string | `CONT` | Contaminant affix string used by pmultiqc to identify contaminant proteins. | +| `--protein_level_fdr_cutoff` | number | `0.01` | Experiment-wide protein/protein-group-level FDR cutoff. | ## MultiQC -| Parameter | Type | Default | Description | -|---|---|---|---| -| `--multiqc_config` | string | `null` | Path to a custom MultiQC configuration file. | -| `--multiqc_title` | string | `null` | Custom title for the MultiQC report. Used as page header and default filename. | -| `--multiqc_logo` | string | `null` | Path to a custom logo file for the MultiQC report. Must also be set in the MultiQC config. | -| `--skip_table_plots` | boolean | `false` | Skip protein/peptide table plots in pmultiqc. Useful for very large datasets. | -| `--max_multiqc_email_size` | string | `25.MB` | Maximum file size for MultiQC report attachments in summary emails. | -| `--multiqc_methods_description` | string | `null` | Path to a custom YAML file containing an HTML methods description for MultiQC. | +| Parameter | Type | Default | Description | +| ------------------------------- | ------- | ------- | ------------------------------------------------------------------------------------------ | +| `--multiqc_config` | string | `null` | Path to a custom MultiQC configuration file. | +| `--multiqc_title` | string | `null` | Custom title for the MultiQC report. Used as page header and default filename. | +| `--multiqc_logo` | string | `null` | Path to a custom logo file for the MultiQC report. Must also be set in the MultiQC config. | +| `--skip_table_plots` | boolean | `false` | Skip protein/peptide table plots in pmultiqc. Useful for very large datasets. | +| `--max_multiqc_email_size` | string | `25.MB` | Maximum file size for MultiQC report attachments in summary emails. | +| `--multiqc_methods_description` | string | `null` | Path to a custom YAML file containing an HTML methods description for MultiQC. | From 2cb774081933689b343f320b7a8ea9e6ac1d6c91 Mon Sep 17 00:00:00 2001 From: Yasset Perez-Riverol Date: Fri, 3 Apr 2026 18:38:34 +0100 Subject: [PATCH 13/28] chore: remove internal planning doc to resolve merge conflict Co-Authored-By: Claude Opus 4.6 (1M context) --- docs/v1-release-roadmap.md | 282 ------------------------------------- 1 file changed, 282 deletions(-) delete mode 100644 docs/v1-release-roadmap.md diff --git a/docs/v1-release-roadmap.md b/docs/v1-release-roadmap.md deleted file mode 100644 index 1961f20..0000000 --- a/docs/v1-release-roadmap.md +++ /dev/null @@ -1,282 +0,0 @@ -# quantmsdiann v1.0.0 Release Roadmap — Design Spec - -**Date:** 2026-04-03 -**Author:** Yasset Perez-Riverol + Claude -**Status:** Approved -**Issues covered:** #1, #2, #3, #5, #7, #9, #10, #15, #17 - ---- - -## Overview - -Comprehensive pre-release work for quantmsdiann v1.0.0 covering robustness fixes, DDA support via DIA-NN 2.3.2, new feature parameters, and documentation. Four-week timeline. - -## Architecture - -No new workflows or modules. The existing `workflows/dia.nf` pipeline handles both DIA and DDA since DIA-NN uses the same steps with `--dda` appended. Default container stays 1.8.1; 2.3.2 is the "latest" option. - -``` -SDRF_PARSING -> FILE_PREPARATION -> INSILICO_LIBRARY -> PRELIMINARY_ANALYSIS -> -ASSEMBLE_EMPIRICAL -> INDIVIDUAL_ANALYSIS -> FINAL_QUANTIFICATION -> DIANN_MSSTATS -> PMULTIQC -``` - -DDA parallelization is identical to DIA — per-file parallel for PRELIMINARY_ANALYSIS and INDIVIDUAL_ANALYSIS, synchronization points at ASSEMBLE_EMPIRICAL and FINAL_QUANTIFICATION. Confirmed by DIA-NN author in vdemichev/DiaNN#1727. - ---- - -## Week 1: Robustness Fixes - -### 1.1 Fix tee pipes masking failures - -Add `set -o pipefail` or `exit ${PIPESTATUS[0]}` to script blocks in: - -- `modules/local/diann/generate_cfg/main.nf` -- `modules/local/diann/diann_msstats/main.nf` -- `modules/local/samplesheet_check/main.nf` -- `modules/local/sdrf_parsing/main.nf` - -**Risk:** Without this fix, if `quantmsutilsc` or `parse_sdrf` fails, the Nextflow task appears to succeed because `tee` returns exit code 0. - -### 1.2 Add error retry to long-running DIA-NN tasks - -Add `label 'error_retry'` to: - -- PRELIMINARY_ANALYSIS (process_high) -- INDIVIDUAL_ANALYSIS (process_high) -- FINAL_QUANTIFICATION (process_high) -- INSILICO_LIBRARY_GENERATION (process_medium) -- ASSEMBLE_EMPIRICAL_LIBRARY (process_medium) - -These are the longest-running tasks and most susceptible to transient failures (OOM, I/O timeouts). - -### 1.3 Empty input guards - -- `subworkflows/local/create_input_channel/main.nf` — Fail fast if SDRF has 0 data rows after `splitCsv`. Add a `.count()` check with error message. -- `workflows/dia.nf` — Guard `.first()` calls on `ch_searchdb` and `ch_experiment_meta` with `.ifEmpty { error("...") }` to prevent indefinite hangs on empty inputs. - -### 1.4 New test configs - -**`conf/tests/test_dia_skip_preanalysis.config`:** - -- Sets `skip_preliminary_analysis = true` -- Uses default `mass_acc_ms1`, `mass_acc_ms2`, `scan_window` params -- Same PXD026600 test data as test_dia -- Validates the skip path that is currently untested in CI - -**`conf/tests/test_dia_speclib.config`:** - -- Sets `diann_speclib` to a pre-built spectral library -- Skips INSILICO_LIBRARY_GENERATION (the `if` branch in dia.nf line 55-56) -- Requires a small test spectral library in quantms-test-datasets (or generated from existing test data) - -Both configs added to `extended_ci.yml` stage 2a. - ---- - -## Week 2: Container Build + DDA Support - -### 2.1 Container (PR to bigbio/quantms-containers) - -Build and push `ghcr.io/bigbio/diann:2.3.2` from existing Dockerfile at `quantms-containers/diann-2.3.2/Dockerfile`. The Dockerfile downloads `DIA-NN-2.3.2-Academia-Linux.zip` from the official GitHub release. - -### 2.2 Version config - -Add `conf/diann_versions/v2_3_2.config`: - -```groovy -params.diann_version = '2.3.2' -process { - withLabel: diann { - container = 'ghcr.io/bigbio/diann:2.3.2' - } -} -singularity.enabled = false -docker.enabled = true -``` - -Add profile in `nextflow.config`: - -```groovy -diann_v2_3_2 { includeConfig 'conf/diann_versions/v2_3_2.config' } -``` - -### 2.3 DDA implementation - -**New param** in `nextflow.config`: - -```groovy -diann_dda = false // Enable DDA analysis mode (requires DIA-NN >= 2.3.2) -``` - -**Version guard** in `workflows/dia.nf` at workflow start: - -```groovy -if (params.diann_dda && params.diann_version < '2.3.2') { - error("DDA mode requires DIA-NN >= 2.3.2. Current version: ${params.diann_version}. Use -profile diann_v2_3_2") -} -``` - -**Pass `--dda` to all DIA-NN modules** — In each module's script block, add: - -```groovy -diann_dda_flag = params.diann_dda ? "--dda" : "" -``` - -And append `${diann_dda_flag}` to the DIA-NN command. Add `'--dda'` to the `blocked` list in all 5 modules. - -**Accept DDA in create_input_channel** — Modify `create_input_channel/main.nf` lines 78-88: - -```groovy -if (acqMethod.toLowerCase().contains("data-independent acquisition") || acqMethod.toLowerCase().contains("dia")) { - meta.acquisition_method = "dia" -} else if (params.diann_dda && (acqMethod.toLowerCase().contains("data-dependent acquisition") || acqMethod.toLowerCase().contains("dda"))) { - meta.acquisition_method = "dda" -} else if (acqMethod.isEmpty()) { - meta.acquisition_method = params.diann_dda ? "dda" : "dia" -} else { - log.error("Unsupported acquisition method: '${acqMethod}'. ...") - exit(1) -} -``` - -### 2.4 Test data (PR to bigbio/quantms-test-datasets) - -Add `comment[proteomics data acquisition method]` column with value `NT=Data-Dependent Acquisition;AC=PRIDE:0000627` to `testdata/lfq_ci/BSA/BSA_design.sdrf.tsv`. The sdrf-pipelines `convert-diann` already extracts this column correctly — no sdrf-pipelines changes needed. - -### 2.5 Test config - -**`conf/tests/test_dda.config`:** - -- Points to BSA dataset from `bigbio/quantms-test-datasets/testdata/lfq_ci/BSA/` -- Sets `diann_dda = true` -- Pins to `ghcr.io/bigbio/diann:2.3.2` -- Added to `extended_ci.yml` stage 2a (private containers) - -### 2.6 Schema + blocked list - -- Add `diann_dda` to `nextflow_schema.json` with description and version note -- Add `'--dda'` to blocked lists in all 5 DIA-NN modules - ---- - -## Week 3: Features - -### 3.1 New DIA-NN parameters - -| Parameter | Flag | Min Version | Module | Default | -| ---------------------- | ------------------ | ----------- | --------------------------- | --------------- | -| `diann_light_models` | `--light-models` | 2.0 | INSILICO_LIBRARY_GENERATION | false | -| `diann_export_quant` | `--export-quant` | 2.0 | FINAL_QUANTIFICATION | false | -| `diann_read_threads` | `--read-threads N` | 2.0 | All DIA-NN steps | null (disabled) | -| `diann_site_ms1_quant` | `--site-ms1-quant` | 2.0 | FINAL_QUANTIFICATION | false | - -Each parameter: add to `nextflow.config`, `nextflow_schema.json`, module script block (with version guard where needed), and module blocked list. - -### 3.2 InfinDIA groundwork (issue #10) - -New params: - -- `enable_infin_dia` (boolean, default: false) — requires >= 2.3.0 -- `diann_pre_select` (integer, optional) — `--pre-select N` precursor limit - -Implementation: - -- Pass `--infin-dia` to INSILICO_LIBRARY_GENERATION when enabled -- Version guard: error if enabled with DIA-NN < 2.3.0 -- No test config — InfinDIA needs large databases to be meaningful -- Document as experimental/advanced feature - -### 3.3 Close resolved issues - -- **#17** (phospho monitor-mod) — Already implemented via `diann_config.cfg` extraction. Close with explanation. -- **#2** (param consolidation) — Superseded by #4 (Phase 6). Close as duplicate. -- **#3** (ext.args documentation) — Close with documentation update in Week 4. - ---- - -## Week 4: Documentation - -### 4.1 Create `docs/parameters.md` - -Comprehensive parameter reference with all ~70 params grouped by: - -- Input/output options -- File preparation (conversion, indexing, statistics) -- DIA-NN general settings -- Mass accuracy and calibration -- Library generation -- Quantification and output -- DDA mode -- InfinDIA (experimental) -- Quality control (pmultiqc) -- MultiQC options -- Boilerplate (nf-core standard) - -Each param: name, type, default, description, version requirement (if any). - -### 4.2 Complete `docs/usage.md` - -Add missing sections: - -- Preprocessing params (`reindex_mzml`, `mzml_statistics`, `convert_dotd`) -- QC params (`enable_pmultiqc`, `skip_table_plots`, `contaminant_string`) -- MultiQC options -- DDA mode with limitations -- InfinDIA (basic) -- `diann_extra_args` scope per module (closes #3) -- `--verbose_modules` profile -- Container version override guide (closes #9) -- Singularity usage with image caching -- SLURM example (from `pride_codon_slurm.config`) -- AWS/cloud basics (Wave profile) - -### 4.3 Update `docs/output.md` - -- Intermediate outputs under `--verbose_modules` -- Parquet vs TSV output explanation (DIA-NN 2.0+) -- MSstats format section - -### 4.4 Housekeeping - -- Add pmultiqc to `CITATIONS.md` -- Fix #15 (docs mismatch for `--input`) -- Update README with DIA-NN version table and link to parameter reference -- Close #1 (documentation issue), #9 (container docs), #15 (input mismatch) - ---- - -## Issues Status After Release - -| Issue | Status | Resolution | -| ----- | ---------------- | ---------------------------------------------------- | -| #1 | Closed | Parameter documentation created | -| #2 | Closed | Superseded by #4 | -| #3 | Closed | ext.args scope documented | -| #5 | Closed | DDA support implemented | -| #7 | Closed | Phase 2 features wired | -| #9 | Closed | Container docs added | -| #10 | Partially closed | InfinDIA groundwork done, full support needs testing | -| #15 | Closed | Docs mismatch fixed | -| #17 | Closed | Already implemented | -| #4 | Open | Blocked on sdrf-pipelines converter release | -| #6 | Open | Blocked on PRIDE ontology release | -| #25 | Open | QPX deferred to next release | - ---- - -## External PRs Required - -1. **bigbio/quantms-containers** — Build and push `ghcr.io/bigbio/diann:2.3.2` -2. **bigbio/quantms-test-datasets** — Add `comment[proteomics data acquisition method]` column to BSA SDRF - ---- - -## Success Criteria - -- `nf-core pipelines lint --release` passes with 0 failures -- `pre-commit run --all-files` passes -- All existing CI tests still pass (test_dia, test_dia_dotd, etc.) -- New tests pass: test_dia_skip_preanalysis, test_dia_speclib, test_dda -- DDA test completes with BSA dataset on DIA-NN 2.3.2 -- `docs/parameters.md` covers all params in `nextflow_schema.json` -- `docs/usage.md` covers all major use cases From 046c7a1e1ce1d544ad7ae8ce20545c4c110bf025 Mon Sep 17 00:00:00 2001 From: Yasset Perez-Riverol Date: Fri, 3 Apr 2026 18:42:22 +0100 Subject: [PATCH 14/28] fix: address PR #32 review comments - Add version guard for DIA-NN 2.0+ params (--light-models, --export-quant, --site-ms1-quant) to prevent crashes with 1.8.1 - Add *.site_report.parquet as optional output in FINAL_QUANTIFICATION for site-level PTM quantification Co-Authored-By: Claude Opus 4.6 (1M context) --- modules/local/diann/final_quantification/main.nf | 1 + workflows/dia.nf | 9 +++++++++ 2 files changed, 10 insertions(+) diff --git a/modules/local/diann/final_quantification/main.nf b/modules/local/diann/final_quantification/main.nf index fc8bbd0..0c14f9a 100644 --- a/modules/local/diann/final_quantification/main.nf +++ b/modules/local/diann/final_quantification/main.nf @@ -35,6 +35,7 @@ process FINAL_QUANTIFICATION { // Different library files format are exported due to different DIA-NN versions path "empirical_library.tsv", emit: final_speclib, optional: true path "empirical_library.tsv.skyline.speclib", emit: skyline_speclib, optional: true + path "*.site_report.parquet", emit: site_report, optional: true path "versions.yml", emit: versions when: diff --git a/workflows/dia.nf b/workflows/dia.nf index 899d226..635295a 100644 --- a/workflows/dia.nf +++ b/workflows/dia.nf @@ -46,6 +46,15 @@ workflow DIA { error("InfinDIA requires DIA-NN >= 2.3.0. Current version: ${params.diann_version}. Use -profile diann_v2_3_2") } + // Version guard for DIA-NN 2.0+ features + if ((params.diann_light_models || params.diann_export_quant || params.diann_site_ms1_quant) && params.diann_version < '2.0') { + def enabled = [] + if (params.diann_light_models) enabled << '--light-models' + if (params.diann_export_quant) enabled << '--export-quant' + if (params.diann_site_ms1_quant) enabled << '--site-ms1-quant' + error("${enabled.join(', ')} require DIA-NN >= 2.0. Current version: ${params.diann_version}. Use -profile diann_v2_1_0 or later") + } + ch_searchdb = channel.fromPath(params.database, checkIfExists: true) .ifEmpty { error("No protein database found at '${params.database}'. Provide --database ") } .first() From 1a3610a7680300fa310408a2b3ed03de63391372 Mon Sep 17 00:00:00 2001 From: Yasset Perez-Riverol Date: Fri, 3 Apr 2026 18:54:11 +0100 Subject: [PATCH 15/28] =?UTF-8?q?fix:=20critical=20DDA=20bugs=20=E2=80=94?= =?UTF-8?q?=20missing=20version=20param=20and=20channel=20routing?= MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit 1. test_dda.config: Add diann_version = '2.3.2' so the version guard doesn't reject DDA mode (default is 1.8.1, guard requires >= 2.3.2) 2. quantmsdiann.nf: Update branch condition to also match "dda" acquisition method. Previously "dda".contains("dia") was false, causing all DDA files to be silently dropped from processing. Co-Authored-By: Claude Opus 4.6 (1M context) --- conf/tests/test_dda.config | 1 + workflows/quantmsdiann.nf | 2 +- 2 files changed, 2 insertions(+), 1 deletion(-) diff --git a/conf/tests/test_dda.config b/conf/tests/test_dda.config index c0ebb24..d9d7896 100644 --- a/conf/tests/test_dda.config +++ b/conf/tests/test_dda.config @@ -31,6 +31,7 @@ params { // DDA mode diann_dda = true + diann_version = '2.3.2' // Search parameters matching BSA dataset min_peptide_length = 7 diff --git a/workflows/quantmsdiann.nf b/workflows/quantmsdiann.nf index 9d869ac..0e31703 100644 --- a/workflows/quantmsdiann.nf +++ b/workflows/quantmsdiann.nf @@ -64,7 +64,7 @@ workflow QUANTMSDIANN { FILE_PREPARATION.out.results .branch { item -> - dia: item[0].acquisition_method.toLowerCase().contains("dia") + dia: item[0].acquisition_method.toLowerCase().contains("dia") || item[0].acquisition_method.toLowerCase().contains("dda") } .set { ch_fileprep_result } // From 7a20d7dad90e228234bfe204f8db84d18e7b2c0c Mon Sep 17 00:00:00 2001 From: Yasset Perez-Riverol Date: Sat, 4 Apr 2026 07:32:24 +0100 Subject: [PATCH 16/28] fix: make --no-ifs-removal and --no-main-report version-conditional These flags exist in DIA-NN 1.8.x but were removed in 2.3.x, causing 'unrecognised option' warnings. Only pass them for versions < 2.3. Co-Authored-By: Claude Opus 4.6 (1M context) --- modules/local/diann/individual_analysis/main.nf | 8 ++++++-- 1 file changed, 6 insertions(+), 2 deletions(-) diff --git a/modules/local/diann/individual_analysis/main.nf b/modules/local/diann/individual_analysis/main.nf index 1e24905..4371b9c 100644 --- a/modules/local/diann/individual_analysis/main.nf +++ b/modules/local/diann/individual_analysis/main.nf @@ -84,6 +84,10 @@ process INDIVIDUAL_ANALYSIS { diann_im_window = params.diann_im_window ? "--im-window $params.diann_im_window" : "" diann_dda_flag = params.diann_dda ? "--dda" : "" + // Flags removed in DIA-NN 2.3.x — only pass for older versions + no_ifs_removal = params.diann_version < '2.3' ? "--no-ifs-removal" : "" + no_main_report = params.diann_version < '2.3' ? "--no-main-report" : "" + // Per-file scan ranges from SDRF (empty = no flag, DIA-NN auto-detects) min_pr_mz = meta['ms1minmz'] ? "--min-pr-mz ${meta['ms1minmz']}" : "" max_pr_mz = meta['ms1maxmz'] ? "--max-pr-mz ${meta['ms1maxmz']}" : "" @@ -103,8 +107,8 @@ process INDIVIDUAL_ANALYSIS { --mass-acc ${mass_acc_ms2} \\ --mass-acc-ms1 ${mass_acc_ms1} \\ --window ${scan_window} \\ - --no-ifs-removal \\ - --no-main-report \\ + ${no_ifs_removal} \\ + ${no_main_report} \\ --relaxed-prot-inf \\ --pg-level $params.pg_level \\ ${min_pr_mz} \\ From 1a929ae854862e27efda47663ee008b6c50be40b Mon Sep 17 00:00:00 2001 From: Yasset Perez-Riverol Date: Mon, 6 Apr 2026 14:04:41 +0100 Subject: [PATCH 17/28] docs: update Zenodo DOI to 10.5281/zenodo.19437128 Co-Authored-By: Claude Opus 4.6 (1M context) --- README.md | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) diff --git a/README.md b/README.md index 67c2de6..99ef33d 100644 --- a/README.md +++ b/README.md @@ -2,7 +2,7 @@ [![GitHub Actions CI Status](https://github.com/bigbio/quantmsdiann/actions/workflows/ci.yml/badge.svg)](https://github.com/bigbio/quantmsdiann/actions/workflows/ci.yml) [![GitHub Actions Linting Status](https://github.com/bigbio/quantmsdiann/actions/workflows/linting.yml/badge.svg)](https://github.com/bigbio/quantmsdiann/actions/workflows/linting.yml) -[![Cite with Zenodo](https://zenodo.org/badge/DOI/10.5281/zenodo.15573386.svg)](https://doi.org/10.5281/zenodo.15573386) +[![Cite with Zenodo](https://zenodo.org/badge/DOI/10.5281/zenodo.19437128.svg)](https://doi.org/10.5281/zenodo.19437128) [![nf-test](https://img.shields.io/badge/unit_tests-nf--test-337ab7.svg)](https://www.nf-test.com) [![Nextflow](https://img.shields.io/badge/version-%E2%89%A525.04.0-green?style=flat&logo=nextflow&logoColor=white&color=%230DC09D&link=https%3A%2F%2Fnextflow.io)](https://www.nextflow.io/) @@ -103,7 +103,7 @@ If you would like to contribute to this pipeline, please see the [contributing g If you use quantmsdiann in your research, please cite: -> Dai et al. "quantms: a cloud-based pipeline for quantitative proteomics" (2024). DOI: [10.5281/zenodo.15573386](https://doi.org/10.5281/zenodo.15573386) +> Dai et al. "quantms: a cloud-based pipeline for quantitative proteomics" (2024). DOI: [10.5281/zenodo.19437128](https://doi.org/10.5281/zenodo.19437128) An extensive list of references for the tools used by the pipeline can be found in the [CITATIONS.md](CITATIONS.md) file. From bd80512dd91f51564f1cf96a17da533ec074c341 Mon Sep 17 00:00:00 2001 From: Yasset Perez-Riverol Date: Mon, 6 Apr 2026 21:10:20 +0100 Subject: [PATCH 18/28] fix: remove tdf2mzml module and references from documentation Bruker .d to mzML conversion via tdf2mzml is no longer needed. Co-Authored-By: Claude Opus 4.6 (1M context) --- AGENTS.md | 4 +- README.md | 2 +- modules/local/utils/tdf2mzml/main.nf | 38 ------------------ modules/local/utils/tdf2mzml/meta.yml | 42 -------------------- subworkflows/local/file_preparation/meta.yml | 1 - 5 files changed, 3 insertions(+), 84 deletions(-) delete mode 100644 modules/local/utils/tdf2mzml/main.nf delete mode 100644 modules/local/utils/tdf2mzml/meta.yml diff --git a/AGENTS.md b/AGENTS.md index 2c32d31..982fcfe 100644 --- a/AGENTS.md +++ b/AGENTS.md @@ -80,7 +80,7 @@ quantmsdiann/ │ ├── pmultiqc/ # QC reporting │ ├── sdrf_parsing/ # SDRF parsing │ ├── samplesheet_check/ # Input validation -│ └── utils/ # tdf2mzml, decompress, mzml stats +│ └── utils/ # decompress, mzml stats ├── conf/ │ ├── base.config # Resource definitions │ ├── modules/ # Module-specific configs @@ -97,7 +97,7 @@ quantmsdiann/ The pipeline executes the following steps: 1. **SDRF Validation & Parsing** - Validates input SDRF and extracts metadata -2. **File Preparation** - Converts RAW/mzML/.d/.dia files (ThermoRawFileParser, tdf2mzml) +2. **File Preparation** - Converts RAW/mzML/.d/.dia files (ThermoRawFileParser) 3. **Generate Config** - Creates DIA-NN config from enzyme/modifications (`quantmsutilsc dianncfg`) 4. **In-Silico Library Generation** - Predicts spectral library from FASTA (or uses provided library) 5. **Preliminary Analysis** - Per-file calibration and mass accuracy determination diff --git a/README.md b/README.md index 99ef33d..ee88b0e 100644 --- a/README.md +++ b/README.md @@ -25,7 +25,7 @@ The pipeline is built using [Nextflow](https://www.nextflow.io), a workflow tool The pipeline takes [SDRF](https://github.com/bigbio/proteomics-metadata-standard) metadata and mass spectrometry data files (`.raw`, `.mzML`, `.d`, `.dia`) as input and performs: 1. **Input validation** — SDRF parsing and validation via [sdrf-pipelines](https://github.com/bigbio/sdrf-pipelines) -2. **File preparation** — RAW to mzML conversion ([ThermoRawFileParser](https://github.com/compomics/ThermoRawFileParser)), indexing, Bruker `.d` handling ([tdf2mzml](https://github.com/bigbio/tdf2mzml)) +2. **File preparation** — RAW to mzML conversion ([ThermoRawFileParser](https://github.com/compomics/ThermoRawFileParser)), indexing 3. **In-silico spectral library generation** — deep learning-based prediction, or use a user-provided library (`--diann_speclib`) 4. **Preliminary analysis** — per-file calibration and mass accuracy estimation (parallelized) 5. **Empirical library assembly** — consensus library from preliminary results with RT profiling diff --git a/modules/local/utils/tdf2mzml/main.nf b/modules/local/utils/tdf2mzml/main.nf deleted file mode 100644 index a242935..0000000 --- a/modules/local/utils/tdf2mzml/main.nf +++ /dev/null @@ -1,38 +0,0 @@ -process TDF2MZML { - tag "$meta.id" - label 'process_single' - label 'error_retry' - - container 'quay.io/bigbio/tdf2mzml:latest' // TODO: pin to a specific version tag for reproducibility - - input: - tuple val(meta), path(rawfile) - - output: - tuple val(meta), path("*.mzML"), emit: mzmls_converted - path "versions.yml", emit: versions - path "*.log", emit: log - - script: - def args = task.ext.args ?: '' - def prefix = task.ext.prefix ?: "${meta.id}" - - """ - echo "Converting..." | tee --append ${rawfile.baseName}_conversion.log - tdf2mzml.py -i *.d $args 2>&1 | tee --append ${rawfile.baseName}_conversion.log - - # Rename .mzml to .mzML via temp file to handle case-insensitive filesystems (e.g. macOS) - mv *.mzml __tmp_converted.mzML && mv __tmp_converted.mzML ${file(rawfile.baseName).baseName}.mzML - - # Rename .d directory only if the name differs (avoid 'same file' error) - target_d="${file(rawfile.baseName).baseName}.d" - if [ ! -d "\${target_d}" ]; then - mv *.d "\${target_d}" - fi - - cat <<-END_VERSIONS > versions.yml - "${task.process}": - tdf2mzml.py: \$(tdf2mzml.py --version) - END_VERSIONS - """ -} diff --git a/modules/local/utils/tdf2mzml/meta.yml b/modules/local/utils/tdf2mzml/meta.yml deleted file mode 100644 index ebb90b8..0000000 --- a/modules/local/utils/tdf2mzml/meta.yml +++ /dev/null @@ -1,42 +0,0 @@ -name: tdf2mzml -description: convert raw bruker files to mzml files -keywords: - - raw - - mzML - - .d -tools: - - tdf2mzml: - description: | - It takes a bruker .d raw file as input and outputs indexed mzML - homepage: https://github.com/mafreitas/tdf2mzml - documentation: https://github.com/mafreitas/tdf2mzml -input: - - meta: - type: map - description: | - Groovy Map containing sample information - - rawfile: - type: file - description: | - Bruker .d raw directory - pattern: "*.d" -output: - - meta: - type: map - description: | - Groovy Map containing sample information - e.g. [ id:'MD5', enzyme:trypsin ] - - mzml: - type: file - description: indexed mzML - pattern: "*.mzML" - - log: - type: file - description: log file - pattern: "*.log" - - version: - type: file - description: File containing software version - pattern: "versions.yml" -authors: - - "@jspaezp" diff --git a/subworkflows/local/file_preparation/meta.yml b/subworkflows/local/file_preparation/meta.yml index 54211c7..54d34fc 100644 --- a/subworkflows/local/file_preparation/meta.yml +++ b/subworkflows/local/file_preparation/meta.yml @@ -8,7 +8,6 @@ keywords: - proteomics components: - thermorawfileparser - - tdf2mzml - decompress - mzml/indexing - mzml/statistics From 66ece2699e848d80504da562c38a7cd6d1d8f628 Mon Sep 17 00:00:00 2001 From: Yasset Perez-Riverol Date: Mon, 6 Apr 2026 21:12:31 +0100 Subject: [PATCH 19/28] chore: bump version to 1.0.1dev for next release cycle Co-Authored-By: Claude Opus 4.6 (1M context) --- nextflow.config | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/nextflow.config b/nextflow.config index 33c5c95..fe29615 100644 --- a/nextflow.config +++ b/nextflow.config @@ -362,7 +362,7 @@ manifest { mainScript = 'main.nf' defaultBranch = 'main' nextflowVersion = '!>=25.04.0' - version = '1.0.0' + version = '1.0.1dev' doi = '10.5281/zenodo.15573386' } From 1dd2786a5dcff7ebeb385dfa97f8dbc8870bdb91 Mon Sep 17 00:00:00 2001 From: Yasset Perez-Riverol Date: Tue, 7 Apr 2026 07:24:21 +0100 Subject: [PATCH 20/28] style: fix prettier formatting in parameters.md Co-Authored-By: Claude Opus 4.6 (1M context) --- docs/parameters.md | 12 ++++++------ 1 file changed, 6 insertions(+), 6 deletions(-) diff --git a/docs/parameters.md b/docs/parameters.md index b95bdd1..0a3eac8 100644 --- a/docs/parameters.md +++ b/docs/parameters.md @@ -58,10 +58,10 @@ This document lists every pipeline parameter organised by category. Default valu | `--diann_debug` | integer | `3` | DIA-NN debug/verbosity level (0-4). Higher values produce more verbose logs. | | `--diann_speclib` | string | `null` | Path to an external spectral library. If provided, the in-silico library generation step is skipped. | | `--diann_extra_args` | string | `null` | Extra arguments appended to all DIA-NN steps. Flags incompatible with a step are automatically stripped with a warning. See [Passing Extra Arguments to DIA-NN](usage.md#passing-extra-arguments-to-dia-nn). | -| `--diann_dda` | boolean | `false` | Enable DDA (Data-Dependent Acquisition) analysis mode. Passes `--dda` to all DIA-NN steps. Requires DIA-NN >= 2.3.2. Beta feature. | -| `--diann_light_models` | boolean | `false` | Enable `--light-models` for 10x faster in-silico library generation. Requires DIA-NN >= 2.0. | -| `--diann_export_quant` | boolean | `false` | Enable `--export-quant` for fragment-level parquet data export. Requires DIA-NN >= 2.0. | -| `--diann_site_ms1_quant` | boolean | `false` | Enable `--site-ms1-quant` to use MS1 apex intensities for PTM site quantification. Requires DIA-NN >= 2.0. | +| `--diann_dda` | boolean | `false` | Enable DDA (Data-Dependent Acquisition) analysis mode. Passes `--dda` to all DIA-NN steps. Requires DIA-NN >= 2.3.2. Beta feature. | +| `--diann_light_models` | boolean | `false` | Enable `--light-models` for 10x faster in-silico library generation. Requires DIA-NN >= 2.0. | +| `--diann_export_quant` | boolean | `false` | Enable `--export-quant` for fragment-level parquet data export. Requires DIA-NN >= 2.0. | +| `--diann_site_ms1_quant` | boolean | `false` | Enable `--site-ms1-quant` to use MS1 apex intensities for PTM site quantification. Requires DIA-NN >= 2.0. | ## 6. Mass Accuracy & Calibration @@ -102,8 +102,8 @@ This document lists every pipeline parameter organised by category. Default valu | `--skip_preliminary_analysis` | boolean | `false` | Skip preliminary analysis. Use the provided spectral library as-is instead of generating a local consensus library. | | `--empirical_assembly_log` | string | `null` | Path to a pre-existing empirical assembly log file. Only used when `--skip_preliminary_analysis true` and `--diann_speclib` are set. | | `--random_preanalysis` | boolean | `false` | Enable random selection of spectrum files for empirical library generation. | -| `--random_preanalysis_seed` | integer | `42` | Random seed for file selection when `--random_preanalysis` is enabled. | -| `--empirical_assembly_ms_n` | integer | `200` | Number of randomly selected spectrum files when `--random_preanalysis` is enabled. | +| `--random_preanalysis_seed` | integer | `42` | Random seed for file selection when `--random_preanalysis` is enabled. | +| `--empirical_assembly_ms_n` | integer | `200` | Number of randomly selected spectrum files when `--random_preanalysis` is enabled. | ## 11. Quantification & Output From 9d9d44405e3d582269c982cdaa41d37d98244244 Mon Sep 17 00:00:00 2001 From: Yasset Perez-Riverol Date: Tue, 7 Apr 2026 08:52:06 +0100 Subject: [PATCH 21/28] fix: add GHCR login to CI for test_dda private container The test_dda profile uses ghcr.io/bigbio/diann:2.3.2 which is a private container requiring authentication. Add Docker login step (matching merge_ci.yml) conditioned on test_dda profile. Co-Authored-By: Claude Opus 4.6 (1M context) --- .github/workflows/ci.yml | 9 +++++++++ 1 file changed, 9 insertions(+) diff --git a/.github/workflows/ci.yml b/.github/workflows/ci.yml index 20bd668..aadf1dd 100644 --- a/.github/workflows/ci.yml +++ b/.github/workflows/ci.yml @@ -60,6 +60,15 @@ jobs: mkdir -p $NXF_SINGULARITY_CACHEDIR mkdir -p $NXF_SINGULARITY_LIBRARYDIR + - name: Log in to GitHub Container Registry + if: matrix.test_profile == 'test_dda' + env: + GHCR_TOKEN: ${{ secrets.GHCR_TOKEN }} + run: | + if [ -n "$GHCR_TOKEN" ]; then + echo "${{ secrets.GHCR_TOKEN }}" | docker login ghcr.io -u ${{ secrets.GHCR_USERNAME }} --password-stdin + fi + - name: Disk space cleanup uses: jlumbroso/free-disk-space@v1.3.1 From 5f716f61e2bb17866bc5c14249573466ff25a1d4 Mon Sep 17 00:00:00 2001 From: Yasset Perez-Riverol Date: Tue, 7 Apr 2026 09:17:15 +0100 Subject: [PATCH 22/28] fix: remove plan docs, add semantic version comparison utility - Remove implementation plan from repo, add docs/plans/ to .gitignore - Add lib/VersionUtils.groovy for semantic version comparison (prevents string comparison bugs like '2.10.0' < '2.3') - Update all version guards in dia.nf and module scripts to use VersionUtils.versionLessThan/versionAtLeast Co-Authored-By: Claude Opus 4.6 (1M context) --- .gitignore | 1 + .../2026-04-03-v1-release-implementation.md | 805 ------------------ lib/VersionUtils.groovy | 34 + .../local/diann/final_quantification/main.nf | 2 +- .../local/diann/individual_analysis/main.nf | 4 +- workflows/dia.nf | 6 +- 6 files changed, 41 insertions(+), 811 deletions(-) delete mode 100644 docs/plans/2026-04-03-v1-release-implementation.md create mode 100644 lib/VersionUtils.groovy diff --git a/.gitignore b/.gitignore index 52812e6..6a54d55 100644 --- a/.gitignore +++ b/.gitignore @@ -21,3 +21,4 @@ null/ .codacy/ .github/instructions/codacy.instructions.md docs/superpowers/ +docs/plans/ diff --git a/docs/plans/2026-04-03-v1-release-implementation.md b/docs/plans/2026-04-03-v1-release-implementation.md deleted file mode 100644 index 8439513..0000000 --- a/docs/plans/2026-04-03-v1-release-implementation.md +++ /dev/null @@ -1,805 +0,0 @@ -# quantmsdiann v1.0.0 Release — Implementation Plan - -> **For agentic workers:** REQUIRED SUB-SKILL: Use superpowers:subagent-driven-development (recommended) or superpowers:executing-plans to implement this plan task-by-task. Steps use checkbox (`- [ ]`) syntax for tracking. - -**Goal:** Prepare quantmsdiann for a robust v1.0.0 release with DDA support, new DIA-NN parameters, and comprehensive documentation. - -**Architecture:** No new workflows or modules. All changes are additions to existing files — new params, flags, guards, test configs, and docs. DDA uses the same pipeline as DIA with `--dda` appended to all DIA-NN invocations. Default container stays 1.8.1; 2.3.2 is opt-in via profile. - -**Tech Stack:** Nextflow DSL2, nf-core, DIA-NN, Groovy, Bash - ---- - -## Task 1: Fix tee pipes masking failures - -**Files:** - -- Modify: `modules/local/diann/generate_cfg/main.nf:26` -- Modify: `modules/local/diann/diann_msstats/main.nf:21-26` -- Modify: `modules/local/samplesheet_check/main.nf:38-43` -- Modify: `modules/local/sdrf_parsing/main.nf:24-30` - -- [ ] **Step 1: Add pipefail to generate_cfg** - -In `modules/local/diann/generate_cfg/main.nf`, find the `"""` opening the script block (line 20) and add `set -o pipefail` as the first line: - -````groovy - """ - set -o pipefail - parse_sdrf generate-diann-cfg \\ - ... - ``` - -- [ ] **Step 2: Add pipefail to diann_msstats** - -In `modules/local/diann/diann_msstats/main.nf`, find the `"""` opening the script block (line 20) and add `set -o pipefail`: - -```groovy - """ - set -o pipefail - quantmsutilsc diann2msstats \\ - ... - ``` - -- [ ] **Step 3: Add pipefail to samplesheet_check** - -In `modules/local/samplesheet_check/main.nf`, find the `"""` opening the script block and add `set -o pipefail`: - -```groovy - """ - set -o pipefail - ... - ``` - -- [ ] **Step 4: Add pipefail to sdrf_parsing** - -In `modules/local/sdrf_parsing/main.nf`, find the `"""` opening the script block (line 22) and add `set -o pipefail`: - -```groovy - """ - set -o pipefail - parse_sdrf convert-diann \\ - ... - ``` - -- [ ] **Step 5: Validate and commit** - -```bash -conda run -n nfcore pre-commit run --all-files -git add modules/local/diann/generate_cfg/main.nf modules/local/diann/diann_msstats/main.nf modules/local/samplesheet_check/main.nf modules/local/sdrf_parsing/main.nf -git commit -m "fix: add pipefail to all modules with tee pipes - -Without pipefail, if the command before tee fails, tee returns 0 and -the Nextflow task appears to succeed. This masked failures in -generate_cfg, diann_msstats, samplesheet_check, and sdrf_parsing." -```` - ---- - -## Task 2: Add error retry to long-running DIA-NN tasks - -**Files:** - -- Modify: `modules/local/diann/preliminary_analysis/main.nf:3-4` -- Modify: `modules/local/diann/individual_analysis/main.nf:3-4` -- Modify: `modules/local/diann/final_quantification/main.nf:3-4` -- Modify: `modules/local/diann/insilico_library_generation/main.nf:3-4` -- Modify: `modules/local/diann/assemble_empirical_library/main.nf:3-4` - -- [ ] **Step 1: Add error_retry label to all 5 DIA-NN modules** - -In each file, add `label 'error_retry'` after the existing labels. For example, `preliminary_analysis/main.nf` currently has: - -```groovy - label 'process_high' - label 'diann' -``` - -Change to: - -```groovy - label 'process_high' - label 'diann' - label 'error_retry' -``` - -Do the same for: - -- `individual_analysis/main.nf` (after `label 'diann'`) -- `final_quantification/main.nf` (after `label 'diann'`) -- `insilico_library_generation/main.nf` (after `label 'diann'`) -- `assemble_empirical_library/main.nf` (after `label 'diann'`) - -- [ ] **Step 2: Validate and commit** - -```bash -conda run -n nfcore pre-commit run --all-files -git add modules/local/diann/preliminary_analysis/main.nf modules/local/diann/individual_analysis/main.nf modules/local/diann/final_quantification/main.nf modules/local/diann/insilico_library_generation/main.nf modules/local/diann/assemble_empirical_library/main.nf -git commit -m "fix: add error_retry label to all DIA-NN analysis modules - -These are the longest-running tasks and most susceptible to transient -failures (OOM, I/O timeouts). The error_retry label enables automatic -retry on signal exits (130-145, 104, 175)." -``` - ---- - -## Task 3: Add empty input guards - -**Files:** - -- Modify: `workflows/dia.nf:38,46` - -- [ ] **Step 1: Guard ch_searchdb with ifEmpty** - -In `workflows/dia.nf`, line 38, change: - -```groovy - ch_searchdb = channel.fromPath(params.database, checkIfExists: true).first() -``` - -To: - -```groovy - ch_searchdb = channel.fromPath(params.database, checkIfExists: true) - .ifEmpty { error("No protein database found at '${params.database}'. Provide --database ") } - .first() -``` - -- [ ] **Step 2: Guard ch_experiment_meta with ifEmpty** - -In `workflows/dia.nf`, line 46, change: - -```groovy - ch_experiment_meta = ch_result.meta.unique { m -> m.experiment_id }.first() -``` - -To: - -```groovy - ch_experiment_meta = ch_result.meta.unique { m -> m.experiment_id } - .ifEmpty { error("No valid input files found after SDRF parsing. Check your SDRF file and input paths.") } - .first() -``` - -- [ ] **Step 3: Validate and commit** - -```bash -conda run -n nfcore pre-commit run --all-files -git add workflows/dia.nf -git commit -m "fix: add empty input guards to prevent silent pipeline hangs - -Guard ch_searchdb and ch_experiment_meta with ifEmpty to fail fast -with clear error messages instead of hanging indefinitely." -``` - ---- - -## Task 4: Add DIA-NN 2.3.2 version config and profile - -**Files:** - -- Create: `conf/diann_versions/v2_3_2.config` -- Modify: `nextflow.config:245-247` (profiles section) - -- [ ] **Step 1: Create v2_3_2.config** - -Create `conf/diann_versions/v2_3_2.config`: - -```groovy -/* - * DIA-NN 2.3.2 container override (private ghcr.io) - * Latest release with DDA support and InfinDIA. - */ -params.diann_version = '2.3.2' - -process { - withLabel: diann { - container = 'ghcr.io/bigbio/diann:2.3.2' - } -} - -singularity.enabled = false -docker.enabled = true -``` - -- [ ] **Step 2: Add profile to nextflow.config** - -In `nextflow.config`, after the `diann_v2_2_0` profile line (around line 247), add: - -```groovy - diann_v2_3_2 { includeConfig 'conf/diann_versions/v2_3_2.config' } -``` - -- [ ] **Step 3: Validate and commit** - -```bash -conda run -n nfcore pre-commit run --all-files -git add conf/diann_versions/v2_3_2.config nextflow.config -git commit -m "feat: add DIA-NN 2.3.2 version config and profile - -Adds conf/diann_versions/v2_3_2.config with ghcr.io/bigbio/diann:2.3.2 -container. Use -profile diann_v2_3_2 to opt in. Default stays 1.8.1. -Enables DDA support and InfinDIA features." -``` - ---- - -## Task 5: Implement DDA support — params, version guard, flag passthrough - -**Files:** - -- Modify: `nextflow.config:53-57` (DIA-NN general params) -- Modify: `nextflow_schema.json` (DIA-NN section) -- Modify: `workflows/dia.nf:35-38` (version guard) -- Modify: `subworkflows/local/create_input_channel/main.nf:75-88` (acquisition method) -- Modify: `modules/local/diann/insilico_library_generation/main.nf` (blocked list + flag) -- Modify: `modules/local/diann/preliminary_analysis/main.nf` (blocked list + flag) -- Modify: `modules/local/diann/assemble_empirical_library/main.nf` (blocked list + flag) -- Modify: `modules/local/diann/individual_analysis/main.nf` (blocked list + flag) -- Modify: `modules/local/diann/final_quantification/main.nf` (blocked list + flag) - -- [ ] **Step 1: Add diann_dda param to nextflow.config** - -In `nextflow.config`, after `diann_extra_args = null` (line 57), add: - -```groovy - diann_dda = false // Enable DDA analysis mode (requires DIA-NN >= 2.3.2) -``` - -- [ ] **Step 2: Add diann_dda to nextflow_schema.json** - -In `nextflow_schema.json`, in the DIA-NN section (inside `"$defs"` > appropriate group), add: - -```json -"diann_dda": { - "type": "boolean", - "description": "Enable DDA (Data-Dependent Acquisition) analysis mode. Passes --dda to all DIA-NN steps. Requires DIA-NN >= 2.3.2 (use -profile diann_v2_3_2). Beta feature — only trust q-values, PEP, RT/IM, Ms1.Apex.Area. PTM localization unreliable with DDA.", - "fa_icon": "fas fa-flask", - "default": false -} -``` - -Add `"diann_dda"` to the corresponding `"required"` or `"properties"` list in the appropriate group. - -- [ ] **Step 3: Add version guard in workflows/dia.nf** - -In `workflows/dia.nf`, at the start of the `main:` block (after line 37), add: - -```groovy - // Version guard for DDA mode - if (params.diann_dda && params.diann_version < '2.3.2') { - error("DDA mode (--diann_dda) requires DIA-NN >= 2.3.2. Current version: ${params.diann_version}. Use -profile diann_v2_3_2") - } -``` - -- [ ] **Step 4: Accept DDA acquisition method in create_input_channel** - -In `subworkflows/local/create_input_channel/main.nf`, replace lines 75-88 (the acquisition method validation block): - -```groovy - // Validate acquisition method - def acqMethod = row.AcquisitionMethod?.toString()?.trim() ?: "" - if (acqMethod.toLowerCase().contains("data-independent acquisition") || acqMethod.toLowerCase().contains("dia")) { - meta.acquisition_method = "dia" - } else if (params.diann_dda && (acqMethod.toLowerCase().contains("data-dependent acquisition") || acqMethod.toLowerCase().contains("dda"))) { - meta.acquisition_method = "dda" - } else if (acqMethod.isEmpty()) { - meta.acquisition_method = params.diann_dda ? "dda" : "dia" - } else { - log.error("Unsupported acquisition method: '${acqMethod}'. This pipeline supports DIA" + (params.diann_dda ? " and DDA (--diann_dda)" : "") + ". Found in file: ${filestr}") - exit(1) - } -``` - -- [ ] **Step 5: Add --dda flag to all 5 DIA-NN modules** - -For each of the 5 DIA-NN modules, make two changes: - -**a) Add `'--dda'` to the blocked list.** In each module's `def blocked = [...]`, add `'--dda'` to the array. - -**b) Add the flag variable and append it to the command.** In each module's script block, after the existing flag variables (before the `"""` shell block), add: - -```groovy - diann_dda_flag = params.diann_dda ? "--dda" : "" -``` - -Then append `${diann_dda_flag} \\` to the DIA-NN command, before `\${mod_flags}` (or before `$args` if no mod_flags). - -Apply to: - -- `modules/local/diann/insilico_library_generation/main.nf` -- `modules/local/diann/preliminary_analysis/main.nf` -- `modules/local/diann/assemble_empirical_library/main.nf` -- `modules/local/diann/individual_analysis/main.nf` -- `modules/local/diann/final_quantification/main.nf` - -- [ ] **Step 6: Validate and commit** - -```bash -conda run -n nfcore pre-commit run --all-files -conda run -n nfcore nf-core pipelines lint --dir . -git add nextflow.config nextflow_schema.json workflows/dia.nf subworkflows/local/create_input_channel/main.nf modules/local/diann/*/main.nf -git commit -m "feat: add DDA support via --diann_dda flag (#5) - -- New param diann_dda (boolean, default: false) -- Version guard: requires DIA-NN >= 2.3.2 -- Passes --dda to all 5 DIA-NN modules when enabled -- Accepts DDA acquisition method in SDRF when diann_dda=true -- Added --dda to blocked lists in all modules - -Closes #5" -``` - ---- - -## Task 6: Add DDA test config - -**Files:** - -- Create: `conf/tests/test_dda.config` -- Modify: `.github/workflows/extended_ci.yml:110-191` (stage 2a) - -- [ ] **Step 1: Create test_dda.config** - -Create `conf/tests/test_dda.config`: - -```groovy -/* -~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ - Nextflow config file for testing DDA analysis (requires DIA-NN >= 2.3.2) -~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ - Tests DDA mode using the BSA dataset with --diann_dda flag. - Uses ghcr.io/bigbio/diann:2.3.2. - - Use as follows: - nextflow run bigbio/quantmsdiann -profile test_dda,docker [--outdir ] - ------------------------------------------------------------------------------------------------- -*/ - -process { - resourceLimits = [ - cpus: 4, - memory: '12.GB', - time: '48.h' - ] -} - -params { - config_profile_name = 'Test profile for DDA analysis' - config_profile_description = 'DDA test using BSA dataset with DIA-NN 2.3.2.' - - outdir = './results_dda' - - // Input data — BSA DDA dataset - input = 'https://raw.githubusercontent.com/bigbio/quantms-test-datasets/quantms/testdata/lfq_ci/BSA/BSA_design.sdrf.tsv' - database = 'https://raw.githubusercontent.com/bigbio/quantms-test-datasets/quantms/testdata/lfq_ci/BSA/18Protein_SoCe_Tr_detergents_trace.fasta' - - // DDA mode - diann_dda = true - - // Search parameters matching BSA dataset - min_peptide_length = 7 - max_peptide_length = 30 - max_precursor_charge = 3 - allowed_missed_cleavages = 1 - diann_normalize = false - publish_dir_mode = 'symlink' - max_mods = 2 -} - -process { - withLabel: diann { - container = 'ghcr.io/bigbio/diann:2.3.2' - } -} - -singularity.enabled = false -docker.enabled = true -``` - -- [ ] **Step 2: Add test_dda profile to nextflow.config** - -In `nextflow.config`, after the `test_dia_2_2_0` profile line (around line 241), add: - -```groovy - test_dda { includeConfig 'conf/tests/test_dda.config' } -``` - -- [ ] **Step 3: Add test_dda to extended_ci.yml stage 2a** - -In `.github/workflows/extended_ci.yml`, in the `test-latest` job matrix (around line 120), add `"test_dda"` to the `test_profile` array: - -```yaml -test_profile: ["test_latest_dia", "test_dia_quantums", "test_dia_parquet", "test_dda"] -``` - -- [ ] **Step 4: Validate and commit** - -```bash -conda run -n nfcore pre-commit run --all-files -git add conf/tests/test_dda.config nextflow.config .github/workflows/extended_ci.yml -git commit -m "test: add DDA test config using BSA dataset with DIA-NN 2.3.2 - -Uses bigbio/quantms-test-datasets BSA LFQ dataset (~34 MB) with -diann_dda=true pinned to ghcr.io/bigbio/diann:2.3.2. Added to -extended_ci.yml stage 2a (private containers)." -``` - ---- - -## Task 7: Add test configs for skip_preliminary_analysis and speclib input - -**Files:** - -- Create: `conf/tests/test_dia_skip_preanalysis.config` -- Modify: `nextflow.config` (profiles section) -- Modify: `.github/workflows/extended_ci.yml` (stage 2a) - -- [ ] **Step 1: Create test_dia_skip_preanalysis.config** - -Create `conf/tests/test_dia_skip_preanalysis.config`: - -```groovy -/* -~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ - Nextflow config file for testing skip_preliminary_analysis path -~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ - Tests the pipeline with skip_preliminary_analysis=true, using default - mass accuracy parameters. Validates the untested code path in dia.nf. - - Use as follows: - nextflow run bigbio/quantmsdiann -profile test_dia_skip_preanalysis,docker [--outdir ] - ------------------------------------------------------------------------------------------------- -*/ - -process { - resourceLimits = [ - cpus: 4, - memory: '12.GB', - time: '48.h' - ] -} - -params { - config_profile_name = 'Test profile for skip preliminary analysis' - config_profile_description = 'Tests skip_preliminary_analysis path with default mass accuracy params.' - - outdir = './results_skip_preanalysis' - - // Input data — same as test_dia - input = 'https://raw.githubusercontent.com/bigbio/quantms-test-datasets/quantms/testdata/dia_ci/PXD026600.sdrf.tsv' - database = 'https://raw.githubusercontent.com/bigbio/quantms-test-datasets/quantms/testdata/dia_ci/REF_EColi_K12_UPS1_combined.fasta' - min_pr_mz = 350 - max_pr_mz = 950 - min_fr_mz = 500 - max_fr_mz = 1500 - min_peptide_length = 15 - max_peptide_length = 30 - max_precursor_charge = 3 - allowed_missed_cleavages = 1 - diann_normalize = false - publish_dir_mode = 'symlink' - max_mods = 2 - - // Skip preliminary analysis — use default mass accuracy params - skip_preliminary_analysis = true - mass_acc_ms2 = 15 - mass_acc_ms1 = 15 - scan_window = 8 -} -``` - -- [ ] **Step 2: Add profile to nextflow.config** - -After existing test profiles (around line 242), add: - -```groovy - test_dia_skip_preanalysis { includeConfig 'conf/tests/test_dia_skip_preanalysis.config' } -``` - -- [ ] **Step 3: Add to extended_ci.yml stage 2a** - -In the `test-latest` job matrix, add `"test_dia_skip_preanalysis"` to the `test_profile` array. - -- [ ] **Step 4: Validate and commit** - -```bash -conda run -n nfcore pre-commit run --all-files -git add conf/tests/test_dia_skip_preanalysis.config nextflow.config .github/workflows/extended_ci.yml -git commit -m "test: add test config for skip_preliminary_analysis path - -Tests the previously untested code path where preliminary analysis is -skipped and default mass accuracy parameters are used directly." -``` - ---- - -## Task 8: Add new DIA-NN feature parameters (light-models, export-quant, site-ms1-quant) - -**Files:** - -- Modify: `nextflow.config` (params section) -- Modify: `nextflow_schema.json` -- Modify: `modules/local/diann/insilico_library_generation/main.nf` (light-models) -- Modify: `modules/local/diann/final_quantification/main.nf` (export-quant, site-ms1-quant) - -- [ ] **Step 1: Add params to nextflow.config** - -In `nextflow.config`, in the DIA-NN general section (after `diann_dda`, around line 58), add: - -```groovy - diann_light_models = false // add '--light-models' for 10x faster library generation (DIA-NN >= 2.0) - diann_export_quant = false // add '--export-quant' for fragment-level parquet export (DIA-NN >= 2.0) - diann_site_ms1_quant = false // add '--site-ms1-quant' for MS1 apex PTM quantification (DIA-NN >= 2.0) -``` - -- [ ] **Step 2: Add params to nextflow_schema.json** - -Add each param to the DIA-NN section in the schema with type, description, default, and fa_icon. - -- [ ] **Step 3: Wire --light-models in insilico_library_generation** - -In `modules/local/diann/insilico_library_generation/main.nf`: - -a) Add `'--light-models'` to the blocked list (line 26-32). - -b) After `diann_no_peptidoforms` variable (line 47), add: - -```groovy - diann_light_models = params.diann_light_models ? "--light-models" : "" -``` - -c) Append `${diann_light_models} \\` to the DIA-NN command before `${met_excision}`. - -- [ ] **Step 4: Wire --export-quant and --site-ms1-quant in final_quantification** - -In `modules/local/diann/final_quantification/main.nf`: - -a) Add `'--export-quant'` and `'--site-ms1-quant'` to the blocked list (line 45-50). - -b) After `diann_dda_flag` variable, add: - -```groovy - diann_export_quant = params.diann_export_quant ? "--export-quant" : "" - diann_site_ms1_quant = params.diann_site_ms1_quant ? "--site-ms1-quant" : "" -``` - -c) Append both to the DIA-NN command before `\${mod_flags}`. - -- [ ] **Step 5: Validate and commit** - -```bash -conda run -n nfcore pre-commit run --all-files -conda run -n nfcore nf-core pipelines lint --dir . -git add nextflow.config nextflow_schema.json modules/local/diann/insilico_library_generation/main.nf modules/local/diann/final_quantification/main.nf -git commit -m "feat: add --light-models, --export-quant, --site-ms1-quant params (#7) - -- diann_light_models: 10x faster in-silico library generation -- diann_export_quant: fragment-level parquet export -- diann_site_ms1_quant: MS1 apex intensities for PTM quantification -All require DIA-NN >= 2.0." -``` - ---- - -## Task 9: Add InfinDIA groundwork - -**Files:** - -- Modify: `nextflow.config` (params section) -- Modify: `nextflow_schema.json` -- Modify: `workflows/dia.nf` (version guard) -- Modify: `modules/local/diann/insilico_library_generation/main.nf` (flag) - -- [ ] **Step 1: Add InfinDIA params to nextflow.config** - -After the DDA param, add: - -```groovy - // DIA-NN: InfinDIA (experimental, v2.3.0+) - enable_infin_dia = false // Enable InfinDIA for ultra-large search spaces - diann_pre_select = null // --pre-select N precursor limit for InfinDIA -``` - -- [ ] **Step 2: Add to nextflow_schema.json** - -Add `enable_infin_dia` (boolean) and `diann_pre_select` (integer, optional) to the schema. - -- [ ] **Step 3: Add version guard in workflows/dia.nf** - -After the DDA version guard, add: - -```groovy - if (params.enable_infin_dia && params.diann_version < '2.3.0') { - error("InfinDIA requires DIA-NN >= 2.3.0. Current version: ${params.diann_version}. Use -profile diann_v2_3_2") - } -``` - -- [ ] **Step 4: Wire flags in insilico_library_generation** - -In `modules/local/diann/insilico_library_generation/main.nf`: - -a) Add `'--infin-dia'` and `'--pre-select'` to the blocked list. - -b) Add flag variables: - -```groovy - infin_dia_flag = params.enable_infin_dia ? "--infin-dia" : "" - pre_select_flag = params.diann_pre_select ? "--pre-select $params.diann_pre_select" : "" -``` - -c) Append both to the DIA-NN command. - -- [ ] **Step 5: Validate and commit** - -```bash -conda run -n nfcore pre-commit run --all-files -conda run -n nfcore nf-core pipelines lint --dir . -git add nextflow.config nextflow_schema.json workflows/dia.nf modules/local/diann/insilico_library_generation/main.nf -git commit -m "feat: add InfinDIA groundwork — enable_infin_dia param (#10) - -Experimental support for InfinDIA (DIA-NN 2.3.0+). Passes --infin-dia -to library generation when enabled. Version guard enforces >= 2.3.0. -No test config — InfinDIA requires large databases." -``` - ---- - -## Task 10: Documentation — parameters.md - -**Files:** - -- Create: `docs/parameters.md` - -- [ ] **Step 1: Create comprehensive parameter reference** - -Create `docs/parameters.md` with all params from `nextflow_schema.json` grouped by category. Read `nextflow.config` and `nextflow_schema.json` to get every param, its type, default, and description. Group into: - -1. Input/output options -2. File preparation -3. DIA-NN general -4. Mass accuracy and calibration -5. Library generation -6. Quantification and output -7. DDA mode -8. InfinDIA (experimental) -9. Quality control -10. MultiQC options -11. Boilerplate - -Each param entry: `| name | type | default | description |` - -- [ ] **Step 2: Commit** - -```bash -git add docs/parameters.md -git commit -m "docs: add comprehensive parameter reference (#1) - -Complete reference for all ~70 pipeline parameters grouped by category -with types, defaults, descriptions, and version requirements. - -Closes #1" -``` - ---- - -## Task 11: Documentation — complete usage.md and output.md - -**Files:** - -- Modify: `docs/usage.md` -- Modify: `docs/output.md` -- Modify: `CITATIONS.md` -- Modify: `README.md` - -- [ ] **Step 1: Add DDA section to usage.md** - -Add a "DDA Analysis Mode" section after the Bruker/timsTOF section with: - -- How to enable (`--diann_dda true -profile diann_v2_3_2`) -- Limitations (beta, trusted columns only, PTM unreliable, MBR limited) -- Example command -- Link to DIA-NN DDA documentation - -- [ ] **Step 2: Add missing param sections to usage.md** - -Add sections for: - -- Preprocessing params (`reindex_mzml`, `mzml_statistics`, `convert_dotd`) -- QC params (`enable_pmultiqc`, `skip_table_plots`, `contaminant_string`) -- `diann_extra_args` scope per module -- `--verbose_modules` profile -- Container version override guide (DIA-NN version profiles) -- Singularity usage -- SLURM example - -- [ ] **Step 3: Update output.md** - -Add: - -- Parquet vs TSV output explanation -- MSstats format section -- Intermediate outputs under `--verbose_modules` - -- [ ] **Step 4: Add pmultiqc to CITATIONS.md** - -Add pmultiqc citation after the MultiQC entry. - -- [ ] **Step 5: Update README.md** - -Add DIA-NN version support table and link to `docs/parameters.md`. - -- [ ] **Step 6: Validate and commit** - -```bash -conda run -n nfcore pre-commit run --all-files -git add docs/usage.md docs/output.md CITATIONS.md README.md -git commit -m "docs: complete usage.md, output.md, citations, README (#1, #3, #9, #15) - -- DDA mode documentation with limitations -- Missing param sections (preprocessing, QC, extra_args scope) -- Container version override and Singularity guides -- Parquet vs TSV output explanation -- pmultiqc citation added -- README updated with version table - -Closes #3, #9, #15" -``` - ---- - -## Task 12: Close resolved issues - -- [ ] **Step 1: Close issues via GitHub CLI** - -```bash -gh issue close 17 --repo bigbio/quantmsdiann --comment "Already implemented — --monitor-mod is extracted from diann_config.cfg (generated by sdrf-pipelines convert-diann) and passed to all DIA-NN steps via mod_flags." -gh issue close 2 --repo bigbio/quantmsdiann --comment "Superseded by #4 (Phase 6: consolidate param generation to sdrf-pipelines)." -gh issue close 1 --repo bigbio/quantmsdiann --comment "Resolved — docs/parameters.md created with comprehensive parameter reference." -gh issue close 3 --repo bigbio/quantmsdiann --comment "Resolved — diann_extra_args scope documented in docs/usage.md." -gh issue close 9 --repo bigbio/quantmsdiann --comment "Resolved — container version override guide and Singularity usage added to docs/usage.md." -gh issue close 15 --repo bigbio/quantmsdiann --comment "Resolved — docs/usage.md input documentation updated." -``` - ---- - -## Task 13: Final validation and push - -- [ ] **Step 1: Run full validation suite** - -```bash -conda run -n nfcore pre-commit run --all-files -conda run -n nfcore nf-core pipelines lint --release --dir . -``` - -Expected: 0 failures on both. - -- [ ] **Step 2: Push dda branch and create PR** - -```bash -git push -u origin dda -gh pr create --title "feat: v1.0.0 release — robustness, DDA support, features, docs" --body "$(cat <<'PREOF' -## Summary -- Robustness fixes: pipefail, error_retry, empty input guards -- DDA support via --diann_dda flag (DIA-NN >= 2.3.2) -- New params: --light-models, --export-quant, --site-ms1-quant -- InfinDIA groundwork (experimental) -- DIA-NN 2.3.2 version config -- New test configs: test_dda, test_dia_skip_preanalysis -- Comprehensive docs: parameters.md, complete usage.md, output.md - -## Issues -Closes #1, #3, #5, #7, #9, #10, #15, #17 - -## Test plan -- [ ] Existing CI tests pass (test_dia, test_dia_dotd) -- [ ] New test_dda passes with BSA dataset on DIA-NN 2.3.2 -- [ ] test_dia_skip_preanalysis passes -- [ ] nf-core lint --release: 0 failures -- [ ] pre-commit: all passing -PREOF -)" --base dev -``` diff --git a/lib/VersionUtils.groovy b/lib/VersionUtils.groovy new file mode 100644 index 0000000..f340a61 --- /dev/null +++ b/lib/VersionUtils.groovy @@ -0,0 +1,34 @@ +/** + * Semantic version comparison utility for DIA-NN version guards. + * + * Nextflow auto-loads all classes in lib/, so these are available + * in workflows and module scripts without explicit imports. + */ +class VersionUtils { + + /** + * Compare two version strings semantically (e.g. '2.10.0' > '2.3.2'). + * Returns negative if a < b, zero if equal, positive if a > b. + */ + static int compare(String a, String b) { + def partsA = a.tokenize('.').collect { it.isInteger() ? it.toInteger() : 0 } + def partsB = b.tokenize('.').collect { it.isInteger() ? it.toInteger() : 0 } + def maxLen = Math.max(partsA.size(), partsB.size()) + for (int i = 0; i < maxLen; i++) { + int va = i < partsA.size() ? partsA[i] : 0 + int vb = i < partsB.size() ? partsB[i] : 0 + if (va != vb) return va <=> vb + } + return 0 + } + + /** True if version is strictly less than required. */ + static boolean versionLessThan(String version, String required) { + return compare(version, required) < 0 + } + + /** True if version is greater than or equal to required. */ + static boolean versionAtLeast(String version, String required) { + return compare(version, required) >= 0 + } +} diff --git a/modules/local/diann/final_quantification/main.nf b/modules/local/diann/final_quantification/main.nf index a16f551..00e5659 100644 --- a/modules/local/diann/final_quantification/main.nf +++ b/modules/local/diann/final_quantification/main.nf @@ -66,7 +66,7 @@ process FINAL_QUANTIFICATION { report_decoys = params.diann_report_decoys ? "--report-decoys": "" diann_export_xic = params.diann_export_xic ? "--xic": "" // --direct-quant only exists in DIA-NN >= 1.9.2 (QuantUMS counterpart); skip for older versions - quantums = params.quantums ? "" : (params.diann_version >= '1.9' ? "--direct-quant" : "") + quantums = params.quantums ? "" : (VersionUtils.versionAtLeast(params.diann_version, '1.9') ? "--direct-quant" : "") quantums_train_runs = params.quantums_train_runs ? "--quant-train-runs $params.quantums_train_runs": "" quantums_sel_runs = params.quantums_sel_runs ? "--quant-sel-runs $params.quantums_sel_runs": "" quantums_params = params.quantums_params ? "--quant-params $params.quantums_params": "" diff --git a/modules/local/diann/individual_analysis/main.nf b/modules/local/diann/individual_analysis/main.nf index b4e8507..a17af6a 100644 --- a/modules/local/diann/individual_analysis/main.nf +++ b/modules/local/diann/individual_analysis/main.nf @@ -86,8 +86,8 @@ process INDIVIDUAL_ANALYSIS { diann_dda_flag = params.diann_dda ? "--dda" : "" // Flags removed in DIA-NN 2.3.x — only pass for older versions - no_ifs_removal = params.diann_version < '2.3' ? "--no-ifs-removal" : "" - no_main_report = params.diann_version < '2.3' ? "--no-main-report" : "" + no_ifs_removal = VersionUtils.versionLessThan(params.diann_version, '2.3') ? "--no-ifs-removal" : "" + no_main_report = VersionUtils.versionLessThan(params.diann_version, '2.3') ? "--no-main-report" : "" // Per-file scan ranges from SDRF (empty = no flag, DIA-NN auto-detects) min_pr_mz = meta['ms1minmz'] ? "--min-pr-mz ${meta['ms1minmz']}" : "" diff --git a/workflows/dia.nf b/workflows/dia.nf index 351f4f2..be3f850 100644 --- a/workflows/dia.nf +++ b/workflows/dia.nf @@ -36,17 +36,17 @@ workflow DIA { ch_software_versions = channel.empty() // Version guard for DDA mode - if (params.diann_dda && params.diann_version < '2.3.2') { + if (params.diann_dda && VersionUtils.versionLessThan(params.diann_version, '2.3.2')) { error("DDA mode (--diann_dda) requires DIA-NN >= 2.3.2. Current version: ${params.diann_version}. Use -profile diann_v2_3_2") } // Version guard for InfinDIA - if (params.enable_infin_dia && params.diann_version < '2.3.0') { + if (params.enable_infin_dia && VersionUtils.versionLessThan(params.diann_version, '2.3.0')) { error("InfinDIA requires DIA-NN >= 2.3.0. Current version: ${params.diann_version}. Use -profile diann_v2_3_2") } // Version guard for DIA-NN 2.0+ features - if ((params.diann_light_models || params.diann_export_quant || params.diann_site_ms1_quant) && params.diann_version < '2.0') { + if ((params.diann_light_models || params.diann_export_quant || params.diann_site_ms1_quant) && VersionUtils.versionLessThan(params.diann_version, '2.0')) { def enabled = [] if (params.diann_light_models) enabled << '--light-models' if (params.diann_export_quant) enabled << '--export-quant' From 7df29e2663ae58b2730a31bc69d187edacce1468 Mon Sep 17 00:00:00 2001 From: Yasset Perez-Riverol Date: Tue, 7 Apr 2026 09:22:37 +0100 Subject: [PATCH 23/28] chore: bump version to 2.0.0dev for DDA support release DDA analysis support is a major feature warranting a major version bump. Co-Authored-By: Claude Opus 4.6 (1M context) --- nextflow.config | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/nextflow.config b/nextflow.config index 8d8677c..dd305a0 100644 --- a/nextflow.config +++ b/nextflow.config @@ -373,7 +373,7 @@ manifest { mainScript = 'main.nf' defaultBranch = 'main' nextflowVersion = '!>=25.04.0' - version = '1.0.1dev' + version = '2.0.0dev' doi = '10.5281/zenodo.15573386' } From a3f4e2563f9c10f649cc298035d01f6d2735b2f2 Mon Sep 17 00:00:00 2001 From: Yasset Perez-Riverol Date: Tue, 7 Apr 2026 10:02:20 +0100 Subject: [PATCH 24/28] fix: standardize meta.yml naming, fix descriptions and minor issues MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit - Rename output version→versions in sdrf_parsing/meta.yml - Add ch_ prefix to input_file→ch_input_file in input_check/meta.yml - Fix grammar in pmultiqc and diann_msstats meta.yml descriptions - Fix glob pattern in decompress_dotd/meta.yml (double-dot expansion) - Update CITATIONS.md to link published Nature Methods article - Fix schema_input.json error messages (source name, whitespace) - Standardize quantmsdiann keyword in utils meta.yml Co-Authored-By: Claude Opus 4.6 (1M context) --- CITATIONS.md | 2 +- assets/schema_input.json | 4 ++-- modules/local/diann/diann_msstats/meta.yml | 2 +- modules/local/pmultiqc/meta.yml | 2 +- modules/local/sdrf_parsing/meta.yml | 2 +- modules/local/utils/decompress_dotd/meta.yml | 2 +- subworkflows/local/input_check/meta.yml | 2 +- subworkflows/local/utils_nfcore_quantms_pipeline/meta.yml | 2 +- 8 files changed, 9 insertions(+), 9 deletions(-) diff --git a/CITATIONS.md b/CITATIONS.md index d74e0f9..8985a13 100644 --- a/CITATIONS.md +++ b/CITATIONS.md @@ -1,6 +1,6 @@ # bigbio/quantmsdiann: Citations -## [Pipeline](https://www.researchsquare.com/article/rs-3002027/v1) +## [Pipeline](https://doi.org/10.1038/s41592-024-02343-1) > Dai C, Pfeuffer J, Wang H, Zheng P, Käll L, Sachsenberg T, Demichev V, Bai M, Kohlbacher O, Perez-Riverol Y. quantms: a cloud-based pipeline for quantitative proteomics enables the reanalysis of public proteomics data. Nat Methods. 2024 Jul 4. doi: 10.1038/s41592-024-02343-1. Epub ahead of print. PMID: 38965444. diff --git a/assets/schema_input.json b/assets/schema_input.json index 7b15010..1699aad 100644 --- a/assets/schema_input.json +++ b/assets/schema_input.json @@ -11,7 +11,7 @@ "source name": { "type": "string", "pattern": "^\\S+$", - "errorMessage": "Sample name must be provided and cannot contain spaces" + "errorMessage": "Source name must be provided and cannot contain spaces" }, "comment[data file]": { "type": "string", @@ -22,7 +22,7 @@ "assay name": { "type": "string", "pattern": "^\\S+$", - "errorMessage": "Assay name must be provided and cannot contain spaces", + "errorMessage": "Assay name must be provided and cannot contain whitespace", "meta": ["assay"] } } diff --git a/modules/local/diann/diann_msstats/meta.yml b/modules/local/diann/diann_msstats/meta.yml index ac1f147..a440d61 100644 --- a/modules/local/diann/diann_msstats/meta.yml +++ b/modules/local/diann/diann_msstats/meta.yml @@ -17,7 +17,7 @@ input: pattern: "*.tsv" - exp_design: type: file - description: An experimental design file including Sample and replicates column et al. + description: An experimental design file including Sample and replicates column etc. pattern: "*.tsv" - report_pr: type: file diff --git a/modules/local/pmultiqc/meta.yml b/modules/local/pmultiqc/meta.yml index adf63f2..fcca33e 100644 --- a/modules/local/pmultiqc/meta.yml +++ b/modules/local/pmultiqc/meta.yml @@ -23,7 +23,7 @@ output: pattern: "*.html" - quantmsdb: type: file - description: Sqlite3 database file stored protein psm and quantification information + description: SQLite3 database file that stores protein, PSM, and quantification information pattern: "*.db" - data: type: dir diff --git a/modules/local/sdrf_parsing/meta.yml b/modules/local/sdrf_parsing/meta.yml index 7c311f4..846cfaa 100644 --- a/modules/local/sdrf_parsing/meta.yml +++ b/modules/local/sdrf_parsing/meta.yml @@ -28,7 +28,7 @@ output: type: file description: log file pattern: "*.log" - - version: + - versions: type: file description: File containing software version pattern: "versions.yml" diff --git a/modules/local/utils/decompress_dotd/meta.yml b/modules/local/utils/decompress_dotd/meta.yml index bbc7c58..55330d3 100644 --- a/modules/local/utils/decompress_dotd/meta.yml +++ b/modules/local/utils/decompress_dotd/meta.yml @@ -22,7 +22,7 @@ input: type: file description: | Bruker Raw file archived using tar - pattern: "*.{d.tar,.tar,.gz,.d.tar.gz}" + pattern: "*.{d.tar,tar,gz,d.tar.gz}" output: - meta: type: map diff --git a/subworkflows/local/input_check/meta.yml b/subworkflows/local/input_check/meta.yml index abe2c7f..3c88724 100644 --- a/subworkflows/local/input_check/meta.yml +++ b/subworkflows/local/input_check/meta.yml @@ -9,7 +9,7 @@ keywords: components: - samplesheet/check input: - - input_file: + - ch_input_file: type: file description: | Input file to be validated diff --git a/subworkflows/local/utils_nfcore_quantms_pipeline/meta.yml b/subworkflows/local/utils_nfcore_quantms_pipeline/meta.yml index 06365ae..cf1fd6d 100644 --- a/subworkflows/local/utils_nfcore_quantms_pipeline/meta.yml +++ b/subworkflows/local/utils_nfcore_quantms_pipeline/meta.yml @@ -4,7 +4,7 @@ description: Pipeline completion utilities for the nf-core quantmsdiann pipeline keywords: - utils - nf-core - - quantms + - quantmsdiann components: - completionemail - completionsummary From fc96eeb95e608f797e4bb904bd42b59616412f58 Mon Sep 17 00:00:00 2001 From: Yasset Perez-Riverol Date: Tue, 7 Apr 2026 13:10:10 +0100 Subject: [PATCH 25/28] feat: auto-detect DDA mode from SDRF acquisition method column DDA mode is now automatically detected from the SDRF `comment[proteomics data acquisition method]` column when it contains `data-dependent acquisition`. The `--diann_dda` flag is kept as a fallback for SDRFs that lack this column. - Modules now read acquisition method from meta instead of params - INSILICO_LIBRARY_GENERATION receives is_dda as input from workflow - Version guard triggers for both param and SDRF-detected DDA - Updated docs, parameters, and schema to reflect auto-detection Co-Authored-By: Claude Opus 4.6 (1M context) --- docs/parameters.md | 33 ++++++++++--------- docs/usage.md | 13 ++++++-- .../diann/assemble_empirical_library/main.nf | 2 +- .../local/diann/final_quantification/main.nf | 2 +- .../local/diann/individual_analysis/main.nf | 2 +- .../diann/insilico_library_generation/main.nf | 3 +- .../insilico_library_generation/meta.yml | 3 ++ .../local/diann/preliminary_analysis/main.nf | 2 +- nextflow.config | 2 +- nextflow_schema.json | 2 +- .../local/create_input_channel/main.nf | 6 ++-- workflows/dia.nf | 13 ++++++-- 12 files changed, 53 insertions(+), 30 deletions(-) diff --git a/docs/parameters.md b/docs/parameters.md index 0a3eac8..f3c76c0 100644 --- a/docs/parameters.md +++ b/docs/parameters.md @@ -52,16 +52,16 @@ This document lists every pipeline parameter organised by category. Default valu ## 5. DIA-NN General -| Parameter | Type | Default | Description | -| ------------------------ | ------- | ------- | ------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------ | -| `--diann_version` | string | `1.8.1` | DIA-NN version used by the workflow. Controls version-dependent flags (e.g. `--monitor-mod` for 1.8.x). See [DIA-NN Version Selection](usage.md#dia-nn-version-selection). | -| `--diann_debug` | integer | `3` | DIA-NN debug/verbosity level (0-4). Higher values produce more verbose logs. | -| `--diann_speclib` | string | `null` | Path to an external spectral library. If provided, the in-silico library generation step is skipped. | -| `--diann_extra_args` | string | `null` | Extra arguments appended to all DIA-NN steps. Flags incompatible with a step are automatically stripped with a warning. See [Passing Extra Arguments to DIA-NN](usage.md#passing-extra-arguments-to-dia-nn). | -| `--diann_dda` | boolean | `false` | Enable DDA (Data-Dependent Acquisition) analysis mode. Passes `--dda` to all DIA-NN steps. Requires DIA-NN >= 2.3.2. Beta feature. | -| `--diann_light_models` | boolean | `false` | Enable `--light-models` for 10x faster in-silico library generation. Requires DIA-NN >= 2.0. | -| `--diann_export_quant` | boolean | `false` | Enable `--export-quant` for fragment-level parquet data export. Requires DIA-NN >= 2.0. | -| `--diann_site_ms1_quant` | boolean | `false` | Enable `--site-ms1-quant` to use MS1 apex intensities for PTM site quantification. Requires DIA-NN >= 2.0. | +| Parameter | Type | Default | Description | +| ------------------------ | ------- | ------- | --------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- | +| `--diann_version` | string | `1.8.1` | DIA-NN version used by the workflow. Controls version-dependent flags (e.g. `--monitor-mod` for 1.8.x). See [DIA-NN Version Selection](usage.md#dia-nn-version-selection). | +| `--diann_debug` | integer | `3` | DIA-NN debug/verbosity level (0-4). Higher values produce more verbose logs. | +| `--diann_speclib` | string | `null` | Path to an external spectral library. If provided, the in-silico library generation step is skipped. | +| `--diann_extra_args` | string | `null` | Extra arguments appended to all DIA-NN steps. Flags incompatible with a step are automatically stripped with a warning. See [Passing Extra Arguments to DIA-NN](usage.md#passing-extra-arguments-to-dia-nn). | +| `--diann_dda` | boolean | `false` | Explicitly enable DDA mode. Normally auto-detected from the SDRF `comment[proteomics data acquisition method]` column. Use this flag only when the SDRF lacks the acquisition method. Requires DIA-NN >= 2.3.2. | +| `--diann_light_models` | boolean | `false` | Enable `--light-models` for 10x faster in-silico library generation. Requires DIA-NN >= 2.0. | +| `--diann_export_quant` | boolean | `false` | Enable `--export-quant` for fragment-level parquet data export. Requires DIA-NN >= 2.0. | +| `--diann_site_ms1_quant` | boolean | `false` | Enable `--site-ms1-quant` to use MS1 apex intensities for PTM site quantification. Requires DIA-NN >= 2.0. | ## 6. Mass Accuracy & Calibration @@ -123,13 +123,14 @@ This document lists every pipeline parameter organised by category. Default valu ## 12. DDA Mode -| Parameter | Type | Default | Description | -| ------------- | ------- | ------- | ------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- | -| `--diann_dda` | boolean | `false` | Enable DDA analysis mode. Passes `--dda` to all DIA-NN steps. Requires DIA-NN >= 2.3.2 (use `-profile diann_v2_3_2`). This is a beta feature with known limitations; see the usage documentation for details. | +| Parameter | Type | Default | Description | +| ------------- | ------- | ------- | ------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------ | +| `--diann_dda` | boolean | `false` | Explicitly enable DDA mode when SDRF lacks the acquisition method column. Normally DDA is auto-detected from the SDRF `comment[proteomics data acquisition method]`. Requires DIA-NN >= 2.3.2 (use `-profile diann_v2_3_2`). Beta feature. | -> **Note:** DDA support requires DIA-NN >= 2.3.2. Enable this profile with -> `-profile diann_v2_3_2`. The DDA mode is experimental and may not support -> all pipeline features available in DIA mode. +> **Note:** DDA mode is auto-detected from the SDRF when the `comment[proteomics data acquisition method]` +> column contains `data-dependent acquisition`. The `--diann_dda` flag is only needed as a +> fallback when the SDRF does not include this column. DDA requires DIA-NN >= 2.3.2 +> (`-profile diann_v2_3_2`). ## 13. InfinDIA (Experimental) diff --git a/docs/usage.md b/docs/usage.md index 4055f47..8b696df 100644 --- a/docs/usage.md +++ b/docs/usage.md @@ -66,7 +66,16 @@ For Synchro-PASEF data, enable `--diann_tims_sum` (which adds `--quant-tims-sum` ### DDA Analysis Mode (Beta) -DIA-NN 2.3.2+ supports DDA data analysis via the `--dda` flag. Enable it with: +DIA-NN 2.3.2+ supports DDA data analysis via the `--dda` flag. The pipeline **auto-detects DDA mode** from the SDRF `comment[proteomics data acquisition method]` column — no extra flags needed if your SDRF contains `data-dependent acquisition`: + +```bash +nextflow run bigbio/quantmsdiann \ + --input dda_sdrf.tsv \ + --database proteins.fasta \ + -profile diann_v2_3_2,docker +``` + +If your SDRF does not include the acquisition method column, you can explicitly enable DDA mode with `--diann_dda true`: ```bash nextflow run bigbio/quantmsdiann \ @@ -84,7 +93,7 @@ nextflow run bigbio/quantmsdiann \ - No isobaric labeling or reporter-tag quantification - Primary use cases: legacy DDA reanalysis, spectral library creation, immunopeptidomics -The pipeline uses the same workflow for DDA as DIA — the `--dda` flag is passed to all DIA-NN steps automatically. +The pipeline uses the same workflow for DDA as DIA — the `--dda` flag is passed to all DIA-NN steps automatically when DDA is detected from the SDRF or enabled via `--diann_dda`. ### Preprocessing Options diff --git a/modules/local/diann/assemble_empirical_library/main.nf b/modules/local/diann/assemble_empirical_library/main.nf index b3f0181..2bfb67e 100644 --- a/modules/local/diann/assemble_empirical_library/main.nf +++ b/modules/local/diann/assemble_empirical_library/main.nf @@ -54,7 +54,7 @@ process ASSEMBLE_EMPIRICAL_LIBRARY { diann_no_peptidoforms = params.diann_no_peptidoforms ? "--no-peptidoforms" : "" diann_tims_sum = params.diann_tims_sum ? "--quant-tims-sum" : "" diann_im_window = params.diann_im_window ? "--im-window $params.diann_im_window" : "" - diann_dda_flag = params.diann_dda ? "--dda" : "" + diann_dda_flag = meta.acquisition_method == 'dda' ? "--dda" : "" """ # Precursor Tolerance value was: ${meta['precursormasstolerance']} diff --git a/modules/local/diann/final_quantification/main.nf b/modules/local/diann/final_quantification/main.nf index 00e5659..a1ba609 100644 --- a/modules/local/diann/final_quantification/main.nf +++ b/modules/local/diann/final_quantification/main.nf @@ -72,7 +72,7 @@ process FINAL_QUANTIFICATION { quantums_params = params.quantums_params ? "--quant-params $params.quantums_params": "" diann_no_peptidoforms = params.diann_no_peptidoforms ? "--no-peptidoforms" : "" diann_use_quant = params.diann_use_quant ? "--use-quant" : "" - diann_dda_flag = params.diann_dda ? "--dda" : "" + diann_dda_flag = meta.acquisition_method == 'dda' ? "--dda" : "" diann_export_quant = params.diann_export_quant ? "--export-quant" : "" diann_site_ms1_quant = params.diann_site_ms1_quant ? "--site-ms1-quant" : "" diff --git a/modules/local/diann/individual_analysis/main.nf b/modules/local/diann/individual_analysis/main.nf index a17af6a..975ceaf 100644 --- a/modules/local/diann/individual_analysis/main.nf +++ b/modules/local/diann/individual_analysis/main.nf @@ -83,7 +83,7 @@ process INDIVIDUAL_ANALYSIS { diann_no_peptidoforms = params.diann_no_peptidoforms ? "--no-peptidoforms" : "" diann_tims_sum = params.diann_tims_sum ? "--quant-tims-sum" : "" diann_im_window = params.diann_im_window ? "--im-window $params.diann_im_window" : "" - diann_dda_flag = params.diann_dda ? "--dda" : "" + diann_dda_flag = meta.acquisition_method == 'dda' ? "--dda" : "" // Flags removed in DIA-NN 2.3.x — only pass for older versions no_ifs_removal = VersionUtils.versionLessThan(params.diann_version, '2.3') ? "--no-ifs-removal" : "" diff --git a/modules/local/diann/insilico_library_generation/main.nf b/modules/local/diann/insilico_library_generation/main.nf index 7de4665..8b05859 100644 --- a/modules/local/diann/insilico_library_generation/main.nf +++ b/modules/local/diann/insilico_library_generation/main.nf @@ -11,6 +11,7 @@ process INSILICO_LIBRARY_GENERATION { input: path(fasta) path(diann_config) + val(is_dda) output: path "versions.yml", emit: versions @@ -47,7 +48,7 @@ process INSILICO_LIBRARY_GENERATION { max_fr_mz = params.max_fr_mz ? "--max-fr-mz $params.max_fr_mz":"" met_excision = params.met_excision ? "--met-excision" : "" diann_no_peptidoforms = params.diann_no_peptidoforms ? "--no-peptidoforms" : "" - diann_dda_flag = params.diann_dda ? "--dda" : "" + diann_dda_flag = is_dda ? "--dda" : "" diann_light_models = params.diann_light_models ? "--light-models" : "" infin_dia_flag = params.enable_infin_dia ? "--infin-dia" : "" pre_select_flag = params.diann_pre_select ? "--pre-select $params.diann_pre_select" : "" diff --git a/modules/local/diann/insilico_library_generation/meta.yml b/modules/local/diann/insilico_library_generation/meta.yml index 5f9d68b..a6185d7 100644 --- a/modules/local/diann/insilico_library_generation/meta.yml +++ b/modules/local/diann/insilico_library_generation/meta.yml @@ -19,6 +19,9 @@ input: type: file description: specifies a configuration file to load options/commands from. pattern: "*.cfg" + - is_dda: + type: boolean + description: Whether DDA mode is enabled (auto-detected from SDRF or set via --diann_dda) output: - predict_speclib: type: file diff --git a/modules/local/diann/preliminary_analysis/main.nf b/modules/local/diann/preliminary_analysis/main.nf index 7e24280..f9a3508 100644 --- a/modules/local/diann/preliminary_analysis/main.nf +++ b/modules/local/diann/preliminary_analysis/main.nf @@ -68,7 +68,7 @@ process PRELIMINARY_ANALYSIS { scan_window = params.scan_window_automatic ? '' : "--window $params.scan_window" diann_tims_sum = params.diann_tims_sum ? "--quant-tims-sum" : "" diann_im_window = params.diann_im_window ? "--im-window $params.diann_im_window" : "" - diann_dda_flag = params.diann_dda ? "--dda" : "" + diann_dda_flag = meta.acquisition_method == 'dda' ? "--dda" : "" // Per-file scan ranges from SDRF (empty = no flag, DIA-NN auto-detects) min_pr_mz = meta['ms1minmz'] ? "--min-pr-mz ${meta['ms1minmz']}" : "" diff --git a/nextflow.config b/nextflow.config index dd305a0..696ebc4 100644 --- a/nextflow.config +++ b/nextflow.config @@ -52,7 +52,7 @@ params { diann_debug = 3 diann_speclib = null diann_extra_args = null - diann_dda = false // Enable DDA analysis mode (requires DIA-NN >= 2.3.2) + diann_dda = false // Fallback: explicitly enable DDA when SDRF lacks acquisition method (requires DIA-NN >= 2.3.2) diann_light_models = false // add '--light-models' for 10x faster library generation (DIA-NN >= 2.0) diann_export_quant = false // add '--export-quant' for fragment-level parquet export (DIA-NN >= 2.0) diann_site_ms1_quant = false // add '--site-ms1-quant' for MS1 apex PTM quantification (DIA-NN >= 2.0) diff --git a/nextflow_schema.json b/nextflow_schema.json index ebe3a45..08d564b 100644 --- a/nextflow_schema.json +++ b/nextflow_schema.json @@ -458,7 +458,7 @@ }, "diann_dda": { "type": "boolean", - "description": "Enable DDA (Data-Dependent Acquisition) analysis mode. Passes --dda to all DIA-NN steps. Requires DIA-NN >= 2.3.2 (use -profile diann_v2_3_2). Beta feature.", + "description": "Explicitly enable DDA mode. Normally auto-detected from the SDRF acquisition method column. Use only when SDRF lacks this column. Requires DIA-NN >= 2.3.2.", "fa_icon": "fas fa-flask", "default": false }, diff --git a/subworkflows/local/create_input_channel/main.nf b/subworkflows/local/create_input_channel/main.nf index c0249f1..a52b9ee 100644 --- a/subworkflows/local/create_input_channel/main.nf +++ b/subworkflows/local/create_input_channel/main.nf @@ -72,16 +72,16 @@ def create_meta_channel(LinkedHashMap row, enzymes, files, wrapper) { exit(1, "ERROR: Please check input file -> File Uri does not exist!\n${filestr}") } - // Validate acquisition method + // Detect acquisition method from SDRF or fallback to --diann_dda param def acqMethod = row.AcquisitionMethod?.toString()?.trim() ?: "" if (acqMethod.toLowerCase().contains("data-independent acquisition") || acqMethod.toLowerCase().contains("dia")) { meta.acquisition_method = "dia" - } else if (params.diann_dda && (acqMethod.toLowerCase().contains("data-dependent acquisition") || acqMethod.toLowerCase().contains("dda"))) { + } else if (acqMethod.toLowerCase().contains("data-dependent acquisition") || acqMethod.toLowerCase().contains("dda")) { meta.acquisition_method = "dda" } else if (acqMethod.isEmpty()) { meta.acquisition_method = params.diann_dda ? "dda" : "dia" } else { - log.error("Unsupported acquisition method: '${acqMethod}'. This pipeline supports DIA" + (params.diann_dda ? " and DDA (--diann_dda)" : "") + ". Found in file: ${filestr}") + log.error("Unsupported acquisition method: '${acqMethod}'. This pipeline supports DIA and DDA. Found in file: ${filestr}") exit(1) } diff --git a/workflows/dia.nf b/workflows/dia.nf index be3f850..69dd78d 100644 --- a/workflows/dia.nf +++ b/workflows/dia.nf @@ -35,7 +35,7 @@ workflow DIA { ch_software_versions = channel.empty() - // Version guard for DDA mode + // Version guard for DDA mode (when explicitly set via param) if (params.diann_dda && VersionUtils.versionLessThan(params.diann_version, '2.3.2')) { error("DDA mode (--diann_dda) requires DIA-NN >= 2.3.2. Current version: ${params.diann_version}. Use -profile diann_v2_3_2") } @@ -68,6 +68,15 @@ workflow DIA { .ifEmpty { error("No valid input files found after SDRF parsing. Check your SDRF file and input paths.") } .first() + // Determine DDA mode: true if explicitly set via param OR auto-detected from SDRF + ch_is_dda = ch_experiment_meta.map { meta -> + def dda = params.diann_dda || meta.acquisition_method == 'dda' + if (dda && VersionUtils.versionLessThan(params.diann_version, '2.3.2')) { + error("DDA mode (detected from SDRF) requires DIA-NN >= 2.3.2. Current version: ${params.diann_version}. Use -profile diann_v2_3_2") + } + return dda + } + // diann_config.cfg comes directly from SDRF_PARSING (convert-diann) // Use as value channel so it can be consumed by all per-file processes ch_diann_cfg_val = ch_diann_cfg @@ -78,7 +87,7 @@ workflow DIA { if (params.diann_speclib != null && params.diann_speclib.toString() != "") { speclib = channel.from(file(params.diann_speclib, checkIfExists: true)) } else { - INSILICO_LIBRARY_GENERATION(ch_searchdb, ch_diann_cfg_val) + INSILICO_LIBRARY_GENERATION(ch_searchdb, ch_diann_cfg_val, ch_is_dda) speclib = INSILICO_LIBRARY_GENERATION.out.predict_speclib } From 9f4ee58d98a90a9eaec8ec03381519da7748f709 Mon Sep 17 00:00:00 2001 From: Yasset Perez-Riverol Date: Tue, 7 Apr 2026 13:14:07 +0100 Subject: [PATCH 26/28] fix: address Copilot review comments - Revert meta.yml input_file naming to match workflow interface - Gate --pre-select behind enable_infin_dia (InfinDIA-only flag) - Tighten --direct-quant version check from 1.9 to 1.9.2 - Improve GHCR login: guard both token and username, skip test_dda gracefully on fork PRs where credentials are unavailable Co-Authored-By: Claude Opus 4.6 (1M context) --- .github/workflows/ci.yml | 10 +++++++--- modules/local/diann/final_quantification/main.nf | 4 ++-- .../local/diann/insilico_library_generation/main.nf | 2 +- subworkflows/local/input_check/meta.yml | 4 ++-- 4 files changed, 12 insertions(+), 8 deletions(-) diff --git a/.github/workflows/ci.yml b/.github/workflows/ci.yml index aadf1dd..e80a477 100644 --- a/.github/workflows/ci.yml +++ b/.github/workflows/ci.yml @@ -64,16 +64,20 @@ jobs: if: matrix.test_profile == 'test_dda' env: GHCR_TOKEN: ${{ secrets.GHCR_TOKEN }} + GHCR_USERNAME: ${{ secrets.GHCR_USERNAME }} run: | - if [ -n "$GHCR_TOKEN" ]; then - echo "${{ secrets.GHCR_TOKEN }}" | docker login ghcr.io -u ${{ secrets.GHCR_USERNAME }} --password-stdin + if [ -z "$GHCR_TOKEN" ] || [ -z "$GHCR_USERNAME" ]; then + echo "::warning::Skipping test_dda: GHCR credentials not available (expected for fork PRs)" + echo "SKIP_DDA=true" >> $GITHUB_ENV + exit 0 fi + echo "$GHCR_TOKEN" | docker login ghcr.io -u "$GHCR_USERNAME" --password-stdin - name: Disk space cleanup uses: jlumbroso/free-disk-space@v1.3.1 - name: Run pipeline with test data in docker/singularity profile - if: github.event.pull_request.base.ref != 'master' + if: github.event.pull_request.base.ref != 'master' && env.SKIP_DDA != 'true' run: | nextflow run ${GITHUB_WORKSPACE} -profile $TEST_PROFILE,$EXEC_PROFILE,dev --outdir ${TEST_PROFILE}_${EXEC_PROFILE}_results diff --git a/modules/local/diann/final_quantification/main.nf b/modules/local/diann/final_quantification/main.nf index a1ba609..64aa5cc 100644 --- a/modules/local/diann/final_quantification/main.nf +++ b/modules/local/diann/final_quantification/main.nf @@ -65,8 +65,8 @@ process FINAL_QUANTIFICATION { no_norm = params.diann_normalize ? "" : "--no-norm" report_decoys = params.diann_report_decoys ? "--report-decoys": "" diann_export_xic = params.diann_export_xic ? "--xic": "" - // --direct-quant only exists in DIA-NN >= 1.9.2 (QuantUMS counterpart); skip for older versions - quantums = params.quantums ? "" : (VersionUtils.versionAtLeast(params.diann_version, '1.9') ? "--direct-quant" : "") + // --direct-quant exists in DIA-NN >= 1.9.2 (QuantUMS counterpart); skip for older versions + quantums = params.quantums ? "" : (VersionUtils.versionAtLeast(params.diann_version, '1.9.2') ? "--direct-quant" : "") quantums_train_runs = params.quantums_train_runs ? "--quant-train-runs $params.quantums_train_runs": "" quantums_sel_runs = params.quantums_sel_runs ? "--quant-sel-runs $params.quantums_sel_runs": "" quantums_params = params.quantums_params ? "--quant-params $params.quantums_params": "" diff --git a/modules/local/diann/insilico_library_generation/main.nf b/modules/local/diann/insilico_library_generation/main.nf index 8b05859..66bca5e 100644 --- a/modules/local/diann/insilico_library_generation/main.nf +++ b/modules/local/diann/insilico_library_generation/main.nf @@ -51,7 +51,7 @@ process INSILICO_LIBRARY_GENERATION { diann_dda_flag = is_dda ? "--dda" : "" diann_light_models = params.diann_light_models ? "--light-models" : "" infin_dia_flag = params.enable_infin_dia ? "--infin-dia" : "" - pre_select_flag = params.diann_pre_select ? "--pre-select $params.diann_pre_select" : "" + pre_select_flag = (params.enable_infin_dia && params.diann_pre_select) ? "--pre-select $params.diann_pre_select" : "" """ diann `cat ${diann_config}` \\ diff --git a/subworkflows/local/input_check/meta.yml b/subworkflows/local/input_check/meta.yml index 3c88724..1f2cefb 100644 --- a/subworkflows/local/input_check/meta.yml +++ b/subworkflows/local/input_check/meta.yml @@ -9,12 +9,12 @@ keywords: components: - samplesheet/check input: - - ch_input_file: + - input_file: type: file description: | Input file to be validated output: - - ch_input_file: + - input_file: type: file description: | Channel containing validated input files From f04e754d0d17361686748c79369fa9e8ab5e2ddb Mon Sep 17 00:00:00 2001 From: yueqixuan Date: Wed, 8 Apr 2026 10:59:01 +0800 Subject: [PATCH 27/28] Support multiplexing --- .../local/diann/final_quantification/main.nf | 14 +- nextflow.config | 2 + nextflow_schema.json | 13 ++ .../local/create_input_channel/main.nf | 136 ++++++++---------- 4 files changed, 86 insertions(+), 79 deletions(-) diff --git a/modules/local/diann/final_quantification/main.nf b/modules/local/diann/final_quantification/main.nf index 64aa5cc..f83af13 100644 --- a/modules/local/diann/final_quantification/main.nf +++ b/modules/local/diann/final_quantification/main.nf @@ -25,11 +25,11 @@ process FINAL_QUANTIFICATION { path "diann_report.{tsv,parquet}", emit: main_report, optional: true path "diann_report.manifest.txt", emit: report_manifest, optional: true path "diann_report.protein_description.tsv", emit: protein_description, optional: true - path "diann_report.stats.tsv", emit: report_stats - path "diann_report.pr_matrix.tsv", emit: pr_matrix - path "diann_report.pg_matrix.tsv", emit: pg_matrix - path "diann_report.gg_matrix.tsv", emit: gg_matrix - path "diann_report.unique_genes_matrix.tsv", emit: unique_gene_matrix + path "diann_report.stats.tsv", emit: report_stats, optional: true + path "diann_report.pr_matrix.tsv", emit: pr_matrix, optional: true + path "diann_report.pg_matrix.tsv", emit: pg_matrix, optional: true + path "diann_report.gg_matrix.tsv", emit: gg_matrix, optional: true + path "diann_report.unique_genes_matrix.tsv", emit: unique_gene_matrix, optional: true path "diannsummary.log", emit: log // Different library files format are exported due to different DIA-NN versions @@ -75,6 +75,8 @@ process FINAL_QUANTIFICATION { diann_dda_flag = meta.acquisition_method == 'dda' ? "--dda" : "" diann_export_quant = params.diann_export_quant ? "--export-quant" : "" diann_site_ms1_quant = params.diann_site_ms1_quant ? "--site-ms1-quant" : "" + diann_channel_run_norm = params.diann_channel_run_norm ? "--channel-run-norm" : "" + diann_channel_spec_norm = params.diann_channel_spec_norm ? "--channel-spec-norm" : "" """ # Notes: if .quant files are passed, mzml/.d files are not accessed, so the name needs to be passed but files @@ -107,6 +109,8 @@ process FINAL_QUANTIFICATION { ${diann_dda_flag} \\ ${diann_export_quant} \\ ${diann_site_ms1_quant} \\ + ${diann_channel_run_norm} \\ + ${diann_channel_spec_norm} \\ \${mod_flags} \\ $args diff --git a/nextflow.config b/nextflow.config index 696ebc4..a8fbb0d 100644 --- a/nextflow.config +++ b/nextflow.config @@ -101,6 +101,8 @@ params { quantums_params = null diann_no_peptidoforms = false // add '--no-peptidoforms' diann_use_quant = true // add '--use-quant' to FINAL_QUANTIFICATION + diann_channel_run_norm = false // add '--channel-run-norm' to FINAL_QUANTIFICATION + diann_channel_spec_norm = false // add '--channel-spec-norm' to FINAL_QUANTIFICATION // pmultiqc options enable_pmultiqc = true diff --git a/nextflow_schema.json b/nextflow_schema.json index 08d564b..9fe8f21 100644 --- a/nextflow_schema.json +++ b/nextflow_schema.json @@ -411,6 +411,19 @@ "fa_icon": "far fa-check-square", "default": true }, + + "diann_channel_run_norm": { + "type": "boolean", + "description": "Set '--channel-run-norm'. Run-specific normalisation for multiplexing (e.g., Protein Turnover SILAC)", + "fa_icon": "far fa-check-square", + "default": false + }, + "diann_channel_spec_norm": { + "type": "boolean", + "description": "Set '--channel-spec-norm'. Channel-specific normalisation for multiplexing (Independent Samples)", + "fa_icon": "far fa-check-square", + "default": false + }, "skip_preliminary_analysis": { "type": "boolean", "description": "Skip the preliminary analysis step, thus use the passed spectral library as-is instead of generating a local consensus library.", diff --git a/subworkflows/local/create_input_channel/main.nf b/subworkflows/local/create_input_channel/main.nf index a52b9ee..577441b 100644 --- a/subworkflows/local/create_input_channel/main.nf +++ b/subworkflows/local/create_input_channel/main.nf @@ -29,8 +29,23 @@ workflow CREATE_INPUT_CHANNEL { .combine(ch_expdesign) .splitCsv(header: true, sep: '\t') .map { experiment_id, row -> + def filestr + if (!params.root_folder) { + filestr = row.URI?.toString()?.trim() ? row.URI.toString() : row.Filename.toString() + } else { + filestr = row.Filename.toString() + filestr = params.root_folder + File.separator + filestr + filestr = (params.local_input_type + ? filestr.take(filestr.lastIndexOf('.')) + '.' + params.local_input_type + : filestr) + } + return [filestr, experiment_id, row] + } + .groupTuple(by: 0) + .map { filestr, experiment_ids, rows -> + def experiment_id = experiment_ids[0] def wrapper = [acquisition_method: "", experiment_id: experiment_id] - create_meta_channel(row, enzymes, files, wrapper) + return create_meta_channel_grouped(filestr, rows, wrapper) } .set { ch_meta_config_dia } @@ -42,38 +57,23 @@ workflow CREATE_INPUT_CHANNEL { } // Function to get list of [meta, [ spectra_files ]] -def create_meta_channel(LinkedHashMap row, enzymes, files, wrapper) { +def create_meta_channel_grouped(String filestr, List rows, Map wrapper) { def meta = [:] - def filestr - // Always use SDRF format - if (!params.root_folder) { - filestr = row.URI?.toString()?.trim() ? row.URI.toString() : row.Filename.toString() - } - else { - filestr = row.Filename.toString() - } + def base_row = rows[0] def fileName = file(filestr).name def dotIndex = fileName.lastIndexOf('.') meta.id = dotIndex > 0 ? fileName.take(dotIndex) : fileName meta.experiment_id = wrapper.experiment_id - // apply transformations given by specified root_folder and type - if (params.root_folder) { - filestr = params.root_folder + File.separator + filestr - filestr = (params.local_input_type - ? filestr.take(filestr.lastIndexOf('.')) + '.' + params.local_input_type - : filestr) - } - // existence check if (!file(filestr).exists()) { exit(1, "ERROR: Please check input file -> File Uri does not exist!\n${filestr}") } // Detect acquisition method from SDRF or fallback to --diann_dda param - def acqMethod = row.AcquisitionMethod?.toString()?.trim() ?: "" + def acqMethod = base_row.AcquisitionMethod?.toString()?.trim() ?: "" if (acqMethod.toLowerCase().contains("data-independent acquisition") || acqMethod.toLowerCase().contains("dia")) { meta.acquisition_method = "dia" } else if (acqMethod.toLowerCase().contains("data-dependent acquisition") || acqMethod.toLowerCase().contains("dda")) { @@ -85,21 +85,35 @@ def create_meta_channel(LinkedHashMap row, enzymes, files, wrapper) { exit(1) } - // DissociationMethod is already normalized by convert-diann (HCD, CID, ETD, ECD) - meta.dissociationmethod = row.DissociationMethod?.toString()?.trim() ?: "" - + meta.dissociationmethod = base_row.DissociationMethod?.toString()?.trim() ?: "" wrapper.acquisition_method = meta.acquisition_method - // Validate required SDRF columns - these parameters are exclusively read from SDRF (no command-line override) + def labels = rows.collect { it.Label?.toString()?.trim() }.findAll { it }.unique() + meta.labelling_type = labels.join(';') + + def is_plexdia = labels.size() > 1 || (labels.size() == 1 && !labels[0].toLowerCase().contains("label free")) + meta.plexdia = is_plexdia + + def enzymes = rows.collect { it.Enzyme?.toString()?.trim() }.findAll { it }.unique() + if (enzymes.size() > 1) { + log.error("Currently only one enzyme is supported per file. Found conflicting enzymes for ${filestr}: '${enzymes}'.") + exit(1) + } + meta.enzyme = enzymes ? enzymes[0] : null + + def fixedMods = rows.collect { it.FixedModifications?.toString()?.trim() }.findAll { it }.unique() + meta.fixedmodifications = fixedMods ? fixedMods[0] : null + + // Validate required SDRF columns def requiredColumns = [ - 'Label': row.Label, - 'Enzyme': row.Enzyme, - 'FixedModifications': row.FixedModifications + 'Label': meta.labelling_type, + 'Enzyme': meta.enzyme, + 'FixedModifications': meta.fixedmodifications ] def missingColumns = [] requiredColumns.each { colName, colValue -> - if (colValue == null || colValue.toString().trim().isEmpty()) { + if (colValue == null || colValue.toString().isEmpty()) { missingColumns.add(colName) } } @@ -110,20 +124,13 @@ def create_meta_channel(LinkedHashMap row, enzymes, files, wrapper) { exit(1) } - // Set values from SDRF (required columns) - meta.labelling_type = row.Label - meta.fixedmodifications = row.FixedModifications - meta.enzyme = row.Enzyme - - // Set tolerance values: use SDRF if available, otherwise fall back to params def validUnits = ['ppm', 'da', 'Da', 'PPM'] - // Precursor mass tolerance - if (row.PrecursorMassTolerance != null && !row.PrecursorMassTolerance.toString().trim().isEmpty()) { + if (base_row.PrecursorMassTolerance != null && !base_row.PrecursorMassTolerance.toString().trim().isEmpty()) { try { - meta.precursormasstolerance = Double.parseDouble(row.PrecursorMassTolerance) + meta.precursormasstolerance = Double.parseDouble(base_row.PrecursorMassTolerance) } catch (NumberFormatException e) { - log.error("ERROR: Invalid PrecursorMassTolerance value '${row.PrecursorMassTolerance}' for file '${filestr}'. Must be a valid number.") + log.error("ERROR: Invalid PrecursorMassTolerance value '${base_row.PrecursorMassTolerance}' for file '${filestr}'. Must be a valid number.") exit(1) } } else { @@ -131,23 +138,21 @@ def create_meta_channel(LinkedHashMap row, enzymes, files, wrapper) { meta.precursormasstolerance = params.precursor_mass_tolerance } - // Precursor mass tolerance unit - if (row.PrecursorMassToleranceUnit != null && !row.PrecursorMassToleranceUnit.toString().trim().isEmpty()) { - if (!validUnits.any { row.PrecursorMassToleranceUnit.toString().equalsIgnoreCase(it) }) { - log.error("ERROR: Invalid PrecursorMassToleranceUnit '${row.PrecursorMassToleranceUnit}' for file '${filestr}'. Must be 'ppm' or 'Da'.") + if (base_row.PrecursorMassToleranceUnit != null && !base_row.PrecursorMassToleranceUnit.toString().trim().isEmpty()) { + if (!validUnits.any { base_row.PrecursorMassToleranceUnit.toString().equalsIgnoreCase(it) }) { + log.error("ERROR: Invalid PrecursorMassToleranceUnit '${base_row.PrecursorMassToleranceUnit}' for file '${filestr}'. Must be 'ppm' or 'Da'.") exit(1) } - meta.precursormasstoleranceunit = row.PrecursorMassToleranceUnit + meta.precursormasstoleranceunit = base_row.PrecursorMassToleranceUnit } else { meta.precursormasstoleranceunit = params.precursor_mass_tolerance_unit } - // Fragment mass tolerance - if (row.FragmentMassTolerance != null && !row.FragmentMassTolerance.toString().trim().isEmpty()) { + if (base_row.FragmentMassTolerance != null && !base_row.FragmentMassTolerance.toString().trim().isEmpty()) { try { - meta.fragmentmasstolerance = Double.parseDouble(row.FragmentMassTolerance) + meta.fragmentmasstolerance = Double.parseDouble(base_row.FragmentMassTolerance) } catch (NumberFormatException e) { - log.error("ERROR: Invalid FragmentMassTolerance value '${row.FragmentMassTolerance}' for file '${filestr}'. Must be a valid number.") + log.error("ERROR: Invalid FragmentMassTolerance value '${base_row.FragmentMassTolerance}' for file '${filestr}'. Must be a valid number.") exit(1) } } else { @@ -155,43 +160,26 @@ def create_meta_channel(LinkedHashMap row, enzymes, files, wrapper) { meta.fragmentmasstolerance = params.fragment_mass_tolerance } - // Fragment mass tolerance unit - if (row.FragmentMassToleranceUnit != null && !row.FragmentMassToleranceUnit.toString().trim().isEmpty()) { - if (!validUnits.any { row.FragmentMassToleranceUnit.toString().equalsIgnoreCase(it) }) { - log.error("ERROR: Invalid FragmentMassToleranceUnit '${row.FragmentMassToleranceUnit}' for file '${filestr}'. Must be 'ppm' or 'Da'.") + if (base_row.FragmentMassToleranceUnit != null && !base_row.FragmentMassToleranceUnit.toString().trim().isEmpty()) { + if (!validUnits.any { base_row.FragmentMassToleranceUnit.toString().equalsIgnoreCase(it) }) { + log.error("ERROR: Invalid FragmentMassToleranceUnit '${base_row.FragmentMassToleranceUnit}' for file '${filestr}'. Must be 'ppm' or 'Da'.") exit(1) } - meta.fragmentmasstoleranceunit = row.FragmentMassToleranceUnit + meta.fragmentmasstoleranceunit = base_row.FragmentMassToleranceUnit } else { meta.fragmentmasstoleranceunit = params.fragment_mass_tolerance_unit } - // Variable modifications: use SDRF if available, otherwise fall back to params - if (row.VariableModifications != null && !row.VariableModifications.toString().trim().isEmpty()) { - meta.variablemodifications = row.VariableModifications + if (base_row.VariableModifications != null && !base_row.VariableModifications.toString().trim().isEmpty()) { + meta.variablemodifications = base_row.VariableModifications } else { meta.variablemodifications = params.variable_mods } - // Per-file scan ranges (empty string = no flags passed, DIA-NN auto-detects) - meta.ms1minmz = row.MS1MinMz?.toString()?.trim() ?: "" - meta.ms1maxmz = row.MS1MaxMz?.toString()?.trim() ?: "" - meta.ms2minmz = row.MS2MinMz?.toString()?.trim() ?: "" - meta.ms2maxmz = row.MS2MaxMz?.toString()?.trim() ?: "" - - enzymes += row.Enzyme - if (enzymes.size() > 1) { - log.error("Currently only one enzyme is supported for the whole experiment. Specified was '${enzymes}'. Check or split your SDRF.") - log.error(filestr) - exit(1) - } - - // Check for duplicate files - if (filestr in files) { - log.error("Currently only one DIA-NN setting per file is supported for the whole experiment. ${filestr} has multiple entries in your SDRF. Consider splitting your design into multiple experiments.") - exit(1) - } - files += filestr + meta.ms1minmz = base_row.MS1MinMz?.toString()?.trim() ?: "" + meta.ms1maxmz = base_row.MS1MaxMz?.toString()?.trim() ?: "" + meta.ms2minmz = base_row.MS2MinMz?.toString()?.trim() ?: "" + meta.ms2maxmz = base_row.MS2MaxMz?.toString()?.trim() ?: "" return [meta, filestr] -} +} \ No newline at end of file From af505677ffb95a56f2cae4909920cba70cb036da Mon Sep 17 00:00:00 2001 From: yueqixuan Date: Wed, 8 Apr 2026 11:02:28 +0800 Subject: [PATCH 28/28] Support multiplexing --- subworkflows/local/create_input_channel/main.nf | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) diff --git a/subworkflows/local/create_input_channel/main.nf b/subworkflows/local/create_input_channel/main.nf index 577441b..2208eca 100644 --- a/subworkflows/local/create_input_channel/main.nf +++ b/subworkflows/local/create_input_channel/main.nf @@ -104,7 +104,7 @@ def create_meta_channel_grouped(String filestr, List rows, Map wrapper) { def fixedMods = rows.collect { it.FixedModifications?.toString()?.trim() }.findAll { it }.unique() meta.fixedmodifications = fixedMods ? fixedMods[0] : null - // Validate required SDRF columns + // Validate required SDRF columns def requiredColumns = [ 'Label': meta.labelling_type, 'Enzyme': meta.enzyme, @@ -182,4 +182,4 @@ def create_meta_channel_grouped(String filestr, List rows, Map wrapper) { meta.ms2maxmz = base_row.MS2MaxMz?.toString()?.trim() ?: "" return [meta, filestr] -} \ No newline at end of file +}