Conversation
|
Important Review skippedAuto reviews are disabled on base/target branches other than the default branch. Please check the settings in the CodeRabbit UI or the ⚙️ Run configurationConfiguration used: defaults Review profile: CHILL Plan: Pro Run ID: You can disable this status message by setting the Use the checkbox below for a quick retry:
📝 WalkthroughWalkthroughInitial addition of a complete Nextflow-based DIA-NN quantitative mass spectrometry analysis pipeline for Changes
Sequence Diagram(s)sequenceDiagram
participant User
participant Main as main.nf
participant QM as QUANTMSDIANN
participant IC as INPUT_CHECK
participant CC as CREATE_INPUT_CHANNEL
participant FP as FILE_PREPARATION
participant DIA as DIA Workflow
participant MQ as PMULTIQC
User->>Main: nextflow run
Main->>QM: Invoke workflow
QM->>IC: Validate samplesheet
IC-->>QM: Validated SDRF
QM->>CC: Parse SDRF→DIA config
CC-->>QM: Meta, file paths, config
QM->>FP: Prepare raw files
FP-->>QM: Converted mzML, statistics
QM->>DIA: Run DIA analysis
DIA-->>QM: Reports, matrices, MSstats input
QM->>MQ: Aggregate QC metrics
MQ-->>QM: MultiQC HTML report
QM-->>Main: Complete
Main-->>User: Pipeline done
sequenceDiagram
participant Input as Input Files
participant IL as INSILICO_LIBRARY<br/>GENERATION
participant PA as PRELIMINARY<br/>ANALYSIS
participant AE as ASSEMBLE<br/>EMPIRICAL
participant IA as INDIVIDUAL<br/>ANALYSIS
participant FQ as FINAL<br/>QUANTIFICATION
participant MS as DIANN_MSSTATS
Input->>IL: FASTA
IL-->>PA: Predicted library
Input->>PA: mzML files
PA-->>AE: Preliminary quant
AE-->>IA: Empirical library
Input->>IA: Per-file mzML
IA-->>FQ: Ordered .quant files
FQ-->>MS: DIA-NN reports
MS-->>Input: MSstats input CSV
Estimated Code Review Effort🎯 5 (Critical) | ⏱️ ~120 minutes Possibly Related PRs
Suggested Reviewers
Poem
✨ Finishing Touches🧪 Generate unit tests (beta)
|
Minor update
Refactor: extract calibration params to meta
There was a problem hiding this comment.
Pull request overview
This PR updates the DIA-NN workflow wiring to consume diann_config.cfg directly from sdrf-pipelines (convert-diann), adds support for DIA-NN version-aware SDRF conversion + PTM localization flags, and expands MultiQC inputs and documentation accordingly.
Changes:
- Pass
diann_config.cfgfromSDRF_PARSINGthroughCREATE_INPUT_CHANNELinto the DIA workflow; propagate DIA-NN summary logs into MultiQC inputs. - Add per-file scan range support (min/max precursor/fragment m/z) and broaden blocking/stripping of conflicting DIA-NN flags.
- Bump several tool containers and document DIA-NN parameter precedence / developer local-container testing.
Reviewed changes
Copilot reviewed 23 out of 24 changed files in this pull request and generated 6 comments.
Show a summary per file
| File | Description |
|---|---|
| workflows/quantmsdiann.nf | Pass DIA-NN cfg into DIA subworkflow; add DIA-NN log to MultiQC inputs. |
| workflows/dia.nf | Remove GENERATE_CFG usage; parse calibrated params and inject into per-file meta; emit DIA-NN log from final quant. |
| subworkflows/local/create_input_channel/main.nf | Expose DIA-NN cfg from SDRF parsing; adjust URI/acquisition method handling; add per-file scan range meta fields. |
| modules/local/sdrf_parsing/main.nf | Switch to convert-diann, emit diann_config.cfg, add mod-localization + DIA-NN version flags, update container. |
| modules/local/diann/preliminary_analysis/main.nf | Support per-file m/z scan flags; strip additional conflicting flags; include --monitor-mod extraction. |
| modules/local/diann/assemble_empirical_library/main.nf | Emit calibrated params file; include --monitor-mod extraction; extract calibration values from log. |
| modules/local/diann/assemble_empirical_library/meta.yml | Document new calibrated_params output. |
| modules/local/diann/individual_analysis/main.nf | Remove diann_log input; use meta-driven calibration; add per-file m/z scan flags; update blocked flags. |
| modules/local/diann/individual_analysis/meta.yml | Remove outdated diann_log input documentation. |
| modules/local/diann/final_quantification/main.nf | Include --monitor-mod extraction in blocked flags and mod flag parsing. |
| modules/local/samplesheet_check/main.nf | Update quantms-utils container and change checksamplesheet invocation to --minimal. |
| modules/local/pmultiqc/main.nf | Update pmultiqc container and CLI flags to hyphenated forms. |
| modules/local/utils/mzml_statistics/main.nf | Bump quantms-utils container tag. |
| modules/local/diann/generate_cfg/main.nf | Bump quantms-utils container tag (process appears unused by updated workflow). |
| modules/local/diann/diann_msstats/main.nf | Bump quantms-utils container tag. |
| nextflow.config | Add diann_version, mod-localization params, and default mass accuracy params. |
| nextflow_schema.json | Add schema entries for new DIA-NN params (version + localization + mass accuracy). |
| docs/usage.md | Add detailed DIA-NN parameter documentation and local-container developer testing instructions. |
| conf/tests/test_dia_local.config | Add local container override config for dev testing. |
| conf/diann_versions/v1_8_1.config | Set params.diann_version for this profile. |
| conf/diann_versions/v2_1_0.config | Set params.diann_version for this profile. |
| conf/diann_versions/v2_2_0.config | Set params.diann_version for this profile. |
| .gitignore | Ignore docs/superpowers/. |
Comments suppressed due to low confidence (1)
subworkflows/local/create_input_channel/main.nf:53
- The URI presence check uses
row.URI?.toString()?.trim()but then assignsrow.URI.toString()(untrimmed). If the URI field contains trailing/leading whitespace, the subsequentfile(filestr).exists()check can fail even though the intended path is valid. Consider assigning the trimmed value (or trimming after selection) before using it as a path.
// Always use SDRF format
if (!params.root_folder) {
filestr = row.URI?.toString()?.trim() ? row.URI.toString() : row.Filename.toString()
}
else {
filestr = row.Filename.toString()
}
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
| path "*.log" , emit: log | ||
| path "versions.yml" , emit: versions | ||
| path "diann_design.tsv" , emit: ch_sdrf_config_file | ||
| path "diann_design.tsv" , emit: ch_expdesign |
There was a problem hiding this comment.
The process declares the same output file (diann_design.tsv) twice (emitted as both ch_sdrf_config_file and ch_expdesign). Nextflow processes generally do not allow duplicate path output declarations for the same file pattern and this can fail at compile/runtime. Consider emitting diann_design.tsv once and then duplicating/aliasing the resulting channel in the calling workflow (e.g., using into/tap/simple assignment).
| path "diann_design.tsv" , emit: ch_expdesign |
| if (params.mass_acc_automatic || params.scan_window_automatic) { | ||
| if (meta.mass_acc_ms2 != "0" && meta.mass_acc_ms2 != null) { | ||
| mass_acc_ms2 = meta.mass_acc_ms2 | ||
| mass_acc_ms1 = meta.mass_acc_ms1 | ||
| scan_window = meta.scan_window | ||
| } | ||
| else if (meta['fragmentmasstolerance']) { | ||
| mass_acc_ms2 = meta['fragmentmasstolerance'] | ||
| mass_acc_ms1 = meta['precursormasstolerance'] | ||
| scan_window = params.scan_window | ||
| } | ||
| else { | ||
| mass_acc_ms2 = params.mass_acc_ms2 | ||
| mass_acc_ms1 = params.mass_acc_ms1 | ||
| scan_window = params.scan_window | ||
| } | ||
| } else if (meta['precursormasstoleranceunit']?.toLowerCase()?.endsWith('ppm') && meta['fragmentmasstoleranceunit']?.toLowerCase()?.endsWith('ppm')) { | ||
| mass_acc_ms1 = meta["precursormasstolerance"] | ||
| mass_acc_ms2 = meta["fragmentmasstolerance"] | ||
| } else { | ||
| mass_acc_ms2 = "\$(cat ${diann_log} | grep \"Averaged recommended settings\" | cut -d ' ' -f 11 | tr -cd \"[0-9]\")" | ||
| scan_window = "\$(cat ${diann_log} | grep \"Averaged recommended settings\" | cut -d ' ' -f 19 | tr -cd \"[0-9]\")" | ||
| mass_acc_ms1 = "\$(cat ${diann_log} | grep \"Averaged recommended settings\" | cut -d ' ' -f 15 | tr -cd \"[0-9]\")" | ||
| if (meta.mass_acc_ms2 != "0" && meta.mass_acc_ms2 != null) { | ||
| mass_acc_ms2 = meta.mass_acc_ms2 | ||
| mass_acc_ms1 = meta.mass_acc_ms1 | ||
| scan_window = meta.scan_window | ||
| } else if (meta['fragmentmasstolerance']) { | ||
| mass_acc_ms2 = meta['fragmentmasstolerance'] | ||
| mass_acc_ms1 = meta['precursormasstolerance'] | ||
| scan_window = params.scan_window | ||
| } else { | ||
| mass_acc_ms2 = params.mass_acc_ms2 | ||
| mass_acc_ms1 = params.mass_acc_ms1 | ||
| scan_window = params.scan_window | ||
| } | ||
| } |
There was a problem hiding this comment.
scan_window is not assigned in the branch where both mass_acc_automatic and scan_window_automatic are false and the tolerances are in ppm. Since the command always interpolates --window ${scan_window}, this can cause an undefined-variable / null interpolation failure at runtime. Ensure scan_window is set in all code paths (e.g., default to params.scan_window when not auto).
| if (params.mass_acc_automatic || params.scan_window_automatic) { | ||
| if (meta.mass_acc_ms2 != "0" && meta.mass_acc_ms2 != null) { | ||
| mass_acc_ms2 = meta.mass_acc_ms2 | ||
| mass_acc_ms1 = meta.mass_acc_ms1 | ||
| scan_window = meta.scan_window | ||
| } | ||
| else if (meta['fragmentmasstolerance']) { | ||
| mass_acc_ms2 = meta['fragmentmasstolerance'] | ||
| mass_acc_ms1 = meta['precursormasstolerance'] | ||
| scan_window = params.scan_window | ||
| } | ||
| else { |
There was a problem hiding this comment.
When mass_acc_automatic/scan_window_automatic are enabled, the fallback uses meta['fragmentmasstolerance'] and meta['precursormasstolerance'] without checking that the corresponding units are ppm. If the SDRF (or defaults) specify Da, these values will still be passed to DIA-NN via --mass-acc/--mass-acc-ms1, which expects ppm (consistent with the unit-checking logic used elsewhere). Consider checking *toleranceunit before using these fields, otherwise fall back to calibrated values or params defaults.
| @@ -40,11 +36,8 @@ process SAMPLESHEET_CHECK { | |||
| cp "${input_file}" "\$OUTPUT_FILE" | |||
| fi | |||
|
|
|||
| quantmsutilsc checksamplesheet --exp_design "\$OUTPUT_FILE" --is_sdrf \\ | |||
| ${string_skip_sdrf_validation} \\ | |||
| ${string_skip_ms_validation} \\ | |||
| ${string_skip_factor_validation} \\ | |||
| ${string_skip_experimental_design_validation} \\ | |||
| quantmsutilsc checksamplesheet --exp_design "\$OUTPUT_FILE" \\ | |||
| --minimal \\ | |||
| ${string_use_ols_cache_only} \\ | |||
| $args \\ | |||
| 2>&1 | tee input_check.log | |||
There was a problem hiding this comment.
validate_ontologies, skip_ms_validation, skip_factor_validation, and skip_experimental_design_validation parameters are still defined in nextflow.config/nextflow_schema.json, but this process no longer uses them and always runs quantmsutilsc checksamplesheet --minimal. This makes the documented params ineffective and changes validation behavior unexpectedly. Either wire these params back into the command, or remove/deprecate them in config/schema/docs to avoid a silent behavior change.
| indiv_fin_analysis_in = ch_file_preparation_results | ||
| .combine(ch_searchdb) | ||
| .combine(ASSEMBLE_EMPIRICAL_LIBRARY.out.log) | ||
| .combine(ASSEMBLE_EMPIRICAL_LIBRARY.out.empirical_library) | ||
|
|
||
| .combine(ASSEMBLE_EMPIRICAL_LIBRARY.out.calibrated_params) | ||
| .map { meta_map, ms_file, fasta, library, param_file -> | ||
| def values = param_file.text.trim().split(',') | ||
| def new_meta = meta_map + [ | ||
| mass_acc_ms2 : values[0], | ||
| mass_acc_ms1 : values[1], | ||
| scan_window : values[2] | ||
| ] | ||
| return [ new_meta, ms_file, fasta, library ] | ||
| } |
There was a problem hiding this comment.
The workflow reads param_file.text from ASSEMBLE_EMPIRICAL_LIBRARY.out.calibrated_params on the Nextflow driver to set per-file meta fields. This can break on remote executors / object-store work dirs (where outputs may not be readable via local filesystem APIs) and also couples control-flow to file I/O on the head node. Prefer emitting the calibrated values as val (e.g., a tuple of numbers) from the process, or parsing within a small helper process so the values flow through channels without driver-side file reads.
CI Feedback 🧐A test triggered by this PR failed. Here is an AI-generated analysis of the failure:
|
Summary by CodeRabbit
New Features
Documentation
Tests
Chores