Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
2 changes: 1 addition & 1 deletion .gitignore
Original file line number Diff line number Diff line change
Expand Up @@ -17,4 +17,4 @@ null/
.cursor/rules/codacy.mdc
.codacy/
.github/instructions/codacy.instructions.md
docs/superpowers/
docs/superpowers/
71 changes: 37 additions & 34 deletions docs/usage.md
Original file line number Diff line number Diff line change
Expand Up @@ -95,49 +95,50 @@ The pipeline passes parameters to DIA-NN at different steps. Some parameters com
### Parameter sources

Parameters are resolved in this priority order:

1. **SDRF metadata** (per-file, from `convert-diann` design file) — highest priority
2. **Pipeline parameters** (`--param_name` on command line or params file)
3. **Nextflow defaults** (`nextflow.config`) — lowest priority

### Pipeline steps

| Step | Description |
|------|-------------|
| Step | Description |
| ------------------------------- | ------------------------------------------------------------------- |
| **INSILICO_LIBRARY_GENERATION** | Predicts a spectral library from FASTA using DIA-NN's deep learning |
| **PRELIMINARY_ANALYSIS** | Per-file calibration and mass accuracy estimation (first pass) |
| **ASSEMBLE_EMPIRICAL_LIBRARY** | Builds consensus empirical library from preliminary results |
| **INDIVIDUAL_ANALYSIS** | Per-file quantification with the empirical library (second pass) |
| **FINAL_QUANTIFICATION** | Aggregates all files into protein/peptide matrices |
| **PRELIMINARY_ANALYSIS** | Per-file calibration and mass accuracy estimation (first pass) |
| **ASSEMBLE_EMPIRICAL_LIBRARY** | Builds consensus empirical library from preliminary results |
| **INDIVIDUAL_ANALYSIS** | Per-file quantification with the empirical library (second pass) |
| **FINAL_QUANTIFICATION** | Aggregates all files into protein/peptide matrices |

### Per-file parameters from SDRF

These parameters are extracted per-file from the SDRF via `convert-diann` and stored in `diann_design.tsv`:

| DIA-NN flag | SDRF column | Design column | Steps | Notes |
|---|---|---|---|---|
| `--mass-acc-ms1` | `comment[precursor mass tolerance]` | `PrecursorMassTolerance` | PRELIMINARY, INDIVIDUAL | Falls back to auto-detect if missing or not ppm |
| `--mass-acc` | `comment[fragment mass tolerance]` | `FragmentMassTolerance` | PRELIMINARY, INDIVIDUAL | Falls back to auto-detect if missing or not ppm |
| `--min-pr-mz` | `comment[ms1 scan range]` or `comment[ms min mz]` | `MS1MinMz` | PRELIMINARY, INDIVIDUAL | Per-file for GPF; global broadest for INSILICO |
| `--max-pr-mz` | `comment[ms1 scan range]` or `comment[ms max mz]` | `MS1MaxMz` | PRELIMINARY, INDIVIDUAL | Per-file for GPF; global broadest for INSILICO |
| `--min-fr-mz` | `comment[ms2 scan range]` or `comment[ms2 min mz]` | `MS2MinMz` | PRELIMINARY, INDIVIDUAL | Per-file for GPF; global broadest for INSILICO |
| `--max-fr-mz` | `comment[ms2 scan range]` or `comment[ms2 max mz]` | `MS2MaxMz` | PRELIMINARY, INDIVIDUAL | Per-file for GPF; global broadest for INSILICO |
| DIA-NN flag | SDRF column | Design column | Steps | Notes |
| ---------------- | -------------------------------------------------- | ------------------------ | ----------------------- | ----------------------------------------------- |
| `--mass-acc-ms1` | `comment[precursor mass tolerance]` | `PrecursorMassTolerance` | PRELIMINARY, INDIVIDUAL | Falls back to auto-detect if missing or not ppm |
| `--mass-acc` | `comment[fragment mass tolerance]` | `FragmentMassTolerance` | PRELIMINARY, INDIVIDUAL | Falls back to auto-detect if missing or not ppm |
| `--min-pr-mz` | `comment[ms1 scan range]` or `comment[ms min mz]` | `MS1MinMz` | PRELIMINARY, INDIVIDUAL | Per-file for GPF; global broadest for INSILICO |
| `--max-pr-mz` | `comment[ms1 scan range]` or `comment[ms max mz]` | `MS1MaxMz` | PRELIMINARY, INDIVIDUAL | Per-file for GPF; global broadest for INSILICO |
| `--min-fr-mz` | `comment[ms2 scan range]` or `comment[ms2 min mz]` | `MS2MinMz` | PRELIMINARY, INDIVIDUAL | Per-file for GPF; global broadest for INSILICO |
| `--max-fr-mz` | `comment[ms2 scan range]` or `comment[ms2 max mz]` | `MS2MaxMz` | PRELIMINARY, INDIVIDUAL | Per-file for GPF; global broadest for INSILICO |

### Global parameters from config

These parameters apply globally across all files. They are set in `diann_config.cfg` (from SDRF) or as pipeline parameters:

| DIA-NN flag | Pipeline parameter | Default | Steps | Notes |
|---|---|---|---|---|
| `--cut` | (from SDRF enzyme) | — | ALL | Enzyme cut rule, derived from `comment[cleavage agent details]` |
| `--fixed-mod` | (from SDRF) | — | ALL | Fixed modifications from `comment[modification parameters]` |
| `--var-mod` | (from SDRF) | — | ALL | Variable modifications from `comment[modification parameters]` |
| `--monitor-mod` | `--enable_mod_localization` + `--mod_localization` | `false` / `Phospho (S),Phospho (T),Phospho (Y)` | PRELIMINARY, ASSEMBLE, INDIVIDUAL, FINAL | PTM site localization scoring (DIA-NN 1.8.x only) |
| `--window` | `--scan_window` | `8` | PRELIMINARY, ASSEMBLE, INDIVIDUAL | Scan window; auto-detected when `--scan_window_automatic=true` |
| `--quick-mass-acc` | `--quick_mass_acc` | `true` | PRELIMINARY | Fast mass accuracy calibration |
| `--min-corr 2 --corr-diff 1 --time-corr-only` | `--performance_mode` | `true` | PRELIMINARY | High-speed, low-RAM mode |
| `--pg-level` | `--pg_level` | `2` | INDIVIDUAL, FINAL | Protein grouping level |
| `--species-genes` | `--species_genes` | `false` | FINAL | Use species-specific gene names |
| `--no-norm` | `--diann_normalize` | `true` | FINAL | Disable normalization when `false` |
| DIA-NN flag | Pipeline parameter | Default | Steps | Notes |
| --------------------------------------------- | -------------------------------------------------- | ----------------------------------------------- | ---------------------------------------- | --------------------------------------------------------------- |
| `--cut` | (from SDRF enzyme) | — | ALL | Enzyme cut rule, derived from `comment[cleavage agent details]` |
| `--fixed-mod` | (from SDRF) | — | ALL | Fixed modifications from `comment[modification parameters]` |
| `--var-mod` | (from SDRF) | — | ALL | Variable modifications from `comment[modification parameters]` |
| `--monitor-mod` | `--enable_mod_localization` + `--mod_localization` | `false` / `Phospho (S),Phospho (T),Phospho (Y)` | PRELIMINARY, ASSEMBLE, INDIVIDUAL, FINAL | PTM site localization scoring (DIA-NN 1.8.x only) |
| `--window` | `--scan_window` | `8` | PRELIMINARY, ASSEMBLE, INDIVIDUAL | Scan window; auto-detected when `--scan_window_automatic=true` |
| `--quick-mass-acc` | `--quick_mass_acc` | `true` | PRELIMINARY | Fast mass accuracy calibration |
| `--min-corr 2 --corr-diff 1 --time-corr-only` | `--performance_mode` | `true` | PRELIMINARY | High-speed, low-RAM mode |
| `--pg-level` | `--pg_level` | `2` | INDIVIDUAL, FINAL | Protein grouping level |
| `--species-genes` | `--species_genes` | `false` | FINAL | Use species-specific gene names |
| `--no-norm` | `--diann_normalize` | `true` | FINAL | Disable normalization when `false` |

### PTM site localization (`--monitor-mod`)

Expand All @@ -161,19 +162,20 @@ nextflow run bigbio/quantmsdiann \
```

The parameter accepts two formats:

- **Modification names** (quantms-compatible): `Phospho (S),Phospho (T),Phospho (Y)` — site info in parentheses is stripped, the base name is mapped to UniMod
- **UniMod accessions** (direct): `UniMod:21,UniMod:1`

Supported modification name mappings:

| Name | UniMod ID | Example |
|---|---|---|
| Phospho | `UniMod:21` | `Phospho (S),Phospho (T),Phospho (Y)` |
| GlyGly | `UniMod:121` | `GlyGly (K)` |
| Acetyl | `UniMod:1` | `Acetyl (Protein N-term)` |
| Oxidation | `UniMod:35` | `Oxidation (M)` |
| Deamidated | `UniMod:7` | `Deamidated (N),Deamidated (Q)` |
| Methylation | `UniMod:34` | `Methylation (K),Methylation (R)` |
| Name | UniMod ID | Example |
| ----------- | ------------ | ------------------------------------- |
| Phospho | `UniMod:21` | `Phospho (S),Phospho (T),Phospho (Y)` |
| GlyGly | `UniMod:121` | `GlyGly (K)` |
| Acetyl | `UniMod:1` | `Acetyl (Protein N-term)` |
| Oxidation | `UniMod:35` | `Oxidation (M)` |
| Deamidated | `UniMod:7` | `Deamidated (N),Deamidated (Q)` |
| Methylation | `UniMod:34` | `Methylation (K),Methylation (R)` |

## Optional outputs

Expand Down Expand Up @@ -269,6 +271,7 @@ nextflow run main.nf \
```

This config (`conf/tests/test_dia_local.config`) overrides:

- `SDRF_PARSING` → `local/sdrf-pipelines:dev`
- `SAMPLESHEET_CHECK` → `local/quantms-utils:dev`
- `DIANN_MSSTATS` → `local/quantms-utils:dev`
Expand Down
9 changes: 9 additions & 0 deletions modules/local/diann/assemble_empirical_library/main.nf
Original file line number Diff line number Diff line change
Expand Up @@ -19,6 +19,7 @@ process ASSEMBLE_EMPIRICAL_LIBRARY {
path "empirical_library.*", emit: empirical_library
path "assemble_empirical_library.log", emit: log
path "versions.yml", emit: versions
path "diann_calibrated_params.csv", emit: calibrated_params

when:
task.ext.when == null || task.ext.when
Expand Down Expand Up @@ -83,6 +84,14 @@ process ASSEMBLE_EMPIRICAL_LIBRARY {

cp report.log.txt assemble_empirical_library.log

val_mass_acc_ms2=\$(grep "Averaged recommended settings" assemble_empirical_library.log | cut -d ' ' -f 11 | tr -cd "[0-9.]")
val_mass_acc_ms1=\$(grep "Averaged recommended settings" assemble_empirical_library.log | cut -d ' ' -f 15 | tr -cd "[0-9.]")
val_scan_window=\$(grep "Averaged recommended settings" assemble_empirical_library.log | cut -d ' ' -f 19 | tr -cd "[0-9.]")
if [ -z "\$val_mass_acc_ms2" ]; then val_mass_acc_ms2="0"; fi
if [ -z "\$val_mass_acc_ms1" ]; then val_mass_acc_ms1="0"; fi
if [ -z "\$val_scan_window" ]; then val_scan_window="0"; fi
echo "\${val_mass_acc_ms2},\${val_mass_acc_ms1},\${val_scan_window}" > diann_calibrated_params.csv

cat <<-END_VERSIONS > versions.yml
"${task.process}":
DIA-NN: \$(diann 2>&1 | grep "DIA-NN" | grep -oP "\\d+\\.\\d+(\\.\\w+)*(\\.[\\d]+)?")
Expand Down
4 changes: 4 additions & 0 deletions modules/local/diann/assemble_empirical_library/meta.yml
Original file line number Diff line number Diff line change
Expand Up @@ -35,5 +35,9 @@ output:
type: file
description: File containing software version
pattern: "versions.yml"
- calibrated_params:
type: file
description: A file containing mass_acc_ms2, mass_acc_ms1, and scan_window extracted from the DIA-NN log.
pattern: "diann_calibrated_params.csv"
authors:
- "@daichengxin"
4 changes: 2 additions & 2 deletions modules/local/diann/diann_msstats/main.nf
Original file line number Diff line number Diff line change
Expand Up @@ -3,8 +3,8 @@ process DIANN_MSSTATS {
label 'process_medium'

container "${ workflow.containerEngine == 'singularity' && !task.ext.singularity_pull_docker_container ?
'https://depot.galaxyproject.org/singularity/quantms-utils:0.0.27--pyh106432d_0' :
'biocontainers/quantms-utils:0.0.27--pyh106432d_0' }"
'https://depot.galaxyproject.org/singularity/quantms-utils:0.0.28--pyh106432d_0' :
'biocontainers/quantms-utils:0.0.28--pyh106432d_0' }"

input:
path(report)
Expand Down
4 changes: 2 additions & 2 deletions modules/local/diann/generate_cfg/main.nf
Original file line number Diff line number Diff line change
Expand Up @@ -3,8 +3,8 @@ process GENERATE_CFG {
label 'process_tiny'

container "${ workflow.containerEngine == 'singularity' && !task.ext.singularity_pull_docker_container ?
'https://depot.galaxyproject.org/singularity/quantms-utils:0.0.27--pyh106432d_0' :
'biocontainers/quantms-utils:0.0.27--pyh106432d_0' }"
'https://depot.galaxyproject.org/singularity/quantms-utils:0.0.28--pyh106432d_0' :
'biocontainers/quantms-utils:0.0.28--pyh106432d_0' }"

input:
val(meta)
Expand Down
42 changes: 31 additions & 11 deletions modules/local/diann/individual_analysis/main.nf
Original file line number Diff line number Diff line change
Expand Up @@ -8,7 +8,7 @@ process INDIVIDUAL_ANALYSIS {
'docker.io/biocontainers/diann:v1.8.1_cv1' }"

input:
tuple val(meta), path(ms_file), path(fasta), path(diann_log), path(library)
tuple val(meta), path(ms_file), path(fasta), path(library)
path(diann_config)

output:
Expand Down Expand Up @@ -44,19 +44,39 @@ process INDIVIDUAL_ANALYSIS {
}
}

scan_window = params.scan_window

if (params.mass_acc_automatic | params.scan_window_automatic) {
mass_acc_ms2 = "\$(cat ${diann_log} | grep \"Averaged recommended settings\" | cut -d ' ' -f 11 | tr -cd \"[0-9]\")"
scan_window = "\$(cat ${diann_log} | grep \"Averaged recommended settings\" | cut -d ' ' -f 19 | tr -cd \"[0-9]\")"
mass_acc_ms1 = "\$(cat ${diann_log} | grep \"Averaged recommended settings\" | cut -d ' ' -f 15 | tr -cd \"[0-9]\")"
} else if (meta['precursormasstoleranceunit'].toLowerCase().endsWith('ppm') && meta['fragmentmasstoleranceunit'].toLowerCase().endsWith('ppm')) {
if (params.mass_acc_automatic || params.scan_window_automatic) {
if (meta.mass_acc_ms2 != "0" && meta.mass_acc_ms2 != null) {
mass_acc_ms2 = meta.mass_acc_ms2
mass_acc_ms1 = meta.mass_acc_ms1
scan_window = meta.scan_window
}
else if (meta['fragmentmasstolerance']) {
mass_acc_ms2 = meta['fragmentmasstolerance']
mass_acc_ms1 = meta['precursormasstolerance']
scan_window = params.scan_window
}
else {
mass_acc_ms2 = params.mass_acc_ms2
mass_acc_ms1 = params.mass_acc_ms1
scan_window = params.scan_window
}
} else if (meta['precursormasstoleranceunit']?.toLowerCase()?.endsWith('ppm') && meta['fragmentmasstoleranceunit']?.toLowerCase()?.endsWith('ppm')) {
mass_acc_ms1 = meta["precursormasstolerance"]
mass_acc_ms2 = meta["fragmentmasstolerance"]
} else {
mass_acc_ms2 = "\$(cat ${diann_log} | grep \"Averaged recommended settings\" | cut -d ' ' -f 11 | tr -cd \"[0-9]\")"
scan_window = "\$(cat ${diann_log} | grep \"Averaged recommended settings\" | cut -d ' ' -f 19 | tr -cd \"[0-9]\")"
mass_acc_ms1 = "\$(cat ${diann_log} | grep \"Averaged recommended settings\" | cut -d ' ' -f 15 | tr -cd \"[0-9]\")"
if (meta.mass_acc_ms2 != "0" && meta.mass_acc_ms2 != null) {
mass_acc_ms2 = meta.mass_acc_ms2
mass_acc_ms1 = meta.mass_acc_ms1
scan_window = meta.scan_window
} else if (meta['fragmentmasstolerance']) {
mass_acc_ms2 = meta['fragmentmasstolerance']
mass_acc_ms1 = meta['precursormasstolerance']
scan_window = params.scan_window
} else {
mass_acc_ms2 = params.mass_acc_ms2
mass_acc_ms1 = params.mass_acc_ms1
scan_window = params.scan_window
}
}

diann_no_peptidoforms = params.diann_no_peptidoforms ? "--no-peptidoforms" : ""
Expand Down
4 changes: 0 additions & 4 deletions modules/local/diann/individual_analysis/meta.yml
Original file line number Diff line number Diff line change
Expand Up @@ -10,10 +10,6 @@ tools:
homepage: https://github.com/vdemichev/DiaNN
documentation: https://github.com/vdemichev/DiaNN
input:
- diann_log:
type: file
description: DIA-NN log file
pattern: "assemble_empirical_library.log"
- empirical_library:
type: file
description: An empirical spectral library from the .quant files.
Expand Down
4 changes: 2 additions & 2 deletions modules/local/pmultiqc/main.nf
Original file line number Diff line number Diff line change
Expand Up @@ -2,8 +2,8 @@ process PMULTIQC {
label 'process_high'

container "${ workflow.containerEngine == 'singularity' && !task.ext.singularity_pull_docker_container ?
'https://depot.galaxyproject.org/singularity/pmultiqc:0.0.42--pyhdfd78af_0' :
'biocontainers/pmultiqc:0.0.42--pyhdfd78af_0' }"
'https://depot.galaxyproject.org/singularity/pmultiqc:0.0.43--pyhdfd78af_0' :
'biocontainers/pmultiqc:0.0.43--pyhdfd78af_0' }"

input:
path 'results/*'
Expand Down
4 changes: 2 additions & 2 deletions modules/local/samplesheet_check/main.nf
Original file line number Diff line number Diff line change
Expand Up @@ -4,8 +4,8 @@ process SAMPLESHEET_CHECK {
label 'process_tiny'

container "${ workflow.containerEngine == 'singularity' && !task.ext.singularity_pull_docker_container ?
'https://depot.galaxyproject.org/singularity/quantms-utils:0.0.27--pyh106432d_0' :
'biocontainers/quantms-utils:0.0.27--pyh106432d_0' }"
'https://depot.galaxyproject.org/singularity/quantms-utils:0.0.28--pyh106432d_0' :
'biocontainers/quantms-utils:0.0.28--pyh106432d_0' }"

input:
path input_file
Expand Down
4 changes: 2 additions & 2 deletions modules/local/utils/mzml_statistics/main.nf
Original file line number Diff line number Diff line change
Expand Up @@ -4,8 +4,8 @@ process MZML_STATISTICS {
label 'process_single'

container "${ workflow.containerEngine == 'singularity' && !task.ext.singularity_pull_docker_container ?
'https://depot.galaxyproject.org/singularity/quantms-utils:0.0.27--pyh106432d_0' :
'biocontainers/quantms-utils:0.0.27--pyh106432d_0' }"
'https://depot.galaxyproject.org/singularity/quantms-utils:0.0.28--pyh106432d_0' :
'biocontainers/quantms-utils:0.0.28--pyh106432d_0' }"

input:
tuple val(meta), path(ms_file)
Expand Down
4 changes: 4 additions & 0 deletions nextflow.config
Original file line number Diff line number Diff line change
Expand Up @@ -85,6 +85,10 @@ params {
random_preanalysis_seed = 42
empirical_assembly_ms_n = 200

// DIA-NN: INDIVIDUAL_ANALYSIS
mass_acc_ms2 = 15
mass_acc_ms1 = 15

// DIA-NN: FINAL_QUANTIFICATION — summarization & output
pg_level = 2
species_genes = false
Expand Down
Loading
Loading