Skip to content
Merged
Show file tree
Hide file tree
Changes from 62 commits
Commits
Show all changes
65 commits
Select commit Hold shift + click to select a range
e31a2b4
Creating new module folder for starfusion, using fq2bam as a starting…
gburnett-nvidia Sep 24, 2025
4a06a59
Merge branch 'master' into parabricks_starfusion
gburnett-nvidia Sep 24, 2025
698f22e
resolving commit issue
gburnett-nvidia Sep 24, 2025
94ad33e
Updating files to reflect starfusion instead of fq2bam. Tests not pas…
gburnett-nvidia Sep 24, 2025
0179006
Parabricks is running but failing. Must be an issue with the input data.
gburnett-nvidia Sep 24, 2025
0e76b96
Merge branch 'master' into parabricks_starfusion
gburnett-nvidia Sep 25, 2025
1281359
Creating new starfusion_build folder. Copied from starfusion.
gburnett-nvidia Sep 25, 2025
bd4b208
Changing to using starfusion/build as base for new parabricks starfus…
gburnett-nvidia Sep 25, 2025
0a42ab9
Decremented starfusion_build to 1.7 for Parabricks compatibility
gburnett-nvidia Sep 25, 2025
a5cb70e
Tested and linted new starfusion model
gburnett-nvidia Sep 25, 2025
2443e06
Removing GPU docker options from nextflow.config
gburnett-nvidia Sep 25, 2025
fed75fb
Adding support for Chimeric.out.junction output in rnafq2bam
gburnett-nvidia Sep 26, 2025
573322c
Updating language in README.md
gburnett-nvidia Sep 26, 2025
a19d167
Adding version checking to snapshot assertion
gburnett-nvidia Sep 26, 2025
81ebabf
Adding out_dir to test assertions
gburnett-nvidia Sep 26, 2025
2ef1228
Merge branch 'master' into parabricks_starfusion
gburnett-nvidia Sep 26, 2025
b94b883
Removing stub fail test
gburnett-nvidia Sep 29, 2025
06f9e38
Merge branch 'parabricks_starfusion' of github.com:clara-parabricks-w…
gburnett-nvidia Sep 29, 2025
94b0126
Cleaning up data outputs
gburnett-nvidia Sep 30, 2025
4f85f5a
Removing unnecessary tags from starfusion nftest
gburnett-nvidia Sep 30, 2025
547ee70
Cleaning up tags in starfusion_build nftest
gburnett-nvidia Sep 30, 2025
8b708bc
Added test for chimeric output in rnafq2bam
gburnett-nvidia Sep 30, 2025
0f2bc48
Fixing issue with empty output (problem was with passing the chimeric…
gburnett-nvidia Sep 30, 2025
e15393b
Changing to starfusion test dataset to fix empty junction file
gburnett-nvidia Sep 30, 2025
efd0b79
Making Chimeric.out.junction output detection more generic
gburnett-nvidia Oct 1, 2025
f15a76f
Cleaning up rnafq2bam test assertions
gburnett-nvidia Oct 1, 2025
6b43978
Changing starfusion to output fusions and abridged fusions (for parit…
gburnett-nvidia Oct 1, 2025
72c25c3
Merge branch 'master' into parabricks_starfusion
gburnett-nvidia Oct 1, 2025
a5c2c45
Removing chimeric tests due to hanging issue with CI runners
gburnett-nvidia Oct 3, 2025
020f10e
Reducing memory requirements
gburnett-nvidia Oct 3, 2025
470b2fb
Updating mem requiremnts
gburnett-nvidia Oct 3, 2025
78bad19
ci: Try 12xl
edmundmiller Oct 3, 2025
97a81a3
Brining back chimeric tests
gburnett-nvidia Oct 3, 2025
db1bd89
Increasing CI runner GPU size
gburnett-nvidia Oct 3, 2025
abc1986
Merge branch 'nf-core:parabricks_starfusion' into parabricks_starfusion
gburnett-nvidia Oct 3, 2025
bf4ca16
Updating chimeric.out.junction snapshot in rnafq2bam
gburnett-nvidia Oct 3, 2025
b22f6cf
Upgrading memory from 15 GB to 100 GB to meet minimum recommended req…
gburnett-nvidia Oct 7, 2025
c8fa6ad
Merge branch 'parabricks_starfusion' of github.com:clara-parabricks-w…
gburnett-nvidia Oct 7, 2025
8b5eab8
Chimeric.out.junction snapshot is not stable, so moving to just check…
gburnett-nvidia Oct 7, 2025
2f1616f
Moving module params out of the config and into the test file
gburnett-nvidia Oct 7, 2025
d96debc
Merge branch 'master' into parabricks_starfusion
gburnett-nvidia Oct 7, 2025
19c077a
Updating test data paths
gburnett-nvidia Oct 8, 2025
0bc879d
Merge branch 'parabricks_starfusion' of github.com:clara-parabricks-w…
gburnett-nvidia Oct 8, 2025
78f917b
Merge branch 'master' into parabricks_starfusion
gburnett-nvidia Oct 8, 2025
d94acd7
Updating readme
gburnett-nvidia Oct 8, 2025
9ee1ee4
Merge branch 'parabricks_starfusion' of github.com:clara-parabricks-w…
gburnett-nvidia Oct 8, 2025
a08ab56
Adding version printing back to snapshot
gburnett-nvidia Oct 8, 2025
bec434f
Adding version printing back to snapshot (more places)
gburnett-nvidia Oct 8, 2025
22f274e
Cleaning up module args
gburnett-nvidia Oct 8, 2025
488ceb6
Removing compatible versions from module (since it does not work)
gburnett-nvidia Oct 8, 2025
c7e33d3
Scaling down instance size to match minimum system requirements
gburnett-nvidia Oct 8, 2025
713259d
Updating stub test
gburnett-nvidia Oct 8, 2025
c215c33
Updating module outputs
gburnett-nvidia Oct 9, 2025
4afd993
Updating stub test
gburnett-nvidia Oct 9, 2025
593d117
Updating meta.yml to match new module outputs
gburnett-nvidia Oct 9, 2025
e958a39
Adding rnafq2bam snapshot
gburnett-nvidia Oct 9, 2025
196f759
Updating rnafq2bam outputs to fix stub test
gburnett-nvidia Oct 9, 2025
b3467ba
Fixing starfusion stub tests
gburnett-nvidia Oct 9, 2025
cdb1c1d
Updating README to include notes on testing for starfusion module
gburnett-nvidia Oct 9, 2025
4a8ea2d
Updating README to reflect new large instance for testing starfusion
gburnett-nvidia Oct 9, 2025
f0e433f
Merge branch 'master' into parabricks_starfusion
gburnett-nvidia Oct 9, 2025
83e2cea
Updating to 4.6
gburnett-nvidia Oct 13, 2025
7ad8a9e
Adjusting memory requirements for rnafq2bam and starfusion testing to…
gburnett-nvidia Oct 14, 2025
daaa45d
Reverting github ci runner instance
gburnett-nvidia Oct 15, 2025
cef8071
Merge branch 'master' into parabricks_starfusion
gburnett-nvidia Oct 15, 2025
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
2 changes: 1 addition & 1 deletion .github/workflows/nf-test-gpu.yml
Original file line number Diff line number Diff line change
Expand Up @@ -91,7 +91,7 @@ jobs:
echo ${{ steps.set-shards.outputs.total_shards }}

nf-test-gpu:
runs-on: "runs-on=${{ github.run_id }}/family=g4dn.xlarge/image=ubuntu24-gpu-x64"
runs-on: "runs-on=${{ github.run_id }}/family=g4dn.8xlarge/image=ubuntu24-gpu-x64"
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

looks good to me its just this change where I am unsure

Copy link
Contributor Author

@gburnett-nvidia gburnett-nvidia Oct 14, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Based on the other PR it looks like it's okay for me to change it back before merging. And we'll have some discussions elsewhere about if there's a better way we can do this in the future.

name: "GPU Test | ${{ matrix.profile }} | ${{ matrix.shard }}"
needs: [nf-test-changes]
if: ${{ needs.nf-test-changes.outputs.total_shards != '0' }}
Expand Down
46 changes: 34 additions & 12 deletions modules/nf-core/parabricks/README.md
Original file line number Diff line number Diff line change
@@ -1,18 +1,20 @@
# NVIDIA Clara Parabricks modules
# NVIDIA Parabricks Modules

These nf-core modules implement functionality of the NVIDIA Clara Parabricks programs for GPU-accelerated genomics tasks. Parabricks covers the functionality of alignment (replicating `bwa mem`), variant calling (replicating `gatk4` and `deepvariant`), and other common genomics tasks. Please see the [documentation](https://docs.nvidia.com/clara/parabricks/4.0.1/index.html) for additional details.
These nf-core modules implement functionality of the NVIDIA Parabricks software for GPU-accelerated genomics tasks. Parabricks covers the functionality of alignment (Ex. `bwa mem`, `bwa meth` and `STAR`), variant and fusion calling (Ex. `deepvariant`, `mutectcaller`, and `starfusion`), and other common genomics tasks. Please see the [documentation](https://docs.nvidia.com/clara/parabricks/latest/index.html) for additional details.

## General considerations

Parabricks is available only through a docker container: `nvcr.io/nvidia/clara/clara-parabricks:4.5.1-1`. The main entrypoint to the Parabricks is the command line program `pbrun`, which calls several sub-programs within the container. The Prabricks authors combined several common patterns into a single tool, such as `fq2bam` performing alignment, sorting, mark duplicates, and base quality score recalibration, all within a single command. Generally, the tools can be used for only a subset of the entire functionality as well.
Parabricks is available only through a docker container: `nvcr.io/nvidia/clara/clara-parabricks:latest`. The main entrypoint to the Parabricks is the command line program `pbrun`, which calls several sub-programs within the container. The Parabricks authors sometimes combine several common tasks into a single tool, such as `fq2bam` performing bwa alignment, sorting, mark duplicates, and base quality score recalibration, all within a single command. Generally, the tools can be used for only a subset of the entire functionality as well.

Parabricks tools must be run with at least one NVIDIA GPU with >16GB vRAM, and a usually require a large amount of resources (at least 8 threads and 30GB RAM). Please see the [resource requirements](https://docs.nvidia.com/clara/parabricks/4.0.1/GettingStarted.html) for more information. It is recommended to use fast local storage whenever possible to ensure the system is not bottlenecked by I/O.
## Hardware Requirements

Parabricks tools must be run with at least one NVIDIA GPU with at least 16GB vRAM, and a usually require a large amount of resources (at least 8 threads and 30GB RAM). Please see the [installation requirements](https://docs.nvidia.com/clara/parabricks/latest/gettingstarted/installationrequirements.html) for more information. It is recommended to use fast local storage whenever possible to ensure the system is not bottlenecked by I/O.

To give docker or singularity access to GPUs present on the host system, add this line to the configuration file: `docker.runOptions = "--gpus all"` or `singularity.runOptions = "--nv"`.

## License

As of version 4.0, Parabricks is available for general use. The license from NVIDIA states:
As of version 4.0, Parabricks is available for free for general use. The license from NVIDIA states:

```
A license is no longer required to use Clara Parabricks. The container works out of the box once downloaded.
Expand All @@ -26,24 +28,44 @@ Clara Parabricks is free for
Users who would like to have Enterprise Support for Clara Parabricks can purchase NVIDIA AI Enterprise licenses, which provides full-stack support. To learn more about NVIDIA AI Enterprise, please visit https://www.nvidia.com/en-us/data-center/products/ai-enterprise/.
```

## Specific tools
## Notes on Specific tools

### parabricks/fq2bam
### fq2bam

`fq2bam` performs alignment, sorting, (optional) marking of duplicates, and (optional) base quality score recalibration (BQSR). There is no option to control the number of threads used with this tool - all available threads on the system are used by default.
The `fq2bam` module performs alignment, sorting, (optional) marking of duplicates, and (optional) base quality score recalibration (BQSR). There is no option to control the number of threads used with this tool - all available threads on the system are used by default.

Alignment and coordinate sorting are always performed. Duplicate marking can be performed by passing the option `markdups=true`. Duplicate marking and BQSR can be performed by passing the options `markdups=true` and `known_sites=$KNOWN_SITES_FILE`.

Please see the `fq2bam/meta.yml` file for a detailed list of required and optional inputs and outputs.

For additional considerations, including information about how readgroups are added to the resulting bam files, see the [tool documentation](https://docs.nvidia.com/clara/parabricks/latest/Documentation/ToolDocs/man_fq2bam.html).

## parabricks/stargenomegenerate
## rnafq2bam

The `rnafq2bam` module is based on STAR version `2.7.2a`. Therefore the genome lib directory required as input for this module must also be generated using this version of STAR. For convenience, a module with this version of STAR is included in this directory under `parabricks/stargenomegenerate`.

## starfusion

The `parabricks/stargenomegenerate` module is near identical to the existing `star/genomegenerate` module, however it runs on with older version of STAR (2.7.2a) that is required for Parabricks compatibility. This module does not exist in any previous versions of the `nf-core/modules` and therefore must be included here. In the future, it's possible that Parabricks will update to a newer version of STAR and this accessory module may become obselete, but for now it is required pre-processing if the Genome Lib Dir has not already been generated with this vesion of STAR.
The `starfusion` module is based on starfusion 1.7.0. Therefore the genome lib directory required as input for this module must also be generated using this version of starfusion. For convenience, a module with this version of starfusion is included in this directory under `parabricks/starfusion_build`.

## Compatible with
## compatible_with.yaml

Is added as optional output to the stub section to make the compatible CPU version available to the end user. This section is not given for the subtools `applybqsr`, `fq2bammeth`, `genotypegvcf`, or `rnafq2bam`.
The `compatible_with.yaml` is provided as an optional output to the stub section to make the compatible CPU version available to the end user. This section is not given for the subtools `applybqsr`, `fq2bammeth`, `genotypegvcf`, `rnafq2bam`, or `starfusion`.

For the full list of compatible versions, check the [Parabricks documentation](https://docs.nvidia.com/clara/parabricks/latest/documentation/tooldocs/outputaccuracyandcompatiblecpusoftwareversions.html#).

## Notes on Testing

The Parabricks `starfusion` module requires testing on a `g4dn.8xlarge` instead of the default `g4dn.xlarge` due to higher system memory requirements. In [nf-test-gpu.yml](../../../.github/workflows/nf-test-gpu.yml), update the following line:

```
nf-test-gpu:
runs-on: "runs-on=${{ github.run_id }}/family=g4dn.8xlarge/image=ubuntu24-gpu-x64"
```

Change it back before merging into master:

```
nf-test-gpu:
runs-on: "runs-on=${{ github.run_id }}/family=g4dn.xlarge/image=ubuntu24-gpu-x64"
```
19 changes: 14 additions & 5 deletions modules/nf-core/parabricks/rnafq2bam/main.nf
Original file line number Diff line number Diff line change
Expand Up @@ -5,18 +5,21 @@ process PARABRICKS_RNAFQ2BAM {
// needed by the module to work properly can be removed when fixed upstream - see: https://github.com/nf-core/modules/issues/7226
stageInMode 'copy'

container "nvcr.io/nvidia/clara/clara-parabricks:4.5.1-1"
container "nvcr.io/nvidia/clara/clara-parabricks:4.6.0-1"

input:
tuple val(meta), path(reads)
tuple val(meta), path(reads)
tuple val(meta1), path(fasta)
tuple val(meta2), path(index)
tuple val(meta3), path(genome_lib_dir)

output:
tuple val(meta), path("*.bam"), emit: bam
tuple val(meta), path("*.bai"), emit: bai
path "versions.yml", emit: versions
tuple val(meta), path("*.bam"), emit: bam
tuple val(meta), path("*.bai"), emit: bai
tuple val(meta), path("Chimeric.out.junction"), emit: junction, optional: true
tuple val(meta), path("*_qc_metrics"), emit: qc_metrics, optional:true
tuple val(meta), path("*.duplicate-metrics.txt"), emit: duplicate_metrics, optional:true
path "versions.yml", emit: versions

when:
task.ext.when == null || task.ext.when
Expand Down Expand Up @@ -46,6 +49,10 @@ process PARABRICKS_RNAFQ2BAM {
${num_gpus} \\
${args}

if [[ "${args}" == *"--out-chim-type"* ]]; then
mv ${prefix}/Chimeric.out.junction .
fi

cat <<-END_VERSIONS > versions.yml
"${task.process}":
pbrun: \$(echo \$(pbrun version 2>&1) | sed 's/^Please.* //' )
Expand All @@ -59,11 +66,13 @@ process PARABRICKS_RNAFQ2BAM {
}
def args = task.ext.args ?: ''
def prefix = task.ext.prefix ?: "${meta.id}"
def chimeric_output = args.contains("--out-chim-type") ? "touch Chimeric.out.junction" : ""
def qc_metrics_output = args.contains("--out-qc-metrics-dir") ? "mkdir ${prefix}_qc_metrics" : ""
def duplicate_metrics_output = args.contains("--out-duplicate-metrics") ? "touch ${prefix}.duplicate-metrics.txt" : ""
"""
touch ${prefix}.bam
touch ${prefix}.bam.bai
${chimeric_output}
${qc_metrics_output}
${duplicate_metrics_output}

Expand Down
33 changes: 33 additions & 0 deletions modules/nf-core/parabricks/rnafq2bam/meta.yml
Original file line number Diff line number Diff line change
Expand Up @@ -81,6 +81,39 @@ output:
pattern: "*.bai"
ontologies:
- edam: http://edamontology.org/format_3327 # BAI
junction:
- - meta:
type: map
description: |
Groovy Map containing sample information
e.g. [ id:'test', single_end:false ]
- "Chimeric.out.junction":
type: file
description: Chimeric junction output file
pattern: "Chimeric.out.junction"
ontologies: []
qc_metrics:
- - meta:
type: map
description: |
Groovy Map containing sample information
e.g. [ id:'test', single_end:false ]
- "*_qc_metrics":
type: directory
description: (optional) optional directory of qc metrics
pattern: "*_qc_metrics"
duplicate_metrics:
- - meta:
type: map
description: |
Groovy Map containing sample information
e.g. [ id:'test', single_end:false ]
- "*.duplicate-metrics.txt":
type: file
description: (optional) metrics calculated from marking duplicates in the
bam file
pattern: "*.duplicate-metrics.txt"
ontologies: []
versions:
- versions.yml:
type: file
Expand Down
107 changes: 92 additions & 15 deletions modules/nf-core/parabricks/rnafq2bam/tests/main.nf.test
Original file line number Diff line number Diff line change
Expand Up @@ -18,14 +18,14 @@ nextflow_process {
script "../../stargenomegenerate/main.nf"
process {
"""
input[0] = Channel.of([
[ id:'test_fasta' ],
[ file(params.modules_testdata_base_path + 'genomics/homo_sapiens/genome/genome.fasta', checkIfExists: true) ]
])
input[1] = Channel.of([
[ id:'test_gtf' ],
[ file(params.modules_testdata_base_path + 'genomics/homo_sapiens/genome/genome.gtf', checkIfExists: true) ]
])
input[0] = [
[ id:'minigenome_fasta' ],
file(params.modules_testdata_base_path + 'genomics/homo_sapiens/genome/minigenome.fa')
]
input[1] = [
[ id:'minigenome_gtf' ],
file(params.modules_testdata_base_path + 'genomics/homo_sapiens/genome/minigenome.gtf')
]
"""
}
}
Expand All @@ -49,7 +49,7 @@ nextflow_process {

when {
params {
module_args = '--low-memory --read-files-command zcat'
module_args = '--low-memory --read-files-command zcat --x3'
// Ref: https://forums.developer.nvidia.com/t/problem-with-gpu/256825/6
// Parabricks’s rnafq2bam requires 24GB of memory.
// Using --low-memory for testing
Expand All @@ -58,11 +58,11 @@ nextflow_process {
"""
input[0] = Channel.of([
[ id:'test', single_end:true ], // meta map
[ file(params.modules_testdata_base_path + 'genomics/homo_sapiens/illumina/fastq/test_rnaseq_1.fastq.gz', checkIfExists: true) ]
[ file(params.modules_testdata_base_path + 'genomics/homo_sapiens/genome/test_starfusion_rnaseq_1.fastq.gz', checkIfExists: true) ]
])
input[1] = Channel.of([
[ id:'test' ], // meta map
file(params.modules_testdata_base_path + 'genomics/homo_sapiens/genome/genome.fasta', checkIfExists: true)
file(params.modules_testdata_base_path + 'genomics/homo_sapiens/genome/minigenome.gtf', checkIfExists: true)
])
input[2] = BWA_INDEX.out.index
input[3] = PARABRICKS_STARGENOMEGENERATE.out.index
Expand Down Expand Up @@ -96,11 +96,88 @@ nextflow_process {
"""
input[0] = Channel.of([
[ id:'test', single_end:true ], // meta map
[ file(params.modules_testdata_base_path + 'genomics/homo_sapiens/illumina/fastq/test_rnaseq_1.fastq.gz', checkIfExists: true) ]
[ file(params.modules_testdata_base_path + 'genomics/homo_sapiens/genome/test_starfusion_rnaseq_1.fastq.gz', checkIfExists: true) ]
])
input[1] = Channel.of([
[ id:'test' ], // meta map
file(params.modules_testdata_base_path + 'genomics/homo_sapiens/genome/genome.fasta', checkIfExists: true)
file(params.modules_testdata_base_path + 'genomics/homo_sapiens/genome/minigenome.gtf', checkIfExists: true)
])
input[2] = BWA_INDEX.out.index
input[3] = PARABRICKS_STARGENOMEGENERATE.out.index
"""
}
}

then {
assertAll(
{ assert process.success },
{ assert snapshot(
process.out,
path(process.out.versions[0]).yaml
).match() }
)
}
}

test("homo_sapiens_chimeric") {

config "./nextflow.config"

when {
params {
module_args = '--low-memory --read-files-command zcat --out-chim-type Junctions --min-chim-segment 15 --x3'
// Ref: https://forums.developer.nvidia.com/t/problem-with-gpu/256825/6
// Parabricks’s rnafq2bam requires 24GB of memory.
// Using --low-memory for testing
}
process {
"""
input[0] = Channel.of([
[ id:'test', single_end:true ], // meta map
[ file(params.modules_testdata_base_path + 'genomics/homo_sapiens/genome/test_starfusion_rnaseq_1.fastq.gz', checkIfExists: true) ]
])
input[1] = Channel.of([
[ id:'test' ], // meta map
file(params.modules_testdata_base_path + 'genomics/homo_sapiens/genome/minigenome.gtf', checkIfExists: true)
])
input[2] = BWA_INDEX.out.index
input[3] = PARABRICKS_STARGENOMEGENERATE.out.index
"""
}
}

then {
assertAll(
{ assert process.success },
{ assert snapshot(
bam(process.out.bam[0][1]).getReadsMD5(),
file(process.out.bai[0][1]).name,
file(process.out.junction[0][1]).name,
process.out.versions,
path(process.out.versions[0]).yaml
).match() }
)
}
}

test("homo_sapiens_chimeric - stub") {

config "./nextflow.config"
options "-stub"

when {
params {
module_args = '--out-chim-type Junctions'
}
process {
"""
input[0] = Channel.of([
[ id:'test', single_end:true ], // meta map
[ file(params.modules_testdata_base_path + 'genomics/homo_sapiens/genome/test_starfusion_rnaseq_1.fastq.gz', checkIfExists: true) ]
])
input[1] = Channel.of([
[ id:'test' ], // meta map
file(params.modules_testdata_base_path + 'genomics/homo_sapiens/genome/minigenome.gtf', checkIfExists: true)
])
input[2] = BWA_INDEX.out.index
input[3] = PARABRICKS_STARGENOMEGENERATE.out.index
Expand All @@ -113,8 +190,8 @@ nextflow_process {
{ assert process.success },
{ assert snapshot(
process.out,
path(process.out.versions.get(0)).yaml,
).match() }
path(process.out.versions[0]).yaml
).match() }
)
}
}
Expand Down
Loading
Loading