Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Control generation of --output-stats when running umi-tools dedup #827

Closed
lars-work-sund opened this issue May 19, 2022 · 3 comments
Closed
Milestone

Comments

@lars-work-sund
Copy link

lars-work-sund commented May 19, 2022

Description of feature

umitools dedup uses large amounts of memory and runs slowly. To speed it up it is recommended to only run it on a single chromosome, see the FAQ point number 4.

Currently the option is hard-coded, see github

The suggested solution to only run on a single chromosome would cause the pipeline to only analyze a single chromosome.
Since --chrom seems to be mostly for debugging purposes, see here

I suggest either making the --output-stats optional, or running a second round of deduplication on a single chromosome to generate the output stats.

@lars-work-sund
Copy link
Author

Running the analysis on a single chromosome is more tricky than I expected. In the NFCORE_RNASEQ:RNASEQ:DEDUP_UMI_UMITOOLS_TRANSCRIPTOME:UMITOOLS_DEDUP step each transcript is considered a contig.

@drpatelh
Copy link
Member

Ok. In that case, we definitely need to fix the module in the next release.

@lars-work-sund tried this config below which failed as reported in the comment above:

process {
    withName: 'NFCORE_RNASEQ:RNASEQ:DEDUP_UMI_UMITOOLS_GENOME:UMITOOLS_DEDUP' {
        cpus = 12
        memory = '72.GB'
        time = '16.h'
        ext.args = '--chrom=chr22'
    }

    withName: 'NFCORE_RNASEQ:RNASEQ:DEDUP_UMI_UMITOOLS_TRANSCRIPTOME:UMITOOLS_DEDUP' {
        cpus = 12
        memory = '72.GB'
        time = '16.h'
        ext.args = '--chrom=chr22'
    }
}

We will need to update the nf-core/module to make --output-stats optional and possibly add a parameter to the pipeline to control this behaviour e.g. --umitools_dedup_output_stats which is off by default.

@drpatelh drpatelh added this to the 3.8 milestone May 23, 2022
@drpatelh drpatelh changed the title umi-tools dedup: only run --output-stats on a single chromosome Control generation of --output-stats when running umi-tools dedup May 23, 2022
drpatelh added a commit to drpatelh/nf-core-rnaseq that referenced this issue May 24, 2022
@drpatelh
Copy link
Member

drpatelh commented May 24, 2022

nf-core/module updated in nf-core/modules#1689
Fixed in pipeline in drpatelh@8102aa3

Added a new boolean parameter to the pipeline called --umitools_dedup_stats that when provided will generate the output stats. By default, the stats won't be generated.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

2 participants