Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix parsing of UMI tools dedup output to MultiQC #12

Closed
lianov opened this issue Apr 12, 2024 · 1 comment
Closed

Fix parsing of UMI tools dedup output to MultiQC #12

lianov opened this issue Apr 12, 2024 · 1 comment
Labels
enhancement New feature or request

Comments

@lianov
Copy link
Member

lianov commented Apr 12, 2024

Description

Currently, given our strategy of splitting files by chr. to speed up deduplication, there are multiple QC files reported in the MultiQC report (one dedup QC for each chr). This will lead to over-crowding of the report with multiple samples (both in the General Stats section and UMI-tools. A few strategies around this may include:

  1. Only include main chromossomes (no haplotypes etc., given lower reads in these regions). This may still be too much for the report, but worth a thought (second option more straight-forward)
  2. Create a new UMI-tools stats output based on overall metrics from split files.
@lianov lianov added bug Something isn't working enhancement New feature or request and removed bug Something isn't working labels Apr 12, 2024
@lianov
Copy link
Member Author

lianov commented Apr 29, 2024

As a note, our path here for first release will be to disable the outputs from UMI-tools from being passed on to MultiQC at this time. The pipeline already produces samtools stats from before and after dedub, and although this does not provide a bar plot for the % of reads that are detected from UMI-tools, once can see how many reads are dropped from dedup from the already provided metrics (at the top of the MultiQC report).

We will leave this issue open for re-consideration in the future (and the files from UMI-tools are still produced, simply not added to the scnanoseq "out of the box" MultiQC report)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

No branches or pull requests

1 participant