Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

AddFlowBaseQuality tool #8235

Merged

Conversation

dror27
Copy link
Contributor

@dror27 dror27 commented Mar 7, 2023

A new tool, for flow based fils, that writes reads from SAM format file (SAM/BAM/CRAM) that pass criteria to a new file while adding a base-quality attribute (BQ)

@ilyasoifer ilyasoifer self-requested a review March 8, 2023 10:45
Copy link
Collaborator

@ilyasoifer ilyasoifer left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@dror27 - can you please annotate this tool as belonging to FlowBasedTools group?
We did this for FlowFeatureMapper

@ilyasoifer ilyasoifer requested a review from meganshand March 21, 2023 21:24
@dror27
Copy link
Contributor Author

dror27 commented Apr 17, 2023

@ilyasoifer - I have addressed your comments. Please review

@ilyasoifer
Copy link
Collaborator

@meganshand - can you take a look please? It is a nice tool that converts the indel qualities to base qualities. Sometimes people are interested in that

@meganshand meganshand self-assigned this Apr 28, 2023
final double[][] errorProbBands = extractErrorProbBands(fbRead, minErrorRate);
final double[] result = new double[fbRead.getBasesNoCopy().length];

// loop over hmers via flow key
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This section isn't clear to me. Can you add some documentation about the flow key? I think I'm just missing the structure of the flow error probabilities. You can either add some more comments here, or just point the reader to something if it already exists elsewhere.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Added a javadoc link to FlowBasedRead#flowMatrix where the flow probabilities are described.

@Argument(fullName = MAXIMAL_QUALITY_SCORE_LONG_NAME, doc = "clip quality score to the given value (phred)")
public int maxQualityScore = 126;

@Argument(fullName = REPLACE_QUALITY_MODE_LONG_NAME, doc = "replace existing base qualities while saving previous qualities to OQ (when true) or simply write to BQ (when false) ")
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not sure I understand the use case of writing to BQ without changing the actual base qualities. Both of these options make the file larger and contain the same information (since you continue to store the old base qualities either in the QUAL field or in the OQ tag and you store the new qualities either in the BQ tag or the QUAL field). Is there a reason to not replace the QUAL field and only keep the new qualities in the BQ tag?

Additionally the BQ tag is reserved in the spec for "Offset to base alignment quality (BAQ)" so might not be the best choice to put these new base qualities.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As I understand the two modes, they do not involve storing the same information twice.

In the 'replace' mode, the QUAL field is replaced with qualities computed from the flow probabilities .The old QUAL field is saved in OQ.

In the 'non replace mode', the QUAL field is preserved while a new quality string, computed from the flow probabilities, is saved in BQ.

It is assumed that computed probabilities will be different than the original QUAL - giving rise to this tool.

As to the BQ being reserved, I have replaced with XQ, which is the user defined space.

@meganshand meganshand assigned dror27 and unassigned meganshand May 1, 2023
ilyasoifer and others added 2 commits May 1, 2023 23:32
- added link to flow based prob doc
- renamed BQ to XQ to avoid clash with standard
@ilyasoifer ilyasoifer self-requested a review May 2, 2023 05:35
@ilyasoifer ilyasoifer merged commit d2783f7 into broadinstitute:master May 2, 2023
@ilyasoifer ilyasoifer deleted the ultima.add-flow-base-quality.devel branch May 2, 2023 08:09
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants