-
Notifications
You must be signed in to change notification settings - Fork 369
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Included support for indel error detection and read filtering in CollectSamErrorMetrics #1370
Conversation
Yikes! that's some PR! could you split this into 3 separate PRs?
I think that it will make reviewing the code much easier. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Even though I asked that you split this up, I'm still going to give a few comments here....
src/main/java/picard/sam/SamErrorMetric/BooleanSamErrorReadFilterCriterion.java
Outdated
Show resolved
Hide resolved
src/main/java/picard/sam/SamErrorMetric/NumericSamErrorReadFilterCriterion.java
Outdated
Show resolved
Hide resolved
src/main/java/picard/sam/SamErrorMetric/SamErrorReadFilterCriterion.java
Outdated
Show resolved
Hide resolved
@yfarjoun Since these features are built upon each other they'll be somewhat difficult to deconflict at this point. It's a larger PR, but not completely unwieldy so I think it'll be OK to review without splitting it up. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for this big PR!
the indel error stuff looks good, though I haven't tried it out yet (I'm excited to do so!)
the bulk of the code is about this new filtering DSL, but there's no real testing of it, nor any documentation as to what it is for...also the way it is implemented seems to store the entire input in memory...which doesn't seem to be a good idea (nor needed!) could you explain in the PR or the javaDoc what the filtering is actually for and how it works? it also seems that there are now lots of new output files, but it's unclear what they are for....
Lots of missing "finals" and parenthesis and brace formatting (ask IntelliJ to reformat your code)
src/main/java/picard/sam/SamErrorMetric/IndelErrorCalculator.java
Outdated
Show resolved
Hide resolved
@@ -0,0 +1,28 @@ | |||
package picard.sam.SamErrorMetric; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
all new files need a license
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Right, added that. There were also some other files in the SamErrorMetric directory that didn't have licenses (e.g. OverlappingErrorMetric), I added a license to those as well if that's ok
src/main/java/picard/sam/SamErrorMetric/NumericSamErrorReadFilterCriterion.java
Outdated
Show resolved
Hide resolved
src/main/java/picard/sam/SamErrorMetric/ReadBaseStratification.java
Outdated
Show resolved
Hide resolved
return null; | ||
} | ||
if(operation == CollectSamErrorMetrics.BaseOperation.Insertion && cigarElement.getOperator() != CigarOperator.I) { | ||
log.warn("Wrong CIGAR operator for the given position. This is an error and should be fixed."); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this seems to be more serious than a "warning"....what's going on? shoudln't this throw an exception here?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
also, unclear what "should be fixed" means...did the user do something wrong?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This case should never occur (if it does, it would mean an error in the CIGAR string or an error in the code, i.e. an error by me), however, it is seems like a sensible check to make. I now throw an exception instead of a log warning, let me know if you have a better suggestion
src/main/java/picard/sam/SamErrorMetric/CollectSamErrorMetrics.java
Outdated
Show resolved
Hide resolved
src/main/java/picard/sam/SamErrorMetric/CollectSamErrorMetrics.java
Outdated
Show resolved
Hide resolved
src/main/java/picard/sam/SamErrorMetric/CollectSamErrorMetrics.java
Outdated
Show resolved
Hide resolved
src/main/java/picard/sam/SamErrorMetric/CollectSamErrorMetrics.java
Outdated
Show resolved
Hide resolved
I now removed all the read output "filtering" from this PR and actually tried to split this PR up into smaller ones as well. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Another round of mostly minor changes...but here's a bigger request:
could you change AbstractRecordAndOffset to include a Raison D'être (MATCH, INSERTION, DELETION) with default constructor providing MATCH? this way you will not need to change all the addBase signatures to bubble down the operation and instead just ask the RAO itself..... what do you think?
src/main/java/picard/sam/SamErrorMetric/CollectSamErrorMetrics.java
Outdated
Show resolved
Hide resolved
|
||
if (operation == CollectSamErrorMetrics.BaseOperation.Insertion) { | ||
nInserts++; | ||
CigarElement cigarElement = ReadBaseStratification.getIndelElement(recordAndOffset, operation); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
can this line be extracted from the if statement?
/* | ||
* The MIT License | ||
* | ||
* Copyright (c) 2018 The Broad Institute |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
2019 here and elsewhere
|
||
/** | ||
* A calculator that estimates the error rate of the bases it observes for indels only. | ||
* Created by jonn on 6/13/19. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
no need for "created by" line.
* Added alignment type to RecordAndOffset class - This was suggested as part of broadinstitute/picard#1370 - necessary to determine whether a RecordAndOffset object is an (alignment) match, insertion, or deletion when queried by a SamLocusIterator - default behavior is preserved by overloading the default constructor * Added tests for including the alignment type in RecordAndOffset
13f00e0
to
90f4b9f
Compare
@@ -36,8 +38,16 @@ | |||
**/ | |||
@Override | |||
public void addBase(final SamLocusIterator.RecordAndOffset recordAndOffset, final SamLocusAndReferenceIterator.SAMLocusAndReference locusAndRef) { | |||
if (!SequenceUtil.isNoCall(recordAndOffset.getReadBase())) { | |||
nBases++; | |||
// TODO mgatzen should deletions be counted towards total bases? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I personally think that only read-bases should count towards the total bases, which will be used as the denominator
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I agree - thank you
nBases++; | ||
} | ||
} else if (recordAndOffset.getAlignmentType() == AbstractRecordAndOffset.AlignmentType.Insertion) { | ||
CigarElement cigarElement = ReadBaseStratification.getIndelElement(recordAndOffset); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
final
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
fixed
/** | ||
* Stratifies according to the number of indel bases (from CIGAR string) that the read has. | ||
*/ | ||
public static class IndelsInReadStratifier extends RecordStratifier<Integer> { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should this be implemented as a sum of two CigarOperatorsInReadStratifier
(I
and D
)? probably not as fast, but less code repetition?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good idea. I moved the actual counting to a private static method so that we don't need to instantiate two Stratifier objects to get the result. (That probably made it more code overall, but less repetition.)
|
||
@Test(dataProvider = "provideForTestHasDeletionBeenProcessed") | ||
public void testHasDeletionBeenProcessed(final String cigarString, final List<Boolean> expected) { | ||
Cigar cigar = TextCigarCodec.decode(cigarString); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
finals
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
fixed
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks great. thanks for all the test. just a few minor comments and I'm happy for this to be merged.
- Created new IndelErrorCalculator and IndelErrorMetric classes - Adjusted CollectSamErrorMetrics to be able to consider insertions and deletions, as well as number of inserted and deleted bases - Tests for indel error detection - Added indel_length stratifier - The PairOrientation stratifier now checks if the read is paired and returns null if it is not - Using operator information to that has been incorporated into htsjdk
- Merged mg_longread_error_reporting_iteration into mg_longread_error_reporting
1f9f5c2
to
0930965
Compare
deletions, as well as number of inserted and deleted bases
Description
Give your PR a concise yet descriptive title
Please explain the changes you made here.
Explain the motivation for making this change. What existing problem does the pull request solve?
Mention any issues fixed, addressed or otherwise related to this pull request, including issue numbers or hard links for issues in other repos.
You can delete these instructions once you have written your PR description.
Checklist (never delete this)
Never delete this, it is our record that procedure was followed. If you find that for whatever reason one of the checklist points doesn't apply to your PR, you can leave it unchecked but please add an explanation below.
Content
Review
For more detailed guidelines, see https://github.com/broadinstitute/picard/wiki/Guidelines-for-pull-requests