-
Notifications
You must be signed in to change notification settings - Fork 594
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Checkpoint of necessary changes for DRAGEN v3.7.8 concordance #8083
Conversation
Github actions tests reported job failures from actions build 3365045711
|
@Argument(fullName= PILEUP_DETECTION_FILTER_ASSEMBLY_HAPS_THRESHOLD, doc = "If enabled (set to non-zero), will apply the \"badness\" filter to compatable assembled haplotypes.", optional = true) | ||
public double assemblyBadReadThreshold = 0.4; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Comment to self. This default is dangerous, almost certainly there should be an explicit OPT-IN to start filtering assembly haplotypes when in the normal pileupcaller mode under any circumstances!
final AssemblyResultSet assemblyResultSet, final List<VariantContext> pileupAllelesFoundShouldFilter, | ||
final List<VariantContext> pileupAllelesPassingFilters, final boolean debug) { | ||
List<Haplotype> haplotypesWithFilterAlleles = new ArrayList<>(); | ||
if (!pileupAllelesFoundShouldFilter.isEmpty() && !argumentCollection.pileupDetectionArgs.generatePDHaplotypes) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This logic is messy and dangerous, There should be an explicit opt-in and probably a user warning printed out if we are actually throwing away haplotypes here... The non-pdhmm version of this code should really be considered an off-label approach and it really wasn't extensively tested except as a stepping stone to the full PDHmm implementation so maybe we should re-evaluate if we want this at all..
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@jamesemery Back to you with my comments
src/main/java/org/broadinstitute/hellbender/engine/AssemblyRegionIterator.java
Outdated
Show resolved
Hide resolved
src/main/java/org/broadinstitute/hellbender/engine/AssemblyRegionIterator.java
Outdated
Show resolved
Hide resolved
// .filter(v -> PileupBasedAlleles.passesFilters(argumentCollection.pileupDetectionArgs, v)) | ||
// .collect(Collectors.toList()); | ||
// | ||
// applyPileupEventsAsForcedAlleles(region, argumentCollection, aligner, refHaplotype, assemblyResultSet, pileupAllelesFoundShouldFilter, pileupAllelesPassingFilters); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You removed the forcedPileupAlleles
arg, but the call into what I assume is your replacement for this functionality (applyPileupEventsAsForcedAlleles()
) is commented out. Should it be uncommented? Or is this replaced by the pdhmm?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ahh this looks worse than it is....
The old GGA mode and the PileupCallerAllele modes were located here (after assembly) but at some point @davidbenjamin moved the GGA allele application to after assembly trimming because it was causing edge case problems. At the time I elected to leave the pileup calling application where it was but it was causing me too many headaches here so I moved it to the same place as the GGA code. The new home for this call is here: HaplotypeCaller.java 739
. It is however not in Mutect...
* @param pileupAllelesPassingFilters | ||
* @return | ||
*/ | ||
public static AssemblyResultSet applyPileupEventsAsForcedAlleles(final AssemblyRegion region, final AssemblyBasedCallerArgumentCollection argumentCollection, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Does this method have a unit test? Or is there any code path covered by integration tests that exercises it?
If it's unused in the pdhmm codepath, can we just remove it? Or do we actually need it for some reason?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Its captured by some HaplotypeCallerIntegrationTest consistent-with-past-results tests... Its the old method for pileupcalling that Bhanu added for the bacterial pipeline. I leave it in for legacy reasons as its a reasonable alternative approach to the PDHMM dependent one that is more usable by other users than DRAGEN-GATK.
// we resort to a messy approach where we filter alleles by throwing away every haplotype supporting an allele. This is | ||
// very dangerous since this could easily destroy phased variants with the haplotype. | ||
if (!pileupAllelesFoundShouldFilter.isEmpty() && !argumentCollection.pileupDetectionArgs.generatePDHaplotypes) { | ||
// TODO this is a bad algorithm for bad people |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
But is it faster than the Java version of the pdhmm? Does it do effectively the same thing?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Its slow, faster than PDHMM by a longshot though because currently the JavaPDHMM is veeery slow
// for (VariantContext v : variants) { | ||
// VariantContext actualVC = resultEMap.get(v.getStart()); | ||
// Assert.assertNotNull(actualVC); | ||
// Assert.assertEquals(actualVC.getAlleles(), v.getAlleles()); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Are these assertions commented out because they fail? Is this test still valid?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So this test failed because the tests i wrote above have alternative representations that realignment was causing the event map to be mismatching on... I've rewritten it to at least somewhat account for this fact... not perfect but alright and better than not testing the event set at all
} | ||
//TODO TESTS TO MAKE: | ||
// ASSSERT IT FAILS IF STARTS BEFORE OR AFTER | ||
// ASSERT it gives a good exception if the order is incorrect |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe now is the time to add these tests?
|
||
//This test is here for the sake of them being related operations | ||
//Test some real cases where we were seeing failures: | ||
// (62,Rlen=1,[C])->(82,Rlen=1,[C])->(84,Rlen=13,[C]) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This comment is a little difficult to understand...
...der/tools/walkers/haplotypecaller/PartiallyDeterminedHaplotypeComputationEngineUnitTest.java
Outdated
Show resolved
Hide resolved
@@ -1978,6 +1978,120 @@ public void testPileupCallingDRAGENModeConsistentWithPastResults(final String in | |||
} | |||
} | |||
|
|||
@Test(dataProvider="HaplotypeCallerTestInputs") | |||
public void testPileupCallingDRAGEN378ModeConsistentWithPastResults(final String inputFileName, final String referenceFileName) throws Exception { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In addition to testing for consistency with past results, could we have a test that checks for concordance vs. DRAGEN output (within some tolerance)?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm going to punt on this for now as its going to take some time to regenerate those and I would like to hand this off soon.
…t to make alternative impelementations easier
a70603e
to
1fec7b8
Compare
…n code harder to enable accidentally
This code is fairly concordant now with a new meta-argument added for enabling the dragen code. However there are still 2 algorithms that we know are necessary for final concordance.