Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

max alt allele bug fix #7655

Merged
merged 1 commit into from
Feb 4, 2022
Merged

max alt allele bug fix #7655

merged 1 commit into from
Feb 4, 2022

Conversation

ldgauthier
Copy link
Contributor

Add new argument for GenomicsDB max alts, which must be >= GGVCFs max alts + 1
Fix exception arising from GDB output with too many alts and no likelihoods

Copy link
Contributor

@droazen droazen left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Back to you with my comments @ldgauthier

if (genotypeCalcArgs != null) {
this.maxDiploidAltAllelesThatCanBeGenotyped = genotypeCalcArgs.maxAlternateAlleles;
this.maxGenotypeCount = genotypeCalcArgs.maxGenotypeCount;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Did we determine whether we also need a separate max genotype count arg for GenomicsDB? Does the GenomicsDB value also need to be greater than the value for genotyping? @mlathara opinions?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm probably just looking in the wrong place, but I couldn't see where GenotypeGVCFs is limiting anything based on the max-genotype-count. There's one spot where it warns about PL field length but that is checking against a constant, not max-genotype-count.

So, if GenotypeGVCFs doesn't actually need to use this, then we're fine with a single argument.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Using IntelliJ it looks like it's just HaplotypeCaller.

public static boolean genotypeIsUsableForAFCalculation(Genotype g) {
return g.hasLikelihoods() || g.hasGQ() || g.getAlleles().stream().anyMatch(a -> a.isCalled() && a.isNonReference() && !a.isSymbolic());
return g.hasLikelihoods() || (g.isHomRef() && g.hasGQ() && 2 == g.getPloidy());
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This condition has changed quite a bit in this PR! Can you briefly explain?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The only scenario where we can (somewhat confidently) reconstruct PLs from GQ is hom-ref, but it has to be diploid (probably we could do haploid and higher ploidies too, but the ROI is low)
The second case isn't useful because if we get there because there are no likelihoods and no GQ, then there's no useful data.

}

@Test
public void testMaxAltsToCombine() {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

testMaxAltsToCombine -> testMaxAltsForGenomicsDB

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

compromise: testMaxAltsToCombineInGenomicsDB ?

@droazen
Copy link
Contributor

droazen commented Jan 28, 2022

@mlathara Would you mind reviewing as well when you get a chance? Thanks!

@droazen droazen self-assigned this Jan 28, 2022
Copy link
Contributor

@mlathara mlathara left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

added my comments

if (genotypeCalcArgs != null) {
this.maxDiploidAltAllelesThatCanBeGenotyped = genotypeCalcArgs.maxAlternateAlleles;
this.maxGenotypeCount = genotypeCalcArgs.maxGenotypeCount;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm probably just looking in the wrong place, but I couldn't see where GenotypeGVCFs is limiting anything based on the max-genotype-count. There's one spot where it warns about PL field length but that is checking against a constant, not max-genotype-count.

So, if GenotypeGVCFs doesn't actually need to use this, then we're fine with a single argument.

@ldgauthier ldgauthier requested a review from droazen February 1, 2022 14:44
@ldgauthier
Copy link
Contributor Author

@droazen back to you. Tests fixed locally and I will rebase and merge AlleleSubsettingUtils now.

@ldgauthier ldgauthier force-pushed the ldg_maxAltAlleleBugFix branch from 94b9c4b to 462c6f4 Compare February 1, 2022 14:46
Copy link
Contributor

@droazen droazen left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@ldgauthier Looks good -- just one minor typo in a comment, then go ahead and merge when tests pass

args.add("--"+GenomicsDBArgumentCollection.MAX_ALTS_LONG_NAME);
args.add("5");
args.add("--"+GenotypeCalculationArgumentCollection.MAX_ALTERNATE_ALLELES_LONG_NAME);
args.add("5"); // GenotypeGVCFs value needs to be at least one more, should throw
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

"GenotypeGVCFs value needs to be at least one more" should be "GenomicsDB value needs to be at least one more"

max alts + 1 (for non-ref)
Properly handle genotypes returned from GenomicsDB with no likelihoods
@ldgauthier ldgauthier force-pushed the ldg_maxAltAlleleBugFix branch from 2ba3a3b to 81a078b Compare February 3, 2022 19:06
@ldgauthier ldgauthier merged commit da7cd83 into master Feb 4, 2022
@ldgauthier ldgauthier deleted the ldg_maxAltAlleleBugFix branch February 4, 2022 14:01
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants