-
Notifications
You must be signed in to change notification settings - Fork 242
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
VariantContextBuilder bug fix; better symbolic allele handling #6
Conversation
…thout loading attributes or explicitly calling .attributes(null) first
return false; | ||
else { | ||
final String strBases = new String(bases); | ||
return (bases[0] == '<' && bases[bases.length-1] == '>') || | ||
(strBases.contains("[") || strBases.contains("]")); | ||
return (bases[0] == '<' || bases[bases.length-1] == '>') || // symbolic |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Would "<" be considered a symbolic allele? Would "<>" be considered a symbolic allele? According to your changes they would be considered symbolic, but I'm not sure why.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
4.2 allows symbolic insertion alleles such as "A" and "A", hence the relaxation of the the <> requirements. < is still rejected as it is <= length 1 (3 lines up), and the VCF specifications do not require a non-zero length ID string, so yes, "<>" is technically valid (although I don't think it should be).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hi Daniel,
For your patch, "<ABC" would be considered a symbolic allele. I.e. there is a boolean OR operator between the test for "<" and for ">". Is that really what you intend?
-Alec
On May 20, 2014, at 10:51 PM, Daniel Cameron [email protected] wrote:
In src/java/htsjdk/variant/variantcontext/Allele.java:
return false; else { final String strBases = new String(bases);
return (bases[0] == '<' && bases[bases.length-1] == '>') ||
(strBases.contains("[") || strBases.contains("]"));
4.2 allows symbolic insertion alleles such as "A" and "A", hence the relaxation of the the <> requirements. < is still rejected as it is <= length 1 (3 lines up), and the VCF specifications do not require a non-zero length ID string, so yes, <> is technically valid (although I don't think it should be).return (bases[0] == '<' || bases[bases.length-1] == '>') || // symbolic
—
Reply to this email directly or view it on GitHub.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, it was changed to OR so A and A (v4.2 section 5.4.2 Large Insertions) would return true.
It is true the check will now accept invalid alleles such as <ABC, but as noted elsewhere, it already accepts invalid alleles such as [ABC and downstream parsing and interpretation of the allele must be done by users of the API anyway so invalid alleles are likely to fail downstream of the API just as they do now with breakpoints. The intent of this minimal pull is for a least-impact update to htsjdk so it doesn't choke on valid v4.2 allele. Should I raise a feature request for first-class treatment of symbolic alleles? I'm happy to write a proper implementation of SV allele handling (eg: htsjdk is also missing definitions for the 31 spec-defined SV headers).
The other problem I encountered is the VCF spec itself is unclear as to what a valid symbolic allele is. For example, AA may or may not be valid so writing a unambiguously correct VCF symbolic allele parser is not actually possible at this point. I've raised this issue on samtools/hts-specs and the VCFTools-spec mailing list but I expect clarification will take some time.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ah, OK, I see that. @rpoplin, can you OK this?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Seems like this change plus the feature request is a sensible path forward.
Hi Daniel, I just rebased these changes and merged them. Thanks for bearing with me during this process. -Alec |
# This is the 1st commit message: move Container.setByteOffset() call inside ContainerIO.readContainer() - also ContainerHeaderIO.readContainerHeader() restrict access to setByteOffset() and encapsulate Container.slices # This is the commit message #2: oops # This is the commit message #3: a little unrelated cleanup # This is the commit message #4: oops # This is the commit message #5: better CRAIEntryTest # This is the commit message #6: test improvements and a fix # This is the commit message #7: comment # This is the commit message #8: javadoc # This is the commit message #9: review comments # This is the commit message #10: comments and clarification for CRAMBAIIndexer
Fix mixed VCF record typo (REF allele should be GCG, as in the wiki version). Hat tip @holyspidoo, fixes samtools#6. Clarify haplotype tables: C is only the reference base in the first few examples; align sequence letters via verbatim monospaced font.
Bug fix for VariantContextBuilder thowing NullPointerException if .attribute() is called before .attributes()
Accepting breakend alleles as symbolic alleles
Project renaming and lib dependencies added so htslib compiles in an eclipse workspace that also contains picard