Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

VariantContextBuilder bug fix; better symbolic allele handling #6

Closed
wants to merge 4 commits into from
Closed

Conversation

d-cameron
Copy link
Contributor

Bug fix for VariantContextBuilder thowing NullPointerException if .attribute() is called before .attributes()
Accepting breakend alleles as symbolic alleles
Project renaming and lib dependencies added so htslib compiles in an eclipse workspace that also contains picard

return false;
else {
final String strBases = new String(bases);
return (bases[0] == '<' && bases[bases.length-1] == '>') ||
(strBases.contains("[") || strBases.contains("]"));
return (bases[0] == '<' || bases[bases.length-1] == '>') || // symbolic
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Would "<" be considered a symbolic allele? Would "<>" be considered a symbolic allele? According to your changes they would be considered symbolic, but I'm not sure why.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

4.2 allows symbolic insertion alleles such as "A" and "A", hence the relaxation of the the <> requirements. < is still rejected as it is <= length 1 (3 lines up), and the VCF specifications do not require a non-zero length ID string, so yes, "<>" is technically valid (although I don't think it should be).

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hi Daniel,

For your patch, "<ABC" would be considered a symbolic allele. I.e. there is a boolean OR operator between the test for "<" and for ">". Is that really what you intend?

-Alec

On May 20, 2014, at 10:51 PM, Daniel Cameron [email protected] wrote:

In src/java/htsjdk/variant/variantcontext/Allele.java:

         return false;
     else {
         final String strBases = new String(bases);
  •        return (bases[0] == '<' && bases[bases.length-1] == '>') ||
    
  •                (strBases.contains("[") || strBases.contains("]"));
    
  •        return (bases[0] == '<' || bases[bases.length-1] == '>') || // symbolic
    
    4.2 allows symbolic insertion alleles such as "A" and "A", hence the relaxation of the the <> requirements. < is still rejected as it is <= length 1 (3 lines up), and the VCF specifications do not require a non-zero length ID string, so yes, <> is technically valid (although I don't think it should be).


Reply to this email directly or view it on GitHub.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, it was changed to OR so A and A (v4.2 section 5.4.2 Large Insertions) would return true.

It is true the check will now accept invalid alleles such as <ABC, but as noted elsewhere, it already accepts invalid alleles such as [ABC and downstream parsing and interpretation of the allele must be done by users of the API anyway so invalid alleles are likely to fail downstream of the API just as they do now with breakpoints. The intent of this minimal pull is for a least-impact update to htsjdk so it doesn't choke on valid v4.2 allele. Should I raise a feature request for first-class treatment of symbolic alleles? I'm happy to write a proper implementation of SV allele handling (eg: htsjdk is also missing definitions for the 31 spec-defined SV headers).

The other problem I encountered is the VCF spec itself is unclear as to what a valid symbolic allele is. For example, AA may or may not be valid so writing a unambiguously correct VCF symbolic allele parser is not actually possible at this point. I've raised this issue on samtools/hts-specs and the VCFTools-spec mailing list but I expect clarification will take some time.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah, OK, I see that. @rpoplin, can you OK this?

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Seems like this change plus the feature request is a sensible path forward.

@alecw
Copy link
Contributor

alecw commented May 22, 2014

Hi Daniel, I just rebased these changes and merged them. Thanks for bearing with me during this process.

-Alec

@alecw alecw closed this May 22, 2014
jmthibault79 added a commit that referenced this pull request Mar 26, 2019
# This is the 1st commit message:

move Container.setByteOffset() call inside ContainerIO.readContainer()
- also ContainerHeaderIO.readContainerHeader()

restrict access to setByteOffset() and encapsulate Container.slices

# This is the commit message #2:

oops

# This is the commit message #3:

a little unrelated cleanup

# This is the commit message #4:

oops

# This is the commit message #5:

better CRAIEntryTest

# This is the commit message #6:

test improvements and a fix

# This is the commit message #7:

comment

# This is the commit message #8:

javadoc

# This is the commit message #9:

review comments

# This is the commit message #10:

comments and clarification for CRAMBAIIndexer
yfarjoun pushed a commit to yfarjoun/htsjdk that referenced this pull request Jun 5, 2019
Fix mixed VCF record typo (REF allele should be GCG, as in the wiki
version).  Hat tip @holyspidoo, fixes samtools#6.

Clarify haplotype tables: C is only the reference base in the first
few examples; align sequence letters via verbatim monospaced font.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants