Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Parse valid VCF 4.2 with ##INFO containing Source + Version #1248

Merged
merged 42 commits into from
Jan 22, 2019
Merged

Parse valid VCF 4.2 with ##INFO containing Source + Version #1248

merged 42 commits into from
Jan 22, 2019

Conversation

romanzenka
Copy link
Contributor

@romanzenka romanzenka commented Dec 28, 2018

Description

As per bug #517 - despite VCFv4.2 spec allowing for "Source" and "Version" fields in ##INFO, htsjdk would throw an error. This fixes the issue.

Checklist

  • Code compiles correctly
  • New tests covering changes and new functionality
  • All tests passing
  • Extended the README / documentation, if necessary
  • Is not backward compatible (breaks binary or source compatibility)

@romanzenka romanzenka changed the title Fix bug #517 Fix bug #517 - fails to parse valid VCF 4.2 with INFO containing Source + Version Dec 28, 2018
@romanzenka romanzenka changed the title Fix bug #517 - fails to parse valid VCF 4.2 with INFO containing Source + Version Parse valid VCF 4.2 with ##INFO containing Source + Version Dec 28, 2018
This allows us to validate a new feature in VCFv4.2 - optional tags
in ##INFO field
@codecov-io
Copy link

codecov-io commented Dec 28, 2018

Codecov Report

Merging #1248 into master will increase coverage by 0.029%.
The diff coverage is 73.684%.

@@               Coverage Diff               @@
##              master     #1248       +/-   ##
===============================================
+ Coverage     69.324%   69.353%   +0.029%     
- Complexity      8287      8301       +14     
===============================================
  Files            555       555               
  Lines          33075     33116       +41     
  Branches        5561      5572       +11     
===============================================
+ Hits           22929     22967       +38     
- Misses          7884      7890        +6     
+ Partials        2262      2259        -3
Impacted Files Coverage Δ Complexity Δ
...n/java/htsjdk/variant/vcf/VCFContigHeaderLine.java 76.471% <100%> (ø) 10 <1> (ø) ⬇️
...rc/main/java/htsjdk/variant/vcf/VCFHeaderLine.java 74.468% <100%> (+1.135%) 22 <0> (+2) ⬆️
...n/java/htsjdk/variant/vcf/VCFFilterHeaderLine.java 100% <100%> (ø) 5 <1> (ø) ⬇️
...main/java/htsjdk/variant/vcf/AbstractVCFCodec.java 73.006% <100%> (ø) 90 <0> (ø) ⬇️
...n/java/htsjdk/variant/vcf/VCFSimpleHeaderLine.java 75.61% <33.333%> (-3.877%) 12 <1> (ø)
...ain/java/htsjdk/variant/vcf/VCFInfoHeaderLine.java 83.333% <50%> (-16.667%) 6 <1> (+1)
...java/htsjdk/variant/vcf/VCFCompoundHeaderLine.java 75.51% <73.333%> (-0.1%) 48 <2> (+7)
...va/htsjdk/variant/vcf/VCFHeaderLineTranslator.java 92.593% <78.571%> (+5.093%) 3 <2> (+1) ⬆️
...sjdk/samtools/util/Md5CalculatingOutputStream.java 84.615% <0%> (+10.256%) 11% <0%> (+3%) ⬆️

@romanzenka
Copy link
Contributor Author

I will work on getting coverage back in shape

@lbergelson
Copy link
Member

@cmnbroad Could you take a look at this? It's one of the tickets that was slated to be closed by your giant refactor branch that never went int.

@lbergelson
Copy link
Member

@romanzenka Thanks for doing this! This warning might be too late, but the code coverage is really oversensitive and tends to fail at random due to some non-determinism in the test suite. So don't worry too much about pleasing it.

@romanzenka
Copy link
Contributor Author

I originally wanted to reuse parts of the original PR, but I could not figure out how it fixes this particular issue (VCF adding optional fields). So I enhanced the interface (retaining backwards compatibility) so now you can specify optional tags.

@cmnbroad
Copy link
Collaborator

cmnbroad commented Jan 3, 2019

Yes, the original PR refactored the entire header line class hierarchy, so it would be hard to extract just a piece of it. I'll take a look at the changes here.

Copy link
Contributor

@pshapiro4broad pshapiro4broad left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Many minor comments

pshapiro4broad and others added 4 commits January 3, 2019 10:06
Cleaner call avoiding code duplication

Co-Authored-By: romanzenka <[email protected]>
1) switched optional flags from Set to List, assuming there will be typically very few of them
2) comments on VCFLineParser, deprecated version without optional flags, removed the redundant 'public'
3) clearer exception in VCF4Parser when tags are out of order versus tags are completely unexpected
4) unit test checks for error message to match expected one
Copy link
Contributor

@pshapiro4broad pshapiro4broad left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the updates the code looks 👍 to me, except for the issue with the case statement fall-through.

Will also need @lbergelson or @cmnbroad to check this off too, as I'm not a VCF format expert.

@cmnbroad
Copy link
Collaborator

cmnbroad commented Jan 3, 2019

@romanzenka Thanks for taking this on - we've been wanting to fix it for a while. It looks you've made changes to make "source" and "version" tolerated on read, but the values aren't retained, accessible to callers, or round-tripped on write. Would you be able to add getters so these values can be retrieved from the header lines, and update the encoder to propagate them on write, with corresponding tests ? Ideally we'd implement generalized attribute support, but thats a lot more work. For now it would be ok to handle these two attributes specifically since they're recommended in the spec, but I'd be reluctant to take a change that accepts and then silently drops them.

Copy link
Collaborator

@cmnbroad cmnbroad left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the last changes - one minor minor change requested.

@lbergelson, except for one last comment on the tests, Source and Version can be round-tripped now. I think this PR strikes as good a balance as we're going to get for that without a major overhaul such as in my other PR. The only remaining question is whether we really want to deprecate the old parseLine in VCFLineParser, and the old constructor in VCFSimpleLineHeader, or just leave them. I think they're harmless to leave, though the new replacements are preferable.

@romanzenka
Copy link
Contributor Author

romanzenka commented Jan 11, 2019

I think I addressed everything - the only remaining issue was deprecation of those two constructors (that are not used anymore within the library). I removed the @deprecated for now and replaced it with @see, it can always be changed back later maybe during a larger deprecation activity when you decide to clean the library up.

Copy link
Collaborator

@cmnbroad cmnbroad left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

One hopefully final request and we can get this merged.

@romanzenka
Copy link
Contributor Author

I added a freebie that you may/may not like - I am tired of the MD5CalculatingOutputStream spuriously messing up my coverage, so I made it deterministic.

@lbergelson
Copy link
Member

@romanzenka 👍 to fixing that. Don't worry too much about the coverage though... it's weirdly oversensitive. We need to change the settings to be more lenient.

@romanzenka
Copy link
Contributor Author

@lbergelson I do not think it is even sensitivity - there is something random about how the coverage gets compared. The latest report makes no sense to me - I have not done anything with those 21 reported files. I suspect the code coverage tool itself might be confused?
What you would want to compare is coverage as it is right now versus coverage as it would be after you merge the PR. I do not think that is happening.

@lbergelson
Copy link
Member

Yeah, it choose funny comparisons sometimes and I'm not certain exactly what happens. I think it compares the coverage in the last run of master vs the coverage in the pr build, which isn't necessarily in sync although it should be close.

@romanzenka
Copy link
Contributor Author

romanzenka commented Jan 17, 2019

@lbergelson I am simply merging master for now to see what happens... my bad, that's probably best practice before accepting a PR anyway

@romanzenka
Copy link
Contributor Author

@cmnbroad Do you think this could be merged before Friday 2019-01-25? We are releasing an app and it would be nice to build it on top of an official commit instead of our own branch.
I am not hoping for an official release, although that would of course be awesome!

Copy link
Collaborator

@cmnbroad cmnbroad left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@lbergelson this looks good to me now (except for the old copyright date), in case you want a final review.

@cmnbroad
Copy link
Collaborator

@romanzenka Thanks. Add yes, we should be able to get this merged in the next 24 hours.

@lbergelson
Copy link
Member

@cmnbroad @romanzenka 👍

Thanks for working through this one. It turned out to be more work than I expected. I don't know if we'll be able to get another release out by friday. I'll see what I can do...

@romanzenka
Copy link
Contributor Author

romanzenka commented Jan 22, 2019

@cmnbroad @romanzenka 👍

Thanks for working through this one. It turned out to be more work than I expected. I don't know if we'll be able to get another release out by friday. I'll see what I can do...

Don't worry about it. Release when it is appropriate to release. We'll build off a particular commit.

Oh and thanks for very thorough and high-quality review!

@cmnbroad cmnbroad merged commit 93250d5 into samtools:master Jan 22, 2019
@cmnbroad
Copy link
Collaborator

Thanks again for working through all the issues @romanzenka.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants