Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

One possible solution to cigarLen>65535 (DON'T MERGE; for discussion only) #938

Closed
wants to merge 3 commits into from
Closed

Conversation

lh3
Copy link
Member

@lh3 lh3 commented Jul 20, 2017

Description

First of all, this PR is not intended to be merged. It is only meant to show code changes for a solution to CIGARs with >65535 operations. The solution was first proposed here (solution 1). It is not the solution @yfarjoun and I have agreed on, but it probably requires the least code change. The other solution needs to add a few hundred lines of code to htsjdk across multiple files.

You can see how the approach works from the change to BAMRecordCodec.encode(). When cigarLen is too large, we write indexingBin, cigarLen and flags differently. The decoder sets the correct values for these fields once the BAMRecord is read to memory. ./gradlew test reported "all tests passed" in the end, but this doesn't tell us whether/how my changes work with a new BAM with long cigars.

Checklist

  • Code compiles correctly
  • New tests covering changes and new functionality
  • All tests passing
  • Extended the README / documentation, if necessary
  • Is not backward compatible (breaks binary or source compatibility)

lh3 added 3 commits July 19, 2017 19:02
This approach was initially proposed in this thread (solution 1):

https://sourceforge.net/p/samtools/mailman/message/30672431/

It is reusing the bin field to store the higher 16 bits of cigarLen.
Importantly, this change is only applied to BAM files. Once the BAMRecord is
read into memory by new htsjdk, all the fields should be correctly populated.
It is silly to send a PR not compilable. Even for discussion purposes.
@codecov-io
Copy link

codecov-io commented Jul 20, 2017

Codecov Report

Merging #938 into master will decrease coverage by 0.012%.
The diff coverage is 41.667%.

@@               Coverage Diff               @@
##              master      #938       +/-   ##
===============================================
- Coverage     65.348%   65.336%   -0.012%     
- Complexity      7321      7323        +2     
===============================================
  Files            528       528               
  Lines          31975     31993       +18     
  Branches        5469      5472        +3     
===============================================
+ Hits           20895     20903        +8     
- Misses          8921      8930        +9     
- Partials        2159      2160        +1
Impacted Files Coverage Δ Complexity Δ
src/main/java/htsjdk/samtools/BAMRecordCodec.java 74.783% <41.667%> (-9.753%) 15 <0> (ø)
...a/htsjdk/samtools/cram/encoding/rans/Decoding.java 85.714% <0%> (+3.571%) 12% <0%> (+1%) ⬆️
...sjdk/samtools/util/Md5CalculatingOutputStream.java 76.923% <0%> (+7.692%) 9% <0%> (+1%) ⬆️

@lh3
Copy link
Member Author

lh3 commented Jul 20, 2017

PS: I submitted this PR because in an internal discussion, @jkbonfield thought this solution still has advantages: it crashes older tools sometimes and is simple to implement. I have PRs to htslib and hts-specs on the solution @yfarjoun and I have agreed on, but I am not familiar with htsjdk enough to implement that in short time. To me, both solutions (this PR and samtools/htslib#560) are fine as long as we can reach a consensus.

@lh3
Copy link
Member Author

lh3 commented Oct 3, 2017

I am closing this PR now, in favor of #1003.

@lh3 lh3 closed this Oct 3, 2017
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants