-
Notifications
You must be signed in to change notification settings - Fork 242
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Support BAMs with >65535 CIGAR operations #1003
Conversation
Codecov Report
@@ Coverage Diff @@
## master #1003 +/- ##
===============================================
+ Coverage 66.103% 66.277% +0.173%
- Complexity 7562 7758 +196
===============================================
Files 532 532
Lines 32254 33072 +818
Branches 5497 5701 +204
===============================================
+ Hits 21321 21919 +598
- Misses 8777 8954 +177
- Partials 2156 2199 +43
|
@lh3 there are failing tests...could you please rebase (not squash) to see if that helps? |
This is the solution that @yfarjoun and I have agreed on. In the presence of CIGAR with >65535 operators, we move the real CIGAR to the CG tag and put a fake full clipping CIGAR (i.e. <readLength>S) in place. `./gradlew` reported one test failure due to `EnaRefServiceTest`. Looking at the call stack, I don't think that is my fault. The rest of tests have passed. On an example SAM and BAM (from <http://lh3lh3.users.sourceforge.net/data>), `PrintReadsExample` gives the desired BAM output when taking both SAM and BAM as input. Both shallow and deep decoding also write correct BAMs.
For BAMs generated with old tools, the indexingBin field may not be correctly set. We need to manually update this field to avoid errors during random retrieval.
The old code to lift tag to CIGAR did not work. This commit fixed several issues: - byte order is not set when computing alignmentEnd - forgot to add *4 to cigarLen in two cases - getChar() is not what it is supposed to mean The new code has been more carefully tested and seems to work.
This fixes a test failure.
The master has a new test which leads to a failure (null is not checked). This has been fixed. All tests passed now. |
I'd like to have a test for this...do you mind if I added a test in a commit? |
@yfarjoun Please go ahead and add the test. I really appreciate! |
- extracting constants etc.
hi @lh3, I added some tests and some code changes here: https://github.com/samtools/htsjdk/tree/yf_cigar-64k-tag though one seemingly unrelated test is failing, and the second test I wrote isn't failing, even though I designed it to fail...I was hoping that you could include something like the tests I wrote here in your branch. The second test adds wrong CG tags to the records and expects failure (since they should either be replaced by the new tags, or confuse the system with wrong tags)...let me know how I can help. |
I found the problem and fixed it...so now my only question is why can I add a CG tag and somehow the bam manages to not get tripped up... |
…ed the SamSetRecordBuilder..
…ed the SamSetRecordBuilder..
@lh3 could you tell me what you think about the tests and my question? |
On my laptop, there are 3 |
scratch that. I did push the latest, and they pass on travis... https://travis-ci.org/samtools/htsjdk/builds/319218545?utm_source=github_status&utm_medium=notification |
make sure you got the latest commit from my branch. |
Thanks a lot, @yfarjoun. I have merged your branch into this PR. |
|
||
|
||
//why is this not breaking? | ||
@Test(dataProvider = "longCigarsData") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
do you understand, @lh3 , why this test isn't failing?!
I"m adding a fake CG tag and I have long cigars, but somehow it manages to encode and decode the read correctly....I was expecting failure!
superseded by #1086 |
Description
This PR implements the long-cigar solution that @yfarjoun and I have agreed on. In the presence of CIGAR with >65535 operators, we move the real CIGAR to the CG tag and put a fake full clipping CIGAR (i.e.
<readLength>S
) in place../gradlew test
reported one test failure due toEnaRefServiceTest
. Looking at the call stack, I don't think that is my fault. The rest of tests have passed.On an example SAM and BAM (from http://lh3lh3.users.sourceforge.net/data),
PrintReadsExample
gives the desired BAM output when taking both SAM and BAM as input. Both shallow and deep decoding also write correct BAMs.Checklist