Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

bgzf/index: fix chunk reader iteration #14

Merged
merged 1 commit into from
Sep 17, 2015
Merged

bgzf/index: fix chunk reader iteration #14

merged 1 commit into from
Sep 17, 2015

Conversation

kortschak
Copy link
Member

No tests. The test needs to come, but it requires engineering a set of blocks and index chunks to mimic the output shown in #10.

@brentp Please take a look.

@brentp
Copy link
Contributor

brentp commented Sep 15, 2015

I don't grok why this works and why it didn't before.

@kortschak
Copy link
Member Author

The file offset part of a virtual offset is the unique identifier for a BGZF block. I was using the end block of the last read (which may actually be the next block - so the condition failed, and we did not truncate the []byte to the correct length - thus read potentially up to the end of the BGZF block).

The logic is still weird for me and I'm looking into it further.

@kortschak
Copy link
Member Author

I've improved the clarity, properly terminate in the case of non-scanner reading (it was covering for an error in my logic which is now fixed). But please run the new code on your complete set to see if you find any other cases.

PTAL

// length of the remaining chunk data.
var cursor uint16
if last.Begin.File == r.chunks[0].Begin.File {
cursor = last.End.Block
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If I change this to:
cursor = r.chunks[0].Begin.Block

Then my tests (doing tons of intersections and comparing to tabix CLI tool) are getting a lot farther, but still failing eventually. I think it's in the block below for switching chunks to early??

@kortschak
Copy link
Member Author

PTAL

I resorted to paper and pencil after constructing tests.

@kortschak kortschak changed the title bgzf/index: use correct end of chunk for identification bgzf/index: fix chunk reader iteration Sep 16, 2015
@brentp
Copy link
Contributor

brentp commented Sep 16, 2015

LGTM

@kortschak
Copy link
Member Author

Nope, not yet.

@kortschak
Copy link
Member Author

There is a remaining issue, though it doesn't break things, it just results in unused reads. The fix requires new API in bgzf.Reader.

@kortschak kortschak merged commit df762dd into master Sep 17, 2015
@kortschak kortschak deleted the issue10 branch September 17, 2015 00:06
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants