Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

fix parsing headers 48105b07 #924

Open
wants to merge 1 commit into
base: master
Choose a base branch
from
Open

fix parsing headers 48105b07 #924

wants to merge 1 commit into from

Conversation

kuk0
Copy link

@kuk0 kuk0 commented Jun 23, 2021

see the comments in 48105b0
the two versions of isHeaderLine are not equivalent and the commit broke reading correct BED files

@38
Copy link
Contributor

38 commented Jun 23, 2021

Why revert it? This is a critical optimization that makes it significantly faster.

@38
Copy link
Contributor

38 commented Jun 23, 2021

I guess we could fix the parser - of course it has bug.

And please note, the original version isn't a reliable parsing code as well. The point is we need to handle most of the case.

see arq5x@48105b0
that commit broke reading correct BED files
@kuk0 kuk0 changed the title partial revert of 48105b07 fix parsing headers 48105b07 Jun 24, 2021
@kuk0
Copy link
Author

kuk0 commented Jun 24, 2021

This is a critical optimization that makes it significantly faster.

ok, well, but how much faster? do you have a benchmark?
how many times is that function (isHeaderLine) called?
e.g., if i have a file with 1M lines, but the header is just the first line, i would expect it to be called ~ once or twice (then it would be unnecessary to optimize it at all); if it's called 1M times, is it necessary?

@kuk0
Copy link
Author

kuk0 commented Jun 24, 2021

i have updated the diff and tested it on this BED file:

chr	start	end	name	score	strand
NC_045512.2	0	29903	sars2	.	+

bedtools v2.29.2 can read this file
bedtools v2.30.0 is broken

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants