-
-
Notifications
You must be signed in to change notification settings - Fork 5.5k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Improve countlines() performance #11947
Conversation
I restarted two builds - one had a weird type inference error and the other had a network failure. |
Yeah, the failures seemed unrealted. Let's wait on AppVeyor then merge. This isn't anything spectacular, but close to 2x improvements in some cases is nice. |
Glad to see improvements on the data ingest end of the standard library! |
Is it computing the same thing ? The old impl. seems to not be counting empty lines. |
Yeah, I noticed that. I'm not sure why we wouldn't count empty lines? Looking at the source code for |
We could maybe provide a |
👍 LGTM |
Well, it was introduced in 2eb2493 with a reference to dlmread. I didn't dig any further, but is it (or was it) used to determine the size of an array to preallocate? |
OK, I've looked everywhere. This is documented as counting non-empty lines, however, it is not used anywhere in Base, and is only used in one place in any registered package, in |
Thanks for looking around @ScottPJones. My guess is it used to be used by |
Yes, I'm sad though, because your code looks very nice and fast, only to find out that it's not (currently) used except that once in a package. |
Improve countlines() performance
Ref: Luthaf/Jumos.jl#22 |
if !isascii(eol) | ||
throw(ArgumentError("only ASCII line terminators are supported")) | ||
end | ||
countlines(f::AbstractString,eol::Char='\n') = open(io->countlines(io,eol),f)::Int |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Interesting tidbit for others out there: while the open(func, file)
can be quick and tidy, it's also quite subtle from a type stability perspective. Without the ::Int
assertion here, @code_warntype
shows that the return type for this method is Any
, even though countlines
itself is type-stable.
The first 3 files were auto-generated from random characters + newlines. The 21GB file is from @jiahao's test file from the readdlm issue.
Also added previously missing tests