-
Notifications
You must be signed in to change notification settings - Fork 156
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Replace other iter().position()
instances with memchr
#302
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is great, thanks. If the dependency is ever a problem we can make it optional with a naive implementation when disabled.
Let's merge this instead of #301. Can you delete the unneeded fixme?
I experimented with using reads instead of mmap (https://github.com/philipc/object/tree/pr302) and it seems promising:
Note that I changed the loop to cover the file open too (just the inner loop is slower, but that's expected to be better for mmap since it's already in memory after the first loop). It's still not ideal because it does a hash lookup and allocation for every read, but fixing that is harder and something I'll think about. Also not sure how well it translates to real workloads. |
This makes parsing of the archive headers significantly faster. The `ar` example adjusted to parse the same archive 1 million times, when run with the rlib of the `object` crate itself produces the following metrics: 788.19 msec task-clock:u # 0.998 CPUs utilized 2,502,967,113 cycles:u # 3.176 GHz 7,780,571,392 instructions:u # 3.11 insn per cycle In contrast to the following for the old code: 1,061.09 msec task-clock:u # 0.998 CPUs utilized 3,374,141,510 cycles:u # 3.180 GHz 12,012,570,139 instructions:u # 3.56 insn per cycle This results in a reduction of about 1B cycles, or 25% reduction in wall clock time. Originally `perf` would show a heavy hotspot (in the area of 50% of the total runtime) in `parse_sysv_extended_name`.
Here instead of figuring out the extents of the integer ahead of time we check for the spaces while we compute the number itself. This further reduces the runtime of the beforementioned case (see previous commit) to: 580.57 msec task-clock:u # 0.997 CPUs utilized 1,843,957,595 cycles:u # 3.176 GHz 5,901,570,558 instructions:u # 3.20 insn per cycle `perf report` still shows that the most of the time is spent parsing sysv archive names (which makes sense – its pretty much all the program does after all!).
This makes `memchr` a non-optional dependency, due to its use in `read_string` method of `pod::Bytes`.
b3d84de
to
de7ecdd
Compare
Done. |
Replace other `iter().position()` instances with `memchr`
This might provide similar improvements to those seen in #301, depending on how hot the relevant code is.. This makes
memchr
a non-optional dependency, due to its use inread_string
method ofpod::Bytes
. If this is a problem it is still possible to make it an optional dependency used bycoff
andmacho
code.