-
-
Notifications
You must be signed in to change notification settings - Fork 301
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
switch to custom Hasher implementation #630
Conversation
1e783cf
to
269d59e
Compare
I would not be surprised. Now, having forgotten most of the history collected in the tracking issue I wonder why it decompresses entries to learn about the size instead of querying the index for that. Oh, right, because it first has to discover all objects and that tree traversal takes a long time even though it already skips over trees it has already seen. |
- rename `git-shamap` to `git-hash-collections`. That way more of these can be added as we go, starting with hashmaps (and maybe even ending with them). - remove unused 'raw' module. It can be brought back should it become relevant again. - change names to mirror `std::collections::*` and rely on prefixing with the whole cratename more. - pledge to not using unsafe code - prepare for adding 'breaking change' commit message to `git-revision`
…ons`. Due to the importance of best-suited data structures for maximizing performance we need to take control over them. This is best done using a dedicated crate that can cater to our very needs. That crate is named `git-hash-collections`.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks a lot for contributing! Every percent of performance is appreciated!
As a general note, it's very important to split commits and use the correct commit message to signal breaking changes. I have created a write-up here: 932c353 .
Besides that I think it's ready for being merged, but wanted to see if you are OK with the changes first.
Thank you
Looks good to me, I will follow the commit conventions in the future. A few comnments:
Fine for now but custom collections (and hashtables in particular) are a prime candidate for unsafe code so we might have to remove that in the future.
Bikeshed: Maybe |
That sounds absolutely fair. All new crates (should) start out with these deny's (except for
Yes, let's do that instead! I think once the naming is fixed, we can merge this one. |
Closing as it's already merged, but I had to rewrite all commits but the first one in the end. |
Speeds up pack creation by about 1% and ensures
that the
Hasher
will remain robust whenmore variants (like sha256) are added to
ObjectId
in the future.Looking at the flamegraph, the
HashMap
doesn't even show up tough.It seems pack creation is almost entirely bound by zlip decoding during commit transversal.
So this might be a larger improvement in hashing performance but just doesn't matter as much overall.
Since git is still quite a bit faster during (single-threaded) pack creation it
would definitely be interesting to find out why
gitoxide
is slower.Since
gitoxide
uses azlib-ng
I would expect it to be faster thangit
if decoding is really the bottleneck.Without that advantage the gap would be even larger.
Perhaps git somehow manages to do less work during transversal compared to
gitoxide
?Ok for Byron review the PR on video?
If I think the review will be helpful for the community, then I might record and publish a video.