Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

use rapidhash #57509

Draft
wants to merge 11 commits into
base: master
Choose a base branch
from
Draft

use rapidhash #57509

wants to merge 11 commits into from

Conversation

adienes
Copy link
Contributor

@adienes adienes commented Feb 23, 2025

closes #57235

todos:

  • are the test changes acceptable?
  • default seed going from 0 --> 0xbdd89aa982704029
  • lots and lots of benchmarking
  • I don't really understand the effects stuff (for the String hash)
  • how configurable should the secret be? to address randomize hash values per process #37166 should the seed/secret be an env var?
  • proper attribution to creators of rapidhash and also to reference implementations
  • I changed some h + seed to h ⊻ seed only because I don't really get why + was used in the first place
  • should instances of hash(x) - 3h instead be hash(x, h) ?

@adienes adienes added performance Must go faster hashing labels Feb 23, 2025
@adienes adienes requested a review from oscardssmith February 23, 2025 22:44
@oscardssmith
Copy link
Member

oscardssmith commented Feb 24, 2025

default seed going from 0 --> 0xbdd89aa982704029

yes

should instances of hash(x) - 3h instead be hash(x, h)

yes

I'm curious to see how the benchmarks look. It would be very cool if this was faster.

@nsajko
Copy link
Contributor

nsajko commented Feb 24, 2025

xref #52440

@nsajko
Copy link
Contributor

nsajko commented Feb 25, 2025

  • Why does this change the dispatch on the second argument to hash from UInt to UInt64 in some places?
  • I don't think this PR should add new arguments to hash, or change the meaning of existing argument positions. It's probably better to separate performance improvements and new features into separate PRs. You could use an internal function (_hash?) which accepts four, or however many necessary, arguments, and keep the dispatch for hash methods as-is. In this PR, at least.

@adienes
Copy link
Contributor Author

adienes commented Feb 27, 2025

should be in a more reviewable state now. I applied review comments so I (think) the hash methods are identical to master. benchmarks look pretty good locally --- enormously faster on String and about the same / slightly faster on everything else

note that I took some liberty with the hash(::UInt64) method to do a 2-round feistel design as suggested by Oscar on zulip. this has the downside that it's no longer "true" rapidhash as in the hash will not match a String with the same bitstring, but it has the upsides that it's quite a bit faster and also invertible. at least in terms of randomness used, true rapidhash on 8 bytes won't use the third secret term either

@nanosoldier runbenchmarks(ALL, vs=":master")

@adienes
Copy link
Contributor Author

adienes commented Feb 27, 2025

why didn't that trigger a benchmark run?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

switch MurmurHash3 to rapidhash ?
3 participants