-
Notifications
You must be signed in to change notification settings - Fork 281
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Extract index from the high hash bits instead of the low bits. #71
Conversation
Some Hashers, and FxHasher in particular, mix the low bits poorly; so only use the high bits for both hashes. Instead of introducing bucket_shift next to bucket_mask in the top-level struct, derive them both from capacity_log2. This avoids exacerbating struct size in rust-lang#69. This change is also a prerequisite for rayon-based parallel collect, which requires that the high bits of the index are always in the same place regardless of table size. name old ns/iter new ns/iter diff ns/iter diff % speedup find_existing 0 0 0 NaN% x NaN find_existing_high_bits 84,320 3,442 -80,878 -95.92% x 24.50 find_nonexisting 0 0 0 NaN% x NaN get_remove_insert 25 29 4 16.00% x 0.86 grow_by_insertion 205 209 4 1.95% x 0.98 grow_by_insertion_kb 290 180 -110 -37.93% x 1.61 hashmap_as_queue 25 26 1 4.00% x 0.96 insert_8_char_string 18,038 17,491 -547 -3.03% x 1.03 new_drop 0 0 0 NaN% x NaN new_insert_drop 45 50 5 11.11% x 0.90
Unfortunately, the 2 benchmarks which actually matter ( Also, could you explain what you have in mind for parallel collect? I am not sure if it is possible to do better than the implementation we have now (parallel collect into a I personally feel that we would be better off fixing the hash function to provide better output in the lower bits, but that is something to decide with benchmarks. |
I implemented a parallel collect for the std HashMap, and it got a nearly linear speedup with more cores: edre/rayon-hash@f94751a The gist of the algorithm is:
Just this change of using the high bits for the index in the std HashMap made serial collect a couple percent faster. Probably because table resize writes to one spot in the new table instead of two. Adding a bit to the front of the index splits the new address into two possible spots. |
Can you retry the benchmarks with the latest master? They should give a better picture of the performance now that #72 is merged. |
This change does a lot worse in the new benchmarks. My rust dev setup is in a VM, and there's about 10% noise, but this still looks bad.
My conclusion from this is that Fx is not a very good general hash function. These benchmarks use keys that are closely packed small integers, and it would be difficult for a hash function to mess up the perfect low-bit entropy handed to it here. I think you should reintroduce some form of the find_existing_high_bits benchmark, which highlights where Fx is problematic. Using fnv makes the benchmarks more similar, but is a bit slower. |
We've switched the hash function to AHash, which solves this issue. |
See #97 |
Some Hashers, and FxHasher in particular, mix the low bits poorly; so only use the high bits for both hashes.
Instead of introducing bucket_shift next to bucket_mask in the top-level struct, derive them both from capacity_log2. This avoids exacerbating struct size in #69.
This change is also a prerequisite for rayon-based parallel collect, which requires that the high bits of the index are always in the same place regardless of table size.
name old ns/iter new ns/iter diff ns/iter diff % speedup
find_existing 0 0 0 NaN% x NaN
find_existing_high_bits 84,320 3,442 -80,878 -95.92% x 24.50
find_nonexisting 0 0 0 NaN% x NaN
get_remove_insert 25 29 4 16.00% x 0.86
grow_by_insertion 205 209 4 1.95% x 0.98
grow_by_insertion_kb 290 180 -110 -37.93% x 1.61
hashmap_as_queue 25 26 1 4.00% x 0.96
insert_8_char_string 18,038 17,491 -547 -3.03% x 1.03
new_drop 0 0 0 NaN% x NaN
new_insert_drop 45 50 5 11.11% x 0.90