Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Extract index from the high hash bits instead of the low bits. #71

Closed
wants to merge 3 commits into from

Conversation

edre
Copy link
Contributor

@edre edre commented Apr 25, 2019

Some Hashers, and FxHasher in particular, mix the low bits poorly; so only use the high bits for both hashes.

Instead of introducing bucket_shift next to bucket_mask in the top-level struct, derive them both from capacity_log2. This avoids exacerbating struct size in #69.

This change is also a prerequisite for rayon-based parallel collect, which requires that the high bits of the index are always in the same place regardless of table size.

name old ns/iter new ns/iter diff ns/iter diff % speedup
find_existing 0 0 0 NaN% x NaN
find_existing_high_bits 84,320 3,442 -80,878 -95.92% x 24.50
find_nonexisting 0 0 0 NaN% x NaN
get_remove_insert 25 29 4 16.00% x 0.86
grow_by_insertion 205 209 4 1.95% x 0.98
grow_by_insertion_kb 290 180 -110 -37.93% x 1.61
hashmap_as_queue 25 26 1 4.00% x 0.96
insert_8_char_string 18,038 17,491 -547 -3.03% x 1.03
new_drop 0 0 0 NaN% x NaN
new_insert_drop 45 50 5 11.11% x 0.90

Some Hashers, and FxHasher in particular, mix the low bits poorly; so only use the high bits for both hashes.

Instead of introducing bucket_shift next to bucket_mask in the top-level struct, derive them both from capacity_log2. This avoids exacerbating struct size in rust-lang#69.

This change is also a prerequisite for rayon-based parallel collect, which requires that the high bits of the index are always in the same place regardless of table size.

 name                     old ns/iter  new ns/iter  diff ns/iter   diff %  speedup
 find_existing            0            0                       0     NaN%    x NaN
 find_existing_high_bits  84,320       3,442             -80,878  -95.92%  x 24.50
 find_nonexisting         0            0                       0     NaN%    x NaN
 get_remove_insert        25           29                      4   16.00%   x 0.86
 grow_by_insertion        205          209                     4    1.95%   x 0.98
 grow_by_insertion_kb     290          180                  -110  -37.93%   x 1.61
 hashmap_as_queue         25           26                      1    4.00%   x 0.96
 insert_8_char_string     18,038       17,491               -547   -3.03%   x 1.03
 new_drop                 0            0                       0     NaN%    x NaN
 new_insert_drop          45           50                      5   11.11%   x 0.90
@Amanieu
Copy link
Member

Amanieu commented Apr 26, 2019

Unfortunately, the 2 benchmarks which actually matter (find_existing and find_nonexisting) are currently broken. I will prepare a PR to fix then, and then we can evaluate the impact that your change has on performance.

Also, could you explain what you have in mind for parallel collect? I am not sure if it is possible to do better than the implementation we have now (parallel collect into a Vec and then normal collect that into a HashMap).

I personally feel that we would be better off fixing the hash function to provide better output in the lower bits, but that is something to decide with benchmarks.

@edre
Copy link
Contributor Author

edre commented Apr 26, 2019

I implemented a parallel collect for the std HashMap, and it got a nearly linear speedup with more cores: edre/rayon-hash@f94751a

The gist of the algorithm is:

  1. Hash and partition items into shard buckets in parallel.
  2. Insert shard buckets into their segment of the hash table in parallel.
  3. Fix up any entries that needed to overflow into the next segment.

Just this change of using the high bits for the index in the std HashMap made serial collect a couple percent faster. Probably because table resize writes to one spot in the new table instead of two. Adding a bit to the front of the index splits the new address into two possible spots.

@Amanieu
Copy link
Member

Amanieu commented Apr 26, 2019

Can you retry the benchmarks with the latest master? They should give a better picture of the performance now that #72 is merged.

@edre
Copy link
Contributor Author

edre commented Apr 26, 2019

This change does a lot worse in the new benchmarks. My rust dev setup is in a VM, and there's about 10% noise, but this still looks bad.

 name              old ns/iter  bigstruct ns/iter  diff ns/iter  diff %  speedup 
 insert_erase_i32  38,723       39,912                    1,189   3.07%   x 0.97 
 insert_erase_i64  36,316       41,743                    5,427  14.94%   x 0.87 
 insert_i32        31,225       29,544                   -1,681  -5.38%   x 1.06 
 insert_i64        28,472       31,279                    2,807   9.86%   x 0.91 
 iter_i32          2,091        2,375                       284  13.58%   x 0.88 
 iter_i64          2,096        2,541                       445  21.23%   x 0.82 
 lookup_fail_i32   3,813        5,684                     1,871  49.07%   x 0.67 
 lookup_fail_i64   3,524        3,324                      -200  -5.68%   x 1.06 
 lookup_i32        4,846        4,467                      -379  -7.82%   x 1.08 
 lookup_i64        5,420        9,067                     3,647  67.29%   x 0.60 

My conclusion from this is that Fx is not a very good general hash function. These benchmarks use keys that are closely packed small integers, and it would be difficult for a hash function to mess up the perfect low-bit entropy handed to it here. I think you should reintroduce some form of the find_existing_high_bits benchmark, which highlights where Fx is problematic.

Using fnv makes the benchmarks more similar, but is a bit slower.

@Amanieu
Copy link
Member

Amanieu commented Aug 4, 2019

We've switched the hash function to AHash, which solves this issue.

@Amanieu Amanieu closed this Aug 4, 2019
@Amanieu
Copy link
Member

Amanieu commented Aug 4, 2019

See #97

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants