Extract index from the high hash bits instead of the low bits. #71

edre · 2019-04-25T22:44:04Z

Some Hashers, and FxHasher in particular, mix the low bits poorly; so only use the high bits for both hashes.

Instead of introducing bucket_shift next to bucket_mask in the top-level struct, derive them both from capacity_log2. This avoids exacerbating struct size in #69.

This change is also a prerequisite for rayon-based parallel collect, which requires that the high bits of the index are always in the same place regardless of table size.

name old ns/iter new ns/iter diff ns/iter diff % speedup
find_existing 0 0 0 NaN% x NaN
find_existing_high_bits 84,320 3,442 -80,878 -95.92% x 24.50
find_nonexisting 0 0 0 NaN% x NaN
get_remove_insert 25 29 4 16.00% x 0.86
grow_by_insertion 205 209 4 1.95% x 0.98
grow_by_insertion_kb 290 180 -110 -37.93% x 1.61
hashmap_as_queue 25 26 1 4.00% x 0.96
insert_8_char_string 18,038 17,491 -547 -3.03% x 1.03
new_drop 0 0 0 NaN% x NaN
new_insert_drop 45 50 5 11.11% x 0.90

Some Hashers, and FxHasher in particular, mix the low bits poorly; so only use the high bits for both hashes. Instead of introducing bucket_shift next to bucket_mask in the top-level struct, derive them both from capacity_log2. This avoids exacerbating struct size in rust-lang#69. This change is also a prerequisite for rayon-based parallel collect, which requires that the high bits of the index are always in the same place regardless of table size. name old ns/iter new ns/iter diff ns/iter diff % speedup find_existing 0 0 0 NaN% x NaN find_existing_high_bits 84,320 3,442 -80,878 -95.92% x 24.50 find_nonexisting 0 0 0 NaN% x NaN get_remove_insert 25 29 4 16.00% x 0.86 grow_by_insertion 205 209 4 1.95% x 0.98 grow_by_insertion_kb 290 180 -110 -37.93% x 1.61 hashmap_as_queue 25 26 1 4.00% x 0.96 insert_8_char_string 18,038 17,491 -547 -3.03% x 1.03 new_drop 0 0 0 NaN% x NaN new_insert_drop 45 50 5 11.11% x 0.90

Amanieu · 2019-04-26T01:33:29Z

Unfortunately, the 2 benchmarks which actually matter (find_existing and find_nonexisting) are currently broken. I will prepare a PR to fix then, and then we can evaluate the impact that your change has on performance.

Also, could you explain what you have in mind for parallel collect? I am not sure if it is possible to do better than the implementation we have now (parallel collect into a Vec and then normal collect that into a HashMap).

I personally feel that we would be better off fixing the hash function to provide better output in the lower bits, but that is something to decide with benchmarks.

edre · 2019-04-26T03:09:19Z

I implemented a parallel collect for the std HashMap, and it got a nearly linear speedup with more cores: edre/rayon-hash@f94751a

The gist of the algorithm is:

Hash and partition items into shard buckets in parallel.
Insert shard buckets into their segment of the hash table in parallel.
Fix up any entries that needed to overflow into the next segment.

Just this change of using the high bits for the index in the std HashMap made serial collect a couple percent faster. Probably because table resize writes to one spot in the new table instead of two. Adding a bit to the front of the index splits the new address into two possible spots.

Amanieu · 2019-04-26T04:58:53Z

Can you retry the benchmarks with the latest master? They should give a better picture of the performance now that #72 is merged.

edre · 2019-04-26T18:32:50Z

This change does a lot worse in the new benchmarks. My rust dev setup is in a VM, and there's about 10% noise, but this still looks bad.

 name              old ns/iter  bigstruct ns/iter  diff ns/iter  diff %  speedup 
 insert_erase_i32  38,723       39,912                    1,189   3.07%   x 0.97 
 insert_erase_i64  36,316       41,743                    5,427  14.94%   x 0.87 
 insert_i32        31,225       29,544                   -1,681  -5.38%   x 1.06 
 insert_i64        28,472       31,279                    2,807   9.86%   x 0.91 
 iter_i32          2,091        2,375                       284  13.58%   x 0.88 
 iter_i64          2,096        2,541                       445  21.23%   x 0.82 
 lookup_fail_i32   3,813        5,684                     1,871  49.07%   x 0.67 
 lookup_fail_i64   3,524        3,324                      -200  -5.68%   x 1.06 
 lookup_i32        4,846        4,467                      -379  -7.82%   x 1.08 
 lookup_i64        5,420        9,067                     3,647  67.29%   x 0.60

My conclusion from this is that Fx is not a very good general hash function. These benchmarks use keys that are closely packed small integers, and it would be difficult for a hash function to mess up the perfect low-bit entropy handed to it here. I think you should reintroduce some form of the find_existing_high_bits benchmark, which highlights where Fx is problematic.

Using fnv makes the benchmarks more similar, but is a bit slower.

Amanieu · 2019-08-04T17:41:59Z

We've switched the hash function to AHash, which solves this issue.

Amanieu · 2019-08-04T17:42:11Z

See #97

edre added 3 commits April 25, 2019 15:39

Fix serde tests that depended on iteration order.

1f99d31

Make serde tests not dependent on the hash order on all platforms.

cfa7cc1

tkaitchuck mentioned this pull request May 29, 2019

Remove bias in benchmarks #81

Closed

Amanieu closed this Aug 4, 2019

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Extract index from the high hash bits instead of the low bits. #71

Extract index from the high hash bits instead of the low bits. #71

edre commented Apr 25, 2019

Amanieu commented Apr 26, 2019

edre commented Apr 26, 2019

Amanieu commented Apr 26, 2019

edre commented Apr 26, 2019

Amanieu commented Aug 4, 2019

Amanieu commented Aug 4, 2019

Extract index from the high hash bits instead of the low bits. #71

Extract index from the high hash bits instead of the low bits. #71

Conversation

edre commented Apr 25, 2019

Amanieu commented Apr 26, 2019

edre commented Apr 26, 2019

Amanieu commented Apr 26, 2019

edre commented Apr 26, 2019

Amanieu commented Aug 4, 2019

Amanieu commented Aug 4, 2019