Decrease HashMap Load Factor #38003

pssalmeida · 2016-11-25T17:43:29Z

Currently HashMap uses a load factor of approximately 0.9. While this is appropriate for reads (get), as Robin Hood Hashing gives a low distance from the starting point with very low, effectively fixed variance, such is not the case for writes (insert).

When writing over an occupied bucked, it will incur a displacement involving reads and writes over several buckets, either for the new value or the values already in the map. Not only this number is larger than for reads, but also has an increasing variance, quickly becoming very large as load approaches 1.

For a load factor a: (UPDATED, previous lookup values were for random probing)

the average number of probes in a read is ~~1/a*ln(1/1-a))~~ (1+1/(1-a))/2
the average number of written buckets in a write is 1/(1-a)
the probability of writing no more than k buckets in a write is 1 - a^k

In a table, the average buckets written, the 95th percentile buckets written, and the average buckets probed in a get, for different load factors:

a	Avg W	95% W	Avg R
0.500	2	4.3	~~1.39~~ 1.5
0.666	3	7.4	~~1.65~~ 2.0
0.750	4	10.4	~~1.84~~ 2.5
0.800	5	13.4	~~2.01~~ 3.0
0.833	6	16.4	~~2.15~~ 3.5
0.857	7	19.4	~~2.27~~ 4.0
0.875	8	22.4	~~2.38~~ 4.5
0.888	9	25.4	~~2.47~~ 5.0
0.900	10	28.4	~~2.56~~ 5.5
0.950	20	58.4	~~3.15~~ 10.5
0.990	100	298.1	~~4.65~~ 50.5

It can be seen that for a = 0.9, the average probes when reading is just ~~2.56~~ 5.5, but the average number of buckets written when inserting is 10 and the 95th percentile is 28 buckets. For deletes it will not be much better when doing the backwards shift, as most keys will be displaced from their origin (the very low variance behaviour or RH).

While for many cases, the average over the map lifetime may not be much impacted, if it happens that a map is being operated near capacity, with elements constantly being inserted and others removed, e.g., when implementing an LRU cache, the number of buckets (re)written per insert (and also per delete) will be manifestly higher than desired.

While the exact load factor choice can be argued about, it seems that 0.9 is manifestly too high. Going from 0.9 to 0.8 would have a 12.5% memory overhead, while going from 0.8 to 0.9 doubles the average number of buckets written when inserting, and more than doubles the 95th percentile. Therefore, a load factor around 0.8 seems more reasonable.

sfackler · 2016-11-25T17:44:57Z

Concrete performance numbers for various load factors would probably be helpful.

abonander · 2016-11-25T20:48:50Z

Honestly, why isn't the load factor configurable to begin with? That would make it quite a bit easier to tune for specific use cases.

arthurprs · 2016-11-26T16:51:19Z

I was going to propose 0.833 (5/6) based on some tests I ran earlier, but we need more numbers to take an informed decision.

funny-falcon · 2016-11-26T22:59:53Z

Looks like 0.8 gives 20% improvement on read (ok, number of buckets readed is not directly transfers to performance, but still).

pczarn · 2016-11-27T17:34:23Z

I don't know the distribution for the number of buckets accessed by insertion, but here are accurate numbers for lookup.

a	50% R	90% R	95% R
0.8	1.8	5.5	7.0
0.833	2.1	6.7	8.7
0.9	3.5	10.9	14.5

funny-falcon · 2016-11-27T20:55:56Z

So, 0.8 compared to 0.833 consumes 5% more memory and gives 15-20% less collision chain.

arthurprs · 2016-11-27T21:36:44Z

@funny-falcon how did you come up with these numbers?

pssalmeida · 2016-11-27T21:53:57Z

@abonander I also found surprising not being able to choose load factor, something that would be a good idea. Anyway, the question about what the sensible default should be remains.

@pczarn I find strange such long tails for lookup distribution ... I had the impression the variance would be much less. Also the shape does not look right. For high load factor it should start by a gentle slope and have a steep slope near the average.

Anyway, lookups are not the problem, as we are talking about 1 cache line most times, sometimes 2, rarely more than 2. I would also point out that for inserts, the cost is "amplified" by the payload size. We are not talking about only traversing hashes. An average of 10 means, in addition to the hashes, displacing 10 times the size of key and value. This makes such large numbers even more scary.

funny-falcon · 2016-11-27T23:30:38Z

@arthurprs only with mathematics and numbers from a table above. (Not, I've edited my comment: 15-20% less collision chain, not performance. Cause of great cache locality of Robin-Hood hashing, performance difference will be usually lesser).

For used space:

0.833 fill rate means one need 1/0.833=1.2 space, or 20% excess space.
0.8 fill rate means one need 1/0.8=1.25 space, or 25% excess space.
1.25/1.2 = 1.042, so it means, 0.8 fill rate uses 4.2% more space (in average) than 0.833 fill rate.

matklad · 2016-11-28T12:39:51Z

@pssalmeida are those back of an envelope calculations? They feel a bit odd to me, because they say that the average probe count for reading at 0.9 load factor should be 2.56. I've recently measured plain linear probing and robin hood hashing and I get 5.48 as the average probe length (for the elements which are in the table). Perhaps my measurements are wrong, but it still might be a good idea to instrument and measure the current implementation, as suggested by @sfackler .

arthurprs · 2016-11-28T16:10:18Z

@matklad I was checking your test-repo, I can't wrap my head around the timing/cache results. It should take less time and put less pressure on the cache, it makes no sense to me whatsoever.

matklad · 2016-11-28T16:45:58Z

@arthurprs this puzzles me as well! I want to plot the actual distribution of probe lengths and to run some specific tests about cache access (basically execute queries like "sum of the elements from i to k + i" with different distributions of k). But at least the paper you've linked to from another issues seems to imply that RH is not automatically better than plain LP?

We can conclude that RH provides a very interesting trade-off: for a small penalty (often within 1-5%) in peak performance on the best of cases (all lookups successful), RH significantly improves on the worst-case over LP in general, up to more than a factor 4.

funny-falcon · 2016-11-28T17:22:52Z

Possible solve of a puzzle:

lesser variance not only means there is less number of long chains, but it could also mean there is less number of short chains.
longer chain have more impact on mean value,
So, given mean value of all chains is same, we have RH hashing has "larger mean of small chains".
in other words: while RH has better "worst time" for single lookup, it has worse "average time".

And it is clear result of its algorithm: it prefers to make short chains longer to cut probability of "too long chain".

I think, if you compare "Distribution of number of buckets" between Linear hashing and RH hashing (as @pczarn did above for RH), it will describe your timing and cache misses.

arthurprs · 2016-11-28T17:56:08Z

That's it @funny-falcon. It's probably worth pointing out that the hash algorithm is "~perfect" and there's no deletions involved, so that's the best case for LP.

RH LP

25'th percentile is 2
50'th percentile is 4
85'th percentile is 10
90'th percentile is 12
95'th percentile is 15
99'th percentile is 23
100'th percentile is 56

LP

25'th percentile is 1
50'th percentile is 1
85'th percentile is 6
90'th percentile is 10
95'th percentile is 21
99'th percentile is 75
100'th percentile is 1271

pssalmeida · 2016-11-28T19:44:35Z

@matklad I made a mistake. I was using the results from https://arxiv.org/abs/1605.04031, which I had just skimmed, but now I noticed that they discuss random probing. Meanwhile I found this talk https://www.ime.usp.br/~yoshi/FoCM2014/talks/viola.pdf, which states that the mean is (1 + 1/(1-a))/2, which gives exactly your experimental results!

So, the results are worse, even for lookups, which is another reason to decrease the load factor.

pssalmeida · 2016-11-28T19:54:42Z

Updated table with new formula for average number of buckets in lookup, for linear probing.

sfackler · 2016-11-28T19:56:02Z

Again, I care far more about actual timings of real operations rather than probe counts.

arthurprs · 2016-11-28T20:10:10Z

@pssalmeida Let me know if you'd like me to run numbers. I'm running HM benchmarks extensively during the last couple of months.

pssalmeida · 2016-11-28T20:10:34Z

@sfackler sure. If someone who has been already doing such measurements on HashMap could volunteer it would be nice. The experiments by @matklad already show that timing and cache misses can increase over a simple linear probing.

Intuitively, for simple LP, most hashes will have smaller distances to origin than RH LP, and will incur a single cache line read. So, even if some are farther, as long as they remain in a second cache line, it won't matter much if they are farther away, and only rarely a third cache line will be visited.

For RH LP, most hashes will have a distance to origin closer to the mean. If that is not very small, a non-negligible number of lookups may incur a second cache line read, giving worse results.

pssalmeida · 2016-11-28T20:12:49Z

@arthurprs If you could run some tests it would be great. An interesting case would be doing lots of insert-delete cycles in a full loaded to capacity table.

arthurprs · 2016-11-28T20:41:03Z

EDIT: updated numbers bellow

brson · 2016-11-28T22:18:34Z

Numbers look good to me.

sfackler · 2016-11-29T01:01:56Z

@abonander One potential concern there is that the load factor you'd select is somewhat tied to the implementation. If we significantly changed HashMap's internals (as we're probably going to do to resolve the n^2 thing), the load factor someone's selected may all of the sudden be unacceptably slow.

abonander · 2016-11-29T01:38:18Z

@sfackler Then it may be better to abstract the load factor behind an enum that just lets them select a load factor that produces the desired effect from the implementation, maybe like the following:

/// An abstraction over a hashtable's load factor which allows the user to choose which aspect they
/// would prefer to optimize, possibly at a detriment to other metrics.
pub enum LoadFactor {
    /// Use a load factor that produces the fastest lookup times without seriously degrading other aspects.
    OptimizeLookup,
    /// Use a load factor that produces the fastest insertion times without seriously degrading other aspects.
    OptimizeInsert,
    /// Use a load factor that produces the lowest memory usage without seriously degrading other aspects.
    SaveMemory,
    /// Use a load-factor with good all-round performance.
    Balanced,
    /// A custom load factor whose performance may be affected by changes to the implementation.
    Custom(f64),
}

Then the default would be Balanced. People using the Custom variant will be acknowledging that their chosen load-factor may change performance characteristics when the implementation is changed.

arthurprs · 2016-11-29T07:20:11Z

The fill factor isn't guaranteed or exposed directly by the api, so It shouldn't be a problem. An api for exposing/changing load factor will likely need an rfc.

arthurprs · 2016-11-29T21:05:12Z

Complete results now, code is here https://gist.github.com/arthurprs/97dabb2c28f8d778dc377d69d0b95758

It's really important to emphasize that only the grow benchmark is realistic, the others simulate the worst case performance of each load factor.

removed due to broken numbers, see bellow

pssalmeida · 2016-11-29T22:04:57Z

@arthurprs Thanks for the benchmarks. The lru_sim, even if not realistic shows an impressive difference, but even the others show that there is considerable impact going from 0.91 to lower load factors.

arthurprs · 2017-01-18T12:54:05Z

Going over some issues I noticed that people subscribed to this might also be interested on #38368

arthurprs · 2017-04-19T18:40:58Z

Now that #40561 is in, this is the only remaining place I can think of that we can extract nice gains (at least without some kind of major rewrite with it's own trade-offs).

With the updated code above I conducted several benchmarks with various load factors: 0.909 (today), 0.875, 0.857, 0.833 and 0.8;
The results can be seen in the gist (edit: removed)

The current implementation is reasonably compact, so it wouldn't be unreasonable to set it down to 0.80, as that's still high (the highest?) compared to most stdlibs. Something in between is probably the way to go though.

arthurprs · 2017-05-01T13:09:51Z

I found out that the lookup benchs are broken. I will do another round of tests soon.

arthurprs · 2017-05-26T16:38:25Z

Ok, I found some time today and here are the final results. It turns out that benchmarking across load factors is really hard.

Tested load factors: 90.9% (current), 87.5%, 85.7%, 83.3% and 80%.

 name                            s909_ ns/iter  s875_ ns/iter  diff ns/iter   diff % 
 ::grow_10_000                   711,918        629,273             -82,645  -11.61% 
 ::grow_big_value_10_000         1,420,685      1,286,508          -134,177   -9.44% 
 ::insert_100_000                3,504,832      3,453,408           -51,424   -1.47% 
 ::insert_int_big_value_100_000  7,560,059      7,538,186           -21,873   -0.29% 
 ::insert_str_100_000            3,761,604      3,632,067          -129,537   -3.44% 
 ::lookup_100_000                341,234        344,626               3,392    0.99% 
 ::lookup_100_000_unif           344,869        345,345                 476    0.14% 
 ::lru_sim                       9,748,455      9,800,613            52,158    0.54% 
 ::merge_shuffle                 1,062,361      970,109             -92,252   -8.68% 
➜  hashmap2 git:(adapt) ✗ cargo benchcmp s909_ s857_ bench.txt
 name                            s909_ ns/iter  s857_ ns/iter  diff ns/iter   diff % 
 ::grow_10_000                   711,918        590,330            -121,588  -17.08% 
 ::grow_big_value_10_000         1,420,685      1,231,806          -188,879  -13.29% 
 ::insert_100_000                3,504,832      3,543,471            38,639    1.10% 
 ::insert_int_big_value_100_000  7,560,059      7,601,703            41,644    0.55% 
 ::insert_str_100_000            3,761,604      3,688,459           -73,145   -1.94% 
 ::lookup_100_000                341,234        347,284               6,050    1.77% 
 ::lookup_100_000_unif           344,869        345,188                 319    0.09% 
 ::lru_sim                       9,748,455      9,665,578           -82,877   -0.85% 
 ::merge_shuffle                 1,062,361      956,885            -105,476   -9.93% 
➜  hashmap2 git:(adapt) ✗ cargo benchcmp s909_ s833_ bench.txt
 name                            s909_ ns/iter  s833_ ns/iter  diff ns/iter   diff % 
 ::grow_10_000                   711,918        564,378            -147,540  -20.72% 
 ::grow_big_value_10_000         1,420,685      1,177,490          -243,195  -17.12% 
 ::insert_100_000                3,504,832      3,484,783           -20,049   -0.57% 
 ::insert_int_big_value_100_000  7,560,059      7,541,711           -18,348   -0.24% 
 ::insert_str_100_000            3,761,604      3,615,516          -146,088   -3.88% 
 ::lookup_100_000                341,234        347,486               6,252    1.83% 
 ::lookup_100_000_unif           344,869        352,215               7,346    2.13% 
 ::lru_sim                       9,748,455      9,739,924            -8,531   -0.09% 
 ::merge_shuffle                 1,062,361      908,930            -153,431  -14.44% 
➜  hashmap2 git:(adapt) ✗ cargo benchcmp s909_ s80_ bench.txt
 name                            s909_ ns/iter  s80_ ns/iter  diff ns/iter   diff % 
 ::grow_10_000                   711,918        517,805           -194,113  -27.27% 
 ::grow_big_value_10_000         1,420,685      1,132,740         -287,945  -20.27% 
 ::insert_100_000                3,504,832      3,542,416           37,584    1.07% 
 ::insert_int_big_value_100_000  7,560,059      7,613,803           53,744    0.71% 
 ::insert_str_100_000            3,761,604      3,911,995          150,391    4.00% 
 ::lookup_100_000                341,234        345,034              3,800    1.11% 
 ::lookup_100_000_unif           344,869        343,880               -989   -0.29% 
 ::lru_sim                       9,748,455      9,656,683          -91,772   -0.94% 
 ::merge_shuffle                 1,062,361      863,767           -198,594  -18.69%

lru_sim (delete + insert), insert and lookup are expected to perform the same because the load factor is effectively fixed in those benchmarks.

arthurprs · 2017-05-26T17:36:15Z

And some extra worst case tests, where constants are set to 95% of .capacity() after creation with with_capacity(X) https://gist.github.com/arthurprs/ae084a0c5a119ddd732d93eb6fd31d05

arthurprs · 2019-11-08T09:49:04Z

This may not be as significant with the new hashmap (hashbrown) implementation. @Amanieu might want to add his thoughts here.

Amanieu · 2019-11-08T16:33:31Z

The main concern here was that the high load factor would cause many entries to be moved during an insertion. Since the new hash table implementation does not move existing entries on insert, I think this can be closed.

arthurprs mentioned this issue Nov 26, 2016

Exposure of HashMap iteration order allows for O(n²) blowup. #36481

Open

pczarn mentioned this issue Nov 27, 2016

Adaptive hashing rust-lang/rfcs#1796

Closed

apasel422 added the A-libs label Nov 28, 2016

alexcrichton added the T-libs-api Relevant to the library API team, which will review and decide on the PR/issue. label Nov 28, 2016

brson added the relnotes Marks issues that should be documented in the release notes of the next release. label Nov 28, 2016

pczarn mentioned this issue Jan 31, 2017

WIP: Robin Hood hashing borgbackup/borg#1429

Closed

steveklabnik removed the A-libs label Mar 24, 2017

Mark-Simulacrum added the C-feature-request Category: A feature request, i.e: not implemented / a PR. label Jul 26, 2017

dtolnay added I-needs-decision Issue: In need of a decision. and removed relnotes Marks issues that should be documented in the release notes of the next release. labels Nov 18, 2017

funny-falcon mentioned this issue Aug 1, 2019

Hash now uses an open addressing algorithm crystal-lang/crystal#8017

Merged

Amanieu closed this as completed Nov 8, 2019

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Decrease HashMap Load Factor #38003

Decrease HashMap Load Factor #38003

pssalmeida commented Nov 25, 2016 •

edited

Loading

sfackler commented Nov 25, 2016

abonander commented Nov 25, 2016

arthurprs commented Nov 26, 2016

funny-falcon commented Nov 26, 2016

pczarn commented Nov 27, 2016 •

edited

Loading

funny-falcon commented Nov 27, 2016 •

edited

Loading

arthurprs commented Nov 27, 2016

pssalmeida commented Nov 27, 2016

funny-falcon commented Nov 27, 2016

matklad commented Nov 28, 2016

arthurprs commented Nov 28, 2016

matklad commented Nov 28, 2016

funny-falcon commented Nov 28, 2016

arthurprs commented Nov 28, 2016 •

edited

Loading

pssalmeida commented Nov 28, 2016

pssalmeida commented Nov 28, 2016

sfackler commented Nov 28, 2016

arthurprs commented Nov 28, 2016

pssalmeida commented Nov 28, 2016

pssalmeida commented Nov 28, 2016

arthurprs commented Nov 28, 2016 •

edited

Loading

brson commented Nov 28, 2016

sfackler commented Nov 29, 2016

abonander commented Nov 29, 2016 •

edited

Loading

arthurprs commented Nov 29, 2016 •

edited

Loading

arthurprs commented Nov 29, 2016 •

edited

Loading

pssalmeida commented Nov 29, 2016

arthurprs commented Jan 18, 2017

arthurprs commented Apr 19, 2017 •

edited

Loading

arthurprs commented May 1, 2017 •

edited

Loading

arthurprs commented May 26, 2017 •

edited

Loading

arthurprs commented May 26, 2017

arthurprs commented Nov 8, 2019

Amanieu commented Nov 8, 2019

Decrease HashMap Load Factor #38003

Decrease HashMap Load Factor #38003

Comments

pssalmeida commented Nov 25, 2016 • edited Loading

sfackler commented Nov 25, 2016

abonander commented Nov 25, 2016

arthurprs commented Nov 26, 2016

funny-falcon commented Nov 26, 2016

pczarn commented Nov 27, 2016 • edited Loading

funny-falcon commented Nov 27, 2016 • edited Loading

arthurprs commented Nov 27, 2016

pssalmeida commented Nov 27, 2016

funny-falcon commented Nov 27, 2016

matklad commented Nov 28, 2016

arthurprs commented Nov 28, 2016

matklad commented Nov 28, 2016

funny-falcon commented Nov 28, 2016

arthurprs commented Nov 28, 2016 • edited Loading

pssalmeida commented Nov 28, 2016

pssalmeida commented Nov 28, 2016

sfackler commented Nov 28, 2016

arthurprs commented Nov 28, 2016

pssalmeida commented Nov 28, 2016

pssalmeida commented Nov 28, 2016

arthurprs commented Nov 28, 2016 • edited Loading

brson commented Nov 28, 2016

sfackler commented Nov 29, 2016

abonander commented Nov 29, 2016 • edited Loading

arthurprs commented Nov 29, 2016 • edited Loading

arthurprs commented Nov 29, 2016 • edited Loading

pssalmeida commented Nov 29, 2016

arthurprs commented Jan 18, 2017

arthurprs commented Apr 19, 2017 • edited Loading

arthurprs commented May 1, 2017 • edited Loading

arthurprs commented May 26, 2017 • edited Loading

arthurprs commented May 26, 2017

arthurprs commented Nov 8, 2019

Amanieu commented Nov 8, 2019

pssalmeida commented Nov 25, 2016 •

edited

Loading

pczarn commented Nov 27, 2016 •

edited

Loading

funny-falcon commented Nov 27, 2016 •

edited

Loading

arthurprs commented Nov 28, 2016 •

edited

Loading

arthurprs commented Nov 28, 2016 •

edited

Loading

abonander commented Nov 29, 2016 •

edited

Loading

arthurprs commented Nov 29, 2016 •

edited

Loading

arthurprs commented Nov 29, 2016 •

edited

Loading

arthurprs commented Apr 19, 2017 •

edited

Loading

arthurprs commented May 1, 2017 •

edited

Loading

arthurprs commented May 26, 2017 •

edited

Loading