Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Robin hood hashmap #12081

Merged
merged 1 commit into from
Mar 13, 2014
Merged

Robin hood hashmap #12081

merged 1 commit into from
Mar 13, 2014

Conversation

cgaebel
Copy link
Contributor

@cgaebel cgaebel commented Feb 7, 2014

Partially addresses #11783.

Previously, rust's hashtable was totally unoptimized. It used an Option
per key-value pair, and used very naive open allocation.

The old hashtable had very high variance in lookup time. For an example,
see the 'find_nonexisting' benchmark below. This is fixed by keys in
'lucky' spots with a low probe sequence length getting their good spots
stolen by keys with long probe sequence lengths. This reduces hashtable
probe length variance, while maintaining the same mean.

Also, other optimization liberties were taken. Everything is as cache
aware as possible, and this hashtable should perform extremely well for
both large and small keys and values.

Benchmarks:

comprehensive_old_hashmap         378 ns/iter (+/- 8)
comprehensive_new_hashmap         206 ns/iter (+/- 4)
1.8x faster

old_hashmap_as_queue              238 ns/iter (+/- 8)
new_hashmap_as_queue              119 ns/iter (+/- 2)
2x faster

old_hashmap_insert                172 ns/iter (+/- 8)
new_hashmap_insert                146 ns/iter (+/- 11)
1.17x faster

old_hashmap_find_existing         50 ns/iter (+/- 12)
new_hashmap_find_existing         35 ns/iter (+/- 6)
1.43x faster

old_hashmap_find_notexisting      49 ns/iter (+/- 49)
new_hashmap_find_notexisting      34 ns/iter (+/- 4)
1.44x faster

Memory usage of old hashtable (64-bit assumed):

aligned(8+sizeof(Option)+sizeof(K)+sizeof(V))/0.75 + 48ish bytes

Memory usage of new hashtable:

(aligned(sizeof(K))
+ aligned(sizeof(V))
+ 8)/0.9 + 112ish bytes

Timing of building librustc:

compile_and_link: x86_64-unknown-linux-gnu/stage0/lib/rustlib/x86_64-unknown-linux-gnu/lib/librustc
time: 0.457 s   parsing
time: 0.028 s   gated feature checking
time: 0.000 s   crate injection
time: 0.108 s   configuration 1
time: 1.049 s   expansion
time: 0.219 s   configuration 2
time: 0.222 s   maybe building test harness
time: 0.223 s   prelude injection
time: 0.268 s   assinging node ids and indexing ast
time: 0.075 s   external crate/lib resolution
time: 0.026 s   language item collection
time: 1.016 s   resolution
time: 0.038 s   lifetime resolution
time: 0.000 s   looking for entry point
time: 0.030 s   looking for macro registrar
time: 0.061 s   freevar finding
time: 0.138 s   region resolution
time: 0.110 s   type collecting
time: 0.072 s   variance inference
time: 0.126 s   coherence checking
time: 9.110 s   type checking
time: 0.186 s   const marking
time: 0.049 s   const checking
time: 0.418 s   privacy checking
time: 0.057 s   effect checking
time: 0.033 s   loop checking
time: 1.293 s   compute moves
time: 0.182 s   match checking
time: 0.242 s   liveness checking
time: 0.866 s   borrow checking
time: 0.150 s   kind checking
time: 0.013 s   reachability checking
time: 0.175 s   death checking
time: 0.461 s   lint checking
time: 13.112 s  translation
  time: 4.352 s llvm function passes
  time: 96.702 s    llvm module passes
  time: 50.574 s    codegen passes
time: 154.611 s LLVM passes
  time: 2.821 s running linker
time: 15.750 s  linking


compile_and_link: x86_64-unknown-linux-gnu/stage1/lib/rustlib/x86_64-unknown-linux-gnu/lib/librustc
time: 0.422 s   parsing
time: 0.031 s   gated feature checking
time: 0.000 s   crate injection
time: 0.126 s   configuration 1
time: 1.014 s   expansion
time: 0.251 s   configuration 2
time: 0.249 s   maybe building test harness
time: 0.273 s   prelude injection
time: 0.279 s   assinging node ids and indexing ast
time: 0.076 s   external crate/lib resolution
time: 0.033 s   language item collection
time: 1.028 s   resolution
time: 0.036 s   lifetime resolution
time: 0.000 s   looking for entry point
time: 0.029 s   looking for macro registrar
time: 0.063 s   freevar finding
time: 0.133 s   region resolution
time: 0.111 s   type collecting
time: 0.077 s   variance inference
time: 0.565 s   coherence checking
time: 8.953 s   type checking
time: 0.176 s   const marking
time: 0.050 s   const checking
time: 0.401 s   privacy checking
time: 0.063 s   effect checking
time: 0.032 s   loop checking
time: 1.291 s   compute moves
time: 0.172 s   match checking
time: 0.249 s   liveness checking
time: 0.831 s   borrow checking
time: 0.121 s   kind checking
time: 0.013 s   reachability checking
time: 0.179 s   death checking
time: 0.503 s   lint checking
time: 14.385 s  translation
  time: 4.495 s llvm function passes
  time: 92.234 s    llvm module passes
  time: 51.172 s    codegen passes
time: 150.809 s LLVM passes
  time: 2.542 s running linker
time: 15.109 s  linking

BUT accesses are much more cache friendly. In fact, if the probe
sequence length is below 8, only two cache lines worth of hashes will be
pulled into cache. This is unlike the old version which would have to
stride over the stoerd keys and values, and would be more cache
unfriendly the bigger the stored values got.

And did you notice the higher load factor? We can now reasonably get a
load factor of 0.9 with very good performance.

Please review this very closely. This is my first major contribution to Rust. Sorry for the ugly diff!

use num;
use option::{None, Option, Some};
use rand::Rng;
use option::*;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We're trying to avoid glob imports.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I import so, so many things from iter that it makes more sense to glob.

All other globs have been removed.

@nikomatsakis
Copy link
Contributor

I'm not even sure how much controversy there is at this point, just thought I'd throw out my 2 cents ;)

mod table {
use std::clone::Clone;
use std::cmp::Eq;
use std::hash::{Hash, Hasher, sip};
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this is the (legitimately) unused import, rather than the outer one.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Dear god that's embarassing.

Thanks, though..

@huonw
Copy link
Member

huonw commented Mar 1, 2014

Re assertions: #12616 gives a precedent for writing

// FIXME #12049
if cfg!(test) { assert!(...) }

@cgaebel
Copy link
Contributor Author

cgaebel commented Mar 1, 2014

Added the "debug asserts" referencing #12049.

@emberian
Copy link
Member

emberian commented Mar 2, 2014

@cgaebel it's probably a staging thing. You might need a #[cfg(not(stage0))] use ... or something.

@cgaebel
Copy link
Contributor Author

cgaebel commented Mar 2, 2014

Naw huon got it. It was just me not looking at the line numbers and being confused. This should be good to merge.

@emberian
Copy link
Member

emberian commented Mar 2, 2014

/me quietly sits on his hands

@huonw
Copy link
Member

huonw commented Mar 2, 2014

BTW, are the numbers in the PR description still current with the various changes that have been made?

@cgaebel
Copy link
Contributor Author

cgaebel commented Mar 2, 2014

No, but should be same ballpark. Nothing major that should affect performance has changed.

@nikomatsakis
Copy link
Contributor

On a recent flight, I finally found the time to read the PR in
depth. This is some excellent work. I am satisfied with the
refactoring of the unsafe bits combined with the usage assertions,
it's even better than what I had in mind. I think though I have some
ideas for how to statically verify the usage of FullIndex and friends
are used properly, but this could be a follow-up PR (and I have to try
it out, there might be a complication I have not foreseen).

I will look over my notes later today and add them into the PR. I
seem to recall that mostly I just had nits, apart from my thoughts on
how to make it safer.

Regarding the mutable iterator in specific -- I think you could make
it "safe" along the model of the vector iterators, by having the
iterator take mutable slices that it shrinks for each step. But this
is just trading one unsafety for another, since those slices would
contain partially initialized data, and the compiler would be unaware
of it. I think the current setup (artifically prolonging the lifetime)
is a smaller sin.

@nikomatsakis
Copy link
Contributor

(So I investigated my "safety" ideas a bit more. I think it can be done, but it's a non-trivial patch, and it's unclear if the resulting code is clearer. It would be somewhat nicer if we could parameterize over mutability, though using traits it might be possible to write the relevant code generically. Regardless, as the patch'd be non-trivial, it might affect perf, and it may not be more readable really, so I think it ought to be considered as a follow up.)

//! # Example
//!
//! ```rust
//! use collections::HashMap;
//! use std::hashmap::HashMap;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks like a merge ended up with the old version here?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed. Good catch!

@brson
Copy link
Contributor

brson commented Mar 11, 2014

The unsafety is unfortunate but we can't ignore this perf improvement. Let's land this but please do keep trying to abstract out the unsafe code here.

@cgaebel Thanks.

@cgaebel
Copy link
Contributor Author

cgaebel commented Mar 11, 2014

The public interface is safe. What pieces of unsafety should I focus on? The "double-take bug"?

@alexcrichton
Copy link
Member

Feel free to comment on the PR whenever you force push, sadly none of us get notifications!

@cgaebel
Copy link
Contributor Author

cgaebel commented Mar 12, 2014

Oops. /pushed

@nikomatsakis
Copy link
Contributor

The public interface is safe. What pieces of unsafety should I focus on? The "double-take bug"?

Note: I have an in-progress branch addressing the double take bug. I'll push it somewhere public.

Previously, rust's hashtable was totally unoptimized. It used an Option
per key-value pair, and used very naive open allocation.

The old hashtable had very high variance in lookup time. For an example,
see the 'find_nonexisting' benchmark below. This is fixed by keys in
'lucky' spots with a low probe sequence length getting their good spots
stolen by keys with long probe sequence lengths. This reduces hashtable
probe length variance, while maintaining the same mean.

Also, other optimization liberties were taken. Everything is as cache
aware as possible, and this hashtable should perform extremely well for
both large and small keys and values.

Benchmarks:

comprehensive_old_hashmap         378 ns/iter (+/- 8)
comprehensive_new_hashmap         206 ns/iter (+/- 4)
1.8x faster

old_hashmap_as_queue              238 ns/iter (+/- 8)
new_hashmap_as_queue              119 ns/iter (+/- 2)
2x faster

old_hashmap_insert                172 ns/iter (+/- 8)
new_hashmap_insert                146 ns/iter (+/- 11)
1.17x faster

old_hashmap_find_existing         50 ns/iter (+/- 12)
new_hashmap_find_existing         35 ns/iter (+/- 6)
1.43x faster

old_hashmap_find_notexisting      49 ns/iter (+/- 49)
new_hashmap_find_notexisting      34 ns/iter (+/- 4)
1.44x faster

Memory usage of old hashtable (64-bit assumed):

aligned(8+sizeof(K)+sizeof(V))/0.75 + 6 words

Memory usage of new hashtable:

(aligned(sizeof(K))
+ aligned(sizeof(V))
+ 8)/0.9 + 6.5 words

BUT accesses are much more cache friendly. In fact, if the probe
sequence length is below 8, only two cache lines worth of hashes will be
pulled into cache. This is unlike the old version which would have to
stride over the stoerd keys and values, and would be more cache
unfriendly the bigger the stored values got.

And did you notice the higher load factor? We can now reasonably get a
load factor of 0.9 with very good performance.
@cgaebel
Copy link
Contributor Author

cgaebel commented Mar 12, 2014

Force pushed.

bors added a commit that referenced this pull request Mar 13, 2014
Partially addresses #11783.

Previously, rust's hashtable was totally unoptimized. It used an Option
per key-value pair, and used very naive open allocation.

The old hashtable had very high variance in lookup time. For an example,
see the 'find_nonexisting' benchmark below. This is fixed by keys in
'lucky' spots with a low probe sequence length getting their good spots
stolen by keys with long probe sequence lengths. This reduces hashtable
probe length variance, while maintaining the same mean.

Also, other optimization liberties were taken. Everything is as cache
aware as possible, and this hashtable should perform extremely well for
both large and small keys and values.

Benchmarks:

```
comprehensive_old_hashmap         378 ns/iter (+/- 8)
comprehensive_new_hashmap         206 ns/iter (+/- 4)
1.8x faster

old_hashmap_as_queue              238 ns/iter (+/- 8)
new_hashmap_as_queue              119 ns/iter (+/- 2)
2x faster

old_hashmap_insert                172 ns/iter (+/- 8)
new_hashmap_insert                146 ns/iter (+/- 11)
1.17x faster

old_hashmap_find_existing         50 ns/iter (+/- 12)
new_hashmap_find_existing         35 ns/iter (+/- 6)
1.43x faster

old_hashmap_find_notexisting      49 ns/iter (+/- 49)
new_hashmap_find_notexisting      34 ns/iter (+/- 4)
1.44x faster

Memory usage of old hashtable (64-bit assumed):

aligned(8+sizeof(Option)+sizeof(K)+sizeof(V))/0.75 + 48ish bytes

Memory usage of new hashtable:

(aligned(sizeof(K))
+ aligned(sizeof(V))
+ 8)/0.9 + 112ish bytes

Timing of building librustc:

compile_and_link: x86_64-unknown-linux-gnu/stage0/lib/rustlib/x86_64-unknown-linux-gnu/lib/librustc
time: 0.457 s   parsing
time: 0.028 s   gated feature checking
time: 0.000 s   crate injection
time: 0.108 s   configuration 1
time: 1.049 s   expansion
time: 0.219 s   configuration 2
time: 0.222 s   maybe building test harness
time: 0.223 s   prelude injection
time: 0.268 s   assinging node ids and indexing ast
time: 0.075 s   external crate/lib resolution
time: 0.026 s   language item collection
time: 1.016 s   resolution
time: 0.038 s   lifetime resolution
time: 0.000 s   looking for entry point
time: 0.030 s   looking for macro registrar
time: 0.061 s   freevar finding
time: 0.138 s   region resolution
time: 0.110 s   type collecting
time: 0.072 s   variance inference
time: 0.126 s   coherence checking
time: 9.110 s   type checking
time: 0.186 s   const marking
time: 0.049 s   const checking
time: 0.418 s   privacy checking
time: 0.057 s   effect checking
time: 0.033 s   loop checking
time: 1.293 s   compute moves
time: 0.182 s   match checking
time: 0.242 s   liveness checking
time: 0.866 s   borrow checking
time: 0.150 s   kind checking
time: 0.013 s   reachability checking
time: 0.175 s   death checking
time: 0.461 s   lint checking
time: 13.112 s  translation
  time: 4.352 s llvm function passes
  time: 96.702 s    llvm module passes
  time: 50.574 s    codegen passes
time: 154.611 s LLVM passes
  time: 2.821 s running linker
time: 15.750 s  linking


compile_and_link: x86_64-unknown-linux-gnu/stage1/lib/rustlib/x86_64-unknown-linux-gnu/lib/librustc
time: 0.422 s   parsing
time: 0.031 s   gated feature checking
time: 0.000 s   crate injection
time: 0.126 s   configuration 1
time: 1.014 s   expansion
time: 0.251 s   configuration 2
time: 0.249 s   maybe building test harness
time: 0.273 s   prelude injection
time: 0.279 s   assinging node ids and indexing ast
time: 0.076 s   external crate/lib resolution
time: 0.033 s   language item collection
time: 1.028 s   resolution
time: 0.036 s   lifetime resolution
time: 0.000 s   looking for entry point
time: 0.029 s   looking for macro registrar
time: 0.063 s   freevar finding
time: 0.133 s   region resolution
time: 0.111 s   type collecting
time: 0.077 s   variance inference
time: 0.565 s   coherence checking
time: 8.953 s   type checking
time: 0.176 s   const marking
time: 0.050 s   const checking
time: 0.401 s   privacy checking
time: 0.063 s   effect checking
time: 0.032 s   loop checking
time: 1.291 s   compute moves
time: 0.172 s   match checking
time: 0.249 s   liveness checking
time: 0.831 s   borrow checking
time: 0.121 s   kind checking
time: 0.013 s   reachability checking
time: 0.179 s   death checking
time: 0.503 s   lint checking
time: 14.385 s  translation
  time: 4.495 s llvm function passes
  time: 92.234 s    llvm module passes
  time: 51.172 s    codegen passes
time: 150.809 s LLVM passes
  time: 2.542 s running linker
time: 15.109 s  linking
```

BUT accesses are much more cache friendly. In fact, if the probe
sequence length is below 8, only two cache lines worth of hashes will be
pulled into cache. This is unlike the old version which would have to
stride over the stoerd keys and values, and would be more cache
unfriendly the bigger the stored values got.

And did you notice the higher load factor? We can now reasonably get a
load factor of 0.9 with very good performance.

Please review this very closely. This is my first major contribution to Rust. Sorry for the ugly diff!
@bors bors merged commit 5bdbd21 into rust-lang:master Mar 13, 2014
@huonw
Copy link
Member

huonw commented Mar 17, 2014

BTW, this reduced rustc's memory use (for building librustc) by ~100MB: http://huonw.github.io/isrustfastyet/mem/#2eebeb8,4d64441

@cgaebel
Copy link
Contributor Author

cgaebel commented Mar 17, 2014

NICE! Btw is that a CPU usage regression, or just noise?

@emberian
Copy link
Member

noise. I was probably building another copy of rustc at the time, or
playing a game.

On Mon, Mar 17, 2014 at 11:03 AM, Clark Gaebel [email protected]:

NICE! Btw is that a CPU usage regression, or just noise?


Reply to this email directly or view it on GitHubhttps://github.com//pull/12081#issuecomment-37825672
.

http://octayn.net/

flip1995 pushed a commit to flip1995/rust that referenced this pull request Feb 8, 2024
fix: incorrect suggestions generated by `manual_retain` lint

fixes rust-lang#10393, fixes rust-lang#11457, fixes rust-lang#12081

rust-lang#10393: In the current implementation of `manual_retain`, if the argument to the closure is matched using tuple, they are all treated as the result of a call to `map.into_iter().filter(<f>)`. However, such tuple pattern matching can also occur in many different containers that stores tuples internally. The correct approach is to apply different lint policies depending on whether the receiver of `into_iter` is a map or not.

rust-lang#11457 and rust-lang#12081: In the current implementation of `manual_retain`, if the argument to the closure is `Binding`, the closure will be used directly in the `retain` method, which will result in incorrect suggestion because the first argument to the `retain` closure may be of a different type. In addition, if the argument to the closure is `Ref + Binding`, the lint will simply remove the `Ref` part and use the `Binding` part as the argument to the new closure, which will lead to bad suggestion for the same reason. The correct approach is to detect each of these cases and apply lint suggestions conservatively.

changelog: [`manual_retain`] refactor and add check for various patterns
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.