Skip to content

Commit a902ea6

Browse files
authored
Save an instruction in EntityHasher (#10648)
# Objective Keep essentially the same structure of `EntityHasher` from #9903, but rephrase the multiplication slightly to save an instruction. cc @superdump Discord thread: https://discord.com/channels/691052431525675048/1172033156845674507/1174969772522356756 ## Solution Today, the hash is ```rust self.hash = i | (i.wrapping_mul(FRAC_U64MAX_PI) << 32); ``` with `i` being `(generation << 32) | index`. Expanding things out, we get ```rust i | ( (i * CONST) << 32 ) = (generation << 32) | index | ((((generation << 32) | index) * CONST) << 32) = (generation << 32) | index | ((index * CONST) << 32) // because the generation overflowed = (index * CONST | generation) << 32 | index ``` What if we do the same thing, but with `+` instead of `|`? That's almost the same thing, except that it has carries, which are actually often better in a hash function anyway, since it doesn't saturate. (`|` can be dangerous, since once something becomes `-1` it'll stay that, and there's no mixing available.) ```rust (index * CONST + generation) << 32 + index = (CONST << 32 + 1) * index + generation << 32 = (CONST << 32 + 1) * index + (WHATEVER << 32 + generation) << 32 // because the extra overflows and thus can be anything = (CONST << 32 + 1) * index + ((CONST * generation) << 32 + generation) << 32 // pick "whatever" to be something convenient = (CONST << 32 + 1) * index + ((CONST << 32 + 1) * generation) << 32 = (CONST << 32 + 1) * index +((CONST << 32 + 1) * (generation << 32) = (CONST << 32 + 1) * (index + generation << 32) = (CONST << 32 + 1) * (generation << 32 | index) = (CONST << 32 + 1) * i ``` So we can do essentially the same thing using a single multiplication instead of doing multiply-shift-or. LLVM was already smart enough to merge the shifting into a multiplication, but this saves the extra `or`: ![image](https://github.com/bevyengine/bevy/assets/18526288/d9396614-2326-4730-abbe-4908c01b5ace) <https://rust.godbolt.org/z/MEvbz4eo4> It's a very small change, and often will disappear in load latency anyway, but it's a couple percent faster in lookups: ![image](https://github.com/bevyengine/bevy/assets/18526288/c365ec85-6adc-4f6d-8fa6-a65146f55a75) (There was more of an improvement here before #10558, but with `to_bits` being a single `qword` load now, keeping things mostly as it is turned out to be better than the bigger changes I'd tried in #10605.) --- ## Changelog (Probably skip it) ## Migration Guide (none needed)
1 parent d95d20f commit a902ea6

File tree

2 files changed

+69
-12
lines changed

2 files changed

+69
-12
lines changed

crates/bevy_ecs/src/entity/mod.rs

+32
Original file line numberDiff line numberDiff line change
@@ -998,4 +998,36 @@ mod tests {
998998
assert!(Entity::new(2, 2) > Entity::new(1, 2));
999999
assert!(Entity::new(2, 2) >= Entity::new(1, 2));
10001000
}
1001+
1002+
// Feel free to change this test if needed, but it seemed like an important
1003+
// part of the best-case performance changes in PR#9903.
1004+
#[test]
1005+
fn entity_hash_keeps_similar_ids_together() {
1006+
use std::hash::BuildHasher;
1007+
let hash = bevy_utils::EntityHash;
1008+
1009+
let first_id = 0xC0FFEE << 8;
1010+
let first_hash = hash.hash_one(Entity::from_raw(first_id));
1011+
1012+
for i in 1..=255 {
1013+
let id = first_id + i;
1014+
let hash = hash.hash_one(Entity::from_raw(id));
1015+
assert_eq!(hash.wrapping_sub(first_hash) as u32, i);
1016+
}
1017+
}
1018+
1019+
#[test]
1020+
fn entity_hash_id_bitflip_affects_high_7_bits() {
1021+
use std::hash::BuildHasher;
1022+
let hash = bevy_utils::EntityHash;
1023+
1024+
let first_id = 0xC0FFEE;
1025+
let first_hash = hash.hash_one(Entity::from_raw(first_id)) >> 57;
1026+
1027+
for bit in 0..u32::BITS {
1028+
let id = first_id ^ (1 << bit);
1029+
let hash = hash.hash_one(Entity::from_raw(id)) >> 57;
1030+
assert_ne!(hash, first_hash);
1031+
}
1032+
}
10011033
}

crates/bevy_utils/src/lib.rs

+37-12
Original file line numberDiff line numberDiff line change
@@ -267,29 +267,54 @@ impl BuildHasher for EntityHash {
267267
/// A very fast hash that is only designed to work on generational indices
268268
/// like `Entity`. It will panic if attempting to hash a type containing
269269
/// non-u64 fields.
270+
///
271+
/// This is heavily optimized for typical cases, where you have mostly live
272+
/// entities, and works particularly well for contiguous indices.
273+
///
274+
/// If you have an unusual case -- say all your indices are multiples of 256
275+
/// or most of the entities are dead generations -- then you might want also to
276+
/// try [`AHasher`] for a slower hash computation but fewer lookup conflicts.
270277
#[derive(Debug, Default)]
271278
pub struct EntityHasher {
272279
hash: u64,
273280
}
274281

275-
// This value comes from rustc-hash (also known as FxHasher) which in turn got
276-
// it from Firefox. It is something like `u64::MAX / N` for an N that gives a
277-
// value close to π and works well for distributing bits for hashing when using
278-
// with a wrapping multiplication.
279-
const FRAC_U64MAX_PI: u64 = 0x517cc1b727220a95;
280-
281282
impl Hasher for EntityHasher {
282283
fn write(&mut self, _bytes: &[u8]) {
283284
panic!("can only hash u64 using EntityHasher");
284285
}
285286

286287
#[inline]
287-
fn write_u64(&mut self, i: u64) {
288-
// Apparently hashbrown's hashmap uses the upper 7 bits for some SIMD
289-
// optimisation that uses those bits for binning. This hash function
290-
// was faster than i | (i << (64 - 7)) in the worst cases, and was
291-
// faster than PassHasher for all cases tested.
292-
self.hash = i | (i.wrapping_mul(FRAC_U64MAX_PI) << 32);
288+
fn write_u64(&mut self, bits: u64) {
289+
// SwissTable (and thus `hashbrown`) cares about two things from the hash:
290+
// - H1: low bits (masked by `2ⁿ-1`) to pick the slot in which to store the item
291+
// - H2: high 7 bits are used to SIMD optimize hash collision probing
292+
// For more see <https://abseil.io/about/design/swisstables#metadata-layout>
293+
294+
// This hash function assumes that the entity ids are still well-distributed,
295+
// so for H1 leaves the entity id alone in the low bits so that id locality
296+
// will also give memory locality for things spawned together.
297+
// For H2, take advantage of the fact that while multiplication doesn't
298+
// spread entropy to the low bits, it's incredibly good at spreading it
299+
// upward, which is exactly where we need it the most.
300+
301+
// While this does include the generation in the output, it doesn't do so
302+
// *usefully*. H1 won't care until you have over 3 billion entities in
303+
// the table, and H2 won't care until something hits generation 33 million.
304+
// Thus the comment suggesting that this is best for live entities,
305+
// where there won't be generation conflicts where it would matter.
306+
307+
// The high 32 bits of this are ⅟φ for Fibonacci hashing. That works
308+
// particularly well for hashing for the same reason as described in
309+
// <https://extremelearning.com.au/unreasonable-effectiveness-of-quasirandom-sequences/>
310+
// It loses no information because it has a modular inverse.
311+
// (Specifically, `0x144c_bc89_u32 * 0x9e37_79b9_u32 == 1`.)
312+
//
313+
// The low 32 bits make that part of the just product a pass-through.
314+
const UPPER_PHI: u64 = 0x9e37_79b9_0000_0001;
315+
316+
// This is `(MAGIC * index + generation) << 32 + index`, in a single instruction.
317+
self.hash = bits.wrapping_mul(UPPER_PHI);
293318
}
294319

295320
#[inline]

0 commit comments

Comments
 (0)