Reduce size_of<HashMap> ? #69

SimonSapin · 2019-04-25T11:49:29Z

With the switch to Hashbrow, std::mem::size_of::<std::collections::HashMap<(), ()>>() on 64-bit platforms grew from 40 bytes to 56. (24 to 40 bytes with BuildHasherDefault.)

In Servo’s DOM implementation we have types whose size_of is several hundreds of bytes. Because some not-so-unusual pages can have very many DOM nodes this size can add up to significant memory usage. We have unit tests for size_of to ensure it does not accidentally grow, which fail in today’s Rust Nightly because several types grew by 16 bytes because they contain a HashMap.

Hashbrown’s HashMap contains a RawTable which has five pointer-sized fields:

hashbrown/src/raw/mod.rs

Lines 328 to 348 in 7e79b0c

    
           /// A raw hash table with an unsafe API. 
        
           pub struct RawTable<T> { 
        
               // Mask to get an index from a hash value. The value is one less than the 
        
               // number of buckets in the table. 
        
               bucket_mask: usize, 
        
               // Pointer to the array of control bytes 
        
               ctrl: NonNull<u8>, 
        
               // Pointer to the array of buckets 
        
               data: NonNull<T>, 
        
               // Number of elements that can be inserted before we need to grow the table 
        
               growth_left: usize, 
        
               // Number of elements in the table, only really used by len() 
        
               items: usize, 
        
               // Tell dropck that we own instances of T. 
        
               marker: PhantomData<T>, 
        
           }

Some of them seem potentially redundant, but I’m not sure. For example there are two pointers that seem to be in the same allocation. How expensive would it be to re-compute the second pointer every time it is needed?

Are bucket_mask, growth_left, and len related such that one of them could be computed from the others?

The text was updated successfully, but these errors were encountered:

This includes a `size_of` regression for a few DOM types, due to rust-lang/rust#58623 which replaces the implementation of `HashMap` in the standard library to Hashbrown. Although `size_of<HashMap>` grows, it’s not obvious how total memory usage is going to be impacted: Hashbrown only has one `u8` instead of one `usize` of overhead per hash table bucket for storing (part of) a hash, and so might allocate less memory itself. Hashbrown also typically has better run time performance: https://github.com/rust-lang/hashbrown#performance Still, I’ve filed rust-lang/hashbrown#69 about potentially reducing the `size_of<HashMap>` regression.

Amanieu · 2019-04-25T14:58:23Z

The data pointer could be derived from the ctrl pointer, which would save 8 bytes. However the other fields can't be derived from each other and must be kept.

Upgrade to rustc 1.36.0-nightly (e305df184 2019-04-24) This includes a `size_of` regression for a few DOM types, due to rust-lang/rust#58623 which replaces the implementation of `HashMap` in the standard library to Hashbrown. Although `size_of<HashMap>` grows, it’s not obvious how total memory usage is going to be impacted: Hashbrown only has one `u8` instead of one `usize` of overhead per hash table bucket for storing (part of) a hash, and so might allocate less memory itself. Hashbrown also typically has better run time performance: https://github.com/rust-lang/hashbrown#performance Still, I’ve filed rust-lang/hashbrown#69 about potentially reducing the `size_of<HashMap>` regression.  --- This change is [<img src="https://reviewable.io/review_button.svg" height="34" align="absmiddle" alt="Reviewable"/>](https://reviewable.io/reviews/servo/servo/23263)

Some Hashers, and FxHasher in particular, mix the low bits poorly; so only use the high bits for both hashes. Instead of introducing bucket_shift next to bucket_mask in the top-level struct, derive them both from capacity_log2. This avoids exacerbating struct size in rust-lang#69. This change is also a prerequisite for rayon-based parallel collect, which requires that the high bits of the index are always in the same place regardless of table size. name old ns/iter new ns/iter diff ns/iter diff % speedup find_existing 0 0 0 NaN% x NaN find_existing_high_bits 84,320 3,442 -80,878 -95.92% x 24.50 find_nonexisting 0 0 0 NaN% x NaN get_remove_insert 25 29 4 16.00% x 0.86 grow_by_insertion 205 209 4 1.95% x 0.98 grow_by_insertion_kb 290 180 -110 -37.93% x 1.61 hashmap_as_queue 25 26 1 4.00% x 0.96 insert_8_char_string 18,038 17,491 -547 -3.03% x 1.03 new_drop 0 0 0 NaN% x NaN new_insert_drop 45 50 5 11.11% x 0.90

Fixes rust-lang#69 to the extent possible without more dramatic changes.

Gankra · 2019-04-26T17:35:54Z

I think it would be much more interesting to investigate moving to a layout where all of the fields are in the same allocation as the keys and hashes, with an empty singleton to avoid allocating empty maps. As is done with thin-vec and BTreeMap.

This would make it much cheaper to move HashMaps around, and would dramatically reduce the memory footprint of things which only sometimes actually create a real map. (Option<HashMap>, Vec<HashMap>, or just a rarely used HashMap::new()).

It's non-trivial work though, because it requires carefully ensuring you never mess up the CoW nature of the singleton, as well as carefully ensuring that you never make the size/align mismatch the allocation. That said I think it might be possible, as we should be able to avoid ever looking at the keys for a static ctrl array that's all EMPTY. The BuildHasher is the most annoying part here I think. I think it has to be inline. But that's fine for everyone using FxHasher or any other BuildHasherDefault, which is zero-sized. Sadly RandomState (std's default) needs 16 bytes for the seed.

Gankra · 2019-04-26T17:47:38Z

As an aside this would be an exciting experiment because we would have a very optimized implementation to compare the performance difference of this "trick" to in an apples-to-apples way. I'm not aware of any detailed analysis of the second and third-order effects of this optimization, like cache locality and whether it upsets the optimizer, which have historically been reasons to hand-wring over it.

SimonSapin mentioned this issue Apr 25, 2019

Upgrade to rustc 1.36.0-nightly (e305df184 2019-04-24) servo/servo#23263

Merged

edre mentioned this issue Apr 25, 2019

Extract index from the high hash bits instead of the low bits. #71

Closed

SimonSapin added a commit to SimonSapin/hashbrown that referenced this issue Apr 26, 2019

Reduce size_of<HashMap> by one pointer

488f476

Fixes rust-lang#69 to the extent possible without more dramatic changes.

SimonSapin mentioned this issue Apr 26, 2019

Reduce size_of<HashMap> by one pointer #73

Closed

Gankra mentioned this issue Apr 27, 2019

Proof of Concept: Outlined RawTable #76

Closed

iwa0 mentioned this issue May 23, 2020

Another attempt to reduce size_of<HashMap> #159

Merged

clarfonthey mentioned this issue Aug 20, 2024

Refactoring proposal: cleaning up the internal APIs #545

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Reduce size_of<HashMap> ? #69

Reduce size_of<HashMap> ? #69

SimonSapin commented Apr 25, 2019 •

edited

Loading

Amanieu commented Apr 25, 2019

Gankra commented Apr 26, 2019

Gankra commented Apr 26, 2019 •

edited

Loading

Reduce size_of<HashMap> ? #69

Reduce size_of<HashMap> ? #69

Comments

SimonSapin commented Apr 25, 2019 • edited Loading

Amanieu commented Apr 25, 2019

Gankra commented Apr 26, 2019

Gankra commented Apr 26, 2019 • edited Loading

SimonSapin commented Apr 25, 2019 •

edited

Loading

Gankra commented Apr 26, 2019 •

edited

Loading