Additional memory segmentation for bucket index #94

dotnwat · 2023-10-17T20:52:56Z

Is your feature request related to a problem? Please describe.

Sort of. We are exploring the segmented map as means to work-around some constrains we have related to fragmented memory and not being able to allocate large contiguous memory regions.

However, in our experiments we found that it was fairly easy to get >1MB allocation on the bucket index, which exceeds the maximum allocation size we'd like to make.

Describe the solution you'd like

Would it be reasonable to also use the fragmented vector for the bucket index?
Is there a solution that we are missing here?

Describe alternatives you've considered

Using a different approach to indexing entirely for our problem.

martinus · 2023-10-18T17:20:54Z

Hi, I created the segmented_map as a way around the allocation spikes (when std::vector is full it first has to allocate a 2x size array, move things over, and then delete the old memory, thus needing 3x more memory). For the bucket index I don't think the segmented vector is of any use, because I need to allocate the memory anways; there is no memory saved. Or is the only reason you want a different allocation behavior because you need smaller chunks? Total memory usage would increase a bit by switching the bucket index from a plain memory to segments

dotnwat · 2023-10-18T17:57:33Z

Or is the only reason you want a different allocation behavior because you need smaller chunks

Correct. We operate in an environment where memory fragmentation exists. After the system runs for a while it can be difficult to allocate large (e.g. > 128 KB) chunks of contiguous memory. The segmented map offers us a solution, but it is limited because of the bucket index.

In segmented mode we only applied the segmenting to the values array but not the bucket array. As a result there the pattern of there still being a deallocation followed by an increased allocation when resizing the hash map continues to exist. Further, in environments where the max allocation size is limited because of fragmentation issues this can lead to problems. To avoid both of these issues this patch makes the bucket array use the same datastructure as the values array, i.e.: a `std::vector` when linear and `segmented_vector` when segmented (or the passed datastructure if specified). This extra indirection does add some overhead in the segmented case. Looking at the quick benchmarks we see: Before: ``` | ns/op | op/s | err% | ins/op | cyc/op | IPC | bra/op | miss% | total | benchmarking |--------------------:|--------------------:|--------:|----------------:|----------------:|-------:|---------------:|--------:|----------:|:------------- | 8,912,995.09 | 112.20 | 0.1% | 225,712,537.08 | 26,628,198.00 | 8.476 | 25,133,812.23 | 0.1% | 1.15 | `ankerl::unordered_dense::map<uint64_t, size_t> segmented_vector iterate while adding then removing` | 65,440,597.50 | 15.28 | 0.1% | 496,971,523.50 | 195,721,929.00 | 2.539 | 64,749,156.50 | 11.2% | 1.44 | `ankerl::unordered_dense::map<uint64_t, size_t> segmented_vector random insert erase` | 63,254,162.50 | 15.81 | 0.1% | 540,753,642.50 | 188,790,381.00 | 2.864 | 101,168,500.00 | 6.3% | 1.39 | `ankerl::unordered_dense::map<uint64_t, size_t> segmented_vector 50% probability to find` | 9,777,270.50 | 102.28 | 0.2% | 281,149,360.00 | 28,833,467.00 | 9.751 | 25,968,567.75 | 0.1% | 1.19 | `ankerl::unordered_dense::map<std::string, size_t> segmented_vector iterate while adding then removing` | 220,368,952.00 | 4.54 | 0.2% |2,707,978,150.00 | 659,198,358.00 | 4.108 | 347,649,399.00 | 3.8% | 2.43 | `ankerl::unordered_dense::map<std::string, size_t> segmented_vector random insert erase` | 156,887,435.00 | 6.37 | 0.1% |2,166,844,490.00 | 464,728,290.00 | 4.663 | 266,835,027.00 | 2.5% | 1.73 | `ankerl::unordered_dense::map<std::string, size_t> segmented_vector 50% probability to find` ``` After: ``` | ns/op | op/s | err% | ins/op | cyc/op | IPC | bra/op | miss% | total | benchmarking |--------------------:|--------------------:|--------:|----------------:|----------------:|-------:|---------------:|--------:|----------:|:------------- | 8,921,748.31 | 112.09 | 0.1% | 226,313,644.69 | 26,684,106.00 | 8.481 | 25,174,702.92 | 0.1% | 1.18 | `ankerl::unordered_dense::map<uint64_t, size_t> segmented_vector iterate while adding then removing` | 75,578,500.00 | 13.23 | 0.1% | 597,036,791.50 | 226,059,912.00 | 2.641 | 64,865,689.00 | 11.3% | 1.14 | `ankerl::unordered_dense::map<uint64_t, size_t> segmented_vector random insert erase` | 74,928,542.00 | 13.35 | 0.1% | 677,557,943.00 | 223,726,152.00 | 3.029 | 91,606,575.00 | 7.0% | 1.13 | `ankerl::unordered_dense::map<uint64_t, size_t> segmented_vector 50% probability to find` | 10,079,993.00 | 99.21 | 0.4% | 293,716,069.73 | 29,697,236.40 | 9.890 | 25,980,823.83 | 0.1% | 1.20 | `ankerl::unordered_dense::map<std::string, size_t> segmented_vector iterate while adding then removing` | 220,081,085.00 | 4.54 | 0.1% |2,721,992,469.00 | 658,245,042.00 | 4.135 | 345,686,575.00 | 3.8% | 2.42 | `ankerl::unordered_dense::map<std::string, size_t> segmented_vector random insert erase` | 158,126,693.00 | 6.32 | 0.1% |2,191,768,626.00 | 468,710,736.00 | 4.676 | 267,938,632.00 | 2.5% | 1.74 | `ankerl::unordered_dense::map<std::string, size_t> segmented_vector 50% probability to find` ``` If we think this is not unconditionally acceptable then we could possibly add another template parameter (or make IsSegmented an enum) to decide which parts are supposed to be segmented. Fixes martinus#94

StephanDollberg · 2024-03-06T15:38:40Z

I have created a PR implementing that here: #112

Let us know what you think.

dotnwat · 2024-03-08T02:23:05Z

I have created a PR implementing that here: #112

Let us know what you think.

👋 👋 👋

dotnwat added the enhancement New feature or request label Oct 17, 2023

dotnwat assigned martinus Oct 17, 2023

StephanDollberg mentioned this issue Mar 6, 2024

segment_map: Also segment the bucket array #112

Merged

martinus closed this as completed in #112 Oct 5, 2024

martinus closed this as completed in 6d20fbb Oct 5, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Additional memory segmentation for bucket index #94

Additional memory segmentation for bucket index #94

dotnwat commented Oct 17, 2023

martinus commented Oct 18, 2023

dotnwat commented Oct 18, 2023

StephanDollberg commented Mar 6, 2024

dotnwat commented Mar 8, 2024

Additional memory segmentation for bucket index #94

Additional memory segmentation for bucket index #94

Comments

dotnwat commented Oct 17, 2023

martinus commented Oct 18, 2023

dotnwat commented Oct 18, 2023

StephanDollberg commented Mar 6, 2024

dotnwat commented Mar 8, 2024