Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Introduce Overflow & Displacement tracking. #517

Open
wants to merge 1 commit into
base: master
Choose a base branch
from

Conversation

matthieu-m
Copy link

Changes:

  • Introduce Overflow Trackers, with features to select the desired variant.
  • Introduce Displacements, conditional on the Overflow Tracker variant tracking removals.
  • Adjust insertion/removal of items in RawTable to properly track overflow and displacement.
  • Adjust find in RawTable to short-circuit probe sequence when overflow tracking ensure there is no need to probe further.
  • OF NOTE: enforce group alignment.

Motivation:

Overflow tracking allows cutting a probing sequence short, which may be beneficial.

The use of a multitude of variants makes it easier to test and benchmark all variants, thus making it easier to pick the right one... or not pick any.

The groups are now forcibly aligned because overflow tracking is performed on a group basis, and does not work with "floating" groups.

Design:

Overflow trackers and displacements are tacked at the end of the allocation, and their access is minimized, so that their performance impact is minimized.

In particular:

  1. An element which does not overflow on insertion need not trigger a write to any overflow tracker, nor to its displacement.
  2. Only if removals are tracked is the displacement read on removal.
  3. Only if removals are tracked and the displacement is non-0 are overflow trackers written to on removal.

This follows the philosophy of "You Don't Pay For What You Don't Use", and makes the impact as minimal as can be.

Benchmarks:

Methodology: each variant was benchmarked 3 times, and for each benchmark the best result was picked. Then all results were normalized on the current master for ease of comparison.

Benchmark master none bloom-1-u8 bloom-1-u16 counter-u8 hybrid
clone_from_large 100% (+/-19.77%) +0.00% (+/-0.20%) +0.17% (+/-0.10%) +0.00% (+/-0.20%) -0.94% (+/-0.00%) +1.18% (+/-0.18%)
clone_from_small 100% (+/-6.82%) +0.00% (+/-0.07%) +2.27% (+/-0.20%) +2.27% (+/-0.04%) +0.00% (+/-0.25%) +0.00% (+/-0.05%)
clone_large 100% (+/-8.86%) +0.00% (+/-0.09%) +1.24% (+/-0.14%) -0.66% (+/-0.07%) -0.86% (+/-0.09%) -1.04% (+/-0.07%)
clone_small 100% (+/-9.09%) +0.00% (+/-0.09%) +3.64% (+/-0.05%) +1.82% (+/-0.07%) +0.00% (+/-0.07%) +1.82% (+/-0.04%)
grow_insert_ahash_highbits 100% (+/-4.54%) +0.00% (+/-0.05%) +0.24% (+/-0.03%) -0.65% (+/-0.00%) -0.51% (+/-0.05%) +2.29% (+/-0.00%)
grow_insert_ahash_random 100% (+/-0.02%) +0.00% (+/-0.00%) +2.83% (+/-0.00%) +0.88% (+/-0.00%) +0.53% (+/-0.00%) +1.58% (+/-0.00%)
grow_insert_ahash_serial 100% (+/-0.01%) +0.00% (+/-0.00%) +0.85% (+/-0.05%) +0.22% (+/-0.00%) +1.46% (+/-0.00%) +4.13% (+/-0.00%)
grow_insert_std_highbits 100% (+/-0.00%) +0.00% (+/-0.00%) +0.81% (+/-0.00%) +1.54% (+/-0.00%) +0.14% (+/-0.00%) +0.93% (+/-0.00%)
grow_insert_std_random 100% (+/-1.61%) +0.00% (+/-0.02%) +4.05% (+/-0.00%) +2.37% (+/-0.00%) +3.96% (+/-0.00%) +3.10% (+/-0.00%)
grow_insert_std_serial 100% (+/-0.00%) +0.00% (+/-0.00%) +4.50% (+/-0.00%) +3.71% (+/-0.00%) +1.83% (+/-0.00%) +5.21% (+/-0.00%)
insert_ahash_highbits 100% (+/-0.01%) +0.00% (+/-0.00%) +2.64% (+/-0.00%) +1.21% (+/-0.00%) +2.07% (+/-0.00%) +1.45% (+/-0.00%)
insert_ahash_random 100% (+/-0.01%) +0.00% (+/-0.00%) +6.36% (+/-0.00%) +0.48% (+/-0.00%) +0.62% (+/-0.00%) +0.38% (+/-0.00%)
insert_ahash_serial 100% (+/-3.56%) +0.00% (+/-0.04%) +5.62% (+/-0.00%) +5.34% (+/-0.00%) -0.12% (+/-0.00%) +0.20% (+/-0.00%)
insert_erase_ahash_highbits 100% (+/-4.64%) +0.00% (+/-0.05%) +2.98% (+/-0.05%) +3.52% (+/-0.00%) +3.19% (+/-0.04%) +7.18% (+/-0.00%)
insert_erase_ahash_random 100% (+/-0.01%) +0.00% (+/-0.00%) +2.59% (+/-0.00%) +3.44% (+/-0.00%) +2.80% (+/-0.00%) +4.72% (+/-0.03%)
insert_erase_ahash_serial 100% (+/-0.01%) +0.00% (+/-0.00%) +0.50% (+/-0.06%) +0.83% (+/-0.00%) +5.17% (+/-0.00%) +3.54% (+/-0.02%)
insert_erase_std_highbits 100% (+/-0.01%) +0.00% (+/-0.00%) +2.06% (+/-0.00%) +2.07% (+/-0.00%) +0.14% (+/-0.00%) +0.40% (+/-0.03%)
insert_erase_std_random 100% (+/-0.01%) +0.00% (+/-0.00%) -0.06% (+/-0.00%) +0.84% (+/-0.00%) -1.83% (+/-0.00%) +0.95% (+/-0.00%)
insert_erase_std_serial 100% (+/-1.97%) +0.00% (+/-0.02%) +4.26% (+/-0.00%) +4.75% (+/-0.00%) -0.75% (+/-0.00%) +2.14% (+/-0.00%)
insert_std_highbits 100% (+/-0.00%) +0.00% (+/-0.00%) +0.35% (+/-0.00%) -0.69% (+/-0.00%) -1.61% (+/-0.04%) -1.21% (+/-0.00%)
insert_std_random 100% (+/-0.00%) +0.00% (+/-0.00%) -2.34% (+/-0.00%) -0.57% (+/-0.00%) -0.69% (+/-0.00%) +0.45% (+/-0.00%)
insert_std_serial 100% (+/-2.18%) +0.00% (+/-0.02%) -2.24% (+/-0.00%) -2.86% (+/-0.05%) +0.69% (+/-0.00%) +1.62% (+/-0.00%)
iter_ahash_highbits 100% (+/-10.23%) +0.00% (+/-0.10%) +3.41% (+/-0.12%) -1.46% (+/-0.07%) -0.32% (+/-0.11%) -0.97% (+/-0.06%)
iter_ahash_random 100% (+/-3.57%) +0.00% (+/-0.04%) +1.95% (+/-0.08%) -0.97% (+/-0.06%) -0.65% (+/-0.07%) -0.81% (+/-0.05%)
iter_ahash_serial 100% (+/-8.93%) +0.00% (+/-0.09%) +2.60% (+/-0.09%) -0.97% (+/-0.06%) -0.81% (+/-0.04%) -0.49% (+/-0.05%)
iter_std_highbits 100% (+/-4.52%) +0.00% (+/-0.05%) +2.42% (+/-0.09%) -0.48% (+/-0.06%) +0.65% (+/-0.13%) -0.16% (+/-0.06%)
iter_std_random 100% (+/-5.47%) +0.00% (+/-0.05%) -0.16% (+/-0.12%) -0.80% (+/-0.07%) +0.64% (+/-0.08%) +0.32% (+/-0.06%)
iter_std_serial 100% (+/-6.44%) +0.00% (+/-0.06%) +1.77% (+/-0.07%) +0.64% (+/-0.08%) +1.93% (+/-0.02%) +0.16% (+/-0.05%)
lookup_ahash_highbits 100% (+/-4.26%) +0.00% (+/-0.04%) +4.47% (+/-0.12%) +1.63% (+/-0.10%) -1.20% (+/-0.07%) +1.02% (+/-0.07%)
lookup_ahash_random 100% (+/-5.24%) +0.00% (+/-0.05%) +8.50% (+/-0.08%) +7.26% (+/-0.09%) -0.50% (+/-0.05%) +7.41% (+/-0.13%)
lookup_ahash_serial 100% (+/-4.51%) +0.00% (+/-0.05%) +8.28% (+/-0.05%) +6.62% (+/-0.07%) +0.25% (+/-0.14%) +8.25% (+/-0.13%)
lookup_fail_ahash_highbits 100% (+/-7.58%) +0.00% (+/-0.08%) +10.95% (+/-0.18%) +7.62% (+/-0.03%) +1.89% (+/-0.05%) +9.13% (+/-0.06%)
lookup_fail_ahash_random 100% (+/-7.33%) +0.00% (+/-0.07%) +13.83% (+/-0.16%) +9.87% (+/-0.08%) -0.34% (+/-0.05%) +12.93% (+/-0.12%)
lookup_fail_ahash_serial 100% (+/-6.37%) +0.00% (+/-0.06%) +7.33% (+/-0.05%) +11.93% (+/-0.20%) +1.36% (+/-0.06%) +10.31% (+/-0.05%)
lookup_fail_std_highbits 100% (+/-7.78%) +0.00% (+/-0.08%) +3.68% (+/-0.06%) +5.35% (+/-0.03%) +0.60% (+/-0.05%) +4.09% (+/-0.05%)
lookup_fail_std_random 100% (+/-5.59%) +0.00% (+/-0.06%) +5.37% (+/-0.11%) +6.13% (+/-0.04%) +1.06% (+/-0.00%) +5.11% (+/-0.08%)
lookup_fail_std_serial 100% (+/-4.02%) +0.00% (+/-0.04%) +1.58% (+/-0.06%) +4.38% (+/-0.11%) +0.55% (+/-0.00%) +3.10% (+/-0.05%)
lookup_std_highbits 100% (+/-3.36%) +0.00% (+/-0.03%) +5.24% (+/-0.00%) +7.26% (+/-0.00%) +1.65% (+/-0.00%) +4.80% (+/-0.09%)
lookup_std_random 100% (+/-2.47%) +0.00% (+/-0.02%) +3.76% (+/-0.03%) +3.32% (+/-0.06%) +3.57% (+/-0.11%) +3.22% (+/-0.06%)
lookup_std_serial 100% (+/-9.09%) +0.00% (+/-0.09%) +8.38% (+/-0.04%) +7.50% (+/-0.08%) +7.86% (+/-0.09%) +8.46% (+/-0.09%)
rehash_in_place 100% (+/-0.01%) +0.00% (+/-0.00%) +2.49% (+/-0.00%) -1.66% (+/-0.00%) +1.48% (+/-0.00%) +5.18% (+/-0.00%)
insert 100% (+/-0.01%) +0.00% (+/-0.00%) +0.25% (+/-0.11%) -1.51% (+/-0.07%) +4.53% (+/-0.13%) +2.96% (+/-0.00%)
insert_unique_unchecked 100% (+/-6.95%) +0.00% (+/-0.07%) -5.59% (+/-0.08%) -10.45% (+/-0.06%) -0.36% (+/-0.16%) -4.54% (+/-0.05%)

Remarks:

  • The none variant is completely neutral, which means that enforcing group alignment did not affect performance.
  • The other variants show some promise, but the results vary quite a bit depending on micro-optimization. Aggressive (always) inlining of key methods seemed to help, for example, but I am not so sure whether may_have_overflowed should be inlined since it's expected to be rare.
  • Whether the benchmark "suffer" from high probe counts is unknown to me. Overflow tracking is only helpful to cut probing sequences short, and thus pure overhead if there's no quadratic probing.

In any case, at least with the scaffolding in place it should be possible to experiment further if there's any will to.

* Changes:

- Introduce Overflow Trackers, with features to select the desired
  variant.
- Introduce Displacements, conditional on the Overflow Tracker variant
  tracking removals.
- Adjust insertion/removal of items in RawTable to properly track
  overflow and displacement.
- Adjust find in RawTable to short-circuit probe sequence when overflow
  tracking ensure there is no need to probe further.
- OF NOTE: enforce group alignment.

* Motivation:

Overflow tracking allows cutting a probing sequence short, which may be
beneficial.

The use of a multitude of variants makes it easier to test and benchmark
all variants, thus making it easier to pick the right one... or not pick
any.

The groups are now forcibly aligned because overflow tracking is
performed on a group basis, and does not work with "floating" groups.

* Design:

Overflow trackers and displacements are tacked at the end of the
allocation, and their access is minimized, so that their performance
impact is minimized.

In particular:

1. An element which does not overflow on insertion need not trigger a
   write to any overflow tracker, nor to its displacement.
2. Only if removals are tracked is the displacement read on removal.
3. Only if removals are tracked and the displacement is non-0 are
   overflow trackers written to on removal.

This follows the philosophy of "You Don't Pay For What You Don't Use",
and makes the impact as minimal as can be.
@bors
Copy link
Contributor

bors commented Jun 7, 2024

☔ The latest upstream changes (presumably #525) made this pull request unmergeable. Please resolve the merge conflicts.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants