Skip to content

Commit

Permalink
CalcNumBuckets searches for the best number of buckets, which current…
Browse files Browse the repository at this point in the history
…ly is the first prime number that would give less than 5% collision rate for unique hash codes.

When bestNumCollisions was set to codes.Count (the number of unique hash codes), it meant "start the search assuming that current best collision rate is 100%".
The first iteration would then check all values, as any result would be better than 100% collision rate. It would set the new best collision rate, which then would be used by next iterations.

Setting bestNumCollisions to `codes.Count / 20 + 1` (just one more collision than 5%) at the beginning means: find me the first bucket that meets the criteria.

If none is found, the last prime number is returned, which matches the previous behavior.

+23% improvement
  • Loading branch information
adamsitnik committed Jun 16, 2023
1 parent 4ad0887 commit 4014ff8
Showing 1 changed file with 3 additions and 3 deletions.
Original file line number Diff line number Diff line change
Expand Up @@ -147,7 +147,6 @@ private static int CalcNumBuckets(ReadOnlySpan<int> hashCodes, bool optimizeForR
{
Debug.Assert(hashCodes.Length != 0);

const double AcceptableCollisionRate = 0.05; // What is a satisfactory rate of hash collisions?
const int LargeInputSizeThreshold = 1000; // What is the limit for an input to be considered "small"?
const int MaxSmallBucketTableMultiplier = 16; // How large a bucket table should be allowed for small inputs?
const int MaxLargeBucketTableMultiplier = 3; // How large a bucket table should be allowed for large inputs?
Expand Down Expand Up @@ -208,7 +207,8 @@ private static int CalcNumBuckets(ReadOnlySpan<int> hashCodes, bool optimizeForR
int[] seenBuckets = ArrayPool<int>.Shared.Rent((maxNumBuckets / BitsPerInt32) + 1);

int bestNumBuckets = maxNumBuckets;
int bestNumCollisions = codes.Count;
// just one more collision than the acceptable collision rate (5%)
int bestNumCollisions = (codes.Count / 20) + 1;

// Iterate through each available prime between the min and max discovered. For each, compute
// the collision ratio.
Expand Down Expand Up @@ -246,7 +246,7 @@ private static int CalcNumBuckets(ReadOnlySpan<int> hashCodes, bool optimizeForR
{
bestNumBuckets = numBuckets;

if (numCollisions / (double)codes.Count <= AcceptableCollisionRate)
if (numCollisions <= (codes.Count / 20))
{
break;
}
Expand Down

0 comments on commit 4014ff8

Please sign in to comment.