-
Notifications
You must be signed in to change notification settings - Fork 4.7k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Faster optimized frozen dictionary creation (3/n) #87688
Conversation
…` by writing the destination indexes to the provided buffer with hashcodes and moving the responsibility to the caller (1-4% gain)
…ly is the first prime number that would give less than 5% collision rate for unique hash codes. When bestNumCollisions was set to codes.Count (the number of unique hash codes), it meant "start the search assuming that current best collision rate is 100%". The first iteration would then check all values, as any result would be better than 100% collision rate. It would set the new best collision rate, which then would be used by next iterations. Setting bestNumCollisions to `codes.Count / 20 + 1` (just one more collision than 5%) at the beginning means: find me the first bucket that meets the criteria. If none is found, the last prime number is returned, which matches the previous behavior. +23% improvement
…y unique (because it comes from a dictionary or a hash set) there is no need to create another hash set Also, in cases where simply all hash codes are unique, we can iterate over a span rather than a hash set +9% gain for scenarios where the key was an integer (time), 10-20% allocations drop up to +5% gain where string keys turned out to have unique hash codes
Tagging subscribers to this area: @dotnet/area-system-collections Issue Details
My last idea is to use binary search in
|
I wrote some small utility for testing and found some differences with the old code, marking as DRAFT, will mark as ready for review when I solve the problem |
… currently is the first prime number that would give less than 5% collision rate for unique hash codes." as it's not finished yet This reverts commit 4014ff8. # Conflicts: # src/libraries/System.Collections.Immutable/src/System/Collections/Frozen/FrozenHashTable.cs
I've decided to revert 4014ff8 for now and offer this PR with two improvements now, will send a separate PR with improved |
FWIW I've tried one more approach: 9ca2177 To filter out duplicate codes, I tried to sort the hash codes and just skip the duplicates (previous value == current). The allocations dropped by 6-21%, but the CPU time has regressed by 1-19%. |
src/libraries/System.Collections.Immutable/src/System/Collections/Frozen/FrozenHashTable.cs
Outdated
Show resolved
Hide resolved
foreach (int code in hashCodes) | ||
{ | ||
seenBuckets[bucketNum / BitsPerInt32] |= 1 << (int)bucketNum; | ||
uint bucketNum = (uint)code % (uint)numBuckets; | ||
if ((seenBuckets[bucketNum / BitsPerInt32] & (1 << (int)bucketNum)) != 0) | ||
{ | ||
numCollisions++; | ||
if (numCollisions >= bestNumCollisions) | ||
{ | ||
// If we've already hit the previously known best number of collisions, | ||
// there's no point in continuing as worst case we'd just use that. | ||
break; | ||
} | ||
} | ||
else | ||
{ | ||
seenBuckets[bucketNum / BitsPerInt32] |= 1 << (int)bucketNum; | ||
} | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
After all of the work done to clone the inputs, allocate the dictionaries, analyze the keys, and so on, iterating over a span instead of a HashSet really provides a meaningful enough gain to duplicate this?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
After all of the work done to clone the inputs, allocate the dictionaries, analyze the keys, and so on, iterating over a span instead of a HashSet really provides a meaningful enough gain to duplicate this?
Yes: "Up to +5% gain where string keys turned out to have unique hash codes"
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I saw that in the PR description, but I'm still skeptical.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I've used a profiler and benchmarked it more than once. This loop is the hottest place in the entire process of creating frozen dictionaries/hash sets (because all other parts got optimized in other PRs and are now relatively cheap).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Still skeptical :) But putting aside my skepticism, can you at least dedup this by putting it into an aggressively-inlined helper?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@stephentoub do you mind if I do that in my next PR that is going to touch this area?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sure
src/libraries/System.Collections.Immutable/src/System/Collections/Frozen/FrozenHashTable.cs
Outdated
Show resolved
Hide resolved
Co-authored-by: Stephen Toub <[email protected]>
{ | ||
seenBuckets[bucketNum / BitsPerInt32] |= 1 << (int)bucketNum; | ||
uint bucketNum = (uint)code % (uint)numBuckets; | ||
if ((seenBuckets[bucketNum / BitsPerInt32] & (1 << (int)bucketNum)) != 0) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Would computing the bitmask value once (instead of in line 260 and 272) help anything?
numCollisions++; | ||
if (numCollisions >= bestNumCollisions) | ||
uint bucketNum = (uint)code % (uint)numBuckets; | ||
if ((seenBuckets[bucketNum / BitsPerInt32] & (1 << (int)bucketNum)) != 0) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Would computing the bitmask value once (instead of in line 238 and 250) help anything? e.g.
int bucketMask = 1 << (int)bucketNum;
if ((seenBuckets[bucketNum / BitsPerInt32] &bucketMask) != 0)
{
...
}
else
{
seenBuckets[bucketNum / BitsPerInt32] |= bucketMask;
}
Action<int, int> storeDestIndexFromSrcIndex
by writing the destination indexes to the provided buffer with hashcodes after we are done using the hashcodes, move the responsibility of updating the destination to the caller (1-4% gain)