Faster optimized frozen dictionary creation (3/n) #87688

adamsitnik · 2023-06-16T13:25:38Z

avoid the need of having Action<int, int> storeDestIndexFromSrcIndex by writing the destination indexes to the provided buffer with hashcodes after we are done using the hashcodes, move the responsibility of updating the destination to the caller (1-4% gain)
- I know it's hacky and it's just 1-4%, so please let me know what you think.
For cases where the key is an integer and we know the input is already unique (because it comes from a dictionary or a hash set) there is no need to create another hash set.
- +15% gain for scenarios where the key was an integer (time), 13-19% allocations drop
- Also, in cases where simply all hash codes are unique, we can iterate over a span rather than a hash set. Up to +5% gain where string keys turned out to have unique hash codes

Type	Method	Toolchain	Size	Mean	Ratio	Allocated	Alloc Ratio
CtorFromCollection<Int32>	FrozenDictionaryOptimized	this PR	512	38.95 us	0.84	34.48 KB	0.81
CtorFromCollection<Int32>	FrozenDictionaryOptimized	#87630	512	46.29 us	1.00	42.84 KB	1.00

CtorFromCollection<String>	FrozenDictionaryOptimized	this PR	512	63.21 us	0.94	74.77 KB	1.00
CtorFromCollection<String>	FrozenDictionaryOptimized	#87630	512	67.59 us	1.00	74.88 KB	1.00

CtorFromCollection<Int32>	FrozenSetOptimized	this PR	512	52.80 us	0.85	56.01 KB	0.87
CtorFromCollection<Int32>	FrozenSetOptimized	#87630	512	62.19 us	1.00	64.27 KB	1.00

CtorFromCollection<String>	FrozenSetOptimized	this PR	512	79.05 us	0.94	77.05 KB	1.00
CtorFromCollection<String>	FrozenSetOptimized	#87630	512	83.85 us	1.00	77.14 KB	1.00

…` by writing the destination indexes to the provided buffer with hashcodes and moving the responsibility to the caller (1-4% gain)

…ly is the first prime number that would give less than 5% collision rate for unique hash codes. When bestNumCollisions was set to codes.Count (the number of unique hash codes), it meant "start the search assuming that current best collision rate is 100%". The first iteration would then check all values, as any result would be better than 100% collision rate. It would set the new best collision rate, which then would be used by next iterations. Setting bestNumCollisions to `codes.Count / 20 + 1` (just one more collision than 5%) at the beginning means: find me the first bucket that meets the criteria. If none is found, the last prime number is returned, which matches the previous behavior. +23% improvement

…y unique (because it comes from a dictionary or a hash set) there is no need to create another hash set Also, in cases where simply all hash codes are unique, we can iterate over a span rather than a hash set +9% gain for scenarios where the key was an integer (time), 10-20% allocations drop up to +5% gain where string keys turned out to have unique hash codes

ghost · 2023-06-16T13:25:47Z

Tagging subscribers to this area: @dotnet/area-system-collections
See info in area-owners.md if you want to be subscribed.

Issue Details

avoid the need of having Action<int, int> storeDestIndexFromSrcIndex by writing the destination indexes to the provided buffer with hashcodes after we are done using the hashcodes, move the responsibility of updating the destination to the caller (1-4% gain)
- I know it's hacky and it's just 1-4%, so please let me know what you think.
CalcNumBuckets searches for the best number of buckets, which currently is the first prime number that would give less than 5% collision rate for unique hash codes. When bestNumCollisions was set to codes.Count (the number of unique hash codes), it meant "start the search assuming that current best collision rate is 100%". The first iteration would then check all values, as any result would be better than 100% collision rate. It would set the new best collision rate, which then would be used by next iterations.
- Setting bestNumCollisions to codes.Count / 20 + 1 (just one more collision than 5%) at the beginning means: find me the first bucket that meets the criteria and quickly break for buckets that don't.
- If none is found, the last prime number is returned, which matches the previous behavior assuming that the biggest prime number is always producing the best result.
- +23% improvement for most collections!
For cases where the key is an integer and we know the input is already unique (because it comes from a dictionary or a hash set) there is no need to create another hash set.
- Also, in cases where simply all hash codes are unique, we can iterate over a span rather than a hash set.
- +9% gain for scenarios where the key was an integer (time), 10-20% allocations drop
- up to +5% gain where string keys turned out to have unique hash codes

Type	Method	Job	Size	Mean	Ratio	Allocated	Alloc Ratio
CtorFromCollection<Int32>	FrozenDictionaryOptimized	this PR	512	24.05 us	0.52	34.48 KB	0.80
CtorFromCollection<Int32>	FrozenDictionaryOptimized	#87630	512	44.46 us	0.96	42.84 KB	1.00
CtorFromCollection<Int32>	FrozenDictionaryOptimized	#87510	512	46.12 us	1.00	42.9 KB	1.00
CtorFromCollection<Int32>	FrozenDictionaryOptimized	before #87510	512	46.70 us	1.00	42.9 KB	1.00

CtorFromCollection<String>	FrozenDictionaryOptimized	this PR	512	46.33 us	0.51	74.77 KB	0.88
CtorFromCollection<String>	FrozenDictionaryOptimized	#87630	512	67.60 us	0.74	74.88 KB	0.88
CtorFromCollection<String>	FrozenDictionaryOptimized	#87510	512	75.48 us	0.83	74.91 KB	0.88
CtorFromCollection<String>	FrozenDictionaryOptimized	before #87510	512	91.57 us	1.00	85.21 KB	1.00

CtorFromCollection<Int32>	FrozenSetOptimized	this PR	512	37.02 us	0.61	56.01 KB	0.87
CtorFromCollection<Int32>	FrozenSetOptimized	#87630	512	59.76 us	0.98	64.27 KB	1.00
CtorFromCollection<Int32>	FrozenSetOptimized	#87510	512	63.30 us	1.04	64.35 KB	1.00
CtorFromCollection<Int32>	FrozenSetOptimized	before #87510	512	60.78 us	1.00	64.35 KB	1.00

CtorFromCollection<String>	FrozenSetOptimized	this PR	512	61.22 us	0.60	77.05 KB	0.88
CtorFromCollection<String>	FrozenSetOptimized	#87630	512	83.31 us	0.81	77.14 KB	0.88
CtorFromCollection<String>	FrozenSetOptimized	#87510	512	89.29 us	0.86	77.2 KB	0.88
CtorFromCollection<String>	FrozenSetOptimized	before #87510	512	103.28 us	1.00	87.51 KB	1.00

My last idea is to use binary search in CalcNumBuckets to save time on searching for the best number of buckets. It should work if my assumption (the larger the prime number, the lower collision ratio) is correct. I am going to write an app for generating tons of random inputs to verify that such assumption is correct. If somebody knows the answer already, please let me know ;)

Author:	adamsitnik
Assignees:	-
Labels:	`area-System.Collections`, `tenet-performance`
Milestone:	-

adamsitnik · 2023-06-19T06:54:58Z

I wrote some small utility for testing and found some differences with the old code, marking as DRAFT, will mark as ready for review when I solve the problem

… currently is the first prime number that would give less than 5% collision rate for unique hash codes." as it's not finished yet This reverts commit 4014ff8. # Conflicts: # src/libraries/System.Collections.Immutable/src/System/Collections/Frozen/FrozenHashTable.cs

adamsitnik · 2023-06-19T07:40:09Z

I've decided to revert 4014ff8 for now and offer this PR with two improvements now, will send a separate PR with improved CalcNumBuckets logic

adamsitnik · 2023-06-22T12:10:36Z

FWIW I've tried one more approach: 9ca2177

To filter out duplicate codes, I tried to sort the hash codes and just skip the duplicates (previous value == current).

The allocations dropped by 6-21%, but the CPU time has regressed by 1-19%.

src/libraries/System.Collections.Immutable/src/System/Collections/Frozen/FrozenHashTable.cs

stephentoub · 2023-06-22T12:19:01Z

src/libraries/System.Collections.Immutable/src/System/Collections/Frozen/FrozenHashTable.cs

+                    foreach (int code in hashCodes)
                    {
-                        seenBuckets[bucketNum / BitsPerInt32] |= 1 << (int)bucketNum;
+                        uint bucketNum = (uint)code % (uint)numBuckets;
+                        if ((seenBuckets[bucketNum / BitsPerInt32] & (1 << (int)bucketNum)) != 0)
+                        {
+                            numCollisions++;
+                            if (numCollisions >= bestNumCollisions)
+                            {
+                                // If we've already hit the previously known best number of collisions,
+                                // there's no point in continuing as worst case we'd just use that.
+                                break;
+                            }
+                        }
+                        else
+                        {
+                            seenBuckets[bucketNum / BitsPerInt32] |= 1 << (int)bucketNum;
+                        }
                    }


After all of the work done to clone the inputs, allocate the dictionaries, analyze the keys, and so on, iterating over a span instead of a HashSet really provides a meaningful enough gain to duplicate this?

After all of the work done to clone the inputs, allocate the dictionaries, analyze the keys, and so on, iterating over a span instead of a HashSet really provides a meaningful enough gain to duplicate this?

Yes: "Up to +5% gain where string keys turned out to have unique hash codes"

I saw that in the PR description, but I'm still skeptical.

I've used a profiler and benchmarked it more than once. This loop is the hottest place in the entire process of creating frozen dictionaries/hash sets (because all other parts got optimized in other PRs and are now relatively cheap).

Still skeptical :) But putting aside my skepticism, can you at least dedup this by putting it into an aggressively-inlined helper?

@stephentoub do you mind if I do that in my next PR that is going to touch this area?

src/libraries/System.Collections.Immutable/src/System/Collections/Frozen/FrozenHashTable.cs

Co-authored-by: Stephen Toub <[email protected]>

IDisposable · 2023-06-23T16:01:59Z

src/libraries/System.Collections.Immutable/src/System/Collections/Frozen/FrozenHashTable.cs

                    {
-                        seenBuckets[bucketNum / BitsPerInt32] |= 1 << (int)bucketNum;
+                        uint bucketNum = (uint)code % (uint)numBuckets;
+                        if ((seenBuckets[bucketNum / BitsPerInt32] & (1 << (int)bucketNum)) != 0)


Would computing the bitmask value once (instead of in line 260 and 272) help anything?

IDisposable · 2023-06-23T16:04:48Z

src/libraries/System.Collections.Immutable/src/System/Collections/Frozen/FrozenHashTable.cs

-                        numCollisions++;
-                        if (numCollisions >= bestNumCollisions)
+                        uint bucketNum = (uint)code % (uint)numBuckets;
+                        if ((seenBuckets[bucketNum / BitsPerInt32] & (1 << (int)bucketNum)) != 0)


Would computing the bitmask value once (instead of in line 238 and 250) help anything? e.g.

int bucketMask = 1 << (int)bucketNum; if ((seenBuckets[bucketNum / BitsPerInt32] &bucketMask) != 0) { ... } else { seenBuckets[bucketNum / BitsPerInt32] |= bucketMask; }

adamsitnik added 3 commits June 16, 2023 11:34

avoid the need of having `Action<int, int> storeDestIndexFromSrcIndex…

4ad0887

…` by writing the destination indexes to the provided buffer with hashcodes and moving the responsibility to the caller (1-4% gain)

adamsitnik added area-System.Collections tenet-performance Performance related issue labels Jun 16, 2023

adamsitnik requested a review from stephentoub June 16, 2023 13:25

ghost assigned adamsitnik Jun 16, 2023

build-analysis bot mentioned this pull request Jun 16, 2023

Tracking issue for CI build timeouts #76454

Closed

adamsitnik marked this pull request as draft June 19, 2023 06:53

adamsitnik marked this pull request as ready for review June 19, 2023 07:39

adamsitnik mentioned this pull request Jun 21, 2023

Faster optimized frozen dictionary creation (4/n) #87876

Merged

stephentoub reviewed Jun 22, 2023

View reviewed changes

src/libraries/System.Collections.Immutable/src/System/Collections/Frozen/FrozenHashTable.cs Outdated Show resolved Hide resolved

apply suggestion from Stephen

cfe0f2c

stephentoub reviewed Jun 22, 2023

View reviewed changes

src/libraries/System.Collections.Immutable/src/System/Collections/Frozen/FrozenHashTable.cs Outdated Show resolved Hide resolved

stephentoub approved these changes Jun 22, 2023

View reviewed changes

Apply suggestions from code review

e62cb33

Co-authored-by: Stephen Toub <[email protected]>

adamsitnik merged commit b878df7 into dotnet:main Jun 22, 2023

This was referenced Jun 23, 2023

Faster optimized frozen dictionary creation (5/n) #87960

Merged

Frozen collection construction performance #87964

Closed

IDisposable reviewed Jun 23, 2023

View reviewed changes

lewing mentioned this pull request Jun 29, 2023

[Perf] Linux/x64: 3 Improvements on 6/22/2023 7:04:23 PM dotnet/perf-autofiling-issues#19234

Open

ghost locked as resolved and limited conversation to collaborators Jul 23, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Faster optimized frozen dictionary creation (3/n) #87688

Faster optimized frozen dictionary creation (3/n) #87688

adamsitnik commented Jun 16, 2023 •

edited

Loading

ghost commented Jun 16, 2023

adamsitnik commented Jun 19, 2023

adamsitnik commented Jun 19, 2023

adamsitnik commented Jun 22, 2023

stephentoub Jun 22, 2023

adamsitnik Jun 22, 2023

stephentoub Jun 22, 2023

adamsitnik Jun 22, 2023

stephentoub Jun 22, 2023

adamsitnik Jun 22, 2023

stephentoub Jun 22, 2023

IDisposable Jun 23, 2023

IDisposable Jun 23, 2023

Faster optimized frozen dictionary creation (3/n) #87688

Faster optimized frozen dictionary creation (3/n) #87688

Conversation

adamsitnik commented Jun 16, 2023 • edited Loading

ghost commented Jun 16, 2023

adamsitnik commented Jun 19, 2023

adamsitnik commented Jun 19, 2023

adamsitnik commented Jun 22, 2023

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

adamsitnik commented Jun 16, 2023 •

edited

Loading