-
Notifications
You must be signed in to change notification settings - Fork 4.7k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Removed null checks of buckets and entries in Dictionary and HashSet #48725
Conversation
Tagging subscribers to this area: @eiriktsarpalis Issue DetailsRemoved null checks of buckets and entries in Dictionary and HashSet at the cost of some constructor performance. I still need to run perf measurements.
|
@@ -51,8 +51,14 @@ public Dictionary(int capacity, IEqualityComparer<TKey>? comparer) | |||
|
|||
if (capacity > 0) | |||
{ | |||
Initialize(capacity); | |||
Initialize(HashHelpers.GetPrime(capacity)); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why has HashHelpers.GetPrime
been moved out of the Initialize
method? It doesn't like all callsites have been updated to reflect the change.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It was moved out because TrimExcess
was already passing in a size computed from HashHelpers.GetPrime
so it was doing double work. That is the only callsite I did not update.
} | ||
else | ||
{ | ||
InitializeEmpty(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is replaced with eventual calls to Initialize(0)
. Note however that Initialize(0)
will allocate arrays of size 3
instead of 1
in this case. What is the motivation behind this change?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The idea is that using InitializeEmpty
will reduce construction costs since it's just initializing with statically initialized arrays. If we called Initialize(0)
we'd have to pay those allocation costs upfront even if the dictionary is never written to. It also allows us to maintain current binary serialization logic of not serializing entries if the dictionary hasn't been written to and constructed with a capacity of 0, either explicitly or implicitly.
src/libraries/System.Private.CoreLib/src/System/Collections/HashHelpers.cs
Outdated
Show resolved
Hide resolved
Would you be able to share benchmark results from |
…at the cost of some constructor performance.
1b50167
to
b6f24a1
Compare
edit - oh, I see its built on the original backport. I thought it looked familiar! |
cc @benaadams as he works in dictionary periodically.. |
@jkotas I remember you had concerns about dotnet/coreclr#22599 but this is a bit different. |
} | ||
else | ||
{ | ||
uint hashCode = (uint)comparer.GetHashCode(key); | ||
int i = GetBucket(hashCode); | ||
Entry[] entries = _entries; | ||
uint collisionCount = 0; | ||
i--; // Value in _buckets is 1-based; subtract 1 from i. We do it here so it fuses with the following conditional. | ||
do | ||
{ | ||
// Should be a while loop https://github.com/dotnet/runtime/issues/9422 | ||
// Test in if to drop range check for following array access | ||
if ((uint)i >= (uint)entries.Length) | ||
{ | ||
goto ReturnNotFound; | ||
} | ||
|
||
entry = ref entries[i]; | ||
if (entry.hashCode == hashCode && comparer.Equals(entry.key, key)) | ||
{ | ||
goto ReturnFound; | ||
} | ||
|
||
goto ReturnNotFound; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Still want this goto or its going to throw and exception when something is not found?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It appears this goto statement was only used when the entries was null
. Now it shows as unreachable code.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ah, yes; though its now going to calculate the hashcode and bucket for empty dictionaries?
@@ -368,84 +372,49 @@ public virtual void GetObjectData(SerializationInfo info, StreamingContext conte | |||
} | |||
|
|||
ref Entry entry = ref Unsafe.NullRef<Entry>(); | |||
if (_buckets != null) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Still want an if
test here (perhaps .Length > 0
instead, or it will do a lot of work for an empty dictionary?
Yes, I am still concerned about this change. I am interested in the microbenchmark results for this change. I doubt that the improvement is going to be measurable. On the other hand, this change will increase the static Dictionary footprint and cost to create empty Dictionary in measurable way. |
So I'm trying to run some benchmarks on this and am having difficulty. I setup two separate copies of this repo on my machine, one for the main branch and one for my branch. I was able to build each of these eventually after several errors but a |
The functional tests are failing on the change in the CI. I guess that it is also what breaks the perf tests. |
It will throw |
Could you explain how it increases the static footprint? I must be missing it. |
It does make an |
Ah yes. |
Here's the benchmarks I ran. It looks like there's just marginal improvements to most tests if any. The biggest surprise is the improvement for the BenchmarkDotNet=v0.12.1.1466-nightly, OS=Windows 10.0.19041.867 (2004/May2020Update/20H1)
AMD Ryzen 9 3900X, 1 CPU, 24 logical and 12 physical cores
.NET SDK=6.0.100-preview.2.21155.3
[Host] : .NET 6.0.0 (6.0.21.15406), X64 RyuJIT
Job-CSABXJ : .NET 6.0.0 (42.42.42.42424), X64 RyuJIT
Job-DDDSNE : .NET 6.0.0 (42.42.42.42424), X64 RyuJIT
PowerPlanMode=00000000-0000-0000-0000-000000000000 Arguments=/p:DebugType=portable IterationTime=250.0000 ms
MaxIterationCount=20 MinIterationCount=15 WarmupCount=1
|
fwiw the geo mean of the "branch" ratios above is 1.002. Exclude constructors, which are presumably more rarely called, and it's 0.98 |
…pty array instead of using Array.Empty<Entry>() since Entry is internal and to avoid the extra generic instantiation
I just ran it again and got these results. What's interesting is that
BenchmarkDotNet=v0.12.1.1466-nightly, OS=Windows 10.0.19041.867 (2004/May2020Update/20H1)
AMD Ryzen 9 3900X, 1 CPU, 24 logical and 12 physical cores
.NET SDK=6.0.100-preview.2.21155.3
[Host] : .NET 6.0.0 (6.0.21.15406), X64 RyuJIT
Job-QHDUBI : .NET 6.0.0 (42.42.42.42424), X64 RyuJIT
Job-CLVDGL : .NET 6.0.0 (42.42.42.42424), X64 RyuJIT
PowerPlanMode=00000000-0000-0000-0000-000000000000 Arguments=/p:DebugType=portable IterationTime=250.0000 ms
MaxIterationCount=20 MinIterationCount=15 WarmupCount=1
|
Reviewers can we have a final look and sign off or provide feedback? Thanks! |
src/libraries/System.Private.CoreLib/src/System/Collections/HashHelpers.cs
Show resolved
Hide resolved
@@ -343,10 +343,10 @@ public MethodInfo GetArrayMethod(Type arrayClass, string methodName, CallingConv | |||
public EnumBuilder DefineEnum(string name, TypeAttributes visibility, Type underlyingType) | |||
{ | |||
ITypeIdentifier ident = TypeIdentifiers.FromInternal(name); | |||
if (name_cache.ContainsKey(ident)) | |||
if (name != null && name_cache.ContainsKey(ident)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I suspect that we will likely see more instances of breaks like this one in the code out there.
Do we need to file a breaking change notification for this change?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So you think we should file a breaking change notification to indicate that the GetHashCode
method will be called on a key provided to an initially empty Dictionary
whereas before it did not?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I would like to hear what others think about this issue.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
cc @marklio @GrabYourPitchforks for thoughts.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's not clear to me what the break was. Was there a bug in the implementation of <some_concrete_type>.GetHashCode
where it was throwing an exception when called? Normally one would expect this method to be called as part of dictionary / hashset processing.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@marklio If I remember correctly, we have done analysis on the capacity distribution for common collections in real apps. How common are empty dictionaries?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Also, it would take just one extra if-check to mitigate this issue: https://github.com/dotnet/runtime/pull/48725/files#r593419736
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@TylerBrinkley do you know how reintroducing a check here will affect perf results? I'm guessing that an "is empty" check will cost about the same as the original "is null" check, but it'd be good to confirm.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I just saw this thread. I'll see if we had information on empty dictionaries today.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The last analysis we did of dictionaries was back in 2018, where we were interested in the usage of types as keys to help inform what investments we should make in dictionaries. Data retention policies mean that the underlying study data is long gone, so I can't provide a bunch of backing data, but from summaries, it did indicate that empty dictionaries are common (although we can't tell what operations/frequency are performed on them, or how long they stay empty). There aren't resources to do another study at this time. My guess based on available data here is that operations on empty dictionaries are pretty common. IMO, I tend to agree with @GrabYourPitchforks that perf-sensitive operations on "chronically empty" dictionaries probably aren't common if they were important enough to measure, and profiling would likely shine a pretty direct light on the culprit if it was important enough to measure (and differing perf characteristics are pretty expected in new versions). The functional breaks associated with this change (which we know exist due to the presence of such fixes in the PR) are likely unexpected and pretty hard to reason about if people hit them, and they may not be in a position to be able to address them if they don't own the types involved. I'd say if an easy acceptable mitigation exists, we should take it.
|
That'll teach me to edit directly within GitHub again. |
/azp run runtime |
Commenter does not have sufficient privileges for PR 48725 in repo dotnet/runtime |
/azp run runtime |
Azure Pipelines successfully started running 1 pipeline(s). |
@TylerBrinkley do you have time to fix the merge conflicts, and address feedback above? |
@TylerBrinkley could you please let us know if you're planning to get back to this? |
Thanks Tyler, reviewers, can we have another look? |
I think we should add back the early out for empty table. I do not think that the potential functional and performance regressions are worth it. |
Sorry, my computer's cpu cooler died on me so I can't do a comparison with adding the early empty collection check back in right now. If I had to guess I think the performance would be similar to the prior null check and thus would make this change not worthwhile. I think to justify the performance regression with empty collections we would need to know better how often an empty collection has lookups performed as opposed to collections with items. When I get my computer up and running again I can calculate that breakeven point where when a certain percentage of calls are made with items compared to empty collections, the net time throughput is the same. In regards to the functional difference, I don't think users should find it surprising that the |
@TylerBrinkley how is your computer 😄 Just going through old PR's. |
@jkotas @danmoseley should we maybe close this for now? It seems unlikely it would make it in time for .NET 6. |
Per the discussion, I'm going to close this. Thanks for all the great effort here. |
Sorry, I was very busy with other things and was encountering an unexpected regression I was trying to understand. As time allows I will look more into this for .NET 7.0. |
Removed null checks of buckets and entries in Dictionary and HashSet at the cost of some constructor performance. I still need to run perf measurements.
Re-done from dotnet/coreclr#23003