Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Faster optimized frozen dictionary creation (1/n) #87510

Merged
merged 3 commits into from
Jun 14, 2023

Conversation

adamsitnik
Copy link
Member

@adamsitnik adamsitnik commented Jun 13, 2023

  • every strategy needs an array of keys, we can create it up-front and iterate over it rather than the dictionary to get min and max lengths (1-2% gain)
  • Instead of ensuring that at least 95% of data is good, we stop when we know that at least 5% is bad (13-14% gain)
  • toggle the direction and re-use the comparer and hashset (3% time gain, 12% allocations reduction)
BenchmarkDotNet=v0.13.2.2052-nightly, OS=Windows 11 (10.0.22621.1702)
AMD Ryzen Threadripper PRO 3945WX 12-Cores, 1 CPU, 24 logical and 12 physical cores
.NET SDK=8.0.100-preview.4.23259.14
  [Host]     : .NET 8.0.0 (8.0.23.25905), X64 RyuJIT AVX2

LaunchCount=3  MaxIterationCount=20  MemoryRandomization=True
Method Job Size Mean Ratio Allocated Alloc Ratio
FrozenDictionaryOptimized PR 512 73.236 us 0.82 74.91 KB 0.88
FrozenDictionaryOptimized main 512 89.431 us 1.00 85.21 KB 1.00

For this particular benchmark, the initial gap between creating the non-optimized (4 us) and optimized (89.4 us) was 85.4 us, now it's 69.2 us.

…iterate over it rather than the dictionary to get min and max lengths (1-2% gain)
…e know that at least 5% is bad (13-14% gain)
@ghost
Copy link

ghost commented Jun 13, 2023

Tagging subscribers to this area: @dotnet/area-system-collections
See info in area-owners.md if you want to be subscribed.

Issue Details
BenchmarkDotNet=v0.13.2.2052-nightly, OS=Windows 11 (10.0.22621.1702)
AMD Ryzen Threadripper PRO 3945WX 12-Cores, 1 CPU, 24 logical and 12 physical cores
.NET SDK=8.0.100-preview.4.23259.14
  [Host]     : .NET 8.0.0 (8.0.23.25905), X64 RyuJIT AVX2

LaunchCount=3  MaxIterationCount=20  MemoryRandomization=True
Method Job Size Mean Ratio Allocated Alloc Ratio
FrozenDictionaryOptimized PR 512 73.236 us 0.82 74.91 KB 0.88
FrozenDictionaryOptimized main 512 89.431 us 1.00 85.21 KB 1.00
Author: adamsitnik
Assignees: -
Labels:

area-System.Collections, tenet-performance

Milestone: -

@Tornhoof
Copy link
Contributor

Instead of ensuring that at least 95% of data is good, we stop when we know that at least 5% is bad (13-14% gain)

Do you mean at most 5% is bad? Otherwise you could simply nop everything and 100% is bad, which is at least 5% :)

@stephentoub
Copy link
Member

Instead of ensuring that at least 95% of data is good, we stop when we know that at least 5% is bad (13-14% gain)

Do you mean at most 5% is bad? Otherwise you could simply nop everything and 100% is bad, which is at least 5% :)

The wording is correct. For a given capacity, once we've seen at least 5% of the entries conflict, we know we can't get to > 95% not conflicting, so we can give up on that capacity and try with the next. We can't give up until we've seen at least 5% conflict, and if we never see 5% conflict, then that capacity is good to go and we can use it.

@Tornhoof
Copy link
Contributor

Thank you for the explanation.

}

private sealed class RightJustifiedCaseInsensitiveSubstringComparer : SubstringComparer
private sealed class JustifiedCaseInsensitiveSubstringComparer : SubstringComparer
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm happy to hear it's helpful, and the fewer the types the better, but I'm a little surprised this makes a positive impact on throughput, since it's adding more work on every comparison. What's the logic for why it makes things faster?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

it's adding more work on every comparison

That is true, but it's less work compared to creating a new HashSet<string>(keys.Length)

image

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is that true just in the example you're profiling or generally? The HashSet will happen once regardless of how many retries are needed.

{
set.Clear();

// SufficientUniquenessFactor of 95% is good enough.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This constant was deleted.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I was too lazy/tired to turn the const name into three separate words. If you don't mind I am going to do that in next PR.

@adamsitnik
Copy link
Member Author

The failure is unrelated: #87505, merging

@adamsitnik adamsitnik merged commit 9b6bab4 into dotnet:main Jun 14, 2023
@IDisposable
Copy link
Contributor

@adamsitnik Can you point me to how to generate the benchmarking analysis? I'm looking into tweaking this a tiny bit more...

@adamsitnik
Copy link
Member Author

Can you point me to how to generate the benchmarking analysis?

Of course!

This doc describes how to run benchmarks against a local build of dotnet/runtime:

To get a trace file you can use the EtwProfiler

Here is a short introduction to PerfView.

If you are already familiar with PerfView then the generation of the profile diffs is described here

BTW automating this diff is on my TODO list, I've already made the first step and implemented the dotnet/BenchmarkDotNet#2116

I'm looking into tweaking this a tiny bit more...

I am working on it right now, I expect to introduce changes to FrozenHashTable.Create and CalcNumBuckets (just letting you know)

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants