Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Improve the performance of ConditionalWeakTable.TryGetValue #80059

Merged
merged 17 commits into from
Jan 5, 2023

Conversation

AustinWise
Copy link
Contributor

@AustinWise AustinWise commented Dec 30, 2022

Also fixes Objective-C reference tracking in NativeAOT, which was broken by #79519.

Fixes #80032
Related issue: #77472

@ghost ghost added the community-contribution Indicates that the PR has been added by a community member label Dec 30, 2022
@ghost
Copy link

ghost commented Dec 30, 2022

Tagging subscribers to this area: @agocke, @MichalStrehovsky, @jkotas
See info in area-owners.md if you want to be subscribed.

Issue Details

Fixes #80032

Author: AustinWise
Assignees: -
Labels:

area-NativeAOT-coreclr

Milestone: -

@AustinWise AustinWise marked this pull request as ready for review December 30, 2022 20:37
@AustinWise

This comment was marked as outdated.

@jkotas
Copy link
Member

jkotas commented Dec 30, 2022

I would not expect 20% perf regression with the current change. Is it repeatable? Have you looked at the generated code to see what caused it?

@AustinWise
Copy link
Contributor Author

My bad, the benchmark runner was not using the correct runtime, because I did not pass the args in. I'll correct this and rerun.

@AustinWise
Copy link
Contributor Author

I updated the benchmark to properly read arguments and try a few different number of objects. It now shows that trying to get a value that is not in the ConditionalWeakTable is a fair amount faster.

using System.Runtime.CompilerServices;
using BenchmarkDotNet.Attributes;
using BenchmarkDotNet.Running;

namespace MyBenchmarks
{
    public class TryGetHashCode
    {
        private readonly List<object> mRootedObjects = new();
        private readonly ConditionalWeakTable<object, object> mWeakTable = new();
        private object mAnObjectInTheTable = null!;


        [Params(1, 100, 1000, 10000)]
        public int NumberOfObjects;

        [GlobalSetup]
        public void Setup()
        {
            for (int i = 0; i < NumberOfObjects; i++)
            {
                var obj = new object();
                mRootedObjects.Add(obj);
                mWeakTable.Add(obj, new object());
                mAnObjectInTheTable = obj;
            }
        }

        [Benchmark]
        public bool TryGetNonExistentValue()
        {
            return mWeakTable.TryGetValue(new object(), out object _);
        }

        [Benchmark]
        public bool TryGetExistingValue()
        {
            return mWeakTable.TryGetValue(mAnObjectInTheTable, out object _);
        }
    }

    public class Program
    {
        public static void Main(string[] args)
        {
            var summary = BenchmarkRunner.Run<TryGetHashCode>(null!, args);
        }
    }
}
BenchmarkDotNet=v0.13.3, OS=ubuntu 22.04
AMD Ryzen Threadripper PRO 3955WX 16-Cores, 1 CPU, 32 logical and 16 physical cores
.NET SDK=7.0.101
  [Host]     : .NET 7.0.1 (7.0.122.56804), X64 RyuJIT AVX2
  Job-FOVPQH : .NET 8.0.0 (42.42.42.42424), X64 RyuJIT AVX2
  Job-WAHSHU : .NET 8.0.0 (42.42.42.42424), X64 RyuJIT AVX2

// * Warnings *
MultimodalDistribution
  TryGetHashCode.TryGetExistingValue: Toolchain=merge-base -> It seems that the distribution is bimodal (mValue = 3.29)
Method Job Toolchain NumberOfObjects Mean Error StdDev Ratio RatioSD
TryGetNonExistentValue Job-FOVPQH merge-base 1 28.840 ns 0.2900 ns 0.2712 ns 1.00 0.00
TryGetNonExistentValue Job-WAHSHU PR 1 7.504 ns 0.0704 ns 0.0624 ns 0.26 0.00
TryGetExistingValue Job-FOVPQH merge-base 1 12.081 ns 0.3200 ns 0.9435 ns 1.00 0.00
TryGetExistingValue Job-WAHSHU PR 1 10.224 ns 0.2313 ns 0.2475 ns 0.84 0.08
TryGetNonExistentValue Job-FOVPQH merge-base 100 33.848 ns 0.0376 ns 0.0333 ns 1.00 0.00
TryGetNonExistentValue Job-WAHSHU PR 100 7.568 ns 0.0164 ns 0.0145 ns 0.22 0.00
TryGetExistingValue Job-FOVPQH merge-base 100 10.562 ns 0.0760 ns 0.0711 ns 1.00 0.00
TryGetExistingValue Job-WAHSHU PR 100 10.089 ns 0.1101 ns 0.0976 ns 0.96 0.01
TryGetNonExistentValue Job-FOVPQH merge-base 1000 35.367 ns 0.0269 ns 0.0224 ns 1.00 0.00
TryGetNonExistentValue Job-WAHSHU PR 1000 7.527 ns 0.0183 ns 0.0152 ns 0.21 0.00
TryGetExistingValue Job-FOVPQH merge-base 1000 10.223 ns 0.2265 ns 0.2945 ns 1.00 0.00
TryGetExistingValue Job-WAHSHU PR 1000 9.775 ns 0.1210 ns 0.1132 ns 0.95 0.03
TryGetNonExistentValue Job-FOVPQH merge-base 10000 35.562 ns 0.0861 ns 0.0719 ns 1.00 0.00
TryGetNonExistentValue Job-WAHSHU PR 10000 7.795 ns 0.1275 ns 0.1192 ns 0.22 0.00
TryGetExistingValue Job-FOVPQH merge-base 10000 11.644 ns 0.2608 ns 0.7227 ns 1.00 0.00
TryGetExistingValue Job-WAHSHU PR 10000 10.384 ns 0.0961 ns 0.0852 ns 0.89 0.07

@AustinWise
Copy link
Contributor Author

I added an implementation for Mono.

One thing to note is that unlike CoreCLR, the hashcodes generated for objects are not deterministic. So the hash table could potentially have a different number of collisions run-to-run. I ran the benchmarks twice and confirmed the ratio between merge-base and PR were roughly the same.

Here is the results for running Mono on x64 Linux. I used these directions to run the benchmarks. I used #80082 to include corerun in the testhost.

BenchmarkDotNet=v0.13.3, OS=ubuntu 22.04
AMD Ryzen Threadripper PRO 3955WX 16-Cores, 1 CPU, 32 logical and 16 physical cores
.NET SDK=7.0.101
  [Host]     : .NET 7.0.1 (7.0.122.56804), X64 RyuJIT AVX2
  Job-PGRQPW : .NET 8.0.0 (42.42.42.42424) using MonoVM, X64 VectorSize=128
  Job-SWPJPT : .NET 8.0.0 (42.42.42.42424) using MonoVM, X64 VectorSize=128
Method Job Toolchain NumberOfObjects Mean Error StdDev Ratio RatioSD
TryGetNonExistentValue Job-PGRQPW merge-base 1 55.36 ns 0.017 ns 0.015 ns 1.00 0.00
TryGetNonExistentValue Job-SWPJPT PR 1 47.41 ns 0.027 ns 0.024 ns 0.86 0.00
TryGetExistingValue Job-PGRQPW merge-base 1 51.71 ns 0.093 ns 0.087 ns 1.00 0.00
TryGetExistingValue Job-SWPJPT PR 1 50.67 ns 0.859 ns 0.803 ns 0.98 0.02
TryGetNonExistentValue Job-PGRQPW merge-base 100 59.69 ns 0.078 ns 0.073 ns 1.00 0.00
TryGetNonExistentValue Job-SWPJPT PR 100 50.21 ns 0.593 ns 0.555 ns 0.84 0.01
TryGetExistingValue Job-PGRQPW merge-base 100 54.07 ns 0.512 ns 0.479 ns 1.00 0.00
TryGetExistingValue Job-SWPJPT PR 100 49.83 ns 0.400 ns 0.374 ns 0.92 0.01
TryGetNonExistentValue Job-PGRQPW merge-base 1000 62.25 ns 0.045 ns 0.042 ns 1.00 0.00
TryGetNonExistentValue Job-SWPJPT PR 1000 47.91 ns 0.059 ns 0.055 ns 0.77 0.00
TryGetExistingValue Job-PGRQPW merge-base 1000 53.12 ns 0.131 ns 0.116 ns 1.00 0.00
TryGetExistingValue Job-SWPJPT PR 1000 50.17 ns 0.480 ns 0.449 ns 0.94 0.01
TryGetNonExistentValue Job-PGRQPW merge-base 10000 62.63 ns 0.244 ns 0.229 ns 1.00 0.00
TryGetNonExistentValue Job-SWPJPT PR 10000 48.70 ns 0.133 ns 0.124 ns 0.78 0.00
TryGetExistingValue Job-PGRQPW merge-base 10000 52.06 ns 0.044 ns 0.036 ns 1.00 0.00
TryGetExistingValue Job-SWPJPT PR 10000 51.24 ns 0.026 ns 0.025 ns 0.98 0.00

I also benchmarked WASM running on V8. I followed these directions to build the runtime. I modfied my benchmark to run against the WASM runtime. See this branch: https://github.com/AustinWise/TryGetHashCodeBenchmark/tree/wasm

BenchmarkDotNet=v0.13.3, OS=ubuntu 22.04
AMD Ryzen Threadripper PRO 3955WX 16-Cores, 1 CPU, 32 logical and 16 physical cores
.NET SDK=7.0.101
  [Host]     : .NET 7.0.1 (7.0.122.56804), X64 RyuJIT AVX2
  PR         : .NET Core (Mono) 8.0.0-dev, Wasm AOT
  merge-base : .NET Core (Mono) 8.0.0-dev, Wasm AOT

Runtime=Wasm  IterationCount=3  LaunchCount=1  
WarmupCount=3  
V8 version 11.1.92
Method Job Toolchain NumberOfObjects Mean Error StdDev Ratio RatioSD
TryGetNonExistentValue PR Wasm: PR 1 235.6 ns 23.61 ns 1.29 ns 0.84 0.01
TryGetNonExistentValue merge-base Wasm: merge-base 1 278.8 ns 7.32 ns 0.40 ns 1.00 0.00
TryGetExistingValue PR Wasm: PR 1 325.4 ns 31.42 ns 1.72 ns 0.93 0.04
TryGetExistingValue merge-base Wasm: merge-base 1 350.9 ns 283.24 ns 15.53 ns 1.00 0.00
TryGetNonExistentValue PR Wasm: PR 100 224.6 ns 5.16 ns 0.28 ns 0.76 0.00
TryGetNonExistentValue merge-base Wasm: merge-base 100 294.3 ns 16.31 ns 0.89 ns 1.00 0.00
TryGetExistingValue PR Wasm: PR 100 329.3 ns 15.52 ns 0.85 ns 0.94 0.01
TryGetExistingValue merge-base Wasm: merge-base 100 349.4 ns 47.84 ns 2.62 ns 1.00 0.00
TryGetNonExistentValue PR Wasm: PR 1000 254.7 ns 40.19 ns 2.20 ns 0.84 0.01
TryGetNonExistentValue merge-base Wasm: merge-base 1000 303.1 ns 12.48 ns 0.68 ns 1.00 0.00
TryGetExistingValue PR Wasm: PR 1000 337.1 ns 23.65 ns 1.30 ns 1.02 0.01
TryGetExistingValue merge-base Wasm: merge-base 1000 331.6 ns 19.43 ns 1.07 ns 1.00 0.00
TryGetNonExistentValue PR Wasm: PR 10000 253.8 ns 41.11 ns 2.25 ns 0.92 0.01
TryGetNonExistentValue merge-base Wasm: merge-base 10000 277.1 ns 10.21 ns 0.56 ns 1.00 0.00
TryGetExistingValue PR Wasm: PR 10000 367.4 ns 21.53 ns 1.18 ns 1.15 0.01
TryGetExistingValue merge-base Wasm: merge-base 10000 320.2 ns 19.91 ns 1.09 ns 1.00 0.00

@AustinWise AustinWise changed the title [NativeAOT] Fix Objective-C reference tracking Improves the performance of ConditionalWeakTable.TryGetValue Dec 31, 2022
@AustinWise AustinWise changed the title Improves the performance of ConditionalWeakTable.TryGetValue Improve the performance of ConditionalWeakTable.TryGetValue Dec 31, 2022
@jkotas
Copy link
Member

jkotas commented Jan 1, 2023

@vargaz @lambdageek Could you please review Mono changes?

Copy link
Member

@lambdageek lambdageek left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Mono changes LGTM

src/mono/mono/metadata/object-internals.h Outdated Show resolved Hide resolved
@AustinWise
Copy link
Contributor Author

With the interpreter transforms actually kicking in, the improvments for Mono WASM are even better. At least I assume that is the reason, I could not figure out how run the interpreter under a debugger.

BenchmarkDotNet=v0.13.3, OS=ubuntu 22.04
AMD Ryzen Threadripper PRO 3955WX 16-Cores, 1 CPU, 32 logical and 16 physical cores
.NET SDK=7.0.101
  [Host]     : .NET 7.0.1 (7.0.122.56804), X64 RyuJIT AVX2
  PR         : .NET Core (Mono) 8.0.0-dev, Wasm AOT
  merge-base : .NET Core (Mono) 8.0.0-dev, Wasm AOT

Runtime=Wasm  IterationCount=3  LaunchCount=1  
WarmupCount=3
V8 version 11.1.92
Method Job Toolchain NumberOfObjects Mean Error StdDev Ratio RatioSD
TryGetNonExistentValue PR Wasm: PR 1 139.9 ns 3.64 ns 0.20 ns 0.46 0.00
TryGetNonExistentValue merge-base Wasm: merge-base 1 302.9 ns 23.78 ns 1.30 ns 1.00 0.00
TryGetExistingValue PR Wasm: PR 1 265.1 ns 19.45 ns 1.07 ns 0.81 0.00
TryGetExistingValue merge-base Wasm: merge-base 1 326.2 ns 14.47 ns 0.79 ns 1.00 0.00
TryGetNonExistentValue PR Wasm: PR 100 136.2 ns 8.35 ns 0.46 ns 0.47 0.01
TryGetNonExistentValue merge-base Wasm: merge-base 100 292.6 ns 98.27 ns 5.39 ns 1.00 0.00
TryGetExistingValue PR Wasm: PR 100 236.7 ns 28.44 ns 1.56 ns 0.70 0.03
TryGetExistingValue merge-base Wasm: merge-base 100 340.1 ns 267.10 ns 14.64 ns 1.00 0.00
TryGetNonExistentValue PR Wasm: PR 1000 138.3 ns 10.93 ns 0.60 ns 0.45 0.00
TryGetNonExistentValue merge-base Wasm: merge-base 1000 307.0 ns 36.50 ns 2.00 ns 1.00 0.00
TryGetExistingValue PR Wasm: PR 1000 233.9 ns 22.86 ns 1.25 ns 0.71 0.01
TryGetExistingValue merge-base Wasm: merge-base 1000 330.4 ns 43.60 ns 2.39 ns 1.00 0.00
TryGetNonExistentValue PR Wasm: PR 10000 140.2 ns 13.44 ns 0.74 ns 0.43 0.00
TryGetNonExistentValue merge-base Wasm: merge-base 10000 322.6 ns 19.63 ns 1.08 ns 1.00 0.00
TryGetExistingValue PR Wasm: PR 10000 241.0 ns 24.04 ns 1.32 ns 0.74 0.01
TryGetExistingValue merge-base Wasm: merge-base 10000 327.1 ns 51.33 ns 2.81 ns 1.00 0.00

@jkotas
Copy link
Member

jkotas commented Jan 4, 2023

@AustinWise Could you please resolve the merge conflict?

@AustinWise
Copy link
Contributor Author

AustinWise commented Jan 5, 2023

@jkotas I merged in main. Test failure looks like #74838, but did not dig into this deeply.

@jkotas jkotas merged commit 5a10aa6 into dotnet:main Jan 5, 2023
@jkotas
Copy link
Member

jkotas commented Jan 5, 2023

@AustinWise Thank you!

@VSadov
Copy link
Member

VSadov commented Jan 5, 2023

@AustinWise Thanks for getting this through!!

@AustinWise AustinWise deleted the austin/TryGetHashCode branch January 6, 2023 04:29
@AustinWise
Copy link
Contributor Author

Thanks for all the help with reviewing!

Perhaps this PR should be labeled-with tenet-performance so that @stephentoub can find it to potentially include it in the .NET 8 performance improvements post 😀

@jkotas jkotas added the tenet-performance Performance related issue label Jan 6, 2023
@stephentoub
Copy link
Member

stephentoub commented Jan 6, 2023

I don't look at the labels. I look at every pr that comes through and track them on a list of prs to consider... this one is already on my list :)

@ghost ghost locked as resolved and limited conversation to collaborators Feb 5, 2023
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
area-NativeAOT-coreclr community-contribution Indicates that the PR has been added by a community member tenet-performance Performance related issue
Projects
None yet
Development

Successfully merging this pull request may close these issues.

[NativeAOT + Objective-C Marshal] request for advise on restricted GC callouts
8 participants