[Perf] Regressions in System.Collections.TryGetValueFalse<String, String> #51258

DrewScoggins · 2021-04-14T18:56:52Z

Run Information

Architecture	x64
OS	ubuntu 18.04
Baseline	59c592cc8d2778bcc6173baa2b25b13190e42990
Compare	6bfc5f21dea7b550f1c807454d45408ef34764e1
Diff	Diff

Regressions in System.Collections.TryGetValueFalse<String, String>

Benchmark	Baseline	Test	Test/Base	Baseline IR	Compare IR	IR Ratio	Baseline ETL	Compare ETL
IDictionary	9.34 μs	11.07 μs	1.19
Dictionary	8.17 μs	9.92 μs	1.21

![graph]
Historical Data in Reporting System

Repro

git clone https://github.com/dotnet/performance.git
python3 .\performance\scripts\benchmarks_ci.py -f netcoreapp5.0 --filter 'System.Collections.TryGetValueFalse&lt;String, String&gt;*'

Payloads

Baseline
Compare

Histogram

System.Collections.TryGetValueFalse<String, String>.IDictionary(Size: 512)

System.Collections.TryGetValueFalse<String, String>.Dictionary(Size: 512)

Docs

Profiling workflow for dotnet/runtime repository
Benchmarking workflow for dotnet/runtime repository

category:performance
theme:benchmarks

The text was updated successfully, but these errors were encountered:

ghost · 2021-04-14T18:56:58Z

Tagging subscribers to this area: @eiriktsarpalis
See info in area-owners.md if you want to be subscribed.

Issue Details

Run Information

Architecture	x64
OS	ubuntu 18.04
Baseline	59c592cc8d2778bcc6173baa2b25b13190e42990
Compare	6bfc5f21dea7b550f1c807454d45408ef34764e1
Diff	Diff

Regressions in System.Collections.TryGetValueFalse<String, String>

Benchmark	Baseline	Test	Test/Base	Baseline IR	Compare IR	IR Ratio	Baseline ETL	Compare ETL
IDictionary	9.34 μs	11.07 μs	1.19
Dictionary	8.17 μs	9.92 μs	1.21

![graph]
Historical Data in Reporting System

Repro

git clone https://github.com/dotnet/performance.git
python3 .\performance\scripts\benchmarks_ci.py -f netcoreapp5.0 --filter 'System.Collections.TryGetValueFalse&lt;String, String&gt;*'

Payloads

Baseline
Compare

Histogram

System.Collections.TryGetValueFalse<String, String>.IDictionary(Size: 512)

System.Collections.TryGetValueFalse<String, String>.Dictionary(Size: 512)

Docs

Profiling workflow for dotnet/runtime repository
Benchmarking workflow for dotnet/runtime repository

Author:	DrewScoggins
Assignees:	AndyAyersMS
Labels:	`arch-x64`, `area-System.Collections`, `os-linux`, `tenet-performance`, `tenet-performance-benchmarks`, `untriaged`
Milestone:	-

DrewScoggins · 2021-04-14T18:59:30Z

Run Information

Architecture	x64
OS	ubuntu 18.04
Baseline	59c592cc8d2778bcc6173baa2b25b13190e42990
Compare	6bfc5f21dea7b550f1c807454d45408ef34764e1
Diff	Diff

Improvemnts in System.Collections.ContainsKeyFalse<String, String>

Benchmark	Baseline	Test	Test/Base	Baseline IR	Compare IR	IR Ratio	Baseline ETL	Compare ETL
IDictionary	11.48 μs	9.45 μs	0.82
Dictionary	9.95 μs	8.25 μs	0.83

![graph]
Historical Data in Reporting System

Repro

git clone https://github.com/dotnet/performance.git
python3 .\performance\scripts\benchmarks_ci.py -f netcoreapp5.0 --filter 'System.Collections.ContainsKeyFalse&lt;String, String&gt;*'

Payloads

Baseline
Compare

Histogram

System.Collections.ContainsKeyFalse<String, String>.IDictionary(Size: 512)

System.Collections.ContainsKeyFalse<String, String>.Dictionary(Size: 512)

Docs

Profiling workflow for dotnet/runtime repository
Benchmarking workflow for dotnet/runtime repository

DrewScoggins · 2021-04-14T19:01:20Z

Looking at the full test trends for TryGetValue and ContainsKeyFalse, seems to show that when one gets faster the other get slower.

AndyAyersMS · 2021-04-16T01:46:29Z

@DrewScoggins the zip archives linked above lose all file permissions (or else I'm doing it wrong). Makes it painful to download and then run the binaries.

andy@andy-ubuntu$ unzip -Zl 6d58d995-201b-444b-a499-a528190087f3.zip Core_Root/corerun 
?---------  2.0 unx   108912 b-    39058 defN 21-Apr-05 14:49 Core_Root/corerun

after an unzip:

---------- 1 andy andy   108912 Apr  5 14:49 corerun

Can we instead get a zipped up tar archive?

AndyAyersMS · 2021-04-16T01:47:56Z

FWIW I can't repro the above regression with local builds on my unix box, which is why I wanted to grab the exact bits used by the runs.

DrewScoggins · 2021-04-16T18:58:01Z

Those are the zips that we send to Helix, and we don't really have another place where we can easily get them. If I remember the only thing you had to chmod +x was corerun, and then everything worked. Maybe we can look at including a little shell script that setups the binaries for repro?

AndyAyersMS · 2021-04-16T19:46:58Z

If this is what you send, I wonder what helix does to work around this?

Let me see if fixing corerun to be +x and all the others +r does the trick.

AndyAyersMS · 2021-04-16T22:14:07Z

It requires more futzing about than just that, not quite sure what BDN is doing... at any rate I have the same non-result with the downloaded builds as I did with the local builds. Could be my ancient HW I suppose.

TryGetValueFalse

Base, Download

BenchmarkDotNet=v0.12.1.1521-nightly, OS=ubuntu 18.04
Intel Core i7-2720QM CPU 2.20GHz (Sandy Bridge), 1 CPU, 8 logical and 4 physical cores
.NET SDK=6.0.100-preview.3.21202.5
  [Host]     : .NET 6.0.0 (6.0.21.20104), X64 RyuJIT
  Job-CXOBIO : .NET 6.0.0 (6.0.21.20502), X64 RyuJIT

PowerPlanMode=00000000-0000-0000-0000-000000000000  Arguments=/p:DebugType=portable  Toolchain=CoreRun  
IterationTime=250.0000 ms  MaxIterationCount=20  MinIterationCount=15  
WarmupCount=1

Method	Size	Mean	Error	StdDev	Median	Min	Max
Dictionary	512	14.85 µs	0.060 µs	0.047 µs	14.84 µs	14.79 µs	14.93 µs
IDictionary	512	16.57 µs	0.105 µs	0.093 µs	16.58 µs	16.41 µs	16.71 µs
SortedList	512	420.46 µs	1.288 µs	1.142 µs	420.31 µs	418.88 µs	422.94 µs
SortedDictionary	512	460.61 µs	17.330 µs	19.957 µs	447.18 µs	443.50 µs	487.07 µs
ConcurrentDictionary	512	22.84 µs	0.076 µs	0.068 µs	22.81 µs	22.77 µs	23.00 µs
ImmutableDictionary	512	37.58 µs	0.208 µs	0.184 µs	37.57 µs	37.33 µs	37.91 µs
ImmutableSortedDictionary	512	422.83 µs	1.118 µs	0.933 µs	422.94 µs	421.12 µs	424.30 µs

Base, Local

BenchmarkDotNet=v0.12.1.1521-nightly, OS=ubuntu 18.04
Intel Core i7-2720QM CPU 2.20GHz (Sandy Bridge), 1 CPU, 8 logical and 4 physical cores
.NET SDK=6.0.100-preview.3.21202.5
  [Host]     : .NET 6.0.0 (6.0.21.20104), X64 RyuJIT
  Job-NDBUIM : .NET 6.0.0 (42.42.42.42424), X64 RyuJIT

PowerPlanMode=00000000-0000-0000-0000-000000000000  Arguments=/p:DebugType=portable  Toolchain=CoreRun  
IterationTime=250.0000 ms  MaxIterationCount=20  MinIterationCount=15  
WarmupCount=1

Method	Size	Mean	Error	StdDev	Median	Min	Max
Dictionary	512	14.88 µs	0.070 µs	0.062 µs	14.86 µs	14.80 µs	15.01 µs
IDictionary	512	16.67 µs	0.202 µs	0.169 µs	16.61 µs	16.53 µs	17.13 µs
SortedList	512	417.57 µs	1.038 µs	0.867 µs	417.49 µs	416.20 µs	418.93 µs
SortedDictionary	512	438.12 µs	3.399 µs	3.013 µs	438.20 µs	433.96 µs	443.15 µs
ConcurrentDictionary	512	22.92 µs	0.053 µs	0.044 µs	22.90 µs	22.86 µs	22.99 µs
ImmutableDictionary	512	36.94 µs	0.134 µs	0.126 µs	36.92 µs	36.75 µs	37.18 µs
ImmutableSortedDictionary	512	422.31 µs	0.916 µs	0.765 µs	422.20 µs	420.92 µs	424.09 µs

Diff, Download

BenchmarkDotNet=v0.12.1.1521-nightly, OS=ubuntu 18.04
Intel Core i7-2720QM CPU 2.20GHz (Sandy Bridge), 1 CPU, 8 logical and 4 physical cores
.NET SDK=6.0.100-preview.3.21202.5
  [Host]     : .NET 6.0.0 (6.0.21.20104), X64 RyuJIT
  Job-PLJKLO : .NET 6.0.0 (6.0.21.20602), X64 RyuJIT

PowerPlanMode=00000000-0000-0000-0000-000000000000  Arguments=/p:DebugType=portable  Toolchain=CoreRun  
IterationTime=250.0000 ms  MaxIterationCount=20  MinIterationCount=15  
WarmupCount=1

Method	Size	Mean	Error	StdDev	Median	Min	Max
Dictionary	512	14.94 µs	0.108 µs	0.101 µs	14.92 µs	14.82 µs	15.18 µs
IDictionary	512	16.73 µs	0.142 µs	0.126 µs	16.78 µs	16.49 µs	16.90 µs
SortedList	512	434.83 µs	1.179 µs	0.985 µs	434.84 µs	433.51 µs	436.95 µs
SortedDictionary	512	468.35 µs	1.755 µs	1.465 µs	468.34 µs	466.65 µs	471.78 µs
ConcurrentDictionary	512	23.12 µs	0.299 µs	0.280 µs	22.98 µs	22.84 µs	23.75 µs
ImmutableDictionary	512	37.81 µs	0.702 µs	0.548 µs	37.64 µs	37.23 µs	39.04 µs
ImmutableSortedDictionary	512	445.27 µs	5.198 µs	4.608 µs	442.97 µs	441.13 µs	454.60 µs

Diff, Local

BenchmarkDotNet=v0.12.1.1521-nightly, OS=ubuntu 18.04
Intel Core i7-2720QM CPU 2.20GHz (Sandy Bridge), 1 CPU, 8 logical and 4 physical cores
.NET SDK=6.0.100-preview.3.21202.5
  [Host]     : .NET 6.0.0 (6.0.21.20104), X64 RyuJIT
  Job-KXUFTT : .NET 6.0.0 (42.42.42.42424), X64 RyuJIT

PowerPlanMode=00000000-0000-0000-0000-000000000000  Arguments=/p:DebugType=portable  Toolchain=CoreRun  
IterationTime=250.0000 ms  MaxIterationCount=20  MinIterationCount=15  
WarmupCount=1

Method	Size	Mean	Error	StdDev	Median	Min	Max
Dictionary	512	14.74 µs	0.070 µs	0.062 µs	14.73 µs	14.64 µs	14.86 µs
IDictionary	512	16.36 µs	0.080 µs	0.075 µs	16.37 µs	16.27 µs	16.50 µs
SortedList	512	408.75 µs	3.088 µs	2.411 µs	407.43 µs	406.70 µs	413.31 µs
SortedDictionary	512	448.91 µs	1.650 µs	1.462 µs	449.24 µs	446.71 µs	451.09 µs
ConcurrentDictionary	512	22.67 µs	0.057 µs	0.048 µs	22.68 µs	22.58 µs	22.75 µs
ImmutableDictionary	512	37.48 µs	0.114 µs	0.101 µs	37.46 µs	37.31 µs	37.68 µs
ImmutableSortedDictionary	512	433.14 µs	3.722 µs	2.906 µs	433.14 µs	429.73 µs	436.68 µs

AndyAyersMS · 2021-04-16T22:22:54Z

Also note the same tests on windows x64 sped up at that same point:

Windows

Ubuntu

(likewise for IDictionary)

I don't suppose we have ETL/IR data for those windows runs...?

DrewScoggins · 2021-04-16T22:45:34Z

We do not, ETL collection has been really dodgy on lab machines as of late, with IR collection even worse. You could use the two Windows builds below to collect traces locally though.

Baseline
Compare

AndyAyersMS · 2021-06-11T21:25:56Z

Taking a fresh look, here's the recent history. We see 20% or so swings in perf. Also lately we seem to have reached a new low, but given history it's not clear how durable that is going to be.

Working back in time, these jumps all seem to be correlated with PGO updates:

[main] Update dependencies from dnceng/internal/dotnet-optimization #53864 (June 11)
[main] Update dependencies from dnceng/internal/dotnet-optimization #53672 (June 7)
[main] Update dependencies from dnceng/internal/dotnet-optimization #53343 (June 2)
[main] Update dependencies from dnceng/internal/dotnet-optimization #52966 (May 21)
[main] Update dependencies from dnceng/internal/dotnet-optimization #52901 (May 18)
[main] Update dependencies from dnceng/internal/dotnet-optimization #52678 (May 18)
[main] Update dependencies from dnceng/internal/dotnet-optimization #52338 (May 6)
Update pgo data versions, and remove excess nuget details handling old ibc data #52082 (April 29)

We should be able to use profiling to focus in on the key methods, and then check the optimization data we've gathered to see if it is fluctuating.

Also worth noting, the dictionary case (which is testing the same exact code), shows the same pattern of fluctuation.

AndyAyersMS · 2021-06-11T22:47:58Z

Profiling the test above (and filtering to just the "actual" timed runs done by BDN)

Looking at codegen, the most likely method impacted by PGO is FindValue; in the run I do locally it has a couple of GDV sites. GetgNonRandomizedHashCode does some block reordering with PGO, but looks like the impact of that should be small. So we'll focus on the PGO data for FindValue.

AndyAyersMS · 2021-06-11T23:07:12Z

With current profile data, here are the PGO counts.

-----------------------------------------------------------------------------------------------------------------------------------------
BBnum BBid ref try hnd                 weight      IBC  lp [IL range]     [jump]      [EH region]         [flags]
-----------------------------------------------------------------------------------------------------------------------------------------
BB01 [0000]  1                             798k 797559    [000..008)-> BB03 ( cond )                     IBC 
BB02 [0001]  1                             0         0    [008..00E)                                     rare IBC 
BB03 [0002]  2                             798k 797559    [00E..01F)-> BB26 ( cond )                     IBC 
BB04 [0003]  1                             764k 763664    [01F..02C)-> BB18 ( cond )                     IBC 
BB05 [0004]  1                             158k 157883    [02C..060)-> BB12 ( cond )                     IBC 
BB06 [0005]  1                             0         0    [060..066)                                     rare IBC 
BB07 [0006]  2                             0         0    [066..071)-> BB26 ( cond )                     rare bwd bwd-target IBC 
BB08 [0007]  1                             0         0    [071..084)-> BB10 ( cond )                     rare bwd IBC 
BB09 [0008]  1                             0         0    [084..09A)-> BB24 ( cond )                     rare bwd IBC 
BB10 [0009]  2                             0         0    [09A..0B0)-> BB07 ( cond )                     rare bwd IBC 
BB11 [0010]  1                             0         0    [0B0..0B5)-> BB23 (always)                     rare IBC 
BB12 [0011]  1                             158k 157883    [0B5..0C2)                                     IBC 
BB13 [0012]  2                             176k 176445    [0C2..0CD)-> BB26 ( cond )                     bwd bwd-target IBC 
BB14 [0013]  1                             168k 167622    [0CD..0E0)-> BB16 ( cond )                     bwd IBC 
BB15 [0014]  1                             149k 149060    [0E0..0F3)-> BB24 ( cond )                     bwd IBC 
BB16 [0015]  2                           18562.  18562    [0F3..109)-> BB13 ( cond )                     bwd IBC 
BB17 [0016]  1                             0         0    [109..10B)-> BB23 (always)                     rare IBC 
BB18 [0017]  1                             606k 605781    [10B..130)                                     IBC 
BB19 [0018]  2                             874k 874419    [130..138)-> BB26 ( cond )                     bwd bwd-target IBC 
BB20 [0019]  1                             523k 522763    [138..14C)-> BB22 ( cond )                     bwd IBC 
BB21 [0020]  1                             254k 254125    [14C..15B)-> BB24 ( cond )                     bwd IBC 
BB22 [0021]  2                             269k 268638    [15B..171)-> BB19 ( cond )                     bwd IBC 
BB23 [0022]  3                             0         0    [171..176)                                     rare IBC 
BB24 [0023]  4                             403k 403185    [176..17D)                                     IBC 
BB25 [0024]  2                             798k 797559    [17D..17F)        (return)                     bwd-target IBC 
BB26 [0025]  4                             394k 394374    [17F..187)-> BB25 (always)                     bwd IBC 
-----------------------------------------------------------------------------------------------------------------------------------------

the GDV sites in FindValue are both kind of marginal:

impImportBlockPending for BB24

    [ 1]  46 (0x02e) constrained. (1B000018) callvirt 06000505
In Compiler::impImportCall: opcode is callvirt, kind=4, callRetType is int, structSize is 0

impDevirtualizeCall: Trying to devirtualize virtual call:
    class for 'this' is __Canon (attrib 20020000)
    base method is Object::GetHashCode
--- no derived method: object class was canonical
    Class not final or exact, and method not final
Considering guarded devirtualization at IL offset 52 (0x34)
Likely class for 00007FFED5605540 (__Canon) is 00007FFED56090B8 (RuntimeType) [likelihood:37 classes seen:7]
virtual call would invoke method GetHashCode
Marking call [000234] as guarded devirtualization candidate; will guess for class RuntimeType

Importing BB15 (PC=224) of 'Dictionary`2:FindValue(__Canon):byref:this'
    [ 0] 224 (0x0e0) ldloc.s 7
    [ 1] 226 (0x0e2) ldloc.0
    [ 2] 227 (0x0e3) ldfld 0A000C61
    [ 2] 232 (0x0e8) ldarg.1
    [ 3] 233 (0x0e9) callvirt 0A00048B
In Compiler::impImportCall: opcode is callvirt, kind=4, callRetType is bool, structSize is 0

impDevirtualizeCall: Trying to devirtualize virtual call:
    class for 'this' is EqualityComparer`1 (attrib 20020400)
    base method is EqualityComparer`1::Equals
    devirt to EqualityComparer`1::Equals -- inexact or not final
               [000340] --CXG-------              *  CALLV vt-ind int    EqualityComparer`1.Equals
               [000336] ------------ this in rcx  +--*  LCL_VAR   ref    V09 loc7         
               [000338] ---XG------- arg1         +--*  FIELD     ref    key
               [000337] ------------              |  \--*  LCL_VAR   byref  V02 loc0         
               [000339] ------------ arg2         \--*  LCL_VAR   ref    V01 arg1         
    Class not final or exact, and method not final
Considering guarded devirtualization at IL offset 233 (0xe9)
Likely class for 00007FFED56B6458 (EqualityComparer`1) is 00007FFED587BE58 (ObjectEqualityComparer`1) [likelihood:40 classes seen:7]
virtual call would invoke method Equals
Marking call [000340] as guarded devirtualization candidate; will guess for class ObjectEqualityComparer`1

So we should look into the stability of these class profiles over time.

AndyAyersMS · 2021-06-14T19:58:43Z

Comparing the codegen for FindValue at ccec848 (just before the June 7 update) to bbf9659 (a few days later), the only diff comes from a diff in the edge count info.

For the older data, we have

Profile summary: 8 runs, 0 block probes, 17 edge probes, 2 class profiles, 0 other records

Reconstructing block counts from sparse edge instrumentation
... adding known edge BB02 -> BB03: weight 0
... adding known edge BB07 -> BB26: weight 0
... adding known edge BB09 -> BB10: weight 0
... adding known edge BB09 -> BB24: weight 0
... adding known edge BB10 -> BB07: weight 0
... adding known edge BB13 -> BB26: weight 8927
... adding known edge BB15 -> BB16: weight 0
... adding known edge BB15 -> BB24: weight 150363
... adding known edge BB16 -> BB13: weight 39437
... adding known edge BB17 -> BB23: weight 0
... adding known edge BB19 -> BB26: weight 351612
... adding known edge BB21 -> BB22: weight 0
... adding known edge BB21 -> BB24: weight 253618
... adding known edge BB22 -> BB19: weight 268579
... adding known edge BB22 -> BB23: weight 0
... adding known edge BB24 -> BB25: weight 403980
... adding known edge BB25 -> BB01: weight 798427

and for the newer

Reconstructing block counts from sparse edge instrumentation
... adding known edge BB02 -> BB03: weight 0
... adding known edge BB07 -> BB26: weight 0
... adding known edge BB09 -> BB10: weight 0
... adding known edge BB09 -> BB24: weight 0
... adding known edge BB10 -> BB07: weight 0
... adding known edge BB13 -> BB26: weight 8823
... adding known edge BB15 -> BB16: weight 0
... adding known edge BB15 -> BB24: weight 149060
... adding known edge BB16 -> BB13: weight 18562
... adding known edge BB17 -> BB23: weight 0
... adding known edge BB19 -> BB26: weight 351656
... adding known edge BB21 -> BB22: weight 0
... adding known edge BB21 -> BB24: weight 254125
... adding known edge BB22 -> BB19: weight 268638
... adding known edge BB22 -> BB23: weight 0
... adding known edge BB24 -> BB25: weight 403185
... adding known edge BB25 -> BB01: weight 797559

They largely agree, save for the BB16 -> BB13 edge, which has a much lower count in the second collection.

After solving, this gives a higher weight for BB13, BB14, and BB16 (old profile below, new profile in the comment above)

-----------------------------------------------------------------------------------------------------------------------------------------
BBnum BBid ref try hnd                 weight      IBC  lp [IL range]     [jump]      [EH region]         [flags]
-----------------------------------------------------------------------------------------------------------------------------------------
BB01 [0000]  1                             798k 798427    [000..008)-> BB03 ( cond )                     IBC 
BB02 [0001]  1                             0         0    [008..00E)                                     rare IBC 
BB03 [0002]  2                             798k 798427    [00E..01F)-> BB26 ( cond )                     IBC 
BB04 [0003]  1                             765k 764520    [01F..02C)-> BB18 ( cond )                     IBC 
BB05 [0004]  1                             159k 159290    [02C..060)-> BB12 ( cond )                     IBC 
BB06 [0005]  1                             0         0    [060..066)                                     rare IBC 
BB07 [0006]  2                             0         0    [066..071)-> BB26 ( cond )                     rare bwd bwd-target IBC 
BB08 [0007]  1                             0         0    [071..084)-> BB10 ( cond )                     rare bwd IBC 
BB09 [0008]  1                             0         0    [084..09A)-> BB24 ( cond )                     rare bwd IBC 
BB10 [0009]  2                             0         0    [09A..0B0)-> BB07 ( cond )                     rare bwd IBC 
BB11 [0010]  1                             0         0    [0B0..0B5)-> BB23 (always)                     rare IBC 
BB12 [0011]  1                             159k 159290    [0B5..0C2)                                     IBC 
BB13 [0012]  2                             199k 198727    [0C2..0CD)-> BB26 ( cond )                     bwd bwd-target IBC 
BB14 [0013]  1                             190k 189800    [0CD..0E0)-> BB16 ( cond )                     bwd IBC 
BB15 [0014]  1                             150k 150363    [0E0..0F3)-> BB24 ( cond )                     bwd IBC 
BB16 [0015]  2                           39437.  39437    [0F3..109)-> BB13 ( cond )                     bwd IBC 
BB17 [0016]  1                             0         0    [109..10B)-> BB23 (always)                     rare IBC 
BB18 [0017]  1                             605k 605230    [10B..130)                                     IBC 
BB19 [0018]  2                             874k 873809    [130..138)-> BB26 ( cond )                     bwd bwd-target IBC 
BB20 [0019]  1                             522k 522197    [138..14C)-> BB22 ( cond )                     bwd IBC 
BB21 [0020]  1                             254k 253618    [14C..15B)-> BB24 ( cond )                     bwd IBC 
BB22 [0021]  2                             269k 268579    [15B..171)-> BB19 ( cond )                     bwd IBC 
BB23 [0022]  3                             0         0    [171..176)                                     rare IBC 
BB24 [0023]  4                             404k 403980    [176..17D)                                     IBC 
BB25 [0024]  2                             798k 798427    [17D..17F)        (return)                     bwd-target IBC 
BB26 [0025]  4                             394k 394447    [17F..187)-> BB25 (always)                     bwd IBC 
-----------------------------------------------------------------------------------------------------------------------------------------

Impact of this on codegen is fairly minimal, we end up re-ordering some blocks at the end of the method.

AndyAyersMS · 2021-06-14T20:56:02Z

Similar diffs looking back to the May 21 codegen, edge weights differ leading to mainly different block layouts but otherwise "equivalent" code. Some of the layout diffs also come from shifting likelihoods in guarded devirtualization.

AndyAyersMS · 2021-06-14T22:22:32Z

On my box I'm able to repro about a 4% regression with the June 2 build, vs May 21 and June 7:

BenchmarkDotNet=v0.13.0.1555-nightly, OS=Windows 10.0.19043.1052 (21H1/May2021Update)
Intel Core i7-8700 CPU 3.20GHz (Coffee Lake), 1 CPU, 12 logical and 6 physical cores
.NET SDK=6.0.100-preview.6.21275.3
  [Host]     : .NET 6.0.0 (6.0.21.27401), X64 RyuJIT
  Job-ERFQEC : .NET 6.0.0 (42.42.42.42424), X64 RyuJIT
  Job-TXTABM : .NET 6.0.0 (42.42.42.42424), X64 RyuJIT
  Job-HUUBVC : .NET 6.0.0 (42.42.42.42424), X64 RyuJIT

PowerPlanMode=00000000-0000-0000-0000-000000000000  Arguments=/p:DebugType=portable  IterationTime=250.0000 ms
MaxIterationCount=20  MinIterationCount=15  WarmupCount=1

Method	Job	Toolchain	Size	Mean	Error	StdDev	Median	Min	Max	Ratio	Gen 0	Gen 1	Gen 2	Allocated
IDictionary	Job-ERFQEC	June 7	512	11.28 us	0.158 us	0.148 us	11.27 us	11.00 us	11.59 us	0.99	-	-	-	-
IDictionary	Job-TXTABM	June 2	512	11.70 us	0.179 us	0.167 us	11.72 us	11.42 us	11.90 us	1.03	-	-	-	-
IDictionary	Job-HUUBVC	May 21	512	11.39 us	0.101 us	0.094 us	11.39 us	11.22 us	11.53 us	1.00	-	-	-	-

None of the other key methods have diffs, so evidently the block layout changes in FindValue(__Canon) must be the cause of these perf swings.

Seems like if we fixed the class profile merge logic (see #48549 (comment)) that might lead to somewhat more stable profiles.

Not sure what leads to the remaining count instability. Could perhaps be lost counter updates from concurrency but would not expect such large swings.

danmoseley · 2021-07-14T18:30:41Z

Updating area based on analysis above.

AndyAyersMS · 2021-07-27T18:45:53Z

These tests are quite sensitive to fine details of PGO and perf swings back and forth depending on exact block layout.

Down the road we can likely improve block layout's algorithms to avoid being quite so sensitive. But I don't think we can address this for 6.0. So am going to move to future.

danmoseley · 2021-07-27T21:49:45Z

Are there any characteristics of regressions caused by "fine details of PGO", changing PGO data etc that we can spot in the graphs for other regressions? It feels like when I suspect PGO data I'm just waving my hands. Eg., would it look like bimodal between builds, but stable within iterations on the same build?

AndyAyersMS · 2021-07-27T22:11:03Z

would it look like bimodal between builds, but stable within iterations on the same build?

Yes, this (more or less): if you look up at #51258 (comment) you can see that the perf of the test switches between two levels over time, and the timing of the swings is correlated with PGO updates.

There other other factors that can cause this sort of behavior (memory alignment of data, etc) so checking for the correlations with PGO updates is important.

danmoseley · 2021-07-27T22:20:37Z

Is there a file in the repo that contains the PGO update, then I can correlate it's history (ie., how do I know when there was an update)

danmoseley · 2021-07-27T22:24:51Z

memory alignment of data, etc

Do we still see significant impact from this? I guess there are next steps left in dotnet/performance#1602

On that subject, I guess remaining work on code alignment in #43227 will have to be prioritized one way or another for next cycle too.

AndyAyersMS · 2021-07-27T22:31:32Z

If you look at the test history data from the lab you can select a particular perf jump and left-click on the before level and "set baseline" and do the same for the after level and set compare.

Then you can look at the commit range between the two; you're looking at something like this and within there you'll see an "Update Dependencies" PR -- within that you'll see updates to eng/Version.Details.xml, and in particular this one line that updates the PGO version:

- <Dependency Name="optimization.PGO.CoreCLR" Version="1.0.0-prerelease.21320.4">
+ <Dependency Name="optimization.PGO.CoreCLR" Version="1.0.0-prerelease.21329.4">

danmoseley · 2021-07-28T19:28:07Z

Ah - that's what I need. I'll check that next time.

adamsitnik · 2021-09-14T16:50:15Z

When looking at the last manual perf run for 6.0 I assumed that these benchmarks are just flaky:

System.Collections.ContainsKeyFalse<Int32, Int32>.SortedList(Size: 512)

Result	Base	Diff	Ratio	Alloc Delta	Modality	Operating System	Bit	Processor Name	Base V	Diff V
Same	15630.24	17268.63	0.91	+0		Windows 10.0.19043.1165	X64	AMD Ryzen Threadripper PRO 3945WX 12-Cores	5.0.921.35908	6.0.21.41701
Slower	20678.27	23181.68	0.89	+0		Windows 10.0.20348	X64	AMD EPYC 7452	5.0.921.35908	6.0.21.41701
Slower	20180.17	23033.57	0.88	+0		Windows 10.0.20348	X64	AMD EPYC 7452	5.0.921.35908	6.0.21.41701
Same	28702.59	31566.71	0.91	+0		Windows 10.0.18363.1621	X64	Intel Xeon CPU E5-1650 v4 3.60GHz	5.0.921.35908	6.0.21.41701
Slower	33899.56	40314.91	0.84	+0		Windows 8.1	X64	Intel Core i7-3610QM CPU 2.30GHz (Ivy Bridge)	5.0.921.35908	6.0.21.45401
Same	32344.42	35359.89	0.91	+0		Windows 10.0.19042.685	X64	Intel Core i7-5557U CPU 3.10GHz (Broadwell)	5.0.921.35908	6.0.21.41701
Same	27818.40	30054.14	0.93	+0		Windows 10.0.19043.1165	X64	Intel Core i7-6700 CPU 3.40GHz (Skylake)	5.0.921.35908	6.0.21.41701
Same	37190.26	39400.55	0.94	+0		Windows 10.0.22454	X64	Intel Core i7-8650U CPU 1.90GHz (Kaby Lake R)	5.0.921.35908	6.0.21.41701
Same	25109.38	26330.44	0.95	+0		Windows 10.0.22451	X64	Intel Core i7-8700 CPU 3.20GHz (Coffee Lake)	5.0.921.35908	6.0.21.41701
Same	26131.44	28154.31	0.93	+0		Windows 10.0.19042.1165	X64	Intel Core i9-9900T CPU 2.10GHz	5.0.921.35908	6.0.21.41701
Slower	45064.25	62926.86	0.72	+0		Windows 7 SP1	X64	Intel Core2 Duo CPU T9600 2.80GHz	5.0.721.25508	6.0.21.41701
Slower	18594.02	23898.72	0.78	+0		centos 8	X64	AMD EPYC 7452	5.0.921.35908	6.0.21.41701
Slower	18902.81	23519.43	0.80	+0		debian 10	X64	AMD EPYC 7452	5.0.921.35908	6.0.21.41701
Slower	19385.65	24385.02	0.79	+0		rhel 7	X64	AMD EPYC 7452	5.0.921.35908	6.0.21.41701
Slower	18758.27	23875.44	0.79	+0		sles 15	X64	AMD EPYC 7452	5.0.921.35908	6.0.21.41701
Slower	19208.14	22716.50	0.85	+0		opensuse-leap 15.3	X64	AMD EPYC 7452	5.0.921.35908	6.0.21.41701
Slower	25873.41	29614.95	0.87	+0		ubuntu 18.04	X64	Intel Xeon CPU E5-1650 v4 3.60GHz	5.0.921.35908	6.0.21.41701
Slower	34763.92	38984.71	0.89	+0		ubuntu 18.04	X64	Intel Core i7-2720QM CPU 2.20GHz (Sandy Bridge)	5.0.921.35908	6.0.21.41701
Same	31299.91	33194.56	0.94	+0		alpine 3.13	X64	Intel Core i7-7700 CPU 3.60GHz (Kaby Lake)	5.0.921.35908	6.0.21.41701
Same	43801.13	44121.66	0.99	+0		ubuntu 16.04	Arm64	Unknown processor	5.0.421.11614	6.0.21.41701
Same	33187.39	36221.29	0.92	+0		Windows 10.0.19043.1165	Arm64	Microsoft SQ1 3.0 GHz	5.0.921.35908	6.0.21.41701
Same	34680.78	36189.27	0.96	+0		Windows 10.0.22000	Arm64	Microsoft SQ1 3.0 GHz	5.0.921.35908	6.0.21.41701
Slower	15620.29	18899.32	0.83	+0		Windows 10.0.19043.1165	X86	AMD Ryzen Threadripper PRO 3945WX 12-Cores	5.0.921.35908	6.0.21.41701
Same	28480.79	30515.61	0.93	+0	bimodal	Windows 10.0.18363.1621	X86	Intel Xeon CPU E5-1650 v4 3.60GHz	5.0.921.35908	6.0.21.41701
Slower	32878.72	37487.28	0.88	+0		Windows 10.0.19043.1165	Arm	Microsoft SQ1 3.0 GHz	5.0.921.35908	6.0.21.41701
Slower	34081.18	38946.04	0.88	+0		macOS Big Sur 11.5.2	X64	Intel Core i5-4278U CPU 2.60GHz (Haswell)	5.0.921.35908	6.0.21.41701
Slower	28952.99	32334.64	0.90	+0		macOS Big Sur 11.5.2	X64	Intel Core i7-4870HQ CPU 2.50GHz (Haswell)	5.0.921.35908	6.0.21.41701
Slower	30580.86	34945.53	0.88	+0		macOS Big Sur 11.4	X64	Intel Core i7-5557U CPU 3.10GHz (Broadwell)	5.0.921.35908	6.0.21.41701

System.Collections.TryGetValueFalse<Int32, Int32>.SortedList(Size: 512)

Result	Base	Diff	Ratio	Alloc Delta	Modality	Operating System	Bit	Processor Name	Base V	Diff V
Slower	14583.11	18196.25	0.80	+0		Windows 10.0.19043.1165	X64	AMD Ryzen Threadripper PRO 3945WX 12-Cores	5.0.921.35908	6.0.21.41701
Slower	19698.44	24552.94	0.80	+0		Windows 10.0.20348	X64	AMD EPYC 7452	5.0.921.35908	6.0.21.41701
Slower	19549.13	24505.74	0.80	+0		Windows 10.0.20348	X64	AMD EPYC 7452	5.0.921.35908	6.0.21.41701
Same	28687.70	31988.37	0.90	+0	bimodal	Windows 10.0.18363.1621	X64	Intel Xeon CPU E5-1650 v4 3.60GHz	5.0.921.35908	6.0.21.41701
Slower	34815.35	40048.39	0.87	+0		Windows 8.1	X64	Intel Core i7-3610QM CPU 2.30GHz (Ivy Bridge)	5.0.921.35908	6.0.21.45401
Same	32563.28	35785.74	0.91	+0		Windows 10.0.19042.685	X64	Intel Core i7-5557U CPU 3.10GHz (Broadwell)	5.0.921.35908	6.0.21.41701
Same	27969.65	29438.33	0.95	+0		Windows 10.0.19043.1165	X64	Intel Core i7-6700 CPU 3.40GHz (Skylake)	5.0.921.35908	6.0.21.41701
Same	37241.70	39451.36	0.94	+0		Windows 10.0.22454	X64	Intel Core i7-8650U CPU 1.90GHz (Kaby Lake R)	5.0.921.35908	6.0.21.41701
Same	24716.30	26593.07	0.93	+0		Windows 10.0.22451	X64	Intel Core i7-8700 CPU 3.20GHz (Coffee Lake)	5.0.921.35908	6.0.21.41701
Same	27215.48	27920.34	0.97	+0		Windows 10.0.19042.1165	X64	Intel Core i9-9900T CPU 2.10GHz	5.0.921.35908	6.0.21.41701
Slower	46122.94	63292.95	0.73	+0		Windows 7 SP1	X64	Intel Core2 Duo CPU T9600 2.80GHz	5.0.721.25508	6.0.21.41701
Slower	18494.56	22117.07	0.84	+0		centos 8	X64	AMD EPYC 7452	5.0.921.35908	6.0.21.41701
Slower	19264.81	25169.32	0.77	+0		debian 10	X64	AMD EPYC 7452	5.0.921.35908	6.0.21.41701
Slower	18693.45	22418.03	0.83	+0		rhel 7	X64	AMD EPYC 7452	5.0.921.35908	6.0.21.41701
Slower	20366.13	25532.80	0.80	+0		sles 15	X64	AMD EPYC 7452	5.0.921.35908	6.0.21.41701
Slower	19613.86	25289.33	0.78	+0		opensuse-leap 15.3	X64	AMD EPYC 7452	5.0.921.35908	6.0.21.41701
Slower	26467.75	30251.47	0.87	+0		ubuntu 18.04	X64	Intel Xeon CPU E5-1650 v4 3.60GHz	5.0.921.35908	6.0.21.41701
Slower	34747.86	38926.07	0.89	+0		ubuntu 18.04	X64	Intel Core i7-2720QM CPU 2.20GHz (Sandy Bridge)	5.0.921.35908	6.0.21.41701
Slower	29441.72	33311.51	0.88	+0		alpine 3.13	X64	Intel Core i7-7700 CPU 3.60GHz (Kaby Lake)	5.0.921.35908	6.0.21.41701
Same	43243.67	43650.36	0.99	+0		ubuntu 16.04	Arm64	Unknown processor	5.0.421.11614	6.0.21.41701
Same	33389.19	34600.58	0.96	+0		Windows 10.0.19043.1165	Arm64	Microsoft SQ1 3.0 GHz	5.0.921.35908	6.0.21.41701
Same	35076.73	34480.20	1.02	+0		Windows 10.0.22000	Arm64	Microsoft SQ1 3.0 GHz	5.0.921.35908	6.0.21.41701
Slower	15611.10	19623.90	0.80	+0		Windows 10.0.19043.1165	X86	AMD Ryzen Threadripper PRO 3945WX 12-Cores	5.0.921.35908	6.0.21.41701
Same	27990.94	30754.16	0.91	+0		Windows 10.0.18363.1621	X86	Intel Xeon CPU E5-1650 v4 3.60GHz	5.0.921.35908	6.0.21.41701
Same	35018.86	36948.07	0.95	+0		Windows 10.0.19043.1165	Arm	Microsoft SQ1 3.0 GHz	5.0.921.35908	6.0.21.41701
Slower	33809.69	42191.42	0.80	+0		macOS Big Sur 11.5.2	X64	Intel Core i5-4278U CPU 2.60GHz (Haswell)	5.0.921.35908	6.0.21.41701
Slower	28922.73	33947.31	0.85	+0		macOS Big Sur 11.5.2	X64	Intel Core i7-4870HQ CPU 2.50GHz (Haswell)	5.0.921.35908	6.0.21.41701
Slower	30420.86	35304.35	0.86	+0		macOS Big Sur 11.4	X64	Intel Core i7-5557U CPU 3.10GHz (Broadwell)	5.0.921.35908	6.0.21.41701

But looking at the historical data it seems that we have slightly regressed SortedList:

https://pvscmdupload.blob.core.windows.net/reports/allTestHistory%2frefs%2fheads%2fmain_x64_Windows%2010.0.18362%2fSystem.Collections.TryGetValueFalse(Int32%2c%20Int32).SortedList(Size%3a%20512).html

AndyAyersMS · 2024-06-19T16:31:04Z

Stale issue.

DrewScoggins added os-linux Linux OS (any supported distro) tenet-performance Performance related issue tenet-performance-benchmarks Issue from performance benchmark arch-x64 labels Apr 14, 2021

DrewScoggins assigned AndyAyersMS Apr 14, 2021

dotnet-issue-labeler bot added area-System.Collections untriaged New issue has not been triaged by the area owner labels Apr 14, 2021

AndyAyersMS added this to the 6.0.0 milestone Apr 17, 2021

jeffschwMSFT removed the untriaged New issue has not been triaged by the area owner label Jul 9, 2021

danmoseley added area-CodeGen-coreclr CLR JIT compiler in src/coreclr/src/jit and related components such as SuperPMI and removed area-System.Collections labels Jul 14, 2021

AndyAyersMS modified the milestones: 6.0.0, Future Jul 27, 2021

AndyAyersMS mentioned this issue Sep 15, 2021

Investigate stability of PGO updates #52610

Open

adamsitnik mentioned this issue Sep 17, 2021

.NET 6.0 Microbenchmarks Performance Study Report #59272

Closed

19 tasks

AndyAyersMS closed this as completed Jun 19, 2024

github-actions bot locked and limited conversation to collaborators Jul 20, 2024

[Perf] Regressions in System.Collections.TryGetValueFalse<String, String> #51258

[Perf] Regressions in System.Collections.TryGetValueFalse<String, String> #51258

Comments

DrewScoggins commented Apr 14, 2021 • edited by BruceForstall Loading

Run Information

Regressions in System.Collections.TryGetValueFalse<String, String>

Repro

Payloads

Histogram

System.Collections.TryGetValueFalse<String, String>.IDictionary(Size: 512)

System.Collections.TryGetValueFalse<String, String>.Dictionary(Size: 512)

Docs

ghost commented Apr 14, 2021

Run Information

Regressions in System.Collections.TryGetValueFalse<String, String>

Repro

Payloads

Histogram

System.Collections.TryGetValueFalse<String, String>.IDictionary(Size: 512)

System.Collections.TryGetValueFalse<String, String>.Dictionary(Size: 512)

Docs

DrewScoggins commented Apr 14, 2021

Run Information

Improvemnts in System.Collections.ContainsKeyFalse<String, String>

Repro

Payloads

Histogram

System.Collections.ContainsKeyFalse<String, String>.IDictionary(Size: 512)

System.Collections.ContainsKeyFalse<String, String>.Dictionary(Size: 512)

Docs

DrewScoggins commented Apr 14, 2021

AndyAyersMS commented Apr 16, 2021

AndyAyersMS commented Apr 16, 2021

DrewScoggins commented Apr 16, 2021

AndyAyersMS commented Apr 16, 2021

AndyAyersMS commented Apr 16, 2021

TryGetValueFalse

Base, Download

Base, Local

Diff, Download

Diff, Local

AndyAyersMS commented Apr 16, 2021

Windows

Ubuntu

DrewScoggins commented Apr 16, 2021

AndyAyersMS commented Jun 11, 2021 • edited Loading

AndyAyersMS commented Jun 11, 2021

AndyAyersMS commented Jun 11, 2021

AndyAyersMS commented Jun 14, 2021

AndyAyersMS commented Jun 14, 2021

AndyAyersMS commented Jun 14, 2021

danmoseley commented Jul 14, 2021

AndyAyersMS commented Jul 27, 2021

danmoseley commented Jul 27, 2021

AndyAyersMS commented Jul 27, 2021

danmoseley commented Jul 27, 2021

danmoseley commented Jul 27, 2021

AndyAyersMS commented Jul 27, 2021

danmoseley commented Jul 28, 2021

adamsitnik commented Sep 14, 2021

System.Collections.ContainsKeyFalse<Int32, Int32>.SortedList(Size: 512)

System.Collections.TryGetValueFalse<Int32, Int32>.SortedList(Size: 512)

AndyAyersMS commented Jun 19, 2024

DrewScoggins commented Apr 14, 2021 •

edited by BruceForstall

Loading

AndyAyersMS commented Jun 11, 2021 •

edited

Loading