Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Perf regressions in Perf_Matrix benchmarks #59415

Closed
performanceautofiler bot opened this issue Sep 21, 2021 · 21 comments
Closed

Perf regressions in Perf_Matrix benchmarks #59415

performanceautofiler bot opened this issue Sep 21, 2021 · 21 comments
Assignees
Labels
arch-x64 area-CodeGen-coreclr CLR JIT compiler in src/coreclr/src/jit and related components such as SuperPMI os-linux Linux OS (any supported distro) tenet-performance Performance related issue tenet-performance-benchmarks Issue from performance benchmark
Milestone

Comments

@performanceautofiler
Copy link

Run Information

Architecture x64
OS ubuntu 18.04
Baseline 10b2f7c858934c27c56aa13845cb47064bfe87e9
Compare a842e7a1dc6c241c928c6291411393cdb2516608
Diff Diff

Regressions in System.IO.Tests.BinaryWriterTests

Benchmark Baseline Test Test/Base Test Quality Edge Detector Baseline IR Compare IR IR Ratio Baseline ETL Compare ETL
WriteAsciiChar - Duration of single invocation 3.44 ns 4.71 ns 1.37 0.17 True

graph
Test Report

Repro

git clone https://github.com/dotnet/performance.git
python3 .\performance\scripts\benchmarks_ci.py -f net6.0 --filter 'System.IO.Tests.BinaryWriterTests*'

Payloads

Baseline
Compare

Histogram

System.IO.Tests.BinaryWriterTests.WriteAsciiChar


Docs

Profiling workflow for dotnet/runtime repository
Benchmarking workflow for dotnet/runtime repository

### Run Information
Architecture x64
OS ubuntu 18.04
Baseline 10b2f7c858934c27c56aa13845cb47064bfe87e9
Compare a842e7a1dc6c241c928c6291411393cdb2516608
Diff Diff

Regressions in System.Numerics.Tests.Perf_Matrix3x2

Benchmark Baseline Test Test/Base Test Quality Edge Detector Baseline IR Compare IR IR Ratio Baseline ETL Compare ETL
InequalityOperatorBenchmark - Duration of single invocation 6.33 ns 12.21 ns 1.93 0.07 True
LerpBenchmark - Duration of single invocation 8.16 ns 13.30 ns 1.63 0.04 True
NegationOperatorBenchmark - Duration of single invocation 6.35 ns 10.87 ns 1.71 0.10 False
AddBenchmark - Duration of single invocation 8.96 ns 15.89 ns 1.77 0.08 True

graph
graph
graph
graph
Test Report

Repro

git clone https://github.com/dotnet/performance.git
python3 .\performance\scripts\benchmarks_ci.py -f net6.0 --filter 'System.Numerics.Tests.Perf_Matrix3x2*'

Payloads

Baseline
Compare

Histogram

System.Numerics.Tests.Perf_Matrix3x2.InequalityOperatorBenchmark


System.Numerics.Tests.Perf_Matrix3x2.LerpBenchmark


System.Numerics.Tests.Perf_Matrix3x2.NegationOperatorBenchmark


System.Numerics.Tests.Perf_Matrix3x2.AddBenchmark


Docs

Profiling workflow for dotnet/runtime repository
Benchmarking workflow for dotnet/runtime repository

Run Information

Architecture x64
OS ubuntu 18.04
Baseline 10b2f7c858934c27c56aa13845cb47064bfe87e9
Compare a842e7a1dc6c241c928c6291411393cdb2516608
Diff Diff

Regressions in System.Text.Json.Serialization.Tests.ReadJson<Nullable<DateTimeOffset>>

Benchmark Baseline Test Test/Base Test Quality Edge Detector Baseline IR Compare IR IR Ratio Baseline ETL Compare ETL
DeserializeFromUtf8Bytes - Duration of single invocation 130.22 ns 140.77 ns 1.08 0.05 True

graph
Test Report

Repro

git clone https://github.com/dotnet/performance.git
python3 .\performance\scripts\benchmarks_ci.py -f net6.0 --filter 'System.Text.Json.Serialization.Tests.ReadJson&lt;Nullable&lt;DateTimeOffset&gt;&gt;*'

Payloads

Baseline
Compare

Histogram

System.Text.Json.Serialization.Tests.ReadJson<Nullable<DateTimeOffset>>.DeserializeFromUtf8Bytes


Docs

Profiling workflow for dotnet/runtime repository
Benchmarking workflow for dotnet/runtime repository

Run Information

Architecture x64
OS ubuntu 18.04
Baseline 10b2f7c858934c27c56aa13845cb47064bfe87e9
Compare a842e7a1dc6c241c928c6291411393cdb2516608
Diff Diff

Regressions in System.Collections.Sort<BigStruct>

Benchmark Baseline Test Test/Base Test Quality Edge Detector Baseline IR Compare IR IR Ratio Baseline ETL Compare ETL
Array_ComparerStruct - Duration of single invocation 33.94 μs 46.92 μs 1.38 0.02 True
Array_Comparison - Duration of single invocation 30.67 μs 45.16 μs 1.47 0.03 True
Array_ComparerClass - Duration of single invocation 31.22 μs 44.95 μs 1.44 0.07 True

graph
graph
graph
Test Report

Repro

git clone https://github.com/dotnet/performance.git
python3 .\performance\scripts\benchmarks_ci.py -f net6.0 --filter 'System.Collections.Sort&lt;BigStruct&gt;*'

Payloads

Baseline
Compare

Histogram

System.Collections.Sort<BigStruct>.Array_ComparerStruct(Size: 512)


System.Collections.Sort<BigStruct>.Array_Comparison(Size: 512)


System.Collections.Sort<BigStruct>.Array_ComparerClass(Size: 512)


Docs

Profiling workflow for dotnet/runtime repository
Benchmarking workflow for dotnet/runtime repository

Run Information

Architecture x64
OS ubuntu 18.04
Baseline 10b2f7c858934c27c56aa13845cb47064bfe87e9
Compare a842e7a1dc6c241c928c6291411393cdb2516608
Diff Diff

Regressions in System.Collections.IterateForEachNonGeneric<Int32>

Benchmark Baseline Test Test/Base Test Quality Edge Detector Baseline IR Compare IR IR Ratio Baseline ETL Compare ETL
Queue - Duration of single invocation 3.61 μs 3.85 μs 1.07 0.06 False

graph
Test Report

Repro

git clone https://github.com/dotnet/performance.git
python3 .\performance\scripts\benchmarks_ci.py -f net6.0 --filter 'System.Collections.IterateForEachNonGeneric&lt;Int32&gt;*'

Payloads

Baseline
Compare

Histogram

System.Collections.IterateForEachNonGeneric<Int32>.Queue(Size: 512)


Docs

Profiling workflow for dotnet/runtime repository
Benchmarking workflow for dotnet/runtime repository

Run Information

Architecture x64
OS ubuntu 18.04
Baseline 10b2f7c858934c27c56aa13845cb47064bfe87e9
Compare a842e7a1dc6c241c928c6291411393cdb2516608
Diff Diff

Regressions in System.Collections.IterateForEach<Int32>

Benchmark Baseline Test Test/Base Test Quality Edge Detector Baseline IR Compare IR IR Ratio Baseline ETL Compare ETL
ConcurrentStack - Duration of single invocation 2.49 μs 2.69 μs 1.08 0.01 False

graph
Test Report

Repro

git clone https://github.com/dotnet/performance.git
python3 .\performance\scripts\benchmarks_ci.py -f net6.0 --filter 'System.Collections.IterateForEach&lt;Int32&gt;*'

Payloads

Baseline
Compare

Histogram

System.Collections.IterateForEach<Int32>.ConcurrentStack(Size: 512)


Docs

Profiling workflow for dotnet/runtime repository
Benchmarking workflow for dotnet/runtime repository

Run Information

Architecture x64
OS ubuntu 18.04
Baseline 10b2f7c858934c27c56aa13845cb47064bfe87e9
Compare a842e7a1dc6c241c928c6291411393cdb2516608
Diff Diff

Regressions in System.Collections.ContainsFalse<Int32>

Benchmark Baseline Test Test/Base Test Quality Edge Detector Baseline IR Compare IR IR Ratio Baseline ETL Compare ETL
ImmutableSortedSet - Duration of single invocation 28.60 μs 31.25 μs 1.09 0.00 True

graph
Test Report

Repro

git clone https://github.com/dotnet/performance.git
python3 .\performance\scripts\benchmarks_ci.py -f net6.0 --filter 'System.Collections.ContainsFalse&lt;Int32&gt;*'

Payloads

Baseline
Compare

Histogram

System.Collections.ContainsFalse<Int32>.ImmutableSortedSet(Size: 512)


Docs

Profiling workflow for dotnet/runtime repository
Benchmarking workflow for dotnet/runtime repository

Run Information

Architecture x64
OS ubuntu 18.04
Baseline 10b2f7c858934c27c56aa13845cb47064bfe87e9
Compare a842e7a1dc6c241c928c6291411393cdb2516608
Diff Diff

Regressions in System.Numerics.Tests.Perf_Vector2

Benchmark Baseline Test Test/Base Test Quality Edge Detector Baseline IR Compare IR IR Ratio Baseline ETL Compare ETL
TransformNormalByMatrix3x2Benchmark - Duration of single invocation 1.99 ns 8.09 ns 4.06 0.14 True

graph
Test Report

Repro

git clone https://github.com/dotnet/performance.git
python3 .\performance\scripts\benchmarks_ci.py -f net6.0 --filter 'System.Numerics.Tests.Perf_Vector2*'

Payloads

Baseline
Compare

Histogram

System.Numerics.Tests.Perf_Vector2.TransformNormalByMatrix3x2Benchmark


Docs

Profiling workflow for dotnet/runtime repository
Benchmarking workflow for dotnet/runtime repository

Run Information

Architecture x64
OS ubuntu 18.04
Baseline 10b2f7c858934c27c56aa13845cb47064bfe87e9
Compare a842e7a1dc6c241c928c6291411393cdb2516608
Diff Diff

Regressions in System.Memory.Span<Byte>

Benchmark Baseline Test Test/Base Test Quality Edge Detector Baseline IR Compare IR IR Ratio Baseline ETL Compare ETL
SequenceCompareTo - Duration of single invocation 9.92 ns 12.15 ns 1.23 0.08 False

graph
Test Report

Repro

git clone https://github.com/dotnet/performance.git
python3 .\performance\scripts\benchmarks_ci.py -f net6.0 --filter 'System.Memory.Span&lt;Byte&gt;*'

Payloads

Baseline
Compare

Histogram

System.Memory.Span<Byte>.SequenceCompareTo(Size: 512)


Docs

Profiling workflow for dotnet/runtime repository
Benchmarking workflow for dotnet/runtime repository

Run Information

Architecture x64
OS ubuntu 18.04
Baseline 10b2f7c858934c27c56aa13845cb47064bfe87e9
Compare a842e7a1dc6c241c928c6291411393cdb2516608
Diff Diff

Regressions in System.Text.RegularExpressions.Tests.Perf_Regex_Common

Benchmark Baseline Test Test/Base Test Quality Edge Detector Baseline IR Compare IR IR Ratio Baseline ETL Compare ETL
Uri_IsMatch - Duration of single invocation 132.85 ns 143.51 ns 1.08 0.04 False

graph
Test Report

Repro

git clone https://github.com/dotnet/performance.git
python3 .\performance\scripts\benchmarks_ci.py -f net6.0 --filter 'System.Text.RegularExpressions.Tests.Perf_Regex_Common*'

Payloads

Baseline
Compare

Histogram

System.Text.RegularExpressions.Tests.Perf_Regex_Common.Uri_IsMatch(Options: IgnoreCase, Compiled)


Docs

Profiling workflow for dotnet/runtime repository
Benchmarking workflow for dotnet/runtime repository

@kunalspathak kunalspathak changed the title [Perf] Changes at 9/18/2021 1:42:06 AM Perf regressions in Perf_Matrix benchmarks Sep 21, 2021
@kunalspathak kunalspathak transferred this issue from dotnet/perf-autofiling-issues Sep 21, 2021
@dotnet-issue-labeler
Copy link

I couldn't figure out the best area label to add to this issue. If you have write-permissions please help me learn by adding exactly one area label.

@dotnet-issue-labeler dotnet-issue-labeler bot added the untriaged New issue has not been triaged by the area owner label Sep 21, 2021
@kunalspathak kunalspathak added arch-x64 os-linux Linux OS (any supported distro) area-CodeGen-coreclr CLR JIT compiler in src/coreclr/src/jit and related components such as SuperPMI and removed untriaged New issue has not been triaged by the area owner labels Sep 21, 2021
@ghost
Copy link

ghost commented Sep 21, 2021

Tagging subscribers to this area: @JulieLeeMSFT
See info in area-owners.md if you want to be subscribed.

Issue Details

Run Information

Architecture x64
OS ubuntu 18.04
Baseline 10b2f7c858934c27c56aa13845cb47064bfe87e9
Compare a842e7a1dc6c241c928c6291411393cdb2516608
Diff Diff

Regressions in System.IO.Tests.BinaryWriterTests

Benchmark Baseline Test Test/Base Test Quality Edge Detector Baseline IR Compare IR IR Ratio Baseline ETL Compare ETL
WriteAsciiChar - Duration of single invocation 3.44 ns 4.71 ns 1.37 0.17 True

graph
Test Report

Repro

git clone https://github.com/dotnet/performance.git
python3 .\performance\scripts\benchmarks_ci.py -f net6.0 --filter 'System.IO.Tests.BinaryWriterTests*'

Payloads

Baseline
Compare

Histogram

System.IO.Tests.BinaryWriterTests.WriteAsciiChar


Docs

Profiling workflow for dotnet/runtime repository
Benchmarking workflow for dotnet/runtime repository

### Run Information
Architecture x64
OS ubuntu 18.04
Baseline 10b2f7c858934c27c56aa13845cb47064bfe87e9
Compare a842e7a1dc6c241c928c6291411393cdb2516608
Diff Diff

Regressions in System.Numerics.Tests.Perf_Matrix3x2

Benchmark Baseline Test Test/Base Test Quality Edge Detector Baseline IR Compare IR IR Ratio Baseline ETL Compare ETL
InequalityOperatorBenchmark - Duration of single invocation 6.33 ns 12.21 ns 1.93 0.07 True
LerpBenchmark - Duration of single invocation 8.16 ns 13.30 ns 1.63 0.04 True
NegationOperatorBenchmark - Duration of single invocation 6.35 ns 10.87 ns 1.71 0.10 False
AddBenchmark - Duration of single invocation 8.96 ns 15.89 ns 1.77 0.08 True

graph
graph
graph
graph
Test Report

Repro

git clone https://github.com/dotnet/performance.git
python3 .\performance\scripts\benchmarks_ci.py -f net6.0 --filter 'System.Numerics.Tests.Perf_Matrix3x2*'

Payloads

Baseline
Compare

Histogram

System.Numerics.Tests.Perf_Matrix3x2.InequalityOperatorBenchmark


System.Numerics.Tests.Perf_Matrix3x2.LerpBenchmark


System.Numerics.Tests.Perf_Matrix3x2.NegationOperatorBenchmark


System.Numerics.Tests.Perf_Matrix3x2.AddBenchmark


Docs

Profiling workflow for dotnet/runtime repository
Benchmarking workflow for dotnet/runtime repository

Run Information

Architecture x64
OS ubuntu 18.04
Baseline 10b2f7c858934c27c56aa13845cb47064bfe87e9
Compare a842e7a1dc6c241c928c6291411393cdb2516608
Diff Diff

Regressions in System.Text.Json.Serialization.Tests.ReadJson<Nullable<DateTimeOffset>>

Benchmark Baseline Test Test/Base Test Quality Edge Detector Baseline IR Compare IR IR Ratio Baseline ETL Compare ETL
DeserializeFromUtf8Bytes - Duration of single invocation 130.22 ns 140.77 ns 1.08 0.05 True

graph
Test Report

Repro

git clone https://github.com/dotnet/performance.git
python3 .\performance\scripts\benchmarks_ci.py -f net6.0 --filter 'System.Text.Json.Serialization.Tests.ReadJson&lt;Nullable&lt;DateTimeOffset&gt;&gt;*'

Payloads

Baseline
Compare

Histogram

System.Text.Json.Serialization.Tests.ReadJson<Nullable<DateTimeOffset>>.DeserializeFromUtf8Bytes


Docs

Profiling workflow for dotnet/runtime repository
Benchmarking workflow for dotnet/runtime repository

Run Information

Architecture x64
OS ubuntu 18.04
Baseline 10b2f7c858934c27c56aa13845cb47064bfe87e9
Compare a842e7a1dc6c241c928c6291411393cdb2516608
Diff Diff

Regressions in System.Collections.Sort<BigStruct>

Benchmark Baseline Test Test/Base Test Quality Edge Detector Baseline IR Compare IR IR Ratio Baseline ETL Compare ETL
Array_ComparerStruct - Duration of single invocation 33.94 μs 46.92 μs 1.38 0.02 True
Array_Comparison - Duration of single invocation 30.67 μs 45.16 μs 1.47 0.03 True
Array_ComparerClass - Duration of single invocation 31.22 μs 44.95 μs 1.44 0.07 True

graph
graph
graph
Test Report

Repro

git clone https://github.com/dotnet/performance.git
python3 .\performance\scripts\benchmarks_ci.py -f net6.0 --filter 'System.Collections.Sort&lt;BigStruct&gt;*'

Payloads

Baseline
Compare

Histogram

System.Collections.Sort<BigStruct>.Array_ComparerStruct(Size: 512)


System.Collections.Sort<BigStruct>.Array_Comparison(Size: 512)


System.Collections.Sort<BigStruct>.Array_ComparerClass(Size: 512)


Docs

Profiling workflow for dotnet/runtime repository
Benchmarking workflow for dotnet/runtime repository

Run Information

Architecture x64
OS ubuntu 18.04
Baseline 10b2f7c858934c27c56aa13845cb47064bfe87e9
Compare a842e7a1dc6c241c928c6291411393cdb2516608
Diff Diff

Regressions in System.Collections.IterateForEachNonGeneric<Int32>

Benchmark Baseline Test Test/Base Test Quality Edge Detector Baseline IR Compare IR IR Ratio Baseline ETL Compare ETL
Queue - Duration of single invocation 3.61 μs 3.85 μs 1.07 0.06 False

graph
Test Report

Repro

git clone https://github.com/dotnet/performance.git
python3 .\performance\scripts\benchmarks_ci.py -f net6.0 --filter 'System.Collections.IterateForEachNonGeneric&lt;Int32&gt;*'

Payloads

Baseline
Compare

Histogram

System.Collections.IterateForEachNonGeneric<Int32>.Queue(Size: 512)


Docs

Profiling workflow for dotnet/runtime repository
Benchmarking workflow for dotnet/runtime repository

Run Information

Architecture x64
OS ubuntu 18.04
Baseline 10b2f7c858934c27c56aa13845cb47064bfe87e9
Compare a842e7a1dc6c241c928c6291411393cdb2516608
Diff Diff

Regressions in System.Collections.IterateForEach<Int32>

Benchmark Baseline Test Test/Base Test Quality Edge Detector Baseline IR Compare IR IR Ratio Baseline ETL Compare ETL
ConcurrentStack - Duration of single invocation 2.49 μs 2.69 μs 1.08 0.01 False

graph
Test Report

Repro

git clone https://github.com/dotnet/performance.git
python3 .\performance\scripts\benchmarks_ci.py -f net6.0 --filter 'System.Collections.IterateForEach&lt;Int32&gt;*'

Payloads

Baseline
Compare

Histogram

System.Collections.IterateForEach<Int32>.ConcurrentStack(Size: 512)


Docs

Profiling workflow for dotnet/runtime repository
Benchmarking workflow for dotnet/runtime repository

Run Information

Architecture x64
OS ubuntu 18.04
Baseline 10b2f7c858934c27c56aa13845cb47064bfe87e9
Compare a842e7a1dc6c241c928c6291411393cdb2516608
Diff Diff

Regressions in System.Collections.ContainsFalse<Int32>

Benchmark Baseline Test Test/Base Test Quality Edge Detector Baseline IR Compare IR IR Ratio Baseline ETL Compare ETL
ImmutableSortedSet - Duration of single invocation 28.60 μs 31.25 μs 1.09 0.00 True

graph
Test Report

Repro

git clone https://github.com/dotnet/performance.git
python3 .\performance\scripts\benchmarks_ci.py -f net6.0 --filter 'System.Collections.ContainsFalse&lt;Int32&gt;*'

Payloads

Baseline
Compare

Histogram

System.Collections.ContainsFalse<Int32>.ImmutableSortedSet(Size: 512)


Docs

Profiling workflow for dotnet/runtime repository
Benchmarking workflow for dotnet/runtime repository

Run Information

Architecture x64
OS ubuntu 18.04
Baseline 10b2f7c858934c27c56aa13845cb47064bfe87e9
Compare a842e7a1dc6c241c928c6291411393cdb2516608
Diff Diff

Regressions in System.Numerics.Tests.Perf_Vector2

Benchmark Baseline Test Test/Base Test Quality Edge Detector Baseline IR Compare IR IR Ratio Baseline ETL Compare ETL
TransformNormalByMatrix3x2Benchmark - Duration of single invocation 1.99 ns 8.09 ns 4.06 0.14 True

graph
Test Report

Repro

git clone https://github.com/dotnet/performance.git
python3 .\performance\scripts\benchmarks_ci.py -f net6.0 --filter 'System.Numerics.Tests.Perf_Vector2*'

Payloads

Baseline
Compare

Histogram

System.Numerics.Tests.Perf_Vector2.TransformNormalByMatrix3x2Benchmark


Docs

Profiling workflow for dotnet/runtime repository
Benchmarking workflow for dotnet/runtime repository

Run Information

Architecture x64
OS ubuntu 18.04
Baseline 10b2f7c858934c27c56aa13845cb47064bfe87e9
Compare a842e7a1dc6c241c928c6291411393cdb2516608
Diff Diff

Regressions in System.Memory.Span<Byte>

Benchmark Baseline Test Test/Base Test Quality Edge Detector Baseline IR Compare IR IR Ratio Baseline ETL Compare ETL
SequenceCompareTo - Duration of single invocation 9.92 ns 12.15 ns 1.23 0.08 False

graph
Test Report

Repro

git clone https://github.com/dotnet/performance.git
python3 .\performance\scripts\benchmarks_ci.py -f net6.0 --filter 'System.Memory.Span&lt;Byte&gt;*'

Payloads

Baseline
Compare

Histogram

System.Memory.Span<Byte>.SequenceCompareTo(Size: 512)


Docs

Profiling workflow for dotnet/runtime repository
Benchmarking workflow for dotnet/runtime repository

Run Information

Architecture x64
OS ubuntu 18.04
Baseline 10b2f7c858934c27c56aa13845cb47064bfe87e9
Compare a842e7a1dc6c241c928c6291411393cdb2516608
Diff Diff

Regressions in System.Text.RegularExpressions.Tests.Perf_Regex_Common

Benchmark Baseline Test Test/Base Test Quality Edge Detector Baseline IR Compare IR IR Ratio Baseline ETL Compare ETL
Uri_IsMatch - Duration of single invocation 132.85 ns 143.51 ns 1.08 0.04 False

graph
Test Report

Repro

git clone https://github.com/dotnet/performance.git
python3 .\performance\scripts\benchmarks_ci.py -f net6.0 --filter 'System.Text.RegularExpressions.Tests.Perf_Regex_Common*'

Payloads

Baseline
Compare

Histogram

System.Text.RegularExpressions.Tests.Perf_Regex_Common.Uri_IsMatch(Options: IgnoreCase, Compiled)


Docs

Profiling workflow for dotnet/runtime repository
Benchmarking workflow for dotnet/runtime repository

Author: performanceautofiler[bot]
Assignees: -
Labels:

os-linux, arch-x64, area-CodeGen-coreclr

Milestone: -

@kunalspathak
Copy link
Member

Introduced in #55604

@kunalspathak
Copy link
Member

@alexcovington

@kunalspathak
Copy link
Member

kunalspathak commented Sep 21, 2021

windows/x64 regression - dotnet/perf-autofiling-issues#1487

@kunalspathak
Copy link
Member

windows/x86 regression - dotnet/perf-autofiling-issues#1492

@kunalspathak
Copy link
Member

AMD/Windows x64 regression - dotnet/perf-autofiling-issues#1510

@kunalspathak
Copy link
Member

Improvements in Linux/x64 - dotnet/perf-autofiling-issues#1483

@kunalspathak
Copy link
Member

Improvements in windows/x86 - dotnet/perf-autofiling-issues#1496

@kunalspathak
Copy link
Member

Improvement AMD windows/x64 - dotnet/perf-autofiling-issues#1515

@alexcovington
Copy link
Contributor

Thanks, @kunalspathak. Will take a look and update with any findings.

@JulieLeeMSFT JulieLeeMSFT added this to the 6.0.0 milestone Sep 21, 2021
@alexcovington
Copy link
Contributor

Looks like most of these regressions are due to pipeline stalls, specifically when handling the remainder of a block copy/init that doesn't fit in only SIMD moves.

For example, Compare generates block copies using only AVX:

; System.Numerics.Tests.Perf_Vector2.TransformNormalByMatrix3x2Benchmark()
       ...
       vmovdqu   xmm0,xmmword ptr [rsp+38]
       vmovdqu   xmmword ptr [rsp+20],xmm0
       vmovdqu   xmm0,xmmword ptr [rsp+40]
       vmovdqu   xmmword ptr [rsp+28],xmm0
       ...
; Total bytes of code 142

But Baseline uses AVX and handles the remainder with GPR:

; System.Numerics.Tests.Perf_Vector2.TransformNormalByMatrix3x2Benchmark()
       ...
       vmovdqu   xmm0,xmmword ptr [rsp+38]
       vmovdqu   xmmword ptr [rsp+20],xmm0
       mov       rax,[rsp+48]
       mov       [rsp+30],rax
       ...
; Total bytes of code 140

Reverting the behavior to use GPRs for the remainder fixes most of the regressions on my local machine:

[2021/09/22 12:02:37][INFO] // * Summary *
[2021/09/22 12:02:37][INFO]
[2021/09/22 12:02:37][INFO] BenchmarkDotNet=v0.13.1.1603-nightly, OS=Windows 10.0.19042.1237 (20H2/October2020Update)
[2021/09/22 12:02:37][INFO] AMD Ryzen 7 5800, 1 CPU, 16 logical and 8 physical cores
[2021/09/22 12:02:37][INFO] .NET SDK=6.0.100-rc.1.21417.19
[2021/09/22 12:02:37][INFO]   [Host]     : .NET 6.0.0 (6.0.21.41701), X64 RyuJIT
[2021/09/22 12:02:37][INFO]   Job-EABBZP : .NET 7.0.0 (42.42.42.42424), X64 RyuJIT
[2021/09/22 12:02:37][INFO]   Job-YIMGWV : .NET 7.0.0 (42.42.42.42424), X64 RyuJIT
[2021/09/22 12:02:37][INFO]   Job-BBDKJF : .NET 7.0.0 (42.42.42.42424), X64 RyuJIT
[2021/09/22 12:02:37][INFO]   Job-SGJNTS : .NET 7.0.0 (42.42.42.42424), X64 RyuJIT
[2021/09/22 12:02:37][INFO]
[2021/09/22 12:02:37][INFO] PowerPlanMode=00000000-0000-0000-0000-000000000000  Arguments=/p:DebugType=portable,-bl:benchmarkdotnet.binlog  IterationCount=50
[2021/09/22 12:02:37][INFO] IterationTime=250.0000 ms  MaxIterationCount=20  MinIterationCount=15
[2021/09/22 12:02:37][INFO]
[2021/09/22 12:02:37][INFO] |                            Namespace |                            Type |                              Method |        Job |                                                                                                        Toolchain | InvocationCount | MinWarmupIterationCount | UnrollFactor | WarmupCount | Size |              Options |          Mean |       Error |      StdDev |        Median |           Min |           Max | Ratio | RatioSD | Allocated |
[2021/09/22 12:02:37][INFO] |------------------------------------- |-------------------------------- |------------------------------------ |----------- |----------------------------------------------------------------------------------------------------------------- |---------------- |------------------------ |------------- |------------ |----- |--------------------- |--------------:|------------:|------------:|--------------:|--------------:|--------------:|------:|--------:|----------:|
[2021/09/22 12:02:37][INFO] |                System.Numerics.Tests |                  Perf_Matrix3x2 |         InequalityOperatorBenchmark | Job-EABBZP | \runtime-master\artifacts\bin\testhost\net7.0-windows-Release-x64\shared\Microsoft.NETCore.App\7.0.0\corerun.exe |               1 |                 Default |           16 |           1 |    ? |                    ? |     19.619 ns |   0.0041 ns |   0.0077 ns |     19.618 ns |     19.610 ns |     19.639 ns |  1.00 |    0.00 |         - |
[2021/09/22 12:02:37][INFO] |                System.Numerics.Tests |                  Perf_Matrix3x2 |         InequalityOperatorBenchmark | Job-YIMGWV |        \runtime\artifacts\bin\testhost\net7.0-windows-Release-x64\shared\Microsoft.NETCore.App\7.0.0\corerun.exe |               1 |                 Default |           16 |           1 |    ? |                    ? |      6.904 ns |   0.0068 ns |   0.0133 ns |      6.904 ns |      6.881 ns |      6.941 ns |  0.35 |    0.00 |         - |
[2021/09/22 12:02:37][INFO] |                                      |                                 |                                     |            |                                                                                                                  |                 |                         |              |             |      |                      |               |             |             |               |               |               |       |         |           |
[2021/09/22 12:02:37][INFO] |                System.Numerics.Tests |                  Perf_Matrix3x2 |           NegationOperatorBenchmark | Job-EABBZP | \runtime-master\artifacts\bin\testhost\net7.0-windows-Release-x64\shared\Microsoft.NETCore.App\7.0.0\corerun.exe |               1 |                 Default |           16 |           1 |    ? |                    ? |     10.288 ns |   0.0075 ns |   0.0144 ns |     10.287 ns |     10.256 ns |     10.324 ns |  1.00 |    0.00 |         - |
[2021/09/22 12:02:37][INFO] |                System.Numerics.Tests |                  Perf_Matrix3x2 |           NegationOperatorBenchmark | Job-YIMGWV |        \runtime\artifacts\bin\testhost\net7.0-windows-Release-x64\shared\Microsoft.NETCore.App\7.0.0\corerun.exe |               1 |                 Default |           16 |           1 |    ? |                    ? |     10.463 ns |   0.0119 ns |   0.0232 ns |     10.462 ns |     10.419 ns |     10.524 ns |  1.02 |    0.00 |         - |
[2021/09/22 12:02:37][INFO] |                                      |                                 |                                     |            |                                                                                                                  |                 |                         |              |             |      |                      |               |             |             |               |               |               |       |         |           |
[2021/09/22 12:02:37][INFO] |                System.Numerics.Tests |                  Perf_Matrix3x2 |                        AddBenchmark | Job-EABBZP | \runtime-master\artifacts\bin\testhost\net7.0-windows-Release-x64\shared\Microsoft.NETCore.App\7.0.0\corerun.exe |               1 |                 Default |           16 |           1 |    ? |                    ? |     10.865 ns |   0.0078 ns |   0.0155 ns |     10.864 ns |     10.836 ns |     10.904 ns |  1.00 |    0.00 |         - |
[2021/09/22 12:02:37][INFO] |                System.Numerics.Tests |                  Perf_Matrix3x2 |                        AddBenchmark | Job-YIMGWV |        \runtime\artifacts\bin\testhost\net7.0-windows-Release-x64\shared\Microsoft.NETCore.App\7.0.0\corerun.exe |               1 |                 Default |           16 |           1 |    ? |                    ? |     11.125 ns |   0.0069 ns |   0.0133 ns |     11.123 ns |     11.099 ns |     11.159 ns |  1.02 |    0.00 |         - |
[2021/09/22 12:02:37][INFO] |                                      |                                 |                                     |            |                                                                                                                  |                 |                         |              |             |      |                      |               |             |             |               |               |               |       |         |           |
[2021/09/22 12:02:37][INFO] |                System.Numerics.Tests |                  Perf_Matrix3x2 |                       LerpBenchmark | Job-EABBZP | \runtime-master\artifacts\bin\testhost\net7.0-windows-Release-x64\shared\Microsoft.NETCore.App\7.0.0\corerun.exe |               1 |                 Default |           16 |           1 |    ? |                    ? |     10.106 ns |   0.0355 ns |   0.0718 ns |     10.110 ns |      9.818 ns |     10.219 ns |  1.00 |    0.00 |         - |
[2021/09/22 12:02:37][INFO] |                System.Numerics.Tests |                  Perf_Matrix3x2 |                       LerpBenchmark | Job-YIMGWV |        \runtime\artifacts\bin\testhost\net7.0-windows-Release-x64\shared\Microsoft.NETCore.App\7.0.0\corerun.exe |               1 |                 Default |           16 |           1 |    ? |                    ? |     10.102 ns |   0.0136 ns |   0.0268 ns |     10.100 ns |     10.045 ns |     10.161 ns |  1.00 |    0.01 |         - |
[2021/09/22 12:02:37][INFO] |                                      |                                 |                                     |            |                                                                                                                  |                 |                         |              |             |      |                      |               |             |             |               |               |               |       |         |           |
[2021/09/22 12:02:37][INFO] |                System.Numerics.Tests |                    Perf_Vector2 | TransformNormalByMatrix3x2Benchmark | Job-EABBZP | \runtime-master\artifacts\bin\testhost\net7.0-windows-Release-x64\shared\Microsoft.NETCore.App\7.0.0\corerun.exe |               1 |                 Default |           16 |           1 |    ? |                    ? |      2.441 ns |   0.0043 ns |   0.0086 ns |      2.439 ns |      2.425 ns |      2.467 ns |  1.00 |    0.00 |         - |
[2021/09/22 12:02:37][INFO] |                System.Numerics.Tests |                    Perf_Vector2 | TransformNormalByMatrix3x2Benchmark | Job-YIMGWV |        \runtime\artifacts\bin\testhost\net7.0-windows-Release-x64\shared\Microsoft.NETCore.App\7.0.0\corerun.exe |               1 |                 Default |           16 |           1 |    ? |                    ? |      2.465 ns |   0.0043 ns |   0.0081 ns |      2.465 ns |      2.451 ns |      2.485 ns |  1.01 |    0.00 |         - |
[2021/09/22 12:02:37][INFO] |                                      |                                 |                                     |            |                                                                                                                  |                 |                         |              |             |      |                      |               |             |             |               |               |               |       |         |           |
[2021/09/22 12:02:37][INFO] |                   System.Collections |                 Sort<BigStruct> |                 Array_ComparerClass | Job-BBDKJF | \runtime-master\artifacts\bin\testhost\net7.0-windows-Release-x64\shared\Microsoft.NETCore.App\7.0.0\corerun.exe |            5000 |                       6 |            1 |          -1 |  512 |                    ? | 18,309.981 ns |  39.4056 ns |  77.7827 ns | 18,298.790 ns | 18,179.920 ns | 18,525.080 ns |  1.00 |    0.00 |      64 B |
[2021/09/22 12:02:37][INFO] |                   System.Collections |                 Sort<BigStruct> |                 Array_ComparerClass | Job-SGJNTS |        \runtime\artifacts\bin\testhost\net7.0-windows-Release-x64\shared\Microsoft.NETCore.App\7.0.0\corerun.exe |            5000 |                       6 |            1 |          -1 |  512 |                    ? | 17,369.277 ns |  31.0949 ns |  59.9092 ns | 17,364.980 ns | 17,280.080 ns | 17,521.900 ns |  0.95 |    0.01 |      64 B |
[2021/09/22 12:02:37][INFO] |                                      |                                 |                                     |            |                                                                                                                  |                 |                         |              |             |      |                      |               |             |             |               |               |               |       |         |           |
[2021/09/22 12:02:37][INFO] |                   System.Collections |                 Sort<BigStruct> |                Array_ComparerStruct | Job-BBDKJF | \runtime-master\artifacts\bin\testhost\net7.0-windows-Release-x64\shared\Microsoft.NETCore.App\7.0.0\corerun.exe |            5000 |                       6 |            1 |          -1 |  512 |                    ? | 20,739.627 ns |  57.1131 ns | 110.0375 ns | 20,710.290 ns | 20,608.300 ns | 21,011.920 ns |  1.00 |    0.00 |      88 B |
[2021/09/22 12:02:37][INFO] |                   System.Collections |                 Sort<BigStruct> |                Array_ComparerStruct | Job-SGJNTS |        \runtime\artifacts\bin\testhost\net7.0-windows-Release-x64\shared\Microsoft.NETCore.App\7.0.0\corerun.exe |            5000 |                       6 |            1 |          -1 |  512 |                    ? | 18,781.162 ns |  63.7736 ns | 127.3627 ns | 18,739.540 ns | 18,655.720 ns | 19,083.100 ns |  0.91 |    0.01 |      88 B |
[2021/09/22 12:02:37][INFO] |                                      |                                 |                                     |            |                                                                                                                  |                 |                         |              |             |      |                      |               |             |             |               |               |               |       |         |           |
[2021/09/22 12:02:37][INFO] |                   System.Collections |                 Sort<BigStruct> |                    Array_Comparison | Job-BBDKJF | \runtime-master\artifacts\bin\testhost\net7.0-windows-Release-x64\shared\Microsoft.NETCore.App\7.0.0\corerun.exe |            5000 |                       6 |            1 |          -1 |  512 |                    ? | 18,387.790 ns |  48.0323 ns |  91.3865 ns | 18,394.220 ns | 18,241.740 ns | 18,567.620 ns |  1.00 |    0.00 |         - |
[2021/09/22 12:02:37][INFO] |                   System.Collections |                 Sort<BigStruct> |                    Array_Comparison | Job-SGJNTS |        \runtime\artifacts\bin\testhost\net7.0-windows-Release-x64\shared\Microsoft.NETCore.App\7.0.0\corerun.exe |            5000 |                       6 |            1 |          -1 |  512 |                    ? | 17,229.087 ns |  17.4683 ns |  35.2869 ns | 17,229.230 ns | 17,152.700 ns | 17,307.620 ns |  0.94 |    0.01 |         - |
[2021/09/22 12:02:37][INFO] |                                      |                                 |                                     |            |                                                                                                                  |                 |                         |              |             |      |                      |               |             |             |               |               |               |       |         |           |
[2021/09/22 12:02:37][INFO] |                   System.Collections | IterateForEachNonGeneric<Int32> |                               Queue | Job-EABBZP | \runtime-master\artifacts\bin\testhost\net7.0-windows-Release-x64\shared\Microsoft.NETCore.App\7.0.0\corerun.exe |               1 |                 Default |           16 |           1 |  512 |                    ? |  3,764.629 ns |  32.6822 ns |  66.0196 ns |  3,729.569 ns |  3,696.452 ns |  3,872.348 ns |  1.00 |    0.00 |      40 B |
[2021/09/22 12:02:37][INFO] |                   System.Collections | IterateForEachNonGeneric<Int32> |                               Queue | Job-YIMGWV |        \runtime\artifacts\bin\testhost\net7.0-windows-Release-x64\shared\Microsoft.NETCore.App\7.0.0\corerun.exe |               1 |                 Default |           16 |           1 |  512 |                    ? |  3,575.286 ns |  19.6849 ns |  36.9730 ns |  3,561.024 ns |  3,549.117 ns |  3,678.556 ns |  0.95 |    0.02 |      40 B |
[2021/09/22 12:02:37][INFO] |                                      |                                 |                                     |            |                                                                                                                  |                 |                         |              |             |      |                      |               |             |             |               |               |               |       |         |           |
[2021/09/22 12:02:37][INFO] |                        System.Memory |                      Span<Byte> |                   SequenceCompareTo | Job-EABBZP | \runtime-master\artifacts\bin\testhost\net7.0-windows-Release-x64\shared\Microsoft.NETCore.App\7.0.0\corerun.exe |               1 |                 Default |           16 |           1 |  512 |                    ? |     12.714 ns |   0.0196 ns |   0.0383 ns |     12.719 ns |     12.662 ns |     12.814 ns |  1.00 |    0.00 |         - |
[2021/09/22 12:02:37][INFO] |                        System.Memory |                      Span<Byte> |                   SequenceCompareTo | Job-YIMGWV |        \runtime\artifacts\bin\testhost\net7.0-windows-Release-x64\shared\Microsoft.NETCore.App\7.0.0\corerun.exe |               1 |                 Default |           16 |           1 |  512 |                    ? |     12.697 ns |   0.0217 ns |   0.0434 ns |     12.694 ns |     12.530 ns |     12.817 ns |  1.00 |    0.00 |         - |
[2021/09/22 12:02:37][INFO] |                                      |                                 |                                     |            |                                                                                                                  |                 |                         |              |             |      |                      |               |             |             |               |               |               |       |         |           |
[2021/09/22 12:02:37][INFO] |                   System.Collections |            ContainsFalse<Int32> |                  ImmutableSortedSet | Job-EABBZP | \runtime-master\artifacts\bin\testhost\net7.0-windows-Release-x64\shared\Microsoft.NETCore.App\7.0.0\corerun.exe |               1 |                 Default |           16 |           1 |  512 |                    ? | 17,567.821 ns | 118.3061 ns | 236.2701 ns | 17,639.420 ns | 16,881.930 ns | 17,776.573 ns |  1.00 |    0.00 |         - |
[2021/09/22 12:02:37][INFO] |                   System.Collections |            ContainsFalse<Int32> |                  ImmutableSortedSet | Job-YIMGWV |        \runtime\artifacts\bin\testhost\net7.0-windows-Release-x64\shared\Microsoft.NETCore.App\7.0.0\corerun.exe |               1 |                 Default |           16 |           1 |  512 |                    ? | 17,493.840 ns |  22.1354 ns |  44.2068 ns | 17,497.112 ns | 17,375.800 ns | 17,599.555 ns |  1.00 |    0.01 |         - |
[2021/09/22 12:02:37][INFO] |                                      |                                 |                                     |            |                                                                                                                  |                 |                         |              |             |      |                      |               |             |             |               |               |               |       |         |           |
[2021/09/22 12:02:37][INFO] |                   System.Collections |           IterateForEach<Int32> |                     ConcurrentStack | Job-EABBZP | \runtime-master\artifacts\bin\testhost\net7.0-windows-Release-x64\shared\Microsoft.NETCore.App\7.0.0\corerun.exe |               1 |                 Default |           16 |           1 |  512 |                    ? |  2,668.227 ns |   2.4297 ns |   4.8523 ns |  2,669.479 ns |  2,658.646 ns |  2,675.242 ns |  1.00 |    0.00 |      40 B |
[2021/09/22 12:02:37][INFO] |                   System.Collections |           IterateForEach<Int32> |                     ConcurrentStack | Job-YIMGWV |        \runtime\artifacts\bin\testhost\net7.0-windows-Release-x64\shared\Microsoft.NETCore.App\7.0.0\corerun.exe |               1 |                 Default |           16 |           1 |  512 |                    ? |  2,816.365 ns |   4.1483 ns |   8.1884 ns |  2,817.996 ns |  2,806.724 ns |  2,838.608 ns |  1.06 |    0.00 |      40 B |
[2021/09/22 12:02:37][INFO] |                                      |                                 |                                     |            |                                                                                                                  |                 |                         |              |             |      |                      |               |             |             |               |               |               |       |         |           |
[2021/09/22 12:02:37][INFO] | System.Text.RegularExpressions.Tests |               Perf_Regex_Common |                         Uri_IsMatch | Job-EABBZP | \runtime-master\artifacts\bin\testhost\net7.0-windows-Release-x64\shared\Microsoft.NETCore.App\7.0.0\corerun.exe |               1 |                 Default |           16 |           1 |    ? |                 None |    265.914 ns |   0.5222 ns |   1.0061 ns |    265.682 ns |    264.148 ns |    268.109 ns |  1.00 |    0.00 |         - |
[2021/09/22 12:02:37][INFO] | System.Text.RegularExpressions.Tests |               Perf_Regex_Common |                         Uri_IsMatch | Job-YIMGWV |        \runtime\artifacts\bin\testhost\net7.0-windows-Release-x64\shared\Microsoft.NETCore.App\7.0.0\corerun.exe |               1 |                 Default |           16 |           1 |    ? |                 None |    268.954 ns |   0.4903 ns |   0.9904 ns |    268.855 ns |    267.217 ns |    271.230 ns |  1.01 |    0.01 |         - |
[2021/09/22 12:02:37][INFO] |                                      |                                 |                                     |            |                                                                                                                  |                 |                         |              |             |      |                      |               |             |             |               |               |               |       |         |           |
[2021/09/22 12:02:37][INFO] | System.Text.RegularExpressions.Tests |               Perf_Regex_Common |                         Uri_IsMatch | Job-EABBZP | \runtime-master\artifacts\bin\testhost\net7.0-windows-Release-x64\shared\Microsoft.NETCore.App\7.0.0\corerun.exe |               1 |                 Default |           16 |           1 |    ? |             Compiled |     79.100 ns |   0.1498 ns |   0.2851 ns |     79.133 ns |     78.384 ns |     79.658 ns |  1.00 |    0.00 |         - |
[2021/09/22 12:02:37][INFO] | System.Text.RegularExpressions.Tests |               Perf_Regex_Common |                         Uri_IsMatch | Job-YIMGWV |        \runtime\artifacts\bin\testhost\net7.0-windows-Release-x64\shared\Microsoft.NETCore.App\7.0.0\corerun.exe |               1 |                 Default |           16 |           1 |    ? |             Compiled |     79.834 ns |   0.0811 ns |   0.1563 ns |     79.818 ns |     79.545 ns |     80.173 ns |  1.01 |    0.00 |         - |
[2021/09/22 12:02:37][INFO] |                                      |                                 |                                     |            |                                                                                                                  |                 |                         |              |             |      |                      |               |             |             |               |               |               |       |         |           |
[2021/09/22 12:02:37][INFO] | System.Text.RegularExpressions.Tests |               Perf_Regex_Common |                         Uri_IsMatch | Job-EABBZP | \runtime-master\artifacts\bin\testhost\net7.0-windows-Release-x64\shared\Microsoft.NETCore.App\7.0.0\corerun.exe |               1 |                 Default |           16 |           1 |    ? | IgnoreCase, Compiled |    122.076 ns |   0.5854 ns |   1.1825 ns |    121.229 ns |    120.568 ns |    123.854 ns |  1.00 |    0.00 |         - |
[2021/09/22 12:02:37][INFO] | System.Text.RegularExpressions.Tests |               Perf_Regex_Common |                         Uri_IsMatch | Job-YIMGWV |        \runtime\artifacts\bin\testhost\net7.0-windows-Release-x64\shared\Microsoft.NETCore.App\7.0.0\corerun.exe |               1 |                 Default |           16 |           1 |    ? | IgnoreCase, Compiled |    121.028 ns |   0.1459 ns |   0.2705 ns |    120.929 ns |    120.693 ns |    121.651 ns |  0.99 |    0.01 |         - |

The only case that isn't fixed by this is the System.Collections.IterateForEach<Int32>.ConcurrentStack scenario. I'm not sure that #55604 was the cause for this regression.

When I look at the disassembly produced by BDN, I don't see any block copies/inits happen or any SIMD use:

System.Collections.IterateForEach.ConcurrentStack Disassembly

Quick diff comparison

PS C:\Users\acovingt> fc.exe C:\Users\acovingt\Documents\baseline-artifacts\results\System.Collections.IterateForEach_Int32_-asm.md C:\Users\acovingt\Documents\diff-artifacts\results\System.Collections.IterateForEach_Int32_-asm.md
Comparing files C:\USERS\ACOVINGT\DOCUMENTS\BASELINE-ARTIFACTS\RESULTS\System.Collections.IterateForEach_Int32_-asm.md and C:\USERS\ACOVINGT\DOCUMENTS\DIFF-ARTIFACTS\RESULTS\SYSTEM.COLLECTIONS.ITERATEFOREACH_INT32_-ASM.MD
***** C:\USERS\ACOVINGT\DOCUMENTS\BASELINE-ARTIFACTS\RESULTS\System.Collections.IterateForEach_Int32_-asm.md
       mov       rcx,rbx
       call      qword ptr [7FFE130973C0]
       test      eax,eax
***** C:\USERS\ACOVINGT\DOCUMENTS\DIFF-ARTIFACTS\RESULTS\SYSTEM.COLLECTIONS.ITERATEFOREACH_INT32_-ASM.MD
       mov       rcx,rbx
       call      qword ptr [7FFE130A73C0]
       test      eax,eax
*****

***** C:\USERS\ACOVINGT\DOCUMENTS\BASELINE-ARTIFACTS\RESULTS\System.Collections.IterateForEach_Int32_-asm.md
       mov       rcx,rbx
       mov       r11,7FFE12CC04A0
       call      qword ptr [7FFE130904A0]
       mov       esi,eax
***** C:\USERS\ACOVINGT\DOCUMENTS\DIFF-ARTIFACTS\RESULTS\SYSTEM.COLLECTIONS.ITERATEFOREACH_INT32_-ASM.MD
       mov       rcx,rbx
       mov       r11,7FFE12CD04A0
       call      qword ptr [7FFE130A04A0]
       mov       esi,eax
*****

***** C:\USERS\ACOVINGT\DOCUMENTS\BASELINE-ARTIFACTS\RESULTS\System.Collections.IterateForEach_Int32_-asm.md
       mov       rcx,rbx
       call      qword ptr [7FFE130973C0]
       test      eax,eax
***** C:\USERS\ACOVINGT\DOCUMENTS\DIFF-ARTIFACTS\RESULTS\SYSTEM.COLLECTIONS.ITERATEFOREACH_INT32_-ASM.MD
       mov       rcx,rbx
       call      qword ptr [7FFE130A73C0]
       test      eax,eax
*****

***** C:\USERS\ACOVINGT\DOCUMENTS\BASELINE-ARTIFACTS\RESULTS\System.Collections.IterateForEach_Int32_-asm.md
       mov       rcx,rbx
       call      qword ptr [7FFE130973B8]
       mov       eax,esi
***** C:\USERS\ACOVINGT\DOCUMENTS\DIFF-ARTIFACTS\RESULTS\SYSTEM.COLLECTIONS.ITERATEFOREACH_INT32_-ASM.MD
       mov       rcx,rbx
       call      qword ptr [7FFE130A73B8]
       mov       eax,esi
*****

***** C:\USERS\ACOVINGT\DOCUMENTS\BASELINE-ARTIFACTS\RESULTS\System.Collections.IterateForEach_Int32_-asm.md
       mov       rcx,[rbp+0FFE0]
       call      qword ptr [7FFE130973B8]
M00_L02:
***** C:\USERS\ACOVINGT\DOCUMENTS\DIFF-ARTIFACTS\RESULTS\SYSTEM.COLLECTIONS.ITERATEFOREACH_INT32_-ASM.MD
       mov       rcx,[rbp+0FFE0]
       call      qword ptr [7FFE130A73B8]
M00_L02:
*****

Baseline

; System.Collections.IterateForEach`1[[System.Int32, System.Private.CoreLib]].ConcurrentStack()
       push      rbp
       push      rdi
       push      rsi
       push      rbx
       sub       rsp,38
       lea       rbp,[rsp+50]
       mov       [rbp+0FFD0],rsp
       xor       esi,esi
       mov       rcx,[rcx+70]
       mov       rdi,[rcx+8]
       mov       rcx,offset MT_System.Collections.Concurrent.ConcurrentStack`1+<GetEnumerator>d__35[[System.Int32, System.Private.CoreLib]]
       call      CORINFO_HELP_NEWSFAST
       mov       rbx,rax
       xor       edx,edx
       mov       [rbx+18],edx
       lea       rcx,[rbx+8]
       mov       rdx,rdi
       call      CORINFO_HELP_ASSIGN_REF
       mov       [rbp+0FFE0],rbx
       mov       rcx,rbx
       call      qword ptr [7FFE130973C0]
       test      eax,eax
       je        short M00_L01
M00_L00:
       mov       rcx,rbx
       mov       r11,7FFE12CC04A0
       call      qword ptr [7FFE130904A0]
       mov       esi,eax
       mov       rcx,rbx
       call      qword ptr [7FFE130973C0]
       test      eax,eax
       jne       short M00_L00
M00_L01:
       mov       rcx,rbx
       call      qword ptr [7FFE130973B8]
       mov       eax,esi
       add       rsp,38
       pop       rbx
       pop       rsi
       pop       rdi
       pop       rbp
       ret
       push      rbp
       push      rdi
       push      rsi
       push      rbx
       sub       rsp,28
       mov       rbp,[rcx+20]
       mov       [rsp+20],rbp
       lea       rbp,[rbp+50]
       cmp       qword ptr [rbp+0FFE0],0
       je        short M00_L02
       mov       rcx,[rbp+0FFE0]
       call      qword ptr [7FFE130973B8]
M00_L02:
       nop
       add       rsp,28
       pop       rbx
       pop       rsi
       pop       rdi
       pop       rbp
       ret
; Total bytes of code 181
; System.Collections.Concurrent.ConcurrentStack`1+<GetEnumerator>d__35[[System.Int32, System.Private.CoreLib]].MoveNext()
       push      rdi
       push      rsi
       mov       rsi,rcx
       mov       eax,[rsi+18]
       test      eax,eax
       je        short M01_L00
       cmp       eax,1
       je        short M01_L02
       xor       eax,eax
       pop       rsi
       pop       rdi
       ret
M01_L00:
       mov       dword ptr [rsi+18],0FFFFFFFF
       mov       rdi,[rsi+8]
       lea       rcx,[rsi+10]
       mov       rdx,rdi
       call      CORINFO_HELP_ASSIGN_REF
       test      rdi,rdi
       je        short M01_L03
M01_L01:
       mov       rax,[rsi+10]
       mov       eax,[rax+10]
       mov       [rsi+1C],eax
       mov       dword ptr [rsi+18],1
       mov       eax,1
       pop       rsi
       pop       rdi
       ret
M01_L02:
       mov       dword ptr [rsi+18],0FFFFFFFF
       mov       rdx,[rsi+10]
       mov       rdi,[rdx+8]
       lea       rcx,[rsi+10]
       mov       rdx,rdi
       call      CORINFO_HELP_ASSIGN_REF
       test      rdi,rdi
       jne       short M01_L01
M01_L03:
       xor       eax,eax
       pop       rsi
       pop       rdi
       ret
; Total bytes of code 112
; System.Collections.Concurrent.ConcurrentStack`1+<GetEnumerator>d__35[[System.Int32, System.Private.CoreLib]].System.IDisposable.Dispose()
       ret
; Total bytes of code 1

Updated to use GPR for remainder

; System.Collections.IterateForEach`1[[System.Int32, System.Private.CoreLib]].ConcurrentStack()
       push      rbp
       push      rdi
       push      rsi
       push      rbx
       sub       rsp,38
       lea       rbp,[rsp+50]
       mov       [rbp+0FFD0],rsp
       xor       esi,esi
       mov       rcx,[rcx+70]
       mov       rdi,[rcx+8]
       mov       rcx,offset MT_System.Collections.Concurrent.ConcurrentStack`1+<GetEnumerator>d__35[[System.Int32, System.Private.CoreLib]]
       call      CORINFO_HELP_NEWSFAST
       mov       rbx,rax
       xor       edx,edx
       mov       [rbx+18],edx
       lea       rcx,[rbx+8]
       mov       rdx,rdi
       call      CORINFO_HELP_ASSIGN_REF
       mov       [rbp+0FFE0],rbx
       mov       rcx,rbx
       call      qword ptr [7FFE130A73C0]
       test      eax,eax
       je        short M00_L01
M00_L00:
       mov       rcx,rbx
       mov       r11,7FFE12CD04A0
       call      qword ptr [7FFE130A04A0]
       mov       esi,eax
       mov       rcx,rbx
       call      qword ptr [7FFE130A73C0]
       test      eax,eax
       jne       short M00_L00
M00_L01:
       mov       rcx,rbx
       call      qword ptr [7FFE130A73B8]
       mov       eax,esi
       add       rsp,38
       pop       rbx
       pop       rsi
       pop       rdi
       pop       rbp
       ret
       push      rbp
       push      rdi
       push      rsi
       push      rbx
       sub       rsp,28
       mov       rbp,[rcx+20]
       mov       [rsp+20],rbp
       lea       rbp,[rbp+50]
       cmp       qword ptr [rbp+0FFE0],0
       je        short M00_L02
       mov       rcx,[rbp+0FFE0]
       call      qword ptr [7FFE130A73B8]
M00_L02:
       nop
       add       rsp,28
       pop       rbx
       pop       rsi
       pop       rdi
       pop       rbp
       ret
; Total bytes of code 181
; System.Collections.Concurrent.ConcurrentStack`1+<GetEnumerator>d__35[[System.Int32, System.Private.CoreLib]].MoveNext()
       push      rdi
       push      rsi
       mov       rsi,rcx
       mov       eax,[rsi+18]
       test      eax,eax
       je        short M01_L00
       cmp       eax,1
       je        short M01_L02
       xor       eax,eax
       pop       rsi
       pop       rdi
       ret
M01_L00:
       mov       dword ptr [rsi+18],0FFFFFFFF
       mov       rdi,[rsi+8]
       lea       rcx,[rsi+10]
       mov       rdx,rdi
       call      CORINFO_HELP_ASSIGN_REF
       test      rdi,rdi
       je        short M01_L03
M01_L01:
       mov       rax,[rsi+10]
       mov       eax,[rax+10]
       mov       [rsi+1C],eax
       mov       dword ptr [rsi+18],1
       mov       eax,1
       pop       rsi
       pop       rdi
       ret
M01_L02:
       mov       dword ptr [rsi+18],0FFFFFFFF
       mov       rdx,[rsi+10]
       mov       rdi,[rdx+8]
       lea       rcx,[rsi+10]
       mov       rdx,rdi
       call      CORINFO_HELP_ASSIGN_REF
       test      rdi,rdi
       jne       short M01_L01
M01_L03:
       xor       eax,eax
       pop       rsi
       pop       rdi
       ret
; Total bytes of code 112
; System.Collections.Concurrent.ConcurrentStack`1+<GetEnumerator>d__35[[System.Int32, System.Private.CoreLib]].System.IDisposable.Dispose()
       ret
; Total bytes of code 1

Will submit a PR to fix this soon. Let me know if I can clarify anything.

@kunalspathak
Copy link
Member

Thanks for promptly looking at this.

The only case that isn't fixed by this is the System.Collections.IterateForEach.ConcurrentStack scenario. I'm not sure that #55604 was the cause for this regression.

image

I see that there is noise because the regression seems to have recovered. Don't worry about this benchmark.

@JulieLeeMSFT
Copy link
Member

Moving this to .NET 7 since we passed .NET 6 RC2 snap.

@adamsitnik
Copy link
Member

I confirm that System.Numerics.Tests.Perf_Matrix4x4.NegationOperatorBenchmark regression is still a thing for 7.0:

Result Base Diff Ratio Operating System Bit Processor Name
Slower 4.58 7.99 0.57 Windows 11 X64 AMD Ryzen Threadripper PRO 3945WX 12-Cores
Slower 3.62 6.99 0.52 Windows 11 X64 AMD Ryzen 9 5900X
Slower 5.80 7.89 0.73 Windows 10 X64 Intel Xeon CPU E5-1650 v4 3.60GHz
Same 11.58 10.50 1.10 Windows 11 X64 Intel Core i5-4300U CPU 1.90GHz (Haswell)
Same 5.58 7.01 0.80 Windows 10 X64 Intel Core i7-7700 CPU 3.60GHz (Kaby Lake)
Same 5.20 6.54 0.80 Windows 11 X64 Intel Core i7-8700 CPU 3.20GHz (Coffee Lake)
Same 5.06 7.08 0.71 Windows 11 X64 Intel Core i9-9900T CPU 2.10GHz
Slower 8.45 10.77 0.78 Windows 11 X64 Unknown processor
Same 8.83 9.41 0.94 Windows 11 X64 Unknown processor
Slower 4.28 7.33 0.58 ubuntu 20.04 X64 AMD Ryzen 9 5900X
Same 7.32 8.63 0.85 ubuntu 18.04 X64 Intel Xeon CPU E5-1650 v4 3.60GHz
Same 8.10 8.35 0.97 centos 7 X64 Intel Xeon CPU E5530 2.40GHz
Same 7.44 8.73 0.85 ubuntu 18.04 X64 Intel Core i7-2720QM CPU 2.20GHz (Sandy Bridge)
Same 6.03 8.02 0.75 alpine 3.13 X64 Intel Core i7-7700 CPU 3.60GHz (Kaby Lake)
Slower 5.95 18.62 0.32 ubuntu 18.04 X64 Intel Core i7-7700 CPU 3.60GHz (Kaby Lake)
Slower 5.95 8.28 0.72 ubuntu 20.04 X64 Intel Core i7-7700 CPU 3.60GHz (Kaby Lake)
Same 11.59 11.56 1.00 Windows 10 Arm64 Microsoft SQ1 3.0 GHz
Slower 6.18 38.48 0.16 Windows 10 X86 Intel Xeon CPU E5-1650 v4 3.60GHz
Slower 27.73 32.56 0.85 Windows 10 Arm Microsoft SQ1 3.0 GHz
Slower 7.14 11.38 0.63 macOS Big Sur 11.6.3 X64 Intel Core i5-4278U CPU 2.60GHz (Haswell)
Slower 6.26 10.57 0.59 macOS Big Sur 11.4 X64 Intel Core i7-5557U CPU 3.10GHz (Broadwell)

@adamsitnik
Copy link
Member

Same goes for System.Collections.Sort<BigStruct>.Array_Comparison(Size: 512) but it's more hardware-specific

Result Base Diff Ratio Operating System Bit Processor Name
Same 24599.98 23124.16 1.06 Windows 11 X64 AMD Ryzen Threadripper PRO 3945WX 12-Cores
Faster 22267.59 12209.82 1.82 Windows 11 X64 AMD Ryzen 9 5900X
Same 36100.30 33451.88 1.08 Windows 10 X64 Intel Xeon CPU E5-1650 v4 3.60GHz
Slower 47183.04 66218.78 0.71 Windows 11 X64 Intel Core i5-4300U CPU 1.90GHz (Haswell)
Faster 40294.54 28493.22 1.41 Windows 10 X64 Intel Core i7-7700 CPU 3.60GHz (Kaby Lake)
Same 30304.77 27743.34 1.09 Windows 11 X64 Intel Core i7-8700 CPU 3.20GHz (Coffee Lake)
Same 32046.90 32203.04 1.00 Windows 11 X64 Intel Core i9-9900T CPU 2.10GHz
Same 46669.91 43923.64 1.06 Windows 11 X64 Unknown processor
Slower 42951.49 50995.70 0.84 Windows 11 X64 Unknown processor
Slower 14197.04 37533.26 0.38 ubuntu 20.04 X64 AMD Ryzen 9 5900X
Slower 38299.66 55484.61 0.69 ubuntu 18.04 X64 Intel Xeon CPU E5-1650 v4 3.60GHz
Same 72365.12 70844.77 1.02 centos 7 X64 Intel Xeon CPU E5530 2.40GHz
Same 48875.84 52673.91 0.93 ubuntu 18.04 X64 Intel Core i7-2720QM CPU 2.20GHz (Sandy Bridge)
Slower 36496.10 52447.27 0.70 alpine 3.13 X64 Intel Core i7-7700 CPU 3.60GHz (Kaby Lake)
Slower 36711.46 52210.52 0.70 ubuntu 18.04 X64 Intel Core i7-7700 CPU 3.60GHz (Kaby Lake)
Slower 37068.52 55272.38 0.67 ubuntu 20.04 X64 Intel Core i7-7700 CPU 3.60GHz (Kaby Lake)
Slower 34329.86 49660.84 0.69 ubuntu 20.04 X64 Intel Core i7-8700 CPU 3.20GHz (Coffee Lake)
Same 57540.97 56836.27 1.01 ubuntu 20.04 Arm64 Unknown processor
Same 58085.44 58366.52 1.00 Windows 10 Arm64 Microsoft SQ1 3.0 GHz
Same 59580.26 57693.12 1.03 Windows 11 Arm64 Microsoft SQ1 3.0 GHz
Slower 26869.67 50771.14 0.53 Windows 11 X86 AMD Ryzen Threadripper PRO 3945WX 12-Cores
Slower 39058.55 68290.00 0.57 Windows 10 X86 Intel Xeon CPU E5-1650 v4 3.60GHz
Same 82255.74 82176.45 1.00 Windows 10 Arm Microsoft SQ1 3.0 GHz
Slower 45699.41 71023.83 0.64 macOS Big Sur 11.6.3 X64 Intel Core i5-4278U CPU 2.60GHz (Haswell)
Slower 42134.09 68595.79 0.61 macOS Big Sur 11.4 X64 Intel Core i7-5557U CPU 3.10GHz (Broadwell)

@kunalspathak
Copy link
Member

@alexcovington - do you mind taking a look at fresh numbers and see it with and without #59497 ?

@alexcovington
Copy link
Contributor

@kunalspathak Sorry for the delay. I'm not able to reproduce the regression on either my 3600 or 5900X systems.

I pulled the latest main and built two versions, one with and without #59497.

Base commit is 77d6833.

To remove #59497 to use as a comparison, I reverted commit b82e838:

git revert --strategy resolve b82e8389715b275

Here are my numbers on my Ryzen 5 3600 system (the results are similar on my 5900X system and don't see much change either):

System.Collections.Sort.Array_Comparison
BenchmarkDotNet=v0.13.1.1694-nightly, OS=Windows 10 (10.0.19044.1466/21H2/November2021Update)
AMD Ryzen 5 3600, 1 CPU, 12 logical and 6 physical cores
.NET SDK=7.0.100-preview.2.22111.4
  [Host]     : .NET 7.0.0 (7.0.22.10302), X64 RyuJIT
  Job-VNDCGQ : .NET 7.0.0 (42.42.42.42424), X64 RyuJIT
  Job-ZZYUDP : .NET 7.0.0 (42.42.42.42424), X64 RyuJIT

PowerPlanMode=00000000-0000-0000-0000-000000000000  Arguments=/p:DebugType=portable,-bl:benchmarkdotnet.binlog  InvocationCount=5000  
IterationCount=30  IterationTime=250.0000 ms  LaunchCount=5  
MaxIterationCount=20  MinIterationCount=15  MinWarmupIterationCount=6  
UnrollFactor=1  WarmupCount=-1  

|           Method |        Job |                                                                                                      Toolchain | Size |     Mean |    Error |   StdDev |   Median |      Min |      Max | Ratio | RatioSD | Allocated | Alloc Ratio |
|----------------- |----------- |--------------------------------------------------------------------------------------------------------------- |----- |---------:|---------:|---------:|---------:|---------:|---------:|------:|--------:|----------:|------------:|
| Array_Comparison | Job-VNDCGQ | \runtime-base\artifacts\bin\testhost\net7.0-windows-Release-x64\shared\Microsoft.NETCore.App\7.0.0\corerun.exe |  512 | 28.25 μs | 0.191 μs | 0.679 μs | 28.32 μs | 26.85 μs | 30.38 μs |  1.00 |    0.00 |         - |          NA |
| Array_Comparison | Job-ZZYUDP | \runtime-diff\artifacts\bin\testhost\net7.0-windows-Release-x64\shared\Microsoft.NETCore.App\7.0.0\corerun.exe |  512 | 27.84 μs | 0.106 μs | 0.380 μs | 27.80 μs | 26.92 μs | 29.05 μs |  0.99 |    0.03 |         - |          NA |
System.Numerics.Tests.Perf_Matrix4x4.NegationOperatorBenchmark
BenchmarkDotNet=v0.13.1.1694-nightly, OS=Windows 10 (10.0.19044.1466/21H2/November2021Update)
AMD Ryzen 5 3600, 1 CPU, 12 logical and 6 physical cores
.NET SDK=7.0.100-preview.2.22111.4
  [Host]     : .NET 7.0.0 (7.0.22.10302), X64 RyuJIT
  Job-HRURMT : .NET 7.0.0 (42.42.42.42424), X64 RyuJIT
  Job-KNGSMF : .NET 7.0.0 (42.42.42.42424), X64 RyuJIT

PowerPlanMode=00000000-0000-0000-0000-000000000000  Arguments=/p:DebugType=portable,-bl:benchmarkdotnet.binlog  IterationCount=30  
IterationTime=250.0000 ms  LaunchCount=5  MaxIterationCount=20  
MinIterationCount=15  WarmupCount=1  

|                    Method |        Job |                                                                                                      Toolchain |     Mean |     Error |    StdDev |   Median |      Min |      Max | Ratio | RatioSD | Allocated | Alloc Ratio |
|-------------------------- |----------- |--------------------------------------------------------------------------------------------------------------- |---------:|----------:|----------:|---------:|---------:|---------:|------:|--------:|----------:|------------:|
| NegationOperatorBenchmark | Job-HRURMT | \runtime-base\artifacts\bin\testhost\net7.0-windows-Release-x64\shared\Microsoft.NETCore.App\7.0.0\corerun.exe | 9.258 ns | 0.1898 ns | 0.6657 ns | 9.368 ns | 8.492 ns | 10.24 ns |  1.00 |    0.00 |         - |          NA |
| NegationOperatorBenchmark | Job-KNGSMF | \runtime-diff\artifacts\bin\testhost\net7.0-windows-Release-x64\shared\Microsoft.NETCore.App\7.0.0\corerun.exe | 9.318 ns | 0.1338 ns | 0.4709 ns | 9.373 ns | 8.502 ns | 10.02 ns |  1.01 |    0.06 |         - |          NA |

Please let me know if I can clarify anything.

@kunalspathak
Copy link
Member

Perf_Matrix3x2 seems to have spikes historically:

image

image

image

@ghost ghost locked as resolved and limited conversation to collaborators Jul 4, 2022
@jozkee
Copy link
Member

jozkee commented Oct 14, 2022

A regression for System.Numerics.Tests.Perf_Matrix4x4.NegateBenchmark showed up in the 6.0 vs 7.0-rc2 report, potentially x64 specific. The historical data only goes back until october 2021 so we are unable to determine if this is just noise nor to see what might've caused it.

System.Numerics.Tests.Perf_Matrix4x4.NegateBenchmark

Result Ratio Alloc Delta Operating System Bit Processor Name Modality
Faster 1.15 +0 ubuntu 18.04 Arm64 Unknown processor
Slower 0.67 +0 Windows 11 Arm64 Unknown processor
Faster 1.23 +0 Windows 11 Arm64 Microsoft SQ1 3.0 GHz
Faster 1.17 +0 Windows 11 Arm64 Microsoft SQ1 3.0 GHz
Noise - +0 macOS Monterey 12.6 Arm64 Apple M1
Noise - +0 macOS Monterey 12.6 Arm64 Apple M1 Max
Same 0.96 +0 Windows 10 X64 Intel Core i7-8650U CPU 1.90GHz (Kaby Lake R)
Slower 0.73 +0 Windows 11 X64 AMD Ryzen Threadripper PRO 3945WX 12-Cores
Slower 0.63 +0 Windows 11 X64 AMD Ryzen 9 5900X
Slower 0.67 +0 Windows 11 X64 AMD Ryzen 9 7950X
Same 0.91 +0 Windows 11 X64 Intel Core i7-8700 CPU 3.20GHz (Coffee Lake) several?
Same 0.92 +0 debian 11 X64 Intel Core i7-8700 CPU 3.20GHz (Coffee Lake)
Slower 0.29 +0 ubuntu 18.04 X64 AMD Ryzen 9 5900X
Slower 0.31 +0 ubuntu 18.04 X64 Intel Xeon CPU E5-1650 v4 3.60GHz
Slower 0.67 +0 ubuntu 20.04 X64 AMD Ryzen 9 5900X
Same 0.92 +0 ubuntu 20.04 X64 Intel Core i7-8650U CPU 1.90GHz (Kaby Lake R)
Same 1.06 +0 ubuntu 20.04 X64 Intel Core i7-8700 CPU 3.20GHz (Coffee Lake)
Same 0.92 +0 macOS Big Sur 11.7 X64 Intel Core i5-4278U CPU 2.60GHz (Haswell)
Noise - +0 macOS Monterey 12.6 X64 Intel Core i7-4870HQ CPU 2.50GHz (Haswell)

@jozkee jozkee reopened this Oct 14, 2022
@jozkee
Copy link
Member

jozkee commented Oct 14, 2022

Actually, this can be closed as duplicate of #65191 which is tracking System.Numerics.Tests.Perf_Matrix4x4.NegateBenchmark and other regressions in the same area.

@jozkee jozkee closed this as completed Oct 14, 2022
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
arch-x64 area-CodeGen-coreclr CLR JIT compiler in src/coreclr/src/jit and related components such as SuperPMI os-linux Linux OS (any supported distro) tenet-performance Performance related issue tenet-performance-benchmarks Issue from performance benchmark
Projects
None yet
Development

No branches or pull requests

6 participants