-
Notifications
You must be signed in to change notification settings - Fork 4.7k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Perf regressions in Perf_Matrix benchmarks #59415
Comments
I couldn't figure out the best area label to add to this issue. If you have write-permissions please help me learn by adding exactly one area label. |
Tagging subscribers to this area: @JulieLeeMSFT Issue DetailsRun Information
Regressions in System.IO.Tests.BinaryWriterTests
Reprogit clone https://github.com/dotnet/performance.git
python3 .\performance\scripts\benchmarks_ci.py -f net6.0 --filter 'System.IO.Tests.BinaryWriterTests*' PayloadsHistogramSystem.IO.Tests.BinaryWriterTests.WriteAsciiChar
DocsProfiling workflow for dotnet/runtime repository
Regressions in System.Numerics.Tests.Perf_Matrix3x2
Reprogit clone https://github.com/dotnet/performance.git
python3 .\performance\scripts\benchmarks_ci.py -f net6.0 --filter 'System.Numerics.Tests.Perf_Matrix3x2*' PayloadsHistogramSystem.Numerics.Tests.Perf_Matrix3x2.InequalityOperatorBenchmark
System.Numerics.Tests.Perf_Matrix3x2.LerpBenchmark
System.Numerics.Tests.Perf_Matrix3x2.NegationOperatorBenchmark
System.Numerics.Tests.Perf_Matrix3x2.AddBenchmark
DocsProfiling workflow for dotnet/runtime repository Run Information
Regressions in System.Text.Json.Serialization.Tests.ReadJson<Nullable<DateTimeOffset>>
Reprogit clone https://github.com/dotnet/performance.git
python3 .\performance\scripts\benchmarks_ci.py -f net6.0 --filter 'System.Text.Json.Serialization.Tests.ReadJson<Nullable<DateTimeOffset>>*' PayloadsHistogramSystem.Text.Json.Serialization.Tests.ReadJson<Nullable<DateTimeOffset>>.DeserializeFromUtf8Bytes
DocsProfiling workflow for dotnet/runtime repository Run Information
Regressions in System.Collections.Sort<BigStruct>
Reprogit clone https://github.com/dotnet/performance.git
python3 .\performance\scripts\benchmarks_ci.py -f net6.0 --filter 'System.Collections.Sort<BigStruct>*' PayloadsHistogramSystem.Collections.Sort<BigStruct>.Array_ComparerStruct(Size: 512)
System.Collections.Sort<BigStruct>.Array_Comparison(Size: 512)
System.Collections.Sort<BigStruct>.Array_ComparerClass(Size: 512)
DocsProfiling workflow for dotnet/runtime repository Run Information
Regressions in System.Collections.IterateForEachNonGeneric<Int32>
Reprogit clone https://github.com/dotnet/performance.git
python3 .\performance\scripts\benchmarks_ci.py -f net6.0 --filter 'System.Collections.IterateForEachNonGeneric<Int32>*' PayloadsHistogramSystem.Collections.IterateForEachNonGeneric<Int32>.Queue(Size: 512)
DocsProfiling workflow for dotnet/runtime repository Run Information
Regressions in System.Collections.IterateForEach<Int32>
Reprogit clone https://github.com/dotnet/performance.git
python3 .\performance\scripts\benchmarks_ci.py -f net6.0 --filter 'System.Collections.IterateForEach<Int32>*' PayloadsHistogramSystem.Collections.IterateForEach<Int32>.ConcurrentStack(Size: 512)
DocsProfiling workflow for dotnet/runtime repository Run Information
Regressions in System.Collections.ContainsFalse<Int32>
Reprogit clone https://github.com/dotnet/performance.git
python3 .\performance\scripts\benchmarks_ci.py -f net6.0 --filter 'System.Collections.ContainsFalse<Int32>*' PayloadsHistogramSystem.Collections.ContainsFalse<Int32>.ImmutableSortedSet(Size: 512)
DocsProfiling workflow for dotnet/runtime repository Run Information
Regressions in System.Numerics.Tests.Perf_Vector2
Reprogit clone https://github.com/dotnet/performance.git
python3 .\performance\scripts\benchmarks_ci.py -f net6.0 --filter 'System.Numerics.Tests.Perf_Vector2*' PayloadsHistogramSystem.Numerics.Tests.Perf_Vector2.TransformNormalByMatrix3x2Benchmark
DocsProfiling workflow for dotnet/runtime repository Run Information
Regressions in System.Memory.Span<Byte>
Reprogit clone https://github.com/dotnet/performance.git
python3 .\performance\scripts\benchmarks_ci.py -f net6.0 --filter 'System.Memory.Span<Byte>*' PayloadsHistogramSystem.Memory.Span<Byte>.SequenceCompareTo(Size: 512)
DocsProfiling workflow for dotnet/runtime repository Run Information
Regressions in System.Text.RegularExpressions.Tests.Perf_Regex_Common
Reprogit clone https://github.com/dotnet/performance.git
python3 .\performance\scripts\benchmarks_ci.py -f net6.0 --filter 'System.Text.RegularExpressions.Tests.Perf_Regex_Common*' PayloadsHistogramSystem.Text.RegularExpressions.Tests.Perf_Regex_Common.Uri_IsMatch(Options: IgnoreCase, Compiled)
DocsProfiling workflow for dotnet/runtime repository
|
Introduced in #55604 |
windows/x64 regression - dotnet/perf-autofiling-issues#1487 |
windows/x86 regression - dotnet/perf-autofiling-issues#1492 |
AMD/Windows x64 regression - dotnet/perf-autofiling-issues#1510 |
Improvements in Linux/x64 - dotnet/perf-autofiling-issues#1483 |
Improvements in windows/x86 - dotnet/perf-autofiling-issues#1496 |
Improvement AMD windows/x64 - dotnet/perf-autofiling-issues#1515 |
Thanks, @kunalspathak. Will take a look and update with any findings. |
Looks like most of these regressions are due to pipeline stalls, specifically when handling the remainder of a block copy/init that doesn't fit in only SIMD moves. For example, Compare generates block copies using only AVX: ; System.Numerics.Tests.Perf_Vector2.TransformNormalByMatrix3x2Benchmark()
...
vmovdqu xmm0,xmmword ptr [rsp+38]
vmovdqu xmmword ptr [rsp+20],xmm0
vmovdqu xmm0,xmmword ptr [rsp+40]
vmovdqu xmmword ptr [rsp+28],xmm0
...
; Total bytes of code 142 But Baseline uses AVX and handles the remainder with GPR: ; System.Numerics.Tests.Perf_Vector2.TransformNormalByMatrix3x2Benchmark()
...
vmovdqu xmm0,xmmword ptr [rsp+38]
vmovdqu xmmword ptr [rsp+20],xmm0
mov rax,[rsp+48]
mov [rsp+30],rax
...
; Total bytes of code 140 Reverting the behavior to use GPRs for the remainder fixes most of the regressions on my local machine:
The only case that isn't fixed by this is the When I look at the disassembly produced by BDN, I don't see any block copies/inits happen or any SIMD use: System.Collections.IterateForEach.ConcurrentStack DisassemblyQuick diff comparison
Baseline ; System.Collections.IterateForEach`1[[System.Int32, System.Private.CoreLib]].ConcurrentStack()
push rbp
push rdi
push rsi
push rbx
sub rsp,38
lea rbp,[rsp+50]
mov [rbp+0FFD0],rsp
xor esi,esi
mov rcx,[rcx+70]
mov rdi,[rcx+8]
mov rcx,offset MT_System.Collections.Concurrent.ConcurrentStack`1+<GetEnumerator>d__35[[System.Int32, System.Private.CoreLib]]
call CORINFO_HELP_NEWSFAST
mov rbx,rax
xor edx,edx
mov [rbx+18],edx
lea rcx,[rbx+8]
mov rdx,rdi
call CORINFO_HELP_ASSIGN_REF
mov [rbp+0FFE0],rbx
mov rcx,rbx
call qword ptr [7FFE130973C0]
test eax,eax
je short M00_L01
M00_L00:
mov rcx,rbx
mov r11,7FFE12CC04A0
call qword ptr [7FFE130904A0]
mov esi,eax
mov rcx,rbx
call qword ptr [7FFE130973C0]
test eax,eax
jne short M00_L00
M00_L01:
mov rcx,rbx
call qword ptr [7FFE130973B8]
mov eax,esi
add rsp,38
pop rbx
pop rsi
pop rdi
pop rbp
ret
push rbp
push rdi
push rsi
push rbx
sub rsp,28
mov rbp,[rcx+20]
mov [rsp+20],rbp
lea rbp,[rbp+50]
cmp qword ptr [rbp+0FFE0],0
je short M00_L02
mov rcx,[rbp+0FFE0]
call qword ptr [7FFE130973B8]
M00_L02:
nop
add rsp,28
pop rbx
pop rsi
pop rdi
pop rbp
ret
; Total bytes of code 181 ; System.Collections.Concurrent.ConcurrentStack`1+<GetEnumerator>d__35[[System.Int32, System.Private.CoreLib]].MoveNext()
push rdi
push rsi
mov rsi,rcx
mov eax,[rsi+18]
test eax,eax
je short M01_L00
cmp eax,1
je short M01_L02
xor eax,eax
pop rsi
pop rdi
ret
M01_L00:
mov dword ptr [rsi+18],0FFFFFFFF
mov rdi,[rsi+8]
lea rcx,[rsi+10]
mov rdx,rdi
call CORINFO_HELP_ASSIGN_REF
test rdi,rdi
je short M01_L03
M01_L01:
mov rax,[rsi+10]
mov eax,[rax+10]
mov [rsi+1C],eax
mov dword ptr [rsi+18],1
mov eax,1
pop rsi
pop rdi
ret
M01_L02:
mov dword ptr [rsi+18],0FFFFFFFF
mov rdx,[rsi+10]
mov rdi,[rdx+8]
lea rcx,[rsi+10]
mov rdx,rdi
call CORINFO_HELP_ASSIGN_REF
test rdi,rdi
jne short M01_L01
M01_L03:
xor eax,eax
pop rsi
pop rdi
ret
; Total bytes of code 112 ; System.Collections.Concurrent.ConcurrentStack`1+<GetEnumerator>d__35[[System.Int32, System.Private.CoreLib]].System.IDisposable.Dispose()
ret
; Total bytes of code 1 Updated to use GPR for remainder ; System.Collections.IterateForEach`1[[System.Int32, System.Private.CoreLib]].ConcurrentStack()
push rbp
push rdi
push rsi
push rbx
sub rsp,38
lea rbp,[rsp+50]
mov [rbp+0FFD0],rsp
xor esi,esi
mov rcx,[rcx+70]
mov rdi,[rcx+8]
mov rcx,offset MT_System.Collections.Concurrent.ConcurrentStack`1+<GetEnumerator>d__35[[System.Int32, System.Private.CoreLib]]
call CORINFO_HELP_NEWSFAST
mov rbx,rax
xor edx,edx
mov [rbx+18],edx
lea rcx,[rbx+8]
mov rdx,rdi
call CORINFO_HELP_ASSIGN_REF
mov [rbp+0FFE0],rbx
mov rcx,rbx
call qword ptr [7FFE130A73C0]
test eax,eax
je short M00_L01
M00_L00:
mov rcx,rbx
mov r11,7FFE12CD04A0
call qword ptr [7FFE130A04A0]
mov esi,eax
mov rcx,rbx
call qword ptr [7FFE130A73C0]
test eax,eax
jne short M00_L00
M00_L01:
mov rcx,rbx
call qword ptr [7FFE130A73B8]
mov eax,esi
add rsp,38
pop rbx
pop rsi
pop rdi
pop rbp
ret
push rbp
push rdi
push rsi
push rbx
sub rsp,28
mov rbp,[rcx+20]
mov [rsp+20],rbp
lea rbp,[rbp+50]
cmp qword ptr [rbp+0FFE0],0
je short M00_L02
mov rcx,[rbp+0FFE0]
call qword ptr [7FFE130A73B8]
M00_L02:
nop
add rsp,28
pop rbx
pop rsi
pop rdi
pop rbp
ret
; Total bytes of code 181 ; System.Collections.Concurrent.ConcurrentStack`1+<GetEnumerator>d__35[[System.Int32, System.Private.CoreLib]].MoveNext()
push rdi
push rsi
mov rsi,rcx
mov eax,[rsi+18]
test eax,eax
je short M01_L00
cmp eax,1
je short M01_L02
xor eax,eax
pop rsi
pop rdi
ret
M01_L00:
mov dword ptr [rsi+18],0FFFFFFFF
mov rdi,[rsi+8]
lea rcx,[rsi+10]
mov rdx,rdi
call CORINFO_HELP_ASSIGN_REF
test rdi,rdi
je short M01_L03
M01_L01:
mov rax,[rsi+10]
mov eax,[rax+10]
mov [rsi+1C],eax
mov dword ptr [rsi+18],1
mov eax,1
pop rsi
pop rdi
ret
M01_L02:
mov dword ptr [rsi+18],0FFFFFFFF
mov rdx,[rsi+10]
mov rdi,[rdx+8]
lea rcx,[rsi+10]
mov rdx,rdi
call CORINFO_HELP_ASSIGN_REF
test rdi,rdi
jne short M01_L01
M01_L03:
xor eax,eax
pop rsi
pop rdi
ret
; Total bytes of code 112 ; System.Collections.Concurrent.ConcurrentStack`1+<GetEnumerator>d__35[[System.Int32, System.Private.CoreLib]].System.IDisposable.Dispose()
ret
; Total bytes of code 1 Will submit a PR to fix this soon. Let me know if I can clarify anything. |
Thanks for promptly looking at this.
I see that there is noise because the regression seems to have recovered. Don't worry about this benchmark. |
Moving this to .NET 7 since we passed .NET 6 RC2 snap. |
I confirm that
|
Same goes for
|
@alexcovington - do you mind taking a look at fresh numbers and see it with and without #59497 ? |
@kunalspathak Sorry for the delay. I'm not able to reproduce the regression on either my 3600 or 5900X systems. I pulled the latest main and built two versions, one with and without #59497. Base commit is 77d6833. To remove #59497 to use as a comparison, I reverted commit b82e838:
Here are my numbers on my Ryzen 5 3600 system (the results are similar on my 5900X system and don't see much change either): System.Collections.Sort.Array_Comparison
System.Numerics.Tests.Perf_Matrix4x4.NegationOperatorBenchmark
Please let me know if I can clarify anything. |
A regression for System.Numerics.Tests.Perf_Matrix4x4.NegateBenchmark
|
Actually, this can be closed as duplicate of #65191 which is tracking System.Numerics.Tests.Perf_Matrix4x4.NegateBenchmark and other regressions in the same area. |
Run Information
Regressions in System.IO.Tests.BinaryWriterTests
Test Report
Repro
Payloads
Baseline
Compare
Histogram
System.IO.Tests.BinaryWriterTests.WriteAsciiChar
Docs
Profiling workflow for dotnet/runtime repository
Benchmarking workflow for dotnet/runtime repository
Regressions in System.Numerics.Tests.Perf_Matrix3x2
Test Report
Repro
Payloads
Baseline
Compare
Histogram
System.Numerics.Tests.Perf_Matrix3x2.InequalityOperatorBenchmark
System.Numerics.Tests.Perf_Matrix3x2.LerpBenchmark
System.Numerics.Tests.Perf_Matrix3x2.NegationOperatorBenchmark
System.Numerics.Tests.Perf_Matrix3x2.AddBenchmark
Docs
Profiling workflow for dotnet/runtime repository
Benchmarking workflow for dotnet/runtime repository
Run Information
Regressions in System.Text.Json.Serialization.Tests.ReadJson<Nullable<DateTimeOffset>>
Test Report
Repro
Payloads
Baseline
Compare
Histogram
System.Text.Json.Serialization.Tests.ReadJson<Nullable<DateTimeOffset>>.DeserializeFromUtf8Bytes
Docs
Profiling workflow for dotnet/runtime repository
Benchmarking workflow for dotnet/runtime repository
Run Information
Regressions in System.Collections.Sort<BigStruct>
Test Report
Repro
Payloads
Baseline
Compare
Histogram
System.Collections.Sort<BigStruct>.Array_ComparerStruct(Size: 512)
System.Collections.Sort<BigStruct>.Array_Comparison(Size: 512)
System.Collections.Sort<BigStruct>.Array_ComparerClass(Size: 512)
Docs
Profiling workflow for dotnet/runtime repository
Benchmarking workflow for dotnet/runtime repository
Run Information
Regressions in System.Collections.IterateForEachNonGeneric<Int32>
Test Report
Repro
Payloads
Baseline
Compare
Histogram
System.Collections.IterateForEachNonGeneric<Int32>.Queue(Size: 512)
Docs
Profiling workflow for dotnet/runtime repository
Benchmarking workflow for dotnet/runtime repository
Run Information
Regressions in System.Collections.IterateForEach<Int32>
Test Report
Repro
Payloads
Baseline
Compare
Histogram
System.Collections.IterateForEach<Int32>.ConcurrentStack(Size: 512)
Docs
Profiling workflow for dotnet/runtime repository
Benchmarking workflow for dotnet/runtime repository
Run Information
Regressions in System.Collections.ContainsFalse<Int32>
Test Report
Repro
Payloads
Baseline
Compare
Histogram
System.Collections.ContainsFalse<Int32>.ImmutableSortedSet(Size: 512)
Docs
Profiling workflow for dotnet/runtime repository
Benchmarking workflow for dotnet/runtime repository
Run Information
Regressions in System.Numerics.Tests.Perf_Vector2
Test Report
Repro
Payloads
Baseline
Compare
Histogram
System.Numerics.Tests.Perf_Vector2.TransformNormalByMatrix3x2Benchmark
Docs
Profiling workflow for dotnet/runtime repository
Benchmarking workflow for dotnet/runtime repository
Run Information
Regressions in System.Memory.Span<Byte>
Test Report
Repro
Payloads
Baseline
Compare
Histogram
System.Memory.Span<Byte>.SequenceCompareTo(Size: 512)
Docs
Profiling workflow for dotnet/runtime repository
Benchmarking workflow for dotnet/runtime repository
Run Information
Regressions in System.Text.RegularExpressions.Tests.Perf_Regex_Common
Test Report
Repro
Payloads
Baseline
Compare
Histogram
System.Text.RegularExpressions.Tests.Perf_Regex_Common.Uri_IsMatch(Options: IgnoreCase, Compiled)
Docs
Profiling workflow for dotnet/runtime repository
Benchmarking workflow for dotnet/runtime repository
The text was updated successfully, but these errors were encountered: