[Perf] Windows/arm64: 1 Regression on 2/5/2024 4:01:54 PM #98176

performanceautofiler · 2024-02-08T07:48:08Z

Run Information

Name	Value
Architecture	arm64
OS	Windows 10.0.19041
Queue	SurfaceWindows
Baseline	64822a667f1f0b204ca030de78c5de2362e48029
Compare	89bba37a92053f52e38d5a5c81a1f3510319dabf
Diff	Diff
Configs	CompilationMode:tiered, RunKind:micro

Regressions in Benchstone.BenchI.Array2

Benchmark	Baseline	Test	Test/Base	Test Quality	Edge Detector	Baseline IR	Compare IR	IR Ratio
Test - Duration of single invocation 📝 - Benchmark Source 📈 - ADX Test Multi Config Graph	1.23 secs	1.31 secs	1.07	0.00	False

Test Report

Repro

General Docs link: https://github.com/dotnet/performance/blob/main/docs/benchmarking-workflow-dotnet-runtime.md

git clone https://github.com/dotnet/performance.git
py .\performance\scripts\benchmarks_ci.py -f net8.0 --filter 'Benchstone.BenchI.Array2*'

Payloads

Baseline
Compare

Benchstone.BenchI.Array2.Test

ETL Files

Histogram

JIT Disasms

Docs

Profiling workflow for dotnet/runtime repository
Benchmarking workflow for dotnet/runtime repository

DrewScoggins · 2024-02-08T17:32:25Z

#97921

ghost · 2024-02-09T17:19:31Z

Tagging subscribers to this area: @JulieLeeMSFT, @jakobbotsch
See info in area-owners.md if you want to be subscribed.

Issue Details

Run Information

Name	Value
Architecture	arm64
OS	Windows 10.0.19041
Queue	SurfaceWindows
Baseline	64822a667f1f0b204ca030de78c5de2362e48029
Compare	89bba37a92053f52e38d5a5c81a1f3510319dabf
Diff	Diff
Configs	CompilationMode:tiered, RunKind:micro

Regressions in Benchstone.BenchI.Array2

Benchmark	Baseline	Test	Test/Base	Test Quality	Edge Detector	Baseline IR	Compare IR	IR Ratio
Test - Duration of single invocation 📝 - Benchmark Source 📈 - ADX Test Multi Config Graph	1.23 secs	1.31 secs	1.07	0.00	False

Test Report

Repro

General Docs link: https://github.com/dotnet/performance/blob/main/docs/benchmarking-workflow-dotnet-runtime.md

git clone https://github.com/dotnet/performance.git
py .\performance\scripts\benchmarks_ci.py -f net8.0 --filter 'Benchstone.BenchI.Array2*'

Payloads

Baseline
Compare

Benchstone.BenchI.Array2.Test

ETL Files

Histogram

JIT Disasms

Docs

Profiling workflow for dotnet/runtime repository
Benchmarking workflow for dotnet/runtime repository

Author:	performanceautofiler[bot]
Assignees:	-
Labels:	`arch-arm64`, `os-windows`, `area-CodeGen-coreclr`, `untriaged`, `runtime-coreclr`, `needs-area-label`
Milestone:	-

jakobbotsch · 2024-06-24T13:32:02Z

@EgorBot help

EgorBot · 2024-06-24T13:32:14Z

EgorBot manual

Usage: @EgorBot [-%target%] [-profiler] [raw args for BDN] `C# snippet surrounded with triple ticks`

-%target%:       Can be -arm64, -amd or -intel. Or multiple at once, e.g. '-amd -intel'
                 -intel is used when none of the targets are specified.
-profiler:       Use 'perf record' to collect a flamegraph/hot asm - shouldn't be used 
                 when the given benchmark snippet contains more than one [Benchmark]
                 Disabled by default.
-mono:           Use Mono runtime instead of CoreCLR for all targets. Should be possible to use
                 Mono interp too (LLVM is not supported yet).
                 Mono doesn't support -profiler (at least JIT)
                 To use mono-interp, use BDN args, e.g. --envvars MONO_ENV_OPTIONS:"--interpreter"
                 Disabled by default.
-[args for BDN]: Args directly passed to BDN e.g. '--disasm', see
                 https://github.com/dotnet/BenchmarkDotNet/blob/master/docs/articles/guides/console-args.md

All targets are Linux-only at the moment.
NOTE: BenchmarkRunner.Run or BenchmarkSwitcher.From* can be omitted (snippet without an entrypoint)
Although, if they're presented then Program's args must be be forwarded to Run(args: args)

NOTE: [DisassemblyDiagnoser] may cause unexpected crashes in BDN on Linux (at least on x64)

Usage example: link

jakobbotsch · 2024-06-24T13:35:32Z

@EgorBot -arm64 -perf -commit 1ed67a5 vs 7c90c57 --disasm

// Licensed to the .NET Foundation under one or more agreements.
// The .NET Foundation licenses this file to you under the MIT license.
// See the LICENSE file in the project root for more information.
//

using System.Runtime.CompilerServices;
using BenchmarkDotNet.Attributes;

namespace Benchstone.BenchI
{
    public class Array2
    {
        public const int Iterations = 500000;

        static T[][][] AllocArray<T>(int n1, int n2, int n3)
        {
            T[][][] a = new T[n1][][];
            for (int i = 0; i < n1; ++i)
            {
                a[i] = new T[n2][];
                for (int j = 0; j < n2; j++)
                {
                    a[i][j] = new T[n3];
                }
            }

            return a;
        }

        static void Initialize(int[][][] s)
        {
            for (int i = 0; i < 10; i++)
            {
                for (int j = 0; j < 10; j++)
                {
                    for (int k = 0; k < 10; k++)
                    {
                        s[i][j][k] = (2 * i) - (3 * j) + (5 * k);
                    }
                }
            }
        }

        static bool VerifyCopy(int[][][] s, int[][][] d)
        {
            for (int i = 0; i < 10; i++)
            {
                for (int j = 0; j < 10; j++)
                {
                    for (int k = 0; k < 10; k++)
                    {
                        if (s[i][j][k] != d[i][j][k])
                        {
                            return false;
                        }
                    }
                }
            }

            return true;
        }

        [MethodImpl(MethodImplOptions.NoInlining)]
        static bool Bench(int loop)
        {

            int[][][] s = AllocArray<int>(10, 10, 10);
            int[][][] d = AllocArray<int>(10, 10, 10);

            Initialize(s);

            for (; loop != 0; loop--)
            {
                for (int i = 0; i < 10; i++)
                {
                    for (int j = 0; j < 10; j++)
                    {
                        for (int k = 0; k < 10; k++)
                        {
                            d[i][j][k] = s[i][j][k];
                        }
                    }
                }
            }

            bool result = VerifyCopy(s, d);

            return result;
        }

        [Benchmark]
        public bool Test() => Bench(Iterations);
    }
}

EgorBot · 2024-06-24T13:57:08Z

Benchmark results on Arm64

BenchmarkDotNet v0.13.12, Ubuntu 22.04.4 LTS (Jammy Jellyfish)
Unknown processor
  Job-FZGSMY : .NET 9.0.0 (42.42.42.42424), Arm64 RyuJIT AdvSIMD
  Job-QSREDL : .NET 9.0.0 (42.42.42.42424), Arm64 RyuJIT AdvSIMD

Method	Toolchain	Mean	Error	Ratio	Code Size
Test	Main	1.376 s	0.0001 s	1.00	760 B
Test	PR	1.293 s	0.0006 s	0.94	760 B

BDN_Artifacts.zip

Flame graphs: Main vs PR 🔥
Hot asm: Main vs PR
Hot functions: Main vs PR

For clean perf results, make sure you have just one [Benchmark] in your app.

jakobbotsch · 2024-06-24T14:07:07Z

(I flipped the base and PR above, there is an actual ~5% regression.)

It looks like a case where we previously managed to CSE part of an address mode that ended up being repeated a number of times:
https://www.diffchecker.com/yHlIzQVp/

Notice a bunch of new lsl #3 appearing on the right; those were previously CSE'd to ubfiz on the left.

Anyway, given that the regression is so small I'm going to consider it acceptable. If we want to look closer at CSE'ing address modes that requires some closer investigations I think.

performanceautofiler bot added arch-arm64 os-windows runtime-coreclr specific to the CoreCLR runtime untriaged New issue has not been triaged by the area owner labels Feb 8, 2024

performanceautofiler bot mentioned this issue Feb 8, 2024

[SENTINEL] Autofile run complete at 2/8/2024 7:48:32 AM. 6 issues filed. dotnet/perf-autofiling-issues#28763

Closed

DrewScoggins removed the untriaged New issue has not been triaged by the area owner label Feb 8, 2024

DrewScoggins transferred this issue from dotnet/perf-autofiling-issues Feb 8, 2024

dotnet-issue-labeler bot added the needs-area-label An area label is needed to ensure this gets routed to the appropriate area owners label Feb 8, 2024

ghost added the untriaged New issue has not been triaged by the area owner label Feb 8, 2024

jeffschwMSFT added the area-CodeGen-coreclr CLR JIT compiler in src/coreclr/src/jit and related components such as SuperPMI label Feb 9, 2024

vcsjones removed the needs-area-label An area label is needed to ensure this gets routed to the appropriate area owners label Feb 13, 2024

BruceForstall assigned jakobbotsch Feb 13, 2024

BruceForstall removed the untriaged New issue has not been triaged by the area owner label Feb 13, 2024

BruceForstall added this to the 9.0.0 milestone Feb 13, 2024

jakobbotsch added the Priority:2 Work that is important, but not critical for the release label May 3, 2024

jakobbotsch closed this as not planned Won't fix, can't repro, duplicate, stale Jun 24, 2024

github-actions bot locked and limited conversation to collaborators Jul 25, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Perf] Windows/arm64: 1 Regression on 2/5/2024 4:01:54 PM #98176

[Perf] Windows/arm64: 1 Regression on 2/5/2024 4:01:54 PM #98176

performanceautofiler bot commented Feb 8, 2024 •

edited

Loading

Payloads

Benchstone.BenchI.Array2.Test

ETL Files

Histogram

JIT Disasms

Docs

DrewScoggins commented Feb 8, 2024

ghost commented Feb 9, 2024

Run Information

Regressions in Benchstone.BenchI.Array2

Repro

Payloads

Benchstone.BenchI.Array2.Test

ETL Files

Histogram

JIT Disasms

Docs

jakobbotsch commented Jun 24, 2024

EgorBot commented Jun 24, 2024

jakobbotsch commented Jun 24, 2024

EgorBot commented Jun 24, 2024

jakobbotsch commented Jun 24, 2024

[Perf] Windows/arm64: 1 Regression on 2/5/2024 4:01:54 PM #98176

[Perf] Windows/arm64: 1 Regression on 2/5/2024 4:01:54 PM #98176

Comments

performanceautofiler bot commented Feb 8, 2024 • edited Loading

Run Information

Regressions in Benchstone.BenchI.Array2

Repro

Payloads

Benchstone.BenchI.Array2.Test

ETL Files

Histogram

JIT Disasms

Docs

DrewScoggins commented Feb 8, 2024

ghost commented Feb 9, 2024

Run Information

Regressions in Benchstone.BenchI.Array2

Repro

Payloads

Benchstone.BenchI.Array2.Test

ETL Files

Histogram

JIT Disasms

Docs

jakobbotsch commented Jun 24, 2024

EgorBot commented Jun 24, 2024

jakobbotsch commented Jun 24, 2024

EgorBot commented Jun 24, 2024

jakobbotsch commented Jun 24, 2024

performanceautofiler bot commented Feb 8, 2024 •

edited

Loading