Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Perf] Windows/arm64: 1 Regression on 2/5/2024 4:01:54 PM #98176

Closed
performanceautofiler bot opened this issue Feb 8, 2024 · 7 comments
Closed

[Perf] Windows/arm64: 1 Regression on 2/5/2024 4:01:54 PM #98176

performanceautofiler bot opened this issue Feb 8, 2024 · 7 comments
Assignees
Labels
arch-arm64 area-CodeGen-coreclr CLR JIT compiler in src/coreclr/src/jit and related components such as SuperPMI os-windows Priority:2 Work that is important, but not critical for the release runtime-coreclr specific to the CoreCLR runtime
Milestone

Comments

@performanceautofiler
Copy link

performanceautofiler bot commented Feb 8, 2024

Run Information

Name Value
Architecture arm64
OS Windows 10.0.19041
Queue SurfaceWindows
Baseline 64822a667f1f0b204ca030de78c5de2362e48029
Compare 89bba37a92053f52e38d5a5c81a1f3510319dabf
Diff Diff
Configs CompilationMode:tiered, RunKind:micro

Regressions in Benchstone.BenchI.Array2

Benchmark Baseline Test Test/Base Test Quality Edge Detector Baseline IR Compare IR IR Ratio
1.23 secs 1.31 secs 1.07 0.00 False

graph
Test Report

Repro

General Docs link: https://github.com/dotnet/performance/blob/main/docs/benchmarking-workflow-dotnet-runtime.md

git clone https://github.com/dotnet/performance.git
py .\performance\scripts\benchmarks_ci.py -f net8.0 --filter 'Benchstone.BenchI.Array2*'

Payloads

Baseline
Compare

Benchstone.BenchI.Array2.Test

ETL Files

Histogram

JIT Disasms

Docs

Profiling workflow for dotnet/runtime repository
Benchmarking workflow for dotnet/runtime repository

@performanceautofiler performanceautofiler bot added arch-arm64 os-windows runtime-coreclr specific to the CoreCLR runtime untriaged New issue has not been triaged by the area owner labels Feb 8, 2024
@DrewScoggins DrewScoggins removed the untriaged New issue has not been triaged by the area owner label Feb 8, 2024
@DrewScoggins DrewScoggins transferred this issue from dotnet/perf-autofiling-issues Feb 8, 2024
@dotnet-issue-labeler dotnet-issue-labeler bot added the needs-area-label An area label is needed to ensure this gets routed to the appropriate area owners label Feb 8, 2024
@DrewScoggins
Copy link
Member

#97921

@ghost ghost added the untriaged New issue has not been triaged by the area owner label Feb 8, 2024
@jeffschwMSFT jeffschwMSFT added the area-CodeGen-coreclr CLR JIT compiler in src/coreclr/src/jit and related components such as SuperPMI label Feb 9, 2024
@ghost
Copy link

ghost commented Feb 9, 2024

Tagging subscribers to this area: @JulieLeeMSFT, @jakobbotsch
See info in area-owners.md if you want to be subscribed.

Issue Details

Run Information

Name Value
Architecture arm64
OS Windows 10.0.19041
Queue SurfaceWindows
Baseline 64822a667f1f0b204ca030de78c5de2362e48029
Compare 89bba37a92053f52e38d5a5c81a1f3510319dabf
Diff Diff
Configs CompilationMode:tiered, RunKind:micro

Regressions in Benchstone.BenchI.Array2

Benchmark Baseline Test Test/Base Test Quality Edge Detector Baseline IR Compare IR IR Ratio
1.23 secs 1.31 secs 1.07 0.00 False

graph
Test Report

Repro

General Docs link: https://github.com/dotnet/performance/blob/main/docs/benchmarking-workflow-dotnet-runtime.md

git clone https://github.com/dotnet/performance.git
py .\performance\scripts\benchmarks_ci.py -f net8.0 --filter 'Benchstone.BenchI.Array2*'

Payloads

Baseline
Compare

Benchstone.BenchI.Array2.Test

ETL Files

Histogram

JIT Disasms

Docs

Profiling workflow for dotnet/runtime repository
Benchmarking workflow for dotnet/runtime repository

Author: performanceautofiler[bot]
Assignees: -
Labels:

arch-arm64, os-windows, area-CodeGen-coreclr, untriaged, runtime-coreclr, needs-area-label

Milestone: -

@vcsjones vcsjones removed the needs-area-label An area label is needed to ensure this gets routed to the appropriate area owners label Feb 13, 2024
@BruceForstall BruceForstall removed the untriaged New issue has not been triaged by the area owner label Feb 13, 2024
@BruceForstall BruceForstall added this to the 9.0.0 milestone Feb 13, 2024
@jakobbotsch jakobbotsch added the Priority:2 Work that is important, but not critical for the release label May 3, 2024
@jakobbotsch
Copy link
Member

@EgorBot help

@EgorBot
Copy link

EgorBot commented Jun 24, 2024

EgorBot manual
Usage: @EgorBot [-%target%] [-profiler] [raw args for BDN] `C# snippet surrounded with triple ticks`
-%target%:       Can be -arm64, -amd or -intel. Or multiple at once, e.g. '-amd -intel'
                 -intel is used when none of the targets are specified.
-profiler:       Use 'perf record' to collect a flamegraph/hot asm - shouldn't be used 
                 when the given benchmark snippet contains more than one [Benchmark]
                 Disabled by default.
-mono:           Use Mono runtime instead of CoreCLR for all targets. Should be possible to use
                 Mono interp too (LLVM is not supported yet).
                 Mono doesn't support -profiler (at least JIT)
                 To use mono-interp, use BDN args, e.g. --envvars MONO_ENV_OPTIONS:"--interpreter"
                 Disabled by default.
-[args for BDN]: Args directly passed to BDN e.g. '--disasm', see
                 https://github.com/dotnet/BenchmarkDotNet/blob/master/docs/articles/guides/console-args.md

All targets are Linux-only at the moment.
NOTE: BenchmarkRunner.Run or BenchmarkSwitcher.From* can be omitted (snippet without an entrypoint)
Although, if they're presented then Program's args must be be forwarded to Run(args: args)

NOTE: [DisassemblyDiagnoser] may cause unexpected crashes in BDN on Linux (at least on x64)

Usage example: link

@jakobbotsch
Copy link
Member

@EgorBot -arm64 -perf -commit 1ed67a5 vs 7c90c57 --disasm

// Licensed to the .NET Foundation under one or more agreements.
// The .NET Foundation licenses this file to you under the MIT license.
// See the LICENSE file in the project root for more information.
//

using System.Runtime.CompilerServices;
using BenchmarkDotNet.Attributes;

namespace Benchstone.BenchI
{
    public class Array2
    {
        public const int Iterations = 500000;

        static T[][][] AllocArray<T>(int n1, int n2, int n3)
        {
            T[][][] a = new T[n1][][];
            for (int i = 0; i < n1; ++i)
            {
                a[i] = new T[n2][];
                for (int j = 0; j < n2; j++)
                {
                    a[i][j] = new T[n3];
                }
            }

            return a;
        }

        static void Initialize(int[][][] s)
        {
            for (int i = 0; i < 10; i++)
            {
                for (int j = 0; j < 10; j++)
                {
                    for (int k = 0; k < 10; k++)
                    {
                        s[i][j][k] = (2 * i) - (3 * j) + (5 * k);
                    }
                }
            }
        }

        static bool VerifyCopy(int[][][] s, int[][][] d)
        {
            for (int i = 0; i < 10; i++)
            {
                for (int j = 0; j < 10; j++)
                {
                    for (int k = 0; k < 10; k++)
                    {
                        if (s[i][j][k] != d[i][j][k])
                        {
                            return false;
                        }
                    }
                }
            }

            return true;
        }

        [MethodImpl(MethodImplOptions.NoInlining)]
        static bool Bench(int loop)
        {

            int[][][] s = AllocArray<int>(10, 10, 10);
            int[][][] d = AllocArray<int>(10, 10, 10);

            Initialize(s);

            for (; loop != 0; loop--)
            {
                for (int i = 0; i < 10; i++)
                {
                    for (int j = 0; j < 10; j++)
                    {
                        for (int k = 0; k < 10; k++)
                        {
                            d[i][j][k] = s[i][j][k];
                        }
                    }
                }
            }

            bool result = VerifyCopy(s, d);

            return result;
        }

        [Benchmark]
        public bool Test() => Bench(Iterations);
    }
}

@EgorBot
Copy link

EgorBot commented Jun 24, 2024

Benchmark results on Arm64
BenchmarkDotNet v0.13.12, Ubuntu 22.04.4 LTS (Jammy Jellyfish)
Unknown processor
  Job-FZGSMY : .NET 9.0.0 (42.42.42.42424), Arm64 RyuJIT AdvSIMD
  Job-QSREDL : .NET 9.0.0 (42.42.42.42424), Arm64 RyuJIT AdvSIMD
Method Toolchain Mean Error Ratio Code Size
Test Main 1.376 s 0.0001 s 1.00 760 B
Test PR 1.293 s 0.0006 s 0.94 760 B

BDN_Artifacts.zip

Flame graphs: Main vs PR 🔥
Hot asm: Main vs PR
Hot functions: Main vs PR

For clean perf results, make sure you have just one [Benchmark] in your app.

@jakobbotsch
Copy link
Member

(I flipped the base and PR above, there is an actual ~5% regression.)

It looks like a case where we previously managed to CSE part of an address mode that ended up being repeated a number of times:
https://www.diffchecker.com/yHlIzQVp/

Notice a bunch of new lsl #3 appearing on the right; those were previously CSE'd to ubfiz on the left.

Anyway, given that the regression is so small I'm going to consider it acceptable. If we want to look closer at CSE'ing address modes that requires some closer investigations I think.

@jakobbotsch jakobbotsch closed this as not planned Won't fix, can't repro, duplicate, stale Jun 24, 2024
@github-actions github-actions bot locked and limited conversation to collaborators Jul 25, 2024
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
arch-arm64 area-CodeGen-coreclr CLR JIT compiler in src/coreclr/src/jit and related components such as SuperPMI os-windows Priority:2 Work that is important, but not critical for the release runtime-coreclr specific to the CoreCLR runtime
Projects
None yet
Development

No branches or pull requests

6 participants