Add Arm N2 SVE perfscore file #95700

a74nh · 2023-12-06T17:16:39Z

The SVE perfscores were full of small errors. I've attempted to fix this by autogenerating a bunch of #defines in perfscorearm64sven2.h from the table in #94549 (comment)

Sadly there's no obvious way of auto mapping from instruction group to perfscore group. That still needs manually checking. But, thankfully many instruction groups map to exactly one perfscore group.

When adding new instructions groups to codegen, then easiest route for the perfscore is to:

Lookup the instructions in your group using the table in Arm64: Implement SVE encodings #94549
The name of the perf group(s) in the table can then be searched for in perfscorearm64sven2.h to find the correct defines.

I've named everything with N2 with the intention that at some point in the future we may want to add encodings for additional chipsets. It would also be useful at some point for the general and neon instructions to match the same CPU.

ghost · 2023-12-06T17:16:50Z

Tagging subscribers to this area: @JulieLeeMSFT, @jakobbotsch
See info in area-owners.md if you want to be subscribed.

Issue Details

null

Author:	a74nh
Assignees:	-
Labels:	`area-CodeGen-coreclr`, `community-contribution`
Milestone:	-

a74nh · 2023-12-06T17:22:44Z

@kunalspathak @BruceForstall @TIHan

Keeping as a draft PR for the moment in case anyone has a better idea.

I wondered at replacing the switch with a lookup table. But, all the exceptions make that non-obvious. Maybe best to keep as the switch for the time being.

a74nh · 2023-12-06T17:28:20Z

src/coreclr/jit/emitarm64.cpp

+ *  Returns the 1/2/4/8 byte elemsize for an Arm64 Sve vector instruction
+ */
+
+/*static*/ unsigned emitter::insSveElemsize(insOpts opt)


I'm now thinking this function should return bits instead of bytes.

@TIHan added this in #95618

BruceForstall · 2023-12-07T01:32:21Z

This looks completely reasonable to me.

kunalspathak · 2023-12-07T07:41:17Z

I've attempted to fix this by autogenerating a bunch of #defines in perfscorearm64sven2.h from the table in #94549 (comment)

Are they different from what is defined in emitPerfScore_sve.cpp that I have uploaded in #94549?

kunalspathak · 2023-12-07T07:42:01Z

src/coreclr/jit/perfscorearm64sven2.h

@@ -0,0 +1,868 @@
+// Licensed to the .NET Foundation under one or more agreements.


This should be included in CMakeLists.txt.

kunalspathak · 2023-12-07T07:45:44Z

src/coreclr/jit/perfscorearm64sven2.h

+#define PERFSCORE_THROUGHPUT_N2_LOOP_CONTROL_PROPAGATING PERFSCORE_THROUGHPUT_1C
+
+// Loop control, propagating and flag setting
+#define PERFSCORE_LATENCY_N2_LOOP_CONTROL_PROPAGATING_AND_FLAG_SETTING PERFSCORE_LATENCY_3C


we do not have platform or hardware specific perfscores, it is probably good to have it that way, but:

During runtime, I am not sure if we detect hardware and will ever pick the appropriate perfscores PERFSCORE_LATENCY_N2_* vs ``PERFSCORE_LATENCY_N3_*`, etc.

This easily can get outdated (which most of the perfscore happen) or unused once we add scores for newer hardware.

With the new hardware perf scores added, it will be more confusing to pick which one is best for given instruction.

They are purely used for display purpose and we do not depend on them for any optimizations.

IMO, we should just follow the existing approach of having these numbers to the best of our ability and then generalize them as the missing instruction numbers are made available.

Are they different from what is defined in emitPerfScore_sve.cpp that I have uploaded in #94549?

I hadn't spotted the latest update of emitPerfScore_sve.cpp.

But, I now realise where the errors are coming in .... for PERFSCORE_THROUGHPUT you've mixed up the X and C cases. X is when there are more than one per cycle. Looking at the file, INS_sve_fsqrt is marked as PERFSCORE_THROUGHPUT_14X which can't be right as it's always going to be a slow instruction.

With that fixed I'm thinking maybe we should stick to your file instead of this PR.

What is missing from your file is the cases where the perfscore changes depending on the element size (eg IF_SVE_AK_3A). I noticed that NEON has these.

I also think for currently unsupported instructions we should set the score high, rather than 1C.

They are purely used for display purpose and we do not depend on them for any optimizations.

What are they used for? Is it just for manually comparing the relative performance of one version of a compiled routine against another? Deciding whether a new coreclr optimisation improves or regresses the code?

If so, then my concern is that the relative performance of difference instructions can change quite a bit over time depending on microarchitecture design, and especially between different types of CPUs (eg: a mobile chip vs a server one) as they are design for different purposes. So, basing decisions on old architectures won't match the actual performance seen on real hardware. Could quite easily add an optimisation that that has a better perfscore, but worse real performance. Right now, comparing a NEON loop against an SVE loop using the existing perfscore data would be a bad idea, as I suspect NEON comes from a much older chip.

Ok, I can see why you wouldn't want multiple sets of perfscores. But, on the other hand I don't think a mix up of scores from a variety of sources is helpful. I'm not sure what a good solution would be though.

Perfscore is currently an estimate of performance that can be used in asmdiffs to determine if a change is good from a performance perspective, not just a code size perspective. See "superpmi.py asmdiffs -metrics" and jit-analyze "--metrics".

We're not sure how good these scores are. @AndyAyersMS has been doing various analyses to determine how usable perfscore is for (auto) tuning CSE instead of running benchmarks and measuring time (which is noisy, etc.).

I think choosing a particular platform and using it for all the data makes sense. Whether we ever change the set of numbers we use based on the particular target platform is TBD. We're nowhere close to being able to make that decision. Updating the existing numbers to be based on a particular modern NEON platform makes sense, but might not be worth the effort, either.

X is when there are more than one per cycle. Looking at the file, INS_sve_fsqrt is marked as PERFSCORE_THROUGHPUT_14X which can't be right as it's always going to be a slow instruction.

The way X is currently used for throughput < 1. fsqrt, that sounds correct because it is 1/14 as minimum.

What are they used for? Is it just for manually comparing the relative performance of one version of a compiled routine against another? Deciding whether a new coreclr optimisation improves or regresses the code?

That's right. We do not even have them on by default in spmi-diffs pipeline, the default we use is codeSize. But for register allocation changes, PerfScore comes handy because those changes can increase the code size but eliminate memory access and that increases the code size, but lowers the perfscore.

If so, then my concern is that the relative performance of difference instructions can change quite a bit over time depending on microarchitecture design, and especially between different types of CPUs

Yes, that is a true statement and we do not keep it upto date.

Right now, comparing a NEON loop against an SVE loop using the existing perfscore data would be a bad idea, as I suspect NEON comes from a much older chip.

I don't think we would do that. We will just compare the disassembly :)

So, basing decisions on old architectures won't match the actual performance seen on real hardware

pretty much to what @BruceForstall mentioned about our experimentation in this area.

Right now, comparing a NEON loop against an SVE loop using the existing perfscore data would be a bad idea, as I suspect NEON comes from a much older chip.

Would it make sense to just update all the data to be from a single chip then?

ghost · 2024-01-07T23:01:39Z

Draft Pull Request was automatically closed for 30 days of inactivity. Please let us know if you'd like to reopen it.

Add Arm N2 SVE perfscore file

ad2cd9d

dotnet-issue-labeler bot added the area-CodeGen-coreclr CLR JIT compiler in src/coreclr/src/jit and related components such as SuperPMI label Dec 6, 2023

ghost added the community-contribution Indicates that the PR has been added by a community member label Dec 6, 2023

a74nh commented Dec 6, 2023

View reviewed changes

This was referenced Dec 6, 2023

[8.0] Failing to build native components #94823

Closed

[8.0] [outerloop] jit-format: Format job found errors, please apply the format patch #95159

Closed

kunalspathak reviewed Dec 7, 2023

View reviewed changes

BruceForstall added the arm-sve Work related to arm64 SVE/SVE2 support label Dec 7, 2023

ghost closed this Jan 7, 2024

github-actions bot locked and limited conversation to collaborators Feb 7, 2024

This pull request was closed.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add Arm N2 SVE perfscore file #95700

Add Arm N2 SVE perfscore file #95700

a74nh commented Dec 6, 2023 •

edited by kunalspathak

Loading

ghost commented Dec 6, 2023

a74nh commented Dec 6, 2023

a74nh Dec 6, 2023

kunalspathak Dec 7, 2023

BruceForstall commented Dec 7, 2023

kunalspathak commented Dec 7, 2023

kunalspathak Dec 7, 2023 •

edited

Loading

kunalspathak Dec 7, 2023

a74nh Dec 7, 2023

BruceForstall Dec 7, 2023

kunalspathak Dec 8, 2023

MichalPetryka Dec 8, 2023

ghost commented Jan 7, 2024

		@@ -0,0 +1,868 @@
		// Licensed to the .NET Foundation under one or more agreements.

Add Arm N2 SVE perfscore file #95700

Add Arm N2 SVE perfscore file #95700

Conversation

a74nh commented Dec 6, 2023 • edited by kunalspathak Loading

ghost commented Dec 6, 2023

a74nh commented Dec 6, 2023

a74nh Dec 6, 2023

Choose a reason for hiding this comment

kunalspathak Dec 7, 2023

Choose a reason for hiding this comment

BruceForstall commented Dec 7, 2023

kunalspathak commented Dec 7, 2023

kunalspathak Dec 7, 2023 • edited Loading

Choose a reason for hiding this comment

kunalspathak Dec 7, 2023

Choose a reason for hiding this comment

a74nh Dec 7, 2023

Choose a reason for hiding this comment

BruceForstall Dec 7, 2023

Choose a reason for hiding this comment

kunalspathak Dec 8, 2023

Choose a reason for hiding this comment

MichalPetryka Dec 8, 2023

Choose a reason for hiding this comment

ghost commented Jan 7, 2024

a74nh commented Dec 6, 2023 •

edited by kunalspathak

Loading

kunalspathak Dec 7, 2023 •

edited

Loading