ARM64-SVE: Implement IF_SVE_ED_1A, IF_SVE_EE_1A, IF_SVE_EB_1A, IF_SVE_EC_1A #97238

amanasifkhalid · 2024-01-20T00:01:44Z

Part of #94549. These formats include some add and mov encodings, which are among the instructions we're prioritizing here.

JitDisasm output:

mov     z0.b, #-128
mov     z1.h, #0, LSL #8
mov     z2.s, #5
mov     z3.d, #127
mov     z4.b, #0
mov     z5.h, #-128, LSL #8
mov     z6.s, #5, LSL #8
mov     z7.d, #127, LSL #8
add     z0.b, z0.b, #0
sqadd   z1.h, z1.h, #0, LSL #8
sqsub   z2.s, z2.s, #1
sub     z3.d, z3.d, #128
subr    z4.b, z4.b, #255
uqadd   z5.h, z5.h, #5, LSL #8
uqsub   z6.s, z6.s, #255, LSL #8
smax    z0.b, z0.b, #-128
smax    z1.h, z1.h, #127
smin    z2.s, z2.s, #-128
smin    z3.d, z3.d, #127
umax    z4.b, z4.b, #0
umax    z5.h, z5.h, #255
umin    z6.s, z6.s, #0
umin    z7.d, z7.d, #255
mul     z0.b, z0.b, #-128
mul     z1.h, z1.h, #0
mul     z2.s, z2.s, #5
mul     z3.d, z3.d, #127

cstool output:

mov   z0.b, #-0x80
mov   z1.h, #0, LSL #8
mov   z2.s, #5
mov   z3.d, #0x7F
mov   z4.b, #0
mov   z5.h, #-0x8000
mov   z6.s, #0x500
mov   z7.d, #0x7F00
add   z0.b, z0.b, #0
sqadd z1.h, z1.h, #0, LSL #8
sqsub z2.s, z2.s, #1
sub   z3.d, z3.d, #0x80
subr  z4.b, z4.b, #0xFF
uqadd z5.h, z5.h, #0x500
uqsub z6.s, z6.s, #0xFF00
smax  z0.b, z0.b, #-128
smax  z1.h, z1.h, #127
smin  z2.s, z2.s, #-128
smin  z3.d, z3.d, #127
umax  z4.b, z4.b, #0
umax  z5.h, z5.h, #0xFF
umin  z6.s, z6.s, #0
umin  z7.d, z7.d, #0xFF
mul   z0.b, z0.b, #-128
mul   z1.h, z1.h, #0
mul   z2.s, z2.s, #5
mul   z3.d, z3.d, #127

Note that there are some diffs in the above outputs due to differences in how immediate values are printed:

cstool begins to print immediates in hex at smaller values than the JIT (e.g. 0xFF vs 255)
the JIT is more strict in printing left-shifted values as #imm, LSL #8, whereas cstool seems to only do it for immediate values of zero.

cc @dotnet/arm64-contrib.

ghost · 2024-01-20T00:01:54Z

Tagging subscribers to this area: @JulieLeeMSFT, @jakobbotsch
See info in area-owners.md if you want to be subscribed.

Issue Details

Part of #94549. These formats include some add and mov encodings, which are among the instructions we're prioritizing here.

JitDisasm output:

mov     z0.b, #-128
mov     z1.h, #0, LSL #8
mov     z2.s, #5
mov     z3.d, #127
mov     z4.b, #0
mov     z5.h, #-128, LSL #8
mov     z6.s, #5, LSL #8
mov     z7.d, #127, LSL #8
add     z0.b, z0.b, #0
sqadd   z1.h, z1.h, #0, LSL #8
sqsub   z2.s, z2.s, #1
sub     z3.d, z3.d, #128
subr    z4.b, z4.b, #255
uqadd   z5.h, z5.h, #5, LSL #8
uqsub   z6.s, z6.s, #255, LSL #8
smax    z0.b, z0.b, #-128
smax    z1.h, z1.h, #127
smin    z2.s, z2.s, #-128
smin    z3.d, z3.d, #127
umax    z4.b, z4.b, #0
umax    z5.h, z5.h, #255
umin    z6.s, z6.s, #0
umin    z7.d, z7.d, #255
mul     z0.b, z0.b, #-128
mul     z1.h, z1.h, #0
mul     z2.s, z2.s, #5
mul     z3.d, z3.d, #127

cstool output:

mov   z0.b, #-0x80
mov   z1.h, #0, LSL #8
mov   z2.s, #5
mov   z3.d, #0x7F
mov   z4.b, #0
mov   z5.h, #-0x8000
mov   z6.s, #0x500
mov   z7.d, #0x7F00
add   z0.b, z0.b, #0
sqadd z1.h, z1.h, #0, LSL #8
sqsub z2.s, z2.s, #1
sub   z3.d, z3.d, #0x80
subr  z4.b, z4.b, #0xFF
uqadd z5.h, z5.h, #0x500
uqsub z6.s, z6.s, #0xFF00
smax  z0.b, z0.b, #-128
smax  z1.h, z1.h, #127
smin  z2.s, z2.s, #-128
smin  z3.d, z3.d, #127
umax  z4.b, z4.b, #0
umax  z5.h, z5.h, #0xFF
umin  z6.s, z6.s, #0
umin  z7.d, z7.d, #0xFF
mul   z0.b, z0.b, #-128
mul   z1.h, z1.h, #0
mul   z2.s, z2.s, #5
mul   z3.d, z3.d, #127

Note that there are some diffs in the above outputs due to differences in how immediate values are printed:

cstool begins to print immediates in hex at smaller values than the JIT (e.g. 0xFF vs 255)
the JIT is more strict in printing left-shifted values as #imm, LSL #8, whereas cstool seems to only do it for immediate values of zero.

cc @dotnet/arm64-contrib.

Author:	amanasifkhalid
Assignees:	amanasifkhalid
Labels:	`area-CodeGen-coreclr`
Milestone:	-

amanasifkhalid · 2024-01-22T15:41:46Z

src/coreclr/jit/emitarm64.cpp

+        case IF_SVE_EB_1A: // ........xx...... ..hiiiiiiiiddddd -- SVE broadcast integer immediate (unpredicated)
+            switch (ins)
+            {
+                // TODO-SVE: Why are these different? MOV is an alias for DUP


I forgot to mention this above, but should these PerfScore values be different? If so, if the instruction is INS_sve_dup in emitIns_R_I, we change it to INS_sve_mov since that is the preferred disassembly. That means the INS_sve_dup case is unreachable when determining PerfScores. I can change my logic so we set the instruction to INS_sve_dup in emitIns_R_I, and then print the instruction as a mov in emitDispInsHelp, but that will require introducing some edge case logic into the latter method, as we currently don't do anything special for printing aliased instructions.

If these PerfScores should be the same, then we can avoid that issue altogether.

The alias isn't changing the encoding at all. All the bits in the instruction are the same, it's just the way it's printed that changes.

Looking at the table here
Mov and dup exist in multiple rows:

Broadcast logical bitmask immediate to vector DUPM, MOV 2 2 V -

Duplicate, immediate and indexed form DUP, MOV 2 2 V -

Duplicate, scalar form DUP, MOV 3 1 M0 -

I think you should be using the first one (2,2) for all of IF_SVE_EB_1A.

Was this incorrect in the boiler plate autogenerated code?

I see, thank you for clarifying. Yes, the boilerplate code has different PerfScore values for mov and dup for the IF_SVE_EB_1A format -- it looks like the generator tool mixed up the PerfScore values from different formats. I'll update this to use (2,2).

it looks like the generator tool mixed up the PerfScore values from different formats

@kunalspathak something to fix

amanasifkhalid · 2024-01-22T15:42:19Z

Failures are unrelated.

a74nh

LGTM

kunalspathak · 2024-01-22T18:16:28Z

src/coreclr/jit/emitarm64.cpp

+        case IF_SVE_EC_1A: // ........xx...... ..hiiiiiiiiddddd -- SVE integer add/subtract immediate (unpredicated)
+            assert(insOptsScalableStandard(id->idInsOpt()));
+            // Size specifier must be able to fit left-shifted immediate
+            assert(insOptsScalableAtLeastHalf(id->idInsOpt()) || !id->idOptionalShift());


likewise here.

kunalspathak · 2024-01-22T18:17:23Z

src/coreclr/jit/emitarm64.cpp

+        case IF_SVE_EB_1A: // ........xx...... ..hiiiiiiiiddddd -- SVE broadcast integer immediate (unpredicated)
+            assert(insOptsScalableStandard(id->idInsOpt()));
+            // Size specifier must be able to fit left-shifted immediate
+            assert(insOptsScalableAtLeastHalf(id->idInsOpt()) || !id->idOptionalShift());


not sure if I follow why we need insOptsScalableAtLeastHalf(id->idInsOpt() here? EB_1A is either https://docsmirror.github.io/A64/2023-06/dup_z_i.html or https://docsmirror.github.io/A64/2023-06/mov_dup_z_i.html and they both take B

You're correct that B is accepted. I believe if the immediate value is being left-shifted, the size specifier has to be at least 16 bits. The docs you linked say this: "The immediate operand is a signed value in the range -128 to +127, and for element widths of 16 bits or higher it may also be a signed multiple of 256 in the range -32768 to +32512 (excluding 0)."

cstool's behavior doesn't seem to match the documentation exactly, though. From my local testing, if the immediate is left-shifted but the specifier is B, cstool refuses to parse the instruction, even if the immediate being shifted is 0. I wrote this assert to match the behavior of cstool, such that if id->idOptionalShift is true, the size specifier cannot be B. @a74nh does this sound correct to you?

IMO, we should go with the arm specs. did you try validating it with LATE_DISASM?

I just tried it, and LATE_DISASM matches cstool's behavior, in that it refuses to decode the instruction if a left-shifted immediate (even 0) is used with the B size specifier. I think I'm just misinterpreting the "(excluding 0)" part of the documentation linked above, and the assert is correct in disallowing the B specifier to be used with shifted immediates.

got it. Let's wait for @a74nh to comment on this.

You're correct that B is accepted. I believe if the immediate value is being left-shifted, the size specifier has to be at least 16 bits. The docs you linked say this: "The immediate operand is a signed value in the range -128 to +127, and for element widths of 16 bits or higher it may also be a signed multiple of 256 in the range -32768 to +32512 (excluding 0)."

Yes. The optional shift is being used to specify the range -32768 to +32512. That range is only valid for widths of H or wider. Therefore, it's not valid to specify a shift with size B.

It also makes sense as a value shifted by 8 cannot fit into a B, and so would always be 0.

if the immediate is left-shifted but the specifier is B, cstool refuses to parse the instruction, even if the immediate being shifted is 0.

That makes sense. It's still trying to left shift 0 by 8, and the documentation says left shift isn't valid for B. At a microarchitecture level, it wouldn't to useful encode a special case just to allow that.

I'm happy with the asserts.

kunalspathak · 2024-01-22T18:21:51Z

src/coreclr/jit/emitarm64.cpp

@@ -6244,14 +6353,21 @@ void emitter::emitIns_R_I(instruction ins,
    assert(canEncode);
    assert(fmt != IF_NONE);

-    instrDesc* id = emitNewInstrSC(attr, imm);
+    // Instructions with optional shifts need larger instrDesc to store state
+    instrDesc* id = optionalShift ? emitNewInstrCns(attr, imm) : emitNewInstrSC(attr, imm);


please double check the TP cost for this.

Sure thing. This is from an older run, but TP shouldn't have changed since my last push. There is a TP impact, but it's quite small, and I think it only comes from the additional branch from checking optionalShift. The larger instrDesc should only affect SVE instructions as of writing, as optionalShift is only true for the new encodings added in this PR.

There are 2 compares (probably c++ eliminates redundant comparison) happening for non-sve and that is increasing the tp. May be in future, we should have emitIns_SVE_R_I() .

Can you change this to so the frequent branch is always true.

instrDesc* id = nullptr; if (!optionalShift) { id = ... ... } else { ... id = emitNewInstrCns(attr, imm); id->idOptionalShift(hasShift); }

The updated diffs look the same.

yeah, we might eventually have to move to emitIns_sve_, but lets hold off on that for now.

kunalspathak · 2024-01-22T19:33:03Z

Assembly diffs

Throughput diffs

Throughput diffs for linux/arm64 ran on windows/x64

Overall (+0.01%)

Collection	PDIFF
benchmarks.run.linux.arm64.checked.mch	+0.01%
benchmarks.run_pgo.linux.arm64.checked.mch	+0.01%
benchmarks.run_tiered.linux.arm64.checked.mch	+0.01%
coreclr_tests.run.linux.arm64.checked.mch	+0.01%
libraries.crossgen2.linux.arm64.checked.mch	+0.01%
libraries.pmi.linux.arm64.checked.mch	+0.01%
libraries_tests.run.linux.arm64.Release.mch	+0.01%
libraries_tests_no_tiered_compilation.run.linux.arm64.Release.mch	+0.01%
realworld.run.linux.arm64.checked.mch	+0.01%
smoke_tests.nativeaot.linux.arm64.checked.mch	+0.01%

MinOpts (+0.00% to +0.04%)

Collection	PDIFF
benchmarks.run.linux.arm64.checked.mch	+0.03%
benchmarks.run_pgo.linux.arm64.checked.mch	+0.02%
benchmarks.run_tiered.linux.arm64.checked.mch	+0.02%
coreclr_tests.run.linux.arm64.checked.mch	+0.02%
libraries.crossgen2.linux.arm64.checked.mch	+0.02%
libraries_tests.run.linux.arm64.Release.mch	+0.02%
libraries_tests_no_tiered_compilation.run.linux.arm64.Release.mch	+0.03%
realworld.run.linux.arm64.checked.mch	+0.04%
smoke_tests.nativeaot.linux.arm64.checked.mch	+0.01%

FullOpts (+0.01%)

Collection	PDIFF
benchmarks.run.linux.arm64.checked.mch	+0.01%
benchmarks.run_pgo.linux.arm64.checked.mch	+0.01%
benchmarks.run_tiered.linux.arm64.checked.mch	+0.01%
coreclr_tests.run.linux.arm64.checked.mch	+0.01%
libraries.crossgen2.linux.arm64.checked.mch	+0.01%
libraries.pmi.linux.arm64.checked.mch	+0.01%
libraries_tests.run.linux.arm64.Release.mch	+0.01%
libraries_tests_no_tiered_compilation.run.linux.arm64.Release.mch	+0.01%
realworld.run.linux.arm64.checked.mch	+0.01%
smoke_tests.nativeaot.linux.arm64.checked.mch	+0.01%

Throughput diffs for osx/arm64 ran on windows/x64

Overall (+0.01%)

Collection	PDIFF
benchmarks.run.osx.arm64.checked.mch	+0.01%
benchmarks.run_pgo.osx.arm64.checked.mch	+0.01%
benchmarks.run_tiered.osx.arm64.checked.mch	+0.01%
coreclr_tests.run.osx.arm64.checked.mch	+0.01%
libraries.crossgen2.osx.arm64.checked.mch	+0.01%
libraries.pmi.osx.arm64.checked.mch	+0.01%
libraries_tests.run.osx.arm64.Release.mch	+0.01%
libraries_tests_no_tiered_compilation.run.osx.arm64.Release.mch	+0.01%
realworld.run.osx.arm64.checked.mch	+0.01%

MinOpts (+0.00% to +0.04%)

Collection	PDIFF
benchmarks.run.osx.arm64.checked.mch	+0.01%
benchmarks.run_pgo.osx.arm64.checked.mch	+0.02%
benchmarks.run_tiered.osx.arm64.checked.mch	+0.02%
coreclr_tests.run.osx.arm64.checked.mch	+0.02%
libraries.crossgen2.osx.arm64.checked.mch	+0.02%
libraries_tests.run.osx.arm64.Release.mch	+0.02%
libraries_tests_no_tiered_compilation.run.osx.arm64.Release.mch	+0.03%
realworld.run.osx.arm64.checked.mch	+0.04%

FullOpts (+0.01%)

Collection	PDIFF
benchmarks.run.osx.arm64.checked.mch	+0.01%
benchmarks.run_pgo.osx.arm64.checked.mch	+0.01%
benchmarks.run_tiered.osx.arm64.checked.mch	+0.01%
coreclr_tests.run.osx.arm64.checked.mch	+0.01%
libraries.crossgen2.osx.arm64.checked.mch	+0.01%
libraries.pmi.osx.arm64.checked.mch	+0.01%
libraries_tests.run.osx.arm64.Release.mch	+0.01%
libraries_tests_no_tiered_compilation.run.osx.arm64.Release.mch	+0.01%
realworld.run.osx.arm64.checked.mch	+0.01%

Throughput diffs for windows/arm64 ran on windows/x64

Overall (+0.01%)

Collection	PDIFF
benchmarks.run.windows.arm64.checked.mch	+0.01%
benchmarks.run_pgo.windows.arm64.checked.mch	+0.01%
benchmarks.run_tiered.windows.arm64.checked.mch	+0.01%
coreclr_tests.run.windows.arm64.checked.mch	+0.01%
libraries.crossgen2.windows.arm64.checked.mch	+0.01%
libraries.pmi.windows.arm64.checked.mch	+0.01%
libraries_tests.run.windows.arm64.Release.mch	+0.01%
libraries_tests_no_tiered_compilation.run.windows.arm64.Release.mch	+0.01%
realworld.run.windows.arm64.checked.mch	+0.01%
smoke_tests.nativeaot.windows.arm64.checked.mch	+0.01%

MinOpts (+0.00% to +0.04%)

Collection	PDIFF
benchmarks.run.windows.arm64.checked.mch	+0.01%
benchmarks.run_pgo.windows.arm64.checked.mch	+0.02%
benchmarks.run_tiered.windows.arm64.checked.mch	+0.02%
coreclr_tests.run.windows.arm64.checked.mch	+0.02%
libraries.crossgen2.windows.arm64.checked.mch	+0.02%
libraries_tests.run.windows.arm64.Release.mch	+0.02%
libraries_tests_no_tiered_compilation.run.windows.arm64.Release.mch	+0.03%
realworld.run.windows.arm64.checked.mch	+0.04%
smoke_tests.nativeaot.windows.arm64.checked.mch	+0.01%

FullOpts (+0.01%)

Collection	PDIFF
benchmarks.run.windows.arm64.checked.mch	+0.01%
benchmarks.run_pgo.windows.arm64.checked.mch	+0.01%
benchmarks.run_tiered.windows.arm64.checked.mch	+0.01%
coreclr_tests.run.windows.arm64.checked.mch	+0.01%
libraries.crossgen2.windows.arm64.checked.mch	+0.01%
libraries.pmi.windows.arm64.checked.mch	+0.01%
libraries_tests.run.windows.arm64.Release.mch	+0.01%
libraries_tests_no_tiered_compilation.run.windows.arm64.Release.mch	+0.01%
realworld.run.windows.arm64.checked.mch	+0.01%
smoke_tests.nativeaot.windows.arm64.checked.mch	+0.01%

Details here

a74nh · 2024-01-23T11:26:26Z

src/coreclr/jit/emitarm64.cpp

+        {
+            code = emitInsCodeSve(ins, fmt);
+            code |= insEncodeReg_V_4_to_0(id->idReg1());     // ddddd
+            code_t imm8 = (code_t)(emitGetInsSC(id) & 0xFF); // iiiiiiii


Given everything else is in functions, then this should be moved into a new helper function:

code_t imm8 = (code_t)(emitGetInsSC(id) & 0xFF); code |= (imm8 << 5);

Got it, fixed.

kunalspathak

LGTM

ryujit-bot · 2024-01-24T15:09:34Z

Diff results for #97238

Throughput diffs

Throughput diffs for linux/arm64 ran on windows/x64

Overall (+0.27% to +0.64%)

Collection	PDIFF
benchmarks.run.linux.arm64.checked.mch	+0.29%
benchmarks.run_pgo.linux.arm64.checked.mch	+0.34%
benchmarks.run_tiered.linux.arm64.checked.mch	+0.64%
coreclr_tests.run.linux.arm64.checked.mch	+0.56%
libraries.crossgen2.linux.arm64.checked.mch	+0.44%
libraries.pmi.linux.arm64.checked.mch	+0.30%
libraries_tests.run.linux.arm64.Release.mch	+0.44%
libraries_tests_no_tiered_compilation.run.linux.arm64.Release.mch	+0.32%
realworld.run.linux.arm64.checked.mch	+0.29%
smoke_tests.nativeaot.linux.arm64.checked.mch	+0.27%

MinOpts (+0.70% to +1.21%)

Collection	PDIFF
benchmarks.run.linux.arm64.checked.mch	+0.96%
benchmarks.run_pgo.linux.arm64.checked.mch	+0.96%
benchmarks.run_tiered.linux.arm64.checked.mch	+0.97%
coreclr_tests.run.linux.arm64.checked.mch	+0.91%
libraries.crossgen2.linux.arm64.checked.mch	+0.99%
libraries.pmi.linux.arm64.checked.mch	+0.70%
libraries_tests.run.linux.arm64.Release.mch	+0.98%
libraries_tests_no_tiered_compilation.run.linux.arm64.Release.mch	+0.97%
realworld.run.linux.arm64.checked.mch	+1.21%
smoke_tests.nativeaot.linux.arm64.checked.mch	+0.84%

FullOpts (+0.25% to +0.44%)

Collection	PDIFF
benchmarks.run.linux.arm64.checked.mch	+0.29%
benchmarks.run_pgo.linux.arm64.checked.mch	+0.26%
benchmarks.run_tiered.linux.arm64.checked.mch	+0.26%
coreclr_tests.run.linux.arm64.checked.mch	+0.29%
libraries.crossgen2.linux.arm64.checked.mch	+0.44%
libraries.pmi.linux.arm64.checked.mch	+0.30%
libraries_tests.run.linux.arm64.Release.mch	+0.25%
libraries_tests_no_tiered_compilation.run.linux.arm64.Release.mch	+0.30%
realworld.run.linux.arm64.checked.mch	+0.28%
smoke_tests.nativeaot.linux.arm64.checked.mch	+0.27%

Throughput diffs for osx/arm64 ran on windows/x64

Overall (+0.28% to +0.58%)

Collection	PDIFF
benchmarks.run.osx.arm64.checked.mch	+0.28%
benchmarks.run_pgo.osx.arm64.checked.mch	+0.40%
benchmarks.run_tiered.osx.arm64.checked.mch	+0.58%
coreclr_tests.run.osx.arm64.checked.mch	+0.55%
libraries.crossgen2.osx.arm64.checked.mch	+0.44%
libraries.pmi.osx.arm64.checked.mch	+0.30%
libraries_tests.run.osx.arm64.Release.mch	+0.50%
libraries_tests_no_tiered_compilation.run.osx.arm64.Release.mch	+0.32%
realworld.run.osx.arm64.checked.mch	+0.28%

MinOpts (+0.70% to +1.21%)

Collection	PDIFF
benchmarks.run.osx.arm64.checked.mch	+1.04%
benchmarks.run_pgo.osx.arm64.checked.mch	+0.98%
benchmarks.run_tiered.osx.arm64.checked.mch	+0.99%
coreclr_tests.run.osx.arm64.checked.mch	+0.89%
libraries.crossgen2.osx.arm64.checked.mch	+0.99%
libraries.pmi.osx.arm64.checked.mch	+0.70%
libraries_tests.run.osx.arm64.Release.mch	+0.98%
libraries_tests_no_tiered_compilation.run.osx.arm64.Release.mch	+0.97%
realworld.run.osx.arm64.checked.mch	+1.21%

FullOpts (+0.25% to +0.44%)

Collection	PDIFF
benchmarks.run.osx.arm64.checked.mch	+0.28%
benchmarks.run_pgo.osx.arm64.checked.mch	+0.25%
benchmarks.run_tiered.osx.arm64.checked.mch	+0.26%
coreclr_tests.run.osx.arm64.checked.mch	+0.29%
libraries.crossgen2.osx.arm64.checked.mch	+0.44%
libraries.pmi.osx.arm64.checked.mch	+0.30%
libraries_tests.run.osx.arm64.Release.mch	+0.25%
libraries_tests_no_tiered_compilation.run.osx.arm64.Release.mch	+0.30%
realworld.run.osx.arm64.checked.mch	+0.28%

Throughput diffs for windows/arm64 ran on windows/x64

Overall (+0.27% to +0.57%)

Collection	PDIFF
benchmarks.run.windows.arm64.checked.mch	+0.27%
benchmarks.run_pgo.windows.arm64.checked.mch	+0.35%
benchmarks.run_tiered.windows.arm64.checked.mch	+0.57%
coreclr_tests.run.windows.arm64.checked.mch	+0.56%
libraries.crossgen2.windows.arm64.checked.mch	+0.44%
libraries.pmi.windows.arm64.checked.mch	+0.30%
libraries_tests.run.windows.arm64.Release.mch	+0.50%
libraries_tests_no_tiered_compilation.run.windows.arm64.Release.mch	+0.32%
realworld.run.windows.arm64.checked.mch	+0.29%
smoke_tests.nativeaot.windows.arm64.checked.mch	+0.27%

MinOpts (+0.70% to +1.21%)

Collection	PDIFF
benchmarks.run.windows.arm64.checked.mch	+1.04%
benchmarks.run_pgo.windows.arm64.checked.mch	+0.97%
benchmarks.run_tiered.windows.arm64.checked.mch	+0.99%
coreclr_tests.run.windows.arm64.checked.mch	+0.90%
libraries.crossgen2.windows.arm64.checked.mch	+0.99%
libraries.pmi.windows.arm64.checked.mch	+0.70%
libraries_tests.run.windows.arm64.Release.mch	+0.98%
libraries_tests_no_tiered_compilation.run.windows.arm64.Release.mch	+0.97%
realworld.run.windows.arm64.checked.mch	+1.21%
smoke_tests.nativeaot.windows.arm64.checked.mch	+0.83%

FullOpts (+0.25% to +0.44%)

Collection	PDIFF
benchmarks.run.windows.arm64.checked.mch	+0.27%
benchmarks.run_pgo.windows.arm64.checked.mch	+0.25%
benchmarks.run_tiered.windows.arm64.checked.mch	+0.26%
coreclr_tests.run.windows.arm64.checked.mch	+0.29%
libraries.crossgen2.windows.arm64.checked.mch	+0.44%
libraries.pmi.windows.arm64.checked.mch	+0.30%
libraries_tests.run.windows.arm64.Release.mch	+0.25%
libraries_tests_no_tiered_compilation.run.windows.arm64.Release.mch	+0.30%
realworld.run.windows.arm64.checked.mch	+0.28%
smoke_tests.nativeaot.windows.arm64.checked.mch	+0.27%

Details here

amanasifkhalid added 3 commits January 19, 2024 16:49

Implement IF_SVE_ED_1A, IF_SVE_EE_1A

30f0bfa

Implement IF_SVE_EB_1A

57d1e2c

Implement IF_SVE_EC_1A

ef49578

dotnet-issue-labeler bot added the area-CodeGen-coreclr CLR JIT compiler in src/coreclr/src/jit and related components such as SuperPMI label Jan 20, 2024

ghost assigned amanasifkhalid Jan 20, 2024

amanasifkhalid added the arm-sve Work related to arm64 SVE/SVE2 support label Jan 20, 2024

This was referenced Jan 20, 2024

Test failing: System.Globalization.Tests.CompareInfoHashCodeTests.CheckHashingOfSkippedChars #97241

Closed

System.Diagnostics.Tracing.Tests BasicEventSourceTests.TestsManifestGeneration test fails on windows x86 leg #97255

Closed

kunalspathak mentioned this pull request Jan 22, 2024

Arm64: Implement SVE encodings #94549

Closed

amanasifkhalid commented Jan 22, 2024

View reviewed changes

Fix PerfScore

e17493e

a74nh approved these changes Jan 22, 2024

View reviewed changes

kunalspathak reviewed Jan 22, 2024

View reviewed changes

Refactor instrDesc init

953f058

a74nh reviewed Jan 23, 2024

View reviewed changes

amanasifkhalid added 2 commits January 23, 2024 11:59

Create insEncodeImm8 helper

b57d6cc

merge from main

5239b91

kunalspathak approved these changes Jan 23, 2024

View reviewed changes

amanasifkhalid merged commit bf10f73 into dotnet:main Jan 23, 2024
129 checks passed

amanasifkhalid deleted the sve-add branch January 23, 2024 22:54

github-actions bot locked and limited conversation to collaborators Feb 24, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

ARM64-SVE: Implement IF_SVE_ED_1A, IF_SVE_EE_1A, IF_SVE_EB_1A, IF_SVE_EC_1A #97238

ARM64-SVE: Implement IF_SVE_ED_1A, IF_SVE_EE_1A, IF_SVE_EB_1A, IF_SVE_EC_1A #97238

amanasifkhalid commented Jan 20, 2024

ghost commented Jan 20, 2024

amanasifkhalid Jan 22, 2024

a74nh Jan 22, 2024

amanasifkhalid Jan 22, 2024

a74nh Jan 22, 2024

amanasifkhalid commented Jan 22, 2024

a74nh left a comment

kunalspathak Jan 22, 2024

kunalspathak Jan 22, 2024

amanasifkhalid Jan 22, 2024

kunalspathak Jan 22, 2024

amanasifkhalid Jan 22, 2024

kunalspathak Jan 22, 2024

a74nh Jan 23, 2024

kunalspathak Jan 22, 2024

amanasifkhalid Jan 22, 2024

kunalspathak Jan 22, 2024

amanasifkhalid Jan 22, 2024

kunalspathak Jan 22, 2024

kunalspathak commented Jan 22, 2024

a74nh Jan 23, 2024

amanasifkhalid Jan 23, 2024

kunalspathak left a comment

ryujit-bot commented Jan 24, 2024

Throughput diffs

Throughput diffs for linux/arm64 ran on windows/x64

Throughput diffs for osx/arm64 ran on windows/x64

Throughput diffs for windows/arm64 ran on windows/x64

Broadcast logical bitmask immediate to vector	DUPM, MOV	2	2	V	-
Duplicate, immediate and indexed form	DUP, MOV	2	2	V	-
Duplicate, scalar form	DUP, MOV	3	1	M0	-

ARM64-SVE: Implement IF_SVE_ED_1A, IF_SVE_EE_1A, IF_SVE_EB_1A, IF_SVE_EC_1A #97238

ARM64-SVE: Implement IF_SVE_ED_1A, IF_SVE_EE_1A, IF_SVE_EB_1A, IF_SVE_EC_1A #97238

Conversation

amanasifkhalid commented Jan 20, 2024

ghost commented Jan 20, 2024

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

amanasifkhalid commented Jan 22, 2024

a74nh left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

kunalspathak commented Jan 22, 2024

Assembly diffs

Throughput diffs

Throughput diffs for linux/arm64 ran on windows/x64

Throughput diffs for osx/arm64 ran on windows/x64

Throughput diffs for windows/arm64 ran on windows/x64

Choose a reason for hiding this comment

Choose a reason for hiding this comment

kunalspathak left a comment

Choose a reason for hiding this comment

ryujit-bot commented Jan 24, 2024

Throughput diffs

Throughput diffs for linux/arm64 ran on windows/x64

Throughput diffs for osx/arm64 ran on windows/x64

Throughput diffs for windows/arm64 ran on windows/x64