[Arm64] Vector Load/Store structure instructions #33461

echesakov · 2020-03-11T02:45:29Z

This adds support in the JIT emitter for Vector Load/Store structure instructions:

LD1 (1-4 registers)
LD2
LD3
LD4
LD1R
LD2R
LD3R
LD4R
ST1 (1-4 registers)
ST2
ST3
ST4

in the following addressing modes:

Base register only
Post-indexed by a 64-bit register
Post-indexed by an immediate, equal to the number of bytes transferred

I supported multiple structures as well as single structure variants.

This also adds support in JitDump for printing

SIMD vector register list, e.g. ld1 {v5.16b, v6.16b, v7.16b, v8.16b}, [x9]
SIMD vector element list, .e.g st1 {v0.b}[3], [x1],#1

PerfScore numbers are according to Arm Cortex-A55 Software Optimization Guide

I validated the correctness of the instructions' encodings by comparing JitDump with WinDbg u command output. I attached the outputs of both:
jitDump.txt
windbg-u.txt

Examples of the instruction usages can be found in codegenarm64.cpp in Arm64 emitter unit tests collection.

This is needed for #24771

…64.h

…codegenarm64.cpp

… emitarm64.h

* Load/Store multiple structures base register * Load single structure and replicate base register

* Load/Store multiple structures post-indexed by an immediate * Load/Store single structure base register

* Load single structure and replicate post-indexed by an immediate

* Load/Store multiple structures post-indexed by a register * Load single structure and replicate post-indexed by a register

* Load/Store single structure post-indexed by a register

* Load/Store single structure post-indexed by an immediate

…h emitarm64.cpp

echesakov · 2020-03-11T18:03:17Z

@dotnet/jit-contrib PTAL

CarolEidt · 2020-03-12T00:23:56Z

This looks reasonable - is the plan to put these in now, and then defer the associated intrinsics that require the register allocator to support allocating contiguous registers?

echesakov · 2020-03-12T00:32:16Z

This looks reasonable - is the plan to put these in now, and then defer the associated intrinsics that require the register allocator to support allocating contiguous registers?

Yes, LD1/ST1 (multple structures) will used via LoadVector64/LoadVector128/Store and there is a proposal how to utilize LD1R #33490

But this is almost it for now. However, we should decide how we are going to expose contiguous registers in API soon.

tannergooding · 2020-03-12T03:14:49Z

But this is almost it for now. However, we should decide how we are going to expose contiguous registers in API soon.

We can bring this up in the upcoming API review next Tuesday (the 17th).

echesakov · 2020-03-13T17:07:36Z

@dotnet/jit-contrib ping

src/coreclr/src/jit/emit.h

tannergooding · 2020-03-13T17:21:50Z

src/coreclr/src/jit/emitarm64.cpp

+//
+/*static*/ unsigned emitter::insGetLoadStoreVectorSelem(instruction ins)
+{
+    unsigned selem = 0;


could we call this elementCount or structureElements or something a bit more descriptive?

(same goes for the method name)

sure, I can rename it to insGetLoadStoreVectorStructureElements

After more carefull reading Arm specs I don't think it should be called insGetLoadStoreVectorStructureElements

It should be insGetLoadStoreRegisterListSize
For LD1 (2 registers) - number of structure elements = 1 since the instruction loads 2 single-element structures to 2 registers.

And this function reflects how many registers ins operates on

That's sounds good. I just don't think Selem is very self explanatory.

tannergooding · 2020-03-13T17:24:57Z

src/coreclr/src/jit/emitarm64.cpp

- *  Display a register
- */
+//------------------------------------------------------------------------
+// emitDispReg: Display a general-purpose register name or SIMD and floating-point scalar register name


nit: If we are updating these, we should likely do it fully and also document the parameters, etc

agree. will do

tannergooding

Overall LGTM, just a comment about not liking the selem name.

It also would have been nice to separate out the refactorings into a separate PR to help with review.

tannergooding · 2020-03-13T17:28:06Z

Just realized, what about C# tests for the functions?

echesakov · 2020-03-13T17:30:01Z

Just realized, what about C# tests for the functions?

@tannergooding this part is only implementing instructions in the backend - there is a separate wip PR for Store intrinsic #33535 - where I have c# tests

echesakov · 2020-03-13T17:30:32Z

Overall LGTM, just a comment about not liking the selem name.
It also would have been nice to separate out the refactorings into a separate PR to help with review.

Thanks for the review!

briansull

With comment changes

briansull · 2020-03-13T21:33:11Z

src/coreclr/src/jit/emitarm64.cpp

-            elemsize = optGetElemsize(id->idInsOpt());
-            imm      = emitGetInsSC(id);
+        case IF_LS_2F: // LS_2F   .Q.............. ...Sssnnnnnttttt      Vt[] Rn
+        case IF_LS_2G: // LS_2G   .Q.............. ...Sssnnnnnttttt      Vt[] Rn


This should have the xx fields as well:

.Q.............. xx.Sssnnnnnttttt Vt[] Rn

briansull · 2020-03-13T21:37:23Z

src/coreclr/src/jit/emitarm64.cpp

-            emitDispReg(id->idReg1(), emitInsTargetRegSize(id), true);
-            emitDispAddrRI(id->idReg2(), id->idInsOpt(), 0);
+        case IF_LS_2F: // LS_2F   .Q.............. ...Sssnnnnnttttt      Vt[] Rn
+        case IF_LS_2G: // LS_2G   .Q.............. ...Sssnnnnnttttt      Vt[] Rn


This should have the xx fields as well:

.Q.............. xx.Sssnnnnnttttt Vt[] Rn

briansull · 2020-03-13T21:38:19Z

src/coreclr/src/jit/emitarm64.cpp

+        case IF_LS_2D: // LS_2D   .Q.............. ....ssnnnnnttttt      Vt Rn
+        case IF_LS_2E: // LS_2E   .Q.............. ....ssnnnnnttttt      Vt Rn
+        case IF_LS_2F: // LS_2F   .Q.............. ...Sssnnnnnttttt      Vt[] Rn
+        case IF_LS_2G: // LS_2G   .Q.............. ...Sssnnnnnttttt      Vt[] Rn


These two IF_LS_2F and IF_LS_2G should have the xx fields as well:

.Q.............. xx.Sssnnnnnttttt Vt[] Rn

briansull · 2020-03-13T21:39:11Z

src/coreclr/src/jit/emitarm64.cpp

+        case IF_LS_2D: // LS_2D   .Q.............. ....ssnnnnnttttt      Vt Rn
+        case IF_LS_2E: // LS_2E   .Q.............. ....ssnnnnnttttt      Vt Rn
+        case IF_LS_2F: // LS_2F   .Q.............. ...Sssnnnnnttttt      Vt[] Rn
+        case IF_LS_2G: // LS_2G   .Q.............. ...Sssnnnnnttttt      Vt[] Rn


These two IF_LS_2F and IF_LS_2G should have the xx fields as well:

.Q.............. xx.Sssnnnnnttttt Vt[] Rn

I will make these changes in #33535

BruceForstall

LGTM, just a few nits

BruceForstall · 2020-03-13T21:49:45Z

src/coreclr/src/jit/emitarm64.h

@@ -39,7 +39,10 @@ void emitDispLSExtendOpts(insOpts opt);
 void emitDispReg(regNumber reg, emitAttr attr, bool addComma);
 void emitDispVectorReg(regNumber reg, insOpts opt, bool addComma);
 void emitDispVectorRegIndex(regNumber reg, emitAttr elemsize, ssize_t index, bool addComma);
+void emitDispVectorRegList(regNumber firstReg, unsigned listSize, insOpts opt, bool addComma);


nit: listSize => listLength? (same below)

I will follow up in #33535

BruceForstall · 2020-03-13T21:50:22Z

src/coreclr/src/jit/emitarm64.h

 void emitDispArrangement(insOpts opt);
+void emitDispElemsize(emitAttr elemsize);


nit: capitalize "size", emitDispElemsize => emitDispElemSize?

BruceForstall · 2020-03-13T21:50:57Z

src/coreclr/src/jit/emitarm64.h

@@ -445,6 +448,10 @@ static emitAttr optGetSrcsize(insOpts conversion);
 //    for an element of size 'elemsize' in a vector register of size 'datasize'
 static bool isValidVectorIndex(emitAttr datasize, emitAttr elemsize, ssize_t index);

+// For a given Load/Store Vector instruction 'ins' returns a number of consecutive SIMD registers
+// the instruction loads to/store from.


nit: "store from" => "stores from"

BruceForstall · 2020-03-13T21:51:21Z

src/coreclr/src/jit/emitarm64.h

@@ -445,6 +448,10 @@ static emitAttr optGetSrcsize(insOpts conversion);
 //    for an element of size 'elemsize' in a vector register of size 'datasize'
 static bool isValidVectorIndex(emitAttr datasize, emitAttr elemsize, ssize_t index);

+// For a given Load/Store Vector instruction 'ins' returns a number of consecutive SIMD registers
+// the instruction loads to/store from.
+static unsigned insGetLoadStoreRegisterListSize(instruction ins);


nit: ListSize => ListLength?

BruceForstall · 2020-03-13T22:23:53Z

src/coreclr/src/jit/instrsarm64.h

+                                   //  ld2     {Vt,Vt2},[Xn]        LS_2D  0Q00110001000000 1000ssnnnnnttttt   0C40 8000   base register
+                                   //  ld2     {Vt,Vt2},[Xn],Xm     LS_3F  0Q001100110mmmmm 1000ssnnnnnttttt   0CC0 8000   post-indexed by a register
+                                   //  ld2     {Vt,Vt2},[Xn],#imm   LS_2E  0Q001100110mmmmm 1000ssnnnnnttttt   0CDF 8000   post-indexed by an immediate
+                                   //  ld2     {Vt,Vt2}[],[Xn]      LS_2F  0Q00110101100000 xx0Sssnnnnnttttt   0D60 0000   base register


I wonder if it would be helpful to include a comment for each of these that would lead you to the correct reference manual page, e.g., "C7.2.174 LD2 (single structure)" (hopefully ARM doesn't renumber these... but at least the name would be the same)

Yes, it's a good idea - I will add

BruceForstall · 2020-03-13T22:25:41Z

src/coreclr/src/jit/instrsarm64.h

@@ -378,6 +436,56 @@ INST3(mvn,     "mvn",    0, 0, IF_EN3I,   0x2A2003E0,  0x2A2003E0,  0x2E205800)
                                   //  mvn     Rd,(Rm,shk,imm)      DR_2F  X0101010sh1mmmmm iiiiii11111ddddd   2A20 03E0   Rm {LSL,LSR,ASR} imm(0-63)
                                   //  mvn     Vd,Vn                DV_2M  0Q10111000100000 010110nnnnnddddd   2E20 5800   Vd,Vn    (vector)

+//    enum     name     FP LD/ST            LS_2D        LS_3F        LS_2E
+INST3(ld1_2regs,"ld1",   0,LD, IF_EN3J,   0x0C40A000,  0x0CC0A000,  0x0CDFA000)


I guess you couldn't figure out a way to merge these (ld1_2regs, ld1_3regs, ld1_4regs, st1_2regs, etc.) with their respective ld1/st1 definitions, above? Or create a single ld1_multiregs/st1_multiregs that is distinguished in code with insOpts or similar?

Yes, I have considered different approaches while adding these instructions.

Since multiple registers variants exist only for a small number of instructions, namely ld[1234], ld[1234]r,st[1234] ,tbx and tbl and only a subset of those require a way to specify a number of registers (since it's implied for ld[1234]r, ld[234], st[234]) I decided not to disrupt other emitter parts with adding number of registers to insOpts.

BruceForstall · 2020-03-13T22:27:05Z

src/coreclr/src/jit/codegenarm64.cpp

@@ -5219,6 +5219,726 @@ void CodeGen::genArm64EmitterUnitTests()

 #endif // ALL_ARM64_EMITTER_UNIT_TESTS

+#ifdef ALL_ARM64_EMITTER_UNIT_TESTS
+    //
+    // Loads to /Stores from one, two, three, or four SIMD&FP registers


"Loads to /Stores" => "Loads to and Stores from"? (same below)

I thought "/Stores" was a typo so I was confused reading it.

BruceForstall · 2020-03-13T22:33:21Z

src/coreclr/src/jit/emitarm64.cpp

+//   ins - A Load/Store Vector instruction (e.g. ld1 (2 registers), ld1r, st1).
+//
+// Return value:
+//   A number of consecutive SIMD and floating-point registers the instruction loads to/store from.


nit: "store from"=>"stores from"

BruceForstall · 2020-03-13T22:40:30Z

src/coreclr/src/jit/emitarm64.cpp

- *  Display an vector register index suffix
- */
+//------------------------------------------------------------------------
+// emitDispVectorRegIndex: Display a SIMD vector register name with element index


Would this be clearer named emitDispVectorRegWithIndex? or emitDispVectorRegWithSizeAndIndex

The "official" name for this is SIMD vector element name (see C1.2.5 Register names) so we might also rename it to emitDispVectorElementName

BruceForstall · 2020-03-13T22:42:42Z

src/coreclr/src/jit/emitarm64.cpp

+//------------------------------------------------------------------------
+// emitDispVectorElemList: Display a SIMD vector element list
+//
+void emitter::emitDispVectorElemList(


Would this be clearer as emitDispVectorRegListWithSize? As named, I wasn't sure the difference between emitDispVectorRegList and emitDispVectorElemList -- namely, what is a "VectorElem"?

These terminology from C1.2.5 Register names - I don't have a strong preference how to name the function - I tried to follow what the Arm docs said.

echesakov · 2020-03-14T00:07:58Z

The two Linux sigsegv failures are due to an issue reported in #33562 (comment). An arm64 leg passes, so merging.

echesakov added arch-arm64 area-CodeGen-coreclr CLR JIT compiler in src/coreclr/src/jit and related components such as SuperPMI labels Mar 11, 2020

echesakov added 22 commits March 10, 2020 20:00

Update ld1 in instrsarm64.h

e31f305

Add ld2, ld3, ld4, st1, st2, st3, st4 in instrsarm64.h

cddcb09

Add ld1, st1 operating on multiple registers in instrsarm64.h

d11a956

Add ld1r, ld2r, ld3r, ld4r in instrsarm64.h

66cb52e

Remove EN4J, add EN6B and EN3J in emitarm64.cpp emitfmtsarm64.h

d591c2b

Update LS_2D, LS_2E, LS_3F, LS_3G and add LS_2F, LS_2G in emitfmtsarm…

b559403

…64.h

Add Arm64 emitter unit tests for "Load/Store Vector" instructions in …

e8524fe

…codegenarm64.cpp

Add emitter::emitDispElemsize in emitarm64.cpp emitarm64.h

b4aa743

Update functions' headers in emitarm64.cpp

4e1ce14

Add emitDispVectorRegList and emitDispVectorElemList in emitarm64.cpp…

9cca19b

… emitarm64.h

Add insGetLoadStoreVectorSelem in emitarm64.cpp emitarm64.h

cd88213

Update emitIns_R_R in emitarm64.cpp

8fe0715

* Load/Store multiple structures base register * Load single structure and replicate base register

Update emitIns_R_R_I in emitarm64.cpp

e7ce778

* Load/Store multiple structures post-indexed by an immediate * Load/Store single structure base register

Update emitIns_R_R_I in emitarm64.cpp

b493842

* Load single structure and replicate post-indexed by an immediate

Update emitIns_R_R_R in emitarm64.cpp

ee4415b

* Load/Store multiple structures post-indexed by a register * Load single structure and replicate post-indexed by a register

Update emitIns_R_R_R_I in emitarm64.cpp

bdc9df6

* Load/Store single structure post-indexed by a register

Update emitIns_R_R_I_I in emitarm64.cpp emitarm64.h

66103d5

* Load/Store single structure post-indexed by an immediate

Update emitDispIns in emitarm64.cpp

a0bfd42

Update emitOutputInstr in emitarm64.cpp

e77156b

Update emitInsSanityCheck in emitarm64.cpp

43aed1d

Update emitInsMayWriteToGCReg in emitarm64.cpp

683a20d

Remove ld1 in emitInsTargetRegSize in emitarm64.cpp

6a53804

echesakov force-pushed the Arm64-Vector-Load-Store-Structure-Instructions branch from c0754ee to 7ecdcc6 Compare March 11, 2020 03:00

Update getMemoryOperation and getInsExecutionCharacteristics in emit.…

1e482c5

…h emitarm64.cpp

echesakov force-pushed the Arm64-Vector-Load-Store-Structure-Instructions branch from 7ecdcc6 to 1e482c5 Compare March 11, 2020 17:49

echesakov marked this pull request as ready for review March 11, 2020 18:02

echesakov mentioned this pull request Mar 12, 2020

[Arm64] Implement Store Hardware Intrinsic #33535

Merged

echesakov mentioned this pull request Mar 13, 2020

Optimize System.Collections.BitArray using arm64 intrinsics #33309

Closed

tannergooding reviewed Mar 13, 2020

View reviewed changes

src/coreclr/src/jit/emit.h Show resolved Hide resolved

tannergooding reviewed Mar 13, 2020

View reviewed changes

tannergooding approved these changes Mar 13, 2020

View reviewed changes

Address Tanner's feedback on GitHub.

c7af7d2

briansull approved these changes Mar 13, 2020

View reviewed changes

BruceForstall approved these changes Mar 13, 2020

View reviewed changes

echesakov merged commit 6b8cda0 into dotnet:master Mar 14, 2020

echesakov deleted the Arm64-Vector-Load-Store-Structure-Instructions branch March 14, 2020 00:11

echesakov added a commit to echesakov/runtime that referenced this pull request Mar 17, 2020

Address Brian's feedback on GitHub for dotnet#33461

c8d6cc4

echesakov added a commit to echesakov/runtime that referenced this pull request Mar 17, 2020

Partly address Bruce's feedback on GitHub for dotnet#33461

25fd0d8

echesakov mentioned this pull request Mar 17, 2020

[Arm64] Detect and emit right addressing mode for Load and Store intrinsics #33676

Open

echesakov mentioned this pull request Apr 20, 2020

API Proposal : Arm TableVectorLookup and TableVectorExtension intrinsics #1277

Closed

echesakov mentioned this pull request Apr 29, 2020

Add VectorTableList and TableVectorExtension intrinsics #35600

Merged

ghost locked as resolved and limited conversation to collaborators Dec 10, 2020

		void emitDispArrangement(insOpts opt);
		void emitDispElemsize(emitAttr elemsize);

[Arm64] Vector Load/Store structure instructions #33461

[Arm64] Vector Load/Store structure instructions #33461

Conversation

echesakov commented Mar 11, 2020 • edited Loading

echesakov commented Mar 11, 2020

CarolEidt commented Mar 12, 2020

echesakov commented Mar 12, 2020

tannergooding commented Mar 12, 2020

echesakov commented Mar 13, 2020

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

tannergooding left a comment

Choose a reason for hiding this comment

tannergooding commented Mar 13, 2020

echesakov commented Mar 13, 2020

echesakov commented Mar 13, 2020

briansull left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

briansull Mar 13, 2020 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

BruceForstall left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

echesakov commented Mar 14, 2020

echesakov commented Mar 11, 2020 •

edited

Loading

briansull Mar 13, 2020 •

edited

Loading