Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Arm64] Vector Load/Store structure instructions #33461

Merged

Conversation

echesakov
Copy link
Contributor

@echesakov echesakov commented Mar 11, 2020

This adds support in the JIT emitter for Vector Load/Store structure instructions:

  • LD1 (1-4 registers)
  • LD2
  • LD3
  • LD4
  • LD1R
  • LD2R
  • LD3R
  • LD4R
  • ST1 (1-4 registers)
  • ST2
  • ST3
  • ST4

in the following addressing modes:

  • Base register only
  • Post-indexed by a 64-bit register
  • Post-indexed by an immediate, equal to the number of bytes transferred

I supported multiple structures as well as single structure variants.

This also adds support in JitDump for printing

  • SIMD vector register list, e.g. ld1 {v5.16b, v6.16b, v7.16b, v8.16b}, [x9]
  • SIMD vector element list, .e.g st1 {v0.b}[3], [x1],#1

PerfScore numbers are according to Arm Cortex-A55 Software Optimization Guide

I validated the correctness of the instructions' encodings by comparing JitDump with WinDbg u command output. I attached the outputs of both:
jitDump.txt
windbg-u.txt

Examples of the instruction usages can be found in codegenarm64.cpp in Arm64 emitter unit tests collection.

This is needed for #24771

@echesakov echesakov added arch-arm64 area-CodeGen-coreclr CLR JIT compiler in src/coreclr/src/jit and related components such as SuperPMI labels Mar 11, 2020
* Load/Store multiple structures       base register

* Load single structure and replicate  base register
* Load/Store multiple structures  post-indexed by an immediate

* Load/Store single structure     base register
* Load single structure and replicate  post-indexed by an immediate
* Load/Store multiple structures       post-indexed by a register

* Load single structure and replicate  post-indexed by a register
* Load/Store single structure  post-indexed by a register
* Load/Store single structure  post-indexed by an immediate
@echesakov echesakov force-pushed the Arm64-Vector-Load-Store-Structure-Instructions branch from c0754ee to 7ecdcc6 Compare March 11, 2020 03:00
@echesakov echesakov force-pushed the Arm64-Vector-Load-Store-Structure-Instructions branch from 7ecdcc6 to 1e482c5 Compare March 11, 2020 17:49
@echesakov echesakov marked this pull request as ready for review March 11, 2020 18:02
@echesakov
Copy link
Contributor Author

@dotnet/jit-contrib PTAL

@CarolEidt
Copy link
Contributor

This looks reasonable - is the plan to put these in now, and then defer the associated intrinsics that require the register allocator to support allocating contiguous registers?

@echesakov
Copy link
Contributor Author

This looks reasonable - is the plan to put these in now, and then defer the associated intrinsics that require the register allocator to support allocating contiguous registers?

Yes, LD1/ST1 (multple structures) will used via LoadVector64/LoadVector128/Store and there is a proposal how to utilize LD1R #33490

But this is almost it for now. However, we should decide how we are going to expose contiguous registers in API soon.

@tannergooding
Copy link
Member

But this is almost it for now. However, we should decide how we are going to expose contiguous registers in API soon.

We can bring this up in the upcoming API review next Tuesday (the 17th).

@echesakov
Copy link
Contributor Author

@dotnet/jit-contrib ping

//
/*static*/ unsigned emitter::insGetLoadStoreVectorSelem(instruction ins)
{
unsigned selem = 0;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

could we call this elementCount or structureElements or something a bit more descriptive?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

(same goes for the method name)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

sure, I can rename it to insGetLoadStoreVectorStructureElements

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

After more carefull reading Arm specs I don't think it should be called insGetLoadStoreVectorStructureElements

It should be insGetLoadStoreRegisterListSize
For LD1 (2 registers) - number of structure elements = 1 since the instruction loads 2 single-element structures to 2 registers.

And this function reflects how many registers ins operates on

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That's sounds good. I just don't think Selem is very self explanatory.

* Display a register
*/
//------------------------------------------------------------------------
// emitDispReg: Display a general-purpose register name or SIMD and floating-point scalar register name
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: If we are updating these, we should likely do it fully and also document the parameters, etc

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

agree. will do

Copy link
Member

@tannergooding tannergooding left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Overall LGTM, just a comment about not liking the selem name.

It also would have been nice to separate out the refactorings into a separate PR to help with review.

@tannergooding
Copy link
Member

Just realized, what about C# tests for the functions?

@echesakov
Copy link
Contributor Author

Just realized, what about C# tests for the functions?

@tannergooding this part is only implementing instructions in the backend - there is a separate wip PR for Store intrinsic #33535 - where I have c# tests

@echesakov
Copy link
Contributor Author

Overall LGTM, just a comment about not liking the selem name.
It also would have been nice to separate out the refactorings into a separate PR to help with review.

Thanks for the review!

Copy link
Contributor

@briansull briansull left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

With comment changes

elemsize = optGetElemsize(id->idInsOpt());
imm = emitGetInsSC(id);
case IF_LS_2F: // LS_2F .Q.............. ...Sssnnnnnttttt Vt[] Rn
case IF_LS_2G: // LS_2G .Q.............. ...Sssnnnnnttttt Vt[] Rn
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This should have the xx fields as well:

.Q.............. xx.Sssnnnnnttttt Vt[] Rn

emitDispReg(id->idReg1(), emitInsTargetRegSize(id), true);
emitDispAddrRI(id->idReg2(), id->idInsOpt(), 0);
case IF_LS_2F: // LS_2F .Q.............. ...Sssnnnnnttttt Vt[] Rn
case IF_LS_2G: // LS_2G .Q.............. ...Sssnnnnnttttt Vt[] Rn
Copy link
Contributor

@briansull briansull Mar 13, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This should have the xx fields as well:

.Q.............. xx.Sssnnnnnttttt Vt[] Rn

case IF_LS_2D: // LS_2D .Q.............. ....ssnnnnnttttt Vt Rn
case IF_LS_2E: // LS_2E .Q.............. ....ssnnnnnttttt Vt Rn
case IF_LS_2F: // LS_2F .Q.............. ...Sssnnnnnttttt Vt[] Rn
case IF_LS_2G: // LS_2G .Q.............. ...Sssnnnnnttttt Vt[] Rn
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

These two IF_LS_2F and IF_LS_2G should have the xx fields as well:

.Q.............. xx.Sssnnnnnttttt Vt[] Rn

case IF_LS_2D: // LS_2D .Q.............. ....ssnnnnnttttt Vt Rn
case IF_LS_2E: // LS_2E .Q.............. ....ssnnnnnttttt Vt Rn
case IF_LS_2F: // LS_2F .Q.............. ...Sssnnnnnttttt Vt[] Rn
case IF_LS_2G: // LS_2G .Q.............. ...Sssnnnnnttttt Vt[] Rn
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

These two IF_LS_2F and IF_LS_2G should have the xx fields as well:

.Q.............. xx.Sssnnnnnttttt Vt[] Rn

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I will make these changes in #33535

Copy link
Member

@BruceForstall BruceForstall left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, just a few nits

@@ -39,7 +39,10 @@ void emitDispLSExtendOpts(insOpts opt);
void emitDispReg(regNumber reg, emitAttr attr, bool addComma);
void emitDispVectorReg(regNumber reg, insOpts opt, bool addComma);
void emitDispVectorRegIndex(regNumber reg, emitAttr elemsize, ssize_t index, bool addComma);
void emitDispVectorRegList(regNumber firstReg, unsigned listSize, insOpts opt, bool addComma);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: listSize => listLength? (same below)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I will follow up in #33535

void emitDispArrangement(insOpts opt);
void emitDispElemsize(emitAttr elemsize);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: capitalize "size", emitDispElemsize => emitDispElemSize?

@@ -445,6 +448,10 @@ static emitAttr optGetSrcsize(insOpts conversion);
// for an element of size 'elemsize' in a vector register of size 'datasize'
static bool isValidVectorIndex(emitAttr datasize, emitAttr elemsize, ssize_t index);

// For a given Load/Store Vector instruction 'ins' returns a number of consecutive SIMD registers
// the instruction loads to/store from.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: "store from" => "stores from"

@@ -445,6 +448,10 @@ static emitAttr optGetSrcsize(insOpts conversion);
// for an element of size 'elemsize' in a vector register of size 'datasize'
static bool isValidVectorIndex(emitAttr datasize, emitAttr elemsize, ssize_t index);

// For a given Load/Store Vector instruction 'ins' returns a number of consecutive SIMD registers
// the instruction loads to/store from.
static unsigned insGetLoadStoreRegisterListSize(instruction ins);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: ListSize => ListLength?

// ld2 {Vt,Vt2},[Xn] LS_2D 0Q00110001000000 1000ssnnnnnttttt 0C40 8000 base register
// ld2 {Vt,Vt2},[Xn],Xm LS_3F 0Q001100110mmmmm 1000ssnnnnnttttt 0CC0 8000 post-indexed by a register
// ld2 {Vt,Vt2},[Xn],#imm LS_2E 0Q001100110mmmmm 1000ssnnnnnttttt 0CDF 8000 post-indexed by an immediate
// ld2 {Vt,Vt2}[],[Xn] LS_2F 0Q00110101100000 xx0Sssnnnnnttttt 0D60 0000 base register
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I wonder if it would be helpful to include a comment for each of these that would lead you to the correct reference manual page, e.g., "C7.2.174 LD2 (single structure)" (hopefully ARM doesn't renumber these... but at least the name would be the same)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, it's a good idea - I will add

@@ -378,6 +436,56 @@ INST3(mvn, "mvn", 0, 0, IF_EN3I, 0x2A2003E0, 0x2A2003E0, 0x2E205800)
// mvn Rd,(Rm,shk,imm) DR_2F X0101010sh1mmmmm iiiiii11111ddddd 2A20 03E0 Rm {LSL,LSR,ASR} imm(0-63)
// mvn Vd,Vn DV_2M 0Q10111000100000 010110nnnnnddddd 2E20 5800 Vd,Vn (vector)

// enum name FP LD/ST LS_2D LS_3F LS_2E
INST3(ld1_2regs,"ld1", 0,LD, IF_EN3J, 0x0C40A000, 0x0CC0A000, 0x0CDFA000)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I guess you couldn't figure out a way to merge these (ld1_2regs, ld1_3regs, ld1_4regs, st1_2regs, etc.) with their respective ld1/st1 definitions, above? Or create a single ld1_multiregs/st1_multiregs that is distinguished in code with insOpts or similar?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, I have considered different approaches while adding these instructions.

Since multiple registers variants exist only for a small number of instructions, namely ld[1234], ld[1234]r,st[1234] ,tbx and tbl and only a subset of those require a way to specify a number of registers (since it's implied for ld[1234]r, ld[234], st[234]) I decided not to disrupt other emitter parts with adding number of registers to insOpts.

@@ -5219,6 +5219,726 @@ void CodeGen::genArm64EmitterUnitTests()

#endif // ALL_ARM64_EMITTER_UNIT_TESTS

#ifdef ALL_ARM64_EMITTER_UNIT_TESTS
//
// Loads to /Stores from one, two, three, or four SIMD&FP registers
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

"Loads to /Stores" => "Loads to and Stores from"? (same below)

I thought "/Stores" was a typo so I was confused reading it.

// ins - A Load/Store Vector instruction (e.g. ld1 (2 registers), ld1r, st1).
//
// Return value:
// A number of consecutive SIMD and floating-point registers the instruction loads to/store from.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: "store from"=>"stores from"

* Display an vector register index suffix
*/
//------------------------------------------------------------------------
// emitDispVectorRegIndex: Display a SIMD vector register name with element index
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Would this be clearer named emitDispVectorRegWithIndex? or emitDispVectorRegWithSizeAndIndex

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The "official" name for this is SIMD vector element name (see C1.2.5 Register names) so we might also rename it to emitDispVectorElementName

//------------------------------------------------------------------------
// emitDispVectorElemList: Display a SIMD vector element list
//
void emitter::emitDispVectorElemList(
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Would this be clearer as emitDispVectorRegListWithSize? As named, I wasn't sure the difference between emitDispVectorRegList and emitDispVectorElemList -- namely, what is a "VectorElem"?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

These terminology from C1.2.5 Register names - I don't have a strong preference how to name the function - I tried to follow what the Arm docs said.

@echesakov
Copy link
Contributor Author

The two Linux sigsegv failures are due to an issue reported in #33562 (comment). An arm64 leg passes, so merging.

@echesakov echesakov merged commit 6b8cda0 into dotnet:master Mar 14, 2020
@echesakov echesakov deleted the Arm64-Vector-Load-Store-Structure-Instructions branch March 14, 2020 00:11
echesakov added a commit to echesakov/runtime that referenced this pull request Mar 17, 2020
echesakov added a commit to echesakov/runtime that referenced this pull request Mar 17, 2020
@ghost ghost locked as resolved and limited conversation to collaborators Dec 10, 2020
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
arch-arm64 area-CodeGen-coreclr CLR JIT compiler in src/coreclr/src/jit and related components such as SuperPMI
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants