-
Notifications
You must be signed in to change notification settings - Fork 4.7k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[RyuJIT] Emit shlx, sarx, shrx on x64 #67182
Conversation
Tagging subscribers to this area: @JulieLeeMSFT Issue DetailsFixes #41881.
Current codegen:
New codegen:
It needs further work to remove mov when memory address is used instead of all registers.
|
|
||
regNumber shiftByReg = shiftBy->GetRegNum(); | ||
emitAttr size = emitTypeSize(tree); | ||
GetEmitter()->emitIns_R_R_R(ins, size, tree->GetRegNum(), shiftByReg, operandReg); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Might be worth a note that we don't currently support the contained form due to more complex changes being needed in the emitter.
Ideally we'd fix up the emitter and use inst_RV_RV_TT
instead so that we can emit shlx r32a, r/m32, r32b
. Someone would need to walk through the relevant IF_RWR_RRD_*RD
formats and ensure that it's all handled correctly (noting that technically the format is IF_RWR_*RD_RRD
but that should be the same as IF_RWR_RRD_*RD
with swapping op1/op2, like we do for a couple other BMI2 instructions, namely bextr
and bzhi
).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Opened #67314.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
As @tannergooding , lets have a comment about containment and also specify that here, the operands are swapped because the way operands are encoded in these 3 instructions.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Added commnets
As @tannergooding , lets have a comment about containment and also specify that here, the operands are swapped because the way operands are encoded in these 3 instructions.
Change generally LGTM. Left a suggestion around the ordering of the opportunistic check and potentially logging an issue or a comment around adding support for containment in the future. |
a1a37bd
to
fb558b3
Compare
Opened #67314. |
@kunalspathak PTAL. |
Can you check why there is a regression in coreclr_tests windows/x64?
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Added some comments. I think we need to understand the regression in coreclr/libraries test.
src/coreclr/jit/lowerxarch.cpp
Outdated
@@ -4754,7 +4754,7 @@ void Lowering::ContainCheckDivOrMod(GenTreeOp* node) | |||
void Lowering::ContainCheckShiftRotate(GenTreeOp* node) | |||
{ | |||
assert(node->OperIsShiftOrRotate()); | |||
#ifdef TARGET_X86 | |||
#if defined(TARGET_X86) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
you could just revert this change...
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done.
@@ -932,6 +932,16 @@ int LinearScan::BuildShiftRotate(GenTree* tree) | |||
{ | |||
assert(shiftBy->OperIsConst()); | |||
} | |||
#if defined(TARGET_64BIT) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is this instruction only applicable for x64? I thought it is also valid for x86? @tannergooding ?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I wonder why it worked for x86 without changing this part of the code?
Is this instruction only applicable for x64? I thought it is also valid for x86? @tannergooding ?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Isn't InstructionSet_BMI2
the 32 bit version and for 64 bit you'd want InstructionSet_BMI2_X64
? If so and you checked 64 bit correctly elsewhere and then 32 here it would explain it working for 32.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Double checked that my change handles only x64 case. Will open an issue to handle x86.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Double checked that my change handles only x64 case. Will open an issue to handle x86.
src/coreclr/jit/codegenxarch.cpp
Outdated
@@ -4378,6 +4378,7 @@ void CodeGen::genCodeForShift(GenTree* tree) | |||
int shiftByValue = (int)shiftBy->AsIntConCommon()->IconValue(); | |||
|
|||
#if defined(TARGET_64BIT) | |||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
delete the extra line.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done.
@@ -4397,6 +4398,36 @@ void CodeGen::genCodeForShift(GenTree* tree) | |||
inst_RV_SH(ins, size, tree->GetRegNum(), shiftByValue); | |||
} | |||
} | |||
#if defined(TARGET_64BIT) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
same here...why is it only for x64?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Handling for x64 only for now.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Handling for x64 only for now.
|
||
regNumber shiftByReg = shiftBy->GetRegNum(); | ||
emitAttr size = emitTypeSize(tree); | ||
GetEmitter()->emitIns_R_R_R(ins, size, tree->GetRegNum(), shiftByReg, operandReg); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
As @tannergooding , lets have a comment about containment and also specify that here, the operands are swapped because the way operands are encoded in these 3 instructions.
@@ -749,6 +749,9 @@ bool emitter::TakesRexWPrefix(instruction ins, emitAttr attr) | |||
case INS_pdep: | |||
case INS_pext: | |||
case INS_rorx: | |||
case INS_shlx: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
As per the changes at other places, we have the code to have it supported only for TARGET_64BIT
, but here, it is under TARGET_ARM64
. What is the difference and should it be consistent?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It is under #ifdef TARGET_AMD64, not ARM64. So, I guess it is correct.
@@ -987,17 +990,25 @@ unsigned emitter::emitOutputRexOrVexPrefixIfNeeded(instruction ins, BYTE* dst, c | |||
case INS_rorx: | |||
case INS_pdep: | |||
case INS_mulx: | |||
case INS_shrx: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this code path is also for x86?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Added #ifdef TARGET_64BIT.
{ | ||
// BMI bextr and bzhi encodes the reg2 in VEX.vvvv and reg3 in modRM, | ||
// BMI bextr,bzhi, shrx, shlx and sarx encode the reg2 in VEX.vvvv and reg3 in modRM, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
// BMI bextr,bzhi, shrx, shlx and sarx encode the reg2 in VEX.vvvv and reg3 in modRM, | |
// BMI bextr, bzhi, shrx, shlx and sarx encode the reg2 in VEX.vvvv and reg3 in modRM, |
We need to have similar comment where we swap the operands I pointed out.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Added comment.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: You have already added a comment in codegenxarch.cpp
. No need here. The above comment covers it.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Removed.
src/coreclr/jit/emitxarch.cpp
Outdated
@@ -10288,6 +10302,7 @@ BYTE* emitter::emitOutputAM(BYTE* dst, instrDesc* id, code_t code, CnsVal* addc) | |||
// For this format, moves do not support a third operand, so we only need to handle the binary ops. | |||
if (TakesVexPrefix(ins)) | |||
{ | |||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
delete
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done.
src/coreclr/jit/emitxarch.cpp
Outdated
case INS_sarx: | ||
case INS_shrx: | ||
{ | ||
result.insLatency = PERFSCORE_LATENCY_2C; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Qhere are you getting that number from? I think that matches the newer hardware and not Skylake, which is what we have used for the other numbers
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Updated to result.insLatency += PERFSCORE_LATENCY_1C;
.
if (resUInt != expectedUInt) | ||
{ | ||
Console.Write(" != {0} Failed.\n", expectedUInt); | ||
return 101; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You should consider letting all the tests run, and not exiting on the first failure. Just make sure to return 101 if any test fails.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done.
src/coreclr/jit/codegenxarch.cpp
Outdated
genProduceReg(tree); | ||
|
||
return; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Delete these lines, and let the normal fall-through make this call and return
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done.
Reran asmdiffs and it is an improvement. [14:06:59] Summary of Code Size diffs: Enabled this only for x64, not x86. Will open a new issue to address it for x86. |
fb558b3
to
53575ca
Compare
All tests passed. @kunalspathak PTAL. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Overall looks good. Added some minor suggestions.
src/tests/issues.targets
Outdated
@@ -1474,7 +1473,10 @@ | |||
<ExcludeList Include="$(XunitTestBinBase)/JIT/SIMD/Vector3Interop_ro/*"> | |||
<Issue>https://github.com/dotnet/runtime/issues/46174</Issue> | |||
</ExcludeList> | |||
|
|||
<ExcludeList Include="$(XunitTestBinBase)/JIT/SIMD/ShiftOperations/*"> | |||
<Issue>There is a known undefined behavior with shifts and 0x0FFFFFFFF overflows, so skip the test for mono.</Issue> |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
<Issue>There is a known undefined behavior with shifts and 0x0FFFFFFFF overflows, so skip the test for mono.</Issue> | |
<Issue>There is a known undefined behavior with shifts and 0xFFFFFFFF overflows, so skip the test for mono.</Issue> |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done.
@@ -4803,7 +4803,7 @@ void Lowering::ContainCheckShiftRotate(GenTreeOp* node) | |||
assert(source->OperGet() == GT_LONG); | |||
MakeSrcContained(node, source); | |||
} | |||
#endif // !TARGET_X86 | |||
#endif |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
revert this change.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This has been alrady removed.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I mean put it back (revert the deletion of the comment // !TARGET_X86
).
src/coreclr/jit/codegenxarch.cpp
Outdated
unreached(); | ||
} | ||
|
||
// It handles all register forms, but it does not handle contained form for memory operand. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This comment makes more sense next to your lsraxarch.cpp
change.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Moved it to lsraxarch.cpp
.
{ | ||
// BMI bextr and bzhi encodes the reg2 in VEX.vvvv and reg3 in modRM, | ||
// BMI bextr,bzhi, shrx, shlx and sarx encode the reg2 in VEX.vvvv and reg3 in modRM, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: You have already added a comment in codegenxarch.cpp
. No need here. The above comment covers it.
@@ -0,0 +1,501 @@ | |||
using System; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Include the license.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done.
shiftBy = 31; | ||
resUInt = Shlx32bit(valUInt, shiftBy); | ||
expectedUInt = (uint) (valUInt * Math.Pow(2, (shiftBy % MOD32))); | ||
Console.Write("UnitTest Shlx32bit({0},{1}): {2}", valUInt, shiftBy, resUInt); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why not create a Validate()
function that will do most of these things at one place?
public int Validate(..., actual, ...) {
expected =
if (expected != actual) {
Console.WriteLine("Fail");
return 101;
}
return 100;
}
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You can then just have an input array and iterate over it or something like that.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done
Also, there are some significant regressions. Did you figure out why? Eg. here is from asp.net collection.
|
They are due to register assignment changes. Bytes regressed in some files but perfscores are better. Some adds NOP of 4 bytes. benchmarks.run.windows.x64.checked.mch:Top method regressions (bytes):
Base:
Diff:
Base:
Diff:
aspnet.run.windows.x64.checked.mch:Top method regressions (bytes):
Diff
|
It was not a simple fall out, so I discussed with Kunal to skip x86 for now. |
Can you describe what problems were encountered, and what is required to implement it for x86? The tracking issue #67314 doesn't have any details. |
It has been a while so I cannot remember exactly, but it was not generating the right code. lsraxarch and emitxarch need to be looked into. Updated #67314. |
/azp list |
@kunalspathak and @BruceForstall it is ready to review. |
long expectedLong = 0; | ||
int MOD64 = 64; | ||
|
||
/* TODO: Enable 32bit test when x86 shift is enabled. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I would suggest deleting the commented code.
|
||
try | ||
{ | ||
ulong valULong = 0; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I would define these variables closer to their usage.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Fixed.
ulong valULong = 0; | ||
long valLong = 0; | ||
int shiftBy = 0; | ||
ulong resULong = 0; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is it worth adding short and ints as test cases?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Added.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM...added some suggestions.
switch (x) | ||
{ | ||
case ulong a: | ||
ulong resUlong = ((ulong)a) << y; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why doesn't this generic work?
T res = ((T)x) << y;
return (R)Convert.ChangeType(res,typeof(R));
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
shift operators do not work on generics.
https://docs.microsoft.com/en-us/dotnet/csharp/language-reference/operators/bitwise-and-shift-operators
Because the shift operators are defined only for the int, uint, long, and ulong types, the result of an operation always contains at least 32 bits. If the left-hand operand is of another integral type (sbyte, byte, short, ushort, or char), its value is converted to the int type
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@kunalspathak, this is the error message.
error CS0019: Operator '<<' cannot be applied to operands of type 'T' and 'int'
All tests passed, so merging it now. Thanks for all the code reviews.
Some improvements in windows x64: System.Collections.Tests.Perf_BitArray dotnet/perf-autofiling-issues#5495 |
Improvements in linux/x64 #67182 |
Improvements windows/x64: dotnet/perf-autofiling-issues#5460 |
Fixes #41881.
Generates shlx, sarx, shrx for 64 bit shifts if BMI2 platform.
Current codegen:
New codegen:
It needs further work to remove mov when memory address is used instead of all registers (handle it when enabling contained form in #67314).
x86 support needs to be enabled (added it in #67314).