Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[RyuJIT] Emit shlx, sarx, shrx on x64 #67182

Merged
merged 15 commits into from
May 20, 2022
Merged

Conversation

JulieLeeMSFT
Copy link
Member

@JulieLeeMSFT JulieLeeMSFT commented Mar 26, 2022

Fixes #41881.
Generates shlx, sarx, shrx for 64 bit shifts if BMI2 platform.

ulong Shlx(ulong x, int y) => x << y;
long Sarx(long x, int y) => x >> y;
ulong Shrx(ulong x, int y) => x >> y;

Current codegen:

; Method  Test:Shlx(long,int):long
       mov      eax, ecx
       mov      ecx, edx
       shl      eax, cl
       ret      

; Method Test:Sarx(long,int):long
       mov      eax, ecx
       mov      ecx, edx
       sar      eax, cl
       ret      

; Method Test:Shrx(long,int):long
       mov      eax, ecx
       mov      ecx, edx
       shr      eax, cl
       ret   

New codegen:

for method Test:Shlx(long,int):long
    C4E2E9F7C1           shlx     rax, rcx, rdx

for method Test:Sarx(long,int):long
    C4E2EAF7C1           sarx     rax, rcx, rdx

for method Test:Shrx(long,int):long
    C4E2EBF7C1           shrx     rax, rcx, rdx

It needs further work to remove mov when memory address is used instead of all registers (handle it when enabling contained form in #67314).

ulong ShrxRef(ulong *x, int y) => *x >> y;
    488B01              mov    rax, qword ptr [rcx]
    C4E2EBF7C0          shrx    rax, rax, rdx

x86 support needs to be enabled (added it in #67314).

@ghost ghost assigned JulieLeeMSFT Mar 26, 2022
@dotnet-issue-labeler dotnet-issue-labeler bot added the area-CodeGen-coreclr CLR JIT compiler in src/coreclr/src/jit and related components such as SuperPMI label Mar 26, 2022
@ghost
Copy link

ghost commented Mar 26, 2022

Tagging subscribers to this area: @JulieLeeMSFT
See info in area-owners.md if you want to be subscribed.

Issue Details

Fixes #41881.
Generates shlx, sarx, shrx if BMI2 platform and result is TYP_LONG.

ulong Shlx(ulong x, int y) => x << y;
long Sarx(long x, int y) => x >> y;
ulong Shrx(ulong x, int y) => x >> y;

Current codegen:

; Method  Test:Shlx(long,int):long
       mov      eax, ecx
       mov      ecx, edx
       shl      eax, cl
       ret      

; Method Test:Sarx(long,int):long
       mov      eax, ecx
       mov      ecx, edx
       sar      eax, cl
       ret      

; Method Test:Shrx(long,int):long
       mov      eax, ecx
       mov      ecx, edx
       shr      eax, cl
       ret   

New codegen:

for method Test:Shlx(long,int):long
    C4E2E9F7C1           shlx     rax, rcx, rdx

for method Test:Sarx(long,int):long
    C4E2EAF7C1           sarx     rax, rcx, rdx

for method Test:Shrx(long,int):long
    C4E2EBF7C1           shrx     rax, rcx, rdx

It needs further work to remove mov when memory address is used instead of all registers.

ulong ShrxRef(ulong *x, int y) => *x >> y;
    488B01              mov    rax, qword ptr [rcx]
    C4E2EBF7C0          shrx    rax, rax, rdx
Author: JulieLeeMSFT
Assignees: JulieLeeMSFT
Labels:

area-CodeGen-coreclr

Milestone: -

@JulieLeeMSFT JulieLeeMSFT added this to the 7.0.0 milestone Mar 26, 2022

regNumber shiftByReg = shiftBy->GetRegNum();
emitAttr size = emitTypeSize(tree);
GetEmitter()->emitIns_R_R_R(ins, size, tree->GetRegNum(), shiftByReg, operandReg);
Copy link
Member

@tannergooding tannergooding Mar 27, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Might be worth a note that we don't currently support the contained form due to more complex changes being needed in the emitter.

Ideally we'd fix up the emitter and use inst_RV_RV_TT instead so that we can emit shlx r32a, r/m32, r32b. Someone would need to walk through the relevant IF_RWR_RRD_*RD formats and ensure that it's all handled correctly (noting that technically the format is IF_RWR_*RD_RRD but that should be the same as IF_RWR_RRD_*RD with swapping op1/op2, like we do for a couple other BMI2 instructions, namely bextr and bzhi).

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Opened #67314.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As @tannergooding , lets have a comment about containment and also specify that here, the operands are swapped because the way operands are encoded in these 3 instructions.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Added commnets

As @tannergooding , lets have a comment about containment and also specify that here, the operands are swapped because the way operands are encoded in these 3 instructions.

@tannergooding
Copy link
Member

Change generally LGTM. Left a suggestion around the ordering of the opportunistic check and potentially logging an issue or a comment around adding support for containment in the future.

@JulieLeeMSFT JulieLeeMSFT changed the title [RyuJIT] Emit shlx, sarx, shrx on x64 [RyuJIT] Emit shlx, sarx, shrx on x64 and x86 Mar 29, 2022
@JulieLeeMSFT
Copy link
Member Author

Change generally LGTM. Left a suggestion around the ordering of the opportunistic check and potentially logging an issue or a comment around adding support for containment in the future.

Opened #67314.

@JulieLeeMSFT
Copy link
Member Author

@kunalspathak PTAL.
cc @dotnet/jit-contrib.

@kunalspathak
Copy link
Member

Can you check why there is a regression in coreclr_tests windows/x64?

Total bytes of base: 59717535 (overridden on cmd)
Total bytes of diff: 59717754 (overridden on cmd)
Total bytes of delta: 219 (0.00 % of base)
    diff is a regression.
    relative diff is an improvement.

Copy link
Member

@kunalspathak kunalspathak left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Added some comments. I think we need to understand the regression in coreclr/libraries test.

@@ -4754,7 +4754,7 @@ void Lowering::ContainCheckDivOrMod(GenTreeOp* node)
void Lowering::ContainCheckShiftRotate(GenTreeOp* node)
{
assert(node->OperIsShiftOrRotate());
#ifdef TARGET_X86
#if defined(TARGET_X86)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

you could just revert this change...

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done.

@@ -932,6 +932,16 @@ int LinearScan::BuildShiftRotate(GenTree* tree)
{
assert(shiftBy->OperIsConst());
}
#if defined(TARGET_64BIT)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this instruction only applicable for x64? I thought it is also valid for x86? @tannergooding ?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I wonder why it worked for x86 without changing this part of the code?

Is this instruction only applicable for x64? I thought it is also valid for x86? @tannergooding ?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Isn't InstructionSet_BMI2 the 32 bit version and for 64 bit you'd want InstructionSet_BMI2_X64 ? If so and you checked 64 bit correctly elsewhere and then 32 here it would explain it working for 32.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Double checked that my change handles only x64 case. Will open an issue to handle x86.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Double checked that my change handles only x64 case. Will open an issue to handle x86.

@@ -4378,6 +4378,7 @@ void CodeGen::genCodeForShift(GenTree* tree)
int shiftByValue = (int)shiftBy->AsIntConCommon()->IconValue();

#if defined(TARGET_64BIT)

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

delete the extra line.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done.

@@ -4397,6 +4398,36 @@ void CodeGen::genCodeForShift(GenTree* tree)
inst_RV_SH(ins, size, tree->GetRegNum(), shiftByValue);
}
}
#if defined(TARGET_64BIT)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

same here...why is it only for x64?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Handling for x64 only for now.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Handling for x64 only for now.


regNumber shiftByReg = shiftBy->GetRegNum();
emitAttr size = emitTypeSize(tree);
GetEmitter()->emitIns_R_R_R(ins, size, tree->GetRegNum(), shiftByReg, operandReg);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As @tannergooding , lets have a comment about containment and also specify that here, the operands are swapped because the way operands are encoded in these 3 instructions.

@@ -749,6 +749,9 @@ bool emitter::TakesRexWPrefix(instruction ins, emitAttr attr)
case INS_pdep:
case INS_pext:
case INS_rorx:
case INS_shlx:
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As per the changes at other places, we have the code to have it supported only for TARGET_64BIT, but here, it is under TARGET_ARM64. What is the difference and should it be consistent?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It is under #ifdef TARGET_AMD64, not ARM64. So, I guess it is correct.

@@ -987,17 +990,25 @@ unsigned emitter::emitOutputRexOrVexPrefixIfNeeded(instruction ins, BYTE* dst, c
case INS_rorx:
case INS_pdep:
case INS_mulx:
case INS_shrx:
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this code path is also for x86?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Added #ifdef TARGET_64BIT.

{
// BMI bextr and bzhi encodes the reg2 in VEX.vvvv and reg3 in modRM,
// BMI bextr,bzhi, shrx, shlx and sarx encode the reg2 in VEX.vvvv and reg3 in modRM,
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
// BMI bextr,bzhi, shrx, shlx and sarx encode the reg2 in VEX.vvvv and reg3 in modRM,
// BMI bextr, bzhi, shrx, shlx and sarx encode the reg2 in VEX.vvvv and reg3 in modRM,

We need to have similar comment where we swap the operands I pointed out.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Added comment.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: You have already added a comment in codegenxarch.cpp. No need here. The above comment covers it.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Removed.

@@ -10288,6 +10302,7 @@ BYTE* emitter::emitOutputAM(BYTE* dst, instrDesc* id, code_t code, CnsVal* addc)
// For this format, moves do not support a third operand, so we only need to handle the binary ops.
if (TakesVexPrefix(ins))
{

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

delete

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done.

case INS_sarx:
case INS_shrx:
{
result.insLatency = PERFSCORE_LATENCY_2C;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
result.insLatency = PERFSCORE_LATENCY_2C;
result.insLatency += PERFSCORE_LATENCY_1C;

It should be similar to that of rorx AFAIK.

image

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Qhere are you getting that number from? I think that matches the newer hardware and not Skylake, which is what we have used for the other numbers

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Updated to result.insLatency += PERFSCORE_LATENCY_1C;.

@ghost ghost added the needs-author-action An issue or pull request that requires more info or actions from the author. label Mar 30, 2022
if (resUInt != expectedUInt)
{
Console.Write(" != {0} Failed.\n", expectedUInt);
return 101;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You should consider letting all the tests run, and not exiting on the first failure. Just make sure to return 101 if any test fails.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done.

Comment on lines 4426 to 4428
genProduceReg(tree);

return;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Delete these lines, and let the normal fall-through make this call and return

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done.

@ghost ghost removed the needs-author-action An issue or pull request that requires more info or actions from the author. label Mar 31, 2022
@JulieLeeMSFT
Copy link
Member Author

JulieLeeMSFT commented Apr 22, 2022

Reran asmdiffs and it is an improvement.

[14:06:59] Summary of Code Size diffs:
[14:06:59] (Lower is better)
[14:06:59]
[14:06:59] Total bytes of base: 129839366 (overridden on cmd)
[14:06:59] Total bytes of diff: 129826617 (overridden on cmd)
[14:06:59] Total bytes of delta: -12749 (-0.01 % of base)
[14:06:59]
[14:06:59]
[14:06:59] 0 total files with Code Size differences (0 improved, 0 regressed), 1125 unchanged.
[14:06:59]
[14:06:59] 0 total methods with Code Size differences (0 improved, 0 regressed), 0 unchanged.

Enabled this only for x64, not x86. Will open a new issue to address it for x86.
Edit: Added comment in the existing issue: #67314

@JulieLeeMSFT JulieLeeMSFT changed the title [RyuJIT] Emit shlx, sarx, shrx on x64 and x86 [RyuJIT] Emit shlx, sarx, shrx on x64 Apr 22, 2022
@JulieLeeMSFT
Copy link
Member Author

All tests passed. @kunalspathak PTAL.
cc @dotnet/jit-contrib.

Copy link
Member

@kunalspathak kunalspathak left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Overall looks good. Added some minor suggestions.

@@ -1474,7 +1473,10 @@
<ExcludeList Include="$(XunitTestBinBase)/JIT/SIMD/Vector3Interop_ro/*">
<Issue>https://github.com/dotnet/runtime/issues/46174</Issue>
</ExcludeList>

<ExcludeList Include="$(XunitTestBinBase)/JIT/SIMD/ShiftOperations/*">
<Issue>There is a known undefined behavior with shifts and 0x0FFFFFFFF overflows, so skip the test for mono.</Issue>
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
<Issue>There is a known undefined behavior with shifts and 0x0FFFFFFFF overflows, so skip the test for mono.</Issue>
<Issue>There is a known undefined behavior with shifts and 0xFFFFFFFF overflows, so skip the test for mono.</Issue>

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done.

@@ -4803,7 +4803,7 @@ void Lowering::ContainCheckShiftRotate(GenTreeOp* node)
assert(source->OperGet() == GT_LONG);
MakeSrcContained(node, source);
}
#endif // !TARGET_X86
#endif
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

revert this change.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This has been alrady removed.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I mean put it back (revert the deletion of the comment // !TARGET_X86).

unreached();
}

// It handles all register forms, but it does not handle contained form for memory operand.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This comment makes more sense next to your lsraxarch.cpp change.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Moved it to lsraxarch.cpp.

{
// BMI bextr and bzhi encodes the reg2 in VEX.vvvv and reg3 in modRM,
// BMI bextr,bzhi, shrx, shlx and sarx encode the reg2 in VEX.vvvv and reg3 in modRM,
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: You have already added a comment in codegenxarch.cpp. No need here. The above comment covers it.

@@ -0,0 +1,501 @@
using System;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Include the license.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done.

shiftBy = 31;
resUInt = Shlx32bit(valUInt, shiftBy);
expectedUInt = (uint) (valUInt * Math.Pow(2, (shiftBy % MOD32)));
Console.Write("UnitTest Shlx32bit({0},{1}): {2}", valUInt, shiftBy, resUInt);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why not create a Validate() function that will do most of these things at one place?

public int Validate(..., actual, ...) {
  expected = 
  if (expected != actual) { 
    Console.WriteLine("Fail");
    return 101;
  }
  return 100;
}

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You can then just have an input array and iterate over it or something like that.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done

@ghost ghost added the needs-author-action An issue or pull request that requires more info or actions from the author. label Apr 23, 2022
@kunalspathak
Copy link
Member

Also, there are some significant regressions. Did you figure out why? Eg. here is from asp.net collection.

          16 (8.51 % of base) : 6903.dasm - System.Number:ComputeProductApproximation(int,long,long):System.ValueTuple`2[UInt64,UInt64]
          14 (0.87 % of base) : 13765.dasm - V8.Crypto.BigInteger:modPow(V8.Crypto.BigInteger,V8.Crypto.BigInteger):V8.Crypto.BigInteger:this
           9 (1.80 % of base) : 13749.dasm - V8.Crypto.BigInteger:subTo(V8.Crypto.BigInteger,V8.Crypto.BigInteger):this
           8 (0.27 % of base) : 26580.dasm - System.Text.RegularExpressions.Symbolic.SymbolicRegexMatcher`1[UInt64][System.UInt64]:FindEndPositionCapturing(System.ReadOnlySpan`1[Char],int,byref,PerThreadData[UInt64]):int:this
           8 (1.61 % of base) : 13767.dasm - V8.Crypto.BigInteger:addTo(V8.Crypto.BigInteger,V8.Crypto.BigInteger):this
           7 (4.43 % of base) : 15897.dasm - BenchmarksGame.ByteString:GetHashCode():int:this
           7 (0.08 % of base) : 7387.dasm - System.Text.RegularExpressions.RegexInterpreter:TryMatchAtCurrentPosition(System.ReadOnlySpan`1[Char]):bool:this
           6 (0.34 % of base) : 27178.dasm - Microsoft.CodeAnalysis.PEModule:GetTypeAndConstructor(System.Reflection.Metadata.MetadataReader,System.Reflection.Metadata.CustomAttributeHandle,byref,byref):bool
           6 (1.61 % of base) : 17686.dasm - System.Collections.BitArray:set_Length(int):this
           6 (5.45 % of base) : 3915.dasm - System.Text.Encodings.Web.TextEncoderSettings:AllowRange(System.Text.Unicode.UnicodeRange):this
           5 (1.18 % of base) : 16464.dasm - System.Number:AssembleFloatingPointBits(byref,long,int,bool):long
           4 (0.05 % of base) : 28137.dasm - Microsoft.CodeAnalysis.CSharp.Binder:FoldNeverOverflowBinaryOperators(int,Microsoft.CodeAnalysis.ConstantValue,Microsoft.CodeAnalysis.ConstantValue):System.Object
           4 (0.36 % of base) : 27480.dasm - Microsoft.CodeAnalysis.CSharp.Symbols.Metadata.PE.PENamedTypeSymbol:MakeDeclaredBaseType():Microsoft.CodeAnalysis.CSharp.Symbols.NamedTypeSymbol:this
           4 (0.40 % of base) : 27192.dasm - Microsoft.CodeAnalysis.MetadataReaderExtensions:IsTheObjectClass(System.Reflection.Metadata.MetadataReader,System.Reflection.Metadata.TypeDefinition):bool
           4 (4.30 % of base) : 13741.dasm - MontgomeryReducer:.ctor(V8.Crypto.BigInteger):this
           4 (1.53 % of base) : 5738.dasm - System.Net.Http.HPack.IntegerEncoder:Encode(int,int,System.Span`1[Byte],byref):bool
           4 (3.45 % of base) : 16465.dasm - System.Number:RightShiftWithRounding(long,int,bool):long
           4 (4.40 % of base) : 12332.dasm - System.Text.RegularExpressions.Symbolic.BDD:GetMin():long:this
           3 (1.48 % of base) : 1197.dasm - <>c:<EmitMatchCharacterClass>b__122_1(System.Span`1[Char],System.String):this
           3 (0.18 % of base) : 9584.dasm - BenchmarksGame.BinaryTrees_5:Bench(int,bool):int

@JulieLeeMSFT
Copy link
Member Author

Also, there are some significant regressions. Did you figure out why? Eg. here is from asp.net collection.

They are due to register assignment changes. Bytes regressed in some files but perfscores are better. Some adds NOP of 4 bytes.

benchmarks.run.windows.x64.checked.mch:

Top method regressions (bytes):
16 (8.51 % of base) : 6903.dasm
14 (0.87 % of base) : 13765.dasm

  • 16 (8.51 % of base) : 6903.dasm: better perfscore for diff.

Base:

       shr      r11, cl
						;; size=13 bbWeight=0.50 PerfScore 1.12

Diff:

       shrx     r11, r11, rdx
						;; size=15 bbWeight=0.50 PerfScore 0.38
  • 14 (0.87 % of base) : 13765.dasm

Base:

       sub      ecx, eax
       sar      edx, cl
       and      edx, dword ptr [rsp+7CH]
       jmp      SHORT G_M1573_IG21
						;; size=40 bbWeight=2    PerfScore 37.50

Diff:

       mov      edx, ebx
       sub      edx, eax
       sarx     ecx, ecx, edx
       and      ecx, dword ptr [rsp+7CH]
       jmp      G_M1573_IG21
						;; size=46 bbWeight=2    PerfScore 34.50

aspnet.run.windows.x64.checked.mch:

Top method regressions (bytes):

  • 31 (11.79 % of base) : 29879.dasm
    • Checked it and shlx block is 3 bytes larger but perfScore is 28.71 compared to base PerfScore 34.65.
  • 19 (8.56 % of base) : 6388.dasm
    Base:
    ; Total bytes of code 222, prolog size 14, PerfScore 3321.21, instruction count 69
    Diff
    ; Total bytes of code 241, prolog size 14, PerfScore 2796.93, instruction count 69
  • 19 (8.56 % of base) : 5120.dasm
    -- It add NOP of 4 bytes
    Base:
G_M35765_IG02:        ; gcrefRegs=000000C0 {rsi rdi}, byrefRegs=00000000 {}, byref, isz
       test     rdi, rdi
       je       SHORT G_M35765_IG09
       xor      ebx, ebx
       mov      ebp, dword ptr [rdi+8]
       test     ebp, ebp
       jle      SHORT G_M35765_IG08
						;; size=18 bbWeight=1    PerfScore 4.75

Diff

G_M35765_IG02:        ; gcrefRegs=000000C0 {rsi rdi}, byrefRegs=00000000 {}, byref
       test     rdi, rdi
       je       G_M35765_IG09
       xor      ebx, ebx
       mov      ebp, dword ptr [rdi+8]
       test     ebp, ebp
       jle      SHORT G_M35765_IG08
		  ;; NOP compensation instructions of 4 bytes.
						;; size=22 bbWeight=1    PerfScore 4.75

@JulieLeeMSFT
Copy link
Member Author

Why is this only for x64, not for x86 as well? Is there significant work to enable it for x86? Shouldn't it just fall out?

It was not a simple fall out, so I discussed with Kunal to skip x86 for now.

@BruceForstall
Copy link
Member

It was not a simple fall out, so I discussed with Kunal to skip x86 for now.

Can you describe what problems were encountered, and what is required to implement it for x86? The tracking issue #67314 doesn't have any details.

@JulieLeeMSFT
Copy link
Member Author

It was not a simple fall out, so I discussed with Kunal to skip x86 for now.

Can you describe what problems were encountered, and what is required to implement it for x86? The tracking issue #67314 doesn't have any details.

It has been a while so I cannot remember exactly, but it was not generating the right code. lsraxarch and emitxarch need to be looked into. Updated #67314.

@JulieLeeMSFT
Copy link
Member Author

/azp list

@JulieLeeMSFT
Copy link
Member Author

@kunalspathak and @BruceForstall it is ready to review.

long expectedLong = 0;
int MOD64 = 64;

/* TODO: Enable 32bit test when x86 shift is enabled.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would suggest deleting the commented code.


try
{
ulong valULong = 0;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would define these variables closer to their usage.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed.

ulong valULong = 0;
long valLong = 0;
int shiftBy = 0;
ulong resULong = 0;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is it worth adding short and ints as test cases?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Added.

Copy link
Member

@kunalspathak kunalspathak left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM...added some suggestions.

switch (x)
{
case ulong a:
ulong resUlong = ((ulong)a) << y;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why doesn't this generic work?

T res = ((T)x) << y;
return (R)Convert.ChangeType(res,typeof(R));

Copy link
Member Author

@JulieLeeMSFT JulieLeeMSFT May 19, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

shift operators do not work on generics.
https://docs.microsoft.com/en-us/dotnet/csharp/language-reference/operators/bitwise-and-shift-operators
Because the shift operators are defined only for the int, uint, long, and ulong types, the result of an operation always contains at least 32 bits. If the left-hand operand is of another integral type (sbyte, byte, short, ushort, or char), its value is converted to the int type

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@kunalspathak, this is the error message.
error CS0019: Operator '<<' cannot be applied to operands of type 'T' and 'int'
All tests passed, so merging it now. Thanks for all the code reviews.

@kunalspathak
Copy link
Member

Some improvements in windows x64: System.Collections.Tests.Perf_BitArray dotnet/perf-autofiling-issues#5495

@kunalspathak
Copy link
Member

Improvements in linux/x64 #67182

@kunalspathak
Copy link
Member

Improvements windows/x64: dotnet/perf-autofiling-issues#5460

@ghost ghost locked as resolved and limited conversation to collaborators Jul 2, 2022
@JulieLeeMSFT JulieLeeMSFT deleted the 41881_shrx branch February 1, 2023 02:41
@JulieLeeMSFT JulieLeeMSFT added the arch-riscv Related to the RISC-V architecture label May 23, 2023
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
arch-riscv Related to the RISC-V architecture area-CodeGen-coreclr CLR JIT compiler in src/coreclr/src/jit and related components such as SuperPMI
Projects
None yet
Development

Successfully merging this pull request may close these issues.

[RyuJIT] Emit shrx, sarx, shlx where needed
7 participants