[AMDGPU] Use correct operand order for shifts #68299

rovka · 2023-10-05T11:27:28Z

In a special case in frame index elimination (when the offset is 0), we generate either a S_LSHR_B32 or a V_LSHRREV_B32 using the same code. However, they don't expect their operands in the same order - S_LSHR_B32 takes the value to be shifted first and then the shift amount, whereas V_LSHRREV_B32 has the operands reversed (hence the REV in its name). Update the code & tests to take this into account. Also remove an outdated comment (this code is definitely reachable now that non-entry functions no longer have a fixed emergency scavenge slot).

llvmbot · 2023-10-05T11:28:38Z

@llvm/pr-subscribers-backend-amdgpu

Changes

In a special case in frame index elimination (when the offset is 0), we generate either a S_LSHR_B32 or a V_LSHRREV_B32 using the same code. However, they don't expect their operands in the same order - S_LSHR_B32 takes the value to be shifted first and then the shift amount, whereas V_LSHRREV_B32 has the operands reversed (hence the REV in its name). Update the code & tests to take this into account. Also remove an outdated comment (this code is definitely reachable now that non-entry functions no longer have a fixed emergency scavenge slot).

Full diff: https://github.com/llvm/llvm-project/pull/68299.diff

2 Files Affected:

(modified) llvm/lib/Target/AMDGPU/SIRegisterInfo.cpp (+7-4)
(modified) llvm/test/CodeGen/AMDGPU/frame-index.mir (+2-2)

diff --git a/llvm/lib/Target/AMDGPU/SIRegisterInfo.cpp b/llvm/lib/Target/AMDGPU/SIRegisterInfo.cpp
index e065b104db4eb33..d0a81673d6528c2 100644
--- a/llvm/lib/Target/AMDGPU/SIRegisterInfo.cpp
+++ b/llvm/lib/Target/AMDGPU/SIRegisterInfo.cpp
@@ -2432,10 +2432,13 @@ bool SIRegisterInfo::eliminateFrameIndex(MachineBasicBlock::iterator MI,
         if (Offset == 0) {
           unsigned OpCode = IsSALU && !LiveSCC ? AMDGPU::S_LSHR_B32
                                                : AMDGPU::V_LSHRREV_B32_e64;
-          // XXX - This never happens because of emergency scavenging slot at 0?
-          auto Shift = BuildMI(*MBB, MI, DL, TII->get(OpCode), ResultReg)
-                           .addImm(ST.getWavefrontSizeLog2())
-                           .addReg(FrameReg);
+          auto Shift = BuildMI(*MBB, MI, DL, TII->get(OpCode), ResultReg);
+          if (OpCode == AMDGPU::V_LSHRREV_B32_e64)
+            // For V_LSHRREV, the operands are reversed (the shift count goes
+            // first).
+            Shift.addImm(ST.getWavefrontSizeLog2()).addReg(FrameReg);
+          else
+            Shift.addReg(FrameReg).addImm(ST.getWavefrontSizeLog2());
           if (IsSALU && !LiveSCC)
             Shift.getInstr()->getOperand(3).setIsDead(); // Mark SCC as dead.
           if (IsSALU && LiveSCC) {
diff --git a/llvm/test/CodeGen/AMDGPU/frame-index.mir b/llvm/test/CodeGen/AMDGPU/frame-index.mir
index 31bc2b04dde8c13..d8736c5276a26d2 100644
--- a/llvm/test/CodeGen/AMDGPU/frame-index.mir
+++ b/llvm/test/CodeGen/AMDGPU/frame-index.mir
@@ -53,7 +53,7 @@ body:             |
     ; GCN-LABEL: name: func_add_constant_to_fi_uniform_i32
     ; GCN: liveins: $sgpr30_sgpr31
     ; GCN-NEXT: {{  $}}
-    ; GCN-NEXT: $sgpr0 = S_LSHR_B32 6, $sgpr32, implicit-def dead $scc
+    ; GCN-NEXT: $sgpr0 = S_LSHR_B32 $sgpr32, 6, implicit-def dead $scc
     ; GCN-NEXT: renamable $sgpr4 = nuw S_ADD_I32 killed $sgpr0, 4, implicit-def dead $scc
     ; GCN-NEXT: renamable $vgpr0 = COPY killed renamable $sgpr4, implicit $exec
     ; GCN-NEXT: $m0 = S_MOV_B32 -1
@@ -89,7 +89,7 @@ body:             |
     ; GCN-LABEL: name: func_add_constant_to_fi_uniform_SCC_clobber_i32
     ; GCN: liveins: $sgpr30_sgpr31
     ; GCN-NEXT: {{  $}}
-    ; GCN-NEXT: $sgpr0 = S_LSHR_B32 6, $sgpr32, implicit-def dead $scc
+    ; GCN-NEXT: $sgpr0 = S_LSHR_B32 $sgpr32, 6, implicit-def dead $scc
     ; GCN-NEXT: renamable $sgpr4 = nuw S_ADD_U32 killed $sgpr0, 4, implicit-def $scc
     ; GCN-NEXT: renamable $sgpr5 = S_ADDC_U32 $sgpr4, 1234567, implicit-def $scc, implicit $scc
     ; GCN-NEXT: $sgpr0 = S_LSHR_B32 $sgpr32, 6, implicit-def $scc

arsenm · 2023-10-05T15:09:22Z

llvm/lib/Target/AMDGPU/SIRegisterInfo.cpp

@@ -2432,10 +2432,13 @@ bool SIRegisterInfo::eliminateFrameIndex(MachineBasicBlock::iterator MI,
        if (Offset == 0) {
          unsigned OpCode = IsSALU && !LiveSCC ? AMDGPU::S_LSHR_B32
                                               : AMDGPU::V_LSHRREV_B32_e64;
-          // XXX - This never happens because of emergency scavenging slot at 0?


Is this reachable now?

Yep, I hit this in an OpenMP test after a change to FixSGPRCopies.

…#66882)" Teach the si-fix-sgpr-copies pass to deal with REG_SEQUENCE, PHI or INSERT_SUBREG where the result is an SGPR, but some of the inputs are constants materialized into VGPRs. This may happen in cases where for instance several instructions use an immediate zero and SelectionDAG chooses to put it in a VGPR to satisfy all of them. This however causes the si-fix-sgpr-copies to try to switch the whole chain to VGPR and may lead to illegal VGPR-to-SGPR copies. Rematerializing the constant into an SGPR fixes the issue. This was originally reverted because it triggered an unrelated bug in PEI on one of the OpenMP buildbots. That bug has been fixed in #68299, so it should be ok to try again.

llvmbot added the backend:AMDGPU label Oct 5, 2023

jayfoad approved these changes Oct 5, 2023

View reviewed changes

arsenm reviewed Oct 5, 2023

View reviewed changes

rovka merged commit be382de into llvm:main Oct 6, 2023
4 checks passed

rovka deleted the amdgpu-pei-swap-shift-ops branch October 6, 2023 07:43

stepthomas mentioned this pull request Oct 10, 2023

AMDGPU stepthomas atomic csub no rtn forms ver2 stepthomas/llvm-project#1

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[AMDGPU] Use correct operand order for shifts #68299

[AMDGPU] Use correct operand order for shifts #68299

rovka commented Oct 5, 2023

llvmbot commented Oct 5, 2023

arsenm Oct 5, 2023

rovka Oct 6, 2023

[AMDGPU] Use correct operand order for shifts #68299

[AMDGPU] Use correct operand order for shifts #68299

Conversation

rovka commented Oct 5, 2023

llvmbot commented Oct 5, 2023

arsenm Oct 5, 2023

Choose a reason for hiding this comment

rovka Oct 6, 2023

Choose a reason for hiding this comment