Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[AMDGPU] Use correct operand order for shifts #68299

Merged
merged 1 commit into from
Oct 6, 2023

Conversation

rovka
Copy link
Collaborator

@rovka rovka commented Oct 5, 2023

In a special case in frame index elimination (when the offset is 0), we generate either a S_LSHR_B32 or a V_LSHRREV_B32 using the same code. However, they don't expect their operands in the same order - S_LSHR_B32 takes the value to be shifted first and then the shift amount, whereas V_LSHRREV_B32 has the operands reversed (hence the REV in its name). Update the code & tests to take this into account. Also remove an outdated comment (this code is definitely reachable now that non-entry functions no longer have a fixed emergency scavenge slot).

In a special case in frame index elimination (when the offset is 0), we
generate either a S_LSHR_B32 or a V_LSHRREV_B32 using the same code.
However, they don't expect their operands in the same order - S_LSHR_B32
takes the value to be shifted first and then the shift amount, whereas
V_LSHRREV_B32 has the operands reversed (hence the REV in its name).
Update the code & tests to take this into account. Also remove an
outdated comment (this code is definitely reachable now that non-entry
functions no longer have a fixed emergency scavenge slot).
@llvmbot
Copy link
Collaborator

llvmbot commented Oct 5, 2023

@llvm/pr-subscribers-backend-amdgpu

Changes

In a special case in frame index elimination (when the offset is 0), we generate either a S_LSHR_B32 or a V_LSHRREV_B32 using the same code. However, they don't expect their operands in the same order - S_LSHR_B32 takes the value to be shifted first and then the shift amount, whereas V_LSHRREV_B32 has the operands reversed (hence the REV in its name). Update the code & tests to take this into account. Also remove an outdated comment (this code is definitely reachable now that non-entry functions no longer have a fixed emergency scavenge slot).


Full diff: https://github.com/llvm/llvm-project/pull/68299.diff

2 Files Affected:

  • (modified) llvm/lib/Target/AMDGPU/SIRegisterInfo.cpp (+7-4)
  • (modified) llvm/test/CodeGen/AMDGPU/frame-index.mir (+2-2)
diff --git a/llvm/lib/Target/AMDGPU/SIRegisterInfo.cpp b/llvm/lib/Target/AMDGPU/SIRegisterInfo.cpp
index e065b104db4eb33..d0a81673d6528c2 100644
--- a/llvm/lib/Target/AMDGPU/SIRegisterInfo.cpp
+++ b/llvm/lib/Target/AMDGPU/SIRegisterInfo.cpp
@@ -2432,10 +2432,13 @@ bool SIRegisterInfo::eliminateFrameIndex(MachineBasicBlock::iterator MI,
         if (Offset == 0) {
           unsigned OpCode = IsSALU && !LiveSCC ? AMDGPU::S_LSHR_B32
                                                : AMDGPU::V_LSHRREV_B32_e64;
-          // XXX - This never happens because of emergency scavenging slot at 0?
-          auto Shift = BuildMI(*MBB, MI, DL, TII->get(OpCode), ResultReg)
-                           .addImm(ST.getWavefrontSizeLog2())
-                           .addReg(FrameReg);
+          auto Shift = BuildMI(*MBB, MI, DL, TII->get(OpCode), ResultReg);
+          if (OpCode == AMDGPU::V_LSHRREV_B32_e64)
+            // For V_LSHRREV, the operands are reversed (the shift count goes
+            // first).
+            Shift.addImm(ST.getWavefrontSizeLog2()).addReg(FrameReg);
+          else
+            Shift.addReg(FrameReg).addImm(ST.getWavefrontSizeLog2());
           if (IsSALU && !LiveSCC)
             Shift.getInstr()->getOperand(3).setIsDead(); // Mark SCC as dead.
           if (IsSALU && LiveSCC) {
diff --git a/llvm/test/CodeGen/AMDGPU/frame-index.mir b/llvm/test/CodeGen/AMDGPU/frame-index.mir
index 31bc2b04dde8c13..d8736c5276a26d2 100644
--- a/llvm/test/CodeGen/AMDGPU/frame-index.mir
+++ b/llvm/test/CodeGen/AMDGPU/frame-index.mir
@@ -53,7 +53,7 @@ body:             |
     ; GCN-LABEL: name: func_add_constant_to_fi_uniform_i32
     ; GCN: liveins: $sgpr30_sgpr31
     ; GCN-NEXT: {{  $}}
-    ; GCN-NEXT: $sgpr0 = S_LSHR_B32 6, $sgpr32, implicit-def dead $scc
+    ; GCN-NEXT: $sgpr0 = S_LSHR_B32 $sgpr32, 6, implicit-def dead $scc
     ; GCN-NEXT: renamable $sgpr4 = nuw S_ADD_I32 killed $sgpr0, 4, implicit-def dead $scc
     ; GCN-NEXT: renamable $vgpr0 = COPY killed renamable $sgpr4, implicit $exec
     ; GCN-NEXT: $m0 = S_MOV_B32 -1
@@ -89,7 +89,7 @@ body:             |
     ; GCN-LABEL: name: func_add_constant_to_fi_uniform_SCC_clobber_i32
     ; GCN: liveins: $sgpr30_sgpr31
     ; GCN-NEXT: {{  $}}
-    ; GCN-NEXT: $sgpr0 = S_LSHR_B32 6, $sgpr32, implicit-def dead $scc
+    ; GCN-NEXT: $sgpr0 = S_LSHR_B32 $sgpr32, 6, implicit-def dead $scc
     ; GCN-NEXT: renamable $sgpr4 = nuw S_ADD_U32 killed $sgpr0, 4, implicit-def $scc
     ; GCN-NEXT: renamable $sgpr5 = S_ADDC_U32 $sgpr4, 1234567, implicit-def $scc, implicit $scc
     ; GCN-NEXT: $sgpr0 = S_LSHR_B32 $sgpr32, 6, implicit-def $scc

@@ -2432,10 +2432,13 @@ bool SIRegisterInfo::eliminateFrameIndex(MachineBasicBlock::iterator MI,
if (Offset == 0) {
unsigned OpCode = IsSALU && !LiveSCC ? AMDGPU::S_LSHR_B32
: AMDGPU::V_LSHRREV_B32_e64;
// XXX - This never happens because of emergency scavenging slot at 0?
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this reachable now?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yep, I hit this in an OpenMP test after a change to FixSGPRCopies.

@rovka rovka merged commit be382de into llvm:main Oct 6, 2023
4 checks passed
@rovka rovka deleted the amdgpu-pei-swap-shift-ops branch October 6, 2023 07:43
rovka added a commit that referenced this pull request Oct 6, 2023
…#66882)"

Teach the si-fix-sgpr-copies pass to deal with REG_SEQUENCE, PHI or
INSERT_SUBREG where the result is an SGPR, but some of the inputs are
constants materialized into VGPRs. This may happen in cases where for
instance several instructions use an immediate zero and SelectionDAG
chooses to put it in a VGPR to satisfy all of them. This however causes
the si-fix-sgpr-copies to try to switch the whole chain to VGPR and may
lead to illegal VGPR-to-SGPR copies. Rematerializing the constant into
an SGPR fixes the issue.

This was originally reverted because it triggered an unrelated bug in
PEI on one of the OpenMP buildbots. That bug has been fixed in #68299,
so it should be ok to try again.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants