Skip to content

[AMDGPU][SILoadStoreOptimizer] Fix lds address operand offset#176816

Merged
arsenm merged 5 commits into
llvm:mainfrom
RyanRio:users/rmitchel/fix_load_store_opt_async_lds
Jan 26, 2026
Merged

[AMDGPU][SILoadStoreOptimizer] Fix lds address operand offset#176816
arsenm merged 5 commits into
llvm:mainfrom
RyanRio:users/rmitchel/fix_load_store_opt_async_lds

Conversation

@RyanRio
Copy link
Copy Markdown
Contributor

@RyanRio RyanRio commented Jan 19, 2026

The offset operand in GLOBAL_LOAD_ASYNC_TO_LDS_B128, for instance, is added to both the lds and global address, but SILoadStoreOptimizer is currently unaware of that. This PR inserts an add to counteract the offset meant for the global address. This one add is better than not doing the optimization at all, and having to insert 2 adds for each global address calculation (with no offset).

; ENABLE-LABEL: name: promote_async_load_offset
; ENABLE: liveins: $ttmp7, $vgpr0, $sgpr0_sgpr1
; ENABLE-NEXT: {{  $}}
; ENABLE-NEXT: renamable $vgpr1 = V_LSHLREV_B32_e32 8, $vgpr0, implicit $exec
; ENABLE-NEXT: renamable $vgpr2, renamable $vcc_lo = V_ADD_CO_U32_e64 $vgpr0, 512, 0, implicit $exec
; ENABLE-NEXT: renamable $vgpr3, dead $sgpr_null = V_ADDC_U32_e64 0, killed $vgpr0, killed $vcc_lo, 0, implicit $exec
; ENABLE-NEXT: renamable $vgpr1 = disjoint V_OR_B32_e32 0, killed $vgpr1, implicit $exec
; ENABLE-NEXT: renamable $vgpr0 = V_ADD_U32_e32 256, $vgpr1, implicit $exec
; ENABLE-NEXT: GLOBAL_LOAD_ASYNC_TO_LDS_B128 killed $vgpr0, $vgpr2_vgpr3, -256, 0, implicit-def $asynccnt, implicit $exec, implicit $asynccnt :: (load store (s128), align 1, addrspace 3)
; ENABLE-NEXT: GLOBAL_LOAD_ASYNC_TO_LDS_B128 killed $vgpr1, killed $vgpr2_vgpr3, 0, 0, implicit-def $asynccnt, implicit $exec, implicit $asynccnt :: (load store (s128), align 1, addrspace 3)

; DISABLE-LABEL: name: promote_async_load_offset
; DISABLE: liveins: $ttmp7, $vgpr0, $sgpr0_sgpr1
; DISABLE-NEXT: {{  $}}
; DISABLE-NEXT: renamable $vgpr1 = V_LSHLREV_B32_e32 8, $vgpr0, implicit $exec
; DISABLE-NEXT: renamable $vgpr2, renamable $vcc_lo = V_ADD_CO_U32_e64 256, $vgpr0, 0, implicit $exec
; DISABLE-NEXT: renamable $vgpr3, $sgpr_null = V_ADDC_U32_e64 0, $vgpr0, killed $vcc_lo, 0, implicit $exec
; DISABLE-NEXT: renamable $vgpr1 = disjoint V_OR_B32_e32 0, killed $vgpr1, implicit $exec
; DISABLE-NEXT: GLOBAL_LOAD_ASYNC_TO_LDS_B128 $vgpr1, killed $vgpr2_vgpr3, 0, 0, implicit-def $asynccnt, implicit $exec, implicit $asynccnt :: (load store (s128), align 1, addrspace 3)
; DISABLE-NEXT: renamable $vgpr2, renamable $vcc_lo = V_ADD_CO_U32_e64 512, $vgpr0, 0, implicit $exec
; DISABLE-NEXT: renamable $vgpr3, $sgpr_null = V_ADDC_U32_e64 0, killed $vgpr0, killed $vcc_lo, 0, implicit $exec
; DISABLE-NEXT: GLOBAL_LOAD_ASYNC_TO_LDS_B128 killed $vgpr1, killed $vgpr2_vgpr3, 0, 0, implicit-def $asynccnt, implicit $exec, implicit $asynccnt :: (load store (s128), align 1, addrspace 3)

This PR also promotes the global address to an offset when the offset is calculated with V_ADD_U64 on applicable gfx versions, (and inversely adds the LDS offset), whereas previously the optimization opportunity was missed entirely.

@llvmbot
Copy link
Copy Markdown
Member

llvmbot commented Jan 19, 2026

@llvm/pr-subscribers-backend-amdgpu

Author: Ryan Mitchell (RyanRio)

Changes

The offset operand in GLOBAL_LOAD_ASYNC_TO_LDS_B128, for instance, is added to both the lds and global address, but SILoadStoreOptimizer is currently unaware of that. This PR inserts an add to counteract the offset meant for the global address. This one add is better than not doing the optimization at all, and having to insert 2 adds for each global address calculation (with no offset).

; ENABLE-LABEL: name: promote_async_load_offset
; ENABLE: liveins: $ttmp7, $vgpr0, $sgpr0_sgpr1
; ENABLE-NEXT: {{  $}}
; ENABLE-NEXT: renamable $vgpr1 = V_LSHLREV_B32_e32 8, $vgpr0, implicit $exec
; ENABLE-NEXT: renamable $vgpr2, renamable $vcc_lo = V_ADD_CO_U32_e64 $vgpr0, 512, 0, implicit $exec
; ENABLE-NEXT: renamable $vgpr3, dead $sgpr_null = V_ADDC_U32_e64 0, killed $vgpr0, killed $vcc_lo, 0, implicit $exec
; ENABLE-NEXT: renamable $vgpr1 = disjoint V_OR_B32_e32 0, killed $vgpr1, implicit $exec
; ENABLE-NEXT: renamable $vgpr0 = V_ADD_U32_e32 256, $vgpr1, implicit $exec
; ENABLE-NEXT: GLOBAL_LOAD_ASYNC_TO_LDS_B128 killed $vgpr0, $vgpr2_vgpr3, -256, 0, implicit-def $asynccnt, implicit $exec, implicit $asynccnt :: (load store (s128), align 1, addrspace 3)
; ENABLE-NEXT: GLOBAL_LOAD_ASYNC_TO_LDS_B128 killed $vgpr1, killed $vgpr2_vgpr3, 0, 0, implicit-def $asynccnt, implicit $exec, implicit $asynccnt :: (load store (s128), align 1, addrspace 3)

; DISABLE-LABEL: name: promote_async_load_offset
; DISABLE: liveins: $ttmp7, $vgpr0, $sgpr0_sgpr1
; DISABLE-NEXT: {{  $}}
; DISABLE-NEXT: renamable $vgpr1 = V_LSHLREV_B32_e32 8, $vgpr0, implicit $exec
; DISABLE-NEXT: renamable $vgpr2, renamable $vcc_lo = V_ADD_CO_U32_e64 256, $vgpr0, 0, implicit $exec
; DISABLE-NEXT: renamable $vgpr3, $sgpr_null = V_ADDC_U32_e64 0, $vgpr0, killed $vcc_lo, 0, implicit $exec
; DISABLE-NEXT: renamable $vgpr1 = disjoint V_OR_B32_e32 0, killed $vgpr1, implicit $exec
; DISABLE-NEXT: GLOBAL_LOAD_ASYNC_TO_LDS_B128 $vgpr1, killed $vgpr2_vgpr3, 0, 0, implicit-def $asynccnt, implicit $exec, implicit $asynccnt :: (load store (s128), align 1, addrspace 3)
; DISABLE-NEXT: renamable $vgpr2, renamable $vcc_lo = V_ADD_CO_U32_e64 512, $vgpr0, 0, implicit $exec
; DISABLE-NEXT: renamable $vgpr3, $sgpr_null = V_ADDC_U32_e64 0, killed $vgpr0, killed $vcc_lo, 0, implicit $exec
; DISABLE-NEXT: GLOBAL_LOAD_ASYNC_TO_LDS_B128 killed $vgpr1, killed $vgpr2_vgpr3, 0, 0, implicit-def $asynccnt, implicit $exec, implicit $asynccnt :: (load store (s128), align 1, addrspace 3)

Full diff: https://github.com/llvm/llvm-project/pull/176816.diff

3 Files Affected:

  • (modified) llvm/lib/Target/AMDGPU/SIInstrInfo.h (+8)
  • (modified) llvm/lib/Target/AMDGPU/SILoadStoreOptimizer.cpp (+28-2)
  • (added) llvm/test/CodeGen/AMDGPU/promote-constOffset-to-imm-gfx12.mir (+74)
diff --git a/llvm/lib/Target/AMDGPU/SIInstrInfo.h b/llvm/lib/Target/AMDGPU/SIInstrInfo.h
index bd4029ca97014..2d13ee3800731 100644
--- a/llvm/lib/Target/AMDGPU/SIInstrInfo.h
+++ b/llvm/lib/Target/AMDGPU/SIInstrInfo.h
@@ -1020,6 +1020,14 @@ class SIInstrInfo final : public AMDGPUGenInstrInfo {
     return MI.getDesc().TSFlags & SIInstrFlags::LGKM_CNT;
   }
 
+  static bool usesASYNC_CNT(const MachineInstr &MI) {
+    return MI.getDesc().TSFlags & SIInstrFlags::ASYNC_CNT;
+  }
+
+  bool usesASYNC_CNT(uint16_t Opcode) const {
+    return get(Opcode).TSFlags & SIInstrFlags::ASYNC_CNT;
+  }
+
   // Most sopk treat the immediate as a signed 16-bit, however some
   // use it as unsigned.
   static bool sopkIsZext(unsigned Opcode) {
diff --git a/llvm/lib/Target/AMDGPU/SILoadStoreOptimizer.cpp b/llvm/lib/Target/AMDGPU/SILoadStoreOptimizer.cpp
index dc1592e5fb82a..f50333b5886a7 100644
--- a/llvm/lib/Target/AMDGPU/SILoadStoreOptimizer.cpp
+++ b/llvm/lib/Target/AMDGPU/SILoadStoreOptimizer.cpp
@@ -279,6 +279,7 @@ class SILoadStoreOptimizer {
 
   void updateBaseAndOffset(MachineInstr &I, Register NewBase,
                            int32_t NewOffset) const;
+  void updateAsyncLDSAddress(MachineInstr &MI, int32_t OffsetDiff) const;
   Register computeBase(MachineInstr &MI, const MemAddress &Addr) const;
   MachineOperand createRegOrImm(int32_t Val, MachineInstr &MI) const;
   std::optional<int32_t> extractConstOffset(const MachineOperand &Op) const;
@@ -2311,6 +2312,27 @@ void SILoadStoreOptimizer::processBaseWithConstOffset(const MachineOperand &Base
   Addr.Offset = (*Offset0P & 0x00000000ffffffff) | (Offset1 << 32);
 }
 
+// Maintain the correct LDS address for async loads.
+// It becomes incorrect when promoteConstantOffsetToImm
+// adds an offset only meant for the src operand.
+void SILoadStoreOptimizer::updateAsyncLDSAddress(MachineInstr &MI,
+                                                 int32_t OffsetDiff) const {
+  if (!TII->usesASYNC_CNT(MI) || OffsetDiff == 0)
+    return;
+
+  Register OldVDst = TII->getNamedOperand(MI, AMDGPU::OpName::vdst)->getReg();
+  Register NewVDst =
+      MRI->createVirtualRegister(TRI->getRegClassForReg(*MRI, OldVDst));
+  MachineBasicBlock &MBB = *MI.getParent();
+  DebugLoc DL = MI.getDebugLoc();
+  BuildMI(MBB, MI, DL, TII->get(AMDGPU::V_ADD_U32_e64), NewVDst)
+      .addReg(OldVDst)
+      .addImm(-OffsetDiff)
+      .addImm(0);
+
+  MI.getOperand(0).setReg(NewVDst);
+}
+
 bool SILoadStoreOptimizer::promoteConstantOffsetToImm(
     MachineInstr &MI,
     MemInfoMap &Visited,
@@ -2440,7 +2462,9 @@ bool SILoadStoreOptimizer::promoteConstantOffsetToImm(
     // Instead of moving up, just re-compute anchor-instruction's base address.
     Register Base = computeBase(MI, AnchorAddr);
 
-    updateBaseAndOffset(MI, Base, MAddr.Offset - AnchorAddr.Offset);
+    int32_t OffsetDiff = MAddr.Offset - AnchorAddr.Offset;
+    updateBaseAndOffset(MI, Base, OffsetDiff);
+    updateAsyncLDSAddress(MI, OffsetDiff);
     LLVM_DEBUG(dbgs() << "  After promotion: "; MI.dump(););
 
     for (auto [OtherMI, OtherOffset] : InstsWCommonBase) {
@@ -2451,7 +2475,9 @@ bool SILoadStoreOptimizer::promoteConstantOffsetToImm(
       if (TLI->isLegalFlatAddressingMode(AM, AS)) {
         LLVM_DEBUG(dbgs() << "  Promote Offset(" << OtherOffset; dbgs() << ")";
                    OtherMI->dump());
-        updateBaseAndOffset(*OtherMI, Base, OtherOffset - AnchorAddr.Offset);
+        int32_t OtherOffsetDiff = OtherOffset - AnchorAddr.Offset;
+        updateBaseAndOffset(*OtherMI, Base, OtherOffsetDiff);
+        updateAsyncLDSAddress(*OtherMI, OtherOffsetDiff);
         LLVM_DEBUG(dbgs() << "     After promotion: "; OtherMI->dump());
       }
     }
diff --git a/llvm/test/CodeGen/AMDGPU/promote-constOffset-to-imm-gfx12.mir b/llvm/test/CodeGen/AMDGPU/promote-constOffset-to-imm-gfx12.mir
new file mode 100644
index 0000000000000..3f243f39fa585
--- /dev/null
+++ b/llvm/test/CodeGen/AMDGPU/promote-constOffset-to-imm-gfx12.mir
@@ -0,0 +1,74 @@
+# NOTE: Assertions have been autogenerated by utils/update_mir_test_checks.py
+# RUN: llc -mtriple=amdgcn -mcpu=gfx1250 -verify-machineinstrs -start-before=si-load-store-opt -stop-before=si-insert-waitcnts --verify-machineinstrs -o - %s | FileCheck -check-prefixes=GFX1250 %s
+
+---
+name: promote_async_load_offset
+machineFunctionInfo:
+  stackPtrOffsetReg: '$sgpr32'
+  frameOffsetReg: '$sgpr33'
+tracksRegLiveness: true
+body:             |
+  bb.0.entry:
+    liveins: $vgpr0, $sgpr0_sgpr1, $ttmp7
+    ; GFX1250-LABEL: name: promote_async_load_offset
+    ; GFX1250: liveins: $ttmp7, $vgpr0, $sgpr0_sgpr1
+    ; GFX1250-NEXT: {{  $}}
+    ; GFX1250-NEXT: renamable $vgpr1 = V_LSHLREV_B32_e32 8, $vgpr0, implicit $exec
+    ; GFX1250-NEXT: renamable $vgpr2, renamable $vcc_lo = V_ADD_CO_U32_e64 $vgpr0, 512, 0, implicit $exec
+    ; GFX1250-NEXT: renamable $vgpr3, dead $sgpr_null = V_ADDC_U32_e64 0, killed $vgpr0, killed $vcc_lo, 0, implicit $exec
+    ; GFX1250-NEXT: renamable $vgpr1 = disjoint V_OR_B32_e32 0, killed $vgpr1, implicit $exec
+    ; GFX1250-NEXT: renamable $vgpr0 = V_ADD_U32_e32 256, $vgpr1, implicit $exec
+    ; GFX1250-NEXT: GLOBAL_LOAD_ASYNC_TO_LDS_B128 killed $vgpr0, $vgpr2_vgpr3, -256, 0, implicit-def $asynccnt, implicit $exec, implicit $asynccnt :: (load store (s128), align 1, addrspace 3)
+    ; GFX1250-NEXT: GLOBAL_LOAD_ASYNC_TO_LDS_B128 killed $vgpr1, killed $vgpr2_vgpr3, 0, 0, implicit-def $asynccnt, implicit $exec, implicit $asynccnt :: (load store (s128), align 1, addrspace 3)
+    %0:vgpr_32(s32) = COPY $vgpr0
+    %5:vgpr_32 = V_LSHLREV_B32_e64 8, %0:vgpr_32(s32), implicit $exec
+    %7:vgpr_32 = disjoint V_OR_B32_e64 %5:vgpr_32, 0, implicit $exec
+    %8:vgpr_32 = disjoint V_OR_B32_e64 %5:vgpr_32, 0, implicit $exec
+    %32:vgpr_32, %34:sreg_32_xm0_xexec = V_ADD_CO_U32_e64 %0:vgpr_32, 256, 0, implicit $exec
+    %33:vgpr_32, %35:sreg_32_xm0_xexec = V_ADDC_U32_e64 %0:vgpr_32, 0, killed %34:sreg_32_xm0_xexec, 0, implicit $exec
+    %17:vreg_64_align2 = REG_SEQUENCE %32:vgpr_32, %subreg.sub0, %33:vgpr_32, %subreg.sub1
+    GLOBAL_LOAD_ASYNC_TO_LDS_B128 killed %7:vgpr_32, killed %17:vreg_64_align2, 0, 0, implicit-def $asynccnt, implicit $exec, implicit $asynccnt :: (load store (s128), align 1, addrspace 3)
+    %40:vgpr_32, %36:sreg_32_xm0_xexec = V_ADD_CO_U32_e64 %0:vgpr_32, 512, 0, implicit $exec
+    %41:vgpr_32, %37:sreg_32_xm0_xexec = V_ADDC_U32_e64 %0:vgpr_32, 0, killed %36:sreg_32_xm0_xexec, 0, implicit $exec
+    %21:vreg_64_align2 = REG_SEQUENCE %40:vgpr_32, %subreg.sub0, %41:vgpr_32, %subreg.sub1
+    GLOBAL_LOAD_ASYNC_TO_LDS_B128 killed %8:vgpr_32, killed %21:vreg_64_align2, 0, 0, implicit-def $asynccnt, implicit $exec, implicit $asynccnt :: (load store (s128), align 1, addrspace 3)
+...
+
+---
+name: promote_async_load_offset_two_offsets
+machineFunctionInfo:
+  stackPtrOffsetReg: '$sgpr32'
+  frameOffsetReg: '$sgpr33'
+tracksRegLiveness: true
+body:             |
+  bb.0.entry:
+    liveins: $vgpr0, $sgpr0_sgpr1, $ttmp7
+    ; GFX1250-LABEL: name: promote_async_load_offset_two_offsets
+    ; GFX1250: liveins: $ttmp7, $vgpr0, $sgpr0_sgpr1
+    ; GFX1250-NEXT: {{  $}}
+    ; GFX1250-NEXT: renamable $vgpr1 = V_LSHLREV_B32_e32 8, $vgpr0, implicit $exec
+    ; GFX1250-NEXT: renamable $vgpr2, renamable $vcc_lo = V_ADD_CO_U32_e64 $vgpr0, 516, 0, implicit $exec
+    ; GFX1250-NEXT: renamable $vgpr3, dead $sgpr_null = V_ADDC_U32_e64 0, killed $vgpr0, killed $vcc_lo, 0, implicit $exec
+    ; GFX1250-NEXT: renamable $vgpr1 = disjoint V_OR_B32_e32 0, killed $vgpr1, implicit $exec
+    ; GFX1250-NEXT: renamable $vgpr0 = V_ADD_U32_e32 260, $vgpr1, implicit $exec
+    ; GFX1250-NEXT: GLOBAL_LOAD_ASYNC_TO_LDS_B128 killed $vgpr0, $vgpr2_vgpr3, -260, 0, implicit-def $asynccnt, implicit $exec, implicit $asynccnt :: (load store (s128), align 1, addrspace 3)
+    ; GFX1250-NEXT: renamable $vgpr0 = V_ADD_U32_e32 4, $vgpr1, implicit $exec
+    ; GFX1250-NEXT: GLOBAL_LOAD_ASYNC_TO_LDS_B128 killed $vgpr0, $vgpr2_vgpr3, -4, 0, implicit-def $asynccnt, implicit $exec, implicit $asynccnt :: (load store (s128), align 1, addrspace 3)
+    ; GFX1250-NEXT: GLOBAL_LOAD_ASYNC_TO_LDS_B128 killed $vgpr1, killed $vgpr2_vgpr3, 0, 0, implicit-def $asynccnt, implicit $exec, implicit $asynccnt :: (load store (s128), align 1, addrspace 3)
+    %0:vgpr_32(s32) = COPY $vgpr0
+    %5:vgpr_32 = V_LSHLREV_B32_e64 8, %0:vgpr_32(s32), implicit $exec
+    %7:vgpr_32 = disjoint V_OR_B32_e64 %5:vgpr_32, 0, implicit $exec
+    %8:vgpr_32 = disjoint V_OR_B32_e64 %5:vgpr_32, 0, implicit $exec
+    %32:vgpr_32, %34:sreg_32_xm0_xexec = V_ADD_CO_U32_e64 %0:vgpr_32, 256, 0, implicit $exec
+    %33:vgpr_32, %35:sreg_32_xm0_xexec = V_ADDC_U32_e64 %0:vgpr_32, 0, killed %34:sreg_32_xm0_xexec, 0, implicit $exec
+    %17:vreg_64_align2 = REG_SEQUENCE %32:vgpr_32, %subreg.sub0, %33:vgpr_32, %subreg.sub1
+    GLOBAL_LOAD_ASYNC_TO_LDS_B128 killed %7:vgpr_32, killed %17:vreg_64_align2, 0, 0, implicit-def $asynccnt, implicit $exec, implicit $asynccnt :: (load store (s128), align 1, addrspace 3)
+    %40:vgpr_32, %36:sreg_32_xm0_xexec = V_ADD_CO_U32_e64 %0:vgpr_32, 512, 0, implicit $exec
+    %41:vgpr_32, %37:sreg_32_xm0_xexec = V_ADDC_U32_e64 %0:vgpr_32, 0, killed %36:sreg_32_xm0_xexec, 0, implicit $exec
+    %21:vreg_64_align2 = REG_SEQUENCE %40:vgpr_32, %subreg.sub0, %41:vgpr_32, %subreg.sub1
+    GLOBAL_LOAD_ASYNC_TO_LDS_B128 %8:vgpr_32, killed %21:vreg_64_align2, 0, 0, implicit-def $asynccnt, implicit $exec, implicit $asynccnt :: (load store (s128), align 1, addrspace 3)
+    %42:vgpr_32, %38:sreg_32_xm0_xexec = V_ADD_CO_U32_e64 %0:vgpr_32, 516, 0, implicit $exec
+    %43:vgpr_32, %39:sreg_32_xm0_xexec = V_ADDC_U32_e64 %0:vgpr_32, 0, killed %38:sreg_32_xm0_xexec, 0, implicit $exec
+    %22:vreg_64_align2 = REG_SEQUENCE %42:vgpr_32, %subreg.sub0, %43:vgpr_32, %subreg.sub1
+    GLOBAL_LOAD_ASYNC_TO_LDS_B128 killed %8:vgpr_32, killed %22:vreg_64_align2, 0, 0, implicit-def $asynccnt, implicit $exec, implicit $asynccnt :: (load store (s128), align 1, addrspace 3)
+...

@RyanRio
Copy link
Copy Markdown
Contributor Author

RyanRio commented Jan 19, 2026

@nhaehnle @Sisyph

Copy link
Copy Markdown
Contributor

@arsenm arsenm left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you also include an end to end ir test

Comment thread llvm/lib/Target/AMDGPU/SILoadStoreOptimizer.cpp Outdated
Comment thread llvm/lib/Target/AMDGPU/SILoadStoreOptimizer.cpp Outdated
Comment thread llvm/test/CodeGen/AMDGPU/promote-constOffset-to-imm-gfx12.mir Outdated
Comment thread llvm/test/CodeGen/AMDGPU/promote-constOffset-to-imm-gfx12.mir Outdated
@RyanRio RyanRio force-pushed the users/rmitchel/fix_load_store_opt_async_lds branch from d8ebc76 to e418da0 Compare January 21, 2026 10:37
Comment thread llvm/lib/Target/AMDGPU/SILoadStoreOptimizer.cpp Outdated
Comment thread llvm/lib/Target/AMDGPU/SILoadStoreOptimizer.cpp Outdated
Comment thread llvm/lib/Target/AMDGPU/SILoadStoreOptimizer.cpp Outdated
return std::nullopt;

// Handle S_MOV_B64_IMM_PSEUDO for 64-bit immediates
if (Def->getOpcode() == AMDGPU::S_MOV_B64_IMM_PSEUDO)
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Use getConstValDefinedInReg? Or extract SIFoldOperands::getImmOrMaterializedImm into SIInstrInfo

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I chose to extract getImmOrMaterializedImm, maybe there's also room for improvement to clean up that getConstValDefinedInReg, but that's probably outside the scope of this PR

Comment thread llvm/lib/Target/AMDGPU/SILoadStoreOptimizer.cpp Outdated
Comment thread llvm/lib/Target/AMDGPU/SILoadStoreOptimizer.cpp Outdated
Comment thread llvm/lib/Target/AMDGPU/SILoadStoreOptimizer.cpp Outdated
@github-actions
Copy link
Copy Markdown

github-actions Bot commented Jan 21, 2026

🐧 Linux x64 Test Results

  • 189206 tests passed
  • 5015 tests skipped

✅ The build succeeded and all tests passed.

@RyanRio RyanRio requested a review from arsenm January 22, 2026 07:31
Comment thread llvm/lib/Target/AMDGPU/SILoadStoreOptimizer.cpp Outdated
Comment thread llvm/lib/Target/AMDGPU/SILoadStoreOptimizer.cpp Outdated
Comment thread llvm/lib/Target/AMDGPU/SILoadStoreOptimizer.cpp Outdated
Comment thread llvm/lib/Target/AMDGPU/SILoadStoreOptimizer.cpp Outdated
@RyanRio RyanRio requested a review from arsenm January 23, 2026 05:54
@RyanRio
Copy link
Copy Markdown
Contributor Author

RyanRio commented Jan 26, 2026

@arsenm Please merge on clean CI

@arsenm arsenm merged commit 13b20e7 into llvm:main Jan 26, 2026
11 checks passed

return Def->getOperand(1).getImm();
// Now extract the base register (which should be a 64-bit VGPR).
MachineInstr *BaseDef = MRI->getVRegDef(BaseOp->getReg());
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

BaseDef is unused when building without asserts.

googlewalt added a commit to googlewalt/llvm-project that referenced this pull request Jan 26, 2026
googlewalt added a commit that referenced this pull request Jan 26, 2026
Fix unused variable warning that fires in SILoadStoreOptimizer.cpp when
assertions are disabled. On review, we can just delete the whole assert,
since it isn't querying the def anymore. Fixes #176816
(13b20e7).
@RyanRio RyanRio deleted the users/rmitchel/fix_load_store_opt_async_lds branch January 27, 2026 07:56
antiagainst pushed a commit that referenced this pull request Feb 10, 2026
… async stores (#180220)

`updateAsyncLDSAddress`, introduces by
#176816, previously only
handled async loads , where the LDS address is in the `vdst` operand.
Therefore Async stores produced a nullptr dereference since the LDS
address is in `vdata` for those instructions.

---------

Co-authored-by: Jay Foad <jay.foad@gmail.com>
llvm-sync Bot pushed a commit to arm/arm-toolchain that referenced this pull request Feb 10, 2026
…etToImm for async stores (#180220)

`updateAsyncLDSAddress`, introduces by
llvm/llvm-project#176816, previously only
handled async loads , where the LDS address is in the `vdst` operand.
Therefore Async stores produced a nullptr dereference since the LDS
address is in `vdata` for those instructions.

---------

Co-authored-by: Jay Foad <jay.foad@gmail.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants