Skip to content

[AMDGPU] Add liverange split instructions into BB Prolog#117544

Merged
cdevadas merged 2 commits intomainfrom
users/cdevadas/include-lr-split-insn-to-bb-prolog
Jan 9, 2026
Merged

[AMDGPU] Add liverange split instructions into BB Prolog#117544
cdevadas merged 2 commits intomainfrom
users/cdevadas/include-lr-split-insn-to-bb-prolog

Conversation

@cdevadas
Copy link
Collaborator

@cdevadas cdevadas commented Nov 25, 2024

The COPY inserted for liverange split during sgpr-regalloc
pipeline currently breaks the BB prolog during the subsequent
vgpr-regalloc phase while spilling and/or splitting the vector
liveranges. This patch fixes it by correctly including the
LR split instructions during sgpr-regalloc and wwm-regalloc
pipelines into the BB prolog.

Copy link
Collaborator Author

cdevadas commented Nov 25, 2024

@llvmbot
Copy link
Member

llvmbot commented Nov 25, 2024

@llvm/pr-subscribers-backend-amdgpu

Author: Christudasan Devadasan (cdevadas)

Changes

The COPY inserted for liverange split during sgpr-regalloc
pipeline currently breaks the BB prolog during the subsequent
vgpr-regalloc phase while spilling and/or splitting the vector
liveranges. This patch fixes it by correctly including the
the LR split instructions during sgpr-regalloc and wwm-regalloc
pipelines into the BB prolog.


Patch is 38.33 KiB, truncated to 20.00 KiB below, full version: https://github.com/llvm/llvm-project/pull/117544.diff

4 Files Affected:

  • (modified) llvm/lib/Target/AMDGPU/SIInstrInfo.cpp (+27-7)
  • (modified) llvm/lib/Target/AMDGPU/SIInstrInfo.h (+2)
  • (modified) llvm/test/CodeGen/AMDGPU/identical-subrange-spill-infloop.ll (+57-59)
  • (modified) llvm/test/CodeGen/AMDGPU/ran-out-of-sgprs-allocation-failure.mir (+62-66)
diff --git a/llvm/lib/Target/AMDGPU/SIInstrInfo.cpp b/llvm/lib/Target/AMDGPU/SIInstrInfo.cpp
index 4a94d690297949..204a575e2f64c1 100644
--- a/llvm/lib/Target/AMDGPU/SIInstrInfo.cpp
+++ b/llvm/lib/Target/AMDGPU/SIInstrInfo.cpp
@@ -8956,6 +8956,30 @@ unsigned SIInstrInfo::getLiveRangeSplitOpcode(Register SrcReg,
   return AMDGPU::COPY;
 }
 
+bool SIInstrInfo::canAddToBBProlog(const MachineInstr &MI) const {
+  uint16_t Opcode = MI.getOpcode();
+  // Check if it is SGPR spill or wwm-register spill Opcode.
+  if (isSGPRSpill(Opcode) || isWWMRegSpillOpcode(Opcode))
+    return true;
+
+  const MachineFunction *MF = MI.getMF();
+  const MachineRegisterInfo &MRI = MF->getRegInfo();
+  const SIMachineFunctionInfo *MFI = MF->getInfo<SIMachineFunctionInfo>();
+
+  // See if this is Liverange split instruction inserted for SGPR or
+  // wwm-register. The implicit def inserted for wwm-registers should also be
+  // included as they can appear at the bb begin.
+  bool IsLRSplitInst = MI.getFlag(MachineInstr::LRSplit);
+  if (!IsLRSplitInst && Opcode != AMDGPU::IMPLICIT_DEF)
+    return false;
+
+  Register Reg = MI.getOperand(0).getReg();
+  if (RI.isSGPRClass(RI.getRegClassForReg(MRI, Reg)))
+    return IsLRSplitInst;
+
+  return MFI->isWWMReg(Reg);
+}
+
 bool SIInstrInfo::isBasicBlockPrologue(const MachineInstr &MI,
                                        Register Reg) const {
   // We need to handle instructions which may be inserted during register
@@ -8964,20 +8988,16 @@ bool SIInstrInfo::isBasicBlockPrologue(const MachineInstr &MI,
   // needed by the prolog. However, the insertions for scalar registers can
   // always be placed at the BB top as they are independent of the exec mask
   // value.
-  const MachineFunction *MF = MI.getParent()->getParent();
   bool IsNullOrVectorRegister = true;
   if (Reg) {
+    const MachineFunction *MF = MI.getMF();
     const MachineRegisterInfo &MRI = MF->getRegInfo();
     IsNullOrVectorRegister = !RI.isSGPRClass(RI.getRegClassForReg(MRI, Reg));
   }
 
-  uint16_t Opcode = MI.getOpcode();
-  const SIMachineFunctionInfo *MFI = MF->getInfo<SIMachineFunctionInfo>();
   return IsNullOrVectorRegister &&
-         (isSGPRSpill(Opcode) || isWWMRegSpillOpcode(Opcode) ||
-          (Opcode == AMDGPU::IMPLICIT_DEF &&
-           MFI->isWWMReg(MI.getOperand(0).getReg())) ||
-          (!MI.isTerminator() && Opcode != AMDGPU::COPY &&
+         (canAddToBBProlog(MI) ||
+          (!MI.isTerminator() && MI.getOpcode() != AMDGPU::COPY &&
            MI.modifiesRegister(AMDGPU::EXEC, &RI)));
 }
 
diff --git a/llvm/lib/Target/AMDGPU/SIInstrInfo.h b/llvm/lib/Target/AMDGPU/SIInstrInfo.h
index e55418326a4bd0..ea1d16784645e1 100644
--- a/llvm/lib/Target/AMDGPU/SIInstrInfo.h
+++ b/llvm/lib/Target/AMDGPU/SIInstrInfo.h
@@ -1348,6 +1348,8 @@ class SIInstrInfo final : public AMDGPUGenInstrInfo {
   bool isBasicBlockPrologue(const MachineInstr &MI,
                             Register Reg = Register()) const override;
 
+  bool canAddToBBProlog(const MachineInstr &MI) const;
+
   MachineInstr *createPHIDestinationCopy(MachineBasicBlock &MBB,
                                          MachineBasicBlock::iterator InsPt,
                                          const DebugLoc &DL, Register Src,
diff --git a/llvm/test/CodeGen/AMDGPU/identical-subrange-spill-infloop.ll b/llvm/test/CodeGen/AMDGPU/identical-subrange-spill-infloop.ll
index 5dff660912e402..d7c38f26957677 100644
--- a/llvm/test/CodeGen/AMDGPU/identical-subrange-spill-infloop.ll
+++ b/llvm/test/CodeGen/AMDGPU/identical-subrange-spill-infloop.ll
@@ -176,39 +176,39 @@ define void @main(i1 %arg) #0 {
 ; CHECK-NEXT:    v_readlane_b32 s17, v7, 37
 ; CHECK-NEXT:    v_readlane_b32 s18, v7, 38
 ; CHECK-NEXT:    v_readlane_b32 s19, v7, 39
-; CHECK-NEXT:    v_writelane_b32 v7, s4, 40
-; CHECK-NEXT:    v_writelane_b32 v7, s5, 41
-; CHECK-NEXT:    v_writelane_b32 v7, s6, 42
-; CHECK-NEXT:    v_writelane_b32 v7, s7, 43
-; CHECK-NEXT:    v_writelane_b32 v7, s8, 44
-; CHECK-NEXT:    v_writelane_b32 v7, s9, 45
-; CHECK-NEXT:    v_writelane_b32 v7, s10, 46
-; CHECK-NEXT:    v_writelane_b32 v7, s11, 47
-; CHECK-NEXT:    v_writelane_b32 v7, s12, 48
-; CHECK-NEXT:    v_writelane_b32 v7, s13, 49
-; CHECK-NEXT:    v_writelane_b32 v7, s14, 50
-; CHECK-NEXT:    v_writelane_b32 v7, s15, 51
-; CHECK-NEXT:    v_writelane_b32 v7, s16, 52
-; CHECK-NEXT:    v_writelane_b32 v7, s17, 53
-; CHECK-NEXT:    v_writelane_b32 v7, s18, 54
-; CHECK-NEXT:    v_writelane_b32 v7, s19, 55
+; CHECK-NEXT:    v_writelane_b32 v7, s4, 56
+; CHECK-NEXT:    v_writelane_b32 v7, s5, 57
+; CHECK-NEXT:    v_writelane_b32 v7, s6, 58
+; CHECK-NEXT:    v_writelane_b32 v7, s7, 59
+; CHECK-NEXT:    v_writelane_b32 v7, s8, 60
+; CHECK-NEXT:    v_writelane_b32 v7, s9, 61
+; CHECK-NEXT:    v_writelane_b32 v7, s10, 62
+; CHECK-NEXT:    v_writelane_b32 v7, s11, 63
+; CHECK-NEXT:    v_writelane_b32 v7, s52, 40
+; CHECK-NEXT:    v_writelane_b32 v7, s53, 41
+; CHECK-NEXT:    v_writelane_b32 v7, s54, 42
+; CHECK-NEXT:    v_writelane_b32 v7, s55, 43
+; CHECK-NEXT:    v_writelane_b32 v7, s56, 44
+; CHECK-NEXT:    v_writelane_b32 v7, s57, 45
+; CHECK-NEXT:    v_writelane_b32 v7, s58, 46
 ; CHECK-NEXT:    ; implicit-def: $vgpr6 : SGPR spill to VGPR lane
-; CHECK-NEXT:    v_writelane_b32 v7, s52, 56
-; CHECK-NEXT:    v_writelane_b32 v6, s60, 0
-; CHECK-NEXT:    v_writelane_b32 v7, s53, 57
-; CHECK-NEXT:    v_writelane_b32 v6, s61, 1
-; CHECK-NEXT:    v_writelane_b32 v7, s54, 58
-; CHECK-NEXT:    v_writelane_b32 v6, s62, 2
-; CHECK-NEXT:    v_writelane_b32 v7, s55, 59
-; CHECK-NEXT:    v_writelane_b32 v6, s63, 3
-; CHECK-NEXT:    v_writelane_b32 v7, s56, 60
-; CHECK-NEXT:    v_writelane_b32 v6, s64, 4
-; CHECK-NEXT:    v_writelane_b32 v7, s57, 61
-; CHECK-NEXT:    v_writelane_b32 v6, s65, 5
-; CHECK-NEXT:    v_writelane_b32 v7, s58, 62
-; CHECK-NEXT:    v_writelane_b32 v6, s66, 6
-; CHECK-NEXT:    v_writelane_b32 v7, s59, 63
-; CHECK-NEXT:    v_writelane_b32 v6, s67, 7
+; CHECK-NEXT:    v_writelane_b32 v7, s59, 47
+; CHECK-NEXT:    v_writelane_b32 v6, s12, 0
+; CHECK-NEXT:    v_writelane_b32 v7, s60, 48
+; CHECK-NEXT:    v_writelane_b32 v6, s13, 1
+; CHECK-NEXT:    v_writelane_b32 v7, s61, 49
+; CHECK-NEXT:    v_writelane_b32 v6, s14, 2
+; CHECK-NEXT:    v_writelane_b32 v7, s62, 50
+; CHECK-NEXT:    v_writelane_b32 v6, s15, 3
+; CHECK-NEXT:    v_writelane_b32 v7, s63, 51
+; CHECK-NEXT:    v_writelane_b32 v6, s16, 4
+; CHECK-NEXT:    v_writelane_b32 v7, s64, 52
+; CHECK-NEXT:    v_writelane_b32 v6, s17, 5
+; CHECK-NEXT:    v_writelane_b32 v7, s65, 53
+; CHECK-NEXT:    v_writelane_b32 v6, s18, 6
+; CHECK-NEXT:    v_writelane_b32 v7, s66, 54
+; CHECK-NEXT:    v_writelane_b32 v6, s19, 7
+; CHECK-NEXT:    v_writelane_b32 v7, s67, 55
 ; CHECK-NEXT:    s_andn2_saveexec_b64 s[20:21], s[26:27]
 ; CHECK-NEXT:    s_cbranch_execz .LBB0_10
 ; CHECK-NEXT:  ; %bb.4: ; %bb32
@@ -264,35 +264,39 @@ define void @main(i1 %arg) #0 {
 ; CHECK-NEXT:    buffer_store_dwordx4 v[0:3], off, s[8:11], 0
 ; CHECK-NEXT:    ; implicit-def: $vgpr0
 ; CHECK-NEXT:  .LBB0_6: ; %Flow12
-; CHECK-NEXT:    s_or_saveexec_b64 s[4:5], s[22:23]
-; CHECK-NEXT:    v_readlane_b32 s52, v7, 40
-; CHECK-NEXT:    v_readlane_b32 s53, v7, 41
-; CHECK-NEXT:    v_readlane_b32 s54, v7, 42
-; CHECK-NEXT:    v_readlane_b32 s55, v7, 43
-; CHECK-NEXT:    v_readlane_b32 s56, v7, 44
-; CHECK-NEXT:    v_readlane_b32 s57, v7, 45
-; CHECK-NEXT:    v_readlane_b32 s58, v7, 46
-; CHECK-NEXT:    v_readlane_b32 s59, v7, 47
-; CHECK-NEXT:    v_readlane_b32 s60, v7, 48
-; CHECK-NEXT:    v_readlane_b32 s61, v7, 49
-; CHECK-NEXT:    v_readlane_b32 s62, v7, 50
-; CHECK-NEXT:    v_readlane_b32 s63, v7, 51
-; CHECK-NEXT:    v_readlane_b32 s64, v7, 52
-; CHECK-NEXT:    v_readlane_b32 s65, v7, 53
-; CHECK-NEXT:    v_readlane_b32 s66, v7, 54
-; CHECK-NEXT:    v_readlane_b32 s67, v7, 55
-; CHECK-NEXT:    s_xor_b64 exec, exec, s[4:5]
+; CHECK-NEXT:    s_andn2_saveexec_b64 s[4:5], s[22:23]
 ; CHECK-NEXT:    s_cbranch_execz .LBB0_9
 ; CHECK-NEXT:  ; %bb.7: ; %bb33.preheader
 ; CHECK-NEXT:    s_mov_b32 s8, 0
 ; CHECK-NEXT:    s_mov_b32 s6, s8
+; CHECK-NEXT:    v_readlane_b32 s36, v7, 40
 ; CHECK-NEXT:    s_mov_b32 s7, s8
 ; CHECK-NEXT:    v_mov_b32_e32 v1, s6
-; CHECK-NEXT:    v_readlane_b32 s36, v7, 56
+; CHECK-NEXT:    v_readlane_b32 s37, v7, 41
 ; CHECK-NEXT:    s_mov_b32 s9, s8
 ; CHECK-NEXT:    s_mov_b32 s10, s8
 ; CHECK-NEXT:    s_mov_b32 s11, s8
 ; CHECK-NEXT:    v_mov_b32_e32 v2, s7
+; CHECK-NEXT:    v_readlane_b32 s38, v7, 42
+; CHECK-NEXT:    v_readlane_b32 s39, v7, 43
+; CHECK-NEXT:    v_readlane_b32 s40, v7, 44
+; CHECK-NEXT:    v_readlane_b32 s41, v7, 45
+; CHECK-NEXT:    v_readlane_b32 s42, v7, 46
+; CHECK-NEXT:    v_readlane_b32 s43, v7, 47
+; CHECK-NEXT:    v_readlane_b32 s44, v7, 48
+; CHECK-NEXT:    v_readlane_b32 s45, v7, 49
+; CHECK-NEXT:    v_readlane_b32 s46, v7, 50
+; CHECK-NEXT:    v_readlane_b32 s47, v7, 51
+; CHECK-NEXT:    v_readlane_b32 s48, v7, 52
+; CHECK-NEXT:    v_readlane_b32 s49, v7, 53
+; CHECK-NEXT:    v_readlane_b32 s50, v7, 54
+; CHECK-NEXT:    v_readlane_b32 s51, v7, 55
+; CHECK-NEXT:    s_mov_b64 s[12:13], s[36:37]
+; CHECK-NEXT:    s_mov_b64 s[14:15], s[38:39]
+; CHECK-NEXT:    s_mov_b64 s[16:17], s[40:41]
+; CHECK-NEXT:    s_mov_b64 s[18:19], s[42:43]
+; CHECK-NEXT:    image_sample_lz v3, v[1:2], s[36:43], s[8:11] dmask:0x1
+; CHECK-NEXT:    v_readlane_b32 s36, v7, 56
 ; CHECK-NEXT:    v_readlane_b32 s37, v7, 57
 ; CHECK-NEXT:    v_readlane_b32 s38, v7, 58
 ; CHECK-NEXT:    v_readlane_b32 s39, v7, 59
@@ -300,26 +304,20 @@ define void @main(i1 %arg) #0 {
 ; CHECK-NEXT:    v_readlane_b32 s41, v7, 61
 ; CHECK-NEXT:    v_readlane_b32 s42, v7, 62
 ; CHECK-NEXT:    v_readlane_b32 s43, v7, 63
-; CHECK-NEXT:    s_nop 4
-; CHECK-NEXT:    image_sample_lz v3, v[1:2], s[36:43], s[8:11] dmask:0x1
-; CHECK-NEXT:    image_sample_lz v4, v[1:2], s[52:59], s[8:11] dmask:0x1
 ; CHECK-NEXT:    ; kill: killed $vgpr1_vgpr2
-; CHECK-NEXT:    s_mov_b64 s[12:13], s[36:37]
 ; CHECK-NEXT:    s_and_b64 vcc, exec, 0
 ; CHECK-NEXT:    v_readlane_b32 s44, v6, 0
 ; CHECK-NEXT:    v_readlane_b32 s45, v6, 1
 ; CHECK-NEXT:    v_readlane_b32 s46, v6, 2
 ; CHECK-NEXT:    v_readlane_b32 s47, v6, 3
+; CHECK-NEXT:    image_sample_lz v4, v[1:2], s[36:43], s[8:11] dmask:0x1
 ; CHECK-NEXT:    v_readlane_b32 s48, v6, 4
 ; CHECK-NEXT:    v_readlane_b32 s49, v6, 5
 ; CHECK-NEXT:    v_readlane_b32 s50, v6, 6
 ; CHECK-NEXT:    v_readlane_b32 s51, v6, 7
-; CHECK-NEXT:    s_mov_b64 s[14:15], s[38:39]
-; CHECK-NEXT:    s_mov_b64 s[16:17], s[40:41]
-; CHECK-NEXT:    s_mov_b64 s[18:19], s[42:43]
 ; CHECK-NEXT:    ; kill: killed $sgpr12_sgpr13_sgpr14_sgpr15_sgpr16_sgpr17_sgpr18_sgpr19
 ; CHECK-NEXT:    ; kill: killed $sgpr8_sgpr9_sgpr10 killed $sgpr11
-; CHECK-NEXT:    ; kill: killed $sgpr52_sgpr53_sgpr54_sgpr55_sgpr56_sgpr57_sgpr58_sgpr59
+; CHECK-NEXT:    ; kill: killed $sgpr36_sgpr37_sgpr38_sgpr39_sgpr40_sgpr41_sgpr42_sgpr43
 ; CHECK-NEXT:    s_waitcnt vmcnt(0)
 ; CHECK-NEXT:    v_sub_f32_e32 v1, v4, v3
 ; CHECK-NEXT:    v_mul_f32_e32 v0, v1, v0
diff --git a/llvm/test/CodeGen/AMDGPU/ran-out-of-sgprs-allocation-failure.mir b/llvm/test/CodeGen/AMDGPU/ran-out-of-sgprs-allocation-failure.mir
index c6ee557d970cd7..b376648b29d7d8 100644
--- a/llvm/test/CodeGen/AMDGPU/ran-out-of-sgprs-allocation-failure.mir
+++ b/llvm/test/CodeGen/AMDGPU/ran-out-of-sgprs-allocation-failure.mir
@@ -41,20 +41,22 @@ body:             |
   ; CHECK-NEXT:   successors: %bb.1(0x80000000)
   ; CHECK-NEXT:   liveins: $sgpr14, $sgpr15, $sgpr16, $vgpr0, $sgpr4_sgpr5, $sgpr6_sgpr7, $sgpr8_sgpr9, $sgpr10_sgpr11
   ; CHECK-NEXT: {{  $}}
+  ; CHECK-NEXT:   renamable $sgpr60 = COPY $sgpr15
+  ; CHECK-NEXT:   renamable $sgpr62 = COPY $sgpr14
   ; CHECK-NEXT:   renamable $sgpr34_sgpr35 = V_CMP_GT_I32_e64 1, undef %18:vgpr_32, implicit $exec
   ; CHECK-NEXT:   renamable $sgpr36_sgpr37 = V_CMP_EQ_U32_e64 0, undef %18:vgpr_32, implicit $exec
   ; CHECK-NEXT:   renamable $sgpr38_sgpr39 = V_CMP_NE_U32_e64 0, undef %18:vgpr_32, implicit $exec
   ; CHECK-NEXT:   renamable $sgpr40_sgpr41 = V_CMP_GT_I32_e64 0, undef %18:vgpr_32, implicit $exec
-  ; CHECK-NEXT:   renamable $sgpr60 = S_MOV_B32 0
+  ; CHECK-NEXT:   renamable $sgpr64 = S_MOV_B32 0
   ; CHECK-NEXT:   renamable $sgpr42_sgpr43 = V_CMP_EQ_U32_e64 undef $sgpr4, undef %18:vgpr_32, implicit $exec
-  ; CHECK-NEXT:   [[COPY:%[0-9]+]]:vreg_1024_align2 = COPY renamable $sgpr44_sgpr45_sgpr46_sgpr47_sgpr48_sgpr49_sgpr50_sgpr51_sgpr52_sgpr53_sgpr54_sgpr55_sgpr56_sgpr57_sgpr58_sgpr59_sgpr60_sgpr61_sgpr62_sgpr63_sgpr64_sgpr65_sgpr66_sgpr67_sgpr68_sgpr69_sgpr70_sgpr71_sgpr72_sgpr73_sgpr74_sgpr75, implicit $exec
+  ; CHECK-NEXT:   [[COPY:%[0-9]+]]:vreg_1024_align2 = COPY renamable $sgpr48_sgpr49_sgpr50_sgpr51_sgpr52_sgpr53_sgpr54_sgpr55_sgpr56_sgpr57_sgpr58_sgpr59_sgpr60_sgpr61_sgpr62_sgpr63_sgpr64_sgpr65_sgpr66_sgpr67_sgpr68_sgpr69_sgpr70_sgpr71_sgpr72_sgpr73_sgpr74_sgpr75_sgpr76_sgpr77_sgpr78_sgpr79, implicit $exec
   ; CHECK-NEXT:   renamable $sgpr44_sgpr45 = V_CMP_NE_U32_e64 1, undef %18:vgpr_32, implicit $exec
-  ; CHECK-NEXT:   renamable $sgpr61 = S_MOV_B32 1083786240
+  ; CHECK-NEXT:   renamable $sgpr65 = S_MOV_B32 1083786240
   ; CHECK-NEXT:   S_BRANCH %bb.1
   ; CHECK-NEXT: {{  $}}
   ; CHECK-NEXT: bb.1:
   ; CHECK-NEXT:   successors: %bb.2(0x40000000), %bb.17(0x40000000)
-  ; CHECK-NEXT:   liveins: $sgpr14, $sgpr15, $sgpr16, $sgpr4_sgpr5, $sgpr6_sgpr7, $sgpr8_sgpr9:0x0000000000000003, $sgpr10_sgpr11, $sgpr34_sgpr35, $sgpr36_sgpr37, $sgpr38_sgpr39, $sgpr40_sgpr41, $sgpr42_sgpr43, $sgpr44_sgpr45, $sgpr44_sgpr45_sgpr46_sgpr47_sgpr48_sgpr49_sgpr50_sgpr51_sgpr52_sgpr53_sgpr54_sgpr55_sgpr56_sgpr57_sgpr58_sgpr59_sgpr60_sgpr61_sgpr62_sgpr63_sgpr64_sgpr65_sgpr66_sgpr67_sgpr68_sgpr69_sgpr70_sgpr71_sgpr72_sgpr73_sgpr74_sgpr75:0x0000000F00000000
+  ; CHECK-NEXT:   liveins: $sgpr16, $sgpr60, $sgpr62, $sgpr4_sgpr5, $sgpr6_sgpr7, $sgpr8_sgpr9:0x0000000000000003, $sgpr10_sgpr11, $sgpr34_sgpr35, $sgpr36_sgpr37, $sgpr38_sgpr39, $sgpr40_sgpr41, $sgpr42_sgpr43, $sgpr44_sgpr45, $sgpr48_sgpr49_sgpr50_sgpr51_sgpr52_sgpr53_sgpr54_sgpr55_sgpr56_sgpr57_sgpr58_sgpr59_sgpr60_sgpr61_sgpr62_sgpr63_sgpr64_sgpr65_sgpr66_sgpr67_sgpr68_sgpr69_sgpr70_sgpr71_sgpr72_sgpr73_sgpr74_sgpr75_sgpr76_sgpr77_sgpr78_sgpr79:0x0000000F00000000
   ; CHECK-NEXT: {{  $}}
   ; CHECK-NEXT:   $vcc = S_AND_B64 $exec, renamable $sgpr44_sgpr45, implicit-def dead $scc
   ; CHECK-NEXT:   [[COPY1:%[0-9]+]]:vreg_1024_align2 = COPY [[COPY]]
@@ -63,50 +65,48 @@ body:             |
   ; CHECK-NEXT: {{  $}}
   ; CHECK-NEXT: bb.2:
   ; CHECK-NEXT:   successors: %bb.11(0x40000000), %bb.5(0x40000000)
-  ; CHECK-NEXT:   liveins: $sgpr14, $sgpr15, $sgpr16, $sgpr4_sgpr5, $sgpr6_sgpr7, $sgpr8_sgpr9:0x0000000000000003, $sgpr10_sgpr11, $sgpr34_sgpr35, $sgpr36_sgpr37, $sgpr38_sgpr39, $sgpr40_sgpr41, $sgpr42_sgpr43, $sgpr44_sgpr45, $sgpr44_sgpr45_sgpr46_sgpr47_sgpr48_sgpr49_sgpr50_sgpr51_sgpr52_sgpr53_sgpr54_sgpr55_sgpr56_sgpr57_sgpr58_sgpr59_sgpr60_sgpr61_sgpr62_sgpr63_sgpr64_sgpr65_sgpr66_sgpr67_sgpr68_sgpr69_sgpr70_sgpr71_sgpr72_sgpr73_sgpr74_sgpr75:0x0000000F00000000
-  ; CHECK-NEXT: {{  $}}
-  ; CHECK-NEXT:   renamable $sgpr64 = COPY renamable $sgpr60
-  ; CHECK-NEXT:   renamable $sgpr65 = COPY renamable $sgpr60
-  ; CHECK-NEXT:   renamable $sgpr66 = COPY renamable $sgpr60
-  ; CHECK-NEXT:   renamable $sgpr67 = COPY renamable $sgpr60
-  ; CHECK-NEXT:   renamable $sgpr68 = COPY renamable $sgpr60
-  ; CHECK-NEXT:   renamable $sgpr69 = COPY renamable $sgpr60
-  ; CHECK-NEXT:   renamable $sgpr70 = COPY renamable $sgpr60
-  ; CHECK-NEXT:   renamable $sgpr71 = COPY renamable $sgpr60
-  ; CHECK-NEXT:   renamable $sgpr72 = COPY renamable $sgpr60
-  ; CHECK-NEXT:   renamable $sgpr73 = COPY renamable $sgpr60
-  ; CHECK-NEXT:   renamable $sgpr74 = COPY renamable $sgpr60
-  ; CHECK-NEXT:   renamable $sgpr75 = COPY renamable $sgpr60
-  ; CHECK-NEXT:   renamable $sgpr76 = COPY renamable $sgpr60
-  ; CHECK-NEXT:   renamable $sgpr77 = COPY renamable $sgpr60
-  ; CHECK-NEXT:   renamable $sgpr78 = COPY renamable $sgpr60
-  ; CHECK-NEXT:   renamable $sgpr79 = COPY renamable $sgpr60
-  ; CHECK-NEXT:   renamable $sgpr80 = COPY renamable $sgpr60
-  ; CHECK-NEXT:   renamable $sgpr81 = COPY renamable $sgpr60
-  ; CHECK-NEXT:   renamable $sgpr82 = COPY renamable $sgpr60
-  ; CHECK-NEXT:   renamable $sgpr83 = COPY renamable $sgpr60
-  ; CHECK-NEXT:   renamable $sgpr84 = COPY renamable $sgpr60
-  ; CHECK-NEXT:   renamable $sgpr85 = COPY renamable $sgpr60
-  ; CHECK-NEXT:   renamable $sgpr86 = COPY renamable $sgpr60
-  ; CHECK-NEXT:   renamable $sgpr87 = COPY renamable $sgpr60
-  ; CHECK-NEXT:   renamable $sgpr88 = COPY renamable $sgpr60
-  ; CHECK-NEXT:   renamable $sgpr89 = COPY renamable $sgpr60
-  ; CHECK-NEXT:   renamable $sgpr90 = COPY renamable $sgpr60
-  ; CHECK-NEXT:   renamable $sgpr91 = COPY renamable $sgpr60
-  ; CHECK-NEXT:   renamable $sgpr92 = COPY renamable $sgpr60
-  ; CHECK-NEXT:   renamable $sgpr93 = COPY renamable $sgpr60
-  ; CHECK-NEXT:   renamable $sgpr94 = COPY renamable $sgpr60
-  ; CHECK-NEXT:   renamable $sgpr95 = COPY renamable $sgpr60
-  ; CHECK-NEXT:   [[COPY:%[0-9]+]]:vreg_1024_align2 = COPY killed renamable $sgpr64_sgpr65_sgpr66_sgpr67_sgpr68_sgpr69_sgpr70_sgpr71_sgpr72_sgpr73_sgpr74_sgpr75_sgpr76_sgpr77_sgpr78_sgpr79_sgpr80_sgpr81_sgpr82_sgpr83_sgpr84_sgpr85_sgpr86_sgpr87_sgpr88_sgpr89_sgpr90_sgpr91_sgpr92_sgpr93_sgpr94_sgpr95, implicit $exec
+  ; CHECK-NEXT:   liveins: $sgpr16, $sgpr60, $sgpr62, $sgpr4_sgpr5, $sgpr6_sgpr7, $sgpr8_sgpr9:0x0000000000000003, $sgpr10_sgpr11, $sgpr34_sgpr35, $sgpr36_sgpr37, $sgpr38_sgpr39, $sgpr40_sgpr41, $sgpr42_sgpr43, $sgpr44_sgpr45, $sgpr48_sgpr49_sgpr50_sgpr51_sgpr52_sgpr53_sgpr54_sgpr55_sgpr56_sgpr57_sgpr58_sgpr59_sgpr60_sgpr61_sgpr62_sgpr63_sgpr64_sgpr65_sgpr66_sgpr67_sgpr68_sgpr69_sgpr70_sgpr71_sgpr72_sgpr73_sgpr74_sgpr75_sgpr76_sgpr77_sgpr78_sgpr79:0x0000000F00000000
+  ; CHECK-NEXT: {{  $}}
+  ; CHECK-NEXT:   renamable $sgpr68 = COPY renamable $sgpr64
+  ; CHECK-NEXT:   renamable $sgpr69 = COPY renamable $sgpr64
+  ; CHECK-NEXT:   renamable $sgpr70 = COPY renamable $sgpr64
+  ; CHECK-NEXT:   renamable $sgpr71 = COPY renamable $sgpr64
+  ; CHECK-NEXT:   renamable $sgpr72 = COPY renamable $sgpr64
+  ; CHECK-NEXT:   renamable $sgpr73 = COPY renamable $sgpr64
+  ; CHECK-NEXT:   renamable $sgpr74 = COPY renamable $sgpr64
+  ; CHECK-NEXT:   renamable $sgpr75 = COPY renamable $sgpr64
+  ; CHECK-NEXT:   renamable $sgpr76 = COPY renamable $sgpr64
+  ; CHECK-NEXT:   renamable $sgpr77 = COPY renamable $sgpr64
+  ; CHECK-NEXT:   renamable $sgpr78 = COPY renamable $sgpr64
+  ; CHECK-NEXT:   renamable $sgpr79 = COPY renamable $sgpr64
+  ; CHECK-NEXT:   renamable $sgpr80 = COPY renamable $sgpr64
+  ; CHECK-NEXT:   renamable $sgpr81 = COPY renamable $sgpr64
+  ; CHECK-NEXT:   renamable $sgpr82 = COPY renamable $sgpr64
+  ; CHECK-NEXT:   renamable $sgpr83 = COPY renamable $sgpr64
+  ; CHECK-NEXT:   renamable $sgpr84 = COPY renamable $sgpr64
+  ; CHECK-NEXT:   renamable $sgpr85 = COPY renamable $sgpr64
+  ; CHECK-NEXT:   renamable $sgpr86 = COPY renamable $sgpr64
+  ; CHECK-NEXT:   renamable $sgpr87 = COPY renamable $sgpr64
+  ; CHECK-NEXT:   renamable $sgpr88 = COPY renamable $sgpr64
+  ; CHECK-NEXT:   renamable $sgpr89 = COPY renamable $sgpr64
+  ; CHECK-NEXT:   renamable $sgpr90 = COPY renamable $sgpr64
+  ; CHECK-NEXT:   renamable $sgpr91 = COPY renamable $sgpr64
+  ; CHECK-NEXT:   renamable $sgpr92 = COPY renamable $sgpr64
+  ; CHECK-NEXT:   renamable $sgpr93 = COPY renamable $sgpr64
+  ; CHECK-NEXT:   renamable $sgpr94 = COPY renamable $sgpr64
+  ; CHECK-NEXT:   renamable $sgpr95 = COPY renamable $sgpr64
+  ; CHECK-NEXT:   renamable $sgpr96 = COPY renamable $sgpr64
+  ; CHECK-NEXT:   renamable $sgpr97 = COPY renamable $sgpr64
+  ; CHECK-NEXT:   renamable $sgpr98 = COPY renamable $sgpr64
+  ; CHECK-NEXT:   renamable $sgpr99 = COPY renamable $sgpr64
+  ; CHECK-NEXT:   [[COPY:%[0-9]+]]:vreg_1024_align2 = COPY killed renamable $sgpr68_sgpr69_sgpr70_sgpr71_sgpr72_sgpr73_sgpr74_sgpr75_sgpr76_sgpr77_sgpr78_sgpr79_sgpr80_sgpr81_sgpr82_sgpr83_sgpr84_sgpr85_sgpr86_sgpr87_sgpr88_sgpr89_sgpr90_sgpr91_sgpr92_sgpr93_sgpr94_sgpr95_sgpr96_sgpr97_sgpr98_sgpr99, implicit $exec
   ; CHECK-NEXT:   S_CBRANCH_EXECZ %bb.11, implicit $exec
   ; CHECK-NEXT:   S_BRANCH %bb.5
   ; CHECK-NEXT: {{  $}}
   ; CHECK-NEXT: bb.3:
   ; CHECK-NEXT:   successors: %bb.4(0x80000000)
-  ; CHECK-NEXT:   liveins: $sgpr14, $sgpr15, $sgpr16
+  ; CHECK-NEXT:   liv...
[truncated]

@cdevadas cdevadas force-pushed the users/cdevadas/include-lr-split-insn-to-bb-prolog branch from 2e2b216 to 80232b5 Compare November 7, 2025 06:50
@cdevadas cdevadas force-pushed the users/cdevadas/mi-flag-for-lr-split-insn branch from 90f830f to a9587c1 Compare November 7, 2025 06:50
@cdevadas
Copy link
Collaborator Author

This PR stack is essential for fixing multiple issues. Can we get this stack merged?

// See if this is Liverange split instruction inserted for SGPR or
// wwm-register. The implicit def inserted for wwm-registers should also be
// included as they can appear at the bb begin.
bool IsLRSplitInst = MI.getFlag(MachineInstr::LRSplit);
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Instead of adding a flag, would it be enough to check that this is an SGPR copy?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That won't help for the WWM case (though below it is checking the flag in the SGPR case too)

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Instead of adding a flag, would it be enough to check that this is an SGPR copy?

That won't help. It is important to note that this target hook is also invoked by other passes to identify the right insertion points, such as PHIElimination and MI Sink (called via SkipPHIsAndLabels & SkipPHIsLabelsAndDebug), which can lead to incorrect insertion points if all COPY instructions are included as part of BB Prolog (we have encountered some errors due to that. See this comment from the other PR #117543 (comment)). By adding an MI flag to LR_Split COPY instructions, we ensure that only the intended candidates are considered during RA while inserting spill and LR_split instructions.

@cdevadas
Copy link
Collaborator Author

cdevadas commented Jan 1, 2026

Ping.

Copy link
Collaborator

@qcolombet qcolombet left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

For some targets, it is required to identify the COPY instruction
corresponds to the RA inserted live range split. Adding the new
flag `MachineInstr::LRSplit` to serve the purpose.
The COPY inserted for liverange split during sgpr-regalloc
pipeline currently breaks the BB prolog during the subsequent
vgpr-regalloc phase while spilling and/or splitting the vector
liveranges. This patch fixes it by correctly including the
the LR split instructions during sgpr-regalloc and wwm-regalloc
pipelines into the BB prolog.
@cdevadas cdevadas force-pushed the users/cdevadas/mi-flag-for-lr-split-insn branch from a9587c1 to 7d21940 Compare January 9, 2026 09:06
@cdevadas cdevadas force-pushed the users/cdevadas/include-lr-split-insn-to-bb-prolog branch from 80232b5 to 4f9a9eb Compare January 9, 2026 09:06
Copy link
Collaborator Author

cdevadas commented Jan 9, 2026

Merge activity

  • Jan 9, 12:40 PM UTC: A user started a stack merge that includes this pull request via Graphite.
  • Jan 9, 3:41 PM UTC: The Graphite merge of this pull request was cancelled.
  • Jan 9, 3:54 PM UTC: A user started a stack merge that includes this pull request via Graphite.
  • Jan 9, 3:55 PM UTC: @cdevadas merged this pull request with Graphite.

@cdevadas cdevadas force-pushed the users/cdevadas/mi-flag-for-lr-split-insn branch 12 times, most recently from afb5a58 to c328970 Compare January 9, 2026 14:15
@cdevadas cdevadas force-pushed the users/cdevadas/mi-flag-for-lr-split-insn branch 7 times, most recently from a519089 to 45b1aab Compare January 9, 2026 15:01
@cdevadas cdevadas force-pushed the users/cdevadas/mi-flag-for-lr-split-insn branch 4 times, most recently from 7fc9e73 to df7d4a7 Compare January 9, 2026 15:40
Base automatically changed from users/cdevadas/mi-flag-for-lr-split-insn to main January 9, 2026 15:53
@cdevadas cdevadas merged commit e486a26 into main Jan 9, 2026
13 of 17 checks passed
@cdevadas cdevadas deleted the users/cdevadas/include-lr-split-insn-to-bb-prolog branch January 9, 2026 15:55
akadutta added a commit that referenced this pull request Jan 15, 2026
…ics" (#176081)

This reverts #174224 and re-applies #131759 .

Note: If #117544 is reverted, this should also be reverted.
@akadutta
Copy link
Contributor

If this patch is reverted, please revert #176081 as well.

Priyanshu3820 pushed a commit to Priyanshu3820/llvm-project that referenced this pull request Jan 18, 2026
The COPY inserted for liverange split during sgpr-regalloc
pipeline currently breaks the BB prolog during the subsequent
vgpr-regalloc phase while spilling and/or splitting the vector
liveranges. This patch fixes it by correctly including the
LR split instructions during sgpr-regalloc and wwm-regalloc
pipelines into the BB prolog.
Priyanshu3820 pushed a commit to Priyanshu3820/llvm-project that referenced this pull request Jan 18, 2026
…ics" (llvm#176081)

This reverts llvm#174224 and re-applies llvm#131759 .

Note: If llvm#117544 is reverted, this should also be reverted.
BStott6 pushed a commit to BStott6/llvm-project that referenced this pull request Jan 22, 2026
…ics" (llvm#176081)

This reverts llvm#174224 and re-applies llvm#131759 .

Note: If llvm#117544 is reverted, this should also be reverted.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants