Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[RegisterCoalescer] Fix SUBREG_TO_REG handling in the RegisterCoalescer. #96839

Merged
merged 4 commits into from
Jul 24, 2024

Conversation

stefanp-ibm
Copy link
Contributor

@stefanp-ibm stefanp-ibm commented Jun 27, 2024

The issue with the handling of the SUBREG_TO_REG is that we don't join the subranges correctly when we join live ranges across the SUBREG_TO_REG. For example when joining across this:

32B	  %2:gr64_nosp = SUBREG_TO_REG 0, %0:gr32, %subreg.sub_32bit

we want to join these live ranges:

%0 [16r,32r:0) 0@16r  weight:0.000000e+00
%2 [32r,112r:0) 0@32r  weight:0.000000e+00

Before the fix the range for the resulting merged %2 is:

%2 [16r,112r:0) 0@16r  weight:0.000000e+00

After the fix it is now this:

%2 [16r,112r:0) 0@16r  L000000000000000F [16r,112r:0) 0@16r  weight:0.000000e+00

Two tests are added to this fix. The X86 test fails without the patch.
The PowerPC test passes with and without the patch but is added as a way
track future possible failures when register classes are changed in a
future patch.

@llvmbot
Copy link
Collaborator

llvmbot commented Jun 27, 2024

@llvm/pr-subscribers-backend-x86

Author: Stefan Pintilie (stefanp-ibm)

Changes

Two tests are added to this fix. The X86 test fails without the patch.
The PowerPC test passes with and without the patch but is added as a way
track future possible failures when register classes are changed in a
future patch.


Full diff: https://github.com/llvm/llvm-project/pull/96839.diff

3 Files Affected:

  • (modified) llvm/lib/CodeGen/RegisterCoalescer.cpp (+8)
  • (added) llvm/test/CodeGen/PowerPC/subreg-coalescer.mir (+26)
  • (added) llvm/test/CodeGen/X86/subreg-fail.mir (+124)
diff --git a/llvm/lib/CodeGen/RegisterCoalescer.cpp b/llvm/lib/CodeGen/RegisterCoalescer.cpp
index 3397cb4f3661fe..75cdcf811fd401 100644
--- a/llvm/lib/CodeGen/RegisterCoalescer.cpp
+++ b/llvm/lib/CodeGen/RegisterCoalescer.cpp
@@ -3671,6 +3671,14 @@ bool RegisterCoalescer::joinVirtRegs(CoalescerPair &CP) {
     // having stale segments.
     LHSVals.pruneMainSegments(LHS, ShrinkMainRange);
 
+    LHSVals.pruneSubRegValues(LHS, ShrinkMask);
+    RHSVals.pruneSubRegValues(LHS, ShrinkMask);
+  } else if (TrackSubRegLiveness && !CP.getDstIdx() && CP.getSrcIdx()) {
+    LHS.createSubRangeFrom(LIS->getVNInfoAllocator(),
+                           CP.getNewRC()->getLaneMask(), LHS);
+    mergeSubRangeInto(LHS, RHS, TRI->getSubRegIndexLaneMask(CP.getSrcIdx()), CP,
+                      CP.getDstIdx());
+    LHSVals.pruneMainSegments(LHS, ShrinkMainRange);
     LHSVals.pruneSubRegValues(LHS, ShrinkMask);
     RHSVals.pruneSubRegValues(LHS, ShrinkMask);
   }
diff --git a/llvm/test/CodeGen/PowerPC/subreg-coalescer.mir b/llvm/test/CodeGen/PowerPC/subreg-coalescer.mir
new file mode 100644
index 00000000000000..8b04997ff335f7
--- /dev/null
+++ b/llvm/test/CodeGen/PowerPC/subreg-coalescer.mir
@@ -0,0 +1,26 @@
+# RUN: llc -mtriple powerpc64le-unknown-linux-gnu -mcpu=pwr8 -x mir < %s \
+# RUN:   -verify-machineinstrs --run-pass=register-coalescer -o - | FileCheck %s
+
+---
+name: check_subregs
+alignment:       16
+tracksRegLiveness: true
+body:             |
+  bb.0.entry:
+    liveins: $x3
+
+    %0:g8rc_and_g8rc_nox0 = COPY $x3
+    %3:f8rc, %4:g8rc_and_g8rc_nox0 = LFSUX %0, %0
+    %5:f4rc = FRSP killed %3, implicit $rm
+    %22:vslrc = SUBREG_TO_REG 1, %5, %subreg.sub_64
+    %11:vrrc = XVCVDPSP killed %22, implicit $rm
+    $v2 = COPY %11
+    BLR8 implicit $lr8, implicit $rm, implicit $v2
+...
+
+# CHECK:         %0:g8rc_and_g8rc_nox0 = COPY $x3
+# CHECK-NEXT:    %1:f8rc, dead %2:g8rc_and_g8rc_nox0 = LFSUX %0, %0
+# CHECK-NEXT:    undef %4.sub_64:vslrc = FRSP %1, implicit $rm
+# CHECK-NEXT:    %5:vrrc = XVCVDPSP %4, implicit $rm
+# CHECK-NEXT:    $v2 = COPY %5
+# CHECK-NEXT:    BLR8 implicit $lr8, implicit $rm, implicit $v2
diff --git a/llvm/test/CodeGen/X86/subreg-fail.mir b/llvm/test/CodeGen/X86/subreg-fail.mir
new file mode 100644
index 00000000000000..a563141d59c9fe
--- /dev/null
+++ b/llvm/test/CodeGen/X86/subreg-fail.mir
@@ -0,0 +1,124 @@
+# RUN: llc -mtriple x86_64-unknown-unknown -x mir < %s \
+# RUN:   -verify-machineinstrs -enable-subreg-liveness=true \
+# RUN:   --run-pass=register-coalescer -o - | FileCheck %s
+
+--- |
+  target datalayout = "e-m:e-p270:32:32-p271:32:32-p272:64:64-i64:64-i128:128-f80:128-n8:16:32:64-S128"
+  target triple = "x86_64-unknown-unknown"
+  
+  %pair = type { i64, double }
+  %t21 = type { ptr }
+  %t13 = type { ptr, %t15, %t15 }
+  %t15 = type { i8, i32, i32 }
+  
+  @__force_order = external hidden global i32, align 4
+  @.str = private unnamed_addr constant { [1 x i8], [63 x i8] } zeroinitializer, align 32
+  @a = external global i32, align 4
+  @fn1.g = private unnamed_addr constant [9 x ptr] [ptr null, ptr @a, ptr null, ptr null, ptr null, ptr null, ptr null, ptr null, ptr null], align 16
+  @e = external global i32, align 4
+  @__stack_chk_guard = external dso_local global ptr
+  
+  ; Function Attrs: nounwind ssp
+  define i32 @test1() #0 {
+  entry:
+    %tmp5.i = load volatile i32, ptr undef, align 4
+    %conv.i = zext i32 %tmp5.i to i64
+    %tmp12.i = load volatile i32, ptr undef, align 4
+    %conv13.i = zext i32 %tmp12.i to i64
+    %shl.i = shl i64 %conv13.i, 32
+    %or.i = or i64 %shl.i, %conv.i
+    %add16.i = add i64 %or.i, 256
+    %shr.i = lshr i64 %add16.i, 8
+    %conv19.i = trunc i64 %shr.i to i32
+    store volatile i32 %conv19.i, ptr undef, align 4
+    ret i32 undef
+  }
+...
+---
+name:            test1
+alignment:       16
+exposesReturnsTwice: false
+legalized:       false
+regBankSelected: false
+selected:        false
+failedISel:      false
+tracksRegLiveness: true
+hasWinCFI:       false
+callsEHReturn:   false
+callsUnwindInit: false
+hasEHCatchret:   false
+hasEHScopes:     false
+hasEHFunclets:   false
+isOutlined:      false
+debugInstrRef:   true
+failsVerification: false
+tracksDebugUserValues: false
+registers:
+  - { id: 0, class: gr32, preferred-register: '' }
+  - { id: 1, class: gr64, preferred-register: '' }
+  - { id: 2, class: gr64_nosp, preferred-register: '' }
+  - { id: 3, class: gr32, preferred-register: '' }
+  - { id: 4, class: gr64, preferred-register: '' }
+  - { id: 5, class: gr64, preferred-register: '' }
+  - { id: 6, class: gr64, preferred-register: '' }
+  - { id: 7, class: gr64, preferred-register: '' }
+  - { id: 8, class: gr64, preferred-register: '' }
+  - { id: 9, class: gr32, preferred-register: '' }
+  - { id: 10, class: gr64, preferred-register: '' }
+  - { id: 11, class: gr32, preferred-register: '' }
+liveins:         []
+frameInfo:
+  isFrameAddressTaken: false
+  isReturnAddressTaken: false
+  hasStackMap:     false
+  hasPatchPoint:   false
+  stackSize:       0
+  offsetAdjustment: 0
+  maxAlignment:    1
+  adjustsStack:    false
+  hasCalls:        false
+  stackProtector:  ''
+  functionContext: ''
+  maxCallFrameSize: 4294967295
+  cvBytesOfCalleeSavedRegisters: 0
+  hasOpaqueSPAdjustment: false
+  hasVAStart:      false
+  hasMustTailInVarArgFunc: false
+  hasTailCall:     false
+  isCalleeSavedInfoValid: false
+  localFrameSize:  0
+  savePoint:       ''
+  restorePoint:    ''
+fixedStack:      []
+stack:           []
+entry_values:    []
+callSites:       []
+debugValueSubstitutions: []
+constants:       []
+machineFunctionInfo:
+  amxProgModel:    None
+body:             |
+  bb.0.entry:
+    %0:gr32 = MOV32rm undef %1:gr64, 1, $noreg, 0, $noreg :: (volatile load (s32) from `ptr undef`)
+    %2:gr64_nosp = SUBREG_TO_REG 0, killed %0, %subreg.sub_32bit
+    %3:gr32 = MOV32rm undef %4:gr64, 1, $noreg, 0, $noreg :: (volatile load (s32) from `ptr undef`)
+    %5:gr64 = SUBREG_TO_REG 0, killed %3, %subreg.sub_32bit
+    %6:gr64 = COPY killed %5
+    %6:gr64 = SHL64ri %6, 32, implicit-def dead $eflags
+    %7:gr64 = LEA64r killed %6, 1, killed %2, 256, $noreg
+    %8:gr64 = COPY killed %7
+    %8:gr64 = SHR64ri %8, 8, implicit-def dead $eflags
+    %9:gr32 = COPY killed %8.sub_32bit
+    MOV32mr undef %10:gr64, 1, $noreg, 0, $noreg, killed %9 :: (volatile store (s32) into `ptr undef`)
+    RET 0, undef $eax
+
+...
+
+# CHECK:         undef %2.sub_32bit:gr64_nosp = MOV32rm undef %1:gr64, 1, $noreg, 0, $noreg :: (volatile load (s32) from `ptr undef`)
+# CHECK-NEXT:    undef %6.sub_32bit:gr64_with_sub_8bit = MOV32rm undef %4:gr64, 1, $noreg, 0, $noreg :: (volatile load (s32) from `ptr undef`)
+# CHECK-NEXT:    %6:gr64_with_sub_8bit = SHL64ri %6, 32, implicit-def dead $eflags
+# CHECK-NEXT:    %8:gr64_with_sub_8bit = LEA64r %6, 1, %2, 256, $noreg
+# CHECK-NEXT:    %8:gr64_with_sub_8bit = SHR64ri %8, 8, implicit-def dead $eflags
+# CHECK-NEXT:    MOV32mr undef %10:gr64, 1, $noreg, 0, $noreg, %8.sub_32bit :: (volatile store (s32) into `ptr undef`)
+# CHECK-NEXT:    RET 0, undef $eax
+

@llvmbot
Copy link
Collaborator

llvmbot commented Jun 27, 2024

@llvm/pr-subscribers-llvm-regalloc

Author: Stefan Pintilie (stefanp-ibm)

Changes

Two tests are added to this fix. The X86 test fails without the patch.
The PowerPC test passes with and without the patch but is added as a way
track future possible failures when register classes are changed in a
future patch.


Full diff: https://github.com/llvm/llvm-project/pull/96839.diff

3 Files Affected:

  • (modified) llvm/lib/CodeGen/RegisterCoalescer.cpp (+8)
  • (added) llvm/test/CodeGen/PowerPC/subreg-coalescer.mir (+26)
  • (added) llvm/test/CodeGen/X86/subreg-fail.mir (+124)
diff --git a/llvm/lib/CodeGen/RegisterCoalescer.cpp b/llvm/lib/CodeGen/RegisterCoalescer.cpp
index 3397cb4f3661fe..75cdcf811fd401 100644
--- a/llvm/lib/CodeGen/RegisterCoalescer.cpp
+++ b/llvm/lib/CodeGen/RegisterCoalescer.cpp
@@ -3671,6 +3671,14 @@ bool RegisterCoalescer::joinVirtRegs(CoalescerPair &CP) {
     // having stale segments.
     LHSVals.pruneMainSegments(LHS, ShrinkMainRange);
 
+    LHSVals.pruneSubRegValues(LHS, ShrinkMask);
+    RHSVals.pruneSubRegValues(LHS, ShrinkMask);
+  } else if (TrackSubRegLiveness && !CP.getDstIdx() && CP.getSrcIdx()) {
+    LHS.createSubRangeFrom(LIS->getVNInfoAllocator(),
+                           CP.getNewRC()->getLaneMask(), LHS);
+    mergeSubRangeInto(LHS, RHS, TRI->getSubRegIndexLaneMask(CP.getSrcIdx()), CP,
+                      CP.getDstIdx());
+    LHSVals.pruneMainSegments(LHS, ShrinkMainRange);
     LHSVals.pruneSubRegValues(LHS, ShrinkMask);
     RHSVals.pruneSubRegValues(LHS, ShrinkMask);
   }
diff --git a/llvm/test/CodeGen/PowerPC/subreg-coalescer.mir b/llvm/test/CodeGen/PowerPC/subreg-coalescer.mir
new file mode 100644
index 00000000000000..8b04997ff335f7
--- /dev/null
+++ b/llvm/test/CodeGen/PowerPC/subreg-coalescer.mir
@@ -0,0 +1,26 @@
+# RUN: llc -mtriple powerpc64le-unknown-linux-gnu -mcpu=pwr8 -x mir < %s \
+# RUN:   -verify-machineinstrs --run-pass=register-coalescer -o - | FileCheck %s
+
+---
+name: check_subregs
+alignment:       16
+tracksRegLiveness: true
+body:             |
+  bb.0.entry:
+    liveins: $x3
+
+    %0:g8rc_and_g8rc_nox0 = COPY $x3
+    %3:f8rc, %4:g8rc_and_g8rc_nox0 = LFSUX %0, %0
+    %5:f4rc = FRSP killed %3, implicit $rm
+    %22:vslrc = SUBREG_TO_REG 1, %5, %subreg.sub_64
+    %11:vrrc = XVCVDPSP killed %22, implicit $rm
+    $v2 = COPY %11
+    BLR8 implicit $lr8, implicit $rm, implicit $v2
+...
+
+# CHECK:         %0:g8rc_and_g8rc_nox0 = COPY $x3
+# CHECK-NEXT:    %1:f8rc, dead %2:g8rc_and_g8rc_nox0 = LFSUX %0, %0
+# CHECK-NEXT:    undef %4.sub_64:vslrc = FRSP %1, implicit $rm
+# CHECK-NEXT:    %5:vrrc = XVCVDPSP %4, implicit $rm
+# CHECK-NEXT:    $v2 = COPY %5
+# CHECK-NEXT:    BLR8 implicit $lr8, implicit $rm, implicit $v2
diff --git a/llvm/test/CodeGen/X86/subreg-fail.mir b/llvm/test/CodeGen/X86/subreg-fail.mir
new file mode 100644
index 00000000000000..a563141d59c9fe
--- /dev/null
+++ b/llvm/test/CodeGen/X86/subreg-fail.mir
@@ -0,0 +1,124 @@
+# RUN: llc -mtriple x86_64-unknown-unknown -x mir < %s \
+# RUN:   -verify-machineinstrs -enable-subreg-liveness=true \
+# RUN:   --run-pass=register-coalescer -o - | FileCheck %s
+
+--- |
+  target datalayout = "e-m:e-p270:32:32-p271:32:32-p272:64:64-i64:64-i128:128-f80:128-n8:16:32:64-S128"
+  target triple = "x86_64-unknown-unknown"
+  
+  %pair = type { i64, double }
+  %t21 = type { ptr }
+  %t13 = type { ptr, %t15, %t15 }
+  %t15 = type { i8, i32, i32 }
+  
+  @__force_order = external hidden global i32, align 4
+  @.str = private unnamed_addr constant { [1 x i8], [63 x i8] } zeroinitializer, align 32
+  @a = external global i32, align 4
+  @fn1.g = private unnamed_addr constant [9 x ptr] [ptr null, ptr @a, ptr null, ptr null, ptr null, ptr null, ptr null, ptr null, ptr null], align 16
+  @e = external global i32, align 4
+  @__stack_chk_guard = external dso_local global ptr
+  
+  ; Function Attrs: nounwind ssp
+  define i32 @test1() #0 {
+  entry:
+    %tmp5.i = load volatile i32, ptr undef, align 4
+    %conv.i = zext i32 %tmp5.i to i64
+    %tmp12.i = load volatile i32, ptr undef, align 4
+    %conv13.i = zext i32 %tmp12.i to i64
+    %shl.i = shl i64 %conv13.i, 32
+    %or.i = or i64 %shl.i, %conv.i
+    %add16.i = add i64 %or.i, 256
+    %shr.i = lshr i64 %add16.i, 8
+    %conv19.i = trunc i64 %shr.i to i32
+    store volatile i32 %conv19.i, ptr undef, align 4
+    ret i32 undef
+  }
+...
+---
+name:            test1
+alignment:       16
+exposesReturnsTwice: false
+legalized:       false
+regBankSelected: false
+selected:        false
+failedISel:      false
+tracksRegLiveness: true
+hasWinCFI:       false
+callsEHReturn:   false
+callsUnwindInit: false
+hasEHCatchret:   false
+hasEHScopes:     false
+hasEHFunclets:   false
+isOutlined:      false
+debugInstrRef:   true
+failsVerification: false
+tracksDebugUserValues: false
+registers:
+  - { id: 0, class: gr32, preferred-register: '' }
+  - { id: 1, class: gr64, preferred-register: '' }
+  - { id: 2, class: gr64_nosp, preferred-register: '' }
+  - { id: 3, class: gr32, preferred-register: '' }
+  - { id: 4, class: gr64, preferred-register: '' }
+  - { id: 5, class: gr64, preferred-register: '' }
+  - { id: 6, class: gr64, preferred-register: '' }
+  - { id: 7, class: gr64, preferred-register: '' }
+  - { id: 8, class: gr64, preferred-register: '' }
+  - { id: 9, class: gr32, preferred-register: '' }
+  - { id: 10, class: gr64, preferred-register: '' }
+  - { id: 11, class: gr32, preferred-register: '' }
+liveins:         []
+frameInfo:
+  isFrameAddressTaken: false
+  isReturnAddressTaken: false
+  hasStackMap:     false
+  hasPatchPoint:   false
+  stackSize:       0
+  offsetAdjustment: 0
+  maxAlignment:    1
+  adjustsStack:    false
+  hasCalls:        false
+  stackProtector:  ''
+  functionContext: ''
+  maxCallFrameSize: 4294967295
+  cvBytesOfCalleeSavedRegisters: 0
+  hasOpaqueSPAdjustment: false
+  hasVAStart:      false
+  hasMustTailInVarArgFunc: false
+  hasTailCall:     false
+  isCalleeSavedInfoValid: false
+  localFrameSize:  0
+  savePoint:       ''
+  restorePoint:    ''
+fixedStack:      []
+stack:           []
+entry_values:    []
+callSites:       []
+debugValueSubstitutions: []
+constants:       []
+machineFunctionInfo:
+  amxProgModel:    None
+body:             |
+  bb.0.entry:
+    %0:gr32 = MOV32rm undef %1:gr64, 1, $noreg, 0, $noreg :: (volatile load (s32) from `ptr undef`)
+    %2:gr64_nosp = SUBREG_TO_REG 0, killed %0, %subreg.sub_32bit
+    %3:gr32 = MOV32rm undef %4:gr64, 1, $noreg, 0, $noreg :: (volatile load (s32) from `ptr undef`)
+    %5:gr64 = SUBREG_TO_REG 0, killed %3, %subreg.sub_32bit
+    %6:gr64 = COPY killed %5
+    %6:gr64 = SHL64ri %6, 32, implicit-def dead $eflags
+    %7:gr64 = LEA64r killed %6, 1, killed %2, 256, $noreg
+    %8:gr64 = COPY killed %7
+    %8:gr64 = SHR64ri %8, 8, implicit-def dead $eflags
+    %9:gr32 = COPY killed %8.sub_32bit
+    MOV32mr undef %10:gr64, 1, $noreg, 0, $noreg, killed %9 :: (volatile store (s32) into `ptr undef`)
+    RET 0, undef $eax
+
+...
+
+# CHECK:         undef %2.sub_32bit:gr64_nosp = MOV32rm undef %1:gr64, 1, $noreg, 0, $noreg :: (volatile load (s32) from `ptr undef`)
+# CHECK-NEXT:    undef %6.sub_32bit:gr64_with_sub_8bit = MOV32rm undef %4:gr64, 1, $noreg, 0, $noreg :: (volatile load (s32) from `ptr undef`)
+# CHECK-NEXT:    %6:gr64_with_sub_8bit = SHL64ri %6, 32, implicit-def dead $eflags
+# CHECK-NEXT:    %8:gr64_with_sub_8bit = LEA64r %6, 1, %2, 256, $noreg
+# CHECK-NEXT:    %8:gr64_with_sub_8bit = SHR64ri %8, 8, implicit-def dead $eflags
+# CHECK-NEXT:    MOV32mr undef %10:gr64, 1, $noreg, 0, $noreg, %8.sub_32bit :: (volatile store (s32) into `ptr undef`)
+# CHECK-NEXT:    RET 0, undef $eax
+

@llvmbot
Copy link
Collaborator

llvmbot commented Jun 27, 2024

@llvm/pr-subscribers-backend-powerpc

Author: Stefan Pintilie (stefanp-ibm)

Changes

Two tests are added to this fix. The X86 test fails without the patch.
The PowerPC test passes with and without the patch but is added as a way
track future possible failures when register classes are changed in a
future patch.


Full diff: https://github.com/llvm/llvm-project/pull/96839.diff

3 Files Affected:

  • (modified) llvm/lib/CodeGen/RegisterCoalescer.cpp (+8)
  • (added) llvm/test/CodeGen/PowerPC/subreg-coalescer.mir (+26)
  • (added) llvm/test/CodeGen/X86/subreg-fail.mir (+124)
diff --git a/llvm/lib/CodeGen/RegisterCoalescer.cpp b/llvm/lib/CodeGen/RegisterCoalescer.cpp
index 3397cb4f3661fe..75cdcf811fd401 100644
--- a/llvm/lib/CodeGen/RegisterCoalescer.cpp
+++ b/llvm/lib/CodeGen/RegisterCoalescer.cpp
@@ -3671,6 +3671,14 @@ bool RegisterCoalescer::joinVirtRegs(CoalescerPair &CP) {
     // having stale segments.
     LHSVals.pruneMainSegments(LHS, ShrinkMainRange);
 
+    LHSVals.pruneSubRegValues(LHS, ShrinkMask);
+    RHSVals.pruneSubRegValues(LHS, ShrinkMask);
+  } else if (TrackSubRegLiveness && !CP.getDstIdx() && CP.getSrcIdx()) {
+    LHS.createSubRangeFrom(LIS->getVNInfoAllocator(),
+                           CP.getNewRC()->getLaneMask(), LHS);
+    mergeSubRangeInto(LHS, RHS, TRI->getSubRegIndexLaneMask(CP.getSrcIdx()), CP,
+                      CP.getDstIdx());
+    LHSVals.pruneMainSegments(LHS, ShrinkMainRange);
     LHSVals.pruneSubRegValues(LHS, ShrinkMask);
     RHSVals.pruneSubRegValues(LHS, ShrinkMask);
   }
diff --git a/llvm/test/CodeGen/PowerPC/subreg-coalescer.mir b/llvm/test/CodeGen/PowerPC/subreg-coalescer.mir
new file mode 100644
index 00000000000000..8b04997ff335f7
--- /dev/null
+++ b/llvm/test/CodeGen/PowerPC/subreg-coalescer.mir
@@ -0,0 +1,26 @@
+# RUN: llc -mtriple powerpc64le-unknown-linux-gnu -mcpu=pwr8 -x mir < %s \
+# RUN:   -verify-machineinstrs --run-pass=register-coalescer -o - | FileCheck %s
+
+---
+name: check_subregs
+alignment:       16
+tracksRegLiveness: true
+body:             |
+  bb.0.entry:
+    liveins: $x3
+
+    %0:g8rc_and_g8rc_nox0 = COPY $x3
+    %3:f8rc, %4:g8rc_and_g8rc_nox0 = LFSUX %0, %0
+    %5:f4rc = FRSP killed %3, implicit $rm
+    %22:vslrc = SUBREG_TO_REG 1, %5, %subreg.sub_64
+    %11:vrrc = XVCVDPSP killed %22, implicit $rm
+    $v2 = COPY %11
+    BLR8 implicit $lr8, implicit $rm, implicit $v2
+...
+
+# CHECK:         %0:g8rc_and_g8rc_nox0 = COPY $x3
+# CHECK-NEXT:    %1:f8rc, dead %2:g8rc_and_g8rc_nox0 = LFSUX %0, %0
+# CHECK-NEXT:    undef %4.sub_64:vslrc = FRSP %1, implicit $rm
+# CHECK-NEXT:    %5:vrrc = XVCVDPSP %4, implicit $rm
+# CHECK-NEXT:    $v2 = COPY %5
+# CHECK-NEXT:    BLR8 implicit $lr8, implicit $rm, implicit $v2
diff --git a/llvm/test/CodeGen/X86/subreg-fail.mir b/llvm/test/CodeGen/X86/subreg-fail.mir
new file mode 100644
index 00000000000000..a563141d59c9fe
--- /dev/null
+++ b/llvm/test/CodeGen/X86/subreg-fail.mir
@@ -0,0 +1,124 @@
+# RUN: llc -mtriple x86_64-unknown-unknown -x mir < %s \
+# RUN:   -verify-machineinstrs -enable-subreg-liveness=true \
+# RUN:   --run-pass=register-coalescer -o - | FileCheck %s
+
+--- |
+  target datalayout = "e-m:e-p270:32:32-p271:32:32-p272:64:64-i64:64-i128:128-f80:128-n8:16:32:64-S128"
+  target triple = "x86_64-unknown-unknown"
+  
+  %pair = type { i64, double }
+  %t21 = type { ptr }
+  %t13 = type { ptr, %t15, %t15 }
+  %t15 = type { i8, i32, i32 }
+  
+  @__force_order = external hidden global i32, align 4
+  @.str = private unnamed_addr constant { [1 x i8], [63 x i8] } zeroinitializer, align 32
+  @a = external global i32, align 4
+  @fn1.g = private unnamed_addr constant [9 x ptr] [ptr null, ptr @a, ptr null, ptr null, ptr null, ptr null, ptr null, ptr null, ptr null], align 16
+  @e = external global i32, align 4
+  @__stack_chk_guard = external dso_local global ptr
+  
+  ; Function Attrs: nounwind ssp
+  define i32 @test1() #0 {
+  entry:
+    %tmp5.i = load volatile i32, ptr undef, align 4
+    %conv.i = zext i32 %tmp5.i to i64
+    %tmp12.i = load volatile i32, ptr undef, align 4
+    %conv13.i = zext i32 %tmp12.i to i64
+    %shl.i = shl i64 %conv13.i, 32
+    %or.i = or i64 %shl.i, %conv.i
+    %add16.i = add i64 %or.i, 256
+    %shr.i = lshr i64 %add16.i, 8
+    %conv19.i = trunc i64 %shr.i to i32
+    store volatile i32 %conv19.i, ptr undef, align 4
+    ret i32 undef
+  }
+...
+---
+name:            test1
+alignment:       16
+exposesReturnsTwice: false
+legalized:       false
+regBankSelected: false
+selected:        false
+failedISel:      false
+tracksRegLiveness: true
+hasWinCFI:       false
+callsEHReturn:   false
+callsUnwindInit: false
+hasEHCatchret:   false
+hasEHScopes:     false
+hasEHFunclets:   false
+isOutlined:      false
+debugInstrRef:   true
+failsVerification: false
+tracksDebugUserValues: false
+registers:
+  - { id: 0, class: gr32, preferred-register: '' }
+  - { id: 1, class: gr64, preferred-register: '' }
+  - { id: 2, class: gr64_nosp, preferred-register: '' }
+  - { id: 3, class: gr32, preferred-register: '' }
+  - { id: 4, class: gr64, preferred-register: '' }
+  - { id: 5, class: gr64, preferred-register: '' }
+  - { id: 6, class: gr64, preferred-register: '' }
+  - { id: 7, class: gr64, preferred-register: '' }
+  - { id: 8, class: gr64, preferred-register: '' }
+  - { id: 9, class: gr32, preferred-register: '' }
+  - { id: 10, class: gr64, preferred-register: '' }
+  - { id: 11, class: gr32, preferred-register: '' }
+liveins:         []
+frameInfo:
+  isFrameAddressTaken: false
+  isReturnAddressTaken: false
+  hasStackMap:     false
+  hasPatchPoint:   false
+  stackSize:       0
+  offsetAdjustment: 0
+  maxAlignment:    1
+  adjustsStack:    false
+  hasCalls:        false
+  stackProtector:  ''
+  functionContext: ''
+  maxCallFrameSize: 4294967295
+  cvBytesOfCalleeSavedRegisters: 0
+  hasOpaqueSPAdjustment: false
+  hasVAStart:      false
+  hasMustTailInVarArgFunc: false
+  hasTailCall:     false
+  isCalleeSavedInfoValid: false
+  localFrameSize:  0
+  savePoint:       ''
+  restorePoint:    ''
+fixedStack:      []
+stack:           []
+entry_values:    []
+callSites:       []
+debugValueSubstitutions: []
+constants:       []
+machineFunctionInfo:
+  amxProgModel:    None
+body:             |
+  bb.0.entry:
+    %0:gr32 = MOV32rm undef %1:gr64, 1, $noreg, 0, $noreg :: (volatile load (s32) from `ptr undef`)
+    %2:gr64_nosp = SUBREG_TO_REG 0, killed %0, %subreg.sub_32bit
+    %3:gr32 = MOV32rm undef %4:gr64, 1, $noreg, 0, $noreg :: (volatile load (s32) from `ptr undef`)
+    %5:gr64 = SUBREG_TO_REG 0, killed %3, %subreg.sub_32bit
+    %6:gr64 = COPY killed %5
+    %6:gr64 = SHL64ri %6, 32, implicit-def dead $eflags
+    %7:gr64 = LEA64r killed %6, 1, killed %2, 256, $noreg
+    %8:gr64 = COPY killed %7
+    %8:gr64 = SHR64ri %8, 8, implicit-def dead $eflags
+    %9:gr32 = COPY killed %8.sub_32bit
+    MOV32mr undef %10:gr64, 1, $noreg, 0, $noreg, killed %9 :: (volatile store (s32) into `ptr undef`)
+    RET 0, undef $eax
+
+...
+
+# CHECK:         undef %2.sub_32bit:gr64_nosp = MOV32rm undef %1:gr64, 1, $noreg, 0, $noreg :: (volatile load (s32) from `ptr undef`)
+# CHECK-NEXT:    undef %6.sub_32bit:gr64_with_sub_8bit = MOV32rm undef %4:gr64, 1, $noreg, 0, $noreg :: (volatile load (s32) from `ptr undef`)
+# CHECK-NEXT:    %6:gr64_with_sub_8bit = SHL64ri %6, 32, implicit-def dead $eflags
+# CHECK-NEXT:    %8:gr64_with_sub_8bit = LEA64r %6, 1, %2, 256, $noreg
+# CHECK-NEXT:    %8:gr64_with_sub_8bit = SHR64ri %8, 8, implicit-def dead $eflags
+# CHECK-NEXT:    MOV32mr undef %10:gr64, 1, $noreg, 0, $noreg, %8.sub_32bit :: (volatile store (s32) into `ptr undef`)
+# CHECK-NEXT:    RET 0, undef $eax
+

@bzEq
Copy link
Collaborator

bzEq commented Jun 27, 2024

I think we should pre-commit the X86 test case to show where failure occurs.

@arsenm
Copy link
Contributor

arsenm commented Jun 27, 2024

Title should be descriptive of what the problem is, not a vague "fix issue"

alignment: 16
tracksRegLiveness: true
body: |
bb.0.entry:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Remove .entry block labels

liveins: $x3

%0:g8rc_and_g8rc_nox0 = COPY $x3
%3:f8rc, %4:g8rc_and_g8rc_nox0 = LFSUX %0, %0
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you run run-pass=none to compact the virtual register numbers

@@ -0,0 +1,26 @@
# RUN: llc -mtriple powerpc64le-unknown-linux-gnu -mcpu=pwr8 -x mir < %s \
# RUN: -verify-machineinstrs --run-pass=register-coalescer -o - | FileCheck %s
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

-verify-coalescing would be more useful than -verify-machineinstrs. Also, should use update_mir_test_checks

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also not using stdin and directly reading the file avoids the need for -x mir

llvm/test/CodeGen/X86/subreg-fail.mir Show resolved Hide resolved
@qcolombet
Copy link
Collaborator

Title should be descriptive of what the problem is, not a vague "fix issue"

+1 on that.

Could you describe what was the issue and how this patch fixes it?

Copy link
Contributor

@arsenm arsenm left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Test looks better but title and description still need details

# NOTE: Assertions have been autogenerated by utils/update_mir_test_checks.py UTC_ARGS: --version 5
# RUN: llc -mtriple powerpc64le-unknown-linux-gnu -mcpu=pwr8 -x mir < %s \
# RUN: -verify-coalescing --run-pass=register-coalescer -o - | FileCheck %s

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you add a comment explaining the test?

# RUN: llc -mtriple x86_64-unknown-unknown -x mir < %s \
# RUN: -verify-coalescing -enable-subreg-liveness=true \
# RUN: --run-pass=register-coalescer -o - | FileCheck %s

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you add a comment explaining the test?

@stefanp-ibm stefanp-ibm changed the title [RegisterCoalescer] Fix issue in the RegisterCoalescer. [RegisterCoalescer] Fix SUBREG_TO_REG handling in the RegisterCoalescer. Jun 27, 2024
@stefanp-ibm
Copy link
Contributor Author

I think we should pre-commit the X86 test case to show where failure occurs.

I can do this once the reviewers are happy with the test cases. For the X86 testcase I will probably just use -enable-subreg-liveness=true so that it actually passes and then set it to true when this patch goes in.

@qcolombet
Copy link
Collaborator

%2 [32r,112r:0) 0@32r L000000000000000F [16r,112r:0) 0@16r weight:0.000000e+00

What were we generating previously?

Also, shouldn't it be L000000000000000F [16r, 32r0) 0@16r, i.e., 32r instead of 112r?

@bzEq
Copy link
Collaborator

bzEq commented Jun 28, 2024

%2 [32r,112r:0) 0@32r L000000000000000F [16r,112r:0) 0@16r weight:0.000000e+00

Is it typo? The main liverange should not be shorter than subrange. I tried this patch, looks it should be
Before coalescing

%2 [32r,112r:0) 0@32r  weight:0.000000e+00

After coalescing

%2 [16r,112r:0) 0@16r  L000000000000000F [16r,112r:0) 0@16r  weight:0.000000e+00

During coalescing,

32B %2:gr64_nosp = SUBREG_TO_REG 0, %0:gr32, %subreg.sub_32bit                                                                                                                               
  Considering merging to GR64_NOSP with %0 in %2:sub_32bit                                                                                                                                   
    RHS = %0 [16r,32r:0) 0@16r  weight:0.000000e+00                                                                                                                                          
    LHS = %2 [32r,112r:0) 0@32r  weight:0.000000e+00                                                                                                                                         
    merge %2:0@32r into %0:0@16r --> @16r                                                                                                                                                    
    merge %2:0@32r into %0:0@16r --> @16r                                                                                                                                                    
    joined lanes: 000000000000000F [16r,112r:0) 0@16r                                                                                                                                        
    Expecting instruction removal at 32r                                                                                                                                                     
    erased: 32r %2:gr64_nosp = SUBREG_TO_REG 0, %0:gr32, %subreg.sub_32bit
    updated: 16B  undef %2.sub_32bit:gr64_nosp = MOV32rm undef %1:gr64, 1, $noreg, 0, $noreg :: (volatile load (s32) from `ptr undef`)                                                       
  Success: %0:sub_32bit -> %2                                                                                                                                                                
  Result = %2 [16r,112r:0) 0@16r  L000000000000000F [16r,112r:0) 0@16r  weight:0.000000e+00

@stefanp-ibm
Copy link
Contributor Author

%2 [32r,112r:0) 0@32r L000000000000000F [16r,112r:0) 0@16r weight:0.000000e+00

What were we generating previously?

Also, shouldn't it be L000000000000000F [16r, 32r0) 0@16r, i.e., 32r instead of 112r?

@qcolombet

I'm sorry the line above is a typo. The range should be 16r to 112r.

Before we added the fix we would have:

32B	%2:gr64_nosp = SUBREG_TO_REG 0, %0:gr32, %subreg.sub_32bit
	Considering merging to GR64_NOSP with %0 in %2:sub_32bit
		RHS = %0 [16r,32r:0) 0@16r  weight:0.000000e+00
		LHS = %2 [32r,112r:0) 0@32r  weight:0.000000e+00
		merge %2:0@32r into %0:0@16r --> @16r
		erased:	32r	%2:gr64_nosp = SUBREG_TO_REG 0, %0:gr32, %subreg.sub_32bit
AllocationOrder(GR64) = [ $rax $rcx $rdx $rsi $rdi $r8 $r9 $r10 $r11 $rbx $r14 $r15 $r12 $r13 $rbp ]
AllocationOrder(GR64_NOSP) = [ $rax $rcx $rdx $rsi $rdi $r8 $r9 $r10 $r11 $rbx $r14 $r15 $r12 $r13 $rbp ]
		updated: 16B	undef %2.sub_32bit:gr64_nosp = MOV32rm undef %1:gr64, 1, $noreg, 0, $noreg :: (volatile load (s32) from `ptr undef`)
	Success: %0:sub_32bit -> %2
	Result = %2 [16r,112r:0) 0@16r  weight:0.000000e+00

So, basically the result is %2 [16r,112r:0) 0@16r weight:0.000000e+00.
Which would be verified with -verify-coalescing and we would get the following error:

*** Bad machine code: Live interval for subreg operand has no subranges ***
- function:    test1
- basic block: %bb.0  (0xf2913e5e220) [0B;208B)
- instruction: 16B	undef %2.sub_32bit:gr64_nosp = MOV32rm undef %1:gr64, 1, $noreg, 0, $noreg :: (volatile load (s32) from `ptr undef`)
- operand 0:   undef %2.sub_32bit:gr64_nosp

After we add the fix we get the following:

32B	%2:gr64_nosp = SUBREG_TO_REG 0, %0:gr32, %subreg.sub_32bit
	Considering merging to GR64_NOSP with %0 in %2:sub_32bit
		RHS = %0 [16r,32r:0) 0@16r  weight:0.000000e+00
		LHS = %2 [32r,112r:0) 0@32r  weight:0.000000e+00
		merge %2:0@32r into %0:0@16r --> @16r
		merge %2:0@32r into %0:0@16r --> @16r
		joined lanes: 000000000000000F [16r,112r:0) 0@16r
		Expecting instruction removal at 32r
		erased:	32r	%2:gr64_nosp = SUBREG_TO_REG 0, %0:gr32, %subreg.sub_32bit
AllocationOrder(GR64) = [ $rax $rcx $rdx $rsi $rdi $r8 $r9 $r10 $r11 $rbx $r14 $r15 $r12 $r13 $rbp ]
AllocationOrder(GR64_NOSP) = [ $rax $rcx $rdx $rsi $rdi $r8 $r9 $r10 $r11 $rbx $r14 $r15 $r12 $r13 $rbp ]
		updated: 16B	undef %2.sub_32bit:gr64_nosp = MOV32rm undef %1:gr64, 1, $noreg, 0, $noreg :: (volatile load (s32) from `ptr undef`)
	Success: %0:sub_32bit -> %2
	Result = %2 [16r,112r:0) 0@16r  L000000000000000F [16r,112r:0) 0@16r  weight:0.000000e+00

So the result is %2 [16r,112r:0) 0@16r L000000000000000F [16r,112r:0) 0@16r weight:0.000000e+00 and in this case the verification doesn't find any issues with it.

@stefanp-ibm
Copy link
Contributor Author

%2 [32r,112r:0) 0@32r L000000000000000F [16r,112r:0) 0@16r weight:0.000000e+00

Is it typo? The main liverange should not be shorter than subrange. I tried this patch, looks it should be Before coalescing

Yes, this is a typo and I will fix the description. As you mentioned the result should be:

%2 [16r,112r:0) 0@16r  L000000000000000F [16r,112r:0) 0@16r  weight:0.000000e+00

@stefanp-ibm
Copy link
Contributor Author

I think we should pre-commit the X86 test case to show where failure occurs.

I would like to do this but the test actually hits an assert so we don't actually generate any code for the X86 test unless I remove the -verify-coalescing option which shows the actual failure.

@stefanp-ibm
Copy link
Contributor Author

Gentle ping.

@bzEq bzEq requested review from jayfoad and arsenm July 18, 2024 03:25
@stefanp-ibm stefanp-ibm force-pushed the stefanp/RegCoalesceFix branch 2 times, most recently from 303826f to 6c707ed Compare July 18, 2024 19:40
@@ -0,0 +1,37 @@
# NOTE: Assertions have been autogenerated by utils/update_mir_test_checks.py UTC_ARGS: --version 5
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pre-commit this case to show where MachineVerifier complains when specifying -verify-coalescing -enable-subreg-liveness=true?

@bzEq
Copy link
Collaborator

bzEq commented Jul 23, 2024

I would like to do this but the test actually hits an assert so we don't actually generate any code for the X86 test unless I remove the -verify-coalescing option which shows the actual failure.

We can include error messages in the pre-commit case to show where the problem is. Something like

not --crash llc .... | FileCheck ...
; CHECK: <error information>

I think it's ok we dont't have code generated for an expected failing case.

Copy link
Contributor

@arsenm arsenm left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

lgtm with nit

@@ -0,0 +1,34 @@
# NOTE: Assertions have been autogenerated by utils/update_mir_test_checks.py UTC_ARGS: --version 5
# RUN: llc -mtriple powerpc64le-unknown-linux-gnu -mcpu=pwr8 -x mir < %s \
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
# RUN: llc -mtriple powerpc64le-unknown-linux-gnu -mcpu=pwr8 -x mir < %s \
# RUN: llc -mtriple powerpc64le-unknown-linux-gnu -mcpu=pwr8 %s \

; CHECK-NEXT: BLR8 implicit $lr8, implicit $rm, implicit $v2
%0:g8rc_and_g8rc_nox0 = COPY $x3
%1:f8rc, %2:g8rc_and_g8rc_nox0 = LFSUX %0, %0
%3:f4rc = FRSP killed %1, implicit $rm
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Compact the register numbers, can do this with -run-pass=none

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I did do this but the numbers didn't change. I think it is because both %1 and %2 are defined on line 27.

@@ -0,0 +1,37 @@
# NOTE: Assertions have been autogenerated by utils/update_mir_test_checks.py UTC_ARGS: --version 5
# RUN: llc -mtriple x86_64-unknown-unknown -x mir < %s \
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
# RUN: llc -mtriple x86_64-unknown-unknown -x mir < %s \
# RUN: llc -mtriple x86_64-unknown-unknown %s \

@@ -0,0 +1,37 @@
# NOTE: Assertions have been autogenerated by utils/update_mir_test_checks.py UTC_ARGS: --version 5
# RUN: llc -mtriple x86_64-unknown-unknown -x mir < %s \
# RUN: -verify-coalescing -enable-subreg-liveness=true \
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
# RUN: -verify-coalescing -enable-subreg-liveness=true \
# RUN: -verify-coalescing -enable-subreg-liveness \

Copy link
Collaborator

@qcolombet qcolombet left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the update.

Makes sense now!

Two tests are added to this fix. The X86 test fails without the patch.
The PowerPC test passes with and without the patch but is added as a way
track future possible failures when register classes are changed in a
future patch.
@stefanp-ibm stefanp-ibm merged commit 26fa399 into llvm:main Jul 24, 2024
4 of 7 checks passed
yuxuanchen1997 pushed a commit that referenced this pull request Jul 25, 2024
…er. (#96839)

Summary:
The issue with the handling of the SUBREG_TO_REG is that we don't join
the subranges correctly when we join live ranges across the
SUBREG_TO_REG. For example when joining across this:
```
32B	  %2:gr64_nosp = SUBREG_TO_REG 0, %0:gr32, %subreg.sub_32bit
```
we want to join these live ranges:
```
%0 [16r,32r:0) 0@16r  weight:0.000000e+00
%2 [32r,112r:0) 0@32r  weight:0.000000e+00
```
Before the fix the range for the resulting merged `%2` is:
```
%2 [16r,112r:0) 0@16r  weight:0.000000e+00
```
After the fix it is now this: 
```
%2 [16r,112r:0) 0@16r  L000000000000000F [16r,112r:0) 0@16r  weight:0.000000e+00
```

Two tests are added to this fix. The X86 test fails without the patch.
The PowerPC test passes with and without the patch but is added as a way
track future possible failures when register classes are changed in a
future patch.

Test Plan: 

Reviewers: 

Subscribers: 

Tasks: 

Tags: 


Differential Revision: https://phabricator.intern.facebook.com/D60250666
@daltenty daltenty added this to the LLVM 19.X Release milestone Jul 29, 2024
@stefanp-ibm
Copy link
Contributor Author

/cherry-pick 26fa399

llvmbot pushed a commit to llvmbot/llvm-project that referenced this pull request Jul 29, 2024
…er. (llvm#96839)

The issue with the handling of the SUBREG_TO_REG is that we don't join
the subranges correctly when we join live ranges across the
SUBREG_TO_REG. For example when joining across this:
```
32B	  %2:gr64_nosp = SUBREG_TO_REG 0, %0:gr32, %subreg.sub_32bit
```
we want to join these live ranges:
```
%0 [16r,32r:0) 0@16r  weight:0.000000e+00
%2 [32r,112r:0) 0@32r  weight:0.000000e+00
```
Before the fix the range for the resulting merged `%2` is:
```
%2 [16r,112r:0) 0@16r  weight:0.000000e+00
```
After the fix it is now this:
```
%2 [16r,112r:0) 0@16r  L000000000000000F [16r,112r:0) 0@16r  weight:0.000000e+00
```

Two tests are added to this fix. The X86 test fails without the patch.
The PowerPC test passes with and without the patch but is added as a way
track future possible failures when register classes are changed in a
future patch.

(cherry picked from commit 26fa399)
@llvmbot
Copy link
Collaborator

llvmbot commented Jul 29, 2024

/pull-request #101071

tru pushed a commit that referenced this pull request Jul 31, 2024
…er. (#96839)

The issue with the handling of the SUBREG_TO_REG is that we don't join
the subranges correctly when we join live ranges across the
SUBREG_TO_REG. For example when joining across this:
```
32B	  %2:gr64_nosp = SUBREG_TO_REG 0, %0:gr32, %subreg.sub_32bit
```
we want to join these live ranges:
```
%0 [16r,32r:0) 0@16r  weight:0.000000e+00
%2 [32r,112r:0) 0@32r  weight:0.000000e+00
```
Before the fix the range for the resulting merged `%2` is:
```
%2 [16r,112r:0) 0@16r  weight:0.000000e+00
```
After the fix it is now this:
```
%2 [16r,112r:0) 0@16r  L000000000000000F [16r,112r:0) 0@16r  weight:0.000000e+00
```

Two tests are added to this fix. The X86 test fails without the patch.
The PowerPC test passes with and without the patch but is added as a way
track future possible failures when register classes are changed in a
future patch.

(cherry picked from commit 26fa399)
Harini0924 pushed a commit to Harini0924/llvm-project that referenced this pull request Aug 1, 2024
…er. (llvm#96839)

The issue with the handling of the SUBREG_TO_REG is that we don't join
the subranges correctly when we join live ranges across the
SUBREG_TO_REG. For example when joining across this:
```
32B	  %2:gr64_nosp = SUBREG_TO_REG 0, %0:gr32, %subreg.sub_32bit
```
we want to join these live ranges:
```
%0 [16r,32r:0) 0@16r  weight:0.000000e+00
%2 [32r,112r:0) 0@32r  weight:0.000000e+00
```
Before the fix the range for the resulting merged `%2` is:
```
%2 [16r,112r:0) 0@16r  weight:0.000000e+00
```
After the fix it is now this: 
```
%2 [16r,112r:0) 0@16r  L000000000000000F [16r,112r:0) 0@16r  weight:0.000000e+00
```

Two tests are added to this fix. The X86 test fails without the patch.
The PowerPC test passes with and without the patch but is added as a way
track future possible failures when register classes are changed in a
future patch.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
Development

Successfully merging this pull request may close these issues.

6 participants