-
Notifications
You must be signed in to change notification settings - Fork 11.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[RegisterCoalescer] Fix SUBREG_TO_REG handling in the RegisterCoalescer. #96839
Conversation
@llvm/pr-subscribers-backend-x86 Author: Stefan Pintilie (stefanp-ibm) ChangesTwo tests are added to this fix. The X86 test fails without the patch. Full diff: https://github.com/llvm/llvm-project/pull/96839.diff 3 Files Affected:
diff --git a/llvm/lib/CodeGen/RegisterCoalescer.cpp b/llvm/lib/CodeGen/RegisterCoalescer.cpp
index 3397cb4f3661fe..75cdcf811fd401 100644
--- a/llvm/lib/CodeGen/RegisterCoalescer.cpp
+++ b/llvm/lib/CodeGen/RegisterCoalescer.cpp
@@ -3671,6 +3671,14 @@ bool RegisterCoalescer::joinVirtRegs(CoalescerPair &CP) {
// having stale segments.
LHSVals.pruneMainSegments(LHS, ShrinkMainRange);
+ LHSVals.pruneSubRegValues(LHS, ShrinkMask);
+ RHSVals.pruneSubRegValues(LHS, ShrinkMask);
+ } else if (TrackSubRegLiveness && !CP.getDstIdx() && CP.getSrcIdx()) {
+ LHS.createSubRangeFrom(LIS->getVNInfoAllocator(),
+ CP.getNewRC()->getLaneMask(), LHS);
+ mergeSubRangeInto(LHS, RHS, TRI->getSubRegIndexLaneMask(CP.getSrcIdx()), CP,
+ CP.getDstIdx());
+ LHSVals.pruneMainSegments(LHS, ShrinkMainRange);
LHSVals.pruneSubRegValues(LHS, ShrinkMask);
RHSVals.pruneSubRegValues(LHS, ShrinkMask);
}
diff --git a/llvm/test/CodeGen/PowerPC/subreg-coalescer.mir b/llvm/test/CodeGen/PowerPC/subreg-coalescer.mir
new file mode 100644
index 00000000000000..8b04997ff335f7
--- /dev/null
+++ b/llvm/test/CodeGen/PowerPC/subreg-coalescer.mir
@@ -0,0 +1,26 @@
+# RUN: llc -mtriple powerpc64le-unknown-linux-gnu -mcpu=pwr8 -x mir < %s \
+# RUN: -verify-machineinstrs --run-pass=register-coalescer -o - | FileCheck %s
+
+---
+name: check_subregs
+alignment: 16
+tracksRegLiveness: true
+body: |
+ bb.0.entry:
+ liveins: $x3
+
+ %0:g8rc_and_g8rc_nox0 = COPY $x3
+ %3:f8rc, %4:g8rc_and_g8rc_nox0 = LFSUX %0, %0
+ %5:f4rc = FRSP killed %3, implicit $rm
+ %22:vslrc = SUBREG_TO_REG 1, %5, %subreg.sub_64
+ %11:vrrc = XVCVDPSP killed %22, implicit $rm
+ $v2 = COPY %11
+ BLR8 implicit $lr8, implicit $rm, implicit $v2
+...
+
+# CHECK: %0:g8rc_and_g8rc_nox0 = COPY $x3
+# CHECK-NEXT: %1:f8rc, dead %2:g8rc_and_g8rc_nox0 = LFSUX %0, %0
+# CHECK-NEXT: undef %4.sub_64:vslrc = FRSP %1, implicit $rm
+# CHECK-NEXT: %5:vrrc = XVCVDPSP %4, implicit $rm
+# CHECK-NEXT: $v2 = COPY %5
+# CHECK-NEXT: BLR8 implicit $lr8, implicit $rm, implicit $v2
diff --git a/llvm/test/CodeGen/X86/subreg-fail.mir b/llvm/test/CodeGen/X86/subreg-fail.mir
new file mode 100644
index 00000000000000..a563141d59c9fe
--- /dev/null
+++ b/llvm/test/CodeGen/X86/subreg-fail.mir
@@ -0,0 +1,124 @@
+# RUN: llc -mtriple x86_64-unknown-unknown -x mir < %s \
+# RUN: -verify-machineinstrs -enable-subreg-liveness=true \
+# RUN: --run-pass=register-coalescer -o - | FileCheck %s
+
+--- |
+ target datalayout = "e-m:e-p270:32:32-p271:32:32-p272:64:64-i64:64-i128:128-f80:128-n8:16:32:64-S128"
+ target triple = "x86_64-unknown-unknown"
+
+ %pair = type { i64, double }
+ %t21 = type { ptr }
+ %t13 = type { ptr, %t15, %t15 }
+ %t15 = type { i8, i32, i32 }
+
+ @__force_order = external hidden global i32, align 4
+ @.str = private unnamed_addr constant { [1 x i8], [63 x i8] } zeroinitializer, align 32
+ @a = external global i32, align 4
+ @fn1.g = private unnamed_addr constant [9 x ptr] [ptr null, ptr @a, ptr null, ptr null, ptr null, ptr null, ptr null, ptr null, ptr null], align 16
+ @e = external global i32, align 4
+ @__stack_chk_guard = external dso_local global ptr
+
+ ; Function Attrs: nounwind ssp
+ define i32 @test1() #0 {
+ entry:
+ %tmp5.i = load volatile i32, ptr undef, align 4
+ %conv.i = zext i32 %tmp5.i to i64
+ %tmp12.i = load volatile i32, ptr undef, align 4
+ %conv13.i = zext i32 %tmp12.i to i64
+ %shl.i = shl i64 %conv13.i, 32
+ %or.i = or i64 %shl.i, %conv.i
+ %add16.i = add i64 %or.i, 256
+ %shr.i = lshr i64 %add16.i, 8
+ %conv19.i = trunc i64 %shr.i to i32
+ store volatile i32 %conv19.i, ptr undef, align 4
+ ret i32 undef
+ }
+...
+---
+name: test1
+alignment: 16
+exposesReturnsTwice: false
+legalized: false
+regBankSelected: false
+selected: false
+failedISel: false
+tracksRegLiveness: true
+hasWinCFI: false
+callsEHReturn: false
+callsUnwindInit: false
+hasEHCatchret: false
+hasEHScopes: false
+hasEHFunclets: false
+isOutlined: false
+debugInstrRef: true
+failsVerification: false
+tracksDebugUserValues: false
+registers:
+ - { id: 0, class: gr32, preferred-register: '' }
+ - { id: 1, class: gr64, preferred-register: '' }
+ - { id: 2, class: gr64_nosp, preferred-register: '' }
+ - { id: 3, class: gr32, preferred-register: '' }
+ - { id: 4, class: gr64, preferred-register: '' }
+ - { id: 5, class: gr64, preferred-register: '' }
+ - { id: 6, class: gr64, preferred-register: '' }
+ - { id: 7, class: gr64, preferred-register: '' }
+ - { id: 8, class: gr64, preferred-register: '' }
+ - { id: 9, class: gr32, preferred-register: '' }
+ - { id: 10, class: gr64, preferred-register: '' }
+ - { id: 11, class: gr32, preferred-register: '' }
+liveins: []
+frameInfo:
+ isFrameAddressTaken: false
+ isReturnAddressTaken: false
+ hasStackMap: false
+ hasPatchPoint: false
+ stackSize: 0
+ offsetAdjustment: 0
+ maxAlignment: 1
+ adjustsStack: false
+ hasCalls: false
+ stackProtector: ''
+ functionContext: ''
+ maxCallFrameSize: 4294967295
+ cvBytesOfCalleeSavedRegisters: 0
+ hasOpaqueSPAdjustment: false
+ hasVAStart: false
+ hasMustTailInVarArgFunc: false
+ hasTailCall: false
+ isCalleeSavedInfoValid: false
+ localFrameSize: 0
+ savePoint: ''
+ restorePoint: ''
+fixedStack: []
+stack: []
+entry_values: []
+callSites: []
+debugValueSubstitutions: []
+constants: []
+machineFunctionInfo:
+ amxProgModel: None
+body: |
+ bb.0.entry:
+ %0:gr32 = MOV32rm undef %1:gr64, 1, $noreg, 0, $noreg :: (volatile load (s32) from `ptr undef`)
+ %2:gr64_nosp = SUBREG_TO_REG 0, killed %0, %subreg.sub_32bit
+ %3:gr32 = MOV32rm undef %4:gr64, 1, $noreg, 0, $noreg :: (volatile load (s32) from `ptr undef`)
+ %5:gr64 = SUBREG_TO_REG 0, killed %3, %subreg.sub_32bit
+ %6:gr64 = COPY killed %5
+ %6:gr64 = SHL64ri %6, 32, implicit-def dead $eflags
+ %7:gr64 = LEA64r killed %6, 1, killed %2, 256, $noreg
+ %8:gr64 = COPY killed %7
+ %8:gr64 = SHR64ri %8, 8, implicit-def dead $eflags
+ %9:gr32 = COPY killed %8.sub_32bit
+ MOV32mr undef %10:gr64, 1, $noreg, 0, $noreg, killed %9 :: (volatile store (s32) into `ptr undef`)
+ RET 0, undef $eax
+
+...
+
+# CHECK: undef %2.sub_32bit:gr64_nosp = MOV32rm undef %1:gr64, 1, $noreg, 0, $noreg :: (volatile load (s32) from `ptr undef`)
+# CHECK-NEXT: undef %6.sub_32bit:gr64_with_sub_8bit = MOV32rm undef %4:gr64, 1, $noreg, 0, $noreg :: (volatile load (s32) from `ptr undef`)
+# CHECK-NEXT: %6:gr64_with_sub_8bit = SHL64ri %6, 32, implicit-def dead $eflags
+# CHECK-NEXT: %8:gr64_with_sub_8bit = LEA64r %6, 1, %2, 256, $noreg
+# CHECK-NEXT: %8:gr64_with_sub_8bit = SHR64ri %8, 8, implicit-def dead $eflags
+# CHECK-NEXT: MOV32mr undef %10:gr64, 1, $noreg, 0, $noreg, %8.sub_32bit :: (volatile store (s32) into `ptr undef`)
+# CHECK-NEXT: RET 0, undef $eax
+
|
@llvm/pr-subscribers-llvm-regalloc Author: Stefan Pintilie (stefanp-ibm) ChangesTwo tests are added to this fix. The X86 test fails without the patch. Full diff: https://github.com/llvm/llvm-project/pull/96839.diff 3 Files Affected:
diff --git a/llvm/lib/CodeGen/RegisterCoalescer.cpp b/llvm/lib/CodeGen/RegisterCoalescer.cpp
index 3397cb4f3661fe..75cdcf811fd401 100644
--- a/llvm/lib/CodeGen/RegisterCoalescer.cpp
+++ b/llvm/lib/CodeGen/RegisterCoalescer.cpp
@@ -3671,6 +3671,14 @@ bool RegisterCoalescer::joinVirtRegs(CoalescerPair &CP) {
// having stale segments.
LHSVals.pruneMainSegments(LHS, ShrinkMainRange);
+ LHSVals.pruneSubRegValues(LHS, ShrinkMask);
+ RHSVals.pruneSubRegValues(LHS, ShrinkMask);
+ } else if (TrackSubRegLiveness && !CP.getDstIdx() && CP.getSrcIdx()) {
+ LHS.createSubRangeFrom(LIS->getVNInfoAllocator(),
+ CP.getNewRC()->getLaneMask(), LHS);
+ mergeSubRangeInto(LHS, RHS, TRI->getSubRegIndexLaneMask(CP.getSrcIdx()), CP,
+ CP.getDstIdx());
+ LHSVals.pruneMainSegments(LHS, ShrinkMainRange);
LHSVals.pruneSubRegValues(LHS, ShrinkMask);
RHSVals.pruneSubRegValues(LHS, ShrinkMask);
}
diff --git a/llvm/test/CodeGen/PowerPC/subreg-coalescer.mir b/llvm/test/CodeGen/PowerPC/subreg-coalescer.mir
new file mode 100644
index 00000000000000..8b04997ff335f7
--- /dev/null
+++ b/llvm/test/CodeGen/PowerPC/subreg-coalescer.mir
@@ -0,0 +1,26 @@
+# RUN: llc -mtriple powerpc64le-unknown-linux-gnu -mcpu=pwr8 -x mir < %s \
+# RUN: -verify-machineinstrs --run-pass=register-coalescer -o - | FileCheck %s
+
+---
+name: check_subregs
+alignment: 16
+tracksRegLiveness: true
+body: |
+ bb.0.entry:
+ liveins: $x3
+
+ %0:g8rc_and_g8rc_nox0 = COPY $x3
+ %3:f8rc, %4:g8rc_and_g8rc_nox0 = LFSUX %0, %0
+ %5:f4rc = FRSP killed %3, implicit $rm
+ %22:vslrc = SUBREG_TO_REG 1, %5, %subreg.sub_64
+ %11:vrrc = XVCVDPSP killed %22, implicit $rm
+ $v2 = COPY %11
+ BLR8 implicit $lr8, implicit $rm, implicit $v2
+...
+
+# CHECK: %0:g8rc_and_g8rc_nox0 = COPY $x3
+# CHECK-NEXT: %1:f8rc, dead %2:g8rc_and_g8rc_nox0 = LFSUX %0, %0
+# CHECK-NEXT: undef %4.sub_64:vslrc = FRSP %1, implicit $rm
+# CHECK-NEXT: %5:vrrc = XVCVDPSP %4, implicit $rm
+# CHECK-NEXT: $v2 = COPY %5
+# CHECK-NEXT: BLR8 implicit $lr8, implicit $rm, implicit $v2
diff --git a/llvm/test/CodeGen/X86/subreg-fail.mir b/llvm/test/CodeGen/X86/subreg-fail.mir
new file mode 100644
index 00000000000000..a563141d59c9fe
--- /dev/null
+++ b/llvm/test/CodeGen/X86/subreg-fail.mir
@@ -0,0 +1,124 @@
+# RUN: llc -mtriple x86_64-unknown-unknown -x mir < %s \
+# RUN: -verify-machineinstrs -enable-subreg-liveness=true \
+# RUN: --run-pass=register-coalescer -o - | FileCheck %s
+
+--- |
+ target datalayout = "e-m:e-p270:32:32-p271:32:32-p272:64:64-i64:64-i128:128-f80:128-n8:16:32:64-S128"
+ target triple = "x86_64-unknown-unknown"
+
+ %pair = type { i64, double }
+ %t21 = type { ptr }
+ %t13 = type { ptr, %t15, %t15 }
+ %t15 = type { i8, i32, i32 }
+
+ @__force_order = external hidden global i32, align 4
+ @.str = private unnamed_addr constant { [1 x i8], [63 x i8] } zeroinitializer, align 32
+ @a = external global i32, align 4
+ @fn1.g = private unnamed_addr constant [9 x ptr] [ptr null, ptr @a, ptr null, ptr null, ptr null, ptr null, ptr null, ptr null, ptr null], align 16
+ @e = external global i32, align 4
+ @__stack_chk_guard = external dso_local global ptr
+
+ ; Function Attrs: nounwind ssp
+ define i32 @test1() #0 {
+ entry:
+ %tmp5.i = load volatile i32, ptr undef, align 4
+ %conv.i = zext i32 %tmp5.i to i64
+ %tmp12.i = load volatile i32, ptr undef, align 4
+ %conv13.i = zext i32 %tmp12.i to i64
+ %shl.i = shl i64 %conv13.i, 32
+ %or.i = or i64 %shl.i, %conv.i
+ %add16.i = add i64 %or.i, 256
+ %shr.i = lshr i64 %add16.i, 8
+ %conv19.i = trunc i64 %shr.i to i32
+ store volatile i32 %conv19.i, ptr undef, align 4
+ ret i32 undef
+ }
+...
+---
+name: test1
+alignment: 16
+exposesReturnsTwice: false
+legalized: false
+regBankSelected: false
+selected: false
+failedISel: false
+tracksRegLiveness: true
+hasWinCFI: false
+callsEHReturn: false
+callsUnwindInit: false
+hasEHCatchret: false
+hasEHScopes: false
+hasEHFunclets: false
+isOutlined: false
+debugInstrRef: true
+failsVerification: false
+tracksDebugUserValues: false
+registers:
+ - { id: 0, class: gr32, preferred-register: '' }
+ - { id: 1, class: gr64, preferred-register: '' }
+ - { id: 2, class: gr64_nosp, preferred-register: '' }
+ - { id: 3, class: gr32, preferred-register: '' }
+ - { id: 4, class: gr64, preferred-register: '' }
+ - { id: 5, class: gr64, preferred-register: '' }
+ - { id: 6, class: gr64, preferred-register: '' }
+ - { id: 7, class: gr64, preferred-register: '' }
+ - { id: 8, class: gr64, preferred-register: '' }
+ - { id: 9, class: gr32, preferred-register: '' }
+ - { id: 10, class: gr64, preferred-register: '' }
+ - { id: 11, class: gr32, preferred-register: '' }
+liveins: []
+frameInfo:
+ isFrameAddressTaken: false
+ isReturnAddressTaken: false
+ hasStackMap: false
+ hasPatchPoint: false
+ stackSize: 0
+ offsetAdjustment: 0
+ maxAlignment: 1
+ adjustsStack: false
+ hasCalls: false
+ stackProtector: ''
+ functionContext: ''
+ maxCallFrameSize: 4294967295
+ cvBytesOfCalleeSavedRegisters: 0
+ hasOpaqueSPAdjustment: false
+ hasVAStart: false
+ hasMustTailInVarArgFunc: false
+ hasTailCall: false
+ isCalleeSavedInfoValid: false
+ localFrameSize: 0
+ savePoint: ''
+ restorePoint: ''
+fixedStack: []
+stack: []
+entry_values: []
+callSites: []
+debugValueSubstitutions: []
+constants: []
+machineFunctionInfo:
+ amxProgModel: None
+body: |
+ bb.0.entry:
+ %0:gr32 = MOV32rm undef %1:gr64, 1, $noreg, 0, $noreg :: (volatile load (s32) from `ptr undef`)
+ %2:gr64_nosp = SUBREG_TO_REG 0, killed %0, %subreg.sub_32bit
+ %3:gr32 = MOV32rm undef %4:gr64, 1, $noreg, 0, $noreg :: (volatile load (s32) from `ptr undef`)
+ %5:gr64 = SUBREG_TO_REG 0, killed %3, %subreg.sub_32bit
+ %6:gr64 = COPY killed %5
+ %6:gr64 = SHL64ri %6, 32, implicit-def dead $eflags
+ %7:gr64 = LEA64r killed %6, 1, killed %2, 256, $noreg
+ %8:gr64 = COPY killed %7
+ %8:gr64 = SHR64ri %8, 8, implicit-def dead $eflags
+ %9:gr32 = COPY killed %8.sub_32bit
+ MOV32mr undef %10:gr64, 1, $noreg, 0, $noreg, killed %9 :: (volatile store (s32) into `ptr undef`)
+ RET 0, undef $eax
+
+...
+
+# CHECK: undef %2.sub_32bit:gr64_nosp = MOV32rm undef %1:gr64, 1, $noreg, 0, $noreg :: (volatile load (s32) from `ptr undef`)
+# CHECK-NEXT: undef %6.sub_32bit:gr64_with_sub_8bit = MOV32rm undef %4:gr64, 1, $noreg, 0, $noreg :: (volatile load (s32) from `ptr undef`)
+# CHECK-NEXT: %6:gr64_with_sub_8bit = SHL64ri %6, 32, implicit-def dead $eflags
+# CHECK-NEXT: %8:gr64_with_sub_8bit = LEA64r %6, 1, %2, 256, $noreg
+# CHECK-NEXT: %8:gr64_with_sub_8bit = SHR64ri %8, 8, implicit-def dead $eflags
+# CHECK-NEXT: MOV32mr undef %10:gr64, 1, $noreg, 0, $noreg, %8.sub_32bit :: (volatile store (s32) into `ptr undef`)
+# CHECK-NEXT: RET 0, undef $eax
+
|
@llvm/pr-subscribers-backend-powerpc Author: Stefan Pintilie (stefanp-ibm) ChangesTwo tests are added to this fix. The X86 test fails without the patch. Full diff: https://github.com/llvm/llvm-project/pull/96839.diff 3 Files Affected:
diff --git a/llvm/lib/CodeGen/RegisterCoalescer.cpp b/llvm/lib/CodeGen/RegisterCoalescer.cpp
index 3397cb4f3661fe..75cdcf811fd401 100644
--- a/llvm/lib/CodeGen/RegisterCoalescer.cpp
+++ b/llvm/lib/CodeGen/RegisterCoalescer.cpp
@@ -3671,6 +3671,14 @@ bool RegisterCoalescer::joinVirtRegs(CoalescerPair &CP) {
// having stale segments.
LHSVals.pruneMainSegments(LHS, ShrinkMainRange);
+ LHSVals.pruneSubRegValues(LHS, ShrinkMask);
+ RHSVals.pruneSubRegValues(LHS, ShrinkMask);
+ } else if (TrackSubRegLiveness && !CP.getDstIdx() && CP.getSrcIdx()) {
+ LHS.createSubRangeFrom(LIS->getVNInfoAllocator(),
+ CP.getNewRC()->getLaneMask(), LHS);
+ mergeSubRangeInto(LHS, RHS, TRI->getSubRegIndexLaneMask(CP.getSrcIdx()), CP,
+ CP.getDstIdx());
+ LHSVals.pruneMainSegments(LHS, ShrinkMainRange);
LHSVals.pruneSubRegValues(LHS, ShrinkMask);
RHSVals.pruneSubRegValues(LHS, ShrinkMask);
}
diff --git a/llvm/test/CodeGen/PowerPC/subreg-coalescer.mir b/llvm/test/CodeGen/PowerPC/subreg-coalescer.mir
new file mode 100644
index 00000000000000..8b04997ff335f7
--- /dev/null
+++ b/llvm/test/CodeGen/PowerPC/subreg-coalescer.mir
@@ -0,0 +1,26 @@
+# RUN: llc -mtriple powerpc64le-unknown-linux-gnu -mcpu=pwr8 -x mir < %s \
+# RUN: -verify-machineinstrs --run-pass=register-coalescer -o - | FileCheck %s
+
+---
+name: check_subregs
+alignment: 16
+tracksRegLiveness: true
+body: |
+ bb.0.entry:
+ liveins: $x3
+
+ %0:g8rc_and_g8rc_nox0 = COPY $x3
+ %3:f8rc, %4:g8rc_and_g8rc_nox0 = LFSUX %0, %0
+ %5:f4rc = FRSP killed %3, implicit $rm
+ %22:vslrc = SUBREG_TO_REG 1, %5, %subreg.sub_64
+ %11:vrrc = XVCVDPSP killed %22, implicit $rm
+ $v2 = COPY %11
+ BLR8 implicit $lr8, implicit $rm, implicit $v2
+...
+
+# CHECK: %0:g8rc_and_g8rc_nox0 = COPY $x3
+# CHECK-NEXT: %1:f8rc, dead %2:g8rc_and_g8rc_nox0 = LFSUX %0, %0
+# CHECK-NEXT: undef %4.sub_64:vslrc = FRSP %1, implicit $rm
+# CHECK-NEXT: %5:vrrc = XVCVDPSP %4, implicit $rm
+# CHECK-NEXT: $v2 = COPY %5
+# CHECK-NEXT: BLR8 implicit $lr8, implicit $rm, implicit $v2
diff --git a/llvm/test/CodeGen/X86/subreg-fail.mir b/llvm/test/CodeGen/X86/subreg-fail.mir
new file mode 100644
index 00000000000000..a563141d59c9fe
--- /dev/null
+++ b/llvm/test/CodeGen/X86/subreg-fail.mir
@@ -0,0 +1,124 @@
+# RUN: llc -mtriple x86_64-unknown-unknown -x mir < %s \
+# RUN: -verify-machineinstrs -enable-subreg-liveness=true \
+# RUN: --run-pass=register-coalescer -o - | FileCheck %s
+
+--- |
+ target datalayout = "e-m:e-p270:32:32-p271:32:32-p272:64:64-i64:64-i128:128-f80:128-n8:16:32:64-S128"
+ target triple = "x86_64-unknown-unknown"
+
+ %pair = type { i64, double }
+ %t21 = type { ptr }
+ %t13 = type { ptr, %t15, %t15 }
+ %t15 = type { i8, i32, i32 }
+
+ @__force_order = external hidden global i32, align 4
+ @.str = private unnamed_addr constant { [1 x i8], [63 x i8] } zeroinitializer, align 32
+ @a = external global i32, align 4
+ @fn1.g = private unnamed_addr constant [9 x ptr] [ptr null, ptr @a, ptr null, ptr null, ptr null, ptr null, ptr null, ptr null, ptr null], align 16
+ @e = external global i32, align 4
+ @__stack_chk_guard = external dso_local global ptr
+
+ ; Function Attrs: nounwind ssp
+ define i32 @test1() #0 {
+ entry:
+ %tmp5.i = load volatile i32, ptr undef, align 4
+ %conv.i = zext i32 %tmp5.i to i64
+ %tmp12.i = load volatile i32, ptr undef, align 4
+ %conv13.i = zext i32 %tmp12.i to i64
+ %shl.i = shl i64 %conv13.i, 32
+ %or.i = or i64 %shl.i, %conv.i
+ %add16.i = add i64 %or.i, 256
+ %shr.i = lshr i64 %add16.i, 8
+ %conv19.i = trunc i64 %shr.i to i32
+ store volatile i32 %conv19.i, ptr undef, align 4
+ ret i32 undef
+ }
+...
+---
+name: test1
+alignment: 16
+exposesReturnsTwice: false
+legalized: false
+regBankSelected: false
+selected: false
+failedISel: false
+tracksRegLiveness: true
+hasWinCFI: false
+callsEHReturn: false
+callsUnwindInit: false
+hasEHCatchret: false
+hasEHScopes: false
+hasEHFunclets: false
+isOutlined: false
+debugInstrRef: true
+failsVerification: false
+tracksDebugUserValues: false
+registers:
+ - { id: 0, class: gr32, preferred-register: '' }
+ - { id: 1, class: gr64, preferred-register: '' }
+ - { id: 2, class: gr64_nosp, preferred-register: '' }
+ - { id: 3, class: gr32, preferred-register: '' }
+ - { id: 4, class: gr64, preferred-register: '' }
+ - { id: 5, class: gr64, preferred-register: '' }
+ - { id: 6, class: gr64, preferred-register: '' }
+ - { id: 7, class: gr64, preferred-register: '' }
+ - { id: 8, class: gr64, preferred-register: '' }
+ - { id: 9, class: gr32, preferred-register: '' }
+ - { id: 10, class: gr64, preferred-register: '' }
+ - { id: 11, class: gr32, preferred-register: '' }
+liveins: []
+frameInfo:
+ isFrameAddressTaken: false
+ isReturnAddressTaken: false
+ hasStackMap: false
+ hasPatchPoint: false
+ stackSize: 0
+ offsetAdjustment: 0
+ maxAlignment: 1
+ adjustsStack: false
+ hasCalls: false
+ stackProtector: ''
+ functionContext: ''
+ maxCallFrameSize: 4294967295
+ cvBytesOfCalleeSavedRegisters: 0
+ hasOpaqueSPAdjustment: false
+ hasVAStart: false
+ hasMustTailInVarArgFunc: false
+ hasTailCall: false
+ isCalleeSavedInfoValid: false
+ localFrameSize: 0
+ savePoint: ''
+ restorePoint: ''
+fixedStack: []
+stack: []
+entry_values: []
+callSites: []
+debugValueSubstitutions: []
+constants: []
+machineFunctionInfo:
+ amxProgModel: None
+body: |
+ bb.0.entry:
+ %0:gr32 = MOV32rm undef %1:gr64, 1, $noreg, 0, $noreg :: (volatile load (s32) from `ptr undef`)
+ %2:gr64_nosp = SUBREG_TO_REG 0, killed %0, %subreg.sub_32bit
+ %3:gr32 = MOV32rm undef %4:gr64, 1, $noreg, 0, $noreg :: (volatile load (s32) from `ptr undef`)
+ %5:gr64 = SUBREG_TO_REG 0, killed %3, %subreg.sub_32bit
+ %6:gr64 = COPY killed %5
+ %6:gr64 = SHL64ri %6, 32, implicit-def dead $eflags
+ %7:gr64 = LEA64r killed %6, 1, killed %2, 256, $noreg
+ %8:gr64 = COPY killed %7
+ %8:gr64 = SHR64ri %8, 8, implicit-def dead $eflags
+ %9:gr32 = COPY killed %8.sub_32bit
+ MOV32mr undef %10:gr64, 1, $noreg, 0, $noreg, killed %9 :: (volatile store (s32) into `ptr undef`)
+ RET 0, undef $eax
+
+...
+
+# CHECK: undef %2.sub_32bit:gr64_nosp = MOV32rm undef %1:gr64, 1, $noreg, 0, $noreg :: (volatile load (s32) from `ptr undef`)
+# CHECK-NEXT: undef %6.sub_32bit:gr64_with_sub_8bit = MOV32rm undef %4:gr64, 1, $noreg, 0, $noreg :: (volatile load (s32) from `ptr undef`)
+# CHECK-NEXT: %6:gr64_with_sub_8bit = SHL64ri %6, 32, implicit-def dead $eflags
+# CHECK-NEXT: %8:gr64_with_sub_8bit = LEA64r %6, 1, %2, 256, $noreg
+# CHECK-NEXT: %8:gr64_with_sub_8bit = SHR64ri %8, 8, implicit-def dead $eflags
+# CHECK-NEXT: MOV32mr undef %10:gr64, 1, $noreg, 0, $noreg, %8.sub_32bit :: (volatile store (s32) into `ptr undef`)
+# CHECK-NEXT: RET 0, undef $eax
+
|
I think we should pre-commit the X86 test case to show where failure occurs. |
Title should be descriptive of what the problem is, not a vague "fix issue" |
alignment: 16 | ||
tracksRegLiveness: true | ||
body: | | ||
bb.0.entry: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Remove .entry block labels
liveins: $x3 | ||
|
||
%0:g8rc_and_g8rc_nox0 = COPY $x3 | ||
%3:f8rc, %4:g8rc_and_g8rc_nox0 = LFSUX %0, %0 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can you run run-pass=none to compact the virtual register numbers
@@ -0,0 +1,26 @@ | |||
# RUN: llc -mtriple powerpc64le-unknown-linux-gnu -mcpu=pwr8 -x mir < %s \ | |||
# RUN: -verify-machineinstrs --run-pass=register-coalescer -o - | FileCheck %s |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
-verify-coalescing would be more useful than -verify-machineinstrs. Also, should use update_mir_test_checks
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Also not using stdin and directly reading the file avoids the need for -x mir
+1 on that. Could you describe what was the issue and how this patch fixes it? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Test looks better but title and description still need details
# NOTE: Assertions have been autogenerated by utils/update_mir_test_checks.py UTC_ARGS: --version 5 | ||
# RUN: llc -mtriple powerpc64le-unknown-linux-gnu -mcpu=pwr8 -x mir < %s \ | ||
# RUN: -verify-coalescing --run-pass=register-coalescer -o - | FileCheck %s | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can you add a comment explaining the test?
# RUN: llc -mtriple x86_64-unknown-unknown -x mir < %s \ | ||
# RUN: -verify-coalescing -enable-subreg-liveness=true \ | ||
# RUN: --run-pass=register-coalescer -o - | FileCheck %s | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can you add a comment explaining the test?
I can do this once the reviewers are happy with the test cases. For the X86 testcase I will probably just use |
What were we generating previously? Also, shouldn't it be |
Is it typo? The main liverange should not be shorter than subrange. I tried this patch, looks it should be
After coalescing
During coalescing,
|
I'm sorry the line above is a typo. The range should be Before we added the fix we would have:
So, basically the result is
After we add the fix we get the following:
So the result is |
Yes, this is a typo and I will fix the description. As you mentioned the result should be:
|
I would like to do this but the test actually hits an assert so we don't actually generate any code for the X86 test unless I remove the |
Gentle ping. |
303826f
to
6c707ed
Compare
@@ -0,0 +1,37 @@ | |||
# NOTE: Assertions have been autogenerated by utils/update_mir_test_checks.py UTC_ARGS: --version 5 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Pre-commit this case to show where MachineVerifier
complains when specifying -verify-coalescing -enable-subreg-liveness=true
?
We can include error messages in the pre-commit case to show where the problem is. Something like
I think it's ok we dont't have code generated for an expected failing case. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
lgtm with nit
@@ -0,0 +1,34 @@ | |||
# NOTE: Assertions have been autogenerated by utils/update_mir_test_checks.py UTC_ARGS: --version 5 | |||
# RUN: llc -mtriple powerpc64le-unknown-linux-gnu -mcpu=pwr8 -x mir < %s \ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
# RUN: llc -mtriple powerpc64le-unknown-linux-gnu -mcpu=pwr8 -x mir < %s \ | |
# RUN: llc -mtriple powerpc64le-unknown-linux-gnu -mcpu=pwr8 %s \ |
; CHECK-NEXT: BLR8 implicit $lr8, implicit $rm, implicit $v2 | ||
%0:g8rc_and_g8rc_nox0 = COPY $x3 | ||
%1:f8rc, %2:g8rc_and_g8rc_nox0 = LFSUX %0, %0 | ||
%3:f4rc = FRSP killed %1, implicit $rm |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Compact the register numbers, can do this with -run-pass=none
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I did do this but the numbers didn't change. I think it is because both %1
and %2
are defined on line 27.
@@ -0,0 +1,37 @@ | |||
# NOTE: Assertions have been autogenerated by utils/update_mir_test_checks.py UTC_ARGS: --version 5 | |||
# RUN: llc -mtriple x86_64-unknown-unknown -x mir < %s \ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
# RUN: llc -mtriple x86_64-unknown-unknown -x mir < %s \ | |
# RUN: llc -mtriple x86_64-unknown-unknown %s \ |
@@ -0,0 +1,37 @@ | |||
# NOTE: Assertions have been autogenerated by utils/update_mir_test_checks.py UTC_ARGS: --version 5 | |||
# RUN: llc -mtriple x86_64-unknown-unknown -x mir < %s \ | |||
# RUN: -verify-coalescing -enable-subreg-liveness=true \ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
# RUN: -verify-coalescing -enable-subreg-liveness=true \ | |
# RUN: -verify-coalescing -enable-subreg-liveness \ |
6c707ed
to
a9c6e9d
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for the update.
Makes sense now!
Two tests are added to this fix. The X86 test fails without the patch. The PowerPC test passes with and without the patch but is added as a way track future possible failures when register classes are changed in a future patch.
a9c6e9d
to
d23a9b4
Compare
…er. (#96839) Summary: The issue with the handling of the SUBREG_TO_REG is that we don't join the subranges correctly when we join live ranges across the SUBREG_TO_REG. For example when joining across this: ``` 32B %2:gr64_nosp = SUBREG_TO_REG 0, %0:gr32, %subreg.sub_32bit ``` we want to join these live ranges: ``` %0 [16r,32r:0) 0@16r weight:0.000000e+00 %2 [32r,112r:0) 0@32r weight:0.000000e+00 ``` Before the fix the range for the resulting merged `%2` is: ``` %2 [16r,112r:0) 0@16r weight:0.000000e+00 ``` After the fix it is now this: ``` %2 [16r,112r:0) 0@16r L000000000000000F [16r,112r:0) 0@16r weight:0.000000e+00 ``` Two tests are added to this fix. The X86 test fails without the patch. The PowerPC test passes with and without the patch but is added as a way track future possible failures when register classes are changed in a future patch. Test Plan: Reviewers: Subscribers: Tasks: Tags: Differential Revision: https://phabricator.intern.facebook.com/D60250666
/cherry-pick 26fa399 |
…er. (llvm#96839) The issue with the handling of the SUBREG_TO_REG is that we don't join the subranges correctly when we join live ranges across the SUBREG_TO_REG. For example when joining across this: ``` 32B %2:gr64_nosp = SUBREG_TO_REG 0, %0:gr32, %subreg.sub_32bit ``` we want to join these live ranges: ``` %0 [16r,32r:0) 0@16r weight:0.000000e+00 %2 [32r,112r:0) 0@32r weight:0.000000e+00 ``` Before the fix the range for the resulting merged `%2` is: ``` %2 [16r,112r:0) 0@16r weight:0.000000e+00 ``` After the fix it is now this: ``` %2 [16r,112r:0) 0@16r L000000000000000F [16r,112r:0) 0@16r weight:0.000000e+00 ``` Two tests are added to this fix. The X86 test fails without the patch. The PowerPC test passes with and without the patch but is added as a way track future possible failures when register classes are changed in a future patch. (cherry picked from commit 26fa399)
/pull-request #101071 |
…er. (#96839) The issue with the handling of the SUBREG_TO_REG is that we don't join the subranges correctly when we join live ranges across the SUBREG_TO_REG. For example when joining across this: ``` 32B %2:gr64_nosp = SUBREG_TO_REG 0, %0:gr32, %subreg.sub_32bit ``` we want to join these live ranges: ``` %0 [16r,32r:0) 0@16r weight:0.000000e+00 %2 [32r,112r:0) 0@32r weight:0.000000e+00 ``` Before the fix the range for the resulting merged `%2` is: ``` %2 [16r,112r:0) 0@16r weight:0.000000e+00 ``` After the fix it is now this: ``` %2 [16r,112r:0) 0@16r L000000000000000F [16r,112r:0) 0@16r weight:0.000000e+00 ``` Two tests are added to this fix. The X86 test fails without the patch. The PowerPC test passes with and without the patch but is added as a way track future possible failures when register classes are changed in a future patch. (cherry picked from commit 26fa399)
…er. (llvm#96839) The issue with the handling of the SUBREG_TO_REG is that we don't join the subranges correctly when we join live ranges across the SUBREG_TO_REG. For example when joining across this: ``` 32B %2:gr64_nosp = SUBREG_TO_REG 0, %0:gr32, %subreg.sub_32bit ``` we want to join these live ranges: ``` %0 [16r,32r:0) 0@16r weight:0.000000e+00 %2 [32r,112r:0) 0@32r weight:0.000000e+00 ``` Before the fix the range for the resulting merged `%2` is: ``` %2 [16r,112r:0) 0@16r weight:0.000000e+00 ``` After the fix it is now this: ``` %2 [16r,112r:0) 0@16r L000000000000000F [16r,112r:0) 0@16r weight:0.000000e+00 ``` Two tests are added to this fix. The X86 test fails without the patch. The PowerPC test passes with and without the patch but is added as a way track future possible failures when register classes are changed in a future patch.
The issue with the handling of the SUBREG_TO_REG is that we don't join the subranges correctly when we join live ranges across the SUBREG_TO_REG. For example when joining across this:
we want to join these live ranges:
Before the fix the range for the resulting merged
%2
is:After the fix it is now this:
Two tests are added to this fix. The X86 test fails without the patch.
The PowerPC test passes with and without the patch but is added as a way
track future possible failures when register classes are changed in a
future patch.