[AMDGPU] Try to reuse register with the constant from compare in v_cndmask #148740

dfukalov · 2025-07-14T22:30:16Z

For some targets, the optimization X == Const ? X : Y -> X == Const ? Const : Y can cause extra register usage or redundant immediate encoding for the constant in cndmask generated from the ternary operation.

This patch detects such cases and reuses the register from the compare instruction that already holds the constant, instead of materializing it again for cndmask.

The optimization avoids immediates that can be encoded into cndmask instruction (including +-0.0), as well as !isNormal() constants.

The change is reworked on the base of #131146

…dmask. For some targets, the optimization X == Const ? X : Y -> X == Const ? Const : Y can cause extra register usage or redundant immediate encoding for the constant in cndmask generated from the ternary operation. This patch detects such cases and reuses the register from the compare instruction that already holds the constant, instead of materializing it again for cndmask. The optimization avoids immediates that can be encoded into cndmask instruction (including +-0.0), as well as !isNormal() constants.

llvmbot · 2025-07-14T22:30:48Z

@llvm/pr-subscribers-backend-amdgpu

Author: Daniil Fukalov (dfukalov)

Changes

For some targets, the optimization X == Const ? X : Y -> X == Const ? Const : Y can cause extra register usage or redundant immediate encoding for the constant in cndmask generated from the ternary operation.

This patch detects such cases and reuses the register from the compare instruction that already holds the constant, instead of materializing it again for cndmask.

The optimization avoids immediates that can be encoded into cndmask instruction (including +-0.0), as well as !isNormal() constants.

The change is reworked on the base of #131146

Patch is 101.97 KiB, truncated to 20.00 KiB below, full version: https://github.com/llvm/llvm-project/pull/148740.diff

3 Files Affected:

(modified) llvm/lib/Target/AMDGPU/AMDGPUISelLowering.cpp (+82)
(added) llvm/test/CodeGen/AMDGPU/select-cmp-shared-constant-fp.ll (+1429)
(added) llvm/test/CodeGen/AMDGPU/select-cmp-shared-constant-int.ll (+955)

diff --git a/llvm/lib/Target/AMDGPU/AMDGPUISelLowering.cpp b/llvm/lib/Target/AMDGPU/AMDGPUISelLowering.cpp
index e64d2162441ab..4e7d95714722e 100644
--- a/llvm/lib/Target/AMDGPU/AMDGPUISelLowering.cpp
+++ b/llvm/lib/Target/AMDGPU/AMDGPUISelLowering.cpp
@@ -4842,11 +4842,93 @@ AMDGPUTargetLowering::foldFreeOpFromSelect(TargetLowering::DAGCombinerInfo &DCI,
   return SDValue();
 }
 
+// Detect when CMP and SELECT use the same constant and fold them to avoid
+// loading the constant twice. Specifically handles patterns like:
+// %cmp = icmp eq i32 %val, 4242
+// %sel = select i1 %cmp, i32 4242, i32 %other
+// It can be optimized to reuse %val instead of 4242 in select.
+static SDValue
+foldCmpSelectWithSharedConstant(SDNode *N, TargetLowering::DAGCombinerInfo &DCI,
+                                const AMDGPUSubtarget *ST) {
+  SDValue Cond = N->getOperand(0);
+  SDValue TrueVal = N->getOperand(1);
+  SDValue FalseVal = N->getOperand(2);
+
+  // Check if condition is a comparison.
+  if (Cond.getOpcode() != ISD::SETCC)
+    return SDValue();
+
+  SDValue LHS = Cond.getOperand(0);
+  SDValue RHS = Cond.getOperand(1);
+  ISD::CondCode CC = cast<CondCodeSDNode>(Cond.getOperand(2))->get();
+
+  bool isFloatingPoint = LHS.getValueType().isFloatingPoint();
+  bool isInteger = LHS.getValueType().isInteger();
+
+  // Handle simple floating-point and integer types only.
+  if (!isFloatingPoint && !isInteger)
+    return SDValue();
+
+  bool isEquality = CC == (isFloatingPoint ? ISD::SETOEQ : ISD::SETEQ);
+  bool isNonEquality = CC == (isFloatingPoint ? ISD::SETONE : ISD::SETNE);
+  if (!isEquality && !isNonEquality)
+    return SDValue();
+
+  SDValue ArgVal, ConstVal;
+  if ((isFloatingPoint && isa<ConstantFPSDNode>(RHS)) ||
+      (isInteger && isa<ConstantSDNode>(RHS))) {
+    ConstVal = RHS;
+    ArgVal = LHS;
+  } else if ((isFloatingPoint && isa<ConstantFPSDNode>(LHS)) ||
+             (isInteger && isa<ConstantSDNode>(LHS))) {
+    ConstVal = LHS;
+    ArgVal = RHS;
+  } else {
+    return SDValue();
+  }
+
+  // Check if constant should not be optimized - early return if not.
+  if (isFloatingPoint) {
+    const APFloat &Val = cast<ConstantFPSDNode>(ConstVal)->getValueAPF();
+    const GCNSubtarget *GCNST = static_cast<const GCNSubtarget *>(ST);
+
+    // Only optimize normal floating-point values, skip optimization for
+    // inlinable floating-point constants.
+    if (!Val.isNormal() || GCNST->getInstrInfo()->isInlineConstant(Val))
+      return SDValue();
+  } else {
+    int64_t IntVal = cast<ConstantSDNode>(ConstVal)->getSExtValue();
+
+    // Skip optimization for inlinable integer immediates.
+    // Inlinable immediates include: -16 to 64 (inclusive).
+    if (IntVal >= -16 && IntVal <= 64)
+      return SDValue();
+  }
+
+  // For equality and non-equality comparisons, patterns:
+  // select (setcc x, const), const, y -> select (setcc x, const), x, y
+  // select (setccinv x, const), y, const -> select (setccinv x, const), y, x
+  if (!(isEquality && TrueVal == ConstVal) &&
+      !(isNonEquality && FalseVal == ConstVal))
+    return SDValue();
+
+  SDValue SelectLHS = (isEquality && TrueVal == ConstVal) ? ArgVal : TrueVal;
+  SDValue SelectRHS =
+      (isNonEquality && FalseVal == ConstVal) ? ArgVal : FalseVal;
+  return DCI.DAG.getNode(ISD::SELECT, SDLoc(N), N->getValueType(0), Cond,
+                         SelectLHS, SelectRHS);
+}
+
 SDValue AMDGPUTargetLowering::performSelectCombine(SDNode *N,
                                                    DAGCombinerInfo &DCI) const {
   if (SDValue Folded = foldFreeOpFromSelect(DCI, SDValue(N, 0)))
     return Folded;
 
+  // Try to fold CMP + SELECT patterns with shared constants (both FP and
+  // integer).
+  if (SDValue Folded = foldCmpSelectWithSharedConstant(N, DCI, Subtarget))
+    return Folded;
+
   SDValue Cond = N->getOperand(0);
   if (Cond.getOpcode() != ISD::SETCC)
     return SDValue();
diff --git a/llvm/test/CodeGen/AMDGPU/select-cmp-shared-constant-fp.ll b/llvm/test/CodeGen/AMDGPU/select-cmp-shared-constant-fp.ll
new file mode 100644
index 0000000000000..11af704d30973
--- /dev/null
+++ b/llvm/test/CodeGen/AMDGPU/select-cmp-shared-constant-fp.ll
@@ -0,0 +1,1429 @@
+; NOTE: Assertions have been autogenerated by utils/update_llc_test_checks.py
+; RUN: llc -mtriple=amdgcn -mcpu=gfx900 < %s | FileCheck -check-prefix=GFX900 %s
+; RUN: llc -mtriple=amdgcn -mcpu=gfx1010 < %s | FileCheck -check-prefix=GFX1010 %s
+
+; Test the CMP+SELECT optimization that folds shared constants to reduce
+; register pressure.
+
+;------------------------------------------------------------------------------
+; F32 Tests
+;------------------------------------------------------------------------------
+
+; Should be folded: fcmp oeq + select with constant in true value
+define float @fcmp_select_fold_oeq_f32_imm(float %arg, float %other) {
+; GFX900-LABEL: fcmp_select_fold_oeq_f32_imm:
+; GFX900:       ; %bb.0: ; %entry
+; GFX900-NEXT:    s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
+; GFX900-NEXT:    s_mov_b32 s4, 0x42487ed8
+; GFX900-NEXT:    v_cmp_eq_f32_e32 vcc, s4, v0
+; GFX900-NEXT:    v_cndmask_b32_e32 v0, v1, v0, vcc
+; GFX900-NEXT:    s_setpc_b64 s[30:31]
+;
+; GFX1010-LABEL: fcmp_select_fold_oeq_f32_imm:
+; GFX1010:       ; %bb.0: ; %entry
+; GFX1010-NEXT:    s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
+; GFX1010-NEXT:    v_cmp_eq_f32_e32 vcc_lo, 0x42487ed8, v0
+; GFX1010-NEXT:    v_cndmask_b32_e32 v0, v1, v0, vcc_lo
+; GFX1010-NEXT:    s_setpc_b64 s[30:31]
+entry:
+  %cmp = fcmp oeq float %arg, 0x40490FDB00000000
+  %sel = select i1 %cmp, float 0x40490FDB00000000, float %other
+  ret float %sel
+}
+
+; Should be folded: fcmp oeq + select with constant in true value (commutative)
+define float @fcmp_select_fold_oeq_imm_f32(float %arg, float %other) {
+; GFX900-LABEL: fcmp_select_fold_oeq_imm_f32:
+; GFX900:       ; %bb.0: ; %entry
+; GFX900-NEXT:    s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
+; GFX900-NEXT:    s_mov_b32 s4, 0x42487ed8
+; GFX900-NEXT:    v_cmp_eq_f32_e32 vcc, s4, v0
+; GFX900-NEXT:    v_cndmask_b32_e32 v0, v1, v0, vcc
+; GFX900-NEXT:    s_setpc_b64 s[30:31]
+;
+; GFX1010-LABEL: fcmp_select_fold_oeq_imm_f32:
+; GFX1010:       ; %bb.0: ; %entry
+; GFX1010-NEXT:    s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
+; GFX1010-NEXT:    v_cmp_eq_f32_e32 vcc_lo, 0x42487ed8, v0
+; GFX1010-NEXT:    v_cndmask_b32_e32 v0, v1, v0, vcc_lo
+; GFX1010-NEXT:    s_setpc_b64 s[30:31]
+entry:
+  %cmp = fcmp oeq float 0x40490FDB00000000, %arg
+  %sel = select i1 %cmp, float 0x40490FDB00000000, float %other
+  ret float %sel
+}
+
+; Should be folded: fcmp one + select with constant in false value
+define float @fcmp_select_fold_one_f32_imm(float %arg, float %other) {
+; GFX900-LABEL: fcmp_select_fold_one_f32_imm:
+; GFX900:       ; %bb.0: ; %entry
+; GFX900-NEXT:    s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
+; GFX900-NEXT:    s_mov_b32 s4, 0x402df850
+; GFX900-NEXT:    v_cmp_lg_f32_e32 vcc, s4, v0
+; GFX900-NEXT:    v_cndmask_b32_e32 v0, v0, v1, vcc
+; GFX900-NEXT:    s_setpc_b64 s[30:31]
+;
+; GFX1010-LABEL: fcmp_select_fold_one_f32_imm:
+; GFX1010:       ; %bb.0: ; %entry
+; GFX1010-NEXT:    s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
+; GFX1010-NEXT:    v_cmp_lg_f32_e32 vcc_lo, 0x402df850, v0
+; GFX1010-NEXT:    v_cndmask_b32_e32 v0, v0, v1, vcc_lo
+; GFX1010-NEXT:    s_setpc_b64 s[30:31]
+entry:
+  %cmp = fcmp one float %arg, 0x4005BF0A00000000
+  %sel = select i1 %cmp, float %other, float 0x4005BF0A00000000
+  ret float %sel
+}
+
+; Should be folded: fcmp one + select with constant in false value (commutative)
+define float @fcmp_select_fold_one_imm_f32(float %arg, float %other) {
+; GFX900-LABEL: fcmp_select_fold_one_imm_f32:
+; GFX900:       ; %bb.0: ; %entry
+; GFX900-NEXT:    s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
+; GFX900-NEXT:    s_mov_b32 s4, 0x402df850
+; GFX900-NEXT:    v_cmp_lg_f32_e32 vcc, s4, v0
+; GFX900-NEXT:    v_cndmask_b32_e32 v0, v0, v1, vcc
+; GFX900-NEXT:    s_setpc_b64 s[30:31]
+;
+; GFX1010-LABEL: fcmp_select_fold_one_imm_f32:
+; GFX1010:       ; %bb.0: ; %entry
+; GFX1010-NEXT:    s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
+; GFX1010-NEXT:    v_cmp_lg_f32_e32 vcc_lo, 0x402df850, v0
+; GFX1010-NEXT:    v_cndmask_b32_e32 v0, v0, v1, vcc_lo
+; GFX1010-NEXT:    s_setpc_b64 s[30:31]
+entry:
+  %cmp = fcmp one float 0x4005BF0A00000000, %arg
+  %sel = select i1 %cmp, float %other, float 0x4005BF0A00000000
+  ret float %sel
+}
+
+; Should NOT be folded: different constants
+define float @fcmp_select_no_fold_f32_different_const(float %arg, float %other) {
+; GFX900-LABEL: fcmp_select_no_fold_f32_different_const:
+; GFX900:       ; %bb.0: ; %entry
+; GFX900-NEXT:    s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
+; GFX900-NEXT:    s_mov_b32 s4, 0x42487ed8
+; GFX900-NEXT:    v_mov_b32_e32 v2, 0x46487ed8
+; GFX900-NEXT:    v_cmp_neq_f32_e32 vcc, s4, v0
+; GFX900-NEXT:    v_cndmask_b32_e32 v0, v2, v1, vcc
+; GFX900-NEXT:    s_setpc_b64 s[30:31]
+;
+; GFX1010-LABEL: fcmp_select_no_fold_f32_different_const:
+; GFX1010:       ; %bb.0: ; %entry
+; GFX1010-NEXT:    s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
+; GFX1010-NEXT:    v_cmp_neq_f32_e32 vcc_lo, 0x42487ed8, v0
+; GFX1010-NEXT:    v_cndmask_b32_e32 v0, 0x46487ed8, v1, vcc_lo
+; GFX1010-NEXT:    s_setpc_b64 s[30:31]
+entry:
+  %cmp = fcmp oeq float %arg, 0x40490FDB00000000
+  %sel = select i1 %cmp, float 0x40C90FDB00000000, float %other
+  ret float %sel
+}
+
+; Should NOT be folded: fcmp oeq with constant in other position
+define float @fcmp_select_no_fold_f32_other_pos(float %arg, float %other) {
+; GFX900-LABEL: fcmp_select_no_fold_f32_other_pos:
+; GFX900:       ; %bb.0: ; %entry
+; GFX900-NEXT:    s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
+; GFX900-NEXT:    s_mov_b32 s4, 0x42487ed8
+; GFX900-NEXT:    v_mov_b32_e32 v2, 0x42487ed8
+; GFX900-NEXT:    v_cmp_eq_f32_e32 vcc, s4, v0
+; GFX900-NEXT:    v_cndmask_b32_e32 v0, v2, v1, vcc
+; GFX900-NEXT:    s_setpc_b64 s[30:31]
+;
+; GFX1010-LABEL: fcmp_select_no_fold_f32_other_pos:
+; GFX1010:       ; %bb.0: ; %entry
+; GFX1010-NEXT:    s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
+; GFX1010-NEXT:    v_cmp_eq_f32_e32 vcc_lo, 0x42487ed8, v0
+; GFX1010-NEXT:    v_cndmask_b32_e32 v0, 0x42487ed8, v1, vcc_lo
+; GFX1010-NEXT:    s_setpc_b64 s[30:31]
+entry:
+  %cmp = fcmp oeq float %arg, 0x40490FDB00000000
+  %sel = select i1 %cmp, float %other, float 0x40490FDB00000000
+  ret float %sel
+}
+
+; Should NOT be folded: unsupported comparison type
+define float @fcmp_select_no_fold_f32_unsupported_cmp(float %arg, float %other) {
+; GFX900-LABEL: fcmp_select_no_fold_f32_unsupported_cmp:
+; GFX900:       ; %bb.0: ; %entry
+; GFX900-NEXT:    s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
+; GFX900-NEXT:    s_mov_b32 s4, 0x42487ed8
+; GFX900-NEXT:    v_mov_b32_e32 v2, 0x42487ed8
+; GFX900-NEXT:    v_cmp_gt_f32_e32 vcc, s4, v0
+; GFX900-NEXT:    v_cndmask_b32_e32 v0, v2, v1, vcc
+; GFX900-NEXT:    s_setpc_b64 s[30:31]
+;
+; GFX1010-LABEL: fcmp_select_no_fold_f32_unsupported_cmp:
+; GFX1010:       ; %bb.0: ; %entry
+; GFX1010-NEXT:    s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
+; GFX1010-NEXT:    v_cmp_gt_f32_e32 vcc_lo, 0x42487ed8, v0
+; GFX1010-NEXT:    v_cndmask_b32_e32 v0, 0x42487ed8, v1, vcc_lo
+; GFX1010-NEXT:    s_setpc_b64 s[30:31]
+entry:
+  %cmp = fcmp olt float %arg, 0x40490FDB00000000
+  %sel = select i1 %cmp, float %other, float 0x40490FDB00000000
+  ret float %sel
+}
+
+; Should NOT be folded: imm can be encoded into cndmask
+define float @fcmp_select_no_fold_f32_enc_imm(float %arg, float %other) {
+; GFX900-LABEL: fcmp_select_no_fold_f32_enc_imm:
+; GFX900:       ; %bb.0: ; %entry
+; GFX900-NEXT:    s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
+; GFX900-NEXT:    v_cmp_neq_f32_e32 vcc, 1.0, v0
+; GFX900-NEXT:    v_cndmask_b32_e32 v0, 1.0, v1, vcc
+; GFX900-NEXT:    s_setpc_b64 s[30:31]
+;
+; GFX1010-LABEL: fcmp_select_no_fold_f32_enc_imm:
+; GFX1010:       ; %bb.0: ; %entry
+; GFX1010-NEXT:    s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
+; GFX1010-NEXT:    v_cmp_neq_f32_e32 vcc_lo, 1.0, v0
+; GFX1010-NEXT:    v_cndmask_b32_e32 v0, 1.0, v1, vcc_lo
+; GFX1010-NEXT:    s_setpc_b64 s[30:31]
+entry:
+  %cmp = fcmp oeq float %arg, 1.0
+  %sel = select i1 %cmp, float 1.0, float %other
+  ret float %sel
+}
+
+; Should NOT be folded: imm can be encoded into cndmask
+define float @fcmp_select_no_fold_f32_enc_imm_2(float %arg, float %other) {
+; GFX900-LABEL: fcmp_select_no_fold_f32_enc_imm_2:
+; GFX900:       ; %bb.0: ; %entry
+; GFX900-NEXT:    s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
+; GFX900-NEXT:    v_cmp_lg_f32_e32 vcc, -4.0, v0
+; GFX900-NEXT:    v_cndmask_b32_e32 v0, -4.0, v1, vcc
+; GFX900-NEXT:    s_setpc_b64 s[30:31]
+;
+; GFX1010-LABEL: fcmp_select_no_fold_f32_enc_imm_2:
+; GFX1010:       ; %bb.0: ; %entry
+; GFX1010-NEXT:    s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
+; GFX1010-NEXT:    v_cmp_lg_f32_e32 vcc_lo, -4.0, v0
+; GFX1010-NEXT:    v_cndmask_b32_e32 v0, -4.0, v1, vcc_lo
+; GFX1010-NEXT:    s_setpc_b64 s[30:31]
+entry:
+  %cmp = fcmp one float -4.0, %arg
+  %sel = select i1 %cmp, float %other, float -4.0
+  ret float %sel
+}
+
+; Should NOT be folded: fcmp oeq with zero constant
+define float @fcmp_select_no_fold_oeq_f32_zero(float %arg, float %other) {
+; GFX900-LABEL: fcmp_select_no_fold_oeq_f32_zero:
+; GFX900:       ; %bb.0: ; %entry
+; GFX900-NEXT:    s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
+; GFX900-NEXT:    v_cmp_neq_f32_e32 vcc, 0, v0
+; GFX900-NEXT:    v_cndmask_b32_e32 v0, 0, v1, vcc
+; GFX900-NEXT:    s_setpc_b64 s[30:31]
+;
+; GFX1010-LABEL: fcmp_select_no_fold_oeq_f32_zero:
+; GFX1010:       ; %bb.0: ; %entry
+; GFX1010-NEXT:    s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
+; GFX1010-NEXT:    v_cmp_neq_f32_e32 vcc_lo, 0, v0
+; GFX1010-NEXT:    v_cndmask_b32_e32 v0, 0, v1, vcc_lo
+; GFX1010-NEXT:    s_setpc_b64 s[30:31]
+entry:
+  %cmp = fcmp oeq float %arg, 0.0
+  %sel = select i1 %cmp, float 0.0, float %other
+  ret float %sel
+}
+
+; Should NOT be folded: fcmp one with negative zero constant
+define float @fcmp_select_no_fold_one_f32_negzero(float %arg, float %other) {
+; GFX900-LABEL: fcmp_select_no_fold_one_f32_negzero:
+; GFX900:       ; %bb.0: ; %entry
+; GFX900-NEXT:    s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
+; GFX900-NEXT:    s_brev_b32 s4, 1
+; GFX900-NEXT:    v_bfrev_b32_e32 v2, 1
+; GFX900-NEXT:    v_cmp_lg_f32_e32 vcc, s4, v0
+; GFX900-NEXT:    v_cndmask_b32_e32 v0, v2, v1, vcc
+; GFX900-NEXT:    s_setpc_b64 s[30:31]
+;
+; GFX1010-LABEL: fcmp_select_no_fold_one_f32_negzero:
+; GFX1010:       ; %bb.0: ; %entry
+; GFX1010-NEXT:    s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
+; GFX1010-NEXT:    v_cmp_lg_f32_e32 vcc_lo, 0x80000000, v0
+; GFX1010-NEXT:    v_cndmask_b32_e32 v0, 0x80000000, v1, vcc_lo
+; GFX1010-NEXT:    s_setpc_b64 s[30:31]
+entry:
+  %cmp = fcmp one float -0.0, %arg ; 0x8000000000000000
+  %sel = select i1 %cmp, float %other, float -0.0 ;0x8000000000000000
+  ret float %sel
+}
+
+; NaN values should bypass the optimization due to special IEEE 754 behavior
+; fcmp oeq with NaN always returns false, so select always chooses %other
+define float @fcmp_select_no_fold_oeq_f32_nan(float %arg, float %other) {
+; GFX900-LABEL: fcmp_select_no_fold_oeq_f32_nan:
+; GFX900:       ; %bb.0: ; %entry
+; GFX900-NEXT:    s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
+; GFX900-NEXT:    v_mov_b32_e32 v0, v1
+; GFX900-NEXT:    s_setpc_b64 s[30:31]
+;
+; GFX1010-LABEL: fcmp_select_no_fold_oeq_f32_nan:
+; GFX1010:       ; %bb.0: ; %entry
+; GFX1010-NEXT:    s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
+; GFX1010-NEXT:    v_mov_b32_e32 v0, v1
+; GFX1010-NEXT:    s_setpc_b64 s[30:31]
+entry:
+  %cmp = fcmp oeq float %arg, 0x7FF8000000000000
+  %sel = select i1 %cmp, float 0x7FF8000000000000, float %other
+  ret float %sel
+}
+
+; NaN values should bypass the optimization due to special IEEE 754 behavior
+; fcmp one with NaN always returns false, so select always chooses the NaN constant
+define float @fcmp_select_no_fold_one_f32_nan(float %arg, float %other) {
+; GFX900-LABEL: fcmp_select_no_fold_one_f32_nan:
+; GFX900:       ; %bb.0: ; %entry
+; GFX900-NEXT:    s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
+; GFX900-NEXT:    v_mov_b32_e32 v0, 0x7fc00000
+; GFX900-NEXT:    s_setpc_b64 s[30:31]
+;
+; GFX1010-LABEL: fcmp_select_no_fold_one_f32_nan:
+; GFX1010:       ; %bb.0: ; %entry
+; GFX1010-NEXT:    s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
+; GFX1010-NEXT:    v_mov_b32_e32 v0, 0x7fc00000
+; GFX1010-NEXT:    s_setpc_b64 s[30:31]
+entry:
+  %cmp = fcmp one float 0x7FF8000000000000, %arg
+  %sel = select i1 %cmp, float %other, float 0x7FF8000000000000
+  ret float %sel
+}
+
+; Should NOT be folded: fcmp one with positive infinity
+; Infinity values should bypass the optimization, generating unfolded code
+define float @fcmp_select_no_fold_posinf_oeq_f32(float %arg, float %other) {
+; GFX900-LABEL: fcmp_select_no_fold_posinf_oeq_f32:
+; GFX900:       ; %bb.0: ; %entry
+; GFX900-NEXT:    s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
+; GFX900-NEXT:    s_mov_b32 s4, 0x7f800000
+; GFX900-NEXT:    v_mov_b32_e32 v2, 0x7f800000
+; GFX900-NEXT:    v_cmp_neq_f32_e32 vcc, s4, v0
+; GFX900-NEXT:    v_cndmask_b32_e32 v0, v2, v1, vcc
+; GFX900-NEXT:    s_setpc_b64 s[30:31]
+;
+; GFX1010-LABEL: fcmp_select_no_fold_posinf_oeq_f32:
+; GFX1010:       ; %bb.0: ; %entry
+; GFX1010-NEXT:    s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
+; GFX1010-NEXT:    v_cmp_neq_f32_e32 vcc_lo, 0x7f800000, v0
+; GFX1010-NEXT:    v_cndmask_b32_e32 v0, 0x7f800000, v1, vcc_lo
+; GFX1010-NEXT:    s_setpc_b64 s[30:31]
+entry:
+  %cmp = fcmp oeq float %arg, 0x7FF0000000000000
+  %sel = select i1 %cmp, float 0x7FF0000000000000, float %other
+  ret float %sel
+}
+
+; Should NOT be folded: fcmp one with negative infinity
+; Infinity values should bypass the optimization, generating unfolded code
+define float @fcmp_select_no_fold_neginf_f32_one(float %arg, float %other) {
+; GFX900-LABEL: fcmp_select_no_fold_neginf_f32_one:
+; GFX900:       ; %bb.0: ; %entry
+; GFX900-NEXT:    s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
+; GFX900-NEXT:    s_mov_b32 s4, 0xff800000
+; GFX900-NEXT:    v_mov_b32_e32 v2, 0xff800000
+; GFX900-NEXT:    v_cmp_lg_f32_e32 vcc, s4, v0
+; GFX900-NEXT:    v_cndmask_b32_e32 v0, v2, v1, vcc
+; GFX900-NEXT:    s_setpc_b64 s[30:31]
+;
+; GFX1010-LABEL: fcmp_select_no_fold_neginf_f32_one:
+; GFX1010:       ; %bb.0: ; %entry
+; GFX1010-NEXT:    s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
+; GFX1010-NEXT:    v_cmp_lg_f32_e32 vcc_lo, 0xff800000, v0
+; GFX1010-NEXT:    v_cndmask_b32_e32 v0, 0xff800000, v1, vcc_lo
+; GFX1010-NEXT:    s_setpc_b64 s[30:31]
+entry:
+  %cmp = fcmp one float 0xFFF0000000000000, %arg
+  %sel = select i1 %cmp, float %other, float 0xFFF0000000000000
+  ret float %sel
+}
+
+;------------------------------------------------------------------------------
+; F64 Tests
+;------------------------------------------------------------------------------
+
+; Should be folded: f64 fcmp oeq + select with constant in true value
+define double @fcmp_select_fold_oeq_f64_imm(double %arg, double %other) {
+; GFX900-LABEL: fcmp_select_fold_oeq_f64_imm:
+; GFX900:       ; %bb.0: ; %entry
+; GFX900-NEXT:    s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
+; GFX900-NEXT:    s_mov_b32 s4, 0x54442d18
+; GFX900-NEXT:    s_mov_b32 s5, 0x400921fb
+; GFX900-NEXT:    v_cmp_eq_f64_e32 vcc, s[4:5], v[0:1]
+; GFX900-NEXT:    v_cndmask_b32_e32 v0, v2, v0, vcc
+; GFX900-NEXT:    v_cndmask_b32_e32 v1, v3, v1, vcc
+; GFX900-NEXT:    s_setpc_b64 s[30:31]
+;
+; GFX1010-LABEL: fcmp_select_fold_oeq_f64_imm:
+; GFX1010:       ; %bb.0: ; %entry
+; GFX1010-NEXT:    s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
+; GFX1010-NEXT:    s_mov_b32 s4, 0x54442d18
+; GFX1010-NEXT:    s_mov_b32 s5, 0x400921fb
+; GFX1010-NEXT:    v_cmp_eq_f64_e32 vcc_lo, s[4:5], v[0:1]
+; GFX1010-NEXT:    v_cndmask_b32_e32 v0, v2, v0, vcc_lo
+; GFX1010-NEXT:    v_cndmask_b32_e32 v1, v3, v1, vcc_lo
+; GFX1010-NEXT:    s_setpc_b64 s[30:31]
+entry:
+  %cmp = fcmp oeq double %arg, 3.141592653589793
+  %sel = select i1 %cmp, double 3.141592653589793, double %other
+  ret double %sel
+}
+; Should be folded: f64 fcmp oeq + select with constant in true value (commutative)
+define double @fcmp_select_fold_oeq_imm_f64(double %arg, double %other) {
+; GFX900-LABEL: fcmp_select_fold_oeq_imm_f64:
+; GFX900:       ; %bb.0: ; %entry
+; GFX900-NEXT:    s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
+; GFX900-NEXT:    s_mov_b32 s4, 0x54442d18
+; GFX900-NEXT:    s_mov_b32 s5, 0x400921fb
+; GFX900-NEXT:    v_cmp_eq_f64_e32 vcc, s[...
[truncated]

Copilot

Pull Request Overview

This PR implements an optimization for the AMDGPU backend that reduces register pressure by reusing registers containing constants from compare instructions instead of materializing the same constant again in conditional select operations. The optimization specifically targets patterns like X == Const ? X : Y -> X == Const ? Const : Y by detecting when the comparison and select share the same constant value.

Key changes:

Added foldCmpSelectWithSharedConstant function to detect and optimize CMP+SELECT patterns with shared constants
Supports both integer and floating-point types with appropriate handling for special values (NaN, infinity, zero)
Avoids optimizing constants that can be efficiently encoded as inline immediates
Comprehensive test coverage for various data types (i8, i16, i32, i64, f16, f32, f64, bf16)

Reviewed Changes

Copilot reviewed 3 out of 3 changed files in this pull request and generated 4 comments.

File	Description
llvm/lib/Target/AMDGPU/AMDGPUISelLowering.cpp	Implements the core optimization logic in `foldCmpSelectWithSharedConstant` and integrates it into `performSelectCombine`
llvm/test/CodeGen/AMDGPU/select-cmp-shared-constant-int.ll	Test cases for integer types showing folded vs non-folded scenarios
llvm/test/CodeGen/AMDGPU/select-cmp-shared-constant-fp.ll	Test cases for floating-point types including special value handling

llvm/lib/Target/AMDGPU/AMDGPUISelLowering.cpp

Copilot · 2025-07-14T22:31:05Z

llvm/lib/Target/AMDGPU/AMDGPUISelLowering.cpp

+  if ((isFloatingPoint && isa<ConstantFPSDNode>(RHS)) ||
+      (isInteger && isa<ConstantSDNode>(RHS))) {
+    ConstVal = RHS;
+    ArgVal = LHS;
+  } else if ((isFloatingPoint && isa<ConstantFPSDNode>(LHS)) ||
+             (isInteger && isa<ConstantSDNode>(LHS))) {
+    ConstVal = LHS;
+    ArgVal = RHS;
+  } else {
+    return SDValue();
+  }


[nitpick] The nested conditional logic for determining ConstVal and ArgVal could be simplified by extracting helper functions or using a more structured approach to reduce code duplication.

Suggested change

if ((isFloatingPoint && isa<ConstantFPSDNode>(RHS)) ||

(isInteger && isa<ConstantSDNode>(RHS))) {

ConstVal = RHS;

ArgVal = LHS;

} else if ((isFloatingPoint && isa<ConstantFPSDNode>(LHS)) ||

(isInteger && isa<ConstantSDNode>(LHS))) {

ConstVal = LHS;

ArgVal = RHS;

} else {

return SDValue();

}

std::tie(ConstVal, ArgVal) = getConstAndArgValues(LHS, RHS, isFloatingPoint, isInteger);

if (!ConstVal || !ArgVal)

return SDValue();

Copilot · 2025-07-14T22:31:06Z

llvm/lib/Target/AMDGPU/AMDGPUISelLowering.cpp

+  // Check if constant should not be optimized - early return if not.
+  if (isFloatingPoint) {
+    const APFloat &Val = cast<ConstantFPSDNode>(ConstVal)->getValueAPF();
+    const GCNSubtarget *GCNST = static_cast<const GCNSubtarget *>(ST);


Consider using dynamic_cast instead of static_cast for safer type conversion, or add an assertion to verify the cast is valid.

Suggested change

const GCNSubtarget *GCNST = static_cast<const GCNSubtarget *>(ST);

const GCNSubtarget *GCNST = dynamic_cast<const GCNSubtarget *>(ST);

if (!GCNST) {

// Return early if the cast is invalid

return SDValue();

}

llvm/lib/Target/AMDGPU/AMDGPUISelLowering.cpp

Co-authored-by: Copilot <[email protected]>

rampitec

LGTM

llvm-ci · 2025-07-16T21:33:22Z

LLVM Buildbot has detected a new failure on builder fuchsia-x86_64-linux running on fuchsia-debian-64-us-central1-a-1 while building llvm at step 4 "annotate".

Full details are available at: https://lab.llvm.org/buildbot/#/builders/11/builds/19718

Here is the relevant piece of the build log for the reference

Step 4 (annotate) failure: 'python ../llvm-zorg/zorg/buildbot/builders/annotated/fuchsia-linux.py ...' (failure)
...
[192/2517] Copying CXX header __numeric/pstl.h
[193/2517] Copying CXX header __numeric/transform_reduce.h
[194/2517] Copying CXX header __ostream/basic_ostream.h
[195/2517] Copying CXX header __ostream/print.h
[196/2517] Copying CXX header __ostream/put_character_sequence.h
[197/2517] Building CXX object libc/src/math/generic/CMakeFiles/libc.src.math.generic.isnanl.dir/isnanl.cpp.obj
[198/2517] Copying CXX header __pstl/backend.h
[199/2517] Copying CXX header __pstl/backend_fwd.h
[200/2517] Building CXX object libc/src/stdlib/baremetal/CMakeFiles/libc.src.stdlib.baremetal.abort.dir/abort.cpp.obj
[201/2517] Building CXX object libc/startup/baremetal/CMakeFiles/libc.startup.baremetal.init.dir/init.cpp.obj
FAILED: libc/startup/baremetal/CMakeFiles/libc.startup.baremetal.init.dir/init.cpp.obj 
/var/lib/buildbot/fuchsia-x86_64-linux/build/llvm-build-qdauij6a/./bin/clang++ --target=armv8m.main-none-eabi -DLIBC_NAMESPACE=__llvm_libc_22_0_0_git -I/var/lib/buildbot/fuchsia-x86_64-linux/llvm-project/libc -isystem /var/lib/buildbot/fuchsia-x86_64-linux/build/llvm-build-qdauij6a/include/armv8m.main-unknown-none-eabi --target=armv8m.main-none-eabi -Wno-atomic-alignment "-Dvfprintf(stream, format, vlist)=vprintf(format, vlist)" "-Dfprintf(stream, format, ...)=printf(format)" -D_LIBCPP_PRINT=1 -mthumb -mfloat-abi=softfp -march=armv8m.main+fp+dsp -mcpu=cortex-m33 -fPIC -fno-semantic-interposition -fvisibility-inlines-hidden -Werror=date-time -Werror=unguarded-availability-new -Wall -Wextra -Wno-unused-parameter -Wwrite-strings -Wcast-qual -Wmissing-field-initializers -Wimplicit-fallthrough -Wcovered-switch-default -Wno-noexcept-type -Wnon-virtual-dtor -Wdelete-non-virtual-dtor -Wsuggest-override -Wstring-conversion -Wmisleading-indentation -Wctad-maybe-unsupported -ffunction-sections -fdata-sections -ffile-prefix-map=/var/lib/buildbot/fuchsia-x86_64-linux/build/llvm-build-qdauij6a/runtimes/runtimes-armv8m.main-none-eabi-bins=../../../../llvm-project -ffile-prefix-map=/var/lib/buildbot/fuchsia-x86_64-linux/llvm-project/= -no-canonical-prefixes -Os -DNDEBUG --target=armv8m.main-none-eabi -DLIBC_QSORT_IMPL=LIBC_QSORT_HEAP_SORT -DLIBC_TYPES_TIME_T_IS_32_BIT -DLIBC_ADD_NULL_CHECKS "-DLIBC_MATH=(LIBC_MATH_SKIP_ACCURATE_PASS | LIBC_MATH_SMALL_TABLES)" -DLIBC_ERRNO_MODE=LIBC_ERRNO_MODE_EXTERNAL -fpie -ffreestanding -DLIBC_FULL_BUILD -nostdlibinc -ffixed-point -fno-builtin -fno-exceptions -fno-lax-vector-conversions -fno-unwind-tables -fno-asynchronous-unwind-tables -fno-rtti -ftrivial-auto-var-init=pattern -fno-omit-frame-pointer -Wall -Wextra -Werror -Wconversion -Wno-sign-conversion -Wdeprecated -Wno-c99-extensions -Wno-gnu-imaginary-constant -Wno-pedantic -Wimplicit-fallthrough -Wwrite-strings -Wextra-semi -Wnewline-eof -Wnonportable-system-include-path -Wstrict-prototypes -Wthread-safety -Wglobal-constructors -DLIBC_COPT_PUBLIC_PACKAGING -MD -MT libc/startup/baremetal/CMakeFiles/libc.startup.baremetal.init.dir/init.cpp.obj -MF libc/startup/baremetal/CMakeFiles/libc.startup.baremetal.init.dir/init.cpp.obj.d -o libc/startup/baremetal/CMakeFiles/libc.startup.baremetal.init.dir/init.cpp.obj -c /var/lib/buildbot/fuchsia-x86_64-linux/llvm-project/libc/startup/baremetal/init.cpp
/var/lib/buildbot/fuchsia-x86_64-linux/llvm-project/libc/startup/baremetal/init.cpp:16:8: error: unknown type name 'uintptr_t'
   16 | extern uintptr_t __preinit_array_start[];
      |        ^
/var/lib/buildbot/fuchsia-x86_64-linux/llvm-project/libc/startup/baremetal/init.cpp:17:8: error: unknown type name 'uintptr_t'
   17 | extern uintptr_t __preinit_array_end[];
      |        ^
/var/lib/buildbot/fuchsia-x86_64-linux/llvm-project/libc/startup/baremetal/init.cpp:18:8: error: unknown type name 'uintptr_t'
   18 | extern uintptr_t __init_array_start[];
      |        ^
/var/lib/buildbot/fuchsia-x86_64-linux/llvm-project/libc/startup/baremetal/init.cpp:19:8: error: unknown type name 'uintptr_t'
   19 | extern uintptr_t __init_array_end[];
      |        ^
4 errors generated.
[202/2517] Building CXX object libc/src/string/CMakeFiles/libc.src.string.memrchr.dir/memrchr.cpp.obj
[203/2517] Building CXX object libc/src/string/CMakeFiles/libc.src.string.strcoll_l.dir/strcoll_l.cpp.obj
[204/2517] Building CXX object libc/src/string/CMakeFiles/libc.src.string.memccpy.dir/memccpy.cpp.obj
[205/2517] Building CXX object libc/src/string/CMakeFiles/libc.src.string.strcmp.dir/strcmp.cpp.obj
[206/2517] Generating header errno.h from /var/lib/buildbot/fuchsia-x86_64-linux/llvm-project/runtimes/../libc/include/errno.yaml
[207/2517] Building CXX object libc/src/stdlib/CMakeFiles/libc.src.stdlib.rand_util.dir/rand_util.cpp.obj
[208/2517] Building CXX object libc/src/string/CMakeFiles/libc.src.string.strncpy.dir/strncpy.cpp.obj
[209/2517] Building CXX object libc/src/string/CMakeFiles/libc.src.string.memmem.dir/memmem.cpp.obj
[210/2517] Building CXX object libc/src/string/CMakeFiles/libc.src.string.strcoll.dir/strcoll.cpp.obj
[211/2517] Building CXX object libc/src/string/CMakeFiles/libc.src.string.strncmp.dir/strncmp.cpp.obj
[212/2517] Copying CXX header __atomic/atomic_init.h
[213/2517] Copying CXX header __atomic/atomic_flag.h
[214/2517] Building CXX object libc/startup/baremetal/CMakeFiles/libc.startup.baremetal.fini.dir/fini.cpp.obj
[215/2517] Copying CXX header __algorithm/ranges_copy_n.h
[216/2517] Copying CXX header __algorithm/remove_if.h
[217/2517] Generating header assert.h from /var/lib/buildbot/fuchsia-x86_64-linux/llvm-project/runtimes/../libc/include/assert.yaml
[218/2517] Building CXX object libc/src/stdio/baremetal/CMakeFiles/libc.src.stdio.baremetal.putchar.dir/putchar.cpp.obj
[219/2517] Building CXX object libc/src/math/generic/CMakeFiles/libc.src.math.generic.inv_trigf_utils.dir/inv_trigf_utils.cpp.obj
[220/2517] Building CXX object libc/src/stdlib/CMakeFiles/libc.src.stdlib.llabs.dir/llabs.cpp.obj
[221/2517] Building CXX object libc/src/stdlib/CMakeFiles/libc.src.stdlib.abs.dir/abs.cpp.obj
[222/2517] Building CXX object libc/src/stdlib/CMakeFiles/libc.src.stdlib.a64l.dir/a64l.cpp.obj
[223/2517] Building CXX object libc/src/string/CMakeFiles/libc.src.string.strspn.dir/strspn.cpp.obj
[224/2517] Building CXX object libc/src/stdio/baremetal/CMakeFiles/libc.src.stdio.baremetal.puts.dir/puts.cpp.obj
[225/2517] Generating header features.h from /var/lib/buildbot/fuchsia-x86_64-linux/llvm-project/runtimes/../libc/include/features.yaml
Step 6 (build) failure: build (failure)
...
[192/2517] Copying CXX header __numeric/pstl.h
[193/2517] Copying CXX header __numeric/transform_reduce.h
[194/2517] Copying CXX header __ostream/basic_ostream.h
[195/2517] Copying CXX header __ostream/print.h
[196/2517] Copying CXX header __ostream/put_character_sequence.h
[197/2517] Building CXX object libc/src/math/generic/CMakeFiles/libc.src.math.generic.isnanl.dir/isnanl.cpp.obj
[198/2517] Copying CXX header __pstl/backend.h
[199/2517] Copying CXX header __pstl/backend_fwd.h
[200/2517] Building CXX object libc/src/stdlib/baremetal/CMakeFiles/libc.src.stdlib.baremetal.abort.dir/abort.cpp.obj
[201/2517] Building CXX object libc/startup/baremetal/CMakeFiles/libc.startup.baremetal.init.dir/init.cpp.obj
FAILED: libc/startup/baremetal/CMakeFiles/libc.startup.baremetal.init.dir/init.cpp.obj 
/var/lib/buildbot/fuchsia-x86_64-linux/build/llvm-build-qdauij6a/./bin/clang++ --target=armv8m.main-none-eabi -DLIBC_NAMESPACE=__llvm_libc_22_0_0_git -I/var/lib/buildbot/fuchsia-x86_64-linux/llvm-project/libc -isystem /var/lib/buildbot/fuchsia-x86_64-linux/build/llvm-build-qdauij6a/include/armv8m.main-unknown-none-eabi --target=armv8m.main-none-eabi -Wno-atomic-alignment "-Dvfprintf(stream, format, vlist)=vprintf(format, vlist)" "-Dfprintf(stream, format, ...)=printf(format)" -D_LIBCPP_PRINT=1 -mthumb -mfloat-abi=softfp -march=armv8m.main+fp+dsp -mcpu=cortex-m33 -fPIC -fno-semantic-interposition -fvisibility-inlines-hidden -Werror=date-time -Werror=unguarded-availability-new -Wall -Wextra -Wno-unused-parameter -Wwrite-strings -Wcast-qual -Wmissing-field-initializers -Wimplicit-fallthrough -Wcovered-switch-default -Wno-noexcept-type -Wnon-virtual-dtor -Wdelete-non-virtual-dtor -Wsuggest-override -Wstring-conversion -Wmisleading-indentation -Wctad-maybe-unsupported -ffunction-sections -fdata-sections -ffile-prefix-map=/var/lib/buildbot/fuchsia-x86_64-linux/build/llvm-build-qdauij6a/runtimes/runtimes-armv8m.main-none-eabi-bins=../../../../llvm-project -ffile-prefix-map=/var/lib/buildbot/fuchsia-x86_64-linux/llvm-project/= -no-canonical-prefixes -Os -DNDEBUG --target=armv8m.main-none-eabi -DLIBC_QSORT_IMPL=LIBC_QSORT_HEAP_SORT -DLIBC_TYPES_TIME_T_IS_32_BIT -DLIBC_ADD_NULL_CHECKS "-DLIBC_MATH=(LIBC_MATH_SKIP_ACCURATE_PASS | LIBC_MATH_SMALL_TABLES)" -DLIBC_ERRNO_MODE=LIBC_ERRNO_MODE_EXTERNAL -fpie -ffreestanding -DLIBC_FULL_BUILD -nostdlibinc -ffixed-point -fno-builtin -fno-exceptions -fno-lax-vector-conversions -fno-unwind-tables -fno-asynchronous-unwind-tables -fno-rtti -ftrivial-auto-var-init=pattern -fno-omit-frame-pointer -Wall -Wextra -Werror -Wconversion -Wno-sign-conversion -Wdeprecated -Wno-c99-extensions -Wno-gnu-imaginary-constant -Wno-pedantic -Wimplicit-fallthrough -Wwrite-strings -Wextra-semi -Wnewline-eof -Wnonportable-system-include-path -Wstrict-prototypes -Wthread-safety -Wglobal-constructors -DLIBC_COPT_PUBLIC_PACKAGING -MD -MT libc/startup/baremetal/CMakeFiles/libc.startup.baremetal.init.dir/init.cpp.obj -MF libc/startup/baremetal/CMakeFiles/libc.startup.baremetal.init.dir/init.cpp.obj.d -o libc/startup/baremetal/CMakeFiles/libc.startup.baremetal.init.dir/init.cpp.obj -c /var/lib/buildbot/fuchsia-x86_64-linux/llvm-project/libc/startup/baremetal/init.cpp
/var/lib/buildbot/fuchsia-x86_64-linux/llvm-project/libc/startup/baremetal/init.cpp:16:8: error: unknown type name 'uintptr_t'
   16 | extern uintptr_t __preinit_array_start[];
      |        ^
/var/lib/buildbot/fuchsia-x86_64-linux/llvm-project/libc/startup/baremetal/init.cpp:17:8: error: unknown type name 'uintptr_t'
   17 | extern uintptr_t __preinit_array_end[];
      |        ^
/var/lib/buildbot/fuchsia-x86_64-linux/llvm-project/libc/startup/baremetal/init.cpp:18:8: error: unknown type name 'uintptr_t'
   18 | extern uintptr_t __init_array_start[];
      |        ^
/var/lib/buildbot/fuchsia-x86_64-linux/llvm-project/libc/startup/baremetal/init.cpp:19:8: error: unknown type name 'uintptr_t'
   19 | extern uintptr_t __init_array_end[];
      |        ^
4 errors generated.
[202/2517] Building CXX object libc/src/string/CMakeFiles/libc.src.string.memrchr.dir/memrchr.cpp.obj
[203/2517] Building CXX object libc/src/string/CMakeFiles/libc.src.string.strcoll_l.dir/strcoll_l.cpp.obj
[204/2517] Building CXX object libc/src/string/CMakeFiles/libc.src.string.memccpy.dir/memccpy.cpp.obj
[205/2517] Building CXX object libc/src/string/CMakeFiles/libc.src.string.strcmp.dir/strcmp.cpp.obj
[206/2517] Generating header errno.h from /var/lib/buildbot/fuchsia-x86_64-linux/llvm-project/runtimes/../libc/include/errno.yaml
[207/2517] Building CXX object libc/src/stdlib/CMakeFiles/libc.src.stdlib.rand_util.dir/rand_util.cpp.obj
[208/2517] Building CXX object libc/src/string/CMakeFiles/libc.src.string.strncpy.dir/strncpy.cpp.obj
[209/2517] Building CXX object libc/src/string/CMakeFiles/libc.src.string.memmem.dir/memmem.cpp.obj
[210/2517] Building CXX object libc/src/string/CMakeFiles/libc.src.string.strcoll.dir/strcoll.cpp.obj
[211/2517] Building CXX object libc/src/string/CMakeFiles/libc.src.string.strncmp.dir/strncmp.cpp.obj
[212/2517] Copying CXX header __atomic/atomic_init.h
[213/2517] Copying CXX header __atomic/atomic_flag.h
[214/2517] Building CXX object libc/startup/baremetal/CMakeFiles/libc.startup.baremetal.fini.dir/fini.cpp.obj
[215/2517] Copying CXX header __algorithm/ranges_copy_n.h
[216/2517] Copying CXX header __algorithm/remove_if.h
[217/2517] Generating header assert.h from /var/lib/buildbot/fuchsia-x86_64-linux/llvm-project/runtimes/../libc/include/assert.yaml
[218/2517] Building CXX object libc/src/stdio/baremetal/CMakeFiles/libc.src.stdio.baremetal.putchar.dir/putchar.cpp.obj
[219/2517] Building CXX object libc/src/math/generic/CMakeFiles/libc.src.math.generic.inv_trigf_utils.dir/inv_trigf_utils.cpp.obj
[220/2517] Building CXX object libc/src/stdlib/CMakeFiles/libc.src.stdlib.llabs.dir/llabs.cpp.obj
[221/2517] Building CXX object libc/src/stdlib/CMakeFiles/libc.src.stdlib.abs.dir/abs.cpp.obj
[222/2517] Building CXX object libc/src/stdlib/CMakeFiles/libc.src.stdlib.a64l.dir/a64l.cpp.obj
[223/2517] Building CXX object libc/src/string/CMakeFiles/libc.src.string.strspn.dir/strspn.cpp.obj
[224/2517] Building CXX object libc/src/stdio/baremetal/CMakeFiles/libc.src.stdio.baremetal.puts.dir/puts.cpp.obj
[225/2517] Generating header features.h from /var/lib/buildbot/fuchsia-x86_64-linux/llvm-project/runtimes/../libc/include/features.yaml

llvm/lib/Target/AMDGPU/AMDGPUISelLowering.cpp

…g. (#150929) As requested in #148740.

dfukalov requested review from arsenm, Copilot, jayfoad and rampitec July 14, 2025 22:30

llvmbot added the backend:AMDGPU label Jul 14, 2025

Copilot AI reviewed Jul 14, 2025

View reviewed changes

Update llvm/lib/Target/AMDGPU/AMDGPUISelLowering.cpp

f8c73ee

Co-authored-by: Copilot <[email protected]>

rampitec approved these changes Jul 15, 2025

View reviewed changes

dfukalov merged commit b7f6abd into llvm:main Jul 16, 2025
9 checks passed

arsenm reviewed Jul 18, 2025

View reviewed changes

llvm/lib/Target/AMDGPU/AMDGPUISelLowering.cpp Show resolved Hide resolved

This was referenced Jul 23, 2025

test abhinavgaba/llvm-project#2

Closed

Add dataFence plugin interface abhinavgaba/llvm-project#3

Closed

dfukalov mentioned this pull request Jul 28, 2025

[NFC][AMDGPU] Move cmp+select arguments optimization to SIISelLowering. #150929

Merged

dfukalov added a commit that referenced this pull request Jul 28, 2025

[NFC][AMDGPU] Move cmp+select arguments optimization to SIISelLowerin…

e650c4b

…g. (#150929) As requested in #148740.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[AMDGPU] Try to reuse register with the constant from compare in v_cndmask #148740

[AMDGPU] Try to reuse register with the constant from compare in v_cndmask #148740

Uh oh!

dfukalov commented Jul 14, 2025

Uh oh!

llvmbot commented Jul 14, 2025

Uh oh!

Copilot AI left a comment

Uh oh!

Uh oh!

Copilot AI Jul 14, 2025

Uh oh!

Copilot AI Jul 14, 2025

Uh oh!

Uh oh!

rampitec left a comment

Uh oh!

Uh oh!

llvm-ci commented Jul 16, 2025

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

5 participants

-    const GCNSubtarget *GCNST = static_cast<const GCNSubtarget *>(ST);
+    const GCNSubtarget *GCNST = dynamic_cast<const GCNSubtarget *>(ST);
+    if (!GCNST) {
+        // Return early if the cast is invalid
+        return SDValue();
+    }

[AMDGPU] Try to reuse register with the constant from compare in v_cndmask #148740

[AMDGPU] Try to reuse register with the constant from compare in v_cndmask #148740

Uh oh!

Conversation

dfukalov commented Jul 14, 2025

Uh oh!

llvmbot commented Jul 14, 2025

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull Request Overview

Reviewed Changes

Uh oh!

Uh oh!

Copilot AI Jul 14, 2025

Choose a reason for hiding this comment

Uh oh!

Copilot AI Jul 14, 2025

Choose a reason for hiding this comment

Uh oh!

Uh oh!

rampitec left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

llvm-ci commented Jul 16, 2025

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

5 participants