InstCombine: Fold out nanless canonicalize pattern by arsenm · Pull Request #172998 · llvm/llvm-project

arsenm · 2025-12-19T12:26:44Z

Pattern match a wrapper around llvm.canonicalize which
weakens the semantics to not require quieting signaling
nans. Depending on the denormal mode and FP type, we can
either drop the pattern entirely or reduce it only to
a canonicalize call. I'm inventing this pattern to deal
with LLVM's lax canonicalization model in math library
code.

The math library code currently has explicit checks for
the denormal mode, and conditionally canonicalizes the
result if there is flushing. Semantically, this could be
directly replaced with a simple call to llvm.canonicalize,
but doing so would incur an additional cost when using
standard IEEE behavior. If we do not care about quieting
a signaling nan, this should be a no-op unless the denormal
mode may flush. This will allow replacement of the
conditional code with a zero cost abstraction utility
function.

Note we need a standard LLVM floating-point operation
in the nan case to assert we do not care about preserving
the nan payload and sign bit. This could be any no-op fp
instruction; a normal choice would be fmul by 1.0. Using
that presents an ordering problem - since LLVM fp operations
are not required to canonicalize, instcombine would fold
out the fmul before reaching this select combine. The galaxy
brain solution here is to use fdiv 1.0, %x as the no-op.

This is not a no-op - it could potentially return infinity
if %x were 0 (or very close to 0) so it will not be dropped.
For the purposes here, that's fine since it's only being used
as a nan sink.

https://alive2.llvm.org/ce/z/QYS4en

arsenm · 2025-12-19T12:26:57Z

Warning

This pull request is not mergeable via GitHub because a downstack PR is open. Once all requirements are satisfied, merge this PR as a stack on Graphite.
Learn more

This stack of pull requests is managed by Graphite. Learn more about stacking.

llvmbot · 2025-12-19T12:28:29Z

@llvm/pr-subscribers-llvm-transforms

Author: Matt Arsenault (arsenm)

Changes

Pattern match a wrapper around llvm.canonicalize which
weakens the semantics to not require quieting signaling
nans. Depending on the denormal mode and FP type, we can
either drop the pattern entirely or reduce it only to
a canonicalize call. I'm inventing this pattern to deal
with LLVM's lax canonicalization model in math library
code.

The math library code currently has explicit checks for
the denormal mode, and conditionally canonicalizes the
result if there is flushing. Semantically, this could be
directly replaced with a simple call to llvm.canonicalize,
but doing so would incur an additional cost when using
standard IEEE behavior. If we do not care about quieting
a signaling nan, this should be a no-op unless the denormal
mode may flush. This will allow replacement of the
conditional code with a zero cost abstraction utility
function.

Note we need a standard LLVM floating-point operation
in the nan case to assert we do not care about preserving
the nan payload and sign bit. This could be any no-op fp
instruction; a normal choice would be fmul by 1.0. Using
that presents an ordering problem - since LLVM fp operations
are not required to canonicalize, instcombine would fold
out the fmul before reaching this select combine. The galaxy
brain solution here is to use fdiv 1.0, %x as the no-op.

This is not a no-op - it could potentially return infinity
if %x were 0 (or very close to 0) so it will not be dropped.
For the purposes here, that's fine since it's only being used
as a nan sink.

https://alive2.llvm.org/ce/z/QYS4en

Full diff: https://github.com/llvm/llvm-project/pull/172998.diff

2 Files Affected:

(modified) llvm/lib/Transforms/InstCombine/InstCombineSelect.cpp (+65)
(modified) llvm/test/Transforms/InstCombine/nanless-canonicalize-combine.ll (+20-81)

diff --git a/llvm/lib/Transforms/InstCombine/InstCombineSelect.cpp b/llvm/lib/Transforms/InstCombine/InstCombineSelect.cpp
index f52bac5e600cb..33170e12ed629 100644
--- a/llvm/lib/Transforms/InstCombine/InstCombineSelect.cpp
+++ b/llvm/lib/Transforms/InstCombine/InstCombineSelect.cpp
@@ -4343,6 +4343,71 @@ Instruction *InstCombinerImpl::visitSelectInst(SelectInst &SI) {
           matchFMulByZeroIfResultEqZero(*this, Cmp0, Cmp1, MatchCmp1, MatchCmp0,
                                         SI, SIFPOp->hasNoSignedZeros()))
         return replaceInstUsesWith(SI, Cmp0);
+
+      if (Pred == CmpInst::FCMP_ORD || Pred == CmpInst::FCMP_UNO) {
+        // Fold out only-canonicalize-non-nans pattern. This implements a
+        // wrapper around llvm.canonicalize which is not required to quiet
+        // signaling nans or preserve nan payload bits.
+        //
+        //   %hard.canonical = call @llvm.canonicalize(%x)
+        //   %soft.canonical = fdiv 1.0, %x
+        //   %ord = fcmp ord %x, 0.0
+        //   %x.canon = select i1 %ord, %hard.canonical, %soft.canonical
+        //
+        // With known IEEE handling:
+        //   => %x
+        //
+        // With other denormal behaviors or exotic types:
+        //   => llvm.canonicalize(%x)
+        //
+        // Note the fdiv could be any value preserving, potentially
+        // canonicalizing floating-point operation such as fmul by 1.0. However,
+        // since in the llvm model canonicalization is not mandatory, the fmul
+        // would have been dropped by the time we reached here. The trick here
+        // is to use a reciprocal fdiv. It's not a droppable no-op, as it could
+        // return an infinity if %x were sufficiently small, but in this pattern
+        // we're only using the output for nan values.
+
+        if (Pred == CmpInst::FCMP_ORD) {
+          MatchCmp0 = TrueVal;
+          MatchCmp1 = FalseVal;
+        } else {
+          MatchCmp0 = FalseVal;
+          MatchCmp1 = TrueVal;
+        }
+
+        if (match(MatchCmp0, m_FCanonicalize(m_Specific(Cmp0))) &&
+            match(Cmp1, m_PosZeroFP())) {
+          const fltSemantics &FPSem =
+              SelType->getScalarType()->getFltSemantics();
+          if (APFloat::isIEEELikeFP(FPSem)) {
+            // IEEE handling does not have non-canonical values, so the
+            // canonicalize can be dropped for direct replacement without
+            // looking for the intermediate maybe-canonicalizing operation.
+            if (Cmp0 == MatchCmp1 && SI.getFunction()->getDenormalMode(FPSem) ==
+                                         DenormalMode::getIEEE())
+              return replaceInstUsesWith(SI, Cmp0);
+
+            // If denormals may be flushed, we need to retain the canonicalize
+            // call. This introduces a canonicalization on the nan path, which
+            // we are not free to do as that could change the sign bit or
+            // payload bits. We can only do this if there were a no-op like
+            // floating-point instruction which may have changed the nan bits
+            // anyway.
+            if (match(MatchCmp1, m_FDiv(m_FPOne(), m_Specific(Cmp0)))) {
+              DenormalMode Mode = SI.getFunction()->getDenormalMode(FPSem);
+              if (Mode == DenormalMode::getIEEE())
+                return replaceInstUsesWith(SI, Cmp0);
+
+              if (Mode.inputsAreZero() || Mode.outputsAreZero())
+                return replaceInstUsesWith(SI, MatchCmp0);
+            }
+
+            // Leave the dynamic mode case alone. This would introduce new
+            // constraints if the mode may be refined later.
+          }
+        }
+      }
     }
   }
 
diff --git a/llvm/test/Transforms/InstCombine/nanless-canonicalize-combine.ll b/llvm/test/Transforms/InstCombine/nanless-canonicalize-combine.ll
index 5aa156753e860..0937f6ff19c2b 100644
--- a/llvm/test/Transforms/InstCombine/nanless-canonicalize-combine.ll
+++ b/llvm/test/Transforms/InstCombine/nanless-canonicalize-combine.ll
@@ -9,11 +9,7 @@
 define float @canonicalize_ieee_0(float %x) #0 {
 ; CHECK-LABEL: define float @canonicalize_ieee_0(
 ; CHECK-SAME: float [[X:%.*]]) #[[ATTR0:[0-9]+]] {
-; CHECK-NEXT:    [[HARD_CANONICAL:%.*]] = call float @llvm.canonicalize.f32(float [[X]])
-; CHECK-NEXT:    [[SOFT_CANONICAL:%.*]] = fdiv float 1.000000e+00, [[X]]
-; CHECK-NEXT:    [[ORD:%.*]] = fcmp ord float [[X]], 0.000000e+00
-; CHECK-NEXT:    [[X_CANON:%.*]] = select i1 [[ORD]], float [[HARD_CANONICAL]], float [[SOFT_CANONICAL]]
-; CHECK-NEXT:    ret float [[X_CANON]]
+; CHECK-NEXT:    ret float [[X]]
 ;
   %hard.canonical = call float @llvm.canonicalize.f32(float %x)
   %soft.canonical = fdiv float 1.0, %x
@@ -26,11 +22,7 @@ define float @canonicalize_ieee_0(float %x) #0 {
 define float @canonicalize_ieee_1(float %x) #0 {
 ; CHECK-LABEL: define float @canonicalize_ieee_1(
 ; CHECK-SAME: float [[X:%.*]]) #[[ATTR0]] {
-; CHECK-NEXT:    [[HARD_CANONICAL:%.*]] = call float @llvm.canonicalize.f32(float [[X]])
-; CHECK-NEXT:    [[SOFT_CANONICAL:%.*]] = fdiv float 1.000000e+00, [[X]]
-; CHECK-NEXT:    [[UNO:%.*]] = fcmp uno float [[X]], 0.000000e+00
-; CHECK-NEXT:    [[X_CANON:%.*]] = select i1 [[UNO]], float [[SOFT_CANONICAL]], float [[HARD_CANONICAL]]
-; CHECK-NEXT:    ret float [[X_CANON]]
+; CHECK-NEXT:    ret float [[X]]
 ;
   %hard.canonical = call float @llvm.canonicalize.f32(float %x)
   %soft.canonical = fdiv float 1.0, %x
@@ -44,10 +36,7 @@ define float @canonicalize_ieee_1(float %x) #0 {
 define float @canonicalize_ieee_0_fmul(float %x) #0 {
 ; CHECK-LABEL: define float @canonicalize_ieee_0_fmul(
 ; CHECK-SAME: float [[X:%.*]]) #[[ATTR0]] {
-; CHECK-NEXT:    [[HARD_CANONICAL:%.*]] = call float @llvm.canonicalize.f32(float [[X]])
-; CHECK-NEXT:    [[ORD:%.*]] = fcmp ord float [[X]], 0.000000e+00
-; CHECK-NEXT:    [[X_CANON:%.*]] = select i1 [[ORD]], float [[HARD_CANONICAL]], float [[X]]
-; CHECK-NEXT:    ret float [[X_CANON]]
+; CHECK-NEXT:    ret float [[X]]
 ;
   %hard.canonical = call float @llvm.canonicalize.f32(float %x)
   %soft.canonical = fmul float %x, 1.0
@@ -61,10 +50,7 @@ define float @canonicalize_ieee_0_fmul(float %x) #0 {
 define float @canonicalize_ieee_0_fdiv_commute(float %x) #0 {
 ; CHECK-LABEL: define float @canonicalize_ieee_0_fdiv_commute(
 ; CHECK-SAME: float [[X:%.*]]) #[[ATTR0]] {
-; CHECK-NEXT:    [[HARD_CANONICAL:%.*]] = call float @llvm.canonicalize.f32(float [[X]])
-; CHECK-NEXT:    [[ORD:%.*]] = fcmp ord float [[X]], 0.000000e+00
-; CHECK-NEXT:    [[X_CANON:%.*]] = select i1 [[ORD]], float [[HARD_CANONICAL]], float [[X]]
-; CHECK-NEXT:    ret float [[X_CANON]]
+; CHECK-NEXT:    ret float [[X]]
 ;
   %hard.canonical = call float @llvm.canonicalize.f32(float %x)
   %soft.canonical = fdiv float %x, 1.0
@@ -79,10 +65,7 @@ define float @canonicalize_daz_0(float %x) #1 {
 ; CHECK-LABEL: define float @canonicalize_daz_0(
 ; CHECK-SAME: float [[X:%.*]]) #[[ATTR1:[0-9]+]] {
 ; CHECK-NEXT:    [[HARD_CANONICAL:%.*]] = call float @llvm.canonicalize.f32(float [[X]])
-; CHECK-NEXT:    [[SOFT_CANONICAL:%.*]] = fdiv float 1.000000e+00, [[X]]
-; CHECK-NEXT:    [[ORD:%.*]] = fcmp ord float [[X]], 0.000000e+00
-; CHECK-NEXT:    [[X_CANON:%.*]] = select i1 [[ORD]], float [[HARD_CANONICAL]], float [[SOFT_CANONICAL]]
-; CHECK-NEXT:    ret float [[X_CANON]]
+; CHECK-NEXT:    ret float [[HARD_CANONICAL]]
 ;
   %hard.canonical = call float @llvm.canonicalize.f32(float %x)
   %soft.canonical = fdiv float 1.0, %x
@@ -96,11 +79,8 @@ define float @canonicalize_daz_0(float %x) #1 {
 define float @canonicalize_daz_1(float %x) #1 {
 ; CHECK-LABEL: define float @canonicalize_daz_1(
 ; CHECK-SAME: float [[X:%.*]]) #[[ATTR1]] {
-; CHECK-NEXT:    [[HARD_CANONICAL:%.*]] = call float @llvm.canonicalize.f32(float [[X]])
-; CHECK-NEXT:    [[SOFT_CANONICAL:%.*]] = fdiv float 1.000000e+00, [[X]]
-; CHECK-NEXT:    [[UNO:%.*]] = fcmp uno float [[X]], 0.000000e+00
-; CHECK-NEXT:    [[X_CANON:%.*]] = select i1 [[UNO]], float [[SOFT_CANONICAL]], float [[HARD_CANONICAL]]
-; CHECK-NEXT:    ret float [[X_CANON]]
+; CHECK-NEXT:    [[SOFT_CANONICAL:%.*]] = call float @llvm.canonicalize.f32(float [[X]])
+; CHECK-NEXT:    ret float [[SOFT_CANONICAL]]
 ;
   %hard.canonical = call float @llvm.canonicalize.f32(float %x)
   %soft.canonical = fdiv float 1.0, %x
@@ -146,11 +126,7 @@ define float @canonicalize_dynamic_1(float %x) #2 {
 define <2 x float> @canonicalize_ieee_0_vec(<2 x float> %x) #0 {
 ; CHECK-LABEL: define <2 x float> @canonicalize_ieee_0_vec(
 ; CHECK-SAME: <2 x float> [[X:%.*]]) #[[ATTR0]] {
-; CHECK-NEXT:    [[HARD_CANONICAL:%.*]] = call <2 x float> @llvm.canonicalize.v2f32(<2 x float> [[X]])
-; CHECK-NEXT:    [[SOFT_CANONICAL:%.*]] = fdiv <2 x float> splat (float 1.000000e+00), [[X]]
-; CHECK-NEXT:    [[ORD:%.*]] = fcmp ord <2 x float> [[X]], zeroinitializer
-; CHECK-NEXT:    [[X_CANON:%.*]] = select <2 x i1> [[ORD]], <2 x float> [[HARD_CANONICAL]], <2 x float> [[SOFT_CANONICAL]]
-; CHECK-NEXT:    ret <2 x float> [[X_CANON]]
+; CHECK-NEXT:    ret <2 x float> [[X]]
 ;
   %hard.canonical = call <2 x float> @llvm.canonicalize.v2f32(<2 x float> %x)
   %soft.canonical = fdiv <2 x float> splat (float 1.0), %x
@@ -162,11 +138,7 @@ define <2 x float> @canonicalize_ieee_0_vec(<2 x float> %x) #0 {
 define <2 x float> @canonicalize_ieee_1_vec(<2 x float> %x) #0 {
 ; CHECK-LABEL: define <2 x float> @canonicalize_ieee_1_vec(
 ; CHECK-SAME: <2 x float> [[X:%.*]]) #[[ATTR0]] {
-; CHECK-NEXT:    [[HARD_CANONICAL:%.*]] = call <2 x float> @llvm.canonicalize.v2f32(<2 x float> [[X]])
-; CHECK-NEXT:    [[SOFT_CANONICAL:%.*]] = fdiv <2 x float> splat (float 1.000000e+00), [[X]]
-; CHECK-NEXT:    [[UNO:%.*]] = fcmp uno <2 x float> [[X]], zeroinitializer
-; CHECK-NEXT:    [[X_CANON:%.*]] = select <2 x i1> [[UNO]], <2 x float> [[SOFT_CANONICAL]], <2 x float> [[HARD_CANONICAL]]
-; CHECK-NEXT:    ret <2 x float> [[X_CANON]]
+; CHECK-NEXT:    ret <2 x float> [[X]]
 ;
   %hard.canonical = call <2 x float> @llvm.canonicalize.v2f32(<2 x float> %x)
   %soft.canonical = fdiv <2 x float> splat (float 1.0), %x
@@ -195,10 +167,7 @@ define <2 x float> @canonicalize_daz_0_vec(<2 x float> %x) #1 {
 ; CHECK-LABEL: define <2 x float> @canonicalize_daz_0_vec(
 ; CHECK-SAME: <2 x float> [[X:%.*]]) #[[ATTR1]] {
 ; CHECK-NEXT:    [[HARD_CANONICAL:%.*]] = call <2 x float> @llvm.canonicalize.v2f32(<2 x float> [[X]])
-; CHECK-NEXT:    [[SOFT_CANONICAL:%.*]] = fdiv <2 x float> splat (float 1.000000e+00), [[X]]
-; CHECK-NEXT:    [[ORD:%.*]] = fcmp ord <2 x float> [[X]], zeroinitializer
-; CHECK-NEXT:    [[X_CANON:%.*]] = select <2 x i1> [[ORD]], <2 x float> [[HARD_CANONICAL]], <2 x float> [[SOFT_CANONICAL]]
-; CHECK-NEXT:    ret <2 x float> [[X_CANON]]
+; CHECK-NEXT:    ret <2 x float> [[HARD_CANONICAL]]
 ;
   %hard.canonical = call <2 x float> @llvm.canonicalize.v2f32(<2 x float> %x)
   %soft.canonical = fdiv <2 x float> splat (float 1.0), %x
@@ -210,11 +179,8 @@ define <2 x float> @canonicalize_daz_0_vec(<2 x float> %x) #1 {
 define <2 x float> @canonicalize_daz_1_vec(<2 x float> %x) #1 {
 ; CHECK-LABEL: define <2 x float> @canonicalize_daz_1_vec(
 ; CHECK-SAME: <2 x float> [[X:%.*]]) #[[ATTR1]] {
-; CHECK-NEXT:    [[HARD_CANONICAL:%.*]] = call <2 x float> @llvm.canonicalize.v2f32(<2 x float> [[X]])
-; CHECK-NEXT:    [[SOFT_CANONICAL:%.*]] = fdiv <2 x float> splat (float 1.000000e+00), [[X]]
-; CHECK-NEXT:    [[UNO:%.*]] = fcmp uno <2 x float> [[X]], zeroinitializer
-; CHECK-NEXT:    [[X_CANON:%.*]] = select <2 x i1> [[UNO]], <2 x float> [[SOFT_CANONICAL]], <2 x float> [[HARD_CANONICAL]]
-; CHECK-NEXT:    ret <2 x float> [[X_CANON]]
+; CHECK-NEXT:    [[SOFT_CANONICAL:%.*]] = call <2 x float> @llvm.canonicalize.v2f32(<2 x float> [[X]])
+; CHECK-NEXT:    ret <2 x float> [[SOFT_CANONICAL]]
 ;
   %hard.canonical = call <2 x float> @llvm.canonicalize.v2f32(<2 x float> %x)
   %soft.canonical = fdiv <2 x float> splat (float 1.0), %x
@@ -226,11 +192,7 @@ define <2 x float> @canonicalize_daz_1_vec(<2 x float> %x) #1 {
 define bfloat @canonicalize_ieee_bf16(bfloat %x) #0 {
 ; CHECK-LABEL: define bfloat @canonicalize_ieee_bf16(
 ; CHECK-SAME: bfloat [[X:%.*]]) #[[ATTR0]] {
-; CHECK-NEXT:    [[HARD_CANONICAL:%.*]] = call bfloat @llvm.canonicalize.bf16(bfloat [[X]])
-; CHECK-NEXT:    [[SOFT_CANONICAL:%.*]] = fdiv bfloat 0xR3F80, [[X]]
-; CHECK-NEXT:    [[ORD:%.*]] = fcmp ord bfloat [[X]], 0xR0000
-; CHECK-NEXT:    [[X_CANON:%.*]] = select i1 [[ORD]], bfloat [[HARD_CANONICAL]], bfloat [[SOFT_CANONICAL]]
-; CHECK-NEXT:    ret bfloat [[X_CANON]]
+; CHECK-NEXT:    ret bfloat [[X]]
 ;
   %hard.canonical = call bfloat @llvm.canonicalize.bf16(bfloat %x)
   %soft.canonical = fdiv bfloat 1.0, %x
@@ -242,11 +204,7 @@ define bfloat @canonicalize_ieee_bf16(bfloat %x) #0 {
 define half @canonicalize_ieee_f16(half %x) #0 {
 ; CHECK-LABEL: define half @canonicalize_ieee_f16(
 ; CHECK-SAME: half [[X:%.*]]) #[[ATTR0]] {
-; CHECK-NEXT:    [[HARD_CANONICAL:%.*]] = call half @llvm.canonicalize.f16(half [[X]])
-; CHECK-NEXT:    [[SOFT_CANONICAL:%.*]] = fdiv half 0xH3C00, [[X]]
-; CHECK-NEXT:    [[ORD:%.*]] = fcmp ord half [[X]], 0xH0000
-; CHECK-NEXT:    [[X_CANON:%.*]] = select i1 [[ORD]], half [[HARD_CANONICAL]], half [[SOFT_CANONICAL]]
-; CHECK-NEXT:    ret half [[X_CANON]]
+; CHECK-NEXT:    ret half [[X]]
 ;
   %hard.canonical = call half @llvm.canonicalize.f16(half %x)
   %soft.canonical = fdiv half 1.0, %x
@@ -258,11 +216,7 @@ define half @canonicalize_ieee_f16(half %x) #0 {
 define double @canonicalize_ieee_f64(double %x) #0 {
 ; CHECK-LABEL: define double @canonicalize_ieee_f64(
 ; CHECK-SAME: double [[X:%.*]]) #[[ATTR0]] {
-; CHECK-NEXT:    [[HARD_CANONICAL:%.*]] = call double @llvm.canonicalize.f64(double [[X]])
-; CHECK-NEXT:    [[SOFT_CANONICAL:%.*]] = fdiv double 1.000000e+00, [[X]]
-; CHECK-NEXT:    [[ORD:%.*]] = fcmp ord double [[X]], 0.000000e+00
-; CHECK-NEXT:    [[X_CANON:%.*]] = select i1 [[ORD]], double [[HARD_CANONICAL]], double [[SOFT_CANONICAL]]
-; CHECK-NEXT:    ret double [[X_CANON]]
+; CHECK-NEXT:    ret double [[X]]
 ;
   %hard.canonical = call double @llvm.canonicalize.f64(double %x)
   %soft.canonical = fdiv double 1.0, %x
@@ -274,10 +228,7 @@ define double @canonicalize_ieee_f64(double %x) #0 {
 define fp128 @canonicalize_ieee_f128(fp128 %x) #0 {
 ; CHECK-LABEL: define fp128 @canonicalize_ieee_f128(
 ; CHECK-SAME: fp128 [[X:%.*]]) #[[ATTR0]] {
-; CHECK-NEXT:    [[HARD_CANONICAL:%.*]] = call fp128 @llvm.canonicalize.f128(fp128 [[X]])
-; CHECK-NEXT:    [[ORD:%.*]] = fcmp ord fp128 [[X]], 0xL00000000000000000000000000000000
-; CHECK-NEXT:    [[X_CANON:%.*]] = select i1 [[ORD]], fp128 [[HARD_CANONICAL]], fp128 [[X]]
-; CHECK-NEXT:    ret fp128 [[X_CANON]]
+; CHECK-NEXT:    ret fp128 [[X]]
 ;
   %hard.canonical = call fp128 @llvm.canonicalize.f128(fp128 %x)
   %ord = fcmp ord fp128 %x, 0xL00000000000000000000000000000000
@@ -503,10 +454,7 @@ define ppc_fp128 @ignore_ppc_fp128(ppc_fp128 %x) #0 {
 define float @canonicalize_ieee_0_missing_noop(float %x) #0 {
 ; CHECK-LABEL: define float @canonicalize_ieee_0_missing_noop(
 ; CHECK-SAME: float [[X:%.*]]) #[[ATTR0]] {
-; CHECK-NEXT:    [[HARD_CANONICAL:%.*]] = call float @llvm.canonicalize.f32(float [[X]])
-; CHECK-NEXT:    [[ORD:%.*]] = fcmp ord float [[X]], 0.000000e+00
-; CHECK-NEXT:    [[X_CANON:%.*]] = select i1 [[ORD]], float [[HARD_CANONICAL]], float [[X]]
-; CHECK-NEXT:    ret float [[X_CANON]]
+; CHECK-NEXT:    ret float [[X]]
 ;
   %hard.canonical = call float @llvm.canonicalize.f32(float %x)
   %ord = fcmp ord float %x, 0.0
@@ -519,10 +467,7 @@ define float @canonicalize_ieee_0_missing_noop(float %x) #0 {
 define float @canonicalize_ieee_1_missing_noop(float %x) #0 {
 ; CHECK-LABEL: define float @canonicalize_ieee_1_missing_noop(
 ; CHECK-SAME: float [[X:%.*]]) #[[ATTR0]] {
-; CHECK-NEXT:    [[HARD_CANONICAL:%.*]] = call float @llvm.canonicalize.f32(float [[X]])
-; CHECK-NEXT:    [[UNO:%.*]] = fcmp uno float [[X]], 0.000000e+00
-; CHECK-NEXT:    [[X_CANON:%.*]] = select i1 [[UNO]], float [[X]], float [[HARD_CANONICAL]]
-; CHECK-NEXT:    ret float [[X_CANON]]
+; CHECK-NEXT:    ret float [[X]]
 ;
   %hard.canonical = call float @llvm.canonicalize.f32(float %x)
   %uno = fcmp uno float %x, 0.0
@@ -567,10 +512,7 @@ define float @canonicalize_only_ftz(float %x) "denormal-fp-math"="preserve-sign,
 ; CHECK-LABEL: define float @canonicalize_only_ftz(
 ; CHECK-SAME: float [[X:%.*]]) #[[ATTR3:[0-9]+]] {
 ; CHECK-NEXT:    [[HARD_CANONICAL:%.*]] = call float @llvm.canonicalize.f32(float [[X]])
-; CHECK-NEXT:    [[SOFT_CANONICAL:%.*]] = fdiv float 1.000000e+00, [[X]]
-; CHECK-NEXT:    [[ORD:%.*]] = fcmp ord float [[X]], 0.000000e+00
-; CHECK-NEXT:    [[X_CANON:%.*]] = select i1 [[ORD]], float [[HARD_CANONICAL]], float [[SOFT_CANONICAL]]
-; CHECK-NEXT:    ret float [[X_CANON]]
+; CHECK-NEXT:    ret float [[HARD_CANONICAL]]
 ;
   %hard.canonical = call float @llvm.canonicalize.f32(float %x)
   %soft.canonical = fdiv float 1.0, %x
@@ -584,10 +526,7 @@ define float @canonicalize_only_daz(float %x) "denormal-fp-math"="ieee,preserve-
 ; CHECK-LABEL: define float @canonicalize_only_daz(
 ; CHECK-SAME: float [[X:%.*]]) #[[ATTR4:[0-9]+]] {
 ; CHECK-NEXT:    [[HARD_CANONICAL:%.*]] = call float @llvm.canonicalize.f32(float [[X]])
-; CHECK-NEXT:    [[SOFT_CANONICAL:%.*]] = fdiv float 1.000000e+00, [[X]]
-; CHECK-NEXT:    [[ORD:%.*]] = fcmp ord float [[X]], 0.000000e+00
-; CHECK-NEXT:    [[X_CANON:%.*]] = select i1 [[ORD]], float [[HARD_CANONICAL]], float [[SOFT_CANONICAL]]
-; CHECK-NEXT:    ret float [[X_CANON]]
+; CHECK-NEXT:    ret float [[HARD_CANONICAL]]
 ;
   %hard.canonical = call float @llvm.canonicalize.f32(float %x)
   %soft.canonical = fdiv float 1.0, %x

dtcxzyw

I don't think your proof works: https://alive2.llvm.org/ce/z/T_dpgL
See https://github.com/AliveToolkit/alive2/blob/20a8472b77ba8a1f0b172059d422a55cb4dff120/ir/instr.cpp#L1168-L1170 Looks like alive2 only respects the standard IEEE fp semantics.

BTW, I'd like to see a two-stage fold: https://alive2.llvm.org/ce/z/DMKuW4

define half @src0(half %x, half %y) {
  %hard.canonical = call half @llvm.canonicalize.f32(half %x)
  %ord = fcmp ord half %x, 0.0
  %x.canon = select i1 %ord, half %hard.canonical, half %y
  ret half %x.canon
}

define half @tgt0(half %x, half %y) {
  %ord = fcmp ord half %x, 0.0
  %x.canon = select i1 %ord, half %x, half %y
  ret half %x.canon
}

define half @src1(half %x, half %y) {
  %cond = fcmp ord half %x, 0.0
  %rcp = fdiv half 1.0, %x
  %sel = select i1 %cond, half %y, half %rcp
  ret half %sel
}

define half @tgt1(half %x, half %y) {
  %cond = fcmp ord half %x, 0.0
  %sel = select i1 %cond, half %y, half %x
  ret half %sel
}

Both arms should be simplified independently.

arsenm · 2025-12-19T21:47:36Z

I don't think your proof works: https://alive2.llvm.org/ce/z/T_dpgL See https://github.com/AliveToolkit/alive2/blob/20a8472b77ba8a1f0b172059d422a55cb4dff120/ir/instr.cpp#L1168-L1170 Looks like alive2 only respects the standard IEEE fp semantics.

I know I saw this catching issues with constant folding of canonicalize before. But this is broken even with IEEE because signaling nan quieting is still mandatory

llvm/lib/Transforms/InstCombine/InstCombineSelect.cpp

arsenm · 2025-12-19T22:39:07Z

Both arms should be simplified independently.

I'm not sure that's possible for the second case. The intermediate fdiv cannot disappear while handling the canonicalize, the canonicalize is absorbing the effects from the fdiv

arsenm · 2025-12-19T22:47:40Z

Both arms should be simplified independently.

I'm not sure that's possible for the second case. The intermediate fdiv cannot disappear while handling the canonicalize, the canonicalize is absorbing the effects from the fdiv

By second case, I mean all of the DAZ cases

arsenm · 2025-12-20T13:07:00Z

https://alive2.llvm.org/ce/z/DMKuW4

The second case here has trouble when %x is undef (though the online version always times out)

arsenm · 2025-12-20T13:44:14Z

Switched to replace the select operand in the cases where possible. The DAZ cases still need to treat this as a combined operation. The reciprocal fold also special cases the full operation to avoid introducing an unnecessary freeze. I'm also not confident I'm handling the undef case correctly

arsenm · 2026-02-04T14:27:25Z

ping

Pattern match a wrapper around llvm.canonicalize which weakens the semantics to not require quieting signaling nans. Depending on the denormal mode and FP type, we can either drop the pattern entirely or reduce it only to a canonicalize call. I'm inventing this pattern to deal with LLVM's lax canonicalization model in math library code. The math library code currently has explicit checks for the denormal mode, and conditionally canonicalizes the result if there is flushing. Semantically, this could be directly replaced with a simple call to llvm.canonicalize, but doing so would incur an additional cost when using standard IEEE behavior. If we do not care about quieting a signaling nan, this should be a no-op unless the denormal mode may flush. This will allow replacement of the conditional code with a zero cost abstraction utility function. Note we need a standard LLVM floating-point operation in the nan case to assert we do not care about preserving the nan payload and sign bit. This could be any no-op fp instruction; a normal choice would be fmul by 1.0. Using that presents an ordering problem - since LLVM fp operations are not required to canonicalize, instcombine would fold out the fmul before reaching this select combine. The galaxy brain solution here is to use fdiv 1.0, %x as the no-op. This is not a no-op - it could potentially return infinity if %x were 0 (or very close to 0) so it will not be dropped. For the purposes here, that's fine since it's only being used as a nan sink. https://alive2.llvm.org/ce/z/QYS4en

arsenm · 2026-02-23T18:38:51Z

ping

dtcxzyw · 2026-02-26T17:42:33Z

llvm/lib/Transforms/InstCombine/InstCombineSelect.cpp

+                // select (fcmp ord %cmp0, 0), y, (fdiv 1, x)
+                //   => select (fcmp ord %cmp0, 0), y, x
+                //
+                // select (fcmp uno %cmp0, 0), (fdiv 1, x), y
+                //   => select (fcmp uno %cmp0, 0), x, y


Suggested change

// select (fcmp ord %cmp0, 0), y, (fdiv 1, x)

// => select (fcmp ord %cmp0, 0), y, x

//

// select (fcmp uno %cmp0, 0), (fdiv 1, x), y

// => select (fcmp uno %cmp0, 0), x, y

// select (fcmp ord x, 0), y, (fdiv 1, x)

// => select (fcmp ord x, 0), y, x

//

// select (fcmp uno x, 0), (fdiv 1, x), y

// => select (fcmp uno x, 0), x, y

dtcxzyw · 2026-02-26T17:48:10Z

llvm/lib/Transforms/InstCombine/InstCombineSelect.cpp

+              // select (fcmp ord %cmp0, 0), canonicalize(x), y
+              //  => select (fcmp ord %cmp0, 0), x, y


Suggested change

// select (fcmp ord %cmp0, 0), canonicalize(x), y

// => select (fcmp ord %cmp0, 0), x, y

// select (fcmp ord x, 0), canonicalize(x), y

// => select (fcmp ord x, 0), x, y

dtcxzyw · 2026-02-26T17:52:30Z

llvm/lib/Transforms/InstCombine/InstCombineSelect.cpp

+          if (RcpIfNan) {
+            if (Mode == DenormalMode::getIEEE()) {


Suggested change

if (RcpIfNan) {

if (Mode == DenormalMode::getIEEE()) {

if (RcpIfNan && Mode == DenormalMode::getIEEE()) {

dtcxzyw · 2026-02-26T17:53:48Z

llvm/lib/Transforms/InstCombine/InstCombineSelect.cpp

+            // anyway.
+            if (RcpIfNan) {
+              if (Mode == DenormalMode::getIEEE())
+                return replaceInstUsesWith(SI, Cmp0);


Unreachable path.

arsenm mentioned this pull request Dec 19, 2025

InstCombine: Add baseline test for nanless canonicalize combine #172997

Open

arsenm added floating-point Floating-point math llvm:instcombine Covers the InstCombine, InstSimplify and AggressiveInstCombine passes labels Dec 19, 2025 — with Graphite App

arsenm requested review from andykaylor, dtcxzyw, guy-david, jayfoad and jcranmer-intel December 19, 2025 12:27

arsenm marked this pull request as ready for review December 19, 2025 12:27

arsenm requested a review from nikic as a code owner December 19, 2025 12:27

llvmbot added the llvm:transforms label Dec 19, 2025

dtcxzyw reviewed Dec 19, 2025

View reviewed changes

jcranmer-intel reviewed Dec 19, 2025

View reviewed changes

llvm/lib/Transforms/InstCombine/InstCombineSelect.cpp Outdated Show resolved Hide resolved

arsenm force-pushed the users/arsenm/instcombine/fold-out-nanless-canonicalize-pattern branch from eb788e6 to 6e74f3b Compare December 20, 2025 13:43

arsenm force-pushed the users/arsenm/instcombine/add-baseline-test-nanless-canonicalize branch from 340f7e9 to 7c77545 Compare December 20, 2025 13:43

arsenm force-pushed the users/arsenm/instcombine/fold-out-nanless-canonicalize-pattern branch from 6e74f3b to b1e7ee1 Compare January 22, 2026 21:27

arsenm force-pushed the users/arsenm/instcombine/add-baseline-test-nanless-canonicalize branch from 7c77545 to f793b21 Compare January 22, 2026 21:27

arsenm mentioned this pull request Jan 26, 2026

[libclc] Refine __clc_fp*_subnormals_supported #157633

Open

arsenm force-pushed the users/arsenm/instcombine/add-baseline-test-nanless-canonicalize branch from f793b21 to 2f43a1e Compare February 21, 2026 19:09

arsenm force-pushed the users/arsenm/instcombine/fold-out-nanless-canonicalize-pattern branch from b1e7ee1 to b785d4a Compare February 21, 2026 19:09

dtcxzyw reviewed Feb 26, 2026

View reviewed changes

		// select (fcmp ord %cmp0, 0), canonicalize(x), y
		// => select (fcmp ord %cmp0, 0), x, y

	if (RcpIfNan) {
	if (Mode == DenormalMode::getIEEE()) {
	if (RcpIfNan && Mode == DenormalMode::getIEEE()) {

Conversation

arsenm commented Dec 19, 2025

Uh oh!

arsenm commented Dec 19, 2025

Uh oh!

llvmbot commented Dec 19, 2025

Uh oh!

dtcxzyw left a comment

Choose a reason for hiding this comment

Uh oh!

arsenm commented Dec 19, 2025

Uh oh!

Uh oh!

arsenm commented Dec 19, 2025

Uh oh!

arsenm commented Dec 19, 2025

Uh oh!

arsenm commented Dec 20, 2025

Uh oh!

arsenm commented Dec 20, 2025

Uh oh!

arsenm commented Feb 4, 2026

Uh oh!

arsenm commented Feb 23, 2026

Uh oh!

dtcxzyw Feb 26, 2026

Choose a reason for hiding this comment

Uh oh!

dtcxzyw Feb 26, 2026

Choose a reason for hiding this comment

Uh oh!

dtcxzyw Feb 26, 2026

Choose a reason for hiding this comment

Uh oh!

dtcxzyw Feb 26, 2026

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants