InstCombine: Fold out nanless canonicalize pattern#172998
Conversation
|
Warning This pull request is not mergeable via GitHub because a downstack PR is open. Once all requirements are satisfied, merge this PR as a stack on Graphite.
This stack of pull requests is managed by Graphite. Learn more about stacking. |
|
@llvm/pr-subscribers-llvm-transforms Author: Matt Arsenault (arsenm) ChangesPattern match a wrapper around llvm.canonicalize which The math library code currently has explicit checks for Note we need a standard LLVM floating-point operation This is not a no-op - it could potentially return infinity https://alive2.llvm.org/ce/z/QYS4en Full diff: https://github.com/llvm/llvm-project/pull/172998.diff 2 Files Affected:
diff --git a/llvm/lib/Transforms/InstCombine/InstCombineSelect.cpp b/llvm/lib/Transforms/InstCombine/InstCombineSelect.cpp
index f52bac5e600cb..33170e12ed629 100644
--- a/llvm/lib/Transforms/InstCombine/InstCombineSelect.cpp
+++ b/llvm/lib/Transforms/InstCombine/InstCombineSelect.cpp
@@ -4343,6 +4343,71 @@ Instruction *InstCombinerImpl::visitSelectInst(SelectInst &SI) {
matchFMulByZeroIfResultEqZero(*this, Cmp0, Cmp1, MatchCmp1, MatchCmp0,
SI, SIFPOp->hasNoSignedZeros()))
return replaceInstUsesWith(SI, Cmp0);
+
+ if (Pred == CmpInst::FCMP_ORD || Pred == CmpInst::FCMP_UNO) {
+ // Fold out only-canonicalize-non-nans pattern. This implements a
+ // wrapper around llvm.canonicalize which is not required to quiet
+ // signaling nans or preserve nan payload bits.
+ //
+ // %hard.canonical = call @llvm.canonicalize(%x)
+ // %soft.canonical = fdiv 1.0, %x
+ // %ord = fcmp ord %x, 0.0
+ // %x.canon = select i1 %ord, %hard.canonical, %soft.canonical
+ //
+ // With known IEEE handling:
+ // => %x
+ //
+ // With other denormal behaviors or exotic types:
+ // => llvm.canonicalize(%x)
+ //
+ // Note the fdiv could be any value preserving, potentially
+ // canonicalizing floating-point operation such as fmul by 1.0. However,
+ // since in the llvm model canonicalization is not mandatory, the fmul
+ // would have been dropped by the time we reached here. The trick here
+ // is to use a reciprocal fdiv. It's not a droppable no-op, as it could
+ // return an infinity if %x were sufficiently small, but in this pattern
+ // we're only using the output for nan values.
+
+ if (Pred == CmpInst::FCMP_ORD) {
+ MatchCmp0 = TrueVal;
+ MatchCmp1 = FalseVal;
+ } else {
+ MatchCmp0 = FalseVal;
+ MatchCmp1 = TrueVal;
+ }
+
+ if (match(MatchCmp0, m_FCanonicalize(m_Specific(Cmp0))) &&
+ match(Cmp1, m_PosZeroFP())) {
+ const fltSemantics &FPSem =
+ SelType->getScalarType()->getFltSemantics();
+ if (APFloat::isIEEELikeFP(FPSem)) {
+ // IEEE handling does not have non-canonical values, so the
+ // canonicalize can be dropped for direct replacement without
+ // looking for the intermediate maybe-canonicalizing operation.
+ if (Cmp0 == MatchCmp1 && SI.getFunction()->getDenormalMode(FPSem) ==
+ DenormalMode::getIEEE())
+ return replaceInstUsesWith(SI, Cmp0);
+
+ // If denormals may be flushed, we need to retain the canonicalize
+ // call. This introduces a canonicalization on the nan path, which
+ // we are not free to do as that could change the sign bit or
+ // payload bits. We can only do this if there were a no-op like
+ // floating-point instruction which may have changed the nan bits
+ // anyway.
+ if (match(MatchCmp1, m_FDiv(m_FPOne(), m_Specific(Cmp0)))) {
+ DenormalMode Mode = SI.getFunction()->getDenormalMode(FPSem);
+ if (Mode == DenormalMode::getIEEE())
+ return replaceInstUsesWith(SI, Cmp0);
+
+ if (Mode.inputsAreZero() || Mode.outputsAreZero())
+ return replaceInstUsesWith(SI, MatchCmp0);
+ }
+
+ // Leave the dynamic mode case alone. This would introduce new
+ // constraints if the mode may be refined later.
+ }
+ }
+ }
}
}
diff --git a/llvm/test/Transforms/InstCombine/nanless-canonicalize-combine.ll b/llvm/test/Transforms/InstCombine/nanless-canonicalize-combine.ll
index 5aa156753e860..0937f6ff19c2b 100644
--- a/llvm/test/Transforms/InstCombine/nanless-canonicalize-combine.ll
+++ b/llvm/test/Transforms/InstCombine/nanless-canonicalize-combine.ll
@@ -9,11 +9,7 @@
define float @canonicalize_ieee_0(float %x) #0 {
; CHECK-LABEL: define float @canonicalize_ieee_0(
; CHECK-SAME: float [[X:%.*]]) #[[ATTR0:[0-9]+]] {
-; CHECK-NEXT: [[HARD_CANONICAL:%.*]] = call float @llvm.canonicalize.f32(float [[X]])
-; CHECK-NEXT: [[SOFT_CANONICAL:%.*]] = fdiv float 1.000000e+00, [[X]]
-; CHECK-NEXT: [[ORD:%.*]] = fcmp ord float [[X]], 0.000000e+00
-; CHECK-NEXT: [[X_CANON:%.*]] = select i1 [[ORD]], float [[HARD_CANONICAL]], float [[SOFT_CANONICAL]]
-; CHECK-NEXT: ret float [[X_CANON]]
+; CHECK-NEXT: ret float [[X]]
;
%hard.canonical = call float @llvm.canonicalize.f32(float %x)
%soft.canonical = fdiv float 1.0, %x
@@ -26,11 +22,7 @@ define float @canonicalize_ieee_0(float %x) #0 {
define float @canonicalize_ieee_1(float %x) #0 {
; CHECK-LABEL: define float @canonicalize_ieee_1(
; CHECK-SAME: float [[X:%.*]]) #[[ATTR0]] {
-; CHECK-NEXT: [[HARD_CANONICAL:%.*]] = call float @llvm.canonicalize.f32(float [[X]])
-; CHECK-NEXT: [[SOFT_CANONICAL:%.*]] = fdiv float 1.000000e+00, [[X]]
-; CHECK-NEXT: [[UNO:%.*]] = fcmp uno float [[X]], 0.000000e+00
-; CHECK-NEXT: [[X_CANON:%.*]] = select i1 [[UNO]], float [[SOFT_CANONICAL]], float [[HARD_CANONICAL]]
-; CHECK-NEXT: ret float [[X_CANON]]
+; CHECK-NEXT: ret float [[X]]
;
%hard.canonical = call float @llvm.canonicalize.f32(float %x)
%soft.canonical = fdiv float 1.0, %x
@@ -44,10 +36,7 @@ define float @canonicalize_ieee_1(float %x) #0 {
define float @canonicalize_ieee_0_fmul(float %x) #0 {
; CHECK-LABEL: define float @canonicalize_ieee_0_fmul(
; CHECK-SAME: float [[X:%.*]]) #[[ATTR0]] {
-; CHECK-NEXT: [[HARD_CANONICAL:%.*]] = call float @llvm.canonicalize.f32(float [[X]])
-; CHECK-NEXT: [[ORD:%.*]] = fcmp ord float [[X]], 0.000000e+00
-; CHECK-NEXT: [[X_CANON:%.*]] = select i1 [[ORD]], float [[HARD_CANONICAL]], float [[X]]
-; CHECK-NEXT: ret float [[X_CANON]]
+; CHECK-NEXT: ret float [[X]]
;
%hard.canonical = call float @llvm.canonicalize.f32(float %x)
%soft.canonical = fmul float %x, 1.0
@@ -61,10 +50,7 @@ define float @canonicalize_ieee_0_fmul(float %x) #0 {
define float @canonicalize_ieee_0_fdiv_commute(float %x) #0 {
; CHECK-LABEL: define float @canonicalize_ieee_0_fdiv_commute(
; CHECK-SAME: float [[X:%.*]]) #[[ATTR0]] {
-; CHECK-NEXT: [[HARD_CANONICAL:%.*]] = call float @llvm.canonicalize.f32(float [[X]])
-; CHECK-NEXT: [[ORD:%.*]] = fcmp ord float [[X]], 0.000000e+00
-; CHECK-NEXT: [[X_CANON:%.*]] = select i1 [[ORD]], float [[HARD_CANONICAL]], float [[X]]
-; CHECK-NEXT: ret float [[X_CANON]]
+; CHECK-NEXT: ret float [[X]]
;
%hard.canonical = call float @llvm.canonicalize.f32(float %x)
%soft.canonical = fdiv float %x, 1.0
@@ -79,10 +65,7 @@ define float @canonicalize_daz_0(float %x) #1 {
; CHECK-LABEL: define float @canonicalize_daz_0(
; CHECK-SAME: float [[X:%.*]]) #[[ATTR1:[0-9]+]] {
; CHECK-NEXT: [[HARD_CANONICAL:%.*]] = call float @llvm.canonicalize.f32(float [[X]])
-; CHECK-NEXT: [[SOFT_CANONICAL:%.*]] = fdiv float 1.000000e+00, [[X]]
-; CHECK-NEXT: [[ORD:%.*]] = fcmp ord float [[X]], 0.000000e+00
-; CHECK-NEXT: [[X_CANON:%.*]] = select i1 [[ORD]], float [[HARD_CANONICAL]], float [[SOFT_CANONICAL]]
-; CHECK-NEXT: ret float [[X_CANON]]
+; CHECK-NEXT: ret float [[HARD_CANONICAL]]
;
%hard.canonical = call float @llvm.canonicalize.f32(float %x)
%soft.canonical = fdiv float 1.0, %x
@@ -96,11 +79,8 @@ define float @canonicalize_daz_0(float %x) #1 {
define float @canonicalize_daz_1(float %x) #1 {
; CHECK-LABEL: define float @canonicalize_daz_1(
; CHECK-SAME: float [[X:%.*]]) #[[ATTR1]] {
-; CHECK-NEXT: [[HARD_CANONICAL:%.*]] = call float @llvm.canonicalize.f32(float [[X]])
-; CHECK-NEXT: [[SOFT_CANONICAL:%.*]] = fdiv float 1.000000e+00, [[X]]
-; CHECK-NEXT: [[UNO:%.*]] = fcmp uno float [[X]], 0.000000e+00
-; CHECK-NEXT: [[X_CANON:%.*]] = select i1 [[UNO]], float [[SOFT_CANONICAL]], float [[HARD_CANONICAL]]
-; CHECK-NEXT: ret float [[X_CANON]]
+; CHECK-NEXT: [[SOFT_CANONICAL:%.*]] = call float @llvm.canonicalize.f32(float [[X]])
+; CHECK-NEXT: ret float [[SOFT_CANONICAL]]
;
%hard.canonical = call float @llvm.canonicalize.f32(float %x)
%soft.canonical = fdiv float 1.0, %x
@@ -146,11 +126,7 @@ define float @canonicalize_dynamic_1(float %x) #2 {
define <2 x float> @canonicalize_ieee_0_vec(<2 x float> %x) #0 {
; CHECK-LABEL: define <2 x float> @canonicalize_ieee_0_vec(
; CHECK-SAME: <2 x float> [[X:%.*]]) #[[ATTR0]] {
-; CHECK-NEXT: [[HARD_CANONICAL:%.*]] = call <2 x float> @llvm.canonicalize.v2f32(<2 x float> [[X]])
-; CHECK-NEXT: [[SOFT_CANONICAL:%.*]] = fdiv <2 x float> splat (float 1.000000e+00), [[X]]
-; CHECK-NEXT: [[ORD:%.*]] = fcmp ord <2 x float> [[X]], zeroinitializer
-; CHECK-NEXT: [[X_CANON:%.*]] = select <2 x i1> [[ORD]], <2 x float> [[HARD_CANONICAL]], <2 x float> [[SOFT_CANONICAL]]
-; CHECK-NEXT: ret <2 x float> [[X_CANON]]
+; CHECK-NEXT: ret <2 x float> [[X]]
;
%hard.canonical = call <2 x float> @llvm.canonicalize.v2f32(<2 x float> %x)
%soft.canonical = fdiv <2 x float> splat (float 1.0), %x
@@ -162,11 +138,7 @@ define <2 x float> @canonicalize_ieee_0_vec(<2 x float> %x) #0 {
define <2 x float> @canonicalize_ieee_1_vec(<2 x float> %x) #0 {
; CHECK-LABEL: define <2 x float> @canonicalize_ieee_1_vec(
; CHECK-SAME: <2 x float> [[X:%.*]]) #[[ATTR0]] {
-; CHECK-NEXT: [[HARD_CANONICAL:%.*]] = call <2 x float> @llvm.canonicalize.v2f32(<2 x float> [[X]])
-; CHECK-NEXT: [[SOFT_CANONICAL:%.*]] = fdiv <2 x float> splat (float 1.000000e+00), [[X]]
-; CHECK-NEXT: [[UNO:%.*]] = fcmp uno <2 x float> [[X]], zeroinitializer
-; CHECK-NEXT: [[X_CANON:%.*]] = select <2 x i1> [[UNO]], <2 x float> [[SOFT_CANONICAL]], <2 x float> [[HARD_CANONICAL]]
-; CHECK-NEXT: ret <2 x float> [[X_CANON]]
+; CHECK-NEXT: ret <2 x float> [[X]]
;
%hard.canonical = call <2 x float> @llvm.canonicalize.v2f32(<2 x float> %x)
%soft.canonical = fdiv <2 x float> splat (float 1.0), %x
@@ -195,10 +167,7 @@ define <2 x float> @canonicalize_daz_0_vec(<2 x float> %x) #1 {
; CHECK-LABEL: define <2 x float> @canonicalize_daz_0_vec(
; CHECK-SAME: <2 x float> [[X:%.*]]) #[[ATTR1]] {
; CHECK-NEXT: [[HARD_CANONICAL:%.*]] = call <2 x float> @llvm.canonicalize.v2f32(<2 x float> [[X]])
-; CHECK-NEXT: [[SOFT_CANONICAL:%.*]] = fdiv <2 x float> splat (float 1.000000e+00), [[X]]
-; CHECK-NEXT: [[ORD:%.*]] = fcmp ord <2 x float> [[X]], zeroinitializer
-; CHECK-NEXT: [[X_CANON:%.*]] = select <2 x i1> [[ORD]], <2 x float> [[HARD_CANONICAL]], <2 x float> [[SOFT_CANONICAL]]
-; CHECK-NEXT: ret <2 x float> [[X_CANON]]
+; CHECK-NEXT: ret <2 x float> [[HARD_CANONICAL]]
;
%hard.canonical = call <2 x float> @llvm.canonicalize.v2f32(<2 x float> %x)
%soft.canonical = fdiv <2 x float> splat (float 1.0), %x
@@ -210,11 +179,8 @@ define <2 x float> @canonicalize_daz_0_vec(<2 x float> %x) #1 {
define <2 x float> @canonicalize_daz_1_vec(<2 x float> %x) #1 {
; CHECK-LABEL: define <2 x float> @canonicalize_daz_1_vec(
; CHECK-SAME: <2 x float> [[X:%.*]]) #[[ATTR1]] {
-; CHECK-NEXT: [[HARD_CANONICAL:%.*]] = call <2 x float> @llvm.canonicalize.v2f32(<2 x float> [[X]])
-; CHECK-NEXT: [[SOFT_CANONICAL:%.*]] = fdiv <2 x float> splat (float 1.000000e+00), [[X]]
-; CHECK-NEXT: [[UNO:%.*]] = fcmp uno <2 x float> [[X]], zeroinitializer
-; CHECK-NEXT: [[X_CANON:%.*]] = select <2 x i1> [[UNO]], <2 x float> [[SOFT_CANONICAL]], <2 x float> [[HARD_CANONICAL]]
-; CHECK-NEXT: ret <2 x float> [[X_CANON]]
+; CHECK-NEXT: [[SOFT_CANONICAL:%.*]] = call <2 x float> @llvm.canonicalize.v2f32(<2 x float> [[X]])
+; CHECK-NEXT: ret <2 x float> [[SOFT_CANONICAL]]
;
%hard.canonical = call <2 x float> @llvm.canonicalize.v2f32(<2 x float> %x)
%soft.canonical = fdiv <2 x float> splat (float 1.0), %x
@@ -226,11 +192,7 @@ define <2 x float> @canonicalize_daz_1_vec(<2 x float> %x) #1 {
define bfloat @canonicalize_ieee_bf16(bfloat %x) #0 {
; CHECK-LABEL: define bfloat @canonicalize_ieee_bf16(
; CHECK-SAME: bfloat [[X:%.*]]) #[[ATTR0]] {
-; CHECK-NEXT: [[HARD_CANONICAL:%.*]] = call bfloat @llvm.canonicalize.bf16(bfloat [[X]])
-; CHECK-NEXT: [[SOFT_CANONICAL:%.*]] = fdiv bfloat 0xR3F80, [[X]]
-; CHECK-NEXT: [[ORD:%.*]] = fcmp ord bfloat [[X]], 0xR0000
-; CHECK-NEXT: [[X_CANON:%.*]] = select i1 [[ORD]], bfloat [[HARD_CANONICAL]], bfloat [[SOFT_CANONICAL]]
-; CHECK-NEXT: ret bfloat [[X_CANON]]
+; CHECK-NEXT: ret bfloat [[X]]
;
%hard.canonical = call bfloat @llvm.canonicalize.bf16(bfloat %x)
%soft.canonical = fdiv bfloat 1.0, %x
@@ -242,11 +204,7 @@ define bfloat @canonicalize_ieee_bf16(bfloat %x) #0 {
define half @canonicalize_ieee_f16(half %x) #0 {
; CHECK-LABEL: define half @canonicalize_ieee_f16(
; CHECK-SAME: half [[X:%.*]]) #[[ATTR0]] {
-; CHECK-NEXT: [[HARD_CANONICAL:%.*]] = call half @llvm.canonicalize.f16(half [[X]])
-; CHECK-NEXT: [[SOFT_CANONICAL:%.*]] = fdiv half 0xH3C00, [[X]]
-; CHECK-NEXT: [[ORD:%.*]] = fcmp ord half [[X]], 0xH0000
-; CHECK-NEXT: [[X_CANON:%.*]] = select i1 [[ORD]], half [[HARD_CANONICAL]], half [[SOFT_CANONICAL]]
-; CHECK-NEXT: ret half [[X_CANON]]
+; CHECK-NEXT: ret half [[X]]
;
%hard.canonical = call half @llvm.canonicalize.f16(half %x)
%soft.canonical = fdiv half 1.0, %x
@@ -258,11 +216,7 @@ define half @canonicalize_ieee_f16(half %x) #0 {
define double @canonicalize_ieee_f64(double %x) #0 {
; CHECK-LABEL: define double @canonicalize_ieee_f64(
; CHECK-SAME: double [[X:%.*]]) #[[ATTR0]] {
-; CHECK-NEXT: [[HARD_CANONICAL:%.*]] = call double @llvm.canonicalize.f64(double [[X]])
-; CHECK-NEXT: [[SOFT_CANONICAL:%.*]] = fdiv double 1.000000e+00, [[X]]
-; CHECK-NEXT: [[ORD:%.*]] = fcmp ord double [[X]], 0.000000e+00
-; CHECK-NEXT: [[X_CANON:%.*]] = select i1 [[ORD]], double [[HARD_CANONICAL]], double [[SOFT_CANONICAL]]
-; CHECK-NEXT: ret double [[X_CANON]]
+; CHECK-NEXT: ret double [[X]]
;
%hard.canonical = call double @llvm.canonicalize.f64(double %x)
%soft.canonical = fdiv double 1.0, %x
@@ -274,10 +228,7 @@ define double @canonicalize_ieee_f64(double %x) #0 {
define fp128 @canonicalize_ieee_f128(fp128 %x) #0 {
; CHECK-LABEL: define fp128 @canonicalize_ieee_f128(
; CHECK-SAME: fp128 [[X:%.*]]) #[[ATTR0]] {
-; CHECK-NEXT: [[HARD_CANONICAL:%.*]] = call fp128 @llvm.canonicalize.f128(fp128 [[X]])
-; CHECK-NEXT: [[ORD:%.*]] = fcmp ord fp128 [[X]], 0xL00000000000000000000000000000000
-; CHECK-NEXT: [[X_CANON:%.*]] = select i1 [[ORD]], fp128 [[HARD_CANONICAL]], fp128 [[X]]
-; CHECK-NEXT: ret fp128 [[X_CANON]]
+; CHECK-NEXT: ret fp128 [[X]]
;
%hard.canonical = call fp128 @llvm.canonicalize.f128(fp128 %x)
%ord = fcmp ord fp128 %x, 0xL00000000000000000000000000000000
@@ -503,10 +454,7 @@ define ppc_fp128 @ignore_ppc_fp128(ppc_fp128 %x) #0 {
define float @canonicalize_ieee_0_missing_noop(float %x) #0 {
; CHECK-LABEL: define float @canonicalize_ieee_0_missing_noop(
; CHECK-SAME: float [[X:%.*]]) #[[ATTR0]] {
-; CHECK-NEXT: [[HARD_CANONICAL:%.*]] = call float @llvm.canonicalize.f32(float [[X]])
-; CHECK-NEXT: [[ORD:%.*]] = fcmp ord float [[X]], 0.000000e+00
-; CHECK-NEXT: [[X_CANON:%.*]] = select i1 [[ORD]], float [[HARD_CANONICAL]], float [[X]]
-; CHECK-NEXT: ret float [[X_CANON]]
+; CHECK-NEXT: ret float [[X]]
;
%hard.canonical = call float @llvm.canonicalize.f32(float %x)
%ord = fcmp ord float %x, 0.0
@@ -519,10 +467,7 @@ define float @canonicalize_ieee_0_missing_noop(float %x) #0 {
define float @canonicalize_ieee_1_missing_noop(float %x) #0 {
; CHECK-LABEL: define float @canonicalize_ieee_1_missing_noop(
; CHECK-SAME: float [[X:%.*]]) #[[ATTR0]] {
-; CHECK-NEXT: [[HARD_CANONICAL:%.*]] = call float @llvm.canonicalize.f32(float [[X]])
-; CHECK-NEXT: [[UNO:%.*]] = fcmp uno float [[X]], 0.000000e+00
-; CHECK-NEXT: [[X_CANON:%.*]] = select i1 [[UNO]], float [[X]], float [[HARD_CANONICAL]]
-; CHECK-NEXT: ret float [[X_CANON]]
+; CHECK-NEXT: ret float [[X]]
;
%hard.canonical = call float @llvm.canonicalize.f32(float %x)
%uno = fcmp uno float %x, 0.0
@@ -567,10 +512,7 @@ define float @canonicalize_only_ftz(float %x) "denormal-fp-math"="preserve-sign,
; CHECK-LABEL: define float @canonicalize_only_ftz(
; CHECK-SAME: float [[X:%.*]]) #[[ATTR3:[0-9]+]] {
; CHECK-NEXT: [[HARD_CANONICAL:%.*]] = call float @llvm.canonicalize.f32(float [[X]])
-; CHECK-NEXT: [[SOFT_CANONICAL:%.*]] = fdiv float 1.000000e+00, [[X]]
-; CHECK-NEXT: [[ORD:%.*]] = fcmp ord float [[X]], 0.000000e+00
-; CHECK-NEXT: [[X_CANON:%.*]] = select i1 [[ORD]], float [[HARD_CANONICAL]], float [[SOFT_CANONICAL]]
-; CHECK-NEXT: ret float [[X_CANON]]
+; CHECK-NEXT: ret float [[HARD_CANONICAL]]
;
%hard.canonical = call float @llvm.canonicalize.f32(float %x)
%soft.canonical = fdiv float 1.0, %x
@@ -584,10 +526,7 @@ define float @canonicalize_only_daz(float %x) "denormal-fp-math"="ieee,preserve-
; CHECK-LABEL: define float @canonicalize_only_daz(
; CHECK-SAME: float [[X:%.*]]) #[[ATTR4:[0-9]+]] {
; CHECK-NEXT: [[HARD_CANONICAL:%.*]] = call float @llvm.canonicalize.f32(float [[X]])
-; CHECK-NEXT: [[SOFT_CANONICAL:%.*]] = fdiv float 1.000000e+00, [[X]]
-; CHECK-NEXT: [[ORD:%.*]] = fcmp ord float [[X]], 0.000000e+00
-; CHECK-NEXT: [[X_CANON:%.*]] = select i1 [[ORD]], float [[HARD_CANONICAL]], float [[SOFT_CANONICAL]]
-; CHECK-NEXT: ret float [[X_CANON]]
+; CHECK-NEXT: ret float [[HARD_CANONICAL]]
;
%hard.canonical = call float @llvm.canonicalize.f32(float %x)
%soft.canonical = fdiv float 1.0, %x
|
dtcxzyw
left a comment
There was a problem hiding this comment.
I don't think your proof works: https://alive2.llvm.org/ce/z/T_dpgL
See https://github.com/AliveToolkit/alive2/blob/20a8472b77ba8a1f0b172059d422a55cb4dff120/ir/instr.cpp#L1168-L1170 Looks like alive2 only respects the standard IEEE fp semantics.
BTW, I'd like to see a two-stage fold: https://alive2.llvm.org/ce/z/DMKuW4
define half @src0(half %x, half %y) {
%hard.canonical = call half @llvm.canonicalize.f32(half %x)
%ord = fcmp ord half %x, 0.0
%x.canon = select i1 %ord, half %hard.canonical, half %y
ret half %x.canon
}
define half @tgt0(half %x, half %y) {
%ord = fcmp ord half %x, 0.0
%x.canon = select i1 %ord, half %x, half %y
ret half %x.canon
}
define half @src1(half %x, half %y) {
%cond = fcmp ord half %x, 0.0
%rcp = fdiv half 1.0, %x
%sel = select i1 %cond, half %y, half %rcp
ret half %sel
}
define half @tgt1(half %x, half %y) {
%cond = fcmp ord half %x, 0.0
%sel = select i1 %cond, half %y, half %x
ret half %sel
}
Both arms should be simplified independently.
I know I saw this catching issues with constant folding of canonicalize before. But this is broken even with IEEE because signaling nan quieting is still mandatory |
I'm not sure that's possible for the second case. The intermediate fdiv cannot disappear while handling the canonicalize, the canonicalize is absorbing the effects from the fdiv |
By second case, I mean all of the DAZ cases |
|
The second case here has trouble when %x is undef (though the online version always times out) |
eb788e6 to
6e74f3b
Compare
340f7e9 to
7c77545
Compare
|
Switched to replace the select operand in the cases where possible. The DAZ cases still need to treat this as a combined operation. The reciprocal fold also special cases the full operation to avoid introducing an unnecessary freeze. I'm also not confident I'm handling the undef case correctly |
6e74f3b to
b1e7ee1
Compare
7c77545 to
f793b21
Compare
|
ping |
Pattern match a wrapper around llvm.canonicalize which weakens the semantics to not require quieting signaling nans. Depending on the denormal mode and FP type, we can either drop the pattern entirely or reduce it only to a canonicalize call. I'm inventing this pattern to deal with LLVM's lax canonicalization model in math library code. The math library code currently has explicit checks for the denormal mode, and conditionally canonicalizes the result if there is flushing. Semantically, this could be directly replaced with a simple call to llvm.canonicalize, but doing so would incur an additional cost when using standard IEEE behavior. If we do not care about quieting a signaling nan, this should be a no-op unless the denormal mode may flush. This will allow replacement of the conditional code with a zero cost abstraction utility function. Note we need a standard LLVM floating-point operation in the nan case to assert we do not care about preserving the nan payload and sign bit. This could be any no-op fp instruction; a normal choice would be fmul by 1.0. Using that presents an ordering problem - since LLVM fp operations are not required to canonicalize, instcombine would fold out the fmul before reaching this select combine. The galaxy brain solution here is to use fdiv 1.0, %x as the no-op. This is not a no-op - it could potentially return infinity if %x were 0 (or very close to 0) so it will not be dropped. For the purposes here, that's fine since it's only being used as a nan sink. https://alive2.llvm.org/ce/z/QYS4en
f793b21 to
2f43a1e
Compare
b1e7ee1 to
b785d4a
Compare
|
ping |
| // select (fcmp ord %cmp0, 0), y, (fdiv 1, x) | ||
| // => select (fcmp ord %cmp0, 0), y, x | ||
| // | ||
| // select (fcmp uno %cmp0, 0), (fdiv 1, x), y | ||
| // => select (fcmp uno %cmp0, 0), x, y |
There was a problem hiding this comment.
| // select (fcmp ord %cmp0, 0), y, (fdiv 1, x) | |
| // => select (fcmp ord %cmp0, 0), y, x | |
| // | |
| // select (fcmp uno %cmp0, 0), (fdiv 1, x), y | |
| // => select (fcmp uno %cmp0, 0), x, y | |
| // select (fcmp ord x, 0), y, (fdiv 1, x) | |
| // => select (fcmp ord x, 0), y, x | |
| // | |
| // select (fcmp uno x, 0), (fdiv 1, x), y | |
| // => select (fcmp uno x, 0), x, y |
| // select (fcmp ord %cmp0, 0), canonicalize(x), y | ||
| // => select (fcmp ord %cmp0, 0), x, y |
There was a problem hiding this comment.
| // select (fcmp ord %cmp0, 0), canonicalize(x), y | |
| // => select (fcmp ord %cmp0, 0), x, y | |
| // select (fcmp ord x, 0), canonicalize(x), y | |
| // => select (fcmp ord x, 0), x, y |
| if (RcpIfNan) { | ||
| if (Mode == DenormalMode::getIEEE()) { |
There was a problem hiding this comment.
| if (RcpIfNan) { | |
| if (Mode == DenormalMode::getIEEE()) { | |
| if (RcpIfNan && Mode == DenormalMode::getIEEE()) { |
| // anyway. | ||
| if (RcpIfNan) { | ||
| if (Mode == DenormalMode::getIEEE()) | ||
| return replaceInstUsesWith(SI, Cmp0); |

Pattern match a wrapper around llvm.canonicalize which
weakens the semantics to not require quieting signaling
nans. Depending on the denormal mode and FP type, we can
either drop the pattern entirely or reduce it only to
a canonicalize call. I'm inventing this pattern to deal
with LLVM's lax canonicalization model in math library
code.
The math library code currently has explicit checks for
the denormal mode, and conditionally canonicalizes the
result if there is flushing. Semantically, this could be
directly replaced with a simple call to llvm.canonicalize,
but doing so would incur an additional cost when using
standard IEEE behavior. If we do not care about quieting
a signaling nan, this should be a no-op unless the denormal
mode may flush. This will allow replacement of the
conditional code with a zero cost abstraction utility
function.
Note we need a standard LLVM floating-point operation
in the nan case to assert we do not care about preserving
the nan payload and sign bit. This could be any no-op fp
instruction; a normal choice would be fmul by 1.0. Using
that presents an ordering problem - since LLVM fp operations
are not required to canonicalize, instcombine would fold
out the fmul before reaching this select combine. The galaxy
brain solution here is to use fdiv 1.0, %x as the no-op.
This is not a no-op - it could potentially return infinity
if %x were 0 (or very close to 0) so it will not be dropped.
For the purposes here, that's fine since it's only being used
as a nan sink.
https://alive2.llvm.org/ce/z/QYS4en