-
Notifications
You must be signed in to change notification settings - Fork 12k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[DAGCombiner] Freeze maybe poison operands when folding select to logic #84924
Conversation
@llvm/pr-subscribers-backend-nvptx @llvm/pr-subscribers-llvm-selectiondag Author: Björn Pettersson (bjope) ChangesWork-in-progress, to fix #84653 Full diff: https://github.com/llvm/llvm-project/pull/84924.diff 1 Files Affected:
diff --git a/llvm/lib/CodeGen/SelectionDAG/DAGCombiner.cpp b/llvm/lib/CodeGen/SelectionDAG/DAGCombiner.cpp
index 5476ef87971436..46675f94642cc9 100644
--- a/llvm/lib/CodeGen/SelectionDAG/DAGCombiner.cpp
+++ b/llvm/lib/CodeGen/SelectionDAG/DAGCombiner.cpp
@@ -11344,28 +11344,34 @@ static SDValue foldBoolSelectToLogic(SDNode *N, SelectionDAG &DAG) {
if (VT != Cond.getValueType() || VT.getScalarSizeInBits() != 1)
return SDValue();
- // select Cond, Cond, F --> or Cond, F
- // select Cond, 1, F --> or Cond, F
+ auto FreezeIfNeeded = [&](SDValue V) {
+ if (!DAG.isGuaranteedNotToBePoison(V))
+ return DAG.getFreeze(V);
+ return V;
+ };
+
+ // select Cond, Cond, F --> or Cond, freeze(F)
+ // select Cond, 1, F --> or Cond, freeze(F)
if (Cond == T || isOneOrOneSplat(T, /* AllowUndefs */ true))
- return matcher.getNode(ISD::OR, SDLoc(N), VT, Cond, F);
+ return matcher.getNode(ISD::OR, SDLoc(N), VT, Cond, FreezeIfNeeded(F));
// select Cond, T, Cond --> and Cond, T
// select Cond, T, 0 --> and Cond, T
if (Cond == F || isNullOrNullSplat(F, /* AllowUndefs */ true))
- return matcher.getNode(ISD::AND, SDLoc(N), VT, Cond, T);
+ return matcher.getNode(ISD::AND, SDLoc(N), VT, Cond, FreezeIfNeeded(T));
// select Cond, T, 1 --> or (not Cond), T
if (isOneOrOneSplat(F, /* AllowUndefs */ true)) {
SDValue NotCond = matcher.getNode(ISD::XOR, SDLoc(N), VT, Cond,
DAG.getAllOnesConstant(SDLoc(N), VT));
- return matcher.getNode(ISD::OR, SDLoc(N), VT, NotCond, T);
+ return matcher.getNode(ISD::OR, SDLoc(N), VT, NotCond, FreezeIfNeeded(T));
}
// select Cond, 0, F --> and (not Cond), F
if (isNullOrNullSplat(T, /* AllowUndefs */ true)) {
SDValue NotCond = matcher.getNode(ISD::XOR, SDLoc(N), VT, Cond,
DAG.getAllOnesConstant(SDLoc(N), VT));
- return matcher.getNode(ISD::AND, SDLoc(N), VT, NotCond, F);
+ return matcher.getNode(ISD::AND, SDLoc(N), VT, NotCond, FreezeIfNeeded(F));
}
return SDValue();
@@ -11394,12 +11400,18 @@ static SDValue foldVSelectToSignBitSplatMask(SDNode *N, SelectionDAG &DAG) {
else
return SDValue();
+ auto FreezeIfNeeded = [&](SDValue V) {
+ if (!DAG.isGuaranteedNotToBePoison(V))
+ return DAG.getFreeze(V);
+ return V;
+ };
+
// (Cond0 s< 0) ? N1 : 0 --> (Cond0 s>> BW-1) & N1
if (isNullOrNullSplat(N2)) {
SDLoc DL(N);
SDValue ShiftAmt = DAG.getConstant(VT.getScalarSizeInBits() - 1, DL, VT);
SDValue Sra = DAG.getNode(ISD::SRA, DL, VT, Cond0, ShiftAmt);
- return DAG.getNode(ISD::AND, DL, VT, Sra, N1);
+ return DAG.getNode(ISD::AND, DL, VT, Sra, FreezeIfNeeded(N1));
}
// (Cond0 s< 0) ? -1 : N2 --> (Cond0 s>> BW-1) | N2
@@ -11407,7 +11419,7 @@ static SDValue foldVSelectToSignBitSplatMask(SDNode *N, SelectionDAG &DAG) {
SDLoc DL(N);
SDValue ShiftAmt = DAG.getConstant(VT.getScalarSizeInBits() - 1, DL, VT);
SDValue Sra = DAG.getNode(ISD::SRA, DL, VT, Cond0, ShiftAmt);
- return DAG.getNode(ISD::OR, DL, VT, Sra, N2);
+ return DAG.getNode(ISD::OR, DL, VT, Sra, FreezeIfNeeded(N2));
}
// If we have to invert the sign bit mask, only do that transform if the
@@ -11419,7 +11431,7 @@ static SDValue foldVSelectToSignBitSplatMask(SDNode *N, SelectionDAG &DAG) {
SDValue ShiftAmt = DAG.getConstant(VT.getScalarSizeInBits() - 1, DL, VT);
SDValue Sra = DAG.getNode(ISD::SRA, DL, VT, Cond0, ShiftAmt);
SDValue Not = DAG.getNOT(DL, Sra, VT);
- return DAG.getNode(ISD::AND, DL, VT, Not, N2);
+ return DAG.getNode(ISD::AND, DL, VT, Not, FreezeIfNeeded(N2));
}
// TODO: There's another pattern in this family, but it may require
|
@@ -6,21 +6,14 @@ | |||
|
|||
; (x0 < x1) && (x2 > x3) | |||
define i32 @cmp_and2(i32 %0, i32 %1, i32 %2, i32 %3) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I guess the typical problem here is that the arguments aren't marked as noundef
. But even if they had been marked as noundef
the lowering of arguments wouldn't resulting adding FREEZE nodes to indicate that the inputs aren't poison. Nor do we have a way to put such attributes on the CopyFromReg (or loads from stack) instructions that would map to accessing the arguments.
(No idea if there is something in AArch64 that could be changed here to still optimize this in a valid way or if the old transform is unsafe from a poison/undef perspective.)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Does SDAG ever look through CopyFromReg in some way? If not, we could assume that the value is frozen.
We can definitely treat function arguments as frozen in SDAG, I'm just not familiar with where else CopyFromReg gets used...
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm just not familiar with where else CopyFromReg gets used
SDAG operates per basic block, and I believe CopyFromReg/CopyToReg are generated for any use/def of a "global" value, i.e. one that is live outside of the current basic block.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I actually started on a patch that simply treated CopyFromReg similar as FREEZE in SelectionDAG::isGuaranteedNotToBeUndefOrPoison. Kind of based on the comments in ISDOpcodes.h about CopyFromReg loading from a register "that is defined outside of the scope of this SelectionDAG". So one could for example assume that we get the same value.
I'm not sure however if there could be other problems. On IR level we only treat arguments as not being poison depending on the noundef attribute. So then I started to think that maybe it was more complicated also for SelectionDAG. But maybe we are saved by only looking at one BB at a time.
I think CopyFromReg can be used on any BB, also dealing with values passed on from one BB to another. So it is at least not only about input arguments to the function (but maybe it can be seen as input arguments to the basic block). Would it for example be a problem if the value tracking would make assumtions based on dominating conditions in predecessor basic blocks? But maybe we avoid such things.
(I know that there were some discussion a couple of years back about poison in SelectionDAG, https://discourse.llvm.org/t/funnel-shift-select-and-poison/51255/25, but that never really landed in any formal decision about how to deal with it.)
@@ -295,22 +295,28 @@ define float @select_icmp_sle(i32 %x, i32 %y, float %a, float %b) { | |||
; Test peephole optimizations for select. | |||
define zeroext i1 @select_opt1(i1 zeroext %c, i1 zeroext %a) { | |||
; CHECK-LABEL: select_opt1 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I figure that these diffs are a bit annoying. But given that the AssertZext/AssertSext are handles as maybe producing poison we can't hoist the freeze above it. And then I think we lose the optimizations based on that the inputs are masked down to a single bit. But maybe this also kind of originates from CopyFromReg not being treated as an implicit FREEZE (at least if the arguments would have been marked as noundef
).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Another thing here is that the fast-isel code still optimizes this case. So either there is a bug in fast-isel that it doesn't consider poison here. Or we should be allowed to optimize this also for the regular SelectionDAG ISel.
Worth mentioning is that GlobalISel isn't part of the checks here, but it seems like GlobalISel also would emit the explicit AND.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I have wondered whether SelectionDAGBuilder should be inserting freeze to incoming nodes if we know they aren't undef/poison - that would allow freeze(assertzext(x)) or whatever?
Hi @bjope, could you please add the following test for #85190?
|
// select Cond, Cond, F --> or Cond, F | ||
// select Cond, 1, F --> or Cond, F | ||
auto FreezeIfNeeded = [&](SDValue V) { | ||
if (!DAG.isGuaranteedNotToBePoison(V)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do we have a plan to implement impliesPoison
for SDAG
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
getNode(ISD::FREEZE) should do this for us?
llvm-project/llvm/lib/CodeGen/SelectionDAG/SelectionDAG.cpp
Lines 5718 to 5723 in 02cb89b
case ISD::FREEZE: | |
assert(VT == N1.getValueType() && "Unexpected VT!"); | |
if (isGuaranteedNotToBeUndefOrPoison(N1, /*PoisonOnly*/ false, | |
/*Depth*/ 1)) | |
return N1; | |
break; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ah. Nice!
Downstream I also needed some additional checks to just do this for my target (since it is a pain doing it for all targets until this has landed upstream). So that is one reason why I've wrapped the check this way.
Ping for review :) |
llvm/test/CodeGen/PowerPC/pr40922.ll
Outdated
; CHECK-NEXT: crorc 20, 1, 4 | ||
; CHECK-NEXT: andi. 5, 5, 1 | ||
; CHECK-NEXT: crnot 20, 4 | ||
; CHECK-NEXT: cror 20, 1, 20 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This becomes worse. A combine for i1 type seems broken. Let me check why this happens.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
t43: i32,ch = load<(dereferenceable load (s32) from @a.b)> t10, t63, undef:i32
t44: i32,glue = addc t43, Constant:i32<6>
t46: i32 = and t44, Constant:i32<-17>
t48: i1 = setcc t46, t43, setuge:ch
t54: i1 = freeze t48 ;;This freeze breaks the instruction selection pattern for or(setcc) in PPC td files.
t55: i1 = or t58, t54
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this is a situation when we do not fold the freeze over an operation (in this case a setcc) due to the operation having more then one operand that may be poison.
However, we got
t43: i32,ch = load<(dereferenceable load (s32) from @a.b)> t10, GlobalAddress:i32<ptr @a.b> 0, undef:i32
t44: i32,glue = addc t43, Constant:i32<6>
t46: i32 = and t44, Constant:i32<-17>
t48: i1 = setcc t46, t43, setuge:ch
t54: i1 = freeze t48
so I think that t46 and t43 only could be poison/undef if t43 is poison/undef. So it would be enough to freeze t43, like this:
t43: i32,ch = load<(dereferenceable load (s32) from @a.b)> t10, GlobalAddress:i32<ptr @a.b> 0, undef:i32
t99: i32 = freeze t43
t44: i32,glue = addc t99, Constant:i32<6>
t46: i32 = and t44, Constant:i32<-17>
t54: i1 = setcc t46, t99, setuge:ch
Not exactly sure, but maybe if it would be possible to improve the analysis in DAGCombiner::visitFREEZE. Idea would be to allow pushing freeze through more operations (with multiple maybe poison/undef operands as we do for BUILD_VECTOR etc)). One could limit it to the scenraio when the isGuaranteedNotToBeUndefOrPoison checks for those operands reaches the same DAG node when recursing, i.e. when the likelyhood is that we still would end up with a single freeze in the end.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
For setcc, it's probably profitable to always push freeze, even if there are multiple operands.
The regressions needs to be mitigated to the maximum extent feasible before we make this change. |
Ping? |
@bjope Can you please rebase this PR? |
Seems like I never splitted this PR into two before going on vacation. I'll fix that and rebase. |
; CHECK: # %bb.0: | ||
; CHECK-NEXT: movb $1, %al | ||
; CHECK-NEXT: retq | ||
; SSE-LABEL: pmaddwd_pcmpgt_infinite_loop: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks like we get:
t3: v8i16 = BUILD_VECTOR Constant:i16<-32768>, Constant:i16<-32768>, Constant:i16<-32768>, Constant:i16<-32768>, Constant:i16<-32768>, Constant:i16<-32768>, Constant:i16<-32768>, Constant:i16<-32768>
t4: v4i32 = llvm.x86.sse2.pmadd.wd TargetConstant:i64<13393>, t3, t3
t62: v4i32 = freeze t4
X86 can perhaps model that intrinsic as not creating poison. But a more general solution could be to let SelectionDAG::computeKnownBits look through FREEZE (and then I guess the same should be done for SelectionDAG::ComputeNumSignBits as well).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Btw, it is a bit weird that SelectionDAG::computeKnownBits is handling AssertZext as if the upper bits are known to be zero. At least as long as we say that AssertZext may result in a poison value that is violating the asserted condition.
But if we consider these three examples, then I think computeKnownBits on %3 should give the same result, right?
%1 = AssertZext i32 %0, i1
%2 = lshr i32 %1, 1
%3 = freeze i32 %2
%1 = AssertZext i32 %0, i1
%2 = freeze i32 %1
%3 = lshr i32 %2, 1
%1 = AssertZext i32 %0, i1
%3 = lshr i32 %1, 1
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
All computeKnownBits() style APIs return "known or poison" results. It's not possible for look through freeze.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
All computeKnownBits() style APIs return "known or poison" results. It's not possible for look through freeze.
Right. Hm. And pushing freeze through operations aggressively/early might block other simplifications as in the example with lshr (which is why we currently avoid it for SRA and SRL).
Another idea then is to let DAGCombiner::visitFREEZE (or getNode for FREEZE) constant fold the operand (checking if computeKnownBits returns a constant). This to avoid that we mess up constant folding by pushing the freeze through an operation that hasn't been constant folded yet.
Or is that also incorrect given that computeKnownBits has returned "known or poison"? If any constant folded value that has been derived via computeKnownBits is a possible poison value, then how can SelectionDAG::isGuaranteedNotToBeUndefOrPoison return true for constants (we can't know if a constant is a possible posion value.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Anyway, maybe we can rely on that for example intrinsic calls like the one here has been constant folded already before ISel. And this test case isn't important to optimize (it is just a regression test for something that used to hit an infinite loop)..
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Another idea then is to let DAGCombiner::visitFREEZE (or getNode for FREEZE) constant fold the operand (checking if computeKnownBits returns a constant). This to avoid that we mess up constant folding by pushing the freeze through an operation that hasn't been constant folded yet. Or is that also incorrect given that computeKnownBits has returned "known or poison"? If any constant folded value that has been derived via computeKnownBits is a possible poison value, then how can SelectionDAG::isGuaranteedNotToBeUndefOrPoison return true for constants (we can't know if a constant is a possible posion value.
To answer that myself, constant folding the freeze should be ok. But just looking through FREEZE in computeKnownBits etc is not valid.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'll get X86ISD VPMADDUBSW/VPMADDWD handling added to canCreateUndefOrPoisonForTargetNode shortly
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So can I land this now? Branch out for llvm 19 is tomorrow. |
PMADD guarantee inbounds/saturated ext-multiply-add results Test to help with regression identified on #84924
…on't create poison Help with regression identified on #84924
Could you please land #94492 first and then rebase? |
Allow pushing freeze through SETCC and SELECT_CC even if there are multiple "maybe poison" operands. In the past we have limited it to a single "maybe poison" operand, but it seems profitable to also allow the multiple operand scenario. One goal here is to avoid some regressions seen in review of llvm#84924 when solving the select->and miscompiles described in llvm#84653
Allow pushing freeze through SETCC and SELECT_CC even if there are multiple "maybe poison" operands. In the past we have limited it to a single "maybe poison" operand, but it seems profitable to also allow the multiple operand scenario. One goal here is to avoid some regressions seen in review of llvm#84924 when solving the select->and miscompiles described in llvm#84653
Just like for regular IR we need to treat SELECT as conditionally blocking poison. So (unless the condition itself is poison) the result is only poison if the selected true/false value is poison. Thus, when doing DAG combines that turn SELECT into arithmetic/logical operations (e.g. AND/OR) we need to make sure that the new operations aren't more poisonous. One way to do that is to use FREEZE to make sure the operands aren't posion. This patch aims at fixing the kind of miscompiles reported in llvm#84653 and llvm#85190 Solution is to make sure that we insert FREEZE, if needed to make the fold sound, when using the foldBoolSelectToLogic and foldVSelectToSignBitSplatMask DAG combines.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
Thanks! |
PMADD guarantee inbounds/saturated ext-multiply-add results Test to help with regression identified on llvm#84924
…on't create poison Help with regression identified on llvm#84924
Allow pushing freeze through SETCC and SELECT_CC even if there are multiple "maybe poison" operands. In the past we have limited it to a single "maybe poison" operand, but it seems profitable to also allow the multiple operand scenario. One goal here is to avoid some regressions seen in review of llvm#84924 when solving the select->and miscompiles described in llvm#84653
…ic (llvm#84924) Just like for regular IR we need to treat SELECT as conditionally blocking poison in SelectionDAG. So (unless the condition itself is poison) the result is only poison if the selected true/false value is poison. Thus, when doing DAG combines that turn SELECT into arithmetic/logical operations (e.g. AND/OR) we need to make sure that the new operations aren't more poisonous. One way to do that is to use FREEZE to make sure the operands aren't posion. This patch aims at fixing the kind of miscompiles reported in llvm#84653 and llvm#85190 Solution is to make sure that we insert FREEZE, if needed to make the fold sound, when using the foldBoolSelectToLogic and foldVSelectToSignBitSplatMask DAG combines.
…ics don't create poison Fix regression introduced by #84924
Summary: PMADD guarantee inbounds/saturated ext-multiply-add results Test to help with regression identified on #84924 Test Plan: Reviewers: Subscribers: Tasks: Tags: Differential Revision: https://phabricator.intern.facebook.com/D60251129
…on't create poison Summary: Help with regression identified on #84924 Test Plan: Reviewers: Subscribers: Tasks: Tags: Differential Revision: https://phabricator.intern.facebook.com/D60251120
Summary: Allow pushing freeze through SETCC and SELECT_CC even if there are multiple "maybe poison" operands. In the past we have limited it to a single "maybe poison" operand, but it seems profitable to also allow the multiple operand scenario. One goal here is to avoid some regressions seen in review of #84924 when solving the select->and miscompiles described in #84653 Test Plan: Reviewers: Subscribers: Tasks: Tags: Differential Revision: https://phabricator.intern.facebook.com/D60251168
…ic (#84924) Just like for regular IR we need to treat SELECT as conditionally blocking poison in SelectionDAG. So (unless the condition itself is poison) the result is only poison if the selected true/false value is poison. Thus, when doing DAG combines that turn SELECT into arithmetic/logical operations (e.g. AND/OR) we need to make sure that the new operations aren't more poisonous. One way to do that is to use FREEZE to make sure the operands aren't posion. This patch aims at fixing the kind of miscompiles reported in #84653 and #85190 Solution is to make sure that we insert FREEZE, if needed to make the fold sound, when using the foldBoolSelectToLogic and foldVSelectToSignBitSplatMask DAG combines.
…ics don't create poison Summary: Fix regression introduced by #84924 Test Plan: Reviewers: Subscribers: Tasks: Tags: Differential Revision: https://phabricator.intern.facebook.com/D60250574
Just like for regular IR we need to treat SELECT as conditionally
blocking poison in SelectionDAG. So (unless the condition itself is
poison) the result is only poison if the selected true/false value is
poison.
Thus, when doing DAG combines that turn SELECT into arithmetic/logical
operations (e.g. AND/OR) we need to make sure that the new operations
aren't more poisonous. One way to do that is to use FREEZE to make
sure the operands aren't posion.
This patch aims at fixing the kind of miscompiles reported in
#84653
and
#85190
Solution is to make sure that we insert FREEZE, if needed to make
the fold sound, when using the foldBoolSelectToLogic and
foldVSelectToSignBitSplatMask DAG combines.