-
Notifications
You must be signed in to change notification settings - Fork 11.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[LICM] Use DomTreeUpdater version of SplitBlockPredecessors, nfc #107190
Conversation
When SplitBlockPredecessors uses DomTreeUpdater, it always assumes that the newly created BB dominates the old BB. This is not true if the consumer only passes in a subset of producers. For example: ``` bb0: br label %bb2 bb1: br label %bb2 bb2: ... ``` If the consumer calls SplitBlockPredecessors on bb2 with only bb0 as the predecessor: ``` bb0: br label %new_bb new_bb: br label %bb2 bb1: br label %bb2 bb2: ... ``` In this case, new_bb does NOT dominate bb2 because there is an alternative path to bb2 from bb1. We fix this by only marking the new BB as dominating the old BB(bb2) if the old BB has a single predecessor. SplitBlockPredecessors with DominatorTree is deprecated in favor of using the function with DomTreeUpdater. We move over a consumer in LICM::splitPredecessorsOfLoopExit() in this patch, which would have triggered the error.
Found this issue while working on #97196 |
@llvm/pr-subscribers-llvm-transforms Author: Joshua Cao (caojoshua) ChangesWhen SplitBlockPredecessors uses DomTreeUpdater, it always assumes that the newly created BB dominates the old BB. This is not true if the consumer only passes in a subset of producers. For example:
If the consumer calls SplitBlockPredecessors on bb2 with only bb0 as the predecessor:
In this case, new_bb does NOT dominate bb2 because there is an alternative path to bb2 from bb1. We fix this by only marking the new BB as dominating the old BB(bb2) if the old BB has a single predecessor. SplitBlockPredecessors with DominatorTree is deprecated in favor of using the function with DomTreeUpdater. We move over a consumer in LICM::splitPredecessorsOfLoopExit() in this patch, which would have triggered the error. Full diff: https://github.com/llvm/llvm-project/pull/107190.diff 3 Files Affected:
diff --git a/llvm/lib/Transforms/Scalar/LICM.cpp b/llvm/lib/Transforms/Scalar/LICM.cpp
index 86c7dceffc5245..d16a2a5fcad3a9 100644
--- a/llvm/lib/Transforms/Scalar/LICM.cpp
+++ b/llvm/lib/Transforms/Scalar/LICM.cpp
@@ -44,6 +44,7 @@
#include "llvm/Analysis/AliasSetTracker.h"
#include "llvm/Analysis/AssumptionCache.h"
#include "llvm/Analysis/CaptureTracking.h"
+#include "llvm/Analysis/DomTreeUpdater.h"
#include "llvm/Analysis/GuardUtils.h"
#include "llvm/Analysis/LazyBlockFrequencyInfo.h"
#include "llvm/Analysis/Loads.h"
@@ -1603,13 +1604,14 @@ static void splitPredecessorsOfLoopExit(PHINode *PN, DominatorTree *DT,
//
const auto &BlockColors = SafetyInfo->getBlockColors();
SmallSetVector<BasicBlock *, 8> PredBBs(pred_begin(ExitBB), pred_end(ExitBB));
+ DomTreeUpdater DTU(DT, DomTreeUpdater::UpdateStrategy::Lazy);
while (!PredBBs.empty()) {
BasicBlock *PredBB = *PredBBs.begin();
assert(CurLoop->contains(PredBB) &&
"Expect all predecessors are in the loop");
if (PN->getBasicBlockIndex(PredBB) >= 0) {
BasicBlock *NewPred = SplitBlockPredecessors(
- ExitBB, PredBB, ".split.loop.exit", DT, LI, MSSAU, true);
+ ExitBB, PredBB, ".split.loop.exit", &DTU, LI, MSSAU, true);
// Since we do not allow splitting EH-block with BlockColors in
// canSplitPredecessors(), we can simply assign predecessor's color to
// the new block.
diff --git a/llvm/lib/Transforms/Utils/BasicBlockUtils.cpp b/llvm/lib/Transforms/Utils/BasicBlockUtils.cpp
index 4144c7993b7e42..24d9d023211499 100644
--- a/llvm/lib/Transforms/Utils/BasicBlockUtils.cpp
+++ b/llvm/lib/Transforms/Utils/BasicBlockUtils.cpp
@@ -1160,7 +1160,10 @@ static void UpdateAnalysisInformation(BasicBlock *OldBB, BasicBlock *NewBB,
// Split block expects NewBB to have a non-empty set of predecessors.
SmallVector<DominatorTree::UpdateType, 8> Updates;
SmallPtrSet<BasicBlock *, 8> UniquePreds;
- Updates.push_back({DominatorTree::Insert, NewBB, OldBB});
+ if (OldBB->getSinglePredecessor()) {
+ assert(OldBB->getSinglePredecessor() == NewBB);
+ Updates.push_back({DominatorTree::Insert, NewBB, OldBB});
+ }
Updates.reserve(Updates.size() + 2 * Preds.size());
for (auto *Pred : Preds)
if (UniquePreds.insert(Pred).second) {
diff --git a/llvm/unittests/Transforms/Utils/BasicBlockUtilsTest.cpp b/llvm/unittests/Transforms/Utils/BasicBlockUtilsTest.cpp
index 56692cf25b7972..8b452e852f4565 100644
--- a/llvm/unittests/Transforms/Utils/BasicBlockUtilsTest.cpp
+++ b/llvm/unittests/Transforms/Utils/BasicBlockUtilsTest.cpp
@@ -414,6 +414,77 @@ define i32 @basic_func(i1 %cond) {
EXPECT_TRUE(DT.verify());
}
+TEST(BasicBlockUtils, SplitBlockPredecessorsLinear) {
+ LLVMContext C;
+ std::unique_ptr<Module> M = parseIR(C, R"IR(
+define i32 @basic_func() {
+entry:
+ br label %bb0
+bb0:
+ ret i32 0
+}
+)IR");
+ Function *F = M->getFunction("basic_func");
+ DominatorTree DT(*F);
+ DomTreeUpdater DTU(DT, DomTreeUpdater::UpdateStrategy::Eager);
+
+ BasicBlock *Entry = &F->getEntryBlock();
+ BasicBlock *BB0 = getBasicBlockByName(*F, "bb0");
+ SplitBlockPredecessors(BB0, {Entry}, "split.entry", &DTU);
+ EXPECT_TRUE(DT.verify());
+}
+
+TEST(BasicBlockUtils, SplitBlockPredecessorsSubsetOfPreds) {
+ LLVMContext C;
+ std::unique_ptr<Module> M = parseIR(C, R"IR(
+define i32 @basic_func(i1 %cond) {
+entry:
+ br i1 %cond, label %bb0, label %bb1
+bb0:
+ br label %bb2
+bb1:
+ br label %bb2
+bb2:
+ %phi = phi i32 [ 0, %bb0 ], [ 1, %bb1 ]
+ ret i32 %phi
+}
+)IR");
+ Function *F = M->getFunction("basic_func");
+ DominatorTree DT(*F);
+ DomTreeUpdater DTU(DT, DomTreeUpdater::UpdateStrategy::Eager);
+
+ BasicBlock *BB0 = getBasicBlockByName(*F, "bb0");
+ BasicBlock *BB2 = getBasicBlockByName(*F, "bb2");
+ SplitBlockPredecessors(BB2, {BB0}, "split.entry", &DTU);
+ EXPECT_TRUE(DT.verify());
+}
+
+TEST(BasicBlockUtils, SplitBlockPredecessorsAllPreds) {
+ LLVMContext C;
+ std::unique_ptr<Module> M = parseIR(C, R"IR(
+define i32 @basic_func(i1 %cond) {
+entry:
+ br i1 %cond, label %bb0, label %bb1
+bb0:
+ br label %bb2
+bb1:
+ br label %bb2
+bb2:
+ %phi = phi i32 [ 0, %bb0 ], [ 1, %bb1 ]
+ ret i32 %phi
+}
+)IR");
+ Function *F = M->getFunction("basic_func");
+ DominatorTree DT(*F);
+ DomTreeUpdater DTU(DT, DomTreeUpdater::UpdateStrategy::Eager);
+
+ BasicBlock *BB0 = getBasicBlockByName(*F, "bb0");
+ BasicBlock *BB1 = getBasicBlockByName(*F, "bb1");
+ BasicBlock *BB2 = getBasicBlockByName(*F, "bb2");
+ SplitBlockPredecessors(BB2, {BB0, BB1}, "split.entry", &DTU);
+ EXPECT_TRUE(DT.verify());
+}
+
TEST(BasicBlockUtils, SplitCriticalEdge) {
LLVMContext C;
std::unique_ptr<Module> M = parseIR(C, R"IR(
|
if (OldBB->getSinglePredecessor()) { | ||
assert(OldBB->getSinglePredecessor() == NewBB); | ||
Updates.push_back({DominatorTree::Insert, NewBB, OldBB}); | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't really understand this fix. The updates here are just the CFG changes, and as far as I can tell we do indeed always insert a CFG edge from NewBB to OldBB, so it should be reflected in the updates.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, we do always update a CFG edge. But that might not be a dominating edge. In the example above (bb2
is OldBB
):
bb0:
br label %new_bb
new_bb:
br label %bb2
bb1:
br label %bb2
bb2: ...
Without this patch, this would have added a edge from new_bb
to bb2
. But new_bb
does not actually dominate bb2
, since we can reach bb2
from another path bb1
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
DTU updates are about edges in the CFG, not the DT. The point of DTU is to determine the correct DT updates from CFG updates, so this kind of reasoning in the caller should never be necessary.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
OK, didn't know that about DTU. Actually just tested this and even if I comment out the if statement, it still passes the unit tests. However, if I use the built clang to compile anything, it runs into assertions in LICM. I'll look into it.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
OK yeah this isn't right. The issue is actually in LICM. If I modify it to iterate over BBs instead of DomTreeNodes, it works as intended. I think the DomTreeNodes are getting modified at some point while modifying the DomTree. The if statement somehow avoids the issue.
I'm having trouble creating a smaller reproducer. But the full issue is in https://buildkite.com/llvm-project/github-pull-requests/builds/88168#01911957-5a9a-4e62-bda6-b2f031368134 (from #97196)
This reverts commit a5ec0c3.
…tor insert" This reverts commit 0caf5b5.
The DominatorTree version is marked for deprecation, so we use the DomTreeUpdater version. We also update sinkRegion() to iterate over basic blocks instead of DomTreeNodes. The loop body calls SplitBlockPredecessors. The DTU version calls DomTreeUpdater::apply_updates(), which may call DominatorTree::reset(). This invalidates the worklist of DomTreeNodes to iterate over. The added test would crash if we iterated over DomTreeNodes and called the SplitBlockPredecessors DTU version.
I completed reworked this patch. I found the root cause of the bugs after modifying LICM to use the DTU version of SplitBlockPredecessors was that we were iterating over DomTreeNodes while updating the DomTree at the same time. I updated the main PR post. The test case is pretty complicated and its not obvious why it triggers the issue. Its the smallest test case I could create following @nikic 's How to reduce LLVM crashes. The original case comes from RustDemangle.cpp. |
✅ With the latest revision this PR passed the C/C++ code formatter. |
} | ||
|
||
; uselistorder directives | ||
uselistorder ptr @_ZNK4llvm16itanium_demangle12OutputBuffer18getCurrentPositionEv, { 1, 0 } |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I doubt the uselistorder is necessary
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, removing this in latest commit
@@ -0,0 +1,889 @@ | |||
; NOTE: Assertions have been autogenerated by utils/update_test_checks.py UTC_ARGS: --version 5 | |||
; RUN: opt -S -passes='cgscc(devirt<4>(inline,function<eager-inv;no-rerun>(early-cse<memssa>,simplifycfg<bonus-inst-threshold=1;no-forward-switch-cond;switch-range-to-icmp;no-switch-to-lookup;keep-loops;no-hoist-common-insts;no-hoist-loads-stores-with-cond-faulting;no-sink-common-insts;speculate-blocks;simplify-cond-branch;no-speculate-unpredictables>,loop-mssa(licm<allowspeculation>))))' < %s | FileCheck %s |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should be able to get this down to one pass?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't think so, not for this example. This is the best reduce_pipeline.py
was able to do. In general this issue has been very hard to reproduce.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You need to do some manual pipeline cleanup after reduce_pipeline. Things like the cgscc and devirt wrapper are usually not needed, as are most pass options.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's usually also possible to use some at least partially inlined IR.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I shortened this line quite a bit. There are still four passes and the devirt. Anything less does not reproduce the error.
; Tests that we should not iterate over DomTreeNodes and modify the DomTree at the same time. | ||
|
||
target datalayout = "e-m:e-p270:32:32-p271:32:32-p272:64:64-i64:64-i128:128-f80:128-n8:16:32:64-S128" | ||
target triple = "x86_64-unknown-linux-gnu" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Shouldn't need triple or datalayout
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If I remove these, and make LICM iterate over DomTreeNodes, it does not reproduce the crash.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
TBH, I think I'd prefer just not having a test for this than having this test. Presumably this is making the stars align in exactly the right way to trigger a crash, but it's probably not going to still be doing that half a year form now when something, somewhere changes in a small way. If there's no way to make the issue more reliably reproducible (e.g. via an extra assert somewhere) then this test isn't adding much, and will likely be a PITA to update in the future for random changes across four passes.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I agree. Its not really testing for the changes in this patch.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
I noticed that this patch caused a significant compilation time regression (~7%) when building php.
with this patch (commit 0bc9834):
See dtcxzyw/llvm-opt-benchmark#1399 for more details. |
I can't say I'm particularly surprised. Manual DT updates are going to be faster than generic DTU updates, especially in pathological cases. We've observed similar things e.g. with DT updates during unrolling, and ended up replacing DTU with manual updates to address it. TBH this makes me think that replacing existing uses of SplitBlockPredecessors directly updating DT with DTU is not something we should pursue. |
I suppose the SplitBlockPredecessors DTU implementation is not that smart. If we have a lot of I think its better to not have separate APIs for DominatorTree and DomTreeUpdater. Maybe we can just call |
…m#107190) The DominatorTree version is marked for deprecation, so we use the DomTreeUpdater version. We also update sinkRegion() to iterate over basic blocks instead of DomTreeNodes. The loop body calls SplitBlockPredecessors. The DTU version calls DomTreeUpdater::apply_updates(), which may call DominatorTree::reset(). This invalidates the worklist of DomTreeNodes to iterate over.
The DominatorTree version is marked for deprecation, so we use the
DomTreeUpdater version. We also update sinkRegion() to iterate over
basic blocks instead of DomTreeNodes. The loop body calls
SplitBlockPredecessors. The DTU version calls
DomTreeUpdater::apply_updates(), which may call DominatorTree::reset().
This invalidates the worklist of DomTreeNodes to iterate over.
The added test would crash if we iterated over DomTreeNodes and called
the SplitBlockPredecessors DTU version.