Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[LICM] Use DomTreeUpdater version of SplitBlockPredecessors, nfc #107190

Merged
merged 12 commits into from
Sep 30, 2024

Conversation

caojoshua
Copy link
Contributor

@caojoshua caojoshua commented Sep 4, 2024

The DominatorTree version is marked for deprecation, so we use the
DomTreeUpdater version. We also update sinkRegion() to iterate over
basic blocks instead of DomTreeNodes. The loop body calls
SplitBlockPredecessors. The DTU version calls
DomTreeUpdater::apply_updates(), which may call DominatorTree::reset().
This invalidates the worklist of DomTreeNodes to iterate over.

The added test would crash if we iterated over DomTreeNodes and called
the SplitBlockPredecessors DTU version.

When SplitBlockPredecessors uses DomTreeUpdater, it always assumes that
the newly created BB dominates the old BB. This is not true if the
consumer only passes in a subset of producers. For example:

```
bb0:
  br label %bb2
bb1:
  br label %bb2
bb2: ...
```

If the consumer calls SplitBlockPredecessors on bb2 with only bb0 as the
predecessor:

```
bb0:
  br label %new_bb
new_bb:
  br label %bb2
bb1:
  br label %bb2
bb2: ...
```

In this case, new_bb does NOT dominate bb2 because there is an
alternative path to bb2 from bb1. We fix this by only marking the new BB
as dominating the old BB(bb2) if the old BB has a single predecessor.

SplitBlockPredecessors with DominatorTree is deprecated in favor of
using the function with DomTreeUpdater. We move over a consumer in
LICM::splitPredecessorsOfLoopExit() in this patch, which would have
triggered the error.
@caojoshua
Copy link
Contributor Author

Found this issue while working on #97196

@llvmbot
Copy link
Collaborator

llvmbot commented Sep 4, 2024

@llvm/pr-subscribers-llvm-transforms

Author: Joshua Cao (caojoshua)

Changes

When SplitBlockPredecessors uses DomTreeUpdater, it always assumes that the newly created BB dominates the old BB. This is not true if the consumer only passes in a subset of producers. For example:

bb0:
  br label %bb2
bb1:
  br label %bb2
bb2: ...

If the consumer calls SplitBlockPredecessors on bb2 with only bb0 as the predecessor:

bb0:
  br label %new_bb
new_bb:
  br label %bb2
bb1:
  br label %bb2
bb2: ...

In this case, new_bb does NOT dominate bb2 because there is an alternative path to bb2 from bb1. We fix this by only marking the new BB as dominating the old BB(bb2) if the old BB has a single predecessor.

SplitBlockPredecessors with DominatorTree is deprecated in favor of using the function with DomTreeUpdater. We move over a consumer in LICM::splitPredecessorsOfLoopExit() in this patch, which would have triggered the error.


Full diff: https://github.com/llvm/llvm-project/pull/107190.diff

3 Files Affected:

  • (modified) llvm/lib/Transforms/Scalar/LICM.cpp (+3-1)
  • (modified) llvm/lib/Transforms/Utils/BasicBlockUtils.cpp (+4-1)
  • (modified) llvm/unittests/Transforms/Utils/BasicBlockUtilsTest.cpp (+71)
diff --git a/llvm/lib/Transforms/Scalar/LICM.cpp b/llvm/lib/Transforms/Scalar/LICM.cpp
index 86c7dceffc5245..d16a2a5fcad3a9 100644
--- a/llvm/lib/Transforms/Scalar/LICM.cpp
+++ b/llvm/lib/Transforms/Scalar/LICM.cpp
@@ -44,6 +44,7 @@
 #include "llvm/Analysis/AliasSetTracker.h"
 #include "llvm/Analysis/AssumptionCache.h"
 #include "llvm/Analysis/CaptureTracking.h"
+#include "llvm/Analysis/DomTreeUpdater.h"
 #include "llvm/Analysis/GuardUtils.h"
 #include "llvm/Analysis/LazyBlockFrequencyInfo.h"
 #include "llvm/Analysis/Loads.h"
@@ -1603,13 +1604,14 @@ static void splitPredecessorsOfLoopExit(PHINode *PN, DominatorTree *DT,
   //
   const auto &BlockColors = SafetyInfo->getBlockColors();
   SmallSetVector<BasicBlock *, 8> PredBBs(pred_begin(ExitBB), pred_end(ExitBB));
+  DomTreeUpdater DTU(DT, DomTreeUpdater::UpdateStrategy::Lazy);
   while (!PredBBs.empty()) {
     BasicBlock *PredBB = *PredBBs.begin();
     assert(CurLoop->contains(PredBB) &&
            "Expect all predecessors are in the loop");
     if (PN->getBasicBlockIndex(PredBB) >= 0) {
       BasicBlock *NewPred = SplitBlockPredecessors(
-          ExitBB, PredBB, ".split.loop.exit", DT, LI, MSSAU, true);
+          ExitBB, PredBB, ".split.loop.exit", &DTU, LI, MSSAU, true);
       // Since we do not allow splitting EH-block with BlockColors in
       // canSplitPredecessors(), we can simply assign predecessor's color to
       // the new block.
diff --git a/llvm/lib/Transforms/Utils/BasicBlockUtils.cpp b/llvm/lib/Transforms/Utils/BasicBlockUtils.cpp
index 4144c7993b7e42..24d9d023211499 100644
--- a/llvm/lib/Transforms/Utils/BasicBlockUtils.cpp
+++ b/llvm/lib/Transforms/Utils/BasicBlockUtils.cpp
@@ -1160,7 +1160,10 @@ static void UpdateAnalysisInformation(BasicBlock *OldBB, BasicBlock *NewBB,
       // Split block expects NewBB to have a non-empty set of predecessors.
       SmallVector<DominatorTree::UpdateType, 8> Updates;
       SmallPtrSet<BasicBlock *, 8> UniquePreds;
-      Updates.push_back({DominatorTree::Insert, NewBB, OldBB});
+      if (OldBB->getSinglePredecessor()) {
+        assert(OldBB->getSinglePredecessor() == NewBB);
+        Updates.push_back({DominatorTree::Insert, NewBB, OldBB});
+      }
       Updates.reserve(Updates.size() + 2 * Preds.size());
       for (auto *Pred : Preds)
         if (UniquePreds.insert(Pred).second) {
diff --git a/llvm/unittests/Transforms/Utils/BasicBlockUtilsTest.cpp b/llvm/unittests/Transforms/Utils/BasicBlockUtilsTest.cpp
index 56692cf25b7972..8b452e852f4565 100644
--- a/llvm/unittests/Transforms/Utils/BasicBlockUtilsTest.cpp
+++ b/llvm/unittests/Transforms/Utils/BasicBlockUtilsTest.cpp
@@ -414,6 +414,77 @@ define i32 @basic_func(i1 %cond) {
   EXPECT_TRUE(DT.verify());
 }
 
+TEST(BasicBlockUtils, SplitBlockPredecessorsLinear) {
+  LLVMContext C;
+  std::unique_ptr<Module> M = parseIR(C, R"IR(
+define i32 @basic_func() {
+entry:
+  br label %bb0
+bb0:
+  ret i32 0
+}
+)IR");
+  Function *F = M->getFunction("basic_func");
+  DominatorTree DT(*F);
+  DomTreeUpdater DTU(DT, DomTreeUpdater::UpdateStrategy::Eager);
+
+  BasicBlock *Entry = &F->getEntryBlock();
+  BasicBlock *BB0 = getBasicBlockByName(*F, "bb0");
+  SplitBlockPredecessors(BB0, {Entry}, "split.entry", &DTU);
+  EXPECT_TRUE(DT.verify());
+}
+
+TEST(BasicBlockUtils, SplitBlockPredecessorsSubsetOfPreds) {
+  LLVMContext C;
+  std::unique_ptr<Module> M = parseIR(C, R"IR(
+define i32 @basic_func(i1 %cond) {
+entry:
+  br i1 %cond, label %bb0, label %bb1
+bb0:
+  br label %bb2
+bb1:
+  br label %bb2
+bb2:
+  %phi = phi i32 [ 0, %bb0 ], [ 1, %bb1 ]
+  ret i32 %phi
+}
+)IR");
+  Function *F = M->getFunction("basic_func");
+  DominatorTree DT(*F);
+  DomTreeUpdater DTU(DT, DomTreeUpdater::UpdateStrategy::Eager);
+
+  BasicBlock *BB0 = getBasicBlockByName(*F, "bb0");
+  BasicBlock *BB2 = getBasicBlockByName(*F, "bb2");
+  SplitBlockPredecessors(BB2, {BB0}, "split.entry", &DTU);
+  EXPECT_TRUE(DT.verify());
+}
+
+TEST(BasicBlockUtils, SplitBlockPredecessorsAllPreds) {
+  LLVMContext C;
+  std::unique_ptr<Module> M = parseIR(C, R"IR(
+define i32 @basic_func(i1 %cond) {
+entry:
+  br i1 %cond, label %bb0, label %bb1
+bb0:
+  br label %bb2
+bb1:
+  br label %bb2
+bb2:
+  %phi = phi i32 [ 0, %bb0 ], [ 1, %bb1 ]
+  ret i32 %phi
+}
+)IR");
+  Function *F = M->getFunction("basic_func");
+  DominatorTree DT(*F);
+  DomTreeUpdater DTU(DT, DomTreeUpdater::UpdateStrategy::Eager);
+
+  BasicBlock *BB0 = getBasicBlockByName(*F, "bb0");
+  BasicBlock *BB1 = getBasicBlockByName(*F, "bb1");
+  BasicBlock *BB2 = getBasicBlockByName(*F, "bb2");
+  SplitBlockPredecessors(BB2, {BB0, BB1}, "split.entry", &DTU);
+  EXPECT_TRUE(DT.verify());
+}
+
 TEST(BasicBlockUtils, SplitCriticalEdge) {
   LLVMContext C;
   std::unique_ptr<Module> M = parseIR(C, R"IR(

if (OldBB->getSinglePredecessor()) {
assert(OldBB->getSinglePredecessor() == NewBB);
Updates.push_back({DominatorTree::Insert, NewBB, OldBB});
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't really understand this fix. The updates here are just the CFG changes, and as far as I can tell we do indeed always insert a CFG edge from NewBB to OldBB, so it should be reflected in the updates.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, we do always update a CFG edge. But that might not be a dominating edge. In the example above (bb2 is OldBB):

bb0:
  br label %new_bb
new_bb:
  br label %bb2
bb1:
  br label %bb2
bb2: ...

Without this patch, this would have added a edge from new_bb to bb2. But new_bb does not actually dominate bb2, since we can reach bb2 from another path bb1

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

DTU updates are about edges in the CFG, not the DT. The point of DTU is to determine the correct DT updates from CFG updates, so this kind of reasoning in the caller should never be necessary.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

OK, didn't know that about DTU. Actually just tested this and even if I comment out the if statement, it still passes the unit tests. However, if I use the built clang to compile anything, it runs into assertions in LICM. I'll look into it.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

OK yeah this isn't right. The issue is actually in LICM. If I modify it to iterate over BBs instead of DomTreeNodes, it works as intended. I think the DomTreeNodes are getting modified at some point while modifying the DomTree. The if statement somehow avoids the issue.

I'm having trouble creating a smaller reproducer. But the full issue is in https://buildkite.com/llvm-project/github-pull-requests/builds/88168#01911957-5a9a-4e62-bda6-b2f031368134 (from #97196)

This reverts commit a5ec0c3.
The DominatorTree version is marked for deprecation, so we use the
DomTreeUpdater version. We also update sinkRegion() to iterate over
basic blocks instead of DomTreeNodes. The loop body calls
SplitBlockPredecessors. The DTU version calls
DomTreeUpdater::apply_updates(), which may call DominatorTree::reset().
This invalidates the worklist of DomTreeNodes to iterate over.

The added test would crash if we iterated over DomTreeNodes and called
the SplitBlockPredecessors DTU version.
@caojoshua caojoshua changed the title [BasicBlockUtils] Fix SplitBlockPredecessors incorrect dominator insert [LICM] Use DomTreeUpdater version of SplitBlockPredecessors, nfc Sep 23, 2024
@caojoshua
Copy link
Contributor Author

caojoshua commented Sep 23, 2024

I completed reworked this patch. I found the root cause of the bugs after modifying LICM to use the DTU version of SplitBlockPredecessors was that we were iterating over DomTreeNodes while updating the DomTree at the same time. I updated the main PR post.

The test case is pretty complicated and its not obvious why it triggers the issue. Its the smallest test case I could create following @nikic 's How to reduce LLVM crashes. The original case comes from RustDemangle.cpp.

Copy link

github-actions bot commented Sep 23, 2024

✅ With the latest revision this PR passed the C/C++ code formatter.

}

; uselistorder directives
uselistorder ptr @_ZNK4llvm16itanium_demangle12OutputBuffer18getCurrentPositionEv, { 1, 0 }
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I doubt the uselistorder is necessary

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, removing this in latest commit

@@ -0,0 +1,889 @@
; NOTE: Assertions have been autogenerated by utils/update_test_checks.py UTC_ARGS: --version 5
; RUN: opt -S -passes='cgscc(devirt<4>(inline,function<eager-inv;no-rerun>(early-cse<memssa>,simplifycfg<bonus-inst-threshold=1;no-forward-switch-cond;switch-range-to-icmp;no-switch-to-lookup;keep-loops;no-hoist-common-insts;no-hoist-loads-stores-with-cond-faulting;no-sink-common-insts;speculate-blocks;simplify-cond-branch;no-speculate-unpredictables>,loop-mssa(licm<allowspeculation>))))' < %s | FileCheck %s
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should be able to get this down to one pass?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think so, not for this example. This is the best reduce_pipeline.py was able to do. In general this issue has been very hard to reproduce.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You need to do some manual pipeline cleanup after reduce_pipeline. Things like the cgscc and devirt wrapper are usually not needed, as are most pass options.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's usually also possible to use some at least partially inlined IR.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I shortened this line quite a bit. There are still four passes and the devirt. Anything less does not reproduce the error.

; Tests that we should not iterate over DomTreeNodes and modify the DomTree at the same time.

target datalayout = "e-m:e-p270:32:32-p271:32:32-p272:64:64-i64:64-i128:128-f80:128-n8:16:32:64-S128"
target triple = "x86_64-unknown-linux-gnu"
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Shouldn't need triple or datalayout

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If I remove these, and make LICM iterate over DomTreeNodes, it does not reproduce the crash.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

TBH, I think I'd prefer just not having a test for this than having this test. Presumably this is making the stars align in exactly the right way to trigger a crash, but it's probably not going to still be doing that half a year form now when something, somewhere changes in a small way. If there's no way to make the issue more reliably reproducible (e.g. via an extra assert somewhere) then this test isn't adding much, and will likely be a PITA to update in the future for random changes across four passes.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree. Its not really testing for the changes in this patch.

Copy link
Contributor

@nikic nikic left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@caojoshua caojoshua merged commit 0bc9834 into llvm:main Sep 30, 2024
8 checks passed
@dtcxzyw
Copy link
Member

dtcxzyw commented Sep 30, 2024

I noticed that this patch caused a significant compilation time regression (~7%) when building php.
Reproducer: https://github.com/dtcxzyw/llvm-opt-benchmark/blob/main/bench/php/original/parse_date.ll
with commit 6f3c151:

perf stat -e instructions:u bin/opt -O3 ../../llvm-opt-benchmark/bench/php/original/parse_date.ll --disable-output

Performance counter stats for 'bin/opt -O3 ../../llvm-opt-benchmark/bench/php/original/parse_date.ll --disable-output':

    16,235,016,831      instructions:u                                                        

       2.118263921 seconds time elapsed

       2.101738000 seconds user
       0.015998000 seconds sys

with this patch (commit 0bc9834):

perf stat -e instructions:u bin/opt -O3 ../../llvm-opt-benchmark/bench/php/original/parse_date.ll --disable-output

 Performance counter stats for 'bin/opt -O3 ../../llvm-opt-benchmark/bench/php/original/parse_date.ll --disable-output':

    44,855,633,670      instructions:u                                                        

       5.125998136 seconds time elapsed

       5.106567000 seconds user
       0.019002000 seconds sys

See dtcxzyw/llvm-opt-benchmark#1399 for more details.
Can someone have a look?

@nikic
Copy link
Contributor

nikic commented Sep 30, 2024

I can't say I'm particularly surprised. Manual DT updates are going to be faster than generic DTU updates, especially in pathological cases. We've observed similar things e.g. with DT updates during unrolling, and ended up replacing DTU with manual updates to address it.

TBH this makes me think that replacing existing uses of SplitBlockPredecessors directly updating DT with DTU is not something we should pursue.

@caojoshua
Copy link
Contributor Author

I suppose the SplitBlockPredecessors DTU implementation is not that smart. If we have a lot of Preds we might have to re-evaluate the DomTree many times. It would be more efficient to just find the common dominator D and insert an edge from D to NewBB.

I think its better to not have separate APIs for DominatorTree and DomTreeUpdater. Maybe we can just call DTU->getDomTree()->splitBlock()?

xgupta pushed a commit to xgupta/llvm-project that referenced this pull request Oct 4, 2024
…m#107190)

The DominatorTree version is marked for deprecation, so we use the
DomTreeUpdater version. We also update sinkRegion() to iterate over
basic blocks instead of DomTreeNodes. The loop body calls
SplitBlockPredecessors. The DTU version calls
DomTreeUpdater::apply_updates(), which may call DominatorTree::reset().
This invalidates the worklist of DomTreeNodes to iterate over.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants