[LICM] Use DomTreeUpdater version of SplitBlockPredecessors, nfc #107190

caojoshua · 2024-09-04T06:31:34Z

The DominatorTree version is marked for deprecation, so we use the
DomTreeUpdater version. We also update sinkRegion() to iterate over
basic blocks instead of DomTreeNodes. The loop body calls
SplitBlockPredecessors. The DTU version calls
DomTreeUpdater::apply_updates(), which may call DominatorTree::reset().
This invalidates the worklist of DomTreeNodes to iterate over.

The added test would crash if we iterated over DomTreeNodes and called
the SplitBlockPredecessors DTU version.

When SplitBlockPredecessors uses DomTreeUpdater, it always assumes that the newly created BB dominates the old BB. This is not true if the consumer only passes in a subset of producers. For example: ``` bb0: br label %bb2 bb1: br label %bb2 bb2: ... ``` If the consumer calls SplitBlockPredecessors on bb2 with only bb0 as the predecessor: ``` bb0: br label %new_bb new_bb: br label %bb2 bb1: br label %bb2 bb2: ... ``` In this case, new_bb does NOT dominate bb2 because there is an alternative path to bb2 from bb1. We fix this by only marking the new BB as dominating the old BB(bb2) if the old BB has a single predecessor. SplitBlockPredecessors with DominatorTree is deprecated in favor of using the function with DomTreeUpdater. We move over a consumer in LICM::splitPredecessorsOfLoopExit() in this patch, which would have triggered the error.

caojoshua · 2024-09-04T06:31:54Z

Found this issue while working on #97196

llvmbot · 2024-09-04T06:32:04Z

@llvm/pr-subscribers-llvm-transforms

Author: Joshua Cao (caojoshua)

Changes

When SplitBlockPredecessors uses DomTreeUpdater, it always assumes that the newly created BB dominates the old BB. This is not true if the consumer only passes in a subset of producers. For example:

bb0:
  br label %bb2
bb1:
  br label %bb2
bb2: ...

If the consumer calls SplitBlockPredecessors on bb2 with only bb0 as the predecessor:

bb0:
  br label %new_bb
new_bb:
  br label %bb2
bb1:
  br label %bb2
bb2: ...

In this case, new_bb does NOT dominate bb2 because there is an alternative path to bb2 from bb1. We fix this by only marking the new BB as dominating the old BB(bb2) if the old BB has a single predecessor.

SplitBlockPredecessors with DominatorTree is deprecated in favor of using the function with DomTreeUpdater. We move over a consumer in LICM::splitPredecessorsOfLoopExit() in this patch, which would have triggered the error.

Full diff: https://github.com/llvm/llvm-project/pull/107190.diff

3 Files Affected:

(modified) llvm/lib/Transforms/Scalar/LICM.cpp (+3-1)
(modified) llvm/lib/Transforms/Utils/BasicBlockUtils.cpp (+4-1)
(modified) llvm/unittests/Transforms/Utils/BasicBlockUtilsTest.cpp (+71)

diff --git a/llvm/lib/Transforms/Scalar/LICM.cpp b/llvm/lib/Transforms/Scalar/LICM.cpp
index 86c7dceffc5245..d16a2a5fcad3a9 100644
--- a/llvm/lib/Transforms/Scalar/LICM.cpp
+++ b/llvm/lib/Transforms/Scalar/LICM.cpp
@@ -44,6 +44,7 @@
 #include "llvm/Analysis/AliasSetTracker.h"
 #include "llvm/Analysis/AssumptionCache.h"
 #include "llvm/Analysis/CaptureTracking.h"
+#include "llvm/Analysis/DomTreeUpdater.h"
 #include "llvm/Analysis/GuardUtils.h"
 #include "llvm/Analysis/LazyBlockFrequencyInfo.h"
 #include "llvm/Analysis/Loads.h"
@@ -1603,13 +1604,14 @@ static void splitPredecessorsOfLoopExit(PHINode *PN, DominatorTree *DT,
   //
   const auto &BlockColors = SafetyInfo->getBlockColors();
   SmallSetVector<BasicBlock *, 8> PredBBs(pred_begin(ExitBB), pred_end(ExitBB));
+  DomTreeUpdater DTU(DT, DomTreeUpdater::UpdateStrategy::Lazy);
   while (!PredBBs.empty()) {
     BasicBlock *PredBB = *PredBBs.begin();
     assert(CurLoop->contains(PredBB) &&
            "Expect all predecessors are in the loop");
     if (PN->getBasicBlockIndex(PredBB) >= 0) {
       BasicBlock *NewPred = SplitBlockPredecessors(
-          ExitBB, PredBB, ".split.loop.exit", DT, LI, MSSAU, true);
+          ExitBB, PredBB, ".split.loop.exit", &DTU, LI, MSSAU, true);
       // Since we do not allow splitting EH-block with BlockColors in
       // canSplitPredecessors(), we can simply assign predecessor's color to
       // the new block.
diff --git a/llvm/lib/Transforms/Utils/BasicBlockUtils.cpp b/llvm/lib/Transforms/Utils/BasicBlockUtils.cpp
index 4144c7993b7e42..24d9d023211499 100644
--- a/llvm/lib/Transforms/Utils/BasicBlockUtils.cpp
+++ b/llvm/lib/Transforms/Utils/BasicBlockUtils.cpp
@@ -1160,7 +1160,10 @@ static void UpdateAnalysisInformation(BasicBlock *OldBB, BasicBlock *NewBB,
       // Split block expects NewBB to have a non-empty set of predecessors.
       SmallVector<DominatorTree::UpdateType, 8> Updates;
       SmallPtrSet<BasicBlock *, 8> UniquePreds;
-      Updates.push_back({DominatorTree::Insert, NewBB, OldBB});
+      if (OldBB->getSinglePredecessor()) {
+        assert(OldBB->getSinglePredecessor() == NewBB);
+        Updates.push_back({DominatorTree::Insert, NewBB, OldBB});
+      }
       Updates.reserve(Updates.size() + 2 * Preds.size());
       for (auto *Pred : Preds)
         if (UniquePreds.insert(Pred).second) {
diff --git a/llvm/unittests/Transforms/Utils/BasicBlockUtilsTest.cpp b/llvm/unittests/Transforms/Utils/BasicBlockUtilsTest.cpp
index 56692cf25b7972..8b452e852f4565 100644
--- a/llvm/unittests/Transforms/Utils/BasicBlockUtilsTest.cpp
+++ b/llvm/unittests/Transforms/Utils/BasicBlockUtilsTest.cpp
@@ -414,6 +414,77 @@ define i32 @basic_func(i1 %cond) {
   EXPECT_TRUE(DT.verify());
 }
 
+TEST(BasicBlockUtils, SplitBlockPredecessorsLinear) {
+  LLVMContext C;
+  std::unique_ptr<Module> M = parseIR(C, R"IR(
+define i32 @basic_func() {
+entry:
+  br label %bb0
+bb0:
+  ret i32 0
+}
+)IR");
+  Function *F = M->getFunction("basic_func");
+  DominatorTree DT(*F);
+  DomTreeUpdater DTU(DT, DomTreeUpdater::UpdateStrategy::Eager);
+
+  BasicBlock *Entry = &F->getEntryBlock();
+  BasicBlock *BB0 = getBasicBlockByName(*F, "bb0");
+  SplitBlockPredecessors(BB0, {Entry}, "split.entry", &DTU);
+  EXPECT_TRUE(DT.verify());
+}
+
+TEST(BasicBlockUtils, SplitBlockPredecessorsSubsetOfPreds) {
+  LLVMContext C;
+  std::unique_ptr<Module> M = parseIR(C, R"IR(
+define i32 @basic_func(i1 %cond) {
+entry:
+  br i1 %cond, label %bb0, label %bb1
+bb0:
+  br label %bb2
+bb1:
+  br label %bb2
+bb2:
+  %phi = phi i32 [ 0, %bb0 ], [ 1, %bb1 ]
+  ret i32 %phi
+}
+)IR");
+  Function *F = M->getFunction("basic_func");
+  DominatorTree DT(*F);
+  DomTreeUpdater DTU(DT, DomTreeUpdater::UpdateStrategy::Eager);
+
+  BasicBlock *BB0 = getBasicBlockByName(*F, "bb0");
+  BasicBlock *BB2 = getBasicBlockByName(*F, "bb2");
+  SplitBlockPredecessors(BB2, {BB0}, "split.entry", &DTU);
+  EXPECT_TRUE(DT.verify());
+}
+
+TEST(BasicBlockUtils, SplitBlockPredecessorsAllPreds) {
+  LLVMContext C;
+  std::unique_ptr<Module> M = parseIR(C, R"IR(
+define i32 @basic_func(i1 %cond) {
+entry:
+  br i1 %cond, label %bb0, label %bb1
+bb0:
+  br label %bb2
+bb1:
+  br label %bb2
+bb2:
+  %phi = phi i32 [ 0, %bb0 ], [ 1, %bb1 ]
+  ret i32 %phi
+}
+)IR");
+  Function *F = M->getFunction("basic_func");
+  DominatorTree DT(*F);
+  DomTreeUpdater DTU(DT, DomTreeUpdater::UpdateStrategy::Eager);
+
+  BasicBlock *BB0 = getBasicBlockByName(*F, "bb0");
+  BasicBlock *BB1 = getBasicBlockByName(*F, "bb1");
+  BasicBlock *BB2 = getBasicBlockByName(*F, "bb2");
+  SplitBlockPredecessors(BB2, {BB0, BB1}, "split.entry", &DTU);
+  EXPECT_TRUE(DT.verify());
+}
+
 TEST(BasicBlockUtils, SplitCriticalEdge) {
   LLVMContext C;
   std::unique_ptr<Module> M = parseIR(C, R"IR(

nikic · 2024-09-04T07:16:22Z

llvm/lib/Transforms/Utils/BasicBlockUtils.cpp

+      if (OldBB->getSinglePredecessor()) {
+        assert(OldBB->getSinglePredecessor() == NewBB);
+        Updates.push_back({DominatorTree::Insert, NewBB, OldBB});
+      }


I don't really understand this fix. The updates here are just the CFG changes, and as far as I can tell we do indeed always insert a CFG edge from NewBB to OldBB, so it should be reflected in the updates.

Yes, we do always update a CFG edge. But that might not be a dominating edge. In the example above (bb2 is OldBB):

bb0: br label %new_bb new_bb: br label %bb2 bb1: br label %bb2 bb2: ...

Without this patch, this would have added a edge from new_bb to bb2. But new_bb does not actually dominate bb2, since we can reach bb2 from another path bb1

DTU updates are about edges in the CFG, not the DT. The point of DTU is to determine the correct DT updates from CFG updates, so this kind of reasoning in the caller should never be necessary.

OK, didn't know that about DTU. Actually just tested this and even if I comment out the if statement, it still passes the unit tests. However, if I use the built clang to compile anything, it runs into assertions in LICM. I'll look into it.

OK yeah this isn't right. The issue is actually in LICM. If I modify it to iterate over BBs instead of DomTreeNodes, it works as intended. I think the DomTreeNodes are getting modified at some point while modifying the DomTree. The if statement somehow avoids the issue.

I'm having trouble creating a smaller reproducer. But the full issue is in https://buildkite.com/llvm-project/github-pull-requests/builds/88168#01911957-5a9a-4e62-bda6-b2f031368134 (from #97196)

This reverts commit a5ec0c3.

…tor insert" This reverts commit 0caf5b5.

The DominatorTree version is marked for deprecation, so we use the DomTreeUpdater version. We also update sinkRegion() to iterate over basic blocks instead of DomTreeNodes. The loop body calls SplitBlockPredecessors. The DTU version calls DomTreeUpdater::apply_updates(), which may call DominatorTree::reset(). This invalidates the worklist of DomTreeNodes to iterate over. The added test would crash if we iterated over DomTreeNodes and called the SplitBlockPredecessors DTU version.

caojoshua · 2024-09-23T08:44:26Z

I completed reworked this patch. I found the root cause of the bugs after modifying LICM to use the DTU version of SplitBlockPredecessors was that we were iterating over DomTreeNodes while updating the DomTree at the same time. I updated the main PR post.

The test case is pretty complicated and its not obvious why it triggers the issue. Its the smallest test case I could create following @nikic 's How to reduce LLVM crashes. The original case comes from RustDemangle.cpp.

github-actions · 2024-09-23T08:46:28Z

✅ With the latest revision this PR passed the C/C++ code formatter.

arsenm · 2024-09-23T11:34:17Z

llvm/test/Transforms/LICM/iterate-over-dom-nodes.ll

+}
+
+; uselistorder directives
+uselistorder ptr @_ZNK4llvm16itanium_demangle12OutputBuffer18getCurrentPositionEv, { 1, 0 }


I doubt the uselistorder is necessary

Yes, removing this in latest commit

arsenm · 2024-09-23T11:34:44Z

llvm/test/Transforms/LICM/iterate-over-dom-nodes.ll

@@ -0,0 +1,889 @@
+; NOTE: Assertions have been autogenerated by utils/update_test_checks.py UTC_ARGS: --version 5
+; RUN: opt -S -passes='cgscc(devirt<4>(inline,function<eager-inv;no-rerun>(early-cse<memssa>,simplifycfg<bonus-inst-threshold=1;no-forward-switch-cond;switch-range-to-icmp;no-switch-to-lookup;keep-loops;no-hoist-common-insts;no-hoist-loads-stores-with-cond-faulting;no-sink-common-insts;speculate-blocks;simplify-cond-branch;no-speculate-unpredictables>,loop-mssa(licm<allowspeculation>))))' < %s | FileCheck %s


Should be able to get this down to one pass?

I don't think so, not for this example. This is the best reduce_pipeline.py was able to do. In general this issue has been very hard to reproduce.

You need to do some manual pipeline cleanup after reduce_pipeline. Things like the cgscc and devirt wrapper are usually not needed, as are most pass options.

It's usually also possible to use some at least partially inlined IR.

I shortened this line quite a bit. There are still four passes and the devirt. Anything less does not reproduce the error.

arsenm · 2024-09-23T11:34:54Z

llvm/test/Transforms/LICM/iterate-over-dom-nodes.ll

+; Tests that we should not iterate over DomTreeNodes and modify the DomTree at the same time.
+
+target datalayout = "e-m:e-p270:32:32-p271:32:32-p272:64:64-i64:64-i128:128-f80:128-n8:16:32:64-S128"
+target triple = "x86_64-unknown-linux-gnu"


Shouldn't need triple or datalayout

If I remove these, and make LICM iterate over DomTreeNodes, it does not reproduce the crash.

nikic · 2024-09-25T08:07:16Z

llvm/test/Transforms/LICM/iterate-over-dom-nodes.ll

TBH, I think I'd prefer just not having a test for this than having this test. Presumably this is making the stars align in exactly the right way to trigger a crash, but it's probably not going to still be doing that half a year form now when something, somewhere changes in a small way. If there's no way to make the issue more reliably reproducible (e.g. via an extra assert somewhere) then this test isn't adding much, and will likely be a PITA to update in the future for random changes across four passes.

I agree. Its not really testing for the changes in this patch.

nikic

LGTM

dtcxzyw · 2024-09-30T16:00:03Z

I noticed that this patch caused a significant compilation time regression (~7%) when building php.
Reproducer: https://github.com/dtcxzyw/llvm-opt-benchmark/blob/main/bench/php/original/parse_date.ll
with commit 6f3c151:

perf stat -e instructions:u bin/opt -O3 ../../llvm-opt-benchmark/bench/php/original/parse_date.ll --disable-output

Performance counter stats for 'bin/opt -O3 ../../llvm-opt-benchmark/bench/php/original/parse_date.ll --disable-output':

    16,235,016,831      instructions:u                                                        

       2.118263921 seconds time elapsed

       2.101738000 seconds user
       0.015998000 seconds sys

with this patch (commit 0bc9834):

perf stat -e instructions:u bin/opt -O3 ../../llvm-opt-benchmark/bench/php/original/parse_date.ll --disable-output

 Performance counter stats for 'bin/opt -O3 ../../llvm-opt-benchmark/bench/php/original/parse_date.ll --disable-output':

    44,855,633,670      instructions:u                                                        

       5.125998136 seconds time elapsed

       5.106567000 seconds user
       0.019002000 seconds sys

See dtcxzyw/llvm-opt-benchmark#1399 for more details.
Can someone have a look?

nikic · 2024-09-30T16:14:42Z

I can't say I'm particularly surprised. Manual DT updates are going to be faster than generic DTU updates, especially in pathological cases. We've observed similar things e.g. with DT updates during unrolling, and ended up replacing DTU with manual updates to address it.

TBH this makes me think that replacing existing uses of SplitBlockPredecessors directly updating DT with DTU is not something we should pursue.

caojoshua · 2024-10-03T08:52:09Z

I suppose the SplitBlockPredecessors DTU implementation is not that smart. If we have a lot of Preds we might have to re-evaluate the DomTree many times. It would be more efficient to just find the common dominator D and insert an edge from D to NewBB.

I think its better to not have separate APIs for DominatorTree and DomTreeUpdater. Maybe we can just call DTU->getDomTree()->splitBlock()?

…m#107190) The DominatorTree version is marked for deprecation, so we use the DomTreeUpdater version. We also update sinkRegion() to iterate over basic blocks instead of DomTreeNodes. The loop body calls SplitBlockPredecessors. The DTU version calls DomTreeUpdater::apply_updates(), which may call DominatorTree::reset(). This invalidates the worklist of DomTreeNodes to iterate over.

caojoshua requested review from arsenm, nikic and kuhar September 4, 2024 06:31

llvmbot added the llvm:transforms label Sep 4, 2024

caojoshua mentioned this pull request Sep 4, 2024

[BBUtils][nfc] Delete SplitBlockPredecessors with DT #97196

Open

Fix formatting

a5ec0c3

nikic reviewed Sep 4, 2024

View reviewed changes

caojoshua added 6 commits September 23, 2024 00:47

Add reproducing test

785a1d7

Revert "Fix formatting"

e51beac

This reverts commit a5ec0c3.

Revert "[BasicBlockUtils] Fix SplitBlockPredecessors incorrect domina…

b63245d

…tor insert" This reverts commit 0caf5b5.

Auto generate file checks

c898316

Add a note to the test

842a1ab

caojoshua changed the title ~~[BasicBlockUtils] Fix SplitBlockPredecessors incorrect dominator insert~~ [LICM] Use DomTreeUpdater version of SplitBlockPredecessors, nfc Sep 23, 2024

Fix formatting

feb3971

arsenm reviewed Sep 23, 2024

View reviewed changes

caojoshua added 2 commits September 24, 2024 01:48

Remove uselistorder from test

860f4ac

Reduce passes

fa0091d

nikic reviewed Sep 25, 2024

View reviewed changes

Remove test

f667520

nikic approved these changes Sep 27, 2024

View reviewed changes

caojoshua merged commit 0bc9834 into llvm:main Sep 30, 2024
8 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[LICM] Use DomTreeUpdater version of SplitBlockPredecessors, nfc #107190

[LICM] Use DomTreeUpdater version of SplitBlockPredecessors, nfc #107190

caojoshua commented Sep 4, 2024 •

edited

Loading

caojoshua commented Sep 4, 2024

llvmbot commented Sep 4, 2024

nikic Sep 4, 2024

caojoshua Sep 4, 2024

nikic Sep 4, 2024

caojoshua Sep 4, 2024

caojoshua Sep 9, 2024

caojoshua commented Sep 23, 2024 •

edited

Loading

github-actions bot commented Sep 23, 2024 •

edited

Loading

arsenm Sep 23, 2024

caojoshua Sep 24, 2024

arsenm Sep 23, 2024

caojoshua Sep 24, 2024

nikic Sep 24, 2024

nikic Sep 24, 2024

caojoshua Sep 25, 2024

arsenm Sep 23, 2024

caojoshua Sep 24, 2024

nikic Sep 25, 2024

caojoshua Sep 27, 2024

nikic left a comment

dtcxzyw commented Sep 30, 2024

nikic commented Sep 30, 2024

caojoshua commented Oct 3, 2024

		@@ -0,0 +1,889 @@
		; NOTE: Assertions have been autogenerated by utils/update_test_checks.py UTC_ARGS: --version 5
		; RUN: opt -S -passes='cgscc(devirt<4>(inline,function<eager-inv;no-rerun>(early-cse<memssa>,simplifycfg<bonus-inst-threshold=1;no-forward-switch-cond;switch-range-to-icmp;no-switch-to-lookup;keep-loops;no-hoist-common-insts;no-hoist-loads-stores-with-cond-faulting;no-sink-common-insts;speculate-blocks;simplify-cond-branch;no-speculate-unpredictables>,loop-mssa(licm<allowspeculation>))))' < %s \| FileCheck %s

[LICM] Use DomTreeUpdater version of SplitBlockPredecessors, nfc #107190

[LICM] Use DomTreeUpdater version of SplitBlockPredecessors, nfc #107190

Conversation

caojoshua commented Sep 4, 2024 • edited Loading

caojoshua commented Sep 4, 2024

llvmbot commented Sep 4, 2024

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

caojoshua commented Sep 23, 2024 • edited Loading

github-actions bot commented Sep 23, 2024 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

nikic left a comment

Choose a reason for hiding this comment

dtcxzyw commented Sep 30, 2024

nikic commented Sep 30, 2024

caojoshua commented Oct 3, 2024

caojoshua commented Sep 4, 2024 •

edited

Loading

caojoshua commented Sep 23, 2024 •

edited

Loading

github-actions bot commented Sep 23, 2024 •

edited

Loading