Skip to content

Conversation

@ro-i
Copy link
Contributor

@ro-i ro-i commented Jul 17, 2025

First batch of changes to add support for inline-asm callbr for the AMDGPU backend.

@llvmbot
Copy link
Member

llvmbot commented Jul 17, 2025

@llvm/pr-subscribers-llvm-adt
@llvm/pr-subscribers-backend-amdgpu

@llvm/pr-subscribers-llvm-support

Author: Robert Imschweiler (ro-i)

Changes

First batch of changes to add support for basic inline-asm callbr for the AMDGPU backend.


Patch is 132.46 KiB, truncated to 20.00 KiB below, full version: https://github.com/llvm/llvm-project/pull/149308.diff

16 Files Affected:

  • (modified) llvm/include/llvm/Support/GenericLoopInfoImpl.h (+1-1)
  • (modified) llvm/include/llvm/Transforms/Utils/BasicBlockUtils.h (+8-1)
  • (modified) llvm/include/llvm/Transforms/Utils/ControlFlowUtils.h (+34-1)
  • (modified) llvm/lib/Transforms/Utils/BasicBlockUtils.cpp (+12-3)
  • (modified) llvm/lib/Transforms/Utils/ControlFlowUtils.cpp (+56-1)
  • (modified) llvm/lib/Transforms/Utils/FixIrreducible.cpp (+102-25)
  • (modified) llvm/lib/Transforms/Utils/UnifyLoopExits.cpp (+55-17)
  • (modified) llvm/test/Transforms/FixIrreducible/bug45623.ll (+109)
  • (added) llvm/test/Transforms/FixIrreducible/callbr.ll (+842)
  • (modified) llvm/test/Transforms/FixIrreducible/nested.ll (+676)
  • (modified) llvm/test/Transforms/FixIrreducible/unreachable.ll (+23)
  • (modified) llvm/test/Transforms/UnifyLoopExits/basic.ll (+128-3)
  • (modified) llvm/test/Transforms/UnifyLoopExits/integer_guards.ll (+410)
  • (modified) llvm/test/Transforms/UnifyLoopExits/nested.ll (+90)
  • (modified) llvm/test/Transforms/UnifyLoopExits/restore-ssa.ll (+236)
  • (modified) llvm/test/Transforms/UnifyLoopExits/undef-phis.ll (+68)
diff --git a/llvm/include/llvm/Support/GenericLoopInfoImpl.h b/llvm/include/llvm/Support/GenericLoopInfoImpl.h
index 6fc508b0e0cca..8b7927357d57d 100644
--- a/llvm/include/llvm/Support/GenericLoopInfoImpl.h
+++ b/llvm/include/llvm/Support/GenericLoopInfoImpl.h
@@ -355,7 +355,7 @@ void LoopBase<BlockT, LoopT>::verifyLoop() const {
     if (BB == getHeader()) {
       assert(!OutsideLoopPreds.empty() && "Loop is unreachable!");
     } else if (!OutsideLoopPreds.empty()) {
-      // A non-header loop shouldn't be reachable from outside the loop,
+      // A non-header loop block shouldn't be reachable from outside the loop,
       // though it is permitted if the predecessor is not itself actually
       // reachable.
       BlockT *EntryBB = &BB->getParent()->front();
diff --git a/llvm/include/llvm/Transforms/Utils/BasicBlockUtils.h b/llvm/include/llvm/Transforms/Utils/BasicBlockUtils.h
index 979f3b3eb72ff..fc7b313eb552a 100644
--- a/llvm/include/llvm/Transforms/Utils/BasicBlockUtils.h
+++ b/llvm/include/llvm/Transforms/Utils/BasicBlockUtils.h
@@ -607,10 +607,17 @@ LLVM_ABI bool SplitIndirectBrCriticalEdges(Function &F,
 // successors
 LLVM_ABI void InvertBranch(BranchInst *PBI, IRBuilderBase &Builder);
 
-// Check whether the function only has simple terminator:
+template <typename... TermInst>
+LLVM_ABI bool hasOnlyGivenTerminators(const Function &F);
+
+// Check whether the function only has blocks with simple terminators:
 // br/brcond/unreachable/ret
 LLVM_ABI bool hasOnlySimpleTerminator(const Function &F);
 
+// Check whether the function only has blocks with simple terminators
+// (br/brcond/unreachable/ret) or callbr.
+LLVM_ABI bool hasOnlySimpleTerminatorOrCallBr(const Function &F);
+
 } // end namespace llvm
 
 #endif // LLVM_TRANSFORMS_UTILS_BASICBLOCKUTILS_H
diff --git a/llvm/include/llvm/Transforms/Utils/ControlFlowUtils.h b/llvm/include/llvm/Transforms/Utils/ControlFlowUtils.h
index 810fef29f4010..e55efbc907d42 100644
--- a/llvm/include/llvm/Transforms/Utils/ControlFlowUtils.h
+++ b/llvm/include/llvm/Transforms/Utils/ControlFlowUtils.h
@@ -15,10 +15,13 @@
 
 #include "llvm/ADT/SmallVector.h"
 #include "llvm/ADT/StringRef.h"
+#include "llvm/IR/CycleInfo.h"
 
 namespace llvm {
 
 class BasicBlock;
+class CallBrInst;
+class LoopInfo;
 class DomTreeUpdater;
 
 /// Given a set of branch descriptors [BB, Succ0, Succ1], create a "hub" such
@@ -104,7 +107,8 @@ struct ControlFlowHub {
         : BB(BB), Succ0(Succ0), Succ1(Succ1) {}
   };
 
-  void addBranch(BasicBlock *BB, BasicBlock *Succ0, BasicBlock *Succ1) {
+  void addBranch(BasicBlock *BB, BasicBlock *Succ0,
+                 BasicBlock *Succ1 = nullptr) {
     assert(BB);
     assert(Succ0 || Succ1);
     Branches.emplace_back(BB, Succ0, Succ1);
@@ -118,6 +122,35 @@ struct ControlFlowHub {
            std::optional<unsigned> MaxControlFlowBooleans = std::nullopt);
 
   SmallVector<BranchDescriptor> Branches;
+
+  /**
+   * \brief Create a new intermediate target block for a callbr edge.
+   *
+   * This function creates a new basic block (the "target block") that sits
+   * between a callbr instruction and one of its successors. The callbr's
+   * successor is rewired to this new block, and the new block unconditionally
+   * branches to the original successor. This is useful for normalizing control
+   * flow, e.g., when transforming irreducible loops.
+   *
+   * \param CallBr         The callbr instruction whose edge is to be split.
+   * \param Succ           The original successor basic block to be reached.
+   * \param SuccIdx        The index of the successor in the callbr instruction.
+   * \param AttachToCallBr If true, the new block is associated with the
+   * callbr's parent for loop/cycle info. If false, the new block is associated
+   * with the callbr's successor for loop/cycle info. \param CI Optional
+   * CycleInfo for updating cycle membership. \param DTU            Optional
+   * DomTreeUpdater for updating the dominator tree. \param LI Optional LoopInfo
+   * for updating loop membership.
+   *
+   * \returns The newly created intermediate target block.
+   *
+   * \note This function updates PHI nodes, dominator tree, loop info, and cycle
+   * info as needed.
+   */
+  static BasicBlock *
+  createCallBrTarget(CallBrInst *CallBr, BasicBlock *Succ, unsigned SuccIdx,
+                     bool AttachToCallBr = true, CycleInfo *CI = nullptr,
+                     DomTreeUpdater *DTU = nullptr, LoopInfo *LI = nullptr);
 };
 
 } // end namespace llvm
diff --git a/llvm/lib/Transforms/Utils/BasicBlockUtils.cpp b/llvm/lib/Transforms/Utils/BasicBlockUtils.cpp
index c8255742c41ba..6103d07212fc2 100644
--- a/llvm/lib/Transforms/Utils/BasicBlockUtils.cpp
+++ b/llvm/lib/Transforms/Utils/BasicBlockUtils.cpp
@@ -1766,12 +1766,21 @@ void llvm::InvertBranch(BranchInst *PBI, IRBuilderBase &Builder) {
   PBI->swapSuccessors();
 }
 
-bool llvm::hasOnlySimpleTerminator(const Function &F) {
+template <typename... TermInst>
+bool llvm::hasOnlyGivenTerminators(const Function &F) {
   for (auto &BB : F) {
     auto *Term = BB.getTerminator();
-    if (!(isa<ReturnInst>(Term) || isa<UnreachableInst>(Term) ||
-          isa<BranchInst>(Term)))
+    if (!(isa<TermInst>(Term) || ...))
       return false;
   }
   return true;
 }
+
+bool llvm::hasOnlySimpleTerminator(const Function &F) {
+  return hasOnlyGivenTerminators<ReturnInst, UnreachableInst, BranchInst>(F);
+}
+
+bool llvm::hasOnlySimpleTerminatorOrCallBr(const Function &F) {
+  return hasOnlyGivenTerminators<ReturnInst, UnreachableInst, BranchInst,
+                                 CallBrInst>(F);
+}
\ No newline at end of file
diff --git a/llvm/lib/Transforms/Utils/ControlFlowUtils.cpp b/llvm/lib/Transforms/Utils/ControlFlowUtils.cpp
index 4b0065d0030cd..f7197a68813dd 100644
--- a/llvm/lib/Transforms/Utils/ControlFlowUtils.cpp
+++ b/llvm/lib/Transforms/Utils/ControlFlowUtils.cpp
@@ -14,6 +14,7 @@
 #include "llvm/ADT/SetVector.h"
 #include "llvm/ADT/SmallSet.h"
 #include "llvm/Analysis/DomTreeUpdater.h"
+#include "llvm/Analysis/LoopInfo.h"
 #include "llvm/IR/Constants.h"
 #include "llvm/IR/Instructions.h"
 #include "llvm/IR/ValueHandle.h"
@@ -282,7 +283,9 @@ std::pair<BasicBlock *, bool> ControlFlowHub::finalize(
 
   for (auto [BB, Succ0, Succ1] : Branches) {
 #ifndef NDEBUG
-    assert(Incoming.insert(BB).second && "Duplicate entry for incoming block.");
+    assert(
+        (Incoming.insert(BB).second || isa<CallBrInst>(BB->getTerminator())) &&
+        "Duplicate entry for incoming block.");
 #endif
     if (Succ0)
       Outgoing.insert(Succ0);
@@ -342,3 +345,55 @@ std::pair<BasicBlock *, bool> ControlFlowHub::finalize(
 
   return {FirstGuardBlock, true};
 }
+
+BasicBlock *ControlFlowHub::createCallBrTarget(
+    CallBrInst *CallBr, BasicBlock *Succ, unsigned SuccIdx, bool AttachToCallBr,
+    CycleInfo *CI, DomTreeUpdater *DTU, LoopInfo *LI) {
+  BasicBlock *CallBrBlock = CallBr->getParent();
+  BasicBlock *CallBrTarget =
+      BasicBlock::Create(CallBrBlock->getContext(),
+                         CallBrBlock->getName() + ".target." + Succ->getName(),
+                         CallBrBlock->getParent());
+  // Rewire control flow from callbr to the new target block.
+  Succ->replacePhiUsesWith(CallBrBlock, CallBrTarget);
+  CallBr->setSuccessor(SuccIdx, CallBrTarget);
+  // Jump from the new target block to the original successor.
+  BranchInst::Create(Succ, CallBrTarget);
+  if (LI) {
+    if (Loop *L = LI->getLoopFor(AttachToCallBr ? CallBrBlock : Succ); L) {
+      bool AddToLoop = true;
+      if (AttachToCallBr) {
+        // Check if the loops are disjoint. In that case, we do not add the
+        // intermediate target to any loop.
+        if (auto *LL = LI->getLoopFor(Succ);
+            LL && !L->contains(LL) && !LL->contains(L))
+          AddToLoop = false;
+      }
+      if (AddToLoop)
+        L->addBasicBlockToLoop(CallBrTarget, *LI);
+    }
+  }
+  if (CI) {
+    if (auto *C = CI->getCycle(AttachToCallBr ? CallBrBlock : Succ); C) {
+      bool AddToCycle = true;
+      if (AttachToCallBr) {
+        // Check if the cycles are disjoint. In that case, we do not add the
+        // intermediate target to any cycle.
+        if (auto *CC = CI->getCycle(Succ); CC) {
+          auto *CommonC = CI->getSmallestCommonCycle(C, CC);
+          if (CommonC != C && CommonC != CC)
+            AddToCycle = false;
+        }
+      }
+      if (AddToCycle)
+        CI->addBlockToCycle(CallBrTarget, C);
+    }
+  }
+  if (DTU) {
+    DTU->applyUpdates({{DominatorTree::Insert, CallBrBlock, CallBrTarget}});
+    if (DTU->getDomTree().dominates(CallBrBlock, Succ))
+      DTU->applyUpdates({{DominatorTree::Delete, CallBrBlock, Succ},
+                         {DominatorTree::Insert, CallBrTarget, Succ}});
+  }
+  return CallBrTarget;
+}
\ No newline at end of file
diff --git a/llvm/lib/Transforms/Utils/FixIrreducible.cpp b/llvm/lib/Transforms/Utils/FixIrreducible.cpp
index 45e1d12c2bfff..ade23f942352d 100644
--- a/llvm/lib/Transforms/Utils/FixIrreducible.cpp
+++ b/llvm/lib/Transforms/Utils/FixIrreducible.cpp
@@ -79,6 +79,53 @@
 // Limitation: The pass cannot handle switch statements and indirect
 //             branches. Both must be lowered to plain branches first.
 //
+// CallBr support: CallBr is handled as a more general branch instruction which
+// can have multiple successors. The pass redirects the edges to intermediate
+// target blocks that unconditionally branch to the original callbr target
+// blocks. This allows the control flow hub to know to which of the original
+// target blocks to jump to.
+// Example input CFG:
+//                        Entry (callbr)
+//                       /     \
+//                      v       v
+//                      H ----> B
+//                      ^      /|
+//                       `----' |
+//                              v
+//                             Exit
+//
+// becomes:
+//                        Entry (callbr)
+//                       /     \
+//                      v       v
+//                 target.H   target.B
+//                      |       |
+//                      v       v
+//                      H ----> B
+//                      ^      /|
+//                       `----' |
+//                              v
+//                             Exit
+//
+// Note
+// OUTPUT CFG: Converted to a natural loop with a new header N.
+//
+//                        Entry (callbr)
+//                       /     \
+//                      v       v
+//                 target.H   target.B
+//                      \       /
+//                       \     /
+//                        v   v
+//                          N <---.
+//                         / \     \
+//                        /   \     |
+//                       v     v    /
+//                       H --> B --'
+//                             |
+//                             v
+//                            Exit
+//
 //===----------------------------------------------------------------------===//
 
 #include "llvm/Transforms/Utils/FixIrreducible.h"
@@ -231,6 +278,7 @@ static bool fixIrreducible(Cycle &C, CycleInfo &CI, DominatorTree &DT,
     return false;
   LLVM_DEBUG(dbgs() << "Processing cycle:\n" << CI.print(&C) << "\n";);
 
+  DomTreeUpdater DTU(DT, DomTreeUpdater::UpdateStrategy::Eager);
   ControlFlowHub CHub;
   SetVector<BasicBlock *> Predecessors;
 
@@ -242,18 +290,33 @@ static bool fixIrreducible(Cycle &C, CycleInfo &CI, DominatorTree &DT,
   }
 
   for (BasicBlock *P : Predecessors) {
-    auto *Branch = cast<BranchInst>(P->getTerminator());
-    // Exactly one of the two successors is the header.
-    BasicBlock *Succ0 = Branch->getSuccessor(0) == Header ? Header : nullptr;
-    BasicBlock *Succ1 = Succ0 ? nullptr : Header;
-    if (!Succ0)
-      assert(Branch->getSuccessor(1) == Header);
-    assert(Succ0 || Succ1);
-    CHub.addBranch(P, Succ0, Succ1);
-
-    LLVM_DEBUG(dbgs() << "Added internal branch: " << P->getName() << " -> "
-                      << (Succ0 ? Succ0->getName() : "") << " "
-                      << (Succ1 ? Succ1->getName() : "") << "\n");
+    if (BranchInst *Branch = dyn_cast<BranchInst>(P->getTerminator()); Branch) {
+      // Exactly one of the two successors is the header.
+      BasicBlock *Succ0 = Branch->getSuccessor(0) == Header ? Header : nullptr;
+      BasicBlock *Succ1 = Succ0 ? nullptr : Header;
+      if (!Succ0)
+        assert(Branch->getSuccessor(1) == Header);
+      assert(Succ0 || Succ1);
+      CHub.addBranch(P, Succ0, Succ1);
+
+      LLVM_DEBUG(dbgs() << "Added internal branch: " << P->getName() << " -> "
+                        << (Succ0 ? Succ0->getName() : "") << " "
+                        << (Succ1 ? Succ1->getName() : "") << "\n");
+    } else if (CallBrInst *CallBr = dyn_cast<CallBrInst>(P->getTerminator());
+               CallBr) {
+      for (unsigned I = 0; I < CallBr->getNumSuccessors(); ++I) {
+        BasicBlock *Succ = CallBr->getSuccessor(I);
+        if (Succ != Header)
+          continue;
+        BasicBlock *NewSucc = llvm::ControlFlowHub::createCallBrTarget(
+            CallBr, Succ, I, false, &CI, &DTU, LI);
+        CHub.addBranch(NewSucc, Succ);
+        LLVM_DEBUG(dbgs() << "Added internal branch: " << NewSucc->getName()
+                          << " -> " << Succ->getName() << "\n");
+      }
+    } else {
+      llvm_unreachable("Unsupported block terminator.");
+    }
   }
 
   // Redirect external incoming edges. This includes the edges on the header.
@@ -266,17 +329,32 @@ static bool fixIrreducible(Cycle &C, CycleInfo &CI, DominatorTree &DT,
   }
 
   for (BasicBlock *P : Predecessors) {
-    auto *Branch = cast<BranchInst>(P->getTerminator());
-    BasicBlock *Succ0 = Branch->getSuccessor(0);
-    Succ0 = C.contains(Succ0) ? Succ0 : nullptr;
-    BasicBlock *Succ1 =
-        Branch->isUnconditional() ? nullptr : Branch->getSuccessor(1);
-    Succ1 = Succ1 && C.contains(Succ1) ? Succ1 : nullptr;
-    CHub.addBranch(P, Succ0, Succ1);
-
-    LLVM_DEBUG(dbgs() << "Added external branch: " << P->getName() << " -> "
-                      << (Succ0 ? Succ0->getName() : "") << " "
-                      << (Succ1 ? Succ1->getName() : "") << "\n");
+    if (BranchInst *Branch = dyn_cast<BranchInst>(P->getTerminator()); Branch) {
+      BasicBlock *Succ0 = Branch->getSuccessor(0);
+      Succ0 = C.contains(Succ0) ? Succ0 : nullptr;
+      BasicBlock *Succ1 =
+          Branch->isUnconditional() ? nullptr : Branch->getSuccessor(1);
+      Succ1 = Succ1 && C.contains(Succ1) ? Succ1 : nullptr;
+      CHub.addBranch(P, Succ0, Succ1);
+
+      LLVM_DEBUG(dbgs() << "Added external branch: " << P->getName() << " -> "
+                        << (Succ0 ? Succ0->getName() : "") << " "
+                        << (Succ1 ? Succ1->getName() : "") << "\n");
+    } else if (CallBrInst *CallBr = dyn_cast<CallBrInst>(P->getTerminator());
+               CallBr) {
+      for (unsigned I = 0; I < CallBr->getNumSuccessors(); ++I) {
+        BasicBlock *Succ = CallBr->getSuccessor(I);
+        if (!C.contains(Succ))
+          continue;
+        BasicBlock *NewSucc = llvm::ControlFlowHub::createCallBrTarget(
+            CallBr, Succ, I, true, &CI, &DTU, LI);
+        CHub.addBranch(NewSucc, Succ);
+        LLVM_DEBUG(dbgs() << "Added external branch: " << NewSucc->getName()
+                          << " -> " << Succ->getName() << "\n");
+      }
+    } else {
+      llvm_unreachable("Unsupported block terminator.");
+    }
   }
 
   // Redirect all the backedges through a "hub" consisting of a series
@@ -292,7 +370,6 @@ static bool fixIrreducible(Cycle &C, CycleInfo &CI, DominatorTree &DT,
   SetVector<BasicBlock *> Entries;
   Entries.insert(C.entry_rbegin(), C.entry_rend());
 
-  DomTreeUpdater DTU(DT, DomTreeUpdater::UpdateStrategy::Eager);
   CHub.finalize(&DTU, GuardBlocks, "irr");
 #if defined(EXPENSIVE_CHECKS)
   assert(DT.verify(DominatorTree::VerificationLevel::Full));
@@ -325,7 +402,7 @@ static bool FixIrreducibleImpl(Function &F, CycleInfo &CI, DominatorTree &DT,
   LLVM_DEBUG(dbgs() << "===== Fix irreducible control-flow in function: "
                     << F.getName() << "\n");
 
-  assert(hasOnlySimpleTerminator(F) && "Unsupported block terminator.");
+  assert(hasOnlySimpleTerminatorOrCallBr(F) && "Unsupported block terminator.");
 
   bool Changed = false;
   for (Cycle *TopCycle : CI.toplevel_cycles()) {
diff --git a/llvm/lib/Transforms/Utils/UnifyLoopExits.cpp b/llvm/lib/Transforms/Utils/UnifyLoopExits.cpp
index 9f338dbc78cff..51e5aaa5225e1 100644
--- a/llvm/lib/Transforms/Utils/UnifyLoopExits.cpp
+++ b/llvm/lib/Transforms/Utils/UnifyLoopExits.cpp
@@ -12,7 +12,11 @@
 //
 // Limitation: This assumes that all terminators in the CFG are direct branches
 //             (the "br" instruction). The presence of any other control flow
-//             such as indirectbr, switch or callbr will cause an assert.
+//             such as indirectbr ot switch will cause an assert.
+//             The callbr terminator is supported by creating intermediate
+//             target blocks that unconditionally branch to the original target
+//             blocks. These intermediate target blocks can then be redirected
+//             through the ControlFlowHub as usual.
 //
 //===----------------------------------------------------------------------===//
 
@@ -150,25 +154,53 @@ static bool unifyLoopExits(DominatorTree &DT, LoopInfo &LI, Loop *L) {
   SmallVector<BasicBlock *, 8> ExitingBlocks;
   L->getExitingBlocks(ExitingBlocks);
 
+  DomTreeUpdater DTU(DT, DomTreeUpdater::UpdateStrategy::Eager);
+  SmallVector<BasicBlock *, 8> CallBrTargetBlocks;
   // Redirect exiting edges through a control flow hub.
   ControlFlowHub CHub;
-  for (auto *BB : ExitingBlocks) {
-    auto *Branch = cast<BranchInst>(BB->getTerminator());
-    BasicBlock *Succ0 = Branch->getSuccessor(0);
-    Succ0 = L->contains(Succ0) ? nullptr : Succ0;
-
-    BasicBlock *Succ1 =
-        Branch->isUnconditional() ? nullptr : Branch->getSuccessor(1);
-    Succ1 = L->contains(Succ1) ? nullptr : Succ1;
-    CHub.addBranch(BB, Succ0, Succ1);
-
-    LLVM_DEBUG(dbgs() << "Added exiting branch: " << BB->getName() << " -> {"
-                      << (Succ0 ? Succ0->getName() : "<none>") << ", "
-                      << (Succ1 ? Succ1->getName() : "<none>") << "}\n");
+
+  for (unsigned I = 0; I < ExitingBlocks.size(); ++I) {
+    BasicBlock *BB = ExitingBlocks[I];
+    if (BranchInst *Branch = dyn_cast<BranchInst>(BB->getTerminator());
+        Branch) {
+      BasicBlock *Succ0 = Branch->getSuccessor(0);
+      Succ0 = L->contains(Succ0) ? nullptr : Succ0;
+
+      BasicBlock *Succ1 =
+          Branch->isUnconditional() ? nullptr : Branch->getSuccessor(1);
+      Succ1 = L->contains(Succ1) ? nullptr : Succ1;
+      CHub.addBranch(BB, Succ0, Succ1);
+
+      LLVM_DEBUG(dbgs() << "Added exiting branch: " << BB->getName() << " -> {"
+                        << (Succ0 ? Succ0->getName() : "<none>") << ", "
+                        << (Succ1 ? Succ1->getName() : "<none>") << "}\n");
+    } else if (CallBrInst *CallBr = dyn_cast<CallBrInst>(BB->getTerminator());
+               CallBr) {
+      for (unsigned J = 0; J < CallBr->getNumSuccessors(); ++J) {
+        BasicBlock *Succ = CallBr->getSuccessor(J);
+        if (L->contains(Succ))
+          continue;
+        BasicBlock *NewSucc = ControlFlowHub::createCallBrTarget(
+            CallBr, Succ, J, false, nullptr, &DTU, &LI);
+        // ExitingBlocks is later used to restore SSA, so we need to make sure
+        // that the blocks used for phi nodes in the guard blocks match the
+        // predecessors of the guard blocks, which, in the case of callbr, are
+        // the new intermediate target blocks instead of the callbr blocks
+        // themselves.
+        ExitingBlocks[I] = NewSucc;
+        CHub.addBranch(NewSucc, Succ);
+        LLVM_DEBUG(dbgs() << "Added exiting branch: " << NewSucc->getName()
+                          << " -> " << Succ->getName() << "\n");
+        // Also add the new target block to the list of exiting blocks that
+        // should later be added to the parent loops.
+        CallBrTargetBlocks.push_back(NewSucc);
+      }
+    } else {
+      llvm_unreachable("Unsupported block terminator.");
+    }
   }
 
   SmallVector<BasicBlock *, 8> GuardBlocks;
-  DomTreeUpdater DTU(DT, DomTreeUpdater::UpdateStrategy::Eager);
   Ba...
[truncated]

@llvmbot
Copy link
Member

llvmbot commented Jul 17, 2025

@llvm/pr-subscribers-llvm-transforms

Author: Robert Imschweiler (ro-i)

Changes

First batch of changes to add support for basic inline-asm callbr for the AMDGPU backend.


Patch is 132.46 KiB, truncated to 20.00 KiB below, full version: https://github.com/llvm/llvm-project/pull/149308.diff

16 Files Affected:

  • (modified) llvm/include/llvm/Support/GenericLoopInfoImpl.h (+1-1)
  • (modified) llvm/include/llvm/Transforms/Utils/BasicBlockUtils.h (+8-1)
  • (modified) llvm/include/llvm/Transforms/Utils/ControlFlowUtils.h (+34-1)
  • (modified) llvm/lib/Transforms/Utils/BasicBlockUtils.cpp (+12-3)
  • (modified) llvm/lib/Transforms/Utils/ControlFlowUtils.cpp (+56-1)
  • (modified) llvm/lib/Transforms/Utils/FixIrreducible.cpp (+102-25)
  • (modified) llvm/lib/Transforms/Utils/UnifyLoopExits.cpp (+55-17)
  • (modified) llvm/test/Transforms/FixIrreducible/bug45623.ll (+109)
  • (added) llvm/test/Transforms/FixIrreducible/callbr.ll (+842)
  • (modified) llvm/test/Transforms/FixIrreducible/nested.ll (+676)
  • (modified) llvm/test/Transforms/FixIrreducible/unreachable.ll (+23)
  • (modified) llvm/test/Transforms/UnifyLoopExits/basic.ll (+128-3)
  • (modified) llvm/test/Transforms/UnifyLoopExits/integer_guards.ll (+410)
  • (modified) llvm/test/Transforms/UnifyLoopExits/nested.ll (+90)
  • (modified) llvm/test/Transforms/UnifyLoopExits/restore-ssa.ll (+236)
  • (modified) llvm/test/Transforms/UnifyLoopExits/undef-phis.ll (+68)
diff --git a/llvm/include/llvm/Support/GenericLoopInfoImpl.h b/llvm/include/llvm/Support/GenericLoopInfoImpl.h
index 6fc508b0e0cca..8b7927357d57d 100644
--- a/llvm/include/llvm/Support/GenericLoopInfoImpl.h
+++ b/llvm/include/llvm/Support/GenericLoopInfoImpl.h
@@ -355,7 +355,7 @@ void LoopBase<BlockT, LoopT>::verifyLoop() const {
     if (BB == getHeader()) {
       assert(!OutsideLoopPreds.empty() && "Loop is unreachable!");
     } else if (!OutsideLoopPreds.empty()) {
-      // A non-header loop shouldn't be reachable from outside the loop,
+      // A non-header loop block shouldn't be reachable from outside the loop,
       // though it is permitted if the predecessor is not itself actually
       // reachable.
       BlockT *EntryBB = &BB->getParent()->front();
diff --git a/llvm/include/llvm/Transforms/Utils/BasicBlockUtils.h b/llvm/include/llvm/Transforms/Utils/BasicBlockUtils.h
index 979f3b3eb72ff..fc7b313eb552a 100644
--- a/llvm/include/llvm/Transforms/Utils/BasicBlockUtils.h
+++ b/llvm/include/llvm/Transforms/Utils/BasicBlockUtils.h
@@ -607,10 +607,17 @@ LLVM_ABI bool SplitIndirectBrCriticalEdges(Function &F,
 // successors
 LLVM_ABI void InvertBranch(BranchInst *PBI, IRBuilderBase &Builder);
 
-// Check whether the function only has simple terminator:
+template <typename... TermInst>
+LLVM_ABI bool hasOnlyGivenTerminators(const Function &F);
+
+// Check whether the function only has blocks with simple terminators:
 // br/brcond/unreachable/ret
 LLVM_ABI bool hasOnlySimpleTerminator(const Function &F);
 
+// Check whether the function only has blocks with simple terminators
+// (br/brcond/unreachable/ret) or callbr.
+LLVM_ABI bool hasOnlySimpleTerminatorOrCallBr(const Function &F);
+
 } // end namespace llvm
 
 #endif // LLVM_TRANSFORMS_UTILS_BASICBLOCKUTILS_H
diff --git a/llvm/include/llvm/Transforms/Utils/ControlFlowUtils.h b/llvm/include/llvm/Transforms/Utils/ControlFlowUtils.h
index 810fef29f4010..e55efbc907d42 100644
--- a/llvm/include/llvm/Transforms/Utils/ControlFlowUtils.h
+++ b/llvm/include/llvm/Transforms/Utils/ControlFlowUtils.h
@@ -15,10 +15,13 @@
 
 #include "llvm/ADT/SmallVector.h"
 #include "llvm/ADT/StringRef.h"
+#include "llvm/IR/CycleInfo.h"
 
 namespace llvm {
 
 class BasicBlock;
+class CallBrInst;
+class LoopInfo;
 class DomTreeUpdater;
 
 /// Given a set of branch descriptors [BB, Succ0, Succ1], create a "hub" such
@@ -104,7 +107,8 @@ struct ControlFlowHub {
         : BB(BB), Succ0(Succ0), Succ1(Succ1) {}
   };
 
-  void addBranch(BasicBlock *BB, BasicBlock *Succ0, BasicBlock *Succ1) {
+  void addBranch(BasicBlock *BB, BasicBlock *Succ0,
+                 BasicBlock *Succ1 = nullptr) {
     assert(BB);
     assert(Succ0 || Succ1);
     Branches.emplace_back(BB, Succ0, Succ1);
@@ -118,6 +122,35 @@ struct ControlFlowHub {
            std::optional<unsigned> MaxControlFlowBooleans = std::nullopt);
 
   SmallVector<BranchDescriptor> Branches;
+
+  /**
+   * \brief Create a new intermediate target block for a callbr edge.
+   *
+   * This function creates a new basic block (the "target block") that sits
+   * between a callbr instruction and one of its successors. The callbr's
+   * successor is rewired to this new block, and the new block unconditionally
+   * branches to the original successor. This is useful for normalizing control
+   * flow, e.g., when transforming irreducible loops.
+   *
+   * \param CallBr         The callbr instruction whose edge is to be split.
+   * \param Succ           The original successor basic block to be reached.
+   * \param SuccIdx        The index of the successor in the callbr instruction.
+   * \param AttachToCallBr If true, the new block is associated with the
+   * callbr's parent for loop/cycle info. If false, the new block is associated
+   * with the callbr's successor for loop/cycle info. \param CI Optional
+   * CycleInfo for updating cycle membership. \param DTU            Optional
+   * DomTreeUpdater for updating the dominator tree. \param LI Optional LoopInfo
+   * for updating loop membership.
+   *
+   * \returns The newly created intermediate target block.
+   *
+   * \note This function updates PHI nodes, dominator tree, loop info, and cycle
+   * info as needed.
+   */
+  static BasicBlock *
+  createCallBrTarget(CallBrInst *CallBr, BasicBlock *Succ, unsigned SuccIdx,
+                     bool AttachToCallBr = true, CycleInfo *CI = nullptr,
+                     DomTreeUpdater *DTU = nullptr, LoopInfo *LI = nullptr);
 };
 
 } // end namespace llvm
diff --git a/llvm/lib/Transforms/Utils/BasicBlockUtils.cpp b/llvm/lib/Transforms/Utils/BasicBlockUtils.cpp
index c8255742c41ba..6103d07212fc2 100644
--- a/llvm/lib/Transforms/Utils/BasicBlockUtils.cpp
+++ b/llvm/lib/Transforms/Utils/BasicBlockUtils.cpp
@@ -1766,12 +1766,21 @@ void llvm::InvertBranch(BranchInst *PBI, IRBuilderBase &Builder) {
   PBI->swapSuccessors();
 }
 
-bool llvm::hasOnlySimpleTerminator(const Function &F) {
+template <typename... TermInst>
+bool llvm::hasOnlyGivenTerminators(const Function &F) {
   for (auto &BB : F) {
     auto *Term = BB.getTerminator();
-    if (!(isa<ReturnInst>(Term) || isa<UnreachableInst>(Term) ||
-          isa<BranchInst>(Term)))
+    if (!(isa<TermInst>(Term) || ...))
       return false;
   }
   return true;
 }
+
+bool llvm::hasOnlySimpleTerminator(const Function &F) {
+  return hasOnlyGivenTerminators<ReturnInst, UnreachableInst, BranchInst>(F);
+}
+
+bool llvm::hasOnlySimpleTerminatorOrCallBr(const Function &F) {
+  return hasOnlyGivenTerminators<ReturnInst, UnreachableInst, BranchInst,
+                                 CallBrInst>(F);
+}
\ No newline at end of file
diff --git a/llvm/lib/Transforms/Utils/ControlFlowUtils.cpp b/llvm/lib/Transforms/Utils/ControlFlowUtils.cpp
index 4b0065d0030cd..f7197a68813dd 100644
--- a/llvm/lib/Transforms/Utils/ControlFlowUtils.cpp
+++ b/llvm/lib/Transforms/Utils/ControlFlowUtils.cpp
@@ -14,6 +14,7 @@
 #include "llvm/ADT/SetVector.h"
 #include "llvm/ADT/SmallSet.h"
 #include "llvm/Analysis/DomTreeUpdater.h"
+#include "llvm/Analysis/LoopInfo.h"
 #include "llvm/IR/Constants.h"
 #include "llvm/IR/Instructions.h"
 #include "llvm/IR/ValueHandle.h"
@@ -282,7 +283,9 @@ std::pair<BasicBlock *, bool> ControlFlowHub::finalize(
 
   for (auto [BB, Succ0, Succ1] : Branches) {
 #ifndef NDEBUG
-    assert(Incoming.insert(BB).second && "Duplicate entry for incoming block.");
+    assert(
+        (Incoming.insert(BB).second || isa<CallBrInst>(BB->getTerminator())) &&
+        "Duplicate entry for incoming block.");
 #endif
     if (Succ0)
       Outgoing.insert(Succ0);
@@ -342,3 +345,55 @@ std::pair<BasicBlock *, bool> ControlFlowHub::finalize(
 
   return {FirstGuardBlock, true};
 }
+
+BasicBlock *ControlFlowHub::createCallBrTarget(
+    CallBrInst *CallBr, BasicBlock *Succ, unsigned SuccIdx, bool AttachToCallBr,
+    CycleInfo *CI, DomTreeUpdater *DTU, LoopInfo *LI) {
+  BasicBlock *CallBrBlock = CallBr->getParent();
+  BasicBlock *CallBrTarget =
+      BasicBlock::Create(CallBrBlock->getContext(),
+                         CallBrBlock->getName() + ".target." + Succ->getName(),
+                         CallBrBlock->getParent());
+  // Rewire control flow from callbr to the new target block.
+  Succ->replacePhiUsesWith(CallBrBlock, CallBrTarget);
+  CallBr->setSuccessor(SuccIdx, CallBrTarget);
+  // Jump from the new target block to the original successor.
+  BranchInst::Create(Succ, CallBrTarget);
+  if (LI) {
+    if (Loop *L = LI->getLoopFor(AttachToCallBr ? CallBrBlock : Succ); L) {
+      bool AddToLoop = true;
+      if (AttachToCallBr) {
+        // Check if the loops are disjoint. In that case, we do not add the
+        // intermediate target to any loop.
+        if (auto *LL = LI->getLoopFor(Succ);
+            LL && !L->contains(LL) && !LL->contains(L))
+          AddToLoop = false;
+      }
+      if (AddToLoop)
+        L->addBasicBlockToLoop(CallBrTarget, *LI);
+    }
+  }
+  if (CI) {
+    if (auto *C = CI->getCycle(AttachToCallBr ? CallBrBlock : Succ); C) {
+      bool AddToCycle = true;
+      if (AttachToCallBr) {
+        // Check if the cycles are disjoint. In that case, we do not add the
+        // intermediate target to any cycle.
+        if (auto *CC = CI->getCycle(Succ); CC) {
+          auto *CommonC = CI->getSmallestCommonCycle(C, CC);
+          if (CommonC != C && CommonC != CC)
+            AddToCycle = false;
+        }
+      }
+      if (AddToCycle)
+        CI->addBlockToCycle(CallBrTarget, C);
+    }
+  }
+  if (DTU) {
+    DTU->applyUpdates({{DominatorTree::Insert, CallBrBlock, CallBrTarget}});
+    if (DTU->getDomTree().dominates(CallBrBlock, Succ))
+      DTU->applyUpdates({{DominatorTree::Delete, CallBrBlock, Succ},
+                         {DominatorTree::Insert, CallBrTarget, Succ}});
+  }
+  return CallBrTarget;
+}
\ No newline at end of file
diff --git a/llvm/lib/Transforms/Utils/FixIrreducible.cpp b/llvm/lib/Transforms/Utils/FixIrreducible.cpp
index 45e1d12c2bfff..ade23f942352d 100644
--- a/llvm/lib/Transforms/Utils/FixIrreducible.cpp
+++ b/llvm/lib/Transforms/Utils/FixIrreducible.cpp
@@ -79,6 +79,53 @@
 // Limitation: The pass cannot handle switch statements and indirect
 //             branches. Both must be lowered to plain branches first.
 //
+// CallBr support: CallBr is handled as a more general branch instruction which
+// can have multiple successors. The pass redirects the edges to intermediate
+// target blocks that unconditionally branch to the original callbr target
+// blocks. This allows the control flow hub to know to which of the original
+// target blocks to jump to.
+// Example input CFG:
+//                        Entry (callbr)
+//                       /     \
+//                      v       v
+//                      H ----> B
+//                      ^      /|
+//                       `----' |
+//                              v
+//                             Exit
+//
+// becomes:
+//                        Entry (callbr)
+//                       /     \
+//                      v       v
+//                 target.H   target.B
+//                      |       |
+//                      v       v
+//                      H ----> B
+//                      ^      /|
+//                       `----' |
+//                              v
+//                             Exit
+//
+// Note
+// OUTPUT CFG: Converted to a natural loop with a new header N.
+//
+//                        Entry (callbr)
+//                       /     \
+//                      v       v
+//                 target.H   target.B
+//                      \       /
+//                       \     /
+//                        v   v
+//                          N <---.
+//                         / \     \
+//                        /   \     |
+//                       v     v    /
+//                       H --> B --'
+//                             |
+//                             v
+//                            Exit
+//
 //===----------------------------------------------------------------------===//
 
 #include "llvm/Transforms/Utils/FixIrreducible.h"
@@ -231,6 +278,7 @@ static bool fixIrreducible(Cycle &C, CycleInfo &CI, DominatorTree &DT,
     return false;
   LLVM_DEBUG(dbgs() << "Processing cycle:\n" << CI.print(&C) << "\n";);
 
+  DomTreeUpdater DTU(DT, DomTreeUpdater::UpdateStrategy::Eager);
   ControlFlowHub CHub;
   SetVector<BasicBlock *> Predecessors;
 
@@ -242,18 +290,33 @@ static bool fixIrreducible(Cycle &C, CycleInfo &CI, DominatorTree &DT,
   }
 
   for (BasicBlock *P : Predecessors) {
-    auto *Branch = cast<BranchInst>(P->getTerminator());
-    // Exactly one of the two successors is the header.
-    BasicBlock *Succ0 = Branch->getSuccessor(0) == Header ? Header : nullptr;
-    BasicBlock *Succ1 = Succ0 ? nullptr : Header;
-    if (!Succ0)
-      assert(Branch->getSuccessor(1) == Header);
-    assert(Succ0 || Succ1);
-    CHub.addBranch(P, Succ0, Succ1);
-
-    LLVM_DEBUG(dbgs() << "Added internal branch: " << P->getName() << " -> "
-                      << (Succ0 ? Succ0->getName() : "") << " "
-                      << (Succ1 ? Succ1->getName() : "") << "\n");
+    if (BranchInst *Branch = dyn_cast<BranchInst>(P->getTerminator()); Branch) {
+      // Exactly one of the two successors is the header.
+      BasicBlock *Succ0 = Branch->getSuccessor(0) == Header ? Header : nullptr;
+      BasicBlock *Succ1 = Succ0 ? nullptr : Header;
+      if (!Succ0)
+        assert(Branch->getSuccessor(1) == Header);
+      assert(Succ0 || Succ1);
+      CHub.addBranch(P, Succ0, Succ1);
+
+      LLVM_DEBUG(dbgs() << "Added internal branch: " << P->getName() << " -> "
+                        << (Succ0 ? Succ0->getName() : "") << " "
+                        << (Succ1 ? Succ1->getName() : "") << "\n");
+    } else if (CallBrInst *CallBr = dyn_cast<CallBrInst>(P->getTerminator());
+               CallBr) {
+      for (unsigned I = 0; I < CallBr->getNumSuccessors(); ++I) {
+        BasicBlock *Succ = CallBr->getSuccessor(I);
+        if (Succ != Header)
+          continue;
+        BasicBlock *NewSucc = llvm::ControlFlowHub::createCallBrTarget(
+            CallBr, Succ, I, false, &CI, &DTU, LI);
+        CHub.addBranch(NewSucc, Succ);
+        LLVM_DEBUG(dbgs() << "Added internal branch: " << NewSucc->getName()
+                          << " -> " << Succ->getName() << "\n");
+      }
+    } else {
+      llvm_unreachable("Unsupported block terminator.");
+    }
   }
 
   // Redirect external incoming edges. This includes the edges on the header.
@@ -266,17 +329,32 @@ static bool fixIrreducible(Cycle &C, CycleInfo &CI, DominatorTree &DT,
   }
 
   for (BasicBlock *P : Predecessors) {
-    auto *Branch = cast<BranchInst>(P->getTerminator());
-    BasicBlock *Succ0 = Branch->getSuccessor(0);
-    Succ0 = C.contains(Succ0) ? Succ0 : nullptr;
-    BasicBlock *Succ1 =
-        Branch->isUnconditional() ? nullptr : Branch->getSuccessor(1);
-    Succ1 = Succ1 && C.contains(Succ1) ? Succ1 : nullptr;
-    CHub.addBranch(P, Succ0, Succ1);
-
-    LLVM_DEBUG(dbgs() << "Added external branch: " << P->getName() << " -> "
-                      << (Succ0 ? Succ0->getName() : "") << " "
-                      << (Succ1 ? Succ1->getName() : "") << "\n");
+    if (BranchInst *Branch = dyn_cast<BranchInst>(P->getTerminator()); Branch) {
+      BasicBlock *Succ0 = Branch->getSuccessor(0);
+      Succ0 = C.contains(Succ0) ? Succ0 : nullptr;
+      BasicBlock *Succ1 =
+          Branch->isUnconditional() ? nullptr : Branch->getSuccessor(1);
+      Succ1 = Succ1 && C.contains(Succ1) ? Succ1 : nullptr;
+      CHub.addBranch(P, Succ0, Succ1);
+
+      LLVM_DEBUG(dbgs() << "Added external branch: " << P->getName() << " -> "
+                        << (Succ0 ? Succ0->getName() : "") << " "
+                        << (Succ1 ? Succ1->getName() : "") << "\n");
+    } else if (CallBrInst *CallBr = dyn_cast<CallBrInst>(P->getTerminator());
+               CallBr) {
+      for (unsigned I = 0; I < CallBr->getNumSuccessors(); ++I) {
+        BasicBlock *Succ = CallBr->getSuccessor(I);
+        if (!C.contains(Succ))
+          continue;
+        BasicBlock *NewSucc = llvm::ControlFlowHub::createCallBrTarget(
+            CallBr, Succ, I, true, &CI, &DTU, LI);
+        CHub.addBranch(NewSucc, Succ);
+        LLVM_DEBUG(dbgs() << "Added external branch: " << NewSucc->getName()
+                          << " -> " << Succ->getName() << "\n");
+      }
+    } else {
+      llvm_unreachable("Unsupported block terminator.");
+    }
   }
 
   // Redirect all the backedges through a "hub" consisting of a series
@@ -292,7 +370,6 @@ static bool fixIrreducible(Cycle &C, CycleInfo &CI, DominatorTree &DT,
   SetVector<BasicBlock *> Entries;
   Entries.insert(C.entry_rbegin(), C.entry_rend());
 
-  DomTreeUpdater DTU(DT, DomTreeUpdater::UpdateStrategy::Eager);
   CHub.finalize(&DTU, GuardBlocks, "irr");
 #if defined(EXPENSIVE_CHECKS)
   assert(DT.verify(DominatorTree::VerificationLevel::Full));
@@ -325,7 +402,7 @@ static bool FixIrreducibleImpl(Function &F, CycleInfo &CI, DominatorTree &DT,
   LLVM_DEBUG(dbgs() << "===== Fix irreducible control-flow in function: "
                     << F.getName() << "\n");
 
-  assert(hasOnlySimpleTerminator(F) && "Unsupported block terminator.");
+  assert(hasOnlySimpleTerminatorOrCallBr(F) && "Unsupported block terminator.");
 
   bool Changed = false;
   for (Cycle *TopCycle : CI.toplevel_cycles()) {
diff --git a/llvm/lib/Transforms/Utils/UnifyLoopExits.cpp b/llvm/lib/Transforms/Utils/UnifyLoopExits.cpp
index 9f338dbc78cff..51e5aaa5225e1 100644
--- a/llvm/lib/Transforms/Utils/UnifyLoopExits.cpp
+++ b/llvm/lib/Transforms/Utils/UnifyLoopExits.cpp
@@ -12,7 +12,11 @@
 //
 // Limitation: This assumes that all terminators in the CFG are direct branches
 //             (the "br" instruction). The presence of any other control flow
-//             such as indirectbr, switch or callbr will cause an assert.
+//             such as indirectbr ot switch will cause an assert.
+//             The callbr terminator is supported by creating intermediate
+//             target blocks that unconditionally branch to the original target
+//             blocks. These intermediate target blocks can then be redirected
+//             through the ControlFlowHub as usual.
 //
 //===----------------------------------------------------------------------===//
 
@@ -150,25 +154,53 @@ static bool unifyLoopExits(DominatorTree &DT, LoopInfo &LI, Loop *L) {
   SmallVector<BasicBlock *, 8> ExitingBlocks;
   L->getExitingBlocks(ExitingBlocks);
 
+  DomTreeUpdater DTU(DT, DomTreeUpdater::UpdateStrategy::Eager);
+  SmallVector<BasicBlock *, 8> CallBrTargetBlocks;
   // Redirect exiting edges through a control flow hub.
   ControlFlowHub CHub;
-  for (auto *BB : ExitingBlocks) {
-    auto *Branch = cast<BranchInst>(BB->getTerminator());
-    BasicBlock *Succ0 = Branch->getSuccessor(0);
-    Succ0 = L->contains(Succ0) ? nullptr : Succ0;
-
-    BasicBlock *Succ1 =
-        Branch->isUnconditional() ? nullptr : Branch->getSuccessor(1);
-    Succ1 = L->contains(Succ1) ? nullptr : Succ1;
-    CHub.addBranch(BB, Succ0, Succ1);
-
-    LLVM_DEBUG(dbgs() << "Added exiting branch: " << BB->getName() << " -> {"
-                      << (Succ0 ? Succ0->getName() : "<none>") << ", "
-                      << (Succ1 ? Succ1->getName() : "<none>") << "}\n");
+
+  for (unsigned I = 0; I < ExitingBlocks.size(); ++I) {
+    BasicBlock *BB = ExitingBlocks[I];
+    if (BranchInst *Branch = dyn_cast<BranchInst>(BB->getTerminator());
+        Branch) {
+      BasicBlock *Succ0 = Branch->getSuccessor(0);
+      Succ0 = L->contains(Succ0) ? nullptr : Succ0;
+
+      BasicBlock *Succ1 =
+          Branch->isUnconditional() ? nullptr : Branch->getSuccessor(1);
+      Succ1 = L->contains(Succ1) ? nullptr : Succ1;
+      CHub.addBranch(BB, Succ0, Succ1);
+
+      LLVM_DEBUG(dbgs() << "Added exiting branch: " << BB->getName() << " -> {"
+                        << (Succ0 ? Succ0->getName() : "<none>") << ", "
+                        << (Succ1 ? Succ1->getName() : "<none>") << "}\n");
+    } else if (CallBrInst *CallBr = dyn_cast<CallBrInst>(BB->getTerminator());
+               CallBr) {
+      for (unsigned J = 0; J < CallBr->getNumSuccessors(); ++J) {
+        BasicBlock *Succ = CallBr->getSuccessor(J);
+        if (L->contains(Succ))
+          continue;
+        BasicBlock *NewSucc = ControlFlowHub::createCallBrTarget(
+            CallBr, Succ, J, false, nullptr, &DTU, &LI);
+        // ExitingBlocks is later used to restore SSA, so we need to make sure
+        // that the blocks used for phi nodes in the guard blocks match the
+        // predecessors of the guard blocks, which, in the case of callbr, are
+        // the new intermediate target blocks instead of the callbr blocks
+        // themselves.
+        ExitingBlocks[I] = NewSucc;
+        CHub.addBranch(NewSucc, Succ);
+        LLVM_DEBUG(dbgs() << "Added exiting branch: " << NewSucc->getName()
+                          << " -> " << Succ->getName() << "\n");
+        // Also add the new target block to the list of exiting blocks that
+        // should later be added to the parent loops.
+        CallBrTargetBlocks.push_back(NewSucc);
+      }
+    } else {
+      llvm_unreachable("Unsupported block terminator.");
+    }
   }
 
   SmallVector<BasicBlock *, 8> GuardBlocks;
-  DomTreeUpdater DTU(DT, DomTreeUpdater::UpdateStrategy::Eager);
   Ba...
[truncated]

ro-i added a commit that referenced this pull request Aug 5, 2025
…lbr instruction with basic inline-asm

Finishes adding basic inline-asm callbr support for AMDGPU, started by
#149308.
@ro-i ro-i force-pushed the users/ro-i/callbr-amdgpu_1 branch from 28518f9 to 691bb41 Compare August 13, 2025 20:23
ro-i added a commit that referenced this pull request Aug 13, 2025
…lbr instruction with basic inline-asm

Finishes adding basic inline-asm callbr support for AMDGPU, started by
#149308.
@ro-i
Copy link
Contributor Author

ro-i commented Oct 27, 2025

Ping

Comment on lines 149 to 152
static BasicBlock *
createCallBrTarget(CallBrInst *CallBr, BasicBlock *Succ, unsigned SuccIdx,
CycleInfo *CI = nullptr, DomTreeUpdater *DTU = nullptr,
LoopInfo *LI = nullptr, bool *UpdatedLI = nullptr);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this like llvm::SplitEdge?

Consider moving it to BasicBlockUtils.h and a similar name such as splitCallBrEdge, or even reusing SplitBlock.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not sure if the interface is generic enough? It's specifically crafted to add the intermediate callbr target blocks in the context of the control flow hub, where it's important that the cycle and loop info is updated as it's done by updateCycleLoopInfo

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I understand that the semantics may have some nuance. Splitting the predecessor moves the existing branch instruction (in this case callbr) behind the new block. Identifying the edge with the sources and destination BasicBlock can be ambiguous if there are multiple such edges. So there may be a reason to introduce a new function, but it sounds to me as it should be an overload of SplitEdge.

SplitEdge only does not take a CycleInfo argument because CycleInfo is newer than SplitEdge. Preserving additional analyses can easily be added. MemorySSAUpdater was also added retroactively.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Regarding this and the other comment: I now moved the function. But I think that there's not a lot of potential for code re-using due to the different scenarios (see the comments in the code)

Comment on lines +98 to +108
// Entry (callbr)
// / \
// v v
// target.H target.B
// | |
// v v
// H ----> B
// ^ /|
// `----' |
// v
// Exit
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What I'm doing with callbr in this context is not really related to critical edges, I think. The added blocks between callbr and its original target blocks only facilitate the callbr handling. So that you know to which callbr target you should branch to from the control flow hub (that is being introduced by FixIrreducible/UnifyLoopExits).

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sounds exactly like what SplitEdge is doing. The only complicated case is the one of critical edges, otherwise SplitEdge is identical to splitting the predecessor or successor block. For normalzing the CFG, splitting may not be necessary except for critical edges.

First batch of changes to add support for basic inline-asm callbr for
the AMDGPU backend.
@ro-i ro-i force-pushed the users/ro-i/callbr-amdgpu_1 branch from 1564b82 to 11fcaae Compare October 29, 2025 11:17
Copy link
Member

@Meinersbur Meinersbur left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM.

If in the future someone is refactoring SplitEdge, they may now also see and possibly include SplitBrEdge.

@ro-i ro-i merged commit 8954011 into main Oct 30, 2025
10 checks passed
@ro-i ro-i deleted the users/ro-i/callbr-amdgpu_1 branch October 30, 2025 13:22
@ro-i
Copy link
Contributor Author

ro-i commented Oct 30, 2025

merged, thanks for all the reviews :)
I'll now rebase the other PRs up in the PR stack. But I'll wait at least a couple of days to merge the next one so that there won't be a mess if something needs to be reverted or fixed

ro-i added a commit that referenced this pull request Oct 30, 2025
…lbr instruction with basic inline-asm

Finishes adding basic inline-asm callbr support for AMDGPU, started by
#149308.
aokblast pushed a commit to aokblast/llvm-project that referenced this pull request Oct 30, 2025
…sm (llvm#149308)

First batch of changes to add support for inline-asm callbr for the
AMDGPU backend.
luciechoi pushed a commit to luciechoi/llvm-project that referenced this pull request Nov 1, 2025
…sm (llvm#149308)

First batch of changes to add support for inline-asm callbr for the
AMDGPU backend.
DEBADRIBASAK pushed a commit to DEBADRIBASAK/llvm-project that referenced this pull request Nov 3, 2025
…sm (llvm#149308)

First batch of changes to add support for inline-asm callbr for the
AMDGPU backend.
ro-i added a commit that referenced this pull request Nov 3, 2025
…lbr instruction with inline-asm (#152161)

Finishes adding inline-asm callbr support for AMDGPU, started by
#149308.
llvm-sync bot pushed a commit to arm/arm-toolchain that referenced this pull request Nov 3, 2025
…ort for callbr instruction with inline-asm (#152161)

Finishes adding inline-asm callbr support for AMDGPU, started by
llvm/llvm-project#149308.
ro-i added a commit that referenced this pull request Nov 3, 2025
…lbr instruction with basic inline-asm

Finishes adding basic inline-asm callbr support for AMDGPU, started by
#149308.
ro-i added a commit that referenced this pull request Nov 3, 2025
…t for callbr instruction with inline-asm (#152161)

Finishes adding inline-asm callbr support for AMDGPU, started by
#149308.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

6 participants