Skip to content

Conversation

@fhahn
Copy link
Contributor

@fhahn fhahn commented Jun 12, 2025

Connect SCEV and memory runtime check block directly in VPlan as VPIRBasicBlocks, removing ILV::emitSCEVChecks and ILV::emitMemRuntimeChecks.

The new logic is currently split across LoopVectorizationPlanner::addRuntimeChecks which collects a list of {Condition, CheckBlock} pairs and performs some checks and emits remarks if needed. The list of checks is then added to VPlan in VPlanTransforms::connectCheckBlocks.

@llvmbot
Copy link
Member

llvmbot commented Jun 12, 2025

@llvm/pr-subscribers-llvm-transforms

@llvm/pr-subscribers-backend-risc-v

Author: Florian Hahn (fhahn)

Changes

Connect SCEV and memory runtime check block directly in VPlan as VPIRBasicBlocks, removing ILV::emitSCEVChecks and ILV::emitMemRuntimeChecks.

The new logic is currently split across LoopVectorizationPlanner::addRuntimeChecks which collects a list of {Condition, CheckBlock} pairs and performs some checks and emits remarks if needed. The list of checks is then added to VPlan in VPlanTransforms::connectCheckBlocks.


Patch is 22.20 KiB, truncated to 20.00 KiB below, full version: https://github.com/llvm/llvm-project/pull/143879.diff

6 Files Affected:

  • (modified) llvm/lib/Transforms/Vectorize/LoopVectorizationPlanner.h (+7)
  • (modified) llvm/lib/Transforms/Vectorize/LoopVectorize.cpp (+77-132)
  • (modified) llvm/lib/Transforms/Vectorize/VPlan.cpp (+2-1)
  • (modified) llvm/lib/Transforms/Vectorize/VPlanConstruction.cpp (+39)
  • (modified) llvm/lib/Transforms/Vectorize/VPlanTransforms.h (+7)
  • (modified) llvm/test/Transforms/LoopVectorize/RISCV/riscv-vector-reverse.ll (+4)
diff --git a/llvm/lib/Transforms/Vectorize/LoopVectorizationPlanner.h b/llvm/lib/Transforms/Vectorize/LoopVectorizationPlanner.h
index 70f541d64b305..aae7c9a0075d1 100644
--- a/llvm/lib/Transforms/Vectorize/LoopVectorizationPlanner.h
+++ b/llvm/lib/Transforms/Vectorize/LoopVectorizationPlanner.h
@@ -28,6 +28,10 @@
 #include "llvm/ADT/SmallSet.h"
 #include "llvm/Support/InstructionCost.h"
 
+namespace {
+class GeneratedRTChecks;
+}
+
 namespace llvm {
 
 class LoopInfo;
@@ -548,6 +552,9 @@ class LoopVectorizationPlanner {
                                   VPRecipeBuilder &RecipeBuilder,
                                   ElementCount MinVF);
 
+  /// Add the runtime checks from \p RTChecks to \p VPlan.
+  void addRuntimeChecks(VPlan &Plan, GeneratedRTChecks &RTChecks) const;
+
 #ifndef NDEBUG
   /// \return The most profitable vectorization factor for the available VPlans
   /// and the cost of that VF.
diff --git a/llvm/lib/Transforms/Vectorize/LoopVectorize.cpp b/llvm/lib/Transforms/Vectorize/LoopVectorize.cpp
index 474f856d20461..50ee102b92b53 100644
--- a/llvm/lib/Transforms/Vectorize/LoopVectorize.cpp
+++ b/llvm/lib/Transforms/Vectorize/LoopVectorize.cpp
@@ -399,12 +399,6 @@ static cl::opt<bool> EnableEarlyExitVectorization(
     cl::desc(
         "Enable vectorization of early exit loops with uncountable exits."));
 
-// Likelyhood of bypassing the vectorized loop because assumptions about SCEV
-// variables not overflowing do not hold. See `emitSCEVChecks`.
-static constexpr uint32_t SCEVCheckBypassWeights[] = {1, 127};
-// Likelyhood of bypassing the vectorized loop because pointers overlap. See
-// `emitMemRuntimeChecks`.
-static constexpr uint32_t MemCheckBypassWeights[] = {1, 127};
 // Likelyhood of bypassing the vectorized loop because there are zero trips left
 // after prolog. See `emitIterationCountCheck`.
 static constexpr uint32_t MinItersBypassWeights[] = {1, 127};
@@ -534,16 +528,6 @@ class InnerLoopVectorizer {
   /// it overflows.
   void emitIterationCountCheck(BasicBlock *Bypass);
 
-  /// Emit a bypass check to see if all of the SCEV assumptions we've
-  /// had to make are correct. Returns the block containing the checks or
-  /// nullptr if no checks have been added.
-  BasicBlock *emitSCEVChecks(BasicBlock *Bypass);
-
-  /// Emit bypass checks to check any memory assumptions we may have made.
-  /// Returns the block containing the checks or nullptr if no checks have been
-  /// added.
-  BasicBlock *emitMemRuntimeChecks(BasicBlock *Bypass);
-
   /// Emit basic blocks (prefixed with \p Prefix) for the iteration check,
   /// vector loop preheader, middle block and scalar preheader.
   void createVectorLoopSkeleton(StringRef Prefix);
@@ -1782,7 +1766,6 @@ class GeneratedRTChecks {
   SCEVExpander MemCheckExp;
 
   bool CostTooHigh = false;
-  const bool AddBranchWeights;
 
   Loop *OuterLoop = nullptr;
 
@@ -1794,11 +1777,10 @@ class GeneratedRTChecks {
 public:
   GeneratedRTChecks(PredicatedScalarEvolution &PSE, DominatorTree *DT,
                     LoopInfo *LI, TargetTransformInfo *TTI,
-                    const DataLayout &DL, bool AddBranchWeights,
-                    TTI::TargetCostKind CostKind)
+                    const DataLayout &DL, TTI::TargetCostKind CostKind)
       : DT(DT), LI(LI), TTI(TTI), SCEVExp(*PSE.getSE(), DL, "scev.check"),
-        MemCheckExp(*PSE.getSE(), DL, "scev.check"),
-        AddBranchWeights(AddBranchWeights), PSE(PSE), CostKind(CostKind) {}
+        MemCheckExp(*PSE.getSE(), DL, "scev.check"), PSE(PSE),
+        CostKind(CostKind) {}
 
   /// Generate runtime checks in SCEVCheckBlock and MemCheckBlock, so we can
   /// accurately estimate the cost of the runtime checks. The blocks are
@@ -2016,61 +1998,35 @@ class GeneratedRTChecks {
   /// Adds the generated SCEVCheckBlock before \p LoopVectorPreHeader and
   /// adjusts the branches to branch to the vector preheader or \p Bypass,
   /// depending on the generated condition.
-  BasicBlock *emitSCEVChecks(BasicBlock *Bypass,
-                             BasicBlock *LoopVectorPreHeader) {
+  std::pair<Value *, BasicBlock *> emitSCEVChecks() {
     using namespace llvm::PatternMatch;
     if (!SCEVCheckCond || match(SCEVCheckCond, m_ZeroInt()))
-      return nullptr;
+      return {nullptr, nullptr};
 
-    auto *Pred = LoopVectorPreHeader->getSinglePredecessor();
-    BranchInst::Create(LoopVectorPreHeader, SCEVCheckBlock);
-
-    SCEVCheckBlock->getTerminator()->eraseFromParent();
-    SCEVCheckBlock->moveBefore(LoopVectorPreHeader);
-    Pred->getTerminator()->replaceSuccessorWith(LoopVectorPreHeader,
-                                                SCEVCheckBlock);
-
-    BranchInst &BI =
-        *BranchInst::Create(Bypass, LoopVectorPreHeader, SCEVCheckCond);
-    if (AddBranchWeights)
-      setBranchWeights(BI, SCEVCheckBypassWeights, /*IsExpected=*/false);
-    ReplaceInstWithInst(SCEVCheckBlock->getTerminator(), &BI);
-    // Mark the check as used, to prevent it from being removed during cleanup.
+    Value *Cond = SCEVCheckCond;
     SCEVCheckCond = nullptr;
     AddedAnyChecks = true;
-    return SCEVCheckBlock;
+    return {Cond, SCEVCheckBlock};
   }
 
   /// Adds the generated MemCheckBlock before \p LoopVectorPreHeader and adjusts
   /// the branches to branch to the vector preheader or \p Bypass, depending on
   /// the generated condition.
-  BasicBlock *emitMemRuntimeChecks(BasicBlock *Bypass,
-                                   BasicBlock *LoopVectorPreHeader) {
+  std::pair<Value *, BasicBlock *> emitMemRuntimeChecks() {
     // Check if we generated code that checks in runtime if arrays overlap.
     if (!MemRuntimeCheckCond)
-      return nullptr;
-
-    auto *Pred = LoopVectorPreHeader->getSinglePredecessor();
-    Pred->getTerminator()->replaceSuccessorWith(LoopVectorPreHeader,
-                                                MemCheckBlock);
-
-    MemCheckBlock->moveBefore(LoopVectorPreHeader);
-
-    BranchInst &BI =
-        *BranchInst::Create(Bypass, LoopVectorPreHeader, MemRuntimeCheckCond);
-    if (AddBranchWeights) {
-      setBranchWeights(BI, MemCheckBypassWeights, /*IsExpected=*/false);
-    }
-    ReplaceInstWithInst(MemCheckBlock->getTerminator(), &BI);
-    MemCheckBlock->getTerminator()->setDebugLoc(
-        Pred->getTerminator()->getDebugLoc());
+      return {nullptr, nullptr};
 
     // Mark the check as used, to prevent it from being removed during cleanup.
+    Value *Cond = MemRuntimeCheckCond;
     MemRuntimeCheckCond = nullptr;
     AddedAnyChecks = true;
-    return MemCheckBlock;
+    return {Cond, MemCheckBlock};
   }
 
+  BasicBlock *getSCEVCheckBlock() const { return SCEVCheckBlock; }
+  BasicBlock *getMemCheckBlock() const { return MemCheckBlock; }
+
   /// Return true if any runtime checks have been added
   bool hasChecks() const { return AddedAnyChecks; }
 };
@@ -2451,53 +2407,6 @@ void InnerLoopVectorizer::emitIterationCountCheck(BasicBlock *Bypass) {
          "Plan's entry must be TCCCheckBlock");
 }
 
-BasicBlock *InnerLoopVectorizer::emitSCEVChecks(BasicBlock *Bypass) {
-  BasicBlock *const SCEVCheckBlock =
-      RTChecks.emitSCEVChecks(Bypass, LoopVectorPreHeader);
-  if (!SCEVCheckBlock)
-    return nullptr;
-
-  assert((!Cost->OptForSize ||
-          Cost->Hints->getForce() == LoopVectorizeHints::FK_Enabled) &&
-         "Cannot SCEV check stride or overflow when optimizing for size");
-
-  introduceCheckBlockInVPlan(SCEVCheckBlock);
-  return SCEVCheckBlock;
-}
-
-BasicBlock *InnerLoopVectorizer::emitMemRuntimeChecks(BasicBlock *Bypass) {
-  BasicBlock *const MemCheckBlock =
-      RTChecks.emitMemRuntimeChecks(Bypass, LoopVectorPreHeader);
-
-  // Check if we generated code that checks in runtime if arrays overlap. We put
-  // the checks into a separate block to make the more common case of few
-  // elements faster.
-  if (!MemCheckBlock)
-    return nullptr;
-
-  // VPlan-native path does not do any analysis for runtime checks currently.
-  assert((!EnableVPlanNativePath || OrigLoop->begin() == OrigLoop->end()) &&
-         "Runtime checks are not supported for outer loops yet");
-
-  if (Cost->OptForSize) {
-    assert(Cost->Hints->getForce() == LoopVectorizeHints::FK_Enabled &&
-           "Cannot emit memory checks when optimizing for size, unless forced "
-           "to vectorize.");
-    ORE->emit([&]() {
-      return OptimizationRemarkAnalysis(DEBUG_TYPE, "VectorizationCodeSize",
-                                        OrigLoop->getStartLoc(),
-                                        OrigLoop->getHeader())
-             << "Code-size may be reduced by not forcing "
-                "vectorization, or by source-code modifications "
-                "eliminating the need for runtime checks "
-                "(e.g., adding 'restrict').";
-    });
-  }
-
-  introduceCheckBlockInVPlan(MemCheckBlock);
-  return MemCheckBlock;
-}
-
 /// Replace \p VPBB with a VPIRBasicBlock wrapping \p IRBB. All recipes from \p
 /// VPBB are moved to the end of the newly created VPIRBasicBlock. VPBB must
 /// have a single predecessor, which is rewired to the new VPIRBasicBlock. All
@@ -2614,15 +2523,6 @@ BasicBlock *InnerLoopVectorizer::createVectorizedLoopSkeleton() {
   // to the scalar loop.
   emitIterationCountCheck(LoopScalarPreHeader);
 
-  // Generate the code to check any assumptions that we've made for SCEV
-  // expressions.
-  emitSCEVChecks(LoopScalarPreHeader);
-
-  // Generate the code that checks in runtime if arrays overlap. We put the
-  // checks into a separate block to make the more common case of few elements
-  // faster.
-  emitMemRuntimeChecks(LoopScalarPreHeader);
-
   replaceVPBBWithIRVPBB(Plan.getScalarPreheader(), LoopScalarPreHeader);
   return LoopVectorPreHeader;
 }
@@ -7312,6 +7212,11 @@ DenseMap<const SCEV *, Value *> LoopVectorizationPlanner::executePlan(
   VPlanTransforms::runPass(VPlanTransforms::unrollByUF, BestVPlan, BestUF,
                            OrigLoop->getHeader()->getContext());
   VPlanTransforms::runPass(VPlanTransforms::materializeBroadcasts, BestVPlan);
+
+  if (!VectorizingEpilogue)
+    addRuntimeChecks(BestVPlan, ILV.RTChecks);
+
+  VPBasicBlock *VectorPH = cast<VPBasicBlock>(BestVPlan.getVectorPreheader());
   VPlanTransforms::optimizeForVFAndUF(BestVPlan, BestVF, BestUF, PSE);
   VPlanTransforms::simplifyRecipes(BestVPlan, *Legal->getWidestInductionType());
   VPlanTransforms::narrowInterleaveGroups(
@@ -7359,7 +7264,8 @@ DenseMap<const SCEV *, Value *> LoopVectorizationPlanner::executePlan(
 
   // 1. Set up the skeleton for vectorization, including vector pre-header and
   // middle block. The vector loop is created during VPlan execution.
-  VPBasicBlock *VectorPH = cast<VPBasicBlock>(Entry->getSuccessors()[1]);
+  BasicBlock *EntryBB =
+      cast<VPIRBasicBlock>(BestVPlan.getEntry())->getIRBasicBlock();
   State.CFG.PrevBB = ILV.createVectorizedLoopSkeleton();
   if (VectorizingEpilogue)
     VPlanTransforms::removeDeadRecipes(BestVPlan);
@@ -7383,6 +7289,12 @@ DenseMap<const SCEV *, Value *> LoopVectorizationPlanner::executePlan(
       ILV.getOrCreateVectorTripCount(ILV.LoopVectorPreHeader), State);
   replaceVPBBWithIRVPBB(VectorPH, State.CFG.PrevBB);
 
+  // Move check blocks to their final position.
+  if (BasicBlock *MemCheckBlock = ILV.RTChecks.getMemCheckBlock())
+    MemCheckBlock->moveAfter(EntryBB);
+  if (BasicBlock *SCEVCheckBlock = ILV.RTChecks.getSCEVCheckBlock())
+    SCEVCheckBlock->moveAfter(EntryBB);
+
   BestVPlan.execute(&State);
 
   // 2.5 When vectorizing the epilogue, fix reduction resume values from the
@@ -7483,15 +7395,6 @@ BasicBlock *EpilogueVectorizerMainLoop::createEpilogueVectorizedLoopSkeleton() {
       emitIterationCountCheck(LoopScalarPreHeader, true);
   EPI.EpilogueIterationCountCheck->setName("iter.check");
 
-  // Generate the code to check any assumptions that we've made for SCEV
-  // expressions.
-  EPI.SCEVSafetyCheck = emitSCEVChecks(LoopScalarPreHeader);
-
-  // Generate the code that checks at runtime if arrays overlap. We put the
-  // checks into a separate block to make the more common case of few elements
-  // faster.
-  EPI.MemSafetyCheck = emitMemRuntimeChecks(LoopScalarPreHeader);
-
   // Generate the iteration count check for the main loop, *after* the check
   // for the epilogue loop, so that the path-length is shorter for the case
   // that goes directly through the vector epilogue. The longer-path length for
@@ -7608,6 +7511,13 @@ EpilogueVectorizerEpilogueLoop::createEpilogueVectorizedLoopSkeleton() {
   EPI.EpilogueIterationCountCheck->getTerminator()->replaceUsesOfWith(
       VecEpilogueIterationCountCheck, LoopScalarPreHeader);
 
+  BasicBlock *SCEVCheckBlock = RTChecks.getSCEVCheckBlock();
+  if (SCEVCheckBlock && SCEVCheckBlock->hasNPredecessorsOrMore(1))
+    EPI.SCEVSafetyCheck = SCEVCheckBlock;
+
+  BasicBlock *MemCheckBlock = RTChecks.getMemCheckBlock();
+  if (MemCheckBlock && MemCheckBlock->hasNPredecessorsOrMore(1))
+    EPI.MemSafetyCheck = MemCheckBlock;
   if (EPI.SCEVSafetyCheck)
     EPI.SCEVSafetyCheck->getTerminator()->replaceUsesOfWith(
         VecEpilogueIterationCountCheck, LoopScalarPreHeader);
@@ -9325,6 +9235,47 @@ void LoopVectorizationPlanner::adjustRecipesForReductions(
   VPlanTransforms::runPass(VPlanTransforms::clearReductionWrapFlags, *Plan);
 }
 
+void LoopVectorizationPlanner::addRuntimeChecks(
+    VPlan &Plan, GeneratedRTChecks &RTChecks) const {
+  SmallVector<std::pair<VPValue *, VPIRBasicBlock *>> Checks;
+  const auto &[SCEVCheckCond, SCEVCheckBlock] = RTChecks.emitSCEVChecks();
+  if (SCEVCheckBlock) {
+    assert((!CM.OptForSize ||
+            CM.Hints->getForce() == LoopVectorizeHints::FK_Enabled) &&
+           "Cannot SCEV check stride or overflow when optimizing for size");
+    Checks.emplace_back(Plan.getOrAddLiveIn(SCEVCheckCond),
+                        Plan.createVPIRBasicBlock(SCEVCheckBlock));
+  }
+  const auto &[MemCheckCond, MemCheckBlock] = RTChecks.emitMemRuntimeChecks();
+  if (MemCheckBlock) {
+    // VPlan-native path does not do any analysis for runtime checks
+    // currently.
+    assert((!EnableVPlanNativePath || OrigLoop->begin() == OrigLoop->end()) &&
+           "Runtime checks are not supported for outer loops yet");
+
+    if (CM.OptForSize) {
+      assert(
+          CM.Hints->getForce() == LoopVectorizeHints::FK_Enabled &&
+          "Cannot emit memory checks when optimizing for size, unless forced "
+          "to vectorize.");
+      ORE->emit([&]() {
+        return OptimizationRemarkAnalysis(DEBUG_TYPE, "VectorizationCodeSize",
+                                          OrigLoop->getStartLoc(),
+                                          OrigLoop->getHeader())
+               << "Code-size may be reduced by not forcing "
+                  "vectorization, or by source-code modifications "
+                  "eliminating the need for runtime checks "
+                  "(e.g., adding 'restrict').";
+      });
+    }
+    Checks.emplace_back(Plan.getOrAddLiveIn(MemCheckCond),
+                        Plan.createVPIRBasicBlock(MemCheckBlock));
+  }
+  VPlanTransforms::connectCheckBlocks(
+      Plan, Checks,
+      hasBranchWeightMD(*OrigLoop->getLoopLatch()->getTerminator()));
+}
+
 void VPDerivedIVRecipe::execute(VPTransformState &State) {
   assert(!State.Lane && "VPDerivedIVRecipe being replicated.");
 
@@ -9446,10 +9397,7 @@ static bool processLoopInVPlanNativePath(
   VPlan &BestPlan = LVP.getPlanFor(VF.Width);
 
   {
-    bool AddBranchWeights =
-        hasBranchWeightMD(*L->getLoopLatch()->getTerminator());
-    GeneratedRTChecks Checks(PSE, DT, LI, TTI, F->getDataLayout(),
-                             AddBranchWeights, CM.CostKind);
+    GeneratedRTChecks Checks(PSE, DT, LI, TTI, F->getDataLayout(), CM.CostKind);
     InnerLoopVectorizer LB(L, PSE, LI, DT, TLI, TTI, AC, ORE, VF.Width,
                            VF.Width, 1, &CM, BFI, PSI, Checks, BestPlan);
     LLVM_DEBUG(dbgs() << "Vectorizing outer loop in \""
@@ -10085,10 +10033,7 @@ bool LoopVectorizePass::processLoop(Loop *L) {
   if (ORE->allowExtraAnalysis(LV_NAME))
     LVP.emitInvalidCostRemarks(ORE);
 
-  bool AddBranchWeights =
-      hasBranchWeightMD(*L->getLoopLatch()->getTerminator());
-  GeneratedRTChecks Checks(PSE, DT, LI, TTI, F->getDataLayout(),
-                           AddBranchWeights, CM.CostKind);
+  GeneratedRTChecks Checks(PSE, DT, LI, TTI, F->getDataLayout(), CM.CostKind);
   if (LVP.hasPlanWithVF(VF.Width)) {
     // Select the interleave count.
     IC = CM.selectInterleaveCount(LVP.getPlanFor(VF.Width), VF.Width, VF.Cost);
diff --git a/llvm/lib/Transforms/Vectorize/VPlan.cpp b/llvm/lib/Transforms/Vectorize/VPlan.cpp
index 1838562f26b82..8068a6b3b968f 100644
--- a/llvm/lib/Transforms/Vectorize/VPlan.cpp
+++ b/llvm/lib/Transforms/Vectorize/VPlan.cpp
@@ -484,7 +484,8 @@ void VPBasicBlock::connectToPredecessors(VPTransformState &State) {
       unsigned idx = PredVPSuccessors.front() == this ? 0 : 1;
       assert((TermBr && (!TermBr->getSuccessor(idx) ||
                          (isa<VPIRBasicBlock>(this) &&
-                          TermBr->getSuccessor(idx) == NewBB))) &&
+                          (TermBr->getSuccessor(idx) == NewBB ||
+                           PredVPBlock == getPlan()->getEntry())))) &&
              "Trying to reset an existing successor block.");
       TermBr->setSuccessor(idx, NewBB);
     }
diff --git a/llvm/lib/Transforms/Vectorize/VPlanConstruction.cpp b/llvm/lib/Transforms/Vectorize/VPlanConstruction.cpp
index 593e5063802ba..a55e95ef274b7 100644
--- a/llvm/lib/Transforms/Vectorize/VPlanConstruction.cpp
+++ b/llvm/lib/Transforms/Vectorize/VPlanConstruction.cpp
@@ -20,6 +20,7 @@
 #include "llvm/Analysis/LoopInfo.h"
 #include "llvm/Analysis/LoopIterator.h"
 #include "llvm/Analysis/ScalarEvolution.h"
+#include "llvm/IR/MDBuilder.h"
 
 #define DEBUG_TYPE "vplan"
 
@@ -589,3 +590,41 @@ void VPlanTransforms::createLoopRegions(VPlan &Plan) {
   TopRegion->setName("vector loop");
   TopRegion->getEntryBasicBlock()->setName("vector.body");
 }
+
+// Likelyhood of bypassing the vectorized loop because SCEV assumptions or
+// memory runtime checks.
+static constexpr uint32_t CheckBypassWeights[] = {1, 127};
+
+void VPlanTransforms::connectCheckBlocks(
+    VPlan &Plan, ArrayRef<std::pair<VPValue *, VPIRBasicBlock *>> Checks,
+    bool AddBranchWeights) {
+  VPBlockBase *VectorPH = Plan.getVectorPreheader();
+  VPBlockBase *ScalarPH = Plan.getScalarPreheader();
+  for (const auto &[Cond, CheckBlock] : Checks) {
+    VPBlockBase *PreVectorPH = VectorPH->getSinglePredecessor();
+    VPBlockUtils::insertOnEdge(PreVectorPH, VectorPH, CheckBlock);
+    VPBlockUtils::connectBlocks(CheckBlock, ScalarPH);
+    CheckBlock->swapSuccessors();
+
+    // We just connected a new block to the scalar preheader. Update all
+    // VPPhis by adding an incoming value for it, replicating the last value.
+    unsigned NumPredecessors = ScalarPH->getNumPredecessors();
+    for (VPRecipeBase &R : cast<VPBasicBlock>(ScalarPH)->phis()) {
+      assert(isa<VPPhi>(&R) && "Phi expected to be VPPhi");
+      assert(cast<VPPhi>(&R)->getNumIncoming() == NumPredecessors - 1 &&
+             "must have incoming values for all operands");
+      R.addOperand(R.getOperand(NumPredecessors - 2));
+    }
+
+    VPIRMetadata VPBranchWeights;
+    auto *Term = VPBuilder(CheckBlock)
+                     .createNaryOp(VPInstruction::BranchOnCond, {Cond},
+                                   Plan.getCanonicalIV()->getDebugLoc());
+    if (AddBranchWeights) {
+      MDBuilder MDB(Plan.getScalarHeader()->getIRBasicBlock()->getContext());
+      MDNode *BranchWeights =
+          MDB.createBranchWeights(CheckBypassWeights, /*IsExpected=*/false);
+      Term->addMetadata(LLVMContext::MD_prof, BranchWeights);
+    }
+  }
+}
diff --git a/llvm/lib/Transforms/Vectorize/VPlanTransforms.h b/llvm/lib/Transforms/Vectorize/VPlanTransforms.h
index 34e2de4eb3b74..c13cabc87ce31 100644
--- a/llvm/lib/Transforms/Vectorize/VPlanTransforms.h
+++ b/llvm/lib/Transforms/Vectorize/VPlanTransforms.h
@@ -74,6 +74,13 @@ struct VPlanTransforms {
   /// flat CFG into a hierarchical CFG.
   static void createLoopRegions(VPlan &Plan);
 
+  /// Connect the blocks in \p Checks to \p Plan, using the corresponding
+  /// VPValue as branch condition.
+  static void
+  connectCheckBlocks(VPlan &Plan,
+                     ArrayRef<std::pair<VPValue *, VPIRBasicBlock *>> ...
[truncated]

@llvmbot
Copy link
Member

llvmbot commented Jun 12, 2025

@llvm/pr-subscribers-vectorizers

Author: Florian Hahn (fhahn)

Changes

Connect SCEV and memory runtime check block directly in VPlan as VPIRBasicBlocks, removing ILV::emitSCEVChecks and ILV::emitMemRuntimeChecks.

The new logic is currently split across LoopVectorizationPlanner::addRuntimeChecks which collects a list of {Condition, CheckBlock} pairs and performs some checks and emits remarks if needed. The list of checks is then added to VPlan in VPlanTransforms::connectCheckBlocks.


Patch is 22.20 KiB, truncated to 20.00 KiB below, full version: https://github.com/llvm/llvm-project/pull/143879.diff

6 Files Affected:

  • (modified) llvm/lib/Transforms/Vectorize/LoopVectorizationPlanner.h (+7)
  • (modified) llvm/lib/Transforms/Vectorize/LoopVectorize.cpp (+77-132)
  • (modified) llvm/lib/Transforms/Vectorize/VPlan.cpp (+2-1)
  • (modified) llvm/lib/Transforms/Vectorize/VPlanConstruction.cpp (+39)
  • (modified) llvm/lib/Transforms/Vectorize/VPlanTransforms.h (+7)
  • (modified) llvm/test/Transforms/LoopVectorize/RISCV/riscv-vector-reverse.ll (+4)
diff --git a/llvm/lib/Transforms/Vectorize/LoopVectorizationPlanner.h b/llvm/lib/Transforms/Vectorize/LoopVectorizationPlanner.h
index 70f541d64b305..aae7c9a0075d1 100644
--- a/llvm/lib/Transforms/Vectorize/LoopVectorizationPlanner.h
+++ b/llvm/lib/Transforms/Vectorize/LoopVectorizationPlanner.h
@@ -28,6 +28,10 @@
 #include "llvm/ADT/SmallSet.h"
 #include "llvm/Support/InstructionCost.h"
 
+namespace {
+class GeneratedRTChecks;
+}
+
 namespace llvm {
 
 class LoopInfo;
@@ -548,6 +552,9 @@ class LoopVectorizationPlanner {
                                   VPRecipeBuilder &RecipeBuilder,
                                   ElementCount MinVF);
 
+  /// Add the runtime checks from \p RTChecks to \p VPlan.
+  void addRuntimeChecks(VPlan &Plan, GeneratedRTChecks &RTChecks) const;
+
 #ifndef NDEBUG
   /// \return The most profitable vectorization factor for the available VPlans
   /// and the cost of that VF.
diff --git a/llvm/lib/Transforms/Vectorize/LoopVectorize.cpp b/llvm/lib/Transforms/Vectorize/LoopVectorize.cpp
index 474f856d20461..50ee102b92b53 100644
--- a/llvm/lib/Transforms/Vectorize/LoopVectorize.cpp
+++ b/llvm/lib/Transforms/Vectorize/LoopVectorize.cpp
@@ -399,12 +399,6 @@ static cl::opt<bool> EnableEarlyExitVectorization(
     cl::desc(
         "Enable vectorization of early exit loops with uncountable exits."));
 
-// Likelyhood of bypassing the vectorized loop because assumptions about SCEV
-// variables not overflowing do not hold. See `emitSCEVChecks`.
-static constexpr uint32_t SCEVCheckBypassWeights[] = {1, 127};
-// Likelyhood of bypassing the vectorized loop because pointers overlap. See
-// `emitMemRuntimeChecks`.
-static constexpr uint32_t MemCheckBypassWeights[] = {1, 127};
 // Likelyhood of bypassing the vectorized loop because there are zero trips left
 // after prolog. See `emitIterationCountCheck`.
 static constexpr uint32_t MinItersBypassWeights[] = {1, 127};
@@ -534,16 +528,6 @@ class InnerLoopVectorizer {
   /// it overflows.
   void emitIterationCountCheck(BasicBlock *Bypass);
 
-  /// Emit a bypass check to see if all of the SCEV assumptions we've
-  /// had to make are correct. Returns the block containing the checks or
-  /// nullptr if no checks have been added.
-  BasicBlock *emitSCEVChecks(BasicBlock *Bypass);
-
-  /// Emit bypass checks to check any memory assumptions we may have made.
-  /// Returns the block containing the checks or nullptr if no checks have been
-  /// added.
-  BasicBlock *emitMemRuntimeChecks(BasicBlock *Bypass);
-
   /// Emit basic blocks (prefixed with \p Prefix) for the iteration check,
   /// vector loop preheader, middle block and scalar preheader.
   void createVectorLoopSkeleton(StringRef Prefix);
@@ -1782,7 +1766,6 @@ class GeneratedRTChecks {
   SCEVExpander MemCheckExp;
 
   bool CostTooHigh = false;
-  const bool AddBranchWeights;
 
   Loop *OuterLoop = nullptr;
 
@@ -1794,11 +1777,10 @@ class GeneratedRTChecks {
 public:
   GeneratedRTChecks(PredicatedScalarEvolution &PSE, DominatorTree *DT,
                     LoopInfo *LI, TargetTransformInfo *TTI,
-                    const DataLayout &DL, bool AddBranchWeights,
-                    TTI::TargetCostKind CostKind)
+                    const DataLayout &DL, TTI::TargetCostKind CostKind)
       : DT(DT), LI(LI), TTI(TTI), SCEVExp(*PSE.getSE(), DL, "scev.check"),
-        MemCheckExp(*PSE.getSE(), DL, "scev.check"),
-        AddBranchWeights(AddBranchWeights), PSE(PSE), CostKind(CostKind) {}
+        MemCheckExp(*PSE.getSE(), DL, "scev.check"), PSE(PSE),
+        CostKind(CostKind) {}
 
   /// Generate runtime checks in SCEVCheckBlock and MemCheckBlock, so we can
   /// accurately estimate the cost of the runtime checks. The blocks are
@@ -2016,61 +1998,35 @@ class GeneratedRTChecks {
   /// Adds the generated SCEVCheckBlock before \p LoopVectorPreHeader and
   /// adjusts the branches to branch to the vector preheader or \p Bypass,
   /// depending on the generated condition.
-  BasicBlock *emitSCEVChecks(BasicBlock *Bypass,
-                             BasicBlock *LoopVectorPreHeader) {
+  std::pair<Value *, BasicBlock *> emitSCEVChecks() {
     using namespace llvm::PatternMatch;
     if (!SCEVCheckCond || match(SCEVCheckCond, m_ZeroInt()))
-      return nullptr;
+      return {nullptr, nullptr};
 
-    auto *Pred = LoopVectorPreHeader->getSinglePredecessor();
-    BranchInst::Create(LoopVectorPreHeader, SCEVCheckBlock);
-
-    SCEVCheckBlock->getTerminator()->eraseFromParent();
-    SCEVCheckBlock->moveBefore(LoopVectorPreHeader);
-    Pred->getTerminator()->replaceSuccessorWith(LoopVectorPreHeader,
-                                                SCEVCheckBlock);
-
-    BranchInst &BI =
-        *BranchInst::Create(Bypass, LoopVectorPreHeader, SCEVCheckCond);
-    if (AddBranchWeights)
-      setBranchWeights(BI, SCEVCheckBypassWeights, /*IsExpected=*/false);
-    ReplaceInstWithInst(SCEVCheckBlock->getTerminator(), &BI);
-    // Mark the check as used, to prevent it from being removed during cleanup.
+    Value *Cond = SCEVCheckCond;
     SCEVCheckCond = nullptr;
     AddedAnyChecks = true;
-    return SCEVCheckBlock;
+    return {Cond, SCEVCheckBlock};
   }
 
   /// Adds the generated MemCheckBlock before \p LoopVectorPreHeader and adjusts
   /// the branches to branch to the vector preheader or \p Bypass, depending on
   /// the generated condition.
-  BasicBlock *emitMemRuntimeChecks(BasicBlock *Bypass,
-                                   BasicBlock *LoopVectorPreHeader) {
+  std::pair<Value *, BasicBlock *> emitMemRuntimeChecks() {
     // Check if we generated code that checks in runtime if arrays overlap.
     if (!MemRuntimeCheckCond)
-      return nullptr;
-
-    auto *Pred = LoopVectorPreHeader->getSinglePredecessor();
-    Pred->getTerminator()->replaceSuccessorWith(LoopVectorPreHeader,
-                                                MemCheckBlock);
-
-    MemCheckBlock->moveBefore(LoopVectorPreHeader);
-
-    BranchInst &BI =
-        *BranchInst::Create(Bypass, LoopVectorPreHeader, MemRuntimeCheckCond);
-    if (AddBranchWeights) {
-      setBranchWeights(BI, MemCheckBypassWeights, /*IsExpected=*/false);
-    }
-    ReplaceInstWithInst(MemCheckBlock->getTerminator(), &BI);
-    MemCheckBlock->getTerminator()->setDebugLoc(
-        Pred->getTerminator()->getDebugLoc());
+      return {nullptr, nullptr};
 
     // Mark the check as used, to prevent it from being removed during cleanup.
+    Value *Cond = MemRuntimeCheckCond;
     MemRuntimeCheckCond = nullptr;
     AddedAnyChecks = true;
-    return MemCheckBlock;
+    return {Cond, MemCheckBlock};
   }
 
+  BasicBlock *getSCEVCheckBlock() const { return SCEVCheckBlock; }
+  BasicBlock *getMemCheckBlock() const { return MemCheckBlock; }
+
   /// Return true if any runtime checks have been added
   bool hasChecks() const { return AddedAnyChecks; }
 };
@@ -2451,53 +2407,6 @@ void InnerLoopVectorizer::emitIterationCountCheck(BasicBlock *Bypass) {
          "Plan's entry must be TCCCheckBlock");
 }
 
-BasicBlock *InnerLoopVectorizer::emitSCEVChecks(BasicBlock *Bypass) {
-  BasicBlock *const SCEVCheckBlock =
-      RTChecks.emitSCEVChecks(Bypass, LoopVectorPreHeader);
-  if (!SCEVCheckBlock)
-    return nullptr;
-
-  assert((!Cost->OptForSize ||
-          Cost->Hints->getForce() == LoopVectorizeHints::FK_Enabled) &&
-         "Cannot SCEV check stride or overflow when optimizing for size");
-
-  introduceCheckBlockInVPlan(SCEVCheckBlock);
-  return SCEVCheckBlock;
-}
-
-BasicBlock *InnerLoopVectorizer::emitMemRuntimeChecks(BasicBlock *Bypass) {
-  BasicBlock *const MemCheckBlock =
-      RTChecks.emitMemRuntimeChecks(Bypass, LoopVectorPreHeader);
-
-  // Check if we generated code that checks in runtime if arrays overlap. We put
-  // the checks into a separate block to make the more common case of few
-  // elements faster.
-  if (!MemCheckBlock)
-    return nullptr;
-
-  // VPlan-native path does not do any analysis for runtime checks currently.
-  assert((!EnableVPlanNativePath || OrigLoop->begin() == OrigLoop->end()) &&
-         "Runtime checks are not supported for outer loops yet");
-
-  if (Cost->OptForSize) {
-    assert(Cost->Hints->getForce() == LoopVectorizeHints::FK_Enabled &&
-           "Cannot emit memory checks when optimizing for size, unless forced "
-           "to vectorize.");
-    ORE->emit([&]() {
-      return OptimizationRemarkAnalysis(DEBUG_TYPE, "VectorizationCodeSize",
-                                        OrigLoop->getStartLoc(),
-                                        OrigLoop->getHeader())
-             << "Code-size may be reduced by not forcing "
-                "vectorization, or by source-code modifications "
-                "eliminating the need for runtime checks "
-                "(e.g., adding 'restrict').";
-    });
-  }
-
-  introduceCheckBlockInVPlan(MemCheckBlock);
-  return MemCheckBlock;
-}
-
 /// Replace \p VPBB with a VPIRBasicBlock wrapping \p IRBB. All recipes from \p
 /// VPBB are moved to the end of the newly created VPIRBasicBlock. VPBB must
 /// have a single predecessor, which is rewired to the new VPIRBasicBlock. All
@@ -2614,15 +2523,6 @@ BasicBlock *InnerLoopVectorizer::createVectorizedLoopSkeleton() {
   // to the scalar loop.
   emitIterationCountCheck(LoopScalarPreHeader);
 
-  // Generate the code to check any assumptions that we've made for SCEV
-  // expressions.
-  emitSCEVChecks(LoopScalarPreHeader);
-
-  // Generate the code that checks in runtime if arrays overlap. We put the
-  // checks into a separate block to make the more common case of few elements
-  // faster.
-  emitMemRuntimeChecks(LoopScalarPreHeader);
-
   replaceVPBBWithIRVPBB(Plan.getScalarPreheader(), LoopScalarPreHeader);
   return LoopVectorPreHeader;
 }
@@ -7312,6 +7212,11 @@ DenseMap<const SCEV *, Value *> LoopVectorizationPlanner::executePlan(
   VPlanTransforms::runPass(VPlanTransforms::unrollByUF, BestVPlan, BestUF,
                            OrigLoop->getHeader()->getContext());
   VPlanTransforms::runPass(VPlanTransforms::materializeBroadcasts, BestVPlan);
+
+  if (!VectorizingEpilogue)
+    addRuntimeChecks(BestVPlan, ILV.RTChecks);
+
+  VPBasicBlock *VectorPH = cast<VPBasicBlock>(BestVPlan.getVectorPreheader());
   VPlanTransforms::optimizeForVFAndUF(BestVPlan, BestVF, BestUF, PSE);
   VPlanTransforms::simplifyRecipes(BestVPlan, *Legal->getWidestInductionType());
   VPlanTransforms::narrowInterleaveGroups(
@@ -7359,7 +7264,8 @@ DenseMap<const SCEV *, Value *> LoopVectorizationPlanner::executePlan(
 
   // 1. Set up the skeleton for vectorization, including vector pre-header and
   // middle block. The vector loop is created during VPlan execution.
-  VPBasicBlock *VectorPH = cast<VPBasicBlock>(Entry->getSuccessors()[1]);
+  BasicBlock *EntryBB =
+      cast<VPIRBasicBlock>(BestVPlan.getEntry())->getIRBasicBlock();
   State.CFG.PrevBB = ILV.createVectorizedLoopSkeleton();
   if (VectorizingEpilogue)
     VPlanTransforms::removeDeadRecipes(BestVPlan);
@@ -7383,6 +7289,12 @@ DenseMap<const SCEV *, Value *> LoopVectorizationPlanner::executePlan(
       ILV.getOrCreateVectorTripCount(ILV.LoopVectorPreHeader), State);
   replaceVPBBWithIRVPBB(VectorPH, State.CFG.PrevBB);
 
+  // Move check blocks to their final position.
+  if (BasicBlock *MemCheckBlock = ILV.RTChecks.getMemCheckBlock())
+    MemCheckBlock->moveAfter(EntryBB);
+  if (BasicBlock *SCEVCheckBlock = ILV.RTChecks.getSCEVCheckBlock())
+    SCEVCheckBlock->moveAfter(EntryBB);
+
   BestVPlan.execute(&State);
 
   // 2.5 When vectorizing the epilogue, fix reduction resume values from the
@@ -7483,15 +7395,6 @@ BasicBlock *EpilogueVectorizerMainLoop::createEpilogueVectorizedLoopSkeleton() {
       emitIterationCountCheck(LoopScalarPreHeader, true);
   EPI.EpilogueIterationCountCheck->setName("iter.check");
 
-  // Generate the code to check any assumptions that we've made for SCEV
-  // expressions.
-  EPI.SCEVSafetyCheck = emitSCEVChecks(LoopScalarPreHeader);
-
-  // Generate the code that checks at runtime if arrays overlap. We put the
-  // checks into a separate block to make the more common case of few elements
-  // faster.
-  EPI.MemSafetyCheck = emitMemRuntimeChecks(LoopScalarPreHeader);
-
   // Generate the iteration count check for the main loop, *after* the check
   // for the epilogue loop, so that the path-length is shorter for the case
   // that goes directly through the vector epilogue. The longer-path length for
@@ -7608,6 +7511,13 @@ EpilogueVectorizerEpilogueLoop::createEpilogueVectorizedLoopSkeleton() {
   EPI.EpilogueIterationCountCheck->getTerminator()->replaceUsesOfWith(
       VecEpilogueIterationCountCheck, LoopScalarPreHeader);
 
+  BasicBlock *SCEVCheckBlock = RTChecks.getSCEVCheckBlock();
+  if (SCEVCheckBlock && SCEVCheckBlock->hasNPredecessorsOrMore(1))
+    EPI.SCEVSafetyCheck = SCEVCheckBlock;
+
+  BasicBlock *MemCheckBlock = RTChecks.getMemCheckBlock();
+  if (MemCheckBlock && MemCheckBlock->hasNPredecessorsOrMore(1))
+    EPI.MemSafetyCheck = MemCheckBlock;
   if (EPI.SCEVSafetyCheck)
     EPI.SCEVSafetyCheck->getTerminator()->replaceUsesOfWith(
         VecEpilogueIterationCountCheck, LoopScalarPreHeader);
@@ -9325,6 +9235,47 @@ void LoopVectorizationPlanner::adjustRecipesForReductions(
   VPlanTransforms::runPass(VPlanTransforms::clearReductionWrapFlags, *Plan);
 }
 
+void LoopVectorizationPlanner::addRuntimeChecks(
+    VPlan &Plan, GeneratedRTChecks &RTChecks) const {
+  SmallVector<std::pair<VPValue *, VPIRBasicBlock *>> Checks;
+  const auto &[SCEVCheckCond, SCEVCheckBlock] = RTChecks.emitSCEVChecks();
+  if (SCEVCheckBlock) {
+    assert((!CM.OptForSize ||
+            CM.Hints->getForce() == LoopVectorizeHints::FK_Enabled) &&
+           "Cannot SCEV check stride or overflow when optimizing for size");
+    Checks.emplace_back(Plan.getOrAddLiveIn(SCEVCheckCond),
+                        Plan.createVPIRBasicBlock(SCEVCheckBlock));
+  }
+  const auto &[MemCheckCond, MemCheckBlock] = RTChecks.emitMemRuntimeChecks();
+  if (MemCheckBlock) {
+    // VPlan-native path does not do any analysis for runtime checks
+    // currently.
+    assert((!EnableVPlanNativePath || OrigLoop->begin() == OrigLoop->end()) &&
+           "Runtime checks are not supported for outer loops yet");
+
+    if (CM.OptForSize) {
+      assert(
+          CM.Hints->getForce() == LoopVectorizeHints::FK_Enabled &&
+          "Cannot emit memory checks when optimizing for size, unless forced "
+          "to vectorize.");
+      ORE->emit([&]() {
+        return OptimizationRemarkAnalysis(DEBUG_TYPE, "VectorizationCodeSize",
+                                          OrigLoop->getStartLoc(),
+                                          OrigLoop->getHeader())
+               << "Code-size may be reduced by not forcing "
+                  "vectorization, or by source-code modifications "
+                  "eliminating the need for runtime checks "
+                  "(e.g., adding 'restrict').";
+      });
+    }
+    Checks.emplace_back(Plan.getOrAddLiveIn(MemCheckCond),
+                        Plan.createVPIRBasicBlock(MemCheckBlock));
+  }
+  VPlanTransforms::connectCheckBlocks(
+      Plan, Checks,
+      hasBranchWeightMD(*OrigLoop->getLoopLatch()->getTerminator()));
+}
+
 void VPDerivedIVRecipe::execute(VPTransformState &State) {
   assert(!State.Lane && "VPDerivedIVRecipe being replicated.");
 
@@ -9446,10 +9397,7 @@ static bool processLoopInVPlanNativePath(
   VPlan &BestPlan = LVP.getPlanFor(VF.Width);
 
   {
-    bool AddBranchWeights =
-        hasBranchWeightMD(*L->getLoopLatch()->getTerminator());
-    GeneratedRTChecks Checks(PSE, DT, LI, TTI, F->getDataLayout(),
-                             AddBranchWeights, CM.CostKind);
+    GeneratedRTChecks Checks(PSE, DT, LI, TTI, F->getDataLayout(), CM.CostKind);
     InnerLoopVectorizer LB(L, PSE, LI, DT, TLI, TTI, AC, ORE, VF.Width,
                            VF.Width, 1, &CM, BFI, PSI, Checks, BestPlan);
     LLVM_DEBUG(dbgs() << "Vectorizing outer loop in \""
@@ -10085,10 +10033,7 @@ bool LoopVectorizePass::processLoop(Loop *L) {
   if (ORE->allowExtraAnalysis(LV_NAME))
     LVP.emitInvalidCostRemarks(ORE);
 
-  bool AddBranchWeights =
-      hasBranchWeightMD(*L->getLoopLatch()->getTerminator());
-  GeneratedRTChecks Checks(PSE, DT, LI, TTI, F->getDataLayout(),
-                           AddBranchWeights, CM.CostKind);
+  GeneratedRTChecks Checks(PSE, DT, LI, TTI, F->getDataLayout(), CM.CostKind);
   if (LVP.hasPlanWithVF(VF.Width)) {
     // Select the interleave count.
     IC = CM.selectInterleaveCount(LVP.getPlanFor(VF.Width), VF.Width, VF.Cost);
diff --git a/llvm/lib/Transforms/Vectorize/VPlan.cpp b/llvm/lib/Transforms/Vectorize/VPlan.cpp
index 1838562f26b82..8068a6b3b968f 100644
--- a/llvm/lib/Transforms/Vectorize/VPlan.cpp
+++ b/llvm/lib/Transforms/Vectorize/VPlan.cpp
@@ -484,7 +484,8 @@ void VPBasicBlock::connectToPredecessors(VPTransformState &State) {
       unsigned idx = PredVPSuccessors.front() == this ? 0 : 1;
       assert((TermBr && (!TermBr->getSuccessor(idx) ||
                          (isa<VPIRBasicBlock>(this) &&
-                          TermBr->getSuccessor(idx) == NewBB))) &&
+                          (TermBr->getSuccessor(idx) == NewBB ||
+                           PredVPBlock == getPlan()->getEntry())))) &&
              "Trying to reset an existing successor block.");
       TermBr->setSuccessor(idx, NewBB);
     }
diff --git a/llvm/lib/Transforms/Vectorize/VPlanConstruction.cpp b/llvm/lib/Transforms/Vectorize/VPlanConstruction.cpp
index 593e5063802ba..a55e95ef274b7 100644
--- a/llvm/lib/Transforms/Vectorize/VPlanConstruction.cpp
+++ b/llvm/lib/Transforms/Vectorize/VPlanConstruction.cpp
@@ -20,6 +20,7 @@
 #include "llvm/Analysis/LoopInfo.h"
 #include "llvm/Analysis/LoopIterator.h"
 #include "llvm/Analysis/ScalarEvolution.h"
+#include "llvm/IR/MDBuilder.h"
 
 #define DEBUG_TYPE "vplan"
 
@@ -589,3 +590,41 @@ void VPlanTransforms::createLoopRegions(VPlan &Plan) {
   TopRegion->setName("vector loop");
   TopRegion->getEntryBasicBlock()->setName("vector.body");
 }
+
+// Likelyhood of bypassing the vectorized loop because SCEV assumptions or
+// memory runtime checks.
+static constexpr uint32_t CheckBypassWeights[] = {1, 127};
+
+void VPlanTransforms::connectCheckBlocks(
+    VPlan &Plan, ArrayRef<std::pair<VPValue *, VPIRBasicBlock *>> Checks,
+    bool AddBranchWeights) {
+  VPBlockBase *VectorPH = Plan.getVectorPreheader();
+  VPBlockBase *ScalarPH = Plan.getScalarPreheader();
+  for (const auto &[Cond, CheckBlock] : Checks) {
+    VPBlockBase *PreVectorPH = VectorPH->getSinglePredecessor();
+    VPBlockUtils::insertOnEdge(PreVectorPH, VectorPH, CheckBlock);
+    VPBlockUtils::connectBlocks(CheckBlock, ScalarPH);
+    CheckBlock->swapSuccessors();
+
+    // We just connected a new block to the scalar preheader. Update all
+    // VPPhis by adding an incoming value for it, replicating the last value.
+    unsigned NumPredecessors = ScalarPH->getNumPredecessors();
+    for (VPRecipeBase &R : cast<VPBasicBlock>(ScalarPH)->phis()) {
+      assert(isa<VPPhi>(&R) && "Phi expected to be VPPhi");
+      assert(cast<VPPhi>(&R)->getNumIncoming() == NumPredecessors - 1 &&
+             "must have incoming values for all operands");
+      R.addOperand(R.getOperand(NumPredecessors - 2));
+    }
+
+    VPIRMetadata VPBranchWeights;
+    auto *Term = VPBuilder(CheckBlock)
+                     .createNaryOp(VPInstruction::BranchOnCond, {Cond},
+                                   Plan.getCanonicalIV()->getDebugLoc());
+    if (AddBranchWeights) {
+      MDBuilder MDB(Plan.getScalarHeader()->getIRBasicBlock()->getContext());
+      MDNode *BranchWeights =
+          MDB.createBranchWeights(CheckBypassWeights, /*IsExpected=*/false);
+      Term->addMetadata(LLVMContext::MD_prof, BranchWeights);
+    }
+  }
+}
diff --git a/llvm/lib/Transforms/Vectorize/VPlanTransforms.h b/llvm/lib/Transforms/Vectorize/VPlanTransforms.h
index 34e2de4eb3b74..c13cabc87ce31 100644
--- a/llvm/lib/Transforms/Vectorize/VPlanTransforms.h
+++ b/llvm/lib/Transforms/Vectorize/VPlanTransforms.h
@@ -74,6 +74,13 @@ struct VPlanTransforms {
   /// flat CFG into a hierarchical CFG.
   static void createLoopRegions(VPlan &Plan);
 
+  /// Connect the blocks in \p Checks to \p Plan, using the corresponding
+  /// VPValue as branch condition.
+  static void
+  connectCheckBlocks(VPlan &Plan,
+                     ArrayRef<std::pair<VPValue *, VPIRBasicBlock *>> ...
[truncated]

@fhahn fhahn force-pushed the vplan-connect-memcheck-scevcheck-early branch from 6cc3fc0 to 205fad6 Compare June 17, 2025 21:52
Comment on lines 2004 to 2007
/// Adds the generated SCEVCheckBlock before \p LoopVectorPreHeader and
/// adjusts the branches to branch to the vector preheader or \p Bypass,
/// depending on the generated condition.
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Documentation needs updating. This method no longer adds or adjusts anything, mostly retrieves information.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Updated, thanks

setBranchWeights(BI, SCEVCheckBypassWeights, /*IsExpected=*/false);
ReplaceInstWithInst(SCEVCheckBlock->getTerminator(), &BI);
// Mark the check as used, to prevent it from being removed during cleanup.
Value *Cond = SCEVCheckCond;
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
Value *Cond = SCEVCheckCond;
// Mark the check as used, to prevent it from being removed during cleanup.
Value *Cond = SCEVCheckCond;

Is this prevention still needed? Comment could be improved, this seems to prevent repeated retrieval / reuse.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is not really needed independent of the patch, simplified in 786ccb2

BasicBlock *LoopVectorPreHeader) {
std::pair<Value *, BasicBlock *> emitSCEVChecks() {
using namespace llvm::PatternMatch;
if (!SCEVCheckCond || match(SCEVCheckCond, m_ZeroInt()))
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Better optimize branch-on-false elsewhere? (SCEVCheckCond being zero)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, this will be possible after this change, although it is probably best done and tested separately as follow-up?

/// depending on the generated condition.
BasicBlock *emitSCEVChecks(BasicBlock *Bypass,
BasicBlock *LoopVectorPreHeader) {
std::pair<Value *, BasicBlock *> emitSCEVChecks() {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should this simply retrieve SCEVCheckCond and SCEVCheckBlock, i.e., getSCEVChecks(), provided other things are done elsewhere? See below.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Updated to return the block + cond, with still checking if there's a non-trivial check, thanks

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Still called emit and documented as such.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Updated to getSCECVChecks, thanks

// Mark the check as used, to prevent it from being removed during cleanup.
Value *Cond = SCEVCheckCond;
SCEVCheckCond = nullptr;
AddedAnyChecks = true;
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should AddedAnyChecks be accumulated elsewhere rather than here upon retrieval?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, we could probably just check if the check blocks have been inserted, will check separately, thanks

Comment on lines 286 to 289
; CHECK-NEXT: br i1 %min.iters.check, label %scalar.ph, label %vector.ph
; CHECK-NEXT: LV: vectorizing VPBB:ir-bb<vector.scevcheck> in BB:vector.scevcheck
; CHECK-NEXT: LV: filled BB:
; CHECK-NEXT: vector.scevcheck: ; preds = %for.body.preheader
; CHECK-NEXT: vector.scevcheck: ; No predecessors!
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this correct?
Also below.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yep, at this point, it has not been connected yet; and the message is printed after executing all recipes in a block, but before connecting to predecssors.

// Move check blocks to their final position.
if (BasicBlock *MemCheckBlock = ILV.RTChecks.getMemCheckBlock())
MemCheckBlock->moveAfter(EntryBB);
if (BasicBlock *SCEVCheckBlock = ILV.RTChecks.getSCEVCheckBlock())
SCEVCheckBlock->moveAfter(EntryBB);

Copy link
Collaborator

@ayalz ayalz Jun 25, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should this be part of VPIRBB::execute(), for VPIRBB's who are not placed correctly when generated? VPlan should model the final position of EntryVPBB->SCEVCheckBlock->MemCheckBlock, and VPlan::execute should take care of (re)wiring its new and existing IRBB's accordingly?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It could be part of VPIRBB's execute, except that it will change the order for epilogue vectorization, which inserts an additional block with an iteration count check after the original entry block changing the order. Could be adjusted as follow-up?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sure, can leave behind a TODO for later. Ideally VPlan should model the final control-flow fully and accurately, also for epilog vectorization, and when executed should make sure the generated control-flow matches that modeled by VPlan.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The control-flow is completely modeled in VPlan. This is just the position in the function, added a TODO

Comment on lines 7520 to 7526
BasicBlock *SCEVCheckBlock = RTChecks.getSCEVCheckBlock();
if (SCEVCheckBlock && SCEVCheckBlock->hasNPredecessorsOrMore(1))
EPI.SCEVSafetyCheck = SCEVCheckBlock;

BasicBlock *MemCheckBlock = RTChecks.getMemCheckBlock();
if (MemCheckBlock && MemCheckBlock->hasNPredecessorsOrMore(1))
EPI.MemSafetyCheck = MemCheckBlock;
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sigh, worth a comment to explain this hook up?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Added, also removed SCEVSafetyCheck and MemSafetyCheck from EPI, as it is now only needed in this function.

Comment on lines -7507 to -7509
EPI.SCEVSafetyCheck = emitSCEVChecks(LoopScalarPreHeader);

// Generate the code that checks at runtime if arrays overlap. We put the
// checks into a separate block to make the more common case of few elements
// faster.
EPI.MemSafetyCheck = emitMemRuntimeChecks(LoopScalarPreHeader);
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

These are replaced by calling getSCEVCheckBlock() and getMemCheckBlock() instead, so can emitSCEVChecks() and emitMemRuntimeChecks() be simplified, regarding "Mark the check as used, to prevent it from being removed during cleanup"?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yep should be done in the latest version, thanks

Comment on lines 2033 to 2030
BasicBlock *getSCEVCheckBlock() const { return SCEVCheckBlock; }
BasicBlock *getMemCheckBlock() const { return MemCheckBlock; }
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Mostly needed for hooking up epilog vectorization.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

removed, now just uses getSCEVChecks()/getMemRuntimeChecks()

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Still seem to be here.

Suggested change
BasicBlock *getSCEVCheckBlock() const { return SCEVCheckBlock; }
BasicBlock *getMemCheckBlock() const { return MemCheckBlock; }

Note that getSCEVChecks()/getMemRuntimeChecks() set the AddedAnyChecks sticky bit when called.
If SCEVCheckCond is zero, getSCEVCheckBlock() should return null?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hm, yeah it looks like some updates weren't pushed, sorry about that

fhahn added 3 commits June 26, 2025 21:43
Connect SCEV and memory runtime check block directly in VPlan as
VPIRBasicBlocks, removing ILV::emitSCEVChecks and ILV::emitMemRuntimeChecks.

The new logic is currently split across LoopVectorizationPlanner::addRuntimeChecks
which collects a list of {Condition, CheckBlock} pairs and performs some
checks and emits remarks if needed. The list of checks is then added to
VPlan in VPlanTransforms;:connectCheckBlocks.
@fhahn fhahn force-pushed the vplan-connect-memcheck-scevcheck-early branch from 205fad6 to 7ba9734 Compare June 27, 2025 09:46
@github-actions
Copy link

github-actions bot commented Jun 27, 2025

✅ With the latest revision this PR passed the C/C++ code formatter.

@fhahn fhahn force-pushed the vplan-connect-memcheck-scevcheck-early branch from 7ba9734 to 044d73a Compare June 27, 2025 09:51
Comment on lines 9299 to 9301
VPlanTransforms::connectCheckBlocks(
Plan, Checks,
hasBranchWeightMD(*OrigLoop->getLoopLatch()->getTerminator()));
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Better call attachCheckBlock() given each pair of cond and check blocks separately, twice, above?

Perhaps "attach" is more accurate than "add" or "connect", as this transformation wraps existing IR check conditions and blocks (held detached from original IR, and detached from VPlan until now) in VPValues and VPIRBBs, as when forming the initial VPlan from IR, rather than connecting existing VPlan entities together, or adding completely new entities to VPlan.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Updated to be called attachCheckBlock, thanks!

Comment on lines 7235 to 7231
if (!VectorizingEpilogue)
addRuntimeChecks(BestVPlan, ILV.RTChecks);
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Worth noting, e.g.,
// Checks are the same for all VPlans, added to BestVPlan only for compactness.
?

Comment on lines 567 to 568
/// Add the runtime checks from \p RTChecks to \p VPlan.
void addRuntimeChecks(VPlan &Plan, GeneratedRTChecks &RTChecks) const;
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
/// Add the runtime checks from \p RTChecks to \p VPlan.
void addRuntimeChecks(VPlan &Plan, GeneratedRTChecks &RTChecks) const;
/// Attach the runtime checks of \p RTChecks to \p VPlan.
void attachRuntimeChecks(VPlan &Plan, GeneratedRTChecks &RTChecks) const;

?
The runtime checks are conceptually "there", but held detached from VPlan(s) until now for compactness, and detached from IR until VPlan execution.
This method wraps connectRuntimeChecks() with asserts and remarks. Suffice to have a single VPlanTransform::attachRuntimeCheck() with a unified assert and remark - of a conflict between a runtime check and OptForSize?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

updated the naming, but still left the separate function in LoopVectorizationPlanner, to avoid having to pass Hints/OptForSize (and ORE). I think we can move the asserts to getSCEVChecks/getMemRuntimeChecks and emit a remark in both cases, but this should probably also be done separately?

/// depending on the generated condition.
BasicBlock *emitSCEVChecks(BasicBlock *Bypass,
BasicBlock *LoopVectorPreHeader) {
std::pair<Value *, BasicBlock *> emitSCEVChecks() {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Still called emit and documented as such.

Comment on lines 2018 to 2020
/// Adds the generated MemCheckBlock before \p LoopVectorPreHeader and adjusts
/// the branches to branch to the vector preheader or \p Bypass, depending on
/// the generated condition.
BasicBlock *emitMemRuntimeChecks(BasicBlock *Bypass,
BasicBlock *LoopVectorPreHeader) {
std::pair<Value *, BasicBlock *> emitMemRuntimeChecks() {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Still called emit and documented as such.

OrigLoop->getHeader()->getContext());
VPlanTransforms::runPass(VPlanTransforms::replicateByVF, BestVPlan, BestVF);
VPlanTransforms::runPass(VPlanTransforms::materializeBroadcasts, BestVPlan);
if (hasBranchWeightMD(*OrigLoop->getLoopLatch()->getTerminator()))
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
if (hasBranchWeightMD(*OrigLoop->getLoopLatch()->getTerminator()))
bool hasBranchWeights = hasBranchWeightMD(*OrigLoop->getLoopLatch()->getTerminator());
if (hasBranchWeights)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Updated, thanks

Comment on lines 7230 to 7231
if (!VectorizingEpilogue)
addRuntimeChecks(BestVPlan, ILV.RTChecks);
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
if (!VectorizingEpilogue)
addRuntimeChecks(BestVPlan, ILV.RTChecks);
// Runtime checks are the same for all VPlans, attach them to main loop of BestVPlan only, for compactness.
if (!VectorizingEpilogue)
attachRuntimeChecks(BestVPlan, ILV.RTChecks, hasBranchWeights);

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done thanks

Comment on lines 9262 to 9264
void LoopVectorizationPlanner::addRuntimeChecks(
VPlan &Plan, GeneratedRTChecks &RTChecks) const {
SmallVector<std::pair<VPValue *, VPIRBasicBlock *>> Checks;
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
void LoopVectorizationPlanner::addRuntimeChecks(
VPlan &Plan, GeneratedRTChecks &RTChecks) const {
SmallVector<std::pair<VPValue *, VPIRBasicBlock *>> Checks;
void LoopVectorizationPlanner::attachRuntimeChecks(
VPlan &Plan, GeneratedRTChecks &RTChecks, bool hasBranchWeights) const {

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Updated thanks

Comment on lines 594 to 595
// Likelyhood of bypassing the vectorized loop due to SCEV or memory runtime
// checks.
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
// Likelyhood of bypassing the vectorized loop due to SCEV or memory runtime
// checks.
// Likelyhood of bypassing the vectorized loop due to a runtime check block, including memory overlap checks block and wrapping/unit-stride checks block.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Updated, thanks

void VPlanTransforms::connectCheckBlocks(
VPlan &Plan, ArrayRef<std::pair<VPValue *, VPIRBasicBlock *>> Checks,
bool AddBranchWeights) {
VPBlockBase *VectorPH = Plan.getVectorPreheader();
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Would be good to

    assert((!CM.OptForSize ||
            CM.Hints->getForce() == LoopVectorizeHints::FK_Enabled) &&
           "Runtime checks require forcing vectorization when optimizing for size");

here, but may need to convey OptForSize and ForcedToVectorize.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it would be best to leave the assertions as is for now, but potentially move them to getSCEVChecks/getMemRuntimeChecks?

Copy link
Collaborator

@ayalz ayalz left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This LGTM, thanks for accommodating!

Comment on lines 7518 to 7520
// Retrieve blocks with SCEV and memory runtime checks, if they have been
// connected to the CFG, otherwise they are unused and will be deleted. Their
// terminators and phis using them need adjusting below.
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
// Retrieve blocks with SCEV and memory runtime checks, if they have been
// connected to the CFG, otherwise they are unused and will be deleted. Their
// terminators and phis using them need adjusting below.
// Adjust the terminators of runtime check blocks and phis using them.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Updated, thanks


void LoopVectorizationPlanner::attachRuntimeChecks(
VPlan &Plan, GeneratedRTChecks &RTChecks, bool HasBranchWeights) const {
SmallVector<std::pair<VPValue *, VPIRBasicBlock *>> Checks;
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
SmallVector<std::pair<VPValue *, VPIRBasicBlock *>> Checks;

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Removed thanks

/// flat CFG into a hierarchical CFG.
static void createLoopRegions(VPlan &Plan);

/// Wrap runtime check block \p CHeckBlock in a VPIRBB and \p Cond in a
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
/// Wrap runtime check block \p CHeckBlock in a VPIRBB and \p Cond in a
/// Wrap runtime check block \p CheckBlock in a VPIRBB and \p Cond in a

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed, thanks

fhahn added a commit that referenced this pull request Jul 6, 2025
As suggested in #143879, remove
AddedAnyChecks member and directly check if there are any relevant
runtime check blocks.
llvm-sync bot pushed a commit to arm/arm-toolchain that referenced this pull request Jul 6, 2025
As suggested in llvm/llvm-project#143879, remove
AddedAnyChecks member and directly check if there are any relevant
runtime check blocks.
@fhahn fhahn merged commit 64686c5 into llvm:main Jul 9, 2025
9 checks passed
@fhahn fhahn deleted the vplan-connect-memcheck-scevcheck-early branch July 9, 2025 12:03
llvm-sync bot pushed a commit to arm/arm-toolchain that referenced this pull request Jul 9, 2025
…nsform (NFC). (#143879)

Connect SCEV and memory runtime check block directly in VPlan as
VPIRBasicBlocks, removing ILV::emitSCEVChecks and
ILV::emitMemRuntimeChecks.

The new logic is currently split across
LoopVectorizationPlanner::addRuntimeChecks which collects a list of
{Condition, CheckBlock} pairs and performs some checks and emits remarks
if needed. The list of checks is then added to VPlan in
VPlanTransforms::connectCheckBlocks.

PR: llvm/llvm-project#143879
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants