[NFC][VPlan] Rename VPEVLBasedIVPHIRecipe to VPCurrentIterationPHIRecipe #177114

arcbbb · 2026-01-21T08:25:00Z

This is groundwork for #151300, which aims to support first-faulting
loads in non-tail-folded early-exit loops.
Per #175900, we need a variable-length stepping transform that can
shared between EVL and non-EVL loops.
The idea is to have an EVL-independent counter and transform for
tracking the cumulative number of processed elements.

This patch renames the existing counter (VPEVLBasedIVPHIRecipe) and
transform (canonicalizeEVLLoops) to be EVL-independent:

Rename VPEVLBasedIVPHIRecipe to VPCurrentIterationRecipe to
reflect its general purpose of tracking processed element count.
Rename canonicalizeEVLLoops to convertToVariableLengthStep.

This is NFC.

llvmbot · 2026-01-21T08:25:35Z

@llvm/pr-subscribers-vectorizers
@llvm/pr-subscribers-backend-risc-v

@llvm/pr-subscribers-llvm-transforms

Author: Shih-Po Hung (arcbbb)

Changes

This patch introduces VPCumulativeIVPHIRecipe to track the cumulative count of processed elements across loop iterations. Unlike CanonicalIV which always increments by VF*UF, CumulativeIV can step by a variable amount (e.g., EVL) per iteration.

Key changes:

Rename VPEVLBasedIVPHIRecipe to VPCumulativeIVPHIRecipe and create it during initial VPlan construction (in addCumulativeIVRecipes).
Initially, CumulativeIV steps by VF*UF, same as CanonicalIV.
In addExplicitVectorLength, modify CumulativeIV to step by EVL and switch the loop from up-counting to down-counting.
Add removeFixedStepCumulativeIV to eliminate redundant CumulativeIV when it steps by VF*UF (i.e., equivalent to CanonicalIV).
In convertToVariableLengthStep (formerly canonicalizeEVLLoops), replace CanonicalIV with CumulativeIV and lower to concrete recipes.

This also addresses the issue in #166164 (comment) which needs a cumulative element count in createScalarIVSteps.

Patch is 579.98 KiB, truncated to 20.00 KiB below, full version: https://github.com/llvm/llvm-project/pull/177114.diff

79 Files Affected:

(modified) llvm/lib/Transforms/Vectorize/LoopVectorize.cpp (+10-4)
(modified) llvm/lib/Transforms/Vectorize/VPlan.h (+23-12)
(modified) llvm/lib/Transforms/Vectorize/VPlanAnalysis.cpp (+2-2)
(modified) llvm/lib/Transforms/Vectorize/VPlanConstruction.cpp (+24-2)
(modified) llvm/lib/Transforms/Vectorize/VPlanRecipes.cpp (+4-3)
(modified) llvm/lib/Transforms/Vectorize/VPlanTransforms.cpp (+130-88)
(modified) llvm/lib/Transforms/Vectorize/VPlanTransforms.h (+14-16)
(modified) llvm/lib/Transforms/Vectorize/VPlanValue.h (+1-1)
(modified) llvm/lib/Transforms/Vectorize/VPlanVerifier.cpp (+3-2)
(modified) llvm/test/Transforms/LoopVectorize/RISCV/bf16.ll (+5-5)
(modified) llvm/test/Transforms/LoopVectorize/RISCV/blocks-with-dead-instructions.ll (+4-4)
(modified) llvm/test/Transforms/LoopVectorize/RISCV/dead-ops-cost.ll (+3-3)
(modified) llvm/test/Transforms/LoopVectorize/RISCV/defaults.ll (+1-1)
(modified) llvm/test/Transforms/LoopVectorize/RISCV/evl-compatible-loops.ll (+6-6)
(modified) llvm/test/Transforms/LoopVectorize/RISCV/f16.ll (+2-2)
(modified) llvm/test/Transforms/LoopVectorize/RISCV/first-order-recurrence-scalable-vf1.ll (+4-4)
(modified) llvm/test/Transforms/LoopVectorize/RISCV/fminimumnum.ll (+8-8)
(modified) llvm/test/Transforms/LoopVectorize/RISCV/induction-costs.ll (+4-4)
(modified) llvm/test/Transforms/LoopVectorize/RISCV/inloop-reduction.ll (+14-14)
(modified) llvm/test/Transforms/LoopVectorize/RISCV/interleaved-masked-access.ll (+6-6)
(modified) llvm/test/Transforms/LoopVectorize/RISCV/ordered-reduction.ll (+1-1)
(modified) llvm/test/Transforms/LoopVectorize/RISCV/partial-reduce-dot-product.ll (+16-16)
(modified) llvm/test/Transforms/LoopVectorize/RISCV/predicated-costs.ll (+3-3)
(modified) llvm/test/Transforms/LoopVectorize/RISCV/reductions.ll (+36-36)
(modified) llvm/test/Transforms/LoopVectorize/RISCV/reg-usage-prune-vf.ll (+6-6)
(modified) llvm/test/Transforms/LoopVectorize/RISCV/riscv-vector-reverse.ll (+16-16)
(modified) llvm/test/Transforms/LoopVectorize/RISCV/scalable-basics.ll (+5-5)
(modified) llvm/test/Transforms/LoopVectorize/RISCV/scalable-tailfold.ll (+18-18)
(modified) llvm/test/Transforms/LoopVectorize/RISCV/select-cmp-reduction.ll (+17-17)
(modified) llvm/test/Transforms/LoopVectorize/RISCV/strided-accesses.ll (+6-6)
(modified) llvm/test/Transforms/LoopVectorize/RISCV/tail-folding-bin-unary-ops-args.ll (+72-72)
(modified) llvm/test/Transforms/LoopVectorize/RISCV/tail-folding-call-intrinsics.ll (+40-40)
(modified) llvm/test/Transforms/LoopVectorize/RISCV/tail-folding-cast-intrinsics.ll (+43-43)
(modified) llvm/test/Transforms/LoopVectorize/RISCV/tail-folding-complex-mask.ll (+6-6)
(modified) llvm/test/Transforms/LoopVectorize/RISCV/tail-folding-cond-reduction.ll (+24-24)
(modified) llvm/test/Transforms/LoopVectorize/RISCV/tail-folding-div.ll (+36-36)
(modified) llvm/test/Transforms/LoopVectorize/RISCV/tail-folding-fixed-order-recurrence.ll (+17-17)
(modified) llvm/test/Transforms/LoopVectorize/RISCV/tail-folding-inloop-reduction.ll (+59-59)
(modified) llvm/test/Transforms/LoopVectorize/RISCV/tail-folding-interleave.ll (+4-4)
(modified) llvm/test/Transforms/LoopVectorize/RISCV/tail-folding-intermediate-store.ll (+7-7)
(modified) llvm/test/Transforms/LoopVectorize/RISCV/tail-folding-iv32.ll (+4-4)
(modified) llvm/test/Transforms/LoopVectorize/RISCV/tail-folding-known-no-overflow.ll (+9-9)
(modified) llvm/test/Transforms/LoopVectorize/RISCV/tail-folding-masked-loadstore.ll (+4-4)
(modified) llvm/test/Transforms/LoopVectorize/RISCV/tail-folding-ordered-reduction.ll (+4-4)
(modified) llvm/test/Transforms/LoopVectorize/RISCV/tail-folding-reduction.ll (+43-43)
(modified) llvm/test/Transforms/LoopVectorize/RISCV/tail-folding-reverse-load-store.ll (+10-10)
(modified) llvm/test/Transforms/LoopVectorize/RISCV/tail-folding-safe-dep-distance.ll (+12-12)
(modified) llvm/test/Transforms/LoopVectorize/RISCV/tail-folding-uniform-store.ll (+3-3)
(modified) llvm/test/Transforms/LoopVectorize/RISCV/transform-narrow-interleave-to-widen-memory.ll (+6-6)
(modified) llvm/test/Transforms/LoopVectorize/RISCV/truncate-to-minimal-bitwidth-cost.ll (+4-4)
(modified) llvm/test/Transforms/LoopVectorize/RISCV/type-info-cache-evl-crash.ll (+3-3)
(modified) llvm/test/Transforms/LoopVectorize/RISCV/uniform-load-store.ll (+32-32)
(modified) llvm/test/Transforms/LoopVectorize/RISCV/vectorize-vp-intrinsics.ll (+5-5)
(modified) llvm/test/Transforms/LoopVectorize/RISCV/vplan-riscv-vector-reverse.ll (+5-4)
(modified) llvm/test/Transforms/LoopVectorize/RISCV/vplan-vp-intrinsics-fixed-order-recurrence.ll (+7-6)
(modified) llvm/test/Transforms/LoopVectorize/RISCV/vplan-vp-intrinsics-reduction.ll (+11-9)
(modified) llvm/test/Transforms/LoopVectorize/RISCV/vplan-vp-intrinsics.ll (+5-4)
(modified) llvm/test/Transforms/LoopVectorize/X86/drop-inbounds-flags-for-reverse-vector-pointer.ll (+1-1)
(modified) llvm/test/Transforms/LoopVectorize/X86/tail_loop_folding.ll (+1-1)
(modified) llvm/test/Transforms/LoopVectorize/first-order-recurrence-sink-replicate-region.ll (+1-1)
(modified) llvm/test/Transforms/LoopVectorize/first-order-recurrence.ll (+6-6)
(modified) llvm/test/Transforms/LoopVectorize/fmax-without-fast-math-flags-interleave.ll (+1-1)
(modified) llvm/test/Transforms/LoopVectorize/iv-select-cmp-decreasing.ll (+2-2)
(modified) llvm/test/Transforms/LoopVectorize/pr51614-fold-tail-by-masking.ll (+1-1)
(modified) llvm/test/Transforms/LoopVectorize/reduction-inloop-min-max.ll (+4-4)
(modified) llvm/test/Transforms/LoopVectorize/reduction-inloop-pred.ll (+13-13)
(modified) llvm/test/Transforms/LoopVectorize/reduction-inloop-uf4.ll (+3-3)
(modified) llvm/test/Transforms/LoopVectorize/reduction-inloop.ll (+38-38)
(modified) llvm/test/Transforms/LoopVectorize/reduction-minmax-users-and-predicated.ll (+1-1)
(modified) llvm/test/Transforms/LoopVectorize/reduction-order.ll (+1-1)
(modified) llvm/test/Transforms/LoopVectorize/reduction-predselect.ll (+10-10)
(modified) llvm/test/Transforms/LoopVectorize/remarks-reduction-inloop.ll (+1-1)
(modified) llvm/test/Transforms/LoopVectorize/select-reduction.ll (+1-1)
(modified) llvm/test/Transforms/LoopVectorize/store-reduction-results-in-tail-folded-loop.ll (+1-1)
(modified) llvm/test/Transforms/LoopVectorize/strict-fadd-interleave-only.ll (+6-6)
(modified) llvm/test/Transforms/LoopVectorize/vplan-printing-reductions.ll (+19-19)
(modified) llvm/unittests/Transforms/Vectorize/VPlanHCFGTest.cpp (+16-8)
(modified) llvm/unittests/Transforms/Vectorize/VPlanSlpTest.cpp (+22-22)
(modified) llvm/unittests/Transforms/Vectorize/VPlanTest.cpp (+1-1)

diff --git a/llvm/lib/Transforms/Vectorize/LoopVectorize.cpp b/llvm/lib/Transforms/Vectorize/LoopVectorize.cpp
index 346b8a1f9e420..3da0c6f206ec1 100644
--- a/llvm/lib/Transforms/Vectorize/LoopVectorize.cpp
+++ b/llvm/lib/Transforms/Vectorize/LoopVectorize.cpp
@@ -4117,10 +4117,10 @@ static bool willGenerateVectors(VPlan &Plan, ElementCount VF,
       case VPDef::VPReplicateSC:
       case VPDef::VPInstructionSC:
       case VPDef::VPCanonicalIVPHISC:
+      case VPDef::VPCumulativeIVPHISC:
       case VPDef::VPVectorPointerSC:
       case VPDef::VPVectorEndPointerSC:
       case VPDef::VPExpandSCEVSC:
-      case VPDef::VPEVLBasedIVPHISC:
       case VPDef::VPPredInstPHISC:
       case VPDef::VPBranchOnMaskSC:
         continue;
@@ -4632,8 +4632,9 @@ LoopVectorizationPlanner::selectInterleaveCount(VPlan &Plan, ElementCount VF,
       !(CM.preferPredicatedLoop() && CM.useWideActiveLaneMask()))
     return 1;
 
+  // TODO: Support interleave for loop with variable-length stepping.
   if (any_of(Plan.getVectorLoopRegion()->getEntryBasicBlock()->phis(),
-             IsaPred<VPEVLBasedIVPHIRecipe>)) {
+             IsaPred<VPCumulativeIVPHIRecipe>)) {
     LLVM_DEBUG(dbgs() << "LV: Preference for VP intrinsics indicated. "
                          "Unroll factor forced to be 1.\n");
     return 1;
@@ -7443,8 +7444,8 @@ DenseMap<const SCEV *, Value *> LoopVectorizationPlanner::executePlan(
   // Expand BranchOnTwoConds after dissolution, when latch has direct access to
   // its successors.
   VPlanTransforms::expandBranchOnTwoConds(BestVPlan);
-  // Canonicalize EVL loops after regions are dissolved.
-  VPlanTransforms::canonicalizeEVLLoops(BestVPlan);
+  VPlanTransforms::convertToVariableLengthStep(BestVPlan,
+                                               CM.foldTailByMasking());
   VPlanTransforms::materializeBackedgeTakenCount(BestVPlan, VectorPH);
   VPlanTransforms::materializeVectorTripCount(
       BestVPlan, VectorPH, CM.foldTailByMasking(),
@@ -8371,6 +8372,10 @@ void LoopVectorizationPlanner::buildVPlansWithVPRecipes(ElementCount MinVF,
                                  *Plan, CM.getMaxSafeElements());
         VPlanTransforms::runPass(VPlanTransforms::optimizeEVLMasks, *Plan);
       }
+      // TODO: Place this before optimization after addExplicitVectorLength
+      // is placed close to addActiveLaneMask.
+      VPlanTransforms::runPass(VPlanTransforms::removeFixedStepCumulativeIV,
+                               *Plan);
       assert(verifyVPlanIsValid(*Plan) && "VPlan is invalid");
       VPlans.push_back(std::move(Plan));
     }
@@ -8674,6 +8679,7 @@ VPlanPtr LoopVectorizationPlanner::tryToBuildVPlan(VFRange &Range) {
   // failures.
   DenseMap<VPValue *, VPValue *> IVEndValues;
   VPlanTransforms::updateScalarResumePhis(*Plan, IVEndValues);
+  VPlanTransforms::removeFixedStepCumulativeIV(*Plan);
 
   assert(verifyVPlanIsValid(*Plan) && "VPlan is invalid");
   return Plan;
diff --git a/llvm/lib/Transforms/Vectorize/VPlan.h b/llvm/lib/Transforms/Vectorize/VPlan.h
index 329181df443db..13e42bde49925 100644
--- a/llvm/lib/Transforms/Vectorize/VPlan.h
+++ b/llvm/lib/Transforms/Vectorize/VPlan.h
@@ -541,7 +541,6 @@ class VPSingleDefRecipe : public VPRecipeBase, public VPRecipeValue {
   static inline bool classof(const VPRecipeBase *R) {
     switch (R->getVPDefID()) {
     case VPRecipeBase::VPDerivedIVSC:
-    case VPRecipeBase::VPEVLBasedIVPHISC:
     case VPRecipeBase::VPExpandSCEVSC:
     case VPRecipeBase::VPExpressionSC:
     case VPRecipeBase::VPInstructionSC:
@@ -560,6 +559,7 @@ class VPSingleDefRecipe : public VPRecipeBase, public VPRecipeValue {
     case VPRecipeBase::VPBlendSC:
     case VPRecipeBase::VPPredInstPHISC:
     case VPRecipeBase::VPCanonicalIVPHISC:
+    case VPRecipeBase::VPCumulativeIVPHISC:
     case VPRecipeBase::VPActiveLaneMaskPHISC:
     case VPRecipeBase::VPFirstOrderRecurrencePHISC:
     case VPRecipeBase::VPWidenPHISC:
@@ -3669,28 +3669,32 @@ class VPActiveLaneMaskPHIRecipe : public VPHeaderPHIRecipe {
 };
 
 /// A recipe for generating the phi node for the current index of elements,
-/// adjusted in accordance with EVL value. It starts at the start value of the
-/// canonical induction and gets incremented by EVL in each iteration of the
-/// vector loop.
-class VPEVLBasedIVPHIRecipe : public VPHeaderPHIRecipe {
+/// may be adjusted by variable-length-stepping transform. It starts at the
+/// start value of the canonical induction and gets incremented by the number
+/// of elements processed in each iteration of the vector loop.
+/// When the step equals VFxUF, this can be replaced by
+/// VPCanonicalIVPHIRecipe.
+class VPCumulativeIVPHIRecipe : public VPHeaderPHIRecipe {
 public:
-  VPEVLBasedIVPHIRecipe(VPValue *StartIV, DebugLoc DL)
-      : VPHeaderPHIRecipe(VPDef::VPEVLBasedIVPHISC, nullptr, StartIV, DL) {}
+  VPCumulativeIVPHIRecipe(VPValue *StartIV, DebugLoc DL)
+      : VPHeaderPHIRecipe(VPDef::VPCumulativeIVPHISC, nullptr, StartIV, DL) {}
 
-  ~VPEVLBasedIVPHIRecipe() override = default;
+  ~VPCumulativeIVPHIRecipe() override = default;
 
-  VPEVLBasedIVPHIRecipe *clone() override {
-    llvm_unreachable("cloning not implemented yet");
+  VPCumulativeIVPHIRecipe *clone() override {
+    auto *R = new VPCumulativeIVPHIRecipe(getStartValue(), getDebugLoc());
+    R->addOperand(getBackedgeValue());
+    return R;
   }
 
-  VP_CLASSOF_IMPL(VPDef::VPEVLBasedIVPHISC)
+  VP_CLASSOF_IMPL(VPDef::VPCumulativeIVPHISC)
 
   void execute(VPTransformState &State) override {
     llvm_unreachable("cannot execute this recipe, should be replaced by a "
                      "scalar phi recipe");
   }
 
-  /// Return the cost of this VPEVLBasedIVPHIRecipe.
+  /// Return the cost of this VPCumulativeIVPHIRecipe.
   InstructionCost computeCost(ElementCount VF,
                               VPCostContext &Ctx) const override {
     // For now, match the behavior of the legacy cost model.
@@ -4295,6 +4299,13 @@ class LLVM_ABI_FOR_TEST VPRegionBlock : public VPBlockBase {
     return const_cast<VPRegionBlock *>(this)->getCanonicalIV();
   }
 
+  VPCumulativeIVPHIRecipe *getCumulativeIV() {
+    return cast<VPCumulativeIVPHIRecipe>(getCanonicalIV()->getNextNode());
+  }
+  const VPCumulativeIVPHIRecipe *getCumulativeIV() const {
+    return const_cast<VPRegionBlock *>(this)->getCumulativeIV();
+  }
+
   /// Return the type of the canonical IV for loop regions.
   Type *getCanonicalIVType() { return getCanonicalIV()->getScalarType(); }
   const Type *getCanonicalIVType() const {
diff --git a/llvm/lib/Transforms/Vectorize/VPlanAnalysis.cpp b/llvm/lib/Transforms/Vectorize/VPlanAnalysis.cpp
index 994a4d8921480..25102ccec9c46 100644
--- a/llvm/lib/Transforms/Vectorize/VPlanAnalysis.cpp
+++ b/llvm/lib/Transforms/Vectorize/VPlanAnalysis.cpp
@@ -269,7 +269,7 @@ Type *VPTypeAnalysis::inferScalarType(const VPValue *V) {
       TypeSwitch<const VPRecipeBase *, Type *>(V->getDefiningRecipe())
           .Case<VPActiveLaneMaskPHIRecipe, VPCanonicalIVPHIRecipe,
                 VPFirstOrderRecurrencePHIRecipe, VPReductionPHIRecipe,
-                VPWidenPointerInductionRecipe, VPEVLBasedIVPHIRecipe>(
+                VPWidenPointerInductionRecipe, VPCumulativeIVPHIRecipe>(
               [this](const auto *R) {
                 // Handle header phi recipes, except VPWidenIntOrFpInduction
                 // which needs special handling due it being possibly truncated.
@@ -542,7 +542,7 @@ SmallVector<VPRegisterUsage, 8> llvm::calculateRegisterUsageForPlan(
 
         if (VFs[J].isScalar() ||
             isa<VPCanonicalIVPHIRecipe, VPReplicateRecipe, VPDerivedIVRecipe,
-                VPEVLBasedIVPHIRecipe, VPScalarIVStepsRecipe>(VPV) ||
+                VPCumulativeIVPHIRecipe, VPScalarIVStepsRecipe>(VPV) ||
             (isa<VPInstruction>(VPV) && vputils::onlyScalarValuesUsed(VPV)) ||
             (isa<VPReductionPHIRecipe>(VPV) &&
              (cast<VPReductionPHIRecipe>(VPV))->isInLoop())) {
diff --git a/llvm/lib/Transforms/Vectorize/VPlanConstruction.cpp b/llvm/lib/Transforms/Vectorize/VPlanConstruction.cpp
index 96dd3aff80eb4..ef76c452798fc 100644
--- a/llvm/lib/Transforms/Vectorize/VPlanConstruction.cpp
+++ b/llvm/lib/Transforms/Vectorize/VPlanConstruction.cpp
@@ -484,6 +484,25 @@ static void addCanonicalIVRecipes(VPlan &Plan, VPBasicBlock *HeaderVPBB,
                        LatchDL);
 }
 
+static void addCumulativeIVRecipes(VPlan &Plan, VPBasicBlock *HeaderVPBB,
+                                   VPBasicBlock *LatchVPBB, Type *IdxTy,
+                                   DebugLoc DL) {
+  auto *CanonicalIV = cast<VPCanonicalIVPHIRecipe>(&*HeaderVPBB->begin());
+  Value *StartIdx = ConstantInt::get(IdxTy, 0);
+  auto *StartV = Plan.getOrAddLiveIn(StartIdx);
+  // Add a CumulativeIV after CanonicalIV.
+  auto *CumulativeIVPHI = new VPCumulativeIVPHIRecipe(StartV, DL);
+  CumulativeIVPHI->insertAfter(CanonicalIV);
+
+  // Add the CumulativeIV increment. Initially steps by VFxUF.
+  VPBuilder Builder(LatchVPBB,
+                    std::next(CanonicalIV->getBackedgeRecipe().getIterator()));
+  auto *CumulativeIVIncrement = Builder.createOverflowingOp(
+      Instruction::Add, {&Plan.getVFxUF(), CumulativeIVPHI}, {true, false}, DL,
+      "cumulative.iv.next");
+  CumulativeIVPHI->addOperand(CumulativeIVIncrement);
+}
+
 /// Creates extracts for values in \p Plan defined in a loop region and used
 /// outside a loop region.
 static void createExtractsForLiveOuts(VPlan &Plan, VPBasicBlock *MiddleVPBB) {
@@ -567,6 +586,8 @@ static void addInitialSkeleton(VPlan &Plan, Type *InductionTy, DebugLoc IVDL,
         {VectorPhiR, VectorPhiR->getOperand(0)}, VectorPhiR->getDebugLoc());
     cast<VPIRPhi>(&ScalarPhiR)->addOperand(ResumePhiR);
   }
+
+  addCumulativeIVRecipes(Plan, HeaderVPBB, LatchVPBB, InductionTy, IVDL);
 }
 
 /// Check \p Plan's live-in and replace them with constants, if they can be
@@ -694,7 +715,7 @@ void VPlanTransforms::createHeaderPhiRecipes(
   };
 
   for (VPRecipeBase &R : make_early_inc_range(HeaderVPBB->phis())) {
-    if (isa<VPCanonicalIVPHIRecipe>(&R))
+    if (isa<VPCanonicalIVPHIRecipe, VPCumulativeIVPHIRecipe>(&R))
       continue;
     auto *PhiR = cast<VPPhi>(&R);
     VPHeaderPHIRecipe *HeaderPhiR = CreateHeaderPhiRecipe(PhiR);
@@ -1161,7 +1182,8 @@ bool VPlanTransforms::handleMaxMinNumReductions(VPlan &Plan) {
       MinMaxNumReductionsToHandle;
   bool HasUnsupportedPhi = false;
   for (auto &R : LoopRegion->getEntryBasicBlock()->phis()) {
-    if (isa<VPCanonicalIVPHIRecipe, VPWidenIntOrFpInductionRecipe>(&R))
+    if (isa<VPCanonicalIVPHIRecipe, VPCumulativeIVPHIRecipe,
+            VPWidenIntOrFpInductionRecipe>(&R))
       continue;
     auto *Cur = dyn_cast<VPReductionPHIRecipe>(&R);
     if (!Cur) {
diff --git a/llvm/lib/Transforms/Vectorize/VPlanRecipes.cpp b/llvm/lib/Transforms/Vectorize/VPlanRecipes.cpp
index 11e4f930f1e85..8960733183fc6 100644
--- a/llvm/lib/Transforms/Vectorize/VPlanRecipes.cpp
+++ b/llvm/lib/Transforms/Vectorize/VPlanRecipes.cpp
@@ -74,6 +74,7 @@ bool VPRecipeBase::mayWriteToMemory() const {
   case VPWidenIntrinsicSC:
     return cast<VPWidenIntrinsicRecipe>(this)->mayWriteToMemory();
   case VPCanonicalIVPHISC:
+  case VPCumulativeIVPHISC:
   case VPBranchOnMaskSC:
   case VPDerivedIVSC:
   case VPFirstOrderRecurrencePHISC:
@@ -4494,9 +4495,9 @@ void VPActiveLaneMaskPHIRecipe::printRecipe(raw_ostream &O, const Twine &Indent,
 #endif
 
 #if !defined(NDEBUG) || defined(LLVM_ENABLE_DUMP)
-void VPEVLBasedIVPHIRecipe::printRecipe(raw_ostream &O, const Twine &Indent,
-                                        VPSlotTracker &SlotTracker) const {
-  O << Indent << "EXPLICIT-VECTOR-LENGTH-BASED-IV-PHI ";
+void VPCumulativeIVPHIRecipe::printRecipe(raw_ostream &O, const Twine &Indent,
+                                          VPSlotTracker &SlotTracker) const {
+  O << Indent << "Cumulative-IV-PHI ";
 
   printAsOperand(O, SlotTracker);
   O << " = phi ";
diff --git a/llvm/lib/Transforms/Vectorize/VPlanTransforms.cpp b/llvm/lib/Transforms/Vectorize/VPlanTransforms.cpp
index bfef277070db7..d58fa8c6eaa73 100644
--- a/llvm/lib/Transforms/Vectorize/VPlanTransforms.cpp
+++ b/llvm/lib/Transforms/Vectorize/VPlanTransforms.cpp
@@ -2040,7 +2040,7 @@ static bool simplifyBranchConditionForVFAndUF(VPlan &Plan, ElementCount BestVF,
   if (all_of(Header->phis(), [](VPRecipeBase &Phi) {
         if (auto *R = dyn_cast<VPWidenIntOrFpInductionRecipe>(&Phi))
           return R->isCanonical();
-        return isa<VPCanonicalIVPHIRecipe, VPEVLBasedIVPHIRecipe,
+        return isa<VPCanonicalIVPHIRecipe, VPCumulativeIVPHIRecipe,
                    VPFirstOrderRecurrencePHIRecipe, VPPhi>(&Phi);
       })) {
     for (VPRecipeBase &HeaderR : make_early_inc_range(Header->phis())) {
@@ -3121,10 +3121,8 @@ static void fixupVFUsersForEVL(VPlan &Plan, VPValue &EVL) {
 /// Converts a tail folded vector loop region to step by
 /// VPInstruction::ExplicitVectorLength elements instead of VF elements each
 /// iteration.
-///
-/// - Add a VPEVLBasedIVPHIRecipe and related recipes to \p Plan and
-///   replaces all uses except the canonical IV increment of
-///   VPCanonicalIVPHIRecipe with a VPEVLBasedIVPHIRecipe.
+/// This transformation:
+/// - Makes VPCumulativeIVPHIRecipe step by EVL instead of VFxUF.
 ///   VPCanonicalIVPHIRecipe is used only for loop iterations counting after
 ///   this transformation.
 ///
@@ -3134,6 +3132,8 @@ static void fixupVFUsersForEVL(VPlan &Plan, VPValue &EVL) {
 ///   previous iteration, and VPFirstOrderRecurrencePHIRecipes are replaced with
 ///   @llvm.vp.splice.
 ///
+/// - Switches the loop from up-counting to down-counting.
+///
 /// The function uses the following definitions:
 ///  %StartV is the canonical induction start value.
 ///
@@ -3144,13 +3144,13 @@ static void fixupVFUsersForEVL(VPlan &Plan, VPValue &EVL) {
 ///
 /// vector.body:
 /// ...
-/// %EVLPhi = EXPLICIT-VECTOR-LENGTH-BASED-IV-PHI [ %StartV, %vector.ph ],
-///                                               [ %NextEVLIV, %vector.body ]
+/// %CumulativeIVPhi = Cumulative-IV-PHI [ %StartV, %vector.ph ],
+///                                      [ %NextIV, %vector.body ]
 /// %AVL = phi [ trip-count, %vector.ph ], [ %NextAVL, %vector.body ]
 /// %VPEVL = EXPLICIT-VECTOR-LENGTH %AVL
 /// ...
 /// %OpEVL = cast i32 %VPEVL to IVSize
-/// %NextEVLIV = add IVSize %OpEVL, %EVLPhi
+/// %NextIV = add IVSize %OpEVL, %CumulativeIVPhi
 /// %NextAVL = sub IVSize nuw %AVL, %OpEVL
 /// ...
 ///
@@ -3160,15 +3160,15 @@ static void fixupVFUsersForEVL(VPlan &Plan, VPValue &EVL) {
 ///
 /// vector.body:
 /// ...
-/// %EVLPhi = EXPLICIT-VECTOR-LENGTH-BASED-IV-PHI [ %StartV, %vector.ph ],
-///                                               [ %NextEVLIV, %vector.body ]
+/// %CumulativeIVPhi = Cumulative-IV-PHI [ %StartV, %vector.ph ],
+///                                      [ %NextIV, %vector.body ]
 /// %AVL = phi [ trip-count, %vector.ph ], [ %NextAVL, %vector.body ]
 /// %cmp = cmp ult %AVL, MaxSafeElements
 /// %SAFE_AVL = select %cmp, %AVL, MaxSafeElements
 /// %VPEVL = EXPLICIT-VECTOR-LENGTH %SAFE_AVL
 /// ...
 /// %OpEVL = cast i32 %VPEVL to IVSize
-/// %NextEVLIV = add IVSize %OpEVL, %EVLPhi
+/// %NextIV = add IVSize %OpEVL, %CumulativeIVPhi
 /// %NextAVL = sub IVSize nuw %AVL, %OpEVL
 /// ...
 ///
@@ -3181,11 +3181,9 @@ void VPlanTransforms::addExplicitVectorLength(
 
   auto *CanonicalIVPHI = LoopRegion->getCanonicalIV();
   auto *CanIVTy = LoopRegion->getCanonicalIVType();
-  VPValue *StartV = CanonicalIVPHI->getStartValue();
 
-  // Create the ExplicitVectorLengthPhi recipe in the main loop.
-  auto *EVLPhi = new VPEVLBasedIVPHIRecipe(StartV, DebugLoc::getUnknown());
-  EVLPhi->insertAfter(CanonicalIVPHI);
+  VPCumulativeIVPHIRecipe *CumulativeIVPhi = LoopRegion->getCumulativeIV();
+  VPRecipeBase *CumulativeIVInc = &CumulativeIVPhi->getBackedgeRecipe();
   VPBuilder Builder(Header, Header->getFirstNonPhi());
   // Create the AVL (application vector length), starting from TC -> 0 in steps
   // of EVL.
@@ -3205,19 +3203,17 @@ void VPlanTransforms::addExplicitVectorLength(
 
   auto *CanonicalIVIncrement =
       cast<VPInstruction>(CanonicalIVPHI->getBackedgeValue());
-  Builder.setInsertPoint(CanonicalIVIncrement);
+  Builder.setInsertPoint(CumulativeIVInc);
   VPValue *OpVPEVL = VPEVL;
 
   auto *I32Ty = Type::getInt32Ty(Plan.getContext());
-  OpVPEVL = Builder.createScalarZExtOrTrunc(
-      OpVPEVL, CanIVTy, I32Ty, CanonicalIVIncrement->getDebugLoc());
+  OpVPEVL = Builder.createScalarZExtOrTrunc(OpVPEVL, CanIVTy, I32Ty,
+                                            CumulativeIVInc->getDebugLoc());
+
+  CumulativeIVInc->setOperand(0, OpVPEVL);
 
-  auto *NextEVLIV = Builder.createOverflowingOp(
-      Instruction::Add, {OpVPEVL, EVLPhi},
-      {CanonicalIVIncrement->hasNoUnsignedWrap(),
-       CanonicalIVIncrement->hasNoSignedWrap()},
-      CanonicalIVIncrement->getDebugLoc(), "index.evl.next");
-  EVLPhi->addOperand(NextEVLIV);
+  Builder.setInsertPoint(CumulativeIVInc->getParent(),
+                         std::next(CumulativeIVInc->getIterator()));
 
   VPValue *NextAVL = Builder.createOverflowingOp(
       Instruction::Sub, {AVLPhi, OpVPEVL}, {/*hasNUW=*/true, /*hasNSW=*/false},
@@ -3228,89 +3224,135 @@ void VPlanTransforms::addExplicitVectorLength(
   removeDeadRecipes(Plan);
 
   // Replace all uses of VPCanonicalIVPHIRecipe by
-  // VPEVLBasedIVPHIRecipe except for the canonical IV increment.
-  CanonicalIVPHI->replaceAllUsesWith(EVLPhi);
+  // VPCumulativeIVPHIRecipe except for the canonical IV increment.
+  CanonicalIVPHI->replaceAllUsesWith(CumulativeIVPhi);
   CanonicalIVIncrement->setOperand(0, CanonicalIVPHI);
+
   // TODO: support unroll factor > 1.
   Plan.setUF(1);
+
+  // Switch the loop from up-counting to down counting.
+  // convert (branch-on-count (CanonicalInc, VTC)
+  // -> (branch-on-count (sub VTC, CanonicalIVInc), 0)
+  VPBasicBlock *LatchVPBB = LoopRegion->getExitingBasicBlock();
+  auto *LatchExitingBranch = cast<VPInstruction>(LatchVPBB->getTerminator());
+  if (match(LatchExitingBranch, m_BranchOnCond(m_True())))
+    return;
+  assert(match(LatchExitingBranch,
+               m_BranchOnCount(m_Specific(CanonicalIVIncrement),
+                               m_Specific(&Plan.getVectorTripCount()))) &&
+         "Unexpected terminator");
+  Builder.setInsertPoint(LatchExitingBranch);
+  VPValue *RemainElementCount = Builder.createOverflowingOp(
+      Instruction::Sub, {&Plan.getVectorTripCount(), CanonicalIVIncrement},
+      {/*hasNUW=*/true, /*hasNSW=*/false}, DebugLoc::getCompilerGenerated(),
+      "remain.element.count");
+  auto *Zero = Plan.getOrAddLiveIn(ConstantInt::get(CanIVTy, 0));
+  LatchExitingBranch->setOperand(0, RemainElementCount);
+  LatchExitingBranch->setOperand(1, Zero);
+}
+
+void VPlanTransforms::removeFixedStepCumulativeIV(VPlan &Plan) {
+  VPRegionBlock *LoopRegion = Plan.getVectorLoopRegion();
+  auto *CanonicalIV = LoopRegion->getCanonicalIV();
+  VPCumulativeIVPHIRecipe *CumulativeIVPhi = LoopRegion->getCumulativeIV();
+  VPRecipeBase *CumulativeIVInc = &CumulativeIVPhi->getBackedgeRecipe();
+  // Replace all uses with CanonicalIV if it steps by VF*UF.
+  if (match(CumulativeIVInc,
+            m_Binary<Instruction::Add>(m_Specific(&Plan.getVFxUF()),
+                                       m_Specific(CumulativeIVPhi)))) {
+    CumulativeIVPhi->replaceAllUsesWith(CanonicalIV);
+    CumulativeIVPhi->eraseFromParent();
+    CumulativeIVInc->eraseFromParent();
+  }
 }
 
-void VPlanTransforms::canonicalizeEVLLoops(VPlan &Plan) {
-  // Find EVL loop entries by locating VPEVLBasedIVPHIRecipe.
-  // There should be only one EVL PHI in the entire plan.
-  VPEVLBasedIVPHIRecipe *EVLPhi = nullptr;
+void VPlanTransforms::convertToVariableLengthStep(VPlan &Plan,
+                                                  bool TailByMasking) {
+  VPCumulativeIVPHIRecipe *CumulativeIVPhi = nullptr;
 
   for (VPBasicBlock *VPBB : VPBlockUtils::blocksOnly<VPBasicBlock>(
            vp_depth_first_shallow(Plan.getEntry())))
     for (VPRecipeBase &R : VPBB->phis())
-      if (auto *PhiR = dyn_cast<VPEVLBasedIVPHIRecipe>(&R)) {
-        assert(!EVLPhi && "Found multiple EVL PHIs. Only one expected");
-        EVLPhi = PhiR;
+      if (auto *PhiR = dyn_cast<VPCumulativeIVPHIRecipe>(&R)) {
+        assert(!CumulativeIVPhi &&
+               "Found multiple CumulativeIV. Only one expected");
+        CumulativeI...
[truncated]

lukel97 · 2026-01-21T08:56:35Z

llvm/lib/Transforms/Vectorize/VPlanConstruction.cpp

@@ -567,6 +586,8 @@ static void addInitialSkeleton(VPlan &Plan, Type *InductionTy, DebugLoc IVDL,
        {VectorPhiR, VectorPhiR->getOperand(0)}, VectorPhiR->getDebugLoc());
    cast<VPIRPhi>(&ScalarPhiR)->addOperand(ResumePhiR);
  }
+
+  addCumulativeIVRecipes(Plan, HeaderVPBB, LatchVPBB, InductionTy, IVDL);


Is there a particular reason why we always add the cumulative IV recipe for every VPlan? I would have thought we would only need to add it in the cases where we need to convert to a variably stepping loop region.

It is inspired by #166164, transforms after addExplicitVectorLength are expected to use CumulativeIV instead of CanonicalIV.
For instance, the diff in createScalarIVSteps :

VPHeaderPHIRecipe *IV = LoopRegion->getCanonicalIV(); if (auto *EVLIV = dyn_cast<VPEVLBasedIVPHIRecipe>(std::next(IV->getIterator()))) IV = EVLIV;

The transform can simply use getCumulativeIV() when it cares about the processed element count

To support getCumulativeIV(), I chose to create it by default during VPlan construction and remove it later if unused.

An alternative approach would be to only create it after variable-length transforms (e.g., addExplicitVectorLength), and have getCumulativeIV() fall back to CanonicalIV when CumulativeIV doesn't exist.
Would that be preferable?

An alternative approach would be to only create it after variable-length transforms (e.g., addExplicitVectorLength), and have getCumulativeIV() fall back to CanonicalIV when CumulativeIV doesn't exist.
Would that be preferable?

Yes, I think that's what I had in mind. That way we don't change any VPlans that don't care about CumulativeIV, and we don't need removeFixedStepCumulativeIV.

What we could also even do in a prior NFC is to add VPRegionBlock::getCumulativeIV, returning just getCanonicalIV() for now, and go through every caller of getCanonicalIV() and check if it should be moved to getCumulativeIV().

Then in this PR you can have make getCumulativeIV() return VPCumulativeIVPHIRecipe when it exists

lukel97 · 2026-01-21T09:05:36Z

llvm/lib/Transforms/Vectorize/VPlanTransforms.cpp

+      "remain.element.count");
+  auto *Zero = Plan.getOrAddLiveIn(ConstantInt::get(CanIVTy, 0));
+  LatchExitingBranch->setOperand(0, RemainElementCount);
+  LatchExitingBranch->setOperand(1, Zero);


How come we're now converting the branch condition earlier in the EVL specific transform? I think fault-only-first loads will want this format as well so I would have thought we would want to keep it shared in VPlanTransforms::convertToVariableLengthStep

I made it a multi-step transform because AVL {TC,-,Step} may be optional.
(branch-on-count CanonicalIVInc, VTC)
-> (branch-on-count (sub VectorTripCount, CanonicalIVInc), 0)
-> (branch-on-count (sub TripCount, (add Step, CumulativeIV)), 0)
-> (branch-on-count (sub AVL, Step), 0)

For masked-based loop with llvm.masked.load.ff, I'm not sure if we need an AVL to generate the mask.
For example:

%mask = call <VF x i1> @llvm.get.active.lane.mask(i32 %cumulative.iv, i32 %TC) %ret = call {<VF x ty>, <VF x i1>} @llvm.masked.load.ff (ptr %ptr, <VF x i1> %mask) %ret.mask = extractvalue {<VF x ty>, <VF x i1>} %ret, 1 ... %count = call @llvm.cttz.elts (<VF x i1> %ret.mask, i1 true) %cumulative.iv.next = add %count, %cumulative.iv

Hence I moved the down-counting decision into addExplicitVectorLength.

I see what you mean about llvm.masked.load.ff, but I think all the different transforms needed for branch-on-count is kind of complicated.

If that's the case that we want to only do the "downward counting" transform for EVL tail folded loops and not any variably stepped loop, I think it would be easier to just split it out from canonicalizeEVLLoops. I've opened up #178181 for this, after that I think this PR shouldn't need to worry about the exit condition

I've landed #178181 so that the EVL exit condition transform is split out. I've also relaxed an assertion that the VPEVLBasedIVPHIRecipe should have a ExplicitVectorLength in its backedge value, so I think you can probably get away with just renaming VPEVLBasedIVPHIRecipe in it

Thanks for #178181! That makes this an NFC now.

lukel97 · 2026-01-21T10:20:01Z

By the way, as a topic for bikeshedding I think we might need to find another term other than "cumulative IV", since IIUC an induction variable by definition has to increment/decrement by a fixed amount each time.

fhahn

It would be good if you could add a bit more detail on why this is needed and what the benefit is vs the EVL based IV. The description mostly describes how things to moved around, but it would be good to clarify why this is needed.

I don't think this recipe should be added to all plans unless there's a clear benefit in all cases. In terms of terminology, IV refers to induction, but inductions variables in LLVM terminology step by a loop-invariant amount, which the new recipes would not do in some cases.

llvm/test/Transforms/LoopVectorize/RISCV/inloop-reduction.ll

eas · 2026-01-21T19:03:43Z

By the way, as a topic for bikeshedding I think we might need to find another term other than "cumulative IV", since IIUC an induction variable by definition has to increment/decrement by a fixed amount each time.

VPNumProcessedElements, to be used for scalable vectors/first-fault loads/non-power-of-two vectorization? The last two could be beneficial for targets with fixed width vectors.

arcbbb · 2026-01-22T12:52:27Z

It would be good if you could add a bit more detail on why this is needed and what the benefit is vs the EVL based IV. The description mostly describes how things to moved around, but it would be good to clarify why this is needed.

Thanks! I've updated the description to clarify the motivation. Please take another look!

arcbbb · 2026-01-22T12:53:11Z

By the way, as a topic for bikeshedding I think we might need to find another term other than "cumulative IV", since IIUC an induction variable by definition has to increment/decrement by a fixed amount each time.

VPNumProcessedElements, to be used for scalable vectors/first-fault loads/non-power-of-two vectorization? The last two could be beneficial for targets with fixed width vectors.

Updated. Thanks @lukel97 and @eas !

This is split out from llvm#177114. In order to make canonicalizeEVLLoops a generic "convert to variable stepping" transform, move the code that changes the exit condition to a separate transform. Run it before canonicalizeEVLLoops before VPEVLBasedIVPHIRecipe is expanded. Also relax the assertion for VPInstruction::ExplicitVectorLength to just bail instead, since eventually VPEVLBasedIVPHIRecipe will be used by other loops that aren't EVL tail folded.

llvm/lib/Transforms/Vectorize/LoopVectorize.cpp

lukel97 · 2026-01-27T12:02:51Z

llvm/lib/Transforms/Vectorize/VPlanTransforms.cpp

+      "remain.element.count");
+  auto *Zero = Plan.getOrAddLiveIn(ConstantInt::get(CanIVTy, 0));
+  LatchExitingBranch->setOperand(0, RemainElementCount);
+  LatchExitingBranch->setOperand(1, Zero);


I see what you mean about llvm.masked.load.ff, but I think all the different transforms needed for branch-on-count is kind of complicated.

If that's the case that we want to only do the "downward counting" transform for EVL tail folded loops and not any variably stepped loop, I think it would be easier to just split it out from canonicalizeEVLLoops. I've opened up #178181 for this, after that I think this PR shouldn't need to worry about the exit condition

lukel97 · 2026-01-27T12:05:00Z

llvm/lib/Transforms/Vectorize/VPlan.h

-    llvm_unreachable("cloning not implemented yet");
+  VPNumProcessedElementsPHIRecipe *clone() override {
+    auto *R =
+        new VPNumProcessedElementsPHIRecipe(getStartValue(), getDebugLoc());


Just curious, does the phi get cloned when unrolling with #151300?

…NFC (#178181) This is split out from #177114. In order to make canonicalizeEVLLoops a generic "convert to variable stepping" transform, move the code that changes the exit condition to a separate transform since not all variable stepping loops will want to transform the exit condition. Run it before canonicalizeEVLLoops before VPEVLBasedIVPHIRecipe is expanded. Also relax the assertion for VPInstruction::ExplicitVectorLength to just bail instead, since eventually VPEVLBasedIVPHIRecipe will be used by other loops that aren't EVL tail folded.

lukel97 · 2026-02-02T05:20:24Z

By the way, as a topic for bikeshedding I think we might need to find another term other than "cumulative IV", since IIUC an induction variable by definition has to increment/decrement by a fixed amount each time.

VPNumProcessedElements, to be used for scalable vectors/first-fault loads/non-power-of-two vectorization? The last two could be beneficial for targets with fixed width vectors.

I think VPNumProcessedElements is a bit non-standard, and it's not clear to me what constitutes an element being processed. E.g. if the original scalar loop has two loads and two stores, is each one load/store pair an element processed?

How about something like CurrentTripCount. That way it can be defined in terms of the original scalar loop.

eas · 2026-02-02T16:03:02Z

How about something like CurrentTripCount.

This can be confusing as well, one might thing about number of iterations remaining (maybe?). How about [Orig|Scalar]IterationsProcessed?

lukel97 · 2026-02-03T03:56:01Z

How about something like CurrentTripCount.

This can be confusing as well, one might thing about number of iterations remaining (maybe?). How about [Orig|Scalar]IterationsProcessed?

I see what you mean, I don't think the term "processed" is used much in the loop vectorizer currently though.

How about CurrentIteration? That would match with how VPlan::TripCount/VPlan::BackedgeTakenCount implicitly refer to the scalar loop. The vector equivalent would be VPlan::VectorCurrentIteration.

This is groundwork for llvm#151300, which aims to support first-faulting loads in non-tail-folded early-exit loops. Per llvm#175900, we need a variable-length stepping transform that can shared between EVL and non-EVL loops. The idea is to have an EVL-independent counter and transform for tracking the cumulative number of processed elements. This patch renames the existing counter (VPEVLBasedIVPHIRecipe) and transform (canonicalizeEVLLoops) to be EVL-independent: - Rename VPEVLBasedIVPHIRecipe to VPCurrentIterationRecipe to reflect its general purpose of tracking processed element count. - Rename canonicalizeEVLLoops to convertToVariableLengthStep.

arcbbb · 2026-02-04T09:27:23Z

I'm renaming the title again, apologies for the confusion.

lukel97

LGTM, just left some nits. Can you also update the PR title to mention it's NFC?

llvm/lib/Transforms/Vectorize/VPlanRecipes.cpp

llvm/lib/Transforms/Vectorize/VPlanTransforms.cpp

fhahn

It would probably be good to update title/ description to clarify that this renames EVL-based PHI recipe to more general name. My reading of the current title implies new functionality

llvm/lib/Transforms/Vectorize/VPlan.h

llvm/lib/Transforms/Vectorize/VPlanVerifier.cpp

llvm/lib/Transforms/Vectorize/VPlan.h

1. Rename VPCurrentIterationRecipe to VPCurrentIterationPHIRecipe. 2. Rename VPCurrentIterationSC to VPCurrentIterationPHISC. 3. Rephrase current index of elements. 4. Update assertion string in verifier.

arcbbb · 2026-02-09T03:14:54Z

@fhahn is there anything blocking this or any changes you'd like me to make?

llvm/lib/Transforms/Vectorize/VPlan.h

llvm/lib/Transforms/Vectorize/VPlanTransforms.cpp

llvm/lib/Transforms/Vectorize/VPlanTransforms.h

llvm/lib/Transforms/Vectorize/VPlanTransforms.cpp

fhahn · 2026-02-09T12:05:59Z

llvm/lib/Transforms/Vectorize/VPlanTransforms.cpp

  // The EVL IV is always immediately after the canonical IV.
-  auto *EVLPhi =
-      dyn_cast_or_null<VPEVLBasedIVPHIRecipe>(std::next(CanIV->getIterator()));
+  auto *EVLPhi = dyn_cast_or_null<VPCurrentIterationPHIRecipe>(


This probably also needs updating; whether it is EVL based will be determine later, by checking the increment I think

Right after this, we check that the increment is EVL and bail out if it's not.

fhahn

I think there's also llvm/test/Transforms/LoopVectorize/vplan-force-tail-with-evl.ll which has ; NO-VP-NOT: EXPLICIT-VECTOR-LENGTH-BASED-IV-PHI

llvmbot added backend:RISC-V vectorizers llvm:transforms labels Jan 21, 2026

arcbbb requested review from ayalz, fhahn and lukel97 and removed request for ayalz and lukel97 January 21, 2026 08:25

lukel97 reviewed Jan 21, 2026

View reviewed changes

arcbbb mentioned this pull request Jan 21, 2026

First-fault/fault-only-first load vectorization #175900

Open

arcbbb force-pushed the variable-stepping-ups branch from c6e7432 to 9dce672 Compare January 21, 2026 10:18

fhahn reviewed Jan 21, 2026

View reviewed changes

llvm/test/Transforms/LoopVectorize/RISCV/inloop-reduction.ll Outdated Show resolved Hide resolved

arcbbb force-pushed the variable-stepping-ups branch from 9dce672 to 38aa9d6 Compare January 22, 2026 12:36

arcbbb changed the title ~~[VPlan] Introduce VPCumulativeIVPHIRecipe for variable-length stepping~~ [VPlan] Introduce VPNumProcessedElementsPHIRecipe for variable-length stepping Jan 22, 2026

arcbbb force-pushed the variable-stepping-ups branch from 38aa9d6 to 49f4470 Compare January 22, 2026 12:45

llvm deleted a comment from github-actions bot Jan 22, 2026

lukel97 mentioned this pull request Jan 27, 2026

[VPlan] Split out EVL exit cond transform from canonicalizeEVLLoops. NFC #178181

Merged

lukel97 reviewed Jan 27, 2026

View reviewed changes

arcbbb force-pushed the variable-stepping-ups branch from 49f4470 to c95a2e5 Compare February 4, 2026 09:24

arcbbb changed the title ~~[VPlan] Introduce VPNumProcessedElementsPHIRecipe for variable-length stepping~~ [VPlan] Introduce generic variable-length step support Feb 4, 2026

Address comment

cd43c0c

lukel97 approved these changes Feb 5, 2026

View reviewed changes

Address comments

4984b18

arcbbb changed the title ~~[VPlan] Introduce generic variable-length step support~~ [NFC][VPlan] Introduce generic variable-length step support Feb 6, 2026

arcbbb enabled auto-merge (squash) February 6, 2026 09:33

fhahn reviewed Feb 6, 2026

View reviewed changes

llvm/lib/Transforms/Vectorize/VPlan.h Outdated Show resolved Hide resolved

llvm/lib/Transforms/Vectorize/VPlanVerifier.cpp Outdated Show resolved Hide resolved

llvm/lib/Transforms/Vectorize/VPlan.h Outdated Show resolved Hide resolved

fhahn disabled auto-merge February 6, 2026 09:55

arcbbb added 2 commits February 6, 2026 02:20

Address comments

59c615c

1. Rename VPCurrentIterationRecipe to VPCurrentIterationPHIRecipe. 2. Rename VPCurrentIterationSC to VPCurrentIterationPHISC. 3. Rephrase current index of elements. 4. Update assertion string in verifier.

clang-formatted

80d2e7b

llvm deleted a comment from github-actions bot Feb 6, 2026

arcbbb changed the title ~~[NFC][VPlan] Introduce generic variable-length step support~~ [NFC][VPlan] Rename VPEVLBasedIVPHIRecipe to VPCurrentIterationPHIRecipe Feb 6, 2026

fhahn reviewed Feb 9, 2026

View reviewed changes

Replace uses of VPEVLBasedIVPHIRecipe and evl.based.iv

12d742d

[NFC][VPlan] Rename VPEVLBasedIVPHIRecipe to VPCurrentIterationPHIRecipe #177114

Are you sure you want to change the base?

[NFC][VPlan] Rename VPEVLBasedIVPHIRecipe to VPCurrentIterationPHIRecipe #177114

Conversation

arcbbb commented Jan 21, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

llvmbot commented Jan 21, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

lukel97 commented Jan 21, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

fhahn left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

eas commented Jan 21, 2026

Uh oh!

arcbbb commented Jan 22, 2026

Uh oh!

arcbbb commented Jan 22, 2026

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

lukel97 commented Feb 2, 2026

Uh oh!

eas commented Feb 2, 2026

Uh oh!

lukel97 commented Feb 3, 2026

Uh oh!

arcbbb commented Feb 4, 2026

Uh oh!

lukel97 left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

fhahn left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

arcbbb commented Feb 9, 2026

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

fhahn left a comment

Choose a reason for hiding this comment

arcbbb commented Jan 21, 2026 •

edited

Loading

llvmbot commented Jan 21, 2026 •

edited

Loading

lukel97 commented Jan 21, 2026 •

edited

Loading