Skip to content

Conversation

@arcbbb
Copy link
Contributor

@arcbbb arcbbb commented Jan 21, 2026

This is groundwork for #151300, which aims to support first-faulting
loads in non-tail-folded early-exit loops.
Per #175900, we need a variable-length stepping transform that can
shared between EVL and non-EVL loops.
The idea is to have an EVL-independent counter and transform for
tracking the cumulative number of processed elements.

This patch renames the existing counter (VPEVLBasedIVPHIRecipe) and
transform (canonicalizeEVLLoops) to be EVL-independent:

  • Rename VPEVLBasedIVPHIRecipe to VPCurrentIterationRecipe to
    reflect its general purpose of tracking processed element count.
  • Rename canonicalizeEVLLoops to convertToVariableLengthStep.

This is NFC.

@llvmbot
Copy link
Member

llvmbot commented Jan 21, 2026

@llvm/pr-subscribers-vectorizers
@llvm/pr-subscribers-backend-risc-v

@llvm/pr-subscribers-llvm-transforms

Author: Shih-Po Hung (arcbbb)

Changes

This patch introduces VPCumulativeIVPHIRecipe to track the cumulative count of processed elements across loop iterations. Unlike CanonicalIV which always increments by VF*UF, CumulativeIV can step by a variable amount (e.g., EVL) per iteration.

Key changes:

  • Rename VPEVLBasedIVPHIRecipe to VPCumulativeIVPHIRecipe and create it during initial VPlan construction (in addCumulativeIVRecipes).
  • Initially, CumulativeIV steps by VF*UF, same as CanonicalIV.
  • In addExplicitVectorLength, modify CumulativeIV to step by EVL and switch the loop from up-counting to down-counting.
  • Add removeFixedStepCumulativeIV to eliminate redundant CumulativeIV when it steps by VF*UF (i.e., equivalent to CanonicalIV).
  • In convertToVariableLengthStep (formerly canonicalizeEVLLoops), replace CanonicalIV with CumulativeIV and lower to concrete recipes.

This also addresses the issue in #166164 (comment) which needs a cumulative element count in createScalarIVSteps.


Patch is 579.98 KiB, truncated to 20.00 KiB below, full version: https://github.com/llvm/llvm-project/pull/177114.diff

79 Files Affected:

  • (modified) llvm/lib/Transforms/Vectorize/LoopVectorize.cpp (+10-4)
  • (modified) llvm/lib/Transforms/Vectorize/VPlan.h (+23-12)
  • (modified) llvm/lib/Transforms/Vectorize/VPlanAnalysis.cpp (+2-2)
  • (modified) llvm/lib/Transforms/Vectorize/VPlanConstruction.cpp (+24-2)
  • (modified) llvm/lib/Transforms/Vectorize/VPlanRecipes.cpp (+4-3)
  • (modified) llvm/lib/Transforms/Vectorize/VPlanTransforms.cpp (+130-88)
  • (modified) llvm/lib/Transforms/Vectorize/VPlanTransforms.h (+14-16)
  • (modified) llvm/lib/Transforms/Vectorize/VPlanValue.h (+1-1)
  • (modified) llvm/lib/Transforms/Vectorize/VPlanVerifier.cpp (+3-2)
  • (modified) llvm/test/Transforms/LoopVectorize/RISCV/bf16.ll (+5-5)
  • (modified) llvm/test/Transforms/LoopVectorize/RISCV/blocks-with-dead-instructions.ll (+4-4)
  • (modified) llvm/test/Transforms/LoopVectorize/RISCV/dead-ops-cost.ll (+3-3)
  • (modified) llvm/test/Transforms/LoopVectorize/RISCV/defaults.ll (+1-1)
  • (modified) llvm/test/Transforms/LoopVectorize/RISCV/evl-compatible-loops.ll (+6-6)
  • (modified) llvm/test/Transforms/LoopVectorize/RISCV/f16.ll (+2-2)
  • (modified) llvm/test/Transforms/LoopVectorize/RISCV/first-order-recurrence-scalable-vf1.ll (+4-4)
  • (modified) llvm/test/Transforms/LoopVectorize/RISCV/fminimumnum.ll (+8-8)
  • (modified) llvm/test/Transforms/LoopVectorize/RISCV/induction-costs.ll (+4-4)
  • (modified) llvm/test/Transforms/LoopVectorize/RISCV/inloop-reduction.ll (+14-14)
  • (modified) llvm/test/Transforms/LoopVectorize/RISCV/interleaved-masked-access.ll (+6-6)
  • (modified) llvm/test/Transforms/LoopVectorize/RISCV/ordered-reduction.ll (+1-1)
  • (modified) llvm/test/Transforms/LoopVectorize/RISCV/partial-reduce-dot-product.ll (+16-16)
  • (modified) llvm/test/Transforms/LoopVectorize/RISCV/predicated-costs.ll (+3-3)
  • (modified) llvm/test/Transforms/LoopVectorize/RISCV/reductions.ll (+36-36)
  • (modified) llvm/test/Transforms/LoopVectorize/RISCV/reg-usage-prune-vf.ll (+6-6)
  • (modified) llvm/test/Transforms/LoopVectorize/RISCV/riscv-vector-reverse.ll (+16-16)
  • (modified) llvm/test/Transforms/LoopVectorize/RISCV/scalable-basics.ll (+5-5)
  • (modified) llvm/test/Transforms/LoopVectorize/RISCV/scalable-tailfold.ll (+18-18)
  • (modified) llvm/test/Transforms/LoopVectorize/RISCV/select-cmp-reduction.ll (+17-17)
  • (modified) llvm/test/Transforms/LoopVectorize/RISCV/strided-accesses.ll (+6-6)
  • (modified) llvm/test/Transforms/LoopVectorize/RISCV/tail-folding-bin-unary-ops-args.ll (+72-72)
  • (modified) llvm/test/Transforms/LoopVectorize/RISCV/tail-folding-call-intrinsics.ll (+40-40)
  • (modified) llvm/test/Transforms/LoopVectorize/RISCV/tail-folding-cast-intrinsics.ll (+43-43)
  • (modified) llvm/test/Transforms/LoopVectorize/RISCV/tail-folding-complex-mask.ll (+6-6)
  • (modified) llvm/test/Transforms/LoopVectorize/RISCV/tail-folding-cond-reduction.ll (+24-24)
  • (modified) llvm/test/Transforms/LoopVectorize/RISCV/tail-folding-div.ll (+36-36)
  • (modified) llvm/test/Transforms/LoopVectorize/RISCV/tail-folding-fixed-order-recurrence.ll (+17-17)
  • (modified) llvm/test/Transforms/LoopVectorize/RISCV/tail-folding-inloop-reduction.ll (+59-59)
  • (modified) llvm/test/Transforms/LoopVectorize/RISCV/tail-folding-interleave.ll (+4-4)
  • (modified) llvm/test/Transforms/LoopVectorize/RISCV/tail-folding-intermediate-store.ll (+7-7)
  • (modified) llvm/test/Transforms/LoopVectorize/RISCV/tail-folding-iv32.ll (+4-4)
  • (modified) llvm/test/Transforms/LoopVectorize/RISCV/tail-folding-known-no-overflow.ll (+9-9)
  • (modified) llvm/test/Transforms/LoopVectorize/RISCV/tail-folding-masked-loadstore.ll (+4-4)
  • (modified) llvm/test/Transforms/LoopVectorize/RISCV/tail-folding-ordered-reduction.ll (+4-4)
  • (modified) llvm/test/Transforms/LoopVectorize/RISCV/tail-folding-reduction.ll (+43-43)
  • (modified) llvm/test/Transforms/LoopVectorize/RISCV/tail-folding-reverse-load-store.ll (+10-10)
  • (modified) llvm/test/Transforms/LoopVectorize/RISCV/tail-folding-safe-dep-distance.ll (+12-12)
  • (modified) llvm/test/Transforms/LoopVectorize/RISCV/tail-folding-uniform-store.ll (+3-3)
  • (modified) llvm/test/Transforms/LoopVectorize/RISCV/transform-narrow-interleave-to-widen-memory.ll (+6-6)
  • (modified) llvm/test/Transforms/LoopVectorize/RISCV/truncate-to-minimal-bitwidth-cost.ll (+4-4)
  • (modified) llvm/test/Transforms/LoopVectorize/RISCV/type-info-cache-evl-crash.ll (+3-3)
  • (modified) llvm/test/Transforms/LoopVectorize/RISCV/uniform-load-store.ll (+32-32)
  • (modified) llvm/test/Transforms/LoopVectorize/RISCV/vectorize-vp-intrinsics.ll (+5-5)
  • (modified) llvm/test/Transforms/LoopVectorize/RISCV/vplan-riscv-vector-reverse.ll (+5-4)
  • (modified) llvm/test/Transforms/LoopVectorize/RISCV/vplan-vp-intrinsics-fixed-order-recurrence.ll (+7-6)
  • (modified) llvm/test/Transforms/LoopVectorize/RISCV/vplan-vp-intrinsics-reduction.ll (+11-9)
  • (modified) llvm/test/Transforms/LoopVectorize/RISCV/vplan-vp-intrinsics.ll (+5-4)
  • (modified) llvm/test/Transforms/LoopVectorize/X86/drop-inbounds-flags-for-reverse-vector-pointer.ll (+1-1)
  • (modified) llvm/test/Transforms/LoopVectorize/X86/tail_loop_folding.ll (+1-1)
  • (modified) llvm/test/Transforms/LoopVectorize/first-order-recurrence-sink-replicate-region.ll (+1-1)
  • (modified) llvm/test/Transforms/LoopVectorize/first-order-recurrence.ll (+6-6)
  • (modified) llvm/test/Transforms/LoopVectorize/fmax-without-fast-math-flags-interleave.ll (+1-1)
  • (modified) llvm/test/Transforms/LoopVectorize/iv-select-cmp-decreasing.ll (+2-2)
  • (modified) llvm/test/Transforms/LoopVectorize/pr51614-fold-tail-by-masking.ll (+1-1)
  • (modified) llvm/test/Transforms/LoopVectorize/reduction-inloop-min-max.ll (+4-4)
  • (modified) llvm/test/Transforms/LoopVectorize/reduction-inloop-pred.ll (+13-13)
  • (modified) llvm/test/Transforms/LoopVectorize/reduction-inloop-uf4.ll (+3-3)
  • (modified) llvm/test/Transforms/LoopVectorize/reduction-inloop.ll (+38-38)
  • (modified) llvm/test/Transforms/LoopVectorize/reduction-minmax-users-and-predicated.ll (+1-1)
  • (modified) llvm/test/Transforms/LoopVectorize/reduction-order.ll (+1-1)
  • (modified) llvm/test/Transforms/LoopVectorize/reduction-predselect.ll (+10-10)
  • (modified) llvm/test/Transforms/LoopVectorize/remarks-reduction-inloop.ll (+1-1)
  • (modified) llvm/test/Transforms/LoopVectorize/select-reduction.ll (+1-1)
  • (modified) llvm/test/Transforms/LoopVectorize/store-reduction-results-in-tail-folded-loop.ll (+1-1)
  • (modified) llvm/test/Transforms/LoopVectorize/strict-fadd-interleave-only.ll (+6-6)
  • (modified) llvm/test/Transforms/LoopVectorize/vplan-printing-reductions.ll (+19-19)
  • (modified) llvm/unittests/Transforms/Vectorize/VPlanHCFGTest.cpp (+16-8)
  • (modified) llvm/unittests/Transforms/Vectorize/VPlanSlpTest.cpp (+22-22)
  • (modified) llvm/unittests/Transforms/Vectorize/VPlanTest.cpp (+1-1)
diff --git a/llvm/lib/Transforms/Vectorize/LoopVectorize.cpp b/llvm/lib/Transforms/Vectorize/LoopVectorize.cpp
index 346b8a1f9e420..3da0c6f206ec1 100644
--- a/llvm/lib/Transforms/Vectorize/LoopVectorize.cpp
+++ b/llvm/lib/Transforms/Vectorize/LoopVectorize.cpp
@@ -4117,10 +4117,10 @@ static bool willGenerateVectors(VPlan &Plan, ElementCount VF,
       case VPDef::VPReplicateSC:
       case VPDef::VPInstructionSC:
       case VPDef::VPCanonicalIVPHISC:
+      case VPDef::VPCumulativeIVPHISC:
       case VPDef::VPVectorPointerSC:
       case VPDef::VPVectorEndPointerSC:
       case VPDef::VPExpandSCEVSC:
-      case VPDef::VPEVLBasedIVPHISC:
       case VPDef::VPPredInstPHISC:
       case VPDef::VPBranchOnMaskSC:
         continue;
@@ -4632,8 +4632,9 @@ LoopVectorizationPlanner::selectInterleaveCount(VPlan &Plan, ElementCount VF,
       !(CM.preferPredicatedLoop() && CM.useWideActiveLaneMask()))
     return 1;
 
+  // TODO: Support interleave for loop with variable-length stepping.
   if (any_of(Plan.getVectorLoopRegion()->getEntryBasicBlock()->phis(),
-             IsaPred<VPEVLBasedIVPHIRecipe>)) {
+             IsaPred<VPCumulativeIVPHIRecipe>)) {
     LLVM_DEBUG(dbgs() << "LV: Preference for VP intrinsics indicated. "
                          "Unroll factor forced to be 1.\n");
     return 1;
@@ -7443,8 +7444,8 @@ DenseMap<const SCEV *, Value *> LoopVectorizationPlanner::executePlan(
   // Expand BranchOnTwoConds after dissolution, when latch has direct access to
   // its successors.
   VPlanTransforms::expandBranchOnTwoConds(BestVPlan);
-  // Canonicalize EVL loops after regions are dissolved.
-  VPlanTransforms::canonicalizeEVLLoops(BestVPlan);
+  VPlanTransforms::convertToVariableLengthStep(BestVPlan,
+                                               CM.foldTailByMasking());
   VPlanTransforms::materializeBackedgeTakenCount(BestVPlan, VectorPH);
   VPlanTransforms::materializeVectorTripCount(
       BestVPlan, VectorPH, CM.foldTailByMasking(),
@@ -8371,6 +8372,10 @@ void LoopVectorizationPlanner::buildVPlansWithVPRecipes(ElementCount MinVF,
                                  *Plan, CM.getMaxSafeElements());
         VPlanTransforms::runPass(VPlanTransforms::optimizeEVLMasks, *Plan);
       }
+      // TODO: Place this before optimization after addExplicitVectorLength
+      // is placed close to addActiveLaneMask.
+      VPlanTransforms::runPass(VPlanTransforms::removeFixedStepCumulativeIV,
+                               *Plan);
       assert(verifyVPlanIsValid(*Plan) && "VPlan is invalid");
       VPlans.push_back(std::move(Plan));
     }
@@ -8674,6 +8679,7 @@ VPlanPtr LoopVectorizationPlanner::tryToBuildVPlan(VFRange &Range) {
   // failures.
   DenseMap<VPValue *, VPValue *> IVEndValues;
   VPlanTransforms::updateScalarResumePhis(*Plan, IVEndValues);
+  VPlanTransforms::removeFixedStepCumulativeIV(*Plan);
 
   assert(verifyVPlanIsValid(*Plan) && "VPlan is invalid");
   return Plan;
diff --git a/llvm/lib/Transforms/Vectorize/VPlan.h b/llvm/lib/Transforms/Vectorize/VPlan.h
index 329181df443db..13e42bde49925 100644
--- a/llvm/lib/Transforms/Vectorize/VPlan.h
+++ b/llvm/lib/Transforms/Vectorize/VPlan.h
@@ -541,7 +541,6 @@ class VPSingleDefRecipe : public VPRecipeBase, public VPRecipeValue {
   static inline bool classof(const VPRecipeBase *R) {
     switch (R->getVPDefID()) {
     case VPRecipeBase::VPDerivedIVSC:
-    case VPRecipeBase::VPEVLBasedIVPHISC:
     case VPRecipeBase::VPExpandSCEVSC:
     case VPRecipeBase::VPExpressionSC:
     case VPRecipeBase::VPInstructionSC:
@@ -560,6 +559,7 @@ class VPSingleDefRecipe : public VPRecipeBase, public VPRecipeValue {
     case VPRecipeBase::VPBlendSC:
     case VPRecipeBase::VPPredInstPHISC:
     case VPRecipeBase::VPCanonicalIVPHISC:
+    case VPRecipeBase::VPCumulativeIVPHISC:
     case VPRecipeBase::VPActiveLaneMaskPHISC:
     case VPRecipeBase::VPFirstOrderRecurrencePHISC:
     case VPRecipeBase::VPWidenPHISC:
@@ -3669,28 +3669,32 @@ class VPActiveLaneMaskPHIRecipe : public VPHeaderPHIRecipe {
 };
 
 /// A recipe for generating the phi node for the current index of elements,
-/// adjusted in accordance with EVL value. It starts at the start value of the
-/// canonical induction and gets incremented by EVL in each iteration of the
-/// vector loop.
-class VPEVLBasedIVPHIRecipe : public VPHeaderPHIRecipe {
+/// may be adjusted by variable-length-stepping transform. It starts at the
+/// start value of the canonical induction and gets incremented by the number
+/// of elements processed in each iteration of the vector loop.
+/// When the step equals VFxUF, this can be replaced by
+/// VPCanonicalIVPHIRecipe.
+class VPCumulativeIVPHIRecipe : public VPHeaderPHIRecipe {
 public:
-  VPEVLBasedIVPHIRecipe(VPValue *StartIV, DebugLoc DL)
-      : VPHeaderPHIRecipe(VPDef::VPEVLBasedIVPHISC, nullptr, StartIV, DL) {}
+  VPCumulativeIVPHIRecipe(VPValue *StartIV, DebugLoc DL)
+      : VPHeaderPHIRecipe(VPDef::VPCumulativeIVPHISC, nullptr, StartIV, DL) {}
 
-  ~VPEVLBasedIVPHIRecipe() override = default;
+  ~VPCumulativeIVPHIRecipe() override = default;
 
-  VPEVLBasedIVPHIRecipe *clone() override {
-    llvm_unreachable("cloning not implemented yet");
+  VPCumulativeIVPHIRecipe *clone() override {
+    auto *R = new VPCumulativeIVPHIRecipe(getStartValue(), getDebugLoc());
+    R->addOperand(getBackedgeValue());
+    return R;
   }
 
-  VP_CLASSOF_IMPL(VPDef::VPEVLBasedIVPHISC)
+  VP_CLASSOF_IMPL(VPDef::VPCumulativeIVPHISC)
 
   void execute(VPTransformState &State) override {
     llvm_unreachable("cannot execute this recipe, should be replaced by a "
                      "scalar phi recipe");
   }
 
-  /// Return the cost of this VPEVLBasedIVPHIRecipe.
+  /// Return the cost of this VPCumulativeIVPHIRecipe.
   InstructionCost computeCost(ElementCount VF,
                               VPCostContext &Ctx) const override {
     // For now, match the behavior of the legacy cost model.
@@ -4295,6 +4299,13 @@ class LLVM_ABI_FOR_TEST VPRegionBlock : public VPBlockBase {
     return const_cast<VPRegionBlock *>(this)->getCanonicalIV();
   }
 
+  VPCumulativeIVPHIRecipe *getCumulativeIV() {
+    return cast<VPCumulativeIVPHIRecipe>(getCanonicalIV()->getNextNode());
+  }
+  const VPCumulativeIVPHIRecipe *getCumulativeIV() const {
+    return const_cast<VPRegionBlock *>(this)->getCumulativeIV();
+  }
+
   /// Return the type of the canonical IV for loop regions.
   Type *getCanonicalIVType() { return getCanonicalIV()->getScalarType(); }
   const Type *getCanonicalIVType() const {
diff --git a/llvm/lib/Transforms/Vectorize/VPlanAnalysis.cpp b/llvm/lib/Transforms/Vectorize/VPlanAnalysis.cpp
index 994a4d8921480..25102ccec9c46 100644
--- a/llvm/lib/Transforms/Vectorize/VPlanAnalysis.cpp
+++ b/llvm/lib/Transforms/Vectorize/VPlanAnalysis.cpp
@@ -269,7 +269,7 @@ Type *VPTypeAnalysis::inferScalarType(const VPValue *V) {
       TypeSwitch<const VPRecipeBase *, Type *>(V->getDefiningRecipe())
           .Case<VPActiveLaneMaskPHIRecipe, VPCanonicalIVPHIRecipe,
                 VPFirstOrderRecurrencePHIRecipe, VPReductionPHIRecipe,
-                VPWidenPointerInductionRecipe, VPEVLBasedIVPHIRecipe>(
+                VPWidenPointerInductionRecipe, VPCumulativeIVPHIRecipe>(
               [this](const auto *R) {
                 // Handle header phi recipes, except VPWidenIntOrFpInduction
                 // which needs special handling due it being possibly truncated.
@@ -542,7 +542,7 @@ SmallVector<VPRegisterUsage, 8> llvm::calculateRegisterUsageForPlan(
 
         if (VFs[J].isScalar() ||
             isa<VPCanonicalIVPHIRecipe, VPReplicateRecipe, VPDerivedIVRecipe,
-                VPEVLBasedIVPHIRecipe, VPScalarIVStepsRecipe>(VPV) ||
+                VPCumulativeIVPHIRecipe, VPScalarIVStepsRecipe>(VPV) ||
             (isa<VPInstruction>(VPV) && vputils::onlyScalarValuesUsed(VPV)) ||
             (isa<VPReductionPHIRecipe>(VPV) &&
              (cast<VPReductionPHIRecipe>(VPV))->isInLoop())) {
diff --git a/llvm/lib/Transforms/Vectorize/VPlanConstruction.cpp b/llvm/lib/Transforms/Vectorize/VPlanConstruction.cpp
index 96dd3aff80eb4..ef76c452798fc 100644
--- a/llvm/lib/Transforms/Vectorize/VPlanConstruction.cpp
+++ b/llvm/lib/Transforms/Vectorize/VPlanConstruction.cpp
@@ -484,6 +484,25 @@ static void addCanonicalIVRecipes(VPlan &Plan, VPBasicBlock *HeaderVPBB,
                        LatchDL);
 }
 
+static void addCumulativeIVRecipes(VPlan &Plan, VPBasicBlock *HeaderVPBB,
+                                   VPBasicBlock *LatchVPBB, Type *IdxTy,
+                                   DebugLoc DL) {
+  auto *CanonicalIV = cast<VPCanonicalIVPHIRecipe>(&*HeaderVPBB->begin());
+  Value *StartIdx = ConstantInt::get(IdxTy, 0);
+  auto *StartV = Plan.getOrAddLiveIn(StartIdx);
+  // Add a CumulativeIV after CanonicalIV.
+  auto *CumulativeIVPHI = new VPCumulativeIVPHIRecipe(StartV, DL);
+  CumulativeIVPHI->insertAfter(CanonicalIV);
+
+  // Add the CumulativeIV increment. Initially steps by VFxUF.
+  VPBuilder Builder(LatchVPBB,
+                    std::next(CanonicalIV->getBackedgeRecipe().getIterator()));
+  auto *CumulativeIVIncrement = Builder.createOverflowingOp(
+      Instruction::Add, {&Plan.getVFxUF(), CumulativeIVPHI}, {true, false}, DL,
+      "cumulative.iv.next");
+  CumulativeIVPHI->addOperand(CumulativeIVIncrement);
+}
+
 /// Creates extracts for values in \p Plan defined in a loop region and used
 /// outside a loop region.
 static void createExtractsForLiveOuts(VPlan &Plan, VPBasicBlock *MiddleVPBB) {
@@ -567,6 +586,8 @@ static void addInitialSkeleton(VPlan &Plan, Type *InductionTy, DebugLoc IVDL,
         {VectorPhiR, VectorPhiR->getOperand(0)}, VectorPhiR->getDebugLoc());
     cast<VPIRPhi>(&ScalarPhiR)->addOperand(ResumePhiR);
   }
+
+  addCumulativeIVRecipes(Plan, HeaderVPBB, LatchVPBB, InductionTy, IVDL);
 }
 
 /// Check \p Plan's live-in and replace them with constants, if they can be
@@ -694,7 +715,7 @@ void VPlanTransforms::createHeaderPhiRecipes(
   };
 
   for (VPRecipeBase &R : make_early_inc_range(HeaderVPBB->phis())) {
-    if (isa<VPCanonicalIVPHIRecipe>(&R))
+    if (isa<VPCanonicalIVPHIRecipe, VPCumulativeIVPHIRecipe>(&R))
       continue;
     auto *PhiR = cast<VPPhi>(&R);
     VPHeaderPHIRecipe *HeaderPhiR = CreateHeaderPhiRecipe(PhiR);
@@ -1161,7 +1182,8 @@ bool VPlanTransforms::handleMaxMinNumReductions(VPlan &Plan) {
       MinMaxNumReductionsToHandle;
   bool HasUnsupportedPhi = false;
   for (auto &R : LoopRegion->getEntryBasicBlock()->phis()) {
-    if (isa<VPCanonicalIVPHIRecipe, VPWidenIntOrFpInductionRecipe>(&R))
+    if (isa<VPCanonicalIVPHIRecipe, VPCumulativeIVPHIRecipe,
+            VPWidenIntOrFpInductionRecipe>(&R))
       continue;
     auto *Cur = dyn_cast<VPReductionPHIRecipe>(&R);
     if (!Cur) {
diff --git a/llvm/lib/Transforms/Vectorize/VPlanRecipes.cpp b/llvm/lib/Transforms/Vectorize/VPlanRecipes.cpp
index 11e4f930f1e85..8960733183fc6 100644
--- a/llvm/lib/Transforms/Vectorize/VPlanRecipes.cpp
+++ b/llvm/lib/Transforms/Vectorize/VPlanRecipes.cpp
@@ -74,6 +74,7 @@ bool VPRecipeBase::mayWriteToMemory() const {
   case VPWidenIntrinsicSC:
     return cast<VPWidenIntrinsicRecipe>(this)->mayWriteToMemory();
   case VPCanonicalIVPHISC:
+  case VPCumulativeIVPHISC:
   case VPBranchOnMaskSC:
   case VPDerivedIVSC:
   case VPFirstOrderRecurrencePHISC:
@@ -4494,9 +4495,9 @@ void VPActiveLaneMaskPHIRecipe::printRecipe(raw_ostream &O, const Twine &Indent,
 #endif
 
 #if !defined(NDEBUG) || defined(LLVM_ENABLE_DUMP)
-void VPEVLBasedIVPHIRecipe::printRecipe(raw_ostream &O, const Twine &Indent,
-                                        VPSlotTracker &SlotTracker) const {
-  O << Indent << "EXPLICIT-VECTOR-LENGTH-BASED-IV-PHI ";
+void VPCumulativeIVPHIRecipe::printRecipe(raw_ostream &O, const Twine &Indent,
+                                          VPSlotTracker &SlotTracker) const {
+  O << Indent << "Cumulative-IV-PHI ";
 
   printAsOperand(O, SlotTracker);
   O << " = phi ";
diff --git a/llvm/lib/Transforms/Vectorize/VPlanTransforms.cpp b/llvm/lib/Transforms/Vectorize/VPlanTransforms.cpp
index bfef277070db7..d58fa8c6eaa73 100644
--- a/llvm/lib/Transforms/Vectorize/VPlanTransforms.cpp
+++ b/llvm/lib/Transforms/Vectorize/VPlanTransforms.cpp
@@ -2040,7 +2040,7 @@ static bool simplifyBranchConditionForVFAndUF(VPlan &Plan, ElementCount BestVF,
   if (all_of(Header->phis(), [](VPRecipeBase &Phi) {
         if (auto *R = dyn_cast<VPWidenIntOrFpInductionRecipe>(&Phi))
           return R->isCanonical();
-        return isa<VPCanonicalIVPHIRecipe, VPEVLBasedIVPHIRecipe,
+        return isa<VPCanonicalIVPHIRecipe, VPCumulativeIVPHIRecipe,
                    VPFirstOrderRecurrencePHIRecipe, VPPhi>(&Phi);
       })) {
     for (VPRecipeBase &HeaderR : make_early_inc_range(Header->phis())) {
@@ -3121,10 +3121,8 @@ static void fixupVFUsersForEVL(VPlan &Plan, VPValue &EVL) {
 /// Converts a tail folded vector loop region to step by
 /// VPInstruction::ExplicitVectorLength elements instead of VF elements each
 /// iteration.
-///
-/// - Add a VPEVLBasedIVPHIRecipe and related recipes to \p Plan and
-///   replaces all uses except the canonical IV increment of
-///   VPCanonicalIVPHIRecipe with a VPEVLBasedIVPHIRecipe.
+/// This transformation:
+/// - Makes VPCumulativeIVPHIRecipe step by EVL instead of VFxUF.
 ///   VPCanonicalIVPHIRecipe is used only for loop iterations counting after
 ///   this transformation.
 ///
@@ -3134,6 +3132,8 @@ static void fixupVFUsersForEVL(VPlan &Plan, VPValue &EVL) {
 ///   previous iteration, and VPFirstOrderRecurrencePHIRecipes are replaced with
 ///   @llvm.vp.splice.
 ///
+/// - Switches the loop from up-counting to down-counting.
+///
 /// The function uses the following definitions:
 ///  %StartV is the canonical induction start value.
 ///
@@ -3144,13 +3144,13 @@ static void fixupVFUsersForEVL(VPlan &Plan, VPValue &EVL) {
 ///
 /// vector.body:
 /// ...
-/// %EVLPhi = EXPLICIT-VECTOR-LENGTH-BASED-IV-PHI [ %StartV, %vector.ph ],
-///                                               [ %NextEVLIV, %vector.body ]
+/// %CumulativeIVPhi = Cumulative-IV-PHI [ %StartV, %vector.ph ],
+///                                      [ %NextIV, %vector.body ]
 /// %AVL = phi [ trip-count, %vector.ph ], [ %NextAVL, %vector.body ]
 /// %VPEVL = EXPLICIT-VECTOR-LENGTH %AVL
 /// ...
 /// %OpEVL = cast i32 %VPEVL to IVSize
-/// %NextEVLIV = add IVSize %OpEVL, %EVLPhi
+/// %NextIV = add IVSize %OpEVL, %CumulativeIVPhi
 /// %NextAVL = sub IVSize nuw %AVL, %OpEVL
 /// ...
 ///
@@ -3160,15 +3160,15 @@ static void fixupVFUsersForEVL(VPlan &Plan, VPValue &EVL) {
 ///
 /// vector.body:
 /// ...
-/// %EVLPhi = EXPLICIT-VECTOR-LENGTH-BASED-IV-PHI [ %StartV, %vector.ph ],
-///                                               [ %NextEVLIV, %vector.body ]
+/// %CumulativeIVPhi = Cumulative-IV-PHI [ %StartV, %vector.ph ],
+///                                      [ %NextIV, %vector.body ]
 /// %AVL = phi [ trip-count, %vector.ph ], [ %NextAVL, %vector.body ]
 /// %cmp = cmp ult %AVL, MaxSafeElements
 /// %SAFE_AVL = select %cmp, %AVL, MaxSafeElements
 /// %VPEVL = EXPLICIT-VECTOR-LENGTH %SAFE_AVL
 /// ...
 /// %OpEVL = cast i32 %VPEVL to IVSize
-/// %NextEVLIV = add IVSize %OpEVL, %EVLPhi
+/// %NextIV = add IVSize %OpEVL, %CumulativeIVPhi
 /// %NextAVL = sub IVSize nuw %AVL, %OpEVL
 /// ...
 ///
@@ -3181,11 +3181,9 @@ void VPlanTransforms::addExplicitVectorLength(
 
   auto *CanonicalIVPHI = LoopRegion->getCanonicalIV();
   auto *CanIVTy = LoopRegion->getCanonicalIVType();
-  VPValue *StartV = CanonicalIVPHI->getStartValue();
 
-  // Create the ExplicitVectorLengthPhi recipe in the main loop.
-  auto *EVLPhi = new VPEVLBasedIVPHIRecipe(StartV, DebugLoc::getUnknown());
-  EVLPhi->insertAfter(CanonicalIVPHI);
+  VPCumulativeIVPHIRecipe *CumulativeIVPhi = LoopRegion->getCumulativeIV();
+  VPRecipeBase *CumulativeIVInc = &CumulativeIVPhi->getBackedgeRecipe();
   VPBuilder Builder(Header, Header->getFirstNonPhi());
   // Create the AVL (application vector length), starting from TC -> 0 in steps
   // of EVL.
@@ -3205,19 +3203,17 @@ void VPlanTransforms::addExplicitVectorLength(
 
   auto *CanonicalIVIncrement =
       cast<VPInstruction>(CanonicalIVPHI->getBackedgeValue());
-  Builder.setInsertPoint(CanonicalIVIncrement);
+  Builder.setInsertPoint(CumulativeIVInc);
   VPValue *OpVPEVL = VPEVL;
 
   auto *I32Ty = Type::getInt32Ty(Plan.getContext());
-  OpVPEVL = Builder.createScalarZExtOrTrunc(
-      OpVPEVL, CanIVTy, I32Ty, CanonicalIVIncrement->getDebugLoc());
+  OpVPEVL = Builder.createScalarZExtOrTrunc(OpVPEVL, CanIVTy, I32Ty,
+                                            CumulativeIVInc->getDebugLoc());
+
+  CumulativeIVInc->setOperand(0, OpVPEVL);
 
-  auto *NextEVLIV = Builder.createOverflowingOp(
-      Instruction::Add, {OpVPEVL, EVLPhi},
-      {CanonicalIVIncrement->hasNoUnsignedWrap(),
-       CanonicalIVIncrement->hasNoSignedWrap()},
-      CanonicalIVIncrement->getDebugLoc(), "index.evl.next");
-  EVLPhi->addOperand(NextEVLIV);
+  Builder.setInsertPoint(CumulativeIVInc->getParent(),
+                         std::next(CumulativeIVInc->getIterator()));
 
   VPValue *NextAVL = Builder.createOverflowingOp(
       Instruction::Sub, {AVLPhi, OpVPEVL}, {/*hasNUW=*/true, /*hasNSW=*/false},
@@ -3228,89 +3224,135 @@ void VPlanTransforms::addExplicitVectorLength(
   removeDeadRecipes(Plan);
 
   // Replace all uses of VPCanonicalIVPHIRecipe by
-  // VPEVLBasedIVPHIRecipe except for the canonical IV increment.
-  CanonicalIVPHI->replaceAllUsesWith(EVLPhi);
+  // VPCumulativeIVPHIRecipe except for the canonical IV increment.
+  CanonicalIVPHI->replaceAllUsesWith(CumulativeIVPhi);
   CanonicalIVIncrement->setOperand(0, CanonicalIVPHI);
+
   // TODO: support unroll factor > 1.
   Plan.setUF(1);
+
+  // Switch the loop from up-counting to down counting.
+  // convert (branch-on-count (CanonicalInc, VTC)
+  // -> (branch-on-count (sub VTC, CanonicalIVInc), 0)
+  VPBasicBlock *LatchVPBB = LoopRegion->getExitingBasicBlock();
+  auto *LatchExitingBranch = cast<VPInstruction>(LatchVPBB->getTerminator());
+  if (match(LatchExitingBranch, m_BranchOnCond(m_True())))
+    return;
+  assert(match(LatchExitingBranch,
+               m_BranchOnCount(m_Specific(CanonicalIVIncrement),
+                               m_Specific(&Plan.getVectorTripCount()))) &&
+         "Unexpected terminator");
+  Builder.setInsertPoint(LatchExitingBranch);
+  VPValue *RemainElementCount = Builder.createOverflowingOp(
+      Instruction::Sub, {&Plan.getVectorTripCount(), CanonicalIVIncrement},
+      {/*hasNUW=*/true, /*hasNSW=*/false}, DebugLoc::getCompilerGenerated(),
+      "remain.element.count");
+  auto *Zero = Plan.getOrAddLiveIn(ConstantInt::get(CanIVTy, 0));
+  LatchExitingBranch->setOperand(0, RemainElementCount);
+  LatchExitingBranch->setOperand(1, Zero);
+}
+
+void VPlanTransforms::removeFixedStepCumulativeIV(VPlan &Plan) {
+  VPRegionBlock *LoopRegion = Plan.getVectorLoopRegion();
+  auto *CanonicalIV = LoopRegion->getCanonicalIV();
+  VPCumulativeIVPHIRecipe *CumulativeIVPhi = LoopRegion->getCumulativeIV();
+  VPRecipeBase *CumulativeIVInc = &CumulativeIVPhi->getBackedgeRecipe();
+  // Replace all uses with CanonicalIV if it steps by VF*UF.
+  if (match(CumulativeIVInc,
+            m_Binary<Instruction::Add>(m_Specific(&Plan.getVFxUF()),
+                                       m_Specific(CumulativeIVPhi)))) {
+    CumulativeIVPhi->replaceAllUsesWith(CanonicalIV);
+    CumulativeIVPhi->eraseFromParent();
+    CumulativeIVInc->eraseFromParent();
+  }
 }
 
-void VPlanTransforms::canonicalizeEVLLoops(VPlan &Plan) {
-  // Find EVL loop entries by locating VPEVLBasedIVPHIRecipe.
-  // There should be only one EVL PHI in the entire plan.
-  VPEVLBasedIVPHIRecipe *EVLPhi = nullptr;
+void VPlanTransforms::convertToVariableLengthStep(VPlan &Plan,
+                                                  bool TailByMasking) {
+  VPCumulativeIVPHIRecipe *CumulativeIVPhi = nullptr;
 
   for (VPBasicBlock *VPBB : VPBlockUtils::blocksOnly<VPBasicBlock>(
            vp_depth_first_shallow(Plan.getEntry())))
     for (VPRecipeBase &R : VPBB->phis())
-      if (auto *PhiR = dyn_cast<VPEVLBasedIVPHIRecipe>(&R)) {
-        assert(!EVLPhi && "Found multiple EVL PHIs. Only one expected");
-        EVLPhi = PhiR;
+      if (auto *PhiR = dyn_cast<VPCumulativeIVPHIRecipe>(&R)) {
+        assert(!CumulativeIVPhi &&
+               "Found multiple CumulativeIV. Only one expected");
+        CumulativeI...
[truncated]

@@ -567,6 +586,8 @@ static void addInitialSkeleton(VPlan &Plan, Type *InductionTy, DebugLoc IVDL,
{VectorPhiR, VectorPhiR->getOperand(0)}, VectorPhiR->getDebugLoc());
cast<VPIRPhi>(&ScalarPhiR)->addOperand(ResumePhiR);
}

addCumulativeIVRecipes(Plan, HeaderVPBB, LatchVPBB, InductionTy, IVDL);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is there a particular reason why we always add the cumulative IV recipe for every VPlan? I would have thought we would only need to add it in the cases where we need to convert to a variably stepping loop region.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It is inspired by #166164, transforms after addExplicitVectorLength are expected to use CumulativeIV instead of CanonicalIV.
For instance, the diff in createScalarIVSteps :

VPHeaderPHIRecipe *IV = LoopRegion->getCanonicalIV();
if (auto *EVLIV =
        dyn_cast<VPEVLBasedIVPHIRecipe>(std::next(IV->getIterator())))
  IV = EVLIV;

The transform can simply use getCumulativeIV() when it cares about the processed element count

To support getCumulativeIV(), I chose to create it by default during VPlan construction and remove it later if unused.

An alternative approach would be to only create it after variable-length transforms (e.g., addExplicitVectorLength), and have getCumulativeIV() fall back to CanonicalIV when CumulativeIV doesn't exist.
Would that be preferable?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

An alternative approach would be to only create it after variable-length transforms (e.g., addExplicitVectorLength), and have getCumulativeIV() fall back to CanonicalIV when CumulativeIV doesn't exist.
Would that be preferable?

Yes, I think that's what I had in mind. That way we don't change any VPlans that don't care about CumulativeIV, and we don't need removeFixedStepCumulativeIV.

What we could also even do in a prior NFC is to add VPRegionBlock::getCumulativeIV, returning just getCanonicalIV() for now, and go through every caller of getCanonicalIV() and check if it should be moved to getCumulativeIV().

Then in this PR you can have make getCumulativeIV() return VPCumulativeIVPHIRecipe when it exists

"remain.element.count");
auto *Zero = Plan.getOrAddLiveIn(ConstantInt::get(CanIVTy, 0));
LatchExitingBranch->setOperand(0, RemainElementCount);
LatchExitingBranch->setOperand(1, Zero);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How come we're now converting the branch condition earlier in the EVL specific transform? I think fault-only-first loads will want this format as well so I would have thought we would want to keep it shared in VPlanTransforms::convertToVariableLengthStep

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I made it a multi-step transform because AVL {TC,-,Step} may be optional.
(branch-on-count CanonicalIVInc, VTC)
-> (branch-on-count (sub VectorTripCount, CanonicalIVInc), 0)
-> (branch-on-count (sub TripCount, (add Step, CumulativeIV)), 0)
-> (branch-on-count (sub AVL, Step), 0)

For masked-based loop with llvm.masked.load.ff, I'm not sure if we need an AVL to generate the mask.
For example:

%mask = call <VF x i1> @llvm.get.active.lane.mask(i32 %cumulative.iv, i32 %TC)
%ret = call {<VF x ty>, <VF x i1>} @llvm.masked.load.ff (ptr %ptr,  <VF x i1> %mask)
%ret.mask = extractvalue {<VF x ty>, <VF x i1>} %ret, 1
...
%count = call @llvm.cttz.elts (<VF x i1> %ret.mask, i1 true)
%cumulative.iv.next = add %count, %cumulative.iv

Hence I moved the down-counting decision into addExplicitVectorLength.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I see what you mean about llvm.masked.load.ff, but I think all the different transforms needed for branch-on-count is kind of complicated.

If that's the case that we want to only do the "downward counting" transform for EVL tail folded loops and not any variably stepped loop, I think it would be easier to just split it out from canonicalizeEVLLoops. I've opened up #178181 for this, after that I think this PR shouldn't need to worry about the exit condition

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've landed #178181 so that the EVL exit condition transform is split out. I've also relaxed an assertion that the VPEVLBasedIVPHIRecipe should have a ExplicitVectorLength in its backedge value, so I think you can probably get away with just renaming VPEVLBasedIVPHIRecipe in it

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for #178181! That makes this an NFC now.

@lukel97
Copy link
Contributor

lukel97 commented Jan 21, 2026

By the way, as a topic for bikeshedding I think we might need to find another term other than "cumulative IV", since IIUC an induction variable by definition has to increment/decrement by a fixed amount each time.

Copy link
Contributor

@fhahn fhahn left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It would be good if you could add a bit more detail on why this is needed and what the benefit is vs the EVL based IV. The description mostly describes how things to moved around, but it would be good to clarify why this is needed.

I don't think this recipe should be added to all plans unless there's a clear benefit in all cases. In terms of terminology, IV refers to induction, but inductions variables in LLVM terminology step by a loop-invariant amount, which the new recipes would not do in some cases.

@eas
Copy link
Contributor

eas commented Jan 21, 2026

By the way, as a topic for bikeshedding I think we might need to find another term other than "cumulative IV", since IIUC an induction variable by definition has to increment/decrement by a fixed amount each time.

VPNumProcessedElements, to be used for scalable vectors/first-fault loads/non-power-of-two vectorization? The last two could be beneficial for targets with fixed width vectors.

@arcbbb arcbbb force-pushed the variable-stepping-ups branch from 9dce672 to 38aa9d6 Compare January 22, 2026 12:36
@arcbbb arcbbb changed the title [VPlan] Introduce VPCumulativeIVPHIRecipe for variable-length stepping [VPlan] Introduce VPNumProcessedElementsPHIRecipe for variable-length stepping Jan 22, 2026
@arcbbb arcbbb force-pushed the variable-stepping-ups branch from 38aa9d6 to 49f4470 Compare January 22, 2026 12:45
@llvm llvm deleted a comment from github-actions bot Jan 22, 2026
@arcbbb
Copy link
Contributor Author

arcbbb commented Jan 22, 2026

It would be good if you could add a bit more detail on why this is needed and what the benefit is vs the EVL based IV. The description mostly describes how things to moved around, but it would be good to clarify why this is needed.

Thanks! I've updated the description to clarify the motivation. Please take another look!

@arcbbb
Copy link
Contributor Author

arcbbb commented Jan 22, 2026

By the way, as a topic for bikeshedding I think we might need to find another term other than "cumulative IV", since IIUC an induction variable by definition has to increment/decrement by a fixed amount each time.

VPNumProcessedElements, to be used for scalable vectors/first-fault loads/non-power-of-two vectorization? The last two could be beneficial for targets with fixed width vectors.

Updated. Thanks @lukel97 and @eas !

lukel97 added a commit to lukel97/llvm-project that referenced this pull request Jan 27, 2026
This is split out from llvm#177114.

In order to make canonicalizeEVLLoops a generic "convert to variable stepping" transform, move the code that changes the exit condition to a separate transform. Run it before canonicalizeEVLLoops before VPEVLBasedIVPHIRecipe is expanded.

Also relax the assertion for VPInstruction::ExplicitVectorLength to just bail instead, since eventually VPEVLBasedIVPHIRecipe will be used by other loops that aren't EVL tail folded.
"remain.element.count");
auto *Zero = Plan.getOrAddLiveIn(ConstantInt::get(CanIVTy, 0));
LatchExitingBranch->setOperand(0, RemainElementCount);
LatchExitingBranch->setOperand(1, Zero);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I see what you mean about llvm.masked.load.ff, but I think all the different transforms needed for branch-on-count is kind of complicated.

If that's the case that we want to only do the "downward counting" transform for EVL tail folded loops and not any variably stepped loop, I think it would be easier to just split it out from canonicalizeEVLLoops. I've opened up #178181 for this, after that I think this PR shouldn't need to worry about the exit condition

llvm_unreachable("cloning not implemented yet");
VPNumProcessedElementsPHIRecipe *clone() override {
auto *R =
new VPNumProcessedElementsPHIRecipe(getStartValue(), getDebugLoc());
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just curious, does the phi get cloned when unrolling with #151300?

lukel97 added a commit that referenced this pull request Feb 2, 2026
…NFC (#178181)

This is split out from #177114.

In order to make canonicalizeEVLLoops a generic "convert to variable
stepping" transform, move the code that changes the exit condition to a
separate transform since not all variable stepping loops will want to
transform the exit condition. Run it before canonicalizeEVLLoops before
VPEVLBasedIVPHIRecipe is expanded.

Also relax the assertion for VPInstruction::ExplicitVectorLength to just
bail instead, since eventually VPEVLBasedIVPHIRecipe will be used by
other loops that aren't EVL tail folded.
@lukel97
Copy link
Contributor

lukel97 commented Feb 2, 2026

By the way, as a topic for bikeshedding I think we might need to find another term other than "cumulative IV", since IIUC an induction variable by definition has to increment/decrement by a fixed amount each time.

VPNumProcessedElements, to be used for scalable vectors/first-fault loads/non-power-of-two vectorization? The last two could be beneficial for targets with fixed width vectors.

I think VPNumProcessedElements is a bit non-standard, and it's not clear to me what constitutes an element being processed. E.g. if the original scalar loop has two loads and two stores, is each one load/store pair an element processed?

How about something like CurrentTripCount. That way it can be defined in terms of the original scalar loop.

@eas
Copy link
Contributor

eas commented Feb 2, 2026

How about something like CurrentTripCount.

This can be confusing as well, one might thing about number of iterations remaining (maybe?). How about [Orig|Scalar]IterationsProcessed?

@lukel97
Copy link
Contributor

lukel97 commented Feb 3, 2026

How about something like CurrentTripCount.

This can be confusing as well, one might thing about number of iterations remaining (maybe?). How about [Orig|Scalar]IterationsProcessed?

I see what you mean, I don't think the term "processed" is used much in the loop vectorizer currently though.

How about CurrentIteration? That would match with how VPlan::TripCount/VPlan::BackedgeTakenCount implicitly refer to the scalar loop. The vector equivalent would be VPlan::VectorCurrentIteration.

This is groundwork for llvm#151300, which aims to support first-faulting
loads in non-tail-folded early-exit loops.
Per llvm#175900, we need a variable-length stepping transform that can
shared between EVL and non-EVL loops.
The idea is to have an EVL-independent counter and transform for
tracking the cumulative number of processed elements.

This patch renames the existing counter (VPEVLBasedIVPHIRecipe) and
transform (canonicalizeEVLLoops) to be EVL-independent:
- Rename VPEVLBasedIVPHIRecipe to VPCurrentIterationRecipe to
  reflect its general purpose of tracking processed element count.
- Rename canonicalizeEVLLoops to convertToVariableLengthStep.
@arcbbb arcbbb force-pushed the variable-stepping-ups branch from 49f4470 to c95a2e5 Compare February 4, 2026 09:24
@arcbbb
Copy link
Contributor Author

arcbbb commented Feb 4, 2026

I'm renaming the title again, apologies for the confusion.

@arcbbb arcbbb changed the title [VPlan] Introduce VPNumProcessedElementsPHIRecipe for variable-length stepping [VPlan] Introduce generic variable-length step support Feb 4, 2026
Copy link
Contributor

@lukel97 lukel97 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, just left some nits. Can you also update the PR title to mention it's NFC?

@arcbbb arcbbb changed the title [VPlan] Introduce generic variable-length step support [NFC][VPlan] Introduce generic variable-length step support Feb 6, 2026
@arcbbb arcbbb enabled auto-merge (squash) February 6, 2026 09:33
Copy link
Contributor

@fhahn fhahn left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It would probably be good to update title/ description to clarify that this renames EVL-based PHI recipe to more general name. My reading of the current title implies new functionality

@fhahn fhahn disabled auto-merge February 6, 2026 09:55
1. Rename VPCurrentIterationRecipe to VPCurrentIterationPHIRecipe.
2. Rename VPCurrentIterationSC to VPCurrentIterationPHISC.
3. Rephrase current index of elements.
4. Update assertion string in verifier.
@llvm llvm deleted a comment from github-actions bot Feb 6, 2026
@arcbbb arcbbb changed the title [NFC][VPlan] Introduce generic variable-length step support [NFC][VPlan] Rename VPEVLBasedIVPHIRecipe to VPCurrentIterationPHIRecipe Feb 6, 2026
@arcbbb
Copy link
Contributor Author

arcbbb commented Feb 9, 2026

@fhahn is there anything blocking this or any changes you'd like me to make?

Comment on lines 3373 to +3374
// The EVL IV is always immediately after the canonical IV.
auto *EVLPhi =
dyn_cast_or_null<VPEVLBasedIVPHIRecipe>(std::next(CanIV->getIterator()));
auto *EVLPhi = dyn_cast_or_null<VPCurrentIterationPHIRecipe>(
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This probably also needs updating; whether it is EVL based will be determine later, by checking the increment I think

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Right after this, we check that the increment is EVL and bail out if it's not.

Copy link
Contributor

@fhahn fhahn left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think there's also llvm/test/Transforms/LoopVectorize/vplan-force-tail-with-evl.ll which has ; NO-VP-NOT: EXPLICIT-VECTOR-LENGTH-BASED-IV-PHI

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants