Skip to content

[VPlan][NFC] Extract addCurrentIterationPhi from addExplicitVectorLength#182650

Closed
arcbbb wants to merge 3 commits intollvm:mainfrom
arcbbb:split-current-iteration
Closed

[VPlan][NFC] Extract addCurrentIterationPhi from addExplicitVectorLength#182650
arcbbb wants to merge 3 commits intollvm:mainfrom
arcbbb:split-current-iteration

Conversation

@arcbbb
Copy link
Copy Markdown
Contributor

@arcbbb arcbbb commented Feb 21, 2026

Refactor the creation of VPCurrentIterationPHIRecipe into a separate transformation pass addCurrentIterationPhi. This separates concerns and allows the CurrentIteration PHI to be created independently of EVL-based tail folding.
Changes:

  • Add new VPlanTransforms::addCurrentIterationPhi() that creates VPCurrentIterationPHIRecipe and replaces uses of VPCanonicalIVPHIRecipe
  • Add vputils::getCurrentIterationPhi() helper to locate the VPCurrentIterationPHIRecipe in a VPlan
  • Add vputils::getLoopIndex() helper that returns either VPCurrentIterationPHIRecipe or VPCanonicalIVPHIRecipe

@llvmbot
Copy link
Copy Markdown
Member

llvmbot commented Feb 21, 2026

@llvm/pr-subscribers-backend-risc-v
@llvm/pr-subscribers-vectorizers

@llvm/pr-subscribers-llvm-transforms

Author: Shih-Po Hung (arcbbb)

Changes

Refactor the creation of VPCurrentIterationPHIRecipe into a separate transformation pass addCurrentIterationPhi. This separates concerns and allows the CurrentIteration PHI to be created independently of EVL-based tail folding.
Changes:

  • Add new VPlanTransforms::addCurrentIterationPhi() that creates VPCurrentIterationPHIRecipe and replaces uses of VPCanonicalIVPHIRecipe
  • Add vputils::getCurrentIterationPhi() helper to locate the VPCurrentIterationPHIRecipe in a VPlan
  • Add vputils::getLoopIndex() helper that returns either VPCurrentIterationPHIRecipe or VPCanonicalIVPHIRecipe

Full diff: https://github.com/llvm/llvm-project/pull/182650.diff

5 Files Affected:

  • (modified) llvm/lib/Transforms/Vectorize/LoopVectorize.cpp (+1)
  • (modified) llvm/lib/Transforms/Vectorize/VPlanTransforms.cpp (+48-38)
  • (modified) llvm/lib/Transforms/Vectorize/VPlanTransforms.h (+6-5)
  • (modified) llvm/lib/Transforms/Vectorize/VPlanUtils.cpp (+38-6)
  • (modified) llvm/lib/Transforms/Vectorize/VPlanUtils.h (+5)
diff --git a/llvm/lib/Transforms/Vectorize/LoopVectorize.cpp b/llvm/lib/Transforms/Vectorize/LoopVectorize.cpp
index 69d2b9f2c1a28..a9abacf38c4fa 100644
--- a/llvm/lib/Transforms/Vectorize/LoopVectorize.cpp
+++ b/llvm/lib/Transforms/Vectorize/LoopVectorize.cpp
@@ -8145,6 +8145,7 @@ void LoopVectorizationPlanner::buildVPlansWithVPRecipes(ElementCount MinVF,
       RUN_VPLAN_PASS(VPlanTransforms::optimize, *Plan);
       // TODO: try to put addExplicitVectorLength close to addActiveLaneMask
       if (CM.foldTailWithEVL()) {
+        RUN_VPLAN_PASS(VPlanTransforms::addCurrentIterationPhi, *Plan);
         RUN_VPLAN_PASS(VPlanTransforms::addExplicitVectorLength, *Plan,
                        CM.getMaxSafeElements());
         RUN_VPLAN_PASS(VPlanTransforms::optimizeEVLMasks, *Plan);
diff --git a/llvm/lib/Transforms/Vectorize/VPlanTransforms.cpp b/llvm/lib/Transforms/Vectorize/VPlanTransforms.cpp
index b4c74c8776fc3..885aadac21316 100644
--- a/llvm/lib/Transforms/Vectorize/VPlanTransforms.cpp
+++ b/llvm/lib/Transforms/Vectorize/VPlanTransforms.cpp
@@ -3207,16 +3207,43 @@ static void fixupVFUsersForEVL(VPlan &Plan, VPValue &EVL) {
   HeaderMask->replaceAllUsesWith(EVLMask);
 }
 
+/// Add a VPCurrentIterationPHIRecipe to \p Plan and replaces all uses of
+/// VPCanonicalIVPHIRecipe with VPCurrentIterationPHIRecipe, except for the
+/// canonical IV increment. After this transfomration, VPCanonicalIVPHIRecipe
+/// is used only for loop iterations counting.
+void VPlanTransforms::addCurrentIterationPhi(VPlan &Plan) {
+  if (Plan.hasScalarVFOnly())
+    return;
+  VPRegionBlock *LoopRegion = Plan.getVectorLoopRegion();
+  VPBasicBlock *Header = LoopRegion->getEntryBasicBlock();
+
+  auto *CanonicalIVPHI = LoopRegion->getCanonicalIV();
+  auto *CanonicalIVIncrement =
+      cast<VPInstruction>(CanonicalIVPHI->getBackedgeValue());
+  auto *CanIVTy = LoopRegion->getCanonicalIVType();
+  VPValue *StartV = CanonicalIVPHI->getStartValue();
+  // Create the CurrentIteration recipe in the vector loop.
+  auto *CurrentIteration =
+      new VPCurrentIterationPHIRecipe(StartV, DebugLoc::getUnknown());
+  CurrentIteration->insertAfter(CanonicalIVPHI);
+  VPBuilder Builder(CanonicalIVIncrement);
+  auto *NextIter = Builder.createAdd(
+      CanonicalIVIncrement->getOperand(1), CurrentIteration,
+      CanonicalIVIncrement->getDebugLoc(), "current.iteration.next",
+      {CanonicalIVIncrement->hasNoUnsignedWrap(),
+       CanonicalIVIncrement->hasNoSignedWrap()});
+  CurrentIteration->addOperand(NextIter);
+
+  // Replace all uses of VPCanonicalIVPHIRecipe by
+  // VPCurrentIterationPHIRecipe except for the canonical IV increment.
+  CanonicalIVPHI->replaceAllUsesWith(CurrentIteration);
+  CanonicalIVIncrement->setOperand(0, CanonicalIVPHI);
+}
+
 /// Converts a tail folded vector loop region to step by
 /// VPInstruction::ExplicitVectorLength elements instead of VF elements each
 /// iteration.
 ///
-/// - Add a VPCurrentIterationPHIRecipe and related recipes to \p Plan and
-///   replaces all uses except the canonical IV increment of
-///   VPCanonicalIVPHIRecipe with a VPCurrentIterationPHIRecipe.
-///   VPCanonicalIVPHIRecipe is used only for loop iterations counting after
-///   this transformation.
-///
 /// - The header mask is replaced with a header mask based on the EVL.
 ///
 /// - Plans with FORs have a new phi added to keep track of the EVL of the
@@ -3272,10 +3299,8 @@ void VPlanTransforms::addExplicitVectorLength(
   auto *CanIVTy = LoopRegion->getCanonicalIVType();
   VPValue *StartV = CanonicalIVPHI->getStartValue();
 
-  // Create the CurrentIteration recipe in the vector loop.
-  auto *CurrentIteration =
-      new VPCurrentIterationPHIRecipe(StartV, DebugLoc::getUnknown());
-  CurrentIteration->insertAfter(CanonicalIVPHI);
+  auto *CurrentIteration = vputils::getCurrentIterationPhi(Plan);
+  assert(CurrentIteration && "must have CurrentIteration");
   VPBuilder Builder(Header, Header->getFirstNonPhi());
   // Create the AVL (application vector length), starting from TC -> 0 in steps
   // of EVL.
@@ -3293,22 +3318,21 @@ void VPlanTransforms::addExplicitVectorLength(
   auto *VPEVL = Builder.createNaryOp(VPInstruction::ExplicitVectorLength, AVL,
                                      DebugLoc::getUnknown(), "evl");
 
-  auto *CanonicalIVIncrement =
-      cast<VPInstruction>(CanonicalIVPHI->getBackedgeValue());
-  Builder.setInsertPoint(CanonicalIVIncrement);
+  auto *NextIter = cast<VPInstruction>(CurrentIteration->getBackedgeValue());
+  assert(NextIter->getOperand(0) == &Plan.getVFxUF() &&
+         "CurrentIteration increment should be VFxUF");
+  Builder.setInsertPoint(NextIter);
   VPValue *OpVPEVL = VPEVL;
 
   auto *I32Ty = Type::getInt32Ty(Plan.getContext());
-  OpVPEVL = Builder.createScalarZExtOrTrunc(
-      OpVPEVL, CanIVTy, I32Ty, CanonicalIVIncrement->getDebugLoc());
-
-  auto *NextIter = Builder.createAdd(OpVPEVL, CurrentIteration,
-                                     CanonicalIVIncrement->getDebugLoc(),
-                                     "current.iteration.next",
-                                     {CanonicalIVIncrement->hasNoUnsignedWrap(),
-                                      CanonicalIVIncrement->hasNoSignedWrap()});
-  CurrentIteration->addOperand(NextIter);
+  OpVPEVL = Builder.createScalarZExtOrTrunc(OpVPEVL, CanIVTy, I32Ty,
+                                            NextIter->getDebugLoc());
+
+  NextIter->setOperand(0, OpVPEVL);
 
+  auto *CanonicalIVIncrement =
+      cast<VPInstruction>(CanonicalIVPHI->getBackedgeValue());
+  Builder.setInsertPoint(CanonicalIVIncrement);
   VPValue *NextAVL =
       Builder.createSub(AVLPhi, OpVPEVL, DebugLoc::getCompilerGenerated(),
                         "avl.next", {/*NUW=*/true, /*NSW=*/false});
@@ -3317,27 +3341,13 @@ void VPlanTransforms::addExplicitVectorLength(
   fixupVFUsersForEVL(Plan, *VPEVL);
   removeDeadRecipes(Plan);
 
-  // Replace all uses of VPCanonicalIVPHIRecipe by
-  // VPCurrentIterationPHIRecipe except for the canonical IV increment.
-  CanonicalIVPHI->replaceAllUsesWith(CurrentIteration);
-  CanonicalIVIncrement->setOperand(0, CanonicalIVPHI);
   // TODO: support unroll factor > 1.
   Plan.setUF(1);
 }
 
 void VPlanTransforms::convertToVariableLengthStep(VPlan &Plan) {
-  // Find the vector loop entry by locating VPCurrentIterationPHIRecipe.
-  // There should be only one VPCurrentIteration in the entire plan.
-  VPCurrentIterationPHIRecipe *CurrentIteration = nullptr;
-
-  for (VPBasicBlock *VPBB : VPBlockUtils::blocksOnly<VPBasicBlock>(
-           vp_depth_first_shallow(Plan.getEntry())))
-    for (VPRecipeBase &R : VPBB->phis())
-      if (auto *PhiR = dyn_cast<VPCurrentIterationPHIRecipe>(&R)) {
-        assert(!CurrentIteration &&
-               "Found multiple CurrentIteration. Only one expected");
-        CurrentIteration = PhiR;
-      }
+  VPCurrentIterationPHIRecipe *CurrentIteration =
+      vputils::getCurrentIterationPhi(Plan);
 
   // Early return if it is not variable-length stepping.
   if (!CurrentIteration)
diff --git a/llvm/lib/Transforms/Vectorize/VPlanTransforms.h b/llvm/lib/Transforms/Vectorize/VPlanTransforms.h
index 972a18ebded63..3ece149b8b9c6 100644
--- a/llvm/lib/Transforms/Vectorize/VPlanTransforms.h
+++ b/llvm/lib/Transforms/Vectorize/VPlanTransforms.h
@@ -285,11 +285,12 @@ struct VPlanTransforms {
       VPlan &Plan,
       const std::function<bool(BasicBlock *)> &BlockNeedsPredication);
 
-  /// Add a VPCurrentIterationPHIRecipe and related recipes to \p Plan and
-  /// replaces all uses except the canonical IV increment of
-  /// VPCanonicalIVPHIRecipe with a VPCurrentIterationPHIRecipe.
-  /// VPCanonicalIVPHIRecipe is only used to control the loop after
-  /// this transformation.
+  /// Add a VPCurrentIterationPHIRecipe to \p Plan and replaces all uses of
+  /// VPCanonicalIVPHIRecipe with VPCurrentIterationPHIRecipe, except for the
+  /// canonical IV increment.
+  static void addCurrentIterationPhi(VPlan &Plan);
+  /// Convert a tail-folded vector loop to use explicit vector length (EVL)
+  /// instead of VF elements each iteration.
   static void
   addExplicitVectorLength(VPlan &Plan,
                           const std::optional<unsigned> &MaxEVLSafeElements);
diff --git a/llvm/lib/Transforms/Vectorize/VPlanUtils.cpp b/llvm/lib/Transforms/Vectorize/VPlanUtils.cpp
index f5318bb1c6515..093cd741171c7 100644
--- a/llvm/lib/Transforms/Vectorize/VPlanUtils.cpp
+++ b/llvm/lib/Transforms/Vectorize/VPlanUtils.cpp
@@ -570,13 +570,12 @@ vputils::getRecipesForUncountableExit(VPlan &Plan,
 VPSingleDefRecipe *vputils::findHeaderMask(VPlan &Plan) {
   VPRegionBlock *LoopRegion = Plan.getVectorLoopRegion();
   SmallVector<VPValue *> WideCanonicalIVs;
-  auto *FoundWidenCanonicalIVUser = find_if(
-      LoopRegion->getCanonicalIV()->users(), IsaPred<VPWidenCanonicalIVRecipe>);
-  assert(count_if(LoopRegion->getCanonicalIV()->users(),
-                  IsaPred<VPWidenCanonicalIVRecipe>) <= 1 &&
+  VPHeaderPHIRecipe *LoopIndex = getLoopIndex(Plan);
+  auto *FoundWidenCanonicalIVUser =
+      find_if(LoopIndex->users(), IsaPred<VPWidenCanonicalIVRecipe>);
+  assert(count_if(LoopIndex->users(), IsaPred<VPWidenCanonicalIVRecipe>) <= 1 &&
          "Must have at most one VPWideCanonicalIVRecipe");
-  if (FoundWidenCanonicalIVUser !=
-      LoopRegion->getCanonicalIV()->users().end()) {
+  if (FoundWidenCanonicalIVUser != LoopIndex->users().end()) {
     auto *WideCanonicalIV =
         cast<VPWidenCanonicalIVRecipe>(*FoundWidenCanonicalIVUser);
     WideCanonicalIVs.push_back(WideCanonicalIV);
@@ -665,3 +664,36 @@ VPInstruction *vputils::findComputeReductionResult(VPReductionPHIRecipe *PhiR) {
   return vputils::findUserOf<VPInstruction::ComputeReductionResult>(
       cast<VPSingleDefRecipe>(SelR));
 }
+
+VPCurrentIterationPHIRecipe *vputils::getCurrentIterationPhi(VPlan &Plan) {
+  VPCurrentIterationPHIRecipe *CurrentIteration = nullptr;
+  VPRegionBlock *LoopRegion = Plan.getVectorLoopRegion();
+  if (LoopRegion) {
+    VPCanonicalIVPHIRecipe *CanIV = LoopRegion->getCanonicalIV();
+    if (std::next(CanIV->getIterator()) != CanIV->getParent()->end())
+      CurrentIteration = dyn_cast_or_null<VPCurrentIterationPHIRecipe>(
+          std::next(CanIV->getIterator()));
+    return CurrentIteration;
+  }
+
+  // Find the vector loop entry by locating VPCurrentIterationPHIRecipe.
+  // There should be only one VPCurrentIteration in the entire plan.
+  for (VPBasicBlock *VPBB : VPBlockUtils::blocksOnly<VPBasicBlock>(
+           vp_depth_first_shallow(Plan.getEntry())))
+    for (VPRecipeBase &R : VPBB->phis())
+      if (auto *PhiR = dyn_cast<VPCurrentIterationPHIRecipe>(&R)) {
+        assert(!CurrentIteration &&
+               "Found multiple CurrentIteration. Only one expected");
+        CurrentIteration = PhiR;
+      }
+
+  return CurrentIteration;
+}
+
+VPHeaderPHIRecipe *vputils::getLoopIndex(VPlan &Plan) {
+  VPCurrentIterationPHIRecipe *CurrentIteration =
+      vputils::getCurrentIterationPhi(Plan);
+  if (CurrentIteration)
+    return CurrentIteration;
+  return Plan.getVectorLoopRegion()->getCanonicalIV();
+}
diff --git a/llvm/lib/Transforms/Vectorize/VPlanUtils.h b/llvm/lib/Transforms/Vectorize/VPlanUtils.h
index a5692699d9d76..f322b28f372f1 100644
--- a/llvm/lib/Transforms/Vectorize/VPlanUtils.h
+++ b/llvm/lib/Transforms/Vectorize/VPlanUtils.h
@@ -151,6 +151,11 @@ VPInstruction *findComputeReductionResult(VPReductionPHIRecipe *PhiR);
 /// the header-mask pattern manually.
 VPSingleDefRecipe *findHeaderMask(VPlan &Plan);
 
+/// Return VPCurrentIterationPHIRecipe if present, otherwise nullptr.
+VPCurrentIterationPHIRecipe *getCurrentIterationPhi(VPlan &Plan);
+/// Return the loop index PHI: VPCurrentIterationPHIRecipe for variable-length
+/// stepping loops, or VPCanonicalIVPHIRecipe otherwise.
+VPHeaderPHIRecipe *getLoopIndex(VPlan &Plan);
 } // namespace vputils
 
 //===----------------------------------------------------------------------===//

Refactor the creation of VPCurrentIterationPHIRecipe into a separate
transformation pass `addCurrentIterationPhi`. This separates concerns
and allows the CurrentIteration PHI to be created independently of
EVL-based tail folding.
Changes:
- Add new `VPlanTransforms::addCurrentIterationPhi()` that creates
  VPCurrentIterationPHIRecipe and replaces uses of VPCanonicalIVPHIRecipe
- Add `vputils::getCurrentIterationPhi()` helper to locate the
  VPCurrentIterationPHIRecipe in a VPlan
- Add `vputils::getLoopIndex()` helper that returns either
  VPCurrentIterationPHIRecipe or VPCanonicalIVPHIRecipe
@arcbbb arcbbb force-pushed the split-current-iteration branch from aeede48 to e74fa4b Compare February 21, 2026 04:36
@llvm llvm deleted a comment from github-actions bot Feb 21, 2026
Copy link
Copy Markdown
Contributor

@lukel97 lukel97 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for extracting this, it's good to see the "variable stepping" part be untangled from the EVL transform. I think this will also help #177599

1. Inline fixupVFUsersForEVL into addCurrentIterationPhi
2. Move getLoopIndex helper into VPRegionBlock as getCurrentIteration
@@ -33,13 +33,13 @@ define void @fadd(ptr noalias %a, ptr noalias %b, i64 %n) {
; ZVFBFMIN-NEXT: [[TMP0:%.*]] = phi i64 [ 0, %[[VECTOR_PH]] ], [ [[INDEX_EVL_NEXT:%.*]], %[[VECTOR_BODY]] ]
; ZVFBFMIN-NEXT: [[AVL:%.*]] = phi i64 [ [[N]], %[[VECTOR_PH]] ], [ [[AVL_NEXT:%.*]], %[[VECTOR_BODY]] ]
; ZVFBFMIN-NEXT: [[TMP6:%.*]] = call i32 @llvm.experimental.get.vector.length.i64(i64 [[AVL]], i32 8, i1 true)
; ZVFBFMIN-NEXT: [[TMP13:%.*]] = zext i32 [[TMP6]] to i64
Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We cast EVL to the canonical IV type early, before replacing VF users.

/// fixup recipes that use VF to use \p NewStep instead. After this
/// transformation, VPCanonicalIVPHIRecipe is used only for loop iterations
/// counting.
static void addCurrentIterationPhi(VPlan &Plan, VPValue *NewStep) {
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
static void addCurrentIterationPhi(VPlan &Plan, VPValue *NewStep) {
static void addCurrentIterationPhi(VPlan &Plan, VPSingleDefRecipe *NewStep) {

Then you don't need the getDefiningRecipe in the FOR handling code

Comment on lines +3192 to +3193
VPValue *StepI32;
VPValue *StepSource;
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nit

Suggested change
VPValue *StepI32;
VPValue *StepSource;
VPValue *StepI32, *StepSource;

if (match(NewStep, m_ZExt(m_VPValue(StepSource))) &&
TypeInfo.inferScalarType(StepSource) ==
Type::getInt32Ty(Plan.getContext())) {
StepI32 = StepSource;
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Don't we already have a simplifcation for trunc(zext(x)) -> x in simplifyRecipes? Are we seeing regressions if we always call Builder.createScalarZExtOrTrunc?

@@ -3200,30 +3233,24 @@ static void fixupVFUsersForEVL(VPlan &Plan, VPValue &EVL) {
if (!HeaderMask)
return;

// Replace header masks with a mask equivalent to predicating by EVL:
// Replace header masks with a mask equivalent to predicating by
// variable-length step:
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nit

Suggested change
// variable-length step:
// NewStep:

/// VPCurrentIterationPHIRecipe except for the canonical IV increment, and
/// fixup recipes that use VF to use \p NewStep instead. After this
/// transformation, VPCanonicalIVPHIRecipe is used only for loop iterations
/// counting.
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nit, can we mention somewhere here that NewStep doesn't need to be loop invariant?

@@ -5,7 +5,7 @@ define void @add(ptr noalias nocapture readonly %src1, ptr noalias nocapture rea
; CHECK-LABEL: add
; CHECK: LV(REG): VF = vscale x 4
; CHECK-NEXT: LV(REG): Found max usage: 2 item
; CHECK-NEXT: LV(REG): RegisterClass: RISCV::GPRRC, 6 registers
; CHECK-NEXT: LV(REG): RegisterClass: RISCV::GPRRC, 7 registers
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How come we end up with an extra register use? Is there an extra scalar cast somewhere?

Copy link
Copy Markdown
Contributor

@fhahn fhahn left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It looks like the change is not NFC, there are quite a few movements in tests. Not sure if they can be avoided?

@@ -802,6 +802,14 @@ const VPBasicBlock *VPBasicBlock::getCFGPredecessor(unsigned Idx) const {
return Pred->getExitingBasicBlock();
}

VPHeaderPHIRecipe *VPRegionBlock::getCurrentIteration() {
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is there a benefit from exposing the helpers more widely? Looks like the only have a single uses, and I don't think the movement is needed for extracting addCurrentIterationPhi from addExplicitVectorLength? Overall it seems like it would be good to try to keep the logic together?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The idea is for addCurrentIterationPhi to be shared by the transforms of EVL, FFLoad, and partial alias masking. Adding VPRegionBlock::getCurrentIteration is a step toward eventually moving addExplicitVectorLength into tryToBuildVPlanWithVPRecipes.

That said, FFLoad and partial alias masking transform aren’t in place here yet, and there’s only a single user right now.
It probably makes more sense to keep the logic together for the moment, focus on #166164, and revisit this once those transforms can build on it.
@lukel97 does that sound good?

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Adding VPRegionBlock::getCurrentIteration is a step toward eventually moving addExplicitVectorLength into tryToBuildVPlanWithVPRecipes.

Yup, that's my understanding of the plan as well. We'll need to start changing other transforms to use getCurrentIteration if we want to do EVL tail folding/partial alias masking earlier during vplan construction. Since for some passes it won't be correct if they start using the canonical IV again after we've replaced it with the VPCurrentIterationPHI.

@lukel97 does that sound good?

Yup, sounds good if we want leave that to another PR

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sounds good!

partial alias masking earlier during vplan construction.

I don't think this needs CurrentIterationPHI, as the increment will remain loop-invariant IIUC

Copy link
Copy Markdown
Contributor

@lukel97 lukel97 Mar 2, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's not that the increment needs to be invariant but the fact that the increment is no longer VFxUF, so various recipes need to adapt.

The fixupVFUsersForClampedVF transform in #182457 is doing the same thing as fixupVFUsersForEVL. It would be good to share this logic.

But also CurrentIterationPHI/VPRegionBlock::getCurrentIteration means that transforms and recipes can just directly use the correct increment, instead of relying on being fixed up later.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh I think I misunderstood this comment. I thought that we just meant removing the VPRegionBlock::getCurrentIteration() accessor.

For #151300 do we not need to reuse fixupVFUsersForEVL?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think #182457 treats partial masking as a fixed-length stepping case, so it can simply replace VFxUF during VPlan execution.

For #151300, I think we still need the fixupVFUsers logic. My understanding is that #151300 would stack on top of #166164, which in turn depends on the VPRegionBlock::getCurrentIteration() accessor.
Once that is in place, splitting addCurrentIterationPhi out of addExplicitVectorLength would be a more natural next step.
Would you like to move #166164 forward with getCurrentIteration() accessor?
I’m a bit tied up with some downstream issues at the moment, so I may not have much time to work on this over the next few weeks.

@arcbbb arcbbb closed this Mar 2, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants