Skip to content

[VPlan] Robustly handle Part0 for VecPointer, CanIVInc#185739

Open
artagnon wants to merge 1 commit intollvm:mainfrom
artagnon:vplan-civinc
Open

[VPlan] Robustly handle Part0 for VecPointer, CanIVInc#185739
artagnon wants to merge 1 commit intollvm:mainfrom
artagnon:vplan-civinc

Conversation

@artagnon
Copy link
Copy Markdown
Contributor

@artagnon artagnon commented Mar 10, 2026

Add an extra Offset operand to CanonicalIVIncrement instead of conflating VF operand on the non-unrolled recipe with 1 * VF on Part 1 of the unrolled recipe, and create Part0 offsets for VectorPointer and CanonicalIVIncrement recipes, exactly as we do it for VectorEndPointer, to make the VPlan logic more robust.

The patch is mostly non-functional, with the exception of some wrap-flags.

@llvmbot
Copy link
Copy Markdown
Member

llvmbot commented Mar 10, 2026

@llvm/pr-subscribers-vectorizers

@llvm/pr-subscribers-llvm-transforms

Author: Ramkumar Ramachandra (artagnon)

Changes

VPCanonicalIVIncrementRecipe is very similar to VPVectorPointerRecipe. The current patch is merely a cleanup, and optimizations are left to follow-ups.


Patch is 44.43 KiB, truncated to 20.00 KiB below, full version: https://github.com/llvm/llvm-project/pull/185739.diff

12 Files Affected:

  • (modified) llvm/lib/Transforms/Vectorize/LoopVectorizationPlanner.h (+5)
  • (modified) llvm/lib/Transforms/Vectorize/VPlan.h (+53-2)
  • (modified) llvm/lib/Transforms/Vectorize/VPlanAnalysis.cpp (+4-5)
  • (modified) llvm/lib/Transforms/Vectorize/VPlanPatternMatch.h (+8-1)
  • (modified) llvm/lib/Transforms/Vectorize/VPlanRecipes.cpp (+24-19)
  • (modified) llvm/lib/Transforms/Vectorize/VPlanTransforms.cpp (+17-18)
  • (modified) llvm/lib/Transforms/Vectorize/VPlanUnroll.cpp (+9-22)
  • (modified) llvm/lib/Transforms/Vectorize/VPlanUtils.cpp (+1-2)
  • (modified) llvm/test/Transforms/LoopVectorize/AArch64/scalable-strict-fadd.ll (+60-60)
  • (modified) llvm/test/Transforms/LoopVectorize/AArch64/sve-tail-folding-unroll.ll (+4-4)
  • (modified) llvm/test/Transforms/LoopVectorize/VPlan/AArch64/sve-tail-folding-forced.ll (+2-2)
  • (modified) llvm/test/Transforms/LoopVectorize/VPlan/AArch64/vplan-printing.ll (+3-3)
diff --git a/llvm/lib/Transforms/Vectorize/LoopVectorizationPlanner.h b/llvm/lib/Transforms/Vectorize/LoopVectorizationPlanner.h
index 8368349e63cee..2b919c6b0027f 100644
--- a/llvm/lib/Transforms/Vectorize/LoopVectorizationPlanner.h
+++ b/llvm/lib/Transforms/Vectorize/LoopVectorizationPlanner.h
@@ -389,6 +389,11 @@ class VPBuilder {
     return tryInsertInstruction(new VPExpandSCEVRecipe(Expr));
   }
 
+  VPCanonicalIVIncrementRecipe *createCanonicalIVIncrement(VPValue *Start,
+                                                           DebugLoc DL) {
+    return tryInsertInstruction(new VPCanonicalIVIncrementRecipe(Start, DL));
+  }
+
   //===--------------------------------------------------------------------===//
   // RAII helpers.
   //===--------------------------------------------------------------------===//
diff --git a/llvm/lib/Transforms/Vectorize/VPlan.h b/llvm/lib/Transforms/Vectorize/VPlan.h
index da2f6f8c7cd03..5302bee8a5118 100644
--- a/llvm/lib/Transforms/Vectorize/VPlan.h
+++ b/llvm/lib/Transforms/Vectorize/VPlan.h
@@ -418,6 +418,7 @@ class LLVM_ABI_FOR_TEST VPRecipeBase
     VPScalarIVStepsSC,
     VPVectorPointerSC,
     VPVectorEndPointerSC,
+    VPCanonicalIVIncrementSC,
     VPWidenCallSC,
     VPWidenCanonicalIVSC,
     VPWidenCastSC,
@@ -608,6 +609,7 @@ class VPSingleDefRecipe : public VPRecipeBase, public VPRecipeValue {
     case VPRecipeBase::VPScalarIVStepsSC:
     case VPRecipeBase::VPVectorPointerSC:
     case VPRecipeBase::VPVectorEndPointerSC:
+    case VPRecipeBase::VPCanonicalIVIncrementSC:
     case VPRecipeBase::VPWidenCallSC:
     case VPRecipeBase::VPWidenCanonicalIVSC:
     case VPRecipeBase::VPWidenCastSC:
@@ -1209,8 +1211,6 @@ class LLVM_ABI_FOR_TEST VPInstruction : public VPRecipeWithIRFlags,
     ActiveLaneMask,
     ExplicitVectorLength,
     CalculateTripCountMinusVF,
-    // Increment the canonical IV separately for each unrolled part.
-    CanonicalIVIncrementForPart,
     // Abstract instruction that compares two values and branches. This is
     // lowered to ICmp + BranchOnCond during VPlan to VPlan transformation.
     BranchOnCount,
@@ -3926,6 +3926,57 @@ class VPCurrentIterationPHIRecipe : public VPHeaderPHIRecipe {
 #endif
 };
 
+/// A recipe to compute increments to a canonical induction, supplied as \p
+/// Start, that has non-uniform unrolling. Unrolling adds an extra offset
+/// operand for unrolled parts > 0 and it produces `add Start, Offset`. The
+/// offset for unrolled part 0 is 0.
+class VPCanonicalIVIncrementRecipe : public VPSingleDefRecipe {
+public:
+  VPCanonicalIVIncrementRecipe(VPValue *Start, DebugLoc DL)
+      : VPSingleDefRecipe(VPRecipeBase::VPCanonicalIVIncrementSC, Start, DL) {}
+
+  VP_CLASSOF_IMPL(VPRecipeBase::VPCanonicalIVIncrementSC)
+
+  VPValue *getOffset() const {
+    return getNumOperands() == 2 ? getOperand(1) : nullptr;
+  }
+
+  void execute(VPTransformState &State) override;
+
+  bool usesFirstLaneOnly(const VPValue *Op) const override {
+    assert(is_contained(operands(), Op) &&
+           "Op must be an operand of the recipe");
+    return true;
+  }
+
+  bool usesFirstPartOnly(const VPValue *Op) const override {
+    assert(is_contained(operands(), Op) &&
+           "Op must be an operand of the recipe");
+    assert(getNumOperands() <= 2 && "must have at most two operands");
+    return true;
+  }
+
+  VPCanonicalIVIncrementRecipe *clone() override {
+    auto *Clone =
+        new VPCanonicalIVIncrementRecipe(getOperand(0), getDebugLoc());
+    if (auto *Off = getOffset())
+      Clone->addOperand(Off);
+    return Clone;
+  }
+
+  InstructionCost computeCost(ElementCount VF,
+                              VPCostContext &Ctx) const override {
+    // TODO: Compute accurate cost after retiring the legacy cost model.
+    return 0;
+  }
+
+protected:
+#if !defined(NDEBUG) || defined(LLVM_ENABLE_DUMP)
+  void printRecipe(raw_ostream &O, const Twine &Indent,
+                   VPSlotTracker &SlotTracker) const override;
+#endif
+};
+
 /// A Recipe for widening the canonical induction variable of the vector loop.
 class VPWidenCanonicalIVRecipe : public VPSingleDefRecipe,
                                  public VPUnrollPartAccessor<1> {
diff --git a/llvm/lib/Transforms/Vectorize/VPlanAnalysis.cpp b/llvm/lib/Transforms/Vectorize/VPlanAnalysis.cpp
index 998e48d411f50..f9be5808373e4 100644
--- a/llvm/lib/Transforms/Vectorize/VPlanAnalysis.cpp
+++ b/llvm/lib/Transforms/Vectorize/VPlanAnalysis.cpp
@@ -112,7 +112,6 @@ Type *VPTypeAnalysis::inferScalarTypeForRecipe(const VPInstruction *R) {
   case VPInstruction::FirstOrderRecurrenceSplice:
   case VPInstruction::Not:
   case VPInstruction::CalculateTripCountMinusVF:
-  case VPInstruction::CanonicalIVIncrementForPart:
   case VPInstruction::AnyOf:
   case VPInstruction::BuildStructVector:
   case VPInstruction::BuildVector:
@@ -301,10 +300,10 @@ Type *VPTypeAnalysis::inferScalarType(const VPValue *V) {
               [](const auto *R) { return R->getScalarType(); })
           .Case<VPReductionRecipe, VPPredInstPHIRecipe, VPWidenPHIRecipe,
                 VPScalarIVStepsRecipe, VPWidenGEPRecipe, VPVectorPointerRecipe,
-                VPVectorEndPointerRecipe, VPWidenCanonicalIVRecipe>(
-              [this](const VPRecipeBase *R) {
-                return inferScalarType(R->getOperand(0));
-              })
+                VPVectorEndPointerRecipe, VPCanonicalIVIncrementRecipe,
+                VPWidenCanonicalIVRecipe>([this](const VPRecipeBase *R) {
+            return inferScalarType(R->getOperand(0));
+          })
           // VPInstructionWithType must be handled before VPInstruction.
           .Case<VPInstructionWithType, VPWidenIntrinsicRecipe,
                 VPWidenCastRecipe>(
diff --git a/llvm/lib/Transforms/Vectorize/VPlanPatternMatch.h b/llvm/lib/Transforms/Vectorize/VPlanPatternMatch.h
index 1205f04fb5c29..b574154e21b72 100644
--- a/llvm/lib/Transforms/Vectorize/VPlanPatternMatch.h
+++ b/llvm/lib/Transforms/Vectorize/VPlanPatternMatch.h
@@ -341,7 +341,8 @@ struct Recipe_match {
     if constexpr (std::is_same_v<RecipeTy, VPScalarIVStepsRecipe> ||
                   std::is_same_v<RecipeTy, VPCanonicalIVPHIRecipe> ||
                   std::is_same_v<RecipeTy, VPDerivedIVRecipe> ||
-                  std::is_same_v<RecipeTy, VPVectorEndPointerRecipe>)
+                  std::is_same_v<RecipeTy, VPVectorEndPointerRecipe> ||
+                  std::is_same_v<RecipeTy, VPCanonicalIVIncrementRecipe>)
       return DefR;
     else
       return DefR && DefR->getOpcode() == Opcode;
@@ -868,6 +869,12 @@ inline auto m_c_LogicalOr(const Op0_t &Op0, const Op1_t &Op1) {
 
 inline auto m_CanonicalIV() { return class_match<VPCanonicalIVPHIRecipe>(); }
 
+template <typename Op0_t, typename Op1_t>
+inline auto m_CanonicalIVIncrement(const Op0_t &Op0, const Op1_t &Op1) {
+  return Recipe_match<std::tuple<Op0_t, Op1_t>, 0, false,
+                      VPCanonicalIVIncrementRecipe>({Op0, Op1});
+}
+
 template <typename Op0_t, typename Op1_t, typename Op2_t>
 using VPScalarIVSteps_match = Recipe_match<std::tuple<Op0_t, Op1_t, Op2_t>, 0,
                                            false, VPScalarIVStepsRecipe>;
diff --git a/llvm/lib/Transforms/Vectorize/VPlanRecipes.cpp b/llvm/lib/Transforms/Vectorize/VPlanRecipes.cpp
index d149723e11fb6..2554730ea5fb9 100644
--- a/llvm/lib/Transforms/Vectorize/VPlanRecipes.cpp
+++ b/llvm/lib/Transforms/Vectorize/VPlanRecipes.cpp
@@ -82,6 +82,7 @@ bool VPRecipeBase::mayWriteToMemory() const {
   case VPReductionPHISC:
   case VPScalarIVStepsSC:
   case VPPredInstPHISC:
+  case VPCanonicalIVIncrementSC:
     return false;
   case VPBlendSC:
   case VPReductionEVLSC:
@@ -134,6 +135,7 @@ bool VPRecipeBase::mayReadFromMemory() const {
   case VPScalarIVStepsSC:
   case VPWidenStoreEVLSC:
   case VPWidenStoreSC:
+  case VPCanonicalIVIncrementSC:
     return false;
   case VPBlendSC:
   case VPReductionEVLSC:
@@ -169,6 +171,7 @@ bool VPRecipeBase::mayHaveSideEffects() const {
   case VPReductionPHISC:
   case VPPredInstPHISC:
   case VPVectorEndPointerSC:
+  case VPCanonicalIVIncrementSC:
     return false;
   case VPInstructionSC: {
     auto *VPI = cast<VPInstruction>(this);
@@ -497,7 +500,6 @@ unsigned VPInstruction::getNumOperandsForOpcode() const {
   case VPInstruction::AnyOf:
   case VPInstruction::BuildStructVector:
   case VPInstruction::BuildVector:
-  case VPInstruction::CanonicalIVIncrementForPart:
   case VPInstruction::ComputeAnyOfResult:
   case VPInstruction::ComputeReductionResult:
   case VPInstruction::FirstActiveLane:
@@ -530,7 +532,6 @@ bool VPInstruction::canGenerateScalarForFirstLane() const {
   case VPInstruction::BranchOnTwoConds:
   case VPInstruction::BranchOnCount:
   case VPInstruction::CalculateTripCountMinusVF:
-  case VPInstruction::CanonicalIVIncrementForPart:
   case VPInstruction::PtrAdd:
   case VPInstruction::ExplicitVectorLength:
   case VPInstruction::AnyOf:
@@ -655,14 +656,6 @@ Value *VPInstruction::generate(VPTransformState &State) {
         {AVL, VFArg, Builder.getTrue()});
     return EVL;
   }
-  case VPInstruction::CanonicalIVIncrementForPart: {
-    auto *IV = State.get(getOperand(0), VPLane(0));
-    auto *VFxPart = State.get(getOperand(1), VPLane(0));
-    // The canonical IV is incremented by the vectorization factor (num of
-    // SIMD elements) times the unroll part.
-    return Builder.CreateAdd(IV, VFxPart, Name, hasNoUnsignedWrap(),
-                             hasNoSignedWrap());
-  }
   case VPInstruction::BranchOnCond: {
     Value *Cond = State.get(getOperand(0), VPLane(0));
     // Replace the temporary unreachable terminator with a new conditional
@@ -1342,7 +1335,6 @@ bool VPInstruction::opcodeMayReadOrWriteFromMemory() const {
   case VPInstruction::BuildStructVector:
   case VPInstruction::BuildVector:
   case VPInstruction::CalculateTripCountMinusVF:
-  case VPInstruction::CanonicalIVIncrementForPart:
   case VPInstruction::ExtractLane:
   case VPInstruction::ExtractLastLane:
   case VPInstruction::ExtractLastPart:
@@ -1396,7 +1388,6 @@ bool VPInstruction::usesFirstLaneOnly(const VPValue *Op) const {
   case VPInstruction::ActiveLaneMask:
   case VPInstruction::ExplicitVectorLength:
   case VPInstruction::CalculateTripCountMinusVF:
-  case VPInstruction::CanonicalIVIncrementForPart:
   case VPInstruction::BranchOnCount:
   case VPInstruction::BranchOnCond:
   case VPInstruction::Broadcast:
@@ -1437,7 +1428,6 @@ bool VPInstruction::usesFirstPartOnly(const VPValue *Op) const {
   case VPInstruction::BranchOnCount:
   case VPInstruction::BranchOnCond:
   case VPInstruction::BranchOnTwoConds:
-  case VPInstruction::CanonicalIVIncrementForPart:
     return true;
   };
   llvm_unreachable("switch should return");
@@ -1486,9 +1476,6 @@ void VPInstruction::printRecipe(raw_ostream &O, const Twine &Indent,
   case VPInstruction::CalculateTripCountMinusVF:
     O << "TC > VF ? TC - VF : 0";
     break;
-  case VPInstruction::CanonicalIVIncrementForPart:
-    O << "VF * Part +";
-    break;
   case VPInstruction::BranchOnCount:
     O << "branch-on-count";
     break;
@@ -2103,7 +2090,6 @@ VPIRFlags VPIRFlags::getDefaultFlags(unsigned Opcode) {
   case Instruction::Sub:
   case Instruction::Mul:
   case Instruction::Shl:
-  case VPInstruction::CanonicalIVIncrementForPart:
     return WrapFlagsTy(false, false);
   case Instruction::Trunc:
     return TruncFlagsTy(false, false);
@@ -2144,8 +2130,7 @@ bool VPIRFlags::flagsValidForOpcode(unsigned Opcode) const {
   switch (OpType) {
   case OperationType::OverflowingBinOp:
     return Opcode == Instruction::Add || Opcode == Instruction::Sub ||
-           Opcode == Instruction::Mul || Opcode == Instruction::Shl ||
-           Opcode == VPInstruction::VPInstruction::CanonicalIVIncrementForPart;
+           Opcode == Instruction::Mul || Opcode == Instruction::Shl;
   case OperationType::Trunc:
     return Opcode == Instruction::Trunc;
   case OperationType::DisjointOp:
@@ -4522,6 +4507,26 @@ void VPExpandSCEVRecipe::printRecipe(raw_ostream &O, const Twine &Indent,
 }
 #endif
 
+void VPCanonicalIVIncrementRecipe::execute(VPTransformState &State) {
+  auto &Builder = State.Builder;
+  assert(getOffset() &&
+         "Expected prior simplification of recipe without offset");
+  Value *Start = State.get(getOperand(0), true);
+  Value *Offset = State.get(getOffset(), true);
+  Value *Result = Builder.CreateAdd(Start, Offset, "caniv.inc");
+  State.set(this, Result, true);
+}
+
+#if !defined(NDEBUG) || defined(LLVM_ENABLE_DUMP)
+void VPCanonicalIVIncrementRecipe::printRecipe(
+    raw_ostream &O, const Twine &Indent, VPSlotTracker &SlotTracker) const {
+  O << Indent << "EMIT ";
+  printAsOperand(O, SlotTracker);
+  O << " = CANONICAL-INDUCTION-INC ";
+  printOperands(O, SlotTracker);
+}
+#endif
+
 void VPWidenCanonicalIVRecipe::execute(VPTransformState &State) {
   Value *CanonicalIV = State.get(getOperand(0), /*IsScalar*/ true);
   Type *STy = CanonicalIV->getType();
diff --git a/llvm/lib/Transforms/Vectorize/VPlanTransforms.cpp b/llvm/lib/Transforms/Vectorize/VPlanTransforms.cpp
index 33e5b25bd9322..d879917eb3c30 100644
--- a/llvm/lib/Transforms/Vectorize/VPlanTransforms.cpp
+++ b/llvm/lib/Transforms/Vectorize/VPlanTransforms.cpp
@@ -1627,6 +1627,11 @@ static void simplifyRecipe(VPSingleDefRecipe *Def, VPTypeAnalysis &TypeInfo) {
     if (!VPR->getOffset() || match(VPR->getOffset(), m_ZeroInt()))
       return VPR->replaceAllUsesWith(VPR->getOperand(0));
 
+  // Similarly, simplify unrolled CanonicalIVIncrement.
+  if (auto *CIVInc = dyn_cast<VPCanonicalIVIncrementRecipe>(Def))
+    if (!CIVInc->getOffset() || match(CIVInc->getOffset(), m_ZeroInt()))
+      return CIVInc->replaceAllUsesWith(CIVInc->getOperand(0));
+
   // VPScalarIVSteps after unrolling can be replaced by their start value, if
   // the start index is zero and only the first lane 0 is demanded.
   if (auto *Steps = dyn_cast<VPScalarIVStepsRecipe>(Def)) {
@@ -2065,18 +2070,17 @@ static bool tryToReplaceALMWithWideALM(VPlan &Plan, ElementCount VF,
           m_ActiveLaneMask(m_VPValue(Index), m_VPValue(), m_VPValue()));
     assert(Index && "Expected index from ActiveLaneMask instruction");
 
-    uint64_t Part;
-    if (match(Index,
-              m_VPInstruction<VPInstruction::CanonicalIVIncrementForPart>(
-                  m_VPValue(), m_Mul(m_VPValue(), m_ConstantInt(Part)))))
-      Phis[Part] = Phi;
-    else {
-      // Anything other than a CanonicalIVIncrementForPart is part 0
-      assert(!match(
-          Index,
-          m_VPInstruction<VPInstruction::CanonicalIVIncrementForPart>()));
+    auto *CIVInc = cast<VPCanonicalIVIncrementRecipe>(Index);
+    if (!CIVInc->getOffset()) {
+      // Part 0 CanonicalIVIncrement.
       Phis[0] = Phi;
+      continue;
     }
+    uint64_t Part;
+    assert(match(Index, m_CanonicalIVIncrement(
+                            m_VPValue(), m_c_Mul(m_Specific(&Plan.getVF()),
+                                                 m_ConstantInt(Part)))));
+    Phis[Part] = Phi;
   }
 
   assert(all_of(Phis, [](VPActiveLaneMaskPHIRecipe *Phi) { return Phi; }) &&
@@ -2925,11 +2929,7 @@ addVPLaneMaskPhiAndUpdateExitBranch(VPlan &Plan) {
 
   // Create the ActiveLaneMask instruction using the correct start values.
   VPValue *TC = Plan.getTripCount();
-  VPValue *VF = &Plan.getVF();
-
-  auto *EntryIncrement = Builder.createOverflowingOp(
-      VPInstruction::CanonicalIVIncrementForPart, {StartV, VF}, {false, false},
-      DL, "index.part.next");
+  auto *EntryIncrement = Builder.createCanonicalIVIncrement(StartV, DL);
 
   // Create the active lane mask instruction in the VPlan preheader.
   VPValue *ALMMultiplier =
@@ -2948,9 +2948,8 @@ addVPLaneMaskPhiAndUpdateExitBranch(VPlan &Plan) {
   // original terminator.
   VPRecipeBase *OriginalTerminator = EB->getTerminator();
   Builder.setInsertPoint(OriginalTerminator);
-  auto *InLoopIncrement = Builder.createOverflowingOp(
-      VPInstruction::CanonicalIVIncrementForPart,
-      {CanonicalIVIncrement, &Plan.getVF()}, {false, false}, DL);
+  auto *InLoopIncrement =
+      Builder.createCanonicalIVIncrement(CanonicalIVIncrement, DL);
   auto *ALM = Builder.createNaryOp(VPInstruction::ActiveLaneMask,
                                    {InLoopIncrement, TC, ALMMultiplier}, DL,
                                    "active.lane.mask.next");
diff --git a/llvm/lib/Transforms/Vectorize/VPlanUnroll.cpp b/llvm/lib/Transforms/Vectorize/VPlanUnroll.cpp
index 2d961808d3bcd..88816fbc3f7a3 100644
--- a/llvm/lib/Transforms/Vectorize/VPlanUnroll.cpp
+++ b/llvm/lib/Transforms/Vectorize/VPlanUnroll.cpp
@@ -351,6 +351,15 @@ void UnrollState::unrollRecipeByUF(VPRecipeBase &R) {
       Copy->addOperand(VFxPart);
       continue;
     }
+    if (auto *CIVInc = dyn_cast<VPCanonicalIVIncrementRecipe>(&R)) {
+      VPBuilder Builder(CIVInc);
+      VPValue *VFxPart = Builder.createOverflowingOp(
+          Instruction::Mul, {&Plan.getVF(), getConstantInt(Part)},
+          {true, true});
+      Copy->setOperand(0, CIVInc->getOperand(0));
+      Copy->addOperand(VFxPart);
+      continue;
+    }
     if (auto *Red = dyn_cast<VPReductionRecipe>(&R)) {
       auto *Phi = dyn_cast<VPReductionPHIRecipe>(R.getOperand(0));
       if (Phi && Phi->isOrdered()) {
@@ -380,14 +389,6 @@ void UnrollState::unrollRecipeByUF(VPRecipeBase &R) {
     // requiring it.
     if (isa<VPWidenCanonicalIVRecipe>(Copy))
       Copy->addOperand(getConstantInt(Part));
-
-    if (match(Copy,
-              m_VPInstruction<VPInstruction::CanonicalIVIncrementForPart>())) {
-      VPBuilder Builder(Copy);
-      VPValue *ScaledByPart = Builder.createOverflowingOp(
-          Instruction::Mul, {Copy->getOperand(1), getConstantInt(Part)});
-      Copy->setOperand(1, ScaledByPart);
-    }
   }
   if (auto *VEPR = dyn_cast<VPVectorEndPointerRecipe>(&R)) {
     // Materialize Part0 offset for VectorEndPointer.
@@ -489,20 +490,6 @@ void VPlanTransforms::unrollByUF(VPlan &Plan, unsigned UF) {
   assert(UF > 0 && "Unroll factor must be positive");
   Plan.setUF(UF);
   llvm::scope_exit Cleanup([&Plan, UF]() {
-    auto Iter = vp_depth_first_deep(Plan.getEntry());
-    // Remove recipes that are redundant after unrolling.
-    for (VPBasicBlock *VPBB : VPBlockUtils::blocksOnly<VPBasicBlock>(Iter)) {
-      for (VPRecipeBase &R : make_early_inc_range(*VPBB)) {
-        auto *VPI = dyn_cast<VPInstruction>(&R);
-        if (VPI &&
-            VPI->getOpcode() == VPInstruction::CanonicalIVIncrementForPart &&
-            VPI->getOperand(1) == &Plan.getVF()) {
-          VPI->replaceAllUsesWith(VPI->getOperand(0));
-          VPI->eraseFromParent();
-        }
-      }
-    }
-
     Type *TCTy = VPTypeAnalysis(Plan).inferScalarType(Plan.getTripCount());
     Plan.getUF().replaceAllUsesWith(Plan.getConstantInt(TCTy, UF));
   });
diff --git a/llvm/lib/Transforms/Vectorize/VPlanUtils.cpp b/llvm/lib/Transforms/Vectorize/VPlanUtils.cpp
index 821a4f7911bb8..a11b885ca576f 100644
--- a/llvm/lib/Transforms/Vectorize/VPlanUtils.cpp
+++ b/llvm/lib/Transforms/Vectorize/VPlanUtils.cpp
@@ -404,8 +404,7 @@ bool vputils::isUniformAcrossVFsAndUFs(VPValue *V) {
   VPlan *Plan = VPBB ? VPBB->getPlan() : nullptr;
   if (VPBB &&
       (VPBB == Plan->getVectorPreheader() || VPBB == Plan->getEntry())) {
-    if (match(V->getDefiningRecipe(),
-              m_VPInstruction<VPInstruction::CanonicalIVIncrementForPart>()))
+    if (isa<VPCanonicalIVIncrementRecipe>(V->getDefiningRecipe()))
       return false;
     return all_of(R->operands(), isUniformAcrossVFsAndUFs);
   }
diff --git a/llvm/test/Transforms/LoopVectorize/AArch64/scalable-strict-fadd.ll b/llvm/test/Transforms/LoopVectorize/AArch64/scalable-strict-fadd.ll
index b897982700400..7e7d7ddd85c3a 100644
--- a/llvm/test/Transforms/LoopVectorize/AArch64/scalable-strict-fadd.ll
+++ b/llvm/test/Transforms/LoopVectorize/AArch64/scalable-strict-fadd.ll
@@ -304,22 +304,22 @@ define float @fadd_strict_unroll(ptr noalias nocapture readonly %a, i64 %n) #0 {
 ; CHECK-ORDERED-TF-NEXT:    [[TMP0:%.*]] = call i64 @llvm.vscale.i64()
 ; CHECK-ORDERED-TF-NEXT:    [[TMP1:%.*]] = shl nuw i64 [[TMP0]], 3
 ; CHECK-ORDERED-TF-NEXT:    [[TMP2:%.*]] = shl nuw i64 [[TMP1]], 2
-; CHECK-ORDERED-TF-NEXT:    [[INDEX_PART_NEXT:%.*]] = add i64 0, [[TMP1]]
-; CHECK-ORDERED-TF-NEXT:    [[TMP3:%.*]] = shl i64 [[TMP1]], 1
-; CHECK-ORDERED-TF-NEXT:    [[INDEX_PART_NEXT1:%.*]] = add i64 0, [[TMP3]]
-; CHECK-ORDERED-TF-NEXT:    [[TMP4:%.*]] ...
[truncated]

@artagnon artagnon changed the title [VPlan] Introduce dedicated canonical IV inc recipe [VPlan] Simplify CanonicalIVIncrement unrolling Mar 11, 2026
Copy link
Copy Markdown
Contributor

@fhahn fhahn left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could you elaborate more on the benefit of this? It seems like explicit unrolling as-is is a good fit here.

The cleanup in unrollByUF for UF = 1 is not nice, but perhaps that could be moved to convertToConcreteRecips?

@artagnon
Copy link
Copy Markdown
Contributor Author

Could you elaborate more on the benefit of this? It seems like explicit unrolling as-is is a good fit here.

The cleanup in unrollByUF for UF = 1 is not nice, but perhaps that could be moved to convertToConcreteRecips?

If I understand correctly, we want to minimize uses of VF, so that things like EVL replacement will have an easier time? Besides, the added VF operand is misleading, and I thought we could unroll this exactly as we unroll VectorPointerRecipe for uniformity -- this recipe is identical to VectorPointerRecipe, with the GEP replaced by an add?

Copy link
Copy Markdown
Contributor

@fhahn fhahn left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could you elaborate more on the benefit of this? It seems like explicit unrolling as-is is a good fit here.
The cleanup in unrollByUF for UF = 1 is not nice, but perhaps that could be moved to convertToConcreteRecips?

If I understand correctly, we want to minimize uses of VF, so that things like EVL replacement will have an easier time? Besides, the added VF operand is misleading, and I thought we could unroll this exactly as we unroll VectorPointerRecipe for uniformity -- this recipe is identical to VectorPointerRecipe, with the GEP replaced by an add?

But I think as-is this is not necessarily correct; at this late stage, we may not be able to access getVF, .e.g if we narrowed interleave groups or other future transforms that limit the number of iterations processed. Should be guarded against by #182318.

Similarly, for the EVL transform to be correct, we must have all uses of VF explicit I think, so we can check if we can update them/convert them to EVL.

Not sure if it is possible to create a test where we narrow interleave groups and have CanonicalIVIncrementForPart.

@artagnon
Copy link
Copy Markdown
Contributor Author

Could you elaborate more on the benefit of this? It seems like explicit unrolling as-is is a good fit here.
The cleanup in unrollByUF for UF = 1 is not nice, but perhaps that could be moved to convertToConcreteRecips?

If I understand correctly, we want to minimize uses of VF, so that things like EVL replacement will have an easier time? Besides, the added VF operand is misleading, and I thought we could unroll this exactly as we unroll VectorPointerRecipe for uniformity -- this recipe is identical to VectorPointerRecipe, with the GEP replaced by an add?

But I think as-is this is not necessarily correct; at this late stage, we may not be able to access getVF, .e.g if we narrowed interleave groups or other future transforms that limit the number of iterations processed. Should be guarded against by #182318.

Similarly, for the EVL transform to be correct, we must have all uses of VF explicit I think, so we can check if we can update them/convert them to EVL.

Not sure if it is possible to create a test where we narrow interleave groups and have CanonicalIVIncrementForPart.

Ah, interesting! Thanks for the explanation: this makes me wonder if we made an error with directly unrolling Vector(End)Pointer? Will update this patch to move the scope_exit code to convertToConcreteRecipes.

@fhahn
Copy link
Copy Markdown
Contributor

fhahn commented Mar 12, 2026

Ah, interesting! Thanks for the explanation: this makes me wonder if we made an error with directly unrolling Vector(End)Pointer? Will update this patch to move the scope_exit code to convertToConcreteRecipes.

Don't think so, although if easily done it would also be good to explicitly unroll Vector(End)Pointer I think, if it keeps things simple

@artagnon
Copy link
Copy Markdown
Contributor Author

Ah, interesting! Thanks for the explanation: this makes me wonder if we made an error with directly unrolling Vector(End)Pointer? Will update this patch to move the scope_exit code to convertToConcreteRecipes.

Don't think so, although if easily done it would also be good to explicitly unroll Vector(End)Pointer I think, if it keeps things simple

So the issue is that convertToConcreteRecipes runs after unrolling; how can we match a VF offset (as the scope_exit code that this patch removes currently does), as unrolling would create VF * 1 for the first part, which would then be simplified to just VF? Part 1 would then be erroneously simplified, when we just want to simplify Part 0.

@fhahn
Copy link
Copy Markdown
Contributor

fhahn commented Mar 12, 2026

Ah, interesting! Thanks for the explanation: this makes me wonder if we made an error with directly unrolling Vector(End)Pointer? Will update this patch to move the scope_exit code to convertToConcreteRecipes.

Don't think so, although if easily done it would also be good to explicitly unroll Vector(End)Pointer I think, if it keeps things simple

So the issue is that convertToConcreteRecipes runs after unrolling; how can we match a VF offset (as the scope_exit code that this patch removes currently does), as unrolling would create VF * 1 for the first part, which would then be simplified to just VF? Part 1 would then be erroneously simplified, when we just want to simplify Part 0.

ah right, perhaps it would be possible to completely remove the opcode during unrolling?

@artagnon
Copy link
Copy Markdown
Contributor Author

artagnon commented Mar 13, 2026

Ah, interesting! Thanks for the explanation: this makes me wonder if we made an error with directly unrolling Vector(End)Pointer? Will update this patch to move the scope_exit code to convertToConcreteRecipes.

Don't think so, although if easily done it would also be good to explicitly unroll Vector(End)Pointer I think, if it keeps things simple

So the issue is that convertToConcreteRecipes runs after unrolling; how can we match a VF offset (as the scope_exit code that this patch removes currently does), as unrolling would create VF * 1 for the first part, which would then be simplified to just VF? Part 1 would then be erroneously simplified, when we just want to simplify Part 0.

ah right, perhaps it would be possible to completely remove the opcode during unrolling?

Not sure what you mean? Are you suggesting that we replace the recipe with an Add recipe or something during unrolling?

Due to the fundamental conflation between the not-unrolled recipe and Part 1 of the unrolled recipe, the best we can do is the current scope_exit code in the unroller, which relies on the fact that VPBuilder doesn't simplify 1 * VF to VF: it is hence impossible to do better due to the conflation, unless we decide to either remove the VF operand, or add a third operand. I'm inclined to close this PR and leave this unresolved, unless we can find some kind of solution?

@artagnon
Copy link
Copy Markdown
Contributor Author

After some more thought, I think the Right Thing to do is to add an additional operand, which would also have the benefit of making the code in replaceALMWithWideALM less flaky. I was wrong, and CanonicalIVIncrement is fundamentally different from VectorPointer: the former is dependent on the canonical IV, which would be touched by the EVL and narrow-interleave-groups transforms, while the latter is simply a ptradd that has nothing to do with the canonical IV, which doesn't need to be handled by the EVL transform, and is handled in narrow-interleave-groups by simply looking through it.

I think the line of the work in this patch will ultimately clean up:

  auto *CanonicalIVIncrement =
      cast<VPInstruction>(CanonicalIVPHI->getBackedgeValue());
  Builder.setInsertPoint(CanonicalIVIncrement);
  VPValue *OpVPEVL = VPEVL;

  auto *I32Ty = Type::getInt32Ty(Plan.getContext());
  OpVPEVL = Builder.createScalarZExtOrTrunc(
      OpVPEVL, CanIVTy, I32Ty, CanonicalIVIncrement->getDebugLoc());

  auto *NextIter = Builder.createAdd(OpVPEVL, CurrentIteration,
                                     CanonicalIVIncrement->getDebugLoc(),
                                     "current.iteration.next",
                                     {CanonicalIVIncrement->hasNoUnsignedWrap(),
                                      CanonicalIVIncrement->hasNoSignedWrap()});
  CurrentIteration->addOperand(NextIter);

  VPValue *NextAVL =
      Builder.createSub(AVLPhi, OpVPEVL, DebugLoc::getCompilerGenerated(),
                        "avl.next", {/*NUW=*/true, /*NSW=*/false});
  AVLPhi->addOperand(NextAVL);

  fixupVFUsersForEVL(Plan, *VPEVL);
  removeDeadRecipes(Plan);

  // Replace all uses of VPCanonicalIVPHIRecipe by
  // VPCurrentIterationPHIRecipe except for the canonical IV increment.
  CanonicalIVPHI->replaceAllUsesWith(CurrentIteration);
  CanonicalIVIncrement->setOperand(0, CanonicalIVPHI);
  // TODO: support unroll factor > 1.
  Plan.setUF(1);

Will update soon.

@artagnon artagnon changed the title [VPlan] Simplify CanonicalIVIncrement unrolling [VPlan] Robustly handle Part0 for VecPointer, CanIVInc Mar 13, 2026
Add an extra Offset operand to CanonicalIVIncrement instead of
conflating VF operand on the non-unrolled recipe with 1 * VF on Part 1
of the unrolled recipe, and create Part0 offsets for VectorPointer and
CanonicalIVIncrement recipes, exactly as we do it for VectorEndPointer,
to make the VPlan logic more robust.

The patch is mostly non-functional, with the exception of some
wrap-flags.
@artagnon
Copy link
Copy Markdown
Contributor Author

Gentle ping. Not sure how I feel about this cleanup -- while it certainly makes things more consistent, it perhaps needs to be motivated by a follow-up?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants