[VPlan][NFC] Extract addCurrentIterationPhi from addExplicitVectorLength#182650
[VPlan][NFC] Extract addCurrentIterationPhi from addExplicitVectorLength#182650
Conversation
|
@llvm/pr-subscribers-backend-risc-v @llvm/pr-subscribers-llvm-transforms Author: Shih-Po Hung (arcbbb) ChangesRefactor the creation of VPCurrentIterationPHIRecipe into a separate transformation pass
Full diff: https://github.com/llvm/llvm-project/pull/182650.diff 5 Files Affected:
diff --git a/llvm/lib/Transforms/Vectorize/LoopVectorize.cpp b/llvm/lib/Transforms/Vectorize/LoopVectorize.cpp
index 69d2b9f2c1a28..a9abacf38c4fa 100644
--- a/llvm/lib/Transforms/Vectorize/LoopVectorize.cpp
+++ b/llvm/lib/Transforms/Vectorize/LoopVectorize.cpp
@@ -8145,6 +8145,7 @@ void LoopVectorizationPlanner::buildVPlansWithVPRecipes(ElementCount MinVF,
RUN_VPLAN_PASS(VPlanTransforms::optimize, *Plan);
// TODO: try to put addExplicitVectorLength close to addActiveLaneMask
if (CM.foldTailWithEVL()) {
+ RUN_VPLAN_PASS(VPlanTransforms::addCurrentIterationPhi, *Plan);
RUN_VPLAN_PASS(VPlanTransforms::addExplicitVectorLength, *Plan,
CM.getMaxSafeElements());
RUN_VPLAN_PASS(VPlanTransforms::optimizeEVLMasks, *Plan);
diff --git a/llvm/lib/Transforms/Vectorize/VPlanTransforms.cpp b/llvm/lib/Transforms/Vectorize/VPlanTransforms.cpp
index b4c74c8776fc3..885aadac21316 100644
--- a/llvm/lib/Transforms/Vectorize/VPlanTransforms.cpp
+++ b/llvm/lib/Transforms/Vectorize/VPlanTransforms.cpp
@@ -3207,16 +3207,43 @@ static void fixupVFUsersForEVL(VPlan &Plan, VPValue &EVL) {
HeaderMask->replaceAllUsesWith(EVLMask);
}
+/// Add a VPCurrentIterationPHIRecipe to \p Plan and replaces all uses of
+/// VPCanonicalIVPHIRecipe with VPCurrentIterationPHIRecipe, except for the
+/// canonical IV increment. After this transfomration, VPCanonicalIVPHIRecipe
+/// is used only for loop iterations counting.
+void VPlanTransforms::addCurrentIterationPhi(VPlan &Plan) {
+ if (Plan.hasScalarVFOnly())
+ return;
+ VPRegionBlock *LoopRegion = Plan.getVectorLoopRegion();
+ VPBasicBlock *Header = LoopRegion->getEntryBasicBlock();
+
+ auto *CanonicalIVPHI = LoopRegion->getCanonicalIV();
+ auto *CanonicalIVIncrement =
+ cast<VPInstruction>(CanonicalIVPHI->getBackedgeValue());
+ auto *CanIVTy = LoopRegion->getCanonicalIVType();
+ VPValue *StartV = CanonicalIVPHI->getStartValue();
+ // Create the CurrentIteration recipe in the vector loop.
+ auto *CurrentIteration =
+ new VPCurrentIterationPHIRecipe(StartV, DebugLoc::getUnknown());
+ CurrentIteration->insertAfter(CanonicalIVPHI);
+ VPBuilder Builder(CanonicalIVIncrement);
+ auto *NextIter = Builder.createAdd(
+ CanonicalIVIncrement->getOperand(1), CurrentIteration,
+ CanonicalIVIncrement->getDebugLoc(), "current.iteration.next",
+ {CanonicalIVIncrement->hasNoUnsignedWrap(),
+ CanonicalIVIncrement->hasNoSignedWrap()});
+ CurrentIteration->addOperand(NextIter);
+
+ // Replace all uses of VPCanonicalIVPHIRecipe by
+ // VPCurrentIterationPHIRecipe except for the canonical IV increment.
+ CanonicalIVPHI->replaceAllUsesWith(CurrentIteration);
+ CanonicalIVIncrement->setOperand(0, CanonicalIVPHI);
+}
+
/// Converts a tail folded vector loop region to step by
/// VPInstruction::ExplicitVectorLength elements instead of VF elements each
/// iteration.
///
-/// - Add a VPCurrentIterationPHIRecipe and related recipes to \p Plan and
-/// replaces all uses except the canonical IV increment of
-/// VPCanonicalIVPHIRecipe with a VPCurrentIterationPHIRecipe.
-/// VPCanonicalIVPHIRecipe is used only for loop iterations counting after
-/// this transformation.
-///
/// - The header mask is replaced with a header mask based on the EVL.
///
/// - Plans with FORs have a new phi added to keep track of the EVL of the
@@ -3272,10 +3299,8 @@ void VPlanTransforms::addExplicitVectorLength(
auto *CanIVTy = LoopRegion->getCanonicalIVType();
VPValue *StartV = CanonicalIVPHI->getStartValue();
- // Create the CurrentIteration recipe in the vector loop.
- auto *CurrentIteration =
- new VPCurrentIterationPHIRecipe(StartV, DebugLoc::getUnknown());
- CurrentIteration->insertAfter(CanonicalIVPHI);
+ auto *CurrentIteration = vputils::getCurrentIterationPhi(Plan);
+ assert(CurrentIteration && "must have CurrentIteration");
VPBuilder Builder(Header, Header->getFirstNonPhi());
// Create the AVL (application vector length), starting from TC -> 0 in steps
// of EVL.
@@ -3293,22 +3318,21 @@ void VPlanTransforms::addExplicitVectorLength(
auto *VPEVL = Builder.createNaryOp(VPInstruction::ExplicitVectorLength, AVL,
DebugLoc::getUnknown(), "evl");
- auto *CanonicalIVIncrement =
- cast<VPInstruction>(CanonicalIVPHI->getBackedgeValue());
- Builder.setInsertPoint(CanonicalIVIncrement);
+ auto *NextIter = cast<VPInstruction>(CurrentIteration->getBackedgeValue());
+ assert(NextIter->getOperand(0) == &Plan.getVFxUF() &&
+ "CurrentIteration increment should be VFxUF");
+ Builder.setInsertPoint(NextIter);
VPValue *OpVPEVL = VPEVL;
auto *I32Ty = Type::getInt32Ty(Plan.getContext());
- OpVPEVL = Builder.createScalarZExtOrTrunc(
- OpVPEVL, CanIVTy, I32Ty, CanonicalIVIncrement->getDebugLoc());
-
- auto *NextIter = Builder.createAdd(OpVPEVL, CurrentIteration,
- CanonicalIVIncrement->getDebugLoc(),
- "current.iteration.next",
- {CanonicalIVIncrement->hasNoUnsignedWrap(),
- CanonicalIVIncrement->hasNoSignedWrap()});
- CurrentIteration->addOperand(NextIter);
+ OpVPEVL = Builder.createScalarZExtOrTrunc(OpVPEVL, CanIVTy, I32Ty,
+ NextIter->getDebugLoc());
+
+ NextIter->setOperand(0, OpVPEVL);
+ auto *CanonicalIVIncrement =
+ cast<VPInstruction>(CanonicalIVPHI->getBackedgeValue());
+ Builder.setInsertPoint(CanonicalIVIncrement);
VPValue *NextAVL =
Builder.createSub(AVLPhi, OpVPEVL, DebugLoc::getCompilerGenerated(),
"avl.next", {/*NUW=*/true, /*NSW=*/false});
@@ -3317,27 +3341,13 @@ void VPlanTransforms::addExplicitVectorLength(
fixupVFUsersForEVL(Plan, *VPEVL);
removeDeadRecipes(Plan);
- // Replace all uses of VPCanonicalIVPHIRecipe by
- // VPCurrentIterationPHIRecipe except for the canonical IV increment.
- CanonicalIVPHI->replaceAllUsesWith(CurrentIteration);
- CanonicalIVIncrement->setOperand(0, CanonicalIVPHI);
// TODO: support unroll factor > 1.
Plan.setUF(1);
}
void VPlanTransforms::convertToVariableLengthStep(VPlan &Plan) {
- // Find the vector loop entry by locating VPCurrentIterationPHIRecipe.
- // There should be only one VPCurrentIteration in the entire plan.
- VPCurrentIterationPHIRecipe *CurrentIteration = nullptr;
-
- for (VPBasicBlock *VPBB : VPBlockUtils::blocksOnly<VPBasicBlock>(
- vp_depth_first_shallow(Plan.getEntry())))
- for (VPRecipeBase &R : VPBB->phis())
- if (auto *PhiR = dyn_cast<VPCurrentIterationPHIRecipe>(&R)) {
- assert(!CurrentIteration &&
- "Found multiple CurrentIteration. Only one expected");
- CurrentIteration = PhiR;
- }
+ VPCurrentIterationPHIRecipe *CurrentIteration =
+ vputils::getCurrentIterationPhi(Plan);
// Early return if it is not variable-length stepping.
if (!CurrentIteration)
diff --git a/llvm/lib/Transforms/Vectorize/VPlanTransforms.h b/llvm/lib/Transforms/Vectorize/VPlanTransforms.h
index 972a18ebded63..3ece149b8b9c6 100644
--- a/llvm/lib/Transforms/Vectorize/VPlanTransforms.h
+++ b/llvm/lib/Transforms/Vectorize/VPlanTransforms.h
@@ -285,11 +285,12 @@ struct VPlanTransforms {
VPlan &Plan,
const std::function<bool(BasicBlock *)> &BlockNeedsPredication);
- /// Add a VPCurrentIterationPHIRecipe and related recipes to \p Plan and
- /// replaces all uses except the canonical IV increment of
- /// VPCanonicalIVPHIRecipe with a VPCurrentIterationPHIRecipe.
- /// VPCanonicalIVPHIRecipe is only used to control the loop after
- /// this transformation.
+ /// Add a VPCurrentIterationPHIRecipe to \p Plan and replaces all uses of
+ /// VPCanonicalIVPHIRecipe with VPCurrentIterationPHIRecipe, except for the
+ /// canonical IV increment.
+ static void addCurrentIterationPhi(VPlan &Plan);
+ /// Convert a tail-folded vector loop to use explicit vector length (EVL)
+ /// instead of VF elements each iteration.
static void
addExplicitVectorLength(VPlan &Plan,
const std::optional<unsigned> &MaxEVLSafeElements);
diff --git a/llvm/lib/Transforms/Vectorize/VPlanUtils.cpp b/llvm/lib/Transforms/Vectorize/VPlanUtils.cpp
index f5318bb1c6515..093cd741171c7 100644
--- a/llvm/lib/Transforms/Vectorize/VPlanUtils.cpp
+++ b/llvm/lib/Transforms/Vectorize/VPlanUtils.cpp
@@ -570,13 +570,12 @@ vputils::getRecipesForUncountableExit(VPlan &Plan,
VPSingleDefRecipe *vputils::findHeaderMask(VPlan &Plan) {
VPRegionBlock *LoopRegion = Plan.getVectorLoopRegion();
SmallVector<VPValue *> WideCanonicalIVs;
- auto *FoundWidenCanonicalIVUser = find_if(
- LoopRegion->getCanonicalIV()->users(), IsaPred<VPWidenCanonicalIVRecipe>);
- assert(count_if(LoopRegion->getCanonicalIV()->users(),
- IsaPred<VPWidenCanonicalIVRecipe>) <= 1 &&
+ VPHeaderPHIRecipe *LoopIndex = getLoopIndex(Plan);
+ auto *FoundWidenCanonicalIVUser =
+ find_if(LoopIndex->users(), IsaPred<VPWidenCanonicalIVRecipe>);
+ assert(count_if(LoopIndex->users(), IsaPred<VPWidenCanonicalIVRecipe>) <= 1 &&
"Must have at most one VPWideCanonicalIVRecipe");
- if (FoundWidenCanonicalIVUser !=
- LoopRegion->getCanonicalIV()->users().end()) {
+ if (FoundWidenCanonicalIVUser != LoopIndex->users().end()) {
auto *WideCanonicalIV =
cast<VPWidenCanonicalIVRecipe>(*FoundWidenCanonicalIVUser);
WideCanonicalIVs.push_back(WideCanonicalIV);
@@ -665,3 +664,36 @@ VPInstruction *vputils::findComputeReductionResult(VPReductionPHIRecipe *PhiR) {
return vputils::findUserOf<VPInstruction::ComputeReductionResult>(
cast<VPSingleDefRecipe>(SelR));
}
+
+VPCurrentIterationPHIRecipe *vputils::getCurrentIterationPhi(VPlan &Plan) {
+ VPCurrentIterationPHIRecipe *CurrentIteration = nullptr;
+ VPRegionBlock *LoopRegion = Plan.getVectorLoopRegion();
+ if (LoopRegion) {
+ VPCanonicalIVPHIRecipe *CanIV = LoopRegion->getCanonicalIV();
+ if (std::next(CanIV->getIterator()) != CanIV->getParent()->end())
+ CurrentIteration = dyn_cast_or_null<VPCurrentIterationPHIRecipe>(
+ std::next(CanIV->getIterator()));
+ return CurrentIteration;
+ }
+
+ // Find the vector loop entry by locating VPCurrentIterationPHIRecipe.
+ // There should be only one VPCurrentIteration in the entire plan.
+ for (VPBasicBlock *VPBB : VPBlockUtils::blocksOnly<VPBasicBlock>(
+ vp_depth_first_shallow(Plan.getEntry())))
+ for (VPRecipeBase &R : VPBB->phis())
+ if (auto *PhiR = dyn_cast<VPCurrentIterationPHIRecipe>(&R)) {
+ assert(!CurrentIteration &&
+ "Found multiple CurrentIteration. Only one expected");
+ CurrentIteration = PhiR;
+ }
+
+ return CurrentIteration;
+}
+
+VPHeaderPHIRecipe *vputils::getLoopIndex(VPlan &Plan) {
+ VPCurrentIterationPHIRecipe *CurrentIteration =
+ vputils::getCurrentIterationPhi(Plan);
+ if (CurrentIteration)
+ return CurrentIteration;
+ return Plan.getVectorLoopRegion()->getCanonicalIV();
+}
diff --git a/llvm/lib/Transforms/Vectorize/VPlanUtils.h b/llvm/lib/Transforms/Vectorize/VPlanUtils.h
index a5692699d9d76..f322b28f372f1 100644
--- a/llvm/lib/Transforms/Vectorize/VPlanUtils.h
+++ b/llvm/lib/Transforms/Vectorize/VPlanUtils.h
@@ -151,6 +151,11 @@ VPInstruction *findComputeReductionResult(VPReductionPHIRecipe *PhiR);
/// the header-mask pattern manually.
VPSingleDefRecipe *findHeaderMask(VPlan &Plan);
+/// Return VPCurrentIterationPHIRecipe if present, otherwise nullptr.
+VPCurrentIterationPHIRecipe *getCurrentIterationPhi(VPlan &Plan);
+/// Return the loop index PHI: VPCurrentIterationPHIRecipe for variable-length
+/// stepping loops, or VPCanonicalIVPHIRecipe otherwise.
+VPHeaderPHIRecipe *getLoopIndex(VPlan &Plan);
} // namespace vputils
//===----------------------------------------------------------------------===//
|
Refactor the creation of VPCurrentIterationPHIRecipe into a separate transformation pass `addCurrentIterationPhi`. This separates concerns and allows the CurrentIteration PHI to be created independently of EVL-based tail folding. Changes: - Add new `VPlanTransforms::addCurrentIterationPhi()` that creates VPCurrentIterationPHIRecipe and replaces uses of VPCanonicalIVPHIRecipe - Add `vputils::getCurrentIterationPhi()` helper to locate the VPCurrentIterationPHIRecipe in a VPlan - Add `vputils::getLoopIndex()` helper that returns either VPCurrentIterationPHIRecipe or VPCanonicalIVPHIRecipe
aeede48 to
e74fa4b
Compare
1. Inline fixupVFUsersForEVL into addCurrentIterationPhi 2. Move getLoopIndex helper into VPRegionBlock as getCurrentIteration
llvm/test/Transforms/LoopVectorize/VPlan/RISCV/vplan-vp-intrinsics-fixed-order-recurrence.ll
Outdated
Show resolved
Hide resolved
| @@ -33,13 +33,13 @@ define void @fadd(ptr noalias %a, ptr noalias %b, i64 %n) { | |||
| ; ZVFBFMIN-NEXT: [[TMP0:%.*]] = phi i64 [ 0, %[[VECTOR_PH]] ], [ [[INDEX_EVL_NEXT:%.*]], %[[VECTOR_BODY]] ] | |||
| ; ZVFBFMIN-NEXT: [[AVL:%.*]] = phi i64 [ [[N]], %[[VECTOR_PH]] ], [ [[AVL_NEXT:%.*]], %[[VECTOR_BODY]] ] | |||
| ; ZVFBFMIN-NEXT: [[TMP6:%.*]] = call i32 @llvm.experimental.get.vector.length.i64(i64 [[AVL]], i32 8, i1 true) | |||
| ; ZVFBFMIN-NEXT: [[TMP13:%.*]] = zext i32 [[TMP6]] to i64 | |||
There was a problem hiding this comment.
We cast EVL to the canonical IV type early, before replacing VF users.
| /// fixup recipes that use VF to use \p NewStep instead. After this | ||
| /// transformation, VPCanonicalIVPHIRecipe is used only for loop iterations | ||
| /// counting. | ||
| static void addCurrentIterationPhi(VPlan &Plan, VPValue *NewStep) { |
There was a problem hiding this comment.
| static void addCurrentIterationPhi(VPlan &Plan, VPValue *NewStep) { | |
| static void addCurrentIterationPhi(VPlan &Plan, VPSingleDefRecipe *NewStep) { |
Then you don't need the getDefiningRecipe in the FOR handling code
| VPValue *StepI32; | ||
| VPValue *StepSource; |
There was a problem hiding this comment.
Nit
| VPValue *StepI32; | |
| VPValue *StepSource; | |
| VPValue *StepI32, *StepSource; |
| if (match(NewStep, m_ZExt(m_VPValue(StepSource))) && | ||
| TypeInfo.inferScalarType(StepSource) == | ||
| Type::getInt32Ty(Plan.getContext())) { | ||
| StepI32 = StepSource; |
There was a problem hiding this comment.
Don't we already have a simplifcation for trunc(zext(x)) -> x in simplifyRecipes? Are we seeing regressions if we always call Builder.createScalarZExtOrTrunc?
| @@ -3200,30 +3233,24 @@ static void fixupVFUsersForEVL(VPlan &Plan, VPValue &EVL) { | |||
| if (!HeaderMask) | |||
| return; | |||
|
|
|||
| // Replace header masks with a mask equivalent to predicating by EVL: | |||
| // Replace header masks with a mask equivalent to predicating by | |||
| // variable-length step: | |||
There was a problem hiding this comment.
Nit
| // variable-length step: | |
| // NewStep: |
| /// VPCurrentIterationPHIRecipe except for the canonical IV increment, and | ||
| /// fixup recipes that use VF to use \p NewStep instead. After this | ||
| /// transformation, VPCanonicalIVPHIRecipe is used only for loop iterations | ||
| /// counting. |
There was a problem hiding this comment.
Nit, can we mention somewhere here that NewStep doesn't need to be loop invariant?
| @@ -5,7 +5,7 @@ define void @add(ptr noalias nocapture readonly %src1, ptr noalias nocapture rea | |||
| ; CHECK-LABEL: add | |||
| ; CHECK: LV(REG): VF = vscale x 4 | |||
| ; CHECK-NEXT: LV(REG): Found max usage: 2 item | |||
| ; CHECK-NEXT: LV(REG): RegisterClass: RISCV::GPRRC, 6 registers | |||
| ; CHECK-NEXT: LV(REG): RegisterClass: RISCV::GPRRC, 7 registers | |||
There was a problem hiding this comment.
How come we end up with an extra register use? Is there an extra scalar cast somewhere?
fhahn
left a comment
There was a problem hiding this comment.
It looks like the change is not NFC, there are quite a few movements in tests. Not sure if they can be avoided?
| @@ -802,6 +802,14 @@ const VPBasicBlock *VPBasicBlock::getCFGPredecessor(unsigned Idx) const { | |||
| return Pred->getExitingBasicBlock(); | |||
| } | |||
|
|
|||
| VPHeaderPHIRecipe *VPRegionBlock::getCurrentIteration() { | |||
There was a problem hiding this comment.
Is there a benefit from exposing the helpers more widely? Looks like the only have a single uses, and I don't think the movement is needed for extracting addCurrentIterationPhi from addExplicitVectorLength? Overall it seems like it would be good to try to keep the logic together?
There was a problem hiding this comment.
The idea is for addCurrentIterationPhi to be shared by the transforms of EVL, FFLoad, and partial alias masking. Adding VPRegionBlock::getCurrentIteration is a step toward eventually moving addExplicitVectorLength into tryToBuildVPlanWithVPRecipes.
That said, FFLoad and partial alias masking transform aren’t in place here yet, and there’s only a single user right now.
It probably makes more sense to keep the logic together for the moment, focus on #166164, and revisit this once those transforms can build on it.
@lukel97 does that sound good?
There was a problem hiding this comment.
Adding VPRegionBlock::getCurrentIteration is a step toward eventually moving addExplicitVectorLength into tryToBuildVPlanWithVPRecipes.
Yup, that's my understanding of the plan as well. We'll need to start changing other transforms to use getCurrentIteration if we want to do EVL tail folding/partial alias masking earlier during vplan construction. Since for some passes it won't be correct if they start using the canonical IV again after we've replaced it with the VPCurrentIterationPHI.
@lukel97 does that sound good?
Yup, sounds good if we want leave that to another PR
There was a problem hiding this comment.
Sounds good!
partial alias masking earlier during vplan construction.
I don't think this needs CurrentIterationPHI, as the increment will remain loop-invariant IIUC
There was a problem hiding this comment.
It's not that the increment needs to be invariant but the fact that the increment is no longer VFxUF, so various recipes need to adapt.
The fixupVFUsersForClampedVF transform in #182457 is doing the same thing as fixupVFUsersForEVL. It would be good to share this logic.
But also CurrentIterationPHI/VPRegionBlock::getCurrentIteration means that transforms and recipes can just directly use the correct increment, instead of relying on being fixed up later.
There was a problem hiding this comment.
Oh I think I misunderstood this comment. I thought that we just meant removing the VPRegionBlock::getCurrentIteration() accessor.
For #151300 do we not need to reuse fixupVFUsersForEVL?
There was a problem hiding this comment.
I think #182457 treats partial masking as a fixed-length stepping case, so it can simply replace VFxUF during VPlan execution.
For #151300, I think we still need the fixupVFUsers logic. My understanding is that #151300 would stack on top of #166164, which in turn depends on the VPRegionBlock::getCurrentIteration() accessor.
Once that is in place, splitting addCurrentIterationPhi out of addExplicitVectorLength would be a more natural next step.
Would you like to move #166164 forward with getCurrentIteration() accessor?
I’m a bit tied up with some downstream issues at the moment, so I may not have much time to work on this over the next few weeks.
Refactor the creation of VPCurrentIterationPHIRecipe into a separate transformation pass
addCurrentIterationPhi. This separates concerns and allows the CurrentIteration PHI to be created independently of EVL-based tail folding.Changes:
VPlanTransforms::addCurrentIterationPhi()that creates VPCurrentIterationPHIRecipe and replaces uses of VPCanonicalIVPHIRecipevputils::getCurrentIterationPhi()helper to locate the VPCurrentIterationPHIRecipe in a VPlanvputils::getLoopIndex()helper that returns either VPCurrentIterationPHIRecipe or VPCanonicalIVPHIRecipe