-
Notifications
You must be signed in to change notification settings - Fork 15.7k
[VPlan] Add VPValue for VF, use it for VPWidenIntOrFpInductionRecipe. #95305
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Changes from 2 commits
7ed9803
dbdd4f3
031e6b2
907c19f
77c1e61
7659683
4d495d0
4c0467c
2eaf251
9e95c10
4d5f2e5
4ea3c6f
ee8409c
330dde5
b8bcc2f
2b1bf7d
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
| Original file line number | Diff line number | Diff line change | ||
|---|---|---|---|---|
|
|
@@ -8249,10 +8249,12 @@ createWidenInductionRecipes(PHINode *Phi, Instruction *PhiOrTrunc, | |||
| VPValue *Step = | ||||
| vputils::getOrCreateVPValueForSCEVExpr(Plan, IndDesc.getStep(), SE); | ||||
| if (auto *TruncI = dyn_cast<TruncInst>(PhiOrTrunc)) { | ||||
| return new VPWidenIntOrFpInductionRecipe(Phi, Start, Step, IndDesc, TruncI); | ||||
| return new VPWidenIntOrFpInductionRecipe(Phi, Start, Step, Plan.getVF(), | ||||
| IndDesc, TruncI); | ||||
| } | ||||
| assert(isa<PHINode>(PhiOrTrunc) && "must be a phi node here"); | ||||
| return new VPWidenIntOrFpInductionRecipe(Phi, Start, Step, IndDesc); | ||||
| return new VPWidenIntOrFpInductionRecipe(Phi, Start, Step, Plan.getVF(), | ||||
| IndDesc); | ||||
| } | ||||
|
|
||||
| VPHeaderPHIRecipe *VPRecipeBuilder::tryToOptimizeInductionPHI( | ||||
|
|
@@ -8667,6 +8669,7 @@ static void addCanonicalIVRecipes(VPlan &Plan, Type *IdxTy, bool HasNUW, | |||
| VPBasicBlock *Header = TopRegion->getEntryBasicBlock(); | ||||
| Header->insert(CanonicalIVPHI, Header->begin()); | ||||
|
|
||||
| VPBuilder PhBuilder(cast<VPBasicBlock>(TopRegion->getSinglePredecessor())); | ||||
|
||||
| VPBuilder PhBuilder(cast<VPBasicBlock>(TopRegion->getSinglePredecessor())); |
unused?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Removed, thanks!
| Original file line number | Diff line number | Diff line change | ||||||||||||||||
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
|
|
@@ -928,8 +928,19 @@ void VPlan::prepareToExecute(Value *TripCountV, Value *VectorTripCountV, | |||||||||||||||||
|
|
||||||||||||||||||
| IRBuilder<> Builder(State.CFG.PrevBB->getTerminator()); | ||||||||||||||||||
| // FIXME: Model VF * UF computation completely in VPlan. | ||||||||||||||||||
|
Collaborator
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Would it be possible to handle this FIXME now that VF is assigned a VPValue?
Contributor
Author
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Yes, would either require introducing a
Collaborator
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. ok, so FIXME remains until a symbolic UF VPValue placeholder is introduced, as follow-up, along with a VPInstruction multiplying it with VF (introduced here), possibly subject to constant folding?
Contributor
Author
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Yep |
||||||||||||||||||
| VFxUF.setUnderlyingValue( | ||||||||||||||||||
| createStepForVF(Builder, TripCountV->getType(), State.VF, State.UF)); | ||||||||||||||||||
| Value *RuntimeVF = nullptr; | ||||||||||||||||||
| if (VF.getNumUsers()) { | ||||||||||||||||||
| RuntimeVF = createStepForVF(Builder, TripCountV->getType(), State.VF, 1); | ||||||||||||||||||
|
||||||||||||||||||
| Value *RuntimeVF = nullptr; | |
| if (VF.getNumUsers()) { | |
| RuntimeVF = createStepForVF(Builder, TripCountV->getType(), State.VF, 1); | |
| if (VF.getNumUsers()) { | |
| Value *RuntimeVF = getRuntimeVF(Builder, TripCountV->getType(), State.VF); |
could setting VF and/or VFxUF be done by optimizeForVFAndUF() instead, i.e., as soon as they are fixed for a VPlan?
The term RuntimeVF (here and below) may be confusing, as it relates to the Static (Fixed or Scalable) VF, as indicated in the Type of vector Values, rather than the Dynamic EVL.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Adjusted, thanks!
could setting VF and/or VFxUF be done by optimizeForVFAndUF() instead, i.e., as soon as they are fixed for a VPlan?
Could do, but may not be ideal to rely on an optimization to set them?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
could setting VF and/or VFxUF be done by optimizeForVFAndUF() instead, i.e., as soon as they are fixed for a VPlan?
Could do, but may not be ideal to rely on an optimization to set them?
How about renaming the optimizeForVFAndUF() VPlanTransform to setVFAndUF(), or setVF() and setUF() VPlanTransforms, where fixing a constant value triggers its folding optimizations?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sounds good, can update to setVFAndUF as a start separately?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Given RuntimeVF is only non-null if VF.getNumUsers() != 0 wouldn't it be neater to simply fold this into the if (VF.getNumUsers()) { block? i.e.
if (VF.getNumUsers()) {
RuntimeVF = createStepForVF(Builder, TripCountV->getType(), State.VF, 1);
VF.setUnderlyingValue(RuntimeVF);
VFxUF.setUnderlyingValue(
State.UF > 1 ? Builder.CreateMul(
VF.getLiveInIRValue(),
ConstantInt::get(TripCountV->getType(), State.UF))
: VF.getLiveInIRValue());
} else {
VFxUF.setUnderlyingValue(
createStepForVF(Builder, TripCountV->getType(), State.VF, State.UF));
}
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Adjusted, thanks!
Outdated
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
VFxUF can always be set via createStepForVF(), regardless of UF=1, UF>1, VF having users or not. But we want to set it directly instead, using VF - when the latter is built with getRuntimeVF()? Maybe related to @david-arm's comment below about "... why we need a specialised version for the UF=1 case".
| State.UF > 1 ? Builder.CreateMul( | |
| VF.getLiveInIRValue(), | |
| ConstantInt::get(TripCountV->getType(), State.UF)) | |
| : VF.getLiveInIRValue()); | |
| State.UF > 1 ? Builder.CreateMul( | |
| RuntimeVF, | |
| ConstantInt::get(TripCountV->getType(), State.UF)) | |
| : RuntimeVF); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
But we want to set it directly instead, using VF - when the latter is built with getRuntimeVF(
yep, to reduce calls to @vscale .
Updated to use RuntimeVF variable, thanks!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
VFxUF is set regardless of having users or not. Be consistent?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I see earlier that
... Code for VF is only generated if there are users of VF, to avoid unnecessary test changes.
Code for VF must be generated regardless of direct users, issues is its position and possible repetition?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, VFxUF is used to increment the canonical induction, which is present in almost all cases, but could also be done only if users exist.
... Code for VF is only generated if there are users of VF, to avoid unnecessary test changes.
Code for VF must be generated regardless of direct users, issues is its position and possible repetition?
There are cases where VF separately isn't used, only as part of VFxUF. If only VFxUF is used, the constant multiply of VF * UF is folded, hence always generating VF separately would lead to additional test changes. Some alternatives are mentioned in my latest comment above.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, VFxUF is used to increment the canonical induction, which is present in almost all cases, but could also be done only if users exist.
Exceptional use-less cases are loops whose trip count is VFxUF - where optimizeForVFAndUF() discards the canonical IV's increment by VFxUF?
So one way to improve consistency, w/o changing too many tests, would be to (also) set VFxUF only if used? Although logically it should always be used - to set the vector trip count(?).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
At the moment, it is always used (added assert in 1a5a1e9).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should this condition be an assert?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, will adjust separately
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I must be honest it's not obvious to me why we need a specialised version for the UF=1 case. Why can't we use VFxUF always given VFx1 is a valid use case? It seems to add a bit of extra complexity. I understand in a loop we may want to calculate both a runtime VF (for a second part or something) and a runtime VF x UF (for a canonical induction variable, etc), but if we've filled out Old2NewVPValues[&VFxUF] with both VFx1 and VFxUF we can just query accordingly, right?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ut if we've filled out Old2NewVPValues[&VFxUF] with both VFx1 and VFxUF we can just query accordingly, right?
I am not sure what filling them out accordingly means here, perhaps using a pair instead of 2 separate fields?
There are different places that need both VF and VFxUF and to serve them we would need different values I think, but I might be missing something from your suggestion?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
| Old2NewVPValues[&VFxUF] = &NewPlan->VFxUF; | |
| Old2NewVPValues[&VF] = &NewPlan->VF; | |
| Old2NewVPValues[&VF] = &NewPlan->VF; | |
| Old2NewVPValues[&VFxUF] = &NewPlan->VFxUF; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Updated, thanks!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should this condition be an assert?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, will adjust separately
| Original file line number | Diff line number | Diff line change | ||||||
|---|---|---|---|---|---|---|---|---|
|
|
@@ -1770,25 +1770,27 @@ class VPWidenIntOrFpInductionRecipe : public VPHeaderPHIRecipe { | |||||||
|
|
||||||||
| public: | ||||||||
| VPWidenIntOrFpInductionRecipe(PHINode *IV, VPValue *Start, VPValue *Step, | ||||||||
| const InductionDescriptor &IndDesc) | ||||||||
| VPValue *VF, const InductionDescriptor &IndDesc) | ||||||||
| : VPHeaderPHIRecipe(VPDef::VPWidenIntOrFpInductionSC, IV, Start), IV(IV), | ||||||||
| Trunc(nullptr), IndDesc(IndDesc) { | ||||||||
| addOperand(Step); | ||||||||
| addOperand(VF); | ||||||||
| } | ||||||||
|
|
||||||||
| VPWidenIntOrFpInductionRecipe(PHINode *IV, VPValue *Start, VPValue *Step, | ||||||||
| const InductionDescriptor &IndDesc, | ||||||||
| VPValue *VF, const InductionDescriptor &IndDesc, | ||||||||
| TruncInst *Trunc) | ||||||||
| : VPHeaderPHIRecipe(VPDef::VPWidenIntOrFpInductionSC, Trunc, Start), | ||||||||
| IV(IV), Trunc(Trunc), IndDesc(IndDesc) { | ||||||||
| addOperand(Step); | ||||||||
| addOperand(VF); | ||||||||
| } | ||||||||
|
|
||||||||
| ~VPWidenIntOrFpInductionRecipe() override = default; | ||||||||
|
|
||||||||
| VPWidenIntOrFpInductionRecipe *clone() override { | ||||||||
| return new VPWidenIntOrFpInductionRecipe(IV, getStartValue(), | ||||||||
| getStepValue(), IndDesc, Trunc); | ||||||||
| return new VPWidenIntOrFpInductionRecipe( | ||||||||
| IV, getStartValue(), getStepValue(), getOperand(2), IndDesc, Trunc); | ||||||||
|
||||||||
| IV, getStartValue(), getStepValue(), getOperand(2), IndDesc, Trunc); | |
| IV, getStartValue(), getStepValue(), getVFValue(), IndDesc, Trunc); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done, thanks!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
| VPValue VF; | |
| /// Represents the vectorization factor of the loop. | |
| VPValue VF; |
(belongs to the loop Region rather than VPlan itself)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Added thanks!
VF may also be referenced outside the loop region, so probably should be defined at the top-level and used by the region?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
(This is a follow-up thought, hence originally in brackets)
Regions (and blocks in general) currently model the HCFG only, leaving the def-use graph of values to recipes. A loop region has a canonical IV recipe which typically uses VFxUF as the canonical step controlling the loop. The latter could be a Mul recipe whose operands provide VF and UF, and a loop region could provide getVF() and getUF() to ease their retrieval?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Agreed, I think ideally the loop region would be a user of both VF, UF (and possibly VF x UF).
Outdated
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
| VPValue *getVF() { return &VF; }; | |
| /// Returns the VF of the vector loop region. | |
| VPValue *getVF() { return &VF; }; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Updated, thanks!
| Original file line number | Diff line number | Diff line change |
|---|---|---|
|
|
@@ -18,14 +18,14 @@ define void @induction_i7(ptr %dst) #0 { | |
| ; CHECK-NEXT: [[N_VEC:%.*]] = sub i64 64, [[N_MOD_VF]] | ||
| ; CHECK-NEXT: [[IND_END:%.*]] = trunc i64 [[N_VEC]] to i7 | ||
| ; CHECK-NEXT: [[TMP4:%.*]] = call i64 @llvm.vscale.i64() | ||
| ; CHECK-NEXT: [[TMP5:%.*]] = mul i64 [[TMP4]], 4 | ||
| ; CHECK-NEXT: [[TMP40:%.*]] = mul i64 [[TMP4]], 2 | ||
| ; CHECK-NEXT: [[TMP5:%.*]] = mul i64 [[TMP40]], 2 | ||
| ; CHECK-NEXT: [[TMP6:%.*]] = call <vscale x 2 x i8> @llvm.experimental.stepvector.nxv2i8() | ||
| ; CHECK-NEXT: [[TMP7:%.*]] = trunc <vscale x 2 x i8> [[TMP6]] to <vscale x 2 x i7> | ||
| ; CHECK-NEXT: [[TMP8:%.*]] = add <vscale x 2 x i7> [[TMP7]], zeroinitializer | ||
| ; CHECK-NEXT: [[TMP9:%.*]] = mul <vscale x 2 x i7> [[TMP8]], shufflevector (<vscale x 2 x i7> insertelement (<vscale x 2 x i7> poison, i7 1, i64 0), <vscale x 2 x i7> poison, <vscale x 2 x i32> zeroinitializer) | ||
| ; CHECK-NEXT: [[INDUCTION:%.*]] = add <vscale x 2 x i7> zeroinitializer, [[TMP9]] | ||
| ; CHECK-NEXT: [[TMP10:%.*]] = call i7 @llvm.vscale.i7() | ||
| ; CHECK-NEXT: [[TMP11:%.*]] = mul i7 [[TMP10]], 2 | ||
| ; CHECK-NEXT: [[TMP11:%.*]] = trunc i64 [[TMP40]] to i7 | ||
| ; CHECK-NEXT: [[TMP12:%.*]] = mul i7 1, [[TMP11]] | ||
| ; CHECK-NEXT: [[DOTSPLATINSERT:%.*]] = insertelement <vscale x 2 x i7> poison, i7 [[TMP12]], i64 0 | ||
| ; CHECK-NEXT: [[DOTSPLAT:%.*]] = shufflevector <vscale x 2 x i7> [[DOTSPLATINSERT]], <vscale x 2 x i7> poison, <vscale x 2 x i32> zeroinitializer | ||
|
|
@@ -92,14 +92,14 @@ define void @induction_i3_zext(ptr %dst) #0 { | |
| ; CHECK-NEXT: [[N_VEC:%.*]] = sub i64 64, [[N_MOD_VF]] | ||
| ; CHECK-NEXT: [[IND_END:%.*]] = trunc i64 [[N_VEC]] to i3 | ||
| ; CHECK-NEXT: [[TMP4:%.*]] = call i64 @llvm.vscale.i64() | ||
| ; CHECK-NEXT: [[TMP5:%.*]] = mul i64 [[TMP4]], 4 | ||
| ; CHECK-NEXT: [[TMP40:%.*]] = mul i64 [[TMP4]], 2 | ||
| ; CHECK-NEXT: [[TMP5:%.*]] = mul i64 [[TMP40]], 2 | ||
|
Contributor
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. This is interesting. I guess I was expecting that since this patch only creates a runtime VF on demand that total lines of IR would only decrease, rather than increase which is happening here.
Contributor
Author
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. The reason this was showing up as net increase was that the test case originally didn't check all instructions in vector.ph; now that it does we safe vscale & mul but need a trunc, so it is neutral overall in this case. |
||
| ; CHECK-NEXT: [[TMP6:%.*]] = call <vscale x 2 x i8> @llvm.experimental.stepvector.nxv2i8() | ||
| ; CHECK-NEXT: [[TMP7:%.*]] = trunc <vscale x 2 x i8> [[TMP6]] to <vscale x 2 x i3> | ||
| ; CHECK-NEXT: [[TMP8:%.*]] = add <vscale x 2 x i3> [[TMP7]], zeroinitializer | ||
| ; CHECK-NEXT: [[TMP9:%.*]] = mul <vscale x 2 x i3> [[TMP8]], shufflevector (<vscale x 2 x i3> insertelement (<vscale x 2 x i3> poison, i3 1, i64 0), <vscale x 2 x i3> poison, <vscale x 2 x i32> zeroinitializer) | ||
| ; CHECK-NEXT: [[INDUCTION:%.*]] = add <vscale x 2 x i3> zeroinitializer, [[TMP9]] | ||
| ; CHECK-NEXT: [[TMP10:%.*]] = call i3 @llvm.vscale.i3() | ||
| ; CHECK-NEXT: [[TMP11:%.*]] = mul i3 [[TMP10]], 2 | ||
| ; CHECK-NEXT: [[TMP11:%.*]] = trunc i64 [[TMP40]] to i3 | ||
| ; CHECK-NEXT: [[TMP12:%.*]] = mul i3 1, [[TMP11]] | ||
| ; CHECK-NEXT: [[DOTSPLATINSERT:%.*]] = insertelement <vscale x 2 x i3> poison, i3 [[TMP12]], i64 0 | ||
| ; CHECK-NEXT: [[DOTSPLAT:%.*]] = shufflevector <vscale x 2 x i3> [[DOTSPLATINSERT]], <vscale x 2 x i3> poison, <vscale x 2 x i32> zeroinitializer | ||
|
|
||
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
VPWidenIntOrFpInductionRecipe could retrieve VF by looking up Plan.getVF() on demand rather than recording it as on operand, but the latter helps in checking if VF has users, i.e., if any VPWidenIntOrFpInductionRecipe exists?
Surely VF is needed to vectorize any loop, including ones free of VPWidenIntOrFpInductionRecipes. Does it need to be cached somehow, to prevent regeneration?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Exactly, this is used to check whether to generate it or not.
VFxUFis similarly added as operand to the VPInstruction to increment the canonical IV.We could create VF unconditionally, then we would have update all tests with scalable vectors to split up VFxUF computation to
((vscale * VF) * UF)instead of(vscale * (VF * UF))even ifvscale * VFis only used in the multiply byUF.To limit this we could try to fold it back as post-codegen cleanup. Or update all tests, happy to go either way (or leave as is in the current patch for now)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Agree to model usage and dependence of values directly via explicit operands, rather than by retrieving them from plan (or region).
If/when VFxUF becomes a Mul VPInstruction which uses VF (and UF), will the check for no VF users change to check if VF is used only by this Mul?
Could this folding be done by a subsequent VPlan2VPlan pass? Would indeed be good to reduce amount of test changes...