-
Notifications
You must be signed in to change notification settings - Fork 15.5k
[VPlan] Remove ExtractLastLane for plans with scalar VFs. #171145
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
ExtractLastLane is a no-op for scalar VFs. Update simplifyRecipe to remove them. This also requires adjusting the code in VPlanUnroll.cpp to split off handling of ExtractLastLane/ExtractPenultimateElement for scalar VFs, which now needs to match ExtractLastPart.
|
@llvm/pr-subscribers-vectorizers @llvm/pr-subscribers-llvm-transforms Author: Florian Hahn (fhahn) ChangesExtractLastLane is a no-op for scalar VFs. Update simplifyRecipe to remove them. This also requires adjusting the code in VPlanUnroll.cpp to split off handling of ExtractLastLane/ExtractPenultimateElement for scalar VFs, which now needs to match ExtractLastPart. Full diff: https://github.com/llvm/llvm-project/pull/171145.diff 3 Files Affected:
diff --git a/llvm/lib/Transforms/Vectorize/VPlanTransforms.cpp b/llvm/lib/Transforms/Vectorize/VPlanTransforms.cpp
index 320baeb454d46..4ad098d748568 100644
--- a/llvm/lib/Transforms/Vectorize/VPlanTransforms.cpp
+++ b/llvm/lib/Transforms/Vectorize/VPlanTransforms.cpp
@@ -1385,12 +1385,16 @@ static void simplifyRecipe(VPSingleDefRecipe *Def, VPTypeAnalysis &TypeInfo) {
return;
}
- // Look through ExtractLastLane (BuildVector ....).
- if (match(Def, m_ExtractLastLane(m_BuildVector()))) {
- auto *BuildVector = cast<VPInstruction>(Def->getOperand(0));
- Def->replaceAllUsesWith(
- BuildVector->getOperand(BuildVector->getNumOperands() - 1));
- return;
+ // Look through ExtractLastLane.
+ if (match(Def, m_ExtractLastLane(m_VPValue(A)))) {
+ if (match(A, m_BuildVector())) {
+ auto *BuildVector = cast<VPInstruction>(A);
+ Def->replaceAllUsesWith(
+ BuildVector->getOperand(BuildVector->getNumOperands() - 1));
+ return;
+ }
+ if (Plan->hasScalarVFOnly())
+ return Def->replaceAllUsesWith(A);
}
// Look through ExtractPenultimateElement (BuildVector ....).
diff --git a/llvm/lib/Transforms/Vectorize/VPlanUnroll.cpp b/llvm/lib/Transforms/Vectorize/VPlanUnroll.cpp
index 6fb706ea7d64b..7b4c524712d9a 100644
--- a/llvm/lib/Transforms/Vectorize/VPlanUnroll.cpp
+++ b/llvm/lib/Transforms/Vectorize/VPlanUnroll.cpp
@@ -371,10 +371,9 @@ void UnrollState::unrollBlock(VPBlockBase *VPB) {
continue;
}
- if (match(&R, m_ExtractLastLaneOfLastPart(m_VPValue(Op0))) ||
- match(&R, m_ExtractPenultimateElement(m_VPValue(Op0)))) {
- addUniformForAllParts(cast<VPSingleDefRecipe>(&R));
- if (Plan.hasScalarVFOnly()) {
+ if (Plan.hasScalarVFOnly()) {
+ if (match(&R, m_ExtractLastPart(m_VPValue(Op0))) ||
+ match(&R, m_ExtractPenultimateElement(m_VPValue(Op0)))) {
auto *I = cast<VPInstruction>(&R);
bool IsPenultimatePart =
I->getOpcode() == VPInstruction::ExtractPenultimateElement;
@@ -383,7 +382,10 @@ void UnrollState::unrollBlock(VPBlockBase *VPB) {
I->replaceAllUsesWith(getValueForPart(Op0, PartIdx));
continue;
}
- // For vector VF, always extract from the last part.
+ }
+ if (match(&R, m_ExtractLastLaneOfLastPart(m_VPValue(Op0))) ||
+ match(&R, m_ExtractPenultimateElement(m_VPValue(Op0)))) {
+ addUniformForAllParts(cast<VPSingleDefRecipe>(&R));
R.setOperand(0, getValueForPart(Op0, UF - 1));
continue;
}
diff --git a/llvm/test/Transforms/LoopVectorize/interleave-and-scalarize-only.ll b/llvm/test/Transforms/LoopVectorize/interleave-and-scalarize-only.ll
index bbd596a772c53..c77afa870e2c1 100644
--- a/llvm/test/Transforms/LoopVectorize/interleave-and-scalarize-only.ll
+++ b/llvm/test/Transforms/LoopVectorize/interleave-and-scalarize-only.ll
@@ -220,7 +220,6 @@ exit:
; DBG-EMPTY:
; DBG-NEXT: middle.block:
; DBG-NEXT: EMIT vp<[[RESUME_1_PART:%.+]]> = extract-last-part vp<[[SCALAR_STEPS]]>
-; DBG-NEXT: EMIT vp<[[RESUME_1:%.+]]> = extract-last-lane vp<[[RESUME_1_PART]]>
; DBG-NEXT: EMIT vp<[[CMP:%.+]]> = icmp eq vp<[[TC]]>, vp<[[VEC_TC]]>
; DBG-NEXT: EMIT branch-on-cond vp<[[CMP]]>
; DBG-NEXT: Successor(s): ir-bb<exit>, scalar.ph
@@ -230,7 +229,7 @@ exit:
; DBG-EMPTY:
; DBG-NEXT: scalar.ph:
; DBG-NEXT: EMIT-SCALAR vp<[[RESUME_IV:%.+]]> = phi [ vp<[[VTC]]>, middle.block ], [ ir<0>, ir-bb<entry> ]
-; DBG-NEXT: EMIT-SCALAR vp<[[RESUME_P:%.*]]> = phi [ vp<[[RESUME_1]]>, middle.block ], [ ir<0>, ir-bb<entry> ]
+; DBG-NEXT: EMIT-SCALAR vp<[[RESUME_P:%.*]]> = phi [ vp<[[RESUME_1_PART]]>, middle.block ], [ ir<0>, ir-bb<entry> ]
; DBG-NEXT: Successor(s): ir-bb<loop>
; DBG-EMPTY:
; DBG-NEXT: ir-bb<loop>:
|
lukel97
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
Mel-Chen
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LG
…(#171145) ExtractLastLane is a no-op for scalar VFs. Update simplifyRecipe to remove them. This also requires adjusting the code in VPlanUnroll.cpp to split off handling of ExtractLastLane/ExtractPenultimateElement for scalar VFs, which now needs to match ExtractLastPart. PR: llvm/llvm-project#171145
|
LLVM Buildbot has detected a new failure on builder Full details are available at: https://lab.llvm.org/buildbot/#/builders/95/builds/19379 Here is the relevant piece of the build log for the reference |
ayalz
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Couple of post-commits nits.
| return; | ||
| // Look through ExtractLastLane. | ||
| if (match(Def, m_ExtractLastLane(m_VPValue(A)))) { | ||
| if (match(A, m_BuildVector())) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Such BuildVector operands were visited and processed earlier by the code below folding them into Broadcast if all their operands are equal. Does that folding hold for VF=1 too; if so should such redundant Broadcasts be eliminated too? Worth looking for extracts from broadcast too/instead?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It looks like currently there are no extracts of broadcasts that could be folded in the tests. For VF=1, the broadcasts itself should be removed, possibly before hitting this pattern, but I might be missing something.
| if (match(&R, m_ExtractLastPart(m_VPValue(Op0))) || | ||
| match(&R, m_ExtractPenultimateElement(m_VPValue(Op0)))) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Would be simpler handle each match separately?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I am not sure, the only difference is the index to extract from?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Right, but folding that difference seems to complicate two trivial cases, the first which should hold for any VF, scalar or not:
if (match(&R, m_ExtractLastPart(m_VPValue(Op0)))) {
cast<VPInstruction>(&R)->replaceAllUsesWith(getValueForPart(Op0, UF - 1));
continue;
}
and the second which deals with scalar VF only:
if (Plan.hasScalarVFOnly() && match(&R, m_ExtractPenultimateElement(m_VPValue(Op0)))) {
// For scalar VF, use the penultimate scalar part w/o extracting a lane.
cast<VPInstruction>(&R)->replaceAllUsesWith(getValueForPart(Op0, UF - 2));
continue;
}
?
Are both missing addUniformForAllParts(cast<VPSingleDefRecipe>(&R));?
Can also hoist from below the auto *SingleDef = dyn_cast<VPSingleDefRecipe>(&R) to be used for the RAUW's here and addUniformForAllParts() elsewhere, instead of having to cast<>(&R) after various matches. Perhaps also handle the !SingleDef cases first, to freely access SingleDef afterwards.
| I->replaceAllUsesWith(getValueForPart(Op0, PartIdx)); | ||
| continue; | ||
| } | ||
| // For vector VF, always extract from the last part. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Worth retaining the comment, perhaps emphasizing that
// For vector VF, the penultimate element is always extracted from the last part.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks, re-added in e6e3f94
Re-add and emphasize comment regarding extracting from the last part, as suggested post-commit in #171145.
…t. (NFC) Re-add and emphasize comment regarding extracting from the last part, as suggested post-commit in llvm/llvm-project#171145.
ExtractLastLane is a no-op for scalar VFs. Update simplifyRecipe to remove them. This also requires adjusting the code in VPlanUnroll.cpp to split off handling of ExtractLastLane/ExtractPenultimateElement for scalar VFs, which now needs to match ExtractLastPart.