Conversation
This has no effect for now from what I can tell but is needed if we ever want to extend narrowInterleaveGroups to handle EVL tail folded loops. diff --git a/llvm/lib/Transforms/Vectorize/VPlanRecipes.cpp b/llvm/lib/Transforms/Vectorize/VPlanRecipes.cpp index 80cd112..488470d 100644 --- a/llvm/lib/Transforms/Vectorize/VPlanRecipes.cpp +++ b/llvm/lib/Transforms/Vectorize/VPlanRecipes.cpp @@ -1259,6 +1259,7 @@ bool VPInstruction::opcodeMayReadOrWriteFromMemory() const { case VPInstruction::ExtractLastLanePerPart: case VPInstruction::ExtractPenultimateElement: case VPInstruction::ActiveLaneMask: + case VPInstruction::ExplicitVectorLength: case VPInstruction::FirstActiveLane: case VPInstruction::FirstOrderRecurrenceSplice: case VPInstruction::LogicalAnd:
|
@llvm/pr-subscribers-backend-risc-v @llvm/pr-subscribers-llvm-transforms Author: Luke Lau (lukel97) ChangesThis has no effect for now from what I can tell but is needed if we ever want to extend narrowInterleaveGroups to handle EVL tail folded loops.
Full diff: https://github.com/llvm/llvm-project/pull/167647.diff 1 Files Affected:
diff --git a/llvm/lib/Transforms/Vectorize/VPlanRecipes.cpp b/llvm/lib/Transforms/Vectorize/VPlanRecipes.cpp
index 80cd112dbcd8a..488470d247968 100644
--- a/llvm/lib/Transforms/Vectorize/VPlanRecipes.cpp
+++ b/llvm/lib/Transforms/Vectorize/VPlanRecipes.cpp
@@ -1259,6 +1259,7 @@ bool VPInstruction::opcodeMayReadOrWriteFromMemory() const {
case VPInstruction::ExtractLastLanePerPart:
case VPInstruction::ExtractPenultimateElement:
case VPInstruction::ActiveLaneMask:
+ case VPInstruction::ExplicitVectorLength:
case VPInstruction::FirstActiveLane:
case VPInstruction::FirstOrderRecurrenceSplice:
case VPInstruction::LogicalAnd:
|
|
@llvm/pr-subscribers-vectorizers Author: Luke Lau (lukel97) ChangesThis has no effect for now from what I can tell but is needed if we ever want to extend narrowInterleaveGroups to handle EVL tail folded loops.
Full diff: https://github.com/llvm/llvm-project/pull/167647.diff 1 Files Affected:
diff --git a/llvm/lib/Transforms/Vectorize/VPlanRecipes.cpp b/llvm/lib/Transforms/Vectorize/VPlanRecipes.cpp
index 80cd112dbcd8a..488470d247968 100644
--- a/llvm/lib/Transforms/Vectorize/VPlanRecipes.cpp
+++ b/llvm/lib/Transforms/Vectorize/VPlanRecipes.cpp
@@ -1259,6 +1259,7 @@ bool VPInstruction::opcodeMayReadOrWriteFromMemory() const {
case VPInstruction::ExtractLastLanePerPart:
case VPInstruction::ExtractPenultimateElement:
case VPInstruction::ActiveLaneMask:
+ case VPInstruction::ExplicitVectorLength:
case VPInstruction::FirstActiveLane:
case VPInstruction::FirstOrderRecurrenceSplice:
case VPInstruction::LogicalAnd:
|
artagnon
left a comment
There was a problem hiding this comment.
Oh, I didn't do this myself due to a missing test case!
I'm not sure if there's any functional change today given that an ExplicitVectorLength isn't a candidate for hoisting/sinking etc., but I didn't want to mark it as NFC since its not really a refactoring. I split the change off anyway to show that there's no test diff. |
artagnon
left a comment
There was a problem hiding this comment.
Okay, don't feel strongly about adding something for the future. Weak LGTM, thanks!
fhahn
left a comment
There was a problem hiding this comment.
Are there cases where we could/should simplify evl, then we could remove it, for example if we remove the backedge https://github.com/llvm/llvm-project/blob/main/llvm/test/Transforms/LoopVectorize/RISCV/vector-loop-backedge-elimination-with-evl.ll ?
If so, would be good to combine this with the update here.
I've reworked this PR to simplify the EVL when it's known from AVL <= VF in 6bb2fe0. This was a nice catch, I checked and this seems to remove a small handful of vsetvlis in SPEC CPU 2017 that RISCVInsertVSETVLI can't handle on its own. |
artagnon
left a comment
There was a problem hiding this comment.
The EVL simplification looks good!
artagnon
left a comment
There was a problem hiding this comment.
I think the updated patch LGTM, thanks!
| bool MadeChange = tryToReplaceALMWithWideALM(Plan, BestVF, BestUF); | ||
| MadeChange |= simplifyBranchConditionForVFAndUF(Plan, BestVF, BestUF, PSE); | ||
| MadeChange |= optimizeVectorInductionWidthForTCAndVFUF(Plan, BestVF, BestUF); | ||
| MadeChange |= simplifyKnownEVL(Plan, BestVF, PSE); |
There was a problem hiding this comment.
If we move this to the start, would it be sufficient to do a shallow traversal starting at the region entry?
There was a problem hiding this comment.
I think this needs to run after simplifyBranchConditionForVFAndUF so that the AVL PHI feeding into ExplicitVectorLength is replaced with its singular incoming value, and it looks like the region is removed there
There was a problem hiding this comment.
ah that's unfortuante, thanks for checking
| if (!match(&R, m_EVL(m_VPValue(AVL)))) | ||
| continue; | ||
|
|
||
| const SCEV *AVLSCEV = vputils::getSCEVExprForVPValue(AVL, *PSE.getSE()); |
There was a problem hiding this comment.
Can put PSE->getSE() into a variable, avoid repeated lookups
| bool MadeChange = tryToReplaceALMWithWideALM(Plan, BestVF, BestUF); | ||
| MadeChange |= simplifyBranchConditionForVFAndUF(Plan, BestVF, BestUF, PSE); | ||
| MadeChange |= optimizeVectorInductionWidthForTCAndVFUF(Plan, BestVF, BestUF); | ||
| MadeChange |= simplifyKnownEVL(Plan, BestVF, PSE); |
There was a problem hiding this comment.
ah that's unfortuante, thanks for checking
…ax_lanes On RISC-V, some loops that the loop vectorizer vectorizes pre-LTO may turn out to have the exact trip count exposed after LTO, see llvm#164762. If the trip count is small enough we can fold away the @llvm.experimental.get.vector.length intrinsic based on this corollary from the LangRef: > If %cnt is less than or equal to %max_lanes, the return value is equal to %cnt. This on its own doesn't remove the @llvm.experimental.get.vector.length in llvm#164762 since we also need to teach computeKnownBits about @llvm.experimental.get.vector.length and the sub recurrence, but this PR is a starting point. I've added this in InstCombine rather than InstSimplify since we may need to insert a truncation (@llvm.experimental.get.vector.length can take an i64 %cnt argument, but always truncates the result to i32). Note that there was something similar done in VPlan in llvm#167647 for when the loop vectorizer knows the trip count.
…ax_lanes (#169293) On RISC-V, some loops that the loop vectorizer vectorizes pre-LTO may turn out to have the exact trip count exposed after LTO, see #164762. If the trip count is small enough we can fold away the @llvm.experimental.get.vector.length intrinsic based on this corollary from the LangRef: > If %cnt is less than or equal to %max_lanes, the return value is equal to %cnt. This on its own doesn't remove the @llvm.experimental.get.vector.length in #164762 since we also need to teach computeKnownBits about @llvm.experimental.get.vector.length and the sub recurrence, but this PR is a starting point. I've added this in InstCombine rather than InstSimplify since we may need to insert a truncation (@llvm.experimental.get.vector.length can take an i64 %cnt argument, the result is always i32). Note that there was something similar done in VPlan in #167647 for when the loop vectorizer knows the trip count.
…ax_lanes (llvm#169293) On RISC-V, some loops that the loop vectorizer vectorizes pre-LTO may turn out to have the exact trip count exposed after LTO, see llvm#164762. If the trip count is small enough we can fold away the @llvm.experimental.get.vector.length intrinsic based on this corollary from the LangRef: > If %cnt is less than or equal to %max_lanes, the return value is equal to %cnt. This on its own doesn't remove the @llvm.experimental.get.vector.length in llvm#164762 since we also need to teach computeKnownBits about @llvm.experimental.get.vector.length and the sub recurrence, but this PR is a starting point. I've added this in InstCombine rather than InstSimplify since we may need to insert a truncation (@llvm.experimental.get.vector.length can take an i64 %cnt argument, the result is always i32). Note that there was something similar done in VPlan in llvm#167647 for when the loop vectorizer knows the trip count.
…ax_lanes (llvm#169293) On RISC-V, some loops that the loop vectorizer vectorizes pre-LTO may turn out to have the exact trip count exposed after LTO, see llvm#164762. If the trip count is small enough we can fold away the @llvm.experimental.get.vector.length intrinsic based on this corollary from the LangRef: > If %cnt is less than or equal to %max_lanes, the return value is equal to %cnt. This on its own doesn't remove the @llvm.experimental.get.vector.length in llvm#164762 since we also need to teach computeKnownBits about @llvm.experimental.get.vector.length and the sub recurrence, but this PR is a starting point. I've added this in InstCombine rather than InstSimplify since we may need to insert a truncation (@llvm.experimental.get.vector.length can take an i64 %cnt argument, the result is always i32). Note that there was something similar done in VPlan in llvm#167647 for when the loop vectorizer knows the trip count.
llvm.experimental.get.vector.lengthhas the property that if the AVL (%cnt) is less than or equal to VF (%max_lanes) then the return value is just AVL.This patch uses SCEV to simplify this in optimizeForVFAndUF, and adds
ExplicitVectorLengthtoVPInstruction::opcodeMayReadOrWriteFromMemoryso it gets removed once dead.