[VPlan] Only compute reg pressure if considered. NFCI#156923
Merged
[VPlan] Only compute reg pressure if considered. NFCI#156923
Conversation
In llvm#149056 VF pruning was changed so that it only pruned VFs that stemmed from MaxBandwidth being enabled. However we always compute register pressure regardless of whether or not max bandwidth produced any new VFs. This skips the computation if not needed and renames the method for clarity. The diff in reg-usage.ll is due to the scalable VPlan not actually having any maxbandwidth VFs, so I've changed it to check the fixed-length VF instead, which is affected by maxbandwidth.
Member
|
@llvm/pr-subscribers-vectorizers @llvm/pr-subscribers-llvm-transforms Author: Luke Lau (lukel97) ChangesIn #149056 VF pruning was changed so that it only pruned VFs that stemmed from MaxBandwidth being enabled. However we always compute register pressure regardless of whether or not max bandwidth produced any new VFs. This skips the computation if not needed and renames the method for clarity. The diff in reg-usage.ll is due to the scalable VPlan not actually having any maxbandwidth VFs, so I've changed it to check the fixed-length VF instead, which is affected by maxbandwidth. Full diff: https://github.com/llvm/llvm-project/pull/156923.diff 2 Files Affected:
diff --git a/llvm/lib/Transforms/Vectorize/LoopVectorize.cpp b/llvm/lib/Transforms/Vectorize/LoopVectorize.cpp
index 3fbeef1211954..ab8493919138f 100644
--- a/llvm/lib/Transforms/Vectorize/LoopVectorize.cpp
+++ b/llvm/lib/Transforms/Vectorize/LoopVectorize.cpp
@@ -937,8 +937,8 @@ class LoopVectorizationCostModel {
/// user options, for the given register kind.
bool useMaxBandwidth(TargetTransformInfo::RegisterKind RegKind);
- /// \return True if register pressure should be calculated for the given VF.
- bool shouldCalculateRegPressureForVF(ElementCount VF);
+ /// \return True if register pressure should be considered for the given VF.
+ bool shouldConsiderRegPressureForVF(ElementCount VF);
/// \return The size (in bits) of the smallest and widest types in the code
/// that needs to be vectorized. We ignore values that remain scalar such as
@@ -3727,7 +3727,7 @@ LoopVectorizationCostModel::computeMaxVF(ElementCount UserVF, unsigned UserIC) {
return FixedScalableVFPair::getNone();
}
-bool LoopVectorizationCostModel::shouldCalculateRegPressureForVF(
+bool LoopVectorizationCostModel::shouldConsiderRegPressureForVF(
ElementCount VF) {
if (!useMaxBandwidth(VF.isScalable()
? TargetTransformInfo::RGK_ScalableVector
@@ -4172,8 +4172,9 @@ VectorizationFactor LoopVectorizationPlanner::selectVectorizationFactor() {
P->vectorFactors().end());
SmallVector<VPRegisterUsage, 8> RUs;
- if (CM.useMaxBandwidth(TargetTransformInfo::RGK_ScalableVector) ||
- CM.useMaxBandwidth(TargetTransformInfo::RGK_FixedWidthVector))
+ if (any_of(VFs, [this](ElementCount VF) {
+ return CM.shouldConsiderRegPressureForVF(VF);
+ }))
RUs = calculateRegisterUsageForPlan(*P, VFs, TTI, CM.ValuesToIgnore);
for (unsigned I = 0; I < VFs.size(); I++) {
@@ -4185,7 +4186,7 @@ VectorizationFactor LoopVectorizationPlanner::selectVectorizationFactor() {
/// If the register pressure needs to be considered for VF,
/// don't consider the VF as valid if it exceeds the number
/// of registers for the target.
- if (CM.shouldCalculateRegPressureForVF(VF) &&
+ if (CM.shouldConsiderRegPressureForVF(VF) &&
RUs[I].exceedsMaxNumRegs(TTI, ForceTargetNumVectorRegs))
continue;
@@ -7020,8 +7021,9 @@ VectorizationFactor LoopVectorizationPlanner::computeBestVF() {
P->vectorFactors().end());
SmallVector<VPRegisterUsage, 8> RUs;
- if (CM.useMaxBandwidth(TargetTransformInfo::RGK_ScalableVector) ||
- CM.useMaxBandwidth(TargetTransformInfo::RGK_FixedWidthVector))
+ if (any_of(VFs, [this](ElementCount VF) {
+ return CM.shouldConsiderRegPressureForVF(VF);
+ }))
RUs = calculateRegisterUsageForPlan(*P, VFs, TTI, CM.ValuesToIgnore);
for (unsigned I = 0; I < VFs.size(); I++) {
@@ -7047,7 +7049,7 @@ VectorizationFactor LoopVectorizationPlanner::computeBestVF() {
InstructionCost Cost = cost(*P, VF);
VectorizationFactor CurrentFactor(VF, Cost, ScalarCost);
- if (CM.shouldCalculateRegPressureForVF(VF) &&
+ if (CM.shouldConsiderRegPressureForVF(VF) &&
RUs[I].exceedsMaxNumRegs(TTI, ForceTargetNumVectorRegs)) {
LLVM_DEBUG(dbgs() << "LV(REG): Not considering vector loop of width "
<< VF << " because it uses too many registers\n");
diff --git a/llvm/test/Transforms/LoopVectorize/AArch64/reg-usage.ll b/llvm/test/Transforms/LoopVectorize/AArch64/reg-usage.ll
index e51a925040a49..01d103264fafe 100644
--- a/llvm/test/Transforms/LoopVectorize/AArch64/reg-usage.ll
+++ b/llvm/test/Transforms/LoopVectorize/AArch64/reg-usage.ll
@@ -14,7 +14,7 @@
define void @get_invariant_reg_usage(ptr %z) {
; CHECK-LABEL: LV: Checking a loop in 'get_invariant_reg_usage'
-; CHECK: LV(REG): VF = vscale x 16
+; CHECK: LV(REG): VF = 16
; CHECK-NEXT: LV(REG): Found max usage: 2 item
; CHECK-NEXT: LV(REG): RegisterClass: Generic::ScalarRC, 2 registers
; CHECK-NEXT: LV(REG): RegisterClass: Generic::VectorRC, 1 registers
|
lukel97
added a commit
that referenced
this pull request
Sep 12, 2025
Stacked on #156923 In https://godbolt.org/z/8svWaredK, we spill a lot on RISC-V because whilst the largest element type is i8, we generate a bunch of pointer vectors for gathers and scatters. This means the VF chosen is quite high e.g. <vscale x 16 x i8>, but we end up using a bunch of <vscale x 16 x i64> m8 registers for the pointers. This was briefly fixed by #132190 where we computed register pressure in VPlan and used it to prune VFs that were likely to spill. The legacy cost model wasn't able to do this pruning because it didn't have visibility into the pointer vectors that were needed for the gathers/scatters. However VF pruning was restricted again to just the case when max bandwidth was enabled in #141736 to avoid an AArch64 regression, and restricted again in #149056 to only prune VFs that had max bandwidth enabled. On RISC-V we take advantage of register grouping for performance and choose a default of LMUL 2, which means there are 16 registers to work with – half the number as SVE, so we encounter higher register pressure more frequently. As such, we likely want to always consider pruning VFs with high register pressure and not just the VFs from max bandwidth. This adds a TTI hook to opt into this behaviour for RISC-V which fixes the motivating godbolt example above. When last checked this significantly reduces the number of spills on SPEC CPU 2017, up to 80% on 538.imagick_r.
fhahn
pushed a commit
to fhahn/llvm-project
that referenced
this pull request
Jan 28, 2026
In llvm#149056 VF pruning was changed so that it only pruned VFs that stemmed from MaxBandwidth being enabled. However we always compute register pressure regardless of whether or not max bandwidth is permitted for any VFs (via `MaxPermissibleVFWithoutMaxBW`). This skips the computation if not needed and renames the method for clarity. The diff in reg-usage.ll is due to the scalable VPlan not actually having any maxbandwidth VFs, so I've changed it to check the fixed-length VF instead, which is affected by maxbandwidth. (cherry-picked from 33f8ad7b6d9c5c43525576854fc07eebdca00a9f)
fhahn
pushed a commit
to fhahn/llvm-project
that referenced
this pull request
Jan 28, 2026
In llvm#149056 VF pruning was changed so that it only pruned VFs that stemmed from MaxBandwidth being enabled. However we always compute register pressure regardless of whether or not max bandwidth is permitted for any VFs (via `MaxPermissibleVFWithoutMaxBW`). This skips the computation if not needed and renames the method for clarity. The diff in reg-usage.ll is due to the scalable VPlan not actually having any maxbandwidth VFs, so I've changed it to check the fixed-length VF instead, which is affected by maxbandwidth. (cherry-picked from 33f8ad7b6d9c5c43525576854fc07eebdca00a9f)
fhahn
pushed a commit
to fhahn/llvm-project
that referenced
this pull request
Jan 28, 2026
In llvm#149056 VF pruning was changed so that it only pruned VFs that stemmed from MaxBandwidth being enabled. However we always compute register pressure regardless of whether or not max bandwidth is permitted for any VFs (via `MaxPermissibleVFWithoutMaxBW`). This skips the computation if not needed and renames the method for clarity. The diff in reg-usage.ll is due to the scalable VPlan not actually having any maxbandwidth VFs, so I've changed it to check the fixed-length VF instead, which is affected by maxbandwidth. (cherry-picked from 33f8ad7b6d9c5c43525576854fc07eebdca00a9f)
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
In #149056 VF pruning was changed so that it only pruned VFs that stemmed from MaxBandwidth being enabled.
However we always compute register pressure regardless of whether or not max bandwidth is permitted for any VFs (via
MaxPermissibleVFWithoutMaxBW).This skips the computation if not needed and renames the method for clarity.
The diff in reg-usage.ll is due to the scalable VPlan not actually having any maxbandwidth VFs, so I've changed it to check the fixed-length VF instead, which is affected by maxbandwidth.