[VPlan] Prevent uses of materialized VPSymbolicValues. (NFC)#182318
[VPlan] Prevent uses of materialized VPSymbolicValues. (NFC)#182318
Conversation
|
@llvm/pr-subscribers-vectorizers @llvm/pr-subscribers-llvm-transforms Author: Florian Hahn (fhahn) ChangesAfter VPSymbolicValues (like VF and VFxUF) are materialized via
Note that this still allows some uses (e.g. comparing VPSymbolicValue Depends on #182146 to fix a Patch is 57.58 KiB, truncated to 20.00 KiB below, full version: https://github.com/llvm/llvm-project/pull/182318.diff 17 Files Affected:
diff --git a/llvm/lib/Transforms/Vectorize/VPlan.cpp b/llvm/lib/Transforms/Vectorize/VPlan.cpp
index f233f0dc1b025..e1ab8f6178cb7 100644
--- a/llvm/lib/Transforms/Vectorize/VPlan.cpp
+++ b/llvm/lib/Transforms/Vectorize/VPlan.cpp
@@ -1402,11 +1402,14 @@ bool VPValue::isDefinedOutsideLoopRegions() const {
}
void VPValue::replaceAllUsesWith(VPValue *New) {
replaceUsesWithIf(New, [](VPUser &, unsigned) { return true; });
+ if (auto *SV = dyn_cast<VPSymbolicValue>(this))
+ SV->markMaterialized();
}
void VPValue::replaceUsesWithIf(
VPValue *New,
llvm::function_ref<bool(VPUser &U, unsigned Idx)> ShouldReplace) {
+ assertNotMaterialized();
// Note that this early exit is required for correctness; the implementation
// below relies on the number of users for this VPValue to decrease, which
// isn't the case if this == New.
diff --git a/llvm/lib/Transforms/Vectorize/VPlan.h b/llvm/lib/Transforms/Vectorize/VPlan.h
index e05da74125d1c..a12ded1580486 100644
--- a/llvm/lib/Transforms/Vectorize/VPlan.h
+++ b/llvm/lib/Transforms/Vectorize/VPlan.h
@@ -4668,14 +4668,14 @@ class VPlan {
VPSymbolicValue &getVectorTripCount() { return VectorTripCount; }
/// Returns the VF of the vector loop region.
- VPValue &getVF() { return VF; };
- const VPValue &getVF() const { return VF; };
+ VPSymbolicValue &getVF() { return VF; };
+ const VPSymbolicValue &getVF() const { return VF; };
/// Returns the UF of the vector loop region.
- VPValue &getUF() { return UF; };
+ VPSymbolicValue &getUF() { return UF; };
/// Returns VF * UF of the vector loop region.
- VPValue &getVFxUF() { return VFxUF; }
+ VPSymbolicValue &getVFxUF() { return VFxUF; }
LLVMContext &getContext() const {
return getScalarHeader()->getIRBasicBlock()->getContext();
diff --git a/llvm/lib/Transforms/Vectorize/VPlanAnalysis.cpp b/llvm/lib/Transforms/Vectorize/VPlanAnalysis.cpp
index 4b744b9128171..54c02bf3156b9 100644
--- a/llvm/lib/Transforms/Vectorize/VPlanAnalysis.cpp
+++ b/llvm/lib/Transforms/Vectorize/VPlanAnalysis.cpp
@@ -433,7 +433,8 @@ SmallVector<VPRegisterUsage, 8> llvm::calculateRegisterUsageForPlan(
// the loop (not including non-recipe values such as arguments and
// constants).
SmallSetVector<VPValue *, 8> LoopInvariants;
- LoopInvariants.insert(&Plan.getVectorTripCount());
+ if (Plan.getVectorTripCount().getNumUsers() > 0)
+ LoopInvariants.insert(&Plan.getVectorTripCount());
// We scan the loop in a topological order in order and assign a number to
// each recipe. We use RPO to ensure that defs are met before their users. We
diff --git a/llvm/lib/Transforms/Vectorize/VPlanTransforms.cpp b/llvm/lib/Transforms/Vectorize/VPlanTransforms.cpp
index bb1a91ec8c963..52325ee43e8da 100644
--- a/llvm/lib/Transforms/Vectorize/VPlanTransforms.cpp
+++ b/llvm/lib/Transforms/Vectorize/VPlanTransforms.cpp
@@ -4887,6 +4887,9 @@ void VPlanTransforms::materializeConstantVectorTripCount(
assert(Plan.hasUF(BestUF) && "BestUF is not available in Plan");
VPValue *TC = Plan.getTripCount();
+ if (TC->getNumUsers() == 0)
+ return;
+
// Skip cases for which the trip count may be non-trivial to materialize.
// I.e., when a scalar tail is absent - due to tail folding, or when a scalar
// tail is required.
@@ -5078,12 +5081,19 @@ void VPlanTransforms::materializeFactors(VPlan &Plan, VPBasicBlock *VectorPH,
ElementCount VFEC) {
VPBuilder Builder(VectorPH, VectorPH->begin());
Type *TCTy = VPTypeAnalysis(Plan).inferScalarType(Plan.getTripCount());
- VPValue &VF = Plan.getVF();
- VPValue &VFxUF = Plan.getVFxUF();
VPValue *UF =
Plan.getOrAddLiveIn(ConstantInt::get(TCTy, Plan.getConcreteUF()));
Plan.getUF().replaceAllUsesWith(UF);
+ // If VF and VFxUF are already materialized, there's nothing more to do.
+ if (Plan.getVF().isMaterialized()) {
+ assert(Plan.getVFxUF().isMaterialized() &&
+ "VF and VFxUF must be materialized together");
+ return;
+ }
+
+ VPValue &VF = Plan.getVF();
+ VPValue &VFxUF = Plan.getVFxUF();
// If there are no users of the runtime VF, compute VFxUF by constant folding
// the multiplication of VF and UF.
if (VF.getNumUsers() == 0) {
@@ -5106,10 +5116,6 @@ void VPlanTransforms::materializeFactors(VPlan &Plan, VPBasicBlock *VectorPH,
VPValue *MulByUF = Builder.createOverflowingOp(
Instruction::Mul, {RuntimeVF, UF}, {true, false});
VFxUF.replaceAllUsesWith(MulByUF);
-
- assert(Plan.getVF().getNumUsers() == 0 && Plan.getUF().getNumUsers() == 0 &&
- Plan.getVFxUF().getNumUsers() == 0 &&
- "VF, UF, and VFxUF not expected to be used");
}
DenseMap<const SCEV *, Value *>
@@ -5427,21 +5433,38 @@ VPlanTransforms::narrowInterleaveGroups(VPlan &Plan,
// original iteration.
auto *CanIV = VectorLoop->getCanonicalIV();
auto *Inc = cast<VPInstruction>(CanIV->getBackedgeValue());
- VPBuilder PHBuilder(Plan.getVectorPreheader());
+ VPBasicBlock *VectorPH = Plan.getVectorPreheader();
+ VPBuilder PHBuilder(VectorPH, VectorPH->begin());
VPValue *UF = &Plan.getUF();
+ VPValue *Step;
if (VFToOptimize->isScalable()) {
VPValue *VScale = PHBuilder.createElementCount(
VectorLoop->getCanonicalIVType(), ElementCount::getScalable(1));
- VPValue *VScaleUF = PHBuilder.createOverflowingOp(
- Instruction::Mul, {VScale, UF}, {true, false});
- Inc->setOperand(1, VScaleUF);
+ Step = PHBuilder.createOverflowingOp(Instruction::Mul, {VScale, UF},
+ {true, false});
Plan.getVF().replaceAllUsesWith(VScale);
} else {
- Inc->setOperand(1, UF);
+ Step = UF;
Plan.getVF().replaceAllUsesWith(
Plan.getConstantInt(CanIV->getScalarType(), 1));
}
+
+ // Materialize vector trip count with the narrowed step: TC - (TC % Step).
+ assert(Plan.getMiddleBlock()->getNumSuccessors() == 2 &&
+ "cannot materialize vector trip count when folding the tail or "
+ "requiring a scalar iteration");
+ VPValue *TC = Plan.getTripCount();
+ VPValue *R =
+ PHBuilder.createNaryOp(Instruction::URem, {TC, Step},
+ DebugLoc::getCompilerGenerated(), "n.mod.vf");
+ VPValue *VectorTC =
+ PHBuilder.createSub(TC, R, DebugLoc::getCompilerGenerated(), "n.vec");
+ Plan.getVectorTripCount().replaceAllUsesWith(VectorTC);
+
+ Inc->setOperand(1, Step);
+ Plan.getVFxUF().replaceAllUsesWith(Step);
+
removeDeadRecipes(Plan);
assert(none_of(*VectorLoop->getEntryBasicBlock(),
IsaPred<VPVectorPointerRecipe>) &&
diff --git a/llvm/lib/Transforms/Vectorize/VPlanValue.h b/llvm/lib/Transforms/Vectorize/VPlanValue.h
index 4ef78341e0654..b7d923e302369 100644
--- a/llvm/lib/Transforms/Vectorize/VPlanValue.h
+++ b/llvm/lib/Transforms/Vectorize/VPlanValue.h
@@ -101,11 +101,24 @@ class LLVM_ABI_FOR_TEST VPValue {
void dump() const;
#endif
- unsigned getNumUsers() const { return Users.size(); }
- void addUser(VPUser &User) { Users.push_back(&User); }
+ /// Assert that this VPValue has not been materialized, if it is a
+ /// VPSymbolicValue.
+ void assertNotMaterialized() const;
+
+ unsigned getNumUsers() const {
+ if (Users.empty())
+ return 0;
+ assertNotMaterialized();
+ return Users.size();
+ }
+ void addUser(VPUser &User) {
+ assertNotMaterialized();
+ Users.push_back(&User);
+ }
/// Remove a single \p User from the list of users.
void removeUser(VPUser &User) {
+ assertNotMaterialized();
// The same user can be added multiple times, e.g. because the same VPValue
// is used twice by the same VPUser. Remove a single one.
auto *I = find(Users, &User);
@@ -118,10 +131,22 @@ class LLVM_ABI_FOR_TEST VPValue {
typedef iterator_range<user_iterator> user_range;
typedef iterator_range<const_user_iterator> const_user_range;
- user_iterator user_begin() { return Users.begin(); }
- const_user_iterator user_begin() const { return Users.begin(); }
- user_iterator user_end() { return Users.end(); }
- const_user_iterator user_end() const { return Users.end(); }
+ user_iterator user_begin() {
+ assertNotMaterialized();
+ return Users.begin();
+ }
+ const_user_iterator user_begin() const {
+ assertNotMaterialized();
+ return Users.begin();
+ }
+ user_iterator user_end() {
+ assertNotMaterialized();
+ return Users.end();
+ }
+ const_user_iterator user_end() const {
+ assertNotMaterialized();
+ return Users.end();
+ }
user_range users() { return user_range(user_begin(), user_end()); }
const_user_range users() const {
return const_user_range(user_begin(), user_end());
@@ -226,6 +251,19 @@ struct VPSymbolicValue : public VPValue {
static bool classof(const VPValue *V) {
return V->getVPValueID() == VPVSymbolicSC;
}
+
+#if !defined(NDEBUG)
+ /// Returns true if this symbolic value has been materialized.
+ bool isMaterialized() const { return Materialized; }
+
+ /// Mark this symbolic value as materialized.
+ void markMaterialized() { Materialized = true; }
+
+private:
+ /// Track whether this symbolic value has been materialized (replaced).
+ /// After materialization, accessing users should trigger an assertion.
+ bool Materialized = false;
+#endif
};
/// A VPValue defined by a recipe that produces one or more values.
@@ -427,6 +465,12 @@ class VPDef {
unsigned getNumDefinedValues() const { return DefinedValues.size(); }
};
+inline void VPValue::assertNotMaterialized() const {
+ assert((!isa<VPSymbolicValue>(this) ||
+ !cast<VPSymbolicValue>(this)->isMaterialized()) &&
+ "accessing materialized symbolic value");
+}
+
} // namespace llvm
#endif // LLVM_TRANSFORMS_VECTORIZE_VPLAN_VALUE_H
diff --git a/llvm/test/Transforms/LoopVectorize/AArch64/transform-narrow-interleave-to-widen-memory-constant-ops.ll b/llvm/test/Transforms/LoopVectorize/AArch64/transform-narrow-interleave-to-widen-memory-constant-ops.ll
index 52bd8a0a11e35..6d0e0503c49a5 100644
--- a/llvm/test/Transforms/LoopVectorize/AArch64/transform-narrow-interleave-to-widen-memory-constant-ops.ll
+++ b/llvm/test/Transforms/LoopVectorize/AArch64/transform-narrow-interleave-to-widen-memory-constant-ops.ll
@@ -28,9 +28,8 @@ define void @test_add_double_same_const_args_1(ptr %res, ptr noalias %A, ptr noa
; CHECK-NEXT: [[TMP12:%.*]] = icmp eq i64 [[INDEX_NEXT]], 100
; CHECK-NEXT: br i1 [[TMP12]], label %[[MIDDLE_BLOCK:.*]], label %[[VECTOR_BODY]], !llvm.loop [[LOOP0:![0-9]+]]
; CHECK: [[MIDDLE_BLOCK]]:
-; CHECK-NEXT: br label %[[EXIT:.*]]
+; CHECK-NEXT: br i1 true, [[EXIT1:label %.*]], label %[[EXIT:.*]]
; CHECK: [[EXIT]]:
-; CHECK-NEXT: ret void
;
entry:
br label %loop
@@ -77,11 +76,10 @@ define void @test_add_double_same_const_args_2(ptr %res, ptr noalias %A, ptr noa
; CHECK-NEXT: store <2 x double> [[TMP7]], ptr [[TMP9]], align 4
; CHECK-NEXT: [[INDEX_NEXT]] = add nuw i64 [[INDEX]], 2
; CHECK-NEXT: [[TMP12:%.*]] = icmp eq i64 [[INDEX_NEXT]], 100
-; CHECK-NEXT: br i1 [[TMP12]], label %[[MIDDLE_BLOCK:.*]], label %[[VECTOR_BODY]], !llvm.loop [[LOOP3:![0-9]+]]
+; CHECK-NEXT: br i1 [[TMP12]], label %[[MIDDLE_BLOCK:.*]], label %[[VECTOR_BODY]], !llvm.loop [[LOOP4:![0-9]+]]
; CHECK: [[MIDDLE_BLOCK]]:
-; CHECK-NEXT: br label %[[EXIT:.*]]
+; CHECK-NEXT: br i1 true, [[EXIT1:label %.*]], label %[[EXIT:.*]]
; CHECK: [[EXIT]]:
-; CHECK-NEXT: ret void
;
entry:
br label %loop
@@ -191,11 +189,10 @@ define void @test_add_double_same_var_args_1(ptr %res, ptr noalias %A, ptr noali
; CHECK-NEXT: store <2 x double> [[TMP6]], ptr [[TMP8]], align 4
; CHECK-NEXT: [[INDEX_NEXT]] = add nuw i64 [[INDEX]], 2
; CHECK-NEXT: [[TMP11:%.*]] = icmp eq i64 [[INDEX_NEXT]], 100
-; CHECK-NEXT: br i1 [[TMP11]], label %[[MIDDLE_BLOCK:.*]], label %[[VECTOR_BODY]], !llvm.loop [[LOOP5:![0-9]+]]
+; CHECK-NEXT: br i1 [[TMP11]], label %[[MIDDLE_BLOCK:.*]], label %[[VECTOR_BODY]], !llvm.loop [[LOOP7:![0-9]+]]
; CHECK: [[MIDDLE_BLOCK]]:
-; CHECK-NEXT: br label %[[EXIT:.*]]
+; CHECK-NEXT: br i1 true, [[EXIT1:label %.*]], label %[[EXIT:.*]]
; CHECK: [[EXIT]]:
-; CHECK-NEXT: ret void
;
entry:
br label %loop
@@ -244,11 +241,10 @@ define void @test_add_double_same_var_args_2(ptr %res, ptr noalias %A, ptr noali
; CHECK-NEXT: store <2 x double> [[TMP6]], ptr [[TMP8]], align 4
; CHECK-NEXT: [[INDEX_NEXT]] = add nuw i64 [[INDEX]], 2
; CHECK-NEXT: [[TMP11:%.*]] = icmp eq i64 [[INDEX_NEXT]], 100
-; CHECK-NEXT: br i1 [[TMP11]], label %[[MIDDLE_BLOCK:.*]], label %[[VECTOR_BODY]], !llvm.loop [[LOOP6:![0-9]+]]
+; CHECK-NEXT: br i1 [[TMP11]], label %[[MIDDLE_BLOCK:.*]], label %[[VECTOR_BODY]], !llvm.loop [[LOOP9:![0-9]+]]
; CHECK: [[MIDDLE_BLOCK]]:
-; CHECK-NEXT: br label %[[EXIT:.*]]
+; CHECK-NEXT: br i1 true, [[EXIT1:label %.*]], label %[[EXIT:.*]]
; CHECK: [[EXIT]]:
-; CHECK-NEXT: ret void
;
entry:
br label %loop
diff --git a/llvm/test/Transforms/LoopVectorize/AArch64/transform-narrow-interleave-to-widen-memory-cost.ll b/llvm/test/Transforms/LoopVectorize/AArch64/transform-narrow-interleave-to-widen-memory-cost.ll
index 5e37f9eff4ba2..ccbcfff7fda7b 100644
--- a/llvm/test/Transforms/LoopVectorize/AArch64/transform-narrow-interleave-to-widen-memory-cost.ll
+++ b/llvm/test/Transforms/LoopVectorize/AArch64/transform-narrow-interleave-to-widen-memory-cost.ll
@@ -87,7 +87,7 @@ define void @test_complex_add_double(ptr %res, ptr noalias %A, ptr noalias %B, i
; CHECK-NEXT: [[MIN_ITERS_CHECK:%.*]] = icmp ult i64 [[N]], 4
; CHECK-NEXT: br i1 [[MIN_ITERS_CHECK]], label %[[SCALAR_PH:.*]], label %[[VECTOR_PH:.*]]
; CHECK: [[VECTOR_PH]]:
-; CHECK-NEXT: [[N_MOD_VF:%.*]] = urem i64 [[N]], 4
+; CHECK-NEXT: [[N_MOD_VF:%.*]] = urem i64 [[N]], 2
; CHECK-NEXT: [[N_VEC:%.*]] = sub i64 [[N]], [[N_MOD_VF]]
; CHECK-NEXT: br label %[[VECTOR_BODY:.*]]
; CHECK: [[VECTOR_BODY]]:
@@ -314,8 +314,8 @@ define void @single_fmul_used_by_each_member(ptr noalias %A, ptr noalias %B, ptr
; CHECK-NEXT: [[MIN_ITERS_CHECK11:%.*]] = icmp ult i64 [[TMP0]], 8
; CHECK-NEXT: br i1 [[MIN_ITERS_CHECK11]], label %[[VEC_EPILOG_PH:.*]], label %[[VECTOR_PH:.*]]
; CHECK: [[VECTOR_PH]]:
-; CHECK-NEXT: [[N_MOD_VF:%.*]] = urem i64 [[TMP0]], 8
-; CHECK-NEXT: [[N_VEC:%.*]] = sub i64 [[TMP0]], [[N_MOD_VF]]
+; CHECK-NEXT: [[TMP41:%.*]] = urem i64 [[TMP0]], 4
+; CHECK-NEXT: [[N_VEC12:%.*]] = sub i64 [[TMP0]], [[TMP41]]
; CHECK-NEXT: br label %[[VECTOR_BODY:.*]]
; CHECK: [[VECTOR_BODY]]:
; CHECK-NEXT: [[INDEX:%.*]] = phi i64 [ 0, %[[VECTOR_PH]] ], [ [[INDEX_NEXT:%.*]], %[[VECTOR_BODY]] ]
@@ -359,18 +359,16 @@ define void @single_fmul_used_by_each_member(ptr noalias %A, ptr noalias %B, ptr
; CHECK-NEXT: store <2 x double> [[TMP30]], ptr [[TMP37]], align 8
; CHECK-NEXT: store <2 x double> [[TMP31]], ptr [[TMP43]], align 8
; CHECK-NEXT: [[INDEX_NEXT]] = add nuw i64 [[INDEX]], 4
-; CHECK-NEXT: [[TMP44:%.*]] = icmp eq i64 [[INDEX_NEXT]], [[N_VEC]]
+; CHECK-NEXT: [[TMP44:%.*]] = icmp eq i64 [[INDEX_NEXT]], [[N_VEC12]]
; CHECK-NEXT: br i1 [[TMP44]], label %[[MIDDLE_BLOCK:.*]], label %[[VECTOR_BODY]], !llvm.loop [[LOOP10:![0-9]+]]
; CHECK: [[MIDDLE_BLOCK]]:
-; CHECK-NEXT: [[CMP_N:%.*]] = icmp eq i64 [[TMP0]], [[N_VEC]]
+; CHECK-NEXT: [[CMP_N:%.*]] = icmp eq i64 [[TMP0]], [[N_VEC12]]
; CHECK-NEXT: br i1 [[CMP_N]], label %[[EXIT:.*]], label %[[VEC_EPILOG_ITER_CHECK:.*]]
; CHECK: [[VEC_EPILOG_ITER_CHECK]]:
-; CHECK-NEXT: [[MIN_EPILOG_ITERS_CHECK:%.*]] = icmp ult i64 [[N_MOD_VF]], 2
+; CHECK-NEXT: [[MIN_EPILOG_ITERS_CHECK:%.*]] = icmp ult i64 [[TMP41]], 2
; CHECK-NEXT: br i1 [[MIN_EPILOG_ITERS_CHECK]], label %[[VEC_EPILOG_SCALAR_PH]], label %[[VEC_EPILOG_PH]], !prof [[PROF7]]
; CHECK: [[VEC_EPILOG_PH]]:
-; CHECK-NEXT: [[VEC_EPILOG_RESUME_VAL:%.*]] = phi i64 [ [[N_VEC]], %[[VEC_EPILOG_ITER_CHECK]] ], [ 0, %[[VECTOR_MAIN_LOOP_ITER_CHECK]] ]
-; CHECK-NEXT: [[N_MOD_VF22:%.*]] = urem i64 [[TMP0]], 2
-; CHECK-NEXT: [[N_VEC23:%.*]] = sub i64 [[TMP0]], [[N_MOD_VF22]]
+; CHECK-NEXT: [[VEC_EPILOG_RESUME_VAL:%.*]] = phi i64 [ [[N_VEC12]], %[[VEC_EPILOG_ITER_CHECK]] ], [ 0, %[[VECTOR_MAIN_LOOP_ITER_CHECK]] ]
; CHECK-NEXT: br label %[[VEC_EPILOG_VECTOR_BODY:.*]]
; CHECK: [[VEC_EPILOG_VECTOR_BODY]]:
; CHECK-NEXT: [[INDEX24:%.*]] = phi i64 [ [[VEC_EPILOG_RESUME_VAL]], %[[VEC_EPILOG_PH]] ], [ [[INDEX_NEXT25:%.*]], %[[VEC_EPILOG_VECTOR_BODY]] ]
@@ -384,13 +382,12 @@ define void @single_fmul_used_by_each_member(ptr noalias %A, ptr noalias %B, ptr
; CHECK-NEXT: [[TMP50:%.*]] = getelementptr { double, double }, ptr [[C]], i64 [[INDEX24]]
; CHECK-NEXT: store <2 x double> [[TMP48]], ptr [[TMP50]], align 8
; CHECK-NEXT: [[INDEX_NEXT25]] = add nuw i64 [[INDEX24]], 1
-; CHECK-NEXT: [[TMP51:%.*]] = icmp eq i64 [[INDEX_NEXT25]], [[N_VEC23]]
-; CHECK-NEXT: br i1 [[TMP51]], label %[[VEC_EPILOG_MIDDLE_BLOCK:.*]], label %[[VEC_EPILOG_VECTOR_BODY]], !llvm.loop [[LOOP11:![0-9]+]]
+; CHECK-NEXT: [[TMP46:%.*]] = icmp eq i64 [[INDEX_NEXT25]], [[TMP0]]
+; CHECK-NEXT: br i1 [[TMP46]], label %[[VEC_EPILOG_MIDDLE_BLOCK:.*]], label %[[VEC_EPILOG_VECTOR_BODY]], !llvm.loop [[LOOP11:![0-9]+]]
; CHECK: [[VEC_EPILOG_MIDDLE_BLOCK]]:
-; CHECK-NEXT: [[CMP_N26:%.*]] = icmp eq i64 [[TMP0]], [[N_VEC23]]
-; CHECK-NEXT: br i1 [[CMP_N26]], label %[[EXIT]], label %[[VEC_EPILOG_SCALAR_PH]]
+; CHECK-NEXT: br i1 true, label %[[EXIT]], label %[[VEC_EPILOG_SCALAR_PH]]
; CHECK: [[VEC_EPILOG_SCALAR_PH]]:
-; CHECK-NEXT: [[BC_RESUME_VAL:%.*]] = phi i64 [ [[N_VEC23]], %[[VEC_EPILOG_MIDDLE_BLOCK]] ], [ [[N_VEC]], %[[VEC_EPILOG_ITER_CHECK]] ], [ 0, %[[VECTOR_SCEVCHECK]] ], [ 0, %[[ITER_CHECK]] ]
+; CHECK-NEXT: [[BC_RESUME_VAL:%.*]] = phi i64 [ [[TMP0]], %[[VEC_EPILOG_MIDDLE_BLOCK]] ], [ [[N_VEC12]], %[[VEC_EPILOG_ITER_CHECK]] ], [ 0, %[[VECTOR_SCEVCHECK]] ], [ 0, %[[ITER_CHECK]] ]
; CHECK-NEXT: br label %[[LOOP:.*]]
; CHECK: [[LOOP]]:
; CHECK-NEXT: [[IV:%.*]] = phi i64 [ [[BC_RESUME_VAL]], %[[VEC_EPILOG_SCALAR_PH]] ], [ [[IV_NEXT:%.*]], %[[LOOP]] ]
@@ -462,9 +459,8 @@ define void @test_interleave_after_narrowing(i32 %n, ptr %x, ptr noalias %y) {
; CHECK-NEXT: [[TMP3:%.*]] = icmp eq i64 [[INDEX_NEXT]], 256
; CHECK-NEXT: br i1 [[TMP3]], label %[[MIDDLE_BLOCK:.*]], label %[[VECTOR_BODY]], !llvm.loop [[LOOP13:![0-9]+]]
; CHECK: [[MIDDLE_BLOCK]]:
-; CHECK-NEXT: br label %[[EXIT:.*]]
+; CHECK-NEXT: br i1 true, [[EXIT1:label %.*]], label %[[EXIT:.*]]
; CHECK: [[EXIT]]:
-; CHECK-NEXT: ret void
;
entry:
br label %loop
diff --git a/llvm/test/Transforms/LoopVectorize/AArch64/transform-narrow-interleave-to-widen-memory-derived-ivs.ll b/llvm/test/Transforms/LoopVectorize/AArch64/transform-narrow-interleave-to-widen-memory-derived-ivs.ll
index fab0369de8aa0..6428fe6a29445 100644
--- a/llvm/test/Transforms/LoopVectorize/AArch64/transform-narrow-interleave-to-widen-memory-derived-ivs.ll
+++ b/llvm/test/Transforms/LoopVectorize/AArch64/transform-narrow-interleave-to-widen-memory-derived-ivs.ll
@@ -16,7 +16,7 @@ define void @derived_int_ivs(ptr noalias %a, ptr noalias %b, i64 %end) {
; VF2-NEXT: [[MIN_ITERS_CHECK:%.*]] = icmp ult i64 [[TMP2]], 2
; VF2-NEXT: br i1 [[MIN_ITERS_CHECK]], label %[[SCALAR_PH:.*]], label %[[VECTOR_PH:.*]]
; VF2: [[VECTOR_PH]]:
-; VF2-NEXT: [[N_MOD_VF:%.*]] = urem i64 [[TMP2]], 2
+; VF2-NEXT: [[N_MOD_VF:%.*]] = urem i64 [[TMP2]], 1
; VF2-NEXT: [[N_VEC:%.*]] = sub i64 [[TMP2]], [[N_MOD_VF]]
; VF2-NEXT: [[TMP3:%.*]] = mul i64 [[N_VEC]], 16
; VF2-NEXT: [[TMP4:%.*]] = add i64 16, [[TMP3]]
@@ -30,8 +30,8 @@ define void @derived_int_ivs(ptr noalias %a, ptr noalias %b, i64 %end) {
; VF2-NEXT: [[TMP7:%.*]] = getelementptr inbounds i8, ptr [[B]], i64 [[OFFSET_IDX]]
; VF2-NEXT: store <2 x double> [[WIDE_LOAD]], ptr [[TMP7]], align 8
; VF2-NEXT: [[INDEX_NEXT]] = add nuw i64 [[INDEX]], 1
-; VF2-NEXT: [[TMP9:%.*]] = icmp eq i64 [[INDEX_NEXT]], [[N_VEC]]
-; VF2-NEXT: br i1 [[TMP9]], label %[[MIDDLE_BLOCK:.*]], label %[[VECTOR_BODY]], !llvm.loop [[LOOP0:![0-9]+]]
+; VF2-NEXT: [[TMP8:%.*]] = icmp eq i64 [[INDEX_NEXT]], [[N_VEC]]
+; VF2-NEXT: br i1 [[TMP8]], label %[[MIDDLE_BLOCK:.*]], label %[[VECTOR_BODY]], !llvm.loop [[LOOP0:![0-9]+]]
...
[truncated]
|
|
@llvm/pr-subscribers-backend-risc-v Author: Florian Hahn (fhahn) ChangesAfter VPSymbolicValues (like VF and VFxUF) are materialized via
Note that this still allows some uses (e.g. comparing VPSymbolicValue Depends on #182146 to fix a Patch is 57.58 KiB, truncated to 20.00 KiB below, full version: https://github.com/llvm/llvm-project/pull/182318.diff 17 Files Affected:
diff --git a/llvm/lib/Transforms/Vectorize/VPlan.cpp b/llvm/lib/Transforms/Vectorize/VPlan.cpp
index f233f0dc1b025..e1ab8f6178cb7 100644
--- a/llvm/lib/Transforms/Vectorize/VPlan.cpp
+++ b/llvm/lib/Transforms/Vectorize/VPlan.cpp
@@ -1402,11 +1402,14 @@ bool VPValue::isDefinedOutsideLoopRegions() const {
}
void VPValue::replaceAllUsesWith(VPValue *New) {
replaceUsesWithIf(New, [](VPUser &, unsigned) { return true; });
+ if (auto *SV = dyn_cast<VPSymbolicValue>(this))
+ SV->markMaterialized();
}
void VPValue::replaceUsesWithIf(
VPValue *New,
llvm::function_ref<bool(VPUser &U, unsigned Idx)> ShouldReplace) {
+ assertNotMaterialized();
// Note that this early exit is required for correctness; the implementation
// below relies on the number of users for this VPValue to decrease, which
// isn't the case if this == New.
diff --git a/llvm/lib/Transforms/Vectorize/VPlan.h b/llvm/lib/Transforms/Vectorize/VPlan.h
index e05da74125d1c..a12ded1580486 100644
--- a/llvm/lib/Transforms/Vectorize/VPlan.h
+++ b/llvm/lib/Transforms/Vectorize/VPlan.h
@@ -4668,14 +4668,14 @@ class VPlan {
VPSymbolicValue &getVectorTripCount() { return VectorTripCount; }
/// Returns the VF of the vector loop region.
- VPValue &getVF() { return VF; };
- const VPValue &getVF() const { return VF; };
+ VPSymbolicValue &getVF() { return VF; };
+ const VPSymbolicValue &getVF() const { return VF; };
/// Returns the UF of the vector loop region.
- VPValue &getUF() { return UF; };
+ VPSymbolicValue &getUF() { return UF; };
/// Returns VF * UF of the vector loop region.
- VPValue &getVFxUF() { return VFxUF; }
+ VPSymbolicValue &getVFxUF() { return VFxUF; }
LLVMContext &getContext() const {
return getScalarHeader()->getIRBasicBlock()->getContext();
diff --git a/llvm/lib/Transforms/Vectorize/VPlanAnalysis.cpp b/llvm/lib/Transforms/Vectorize/VPlanAnalysis.cpp
index 4b744b9128171..54c02bf3156b9 100644
--- a/llvm/lib/Transforms/Vectorize/VPlanAnalysis.cpp
+++ b/llvm/lib/Transforms/Vectorize/VPlanAnalysis.cpp
@@ -433,7 +433,8 @@ SmallVector<VPRegisterUsage, 8> llvm::calculateRegisterUsageForPlan(
// the loop (not including non-recipe values such as arguments and
// constants).
SmallSetVector<VPValue *, 8> LoopInvariants;
- LoopInvariants.insert(&Plan.getVectorTripCount());
+ if (Plan.getVectorTripCount().getNumUsers() > 0)
+ LoopInvariants.insert(&Plan.getVectorTripCount());
// We scan the loop in a topological order in order and assign a number to
// each recipe. We use RPO to ensure that defs are met before their users. We
diff --git a/llvm/lib/Transforms/Vectorize/VPlanTransforms.cpp b/llvm/lib/Transforms/Vectorize/VPlanTransforms.cpp
index bb1a91ec8c963..52325ee43e8da 100644
--- a/llvm/lib/Transforms/Vectorize/VPlanTransforms.cpp
+++ b/llvm/lib/Transforms/Vectorize/VPlanTransforms.cpp
@@ -4887,6 +4887,9 @@ void VPlanTransforms::materializeConstantVectorTripCount(
assert(Plan.hasUF(BestUF) && "BestUF is not available in Plan");
VPValue *TC = Plan.getTripCount();
+ if (TC->getNumUsers() == 0)
+ return;
+
// Skip cases for which the trip count may be non-trivial to materialize.
// I.e., when a scalar tail is absent - due to tail folding, or when a scalar
// tail is required.
@@ -5078,12 +5081,19 @@ void VPlanTransforms::materializeFactors(VPlan &Plan, VPBasicBlock *VectorPH,
ElementCount VFEC) {
VPBuilder Builder(VectorPH, VectorPH->begin());
Type *TCTy = VPTypeAnalysis(Plan).inferScalarType(Plan.getTripCount());
- VPValue &VF = Plan.getVF();
- VPValue &VFxUF = Plan.getVFxUF();
VPValue *UF =
Plan.getOrAddLiveIn(ConstantInt::get(TCTy, Plan.getConcreteUF()));
Plan.getUF().replaceAllUsesWith(UF);
+ // If VF and VFxUF are already materialized, there's nothing more to do.
+ if (Plan.getVF().isMaterialized()) {
+ assert(Plan.getVFxUF().isMaterialized() &&
+ "VF and VFxUF must be materialized together");
+ return;
+ }
+
+ VPValue &VF = Plan.getVF();
+ VPValue &VFxUF = Plan.getVFxUF();
// If there are no users of the runtime VF, compute VFxUF by constant folding
// the multiplication of VF and UF.
if (VF.getNumUsers() == 0) {
@@ -5106,10 +5116,6 @@ void VPlanTransforms::materializeFactors(VPlan &Plan, VPBasicBlock *VectorPH,
VPValue *MulByUF = Builder.createOverflowingOp(
Instruction::Mul, {RuntimeVF, UF}, {true, false});
VFxUF.replaceAllUsesWith(MulByUF);
-
- assert(Plan.getVF().getNumUsers() == 0 && Plan.getUF().getNumUsers() == 0 &&
- Plan.getVFxUF().getNumUsers() == 0 &&
- "VF, UF, and VFxUF not expected to be used");
}
DenseMap<const SCEV *, Value *>
@@ -5427,21 +5433,38 @@ VPlanTransforms::narrowInterleaveGroups(VPlan &Plan,
// original iteration.
auto *CanIV = VectorLoop->getCanonicalIV();
auto *Inc = cast<VPInstruction>(CanIV->getBackedgeValue());
- VPBuilder PHBuilder(Plan.getVectorPreheader());
+ VPBasicBlock *VectorPH = Plan.getVectorPreheader();
+ VPBuilder PHBuilder(VectorPH, VectorPH->begin());
VPValue *UF = &Plan.getUF();
+ VPValue *Step;
if (VFToOptimize->isScalable()) {
VPValue *VScale = PHBuilder.createElementCount(
VectorLoop->getCanonicalIVType(), ElementCount::getScalable(1));
- VPValue *VScaleUF = PHBuilder.createOverflowingOp(
- Instruction::Mul, {VScale, UF}, {true, false});
- Inc->setOperand(1, VScaleUF);
+ Step = PHBuilder.createOverflowingOp(Instruction::Mul, {VScale, UF},
+ {true, false});
Plan.getVF().replaceAllUsesWith(VScale);
} else {
- Inc->setOperand(1, UF);
+ Step = UF;
Plan.getVF().replaceAllUsesWith(
Plan.getConstantInt(CanIV->getScalarType(), 1));
}
+
+ // Materialize vector trip count with the narrowed step: TC - (TC % Step).
+ assert(Plan.getMiddleBlock()->getNumSuccessors() == 2 &&
+ "cannot materialize vector trip count when folding the tail or "
+ "requiring a scalar iteration");
+ VPValue *TC = Plan.getTripCount();
+ VPValue *R =
+ PHBuilder.createNaryOp(Instruction::URem, {TC, Step},
+ DebugLoc::getCompilerGenerated(), "n.mod.vf");
+ VPValue *VectorTC =
+ PHBuilder.createSub(TC, R, DebugLoc::getCompilerGenerated(), "n.vec");
+ Plan.getVectorTripCount().replaceAllUsesWith(VectorTC);
+
+ Inc->setOperand(1, Step);
+ Plan.getVFxUF().replaceAllUsesWith(Step);
+
removeDeadRecipes(Plan);
assert(none_of(*VectorLoop->getEntryBasicBlock(),
IsaPred<VPVectorPointerRecipe>) &&
diff --git a/llvm/lib/Transforms/Vectorize/VPlanValue.h b/llvm/lib/Transforms/Vectorize/VPlanValue.h
index 4ef78341e0654..b7d923e302369 100644
--- a/llvm/lib/Transforms/Vectorize/VPlanValue.h
+++ b/llvm/lib/Transforms/Vectorize/VPlanValue.h
@@ -101,11 +101,24 @@ class LLVM_ABI_FOR_TEST VPValue {
void dump() const;
#endif
- unsigned getNumUsers() const { return Users.size(); }
- void addUser(VPUser &User) { Users.push_back(&User); }
+ /// Assert that this VPValue has not been materialized, if it is a
+ /// VPSymbolicValue.
+ void assertNotMaterialized() const;
+
+ unsigned getNumUsers() const {
+ if (Users.empty())
+ return 0;
+ assertNotMaterialized();
+ return Users.size();
+ }
+ void addUser(VPUser &User) {
+ assertNotMaterialized();
+ Users.push_back(&User);
+ }
/// Remove a single \p User from the list of users.
void removeUser(VPUser &User) {
+ assertNotMaterialized();
// The same user can be added multiple times, e.g. because the same VPValue
// is used twice by the same VPUser. Remove a single one.
auto *I = find(Users, &User);
@@ -118,10 +131,22 @@ class LLVM_ABI_FOR_TEST VPValue {
typedef iterator_range<user_iterator> user_range;
typedef iterator_range<const_user_iterator> const_user_range;
- user_iterator user_begin() { return Users.begin(); }
- const_user_iterator user_begin() const { return Users.begin(); }
- user_iterator user_end() { return Users.end(); }
- const_user_iterator user_end() const { return Users.end(); }
+ user_iterator user_begin() {
+ assertNotMaterialized();
+ return Users.begin();
+ }
+ const_user_iterator user_begin() const {
+ assertNotMaterialized();
+ return Users.begin();
+ }
+ user_iterator user_end() {
+ assertNotMaterialized();
+ return Users.end();
+ }
+ const_user_iterator user_end() const {
+ assertNotMaterialized();
+ return Users.end();
+ }
user_range users() { return user_range(user_begin(), user_end()); }
const_user_range users() const {
return const_user_range(user_begin(), user_end());
@@ -226,6 +251,19 @@ struct VPSymbolicValue : public VPValue {
static bool classof(const VPValue *V) {
return V->getVPValueID() == VPVSymbolicSC;
}
+
+#if !defined(NDEBUG)
+ /// Returns true if this symbolic value has been materialized.
+ bool isMaterialized() const { return Materialized; }
+
+ /// Mark this symbolic value as materialized.
+ void markMaterialized() { Materialized = true; }
+
+private:
+ /// Track whether this symbolic value has been materialized (replaced).
+ /// After materialization, accessing users should trigger an assertion.
+ bool Materialized = false;
+#endif
};
/// A VPValue defined by a recipe that produces one or more values.
@@ -427,6 +465,12 @@ class VPDef {
unsigned getNumDefinedValues() const { return DefinedValues.size(); }
};
+inline void VPValue::assertNotMaterialized() const {
+ assert((!isa<VPSymbolicValue>(this) ||
+ !cast<VPSymbolicValue>(this)->isMaterialized()) &&
+ "accessing materialized symbolic value");
+}
+
} // namespace llvm
#endif // LLVM_TRANSFORMS_VECTORIZE_VPLAN_VALUE_H
diff --git a/llvm/test/Transforms/LoopVectorize/AArch64/transform-narrow-interleave-to-widen-memory-constant-ops.ll b/llvm/test/Transforms/LoopVectorize/AArch64/transform-narrow-interleave-to-widen-memory-constant-ops.ll
index 52bd8a0a11e35..6d0e0503c49a5 100644
--- a/llvm/test/Transforms/LoopVectorize/AArch64/transform-narrow-interleave-to-widen-memory-constant-ops.ll
+++ b/llvm/test/Transforms/LoopVectorize/AArch64/transform-narrow-interleave-to-widen-memory-constant-ops.ll
@@ -28,9 +28,8 @@ define void @test_add_double_same_const_args_1(ptr %res, ptr noalias %A, ptr noa
; CHECK-NEXT: [[TMP12:%.*]] = icmp eq i64 [[INDEX_NEXT]], 100
; CHECK-NEXT: br i1 [[TMP12]], label %[[MIDDLE_BLOCK:.*]], label %[[VECTOR_BODY]], !llvm.loop [[LOOP0:![0-9]+]]
; CHECK: [[MIDDLE_BLOCK]]:
-; CHECK-NEXT: br label %[[EXIT:.*]]
+; CHECK-NEXT: br i1 true, [[EXIT1:label %.*]], label %[[EXIT:.*]]
; CHECK: [[EXIT]]:
-; CHECK-NEXT: ret void
;
entry:
br label %loop
@@ -77,11 +76,10 @@ define void @test_add_double_same_const_args_2(ptr %res, ptr noalias %A, ptr noa
; CHECK-NEXT: store <2 x double> [[TMP7]], ptr [[TMP9]], align 4
; CHECK-NEXT: [[INDEX_NEXT]] = add nuw i64 [[INDEX]], 2
; CHECK-NEXT: [[TMP12:%.*]] = icmp eq i64 [[INDEX_NEXT]], 100
-; CHECK-NEXT: br i1 [[TMP12]], label %[[MIDDLE_BLOCK:.*]], label %[[VECTOR_BODY]], !llvm.loop [[LOOP3:![0-9]+]]
+; CHECK-NEXT: br i1 [[TMP12]], label %[[MIDDLE_BLOCK:.*]], label %[[VECTOR_BODY]], !llvm.loop [[LOOP4:![0-9]+]]
; CHECK: [[MIDDLE_BLOCK]]:
-; CHECK-NEXT: br label %[[EXIT:.*]]
+; CHECK-NEXT: br i1 true, [[EXIT1:label %.*]], label %[[EXIT:.*]]
; CHECK: [[EXIT]]:
-; CHECK-NEXT: ret void
;
entry:
br label %loop
@@ -191,11 +189,10 @@ define void @test_add_double_same_var_args_1(ptr %res, ptr noalias %A, ptr noali
; CHECK-NEXT: store <2 x double> [[TMP6]], ptr [[TMP8]], align 4
; CHECK-NEXT: [[INDEX_NEXT]] = add nuw i64 [[INDEX]], 2
; CHECK-NEXT: [[TMP11:%.*]] = icmp eq i64 [[INDEX_NEXT]], 100
-; CHECK-NEXT: br i1 [[TMP11]], label %[[MIDDLE_BLOCK:.*]], label %[[VECTOR_BODY]], !llvm.loop [[LOOP5:![0-9]+]]
+; CHECK-NEXT: br i1 [[TMP11]], label %[[MIDDLE_BLOCK:.*]], label %[[VECTOR_BODY]], !llvm.loop [[LOOP7:![0-9]+]]
; CHECK: [[MIDDLE_BLOCK]]:
-; CHECK-NEXT: br label %[[EXIT:.*]]
+; CHECK-NEXT: br i1 true, [[EXIT1:label %.*]], label %[[EXIT:.*]]
; CHECK: [[EXIT]]:
-; CHECK-NEXT: ret void
;
entry:
br label %loop
@@ -244,11 +241,10 @@ define void @test_add_double_same_var_args_2(ptr %res, ptr noalias %A, ptr noali
; CHECK-NEXT: store <2 x double> [[TMP6]], ptr [[TMP8]], align 4
; CHECK-NEXT: [[INDEX_NEXT]] = add nuw i64 [[INDEX]], 2
; CHECK-NEXT: [[TMP11:%.*]] = icmp eq i64 [[INDEX_NEXT]], 100
-; CHECK-NEXT: br i1 [[TMP11]], label %[[MIDDLE_BLOCK:.*]], label %[[VECTOR_BODY]], !llvm.loop [[LOOP6:![0-9]+]]
+; CHECK-NEXT: br i1 [[TMP11]], label %[[MIDDLE_BLOCK:.*]], label %[[VECTOR_BODY]], !llvm.loop [[LOOP9:![0-9]+]]
; CHECK: [[MIDDLE_BLOCK]]:
-; CHECK-NEXT: br label %[[EXIT:.*]]
+; CHECK-NEXT: br i1 true, [[EXIT1:label %.*]], label %[[EXIT:.*]]
; CHECK: [[EXIT]]:
-; CHECK-NEXT: ret void
;
entry:
br label %loop
diff --git a/llvm/test/Transforms/LoopVectorize/AArch64/transform-narrow-interleave-to-widen-memory-cost.ll b/llvm/test/Transforms/LoopVectorize/AArch64/transform-narrow-interleave-to-widen-memory-cost.ll
index 5e37f9eff4ba2..ccbcfff7fda7b 100644
--- a/llvm/test/Transforms/LoopVectorize/AArch64/transform-narrow-interleave-to-widen-memory-cost.ll
+++ b/llvm/test/Transforms/LoopVectorize/AArch64/transform-narrow-interleave-to-widen-memory-cost.ll
@@ -87,7 +87,7 @@ define void @test_complex_add_double(ptr %res, ptr noalias %A, ptr noalias %B, i
; CHECK-NEXT: [[MIN_ITERS_CHECK:%.*]] = icmp ult i64 [[N]], 4
; CHECK-NEXT: br i1 [[MIN_ITERS_CHECK]], label %[[SCALAR_PH:.*]], label %[[VECTOR_PH:.*]]
; CHECK: [[VECTOR_PH]]:
-; CHECK-NEXT: [[N_MOD_VF:%.*]] = urem i64 [[N]], 4
+; CHECK-NEXT: [[N_MOD_VF:%.*]] = urem i64 [[N]], 2
; CHECK-NEXT: [[N_VEC:%.*]] = sub i64 [[N]], [[N_MOD_VF]]
; CHECK-NEXT: br label %[[VECTOR_BODY:.*]]
; CHECK: [[VECTOR_BODY]]:
@@ -314,8 +314,8 @@ define void @single_fmul_used_by_each_member(ptr noalias %A, ptr noalias %B, ptr
; CHECK-NEXT: [[MIN_ITERS_CHECK11:%.*]] = icmp ult i64 [[TMP0]], 8
; CHECK-NEXT: br i1 [[MIN_ITERS_CHECK11]], label %[[VEC_EPILOG_PH:.*]], label %[[VECTOR_PH:.*]]
; CHECK: [[VECTOR_PH]]:
-; CHECK-NEXT: [[N_MOD_VF:%.*]] = urem i64 [[TMP0]], 8
-; CHECK-NEXT: [[N_VEC:%.*]] = sub i64 [[TMP0]], [[N_MOD_VF]]
+; CHECK-NEXT: [[TMP41:%.*]] = urem i64 [[TMP0]], 4
+; CHECK-NEXT: [[N_VEC12:%.*]] = sub i64 [[TMP0]], [[TMP41]]
; CHECK-NEXT: br label %[[VECTOR_BODY:.*]]
; CHECK: [[VECTOR_BODY]]:
; CHECK-NEXT: [[INDEX:%.*]] = phi i64 [ 0, %[[VECTOR_PH]] ], [ [[INDEX_NEXT:%.*]], %[[VECTOR_BODY]] ]
@@ -359,18 +359,16 @@ define void @single_fmul_used_by_each_member(ptr noalias %A, ptr noalias %B, ptr
; CHECK-NEXT: store <2 x double> [[TMP30]], ptr [[TMP37]], align 8
; CHECK-NEXT: store <2 x double> [[TMP31]], ptr [[TMP43]], align 8
; CHECK-NEXT: [[INDEX_NEXT]] = add nuw i64 [[INDEX]], 4
-; CHECK-NEXT: [[TMP44:%.*]] = icmp eq i64 [[INDEX_NEXT]], [[N_VEC]]
+; CHECK-NEXT: [[TMP44:%.*]] = icmp eq i64 [[INDEX_NEXT]], [[N_VEC12]]
; CHECK-NEXT: br i1 [[TMP44]], label %[[MIDDLE_BLOCK:.*]], label %[[VECTOR_BODY]], !llvm.loop [[LOOP10:![0-9]+]]
; CHECK: [[MIDDLE_BLOCK]]:
-; CHECK-NEXT: [[CMP_N:%.*]] = icmp eq i64 [[TMP0]], [[N_VEC]]
+; CHECK-NEXT: [[CMP_N:%.*]] = icmp eq i64 [[TMP0]], [[N_VEC12]]
; CHECK-NEXT: br i1 [[CMP_N]], label %[[EXIT:.*]], label %[[VEC_EPILOG_ITER_CHECK:.*]]
; CHECK: [[VEC_EPILOG_ITER_CHECK]]:
-; CHECK-NEXT: [[MIN_EPILOG_ITERS_CHECK:%.*]] = icmp ult i64 [[N_MOD_VF]], 2
+; CHECK-NEXT: [[MIN_EPILOG_ITERS_CHECK:%.*]] = icmp ult i64 [[TMP41]], 2
; CHECK-NEXT: br i1 [[MIN_EPILOG_ITERS_CHECK]], label %[[VEC_EPILOG_SCALAR_PH]], label %[[VEC_EPILOG_PH]], !prof [[PROF7]]
; CHECK: [[VEC_EPILOG_PH]]:
-; CHECK-NEXT: [[VEC_EPILOG_RESUME_VAL:%.*]] = phi i64 [ [[N_VEC]], %[[VEC_EPILOG_ITER_CHECK]] ], [ 0, %[[VECTOR_MAIN_LOOP_ITER_CHECK]] ]
-; CHECK-NEXT: [[N_MOD_VF22:%.*]] = urem i64 [[TMP0]], 2
-; CHECK-NEXT: [[N_VEC23:%.*]] = sub i64 [[TMP0]], [[N_MOD_VF22]]
+; CHECK-NEXT: [[VEC_EPILOG_RESUME_VAL:%.*]] = phi i64 [ [[N_VEC12]], %[[VEC_EPILOG_ITER_CHECK]] ], [ 0, %[[VECTOR_MAIN_LOOP_ITER_CHECK]] ]
; CHECK-NEXT: br label %[[VEC_EPILOG_VECTOR_BODY:.*]]
; CHECK: [[VEC_EPILOG_VECTOR_BODY]]:
; CHECK-NEXT: [[INDEX24:%.*]] = phi i64 [ [[VEC_EPILOG_RESUME_VAL]], %[[VEC_EPILOG_PH]] ], [ [[INDEX_NEXT25:%.*]], %[[VEC_EPILOG_VECTOR_BODY]] ]
@@ -384,13 +382,12 @@ define void @single_fmul_used_by_each_member(ptr noalias %A, ptr noalias %B, ptr
; CHECK-NEXT: [[TMP50:%.*]] = getelementptr { double, double }, ptr [[C]], i64 [[INDEX24]]
; CHECK-NEXT: store <2 x double> [[TMP48]], ptr [[TMP50]], align 8
; CHECK-NEXT: [[INDEX_NEXT25]] = add nuw i64 [[INDEX24]], 1
-; CHECK-NEXT: [[TMP51:%.*]] = icmp eq i64 [[INDEX_NEXT25]], [[N_VEC23]]
-; CHECK-NEXT: br i1 [[TMP51]], label %[[VEC_EPILOG_MIDDLE_BLOCK:.*]], label %[[VEC_EPILOG_VECTOR_BODY]], !llvm.loop [[LOOP11:![0-9]+]]
+; CHECK-NEXT: [[TMP46:%.*]] = icmp eq i64 [[INDEX_NEXT25]], [[TMP0]]
+; CHECK-NEXT: br i1 [[TMP46]], label %[[VEC_EPILOG_MIDDLE_BLOCK:.*]], label %[[VEC_EPILOG_VECTOR_BODY]], !llvm.loop [[LOOP11:![0-9]+]]
; CHECK: [[VEC_EPILOG_MIDDLE_BLOCK]]:
-; CHECK-NEXT: [[CMP_N26:%.*]] = icmp eq i64 [[TMP0]], [[N_VEC23]]
-; CHECK-NEXT: br i1 [[CMP_N26]], label %[[EXIT]], label %[[VEC_EPILOG_SCALAR_PH]]
+; CHECK-NEXT: br i1 true, label %[[EXIT]], label %[[VEC_EPILOG_SCALAR_PH]]
; CHECK: [[VEC_EPILOG_SCALAR_PH]]:
-; CHECK-NEXT: [[BC_RESUME_VAL:%.*]] = phi i64 [ [[N_VEC23]], %[[VEC_EPILOG_MIDDLE_BLOCK]] ], [ [[N_VEC]], %[[VEC_EPILOG_ITER_CHECK]] ], [ 0, %[[VECTOR_SCEVCHECK]] ], [ 0, %[[ITER_CHECK]] ]
+; CHECK-NEXT: [[BC_RESUME_VAL:%.*]] = phi i64 [ [[TMP0]], %[[VEC_EPILOG_MIDDLE_BLOCK]] ], [ [[N_VEC12]], %[[VEC_EPILOG_ITER_CHECK]] ], [ 0, %[[VECTOR_SCEVCHECK]] ], [ 0, %[[ITER_CHECK]] ]
; CHECK-NEXT: br label %[[LOOP:.*]]
; CHECK: [[LOOP]]:
; CHECK-NEXT: [[IV:%.*]] = phi i64 [ [[BC_RESUME_VAL]], %[[VEC_EPILOG_SCALAR_PH]] ], [ [[IV_NEXT:%.*]], %[[LOOP]] ]
@@ -462,9 +459,8 @@ define void @test_interleave_after_narrowing(i32 %n, ptr %x, ptr noalias %y) {
; CHECK-NEXT: [[TMP3:%.*]] = icmp eq i64 [[INDEX_NEXT]], 256
; CHECK-NEXT: br i1 [[TMP3]], label %[[MIDDLE_BLOCK:.*]], label %[[VECTOR_BODY]], !llvm.loop [[LOOP13:![0-9]+]]
; CHECK: [[MIDDLE_BLOCK]]:
-; CHECK-NEXT: br label %[[EXIT:.*]]
+; CHECK-NEXT: br i1 true, [[EXIT1:label %.*]], label %[[EXIT:.*]]
; CHECK: [[EXIT]]:
-; CHECK-NEXT: ret void
;
entry:
br label %loop
diff --git a/llvm/test/Transforms/LoopVectorize/AArch64/transform-narrow-interleave-to-widen-memory-derived-ivs.ll b/llvm/test/Transforms/LoopVectorize/AArch64/transform-narrow-interleave-to-widen-memory-derived-ivs.ll
index fab0369de8aa0..6428fe6a29445 100644
--- a/llvm/test/Transforms/LoopVectorize/AArch64/transform-narrow-interleave-to-widen-memory-derived-ivs.ll
+++ b/llvm/test/Transforms/LoopVectorize/AArch64/transform-narrow-interleave-to-widen-memory-derived-ivs.ll
@@ -16,7 +16,7 @@ define void @derived_int_ivs(ptr noalias %a, ptr noalias %b, i64 %end) {
; VF2-NEXT: [[MIN_ITERS_CHECK:%.*]] = icmp ult i64 [[TMP2]], 2
; VF2-NEXT: br i1 [[MIN_ITERS_CHECK]], label %[[SCALAR_PH:.*]], label %[[VECTOR_PH:.*]]
; VF2: [[VECTOR_PH]]:
-; VF2-NEXT: [[N_MOD_VF:%.*]] = urem i64 [[TMP2]], 2
+; VF2-NEXT: [[N_MOD_VF:%.*]] = urem i64 [[TMP2]], 1
; VF2-NEXT: [[N_VEC:%.*]] = sub i64 [[TMP2]], [[N_MOD_VF]]
; VF2-NEXT: [[TMP3:%.*]] = mul i64 [[N_VEC]], 16
; VF2-NEXT: [[TMP4:%.*]] = add i64 16, [[TMP3]]
@@ -30,8 +30,8 @@ define void @derived_int_ivs(ptr noalias %a, ptr noalias %b, i64 %end) {
; VF2-NEXT: [[TMP7:%.*]] = getelementptr inbounds i8, ptr [[B]], i64 [[OFFSET_IDX]]
; VF2-NEXT: store <2 x double> [[WIDE_LOAD]], ptr [[TMP7]], align 8
; VF2-NEXT: [[INDEX_NEXT]] = add nuw i64 [[INDEX]], 1
-; VF2-NEXT: [[TMP9:%.*]] = icmp eq i64 [[INDEX_NEXT]], [[N_VEC]]
-; VF2-NEXT: br i1 [[TMP9]], label %[[MIDDLE_BLOCK:.*]], label %[[VECTOR_BODY]], !llvm.loop [[LOOP0:![0-9]+]]
+; VF2-NEXT: [[TMP8:%.*]] = icmp eq i64 [[INDEX_NEXT]], [[N_VEC]]
+; VF2-NEXT: br i1 [[TMP8]], label %[[MIDDLE_BLOCK:.*]], label %[[VECTOR_BODY]], !llvm.loop [[LOOP0:![0-9]+]]
...
[truncated]
|
artagnon
left a comment
There was a problem hiding this comment.
Looks good, pending rebase.
de680f9 to
58c3f06
Compare
|
ping. This should be ready for review now that #182146 landed |
58c3f06 to
af9765a
Compare
After VPSymbolicValues (like VF and VFxUF) are materialized via replaceAllUsesWith, they should not be accessed again. This patch: 1. Tracks materialization state in VPSymbolicValue. 2. Asserts if the materialized VPValue is used again. Currently it adds asserts to various member functions, preventing calling them on materialized symbolic values. Note that this still allows some uses (e.g. comparing VPSymbolicValue references or pointers), but this should be relatively harmless given that it is impossible to (re-)add any users. If we want to further tighten the checks, we could add asserts to the accessors or override operator&, but that will require more changes and not add much extra guards I think. Depends on llvm#182146 to fix a current access violation (included in PR).
af9765a to
b6c2030
Compare
…vpsymbolicvalue-after-mat
| VPValue *Const1 = Plan.getOrAddLiveIn(ConstantInt::get(Int64, 1)); | ||
| VF->replaceAllUsesWith(Const1); |
There was a problem hiding this comment.
| VPValue *Const1 = Plan.getOrAddLiveIn(ConstantInt::get(Int64, 1)); | |
| VF->replaceAllUsesWith(Const1); | |
| VF->replaceAllUsesWith(Plan.getConstantInt(Int64, 1)); |
and similarly below.
| bool isMaterialized() const { return Materialized; } | ||
|
|
||
| /// Mark this symbolic value as materialized. | ||
| void markMaterialized() { Materialized = true; } |
There was a problem hiding this comment.
Should we assert that it's not already materialized
| user_iterator user_end() { return Users.end(); } | ||
| const_user_iterator user_end() const { return Users.end(); } | ||
| user_iterator user_begin() { | ||
| assertNotMaterialized(); |
There was a problem hiding this comment.
Will these asserts ever be hit if we're already asserting in addUser/removeUser? We also allow calling getNumUsers() post-materialization without asserting.
There was a problem hiding this comment.
I added them just to have extra guards.
…C) (#182318) After VPSymbolicValues (like VF and VFxUF) are materialized via replaceAllUsesWith, they should not be accessed again. This patch: 1. Tracks materialization state in VPSymbolicValue. 2. Asserts if the materialized VPValue is used again. Currently it adds asserts to various member functions, preventing calling them on materialized symbolic values. Note that this still allows some uses (e.g. comparing VPSymbolicValue references or pointers), but this should be relatively harmless given that it is impossible to (re-)add any users. If we want to further tighten the checks, we could add asserts to the accessors or override operator&, but that will require more changes and not add much extra guards I think. Depends on llvm/llvm-project#182146 to fix a current access violation. PR: llvm/llvm-project#182318
ayalz
left a comment
There was a problem hiding this comment.
Post-commit comments.
Symbolic values are placeholders for live-ins or recipes, until their concrete value or code is known, at which time they are RAUW'd with the latter - aka being materialized. They are held and retrievable from VPlan, and are all associated with a loop region. Region values are also placeholders - for recipes, associated with a loop region, and are RAUW'd upon the region's dissolution, which also discards the placeholders. (Materializing and) Discarding symbolic values when dissolving their loop region may be too early, because some transforms still need access to them afterwards? Could VPlan be updated whenever its symbolic value is materialized, to provide the live-in/recipe value instead of the symbolic one?
| if (VF.getNumUsers() == 0) { | ||
| VPValue *RuntimeVFxUF = | ||
| Builder.createElementCount(TCTy, VFEC * Plan.getConcreteUF()); | ||
| VFxUF.replaceAllUsesWith(RuntimeVFxUF); |
There was a problem hiding this comment.
| VFxUF.replaceAllUsesWith(RuntimeVFxUF); | |
| VF.markMaterialized(); |
to prevent future users of VF, as it will not be RAUW'd later?
After VPSymbolicValues (like VF and VFxUF) are materialized via
replaceAllUsesWith, they should not be accessed again. This patch:
Tracks materialization state in VPSymbolicValue.
Asserts if the materialized VPValue is used again. Currently it
adds asserts to various member functions, preventing calling them
on materialized symbolic values.
Note that this still allows some uses (e.g. comparing VPSymbolicValue
references or pointers), but this should be relatively harmless given
that it is impossible to (re-)add any users. If we want to further
tighten the checks, we could add asserts to the accessors or override
operator&, but that will require more changes and not add much extra
guards I think.
Depends on #182146 to fix a
current access violation.