[VPlan] Materialize VectorTripCount in narrowInterleaveGroups.#182146
[VPlan] Materialize VectorTripCount in narrowInterleaveGroups.#182146
Conversation
|
@llvm/pr-subscribers-backend-risc-v @llvm/pr-subscribers-vectorizers Author: Florian Hahn (fhahn) ChangesWhen narrowInterleaveGroups transforms a plan, VF and VFxUF are materialized (replaced with concrete values). This patch also materializes the VectorTripCount in the same transform. This ensures that VectorTripCount is properly computed when the narrow interleave transform is applied, instead of using the original VF
The change also enables stricter verification prevent accesses of UF, VF, VFxUF etc after materialization as follow-up. Note that in some cases we no miss branch folding, but that should be addressed separately, #181252 Fixes one of the violations accessing a VectorTripCount after UF and VF being materialized Patch is 48.32 KiB, truncated to 20.00 KiB below, full version: https://github.com/llvm/llvm-project/pull/182146.diff 12 Files Affected:
diff --git a/llvm/lib/Transforms/Vectorize/VPlanTransforms.cpp b/llvm/lib/Transforms/Vectorize/VPlanTransforms.cpp
index bb1a91ec8c963..c78895b53e41d 100644
--- a/llvm/lib/Transforms/Vectorize/VPlanTransforms.cpp
+++ b/llvm/lib/Transforms/Vectorize/VPlanTransforms.cpp
@@ -5427,21 +5427,38 @@ VPlanTransforms::narrowInterleaveGroups(VPlan &Plan,
// original iteration.
auto *CanIV = VectorLoop->getCanonicalIV();
auto *Inc = cast<VPInstruction>(CanIV->getBackedgeValue());
- VPBuilder PHBuilder(Plan.getVectorPreheader());
+ VPBasicBlock *VectorPH = Plan.getVectorPreheader();
+ VPBuilder PHBuilder(VectorPH, VectorPH->begin());
VPValue *UF = &Plan.getUF();
+ VPValue *Step;
if (VFToOptimize->isScalable()) {
VPValue *VScale = PHBuilder.createElementCount(
VectorLoop->getCanonicalIVType(), ElementCount::getScalable(1));
- VPValue *VScaleUF = PHBuilder.createOverflowingOp(
- Instruction::Mul, {VScale, UF}, {true, false});
- Inc->setOperand(1, VScaleUF);
+ Step = PHBuilder.createOverflowingOp(Instruction::Mul, {VScale, UF},
+ {true, false});
Plan.getVF().replaceAllUsesWith(VScale);
} else {
- Inc->setOperand(1, UF);
+ Step = UF;
Plan.getVF().replaceAllUsesWith(
Plan.getConstantInt(CanIV->getScalarType(), 1));
}
+
+ // Materialize vector trip count with the narrowed step: TC - (TC % Step).
+ assert(Plan.getMiddleBlock()->getNumSuccessors() == 2 &&
+ "cannot materialize vector trip count when folding the tail or "
+ "requiring a scalar iteration");
+ VPValue *TC = Plan.getTripCount();
+ VPValue *R =
+ PHBuilder.createNaryOp(Instruction::URem, {TC, Step},
+ DebugLoc::getCompilerGenerated(), "n.mod.vf");
+ VPValue *VectorTC =
+ PHBuilder.createSub(TC, R, DebugLoc::getCompilerGenerated(), "n.vec");
+ Plan.getVectorTripCount().replaceAllUsesWith(VectorTC);
+
+ Inc->setOperand(1, Step);
+ Plan.getVFxUF().replaceAllUsesWith(Step);
+
removeDeadRecipes(Plan);
assert(none_of(*VectorLoop->getEntryBasicBlock(),
IsaPred<VPVectorPointerRecipe>) &&
diff --git a/llvm/test/Transforms/LoopVectorize/AArch64/transform-narrow-interleave-to-widen-memory-constant-ops.ll b/llvm/test/Transforms/LoopVectorize/AArch64/transform-narrow-interleave-to-widen-memory-constant-ops.ll
index 52bd8a0a11e35..6d0e0503c49a5 100644
--- a/llvm/test/Transforms/LoopVectorize/AArch64/transform-narrow-interleave-to-widen-memory-constant-ops.ll
+++ b/llvm/test/Transforms/LoopVectorize/AArch64/transform-narrow-interleave-to-widen-memory-constant-ops.ll
@@ -28,9 +28,8 @@ define void @test_add_double_same_const_args_1(ptr %res, ptr noalias %A, ptr noa
; CHECK-NEXT: [[TMP12:%.*]] = icmp eq i64 [[INDEX_NEXT]], 100
; CHECK-NEXT: br i1 [[TMP12]], label %[[MIDDLE_BLOCK:.*]], label %[[VECTOR_BODY]], !llvm.loop [[LOOP0:![0-9]+]]
; CHECK: [[MIDDLE_BLOCK]]:
-; CHECK-NEXT: br label %[[EXIT:.*]]
+; CHECK-NEXT: br i1 true, [[EXIT1:label %.*]], label %[[EXIT:.*]]
; CHECK: [[EXIT]]:
-; CHECK-NEXT: ret void
;
entry:
br label %loop
@@ -77,11 +76,10 @@ define void @test_add_double_same_const_args_2(ptr %res, ptr noalias %A, ptr noa
; CHECK-NEXT: store <2 x double> [[TMP7]], ptr [[TMP9]], align 4
; CHECK-NEXT: [[INDEX_NEXT]] = add nuw i64 [[INDEX]], 2
; CHECK-NEXT: [[TMP12:%.*]] = icmp eq i64 [[INDEX_NEXT]], 100
-; CHECK-NEXT: br i1 [[TMP12]], label %[[MIDDLE_BLOCK:.*]], label %[[VECTOR_BODY]], !llvm.loop [[LOOP3:![0-9]+]]
+; CHECK-NEXT: br i1 [[TMP12]], label %[[MIDDLE_BLOCK:.*]], label %[[VECTOR_BODY]], !llvm.loop [[LOOP4:![0-9]+]]
; CHECK: [[MIDDLE_BLOCK]]:
-; CHECK-NEXT: br label %[[EXIT:.*]]
+; CHECK-NEXT: br i1 true, [[EXIT1:label %.*]], label %[[EXIT:.*]]
; CHECK: [[EXIT]]:
-; CHECK-NEXT: ret void
;
entry:
br label %loop
@@ -191,11 +189,10 @@ define void @test_add_double_same_var_args_1(ptr %res, ptr noalias %A, ptr noali
; CHECK-NEXT: store <2 x double> [[TMP6]], ptr [[TMP8]], align 4
; CHECK-NEXT: [[INDEX_NEXT]] = add nuw i64 [[INDEX]], 2
; CHECK-NEXT: [[TMP11:%.*]] = icmp eq i64 [[INDEX_NEXT]], 100
-; CHECK-NEXT: br i1 [[TMP11]], label %[[MIDDLE_BLOCK:.*]], label %[[VECTOR_BODY]], !llvm.loop [[LOOP5:![0-9]+]]
+; CHECK-NEXT: br i1 [[TMP11]], label %[[MIDDLE_BLOCK:.*]], label %[[VECTOR_BODY]], !llvm.loop [[LOOP7:![0-9]+]]
; CHECK: [[MIDDLE_BLOCK]]:
-; CHECK-NEXT: br label %[[EXIT:.*]]
+; CHECK-NEXT: br i1 true, [[EXIT1:label %.*]], label %[[EXIT:.*]]
; CHECK: [[EXIT]]:
-; CHECK-NEXT: ret void
;
entry:
br label %loop
@@ -244,11 +241,10 @@ define void @test_add_double_same_var_args_2(ptr %res, ptr noalias %A, ptr noali
; CHECK-NEXT: store <2 x double> [[TMP6]], ptr [[TMP8]], align 4
; CHECK-NEXT: [[INDEX_NEXT]] = add nuw i64 [[INDEX]], 2
; CHECK-NEXT: [[TMP11:%.*]] = icmp eq i64 [[INDEX_NEXT]], 100
-; CHECK-NEXT: br i1 [[TMP11]], label %[[MIDDLE_BLOCK:.*]], label %[[VECTOR_BODY]], !llvm.loop [[LOOP6:![0-9]+]]
+; CHECK-NEXT: br i1 [[TMP11]], label %[[MIDDLE_BLOCK:.*]], label %[[VECTOR_BODY]], !llvm.loop [[LOOP9:![0-9]+]]
; CHECK: [[MIDDLE_BLOCK]]:
-; CHECK-NEXT: br label %[[EXIT:.*]]
+; CHECK-NEXT: br i1 true, [[EXIT1:label %.*]], label %[[EXIT:.*]]
; CHECK: [[EXIT]]:
-; CHECK-NEXT: ret void
;
entry:
br label %loop
diff --git a/llvm/test/Transforms/LoopVectorize/AArch64/transform-narrow-interleave-to-widen-memory-cost.ll b/llvm/test/Transforms/LoopVectorize/AArch64/transform-narrow-interleave-to-widen-memory-cost.ll
index 5e37f9eff4ba2..ccbcfff7fda7b 100644
--- a/llvm/test/Transforms/LoopVectorize/AArch64/transform-narrow-interleave-to-widen-memory-cost.ll
+++ b/llvm/test/Transforms/LoopVectorize/AArch64/transform-narrow-interleave-to-widen-memory-cost.ll
@@ -87,7 +87,7 @@ define void @test_complex_add_double(ptr %res, ptr noalias %A, ptr noalias %B, i
; CHECK-NEXT: [[MIN_ITERS_CHECK:%.*]] = icmp ult i64 [[N]], 4
; CHECK-NEXT: br i1 [[MIN_ITERS_CHECK]], label %[[SCALAR_PH:.*]], label %[[VECTOR_PH:.*]]
; CHECK: [[VECTOR_PH]]:
-; CHECK-NEXT: [[N_MOD_VF:%.*]] = urem i64 [[N]], 4
+; CHECK-NEXT: [[N_MOD_VF:%.*]] = urem i64 [[N]], 2
; CHECK-NEXT: [[N_VEC:%.*]] = sub i64 [[N]], [[N_MOD_VF]]
; CHECK-NEXT: br label %[[VECTOR_BODY:.*]]
; CHECK: [[VECTOR_BODY]]:
@@ -314,8 +314,8 @@ define void @single_fmul_used_by_each_member(ptr noalias %A, ptr noalias %B, ptr
; CHECK-NEXT: [[MIN_ITERS_CHECK11:%.*]] = icmp ult i64 [[TMP0]], 8
; CHECK-NEXT: br i1 [[MIN_ITERS_CHECK11]], label %[[VEC_EPILOG_PH:.*]], label %[[VECTOR_PH:.*]]
; CHECK: [[VECTOR_PH]]:
-; CHECK-NEXT: [[N_MOD_VF:%.*]] = urem i64 [[TMP0]], 8
-; CHECK-NEXT: [[N_VEC:%.*]] = sub i64 [[TMP0]], [[N_MOD_VF]]
+; CHECK-NEXT: [[TMP41:%.*]] = urem i64 [[TMP0]], 4
+; CHECK-NEXT: [[N_VEC12:%.*]] = sub i64 [[TMP0]], [[TMP41]]
; CHECK-NEXT: br label %[[VECTOR_BODY:.*]]
; CHECK: [[VECTOR_BODY]]:
; CHECK-NEXT: [[INDEX:%.*]] = phi i64 [ 0, %[[VECTOR_PH]] ], [ [[INDEX_NEXT:%.*]], %[[VECTOR_BODY]] ]
@@ -359,18 +359,16 @@ define void @single_fmul_used_by_each_member(ptr noalias %A, ptr noalias %B, ptr
; CHECK-NEXT: store <2 x double> [[TMP30]], ptr [[TMP37]], align 8
; CHECK-NEXT: store <2 x double> [[TMP31]], ptr [[TMP43]], align 8
; CHECK-NEXT: [[INDEX_NEXT]] = add nuw i64 [[INDEX]], 4
-; CHECK-NEXT: [[TMP44:%.*]] = icmp eq i64 [[INDEX_NEXT]], [[N_VEC]]
+; CHECK-NEXT: [[TMP44:%.*]] = icmp eq i64 [[INDEX_NEXT]], [[N_VEC12]]
; CHECK-NEXT: br i1 [[TMP44]], label %[[MIDDLE_BLOCK:.*]], label %[[VECTOR_BODY]], !llvm.loop [[LOOP10:![0-9]+]]
; CHECK: [[MIDDLE_BLOCK]]:
-; CHECK-NEXT: [[CMP_N:%.*]] = icmp eq i64 [[TMP0]], [[N_VEC]]
+; CHECK-NEXT: [[CMP_N:%.*]] = icmp eq i64 [[TMP0]], [[N_VEC12]]
; CHECK-NEXT: br i1 [[CMP_N]], label %[[EXIT:.*]], label %[[VEC_EPILOG_ITER_CHECK:.*]]
; CHECK: [[VEC_EPILOG_ITER_CHECK]]:
-; CHECK-NEXT: [[MIN_EPILOG_ITERS_CHECK:%.*]] = icmp ult i64 [[N_MOD_VF]], 2
+; CHECK-NEXT: [[MIN_EPILOG_ITERS_CHECK:%.*]] = icmp ult i64 [[TMP41]], 2
; CHECK-NEXT: br i1 [[MIN_EPILOG_ITERS_CHECK]], label %[[VEC_EPILOG_SCALAR_PH]], label %[[VEC_EPILOG_PH]], !prof [[PROF7]]
; CHECK: [[VEC_EPILOG_PH]]:
-; CHECK-NEXT: [[VEC_EPILOG_RESUME_VAL:%.*]] = phi i64 [ [[N_VEC]], %[[VEC_EPILOG_ITER_CHECK]] ], [ 0, %[[VECTOR_MAIN_LOOP_ITER_CHECK]] ]
-; CHECK-NEXT: [[N_MOD_VF22:%.*]] = urem i64 [[TMP0]], 2
-; CHECK-NEXT: [[N_VEC23:%.*]] = sub i64 [[TMP0]], [[N_MOD_VF22]]
+; CHECK-NEXT: [[VEC_EPILOG_RESUME_VAL:%.*]] = phi i64 [ [[N_VEC12]], %[[VEC_EPILOG_ITER_CHECK]] ], [ 0, %[[VECTOR_MAIN_LOOP_ITER_CHECK]] ]
; CHECK-NEXT: br label %[[VEC_EPILOG_VECTOR_BODY:.*]]
; CHECK: [[VEC_EPILOG_VECTOR_BODY]]:
; CHECK-NEXT: [[INDEX24:%.*]] = phi i64 [ [[VEC_EPILOG_RESUME_VAL]], %[[VEC_EPILOG_PH]] ], [ [[INDEX_NEXT25:%.*]], %[[VEC_EPILOG_VECTOR_BODY]] ]
@@ -384,13 +382,12 @@ define void @single_fmul_used_by_each_member(ptr noalias %A, ptr noalias %B, ptr
; CHECK-NEXT: [[TMP50:%.*]] = getelementptr { double, double }, ptr [[C]], i64 [[INDEX24]]
; CHECK-NEXT: store <2 x double> [[TMP48]], ptr [[TMP50]], align 8
; CHECK-NEXT: [[INDEX_NEXT25]] = add nuw i64 [[INDEX24]], 1
-; CHECK-NEXT: [[TMP51:%.*]] = icmp eq i64 [[INDEX_NEXT25]], [[N_VEC23]]
-; CHECK-NEXT: br i1 [[TMP51]], label %[[VEC_EPILOG_MIDDLE_BLOCK:.*]], label %[[VEC_EPILOG_VECTOR_BODY]], !llvm.loop [[LOOP11:![0-9]+]]
+; CHECK-NEXT: [[TMP46:%.*]] = icmp eq i64 [[INDEX_NEXT25]], [[TMP0]]
+; CHECK-NEXT: br i1 [[TMP46]], label %[[VEC_EPILOG_MIDDLE_BLOCK:.*]], label %[[VEC_EPILOG_VECTOR_BODY]], !llvm.loop [[LOOP11:![0-9]+]]
; CHECK: [[VEC_EPILOG_MIDDLE_BLOCK]]:
-; CHECK-NEXT: [[CMP_N26:%.*]] = icmp eq i64 [[TMP0]], [[N_VEC23]]
-; CHECK-NEXT: br i1 [[CMP_N26]], label %[[EXIT]], label %[[VEC_EPILOG_SCALAR_PH]]
+; CHECK-NEXT: br i1 true, label %[[EXIT]], label %[[VEC_EPILOG_SCALAR_PH]]
; CHECK: [[VEC_EPILOG_SCALAR_PH]]:
-; CHECK-NEXT: [[BC_RESUME_VAL:%.*]] = phi i64 [ [[N_VEC23]], %[[VEC_EPILOG_MIDDLE_BLOCK]] ], [ [[N_VEC]], %[[VEC_EPILOG_ITER_CHECK]] ], [ 0, %[[VECTOR_SCEVCHECK]] ], [ 0, %[[ITER_CHECK]] ]
+; CHECK-NEXT: [[BC_RESUME_VAL:%.*]] = phi i64 [ [[TMP0]], %[[VEC_EPILOG_MIDDLE_BLOCK]] ], [ [[N_VEC12]], %[[VEC_EPILOG_ITER_CHECK]] ], [ 0, %[[VECTOR_SCEVCHECK]] ], [ 0, %[[ITER_CHECK]] ]
; CHECK-NEXT: br label %[[LOOP:.*]]
; CHECK: [[LOOP]]:
; CHECK-NEXT: [[IV:%.*]] = phi i64 [ [[BC_RESUME_VAL]], %[[VEC_EPILOG_SCALAR_PH]] ], [ [[IV_NEXT:%.*]], %[[LOOP]] ]
@@ -462,9 +459,8 @@ define void @test_interleave_after_narrowing(i32 %n, ptr %x, ptr noalias %y) {
; CHECK-NEXT: [[TMP3:%.*]] = icmp eq i64 [[INDEX_NEXT]], 256
; CHECK-NEXT: br i1 [[TMP3]], label %[[MIDDLE_BLOCK:.*]], label %[[VECTOR_BODY]], !llvm.loop [[LOOP13:![0-9]+]]
; CHECK: [[MIDDLE_BLOCK]]:
-; CHECK-NEXT: br label %[[EXIT:.*]]
+; CHECK-NEXT: br i1 true, [[EXIT1:label %.*]], label %[[EXIT:.*]]
; CHECK: [[EXIT]]:
-; CHECK-NEXT: ret void
;
entry:
br label %loop
diff --git a/llvm/test/Transforms/LoopVectorize/AArch64/transform-narrow-interleave-to-widen-memory-derived-ivs.ll b/llvm/test/Transforms/LoopVectorize/AArch64/transform-narrow-interleave-to-widen-memory-derived-ivs.ll
index fab0369de8aa0..6428fe6a29445 100644
--- a/llvm/test/Transforms/LoopVectorize/AArch64/transform-narrow-interleave-to-widen-memory-derived-ivs.ll
+++ b/llvm/test/Transforms/LoopVectorize/AArch64/transform-narrow-interleave-to-widen-memory-derived-ivs.ll
@@ -16,7 +16,7 @@ define void @derived_int_ivs(ptr noalias %a, ptr noalias %b, i64 %end) {
; VF2-NEXT: [[MIN_ITERS_CHECK:%.*]] = icmp ult i64 [[TMP2]], 2
; VF2-NEXT: br i1 [[MIN_ITERS_CHECK]], label %[[SCALAR_PH:.*]], label %[[VECTOR_PH:.*]]
; VF2: [[VECTOR_PH]]:
-; VF2-NEXT: [[N_MOD_VF:%.*]] = urem i64 [[TMP2]], 2
+; VF2-NEXT: [[N_MOD_VF:%.*]] = urem i64 [[TMP2]], 1
; VF2-NEXT: [[N_VEC:%.*]] = sub i64 [[TMP2]], [[N_MOD_VF]]
; VF2-NEXT: [[TMP3:%.*]] = mul i64 [[N_VEC]], 16
; VF2-NEXT: [[TMP4:%.*]] = add i64 16, [[TMP3]]
@@ -30,8 +30,8 @@ define void @derived_int_ivs(ptr noalias %a, ptr noalias %b, i64 %end) {
; VF2-NEXT: [[TMP7:%.*]] = getelementptr inbounds i8, ptr [[B]], i64 [[OFFSET_IDX]]
; VF2-NEXT: store <2 x double> [[WIDE_LOAD]], ptr [[TMP7]], align 8
; VF2-NEXT: [[INDEX_NEXT]] = add nuw i64 [[INDEX]], 1
-; VF2-NEXT: [[TMP9:%.*]] = icmp eq i64 [[INDEX_NEXT]], [[N_VEC]]
-; VF2-NEXT: br i1 [[TMP9]], label %[[MIDDLE_BLOCK:.*]], label %[[VECTOR_BODY]], !llvm.loop [[LOOP0:![0-9]+]]
+; VF2-NEXT: [[TMP8:%.*]] = icmp eq i64 [[INDEX_NEXT]], [[N_VEC]]
+; VF2-NEXT: br i1 [[TMP8]], label %[[MIDDLE_BLOCK:.*]], label %[[VECTOR_BODY]], !llvm.loop [[LOOP0:![0-9]+]]
; VF2: [[MIDDLE_BLOCK]]:
; VF2-NEXT: [[CMP_N:%.*]] = icmp eq i64 [[TMP2]], [[N_VEC]]
; VF2-NEXT: br i1 [[CMP_N]], [[EXIT:label %.*]], label %[[SCALAR_PH]]
@@ -46,7 +46,7 @@ define void @derived_int_ivs(ptr noalias %a, ptr noalias %b, i64 %end) {
; VF2IC2-NEXT: [[MIN_ITERS_CHECK:%.*]] = icmp ult i64 [[TMP2]], 4
; VF2IC2-NEXT: br i1 [[MIN_ITERS_CHECK]], label %[[SCALAR_PH:.*]], label %[[VECTOR_PH:.*]]
; VF2IC2: [[VECTOR_PH]]:
-; VF2IC2-NEXT: [[N_MOD_VF:%.*]] = urem i64 [[TMP2]], 4
+; VF2IC2-NEXT: [[N_MOD_VF:%.*]] = urem i64 [[TMP2]], 2
; VF2IC2-NEXT: [[N_VEC:%.*]] = sub i64 [[TMP2]], [[N_MOD_VF]]
; VF2IC2-NEXT: [[TMP3:%.*]] = mul i64 [[N_VEC]], 16
; VF2IC2-NEXT: [[TMP4:%.*]] = add i64 16, [[TMP3]]
@@ -154,7 +154,7 @@ define void @derived_pointer_ivs(ptr noalias %a, ptr noalias %b, ptr %end) {
; VF2-NEXT: [[FOUND_CONFLICT:%.*]] = and i1 [[BOUND0]], [[BOUND1]]
; VF2-NEXT: br i1 [[FOUND_CONFLICT]], label %[[SCALAR_PH]], label %[[VECTOR_PH:.*]]
; VF2: [[VECTOR_PH]]:
-; VF2-NEXT: [[N_MOD_VF:%.*]] = urem i64 [[TMP3]], 2
+; VF2-NEXT: [[N_MOD_VF:%.*]] = urem i64 [[TMP3]], 1
; VF2-NEXT: [[N_VEC:%.*]] = sub i64 [[TMP3]], [[N_MOD_VF]]
; VF2-NEXT: [[TMP9:%.*]] = mul i64 [[N_VEC]], 16
; VF2-NEXT: [[TMP10:%.*]] = getelementptr i8, ptr [[A]], i64 [[TMP9]]
@@ -170,8 +170,8 @@ define void @derived_pointer_ivs(ptr noalias %a, ptr noalias %b, ptr %end) {
; VF2-NEXT: [[WIDE_LOAD:%.*]] = load <2 x double>, ptr [[NEXT_GEP]], align 8
; VF2-NEXT: store <2 x double> [[WIDE_LOAD]], ptr [[NEXT_GEP7]], align 8
; VF2-NEXT: [[INDEX_NEXT]] = add nuw i64 [[INDEX]], 1
-; VF2-NEXT: [[TMP14:%.*]] = icmp eq i64 [[INDEX_NEXT]], [[N_VEC]]
-; VF2-NEXT: br i1 [[TMP14]], label %[[MIDDLE_BLOCK:.*]], label %[[VECTOR_BODY]], !llvm.loop [[LOOP4:![0-9]+]]
+; VF2-NEXT: [[TMP13:%.*]] = icmp eq i64 [[INDEX_NEXT]], [[N_VEC]]
+; VF2-NEXT: br i1 [[TMP13]], label %[[MIDDLE_BLOCK:.*]], label %[[VECTOR_BODY]], !llvm.loop [[LOOP4:![0-9]+]]
; VF2: [[MIDDLE_BLOCK]]:
; VF2-NEXT: [[CMP_N:%.*]] = icmp eq i64 [[TMP3]], [[N_VEC]]
; VF2-NEXT: br i1 [[CMP_N]], [[EXIT:label %.*]], label %[[SCALAR_PH]]
@@ -203,7 +203,7 @@ define void @derived_pointer_ivs(ptr noalias %a, ptr noalias %b, ptr %end) {
; VF2IC2-NEXT: [[FOUND_CONFLICT:%.*]] = and i1 [[BOUND0]], [[BOUND1]]
; VF2IC2-NEXT: br i1 [[FOUND_CONFLICT]], label %[[SCALAR_PH]], label %[[VECTOR_PH:.*]]
; VF2IC2: [[VECTOR_PH]]:
-; VF2IC2-NEXT: [[N_MOD_VF:%.*]] = urem i64 [[TMP3]], 4
+; VF2IC2-NEXT: [[N_MOD_VF:%.*]] = urem i64 [[TMP3]], 2
; VF2IC2-NEXT: [[N_VEC:%.*]] = sub i64 [[TMP3]], [[N_MOD_VF]]
; VF2IC2-NEXT: [[TMP9:%.*]] = mul i64 [[N_VEC]], 16
; VF2IC2-NEXT: [[TMP10:%.*]] = getelementptr i8, ptr [[A]], i64 [[TMP9]]
@@ -324,9 +324,8 @@ define void @narrow_with_uniform_add_and_gep(ptr noalias %p) {
; VF2-NEXT: [[TMP5:%.*]] = icmp eq i64 [[INDEX_NEXT]], 512
; VF2-NEXT: br i1 [[TMP5]], label %[[MIDDLE_BLOCK:.*]], label %[[VECTOR_BODY]], !llvm.loop [[LOOP6:![0-9]+]]
; VF2: [[MIDDLE_BLOCK]]:
-; VF2-NEXT: br label %[[EXIT:.*]]
+; VF2-NEXT: br i1 true, [[EXIT1:label %.*]], label %[[EXIT:.*]]
; VF2: [[EXIT]]:
-; VF2-NEXT: ret void
;
; VF2IC2-LABEL: define void @narrow_with_uniform_add_and_gep(
; VF2IC2-SAME: ptr noalias [[P:%.*]]) {
@@ -350,9 +349,8 @@ define void @narrow_with_uniform_add_and_gep(ptr noalias %p) {
; VF2IC2-NEXT: [[TMP7:%.*]] = icmp eq i64 [[INDEX_NEXT]], 512
; VF2IC2-NEXT: br i1 [[TMP7]], label %[[MIDDLE_BLOCK:.*]], label %[[VECTOR_BODY]], !llvm.loop [[LOOP6:![0-9]+]]
; VF2IC2: [[MIDDLE_BLOCK]]:
-; VF2IC2-NEXT: br label %[[EXIT:.*]]
+; VF2IC2-NEXT: br i1 true, [[EXIT1:label %.*]], label %[[EXIT:.*]]
; VF2IC2: [[EXIT]]:
-; VF2IC2-NEXT: ret void
;
; VF4-LABEL: define void @narrow_with_uniform_add_and_gep(
; VF4-SAME: ptr noalias [[P:%.*]]) {
diff --git a/llvm/test/Transforms/LoopVectorize/AArch64/transform-narrow-interleave-to-widen-memory-metadata.ll b/llvm/test/Transforms/LoopVectorize/AArch64/transform-narrow-interleave-to-widen-memory-metadata.ll
index 442574f298c3f..773e6f0bfc4a5 100644
--- a/llvm/test/Transforms/LoopVectorize/AArch64/transform-narrow-interleave-to-widen-memory-metadata.ll
+++ b/llvm/test/Transforms/LoopVectorize/AArch64/transform-narrow-interleave-to-widen-memory-metadata.ll
@@ -23,8 +23,8 @@ define void @load_store_interleave_group_with_metadata(ptr noalias %data) {
; VF2-NEXT: [[TMP2:%.*]] = icmp eq i64 [[INDEX_NEXT]], 100
; VF2-NEXT: br i1 [[TMP2]], label %[[MIDDLE_BLOCK:.*]], label %[[VECTOR_BODY]], !llvm.loop [[LOOP0:![0-9]+]]
; VF2: [[MIDDLE_BLOCK]]:
-; VF2-NEXT: br [[EXIT:label %.*]]
-; VF2: [[SCALAR_PH:.*:]]
+; VF2-NEXT: br i1 true, [[EXIT:label %.*]], label %[[SCALAR_PH:.*]]
+; VF2: [[SCALAR_PH]]:
;
entry:
br label %loop
diff --git a/llvm/test/Transforms/LoopVectorize/AArch64/transform-narrow-interleave-to-widen-memory-remove-loop-region.ll b/llvm/test/Transforms/LoopVectorize/AArch64/transform-narrow-interleave-to-widen-memory-remove-loop-region.ll
index 7d77f2f6b5b9c..d4fb7f7e7685c 100644
--- a/llvm/test/Transforms/LoopVectorize/AArch64/transform-narrow-interleave-to-widen-memory-remove-loop-region.ll
+++ b/llvm/test/Transforms/LoopVectorize/AArch64/transform-narrow-interleave-to-widen-memory-remove-loop-region.ll
@@ -22,8 +22,23 @@ define void @load_store_interleave_group_tc_2(ptr noalias %data) {
; VF2-NEXT: [[TMP2:%.*]] = icmp eq i64 [[INDEX_NEXT]], 2
; VF2-NEXT: br i1 [[TMP2]], label %[[MIDDLE_BLOCK:.*]], label %[[VECTOR_BODY]], !llvm.loop [[LOOP0:![0-9]+]]
; VF2: [[MIDDLE_BLOCK]]:
+; VF2-NEXT: br i1 true, label %[[EXIT1:.*]], label %[[SCALAR_PH:.*]]
+; VF2: [[SCALAR_PH]]:
; VF2-NEXT: br label %[[EXIT:.*]]
; VF2: [[EXIT]]:
+; VF2-NEXT: [[IV:%.*]] = phi i64 [ 2, %[[SCALAR_PH]] ], [ [[IV_NEXT:%.*]], %[[EXIT]] ]
+; VF2-NEXT: [[MUL_2:%.*]] = shl nsw i64 [[IV]], 1
+; VF2-NEXT: [[DATA_0:%.*]] = getelementptr inbounds i64, ptr [[DATA]], i64 [[MUL_2]]
+; VF2-NEXT: [[L_0:%.*]] = load i64, ptr [[DATA_0]], align 8
+; VF2-NEXT: store i64 [[L_0]], ptr [[DATA_0]], align 8
+; VF2-NEXT: [[ADD_1:%.*]] = or disjoint i64 [[MUL_2]], 1
+; VF2-NEXT: [[DATA_1:%.*]] = getelementptr inbounds i64, ptr [[DATA]], i64 [[ADD_1]]
+; VF2-NEXT: [[L_1:%.*]] = load i64, ptr [[DATA_1]], align 8
+; VF2-NEXT: store i64 [[L_1]], ptr [[DATA_1]], align 8
+; VF2-NEXT: [[IV_NEXT]] = add nuw nsw i64 [[IV]], 1
+; VF2-NEXT: [[EC:%.*]] = icmp eq i64 [[IV_NEXT]], 2
+; VF2-NEXT: br i1 [[EC]], label %[[EXIT1]], label %[[EXIT]], !llvm.loop [[LOOP3:![0-9]+]]
+; VF2: [[EXIT1]]:
; VF2-NEXT: ret void
;
; VF4-LABEL: define void @load_store_interleave_group_tc_2(
@@ -205,7 +220,7 @@ define void @test_complex_add_float_tc_4(ptr %res, ptr noalias %A, ptr noalias %
; VF2-NEXT: store <4 x float> [[INTERLEAVED_VEC]], ptr [[TMP5]], align 4
; VF2-NEXT: [[INDEX_NEXT]] = add nuw i64 [[INDEX]], 2
; VF2-NEXT: [[TMP7:%.*]] = icmp eq i64 [[INDEX_NEXT]], 4
-; VF2-NEXT: br i1 [[TMP7]], label %[[MIDDLE_BLOCK:.*]], label %[[VECTOR_BODY]], !llvm.loop [[LOOP3:![0-9]+]]
+; VF2-NEXT: br i1 [[TMP7]], label %[[MIDDLE_BLOCK:.*]], label %[[VECTOR_BODY]], !llvm.loop [[LOOP4:![0-9]+]]
; VF2: [[MIDDLE_BLOCK]]:
; VF2-NEXT: br label %[[EXIT:.*]]
; VF2: [[EXIT]]:
diff --git a/llvm/test/Transforms/LoopVectorize/AArch64/transform-narrow-interleave-to-widen-memory-scalable.ll b/llvm/test/Transforms/LoopVectorize/AArch64/transform-narrow-interleave-to-widen-memory-s...
[truncated]
|
|
@llvm/pr-subscribers-llvm-transforms Author: Florian Hahn (fhahn) ChangesWhen narrowInterleaveGroups transforms a plan, VF and VFxUF are materialized (replaced with concrete values). This patch also materializes the VectorTripCount in the same transform. This ensures that VectorTripCount is properly computed when the narrow interleave transform is applied, instead of using the original VF
The change also enables stricter verification prevent accesses of UF, VF, VFxUF etc after materialization as follow-up. Note that in some cases we no miss branch folding, but that should be addressed separately, #181252 Fixes one of the violations accessing a VectorTripCount after UF and VF being materialized Patch is 48.32 KiB, truncated to 20.00 KiB below, full version: https://github.com/llvm/llvm-project/pull/182146.diff 12 Files Affected:
diff --git a/llvm/lib/Transforms/Vectorize/VPlanTransforms.cpp b/llvm/lib/Transforms/Vectorize/VPlanTransforms.cpp
index bb1a91ec8c963..c78895b53e41d 100644
--- a/llvm/lib/Transforms/Vectorize/VPlanTransforms.cpp
+++ b/llvm/lib/Transforms/Vectorize/VPlanTransforms.cpp
@@ -5427,21 +5427,38 @@ VPlanTransforms::narrowInterleaveGroups(VPlan &Plan,
// original iteration.
auto *CanIV = VectorLoop->getCanonicalIV();
auto *Inc = cast<VPInstruction>(CanIV->getBackedgeValue());
- VPBuilder PHBuilder(Plan.getVectorPreheader());
+ VPBasicBlock *VectorPH = Plan.getVectorPreheader();
+ VPBuilder PHBuilder(VectorPH, VectorPH->begin());
VPValue *UF = &Plan.getUF();
+ VPValue *Step;
if (VFToOptimize->isScalable()) {
VPValue *VScale = PHBuilder.createElementCount(
VectorLoop->getCanonicalIVType(), ElementCount::getScalable(1));
- VPValue *VScaleUF = PHBuilder.createOverflowingOp(
- Instruction::Mul, {VScale, UF}, {true, false});
- Inc->setOperand(1, VScaleUF);
+ Step = PHBuilder.createOverflowingOp(Instruction::Mul, {VScale, UF},
+ {true, false});
Plan.getVF().replaceAllUsesWith(VScale);
} else {
- Inc->setOperand(1, UF);
+ Step = UF;
Plan.getVF().replaceAllUsesWith(
Plan.getConstantInt(CanIV->getScalarType(), 1));
}
+
+ // Materialize vector trip count with the narrowed step: TC - (TC % Step).
+ assert(Plan.getMiddleBlock()->getNumSuccessors() == 2 &&
+ "cannot materialize vector trip count when folding the tail or "
+ "requiring a scalar iteration");
+ VPValue *TC = Plan.getTripCount();
+ VPValue *R =
+ PHBuilder.createNaryOp(Instruction::URem, {TC, Step},
+ DebugLoc::getCompilerGenerated(), "n.mod.vf");
+ VPValue *VectorTC =
+ PHBuilder.createSub(TC, R, DebugLoc::getCompilerGenerated(), "n.vec");
+ Plan.getVectorTripCount().replaceAllUsesWith(VectorTC);
+
+ Inc->setOperand(1, Step);
+ Plan.getVFxUF().replaceAllUsesWith(Step);
+
removeDeadRecipes(Plan);
assert(none_of(*VectorLoop->getEntryBasicBlock(),
IsaPred<VPVectorPointerRecipe>) &&
diff --git a/llvm/test/Transforms/LoopVectorize/AArch64/transform-narrow-interleave-to-widen-memory-constant-ops.ll b/llvm/test/Transforms/LoopVectorize/AArch64/transform-narrow-interleave-to-widen-memory-constant-ops.ll
index 52bd8a0a11e35..6d0e0503c49a5 100644
--- a/llvm/test/Transforms/LoopVectorize/AArch64/transform-narrow-interleave-to-widen-memory-constant-ops.ll
+++ b/llvm/test/Transforms/LoopVectorize/AArch64/transform-narrow-interleave-to-widen-memory-constant-ops.ll
@@ -28,9 +28,8 @@ define void @test_add_double_same_const_args_1(ptr %res, ptr noalias %A, ptr noa
; CHECK-NEXT: [[TMP12:%.*]] = icmp eq i64 [[INDEX_NEXT]], 100
; CHECK-NEXT: br i1 [[TMP12]], label %[[MIDDLE_BLOCK:.*]], label %[[VECTOR_BODY]], !llvm.loop [[LOOP0:![0-9]+]]
; CHECK: [[MIDDLE_BLOCK]]:
-; CHECK-NEXT: br label %[[EXIT:.*]]
+; CHECK-NEXT: br i1 true, [[EXIT1:label %.*]], label %[[EXIT:.*]]
; CHECK: [[EXIT]]:
-; CHECK-NEXT: ret void
;
entry:
br label %loop
@@ -77,11 +76,10 @@ define void @test_add_double_same_const_args_2(ptr %res, ptr noalias %A, ptr noa
; CHECK-NEXT: store <2 x double> [[TMP7]], ptr [[TMP9]], align 4
; CHECK-NEXT: [[INDEX_NEXT]] = add nuw i64 [[INDEX]], 2
; CHECK-NEXT: [[TMP12:%.*]] = icmp eq i64 [[INDEX_NEXT]], 100
-; CHECK-NEXT: br i1 [[TMP12]], label %[[MIDDLE_BLOCK:.*]], label %[[VECTOR_BODY]], !llvm.loop [[LOOP3:![0-9]+]]
+; CHECK-NEXT: br i1 [[TMP12]], label %[[MIDDLE_BLOCK:.*]], label %[[VECTOR_BODY]], !llvm.loop [[LOOP4:![0-9]+]]
; CHECK: [[MIDDLE_BLOCK]]:
-; CHECK-NEXT: br label %[[EXIT:.*]]
+; CHECK-NEXT: br i1 true, [[EXIT1:label %.*]], label %[[EXIT:.*]]
; CHECK: [[EXIT]]:
-; CHECK-NEXT: ret void
;
entry:
br label %loop
@@ -191,11 +189,10 @@ define void @test_add_double_same_var_args_1(ptr %res, ptr noalias %A, ptr noali
; CHECK-NEXT: store <2 x double> [[TMP6]], ptr [[TMP8]], align 4
; CHECK-NEXT: [[INDEX_NEXT]] = add nuw i64 [[INDEX]], 2
; CHECK-NEXT: [[TMP11:%.*]] = icmp eq i64 [[INDEX_NEXT]], 100
-; CHECK-NEXT: br i1 [[TMP11]], label %[[MIDDLE_BLOCK:.*]], label %[[VECTOR_BODY]], !llvm.loop [[LOOP5:![0-9]+]]
+; CHECK-NEXT: br i1 [[TMP11]], label %[[MIDDLE_BLOCK:.*]], label %[[VECTOR_BODY]], !llvm.loop [[LOOP7:![0-9]+]]
; CHECK: [[MIDDLE_BLOCK]]:
-; CHECK-NEXT: br label %[[EXIT:.*]]
+; CHECK-NEXT: br i1 true, [[EXIT1:label %.*]], label %[[EXIT:.*]]
; CHECK: [[EXIT]]:
-; CHECK-NEXT: ret void
;
entry:
br label %loop
@@ -244,11 +241,10 @@ define void @test_add_double_same_var_args_2(ptr %res, ptr noalias %A, ptr noali
; CHECK-NEXT: store <2 x double> [[TMP6]], ptr [[TMP8]], align 4
; CHECK-NEXT: [[INDEX_NEXT]] = add nuw i64 [[INDEX]], 2
; CHECK-NEXT: [[TMP11:%.*]] = icmp eq i64 [[INDEX_NEXT]], 100
-; CHECK-NEXT: br i1 [[TMP11]], label %[[MIDDLE_BLOCK:.*]], label %[[VECTOR_BODY]], !llvm.loop [[LOOP6:![0-9]+]]
+; CHECK-NEXT: br i1 [[TMP11]], label %[[MIDDLE_BLOCK:.*]], label %[[VECTOR_BODY]], !llvm.loop [[LOOP9:![0-9]+]]
; CHECK: [[MIDDLE_BLOCK]]:
-; CHECK-NEXT: br label %[[EXIT:.*]]
+; CHECK-NEXT: br i1 true, [[EXIT1:label %.*]], label %[[EXIT:.*]]
; CHECK: [[EXIT]]:
-; CHECK-NEXT: ret void
;
entry:
br label %loop
diff --git a/llvm/test/Transforms/LoopVectorize/AArch64/transform-narrow-interleave-to-widen-memory-cost.ll b/llvm/test/Transforms/LoopVectorize/AArch64/transform-narrow-interleave-to-widen-memory-cost.ll
index 5e37f9eff4ba2..ccbcfff7fda7b 100644
--- a/llvm/test/Transforms/LoopVectorize/AArch64/transform-narrow-interleave-to-widen-memory-cost.ll
+++ b/llvm/test/Transforms/LoopVectorize/AArch64/transform-narrow-interleave-to-widen-memory-cost.ll
@@ -87,7 +87,7 @@ define void @test_complex_add_double(ptr %res, ptr noalias %A, ptr noalias %B, i
; CHECK-NEXT: [[MIN_ITERS_CHECK:%.*]] = icmp ult i64 [[N]], 4
; CHECK-NEXT: br i1 [[MIN_ITERS_CHECK]], label %[[SCALAR_PH:.*]], label %[[VECTOR_PH:.*]]
; CHECK: [[VECTOR_PH]]:
-; CHECK-NEXT: [[N_MOD_VF:%.*]] = urem i64 [[N]], 4
+; CHECK-NEXT: [[N_MOD_VF:%.*]] = urem i64 [[N]], 2
; CHECK-NEXT: [[N_VEC:%.*]] = sub i64 [[N]], [[N_MOD_VF]]
; CHECK-NEXT: br label %[[VECTOR_BODY:.*]]
; CHECK: [[VECTOR_BODY]]:
@@ -314,8 +314,8 @@ define void @single_fmul_used_by_each_member(ptr noalias %A, ptr noalias %B, ptr
; CHECK-NEXT: [[MIN_ITERS_CHECK11:%.*]] = icmp ult i64 [[TMP0]], 8
; CHECK-NEXT: br i1 [[MIN_ITERS_CHECK11]], label %[[VEC_EPILOG_PH:.*]], label %[[VECTOR_PH:.*]]
; CHECK: [[VECTOR_PH]]:
-; CHECK-NEXT: [[N_MOD_VF:%.*]] = urem i64 [[TMP0]], 8
-; CHECK-NEXT: [[N_VEC:%.*]] = sub i64 [[TMP0]], [[N_MOD_VF]]
+; CHECK-NEXT: [[TMP41:%.*]] = urem i64 [[TMP0]], 4
+; CHECK-NEXT: [[N_VEC12:%.*]] = sub i64 [[TMP0]], [[TMP41]]
; CHECK-NEXT: br label %[[VECTOR_BODY:.*]]
; CHECK: [[VECTOR_BODY]]:
; CHECK-NEXT: [[INDEX:%.*]] = phi i64 [ 0, %[[VECTOR_PH]] ], [ [[INDEX_NEXT:%.*]], %[[VECTOR_BODY]] ]
@@ -359,18 +359,16 @@ define void @single_fmul_used_by_each_member(ptr noalias %A, ptr noalias %B, ptr
; CHECK-NEXT: store <2 x double> [[TMP30]], ptr [[TMP37]], align 8
; CHECK-NEXT: store <2 x double> [[TMP31]], ptr [[TMP43]], align 8
; CHECK-NEXT: [[INDEX_NEXT]] = add nuw i64 [[INDEX]], 4
-; CHECK-NEXT: [[TMP44:%.*]] = icmp eq i64 [[INDEX_NEXT]], [[N_VEC]]
+; CHECK-NEXT: [[TMP44:%.*]] = icmp eq i64 [[INDEX_NEXT]], [[N_VEC12]]
; CHECK-NEXT: br i1 [[TMP44]], label %[[MIDDLE_BLOCK:.*]], label %[[VECTOR_BODY]], !llvm.loop [[LOOP10:![0-9]+]]
; CHECK: [[MIDDLE_BLOCK]]:
-; CHECK-NEXT: [[CMP_N:%.*]] = icmp eq i64 [[TMP0]], [[N_VEC]]
+; CHECK-NEXT: [[CMP_N:%.*]] = icmp eq i64 [[TMP0]], [[N_VEC12]]
; CHECK-NEXT: br i1 [[CMP_N]], label %[[EXIT:.*]], label %[[VEC_EPILOG_ITER_CHECK:.*]]
; CHECK: [[VEC_EPILOG_ITER_CHECK]]:
-; CHECK-NEXT: [[MIN_EPILOG_ITERS_CHECK:%.*]] = icmp ult i64 [[N_MOD_VF]], 2
+; CHECK-NEXT: [[MIN_EPILOG_ITERS_CHECK:%.*]] = icmp ult i64 [[TMP41]], 2
; CHECK-NEXT: br i1 [[MIN_EPILOG_ITERS_CHECK]], label %[[VEC_EPILOG_SCALAR_PH]], label %[[VEC_EPILOG_PH]], !prof [[PROF7]]
; CHECK: [[VEC_EPILOG_PH]]:
-; CHECK-NEXT: [[VEC_EPILOG_RESUME_VAL:%.*]] = phi i64 [ [[N_VEC]], %[[VEC_EPILOG_ITER_CHECK]] ], [ 0, %[[VECTOR_MAIN_LOOP_ITER_CHECK]] ]
-; CHECK-NEXT: [[N_MOD_VF22:%.*]] = urem i64 [[TMP0]], 2
-; CHECK-NEXT: [[N_VEC23:%.*]] = sub i64 [[TMP0]], [[N_MOD_VF22]]
+; CHECK-NEXT: [[VEC_EPILOG_RESUME_VAL:%.*]] = phi i64 [ [[N_VEC12]], %[[VEC_EPILOG_ITER_CHECK]] ], [ 0, %[[VECTOR_MAIN_LOOP_ITER_CHECK]] ]
; CHECK-NEXT: br label %[[VEC_EPILOG_VECTOR_BODY:.*]]
; CHECK: [[VEC_EPILOG_VECTOR_BODY]]:
; CHECK-NEXT: [[INDEX24:%.*]] = phi i64 [ [[VEC_EPILOG_RESUME_VAL]], %[[VEC_EPILOG_PH]] ], [ [[INDEX_NEXT25:%.*]], %[[VEC_EPILOG_VECTOR_BODY]] ]
@@ -384,13 +382,12 @@ define void @single_fmul_used_by_each_member(ptr noalias %A, ptr noalias %B, ptr
; CHECK-NEXT: [[TMP50:%.*]] = getelementptr { double, double }, ptr [[C]], i64 [[INDEX24]]
; CHECK-NEXT: store <2 x double> [[TMP48]], ptr [[TMP50]], align 8
; CHECK-NEXT: [[INDEX_NEXT25]] = add nuw i64 [[INDEX24]], 1
-; CHECK-NEXT: [[TMP51:%.*]] = icmp eq i64 [[INDEX_NEXT25]], [[N_VEC23]]
-; CHECK-NEXT: br i1 [[TMP51]], label %[[VEC_EPILOG_MIDDLE_BLOCK:.*]], label %[[VEC_EPILOG_VECTOR_BODY]], !llvm.loop [[LOOP11:![0-9]+]]
+; CHECK-NEXT: [[TMP46:%.*]] = icmp eq i64 [[INDEX_NEXT25]], [[TMP0]]
+; CHECK-NEXT: br i1 [[TMP46]], label %[[VEC_EPILOG_MIDDLE_BLOCK:.*]], label %[[VEC_EPILOG_VECTOR_BODY]], !llvm.loop [[LOOP11:![0-9]+]]
; CHECK: [[VEC_EPILOG_MIDDLE_BLOCK]]:
-; CHECK-NEXT: [[CMP_N26:%.*]] = icmp eq i64 [[TMP0]], [[N_VEC23]]
-; CHECK-NEXT: br i1 [[CMP_N26]], label %[[EXIT]], label %[[VEC_EPILOG_SCALAR_PH]]
+; CHECK-NEXT: br i1 true, label %[[EXIT]], label %[[VEC_EPILOG_SCALAR_PH]]
; CHECK: [[VEC_EPILOG_SCALAR_PH]]:
-; CHECK-NEXT: [[BC_RESUME_VAL:%.*]] = phi i64 [ [[N_VEC23]], %[[VEC_EPILOG_MIDDLE_BLOCK]] ], [ [[N_VEC]], %[[VEC_EPILOG_ITER_CHECK]] ], [ 0, %[[VECTOR_SCEVCHECK]] ], [ 0, %[[ITER_CHECK]] ]
+; CHECK-NEXT: [[BC_RESUME_VAL:%.*]] = phi i64 [ [[TMP0]], %[[VEC_EPILOG_MIDDLE_BLOCK]] ], [ [[N_VEC12]], %[[VEC_EPILOG_ITER_CHECK]] ], [ 0, %[[VECTOR_SCEVCHECK]] ], [ 0, %[[ITER_CHECK]] ]
; CHECK-NEXT: br label %[[LOOP:.*]]
; CHECK: [[LOOP]]:
; CHECK-NEXT: [[IV:%.*]] = phi i64 [ [[BC_RESUME_VAL]], %[[VEC_EPILOG_SCALAR_PH]] ], [ [[IV_NEXT:%.*]], %[[LOOP]] ]
@@ -462,9 +459,8 @@ define void @test_interleave_after_narrowing(i32 %n, ptr %x, ptr noalias %y) {
; CHECK-NEXT: [[TMP3:%.*]] = icmp eq i64 [[INDEX_NEXT]], 256
; CHECK-NEXT: br i1 [[TMP3]], label %[[MIDDLE_BLOCK:.*]], label %[[VECTOR_BODY]], !llvm.loop [[LOOP13:![0-9]+]]
; CHECK: [[MIDDLE_BLOCK]]:
-; CHECK-NEXT: br label %[[EXIT:.*]]
+; CHECK-NEXT: br i1 true, [[EXIT1:label %.*]], label %[[EXIT:.*]]
; CHECK: [[EXIT]]:
-; CHECK-NEXT: ret void
;
entry:
br label %loop
diff --git a/llvm/test/Transforms/LoopVectorize/AArch64/transform-narrow-interleave-to-widen-memory-derived-ivs.ll b/llvm/test/Transforms/LoopVectorize/AArch64/transform-narrow-interleave-to-widen-memory-derived-ivs.ll
index fab0369de8aa0..6428fe6a29445 100644
--- a/llvm/test/Transforms/LoopVectorize/AArch64/transform-narrow-interleave-to-widen-memory-derived-ivs.ll
+++ b/llvm/test/Transforms/LoopVectorize/AArch64/transform-narrow-interleave-to-widen-memory-derived-ivs.ll
@@ -16,7 +16,7 @@ define void @derived_int_ivs(ptr noalias %a, ptr noalias %b, i64 %end) {
; VF2-NEXT: [[MIN_ITERS_CHECK:%.*]] = icmp ult i64 [[TMP2]], 2
; VF2-NEXT: br i1 [[MIN_ITERS_CHECK]], label %[[SCALAR_PH:.*]], label %[[VECTOR_PH:.*]]
; VF2: [[VECTOR_PH]]:
-; VF2-NEXT: [[N_MOD_VF:%.*]] = urem i64 [[TMP2]], 2
+; VF2-NEXT: [[N_MOD_VF:%.*]] = urem i64 [[TMP2]], 1
; VF2-NEXT: [[N_VEC:%.*]] = sub i64 [[TMP2]], [[N_MOD_VF]]
; VF2-NEXT: [[TMP3:%.*]] = mul i64 [[N_VEC]], 16
; VF2-NEXT: [[TMP4:%.*]] = add i64 16, [[TMP3]]
@@ -30,8 +30,8 @@ define void @derived_int_ivs(ptr noalias %a, ptr noalias %b, i64 %end) {
; VF2-NEXT: [[TMP7:%.*]] = getelementptr inbounds i8, ptr [[B]], i64 [[OFFSET_IDX]]
; VF2-NEXT: store <2 x double> [[WIDE_LOAD]], ptr [[TMP7]], align 8
; VF2-NEXT: [[INDEX_NEXT]] = add nuw i64 [[INDEX]], 1
-; VF2-NEXT: [[TMP9:%.*]] = icmp eq i64 [[INDEX_NEXT]], [[N_VEC]]
-; VF2-NEXT: br i1 [[TMP9]], label %[[MIDDLE_BLOCK:.*]], label %[[VECTOR_BODY]], !llvm.loop [[LOOP0:![0-9]+]]
+; VF2-NEXT: [[TMP8:%.*]] = icmp eq i64 [[INDEX_NEXT]], [[N_VEC]]
+; VF2-NEXT: br i1 [[TMP8]], label %[[MIDDLE_BLOCK:.*]], label %[[VECTOR_BODY]], !llvm.loop [[LOOP0:![0-9]+]]
; VF2: [[MIDDLE_BLOCK]]:
; VF2-NEXT: [[CMP_N:%.*]] = icmp eq i64 [[TMP2]], [[N_VEC]]
; VF2-NEXT: br i1 [[CMP_N]], [[EXIT:label %.*]], label %[[SCALAR_PH]]
@@ -46,7 +46,7 @@ define void @derived_int_ivs(ptr noalias %a, ptr noalias %b, i64 %end) {
; VF2IC2-NEXT: [[MIN_ITERS_CHECK:%.*]] = icmp ult i64 [[TMP2]], 4
; VF2IC2-NEXT: br i1 [[MIN_ITERS_CHECK]], label %[[SCALAR_PH:.*]], label %[[VECTOR_PH:.*]]
; VF2IC2: [[VECTOR_PH]]:
-; VF2IC2-NEXT: [[N_MOD_VF:%.*]] = urem i64 [[TMP2]], 4
+; VF2IC2-NEXT: [[N_MOD_VF:%.*]] = urem i64 [[TMP2]], 2
; VF2IC2-NEXT: [[N_VEC:%.*]] = sub i64 [[TMP2]], [[N_MOD_VF]]
; VF2IC2-NEXT: [[TMP3:%.*]] = mul i64 [[N_VEC]], 16
; VF2IC2-NEXT: [[TMP4:%.*]] = add i64 16, [[TMP3]]
@@ -154,7 +154,7 @@ define void @derived_pointer_ivs(ptr noalias %a, ptr noalias %b, ptr %end) {
; VF2-NEXT: [[FOUND_CONFLICT:%.*]] = and i1 [[BOUND0]], [[BOUND1]]
; VF2-NEXT: br i1 [[FOUND_CONFLICT]], label %[[SCALAR_PH]], label %[[VECTOR_PH:.*]]
; VF2: [[VECTOR_PH]]:
-; VF2-NEXT: [[N_MOD_VF:%.*]] = urem i64 [[TMP3]], 2
+; VF2-NEXT: [[N_MOD_VF:%.*]] = urem i64 [[TMP3]], 1
; VF2-NEXT: [[N_VEC:%.*]] = sub i64 [[TMP3]], [[N_MOD_VF]]
; VF2-NEXT: [[TMP9:%.*]] = mul i64 [[N_VEC]], 16
; VF2-NEXT: [[TMP10:%.*]] = getelementptr i8, ptr [[A]], i64 [[TMP9]]
@@ -170,8 +170,8 @@ define void @derived_pointer_ivs(ptr noalias %a, ptr noalias %b, ptr %end) {
; VF2-NEXT: [[WIDE_LOAD:%.*]] = load <2 x double>, ptr [[NEXT_GEP]], align 8
; VF2-NEXT: store <2 x double> [[WIDE_LOAD]], ptr [[NEXT_GEP7]], align 8
; VF2-NEXT: [[INDEX_NEXT]] = add nuw i64 [[INDEX]], 1
-; VF2-NEXT: [[TMP14:%.*]] = icmp eq i64 [[INDEX_NEXT]], [[N_VEC]]
-; VF2-NEXT: br i1 [[TMP14]], label %[[MIDDLE_BLOCK:.*]], label %[[VECTOR_BODY]], !llvm.loop [[LOOP4:![0-9]+]]
+; VF2-NEXT: [[TMP13:%.*]] = icmp eq i64 [[INDEX_NEXT]], [[N_VEC]]
+; VF2-NEXT: br i1 [[TMP13]], label %[[MIDDLE_BLOCK:.*]], label %[[VECTOR_BODY]], !llvm.loop [[LOOP4:![0-9]+]]
; VF2: [[MIDDLE_BLOCK]]:
; VF2-NEXT: [[CMP_N:%.*]] = icmp eq i64 [[TMP3]], [[N_VEC]]
; VF2-NEXT: br i1 [[CMP_N]], [[EXIT:label %.*]], label %[[SCALAR_PH]]
@@ -203,7 +203,7 @@ define void @derived_pointer_ivs(ptr noalias %a, ptr noalias %b, ptr %end) {
; VF2IC2-NEXT: [[FOUND_CONFLICT:%.*]] = and i1 [[BOUND0]], [[BOUND1]]
; VF2IC2-NEXT: br i1 [[FOUND_CONFLICT]], label %[[SCALAR_PH]], label %[[VECTOR_PH:.*]]
; VF2IC2: [[VECTOR_PH]]:
-; VF2IC2-NEXT: [[N_MOD_VF:%.*]] = urem i64 [[TMP3]], 4
+; VF2IC2-NEXT: [[N_MOD_VF:%.*]] = urem i64 [[TMP3]], 2
; VF2IC2-NEXT: [[N_VEC:%.*]] = sub i64 [[TMP3]], [[N_MOD_VF]]
; VF2IC2-NEXT: [[TMP9:%.*]] = mul i64 [[N_VEC]], 16
; VF2IC2-NEXT: [[TMP10:%.*]] = getelementptr i8, ptr [[A]], i64 [[TMP9]]
@@ -324,9 +324,8 @@ define void @narrow_with_uniform_add_and_gep(ptr noalias %p) {
; VF2-NEXT: [[TMP5:%.*]] = icmp eq i64 [[INDEX_NEXT]], 512
; VF2-NEXT: br i1 [[TMP5]], label %[[MIDDLE_BLOCK:.*]], label %[[VECTOR_BODY]], !llvm.loop [[LOOP6:![0-9]+]]
; VF2: [[MIDDLE_BLOCK]]:
-; VF2-NEXT: br label %[[EXIT:.*]]
+; VF2-NEXT: br i1 true, [[EXIT1:label %.*]], label %[[EXIT:.*]]
; VF2: [[EXIT]]:
-; VF2-NEXT: ret void
;
; VF2IC2-LABEL: define void @narrow_with_uniform_add_and_gep(
; VF2IC2-SAME: ptr noalias [[P:%.*]]) {
@@ -350,9 +349,8 @@ define void @narrow_with_uniform_add_and_gep(ptr noalias %p) {
; VF2IC2-NEXT: [[TMP7:%.*]] = icmp eq i64 [[INDEX_NEXT]], 512
; VF2IC2-NEXT: br i1 [[TMP7]], label %[[MIDDLE_BLOCK:.*]], label %[[VECTOR_BODY]], !llvm.loop [[LOOP6:![0-9]+]]
; VF2IC2: [[MIDDLE_BLOCK]]:
-; VF2IC2-NEXT: br label %[[EXIT:.*]]
+; VF2IC2-NEXT: br i1 true, [[EXIT1:label %.*]], label %[[EXIT:.*]]
; VF2IC2: [[EXIT]]:
-; VF2IC2-NEXT: ret void
;
; VF4-LABEL: define void @narrow_with_uniform_add_and_gep(
; VF4-SAME: ptr noalias [[P:%.*]]) {
diff --git a/llvm/test/Transforms/LoopVectorize/AArch64/transform-narrow-interleave-to-widen-memory-metadata.ll b/llvm/test/Transforms/LoopVectorize/AArch64/transform-narrow-interleave-to-widen-memory-metadata.ll
index 442574f298c3f..773e6f0bfc4a5 100644
--- a/llvm/test/Transforms/LoopVectorize/AArch64/transform-narrow-interleave-to-widen-memory-metadata.ll
+++ b/llvm/test/Transforms/LoopVectorize/AArch64/transform-narrow-interleave-to-widen-memory-metadata.ll
@@ -23,8 +23,8 @@ define void @load_store_interleave_group_with_metadata(ptr noalias %data) {
; VF2-NEXT: [[TMP2:%.*]] = icmp eq i64 [[INDEX_NEXT]], 100
; VF2-NEXT: br i1 [[TMP2]], label %[[MIDDLE_BLOCK:.*]], label %[[VECTOR_BODY]], !llvm.loop [[LOOP0:![0-9]+]]
; VF2: [[MIDDLE_BLOCK]]:
-; VF2-NEXT: br [[EXIT:label %.*]]
-; VF2: [[SCALAR_PH:.*:]]
+; VF2-NEXT: br i1 true, [[EXIT:label %.*]], label %[[SCALAR_PH:.*]]
+; VF2: [[SCALAR_PH]]:
;
entry:
br label %loop
diff --git a/llvm/test/Transforms/LoopVectorize/AArch64/transform-narrow-interleave-to-widen-memory-remove-loop-region.ll b/llvm/test/Transforms/LoopVectorize/AArch64/transform-narrow-interleave-to-widen-memory-remove-loop-region.ll
index 7d77f2f6b5b9c..d4fb7f7e7685c 100644
--- a/llvm/test/Transforms/LoopVectorize/AArch64/transform-narrow-interleave-to-widen-memory-remove-loop-region.ll
+++ b/llvm/test/Transforms/LoopVectorize/AArch64/transform-narrow-interleave-to-widen-memory-remove-loop-region.ll
@@ -22,8 +22,23 @@ define void @load_store_interleave_group_tc_2(ptr noalias %data) {
; VF2-NEXT: [[TMP2:%.*]] = icmp eq i64 [[INDEX_NEXT]], 2
; VF2-NEXT: br i1 [[TMP2]], label %[[MIDDLE_BLOCK:.*]], label %[[VECTOR_BODY]], !llvm.loop [[LOOP0:![0-9]+]]
; VF2: [[MIDDLE_BLOCK]]:
+; VF2-NEXT: br i1 true, label %[[EXIT1:.*]], label %[[SCALAR_PH:.*]]
+; VF2: [[SCALAR_PH]]:
; VF2-NEXT: br label %[[EXIT:.*]]
; VF2: [[EXIT]]:
+; VF2-NEXT: [[IV:%.*]] = phi i64 [ 2, %[[SCALAR_PH]] ], [ [[IV_NEXT:%.*]], %[[EXIT]] ]
+; VF2-NEXT: [[MUL_2:%.*]] = shl nsw i64 [[IV]], 1
+; VF2-NEXT: [[DATA_0:%.*]] = getelementptr inbounds i64, ptr [[DATA]], i64 [[MUL_2]]
+; VF2-NEXT: [[L_0:%.*]] = load i64, ptr [[DATA_0]], align 8
+; VF2-NEXT: store i64 [[L_0]], ptr [[DATA_0]], align 8
+; VF2-NEXT: [[ADD_1:%.*]] = or disjoint i64 [[MUL_2]], 1
+; VF2-NEXT: [[DATA_1:%.*]] = getelementptr inbounds i64, ptr [[DATA]], i64 [[ADD_1]]
+; VF2-NEXT: [[L_1:%.*]] = load i64, ptr [[DATA_1]], align 8
+; VF2-NEXT: store i64 [[L_1]], ptr [[DATA_1]], align 8
+; VF2-NEXT: [[IV_NEXT]] = add nuw nsw i64 [[IV]], 1
+; VF2-NEXT: [[EC:%.*]] = icmp eq i64 [[IV_NEXT]], 2
+; VF2-NEXT: br i1 [[EC]], label %[[EXIT1]], label %[[EXIT]], !llvm.loop [[LOOP3:![0-9]+]]
+; VF2: [[EXIT1]]:
; VF2-NEXT: ret void
;
; VF4-LABEL: define void @load_store_interleave_group_tc_2(
@@ -205,7 +220,7 @@ define void @test_complex_add_float_tc_4(ptr %res, ptr noalias %A, ptr noalias %
; VF2-NEXT: store <4 x float> [[INTERLEAVED_VEC]], ptr [[TMP5]], align 4
; VF2-NEXT: [[INDEX_NEXT]] = add nuw i64 [[INDEX]], 2
; VF2-NEXT: [[TMP7:%.*]] = icmp eq i64 [[INDEX_NEXT]], 4
-; VF2-NEXT: br i1 [[TMP7]], label %[[MIDDLE_BLOCK:.*]], label %[[VECTOR_BODY]], !llvm.loop [[LOOP3:![0-9]+]]
+; VF2-NEXT: br i1 [[TMP7]], label %[[MIDDLE_BLOCK:.*]], label %[[VECTOR_BODY]], !llvm.loop [[LOOP4:![0-9]+]]
; VF2: [[MIDDLE_BLOCK]]:
; VF2-NEXT: br label %[[EXIT:.*]]
; VF2: [[EXIT]]:
diff --git a/llvm/test/Transforms/LoopVectorize/AArch64/transform-narrow-interleave-to-widen-memory-scalable.ll b/llvm/test/Transforms/LoopVectorize/AArch64/transform-narrow-interleave-to-widen-memory-s...
[truncated]
|
.../Transforms/LoopVectorize/AArch64/transform-narrow-interleave-to-widen-memory-derived-ivs.ll
Show resolved
Hide resolved
After VPSymbolicValues (like VF and VFxUF) are materialized via replaceAllUsesWith, they should not be accessed again. This patch: 1. Tracks materialization state in VPSymbolicValue. 2. Asserts if the materialized VPValue is used again. Currently it adds asserts to various member functions, preventing calling them on materialized symbolic values. Note that this still allows some uses (e.g. comparing VPSymbolicValue references or pointers), but this should be relatively harmless given that it is impossible to (re-)add any users. If we want to further tighten the checks, we could add asserts to the accessors or override operator&, but that will require more changes and not add much extra guards I think. Depends on llvm#182146 to fix a current access violation (included in PR).
12df5ab to
a5763ec
Compare
fhahn
left a comment
There was a problem hiding this comment.
Ping. Latest version should have unrelated test changes stripped + missed simplifications fixed
a5763ec to
fcbf692
Compare
🪟 Windows x64 Test Results
✅ The build succeeded and all tests passed. |
🐧 Linux x64 Test Results
✅ The build succeeded and all tests passed. |
fcbf692 to
69c223d
Compare
| %l.1 = load i64, ptr %data.1, align 8 | ||
| %mul.1 = mul i64 %l.factor, %l.1 | ||
| store i64 %mul.1, ptr %data.1, align 8 | ||
| %data.2 = getelementptr inbounds { i64 , i64, i64, i64 }, ptr %data, i64 %iv, i32 2 |
There was a problem hiding this comment.
nit: some odd spacing in all the structs passed to the geps. I think it should be { i64, i64, i64, i64 }
There was a problem hiding this comment.
this should be fixed now, thanks
| ; CHECK-LABEL: define i64 @test_4xi64_induction_live_out( | ||
| ; CHECK-SAME: ptr noalias [[DATA:%.*]], ptr noalias [[FACTOR:%.*]], i64 noundef [[N:%.*]]) #[[ATTR0]] { | ||
| ; CHECK-NEXT: [[ENTRY:.*]]: | ||
| ; CHECK-NEXT: [[MIN_ITERS_CHECK:%.*]] = icmp ult i64 [[N]], 16 |
There was a problem hiding this comment.
Do you know why this min iters check is still 16? It only needs to be 4 to enter the vector loop I think. I appreciate this is probably outside the scope of this patch to fix though.
There was a problem hiding this comment.
Yep, the check is currently still using the original VF. Will check how to also handle this as follow-up
| } | ||
|
|
||
| ; Test with induction live-out to verify exit value is correctly computed. | ||
| define i64 @test_4xi64_induction_live_out(ptr noalias %data, ptr noalias %factor, i64 noundef %n) { |
There was a problem hiding this comment.
Do we also have a test for a live-out where we would also create a vector epilogue? The incoming value for the epilogue middle block also needs to be correct.
There was a problem hiding this comment.
Added tests with epilogue vectorization in a new file, one with and one without an IV live-out. IV live outs are not yet supported by epilogue vectorization, but it should guard against regressions in the future., thanks
When narrowInterleaveGroups transforms a plan, VF and VFxUF are materialized (replaced with concrete values). This patch also materializes the VectorTripCount in the same transform. This ensures that VectorTripCount is properly computed when the narrow interleave transform is applied, instead of using the original VF + UF to compute the vector trip count. The previous behavior generated correct code, but executed fewer iterations in the vector loop. The change also enables stricter verification prevent accesses of VPSymbolicValues after materialization as follow-up. Note that in some cases we no miss branch folding, but that should be addressed separately, llvm#181252
69c223d to
e748bc5
Compare
…-in-narrow-interleave-groups
…-in-narrow-interleave-groups
| ; CHECK: [[VECTOR_PH]]: | ||
| ; CHECK-NEXT: [[N_MOD_VF:%.*]] = urem i64 [[N]], 4 | ||
| ; CHECK-NEXT: [[N_VEC:%.*]] = sub i64 [[N]], [[N_MOD_VF]] | ||
| ; CHECK-NEXT: br label %[[VECTOR_BODY:.*]] |
There was a problem hiding this comment.
nit: Looks like the CHECK lines have gone wrong somehow after updating, but won't hold the patch up for it. Perhaps could be fixed in a follow-up?
There was a problem hiding this comment.
ah yes, not sure why the update script renamed this, removed manually, thanks!
| } | ||
|
|
||
| ; Test with induction live-out to verify exit value is correctly computed. | ||
| define i64 @test_4xi64_induction_live_out(ptr noalias %data, ptr noalias %factor, i64 noundef %n) { |
…-in-narrow-interleave-groups
…ups. (#182146) When narrowInterleaveGroups transforms a plan, VF and VFxUF are materialized (replaced with concrete values). This patch also materializes the VectorTripCount in the same transform. This ensures that VectorTripCount is properly computed when the narrow interleave transform is applied, instead of using the original VF + UF to compute the vector trip count. The previous behavior generated correct code, but executed fewer iterations in the vector loop. The change also enables stricter verification prevent accesses of UF, VF, VFxUF etc after materialization as follow-up. Note that in some cases we no miss branch folding, but that should be addressed separately, llvm/llvm-project#181252 Fixes one of the violations accessing a VectorTripCount after UF and VF being materialized PR: llvm/llvm-project#182146
After VPSymbolicValues (like VF and VFxUF) are materialized via replaceAllUsesWith, they should not be accessed again. This patch: 1. Tracks materialization state in VPSymbolicValue. 2. Asserts if the materialized VPValue is used again. Currently it adds asserts to various member functions, preventing calling them on materialized symbolic values. Note that this still allows some uses (e.g. comparing VPSymbolicValue references or pointers), but this should be relatively harmless given that it is impossible to (re-)add any users. If we want to further tighten the checks, we could add asserts to the accessors or override operator&, but that will require more changes and not add much extra guards I think. Depends on llvm#182146 to fix a current access violation (included in PR).
After VPSymbolicValues (like VF and VFxUF) are materialized via replaceAllUsesWith, they should not be accessed again. This patch: 1. Tracks materialization state in VPSymbolicValue. 2. Asserts if the materialized VPValue is used again. Currently it adds asserts to various member functions, preventing calling them on materialized symbolic values. Note that this still allows some uses (e.g. comparing VPSymbolicValue references or pointers), but this should be relatively harmless given that it is impossible to (re-)add any users. If we want to further tighten the checks, we could add asserts to the accessors or override operator&, but that will require more changes and not add much extra guards I think. Depends on llvm#182146 to fix a current access violation (included in PR).
After VPSymbolicValues (like VF and VFxUF) are materialized via replaceAllUsesWith, they should not be accessed again. This patch: 1. Tracks materialization state in VPSymbolicValue. 2. Asserts if the materialized VPValue is used again. Currently it adds asserts to various member functions, preventing calling them on materialized symbolic values. Note that this still allows some uses (e.g. comparing VPSymbolicValue references or pointers), but this should be relatively harmless given that it is impossible to (re-)add any users. If we want to further tighten the checks, we could add asserts to the accessors or override operator&, but that will require more changes and not add much extra guards I think. Depends on llvm#182146 to fix a current access violation (included in PR).
After VPSymbolicValues (like VF and VFxUF) are materialized via replaceAllUsesWith, they should not be accessed again. This patch: 1. Tracks materialization state in VPSymbolicValue. 2. Asserts if the materialized VPValue is used again. Currently it adds asserts to various member functions, preventing calling them on materialized symbolic values. Note that this still allows some uses (e.g. comparing VPSymbolicValue references or pointers), but this should be relatively harmless given that it is impossible to (re-)add any users. If we want to further tighten the checks, we could add asserts to the accessors or override operator&, but that will require more changes and not add much extra guards I think. Depends on #182146 to fix a current access violation. PR: #182318
…C) (#182318) After VPSymbolicValues (like VF and VFxUF) are materialized via replaceAllUsesWith, they should not be accessed again. This patch: 1. Tracks materialization state in VPSymbolicValue. 2. Asserts if the materialized VPValue is used again. Currently it adds asserts to various member functions, preventing calling them on materialized symbolic values. Note that this still allows some uses (e.g. comparing VPSymbolicValue references or pointers), but this should be relatively harmless given that it is impossible to (re-)add any users. If we want to further tighten the checks, we could add asserts to the accessors or override operator&, but that will require more changes and not add much extra guards I think. Depends on llvm/llvm-project#182146 to fix a current access violation. PR: llvm/llvm-project#182318
|
Hi @fhahn , this commit (c79a058) is causing a large number of test failures in our codebase. The diversity of tests starting to fail makes me suspect there's a miscompilation (rather than a UB in the code). I'm working on a reproducer, but it might take some time. Maybe you have some ideas on what could go wrong? |
|
So far all the failures I've seen are coming from test runs on aarch64. Some of them are coming from this code: https://github.com/dart-lang/sdk/blob/a0faceddd764dc32edbd4265685776e2d885d514/runtime/vm/heap/scavenger.cc#L207 |
|
Repro for a crash: ; ModuleID = './reduced.ll'
source_filename = "./reduced.ll"
target datalayout = "e-m:e-p270:32:32-p271:32:32-p272:64:64-i8:8:32-i16:16:32-i64:64-i128:128-n32:64-S128-Fn32"
target triple = "aarch64-grtev4-linux-gnu"
define ptr @wombat(ptr %arg) #0 {
bb:
br label %bb1
bb1: ; preds = %bb1, %bb
%phi = phi ptr [ %getelementptr4, %bb1 ], [ %arg, %bb ]
store float 1.000000e+00, ptr %phi, align 4
%getelementptr = getelementptr i8, ptr %phi, i64 4
store float 1.000000e+00, ptr %getelementptr, align 4
%getelementptr2 = getelementptr i8, ptr %phi, i64 8
store float 1.000000e+00, ptr %getelementptr2, align 4
%getelementptr3 = getelementptr i8, ptr %phi, i64 12
store float 1.000000e+00, ptr %getelementptr3, align 4
%getelementptr4 = getelementptr i8, ptr %phi, i64 16
%getelementptr5 = getelementptr i8, ptr %arg, i64 16000
%icmp = icmp eq ptr %getelementptr4, %getelementptr5
br i1 %icmp, label %bb6, label %bb1
bb6: ; preds = %bb1
ret ptr null
}
attributes #0 = { "tune-cpu"="neoverse-v2" }It doesn't revert cleanly and the follow-up patch that it conflicts with fixes assertion failures/miscompiles. I think we can wait until someone else is able to tease apart the issues here. |
|
The crash should be fixed by #187016. It's a bit late here, so I won't be able to do the manual revert until tomorrow. @AlexHF would it be possible to share the input IR? The diff with/without the commit may be enough to figure out what the issue is. I should be able to do that easily, given the input IR |
The closest I could get so far is the full IR of the translation unit: https://gcc.godbolt.org/z/jM8r6b1he The problem is in the ScavengerVisitor::VisitPointers method (https://github.com/dart-lang/sdk/blob/a0faceddd764dc32edbd4265685776e2d885d514/runtime/vm/heap/scavenger.cc#L207), i.e. when I place |
|
#187016 also fixes the miscompilations we've seen so far. |
When narrowInterleaveGroups transforms a plan, VF and VFxUF are materialized (replaced with concrete values). This patch also materializes the VectorTripCount in the same transform.
This ensures that VectorTripCount is properly computed when the narrow interleave transform is applied, instead of using the original VF
The change also enables stricter verification prevent accesses of UF, VF, VFxUF etc after materialization as follow-up.
Note that in some cases we no miss branch folding, but that should be addressed separately, #181252
Fixes one of the violations accessing a VectorTripCount after UF and VF being materialized