[LV] Consider whether vscale is a known power of two for iteration check#144963
[LV] Consider whether vscale is a known power of two for iteration check#144963
Conversation
Going mostly by the comment here - but it says "vscale is not necessarily a power-of-2". Both in tree targets have vscale as a power of two, and we have an existing TTI hook for that.
|
@llvm/pr-subscribers-llvm-transforms @llvm/pr-subscribers-vectorizers Author: Philip Reames (preames) ChangesGoing mostly by the comment here - but it says "vscale is not necessarily a power-of-2". Both in tree targets have vscale as a power of two, and we have an existing TTI hook for that. Patch is 74.93 KiB, truncated to 20.00 KiB below, full version: https://github.com/llvm/llvm-project/pull/144963.diff 19 Files Affected:
diff --git a/llvm/lib/Transforms/Vectorize/LoopVectorize.cpp b/llvm/lib/Transforms/Vectorize/LoopVectorize.cpp
index 2f4416d2782e8..fd23e438553ee 100644
--- a/llvm/lib/Transforms/Vectorize/LoopVectorize.cpp
+++ b/llvm/lib/Transforms/Vectorize/LoopVectorize.cpp
@@ -2434,7 +2434,7 @@ Value *InnerLoopVectorizer::createIterationCountCheck(ElementCount VF,
// check is known to be true, or known to be false.
CheckMinIters = Builder.CreateICmp(P, Count, Step, "min.iters.check");
} // else step known to be < trip count, use CheckMinIters preset to false.
- } else if (VF.isScalable() &&
+ } else if (VF.isScalable() && !TTI->isVScaleKnownToBeAPowerOfTwo() &&
!isIndvarOverflowCheckKnownFalse(Cost, VF, UF) &&
Style != TailFoldingStyle::DataAndControlFlowWithoutRuntimeCheck) {
// vscale is not necessarily a power-of-2, which means we cannot guarantee
diff --git a/llvm/test/Transforms/LoopVectorize/AArch64/tail-folding-styles.ll b/llvm/test/Transforms/LoopVectorize/AArch64/tail-folding-styles.ll
index 71d03afa6b6f1..8a28196fd4f7e 100644
--- a/llvm/test/Transforms/LoopVectorize/AArch64/tail-folding-styles.ll
+++ b/llvm/test/Transforms/LoopVectorize/AArch64/tail-folding-styles.ll
@@ -54,11 +54,7 @@ define void @simple_memset_tailfold(i32 %val, ptr %ptr, i64 %n) "target-features
; DATA-LABEL: @simple_memset_tailfold(
; DATA-NEXT: entry:
; DATA-NEXT: [[UMAX:%.*]] = call i64 @llvm.umax.i64(i64 [[N:%.*]], i64 1)
-; DATA-NEXT: [[TMP0:%.*]] = sub i64 -1, [[UMAX]]
-; DATA-NEXT: [[TMP1:%.*]] = call i64 @llvm.vscale.i64()
-; DATA-NEXT: [[TMP2:%.*]] = mul i64 [[TMP1]], 4
-; DATA-NEXT: [[TMP3:%.*]] = icmp ult i64 [[TMP0]], [[TMP2]]
-; DATA-NEXT: br i1 [[TMP3]], label [[SCALAR_PH:%.*]], label [[VECTOR_PH:%.*]]
+; DATA-NEXT: br i1 false, label [[SCALAR_PH:%.*]], label [[VECTOR_PH:%.*]]
; DATA: vector.ph:
; DATA-NEXT: [[TMP4:%.*]] = call i64 @llvm.vscale.i64()
; DATA-NEXT: [[TMP5:%.*]] = mul i64 [[TMP4]], 4
@@ -98,11 +94,7 @@ define void @simple_memset_tailfold(i32 %val, ptr %ptr, i64 %n) "target-features
; DATA_NO_LANEMASK-LABEL: @simple_memset_tailfold(
; DATA_NO_LANEMASK-NEXT: entry:
; DATA_NO_LANEMASK-NEXT: [[UMAX:%.*]] = call i64 @llvm.umax.i64(i64 [[N:%.*]], i64 1)
-; DATA_NO_LANEMASK-NEXT: [[TMP0:%.*]] = sub i64 -1, [[UMAX]]
-; DATA_NO_LANEMASK-NEXT: [[TMP1:%.*]] = call i64 @llvm.vscale.i64()
-; DATA_NO_LANEMASK-NEXT: [[TMP2:%.*]] = mul i64 [[TMP1]], 4
-; DATA_NO_LANEMASK-NEXT: [[TMP3:%.*]] = icmp ult i64 [[TMP0]], [[TMP2]]
-; DATA_NO_LANEMASK-NEXT: br i1 [[TMP3]], label [[SCALAR_PH:%.*]], label [[VECTOR_PH:%.*]]
+; DATA_NO_LANEMASK-NEXT: br i1 false, label [[SCALAR_PH:%.*]], label [[VECTOR_PH:%.*]]
; DATA_NO_LANEMASK: vector.ph:
; DATA_NO_LANEMASK-NEXT: [[TMP4:%.*]] = call i64 @llvm.vscale.i64()
; DATA_NO_LANEMASK-NEXT: [[TMP5:%.*]] = mul i64 [[TMP4]], 4
@@ -150,11 +142,7 @@ define void @simple_memset_tailfold(i32 %val, ptr %ptr, i64 %n) "target-features
; DATA_AND_CONTROL-LABEL: @simple_memset_tailfold(
; DATA_AND_CONTROL-NEXT: entry:
; DATA_AND_CONTROL-NEXT: [[UMAX:%.*]] = call i64 @llvm.umax.i64(i64 [[N:%.*]], i64 1)
-; DATA_AND_CONTROL-NEXT: [[TMP0:%.*]] = sub i64 -1, [[UMAX]]
-; DATA_AND_CONTROL-NEXT: [[TMP1:%.*]] = call i64 @llvm.vscale.i64()
-; DATA_AND_CONTROL-NEXT: [[TMP2:%.*]] = mul i64 [[TMP1]], 4
-; DATA_AND_CONTROL-NEXT: [[TMP3:%.*]] = icmp ult i64 [[TMP0]], [[TMP2]]
-; DATA_AND_CONTROL-NEXT: br i1 [[TMP3]], label [[SCALAR_PH:%.*]], label [[VECTOR_PH:%.*]]
+; DATA_AND_CONTROL-NEXT: br i1 false, label [[SCALAR_PH:%.*]], label [[VECTOR_PH:%.*]]
; DATA_AND_CONTROL: vector.ph:
; DATA_AND_CONTROL-NEXT: [[TMP4:%.*]] = call i64 @llvm.vscale.i64()
; DATA_AND_CONTROL-NEXT: [[TMP5:%.*]] = mul i64 [[TMP4]], 4
diff --git a/llvm/test/Transforms/LoopVectorize/RISCV/inloop-reduction.ll b/llvm/test/Transforms/LoopVectorize/RISCV/inloop-reduction.ll
index 8e90287bac2a2..e85d5579bb332 100644
--- a/llvm/test/Transforms/LoopVectorize/RISCV/inloop-reduction.ll
+++ b/llvm/test/Transforms/LoopVectorize/RISCV/inloop-reduction.ll
@@ -344,16 +344,12 @@ define i32 @smin(ptr %a, i64 %n, i32 %start) {
;
; IF-EVL-OUTLOOP-LABEL: @smin(
; IF-EVL-OUTLOOP-NEXT: entry:
-; IF-EVL-OUTLOOP-NEXT: [[TMP0:%.*]] = sub i64 -1, [[N:%.*]]
-; IF-EVL-OUTLOOP-NEXT: [[TMP1:%.*]] = call i64 @llvm.vscale.i64()
-; IF-EVL-OUTLOOP-NEXT: [[TMP2:%.*]] = mul i64 [[TMP1]], 4
-; IF-EVL-OUTLOOP-NEXT: [[TMP3:%.*]] = icmp ult i64 [[TMP0]], [[TMP2]]
-; IF-EVL-OUTLOOP-NEXT: br i1 [[TMP3]], label [[SCALAR_PH:%.*]], label [[VECTOR_PH:%.*]]
+; IF-EVL-OUTLOOP-NEXT: br i1 false, label [[SCALAR_PH:%.*]], label [[VECTOR_PH:%.*]]
; IF-EVL-OUTLOOP: vector.ph:
; IF-EVL-OUTLOOP-NEXT: [[TMP4:%.*]] = call i64 @llvm.vscale.i64()
; IF-EVL-OUTLOOP-NEXT: [[TMP5:%.*]] = mul i64 [[TMP4]], 4
; IF-EVL-OUTLOOP-NEXT: [[TMP6:%.*]] = sub i64 [[TMP5]], 1
-; IF-EVL-OUTLOOP-NEXT: [[N_RND_UP:%.*]] = add i64 [[N]], [[TMP6]]
+; IF-EVL-OUTLOOP-NEXT: [[N_RND_UP:%.*]] = add i64 [[N:%.*]], [[TMP6]]
; IF-EVL-OUTLOOP-NEXT: [[N_MOD_VF:%.*]] = urem i64 [[N_RND_UP]], [[TMP5]]
; IF-EVL-OUTLOOP-NEXT: [[N_VEC:%.*]] = sub i64 [[N_RND_UP]], [[N_MOD_VF]]
; IF-EVL-OUTLOOP-NEXT: [[TMP7:%.*]] = call i64 @llvm.vscale.i64()
@@ -401,16 +397,12 @@ define i32 @smin(ptr %a, i64 %n, i32 %start) {
;
; IF-EVL-INLOOP-LABEL: @smin(
; IF-EVL-INLOOP-NEXT: entry:
-; IF-EVL-INLOOP-NEXT: [[TMP0:%.*]] = sub i64 -1, [[N:%.*]]
-; IF-EVL-INLOOP-NEXT: [[TMP1:%.*]] = call i64 @llvm.vscale.i64()
-; IF-EVL-INLOOP-NEXT: [[TMP2:%.*]] = mul i64 [[TMP1]], 4
-; IF-EVL-INLOOP-NEXT: [[TMP3:%.*]] = icmp ult i64 [[TMP0]], [[TMP2]]
-; IF-EVL-INLOOP-NEXT: br i1 [[TMP3]], label [[SCALAR_PH:%.*]], label [[VECTOR_PH:%.*]]
+; IF-EVL-INLOOP-NEXT: br i1 false, label [[SCALAR_PH:%.*]], label [[VECTOR_PH:%.*]]
; IF-EVL-INLOOP: vector.ph:
; IF-EVL-INLOOP-NEXT: [[TMP4:%.*]] = call i64 @llvm.vscale.i64()
; IF-EVL-INLOOP-NEXT: [[TMP5:%.*]] = mul i64 [[TMP4]], 4
; IF-EVL-INLOOP-NEXT: [[TMP6:%.*]] = sub i64 [[TMP5]], 1
-; IF-EVL-INLOOP-NEXT: [[N_RND_UP:%.*]] = add i64 [[N]], [[TMP6]]
+; IF-EVL-INLOOP-NEXT: [[N_RND_UP:%.*]] = add i64 [[N:%.*]], [[TMP6]]
; IF-EVL-INLOOP-NEXT: [[N_MOD_VF:%.*]] = urem i64 [[N_RND_UP]], [[TMP5]]
; IF-EVL-INLOOP-NEXT: [[N_VEC:%.*]] = sub i64 [[N_RND_UP]], [[N_MOD_VF]]
; IF-EVL-INLOOP-NEXT: [[TMP7:%.*]] = call i64 @llvm.vscale.i64()
diff --git a/llvm/test/Transforms/LoopVectorize/RISCV/type-info-cache-evl-crash.ll b/llvm/test/Transforms/LoopVectorize/RISCV/type-info-cache-evl-crash.ll
index 3a929f3e43a73..a5f2f9f5345d6 100644
--- a/llvm/test/Transforms/LoopVectorize/RISCV/type-info-cache-evl-crash.ll
+++ b/llvm/test/Transforms/LoopVectorize/RISCV/type-info-cache-evl-crash.ll
@@ -10,11 +10,7 @@ define void @type_info_cache_clobber(ptr %dstv, ptr %src, i64 %wide.trip.count)
; CHECK-SAME: ptr [[DSTV:%.*]], ptr [[SRC:%.*]], i64 [[WIDE_TRIP_COUNT:%.*]]) #[[ATTR0:[0-9]+]] {
; CHECK-NEXT: [[ENTRY:.*]]:
; CHECK-NEXT: [[TMP0:%.*]] = add i64 [[WIDE_TRIP_COUNT]], 1
-; CHECK-NEXT: [[TMP1:%.*]] = sub i64 -1, [[TMP0]]
-; CHECK-NEXT: [[TMP2:%.*]] = call i64 @llvm.vscale.i64()
-; CHECK-NEXT: [[TMP3:%.*]] = mul i64 [[TMP2]], 8
-; CHECK-NEXT: [[TMP4:%.*]] = icmp ult i64 [[TMP1]], [[TMP3]]
-; CHECK-NEXT: br i1 [[TMP4]], label %[[SCALAR_PH:.*]], label %[[VECTOR_MEMCHECK:.*]]
+; CHECK-NEXT: br i1 false, label %[[SCALAR_PH:.*]], label %[[VECTOR_MEMCHECK:.*]]
; CHECK: [[VECTOR_MEMCHECK]]:
; CHECK-NEXT: [[SCEVGEP:%.*]] = getelementptr i8, ptr [[DSTV]], i64 1
; CHECK-NEXT: [[TMP5:%.*]] = add i64 [[WIDE_TRIP_COUNT]], 1
diff --git a/llvm/test/Transforms/LoopVectorize/RISCV/vectorize-force-tail-with-evl-call-intrinsics.ll b/llvm/test/Transforms/LoopVectorize/RISCV/vectorize-force-tail-with-evl-call-intrinsics.ll
index 3128e40144e30..a89c469921fe6 100644
--- a/llvm/test/Transforms/LoopVectorize/RISCV/vectorize-force-tail-with-evl-call-intrinsics.ll
+++ b/llvm/test/Transforms/LoopVectorize/RISCV/vectorize-force-tail-with-evl-call-intrinsics.ll
@@ -16,12 +16,7 @@ define void @vp_smax(ptr %a, ptr %b, ptr %c, i64 %N) {
; IF-EVL-NEXT: [[C3:%.*]] = ptrtoint ptr [[C]] to i64
; IF-EVL-NEXT: [[B2:%.*]] = ptrtoint ptr [[B]] to i64
; IF-EVL-NEXT: [[A1:%.*]] = ptrtoint ptr [[A]] to i64
-; IF-EVL-NEXT: [[TMP0:%.*]] = sub i64 -1, [[N]]
-; IF-EVL-NEXT: [[TMP1:%.*]] = call i64 @llvm.vscale.i64()
-; IF-EVL-NEXT: [[TMP2:%.*]] = mul i64 [[TMP1]], 4
-; IF-EVL-NEXT: [[TMP3:%.*]] = call i64 @llvm.umax.i64(i64 13, i64 [[TMP2]])
-; IF-EVL-NEXT: [[TMP22:%.*]] = icmp ult i64 [[TMP0]], [[TMP3]]
-; IF-EVL-NEXT: br i1 [[TMP22]], label %[[SCALAR_PH:.*]], label %[[VECTOR_MEMCHECK:.*]]
+; IF-EVL-NEXT: br i1 false, label %[[SCALAR_PH:.*]], label %[[VECTOR_MEMCHECK:.*]]
; IF-EVL: [[VECTOR_MEMCHECK]]:
; IF-EVL-NEXT: [[TMP4:%.*]] = call i64 @llvm.vscale.i64()
; IF-EVL-NEXT: [[TMP5:%.*]] = mul i64 [[TMP4]], 4
@@ -130,12 +125,7 @@ define void @vp_smin(ptr %a, ptr %b, ptr %c, i64 %N) {
; IF-EVL-NEXT: [[C3:%.*]] = ptrtoint ptr [[C]] to i64
; IF-EVL-NEXT: [[B2:%.*]] = ptrtoint ptr [[B]] to i64
; IF-EVL-NEXT: [[A1:%.*]] = ptrtoint ptr [[A]] to i64
-; IF-EVL-NEXT: [[TMP0:%.*]] = sub i64 -1, [[N]]
-; IF-EVL-NEXT: [[TMP1:%.*]] = call i64 @llvm.vscale.i64()
-; IF-EVL-NEXT: [[TMP2:%.*]] = mul i64 [[TMP1]], 4
-; IF-EVL-NEXT: [[TMP3:%.*]] = call i64 @llvm.umax.i64(i64 13, i64 [[TMP2]])
-; IF-EVL-NEXT: [[TMP22:%.*]] = icmp ult i64 [[TMP0]], [[TMP3]]
-; IF-EVL-NEXT: br i1 [[TMP22]], label %[[SCALAR_PH:.*]], label %[[VECTOR_MEMCHECK:.*]]
+; IF-EVL-NEXT: br i1 false, label %[[SCALAR_PH:.*]], label %[[VECTOR_MEMCHECK:.*]]
; IF-EVL: [[VECTOR_MEMCHECK]]:
; IF-EVL-NEXT: [[TMP4:%.*]] = call i64 @llvm.vscale.i64()
; IF-EVL-NEXT: [[TMP5:%.*]] = mul i64 [[TMP4]], 4
@@ -244,12 +234,7 @@ define void @vp_umax(ptr %a, ptr %b, ptr %c, i64 %N) {
; IF-EVL-NEXT: [[C3:%.*]] = ptrtoint ptr [[C]] to i64
; IF-EVL-NEXT: [[B2:%.*]] = ptrtoint ptr [[B]] to i64
; IF-EVL-NEXT: [[A1:%.*]] = ptrtoint ptr [[A]] to i64
-; IF-EVL-NEXT: [[TMP0:%.*]] = sub i64 -1, [[N]]
-; IF-EVL-NEXT: [[TMP1:%.*]] = call i64 @llvm.vscale.i64()
-; IF-EVL-NEXT: [[TMP2:%.*]] = mul i64 [[TMP1]], 4
-; IF-EVL-NEXT: [[TMP3:%.*]] = call i64 @llvm.umax.i64(i64 13, i64 [[TMP2]])
-; IF-EVL-NEXT: [[TMP22:%.*]] = icmp ult i64 [[TMP0]], [[TMP3]]
-; IF-EVL-NEXT: br i1 [[TMP22]], label %[[SCALAR_PH:.*]], label %[[VECTOR_MEMCHECK:.*]]
+; IF-EVL-NEXT: br i1 false, label %[[SCALAR_PH:.*]], label %[[VECTOR_MEMCHECK:.*]]
; IF-EVL: [[VECTOR_MEMCHECK]]:
; IF-EVL-NEXT: [[TMP4:%.*]] = call i64 @llvm.vscale.i64()
; IF-EVL-NEXT: [[TMP5:%.*]] = mul i64 [[TMP4]], 4
@@ -358,12 +343,7 @@ define void @vp_umin(ptr %a, ptr %b, ptr %c, i64 %N) {
; IF-EVL-NEXT: [[C3:%.*]] = ptrtoint ptr [[C]] to i64
; IF-EVL-NEXT: [[B2:%.*]] = ptrtoint ptr [[B]] to i64
; IF-EVL-NEXT: [[A1:%.*]] = ptrtoint ptr [[A]] to i64
-; IF-EVL-NEXT: [[TMP0:%.*]] = sub i64 -1, [[N]]
-; IF-EVL-NEXT: [[TMP1:%.*]] = call i64 @llvm.vscale.i64()
-; IF-EVL-NEXT: [[TMP2:%.*]] = mul i64 [[TMP1]], 4
-; IF-EVL-NEXT: [[TMP3:%.*]] = call i64 @llvm.umax.i64(i64 13, i64 [[TMP2]])
-; IF-EVL-NEXT: [[TMP22:%.*]] = icmp ult i64 [[TMP0]], [[TMP3]]
-; IF-EVL-NEXT: br i1 [[TMP22]], label %[[SCALAR_PH:.*]], label %[[VECTOR_MEMCHECK:.*]]
+; IF-EVL-NEXT: br i1 false, label %[[SCALAR_PH:.*]], label %[[VECTOR_MEMCHECK:.*]]
; IF-EVL: [[VECTOR_MEMCHECK]]:
; IF-EVL-NEXT: [[TMP4:%.*]] = call i64 @llvm.vscale.i64()
; IF-EVL-NEXT: [[TMP5:%.*]] = mul i64 [[TMP4]], 4
@@ -472,11 +452,7 @@ define void @vp_ctlz(ptr %a, ptr %b, i64 %N) {
; IF-EVL-NEXT: [[ENTRY:.*]]:
; IF-EVL-NEXT: [[B2:%.*]] = ptrtoint ptr [[B]] to i64
; IF-EVL-NEXT: [[A1:%.*]] = ptrtoint ptr [[A]] to i64
-; IF-EVL-NEXT: [[TMP0:%.*]] = sub i64 -1, [[N]]
-; IF-EVL-NEXT: [[TMP1:%.*]] = call i64 @llvm.vscale.i64()
-; IF-EVL-NEXT: [[TMP2:%.*]] = mul i64 [[TMP1]], 4
-; IF-EVL-NEXT: [[TMP3:%.*]] = icmp ult i64 [[TMP0]], [[TMP2]]
-; IF-EVL-NEXT: br i1 [[TMP3]], label %[[SCALAR_PH:.*]], label %[[VECTOR_MEMCHECK:.*]]
+; IF-EVL-NEXT: br i1 false, label %[[SCALAR_PH:.*]], label %[[VECTOR_MEMCHECK:.*]]
; IF-EVL: [[VECTOR_MEMCHECK]]:
; IF-EVL-NEXT: [[TMP4:%.*]] = call i64 @llvm.vscale.i64()
; IF-EVL-NEXT: [[TMP5:%.*]] = mul i64 [[TMP4]], 4
@@ -571,11 +547,7 @@ define void @vp_cttz(ptr %a, ptr %b, i64 %N) {
; IF-EVL-NEXT: [[ENTRY:.*]]:
; IF-EVL-NEXT: [[B2:%.*]] = ptrtoint ptr [[B]] to i64
; IF-EVL-NEXT: [[A1:%.*]] = ptrtoint ptr [[A]] to i64
-; IF-EVL-NEXT: [[TMP0:%.*]] = sub i64 -1, [[N]]
-; IF-EVL-NEXT: [[TMP1:%.*]] = call i64 @llvm.vscale.i64()
-; IF-EVL-NEXT: [[TMP2:%.*]] = mul i64 [[TMP1]], 4
-; IF-EVL-NEXT: [[TMP3:%.*]] = icmp ult i64 [[TMP0]], [[TMP2]]
-; IF-EVL-NEXT: br i1 [[TMP3]], label %[[SCALAR_PH:.*]], label %[[VECTOR_MEMCHECK:.*]]
+; IF-EVL-NEXT: br i1 false, label %[[SCALAR_PH:.*]], label %[[VECTOR_MEMCHECK:.*]]
; IF-EVL: [[VECTOR_MEMCHECK]]:
; IF-EVL-NEXT: [[TMP4:%.*]] = call i64 @llvm.vscale.i64()
; IF-EVL-NEXT: [[TMP5:%.*]] = mul i64 [[TMP4]], 4
@@ -670,12 +642,7 @@ define void @vp_lrint(ptr %a, ptr %b, i64 %N) {
; IF-EVL-NEXT: [[ENTRY:.*]]:
; IF-EVL-NEXT: [[B2:%.*]] = ptrtoint ptr [[B]] to i64
; IF-EVL-NEXT: [[A1:%.*]] = ptrtoint ptr [[A]] to i64
-; IF-EVL-NEXT: [[TMP0:%.*]] = sub i64 -1, [[N]]
-; IF-EVL-NEXT: [[TMP1:%.*]] = call i64 @llvm.vscale.i64()
-; IF-EVL-NEXT: [[TMP2:%.*]] = mul i64 [[TMP1]], 4
-; IF-EVL-NEXT: [[TMP3:%.*]] = call i64 @llvm.umax.i64(i64 9, i64 [[TMP2]])
-; IF-EVL-NEXT: [[TMP22:%.*]] = icmp ult i64 [[TMP0]], [[TMP3]]
-; IF-EVL-NEXT: br i1 [[TMP22]], label %[[SCALAR_PH:.*]], label %[[VECTOR_MEMCHECK:.*]]
+; IF-EVL-NEXT: br i1 false, label %[[SCALAR_PH:.*]], label %[[VECTOR_MEMCHECK:.*]]
; IF-EVL: [[VECTOR_MEMCHECK]]:
; IF-EVL-NEXT: [[TMP4:%.*]] = call i64 @llvm.vscale.i64()
; IF-EVL-NEXT: [[TMP5:%.*]] = mul i64 [[TMP4]], 4
@@ -778,12 +745,7 @@ define void @vp_llrint(ptr %a, ptr %b, i64 %N) {
; IF-EVL-NEXT: [[ENTRY:.*]]:
; IF-EVL-NEXT: [[B2:%.*]] = ptrtoint ptr [[B]] to i64
; IF-EVL-NEXT: [[A1:%.*]] = ptrtoint ptr [[A]] to i64
-; IF-EVL-NEXT: [[TMP0:%.*]] = sub i64 -1, [[N]]
-; IF-EVL-NEXT: [[TMP1:%.*]] = call i64 @llvm.vscale.i64()
-; IF-EVL-NEXT: [[TMP2:%.*]] = mul i64 [[TMP1]], 4
-; IF-EVL-NEXT: [[TMP3:%.*]] = call i64 @llvm.umax.i64(i64 9, i64 [[TMP2]])
-; IF-EVL-NEXT: [[TMP22:%.*]] = icmp ult i64 [[TMP0]], [[TMP3]]
-; IF-EVL-NEXT: br i1 [[TMP22]], label %[[SCALAR_PH:.*]], label %[[VECTOR_MEMCHECK:.*]]
+; IF-EVL-NEXT: br i1 false, label %[[SCALAR_PH:.*]], label %[[VECTOR_MEMCHECK:.*]]
; IF-EVL: [[VECTOR_MEMCHECK]]:
; IF-EVL-NEXT: [[TMP4:%.*]] = call i64 @llvm.vscale.i64()
; IF-EVL-NEXT: [[TMP5:%.*]] = mul i64 [[TMP4]], 4
@@ -886,12 +848,7 @@ define void @vp_abs(ptr %a, ptr %b, i64 %N) {
; IF-EVL-NEXT: [[ENTRY:.*]]:
; IF-EVL-NEXT: [[B2:%.*]] = ptrtoint ptr [[B]] to i64
; IF-EVL-NEXT: [[A1:%.*]] = ptrtoint ptr [[A]] to i64
-; IF-EVL-NEXT: [[TMP0:%.*]] = sub i64 -1, [[N]]
-; IF-EVL-NEXT: [[TMP1:%.*]] = call i64 @llvm.vscale.i64()
-; IF-EVL-NEXT: [[TMP2:%.*]] = mul i64 [[TMP1]], 4
-; IF-EVL-NEXT: [[TMP3:%.*]] = call i64 @llvm.umax.i64(i64 8, i64 [[TMP2]])
-; IF-EVL-NEXT: [[TMP19:%.*]] = icmp ult i64 [[TMP0]], [[TMP3]]
-; IF-EVL-NEXT: br i1 [[TMP19]], label %[[SCALAR_PH:.*]], label %[[VECTOR_MEMCHECK:.*]]
+; IF-EVL-NEXT: br i1 false, label %[[SCALAR_PH:.*]], label %[[VECTOR_MEMCHECK:.*]]
; IF-EVL: [[VECTOR_MEMCHECK]]:
; IF-EVL-NEXT: [[TMP4:%.*]] = call i64 @llvm.vscale.i64()
; IF-EVL-NEXT: [[TMP5:%.*]] = mul i64 [[TMP4]], 4
diff --git a/llvm/test/Transforms/LoopVectorize/RISCV/vectorize-force-tail-with-evl-cast-intrinsics.ll b/llvm/test/Transforms/LoopVectorize/RISCV/vectorize-force-tail-with-evl-cast-intrinsics.ll
index 50b600f8e8bda..0672df2b3d80c 100644
--- a/llvm/test/Transforms/LoopVectorize/RISCV/vectorize-force-tail-with-evl-cast-intrinsics.ll
+++ b/llvm/test/Transforms/LoopVectorize/RISCV/vectorize-force-tail-with-evl-cast-intrinsics.ll
@@ -13,12 +13,7 @@ define void @vp_sext(ptr %a, ptr %b, i64 %N) {
; IF-EVL-LABEL: define void @vp_sext(
; IF-EVL-SAME: ptr [[A:%.*]], ptr [[B:%.*]], i64 [[N:%.*]]) #[[ATTR0:[0-9]+]] {
; IF-EVL-NEXT: [[ENTRY:.*]]:
-; IF-EVL-NEXT: [[TMP0:%.*]] = sub i64 -1, [[N]]
-; IF-EVL-NEXT: [[TMP1:%.*]] = call i64 @llvm.vscale.i64()
-; IF-EVL-NEXT: [[TMP2:%.*]] = mul i64 [[TMP1]], 2
-; IF-EVL-NEXT: [[TMP3:%.*]] = call i64 @llvm.umax.i64(i64 18, i64 [[TMP2]])
-; IF-EVL-NEXT: [[TMP4:%.*]] = icmp ult i64 [[TMP0]], [[TMP3]]
-; IF-EVL-NEXT: br i1 [[TMP4]], label %[[SCALAR_PH:.*]], label %[[VECTOR_MEMCHECK:.*]]
+; IF-EVL-NEXT: br i1 false, label %[[SCALAR_PH:.*]], label %[[VECTOR_MEMCHECK:.*]]
; IF-EVL: [[VECTOR_MEMCHECK]]:
; IF-EVL-NEXT: [[TMP5:%.*]] = shl i64 [[N]], 3
; IF-EVL-NEXT: [[SCEVGEP:%.*]] = getelementptr i8, ptr [[A]], i64 [[TMP5]]
@@ -112,12 +107,7 @@ define void @vp_zext(ptr %a, ptr %b, i64 %N) {
; IF-EVL-LABEL: define void @vp_zext(
; IF-EVL-SAME: ptr [[A:%.*]], ptr [[B:%.*]], i64 [[N:%.*]]) #[[ATTR0]] {
; IF-EVL-NEXT: [[ENTRY:.*]]:
-; IF-EVL-NEXT: [[TMP0:%.*]] = sub i64 -1, [[N]]
-; IF-EVL-NEXT: [[TMP1:%.*]] = call i64 @llvm.vscale.i64()
-; IF-EVL-NEXT: [[TMP2:%.*]] = mul i64 [[TMP1]], 2
-; IF-EVL-NEXT: [[TMP3:%.*]] = call i64 @llvm.umax.i64(i64 18, i64 [[TMP2]])
-; IF-EVL-NEXT: [[TMP4:%.*]] = icmp ult i64 [[TMP0]], [[TMP3]]
-; IF-EVL-NEXT: br i1 [[TMP4]], label %[[SCALAR_PH:.*]], label %[[VECTOR_MEMCHECK:.*]]
+; IF-EVL-NEXT: br i1 false, label %[[SCALAR_PH:.*]], label %[[VECTOR_MEMCHECK:.*]]
; IF-EVL: [[VECTOR_MEMCHECK]]:
; IF-EVL-NEXT: [[TMP5:%.*]] = shl i64 [[N]], 3
; IF-EVL-NEXT: [[SCEVGEP:%.*]] = getelementptr i8, ptr [[A]], i64 [[TMP5]]
@@ -211,12 +201,7 @@ define void @vp_trunc(ptr %a, ptr %b, i64 %N) {
; IF-EVL-LABEL: define void @vp_trunc(
; IF-EVL-SAME: ptr [[A:%.*]], ptr [[B:%.*]], i64 [[N:%.*]]) #[[ATTR0]] {
; IF-EVL-NEXT: [[ENTRY:.*]]:
-; IF-EVL-NEXT: [[TMP0:%.*]] = sub i64 -1, [[N]]
-; IF-EVL-NEXT: [[TMP1:%.*]] = call i64 @llvm.vscale.i64()
-; IF-EVL-NEXT: [[TMP2:%.*]] = mul i64 [[TMP1]], 2
-; IF-EVL-NEXT: [[TMP3:%.*]] = call i64 @llvm.umax.i64(i64 18, i64 [[TMP2]])
-; IF-EVL-NEXT: [[TMP4:%.*]] = icmp ult i64 [[TMP0]], [[TMP3]]
-; IF-EVL-NEXT: br i1 [[TMP4]], label %[[SCALAR_PH:.*]], label %[[VECTOR_MEMCHECK:.*]]
+; IF-EVL-NEXT: br i1 false, label %[[SCALAR_PH:.*]], label %[[VECTOR_MEMCHECK:.*]]
; IF-EVL: [[VECTOR_MEMCHECK]]:
; IF-EVL-NEXT: [[TMP5:%.*]] = shl i64 [[N]], 2
; IF-EVL-NEXT: [[SCEVGEP:%.*]] = getelementptr i8, ptr [[A]], i64 [[TMP5]]
@@ -310,12 +295,7 @@ define void @vp_fpext(ptr %a, ptr %b, i64 %N) {
; IF-EVL-LABEL: define void @vp_fpext(
; IF-EVL-SAME: ptr [[A:%.*]], ptr [[B:%.*]], i64 [[N:%.*]]) #[[ATTR0]] {
; IF-EVL-NEXT: [[ENTRY:.*]]:
-; IF-EVL-NEXT: [[TMP0:%.*]] = sub i64 -1, [[N]]
-; IF-EVL-NEXT: [[TMP1:%.*]] = call i64 @llvm.vscale.i64()
-; IF-EVL-NEXT: [[TMP2:%.*]] = mul i64 [[TMP1]], 2
-; IF-EVL-NEXT: [[TMP3:%.*]] = call i64 @llvm.umax.i64(i64 14, i64 [[TMP2]])
-; IF-EVL-NEXT: [[TMP4:%.*]] = icmp ult i64 [[TMP0]], [[TMP3]]
-; IF-EVL-NEXT: br i1 [[TMP4]], label %[[SCALAR_PH:.*]], label %[[VECTOR_MEMCHECK:.*]]
+; IF-EVL-NEXT: br i1 false, label %[[SCALAR_PH:.*]], label %[[VECTOR_MEMCHECK:.*]]
; IF-EVL: [[VECTOR_MEMCHECK]]:
; IF-EVL-NEXT: [[TMP5:%.*]] = shl i64 [[N]], 3
; IF-EVL-NEXT: [[SCEVGEP:%.*]] = getelementptr i8, ptr [[A]], i64 [[TMP5]]
@@ -409,12 +389,7 @@ define void @vp_fptrunc(ptr %a, ptr %b, i64 %N) {
; IF-EVL-LABEL: define void @vp_fptrunc(
; IF-EVL-SAME: ptr [[A:%.*]], ptr [[B:%.*]], i64 [[N:%.*]]) #[[ATTR0]] {
; IF-EVL-NEXT: [[ENTRY:.*]]:
-; IF-EVL-NEXT: [[TMP0:%.*]] = sub i64 -1, [[N]]
-; IF-EVL-NEXT: [[TMP1:%.*]] = call i64 @llvm.vscale.i64()
-; IF-EVL-NEXT: [[TMP2:%.*]] = mul i64 [[TMP1]], 2
-; IF-EVL-NEXT: [[TMP3:%.*]] = call i64 @llvm.umax.i64(i64 14, i64 [[TMP2]])
-; IF-EVL-NEXT: [[TMP4:%.*]] = icmp ult i64 [[TMP0]], [[TMP3]]
-; IF-EVL-NEXT: br i1 [[TMP4]], label %[[SCALAR_PH:.*]], label %[[VECTOR_MEMCHECK:.*]]
+; IF-EVL-NEXT: br i1 false, label %[[SCALAR_PH:...
[truncated]
|
| CheckMinIters = Builder.CreateICmp(P, Count, Step, "min.iters.check"); | ||
| } // else step known to be < trip count, use CheckMinIters preset to false. | ||
| } else if (VF.isScalable() && | ||
| } else if (VF.isScalable() && !TTI->isVScaleKnownToBeAPowerOfTwo() && |
There was a problem hiding this comment.
Do you know if the code here is now exercised at all in any tests? I understand the rationale for adding the check, but I'm a bit worried that there is no longer anything to defend it.
There was a problem hiding this comment.
I think this should hopefully be covered by tests using -force-target-supports-scalable-vectors=true -scalable-vectorization=on w/o target. @preames would be good to double check that's indeed the case before landing.
There was a problem hiding this comment.
I just did some checking and I couldn't find any tests that cover this, we should probably add one then.
| CheckMinIters = Builder.CreateICmp(P, Count, Step, "min.iters.check"); | ||
| } // else step known to be < trip count, use CheckMinIters preset to false. | ||
| } else if (VF.isScalable() && | ||
| } else if (VF.isScalable() && !TTI->isVScaleKnownToBeAPowerOfTwo() && |
There was a problem hiding this comment.
I think this should hopefully be covered by tests using -force-target-supports-scalable-vectors=true -scalable-vectorization=on w/o target. @preames would be good to double check that's indeed the case before landing.
lukel97
left a comment
There was a problem hiding this comment.
LGTM if we can add a test for that code path
| CheckMinIters = Builder.CreateICmp(P, Count, Step, "min.iters.check"); | ||
| } // else step known to be < trip count, use CheckMinIters preset to false. | ||
| } else if (VF.isScalable() && | ||
| } else if (VF.isScalable() && !TTI->isVScaleKnownToBeAPowerOfTwo() && |
There was a problem hiding this comment.
I just did some checking and I couldn't find any tests that cover this, we should probably add one then.
|
I don't believe this is possible to test on a target without scalable vector support. I tried, but the costs returned are all invalid, and thus we refuse to vectorize entirely. (Which seems like the correct result) Would folks rather I land this with the code path untested, or convert the check to an assertion? |
Does |
Nope, I continue to get "remark: :0:0: UserVF ignored because of invalid costs.". My attempt in case anyone has suggestions: |
Fair enough, I don't have any better ideas. I'm fine to convert it to an assertion or to leave it untested |
|
If it really is impossible to test, then surely we should just delete the entire block of code to avoid having to maintain something that's impossible to test? What do you think @fhahn? |
+1, I'm thinking that the assertion would just be |
Thinking about this now, the problem may be because the target also has to support masked loads and stores. I'm going to see what happens if I can vectorise a loop with memory ops. |
|
I took an existing test from Transforms/LoopVectorize/AArch64/tail-folding-styles.ll and modified it: I then ran opt and it seems to now hit that code path. It might be a little fragile because there are no memory ops in there and in future I can imagine we might decide there are no vector operations in the loop and abandon vectorisation early on. I think the problem is finding a way to force the target also supporting masked loads and stores. |
|
If the load is guaranteed to be dereferenceable it will be treated as a normal load, so this also should work: |
|
I had just pushed a change which turned this into an assert instead, but am happy to go with whatever reviewers prefer. Honestly, I'm wondering if we should just update LangRef to require vscale to be a power of two. If a target comes along for which this property doesn't hold, we could just relax it then. |
|
The last time this was discussed the conclusion was that if we specify vscale must be a power-of-two then realistically we can never walk it back because it'll be too hard to find the places that rely on the assumption. The compromise was to change the definition of |
I am increasingly thinking this was the wrong decision. The only hardware we know of has power-of-two sizes, and keeping complexity for a potential future use case is generally undesirable. Finding the places that assume power of two again (if ever needed) doesn't seem that hard - compared to implementing any reasonable complicated feature at least. |
This reverts commit d4351da.
|
I landed this in the original form (i.e. making the check conditional) with the suggested test. I'm going to post a separate patch to continue the discussion on whether vscale should just be defined as a power of two. |
|
See #145098 for the followup discussion. |
…heck Previously, the canonical IV increment may have overflowed to a non-zero value due to vscale being a non power-of-two. So we used to emit a runtime check for this. If you didn't want the runtime check, DataAndControlFlowWithoutRuntimeCheck skipped it and instead tweaked the trip count so it wouldn't overflow. However llvm#144963 stopped the check from ever being emitted (and in llvm#183292 the code to emit the check was removed), but we never restored the trip count back to normal now that it was no longer needed. This PR restores the trip count since we don't need to adjust it. A follow up NFC can then remove DataAndControlFlowWithoutRuntimeCheck.
Stacked on llvm#183729 After llvm#144963 and llvm#183292 we never emit the runtime check, so DataAndControlFlowWithoutRuntimeCheck is equivalent to DataAndControlFlow. With that we only need to store one tail folding style instead of two, because we don't need to distinguish whether or not the IV update overflows (to a non-zero value)
…heck (#183729) Previously, the canonical IV increment may have overflowed to a non-zero value due to vscale being a non power-of-two. So we used to emit a runtime check for this. If you didn't want the runtime check, DataAndControlFlowWithoutRuntimeCheck skipped it and instead tweaked the trip count so it wouldn't overflow. However #144963 stopped the check from ever being emitted because vscale is always a power-of-two on AArch64 and RISC-V, so it never overflowed to a non-zero value. And in #183292 the code to emit the check was removed. But we never restored the trip count back to normal when the target's vscale was a power-of-two. Now that vscale is always a power-of-two, this PR avoids adjusting it. A follow up NFC can then remove DataAndControlFlowWithoutRuntimeCheck.
Stacked on llvm#183729 After llvm#144963 and llvm#183292 we never emit the runtime check, so DataAndControlFlowWithoutRuntimeCheck is equivalent to DataAndControlFlow. With that we only need to store one tail folding style instead of two, because we don't need to distinguish whether or not the IV update overflows (to a non-zero value)
After #144963 and #183292 we never emit the runtime check, so DataAndControlFlowWithoutRuntimeCheck is equivalent to DataAndControlFlow. With that we only need to store one tail folding style instead of two, because we don't need to distinguish whether or not the IV update overflows (to a non-zero value)
After llvm#144963 and llvm#183292 we never emit the runtime check, so DataAndControlFlowWithoutRuntimeCheck is equivalent to DataAndControlFlow. With that we only need to store one tail folding style instead of two, because we don't need to distinguish whether or not the IV update overflows (to a non-zero value)
…heck (llvm#183729) Previously, the canonical IV increment may have overflowed to a non-zero value due to vscale being a non power-of-two. So we used to emit a runtime check for this. If you didn't want the runtime check, DataAndControlFlowWithoutRuntimeCheck skipped it and instead tweaked the trip count so it wouldn't overflow. However llvm#144963 stopped the check from ever being emitted because vscale is always a power-of-two on AArch64 and RISC-V, so it never overflowed to a non-zero value. And in llvm#183292 the code to emit the check was removed. But we never restored the trip count back to normal when the target's vscale was a power-of-two. Now that vscale is always a power-of-two, this PR avoids adjusting it. A follow up NFC can then remove DataAndControlFlowWithoutRuntimeCheck.
After llvm#144963 and llvm#183292 we never emit the runtime check, so DataAndControlFlowWithoutRuntimeCheck is equivalent to DataAndControlFlow. With that we only need to store one tail folding style instead of two, because we don't need to distinguish whether or not the IV update overflows (to a non-zero value)
…heck (llvm#183729) Previously, the canonical IV increment may have overflowed to a non-zero value due to vscale being a non power-of-two. So we used to emit a runtime check for this. If you didn't want the runtime check, DataAndControlFlowWithoutRuntimeCheck skipped it and instead tweaked the trip count so it wouldn't overflow. However llvm#144963 stopped the check from ever being emitted because vscale is always a power-of-two on AArch64 and RISC-V, so it never overflowed to a non-zero value. And in llvm#183292 the code to emit the check was removed. But we never restored the trip count back to normal when the target's vscale was a power-of-two. Now that vscale is always a power-of-two, this PR avoids adjusting it. A follow up NFC can then remove DataAndControlFlowWithoutRuntimeCheck.
After llvm#144963 and llvm#183292 we never emit the runtime check, so DataAndControlFlowWithoutRuntimeCheck is equivalent to DataAndControlFlow. With that we only need to store one tail folding style instead of two, because we don't need to distinguish whether or not the IV update overflows (to a non-zero value)
Going mostly by the comment here - but it says "vscale is not necessarily a power-of-2". Both in tree targets have vscale as a power of two, and we have an existing TTI hook for that.