-
Notifications
You must be signed in to change notification settings - Fork 16k
[VPlan] Use unsigned integers for lane start indices #175231
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[VPlan] Use unsigned integers for lane start indices #175231
Conversation
|
@llvm/pr-subscribers-vectorizers @llvm/pr-subscribers-llvm-transforms Author: Aiden Grossman (boomanaiden154) Changesa83c894 caused assertion failures here as if we have a single bit induction variable and two lanes (0 and 1), then the second lane index (1) will be out of bounds of what a signed 1-bit integer can hold. Lane indices are always >0 according to VPlanHelpers.h:125, and the lane representation in this code is also unsigned. The test case come from tensorflow/XLA. Full diff: https://github.com/llvm/llvm-project/pull/175231.diff 2 Files Affected:
diff --git a/llvm/lib/Transforms/Vectorize/VPlanRecipes.cpp b/llvm/lib/Transforms/Vectorize/VPlanRecipes.cpp
index b9cd322d9ec69..e31e24fce5316 100644
--- a/llvm/lib/Transforms/Vectorize/VPlanRecipes.cpp
+++ b/llvm/lib/Transforms/Vectorize/VPlanRecipes.cpp
@@ -2346,9 +2346,8 @@ InstructionCost VPHeaderPHIRecipe::computeCost(ElementCount VF,
/// A helper function that returns an integer or floating-point constant with
/// value C.
-static Constant *getSignedIntOrFpConstant(Type *Ty, int64_t C) {
- return Ty->isIntegerTy() ? ConstantInt::getSigned(Ty, C)
- : ConstantFP::get(Ty, C);
+static Constant *getUnsignedIntOrFpConstant(Type *Ty, int64_t C) {
+ return Ty->isIntegerTy() ? ConstantInt::get(Ty, C) : ConstantFP::get(Ty, C);
}
#if !defined(NDEBUG) || defined(LLVM_ENABLE_DUMP)
@@ -2453,7 +2452,7 @@ void VPScalarIVStepsRecipe::execute(VPTransformState &State) {
for (unsigned Lane = StartLane; Lane < EndLane; ++Lane) {
Value *StartIdx = Builder.CreateBinOp(
- AddOp, StartIdx0, getSignedIntOrFpConstant(BaseIVTy, Lane));
+ AddOp, StartIdx0, getUnsignedIntOrFpConstant(BaseIVTy, Lane));
// The step returned by `createStepForVF` is a runtime-evaluated value
// when VF is scalable. Otherwise, it should be folded into a Constant.
assert((State.VF.isScalable() || isa<Constant>(StartIdx)) &&
diff --git a/llvm/test/Transforms/LoopVectorize/X86/vplan-single-bit-ind-var.ll b/llvm/test/Transforms/LoopVectorize/X86/vplan-single-bit-ind-var.ll
new file mode 100644
index 0000000000000..d501a485ac4db
--- /dev/null
+++ b/llvm/test/Transforms/LoopVectorize/X86/vplan-single-bit-ind-var.ll
@@ -0,0 +1,47 @@
+; NOTE: Assertions have been autogenerated by utils/update_test_checks.py UTC_ARGS: --version 6
+; RUN: opt -passes=loop-vectorize -mcpu=znver3 -S %s 2>&1 | FileCheck %s
+
+target datalayout = "e-m:e-p270:32:32-p271:32:32-p272:64:64-i64:64-i128:128-f80:128-n8:16:32:64-S128"
+target triple = "x86_64-grtev4-linux-gnu"
+
+define ptr @copy_bitcast_fusion() {
+; CHECK-LABEL: define ptr @copy_bitcast_fusion(
+; CHECK-SAME: ) #[[ATTR0:[0-9]+]] {
+; CHECK-NEXT: [[_PREHEADER7:.*:]]
+; CHECK-NEXT: br label %[[VECTOR_PH:.*]]
+; CHECK: [[VECTOR_PH]]:
+; CHECK-NEXT: br label %[[VECTOR_BODY:.*]]
+; CHECK: [[VECTOR_BODY]]:
+; CHECK-NEXT: [[TMP0:%.*]] = select i1 false, i64 1, i64 0
+; CHECK-NEXT: [[TMP1:%.*]] = select i1 true, i64 1, i64 0
+; CHECK-NEXT: [[TMP2:%.*]] = getelementptr float, ptr null, i64 [[TMP0]]
+; CHECK-NEXT: [[TMP3:%.*]] = getelementptr float, ptr null, i64 [[TMP1]]
+; CHECK-NEXT: [[TMP4:%.*]] = load float, ptr [[TMP2]], align 4
+; CHECK-NEXT: [[TMP5:%.*]] = load float, ptr [[TMP3]], align 4
+; CHECK-NEXT: [[TMP6:%.*]] = insertelement <2 x float> poison, float [[TMP4]], i32 0
+; CHECK-NEXT: [[TMP7:%.*]] = insertelement <2 x float> [[TMP6]], float [[TMP5]], i32 1
+; CHECK-NEXT: store <2 x float> [[TMP7]], ptr null, align 4
+; CHECK-NEXT: br label %[[MIDDLE_BLOCK:.*]]
+; CHECK: [[MIDDLE_BLOCK]]:
+; CHECK-NEXT: br label %[[BB8:.*]]
+; CHECK: [[BB8]]:
+; CHECK-NEXT: ret ptr null
+;
+.preheader7:
+ br label %0
+
+0: ; preds = %0, %.preheader7
+ %1 = phi i64 [ 0, %.preheader7 ], [ %7, %0 ]
+ %2 = trunc i64 %1 to i1
+ %3 = select i1 %2, i64 1, i64 0
+ %4 = getelementptr float, ptr null, i64 %3
+ %5 = load float, ptr %4, align 4
+ %6 = getelementptr float, ptr null, i64 %1
+ store float %5, ptr %6, align 4
+ %7 = add i64 %1, 1
+ %exitcond.not = icmp eq i64 %1, 1
+ br i1 %exitcond.not, label %8, label %0
+
+8: ; preds = %0
+ ret ptr null
+}
|
llvm/test/Transforms/LoopVectorize/X86/vplan-single-bit-ind-var.ll
Outdated
Show resolved
Hide resolved
llvm/test/Transforms/LoopVectorize/X86/vplan-single-bit-ind-var.ll
Outdated
Show resolved
Hide resolved
llvm/test/Transforms/LoopVectorize/X86/vplan-single-bit-ind-var.ll
Outdated
Show resolved
Hide resolved
llvm/test/Transforms/LoopVectorize/X86/vplan-single-bit-ind-var.ll
Outdated
Show resolved
Hide resolved
a83c894 caused assertion failures here as if we have a single bit induction variable and two lanes (0 and 1), then the second lane index (1) will be out of bounds of what a signed 1-bit integer can hold. Lane indices are always >0 according to VPlanHelpers.h:125, and the lane representation in this code is also unsigned. The test case come from tensorflow/XLA.
ae8ddd4 to
c1e7621
Compare
|
(sorry for the force push. I was amending, forgetting I wasn't using |
nikic
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
|
Thank you very much for the fast review! Looks you already got the last issue we were seeing (1c07ba6). |
a83c894 caused assertion failures here as if we have a single bit induction variable and two lanes (0 and 1), then the second lane index (1) will be out of bounds of what a signed 1-bit integer can hold. Lane indices are always >0 according to VPlanHelpers.h:125, and the lane representation in this code is also unsigned. The test case come from tensorflow/XLA.
a83c894 caused assertion failures here as if we have a single bit induction variable and two lanes (0 and 1), then the second lane index (1) will be out of bounds of what a signed 1-bit integer can hold. Lane indices are always >0 according to VPlanHelpers.h:125, and the lane representation in this code is also unsigned.
The test case come from tensorflow/XLA.