-
Notifications
You must be signed in to change notification settings - Fork 12.4k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[InterleavedAccessPass] Get round the unsupported large scalarize vectors #88643
Conversation
@llvm/pr-subscribers-backend-aarch64 @llvm/pr-subscribers-llvm-transforms Author: Allen (vfdff) ChangesWhen build with option -msve-vector-bits=512, the return vaule of Subtarget->getMinSVEVectorSizeInBits() is 512; Fix #88247 Full diff: https://github.com/llvm/llvm-project/pull/88643.diff 2 Files Affected:
diff --git a/llvm/lib/Target/AArch64/AArch64ISelLowering.cpp b/llvm/lib/Target/AArch64/AArch64ISelLowering.cpp
index f552f91929201c..24ce3a8121500d 100644
--- a/llvm/lib/Target/AArch64/AArch64ISelLowering.cpp
+++ b/llvm/lib/Target/AArch64/AArch64ISelLowering.cpp
@@ -16433,7 +16433,9 @@ bool AArch64TargetLowering::lowerDeinterleaveIntrinsicToLoad(
if (UseScalable && !VTy->isScalableTy())
return false;
- unsigned NumLoads = getNumInterleavedAccesses(VTy, DL, UseScalable);
+ // Get around for the missing of split large size scalarize vectors.
+ // TODO: Support the custom split of large size scalarize vectors
+ unsigned NumLoads = getNumInterleavedAccesses(VTy, DL, false);
VectorType *LdTy =
VectorType::get(VTy->getElementType(),
diff --git a/llvm/test/Transforms/InterleavedAccess/AArch64/sve-interleaved-accesses.ll b/llvm/test/Transforms/InterleavedAccess/AArch64/sve-interleaved-accesses.ll
index feb22aa1a37635..d0f773a6174d24 100644
--- a/llvm/test/Transforms/InterleavedAccess/AArch64/sve-interleaved-accesses.ll
+++ b/llvm/test/Transforms/InterleavedAccess/AArch64/sve-interleaved-accesses.ll
@@ -491,6 +491,31 @@ define void @store_bfloat_factor2(ptr %ptr, <16 x bfloat> %v0, <16 x bfloat> %v1
ret void
}
+; Use 4 vliad llvm.vector.insert.nxv4f64.nxv2f64
+define { <vscale x 4 x double>, <vscale x 4 x double> } @deinterleave_nxptr_factor2(ptr nocapture readonly %ptr) #2 {
+; CHECK-LABEL: define { <vscale x 4 x double>, <vscale x 4 x double> } @deinterleave_nxptr_factor2(
+; CHECK-SAME: ptr nocapture readonly [[PTR:%.*]]) #[[ATTR2]] {
+; CHECK-NEXT: [[TMP1:%.*]] = getelementptr <vscale x 2 x double>, ptr [[PTR]], i64 0
+; CHECK-NEXT: [[LDN1:%.*]] = call { <vscale x 2 x double>, <vscale x 2 x double> } @llvm.aarch64.sve.ld2.sret.nxv2f64(<vscale x 2 x i1> shufflevector (<vscale x 2 x i1> insertelement (<vscale x 2 x i1> poison, i1 true, i64 0), <vscale x 2 x i1> poison, <vscale x 2 x i32> zeroinitializer), ptr [[TMP1]])
+; CHECK-NEXT: [[TMP2:%.*]] = extractvalue { <vscale x 2 x double>, <vscale x 2 x double> } [[LDN1]], 0
+; CHECK-NEXT: [[TMP3:%.*]] = call <vscale x 4 x double> @llvm.vector.insert.nxv4f64.nxv2f64(<vscale x 4 x double> poison, <vscale x 2 x double> [[TMP2]], i64 0)
+; CHECK-NEXT: [[TMP4:%.*]] = extractvalue { <vscale x 2 x double>, <vscale x 2 x double> } [[LDN1]], 1
+; CHECK-NEXT: [[TMP5:%.*]] = call <vscale x 4 x double> @llvm.vector.insert.nxv4f64.nxv2f64(<vscale x 4 x double> poison, <vscale x 2 x double> [[TMP4]], i64 0)
+; CHECK-NEXT: [[TMP6:%.*]] = getelementptr <vscale x 2 x double>, ptr [[PTR]], i64 2
+; CHECK-NEXT: [[LDN2:%.*]] = call { <vscale x 2 x double>, <vscale x 2 x double> } @llvm.aarch64.sve.ld2.sret.nxv2f64(<vscale x 2 x i1> shufflevector (<vscale x 2 x i1> insertelement (<vscale x 2 x i1> poison, i1 true, i64 0), <vscale x 2 x i1> poison, <vscale x 2 x i32> zeroinitializer), ptr [[TMP6]])
+; CHECK-NEXT: [[TMP7:%.*]] = extractvalue { <vscale x 2 x double>, <vscale x 2 x double> } [[LDN2]], 0
+; CHECK-NEXT: [[TMP8:%.*]] = call <vscale x 4 x double> @llvm.vector.insert.nxv4f64.nxv2f64(<vscale x 4 x double> [[TMP3]], <vscale x 2 x double> [[TMP7]], i64 2)
+; CHECK-NEXT: [[TMP9:%.*]] = extractvalue { <vscale x 2 x double>, <vscale x 2 x double> } [[LDN2]], 1
+; CHECK-NEXT: [[TMP10:%.*]] = call <vscale x 4 x double> @llvm.vector.insert.nxv4f64.nxv2f64(<vscale x 4 x double> [[TMP5]], <vscale x 2 x double> [[TMP9]], i64 2)
+; CHECK-NEXT: [[TMP11:%.*]] = insertvalue { <vscale x 4 x double>, <vscale x 4 x double> } poison, <vscale x 4 x double> [[TMP8]], 0
+; CHECK-NEXT: [[TMP12:%.*]] = insertvalue { <vscale x 4 x double>, <vscale x 4 x double> } [[TMP11]], <vscale x 4 x double> [[TMP10]], 1
+; CHECK-NEXT: ret { <vscale x 4 x double>, <vscale x 4 x double> } [[TMP12]]
+;
+ %wide.vec = load <vscale x 8 x double>, ptr %ptr, align 8
+ %ldN = tail call { <vscale x 4 x double>, <vscale x 4 x double> } @llvm.experimental.vector.deinterleave2.nxv8f64(<vscale x 8 x double> %wide.vec)
+ ret { <vscale x 4 x double>, <vscale x 4 x double> } %ldN
+}
+
attributes #0 = { vscale_range(2,2) "target-features"="+sve" }
attributes #1 = { vscale_range(2,4) "target-features"="+sve" }
attributes #2 = { vscale_range(4,4) "target-features"="+sve" }
|
86f56f5
to
5fd76f1
Compare
unsigned NumLoads = getNumInterleavedAccesses(VTy, DL, UseScalable); | ||
// Get around for the missing of split large size scalarize vectors. | ||
// TODO: Support the custom split of large size scalarize vectors | ||
unsigned NumLoads = getNumInterleavedAccesses(VTy, DL, false); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This looks like it would break the SVE fixed length support. Looking at the code I think changing:
if (UseScalable)
to
if (UseScalable && isa<FixedVectorType>(VecTy)
should do the trick given getMinSVEVectorSizeInBits()
is not relevant for scalable vector types and the default 128
value should be maintained.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Apply your comment, thanks @paulwalker-arm
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sorry, I should have been clearer. I meant the code within getNumInterleavedAccesses()
needs to be changed because that is where the bug exists.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done, thanks
unsigned NumLoads = getNumInterleavedAccesses(VTy, DL, UseScalable); | ||
// Get around for the missing of split large size scalarize vectors. | ||
// TODO: Support the custom split of large size scalarize vectors | ||
unsigned NumLoads = getNumInterleavedAccesses(VTy, DL, false); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sorry, I should have been clearer. I meant the code within getNumInterleavedAccesses()
needs to be changed because that is where the bug exists.
@@ -491,6 +491,30 @@ define void @store_bfloat_factor2(ptr %ptr, <16 x bfloat> %v0, <16 x bfloat> %v1 | |||
ret void | |||
} | |||
|
|||
; Use 4 vliad llvm.vector.insert.nxv4f64.nxv2f64 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is effectively a negative test so perhaps something like "Ensure vscale_range property does not affect scalable vector types."?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Update the comment, thanks
…tors When build with option -msve-vector-bits=512, the return vaule of Subtarget->getMinSVEVectorSizeInBits() is 512; While the MinElts is still 4 for <vscale x 4 x double> in getNumInterleavedAccesses, so it creates invalid llvm.aarch64.sve.ld2.sret.nxv4f64, which need be splited. Unlikely, the related custom spilting is not supported now.
When build with option -msve-vector-bits=512, the return vaule of Subtarget->getMinSVEVectorSizeInBits() is 512;
While the MinElts is still 4 for <vscale x 4 x double> in getNumInterleavedAccesses, so it creates invalid
llvm.aarch64.sve.ld2.sret.nxv4f64, which need be splited.
Unlikely, the related custom spilting is not supported now.
Fix #88247