-
Notifications
You must be signed in to change notification settings - Fork 11.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[VP][RISCV] Introduce vp.splat and RISC-V. #98731
Conversation
This patch introduces a vp intrinsic for splat. It's helpful for IR-level passes to create a splat with specific vector length.
@llvm/pr-subscribers-llvm-ir @llvm/pr-subscribers-llvm-selectiondag Author: Yeting Kuo (yetingk) ChangesThis patch introduces a vp intrinsic for splat. It's helpful for IR-level passes to create a splat with specific vector length. Patch is 50.47 KiB, truncated to 20.00 KiB below, full version: https://github.com/llvm/llvm-project/pull/98731.diff 12 Files Affected:
diff --git a/llvm/docs/LangRef.rst b/llvm/docs/LangRef.rst
index ae39217dc8ff8..37d32f3be7a0a 100644
--- a/llvm/docs/LangRef.rst
+++ b/llvm/docs/LangRef.rst
@@ -22841,6 +22841,51 @@ Examples:
llvm.experimental.vp.splice(<A,B,C,D>, <E,F,G,H>, -2, 3, 2); ==> <B, C, poison, poison> trailing elements
+.. _int_experimental_vp_splat:
+
+
+'``llvm.experimental.vp.splat``' Intrinsic
+^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
+
+Syntax:
+"""""""
+This is an overloaded intrinsic.
+
+::
+
+ declare <2 x double> @llvm.experimental.vp.splat.v2f64(<2 x double> %vec, <2 x i1> %mask, i32 %evl)
+ declare <vscale x 4 x i32> @llvm.experimental.vp.splat.nxv4i32(<vscale x 4 x i32> %vec, <vscale x 4 x i1> %mask, i32 %evl)
+
+Overview:
+"""""""""
+
+The '``llvm.experimental.vp.splat.*``' intrinsic is to create a prdicated splat
+with specific effective vector length.
+
+Arguments:
+""""""""""
+
+The result is a vector and it is a splat of the second scalar operand. The
+second argument ``mask`` is a vector mask and has the same number of elements as
+the result. The third argument is the explicit vector length of the operation.
+
+Semantics:
+""""""""""
+
+This intrinsic splats a vector with ``evl`` elements of a scalar operand.
+The lanes in the result vector disabled by ``mask`` are ``poison``. The
+elements past ``evl`` are poison.
+
+Examples:
+"""""""""
+
+.. code-block:: llvm
+
+ %r = call <4 x float> @llvm.vp.splat.v4f32(float %a, <4 x i1> %mask, i32 %evl)
+ ;; For all lanes below %evl, %r is lane-wise equivalent to %also.r
+ %also.r = select <4 x i1> %mask, <4 x float> splat(float %a), <4 x float> poison
+
+
.. _int_experimental_vp_reverse:
diff --git a/llvm/include/llvm/IR/Intrinsics.td b/llvm/include/llvm/IR/Intrinsics.td
index 65a9b68b5229d..0be7e963954ef 100644
--- a/llvm/include/llvm/IR/Intrinsics.td
+++ b/llvm/include/llvm/IR/Intrinsics.td
@@ -2319,6 +2319,13 @@ def int_experimental_vp_reverse:
llvm_i32_ty],
[IntrNoMem]>;
+def int_experimental_vp_splat:
+ DefaultAttrsIntrinsic<[llvm_anyvector_ty],
+ [LLVMVectorElementType<0>,
+ LLVMScalarOrSameVectorWidth<0, llvm_i1_ty>,
+ llvm_i32_ty],
+ [IntrNoMem]>;
+
def int_vp_is_fpclass:
DefaultAttrsIntrinsic<[ LLVMScalarOrSameVectorWidth<0, llvm_i1_ty>],
[ llvm_anyvector_ty,
diff --git a/llvm/include/llvm/IR/VPIntrinsics.def b/llvm/include/llvm/IR/VPIntrinsics.def
index 8eced073501e8..9d219e27b359e 100644
--- a/llvm/include/llvm/IR/VPIntrinsics.def
+++ b/llvm/include/llvm/IR/VPIntrinsics.def
@@ -777,6 +777,13 @@ END_REGISTER_VP(experimental_vp_reverse, EXPERIMENTAL_VP_REVERSE)
///// } Shuffles
+// llvm.vp.splat(ptr,val,mask,vlen)
+BEGIN_REGISTER_VP_INTRINSIC(experimental_vp_splat, 1, 2)
+BEGIN_REGISTER_VP_SDNODE(EXPERIMENTAL_VP_SPLAT, -1, experimental_vp_splat, 1, 2)
+VP_PROPERTY_NO_FUNCTIONAL
+HELPER_MAP_VPID_TO_VPSD(experimental_vp_splat, EXPERIMENTAL_VP_SPLAT)
+END_REGISTER_VP(experimental_vp_splat, EXPERIMENTAL_VP_SPLAT)
+
#undef BEGIN_REGISTER_VP
#undef BEGIN_REGISTER_VP_INTRINSIC
#undef BEGIN_REGISTER_VP_SDNODE
diff --git a/llvm/lib/CodeGen/SelectionDAG/LegalizeIntegerTypes.cpp b/llvm/lib/CodeGen/SelectionDAG/LegalizeIntegerTypes.cpp
index fed5ebcc3c903..f21c2ba98e567 100644
--- a/llvm/lib/CodeGen/SelectionDAG/LegalizeIntegerTypes.cpp
+++ b/llvm/lib/CodeGen/SelectionDAG/LegalizeIntegerTypes.cpp
@@ -137,6 +137,7 @@ void DAGTypeLegalizer::PromoteIntegerResult(SDNode *N, unsigned ResNo) {
break;
case ISD::SPLAT_VECTOR:
case ISD::SCALAR_TO_VECTOR:
+ case ISD::EXPERIMENTAL_VP_SPLAT:
Res = PromoteIntRes_ScalarOp(N);
break;
case ISD::STEP_VECTOR: Res = PromoteIntRes_STEP_VECTOR(N); break;
@@ -1916,6 +1917,7 @@ bool DAGTypeLegalizer::PromoteIntegerOperand(SDNode *N, unsigned OpNo) {
break;
case ISD::SPLAT_VECTOR:
case ISD::SCALAR_TO_VECTOR:
+ case ISD::EXPERIMENTAL_VP_SPLAT:
Res = PromoteIntOp_ScalarOp(N);
break;
case ISD::VSELECT:
@@ -2211,10 +2213,14 @@ SDValue DAGTypeLegalizer::PromoteIntOp_INSERT_VECTOR_ELT(SDNode *N,
}
SDValue DAGTypeLegalizer::PromoteIntOp_ScalarOp(SDNode *N) {
+ SDValue Op = GetPromotedInteger(N->getOperand(0));
+ if (N->getOpcode() == ISD::EXPERIMENTAL_VP_SPLAT)
+ return DAG.getNode(ISD::EXPERIMENTAL_VP_SPLAT, SDLoc(N), N->getValueType(0),
+ Op, N->getOperand(1), N->getOperand(2));
+
// Integer SPLAT_VECTOR/SCALAR_TO_VECTOR operands are implicitly truncated,
// so just promote the operand in place.
- return SDValue(DAG.UpdateNodeOperands(N,
- GetPromotedInteger(N->getOperand(0))), 0);
+ return SDValue(DAG.UpdateNodeOperands(N, Op), 0);
}
SDValue DAGTypeLegalizer::PromoteIntOp_SELECT(SDNode *N, unsigned OpNo) {
@@ -5231,6 +5237,7 @@ bool DAGTypeLegalizer::ExpandIntegerOperand(SDNode *N, unsigned OpNo) {
case ISD::EXTRACT_ELEMENT: Res = ExpandOp_EXTRACT_ELEMENT(N); break;
case ISD::INSERT_VECTOR_ELT: Res = ExpandOp_INSERT_VECTOR_ELT(N); break;
case ISD::SCALAR_TO_VECTOR: Res = ExpandOp_SCALAR_TO_VECTOR(N); break;
+ case ISD::EXPERIMENTAL_VP_SPLAT:
case ISD::SPLAT_VECTOR: Res = ExpandIntOp_SPLAT_VECTOR(N); break;
case ISD::SELECT_CC: Res = ExpandIntOp_SELECT_CC(N); break;
case ISD::SETCC: Res = ExpandIntOp_SETCC(N); break;
@@ -5859,7 +5866,11 @@ SDValue DAGTypeLegalizer::PromoteIntRes_ScalarOp(SDNode *N) {
EVT NOutElemVT = NOutVT.getVectorElementType();
SDValue Op = DAG.getNode(ISD::ANY_EXTEND, dl, NOutElemVT, N->getOperand(0));
-
+ if (N->isVPOpcode()) {
+ SDValue Mask = N->getOperand(1);
+ SDValue VL = N->getOperand(2);
+ return DAG.getNode(N->getOpcode(), dl, NOutVT, Op, Mask, VL);
+ }
return DAG.getNode(N->getOpcode(), dl, NOutVT, Op);
}
diff --git a/llvm/lib/CodeGen/SelectionDAG/LegalizeTypes.h b/llvm/lib/CodeGen/SelectionDAG/LegalizeTypes.h
index 85f947efe2c75..f20cfe6de60cc 100644
--- a/llvm/lib/CodeGen/SelectionDAG/LegalizeTypes.h
+++ b/llvm/lib/CodeGen/SelectionDAG/LegalizeTypes.h
@@ -915,6 +915,7 @@ class LLVM_LIBRARY_VISIBILITY DAGTypeLegalizer {
void SplitVecRes_Gather(MemSDNode *VPGT, SDValue &Lo, SDValue &Hi,
bool SplitSETCC = false);
void SplitVecRes_ScalarOp(SDNode *N, SDValue &Lo, SDValue &Hi);
+ void SplitVecRes_VP_SPLAT(SDNode *N, SDValue &Lo, SDValue &Hi);
void SplitVecRes_STEP_VECTOR(SDNode *N, SDValue &Lo, SDValue &Hi);
void SplitVecRes_SETCC(SDNode *N, SDValue &Lo, SDValue &Hi);
void SplitVecRes_VECTOR_REVERSE(SDNode *N, SDValue &Lo, SDValue &Hi);
@@ -1052,6 +1053,7 @@ class LLVM_LIBRARY_VISIBILITY DAGTypeLegalizer {
SDValue WidenVecOp_MGATHER(SDNode* N, unsigned OpNo);
SDValue WidenVecOp_MSCATTER(SDNode* N, unsigned OpNo);
SDValue WidenVecOp_VP_SCATTER(SDNode* N, unsigned OpNo);
+ SDValue WidenVecOp_VP_SPLAT(SDNode *N, unsigned OpNo);
SDValue WidenVecOp_SETCC(SDNode* N);
SDValue WidenVecOp_STRICT_FSETCC(SDNode* N);
SDValue WidenVecOp_VSELECT(SDNode *N);
diff --git a/llvm/lib/CodeGen/SelectionDAG/LegalizeVectorTypes.cpp b/llvm/lib/CodeGen/SelectionDAG/LegalizeVectorTypes.cpp
index bbf08e862da12..5015a665b1eb6 100644
--- a/llvm/lib/CodeGen/SelectionDAG/LegalizeVectorTypes.cpp
+++ b/llvm/lib/CodeGen/SelectionDAG/LegalizeVectorTypes.cpp
@@ -1076,6 +1076,7 @@ void DAGTypeLegalizer::SplitVectorResult(SDNode *N, unsigned ResNo) {
case ISD::FCOPYSIGN: SplitVecRes_FPOp_MultiType(N, Lo, Hi); break;
case ISD::IS_FPCLASS: SplitVecRes_IS_FPCLASS(N, Lo, Hi); break;
case ISD::INSERT_VECTOR_ELT: SplitVecRes_INSERT_VECTOR_ELT(N, Lo, Hi); break;
+ case ISD::EXPERIMENTAL_VP_SPLAT: SplitVecRes_VP_SPLAT(N, Lo, Hi); break;
case ISD::SPLAT_VECTOR:
case ISD::SCALAR_TO_VECTOR:
SplitVecRes_ScalarOp(N, Lo, Hi);
@@ -1992,6 +1993,16 @@ void DAGTypeLegalizer::SplitVecRes_ScalarOp(SDNode *N, SDValue &Lo,
}
}
+void DAGTypeLegalizer::SplitVecRes_VP_SPLAT(SDNode *N, SDValue &Lo,
+ SDValue &Hi) {
+ SDLoc dl(N);
+ auto [LoVT, HiVT] = DAG.GetSplitDestVTs(N->getValueType(0));
+ auto [MaskLo, MaskHi] = SplitMask(N->getOperand(1));
+ auto [EVLLo, EVLHi] = DAG.SplitEVL(N->getOperand(2), N->getValueType(0), dl);
+ Lo = DAG.getNode(N->getOpcode(), dl, LoVT, N->getOperand(0), MaskLo, EVLLo);
+ Hi = DAG.getNode(N->getOpcode(), dl, HiVT, N->getOperand(0), MaskHi, EVLHi);
+}
+
void DAGTypeLegalizer::SplitVecRes_LOAD(LoadSDNode *LD, SDValue &Lo,
SDValue &Hi) {
assert(ISD::isUNINDEXEDLoad(LD) && "Indexed load during type legalization!");
@@ -4284,6 +4295,7 @@ void DAGTypeLegalizer::WidenVectorResult(SDNode *N, unsigned ResNo) {
case ISD::STEP_VECTOR:
case ISD::SPLAT_VECTOR:
case ISD::SCALAR_TO_VECTOR:
+ case ISD::EXPERIMENTAL_VP_SPLAT:
Res = WidenVecRes_ScalarOp(N);
break;
case ISD::SIGN_EXTEND_INREG: Res = WidenVecRes_InregOp(N); break;
@@ -5814,6 +5826,9 @@ SDValue DAGTypeLegalizer::WidenVecRes_VP_GATHER(VPGatherSDNode *N) {
SDValue DAGTypeLegalizer::WidenVecRes_ScalarOp(SDNode *N) {
EVT WidenVT = TLI.getTypeToTransformTo(*DAG.getContext(), N->getValueType(0));
+ if (N->isVPOpcode())
+ return DAG.getNode(N->getOpcode(), SDLoc(N), WidenVT, N->getOperand(0),
+ N->getOperand(1), N->getOperand(2));
return DAG.getNode(N->getOpcode(), SDLoc(N), WidenVT, N->getOperand(0));
}
@@ -6353,6 +6368,10 @@ bool DAGTypeLegalizer::WidenVectorOperand(SDNode *N, unsigned OpNo) {
Res = WidenVecOp_FP_TO_XINT_SAT(N);
break;
+ case ISD::EXPERIMENTAL_VP_SPLAT:
+ Res = WidenVecOp_VP_SPLAT(N, OpNo);
+ break;
+
case ISD::VECREDUCE_FADD:
case ISD::VECREDUCE_FMUL:
case ISD::VECREDUCE_ADD:
@@ -6813,6 +6832,13 @@ SDValue DAGTypeLegalizer::WidenVecOp_STORE(SDNode *N) {
report_fatal_error("Unable to widen vector store");
}
+SDValue DAGTypeLegalizer::WidenVecOp_VP_SPLAT(SDNode *N, unsigned OpNo) {
+ assert(OpNo == 1 && "Can widen only mask operand of vp_splat");
+ return DAG.getNode(N->getOpcode(), SDLoc(N), N->getValueType(0),
+ N->getOperand(0), GetWidenedVector(N->getOperand(1)),
+ N->getOperand(2));
+}
+
SDValue DAGTypeLegalizer::WidenVecOp_VP_STORE(SDNode *N, unsigned OpNo) {
assert((OpNo == 1 || OpNo == 3) &&
"Can widen only data or mask operand of vp_store");
diff --git a/llvm/lib/IR/IntrinsicInst.cpp b/llvm/lib/IR/IntrinsicInst.cpp
index e17755c8ad57b..64a14da55b15e 100644
--- a/llvm/lib/IR/IntrinsicInst.cpp
+++ b/llvm/lib/IR/IntrinsicInst.cpp
@@ -699,6 +699,9 @@ Function *VPIntrinsic::getDeclarationForParams(Module *M, Intrinsic::ID VPID,
VPFunc = Intrinsic::getDeclaration(
M, VPID, {Params[0]->getType(), Params[1]->getType()});
break;
+ case Intrinsic::experimental_vp_splat:
+ VPFunc = Intrinsic::getDeclaration(M, VPID, ReturnType);
+ break;
}
assert(VPFunc && "Could not declare VP intrinsic");
return VPFunc;
diff --git a/llvm/lib/Target/RISCV/RISCVISelLowering.cpp b/llvm/lib/Target/RISCV/RISCVISelLowering.cpp
index 19f958ccfd2e1..b51b821450e63 100644
--- a/llvm/lib/Target/RISCV/RISCVISelLowering.cpp
+++ b/llvm/lib/Target/RISCV/RISCVISelLowering.cpp
@@ -705,7 +705,8 @@ RISCVTargetLowering::RISCVTargetLowering(const TargetMachine &TM,
ISD::VP_SMAX, ISD::VP_UMIN, ISD::VP_UMAX,
ISD::VP_ABS, ISD::EXPERIMENTAL_VP_REVERSE, ISD::EXPERIMENTAL_VP_SPLICE,
ISD::VP_SADDSAT, ISD::VP_UADDSAT, ISD::VP_SSUBSAT,
- ISD::VP_USUBSAT, ISD::VP_CTTZ_ELTS, ISD::VP_CTTZ_ELTS_ZERO_UNDEF};
+ ISD::VP_USUBSAT, ISD::VP_CTTZ_ELTS, ISD::VP_CTTZ_ELTS_ZERO_UNDEF,
+ ISD::EXPERIMENTAL_VP_SPLAT};
static const unsigned FloatingPointVPOps[] = {
ISD::VP_FADD, ISD::VP_FSUB, ISD::VP_FMUL,
@@ -721,7 +722,7 @@ RISCVTargetLowering::RISCVTargetLowering(const TargetMachine &TM,
ISD::VP_FMINIMUM, ISD::VP_FMAXIMUM, ISD::VP_LRINT,
ISD::VP_LLRINT, ISD::EXPERIMENTAL_VP_REVERSE,
ISD::EXPERIMENTAL_VP_SPLICE, ISD::VP_REDUCE_FMINIMUM,
- ISD::VP_REDUCE_FMAXIMUM};
+ ISD::VP_REDUCE_FMAXIMUM, ISD::EXPERIMENTAL_VP_SPLAT};
static const unsigned IntegerVecReduceOps[] = {
ISD::VECREDUCE_ADD, ISD::VECREDUCE_AND, ISD::VECREDUCE_OR,
@@ -7268,6 +7269,8 @@ SDValue RISCVTargetLowering::LowerOperation(SDValue Op,
return lowerVPSpliceExperimental(Op, DAG);
case ISD::EXPERIMENTAL_VP_REVERSE:
return lowerVPReverseExperimental(Op, DAG);
+ case ISD::EXPERIMENTAL_VP_SPLAT:
+ return lowerVPSplatExperimental(Op, DAG);
case ISD::CLEAR_CACHE: {
assert(getTargetMachine().getTargetTriple().isOSLinux() &&
"llvm.clear_cache only needs custom lower on Linux targets");
@@ -11630,6 +11633,30 @@ RISCVTargetLowering::lowerVPSpliceExperimental(SDValue Op,
return convertFromScalableVector(VT, Result, DAG, Subtarget);
}
+SDValue
+RISCVTargetLowering::lowerVPSplatExperimental(SDValue Op,
+ SelectionDAG &DAG) const {
+ SDLoc DL(Op);
+ SDValue Val = Op.getOperand(0);
+ SDValue Mask = Op.getOperand(1);
+ SDValue VL = Op.getOperand(2);
+ MVT VT = Op.getSimpleValueType();
+
+ MVT ContainerVT = VT;
+ if (VT.isFixedLengthVector()) {
+ ContainerVT = getContainerForFixedLengthVector(VT);
+ MVT MaskVT = getMaskTypeFor(ContainerVT);
+ Mask = convertToScalableVector(MaskVT, Mask, DAG, Subtarget);
+ }
+
+ SDValue Result = lowerScalarSplat(SDValue(), Val, VL, ContainerVT, DL,
+ DAG, Subtarget);
+
+ if (!VT.isFixedLengthVector())
+ return Result;
+ return convertFromScalableVector(VT, Result, DAG, Subtarget);
+}
+
SDValue
RISCVTargetLowering::lowerVPReverseExperimental(SDValue Op,
SelectionDAG &DAG) const {
diff --git a/llvm/lib/Target/RISCV/RISCVISelLowering.h b/llvm/lib/Target/RISCV/RISCVISelLowering.h
index 7d8bceb5cb341..449ff24492c69 100644
--- a/llvm/lib/Target/RISCV/RISCVISelLowering.h
+++ b/llvm/lib/Target/RISCV/RISCVISelLowering.h
@@ -972,6 +972,7 @@ class RISCVTargetLowering : public TargetLowering {
SDValue lowerLogicVPOp(SDValue Op, SelectionDAG &DAG) const;
SDValue lowerVPExtMaskOp(SDValue Op, SelectionDAG &DAG) const;
SDValue lowerVPSetCCMaskOp(SDValue Op, SelectionDAG &DAG) const;
+ SDValue lowerVPSplatExperimental(SDValue Op, SelectionDAG &DAG) const;
SDValue lowerVPSpliceExperimental(SDValue Op, SelectionDAG &DAG) const;
SDValue lowerVPReverseExperimental(SDValue Op, SelectionDAG &DAG) const;
SDValue lowerVPFPIntConvOp(SDValue Op, SelectionDAG &DAG) const;
diff --git a/llvm/test/CodeGen/RISCV/rvv/fixed-vectors-vp-splat.ll b/llvm/test/CodeGen/RISCV/rvv/fixed-vectors-vp-splat.ll
new file mode 100644
index 0000000000000..2913cbdf0fffd
--- /dev/null
+++ b/llvm/test/CodeGen/RISCV/rvv/fixed-vectors-vp-splat.ll
@@ -0,0 +1,452 @@
+; NOTE: Assertions have been autogenerated by utils/update_llc_test_checks.py UTC_ARGS: --version 5
+; RUN: llc -mtriple=riscv32 -mattr=+v,+d,+zfh,+zvfh -verify-machineinstrs < %s | FileCheck %s --check-prefixes=CHECK,RV32
+; RUN: llc -mtriple=riscv64 -mattr=+v,+d,+zfh,+zvfh -verify-machineinstrs < %s | FileCheck %s --check-prefixes=CHECK,RV64
+
+define <1 x i8> @vp_splat_v1i8(i8 %val, <1 x i1> %m, i32 zeroext %evl) {
+; CHECK-LABEL: vp_splat_v1i8:
+; CHECK: # %bb.0:
+; CHECK-NEXT: vsetvli zero, a1, e8, mf8, ta, ma
+; CHECK-NEXT: vmv.v.x v8, a0
+; CHECK-NEXT: ret
+ %splat = call <1 x i8> @llvm.experimental.vp.splat.v1i8(i8 %val, <1 x i1> %m, i32 %evl)
+ ret <1 x i8> %splat
+}
+
+define <2 x i8> @vp_splat_v2i8(i8 %val, <2 x i1> %m, i32 zeroext %evl) {
+; CHECK-LABEL: vp_splat_v2i8:
+; CHECK: # %bb.0:
+; CHECK-NEXT: vsetvli zero, a1, e8, mf8, ta, ma
+; CHECK-NEXT: vmv.v.x v8, a0
+; CHECK-NEXT: ret
+ %splat = call <2 x i8> @llvm.experimental.vp.splat.v2i8(i8 %val, <2 x i1> %m, i32 %evl)
+ ret <2 x i8> %splat
+}
+
+define <4 x i8> @vp_splat_v4i8(i8 %val, <4 x i1> %m, i32 zeroext %evl) {
+; CHECK-LABEL: vp_splat_v4i8:
+; CHECK: # %bb.0:
+; CHECK-NEXT: vsetvli zero, a1, e8, mf4, ta, ma
+; CHECK-NEXT: vmv.v.x v8, a0
+; CHECK-NEXT: ret
+ %splat = call <4 x i8> @llvm.experimental.vp.splat.v4i8(i8 %val, <4 x i1> %m, i32 %evl)
+ ret <4 x i8> %splat
+}
+
+define <8 x i8> @vp_splat_v8i8(i8 %val, <8 x i1> %m, i32 zeroext %evl) {
+; CHECK-LABEL: vp_splat_v8i8:
+; CHECK: # %bb.0:
+; CHECK-NEXT: vsetvli zero, a1, e8, mf2, ta, ma
+; CHECK-NEXT: vmv.v.x v8, a0
+; CHECK-NEXT: ret
+ %splat = call <8 x i8> @llvm.experimental.vp.splat.v8i8(i8 %val, <8 x i1> %m, i32 %evl)
+ ret <8 x i8> %splat
+}
+
+define <16 x i8> @vp_splat_v16i8(i8 %val, <16 x i1> %m, i32 zeroext %evl) {
+; CHECK-LABEL: vp_splat_v16i8:
+; CHECK: # %bb.0:
+; CHECK-NEXT: vsetvli zero, a1, e8, m1, ta, ma
+; CHECK-NEXT: vmv.v.x v8, a0
+; CHECK-NEXT: ret
+ %splat = call <16 x i8> @llvm.experimental.vp.splat.v16i8(i8 %val, <16 x i1> %m, i32 %evl)
+ ret <16 x i8> %splat
+}
+
+define <32 x i8> @vp_splat_v32i8(i8 %val, <32 x i1> %m, i32 zeroext %evl) {
+; CHECK-LABEL: vp_splat_v32i8:
+; CHECK: # %bb.0:
+; CHECK-NEXT: vsetvli zero, a1, e8, m2, ta, ma
+; CHECK-NEXT: vmv.v.x v8, a0
+; CHECK-NEXT: ret
+ %splat = call <32 x i8> @llvm.experimental.vp.splat.v32i8(i8 %val, <32 x i1> %m, i32 %evl)
+ ret <32 x i8> %splat
+}
+
+define <64 x i8> @vp_splat_v64i8(i8 %val, <64 x i1> %m, i32 zeroext %evl) {
+; CHECK-LABEL: vp_splat_v64i8:
+; CHECK: # %bb.0:
+; CHECK-NEXT: vsetvli zero, a1, e8, m4, ta, ma
+; CHECK-NEXT: vmv.v.x v8, a0
+; CHECK-NEXT: ret
+ %splat = call <64 x i8> @llvm.experimental.vp.splat.v64i8(i8 %val, <64 x i1> %m, i32 %evl)
+ ret <64 x i8> %splat
+}
+
+define <1 x i16> @vp_splat_v1i16(i16 %val, <1 x i1> %m, i32 zeroext %evl) {
+; CHECK-LABEL: vp_splat_v1i16:
+; CHECK: # %bb.0:
+; CHECK-NEXT: vsetvli zero, a1, e16, mf4, ta, ma
+; CHECK-NEXT: vmv.v.x v8, a0
+; CHECK-NEXT: ret
+ %splat = call <1 x i16> @llvm.experimental.vp.splat.v1i16(i16 %val, <1 x i1> %m, i32 %evl)
+ ret <1 x i16> %splat
+}
+
+define <2 x i16> @vp_splat_v2i16(i16 %val, <2 x i1> %m, i32 zeroext %evl) {
+; CHECK-LABEL: vp_splat_v2i16:
+; CHECK: # %bb.0:
+; CHECK-NEXT: vsetvli zero, a1, e16, mf4, ta, ma
+; CHECK-NEXT: vmv.v.x v8, a0
+; CHECK-NEXT: ret
+ %splat = call <2 x i16> @llvm.experimental.vp.splat.v2i16(i16 %val, <2 x i1> %m, i32 %evl)
+ ret <2 x i16> %splat
+}
+
+define <4 x i16> @vp_splat_v4i16(i16 %val, <4 x i1> %m, i32 zeroext %evl) {
+; CHECK-LABEL: vp_splat_v4i16:
+; CHECK: # %bb.0:
+; CHECK-NEXT: vsetvli zero, a1, e16, mf2, ta, ma
+; CHECK-NEXT: vmv.v.x v8, a0
+; CHECK-NEXT: ret
+ %splat = call <4 x i16> @llvm.experimental.vp.splat.v4i16(i16 %val, <4 x i1> %m, i32 %evl)
+ ret <4 x i16> %splat
+}
+
+define <8 x i16> @vp_splat_v8i16(i16 %val, <8 x i1> %m, i32 zeroext %evl) {
+; CHECK-LABEL: vp_splat_v8i16:
+; CHECK: # %bb.0:
+; CHECK-NEXT: vsetvli zero, a1, e16, m1, ta, ma
+; CHECK-NEXT: vmv.v.x v8, a0
+; CHECK-NEXT: ret
+ %splat = call <8 x i16> @llvm.experimental.vp.splat.v8i16(i16 %val, <8 x i1> %m, i32 %evl)
+ ret <8 x i16> %splat
+}
+
+define <16 x i16> @vp_splat_v16i16(i16 %val, <16 x i1> %m, i32 zeroext %evl) {
+; CHECK-LABEL: vp_splat_v16i16:
+; CHECK: # %bb.0:
+; CHECK-NEXT: vsetvli zero, a1, e16, m2, ta, ma
+; CHECK-NEXT: vmv.v.x v8, a0
+; CHECK-NEXT: ret
+ %splat = call <16 x i16> @llvm.experimental.vp.splat.v16i16(i16 %val, <16 x i1> %m, i32 %evl)
+ ret <16 x i16> %splat
+}
+
+define <32 x i16> @vp_splat_v32i16(i16 %val, <32 x i1> %m, i32 zeroext %evl) {
+; CHECK-LABEL: vp_splat_v32i16:
+; CHECK: # %bb.0:
+; CHECK-NEXT: vsetvli zero, a1, e16, m4, ta, ma
+; CHECK-NEXT: vmv.v.x v8, a0
+; CHECK-NEXT: ret
+ %splat = call <32 x i16> @llvm.experimental.vp.splat.v32i16(i16 %val, <32 x i1> %m, i32 %evl)
+ ret <32 x i16> %splat
+}
+
+define <1 x i32> @vp_splat_v1i32(i32 %val, <1 x i1> %m, i32 zeroext %evl) {
+; CHECK-LABEL: vp_splat_v1i32:
+; CHECK: # %bb.0:
+; CHECK-NEXT: vsetvli zero, a1, e32, mf2, ta, ma
+; CHECK-NEXT: vmv.v.x v8, a0
+; CHECK-NEXT: ret
+ %splat = call <1 x i32> @llvm.experimental.vp.sp...
[truncated]
|
You can test this locally with the following command:git-clang-format --diff cb3bc5be9c20d893adf94cdf436092657ab5ab40 83e97d341043e86aa3c495207d80641d49d04d98 --extensions cpp,h -- llvm/lib/CodeGen/SelectionDAG/LegalizeIntegerTypes.cpp llvm/lib/CodeGen/SelectionDAG/LegalizeTypes.h llvm/lib/CodeGen/SelectionDAG/LegalizeVectorTypes.cpp llvm/lib/IR/IntrinsicInst.cpp llvm/lib/Target/RISCV/RISCVISelLowering.cpp llvm/lib/Target/RISCV/RISCVISelLowering.h llvm/unittests/IR/VPIntrinsicTest.cpp View the diff from clang-format here.diff --git a/llvm/lib/CodeGen/SelectionDAG/LegalizeVectorTypes.cpp b/llvm/lib/CodeGen/SelectionDAG/LegalizeVectorTypes.cpp
index 5015a665b1..288886e7d2 100644
--- a/llvm/lib/CodeGen/SelectionDAG/LegalizeVectorTypes.cpp
+++ b/llvm/lib/CodeGen/SelectionDAG/LegalizeVectorTypes.cpp
@@ -1076,7 +1076,9 @@ void DAGTypeLegalizer::SplitVectorResult(SDNode *N, unsigned ResNo) {
case ISD::FCOPYSIGN: SplitVecRes_FPOp_MultiType(N, Lo, Hi); break;
case ISD::IS_FPCLASS: SplitVecRes_IS_FPCLASS(N, Lo, Hi); break;
case ISD::INSERT_VECTOR_ELT: SplitVecRes_INSERT_VECTOR_ELT(N, Lo, Hi); break;
- case ISD::EXPERIMENTAL_VP_SPLAT: SplitVecRes_VP_SPLAT(N, Lo, Hi); break;
+ case ISD::EXPERIMENTAL_VP_SPLAT:
+ SplitVecRes_VP_SPLAT(N, Lo, Hi);
+ break;
case ISD::SPLAT_VECTOR:
case ISD::SCALAR_TO_VECTOR:
SplitVecRes_ScalarOp(N, Lo, Hi);
diff --git a/llvm/lib/Target/RISCV/RISCVISelLowering.cpp b/llvm/lib/Target/RISCV/RISCVISelLowering.cpp
index 98d9b4286b..3a4006cf1b 100644
--- a/llvm/lib/Target/RISCV/RISCVISelLowering.cpp
+++ b/llvm/lib/Target/RISCV/RISCVISelLowering.cpp
@@ -691,38 +691,89 @@ RISCVTargetLowering::RISCVTargetLowering(const TargetMachine &TM,
setOperationAction({ISD::INTRINSIC_W_CHAIN, ISD::INTRINSIC_VOID},
MVT::Other, Custom);
- static const unsigned IntegerVPOps[] = {
- ISD::VP_ADD, ISD::VP_SUB, ISD::VP_MUL,
- ISD::VP_SDIV, ISD::VP_UDIV, ISD::VP_SREM,
- ISD::VP_UREM, ISD::VP_AND, ISD::VP_OR,
- ISD::VP_XOR, ISD::VP_SRA, ISD::VP_SRL,
- ISD::VP_SHL, ISD::VP_REDUCE_ADD, ISD::VP_REDUCE_AND,
- ISD::VP_REDUCE_OR, ISD::VP_REDUCE_XOR, ISD::VP_REDUCE_SMAX,
- ISD::VP_REDUCE_SMIN, ISD::VP_REDUCE_UMAX, ISD::VP_REDUCE_UMIN,
- ISD::VP_MERGE, ISD::VP_SELECT, ISD::VP_FP_TO_SINT,
- ISD::VP_FP_TO_UINT, ISD::VP_SETCC, ISD::VP_SIGN_EXTEND,
- ISD::VP_ZERO_EXTEND, ISD::VP_TRUNCATE, ISD::VP_SMIN,
- ISD::VP_SMAX, ISD::VP_UMIN, ISD::VP_UMAX,
- ISD::VP_ABS, ISD::EXPERIMENTAL_VP_REVERSE, ISD::EXPERIMENTAL_VP_SPLICE,
- ISD::VP_SADDSAT, ISD::VP_UADDSAT, ISD::VP_SSUBSAT,
- ISD::VP_USUBSAT, ISD::VP_CTTZ_ELTS, ISD::VP_CTTZ_ELTS_ZERO_UNDEF,
- ISD::EXPERIMENTAL_VP_SPLAT};
-
- static const unsigned FloatingPointVPOps[] = {
- ISD::VP_FADD, ISD::VP_FSUB, ISD::VP_FMUL,
- ISD::VP_FDIV, ISD::VP_FNEG, ISD::VP_FABS,
- ISD::VP_FMA, ISD::VP_REDUCE_FADD, ISD::VP_REDUCE_SEQ_FADD,
- ISD::VP_REDUCE_FMIN, ISD::VP_REDUCE_FMAX, ISD::VP_MERGE,
- ISD::VP_SELECT, ISD::VP_SINT_TO_FP, ISD::VP_UINT_TO_FP,
- ISD::VP_SETCC, ISD::VP_FP_ROUND, ISD::VP_FP_EXTEND,
- ISD::VP_SQRT, ISD::VP_FMINNUM, ISD::VP_FMAXNUM,
- ISD::VP_FCEIL, ISD::VP_FFLOOR, ISD::VP_FROUND,
- ISD::VP_FROUNDEVEN, ISD::VP_FCOPYSIGN, ISD::VP_FROUNDTOZERO,
- ISD::VP_FRINT, ISD::VP_FNEARBYINT, ISD::VP_IS_FPCLASS,
- ISD::VP_FMINIMUM, ISD::VP_FMAXIMUM, ISD::VP_LRINT,
- ISD::VP_LLRINT, ISD::EXPERIMENTAL_VP_REVERSE,
- ISD::EXPERIMENTAL_VP_SPLICE, ISD::VP_REDUCE_FMINIMUM,
- ISD::VP_REDUCE_FMAXIMUM, ISD::EXPERIMENTAL_VP_SPLAT};
+ static const unsigned IntegerVPOps[] = {ISD::VP_ADD,
+ ISD::VP_SUB,
+ ISD::VP_MUL,
+ ISD::VP_SDIV,
+ ISD::VP_UDIV,
+ ISD::VP_SREM,
+ ISD::VP_UREM,
+ ISD::VP_AND,
+ ISD::VP_OR,
+ ISD::VP_XOR,
+ ISD::VP_SRA,
+ ISD::VP_SRL,
+ ISD::VP_SHL,
+ ISD::VP_REDUCE_ADD,
+ ISD::VP_REDUCE_AND,
+ ISD::VP_REDUCE_OR,
+ ISD::VP_REDUCE_XOR,
+ ISD::VP_REDUCE_SMAX,
+ ISD::VP_REDUCE_SMIN,
+ ISD::VP_REDUCE_UMAX,
+ ISD::VP_REDUCE_UMIN,
+ ISD::VP_MERGE,
+ ISD::VP_SELECT,
+ ISD::VP_FP_TO_SINT,
+ ISD::VP_FP_TO_UINT,
+ ISD::VP_SETCC,
+ ISD::VP_SIGN_EXTEND,
+ ISD::VP_ZERO_EXTEND,
+ ISD::VP_TRUNCATE,
+ ISD::VP_SMIN,
+ ISD::VP_SMAX,
+ ISD::VP_UMIN,
+ ISD::VP_UMAX,
+ ISD::VP_ABS,
+ ISD::EXPERIMENTAL_VP_REVERSE,
+ ISD::EXPERIMENTAL_VP_SPLICE,
+ ISD::VP_SADDSAT,
+ ISD::VP_UADDSAT,
+ ISD::VP_SSUBSAT,
+ ISD::VP_USUBSAT,
+ ISD::VP_CTTZ_ELTS,
+ ISD::VP_CTTZ_ELTS_ZERO_UNDEF,
+ ISD::EXPERIMENTAL_VP_SPLAT};
+
+ static const unsigned FloatingPointVPOps[] = {ISD::VP_FADD,
+ ISD::VP_FSUB,
+ ISD::VP_FMUL,
+ ISD::VP_FDIV,
+ ISD::VP_FNEG,
+ ISD::VP_FABS,
+ ISD::VP_FMA,
+ ISD::VP_REDUCE_FADD,
+ ISD::VP_REDUCE_SEQ_FADD,
+ ISD::VP_REDUCE_FMIN,
+ ISD::VP_REDUCE_FMAX,
+ ISD::VP_MERGE,
+ ISD::VP_SELECT,
+ ISD::VP_SINT_TO_FP,
+ ISD::VP_UINT_TO_FP,
+ ISD::VP_SETCC,
+ ISD::VP_FP_ROUND,
+ ISD::VP_FP_EXTEND,
+ ISD::VP_SQRT,
+ ISD::VP_FMINNUM,
+ ISD::VP_FMAXNUM,
+ ISD::VP_FCEIL,
+ ISD::VP_FFLOOR,
+ ISD::VP_FROUND,
+ ISD::VP_FROUNDEVEN,
+ ISD::VP_FCOPYSIGN,
+ ISD::VP_FROUNDTOZERO,
+ ISD::VP_FRINT,
+ ISD::VP_FNEARBYINT,
+ ISD::VP_IS_FPCLASS,
+ ISD::VP_FMINIMUM,
+ ISD::VP_FMAXIMUM,
+ ISD::VP_LRINT,
+ ISD::VP_LLRINT,
+ ISD::EXPERIMENTAL_VP_REVERSE,
+ ISD::EXPERIMENTAL_VP_SPLICE,
+ ISD::VP_REDUCE_FMINIMUM,
+ ISD::VP_REDUCE_FMAXIMUM,
+ ISD::EXPERIMENTAL_VP_SPLAT};
static const unsigned IntegerVecReduceOps[] = {
ISD::VECREDUCE_ADD, ISD::VECREDUCE_AND, ISD::VECREDUCE_OR,
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It would be quite nice to see how this is expected to help codegen where we currently use regular spats. I assume it will likely reduce vsetvli
s? Is there a plan to introduce it to the vectorizer?
|
||
:: | ||
|
||
declare <2 x double> @llvm.experimental.vp.splat.v2f64(<2 x double> %vec, <2 x i1> %mask, i32 %evl) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This should probably be something like (double %scalar, <2 x i1> %mask, i32 %evl)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done. Thank you for the finding.
@@ -777,6 +777,13 @@ END_REGISTER_VP(experimental_vp_reverse, EXPERIMENTAL_VP_REVERSE) | |||
|
|||
///// } Shuffles | |||
|
|||
// llvm.vp.splat(ptr,val,mask,vlen) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
llvm.experimental.vp.splat(val,mask,vlen)
or x
or something.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done.
@@ -2211,10 +2213,14 @@ SDValue DAGTypeLegalizer::PromoteIntOp_INSERT_VECTOR_ELT(SDNode *N, | |||
} | |||
|
|||
SDValue DAGTypeLegalizer::PromoteIntOp_ScalarOp(SDNode *N) { | |||
SDValue Op = GetPromotedInteger(N->getOperand(0)); | |||
if (N->getOpcode() == ISD::EXPERIMENTAL_VP_SPLAT) | |||
return DAG.getNode(ISD::EXPERIMENTAL_VP_SPLAT, SDLoc(N), N->getValueType(0), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can we use UpdateNodeOperands?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done.
@@ -5859,7 +5866,11 @@ SDValue DAGTypeLegalizer::PromoteIntRes_ScalarOp(SDNode *N) { | |||
EVT NOutElemVT = NOutVT.getVectorElementType(); | |||
|
|||
SDValue Op = DAG.getNode(ISD::ANY_EXTEND, dl, NOutElemVT, N->getOperand(0)); | |||
|
|||
if (N->isVPOpcode()) { | |||
SDValue Mask = N->getOperand(1); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can we pass N->getOperand(1) and N->getOperand(2) directly to getNode without the temporaries?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done.
In #98140 (comment) we ran into an awkward spot where we wanted to perform a splat with an AVL to avoid vl toggles. I think the canonical form would still be shufflevector (or splat_vector for scalable vectors at the SelectionDAG level) |
The main reason of this patch is for codegen and codegen prepare optimization. But I am sure the patch could be benefit to VLA vectorizer. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM with minor comment
This patch introduces a vp intrinsic for splat. It's helpful for IR-level passes to create a splat with specific vector length.
Summary: This patch introduces a vp intrinsic for splat. It's helpful for IR-level passes to create a splat with specific vector length. Test Plan: Reviewers: Subscribers: Tasks: Tags: Differential Revision: https://phabricator.intern.facebook.com/D60250956
This patch introduces a vp intrinsic for splat. It's helpful for IR-level passes to create a splat with specific vector length.