[TTI] Provide conservative costs for @llvm.speculative.load.#180036
Open
[TTI] Provide conservative costs for @llvm.speculative.load.#180036
Conversation
Member
|
@llvm/pr-subscribers-llvm-ir @llvm/pr-subscribers-backend-aarch64 Author: Florian Hahn (fhahn) ChangesAdd TTI support for @llvm.speculative.load, defaulting to Invalid if not Provide implementation for AArch64, which checks if the loaded type is Depends on #179642 (included in PR) Patch is 46.07 KiB, truncated to 20.00 KiB below, full version: https://github.com/llvm/llvm-project/pull/180036.diff 21 Files Affected:
diff --git a/llvm/docs/LangRef.rst b/llvm/docs/LangRef.rst
index ddd5087830acc..f4f4a6a3fc9f2 100644
--- a/llvm/docs/LangRef.rst
+++ b/llvm/docs/LangRef.rst
@@ -27678,6 +27678,119 @@ The '``llvm.masked.compressstore``' intrinsic is designed for compressing data i
Other targets may support this intrinsic differently, for example, by lowering it into a sequence of branches that guard scalar store operations.
+Speculative Load Intrinsics
+---------------------------
+
+LLVM provides intrinsics for speculatively loading memory that may be
+out-of-bounds. These intrinsics enable optimizations like early-exit loop
+vectorization where the vectorized loop may read beyond the end of an array,
+provided the access is guaranteed to not trap by target-specific checks.
+
+.. _int_speculative_load:
+
+'``llvm.speculative.load``' Intrinsic
+^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
+
+Syntax:
+"""""""
+This is an overloaded intrinsic.
+
+::
+
+ declare <4 x float> @llvm.speculative.load.v4f32.p0(ptr <ptr>)
+ declare <8 x i32> @llvm.speculative.load.v8i32.p0(ptr <ptr>)
+ declare i64 @llvm.speculative.load.i64.p0(ptr <ptr>)
+
+Overview:
+"""""""""
+
+The '``llvm.speculative.load``' intrinsic loads a value from memory. Unlike a
+regular load, the memory access may extend beyond the bounds of the allocated
+object, provided the pointer has been verified by
+:ref:`llvm.can.load.speculatively <int_can_load_speculatively>` to ensure the
+access cannot fault.
+
+Arguments:
+""""""""""
+
+The argument is a pointer to the memory location to load from. The return type
+must have a power-of-2 size in bytes.
+
+Semantics:
+""""""""""
+
+The '``llvm.speculative.load``' intrinsic performs a load that may access
+memory beyond the allocated object. It must be used in combination with
+:ref:`llvm.can.load.speculatively <int_can_load_speculatively>` to ensure
+the access cannot fault.
+
+For bytes that are within the bounds of the allocated object, the intrinsic
+returns the stored value. For bytes that are beyond the bounds of the
+allocated object, the intrinsic returns ``poison`` for those bytes. At least the
+first accessed byte must be within the bounds of an allocated object the pointer is
+based on.
+
+The behavior is undefined if this intrinsic is used to load from a pointer
+for which ``llvm.can.load.speculatively`` would return false.
+
+.. _int_can_load_speculatively:
+
+'``llvm.can.load.speculatively``' Intrinsic
+^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
+
+Syntax:
+"""""""
+This is an overloaded intrinsic.
+
+::
+
+ declare i1 @llvm.can.load.speculatively.p0(ptr <ptr>, i64 <num_bytes>)
+ declare i1 @llvm.can.load.speculatively.p1(ptr addrspace(1) <ptr>, i64 <num_bytes>)
+
+Overview:
+"""""""""
+
+The '``llvm.can.load.speculatively``' intrinsic returns true if it is safe
+to speculatively load ``num_bytes`` bytes starting from ``ptr``,
+even if the memory may be beyond the bounds of an allocated object.
+
+Arguments:
+""""""""""
+
+The first argument is a pointer to the memory location.
+
+The second argument is an i64 specifying the size in bytes of the load.
+The size must be a positive power of 2. If the size is not a power-of-2, the
+result is ``poison``.
+
+Semantics:
+""""""""""
+
+This intrinsic has **target-dependent** semantics. It returns ``true`` if
+``num_bytes`` bytes starting at ``ptr`` can be loaded speculatively, even
+if the memory is beyond the bounds of an allocated object. It returns
+``false`` otherwise.
+
+The specific conditions under which this intrinsic returns ``true`` are
+determined by the target. For example, a target may check whether the pointer
+alignment guarantees the load cannot cross a page boundary.
+
+.. code-block:: llvm
+
+ ; Check if we can safely load 16 bytes from %ptr
+ %can_load = call i1 @llvm.can.load.speculatively.p0(ptr %ptr, i64 16)
+ br i1 %can_load, label %speculative_path, label %safe_path
+
+ speculative_path:
+ ; Safe to speculatively load from %ptr
+ %vec = call <4 x i32> @llvm.speculative.load.v4i32.p0(ptr %ptr)
+ ...
+
+ safe_path:
+ ; Fall back to masked load or scalar operations
+ ...
+
+
Memory Use Markers
------------------
diff --git a/llvm/include/llvm/Analysis/TargetTransformInfoImpl.h b/llvm/include/llvm/Analysis/TargetTransformInfoImpl.h
index 6d27cabf404f8..b83941da917eb 100644
--- a/llvm/include/llvm/Analysis/TargetTransformInfoImpl.h
+++ b/llvm/include/llvm/Analysis/TargetTransformInfoImpl.h
@@ -905,6 +905,8 @@ class TargetTransformInfoImplBase {
switch (ICA.getID()) {
default:
break;
+ case Intrinsic::speculative_load:
+ return InstructionCost::getInvalid();
case Intrinsic::allow_runtime_check:
case Intrinsic::allow_ubsan_check:
case Intrinsic::annotation:
diff --git a/llvm/include/llvm/CodeGen/BasicTTIImpl.h b/llvm/include/llvm/CodeGen/BasicTTIImpl.h
index 21afcbefdf719..0de90f7a43a53 100644
--- a/llvm/include/llvm/CodeGen/BasicTTIImpl.h
+++ b/llvm/include/llvm/CodeGen/BasicTTIImpl.h
@@ -1994,6 +1994,9 @@ class BasicTTIImplBase : public TargetTransformInfoImplCRTPBase<T> {
// The cost of materialising a constant integer vector.
return TargetTransformInfo::TCC_Basic;
}
+ case Intrinsic::speculative_load:
+ // Delegate to base; targets must opt-in with a valid cost.
+ return BaseT::getIntrinsicInstrCost(ICA, CostKind);
case Intrinsic::vector_extract: {
// FIXME: Handle case where a scalable vector is extracted from a scalable
// vector
diff --git a/llvm/include/llvm/CodeGen/TargetLowering.h b/llvm/include/llvm/CodeGen/TargetLowering.h
index ada4ffd3bcc89..ebc6b64590dea 100644
--- a/llvm/include/llvm/CodeGen/TargetLowering.h
+++ b/llvm/include/llvm/CodeGen/TargetLowering.h
@@ -2292,6 +2292,19 @@ class LLVM_ABI TargetLoweringBase {
llvm_unreachable("Store conditional unimplemented on this target");
}
+ /// Emit code to check if a speculative load of the given size from Ptr is
+ /// safe. Returns a Value* representing the check result (i1), or nullptr
+ /// to use the default lowering (which returns false). Targets can override
+ /// to provide their own safety check (e.g., alignment-based page boundary
+ /// check).
+ /// \param Builder IRBuilder positioned at the intrinsic call site
+ /// \param Ptr the pointer operand
+ /// \param Size the size in bytes (constant or runtime value for scalable)
+ virtual Value *emitCanLoadSpeculatively(IRBuilderBase &Builder, Value *Ptr,
+ Value *Size) const {
+ return nullptr;
+ }
+
/// Perform a masked atomicrmw using a target-specific intrinsic. This
/// represents the core LL/SC loop which will be lowered at a late stage by
/// the backend. The target-specific intrinsic returns the loaded value and
diff --git a/llvm/include/llvm/IR/Intrinsics.td b/llvm/include/llvm/IR/Intrinsics.td
index ab5f3fbbaf860..f2d5f9cd18894 100644
--- a/llvm/include/llvm/IR/Intrinsics.td
+++ b/llvm/include/llvm/IR/Intrinsics.td
@@ -2601,6 +2601,20 @@ def int_experimental_vector_compress:
[LLVMMatchType<0>, LLVMScalarOrSameVectorWidth<0, llvm_i1_ty>, LLVMMatchType<0>],
[IntrNoMem]>;
+// Speculatively load a value from memory; lowers to a regular aligned load.
+// The loaded type must have a power-of-2 size.
+def int_speculative_load:
+ DefaultAttrsIntrinsic<[llvm_any_ty],
+ [llvm_anyptr_ty],
+ [IntrArgMemOnly, IntrWillReturn, NoCapture<ArgIndex<0>>]>;
+
+// Returns true if it's safe to speculatively load 'num_bytes' from 'ptr'.
+// The size can be a runtime value to support scalable vectors.
+def int_can_load_speculatively:
+ DefaultAttrsIntrinsic<[llvm_i1_ty],
+ [llvm_anyptr_ty, llvm_i64_ty],
+ [IntrNoMem, IntrSpeculatable, IntrWillReturn]>;
+
// Test whether a pointer is associated with a type metadata identifier.
def int_type_test : DefaultAttrsIntrinsic<[llvm_i1_ty], [llvm_ptr_ty, llvm_metadata_ty],
[IntrNoMem, IntrSpeculatable]>;
diff --git a/llvm/lib/CodeGen/PreISelIntrinsicLowering.cpp b/llvm/lib/CodeGen/PreISelIntrinsicLowering.cpp
index 201a7c0f37653..83d61655c9d9e 100644
--- a/llvm/lib/CodeGen/PreISelIntrinsicLowering.cpp
+++ b/llvm/lib/CodeGen/PreISelIntrinsicLowering.cpp
@@ -131,6 +131,39 @@ static bool lowerLoadRelative(Function &F) {
return Changed;
}
+/// Lower @llvm.can.load.speculatively using target-specific expansion.
+/// Each target provides its own expansion via
+/// TargetLowering::emitCanLoadSpeculatively.
+/// The default expansion returns false (conservative).
+static bool lowerCanLoadSpeculatively(Function &F, const TargetMachine *TM) {
+ bool Changed = false;
+
+ for (Use &U : llvm::make_early_inc_range(F.uses())) {
+ auto *CI = dyn_cast<CallInst>(U.getUser());
+ if (!CI || CI->getCalledOperand() != &F)
+ continue;
+
+ Function *ParentFunc = CI->getFunction();
+ const TargetLowering *TLI =
+ TM->getSubtargetImpl(*ParentFunc)->getTargetLowering();
+
+ IRBuilder<> Builder(CI);
+ Value *Ptr = CI->getArgOperand(0);
+ Value *Size = CI->getArgOperand(1);
+
+ // Ask target for expansion; nullptr means use default (return false)
+ Value *Result = TLI->emitCanLoadSpeculatively(Builder, Ptr, Size);
+ if (!Result)
+ Result = Builder.getFalse();
+
+ CI->replaceAllUsesWith(Result);
+ CI->eraseFromParent();
+ Changed = true;
+ }
+
+ return Changed;
+}
+
// ObjCARC has knowledge about whether an obj-c runtime function needs to be
// always tail-called or never tail-called.
static CallInst::TailCallKind getOverridingTailCallKind(const Function &F) {
@@ -632,6 +665,9 @@ bool PreISelIntrinsicLowering::lowerIntrinsics(Module &M) const {
case Intrinsic::load_relative:
Changed |= lowerLoadRelative(F);
break;
+ case Intrinsic::can_load_speculatively:
+ Changed |= lowerCanLoadSpeculatively(F, TM);
+ break;
case Intrinsic::is_constant:
case Intrinsic::objectsize:
Changed |= forEachCall(F, [&](CallInst *CI) {
diff --git a/llvm/lib/CodeGen/SelectionDAG/SelectionDAGBuilder.cpp b/llvm/lib/CodeGen/SelectionDAG/SelectionDAGBuilder.cpp
index e191cc5524a14..196f380b5a4dc 100644
--- a/llvm/lib/CodeGen/SelectionDAG/SelectionDAGBuilder.cpp
+++ b/llvm/lib/CodeGen/SelectionDAG/SelectionDAGBuilder.cpp
@@ -5122,6 +5122,33 @@ void SelectionDAGBuilder::visitMaskedLoad(const CallInst &I, bool IsExpanding) {
setValue(&I, Res);
}
+void SelectionDAGBuilder::visitSpeculativeLoad(const CallInst &I) {
+ SDLoc sdl = getCurSDLoc();
+ Value *PtrOperand = I.getArgOperand(0);
+ SDValue Ptr = getValue(PtrOperand);
+
+ const TargetLowering &TLI = DAG.getTargetLoweringInfo();
+ EVT VT = TLI.getValueType(DAG.getDataLayout(), I.getType());
+ Align Alignment = I.getParamAlign(0).valueOrOne();
+ AAMDNodes AAInfo = I.getAAMetadata();
+ TypeSize StoreSize = VT.getStoreSize();
+
+ SDValue InChain = DAG.getRoot();
+
+ // Use MOLoad but NOT MODereferenceable - the memory may not be
+ // fully dereferenceable.
+ MachineMemOperand::Flags MMOFlags = MachineMemOperand::MOLoad;
+ LocationSize LocSize = StoreSize.isScalable()
+ ? LocationSize::beforeOrAfterPointer()
+ : LocationSize::precise(StoreSize);
+ MachineMemOperand *MMO = DAG.getMachineFunction().getMachineMemOperand(
+ MachinePointerInfo(PtrOperand), MMOFlags, LocSize, Alignment, AAInfo);
+
+ SDValue Load = DAG.getLoad(VT, sdl, InChain, Ptr, MMO);
+ PendingLoads.push_back(Load.getValue(1));
+ setValue(&I, Load);
+}
+
void SelectionDAGBuilder::visitMaskedGather(const CallInst &I) {
SDLoc sdl = getCurSDLoc();
@@ -6873,6 +6900,9 @@ void SelectionDAGBuilder::visitIntrinsicCall(const CallInst &I,
case Intrinsic::masked_compressstore:
visitMaskedStore(I, true /* IsCompressing */);
return;
+ case Intrinsic::speculative_load:
+ visitSpeculativeLoad(I);
+ return;
case Intrinsic::powi:
setValue(&I, ExpandPowI(sdl, getValue(I.getArgOperand(0)),
getValue(I.getArgOperand(1)), DAG));
diff --git a/llvm/lib/CodeGen/SelectionDAG/SelectionDAGBuilder.h b/llvm/lib/CodeGen/SelectionDAG/SelectionDAGBuilder.h
index f8aecea25b3d6..dad406f48b77b 100644
--- a/llvm/lib/CodeGen/SelectionDAG/SelectionDAGBuilder.h
+++ b/llvm/lib/CodeGen/SelectionDAG/SelectionDAGBuilder.h
@@ -619,6 +619,7 @@ class SelectionDAGBuilder {
void visitStore(const StoreInst &I);
void visitMaskedLoad(const CallInst &I, bool IsExpanding = false);
void visitMaskedStore(const CallInst &I, bool IsCompressing = false);
+ void visitSpeculativeLoad(const CallInst &I);
void visitMaskedGather(const CallInst &I);
void visitMaskedScatter(const CallInst &I);
void visitAtomicCmpXchg(const AtomicCmpXchgInst &I);
diff --git a/llvm/lib/IR/Verifier.cpp b/llvm/lib/IR/Verifier.cpp
index 120a7daa16f05..5003a6886d67f 100644
--- a/llvm/lib/IR/Verifier.cpp
+++ b/llvm/lib/IR/Verifier.cpp
@@ -6744,6 +6744,24 @@ void Verifier::visitIntrinsicCall(Intrinsic::ID ID, CallBase &Call) {
&Call);
break;
}
+ case Intrinsic::speculative_load: {
+ Type *LoadTy = Call.getType();
+ TypeSize Size = DL.getTypeStoreSize(LoadTy);
+ // For scalable vectors, check the known minimum size is a power of 2.
+ Check(Size.getKnownMinValue() > 0 && isPowerOf2_64(Size.getKnownMinValue()),
+ "llvm.speculative.load type must have a power-of-2 size", &Call);
+ break;
+ }
+ case Intrinsic::can_load_speculatively: {
+ // If size is a constant, verify it's a positive power of 2.
+ if (auto *SizeCI = dyn_cast<ConstantInt>(Call.getArgOperand(1))) {
+ uint64_t Size = SizeCI->getZExtValue();
+ Check(Size > 0 && isPowerOf2_64(Size),
+ "llvm.can.load.speculatively size must be a positive power of 2",
+ &Call);
+ }
+ break;
+ }
case Intrinsic::vector_insert: {
Value *Vec = Call.getArgOperand(0);
Value *SubVec = Call.getArgOperand(1);
diff --git a/llvm/lib/Target/AArch64/AArch64ISelLowering.cpp b/llvm/lib/Target/AArch64/AArch64ISelLowering.cpp
index 88836d6e167b8..76b6871831069 100644
--- a/llvm/lib/Target/AArch64/AArch64ISelLowering.cpp
+++ b/llvm/lib/Target/AArch64/AArch64ISelLowering.cpp
@@ -30215,6 +30215,56 @@ Value *AArch64TargetLowering::emitStoreConditional(IRBuilderBase &Builder,
return CI;
}
+Value *AArch64TargetLowering::emitCanLoadSpeculatively(IRBuilderBase &Builder,
+ Value *Ptr,
+ Value *Size) const {
+ unsigned AS = cast<PointerType>(Ptr->getType())->getAddressSpace();
+ // Conservatively only allow speculation for address space 0.
+ if (AS != 0)
+ return nullptr;
+ // For power-of-2 sizes <= 16, emit alignment check: (ptr & (size - 1)) == 0.
+ // If the pointer is aligned to at least 'size' bytes, loading 'size' bytes
+ // cannot cross a page boundary, so it's safe to speculate.
+ // The 16-byte limit ensures correctness with MTE (memory tagging), since
+ // MTE uses 16-byte tag granules.
+ //
+ // The alignment check only works for power-of-2 sizes. For non-power-of-2
+ // sizes, we conservatively return false.
+ const DataLayout &DL = Builder.GetInsertBlock()->getModule()->getDataLayout();
+
+ unsigned PtrBits = DL.getPointerSizeInBits(AS);
+ Type *IntPtrTy = Builder.getIntNTy(PtrBits);
+ if (auto *CI = dyn_cast<ConstantInt>(Size)) {
+ uint64_t SizeVal = CI->getZExtValue();
+ assert(isPowerOf2_64(SizeVal) && "size must be power-of-two");
+ // For constant sizes > 16, return nullptr (default false).
+ if (SizeVal > 16)
+ return nullptr;
+
+ // Power-of-2 constant size <= 16: use fast alignment check.
+ Value *PtrInt = Builder.CreatePtrToInt(Ptr, IntPtrTy);
+ Value *Mask = ConstantInt::get(IntPtrTy, SizeVal - 1);
+ Value *Masked = Builder.CreateAnd(PtrInt, Mask);
+ return Builder.CreateICmpEQ(Masked, ConstantInt::get(IntPtrTy, 0));
+ }
+
+ // Check power-of-2 size <= 16 and alignment.
+ Value *PtrInt = Builder.CreatePtrToInt(Ptr, IntPtrTy);
+ Value *SizeExt = Builder.CreateZExtOrTrunc(Size, IntPtrTy);
+
+ Value *SizeLE16 =
+ Builder.CreateICmpULE(SizeExt, ConstantInt::get(IntPtrTy, 16));
+
+ // alignment check: (ptr & (size - 1)) == 0
+ Value *SizeMinusOne =
+ Builder.CreateSub(SizeExt, ConstantInt::get(IntPtrTy, 1));
+ Value *Masked = Builder.CreateAnd(PtrInt, SizeMinusOne);
+ Value *AlignCheck =
+ Builder.CreateICmpEQ(Masked, ConstantInt::get(IntPtrTy, 0));
+
+ return Builder.CreateAnd(SizeLE16, AlignCheck);
+}
+
bool AArch64TargetLowering::functionArgumentNeedsConsecutiveRegisters(
Type *Ty, CallingConv::ID CallConv, bool isVarArg,
const DataLayout &DL) const {
diff --git a/llvm/lib/Target/AArch64/AArch64ISelLowering.h b/llvm/lib/Target/AArch64/AArch64ISelLowering.h
index 89a8858550ca2..884e072eaa925 100644
--- a/llvm/lib/Target/AArch64/AArch64ISelLowering.h
+++ b/llvm/lib/Target/AArch64/AArch64ISelLowering.h
@@ -350,6 +350,8 @@ class AArch64TargetLowering : public TargetLowering {
AtomicOrdering Ord) const override;
Value *emitStoreConditional(IRBuilderBase &Builder, Value *Val, Value *Addr,
AtomicOrdering Ord) const override;
+ Value *emitCanLoadSpeculatively(IRBuilderBase &Builder, Value *Ptr,
+ Value *Size) const override;
void emitAtomicCmpXchgNoStoreLLBalance(IRBuilderBase &Builder) const override;
diff --git a/llvm/lib/Target/AArch64/AArch64TargetTransformInfo.cpp b/llvm/lib/Target/AArch64/AArch64TargetTransformInfo.cpp
index bdf06e39d7367..0945c30429dfb 100644
--- a/llvm/lib/Target/AArch64/AArch64TargetTransformInfo.cpp
+++ b/llvm/lib/Target/AArch64/AArch64TargetTransformInfo.cpp
@@ -628,6 +628,25 @@ AArch64TTIImpl::getIntrinsicInstrCost(const IntrinsicCostAttributes &ICA,
return InstructionCost::getInvalid();
switch (ICA.getID()) {
+ case Intrinsic::speculative_load: {
+ // Speculative loads are only valid for types <= 16 bytes due to MTE
+ // (Memory Tagging Extension) using 16-byte tag granules. Loads larger
+ // than 16 bytes could cross a tag granule boundary.
+ auto LT = getTypeLegalizationCost(RetTy);
+ if (!LT.first.isValid())
+ return InstructionCost::getInvalid();
+ // For scalable vectors, check that we use a single register (which means
+ // <= 16 bytes at minimum vscale). For fixed types, compute the actual size.
+ if (isa<ScalableVectorType>(RetTy)) {
+ if (LT.first.getValue() != 1)
+ return InstructionCost::getInvalid();
+ } else {
+ if (LT.first.getValue() * LT.second.getStoreSize() > 16)
+ return InstructionCost::getInvalid();
+ }
+ // Return cost of a regular load.
+ return getMemoryOpCost(Instruction::Load, RetTy, Align(1), 0, CostKind);
+ }
case Intrinsic::experimental_vector_histogram_add: {
InstructionCost HistCost = getHistogramCost(ST, ICA);
// If the cost isn't valid, we may still be able to scalarize
diff --git a/llvm/test/Analysis/CostModel/AArch64/speculative-load.ll b/llvm/test/Analysis/CostModel/AArch64/speculative-load.ll
new file mode 100644
index 0000000000000..7587c3ec77b40
--- /dev/null
+++ b/llvm/test/Analysis/CostModel/AArch64/speculative-load.ll
@@ -0,0 +1,88 @@
+; NOTE: Assertions have been autogenerated by utils/update_analyze_test_checks.py
+; RUN: opt -passes="print<cost-model>" 2>&1 -disable-output -mtriple=aarch64 < %s | FileCheck %s --check-prefixes=COMMON,NEON
+; RUN: opt -passes="print<cost-model>" 2>&1 -disable-output -mtriple=aarch64 -mattr=+sve < %s | FileCheck %s --check-prefixes=COMMON,SVE
+
+define void @speculative_load_cost_fixed(ptr %p) {
+ ; Scalar types - all valid (<= 16 bytes)
+; COMMON-LABEL: 'speculative_load_cost_fixed'
+; COMMON-NEXT: Cost Model: Found an estimated cost of 1 for instruction: %1 = call i8 @llvm.speculative.load.i8.p0(ptr %p)
+; COMMON-NEXT: Cost Model: Found an estimated cost of 1 for instruction: %2 = call i16 @llvm.speculative.load.i16.p0(ptr %p)
+; COMMON-NEXT: Cost Model: Found an estimated cost of 1 for instruction: %3 = call i32 @llvm.speculative.load.i32.p0(ptr %p)
+; COMMON-NEXT: Cost Model: Found an estimated cost of 1 for instruction: %4 = call i64 @llvm.speculative.load.i64.p0(ptr %p)
+...
[truncated]
|
Introduce two new intrinsics to enable vectorization of loops with early exits that have potentially faulting loads. This has previously been discussed in llvm#120603 and is similar to @nikic's https://hackmd.io/@nikic/S1O4QWYZkx, with the major difference being that there is no `%defined_size` argument and instead the load returns the stored values for the bytes within bounds and undef otherwise. I don't think we can easily compute the defined size because it may depend on the loaded values (i.e. at what lane the early exit has been taken). 1. `@llvm.speculative.load` (name subject to change) - perform a load that may access memory beyond the allocated object. It must be used in combination with `@llvm.can.load.speculatively` to ensure the load is guaranteed to not trap. 2. `@llvm.can.load.speculatively` - Returns true if it's safe to speculatively load a given number of bytes from a pointer. The semantics are target-dependent. On some targets, this may check that the access does not cross page boundaries, or stricter checks for example on AArch64 with MTE, which limits the access size to 16 bytes. `@llvm.speculative.load` is lowered to a regular load in SelectionDAG without MODereferenceable. I am not sure if we need to be more careful than this, i.e. if we could still reason about SelectionDAG loads to infer dereferencability for the pointer. `@llvm.can.load.speculatively` is lowered to regular IR in PreISel lowering, using a target-lowering hook. By default, it conservatively expands to false. These intrinsics should allow the loop vectorizer to vectorize early-exit loops with potentially non-dereferenceable loads.
Add TTI support for @llvm.speculative.load, defaulting to Invalid if not implemented by the target. Provide implementation for AArch64, which checks if the loaded type is <= 16 bytes.
22902de to
c3c6ae7
Compare
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Add TTI support for @llvm.speculative.load, defaulting to Invalid if not
implemented by the target.
Provide implementation for AArch64, which checks if the loaded type is
<= 16 bytes.
Depends on #179642 (included in PR)