Skip to content
Open
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
124 changes: 124 additions & 0 deletions llvm/docs/LangRef.rst
Original file line number Diff line number Diff line change
Expand Up @@ -27689,6 +27689,130 @@ The '``llvm.masked.compressstore``' intrinsic is designed for compressing data i
Other targets may support this intrinsic differently, for example, by lowering it into a sequence of branches that guard scalar store operations.


Speculative Load Intrinsics
---------------------------

LLVM provides intrinsics for speculatively loading memory that may be
out-of-bounds. These intrinsics enable optimizations like early-exit loop
vectorization where the vectorized loop may read beyond the end of an array,
provided the access is guaranteed to not trap by target-specific checks.

.. _int_speculative_load:

'``llvm.speculative.load``' Intrinsic
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^

Syntax:
"""""""
This is an overloaded intrinsic.

::

declare <4 x float> @llvm.speculative.load.v4f32.p0(ptr <ptr>)
declare <8 x i32> @llvm.speculative.load.v8i32.p0(ptr <ptr>)
declare i64 @llvm.speculative.load.i64.p0(ptr <ptr>)

Overview:
"""""""""

The '``llvm.speculative.load``' intrinsic loads a value from memory. Unlike a
regular load, the memory access may extend beyond the bounds of the allocated
object, provided the pointer has been verified by
:ref:`llvm.can.load.speculatively <int_can_load_speculatively>` to ensure the
access cannot fault.

Arguments:
""""""""""

The argument is a pointer to the memory location to load from. The return type
must have a power-of-2 size in bytes.

Semantics:
""""""""""

The '``llvm.speculative.load``' intrinsic performs a load that may access
memory beyond what is accessible through the pointer. It must be used in
combination with :ref:`llvm.can.load.speculatively <int_can_load_speculatively>`
to ensure the access can be performed speculatively.

A byte at ``ptr + i`` is *accessible through* ``ptr`` if both of the following
hold:

1. The byte lies within the bounds of an allocated object that ``ptr`` is
:ref:`based <pointeraliasing>` on.
2. Accessing the byte through ``ptr`` does not violate any ``noalias``
constraints.

For accessible bytes, the intrinsic returns the stored value. For inaccessible
bytes, the intrinsic returns ``poison`` and the bytes are not considered accessed
for the purpose of data races or ``noalias`` constraints. At least the first
byte must be accessible; otherwise the behavior is undefined.

The behavior is undefined if program execution depends on any byte in the
result that may not be accessible.

The behavior is undefined if this intrinsic is used to load from a pointer
for which ``llvm.can.load.speculatively`` would return false.

.. _int_can_load_speculatively:

'``llvm.can.load.speculatively``' Intrinsic
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^

Syntax:
"""""""
This is an overloaded intrinsic.

::

declare i1 @llvm.can.load.speculatively.p0(ptr <ptr>, i64 <num_bytes>)
declare i1 @llvm.can.load.speculatively.p1(ptr addrspace(1) <ptr>, i64 <num_bytes>)

Overview:
"""""""""

The '``llvm.can.load.speculatively``' intrinsic returns true if it is safe
to speculatively load ``num_bytes`` bytes starting from ``ptr``,
even if the memory may be beyond the bounds of an allocated object.

Arguments:
""""""""""

The first argument is a pointer to the memory location.

The second argument is an i64 specifying the size in bytes of the load.
The size must be a positive power of 2. If the size is not a power-of-2, the
result is ``poison``.

Semantics:
""""""""""

This intrinsic has **target-dependent** semantics. It returns ``true`` if
``num_bytes`` bytes starting at ``ptr + I * num_bytes``, for all non-negative
integers ``I`` where the computed address does not wrap around the address
space, can be loaded speculatively, even if the memory is beyond the bounds of
an allocated object. It returns ``false`` otherwise.

The specific conditions under which this intrinsic returns ``true`` are
determined by the target. For example, a target may check whether the pointer
alignment guarantees all such loads cannot cross a page boundary.

.. code-block:: llvm

; Check if we can safely load 16 bytes from %ptr
%can_load = call i1 @llvm.can.load.speculatively.p0(ptr %ptr, i64 16)
br i1 %can_load, label %speculative_path, label %safe_path

speculative_path:
; Safe to speculatively load from %ptr
%vec = call <4 x i32> @llvm.speculative.load.v4i32.p0(ptr %ptr)
...

safe_path:
; Fall back to masked load or scalar operations
...


Memory Use Markers
------------------

Expand Down
2 changes: 2 additions & 0 deletions llvm/include/llvm/Analysis/TargetTransformInfoImpl.h
Original file line number Diff line number Diff line change
Expand Up @@ -910,6 +910,8 @@ class TargetTransformInfoImplBase {
switch (ICA.getID()) {
default:
break;
case Intrinsic::speculative_load:
return InstructionCost::getInvalid();
case Intrinsic::allow_runtime_check:
case Intrinsic::allow_ubsan_check:
case Intrinsic::annotation:
Expand Down
3 changes: 3 additions & 0 deletions llvm/include/llvm/CodeGen/BasicTTIImpl.h
Original file line number Diff line number Diff line change
Expand Up @@ -1994,6 +1994,9 @@ class BasicTTIImplBase : public TargetTransformInfoImplCRTPBase<T> {
// The cost of materialising a constant integer vector.
return TargetTransformInfo::TCC_Basic;
}
case Intrinsic::speculative_load:
// Delegate to base; targets must opt-in with a valid cost.
return BaseT::getIntrinsicInstrCost(ICA, CostKind);
case Intrinsic::vector_extract: {
// FIXME: Handle case where a scalable vector is extracted from a scalable
// vector
Expand Down
13 changes: 13 additions & 0 deletions llvm/include/llvm/CodeGen/TargetLowering.h
Original file line number Diff line number Diff line change
Expand Up @@ -2292,6 +2292,19 @@ class LLVM_ABI TargetLoweringBase {
llvm_unreachable("Store conditional unimplemented on this target");
}

/// Emit code to check if a speculative load of the given size from Ptr is
/// safe. Returns a Value* representing the check result (i1), or nullptr
/// to use the default lowering (which returns false). Targets can override
/// to provide their own safety check (e.g., alignment-based page boundary
/// check).
/// \param Builder IRBuilder positioned at the intrinsic call site
/// \param Ptr the pointer operand
/// \param Size the size in bytes (constant or runtime value for scalable)
virtual Value *emitCanLoadSpeculatively(IRBuilderBase &Builder, Value *Ptr,
Value *Size) const {
return nullptr;
}

/// Perform a masked atomicrmw using a target-specific intrinsic. This
/// represents the core LL/SC loop which will be lowered at a late stage by
/// the backend. The target-specific intrinsic returns the loaded value and
Expand Down
14 changes: 14 additions & 0 deletions llvm/include/llvm/IR/Intrinsics.td
Original file line number Diff line number Diff line change
Expand Up @@ -2603,6 +2603,20 @@ def int_experimental_vector_compress:
[LLVMMatchType<0>, LLVMScalarOrSameVectorWidth<0, llvm_i1_ty>, LLVMMatchType<0>],
[IntrNoMem]>;

// Speculatively load a value from memory; lowers to a regular aligned load.
// The loaded type must have a power-of-2 size.
def int_speculative_load:
DefaultAttrsIntrinsic<[llvm_any_ty],
[llvm_anyptr_ty],
[IntrArgMemOnly, IntrWillReturn, NoCapture<ArgIndex<0>>]>;

// Returns true if it's safe to speculatively load 'num_bytes' from 'ptr'.
// The size can be a runtime value to support scalable vectors.
def int_can_load_speculatively:
DefaultAttrsIntrinsic<[llvm_i1_ty],
[llvm_anyptr_ty, llvm_i64_ty],
[IntrNoMem, IntrSpeculatable, IntrWillReturn]>;

// Test whether a pointer is associated with a type metadata identifier.
def int_type_test : DefaultAttrsIntrinsic<[llvm_i1_ty], [llvm_ptr_ty, llvm_metadata_ty],
[IntrNoMem, IntrSpeculatable]>;
Expand Down
36 changes: 36 additions & 0 deletions llvm/lib/CodeGen/PreISelIntrinsicLowering.cpp
Original file line number Diff line number Diff line change
Expand Up @@ -136,6 +136,39 @@ static bool lowerLoadRelative(Function &F) {
return Changed;
}

/// Lower @llvm.can.load.speculatively using target-specific expansion.
/// Each target provides its own expansion via
/// TargetLowering::emitCanLoadSpeculatively.
/// The default expansion returns false (conservative).
static bool lowerCanLoadSpeculatively(Function &F, const TargetMachine *TM) {
bool Changed = false;

for (Use &U : llvm::make_early_inc_range(F.uses())) {
auto *CI = dyn_cast<CallInst>(U.getUser());
if (!CI || CI->getCalledOperand() != &F)
continue;

Function *ParentFunc = CI->getFunction();
const TargetLowering *TLI =
TM->getSubtargetImpl(*ParentFunc)->getTargetLowering();

IRBuilder<> Builder(CI);
Value *Ptr = CI->getArgOperand(0);
Value *Size = CI->getArgOperand(1);

// Ask target for expansion; nullptr means use default (return false)
Value *Result = TLI->emitCanLoadSpeculatively(Builder, Ptr, Size);
if (!Result)
Result = Builder.getFalse();

CI->replaceAllUsesWith(Result);
CI->eraseFromParent();
Changed = true;
}

return Changed;
}

// ObjCARC has knowledge about whether an obj-c runtime function needs to be
// always tail-called or never tail-called.
static CallInst::TailCallKind getOverridingTailCallKind(const Function &F) {
Expand Down Expand Up @@ -692,6 +725,9 @@ bool PreISelIntrinsicLowering::lowerIntrinsics(Module &M) const {
case Intrinsic::load_relative:
Changed |= lowerLoadRelative(F);
break;
case Intrinsic::can_load_speculatively:
Changed |= lowerCanLoadSpeculatively(F, TM);
break;
case Intrinsic::is_constant:
case Intrinsic::objectsize:
Changed |= forEachCall(F, [&](CallInst *CI) {
Expand Down
30 changes: 30 additions & 0 deletions llvm/lib/CodeGen/SelectionDAG/SelectionDAGBuilder.cpp
Original file line number Diff line number Diff line change
Expand Up @@ -5128,6 +5128,33 @@ void SelectionDAGBuilder::visitMaskedLoad(const CallInst &I, bool IsExpanding) {
setValue(&I, Res);
}

void SelectionDAGBuilder::visitSpeculativeLoad(const CallInst &I) {
SDLoc sdl = getCurSDLoc();
Value *PtrOperand = I.getArgOperand(0);
SDValue Ptr = getValue(PtrOperand);

const TargetLowering &TLI = DAG.getTargetLoweringInfo();
EVT VT = TLI.getValueType(DAG.getDataLayout(), I.getType());
Align Alignment = I.getParamAlign(0).valueOrOne();
AAMDNodes AAInfo = I.getAAMetadata();
TypeSize StoreSize = VT.getStoreSize();

SDValue InChain = DAG.getRoot();

// Use MOLoad but NOT MODereferenceable - the memory may not be
// fully dereferenceable.
MachineMemOperand::Flags MMOFlags = MachineMemOperand::MOLoad;
LocationSize LocSize = StoreSize.isScalable()
? LocationSize::beforeOrAfterPointer()
: LocationSize::precise(StoreSize);
MachineMemOperand *MMO = DAG.getMachineFunction().getMachineMemOperand(
MachinePointerInfo(PtrOperand), MMOFlags, LocSize, Alignment, AAInfo);

SDValue Load = DAG.getLoad(VT, sdl, InChain, Ptr, MMO);
PendingLoads.push_back(Load.getValue(1));
setValue(&I, Load);
}

void SelectionDAGBuilder::visitMaskedGather(const CallInst &I) {
SDLoc sdl = getCurSDLoc();

Expand Down Expand Up @@ -6883,6 +6910,9 @@ void SelectionDAGBuilder::visitIntrinsicCall(const CallInst &I,
case Intrinsic::masked_compressstore:
visitMaskedStore(I, true /* IsCompressing */);
return;
case Intrinsic::speculative_load:
visitSpeculativeLoad(I);
return;
case Intrinsic::powi:
setValue(&I, ExpandPowI(sdl, getValue(I.getArgOperand(0)),
getValue(I.getArgOperand(1)), DAG));
Expand Down
1 change: 1 addition & 0 deletions llvm/lib/CodeGen/SelectionDAG/SelectionDAGBuilder.h
Original file line number Diff line number Diff line change
Expand Up @@ -619,6 +619,7 @@ class SelectionDAGBuilder {
void visitStore(const StoreInst &I);
void visitMaskedLoad(const CallInst &I, bool IsExpanding = false);
void visitMaskedStore(const CallInst &I, bool IsCompressing = false);
void visitSpeculativeLoad(const CallInst &I);
void visitMaskedGather(const CallInst &I);
void visitMaskedScatter(const CallInst &I);
void visitAtomicCmpXchg(const AtomicCmpXchgInst &I);
Expand Down
18 changes: 18 additions & 0 deletions llvm/lib/IR/Verifier.cpp
Original file line number Diff line number Diff line change
Expand Up @@ -6753,6 +6753,24 @@ void Verifier::visitIntrinsicCall(Intrinsic::ID ID, CallBase &Call) {
&Call);
break;
}
case Intrinsic::speculative_load: {
Type *LoadTy = Call.getType();
TypeSize Size = DL.getTypeStoreSize(LoadTy);
// For scalable vectors, check the known minimum size is a power of 2.
Check(Size.getKnownMinValue() > 0 && isPowerOf2_64(Size.getKnownMinValue()),
"llvm.speculative.load type must have a power-of-2 size", &Call);
break;
}
case Intrinsic::can_load_speculatively: {
// If size is a constant, verify it's a positive power of 2.
if (auto *SizeCI = dyn_cast<ConstantInt>(Call.getArgOperand(1))) {
uint64_t Size = SizeCI->getZExtValue();
Check(Size > 0 && isPowerOf2_64(Size),
"llvm.can.load.speculatively size must be a positive power of 2",
&Call);
}
break;
}
case Intrinsic::vector_insert: {
Value *Vec = Call.getArgOperand(0);
Value *SubVec = Call.getArgOperand(1);
Expand Down
50 changes: 50 additions & 0 deletions llvm/lib/Target/AArch64/AArch64ISelLowering.cpp
Original file line number Diff line number Diff line change
Expand Up @@ -30507,6 +30507,56 @@ Value *AArch64TargetLowering::emitStoreConditional(IRBuilderBase &Builder,
return CI;
}

Value *AArch64TargetLowering::emitCanLoadSpeculatively(IRBuilderBase &Builder,
Value *Ptr,
Value *Size) const {
unsigned AS = cast<PointerType>(Ptr->getType())->getAddressSpace();
// Conservatively only allow speculation for address space 0.
if (AS != 0)
return nullptr;
// For power-of-2 sizes <= 16, emit alignment check: (ptr & (size - 1)) == 0.
// If the pointer is aligned to at least 'size' bytes, loading 'size' bytes
// cannot cross a page boundary, so it's safe to speculate.
// The 16-byte limit ensures correctness with MTE (memory tagging), since
// MTE uses 16-byte tag granules.
//
// The alignment check only works for power-of-2 sizes. For non-power-of-2
// sizes, we conservatively return false.
const DataLayout &DL = Builder.GetInsertBlock()->getModule()->getDataLayout();

unsigned PtrBits = DL.getPointerSizeInBits(AS);
Type *IntPtrTy = Builder.getIntNTy(PtrBits);
if (auto *CI = dyn_cast<ConstantInt>(Size)) {
uint64_t SizeVal = CI->getZExtValue();
assert(isPowerOf2_64(SizeVal) && "size must be power-of-two");
// For constant sizes > 16, return nullptr (default false).
if (SizeVal > 16)
return nullptr;

// Power-of-2 constant size <= 16: use fast alignment check.
Value *PtrInt = Builder.CreatePtrToInt(Ptr, IntPtrTy);
Value *Mask = ConstantInt::get(IntPtrTy, SizeVal - 1);
Value *Masked = Builder.CreateAnd(PtrInt, Mask);
return Builder.CreateICmpEQ(Masked, ConstantInt::get(IntPtrTy, 0));
}

// Check power-of-2 size <= 16 and alignment.
Value *PtrInt = Builder.CreatePtrToInt(Ptr, IntPtrTy);
Value *SizeExt = Builder.CreateZExtOrTrunc(Size, IntPtrTy);

Value *SizeLE16 =
Builder.CreateICmpULE(SizeExt, ConstantInt::get(IntPtrTy, 16));

// alignment check: (ptr & (size - 1)) == 0
Value *SizeMinusOne =
Builder.CreateSub(SizeExt, ConstantInt::get(IntPtrTy, 1));
Value *Masked = Builder.CreateAnd(PtrInt, SizeMinusOne);
Value *AlignCheck =
Builder.CreateICmpEQ(Masked, ConstantInt::get(IntPtrTy, 0));

return Builder.CreateAnd(SizeLE16, AlignCheck);
}

bool AArch64TargetLowering::functionArgumentNeedsConsecutiveRegisters(
Type *Ty, CallingConv::ID CallConv, bool isVarArg,
const DataLayout &DL) const {
Expand Down
2 changes: 2 additions & 0 deletions llvm/lib/Target/AArch64/AArch64ISelLowering.h
Original file line number Diff line number Diff line change
Expand Up @@ -350,6 +350,8 @@ class AArch64TargetLowering : public TargetLowering {
AtomicOrdering Ord) const override;
Value *emitStoreConditional(IRBuilderBase &Builder, Value *Val, Value *Addr,
AtomicOrdering Ord) const override;
Value *emitCanLoadSpeculatively(IRBuilderBase &Builder, Value *Ptr,
Value *Size) const override;

void emitAtomicCmpXchgNoStoreLLBalance(IRBuilderBase &Builder) const override;

Expand Down
Loading