Skip to content
Open
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
133 changes: 133 additions & 0 deletions llvm/docs/LangRef.rst
Original file line number Diff line number Diff line change
Expand Up @@ -27815,6 +27815,139 @@ The '``llvm.masked.compressstore``' intrinsic is designed for compressing data i
Other targets may support this intrinsic differently, for example, by lowering it into a sequence of branches that guard scalar store operations.


Speculative Load Intrinsics
---------------------------

LLVM provides intrinsics for speculatively loading memory that may be
out-of-bounds. These intrinsics enable optimizations like early-exit loop
vectorization where the vectorized loop may read beyond the end of an array,
provided the access is guaranteed be valid by target-specific checks.

.. _int_speculative_load:

'``llvm.speculative.load``' Intrinsic
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^

Syntax:
"""""""
This is an overloaded intrinsic.

::

; Direct form: number of accessible bytes given as i64
declare <4 x float> @llvm.speculative.load.v4f32.p0(ptr <ptr>, i1 <from_end>, i64 <num_accessible_bytes>)
declare <8 x i32> @llvm.speculative.load.v8i32.p0(ptr <ptr>, i1 <from_end>, i64 <num_accessible_bytes>)

; Oracle form: accessible bytes computed by calling oracle_fn(args...)
declare <4 x float> @llvm.speculative.load.v4f32.p0(ptr <ptr>, i1 <from_end>, ptr <oracle_fn>, ...)

Overview:
"""""""""

The '``llvm.speculative.load``' intrinsic loads a value from memory. Unlike a
regular load, the memory access may extend beyond the bounds of the allocated
object, provided the pointer has been verified by
:ref:`llvm.can.load.speculatively <int_can_load_speculatively>` to ensure the
access is valid.

Arguments:
""""""""""

The first argument is a pointer to the memory location to load from. The return
type must be a vector type with a power-of-2 size in bytes. The second argument
is an ``i1`` constant flag ``from_end`` that specifies whether the ``N``
accessible bytes are counted from the start or the end of the loaded values (see
Semantics). The remaining arguments determine the *number of accessible bytes*,
denoted ``N`` below.

In the **direct form**, the third argument is an ``i64`` specifying ``N``
directly. In the **oracle form**, the third argument must be a direct
reference to a function returning ``i64`` that may only read memory through its
arguments (indirect function pointers are not permitted); the remaining
arguments are forwarded to it, and its return value is ``N``.

Semantics:
""""""""""

Let ``S`` denote the size of the return type in bytes. The intrinsic performs
a load of ``S`` bytes starting from ``ptr``.

When ``from_end`` is ``false``, the first ``N`` bytes (offsets ``[0, N)``)
are the stored values read from memory. Bytes at offsets ``[N, S)`` are
``poison``.

When ``from_end`` is ``true``, the last ``N`` bytes (offsets ``[S - N, S)``)
are the stored values read from memory. Bytes at offsets ``[0, S - N)`` are
``poison``.

In both cases, the ``N`` accessible bytes must lie within the bounds of an
allocated object that ``ptr`` is :ref:`based <pointeraliasing>` on, and
poison bytes are not considered accessed for the purposes of data races or
``noalias`` constraints. The behavior is undefined if ``N`` exceeds ``S``.

The behavior is undefined if the speculative load accesses memory that would
fault (i.e., the oracle or ``llvm.can.load.speculatively`` would indicate the
access is not safe).

.. _int_can_load_speculatively:

'``llvm.can.load.speculatively``' Intrinsic
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^

Syntax:
"""""""
This is an overloaded intrinsic.

::

declare i1 @llvm.can.load.speculatively.p0(ptr <ptr>, i64 <num_bytes>)
declare i1 @llvm.can.load.speculatively.p1(ptr addrspace(1) <ptr>, i64 <num_bytes>)

Overview:
"""""""""

The '``llvm.can.load.speculatively``' intrinsic returns true if it is safe
to speculatively load ``num_bytes`` bytes starting from ``ptr``,
even if the memory may be beyond the bounds of an allocated object.

Arguments:
""""""""""

The first argument is a pointer to the memory location.

The second argument is an i64 specifying the size in bytes of the load.
The size must be a positive power of 2. If the size is not a power-of-2, the
result is ``poison``.

Semantics:
""""""""""

This intrinsic has **target-dependent** semantics. It returns ``true`` if
``num_bytes`` bytes starting at ``ptr + I * num_bytes``, for all non-negative
integers ``I`` where the computed address does not wrap around the address
space, can be loaded speculatively, even if the memory is beyond the bounds of
an allocated object. It returns ``false`` otherwise.

The specific conditions under which this intrinsic returns ``true`` are
determined by the target. For example, a target may check whether the pointer
alignment guarantees all such loads cannot cross a page boundary.

.. code-block:: llvm

; Check if we can safely load 16 bytes from %ptr
%can_load = call i1 @llvm.can.load.speculatively.p0(ptr %ptr, i64 16)
br i1 %can_load, label %speculative_path, label %safe_path

speculative_path:
; Safe to speculatively load from %ptr
%vec = call <4 x i32> @llvm.speculative.load.v4i32.p0(ptr %ptr, i64 16)
...

safe_path:
; Fall back to masked load or scalar operations
...


Memory Use Markers
------------------

Expand Down
13 changes: 13 additions & 0 deletions llvm/include/llvm/CodeGen/TargetLowering.h
Original file line number Diff line number Diff line change
Expand Up @@ -2284,6 +2284,19 @@ class LLVM_ABI TargetLoweringBase {
llvm_unreachable("Store conditional unimplemented on this target");
}

/// Emit code to check if a speculative load of the given size from Ptr is
/// safe. Returns a Value* representing the check result (i1), or nullptr
/// to use the default lowering (which returns false). Targets can override
/// to provide their own safety check (e.g., alignment-based page boundary
/// check).
/// \param Builder IRBuilder positioned at the intrinsic call site
/// \param Ptr the pointer operand
/// \param Size the size in bytes (constant or runtime value for scalable)
virtual Value *emitCanLoadSpeculatively(IRBuilderBase &Builder, Value *Ptr,
Value *Size) const {
return nullptr;
}

/// Perform a masked atomicrmw using a target-specific intrinsic. This
/// represents the core LL/SC loop which will be lowered at a late stage by
/// the backend. The target-specific intrinsic returns the loaded value and
Expand Down
14 changes: 14 additions & 0 deletions llvm/include/llvm/IR/Intrinsics.td
Original file line number Diff line number Diff line change
Expand Up @@ -2604,6 +2604,20 @@ def int_experimental_vector_compress:
[LLVMMatchType<0>, LLVMScalarOrSameVectorWidth<0, llvm_i1_ty>, LLVMMatchType<0>],
[IntrNoMem]>;

// Speculatively load a value from memory; lowers to a regular aligned load.
def int_speculative_load:
DefaultAttrsIntrinsic<[llvm_anyvector_ty],
[llvm_anyptr_ty, llvm_i1_ty, llvm_vararg_ty],
[IntrReadMem, IntrArgMemOnly, IntrWillReturn, NoCapture<ArgIndex<0>>,
ImmArg<ArgIndex<1>>]>;

// Returns true if it's safe to speculatively load 'num_bytes' from 'ptr'.
// The size can be a runtime value to support scalable vectors.
def int_can_load_speculatively:
DefaultAttrsIntrinsic<[llvm_i1_ty],
[llvm_anyptr_ty, llvm_i64_ty],
[IntrNoMem, IntrSpeculatable, IntrWillReturn]>;

// Test whether a pointer is associated with a type metadata identifier.
def int_type_test : DefaultAttrsIntrinsic<[llvm_i1_ty], [llvm_ptr_ty, llvm_metadata_ty],
[IntrNoMem, IntrSpeculatable]>;
Expand Down
36 changes: 36 additions & 0 deletions llvm/lib/CodeGen/PreISelIntrinsicLowering.cpp
Original file line number Diff line number Diff line change
Expand Up @@ -136,6 +136,39 @@ static bool lowerLoadRelative(Function &F) {
return Changed;
}

/// Lower @llvm.can.load.speculatively using target-specific expansion.
/// Each target provides its own expansion via
/// TargetLowering::emitCanLoadSpeculatively.
/// The default expansion returns false (conservative).
static bool lowerCanLoadSpeculatively(Function &F, const TargetMachine *TM) {
bool Changed = false;

for (Use &U : llvm::make_early_inc_range(F.uses())) {
auto *CI = dyn_cast<CallInst>(U.getUser());
if (!CI || CI->getCalledOperand() != &F)
continue;

Function *ParentFunc = CI->getFunction();
const TargetLowering *TLI =
TM->getSubtargetImpl(*ParentFunc)->getTargetLowering();

IRBuilder<> Builder(CI);
Value *Ptr = CI->getArgOperand(0);
Value *Size = CI->getArgOperand(1);

// Ask target for expansion; nullptr means use default (return false)
Value *Result = TLI->emitCanLoadSpeculatively(Builder, Ptr, Size);
if (!Result)
Result = Builder.getFalse();

CI->replaceAllUsesWith(Result);
CI->eraseFromParent();
Changed = true;
}

return Changed;
}

// ObjCARC has knowledge about whether an obj-c runtime function needs to be
// always tail-called or never tail-called.
static CallInst::TailCallKind getOverridingTailCallKind(const Function &F) {
Expand Down Expand Up @@ -694,6 +727,9 @@ bool PreISelIntrinsicLowering::lowerIntrinsics(Module &M) const {
case Intrinsic::load_relative:
Changed |= lowerLoadRelative(F);
break;
case Intrinsic::can_load_speculatively:
Changed |= lowerCanLoadSpeculatively(F, TM);
break;
case Intrinsic::is_constant:
case Intrinsic::objectsize:
Changed |= forEachCall(F, [&](CallInst *CI) {
Expand Down
32 changes: 32 additions & 0 deletions llvm/lib/CodeGen/SelectionDAG/SelectionDAGBuilder.cpp
Original file line number Diff line number Diff line change
Expand Up @@ -5144,6 +5144,35 @@ void SelectionDAGBuilder::visitMaskedLoad(const CallInst &I, bool IsExpanding) {
setValue(&I, Res);
}

void SelectionDAGBuilder::visitSpeculativeLoad(const CallInst &I) {
SDLoc sdl = getCurSDLoc();
Value *PtrOperand = I.getArgOperand(0);
// The remaining arguments (num_accessible_bytes or oracle function + args)
// are IR-level semantics only; they are not needed at codegen.
SDValue Ptr = getValue(PtrOperand);

const TargetLowering &TLI = DAG.getTargetLoweringInfo();
EVT VT = TLI.getValueType(DAG.getDataLayout(), I.getType());
Align Alignment = I.getParamAlign(0).valueOrOne();
AAMDNodes AAInfo = I.getAAMetadata();
TypeSize StoreSize = VT.getStoreSize();

SDValue InChain = DAG.getRoot();

// Use MOLoad but NOT MODereferenceable - the memory may not be
// fully dereferenceable.
MachineMemOperand::Flags MMOFlags = MachineMemOperand::MOLoad;
LocationSize LocSize = StoreSize.isScalable()
? LocationSize::beforeOrAfterPointer()
: LocationSize::precise(StoreSize);
MachineMemOperand *MMO = DAG.getMachineFunction().getMachineMemOperand(
MachinePointerInfo(PtrOperand), MMOFlags, LocSize, Alignment, AAInfo);

SDValue Load = DAG.getLoad(VT, sdl, InChain, Ptr, MMO);
PendingLoads.push_back(Load.getValue(1));
setValue(&I, Load);
}

void SelectionDAGBuilder::visitMaskedGather(const CallInst &I) {
SDLoc sdl = getCurSDLoc();

Expand Down Expand Up @@ -6905,6 +6934,9 @@ void SelectionDAGBuilder::visitIntrinsicCall(const CallInst &I,
case Intrinsic::masked_compressstore:
visitMaskedStore(I, true /* IsCompressing */);
return;
case Intrinsic::speculative_load:
visitSpeculativeLoad(I);
return;
case Intrinsic::powi:
setValue(&I, ExpandPowI(sdl, getValue(I.getArgOperand(0)),
getValue(I.getArgOperand(1)), DAG));
Expand Down
1 change: 1 addition & 0 deletions llvm/lib/CodeGen/SelectionDAG/SelectionDAGBuilder.h
Original file line number Diff line number Diff line change
Expand Up @@ -620,6 +620,7 @@ class SelectionDAGBuilder {
void visitStore(const StoreInst &I);
void visitMaskedLoad(const CallInst &I, bool IsExpanding = false);
void visitMaskedStore(const CallInst &I, bool IsCompressing = false);
void visitSpeculativeLoad(const CallInst &I);
void visitMaskedGather(const CallInst &I);
void visitMaskedScatter(const CallInst &I);
void visitAtomicCmpXchg(const AtomicCmpXchgInst &I);
Expand Down
64 changes: 64 additions & 0 deletions llvm/lib/IR/Verifier.cpp
Original file line number Diff line number Diff line change
Expand Up @@ -6783,6 +6783,70 @@ void Verifier::visitIntrinsicCall(Intrinsic::ID ID, CallBase &Call) {
&Call);
break;
}
case Intrinsic::speculative_load: {
Type *LoadTy = Call.getType();
TypeSize Size = DL.getTypeStoreSize(LoadTy);
// For scalable vectors, check the known minimum size is a power of 2.
Check(Size.getKnownMinValue() > 0 && isPowerOf2_64(Size.getKnownMinValue()),
"llvm.speculative.load type must have a power-of-2 size", &Call);

unsigned NumArgs = Call.arg_size();
Check(NumArgs >= 3, "llvm.speculative.load requires at least 3 arguments",
&Call);

Value *PayloadArg = Call.getArgOperand(2);
if (PayloadArg->getType()->isIntegerTy(64)) {
// Direct form: (ptr, i1 from_end, i64 num_accessible_bytes)
Check(NumArgs == 3,
"llvm.speculative.load direct form has too many arguments", &Call);
if (auto *CI = dyn_cast<ConstantInt>(PayloadArg)) {
Check(Size.isScalable() || CI->getZExtValue() <= Size.getFixedValue(),
"llvm.speculative.load num_accessible_bytes must not exceed "
"the result size in bytes",
&Call);
}
} else {
// Oracle form: (ptr, i1 from_end, oracle_fn_ptr, args...)
auto *OracleFn = dyn_cast<Function>(PayloadArg);
Check(OracleFn,
"llvm.speculative.load third argument must be i64 or a direct "
"reference to an oracle function",
&Call);

Check(OracleFn->onlyReadsMemory() && OracleFn->onlyAccessesArgMemory(),
"llvm.speculative.load oracle function must not have side effects "
"and may only read memory through its arguments",
&Call);

FunctionType *FTy = OracleFn->getFunctionType();
Check(FTy->getReturnType()->isIntegerTy(64),
"llvm.speculative.load oracle function must return i64", &Call);

unsigned OracleArgsStart = 3;
unsigned NumOracleArgs = NumArgs - OracleArgsStart;
Check(FTy->isVarArg() ? NumOracleArgs >= FTy->getNumParams()
: NumOracleArgs == FTy->getNumParams(),
"llvm.speculative.load oracle function argument count mismatch",
&Call);
for (unsigned I = 0, E = FTy->getNumParams(); I < E; ++I) {
Check(FTy->getParamType(I) ==
Call.getArgOperand(I + OracleArgsStart)->getType(),
"llvm.speculative.load oracle function argument type mismatch",
&Call);
}
}
break;
}
case Intrinsic::can_load_speculatively: {
// If size is a constant, verify it's a positive power of 2.
if (auto *SizeCI = dyn_cast<ConstantInt>(Call.getArgOperand(1))) {
uint64_t Size = SizeCI->getZExtValue();
Check(Size > 0 && isPowerOf2_64(Size),
"llvm.can.load.speculatively size must be a positive power of 2",
&Call);
}
break;
}
case Intrinsic::vector_insert: {
Value *Vec = Call.getArgOperand(0);
Value *SubVec = Call.getArgOperand(1);
Expand Down
Loading