Skip to content

[TTI] Provide conservative costs for @llvm.speculative.load.#180036

Open
fhahn wants to merge 2 commits intollvm:mainfrom
fhahn:tti-speculative-load
Open

[TTI] Provide conservative costs for @llvm.speculative.load.#180036
fhahn wants to merge 2 commits intollvm:mainfrom
fhahn:tti-speculative-load

Conversation

@fhahn
Copy link
Copy Markdown
Contributor

@fhahn fhahn commented Feb 5, 2026

Add TTI support for @llvm.speculative.load, defaulting to Invalid if not
implemented by the target.

Provide implementation for AArch64, which checks if the loaded type is
<= 16 bytes.

Depends on #179642 (included in PR)

@llvmbot llvmbot added backend:AArch64 backend:X86 llvm:codegen llvm:SelectionDAG SelectionDAGISel as well llvm:ir llvm:analysis Includes value tracking, cost tables and constant folding labels Feb 5, 2026
@llvmbot
Copy link
Copy Markdown
Member

llvmbot commented Feb 5, 2026

@llvm/pr-subscribers-llvm-ir
@llvm/pr-subscribers-backend-x86
@llvm/pr-subscribers-llvm-analysis

@llvm/pr-subscribers-backend-aarch64

Author: Florian Hahn (fhahn)

Changes

Add TTI support for @llvm.speculative.load, defaulting to Invalid if not
implemented by the target.

Provide implementation for AArch64, which checks if the loaded type is
<= 16 bytes.

Depends on #179642 (included in PR)


Patch is 46.07 KiB, truncated to 20.00 KiB below, full version: https://github.com/llvm/llvm-project/pull/180036.diff

21 Files Affected:

  • (modified) llvm/docs/LangRef.rst (+113)
  • (modified) llvm/include/llvm/Analysis/TargetTransformInfoImpl.h (+2)
  • (modified) llvm/include/llvm/CodeGen/BasicTTIImpl.h (+3)
  • (modified) llvm/include/llvm/CodeGen/TargetLowering.h (+13)
  • (modified) llvm/include/llvm/IR/Intrinsics.td (+14)
  • (modified) llvm/lib/CodeGen/PreISelIntrinsicLowering.cpp (+36)
  • (modified) llvm/lib/CodeGen/SelectionDAG/SelectionDAGBuilder.cpp (+30)
  • (modified) llvm/lib/CodeGen/SelectionDAG/SelectionDAGBuilder.h (+1)
  • (modified) llvm/lib/IR/Verifier.cpp (+18)
  • (modified) llvm/lib/Target/AArch64/AArch64ISelLowering.cpp (+50)
  • (modified) llvm/lib/Target/AArch64/AArch64ISelLowering.h (+2)
  • (modified) llvm/lib/Target/AArch64/AArch64TargetTransformInfo.cpp (+19)
  • (added) llvm/test/Analysis/CostModel/AArch64/speculative-load.ll (+88)
  • (added) llvm/test/Analysis/CostModel/X86/speculative-load.ll (+30)
  • (added) llvm/test/CodeGen/AArch64/can-load-speculatively.ll (+75)
  • (added) llvm/test/CodeGen/AArch64/speculative-load-intrinsic-sve.ll (+66)
  • (added) llvm/test/CodeGen/AArch64/speculative-load-intrinsic.ll (+117)
  • (added) llvm/test/CodeGen/X86/can-load-speculatively.ll (+32)
  • (added) llvm/test/CodeGen/X86/speculative-load-intrinsic.ll (+146)
  • (added) llvm/test/Verifier/can-load-speculatively.ll (+19)
  • (added) llvm/test/Verifier/speculative-load.ll (+18)
diff --git a/llvm/docs/LangRef.rst b/llvm/docs/LangRef.rst
index ddd5087830acc..f4f4a6a3fc9f2 100644
--- a/llvm/docs/LangRef.rst
+++ b/llvm/docs/LangRef.rst
@@ -27678,6 +27678,119 @@ The '``llvm.masked.compressstore``' intrinsic is designed for compressing data i
 Other targets may support this intrinsic differently, for example, by lowering it into a sequence of branches that guard scalar store operations.
 
 
+Speculative Load Intrinsics
+---------------------------
+
+LLVM provides intrinsics for speculatively loading memory that may be
+out-of-bounds. These intrinsics enable optimizations like early-exit loop
+vectorization where the vectorized loop may read beyond the end of an array,
+provided the access is guaranteed to not trap by target-specific checks.
+
+.. _int_speculative_load:
+
+'``llvm.speculative.load``' Intrinsic
+^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
+
+Syntax:
+"""""""
+This is an overloaded intrinsic.
+
+::
+
+      declare <4 x float>  @llvm.speculative.load.v4f32.p0(ptr <ptr>)
+      declare <8 x i32>    @llvm.speculative.load.v8i32.p0(ptr <ptr>)
+      declare i64          @llvm.speculative.load.i64.p0(ptr <ptr>)
+
+Overview:
+"""""""""
+
+The '``llvm.speculative.load``' intrinsic loads a value from memory. Unlike a
+regular load, the memory access may extend beyond the bounds of the allocated
+object, provided the pointer has been verified by
+:ref:`llvm.can.load.speculatively <int_can_load_speculatively>` to ensure the
+access cannot fault.
+
+Arguments:
+""""""""""
+
+The argument is a pointer to the memory location to load from. The return type
+must have a power-of-2 size in bytes.
+
+Semantics:
+""""""""""
+
+The '``llvm.speculative.load``' intrinsic performs a load that may access
+memory beyond the allocated object. It must be used in combination with
+:ref:`llvm.can.load.speculatively <int_can_load_speculatively>` to ensure
+the access cannot fault.
+
+For bytes that are within the bounds of the allocated object, the intrinsic
+returns the stored value. For bytes that are beyond the bounds of the
+allocated object, the intrinsic returns ``poison`` for those bytes. At least the
+first accessed byte must be within the bounds of an allocated object the pointer is
+based on.
+
+The behavior is undefined if this intrinsic is used to load from a pointer
+for which ``llvm.can.load.speculatively`` would return false.
+
+.. _int_can_load_speculatively:
+
+'``llvm.can.load.speculatively``' Intrinsic
+^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
+
+Syntax:
+"""""""
+This is an overloaded intrinsic.
+
+::
+
+      declare i1 @llvm.can.load.speculatively.p0(ptr <ptr>, i64 <num_bytes>)
+      declare i1 @llvm.can.load.speculatively.p1(ptr addrspace(1) <ptr>, i64 <num_bytes>)
+
+Overview:
+"""""""""
+
+The '``llvm.can.load.speculatively``' intrinsic returns true if it is safe
+to speculatively load ``num_bytes`` bytes starting from ``ptr``,
+even if the memory may be beyond the bounds of an allocated object.
+
+Arguments:
+""""""""""
+
+The first argument is a pointer to the memory location.
+
+The second argument is an i64 specifying the size in bytes of the load.
+The size must be a positive power of 2.  If the size is not a power-of-2, the
+result is ``poison``.
+
+Semantics:
+""""""""""
+
+This intrinsic has **target-dependent** semantics. It returns ``true`` if
+``num_bytes`` bytes starting at ``ptr`` can be loaded speculatively, even
+if the memory is beyond the bounds of an allocated object. It returns
+``false`` otherwise.
+
+The specific conditions under which this intrinsic returns ``true`` are
+determined by the target. For example, a target may check whether the pointer
+alignment guarantees the load cannot cross a page boundary.
+
+.. code-block:: llvm
+
+    ; Check if we can safely load 16 bytes from %ptr
+    %can_load = call i1 @llvm.can.load.speculatively.p0(ptr %ptr, i64 16)
+    br i1 %can_load, label %speculative_path, label %safe_path
+
+    speculative_path:
+      ; Safe to speculatively load from %ptr
+      %vec = call <4 x i32> @llvm.speculative.load.v4i32.p0(ptr %ptr)
+      ...
+
+    safe_path:
+      ; Fall back to masked load or scalar operations
+      ...
+
+
 Memory Use Markers
 ------------------
 
diff --git a/llvm/include/llvm/Analysis/TargetTransformInfoImpl.h b/llvm/include/llvm/Analysis/TargetTransformInfoImpl.h
index 6d27cabf404f8..b83941da917eb 100644
--- a/llvm/include/llvm/Analysis/TargetTransformInfoImpl.h
+++ b/llvm/include/llvm/Analysis/TargetTransformInfoImpl.h
@@ -905,6 +905,8 @@ class TargetTransformInfoImplBase {
     switch (ICA.getID()) {
     default:
       break;
+    case Intrinsic::speculative_load:
+      return InstructionCost::getInvalid();
     case Intrinsic::allow_runtime_check:
     case Intrinsic::allow_ubsan_check:
     case Intrinsic::annotation:
diff --git a/llvm/include/llvm/CodeGen/BasicTTIImpl.h b/llvm/include/llvm/CodeGen/BasicTTIImpl.h
index 21afcbefdf719..0de90f7a43a53 100644
--- a/llvm/include/llvm/CodeGen/BasicTTIImpl.h
+++ b/llvm/include/llvm/CodeGen/BasicTTIImpl.h
@@ -1994,6 +1994,9 @@ class BasicTTIImplBase : public TargetTransformInfoImplCRTPBase<T> {
       // The cost of materialising a constant integer vector.
       return TargetTransformInfo::TCC_Basic;
     }
+    case Intrinsic::speculative_load:
+      // Delegate to base; targets must opt-in with a valid cost.
+      return BaseT::getIntrinsicInstrCost(ICA, CostKind);
     case Intrinsic::vector_extract: {
       // FIXME: Handle case where a scalable vector is extracted from a scalable
       // vector
diff --git a/llvm/include/llvm/CodeGen/TargetLowering.h b/llvm/include/llvm/CodeGen/TargetLowering.h
index ada4ffd3bcc89..ebc6b64590dea 100644
--- a/llvm/include/llvm/CodeGen/TargetLowering.h
+++ b/llvm/include/llvm/CodeGen/TargetLowering.h
@@ -2292,6 +2292,19 @@ class LLVM_ABI TargetLoweringBase {
     llvm_unreachable("Store conditional unimplemented on this target");
   }
 
+  /// Emit code to check if a speculative load of the given size from Ptr is
+  /// safe. Returns a Value* representing the check result (i1), or nullptr
+  /// to use the default lowering (which returns false). Targets can override
+  /// to provide their own safety check (e.g., alignment-based page boundary
+  /// check).
+  /// \param Builder IRBuilder positioned at the intrinsic call site
+  /// \param Ptr the pointer operand
+  /// \param Size the size in bytes (constant or runtime value for scalable)
+  virtual Value *emitCanLoadSpeculatively(IRBuilderBase &Builder, Value *Ptr,
+                                          Value *Size) const {
+    return nullptr;
+  }
+
   /// Perform a masked atomicrmw using a target-specific intrinsic. This
   /// represents the core LL/SC loop which will be lowered at a late stage by
   /// the backend. The target-specific intrinsic returns the loaded value and
diff --git a/llvm/include/llvm/IR/Intrinsics.td b/llvm/include/llvm/IR/Intrinsics.td
index ab5f3fbbaf860..f2d5f9cd18894 100644
--- a/llvm/include/llvm/IR/Intrinsics.td
+++ b/llvm/include/llvm/IR/Intrinsics.td
@@ -2601,6 +2601,20 @@ def int_experimental_vector_compress:
               [LLVMMatchType<0>, LLVMScalarOrSameVectorWidth<0, llvm_i1_ty>, LLVMMatchType<0>],
               [IntrNoMem]>;
 
+// Speculatively load a value from memory; lowers to a regular aligned load.
+// The loaded type must have a power-of-2 size.
+def int_speculative_load:
+  DefaultAttrsIntrinsic<[llvm_any_ty],
+            [llvm_anyptr_ty],
+            [IntrArgMemOnly, IntrWillReturn, NoCapture<ArgIndex<0>>]>;
+
+// Returns true if it's safe to speculatively load 'num_bytes' from 'ptr'.
+// The size can be a runtime value to support scalable vectors.
+def int_can_load_speculatively:
+  DefaultAttrsIntrinsic<[llvm_i1_ty],
+            [llvm_anyptr_ty, llvm_i64_ty],
+            [IntrNoMem, IntrSpeculatable, IntrWillReturn]>;
+
 // Test whether a pointer is associated with a type metadata identifier.
 def int_type_test : DefaultAttrsIntrinsic<[llvm_i1_ty], [llvm_ptr_ty, llvm_metadata_ty],
                               [IntrNoMem, IntrSpeculatable]>;
diff --git a/llvm/lib/CodeGen/PreISelIntrinsicLowering.cpp b/llvm/lib/CodeGen/PreISelIntrinsicLowering.cpp
index 201a7c0f37653..83d61655c9d9e 100644
--- a/llvm/lib/CodeGen/PreISelIntrinsicLowering.cpp
+++ b/llvm/lib/CodeGen/PreISelIntrinsicLowering.cpp
@@ -131,6 +131,39 @@ static bool lowerLoadRelative(Function &F) {
   return Changed;
 }
 
+/// Lower @llvm.can.load.speculatively using target-specific expansion.
+/// Each target provides its own expansion via
+/// TargetLowering::emitCanLoadSpeculatively.
+/// The default expansion returns false (conservative).
+static bool lowerCanLoadSpeculatively(Function &F, const TargetMachine *TM) {
+  bool Changed = false;
+
+  for (Use &U : llvm::make_early_inc_range(F.uses())) {
+    auto *CI = dyn_cast<CallInst>(U.getUser());
+    if (!CI || CI->getCalledOperand() != &F)
+      continue;
+
+    Function *ParentFunc = CI->getFunction();
+    const TargetLowering *TLI =
+        TM->getSubtargetImpl(*ParentFunc)->getTargetLowering();
+
+    IRBuilder<> Builder(CI);
+    Value *Ptr = CI->getArgOperand(0);
+    Value *Size = CI->getArgOperand(1);
+
+    // Ask target for expansion; nullptr means use default (return false)
+    Value *Result = TLI->emitCanLoadSpeculatively(Builder, Ptr, Size);
+    if (!Result)
+      Result = Builder.getFalse();
+
+    CI->replaceAllUsesWith(Result);
+    CI->eraseFromParent();
+    Changed = true;
+  }
+
+  return Changed;
+}
+
 // ObjCARC has knowledge about whether an obj-c runtime function needs to be
 // always tail-called or never tail-called.
 static CallInst::TailCallKind getOverridingTailCallKind(const Function &F) {
@@ -632,6 +665,9 @@ bool PreISelIntrinsicLowering::lowerIntrinsics(Module &M) const {
     case Intrinsic::load_relative:
       Changed |= lowerLoadRelative(F);
       break;
+    case Intrinsic::can_load_speculatively:
+      Changed |= lowerCanLoadSpeculatively(F, TM);
+      break;
     case Intrinsic::is_constant:
     case Intrinsic::objectsize:
       Changed |= forEachCall(F, [&](CallInst *CI) {
diff --git a/llvm/lib/CodeGen/SelectionDAG/SelectionDAGBuilder.cpp b/llvm/lib/CodeGen/SelectionDAG/SelectionDAGBuilder.cpp
index e191cc5524a14..196f380b5a4dc 100644
--- a/llvm/lib/CodeGen/SelectionDAG/SelectionDAGBuilder.cpp
+++ b/llvm/lib/CodeGen/SelectionDAG/SelectionDAGBuilder.cpp
@@ -5122,6 +5122,33 @@ void SelectionDAGBuilder::visitMaskedLoad(const CallInst &I, bool IsExpanding) {
   setValue(&I, Res);
 }
 
+void SelectionDAGBuilder::visitSpeculativeLoad(const CallInst &I) {
+  SDLoc sdl = getCurSDLoc();
+  Value *PtrOperand = I.getArgOperand(0);
+  SDValue Ptr = getValue(PtrOperand);
+
+  const TargetLowering &TLI = DAG.getTargetLoweringInfo();
+  EVT VT = TLI.getValueType(DAG.getDataLayout(), I.getType());
+  Align Alignment = I.getParamAlign(0).valueOrOne();
+  AAMDNodes AAInfo = I.getAAMetadata();
+  TypeSize StoreSize = VT.getStoreSize();
+
+  SDValue InChain = DAG.getRoot();
+
+  // Use MOLoad but NOT MODereferenceable - the memory may not be
+  // fully dereferenceable.
+  MachineMemOperand::Flags MMOFlags = MachineMemOperand::MOLoad;
+  LocationSize LocSize = StoreSize.isScalable()
+                             ? LocationSize::beforeOrAfterPointer()
+                             : LocationSize::precise(StoreSize);
+  MachineMemOperand *MMO = DAG.getMachineFunction().getMachineMemOperand(
+      MachinePointerInfo(PtrOperand), MMOFlags, LocSize, Alignment, AAInfo);
+
+  SDValue Load = DAG.getLoad(VT, sdl, InChain, Ptr, MMO);
+  PendingLoads.push_back(Load.getValue(1));
+  setValue(&I, Load);
+}
+
 void SelectionDAGBuilder::visitMaskedGather(const CallInst &I) {
   SDLoc sdl = getCurSDLoc();
 
@@ -6873,6 +6900,9 @@ void SelectionDAGBuilder::visitIntrinsicCall(const CallInst &I,
   case Intrinsic::masked_compressstore:
     visitMaskedStore(I, true /* IsCompressing */);
     return;
+  case Intrinsic::speculative_load:
+    visitSpeculativeLoad(I);
+    return;
   case Intrinsic::powi:
     setValue(&I, ExpandPowI(sdl, getValue(I.getArgOperand(0)),
                             getValue(I.getArgOperand(1)), DAG));
diff --git a/llvm/lib/CodeGen/SelectionDAG/SelectionDAGBuilder.h b/llvm/lib/CodeGen/SelectionDAG/SelectionDAGBuilder.h
index f8aecea25b3d6..dad406f48b77b 100644
--- a/llvm/lib/CodeGen/SelectionDAG/SelectionDAGBuilder.h
+++ b/llvm/lib/CodeGen/SelectionDAG/SelectionDAGBuilder.h
@@ -619,6 +619,7 @@ class SelectionDAGBuilder {
   void visitStore(const StoreInst &I);
   void visitMaskedLoad(const CallInst &I, bool IsExpanding = false);
   void visitMaskedStore(const CallInst &I, bool IsCompressing = false);
+  void visitSpeculativeLoad(const CallInst &I);
   void visitMaskedGather(const CallInst &I);
   void visitMaskedScatter(const CallInst &I);
   void visitAtomicCmpXchg(const AtomicCmpXchgInst &I);
diff --git a/llvm/lib/IR/Verifier.cpp b/llvm/lib/IR/Verifier.cpp
index 120a7daa16f05..5003a6886d67f 100644
--- a/llvm/lib/IR/Verifier.cpp
+++ b/llvm/lib/IR/Verifier.cpp
@@ -6744,6 +6744,24 @@ void Verifier::visitIntrinsicCall(Intrinsic::ID ID, CallBase &Call) {
           &Call);
     break;
   }
+  case Intrinsic::speculative_load: {
+    Type *LoadTy = Call.getType();
+    TypeSize Size = DL.getTypeStoreSize(LoadTy);
+    // For scalable vectors, check the known minimum size is a power of 2.
+    Check(Size.getKnownMinValue() > 0 && isPowerOf2_64(Size.getKnownMinValue()),
+          "llvm.speculative.load type must have a power-of-2 size", &Call);
+    break;
+  }
+  case Intrinsic::can_load_speculatively: {
+    // If size is a constant, verify it's a positive power of 2.
+    if (auto *SizeCI = dyn_cast<ConstantInt>(Call.getArgOperand(1))) {
+      uint64_t Size = SizeCI->getZExtValue();
+      Check(Size > 0 && isPowerOf2_64(Size),
+            "llvm.can.load.speculatively size must be a positive power of 2",
+            &Call);
+    }
+    break;
+  }
   case Intrinsic::vector_insert: {
     Value *Vec = Call.getArgOperand(0);
     Value *SubVec = Call.getArgOperand(1);
diff --git a/llvm/lib/Target/AArch64/AArch64ISelLowering.cpp b/llvm/lib/Target/AArch64/AArch64ISelLowering.cpp
index 88836d6e167b8..76b6871831069 100644
--- a/llvm/lib/Target/AArch64/AArch64ISelLowering.cpp
+++ b/llvm/lib/Target/AArch64/AArch64ISelLowering.cpp
@@ -30215,6 +30215,56 @@ Value *AArch64TargetLowering::emitStoreConditional(IRBuilderBase &Builder,
   return CI;
 }
 
+Value *AArch64TargetLowering::emitCanLoadSpeculatively(IRBuilderBase &Builder,
+                                                       Value *Ptr,
+                                                       Value *Size) const {
+  unsigned AS = cast<PointerType>(Ptr->getType())->getAddressSpace();
+  // Conservatively only allow speculation for address space 0.
+  if (AS != 0)
+    return nullptr;
+  // For power-of-2 sizes <= 16, emit alignment check: (ptr & (size - 1)) == 0.
+  // If the pointer is aligned to at least 'size' bytes, loading 'size' bytes
+  // cannot cross a page boundary, so it's safe to speculate.
+  // The 16-byte limit ensures correctness with MTE (memory tagging), since
+  // MTE uses 16-byte tag granules.
+  //
+  // The alignment check only works for power-of-2 sizes. For non-power-of-2
+  // sizes, we conservatively return false.
+  const DataLayout &DL = Builder.GetInsertBlock()->getModule()->getDataLayout();
+
+  unsigned PtrBits = DL.getPointerSizeInBits(AS);
+  Type *IntPtrTy = Builder.getIntNTy(PtrBits);
+  if (auto *CI = dyn_cast<ConstantInt>(Size)) {
+    uint64_t SizeVal = CI->getZExtValue();
+    assert(isPowerOf2_64(SizeVal) && "size must be power-of-two");
+    // For constant sizes > 16, return nullptr (default false).
+    if (SizeVal > 16)
+      return nullptr;
+
+    // Power-of-2 constant size <= 16: use fast alignment check.
+    Value *PtrInt = Builder.CreatePtrToInt(Ptr, IntPtrTy);
+    Value *Mask = ConstantInt::get(IntPtrTy, SizeVal - 1);
+    Value *Masked = Builder.CreateAnd(PtrInt, Mask);
+    return Builder.CreateICmpEQ(Masked, ConstantInt::get(IntPtrTy, 0));
+  }
+
+  // Check power-of-2 size <= 16 and alignment.
+  Value *PtrInt = Builder.CreatePtrToInt(Ptr, IntPtrTy);
+  Value *SizeExt = Builder.CreateZExtOrTrunc(Size, IntPtrTy);
+
+  Value *SizeLE16 =
+      Builder.CreateICmpULE(SizeExt, ConstantInt::get(IntPtrTy, 16));
+
+  // alignment check: (ptr & (size - 1)) == 0
+  Value *SizeMinusOne =
+      Builder.CreateSub(SizeExt, ConstantInt::get(IntPtrTy, 1));
+  Value *Masked = Builder.CreateAnd(PtrInt, SizeMinusOne);
+  Value *AlignCheck =
+      Builder.CreateICmpEQ(Masked, ConstantInt::get(IntPtrTy, 0));
+
+  return Builder.CreateAnd(SizeLE16, AlignCheck);
+}
+
 bool AArch64TargetLowering::functionArgumentNeedsConsecutiveRegisters(
     Type *Ty, CallingConv::ID CallConv, bool isVarArg,
     const DataLayout &DL) const {
diff --git a/llvm/lib/Target/AArch64/AArch64ISelLowering.h b/llvm/lib/Target/AArch64/AArch64ISelLowering.h
index 89a8858550ca2..884e072eaa925 100644
--- a/llvm/lib/Target/AArch64/AArch64ISelLowering.h
+++ b/llvm/lib/Target/AArch64/AArch64ISelLowering.h
@@ -350,6 +350,8 @@ class AArch64TargetLowering : public TargetLowering {
                         AtomicOrdering Ord) const override;
   Value *emitStoreConditional(IRBuilderBase &Builder, Value *Val, Value *Addr,
                               AtomicOrdering Ord) const override;
+  Value *emitCanLoadSpeculatively(IRBuilderBase &Builder, Value *Ptr,
+                                  Value *Size) const override;
 
   void emitAtomicCmpXchgNoStoreLLBalance(IRBuilderBase &Builder) const override;
 
diff --git a/llvm/lib/Target/AArch64/AArch64TargetTransformInfo.cpp b/llvm/lib/Target/AArch64/AArch64TargetTransformInfo.cpp
index bdf06e39d7367..0945c30429dfb 100644
--- a/llvm/lib/Target/AArch64/AArch64TargetTransformInfo.cpp
+++ b/llvm/lib/Target/AArch64/AArch64TargetTransformInfo.cpp
@@ -628,6 +628,25 @@ AArch64TTIImpl::getIntrinsicInstrCost(const IntrinsicCostAttributes &ICA,
       return InstructionCost::getInvalid();
 
   switch (ICA.getID()) {
+  case Intrinsic::speculative_load: {
+    // Speculative loads are only valid for types <= 16 bytes due to MTE
+    // (Memory Tagging Extension) using 16-byte tag granules. Loads larger
+    // than 16 bytes could cross a tag granule boundary.
+    auto LT = getTypeLegalizationCost(RetTy);
+    if (!LT.first.isValid())
+      return InstructionCost::getInvalid();
+    // For scalable vectors, check that we use a single register (which means
+    // <= 16 bytes at minimum vscale). For fixed types, compute the actual size.
+    if (isa<ScalableVectorType>(RetTy)) {
+      if (LT.first.getValue() != 1)
+        return InstructionCost::getInvalid();
+    } else {
+      if (LT.first.getValue() * LT.second.getStoreSize() > 16)
+        return InstructionCost::getInvalid();
+    }
+    // Return cost of a regular load.
+    return getMemoryOpCost(Instruction::Load, RetTy, Align(1), 0, CostKind);
+  }
   case Intrinsic::experimental_vector_histogram_add: {
     InstructionCost HistCost = getHistogramCost(ST, ICA);
     // If the cost isn't valid, we may still be able to scalarize
diff --git a/llvm/test/Analysis/CostModel/AArch64/speculative-load.ll b/llvm/test/Analysis/CostModel/AArch64/speculative-load.ll
new file mode 100644
index 0000000000000..7587c3ec77b40
--- /dev/null
+++ b/llvm/test/Analysis/CostModel/AArch64/speculative-load.ll
@@ -0,0 +1,88 @@
+; NOTE: Assertions have been autogenerated by utils/update_analyze_test_checks.py
+; RUN: opt -passes="print<cost-model>" 2>&1 -disable-output -mtriple=aarch64 < %s | FileCheck %s --check-prefixes=COMMON,NEON
+; RUN: opt -passes="print<cost-model>" 2>&1 -disable-output -mtriple=aarch64 -mattr=+sve < %s | FileCheck %s --check-prefixes=COMMON,SVE
+
+define void @speculative_load_cost_fixed(ptr %p) {
+  ; Scalar types - all valid (<= 16 bytes)
+; COMMON-LABEL: 'speculative_load_cost_fixed'
+; COMMON-NEXT:  Cost Model: Found an estimated cost of 1 for instruction: %1 = call i8 @llvm.speculative.load.i8.p0(ptr %p)
+; COMMON-NEXT:  Cost Model: Found an estimated cost of 1 for instruction: %2 = call i16 @llvm.speculative.load.i16.p0(ptr %p)
+; COMMON-NEXT:  Cost Model: Found an estimated cost of 1 for instruction: %3 = call i32 @llvm.speculative.load.i32.p0(ptr %p)
+; COMMON-NEXT:  Cost Model: Found an estimated cost of 1 for instruction: %4 = call i64 @llvm.speculative.load.i64.p0(ptr %p)
+...
[truncated]

Introduce two new intrinsics to enable vectorization of loops with early
exits that have potentially faulting loads.

This has previously been discussed in
llvm#120603 and is similar to
@nikic's https://hackmd.io/@nikic/S1O4QWYZkx, with the major difference
being that there is no `%defined_size` argument and instead the load
returns the stored values for the bytes within bounds and undef
otherwise. I don't think we can easily compute the defined size because
it may depend on the loaded values (i.e. at what lane the early exit has
been taken).

1. `@llvm.speculative.load` (name subject to change) - perform a load that
   may access memory beyond the allocated object. It must be used in
   combination with `@llvm.can.load.speculatively` to ensure the load is
   guaranteed to not trap.

2. `@llvm.can.load.speculatively` - Returns true if it's safe to speculatively
   load a given number of bytes from a pointer. The semantics are
   target-dependent. On some targets, this may check that the access
   does not cross page boundaries, or stricter checks for example on
   AArch64 with MTE, which limits the access size to 16 bytes.

`@llvm.speculative.load` is lowered to a regular load in SelectionDAG
without MODereferenceable. I am not sure if we need to be more careful
than this, i.e. if we could still reason about SelectionDAG loads to
infer dereferencability for the pointer.

`@llvm.can.load.speculatively`  is lowered to regular IR in PreISel
lowering, using a target-lowering hook. By default, it conservatively
expands to false.

These intrinsics should allow the loop vectorizer to vectorize early-exit
loops with potentially non-dereferenceable loads.
Add TTI support for @llvm.speculative.load, defaulting to Invalid if not
implemented by the target.

Provide implementation for AArch64, which checks if the loaded type is
<= 16 bytes.
@fhahn fhahn force-pushed the tti-speculative-load branch from 22902de to c3c6ae7 Compare February 23, 2026 11:56
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

backend:AArch64 backend:X86 llvm:analysis Includes value tracking, cost tables and constant folding llvm:codegen llvm:ir llvm:SelectionDAG SelectionDAGISel as well

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants