[TTI][Vectorize] Migrate masked/gather-scatter/strided/expand-compress costing (NFCI) by arcbbb · Pull Request #165532 · llvm/llvm-project

arcbbb · 2025-10-29T09:43:00Z

In #160470, there is a discussion about the possibility to explored a general approach for handling memory intrinsics.

API changes:

Remove getMaskedMemoryOpCost, getGatherScatterOpCost, getExpandCompressMemoryOpCost, getStridedMemoryOpCost from Analysis/TargetTransformInfo.
Add getMemIntrinsicInstrCost.

In BasicTTIImpl, map intrinsic IDs to existing target implementation until the legacy TTI hooks are retired.

masked_load/store → getMaskedMemoryOpCost
masked_/vp_gather/scatter → getGatherScatterOpCost
masked_expandload/compressstore → getExpandCompressMemoryOpCost
experimental_vp_strided_{load,store} → getStridedMemoryOpCost
TODO: add support for vp_load_ff.

No functional change intended; costs continue to route to the same target-specific hooks.

github-actions · 2025-10-29T09:44:45Z

✅ With the latest revision this PR passed the C/C++ code formatter.

arcbbb · 2025-10-31T01:59:04Z

This approach doesn’t work for getGatherScatterOpCost: ARMTTIImpl::getGatherScatterOpCost walks the use chain of a provided LoadInst/StoreInst, whereas ICA currently expects an IntrinsicInst. Consider introducing getMemIntrinsicInstrCost to cover this scenario..

llvmbot · 2025-11-03T06:45:48Z

@llvm/pr-subscribers-llvm-transforms
@llvm/pr-subscribers-vectorizers
@llvm/pr-subscribers-llvm-analysis

@llvm/pr-subscribers-backend-risc-v

Author: Shih-Po Hung (arcbbb)

Changes

In #160470, there is a discussion about the possibility to explored a general approach for handling memory intrinsics.

This patch adds alignment to IntrinsicCostAttributes for type-based cost queries used by SLP and Loop Vectorizer before IR is materialized.

Candidates to adopt this are getMaskedMemoryOpCost, getGatherScatterOpCost, getExpandCompressMemoryOpCost, and getStridedMemoryOpCost.

Full diff: https://github.com/llvm/llvm-project/pull/165532.diff

7 Files Affected:

(modified) llvm/include/llvm/Analysis/TargetTransformInfo.h (+9-14)
(modified) llvm/include/llvm/Analysis/TargetTransformInfoImpl.h (-8)
(modified) llvm/include/llvm/CodeGen/BasicTTIImpl.h (+18-27)
(modified) llvm/lib/Analysis/TargetTransformInfo.cpp (+8-9)
(modified) llvm/lib/Target/RISCV/RISCVTargetTransformInfo.cpp (+37-23)
(modified) llvm/lib/Target/RISCV/RISCVTargetTransformInfo.h (-6)
(modified) llvm/lib/Transforms/Vectorize/SLPVectorizer.cpp (+35-16)

diff --git a/llvm/include/llvm/Analysis/TargetTransformInfo.h b/llvm/include/llvm/Analysis/TargetTransformInfo.h
index 7b7dc1b46dd80..0270a65eac776 100644
--- a/llvm/include/llvm/Analysis/TargetTransformInfo.h
+++ b/llvm/include/llvm/Analysis/TargetTransformInfo.h
@@ -135,6 +135,9 @@ class IntrinsicCostAttributes {
   InstructionCost ScalarizationCost = InstructionCost::getInvalid();
   TargetLibraryInfo const *LibInfo = nullptr;
 
+  MaybeAlign Alignment;
+  bool VariableMask = false;
+
 public:
   LLVM_ABI IntrinsicCostAttributes(
       Intrinsic::ID Id, const CallBase &CI,
@@ -146,6 +149,10 @@ class IntrinsicCostAttributes {
       FastMathFlags Flags = FastMathFlags(), const IntrinsicInst *I = nullptr,
       InstructionCost ScalarCost = InstructionCost::getInvalid());
 
+  LLVM_ABI IntrinsicCostAttributes(Intrinsic::ID Id, Type *RTy,
+                                   ArrayRef<Type *> Tys, Align Alignment,
+                                   bool VariableMask = false);
+
   LLVM_ABI IntrinsicCostAttributes(Intrinsic::ID Id, Type *RTy,
                                    ArrayRef<const Value *> Args);
 
@@ -160,6 +167,8 @@ class IntrinsicCostAttributes {
   const IntrinsicInst *getInst() const { return II; }
   Type *getReturnType() const { return RetTy; }
   FastMathFlags getFlags() const { return FMF; }
+  MaybeAlign getAlign() const { return Alignment; }
+  bool getVariableMask() const { return VariableMask; }
   InstructionCost getScalarizationCost() const { return ScalarizationCost; }
   const SmallVectorImpl<const Value *> &getArgs() const { return Arguments; }
   const SmallVectorImpl<Type *> &getArgTypes() const { return ParamTys; }
@@ -1586,20 +1595,6 @@ class TargetTransformInfo {
       TTI::TargetCostKind CostKind = TTI::TCK_RecipThroughput,
       const Instruction *I = nullptr) const;
 
-  /// \return The cost of strided memory operations.
-  /// \p Opcode - is a type of memory access Load or Store
-  /// \p DataTy - a vector type of the data to be loaded or stored
-  /// \p Ptr - pointer [or vector of pointers] - address[es] in memory
-  /// \p VariableMask - true when the memory access is predicated with a mask
-  ///                   that is not a compile-time constant
-  /// \p Alignment - alignment of single element
-  /// \p I - the optional original context instruction, if one exists, e.g. the
-  ///        load/store to transform or the call to the gather/scatter intrinsic
-  LLVM_ABI InstructionCost getStridedMemoryOpCost(
-      unsigned Opcode, Type *DataTy, const Value *Ptr, bool VariableMask,
-      Align Alignment, TTI::TargetCostKind CostKind = TTI::TCK_RecipThroughput,
-      const Instruction *I = nullptr) const;
-
   /// \return The cost of the interleaved memory operation.
   /// \p Opcode is the memory operation code
   /// \p VecTy is the vector type of the interleaved access.
diff --git a/llvm/include/llvm/Analysis/TargetTransformInfoImpl.h b/llvm/include/llvm/Analysis/TargetTransformInfoImpl.h
index 4cd607c0d0c8d..a9591704c9d14 100644
--- a/llvm/include/llvm/Analysis/TargetTransformInfoImpl.h
+++ b/llvm/include/llvm/Analysis/TargetTransformInfoImpl.h
@@ -862,14 +862,6 @@ class TargetTransformInfoImplBase {
     return 1;
   }
 
-  virtual InstructionCost
-  getStridedMemoryOpCost(unsigned Opcode, Type *DataTy, const Value *Ptr,
-                         bool VariableMask, Align Alignment,
-                         TTI::TargetCostKind CostKind,
-                         const Instruction *I = nullptr) const {
-    return InstructionCost::getInvalid();
-  }
-
   virtual InstructionCost getInterleavedMemoryOpCost(
       unsigned Opcode, Type *VecTy, unsigned Factor, ArrayRef<unsigned> Indices,
       Align Alignment, unsigned AddressSpace, TTI::TargetCostKind CostKind,
diff --git a/llvm/include/llvm/CodeGen/BasicTTIImpl.h b/llvm/include/llvm/CodeGen/BasicTTIImpl.h
index 76b6c8ec68c72..fbd481d1c794f 100644
--- a/llvm/include/llvm/CodeGen/BasicTTIImpl.h
+++ b/llvm/include/llvm/CodeGen/BasicTTIImpl.h
@@ -1574,18 +1574,6 @@ class BasicTTIImplBase : public TargetTransformInfoImplCRTPBase<T> {
                                        /*IsGatherScatter*/ true, CostKind);
   }
 
-  InstructionCost getStridedMemoryOpCost(unsigned Opcode, Type *DataTy,
-                                         const Value *Ptr, bool VariableMask,
-                                         Align Alignment,
-                                         TTI::TargetCostKind CostKind,
-                                         const Instruction *I) const override {
-    // For a target without strided memory operations (or for an illegal
-    // operation type on one which does), assume we lower to a gather/scatter
-    // operation.  (Which may in turn be scalarized.)
-    return thisT()->getGatherScatterOpCost(Opcode, DataTy, Ptr, VariableMask,
-                                           Alignment, CostKind, I);
-  }
-
   InstructionCost getInterleavedMemoryOpCost(
       unsigned Opcode, Type *VecTy, unsigned Factor, ArrayRef<unsigned> Indices,
       Align Alignment, unsigned AddressSpace, TTI::TargetCostKind CostKind,
@@ -1958,27 +1946,26 @@ class BasicTTIImplBase : public TargetTransformInfoImplCRTPBase<T> {
     }
     case Intrinsic::experimental_vp_strided_store: {
       const Value *Data = Args[0];
-      const Value *Ptr = Args[1];
       const Value *Mask = Args[3];
       const Value *EVL = Args[4];
       bool VarMask = !isa<Constant>(Mask) || !isa<Constant>(EVL);
       Type *EltTy = cast<VectorType>(Data->getType())->getElementType();
       Align Alignment =
           I->getParamAlign(1).value_or(thisT()->DL.getABITypeAlign(EltTy));
-      return thisT()->getStridedMemoryOpCost(Instruction::Store,
-                                             Data->getType(), Ptr, VarMask,
-                                             Alignment, CostKind, I);
+      return thisT()->getCommonMaskedMemoryOpCost(
+          Instruction::Store, Data->getType(), Alignment, VarMask,
+          /*IsGatherScatter*/ true, CostKind);
     }
     case Intrinsic::experimental_vp_strided_load: {
-      const Value *Ptr = Args[0];
       const Value *Mask = Args[2];
       const Value *EVL = Args[3];
       bool VarMask = !isa<Constant>(Mask) || !isa<Constant>(EVL);
       Type *EltTy = cast<VectorType>(RetTy)->getElementType();
       Align Alignment =
           I->getParamAlign(0).value_or(thisT()->DL.getABITypeAlign(EltTy));
-      return thisT()->getStridedMemoryOpCost(Instruction::Load, RetTy, Ptr,
-                                             VarMask, Alignment, CostKind, I);
+      return thisT()->getCommonMaskedMemoryOpCost(
+          Instruction::Load, RetTy, Alignment, VarMask,
+          /*IsGatherScatter*/ true, CostKind);
     }
     case Intrinsic::stepvector: {
       if (isa<ScalableVectorType>(RetTy))
@@ -2418,17 +2405,21 @@ class BasicTTIImplBase : public TargetTransformInfoImplCRTPBase<T> {
     }
     case Intrinsic::experimental_vp_strided_store: {
       auto *Ty = cast<VectorType>(ICA.getArgTypes()[0]);
-      Align Alignment = thisT()->DL.getABITypeAlign(Ty->getElementType());
-      return thisT()->getStridedMemoryOpCost(
-          Instruction::Store, Ty, /*Ptr=*/nullptr, /*VariableMask=*/true,
-          Alignment, CostKind, ICA.getInst());
+      Align Alignment = ICA.getAlign().value_or(
+          thisT()->DL.getABITypeAlign(Ty->getElementType()));
+      return thisT()->getCommonMaskedMemoryOpCost(
+          Instruction::Store, Ty, Alignment,
+          /*VariableMask=*/true,
+          /*IsGatherScatter*/ true, CostKind);
     }
     case Intrinsic::experimental_vp_strided_load: {
       auto *Ty = cast<VectorType>(ICA.getReturnType());
-      Align Alignment = thisT()->DL.getABITypeAlign(Ty->getElementType());
-      return thisT()->getStridedMemoryOpCost(
-          Instruction::Load, Ty, /*Ptr=*/nullptr, /*VariableMask=*/true,
-          Alignment, CostKind, ICA.getInst());
+      Align Alignment = ICA.getAlign().value_or(
+          thisT()->DL.getABITypeAlign(Ty->getElementType()));
+      return thisT()->getCommonMaskedMemoryOpCost(
+          Instruction::Load, Ty, Alignment,
+          /*VariableMask=*/true,
+          /*IsGatherScatter*/ true, CostKind);
     }
     case Intrinsic::vector_reduce_add:
     case Intrinsic::vector_reduce_mul:
diff --git a/llvm/lib/Analysis/TargetTransformInfo.cpp b/llvm/lib/Analysis/TargetTransformInfo.cpp
index c47a1c1b23a37..821017e985cc2 100644
--- a/llvm/lib/Analysis/TargetTransformInfo.cpp
+++ b/llvm/lib/Analysis/TargetTransformInfo.cpp
@@ -96,6 +96,14 @@ IntrinsicCostAttributes::IntrinsicCostAttributes(Intrinsic::ID Id, Type *RTy,
   ParamTys.insert(ParamTys.begin(), Tys.begin(), Tys.end());
 }
 
+IntrinsicCostAttributes::IntrinsicCostAttributes(Intrinsic::ID Id, Type *RTy,
+                                                 ArrayRef<Type *> Tys,
+                                                 Align Alignment,
+                                                 bool VariableMask)
+    : RetTy(RTy), IID(Id), Alignment(Alignment), VariableMask(VariableMask) {
+  ParamTys.insert(ParamTys.begin(), Tys.begin(), Tys.end());
+}
+
 IntrinsicCostAttributes::IntrinsicCostAttributes(Intrinsic::ID Id, Type *Ty,
                                                  ArrayRef<const Value *> Args)
     : RetTy(Ty), IID(Id) {
@@ -1210,15 +1218,6 @@ InstructionCost TargetTransformInfo::getExpandCompressMemoryOpCost(
   return Cost;
 }
 
-InstructionCost TargetTransformInfo::getStridedMemoryOpCost(
-    unsigned Opcode, Type *DataTy, const Value *Ptr, bool VariableMask,
-    Align Alignment, TTI::TargetCostKind CostKind, const Instruction *I) const {
-  InstructionCost Cost = TTIImpl->getStridedMemoryOpCost(
-      Opcode, DataTy, Ptr, VariableMask, Alignment, CostKind, I);
-  assert(Cost >= 0 && "TTI should not produce negative costs!");
-  return Cost;
-}
-
 InstructionCost TargetTransformInfo::getInterleavedMemoryOpCost(
     unsigned Opcode, Type *VecTy, unsigned Factor, ArrayRef<unsigned> Indices,
     Align Alignment, unsigned AddressSpace, TTI::TargetCostKind CostKind,
diff --git a/llvm/lib/Target/RISCV/RISCVTargetTransformInfo.cpp b/llvm/lib/Target/RISCV/RISCVTargetTransformInfo.cpp
index 7bc0b5b394828..da74320af0821 100644
--- a/llvm/lib/Target/RISCV/RISCVTargetTransformInfo.cpp
+++ b/llvm/lib/Target/RISCV/RISCVTargetTransformInfo.cpp
@@ -1172,29 +1172,6 @@ InstructionCost RISCVTTIImpl::getExpandCompressMemoryOpCost(
          LT.first * getRISCVInstructionCost(Opcodes, LT.second, CostKind);
 }
 
-InstructionCost RISCVTTIImpl::getStridedMemoryOpCost(
-    unsigned Opcode, Type *DataTy, const Value *Ptr, bool VariableMask,
-    Align Alignment, TTI::TargetCostKind CostKind, const Instruction *I) const {
-  if (((Opcode == Instruction::Load || Opcode == Instruction::Store) &&
-       !isLegalStridedLoadStore(DataTy, Alignment)) ||
-      (Opcode != Instruction::Load && Opcode != Instruction::Store))
-    return BaseT::getStridedMemoryOpCost(Opcode, DataTy, Ptr, VariableMask,
-                                         Alignment, CostKind, I);
-
-  if (CostKind == TTI::TCK_CodeSize)
-    return TTI::TCC_Basic;
-
-  // Cost is proportional to the number of memory operations implied.  For
-  // scalable vectors, we use an estimate on that number since we don't
-  // know exactly what VL will be.
-  auto &VTy = *cast<VectorType>(DataTy);
-  InstructionCost MemOpCost =
-      getMemoryOpCost(Opcode, VTy.getElementType(), Alignment, 0, CostKind,
-                      {TTI::OK_AnyValue, TTI::OP_None}, I);
-  unsigned NumLoads = getEstimatedVLFor(&VTy);
-  return NumLoads * MemOpCost;
-}
-
 InstructionCost
 RISCVTTIImpl::getCostOfKeepingLiveOverCall(ArrayRef<Type *> Tys) const {
   // FIXME: This is a property of the default vector convention, not
@@ -1561,6 +1538,43 @@ RISCVTTIImpl::getIntrinsicInstrCost(const IntrinsicCostAttributes &ICA,
                           cast<VectorType>(ICA.getArgTypes()[0]), {}, CostKind,
                           0, cast<VectorType>(ICA.getReturnType()));
   }
+  case Intrinsic::experimental_vp_strided_load:
+  case Intrinsic::experimental_vp_strided_store: {
+    if (CostKind == TTI::TCK_CodeSize)
+      return TTI::TCC_Basic;
+
+    auto *DataTy = (ICA.getID() == Intrinsic::experimental_vp_strided_load)
+                       ? cast<VectorType>(ICA.getReturnType())
+                       : cast<VectorType>(ICA.getArgTypes()[0]);
+    Type *EltTy = DataTy->getElementType();
+
+    Align ABITyAlign = DL.getABITypeAlign(EltTy);
+
+    const IntrinsicInst *I = ICA.getInst();
+    Align Alignment;
+    if (ICA.isTypeBasedOnly())
+      Alignment = ICA.getAlign().value_or(ABITyAlign);
+    else {
+      unsigned Index =
+          (ICA.getID() == Intrinsic::experimental_vp_strided_load) ? 0 : 1;
+      Alignment = I->getParamAlign(Index).value_or(ABITyAlign);
+    }
+
+    if (!isLegalStridedLoadStore(DataTy, Alignment))
+      return BaseT::getIntrinsicInstrCost(ICA, CostKind);
+
+    unsigned Opcode = ICA.getID() == Intrinsic::experimental_vp_strided_load
+                          ? Instruction::Load
+                          : Instruction::Store;
+    // Cost is proportional to the number of memory operations implied.  For
+    // scalable vectors, we use an estimate on that number since we don't
+    // know exactly what VL will be.
+    InstructionCost MemOpCost =
+        getMemoryOpCost(Opcode, EltTy, Alignment, 0, CostKind,
+                        {TTI::OK_AnyValue, TTI::OP_None}, I);
+    unsigned NumLoads = getEstimatedVLFor(DataTy);
+    return NumLoads * MemOpCost;
+  }
   case Intrinsic::fptoui_sat:
   case Intrinsic::fptosi_sat: {
     InstructionCost Cost = 0;
diff --git a/llvm/lib/Target/RISCV/RISCVTargetTransformInfo.h b/llvm/lib/Target/RISCV/RISCVTargetTransformInfo.h
index 6886e8964e29e..af456cdd41bd7 100644
--- a/llvm/lib/Target/RISCV/RISCVTargetTransformInfo.h
+++ b/llvm/lib/Target/RISCV/RISCVTargetTransformInfo.h
@@ -202,12 +202,6 @@ class RISCVTTIImpl final : public BasicTTIImplBase<RISCVTTIImpl> {
                                 Align Alignment, TTI::TargetCostKind CostKind,
                                 const Instruction *I = nullptr) const override;
 
-  InstructionCost getStridedMemoryOpCost(unsigned Opcode, Type *DataTy,
-                                         const Value *Ptr, bool VariableMask,
-                                         Align Alignment,
-                                         TTI::TargetCostKind CostKind,
-                                         const Instruction *I) const override;
-
   InstructionCost
   getCostOfKeepingLiveOverCall(ArrayRef<Type *> Tys) const override;
 
diff --git a/llvm/lib/Transforms/Vectorize/SLPVectorizer.cpp b/llvm/lib/Transforms/Vectorize/SLPVectorizer.cpp
index 4fcaf6dabb513..262201dac131a 100644
--- a/llvm/lib/Transforms/Vectorize/SLPVectorizer.cpp
+++ b/llvm/lib/Transforms/Vectorize/SLPVectorizer.cpp
@@ -7224,10 +7224,13 @@ BoUpSLP::LoadsState BoUpSLP::canVectorizeLoads(
               VectorGEPCost;
           break;
         case LoadsState::StridedVectorize:
-          VecLdCost += TTI.getStridedMemoryOpCost(Instruction::Load, SubVecTy,
-                                                  LI0->getPointerOperand(),
-                                                  /*VariableMask=*/false,
-                                                  CommonAlignment, CostKind) +
+          VecLdCost += TTI.getIntrinsicInstrCost(
+                           {Intrinsic::experimental_vp_strided_load,
+                            SubVecTy,
+                            {},
+                            CommonAlignment,
+                            /*VariableMask=*/false},
+                           CostKind) +
                        VectorGEPCost;
           break;
         case LoadsState::CompressVectorize:
@@ -13191,9 +13194,13 @@ void BoUpSLP::transformNodes() {
                                  BaseLI->getPointerAddressSpace(), CostKind,
                                  TTI::OperandValueInfo()) +
             ::getShuffleCost(*TTI, TTI::SK_Reverse, VecTy, Mask, CostKind);
-        InstructionCost StridedCost = TTI->getStridedMemoryOpCost(
-            Instruction::Load, VecTy, BaseLI->getPointerOperand(),
-            /*VariableMask=*/false, CommonAlignment, CostKind, BaseLI);
+        InstructionCost StridedCost =
+            TTI->getIntrinsicInstrCost({Intrinsic::experimental_vp_strided_load,
+                                        VecTy,
+                                        {},
+                                        CommonAlignment,
+                                        /*VariableMask=*/false},
+                                       CostKind);
         if (StridedCost < OriginalVecCost || ForceStridedLoads) {
           // Strided load is more profitable than consecutive load + reverse -
           // transform the node to strided load.
@@ -13226,9 +13233,13 @@ void BoUpSLP::transformNodes() {
                                  BaseSI->getPointerAddressSpace(), CostKind,
                                  TTI::OperandValueInfo()) +
             ::getShuffleCost(*TTI, TTI::SK_Reverse, VecTy, Mask, CostKind);
-        InstructionCost StridedCost = TTI->getStridedMemoryOpCost(
-            Instruction::Store, VecTy, BaseSI->getPointerOperand(),
-            /*VariableMask=*/false, CommonAlignment, CostKind, BaseSI);
+        InstructionCost StridedCost = TTI->getIntrinsicInstrCost(
+            {Intrinsic::experimental_vp_strided_store,
+             Type::getVoidTy(VecTy->getContext()),
+             {VecTy},
+             CommonAlignment,
+             /*VariableMask=*/false},
+            CostKind);
         if (StridedCost < OriginalVecCost)
           // Strided store is more profitable than reverse + consecutive store -
           // transform the node to strided store.
@@ -14991,9 +15002,13 @@ BoUpSLP::getEntryCost(const TreeEntry *E, ArrayRef<Value *> VectorizedVals,
       case TreeEntry::StridedVectorize: {
         Align CommonAlignment =
             computeCommonAlignment<LoadInst>(UniqueValues.getArrayRef());
-        VecLdCost = TTI->getStridedMemoryOpCost(
-            Instruction::Load, VecTy, LI0->getPointerOperand(),
-            /*VariableMask=*/false, CommonAlignment, CostKind);
+        VecLdCost =
+            TTI->getIntrinsicInstrCost({Intrinsic::experimental_vp_strided_load,
+                                        VecTy,
+                                        {},
+                                        CommonAlignment,
+                                        /*VariableMask=*/false},
+                                       CostKind);
         break;
       }
       case TreeEntry::CompressVectorize: {
@@ -15084,9 +15099,13 @@ BoUpSLP::getEntryCost(const TreeEntry *E, ArrayRef<Value *> VectorizedVals,
       if (E->State == TreeEntry::StridedVectorize) {
         Align CommonAlignment =
             computeCommonAlignment<StoreInst>(UniqueValues.getArrayRef());
-        VecStCost = TTI->getStridedMemoryOpCost(
-            Instruction::Store, VecTy, BaseSI->getPointerOperand(),
-            /*VariableMask=*/false, CommonAlignment, CostKind);
+        VecStCost = TTI->getIntrinsicInstrCost(
+            {Intrinsic::experimental_vp_strided_store,
+             Type::getVoidTy(VecTy->getContext()),
+             {VecTy},
+             CommonAlignment,
+             /*VariableMask=*/false},
+            CostKind);
       } else {
         assert(E->State == TreeEntry::Vectorize &&
                "Expected either strided or consecutive stores.");

lukel97 · 2025-11-03T06:56:17Z

llvm/include/llvm/CodeGen/BasicTTIImpl.h

+      return thisT()->getCommonMaskedMemoryOpCost(
+          Instruction::Store, Data->getType(), Alignment, VarMask,
+          /*IsGatherScatter*/ true, CostKind);


I don't think we currently expand vp.strided.load/store if it's not supported by the target. I think we should probably just return an invalid cost in BasicTTIImpl?

Thanks to catching this. I am reworking this to use the new getMemIntrinsicInstrCost and will address it in my next update.

fhahn

This approach doesn’t work for getGatherScatterOpCost: ARMTTIImpl::getGatherScatterOpCost walks the use chain of a provided LoadInst/StoreInst, whereas ICA currently expects an IntrinsicInst. Consider introducing getMemIntrinsicInstrCost to cover this scenario..

Thanks for pushing to unify this. I think the description is currently out-of-date with the implementation and would need updating

fhahn · 2025-11-04T09:52:55Z

llvm/include/llvm/CodeGen/BasicTTIImpl.h

+      unsigned Opcode = (Id == Intrinsic::experimental_vp_strided_load)
+                            ? Instruction::Load
+                            : Instruction::Store;
+      return thisT()->getStridedMemoryOpCost(Opcode, DataTy, Ptr, VariableMask,


should those also be migrated, possibly not in the initial PR but as follow-ups?

Yes, getStridedMemoryOpCost was straightforward (RISCV-only). The others are implemented across several backends, so I’ll stage those as follow-ups with the relevant target reviewers

Sounds great, thanks!

lukel97

The getMemIntrinsicInstrCost hook is fine by me. But I also wonder if it would be possible to store the Instruction + VariableMask inside IntrinsicCostAttributes and reuse getIntrinsicInstCost. Is that difficult to do? Not strongly opinionated on this either way

llvm/include/llvm/CodeGen/BasicTTIImpl.h

lukel97 · 2025-11-04T09:57:58Z

llvm/include/llvm/Analysis/TargetTransformInfo.h

+  /// \p I - the optional original context instruction, if one exists, e.g. the
+  ///        load/store to transform or the call to the gather/scatter intrinsic


We should probably find a better way to model the extending/truncating logic that ARM's MVE TTI needs. I don't think plumbing through the instruction is ideal, but for another PR :)

llvm/include/llvm/CodeGen/BasicTTIImpl.h

llvm/include/llvm/Analysis/TargetTransformInfo.h

davemgreen · 2025-11-04T10:32:43Z

What is the benefit of this approach, over having separate functions? It would seem that adding intrinsic-specific information would be simpler to pass to individual cost functions if needed.

There is also getInterleavedMemoryOpCost that is conceptually similar but doesn't have an intrinsic at the moment. (Although I think it might be worth adding one).

arcbbb · 2025-11-05T12:11:02Z

The getMemIntrinsicInstrCost hook is fine by me. But I also wonder if it would be possible to store the Instruction + VariableMask inside IntrinsicCostAttributes and reuse getIntrinsicInstCost. Is that difficult to do? Not strongly opinionated on this either way

It is tempting to reuse getIntrinsicInstrCost by stuffing the memory extras into IntrinsicCostAttributes, but I only figure out two imperfect ways to do it:

Option 1: Reuse ParamTys and Arguments in ICA
- At callsite, Store takes something like: getIntrinsicInstrCost({IID, getVoidTy(), {Data, Ptr}, VariableMask, Align}, ...)
- Load: getIntrinsicInstrCost({IID, DataTy, {Ptr}, VariableMask, Align}, ...)
- Downside: Not ergonomic and now users have to fabricate pseudo-args and the tuple differs for loads vs stores.
Option 2: Ignore the ParamTys and Arguments in ICA
- At callsite it will be getIntrinsicInstrCost({Ptr, DataTy, Align, VariableMask}, ...)
- Downside: Need a exclusive mode in ICA because ParamTys & Arguments are not used. and the shared parts in ICA are little.

And so far I am not satisfied with either of these.

arcbbb · 2025-11-05T12:54:52Z

What is the benefit of this approach, over having separate functions? It would seem that adding intrinsic-specific information would be simpler to pass to individual cost functions if needed.

When I added the vp.load.ff cost hooks (#160470), I noticed vp_load, vp_load_ff, vp_gather, masked_load, and masked_expandload share almost the same interface.
it would be good to unify them behind a single interface to reduce boilerplate.

davemgreen · 2025-11-05T18:40:47Z

When I added the vp.load.ff cost hooks (#160470), I noticed vp_load, vp_load_ff, vp_gather, masked_load, and masked_expandload share almost the same interface. it would be good to unify them behind a single interface to reduce boilerplate.

OK, sounds good. I'm not against this, I was just thinking of things like what the stride was in a StridedMemoryOp (constants / small values could be cheaper on some hypothetical architecture). GatherScatters could do with better cost modelling, including possible whether the addresses are base + vector offset or vector-base + constant offset, etc. This combined interface might make passing that information harder.

arcbbb · 2025-11-06T16:11:44Z

OK, sounds good. I'm not against this, I was just thinking of things like what the stride was in a StridedMemoryOp (constants / small values could be cheaper on some hypothetical architecture). GatherScatters could do with better cost modelling, including possible whether the addresses are base + vector offset or vector-base + constant offset, etc. This combined interface might make passing that information harder.

I agree that a constant stride helps the cost model.
In that cases I’d extend getMemIntrinsicInstrCost to take a MemAttr bag which has a constructor specific for target intrinsic.
Callers would look like:

getMemIntrinsicInstrCost(Intrinsic::masked_load, MemAttr(...), K)
getMemIntrinsicInstrCost(Intrinsic::vp_strided_load, MemAttr(..., Stride, ...), K)

This way just resembles getIntrinsicInstrCost. And how to fold MemAttr(...) into IntrinsicCostAttributes is a open question as noted in the reply above to @lukel97.

arcbbb · 2025-11-10T09:21:19Z

Added MemIntrinsicCostAttributes for flexible mem‑intrinsic costing.

fhahn

Another advantage of a single entry-point is that it should hopefully be easier for people to discover when new cases are queried/supported

fhahn · 2025-11-10T09:25:08Z

llvm/include/llvm/CodeGen/BasicTTIImpl.h

+      unsigned Opcode = (Id == Intrinsic::experimental_vp_strided_load)
+                            ? Instruction::Load
+                            : Instruction::Store;
+      return thisT()->getStridedMemoryOpCost(Opcode, DataTy, Ptr, VariableMask,


Sounds great, thanks!

- Split from #165532. This is a step toward a unified interface for masked/gather-scatter/strided/expand-compress cost modeling. - Replace the ad-hoc parameter list with a single attributes object. API change: ``` - InstructionCost getMaskedMemoryOpCost(Opcode, Src, Alignment, - AddressSpace, CostKind); + InstructionCost getMaskedMemoryOpCost(MemIntrinsicCostAttributes, + CostKind); ``` Notes: - NFCI intended: callers populate MemIntrinsicCostAttributes with the same information as before. - Follow-up: migrate gather/scatter, strided, and expand/compress cost queries to the same attributes-based entry point.

RKSimon

Needs rebasing after #168029 (or closing and a fresh PR raising)

arcbbb · 2025-11-24T03:09:14Z

Force-push after rebase.

…168029) - Split from llvm#165532. This is a step toward a unified interface for masked/gather-scatter/strided/expand-compress cost modeling. - Replace the ad-hoc parameter list with a single attributes object. API change: ``` - InstructionCost getMaskedMemoryOpCost(Opcode, Src, Alignment, - AddressSpace, CostKind); + InstructionCost getMaskedMemoryOpCost(MemIntrinsicCostAttributes, + CostKind); ``` Notes: - NFCI intended: callers populate MemIntrinsicCostAttributes with the same information as before. - Follow-up: migrate gather/scatter, strided, and expand/compress cost queries to the same attributes-based entry point.

RKSimon

LGTM

fhahn

LGTM, thanks!

llvm-ci · 2025-11-28T09:12:25Z

LLVM Buildbot has detected a new failure on builder clang-ppc64le-linux-test-suite running on ppc64le-clang-test-suite while building llvm at step 6 "test-build-unified-tree-check-all".

Full details are available at: https://lab.llvm.org/buildbot/#/builders/95/builds/19251

Here is the relevant piece of the build log for the reference

Step 6 (test-build-unified-tree-check-all) failure: test (failure)
******************** TEST 'SanitizerCommon-asan-powerpc64le-Linux :: Linux/getpwnam_r_invalid_user.cpp' FAILED ********************
Exit Code: 134

Command Output (stderr):
--
/home/buildbots/llvm-external-buildbots/workers/ppc64le-clang-test-suite/clang-ppc64le-test-suite/build/./bin/clang  --driver-mode=g++ -gline-tables-only -fsanitize=address  -m64 -fno-function-sections -funwind-tables  -I/home/buildbots/llvm-external-buildbots/workers/ppc64le-clang-test-suite/clang-ppc64le-test-suite/llvm-project/compiler-rt/test -ldl -O0 -g /home/buildbots/llvm-external-buildbots/workers/ppc64le-clang-test-suite/clang-ppc64le-test-suite/llvm-project/compiler-rt/test/sanitizer_common/TestCases/Linux/getpwnam_r_invalid_user.cpp -o /home/buildbots/llvm-external-buildbots/workers/ppc64le-clang-test-suite/clang-ppc64le-test-suite/build/runtimes/runtimes-bins/compiler-rt/test/sanitizer_common/asan-powerpc64le-Linux/Linux/Output/getpwnam_r_invalid_user.cpp.tmp &&  /home/buildbots/llvm-external-buildbots/workers/ppc64le-clang-test-suite/clang-ppc64le-test-suite/build/runtimes/runtimes-bins/compiler-rt/test/sanitizer_common/asan-powerpc64le-Linux/Linux/Output/getpwnam_r_invalid_user.cpp.tmp # RUN: at line 2
+ /home/buildbots/llvm-external-buildbots/workers/ppc64le-clang-test-suite/clang-ppc64le-test-suite/build/./bin/clang --driver-mode=g++ -gline-tables-only -fsanitize=address -m64 -fno-function-sections -funwind-tables -I/home/buildbots/llvm-external-buildbots/workers/ppc64le-clang-test-suite/clang-ppc64le-test-suite/llvm-project/compiler-rt/test -ldl -O0 -g /home/buildbots/llvm-external-buildbots/workers/ppc64le-clang-test-suite/clang-ppc64le-test-suite/llvm-project/compiler-rt/test/sanitizer_common/TestCases/Linux/getpwnam_r_invalid_user.cpp -o /home/buildbots/llvm-external-buildbots/workers/ppc64le-clang-test-suite/clang-ppc64le-test-suite/build/runtimes/runtimes-bins/compiler-rt/test/sanitizer_common/asan-powerpc64le-Linux/Linux/Output/getpwnam_r_invalid_user.cpp.tmp
+ /home/buildbots/llvm-external-buildbots/workers/ppc64le-clang-test-suite/clang-ppc64le-test-suite/build/runtimes/runtimes-bins/compiler-rt/test/sanitizer_common/asan-powerpc64le-Linux/Linux/Output/getpwnam_r_invalid_user.cpp.tmp
Result: 110
getpwnam_r_invalid_user.cpp.tmp: /home/buildbots/llvm-external-buildbots/workers/ppc64le-clang-test-suite/clang-ppc64le-test-suite/llvm-project/compiler-rt/test/sanitizer_common/TestCases/Linux/getpwnam_r_invalid_user.cpp:19: int main(): Assertion `res == 0 || res == ENOENT' failed.
/home/buildbots/llvm-external-buildbots/workers/ppc64le-clang-test-suite/clang-ppc64le-test-suite/build/runtimes/runtimes-bins/compiler-rt/test/sanitizer_common/asan-powerpc64le-Linux/Linux/Output/getpwnam_r_invalid_user.cpp.script: line 1: 3950270 Aborted                 /home/buildbots/llvm-external-buildbots/workers/ppc64le-clang-test-suite/clang-ppc64le-test-suite/build/runtimes/runtimes-bins/compiler-rt/test/sanitizer_common/asan-powerpc64le-Linux/Linux/Output/getpwnam_r_invalid_user.cpp.tmp

--

********************

…s costing (NFCI) (llvm#165532) In llvm#160470, there is a discussion about the possibility to explored a general approach for handling memory intrinsics. API changes: - Remove getMaskedMemoryOpCost, getGatherScatterOpCost, getExpandCompressMemoryOpCost, getStridedMemoryOpCost from Analysis/TargetTransformInfo. - Add getMemIntrinsicInstrCost. In BasicTTIImpl, map intrinsic IDs to existing target implementation until the legacy TTI hooks are retired. - masked_load/store → getMaskedMemoryOpCost - masked_/vp_gather/scatter → getGatherScatterOpCost - masked_expandload/compressstore → getExpandCompressMemoryOpCost - experimental_vp_strided_{load,store} → getStridedMemoryOpCost TODO: add support for vp_load_ff. No functional change intended; costs continue to route to the same target-specific hooks.

…om TTIImpl (#169885) Following #165532, this patch moves scalarization‑cost computation into BaseT::getMemIntrinsicCost and lets backends override it via their getMemIntrinsicCost. It also removes the masked/gather‑scatter/strided/expand‑compress costing interfaces from TTIImpl. Targets may keep them locally if needed. Stacked on #170426 and #170436.

…s costing (NFCI) (llvm#165532) In llvm#160470, there is a discussion about the possibility to explored a general approach for handling memory intrinsics. API changes: - Remove getMaskedMemoryOpCost, getGatherScatterOpCost, getExpandCompressMemoryOpCost, getStridedMemoryOpCost from Analysis/TargetTransformInfo. - Add getMemIntrinsicInstrCost. In BasicTTIImpl, map intrinsic IDs to existing target implementation until the legacy TTI hooks are retired. - masked_load/store → getMaskedMemoryOpCost - masked_/vp_gather/scatter → getGatherScatterOpCost - masked_expandload/compressstore → getExpandCompressMemoryOpCost - experimental_vp_strided_{load,store} → getStridedMemoryOpCost TODO: add support for vp_load_ff. No functional change intended; costs continue to route to the same target-specific hooks.

…om TTIImpl (llvm#169885) Following llvm#165532, this patch moves scalarization‑cost computation into BaseT::getMemIntrinsicCost and lets backends override it via their getMemIntrinsicCost. It also removes the masked/gather‑scatter/strided/expand‑compress costing interfaces from TTIImpl. Targets may keep them locally if needed. Stacked on llvm#170426 and llvm#170436.

arcbbb requested review from alexey-bataev, fhahn and luke957 October 29, 2025 09:43

arcbbb force-pushed the mem-intrinsic-cost branch from 7551383 to 47d0b7a Compare October 30, 2025 06:13

arcbbb requested review from lukel97 and removed request for luke957 November 3, 2025 06:34

arcbbb added backend:RISC-V llvm:analysis Includes value tracking, cost tables and constant folding labels Nov 3, 2025

arcbbb added the vectorizers label Nov 3, 2025

lukel97 reviewed Nov 3, 2025

View reviewed changes

fhahn requested review from RKSimon and arsenm November 3, 2025 11:03

arcbbb force-pushed the mem-intrinsic-cost branch 2 times, most recently from 2c2e772 to 5481271 Compare November 4, 2025 09:41

arcbbb changed the title ~~[WIP][TTI] Replace getStridedMemoryOpCost with getIntrinsicInstrCost~~ [TTI][Vectorize] Migrate masked/gather-scatter/strided/expand-compress costing (NFCI) Nov 4, 2025

fhahn requested a review from davemgreen November 4, 2025 09:53

fhahn reviewed Nov 4, 2025

View reviewed changes

arcbbb marked this pull request as ready for review November 4, 2025 09:54

llvmbot added the llvm:transforms label Nov 4, 2025

lukel97 reviewed Nov 4, 2025

View reviewed changes

fhahn reviewed Nov 10, 2025

View reviewed changes

RKSimon requested changes Nov 20, 2025

View reviewed changes

arcbbb force-pushed the mem-intrinsic-cost branch 3 times, most recently from dd1f30b to 9b50fbf Compare November 24, 2025 03:07

llvm deleted a comment from github-actions bot Nov 24, 2025

Add getMemIntrinsicInstrCost

bcfe3dd

arcbbb force-pushed the mem-intrinsic-cost branch from 9b50fbf to bcfe3dd Compare November 24, 2025 05:46

llvm deleted a comment from github-actions bot Nov 24, 2025

Use constructor explicitly

228ead7

RKSimon self-requested a review November 27, 2025 16:44

RKSimon approved these changes Nov 27, 2025

View reviewed changes

fhahn approved these changes Nov 27, 2025

View reviewed changes

arcbbb added 2 commits November 27, 2025 20:41

Merge remote-tracking branch 'pub/main' into mem-intrinsic-cost-v3

a5a8db8

Fix conflict after merge

960333b

arcbbb enabled auto-merge (squash) November 28, 2025 04:48

arcbbb merged commit b9bdec3 into llvm:main Nov 28, 2025
9 of 10 checks passed

arcbbb deleted the mem-intrinsic-cost branch November 28, 2025 05:57

This was referenced Nov 28, 2025

[TTI] Use MemIntrinsicCostAttributes for getGatherScatterOpCost #168650

Merged

[TTI] Remove masked/gather-scatter/strided/expand-compress costing from TTIImpl #169885

Merged

		/// \p I - the optional original context instruction, if one exists, e.g. the
		/// load/store to transform or the call to the gather/scatter intrinsic

Conversation

arcbbb commented Oct 29, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

github-actions bot commented Oct 29, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

arcbbb commented Oct 31, 2025

Uh oh!

llvmbot commented Nov 3, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

lukel97 Nov 3, 2025

Choose a reason for hiding this comment

Uh oh!

arcbbb Nov 3, 2025

Choose a reason for hiding this comment

Uh oh!

fhahn left a comment

Choose a reason for hiding this comment

Uh oh!

fhahn Nov 4, 2025

Choose a reason for hiding this comment

Uh oh!

arcbbb Nov 4, 2025

Choose a reason for hiding this comment

Uh oh!

fhahn Nov 10, 2025

Choose a reason for hiding this comment

Uh oh!

lukel97 left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

lukel97 Nov 4, 2025

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

davemgreen commented Nov 4, 2025

Uh oh!

arcbbb commented Nov 5, 2025

Uh oh!

arcbbb commented Nov 5, 2025

Uh oh!

davemgreen commented Nov 5, 2025

Uh oh!

arcbbb commented Nov 6, 2025

Uh oh!

arcbbb commented Nov 10, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

fhahn left a comment

Choose a reason for hiding this comment

Uh oh!

fhahn Nov 10, 2025

Choose a reason for hiding this comment

Uh oh!

RKSimon left a comment

Choose a reason for hiding this comment

Uh oh!

arcbbb commented Nov 24, 2025

Uh oh!

RKSimon left a comment

Choose a reason for hiding this comment

Uh oh!

fhahn left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

llvm-ci commented Nov 28, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

arcbbb commented Oct 29, 2025 •

edited

Loading

github-actions bot commented Oct 29, 2025 •

edited

Loading

llvmbot commented Nov 3, 2025 •

edited

Loading

arcbbb commented Nov 10, 2025 •

edited

Loading