[LV, VP]VP intrinsics support for the Loop Vectorizer + adding new tail-folding mode using EVL. #76172

alexey-bataev · 2023-12-21T18:12:07Z

This patch introduces generating VP intrinsics in the Loop Vectorizer.

Currently the Loop Vectorizer supports vector predication in a very limited capacity via tail-folding and masked load/store/gather/scatter intrinsics. However, this does not let architectures with active vector length predication support take advantage of their capabilities. Architectures with general masked predication support also can only take advantage of predication on memory operations. By having a way for the Loop Vectorizer to generate Vector Predication intrinsics, which (will) provide a target-independent way to model predicated vector instructions. These architectures can make better use of their predication capabilities.

Our first approach (implemented in this patch) builds on top of the existing tail-folding mechanism in the LV (just adds a new tail-folding mode using EVL), but instead of generating masked intrinsics for memory operations it generates VP intrinsics for loads/stores instructions. The patch adds a new VPlanTransforms to replace the wide header predicate compare with EVL and updates codegen for load/stores to use VP store/load with EVL.

Other important part of this approach is how the Explicit Vector Length is computed. (VP intrinsics define this vector length parameter as Explicit Vector Length (EVL)). We use an experimental intrinsic get_vector_length, that can be lowered to architecture specific instruction(s) to compute EVL.

Also, added a new recipe to emit instructions for computing EVL. Using VPlan in this way will eventually help build and compare VPlans corresponding to different strategies and alternatives.

Differential Revision: https://reviews.llvm.org/D99750

llvmbot · 2023-12-21T18:12:36Z

@llvm/pr-subscribers-backend-powerpc
@llvm/pr-subscribers-llvm-transforms

@llvm/pr-subscribers-llvm-analysis

Author: Alexey Bataev (alexey-bataev)
Co-Authored-By: Vineet Kumar (vntkmr)
Co-Authored-By: Roger Ferrer Ibáñez (rofirrim)
Co-Authored-By: Simon Moll (simoll)

Changes

This patch introduces generating VP intrinsics in the Loop Vectorizer.

Currently the Loop Vectorizer supports vector predication in a very limited capacity via tail-folding and masked load/store/gather/scatter intrinsics. However, this does not let architectures with active vector length predication support take advantage of their capabilities. Architectures with general masked predication support also can only take advantage of predication on memory operations. By having a way for the Loop Vectorizer to generate Vector Predication intrinsics, which (will) provide a target-independent way to model predicated vector instructions, These architectures can make better use of their predication capabilities.

Our first approach (implemented in this patch) builds on top of the existing tail-folding mechanism in the LV, but instead of generating masked intrinsics for memory operations it generates VP intrinsics for loads/stores instructions.

Other important part of this approach is how the Explicit Vector Length is computed. (We use active vector length and explicit vector length interchangeably; VP intrinsics define this vector length parameter as Explicit Vector Length (EVL)). We consider the following three ways to compute the EVL parameter for the VP Intrinsics.

The simplest way is to use the VF as EVL and rely solely on the mask parameter to control predication. The mask parameter is the same as computed for current tail-folding implementation.
The second way is to insert instructions to compute min(VF, trip_count - index) for each vector iteration.
For architectures like RISC-V, which have special instruction to compute/set an explicit vector length, we also introduce an experimental intrinsic get_vector_length, that can be lowered to architecture specific instruction(s) to compute EVL.

Also, added a new recipe to emit instructions for computing EVL. Using VPlan in this way will eventually help build and compare VPlans corresponding to different strategies and alternatives.

===Tentative Development Roadmap===

Use vp-intrinsics for all possible vector operations. That work has 2 possible implementations:
1. Introduce a new pass which transforms emitted vector instructions to vp intrinsics if the the loop was transformed to use predication for loads/stores. The advantage of this approach is that it does not require many changes in the loop vectorizer itself. The disadvantage is that it may require to copy some existing functionality from the loop vectorizer in a separate patch, have similar code in the different passes and perform the same analysis 2 times, at least.
2. Extend Loop Vectorizer using VectorBuildor and make it emit vp intrinsics automatically in presence of EVL value. The advantage is that it does not require a separate pass, thus it may reduce compile time. Plus, we can avoid code duplication. It requires some extra work in the LoopVectorizer to add VectorBuilder support and smart vector instructions/vp intrinsics emission. Also, to fully support Loop Vectorizer it will require adding a new PHI recipe to handle EVL on the previous iteration + extending several existing recipes with the new operands (depends on the design).
Switch to vp-intrinsics for memory operations for VLS and VLA vectorizations.

Differential Revision: https://reviews.llvm.org/D99750

Patch is 101.79 KiB, truncated to 20.00 KiB below, full version: https://github.com/llvm/llvm-project/pull/76172.diff

24 Files Affected:

(modified) llvm/include/llvm/Analysis/TargetTransformInfo.h (+4-1)
(modified) llvm/lib/Target/RISCV/RISCVTargetTransformInfo.cpp (+4)
(modified) llvm/lib/Target/RISCV/RISCVTargetTransformInfo.h (+16)
(modified) llvm/lib/Transforms/Vectorize/LoopVectorize.cpp (+151-8)
(modified) llvm/lib/Transforms/Vectorize/VPlan.h (+43)
(modified) llvm/lib/Transforms/Vectorize/VPlanAnalysis.cpp (+8-8)
(modified) llvm/lib/Transforms/Vectorize/VPlanRecipes.cpp (+66)
(modified) llvm/lib/Transforms/Vectorize/VPlanTransforms.cpp (+98-13)
(modified) llvm/lib/Transforms/Vectorize/VPlanTransforms.h (+7)
(modified) llvm/lib/Transforms/Vectorize/VPlanValue.h (+1)
(modified) llvm/lib/Transforms/Vectorize/VPlanVerifier.cpp (+51)
(modified) llvm/test/Transforms/LoopVectorize/RISCV/inloop-reduction.ll (+65-1)
(added) llvm/test/Transforms/LoopVectorize/RISCV/vectorize-vp-intrinsics.ll (+142)
(added) llvm/test/Transforms/LoopVectorize/RISCV/vplan-vp-intrinsics.ll (+125)
(added) llvm/test/Transforms/LoopVectorize/X86/vectorize-vp-intrinsics.ll (+127)
(added) llvm/test/Transforms/LoopVectorize/X86/vplan-vp-intrinsics.ll (+83)
(added) llvm/test/Transforms/LoopVectorize/vectorize-vp-intrinsics-gather-scatter.ll (+64)
(added) llvm/test/Transforms/LoopVectorize/vectorize-vp-intrinsics-interleave.ll (+169)
(added) llvm/test/Transforms/LoopVectorize/vectorize-vp-intrinsics-iv32.ll (+84)
(added) llvm/test/Transforms/LoopVectorize/vectorize-vp-intrinsics-masked-loadstore.ll (+81)
(added) llvm/test/Transforms/LoopVectorize/vectorize-vp-intrinsics-no-masking.ll (+46)
(added) llvm/test/Transforms/LoopVectorize/vectorize-vp-intrinsics-reverse-load-store.ll (+64)
(added) llvm/test/Transforms/LoopVectorize/vectorize-vp-intrinsics.ll (+97)
(added) llvm/test/Transforms/LoopVectorize/vplan-vp-intrinsics.ll (+36)

diff --git a/llvm/include/llvm/Analysis/TargetTransformInfo.h b/llvm/include/llvm/Analysis/TargetTransformInfo.h
index 735be3680aea0d..e2a127ff35be26 100644
--- a/llvm/include/llvm/Analysis/TargetTransformInfo.h
+++ b/llvm/include/llvm/Analysis/TargetTransformInfo.h
@@ -190,7 +190,10 @@ enum class TailFoldingStyle {
   /// Use predicate to control both data and control flow, but modify
   /// the trip count so that a runtime overflow check can be avoided
   /// and such that the scalar epilogue loop can always be removed.
-  DataAndControlFlowWithoutRuntimeCheck
+  DataAndControlFlowWithoutRuntimeCheck,
+  /// Use predicated EVL instructions for tail-folding.
+  /// Indicates that VP intrinsics should be used if tail-folding is enabled.
+  DataWithEVL,
 };
 
 struct TailFoldingInfo {
diff --git a/llvm/lib/Target/RISCV/RISCVTargetTransformInfo.cpp b/llvm/lib/Target/RISCV/RISCVTargetTransformInfo.cpp
index 4614446b2150b7..1a9abaea811159 100644
--- a/llvm/lib/Target/RISCV/RISCVTargetTransformInfo.cpp
+++ b/llvm/lib/Target/RISCV/RISCVTargetTransformInfo.cpp
@@ -169,6 +169,10 @@ RISCVTTIImpl::getIntImmCostIntrin(Intrinsic::ID IID, unsigned Idx,
   return TTI::TCC_Free;
 }
 
+bool RISCVTTIImpl::hasActiveVectorLength(unsigned, Type *DataTy, Align) const {
+  return ST->hasVInstructions();
+}
+
 TargetTransformInfo::PopcntSupportKind
 RISCVTTIImpl::getPopcntSupport(unsigned TyWidth) {
   assert(isPowerOf2_32(TyWidth) && "Ty width must be power of 2");
diff --git a/llvm/lib/Target/RISCV/RISCVTargetTransformInfo.h b/llvm/lib/Target/RISCV/RISCVTargetTransformInfo.h
index 96ecc771863e56..d2592be75000de 100644
--- a/llvm/lib/Target/RISCV/RISCVTargetTransformInfo.h
+++ b/llvm/lib/Target/RISCV/RISCVTargetTransformInfo.h
@@ -72,6 +72,22 @@ class RISCVTTIImpl : public BasicTTIImplBase<RISCVTTIImpl> {
                                       const APInt &Imm, Type *Ty,
                                       TTI::TargetCostKind CostKind);
 
+  /// \name Vector Predication Information
+  /// Whether the target supports the %evl parameter of VP intrinsic efficiently
+  /// in hardware, for the given opcode and type/alignment. (see LLVM Language
+  /// Reference - "Vector Predication Intrinsics",
+  /// https://llvm.org/docs/LangRef.html#vector-predication-intrinsics and
+  /// "IR-level VP intrinsics",
+  /// https://llvm.org/docs/Proposals/VectorPredication.html#ir-level-vp-intrinsics).
+  /// \param Opcode the opcode of the instruction checked for predicated version
+  /// support.
+  /// \param DataType the type of the instruction with the \p Opcode checked for
+  /// prediction support.
+  /// \param Alignment the alignment for memory access operation checked for
+  /// predicated version support.
+  bool hasActiveVectorLength(unsigned Opcode, Type *DataType,
+                             Align Alignment) const;
+
   TargetTransformInfo::PopcntSupportKind getPopcntSupport(unsigned TyWidth);
 
   bool shouldExpandReduction(const IntrinsicInst *II) const;
diff --git a/llvm/lib/Transforms/Vectorize/LoopVectorize.cpp b/llvm/lib/Transforms/Vectorize/LoopVectorize.cpp
index f82e161fb846d1..7b0e268877ded3 100644
--- a/llvm/lib/Transforms/Vectorize/LoopVectorize.cpp
+++ b/llvm/lib/Transforms/Vectorize/LoopVectorize.cpp
@@ -123,6 +123,7 @@
 #include "llvm/IR/User.h"
 #include "llvm/IR/Value.h"
 #include "llvm/IR/ValueHandle.h"
+#include "llvm/IR/VectorBuilder.h"
 #include "llvm/IR/Verifier.h"
 #include "llvm/Support/Casting.h"
 #include "llvm/Support/CommandLine.h"
@@ -247,10 +248,12 @@ static cl::opt<TailFoldingStyle> ForceTailFoldingStyle(
         clEnumValN(TailFoldingStyle::DataAndControlFlow, "data-and-control",
                    "Create lane mask using active.lane.mask intrinsic, and use "
                    "it for both data and control flow"),
-        clEnumValN(
-            TailFoldingStyle::DataAndControlFlowWithoutRuntimeCheck,
-            "data-and-control-without-rt-check",
-            "Similar to data-and-control, but remove the runtime check")));
+        clEnumValN(TailFoldingStyle::DataAndControlFlowWithoutRuntimeCheck,
+                   "data-and-control-without-rt-check",
+                   "Similar to data-and-control, but remove the runtime check"),
+        clEnumValN(TailFoldingStyle::DataWithEVL, "data-with-evl",
+                   "Use predicated EVL instructions for tail folding if the "
+                   "target supports vector length predication")));
 
 static cl::opt<bool> MaximizeBandwidth(
     "vectorizer-maximize-bandwidth", cl::init(false), cl::Hidden,
@@ -1106,8 +1109,7 @@ void InnerLoopVectorizer::collectPoisonGeneratingRecipes(
       if (isa<VPWidenMemoryInstructionRecipe>(CurRec) ||
           isa<VPInterleaveRecipe>(CurRec) ||
           isa<VPScalarIVStepsRecipe>(CurRec) ||
-          isa<VPCanonicalIVPHIRecipe>(CurRec) ||
-          isa<VPActiveLaneMaskPHIRecipe>(CurRec))
+          isa<VPHeaderPHIRecipe>(CurRec))
         continue;
 
       // This recipe contributes to the address computation of a widen
@@ -1655,6 +1657,23 @@ class LoopVectorizationCostModel {
     return foldTailByMasking() || Legal->blockNeedsPredication(BB);
   }
 
+  /// Returns true if VP intrinsics with explicit vector length support should
+  /// be generated in the tail folded loop.
+  bool useVPIWithVPEVLVectorization() const {
+    return PreferEVL && !EnableVPlanNativePath &&
+           getTailFoldingStyle() == TailFoldingStyle::DataWithEVL &&
+           // FIXME: implement support for max safe dependency distance.
+           Legal->isSafeForAnyVectorWidth() &&
+           // FIXME: remove this once reductions are supported.
+           Legal->getReductionVars().empty() &&
+           // FIXME: remove this once vp_reverse is supported.
+           none_of(
+               WideningDecisions,
+               [](const std::pair<std::pair<Instruction *, ElementCount>,
+                                  std::pair<InstWidening, InstructionCost>>
+                      &Data) { return Data.second.first == CM_Widen_Reverse; });
+  }
+
   /// Returns true if the Phi is part of an inloop reduction.
   bool isInLoopReduction(PHINode *Phi) const {
     return InLoopReductions.contains(Phi);
@@ -1800,6 +1819,10 @@ class LoopVectorizationCostModel {
   /// All blocks of loop are to be masked to fold tail of scalar iterations.
   bool CanFoldTailByMasking = false;
 
+  /// Control whether to generate VP intrinsics with explicit-vector-length
+  /// support in vectorized code.
+  bool PreferEVL = false;
+
   /// A map holding scalar costs for different vectorization factors. The
   /// presence of a cost for an instruction in the mapping indicates that the
   /// instruction will be scalarized when vectorizing with the associated
@@ -4883,6 +4906,39 @@ LoopVectorizationCostModel::computeMaxVF(ElementCount UserVF, unsigned UserIC) {
   // FIXME: look for a smaller MaxVF that does divide TC rather than masking.
   if (Legal->prepareToFoldTailByMasking()) {
     CanFoldTailByMasking = true;
+    if (getTailFoldingStyle() == TailFoldingStyle::None)
+      return MaxFactors;
+
+    if (UserIC > 1) {
+      LLVM_DEBUG(dbgs() << "LV: Preference for VP intrinsics indicated. Will "
+                           "not generate VP intrinsics since interleave count "
+                           "specified is greater than 1.\n");
+      return MaxFactors;
+    }
+
+    if (MaxFactors.ScalableVF.isVector()) {
+      assert(MaxFactors.ScalableVF.isScalable() &&
+             "Expected scalable vector factor.");
+      // FIXME: use actual opcode/data type for analysis here.
+      PreferEVL = getTailFoldingStyle() == TailFoldingStyle::DataWithEVL &&
+                  TTI.hasActiveVectorLength(0, nullptr, Align());
+#if !NDEBUG
+      if (getTailFoldingStyle() == TailFoldingStyle::DataWithEVL) {
+        if (PreferEVL)
+          dbgs() << "LV: Preference for VP intrinsics indicated. Will "
+                    "try to generate VP Intrinsics.\n";
+        else
+          dbgs() << "LV: Preference for VP intrinsics indicated. Will "
+                    "not try to generate VP Intrinsics since the target "
+                    "does not support vector length predication.\n";
+      }
+#endif // !NDEBUG
+
+      // Tail folded loop using VP intrinsics restricts the VF to be scalable.
+      if (PreferEVL)
+        MaxFactors.FixedVF = ElementCount::getFixed(1);
+    }
+
     return MaxFactors;
   }
 
@@ -5493,6 +5549,10 @@ LoopVectorizationCostModel::selectInterleaveCount(ElementCount VF,
   if (!isScalarEpilogueAllowed())
     return 1;
 
+  // Do not interleave if EVL is preferred and no User IC is specified.
+  if (useVPIWithVPEVLVectorization())
+    return 1;
+
   // We used the distance for the interleave count.
   if (!Legal->isSafeForAnyVectorWidth())
     return 1;
@@ -8622,6 +8682,8 @@ void LoopVectorizationPlanner::buildVPlansWithVPRecipes(ElementCount MinVF,
         VPlanTransforms::truncateToMinimalBitwidths(
             *Plan, CM.getMinimalBitwidths(), PSE.getSE()->getContext());
       VPlanTransforms::optimize(*Plan, *PSE.getSE());
+      if (CM.useVPIWithVPEVLVectorization())
+        VPlanTransforms::addExplicitVectorLength(*Plan);
       assert(VPlanVerifier::verifyPlanIsValid(*Plan) && "VPlan is invalid");
       VPlans.push_back(std::move(Plan));
     }
@@ -9454,6 +9516,52 @@ void VPReplicateRecipe::execute(VPTransformState &State) {
       State.ILV->scalarizeInstruction(UI, this, VPIteration(Part, Lane), State);
 }
 
+/// Creates either vp_store or vp_scatter intrinsics calls to represent
+/// predicated store/scatter.
+static Instruction *
+lowerStoreUsingVectorIntrinsics(IRBuilderBase &Builder, Value *Addr,
+                                Value *StoredVal, bool IsScatter, Value *Mask,
+                                Value *EVLPart, const Align &Alignment) {
+  CallInst *Call;
+  if (IsScatter) {
+    Call = Builder.CreateIntrinsic(Type::getVoidTy(EVLPart->getContext()),
+                                   Intrinsic::vp_scatter,
+                                   {StoredVal, Addr, Mask, EVLPart});
+  } else {
+    VectorBuilder VBuilder(Builder);
+    VBuilder.setEVL(EVLPart).setMask(Mask);
+    Call = cast<CallInst>(VBuilder.createVectorInstruction(
+        Instruction::Store, Type::getVoidTy(EVLPart->getContext()),
+        {StoredVal, Addr}));
+  }
+  Call->addParamAttr(
+      1, Attribute::getWithAlignment(Call->getContext(), Alignment));
+  return Call;
+}
+
+/// Creates either vp_load or vp_gather intrinsics calls to represent
+/// predicated load/gather.
+static Instruction *lowerLoadUsingVectorIntrinsics(IRBuilderBase &Builder,
+                                                   VectorType *DataTy,
+                                                   Value *Addr, bool IsGather,
+                                                   Value *Mask, Value *EVLPart,
+                                                   const Align &Alignment) {
+  CallInst *Call;
+  if (IsGather) {
+    Call = Builder.CreateIntrinsic(DataTy, Intrinsic::vp_gather,
+                                   {Addr, Mask, EVLPart}, nullptr,
+                                   "wide.masked.gather");
+  } else {
+    VectorBuilder VBuilder(Builder);
+    VBuilder.setEVL(EVLPart).setMask(Mask);
+    Call = cast<CallInst>(VBuilder.createVectorInstruction(
+        Instruction::Load, DataTy, Addr, "vp.op.load"));
+  }
+  Call->addParamAttr(
+      0, Attribute::getWithAlignment(Call->getContext(), Alignment));
+  return Call;
+}
+
 void VPWidenMemoryInstructionRecipe::execute(VPTransformState &State) {
   VPValue *StoredValue = isStore() ? getStoredValue() : nullptr;
 
@@ -9523,6 +9631,12 @@ void VPWidenMemoryInstructionRecipe::execute(VPTransformState &State) {
     return PartPtr;
   };
 
+  auto MaskValue = [&](unsigned Part) -> Value * {
+    if (isMaskRequired)
+      return BlockInMaskParts[Part];
+    return nullptr;
+  };
+
   // Handle Stores:
   if (SI) {
     State.setDebugLocFrom(SI->getDebugLoc());
@@ -9530,7 +9644,22 @@ void VPWidenMemoryInstructionRecipe::execute(VPTransformState &State) {
     for (unsigned Part = 0; Part < State.UF; ++Part) {
       Instruction *NewSI = nullptr;
       Value *StoredVal = State.get(StoredValue, Part);
-      if (CreateGatherScatter) {
+      if (State.EVL) {
+        Value *EVLPart = State.get(State.EVL, Part);
+        // If EVL is not nullptr, then EVL must be a valid value set during plan
+        // creation, possibly default value = whole vector register length. EVL
+        // is created only if TTI prefers predicated vectorization, thus if EVL
+        // is not nullptr it also implies preference for predicated
+        // vectorization.
+        // FIXME: Support reverse store after vp_reverse is added.
+        NewSI = lowerStoreUsingVectorIntrinsics(
+            Builder,
+            CreateGatherScatter
+                ? State.get(getAddr(), Part)
+                : CreateVecPtr(Part, State.get(getAddr(), VPIteration(0, 0))),
+            StoredVal, CreateGatherScatter, MaskValue(Part), EVLPart,
+            Alignment);
+      } else if (CreateGatherScatter) {
         Value *MaskPart = isMaskRequired ? BlockInMaskParts[Part] : nullptr;
         Value *VectorGep = State.get(getAddr(), Part);
         NewSI = Builder.CreateMaskedScatter(StoredVal, VectorGep, Alignment,
@@ -9561,7 +9690,21 @@ void VPWidenMemoryInstructionRecipe::execute(VPTransformState &State) {
   State.setDebugLocFrom(LI->getDebugLoc());
   for (unsigned Part = 0; Part < State.UF; ++Part) {
     Value *NewLI;
-    if (CreateGatherScatter) {
+    if (State.EVL) {
+      Value *EVLPart = State.get(State.EVL, Part);
+      // If EVL is not nullptr, then EVL must be a valid value set during plan
+      // creation, possibly default value = whole vector register length. EVL
+      // is created only if TTI prefers predicated vectorization, thus if EVL
+      // is not nullptr it also implies preference for predicated
+      // vectorization.
+      // FIXME: Support reverse loading after vp_reverse is added.
+      NewLI = lowerLoadUsingVectorIntrinsics(
+          Builder, DataTy,
+          CreateGatherScatter
+              ? State.get(getAddr(), Part)
+              : CreateVecPtr(Part, State.get(getAddr(), VPIteration(0, 0))),
+          CreateGatherScatter, MaskValue(Part), EVLPart, Alignment);
+    } else if (CreateGatherScatter) {
       Value *MaskPart = isMaskRequired ? BlockInMaskParts[Part] : nullptr;
       Value *VectorGep = State.get(getAddr(), Part);
       NewLI = Builder.CreateMaskedGather(DataTy, VectorGep, Alignment, MaskPart,
diff --git a/llvm/lib/Transforms/Vectorize/VPlan.h b/llvm/lib/Transforms/Vectorize/VPlan.h
index 94cb7688981361..0ca668abbe60c7 100644
--- a/llvm/lib/Transforms/Vectorize/VPlan.h
+++ b/llvm/lib/Transforms/Vectorize/VPlan.h
@@ -242,6 +242,12 @@ struct VPTransformState {
   ElementCount VF;
   unsigned UF;
 
+  /// If EVL is not nullptr, then EVL must be a valid value set during plan
+  /// creation, possibly a default value = whole vector register length. EVL is
+  /// created only if TTI prefers predicated vectorization, thus if EVL is
+  /// not nullptr it also implies preference for predicated vectorization.
+  VPValue *EVL = nullptr;
+
   /// Hold the indices to generate specific scalar instructions. Null indicates
   /// that all instances are to be generated, using either scalar or vector
   /// instructions.
@@ -1057,6 +1063,8 @@ class VPInstruction : public VPRecipeWithIRFlags, public VPValue {
     SLPLoad,
     SLPStore,
     ActiveLaneMask,
+    ExplicitVectorLength,
+    ExplicitVectorLengthIVIncrement,
     CalculateTripCountMinusVF,
     // Increment the canonical IV separately for each unrolled part.
     CanonicalIVIncrementForPart,
@@ -1165,6 +1173,8 @@ class VPInstruction : public VPRecipeWithIRFlags, public VPValue {
     default:
       return false;
     case VPInstruction::ActiveLaneMask:
+    case VPInstruction::ExplicitVectorLength:
+    case VPInstruction::ExplicitVectorLengthIVIncrement:
     case VPInstruction::CalculateTripCountMinusVF:
     case VPInstruction::CanonicalIVIncrementForPart:
     case VPInstruction::BranchOnCount:
@@ -2180,6 +2190,39 @@ class VPActiveLaneMaskPHIRecipe : public VPHeaderPHIRecipe {
 #endif
 };
 
+/// A recipe for generating the phi node for the current index of elements,
+/// adjusted in accordance with EVL value. It starts at StartIV value and gets
+/// incremented by EVL in each iteration of the vector loop.
+class VPEVLBasedIVPHIRecipe : public VPHeaderPHIRecipe {
+public:
+  VPEVLBasedIVPHIRecipe(VPValue *StartMask, DebugLoc DL)
+      : VPHeaderPHIRecipe(VPDef::VPEVLBasedIVPHISC, nullptr, StartMask, DL) {}
+
+  ~VPEVLBasedIVPHIRecipe() override = default;
+
+  VP_CLASSOF_IMPL(VPDef::VPEVLBasedIVPHISC)
+
+  static inline bool classof(const VPHeaderPHIRecipe *D) {
+    return D->getVPDefID() == VPDef::VPEVLBasedIVPHISC;
+  }
+
+  /// Generate phi for handling IV based on EVL over iterations correctly.
+  void execute(VPTransformState &State) override;
+
+  /// Returns true if the recipe only uses the first lane of operand \p Op.
+  bool onlyFirstLaneUsed(const VPValue *Op) const override {
+    assert(is_contained(operands(), Op) &&
+           "Op must be an operand of the recipe");
+    return true;
+  }
+
+#if !defined(NDEBUG) || defined(LLVM_ENABLE_DUMP)
+  /// Print the recipe.
+  void print(raw_ostream &O, const Twine &Indent,
+             VPSlotTracker &SlotTracker) const override;
+#endif
+};
+
 /// A Recipe for widening the canonical induction variable of the vector loop.
 class VPWidenCanonicalIVRecipe : public VPRecipeBase, public VPValue {
 public:
diff --git a/llvm/lib/Transforms/Vectorize/VPlanAnalysis.cpp b/llvm/lib/Transforms/Vectorize/VPlanAnalysis.cpp
index 97a8a1803bbf5a..b8ed256d236a4b 100644
--- a/llvm/lib/Transforms/Vectorize/VPlanAnalysis.cpp
+++ b/llvm/lib/Transforms/Vectorize/VPlanAnalysis.cpp
@@ -207,14 +207,14 @@ Type *VPTypeAnalysis::inferScalarType(const VPValue *V) {
   Type *ResultTy =
       TypeSwitch<const VPRecipeBase *, Type *>(V->getDefiningRecipe())
           .Case<VPCanonicalIVPHIRecipe, VPFirstOrderRecurrencePHIRecipe,
-                VPReductionPHIRecipe, VPWidenPointerInductionRecipe>(
-              [this](const auto *R) {
-                // Handle header phi recipes, except VPWienIntOrFpInduction
-                // which needs special handling due it being possibly truncated.
-                // TODO: consider inferring/caching type of siblings, e.g.,
-                // backedge value, here and in cases below.
-                return inferScalarType(R->getStartValue());
-              })
+                VPReductionPHIRecipe, VPWidenPointerInductionRecipe,
+                VPEVLBasedIVPHIRecipe>([this](const auto *R) {
+            // Handle header phi recipes, except VPWienIntOrFpInduction
+            // which needs special handling due it being possibly truncated.
+            // TODO: consider inferring/caching type of siblings, e.g.,
+            // backedge value, here and in cases below.
+            return inferScalarType(R->getStartValue());
+          })
           .Case<VPWidenIntOrFpInductionRecipe, VPDerivedIVRecipe>(
               [](const auto *R) { return R->getScalarType(); })
           .Case<VPPredInstPHIRecipe, VPWidenPHIRecipe, VPScalarIVStepsRecipe,
diff --git a/llvm/lib/Transforms/Vectorize/VPlanRecipes.cpp b/llvm/lib/Transforms/Vectorize/VPlanRecipes.cpp
index 02e400d590bed4..5e0344a14df5da 100644
--- a/llvm/lib/Transforms/Vectorize/VPlanRecipes.cpp
+++ b/llvm/lib/Transforms/Vectorize/VPlanRecipes.cpp
@@ -345,6 +345,44 @@ Value *VPInstruction::generateInstruction(VPTransformState &State,
     Value *Zero = ConstantInt::get(ScalarTC->getType(), 0);
     return Builder.CreateSelect(Cmp, Sub, Zero);
   }
+  case VPInstruction::ExplicitVectorLength: {
+    // Compute EVL
+    auto GetSetVL = [=](VPTransformState &State, Value *EVL) {
+      assert(EVL->getType()->isIntegerTy() &&
+             "Requested vector length should be an integer.");
+
+      // TODO: Add support for MaxSafeDist for correct loop emission.
+      Value *VFArg = State.Builder.getInt32(State.VF.getKnownMinValue());
+
+      Value *GVL = State.Builder.CreateIntrinsic(
+          State.Builder.getInt32Ty(), Intrinsic::experimental_get_vector_length,
+          {EVL, VFArg, State.Builder.getTrue()});
+      return GVL;
+    };
+    // TODO: Restructur...
[truncated]

llvmbot · 2023-12-21T18:12:37Z

@llvm/pr-subscribers-backend-risc-v

Author: Alexey Bataev (alexey-bataev)
Co-Authored-By: Vineet Kumar (vntkmr)
Co-Authored-By: Roger Ferrer Ibáñez (rofirrim)
Co-Authored-By: Simon Moll (simoll)

Changes

This patch introduces generating VP intrinsics in the Loop Vectorizer.

Currently the Loop Vectorizer supports vector predication in a very limited capacity via tail-folding and masked load/store/gather/scatter intrinsics. However, this does not let architectures with active vector length predication support take advantage of their capabilities. Architectures with general masked predication support also can only take advantage of predication on memory operations. By having a way for the Loop Vectorizer to generate Vector Predication intrinsics, which (will) provide a target-independent way to model predicated vector instructions, These architectures can make better use of their predication capabilities.

Our first approach (implemented in this patch) builds on top of the existing tail-folding mechanism in the LV, but instead of generating masked intrinsics for memory operations it generates VP intrinsics for loads/stores instructions.

Other important part of this approach is how the Explicit Vector Length is computed. (We use active vector length and explicit vector length interchangeably; VP intrinsics define this vector length parameter as Explicit Vector Length (EVL)). We consider the following three ways to compute the EVL parameter for the VP Intrinsics.

The simplest way is to use the VF as EVL and rely solely on the mask parameter to control predication. The mask parameter is the same as computed for current tail-folding implementation.
The second way is to insert instructions to compute min(VF, trip_count - index) for each vector iteration.
For architectures like RISC-V, which have special instruction to compute/set an explicit vector length, we also introduce an experimental intrinsic get_vector_length, that can be lowered to architecture specific instruction(s) to compute EVL.

Also, added a new recipe to emit instructions for computing EVL. Using VPlan in this way will eventually help build and compare VPlans corresponding to different strategies and alternatives.

===Tentative Development Roadmap===

Use vp-intrinsics for all possible vector operations. That work has 2 possible implementations:
1. Introduce a new pass which transforms emitted vector instructions to vp intrinsics if the the loop was transformed to use predication for loads/stores. The advantage of this approach is that it does not require many changes in the loop vectorizer itself. The disadvantage is that it may require to copy some existing functionality from the loop vectorizer in a separate patch, have similar code in the different passes and perform the same analysis 2 times, at least.
2. Extend Loop Vectorizer using VectorBuildor and make it emit vp intrinsics automatically in presence of EVL value. The advantage is that it does not require a separate pass, thus it may reduce compile time. Plus, we can avoid code duplication. It requires some extra work in the LoopVectorizer to add VectorBuilder support and smart vector instructions/vp intrinsics emission. Also, to fully support Loop Vectorizer it will require adding a new PHI recipe to handle EVL on the previous iteration + extending several existing recipes with the new operands (depends on the design).
Switch to vp-intrinsics for memory operations for VLS and VLA vectorizations.

Differential Revision: https://reviews.llvm.org/D99750

Patch is 101.79 KiB, truncated to 20.00 KiB below, full version: https://github.com/llvm/llvm-project/pull/76172.diff

24 Files Affected:

(modified) llvm/include/llvm/Analysis/TargetTransformInfo.h (+4-1)
(modified) llvm/lib/Target/RISCV/RISCVTargetTransformInfo.cpp (+4)
(modified) llvm/lib/Target/RISCV/RISCVTargetTransformInfo.h (+16)
(modified) llvm/lib/Transforms/Vectorize/LoopVectorize.cpp (+151-8)
(modified) llvm/lib/Transforms/Vectorize/VPlan.h (+43)
(modified) llvm/lib/Transforms/Vectorize/VPlanAnalysis.cpp (+8-8)
(modified) llvm/lib/Transforms/Vectorize/VPlanRecipes.cpp (+66)
(modified) llvm/lib/Transforms/Vectorize/VPlanTransforms.cpp (+98-13)
(modified) llvm/lib/Transforms/Vectorize/VPlanTransforms.h (+7)
(modified) llvm/lib/Transforms/Vectorize/VPlanValue.h (+1)
(modified) llvm/lib/Transforms/Vectorize/VPlanVerifier.cpp (+51)
(modified) llvm/test/Transforms/LoopVectorize/RISCV/inloop-reduction.ll (+65-1)
(added) llvm/test/Transforms/LoopVectorize/RISCV/vectorize-vp-intrinsics.ll (+142)
(added) llvm/test/Transforms/LoopVectorize/RISCV/vplan-vp-intrinsics.ll (+125)
(added) llvm/test/Transforms/LoopVectorize/X86/vectorize-vp-intrinsics.ll (+127)
(added) llvm/test/Transforms/LoopVectorize/X86/vplan-vp-intrinsics.ll (+83)
(added) llvm/test/Transforms/LoopVectorize/vectorize-vp-intrinsics-gather-scatter.ll (+64)
(added) llvm/test/Transforms/LoopVectorize/vectorize-vp-intrinsics-interleave.ll (+169)
(added) llvm/test/Transforms/LoopVectorize/vectorize-vp-intrinsics-iv32.ll (+84)
(added) llvm/test/Transforms/LoopVectorize/vectorize-vp-intrinsics-masked-loadstore.ll (+81)
(added) llvm/test/Transforms/LoopVectorize/vectorize-vp-intrinsics-no-masking.ll (+46)
(added) llvm/test/Transforms/LoopVectorize/vectorize-vp-intrinsics-reverse-load-store.ll (+64)
(added) llvm/test/Transforms/LoopVectorize/vectorize-vp-intrinsics.ll (+97)
(added) llvm/test/Transforms/LoopVectorize/vplan-vp-intrinsics.ll (+36)

diff --git a/llvm/include/llvm/Analysis/TargetTransformInfo.h b/llvm/include/llvm/Analysis/TargetTransformInfo.h
index 735be3680aea0d..e2a127ff35be26 100644
--- a/llvm/include/llvm/Analysis/TargetTransformInfo.h
+++ b/llvm/include/llvm/Analysis/TargetTransformInfo.h
@@ -190,7 +190,10 @@ enum class TailFoldingStyle {
   /// Use predicate to control both data and control flow, but modify
   /// the trip count so that a runtime overflow check can be avoided
   /// and such that the scalar epilogue loop can always be removed.
-  DataAndControlFlowWithoutRuntimeCheck
+  DataAndControlFlowWithoutRuntimeCheck,
+  /// Use predicated EVL instructions for tail-folding.
+  /// Indicates that VP intrinsics should be used if tail-folding is enabled.
+  DataWithEVL,
 };
 
 struct TailFoldingInfo {
diff --git a/llvm/lib/Target/RISCV/RISCVTargetTransformInfo.cpp b/llvm/lib/Target/RISCV/RISCVTargetTransformInfo.cpp
index 4614446b2150b7..1a9abaea811159 100644
--- a/llvm/lib/Target/RISCV/RISCVTargetTransformInfo.cpp
+++ b/llvm/lib/Target/RISCV/RISCVTargetTransformInfo.cpp
@@ -169,6 +169,10 @@ RISCVTTIImpl::getIntImmCostIntrin(Intrinsic::ID IID, unsigned Idx,
   return TTI::TCC_Free;
 }
 
+bool RISCVTTIImpl::hasActiveVectorLength(unsigned, Type *DataTy, Align) const {
+  return ST->hasVInstructions();
+}
+
 TargetTransformInfo::PopcntSupportKind
 RISCVTTIImpl::getPopcntSupport(unsigned TyWidth) {
   assert(isPowerOf2_32(TyWidth) && "Ty width must be power of 2");
diff --git a/llvm/lib/Target/RISCV/RISCVTargetTransformInfo.h b/llvm/lib/Target/RISCV/RISCVTargetTransformInfo.h
index 96ecc771863e56..d2592be75000de 100644
--- a/llvm/lib/Target/RISCV/RISCVTargetTransformInfo.h
+++ b/llvm/lib/Target/RISCV/RISCVTargetTransformInfo.h
@@ -72,6 +72,22 @@ class RISCVTTIImpl : public BasicTTIImplBase<RISCVTTIImpl> {
                                       const APInt &Imm, Type *Ty,
                                       TTI::TargetCostKind CostKind);
 
+  /// \name Vector Predication Information
+  /// Whether the target supports the %evl parameter of VP intrinsic efficiently
+  /// in hardware, for the given opcode and type/alignment. (see LLVM Language
+  /// Reference - "Vector Predication Intrinsics",
+  /// https://llvm.org/docs/LangRef.html#vector-predication-intrinsics and
+  /// "IR-level VP intrinsics",
+  /// https://llvm.org/docs/Proposals/VectorPredication.html#ir-level-vp-intrinsics).
+  /// \param Opcode the opcode of the instruction checked for predicated version
+  /// support.
+  /// \param DataType the type of the instruction with the \p Opcode checked for
+  /// prediction support.
+  /// \param Alignment the alignment for memory access operation checked for
+  /// predicated version support.
+  bool hasActiveVectorLength(unsigned Opcode, Type *DataType,
+                             Align Alignment) const;
+
   TargetTransformInfo::PopcntSupportKind getPopcntSupport(unsigned TyWidth);
 
   bool shouldExpandReduction(const IntrinsicInst *II) const;
diff --git a/llvm/lib/Transforms/Vectorize/LoopVectorize.cpp b/llvm/lib/Transforms/Vectorize/LoopVectorize.cpp
index f82e161fb846d1..7b0e268877ded3 100644
--- a/llvm/lib/Transforms/Vectorize/LoopVectorize.cpp
+++ b/llvm/lib/Transforms/Vectorize/LoopVectorize.cpp
@@ -123,6 +123,7 @@
 #include "llvm/IR/User.h"
 #include "llvm/IR/Value.h"
 #include "llvm/IR/ValueHandle.h"
+#include "llvm/IR/VectorBuilder.h"
 #include "llvm/IR/Verifier.h"
 #include "llvm/Support/Casting.h"
 #include "llvm/Support/CommandLine.h"
@@ -247,10 +248,12 @@ static cl::opt<TailFoldingStyle> ForceTailFoldingStyle(
         clEnumValN(TailFoldingStyle::DataAndControlFlow, "data-and-control",
                    "Create lane mask using active.lane.mask intrinsic, and use "
                    "it for both data and control flow"),
-        clEnumValN(
-            TailFoldingStyle::DataAndControlFlowWithoutRuntimeCheck,
-            "data-and-control-without-rt-check",
-            "Similar to data-and-control, but remove the runtime check")));
+        clEnumValN(TailFoldingStyle::DataAndControlFlowWithoutRuntimeCheck,
+                   "data-and-control-without-rt-check",
+                   "Similar to data-and-control, but remove the runtime check"),
+        clEnumValN(TailFoldingStyle::DataWithEVL, "data-with-evl",
+                   "Use predicated EVL instructions for tail folding if the "
+                   "target supports vector length predication")));
 
 static cl::opt<bool> MaximizeBandwidth(
     "vectorizer-maximize-bandwidth", cl::init(false), cl::Hidden,
@@ -1106,8 +1109,7 @@ void InnerLoopVectorizer::collectPoisonGeneratingRecipes(
       if (isa<VPWidenMemoryInstructionRecipe>(CurRec) ||
           isa<VPInterleaveRecipe>(CurRec) ||
           isa<VPScalarIVStepsRecipe>(CurRec) ||
-          isa<VPCanonicalIVPHIRecipe>(CurRec) ||
-          isa<VPActiveLaneMaskPHIRecipe>(CurRec))
+          isa<VPHeaderPHIRecipe>(CurRec))
         continue;
 
       // This recipe contributes to the address computation of a widen
@@ -1655,6 +1657,23 @@ class LoopVectorizationCostModel {
     return foldTailByMasking() || Legal->blockNeedsPredication(BB);
   }
 
+  /// Returns true if VP intrinsics with explicit vector length support should
+  /// be generated in the tail folded loop.
+  bool useVPIWithVPEVLVectorization() const {
+    return PreferEVL && !EnableVPlanNativePath &&
+           getTailFoldingStyle() == TailFoldingStyle::DataWithEVL &&
+           // FIXME: implement support for max safe dependency distance.
+           Legal->isSafeForAnyVectorWidth() &&
+           // FIXME: remove this once reductions are supported.
+           Legal->getReductionVars().empty() &&
+           // FIXME: remove this once vp_reverse is supported.
+           none_of(
+               WideningDecisions,
+               [](const std::pair<std::pair<Instruction *, ElementCount>,
+                                  std::pair<InstWidening, InstructionCost>>
+                      &Data) { return Data.second.first == CM_Widen_Reverse; });
+  }
+
   /// Returns true if the Phi is part of an inloop reduction.
   bool isInLoopReduction(PHINode *Phi) const {
     return InLoopReductions.contains(Phi);
@@ -1800,6 +1819,10 @@ class LoopVectorizationCostModel {
   /// All blocks of loop are to be masked to fold tail of scalar iterations.
   bool CanFoldTailByMasking = false;
 
+  /// Control whether to generate VP intrinsics with explicit-vector-length
+  /// support in vectorized code.
+  bool PreferEVL = false;
+
   /// A map holding scalar costs for different vectorization factors. The
   /// presence of a cost for an instruction in the mapping indicates that the
   /// instruction will be scalarized when vectorizing with the associated
@@ -4883,6 +4906,39 @@ LoopVectorizationCostModel::computeMaxVF(ElementCount UserVF, unsigned UserIC) {
   // FIXME: look for a smaller MaxVF that does divide TC rather than masking.
   if (Legal->prepareToFoldTailByMasking()) {
     CanFoldTailByMasking = true;
+    if (getTailFoldingStyle() == TailFoldingStyle::None)
+      return MaxFactors;
+
+    if (UserIC > 1) {
+      LLVM_DEBUG(dbgs() << "LV: Preference for VP intrinsics indicated. Will "
+                           "not generate VP intrinsics since interleave count "
+                           "specified is greater than 1.\n");
+      return MaxFactors;
+    }
+
+    if (MaxFactors.ScalableVF.isVector()) {
+      assert(MaxFactors.ScalableVF.isScalable() &&
+             "Expected scalable vector factor.");
+      // FIXME: use actual opcode/data type for analysis here.
+      PreferEVL = getTailFoldingStyle() == TailFoldingStyle::DataWithEVL &&
+                  TTI.hasActiveVectorLength(0, nullptr, Align());
+#if !NDEBUG
+      if (getTailFoldingStyle() == TailFoldingStyle::DataWithEVL) {
+        if (PreferEVL)
+          dbgs() << "LV: Preference for VP intrinsics indicated. Will "
+                    "try to generate VP Intrinsics.\n";
+        else
+          dbgs() << "LV: Preference for VP intrinsics indicated. Will "
+                    "not try to generate VP Intrinsics since the target "
+                    "does not support vector length predication.\n";
+      }
+#endif // !NDEBUG
+
+      // Tail folded loop using VP intrinsics restricts the VF to be scalable.
+      if (PreferEVL)
+        MaxFactors.FixedVF = ElementCount::getFixed(1);
+    }
+
     return MaxFactors;
   }
 
@@ -5493,6 +5549,10 @@ LoopVectorizationCostModel::selectInterleaveCount(ElementCount VF,
   if (!isScalarEpilogueAllowed())
     return 1;
 
+  // Do not interleave if EVL is preferred and no User IC is specified.
+  if (useVPIWithVPEVLVectorization())
+    return 1;
+
   // We used the distance for the interleave count.
   if (!Legal->isSafeForAnyVectorWidth())
     return 1;
@@ -8622,6 +8682,8 @@ void LoopVectorizationPlanner::buildVPlansWithVPRecipes(ElementCount MinVF,
         VPlanTransforms::truncateToMinimalBitwidths(
             *Plan, CM.getMinimalBitwidths(), PSE.getSE()->getContext());
       VPlanTransforms::optimize(*Plan, *PSE.getSE());
+      if (CM.useVPIWithVPEVLVectorization())
+        VPlanTransforms::addExplicitVectorLength(*Plan);
       assert(VPlanVerifier::verifyPlanIsValid(*Plan) && "VPlan is invalid");
       VPlans.push_back(std::move(Plan));
     }
@@ -9454,6 +9516,52 @@ void VPReplicateRecipe::execute(VPTransformState &State) {
       State.ILV->scalarizeInstruction(UI, this, VPIteration(Part, Lane), State);
 }
 
+/// Creates either vp_store or vp_scatter intrinsics calls to represent
+/// predicated store/scatter.
+static Instruction *
+lowerStoreUsingVectorIntrinsics(IRBuilderBase &Builder, Value *Addr,
+                                Value *StoredVal, bool IsScatter, Value *Mask,
+                                Value *EVLPart, const Align &Alignment) {
+  CallInst *Call;
+  if (IsScatter) {
+    Call = Builder.CreateIntrinsic(Type::getVoidTy(EVLPart->getContext()),
+                                   Intrinsic::vp_scatter,
+                                   {StoredVal, Addr, Mask, EVLPart});
+  } else {
+    VectorBuilder VBuilder(Builder);
+    VBuilder.setEVL(EVLPart).setMask(Mask);
+    Call = cast<CallInst>(VBuilder.createVectorInstruction(
+        Instruction::Store, Type::getVoidTy(EVLPart->getContext()),
+        {StoredVal, Addr}));
+  }
+  Call->addParamAttr(
+      1, Attribute::getWithAlignment(Call->getContext(), Alignment));
+  return Call;
+}
+
+/// Creates either vp_load or vp_gather intrinsics calls to represent
+/// predicated load/gather.
+static Instruction *lowerLoadUsingVectorIntrinsics(IRBuilderBase &Builder,
+                                                   VectorType *DataTy,
+                                                   Value *Addr, bool IsGather,
+                                                   Value *Mask, Value *EVLPart,
+                                                   const Align &Alignment) {
+  CallInst *Call;
+  if (IsGather) {
+    Call = Builder.CreateIntrinsic(DataTy, Intrinsic::vp_gather,
+                                   {Addr, Mask, EVLPart}, nullptr,
+                                   "wide.masked.gather");
+  } else {
+    VectorBuilder VBuilder(Builder);
+    VBuilder.setEVL(EVLPart).setMask(Mask);
+    Call = cast<CallInst>(VBuilder.createVectorInstruction(
+        Instruction::Load, DataTy, Addr, "vp.op.load"));
+  }
+  Call->addParamAttr(
+      0, Attribute::getWithAlignment(Call->getContext(), Alignment));
+  return Call;
+}
+
 void VPWidenMemoryInstructionRecipe::execute(VPTransformState &State) {
   VPValue *StoredValue = isStore() ? getStoredValue() : nullptr;
 
@@ -9523,6 +9631,12 @@ void VPWidenMemoryInstructionRecipe::execute(VPTransformState &State) {
     return PartPtr;
   };
 
+  auto MaskValue = [&](unsigned Part) -> Value * {
+    if (isMaskRequired)
+      return BlockInMaskParts[Part];
+    return nullptr;
+  };
+
   // Handle Stores:
   if (SI) {
     State.setDebugLocFrom(SI->getDebugLoc());
@@ -9530,7 +9644,22 @@ void VPWidenMemoryInstructionRecipe::execute(VPTransformState &State) {
     for (unsigned Part = 0; Part < State.UF; ++Part) {
       Instruction *NewSI = nullptr;
       Value *StoredVal = State.get(StoredValue, Part);
-      if (CreateGatherScatter) {
+      if (State.EVL) {
+        Value *EVLPart = State.get(State.EVL, Part);
+        // If EVL is not nullptr, then EVL must be a valid value set during plan
+        // creation, possibly default value = whole vector register length. EVL
+        // is created only if TTI prefers predicated vectorization, thus if EVL
+        // is not nullptr it also implies preference for predicated
+        // vectorization.
+        // FIXME: Support reverse store after vp_reverse is added.
+        NewSI = lowerStoreUsingVectorIntrinsics(
+            Builder,
+            CreateGatherScatter
+                ? State.get(getAddr(), Part)
+                : CreateVecPtr(Part, State.get(getAddr(), VPIteration(0, 0))),
+            StoredVal, CreateGatherScatter, MaskValue(Part), EVLPart,
+            Alignment);
+      } else if (CreateGatherScatter) {
         Value *MaskPart = isMaskRequired ? BlockInMaskParts[Part] : nullptr;
         Value *VectorGep = State.get(getAddr(), Part);
         NewSI = Builder.CreateMaskedScatter(StoredVal, VectorGep, Alignment,
@@ -9561,7 +9690,21 @@ void VPWidenMemoryInstructionRecipe::execute(VPTransformState &State) {
   State.setDebugLocFrom(LI->getDebugLoc());
   for (unsigned Part = 0; Part < State.UF; ++Part) {
     Value *NewLI;
-    if (CreateGatherScatter) {
+    if (State.EVL) {
+      Value *EVLPart = State.get(State.EVL, Part);
+      // If EVL is not nullptr, then EVL must be a valid value set during plan
+      // creation, possibly default value = whole vector register length. EVL
+      // is created only if TTI prefers predicated vectorization, thus if EVL
+      // is not nullptr it also implies preference for predicated
+      // vectorization.
+      // FIXME: Support reverse loading after vp_reverse is added.
+      NewLI = lowerLoadUsingVectorIntrinsics(
+          Builder, DataTy,
+          CreateGatherScatter
+              ? State.get(getAddr(), Part)
+              : CreateVecPtr(Part, State.get(getAddr(), VPIteration(0, 0))),
+          CreateGatherScatter, MaskValue(Part), EVLPart, Alignment);
+    } else if (CreateGatherScatter) {
       Value *MaskPart = isMaskRequired ? BlockInMaskParts[Part] : nullptr;
       Value *VectorGep = State.get(getAddr(), Part);
       NewLI = Builder.CreateMaskedGather(DataTy, VectorGep, Alignment, MaskPart,
diff --git a/llvm/lib/Transforms/Vectorize/VPlan.h b/llvm/lib/Transforms/Vectorize/VPlan.h
index 94cb7688981361..0ca668abbe60c7 100644
--- a/llvm/lib/Transforms/Vectorize/VPlan.h
+++ b/llvm/lib/Transforms/Vectorize/VPlan.h
@@ -242,6 +242,12 @@ struct VPTransformState {
   ElementCount VF;
   unsigned UF;
 
+  /// If EVL is not nullptr, then EVL must be a valid value set during plan
+  /// creation, possibly a default value = whole vector register length. EVL is
+  /// created only if TTI prefers predicated vectorization, thus if EVL is
+  /// not nullptr it also implies preference for predicated vectorization.
+  VPValue *EVL = nullptr;
+
   /// Hold the indices to generate specific scalar instructions. Null indicates
   /// that all instances are to be generated, using either scalar or vector
   /// instructions.
@@ -1057,6 +1063,8 @@ class VPInstruction : public VPRecipeWithIRFlags, public VPValue {
     SLPLoad,
     SLPStore,
     ActiveLaneMask,
+    ExplicitVectorLength,
+    ExplicitVectorLengthIVIncrement,
     CalculateTripCountMinusVF,
     // Increment the canonical IV separately for each unrolled part.
     CanonicalIVIncrementForPart,
@@ -1165,6 +1173,8 @@ class VPInstruction : public VPRecipeWithIRFlags, public VPValue {
     default:
       return false;
     case VPInstruction::ActiveLaneMask:
+    case VPInstruction::ExplicitVectorLength:
+    case VPInstruction::ExplicitVectorLengthIVIncrement:
     case VPInstruction::CalculateTripCountMinusVF:
     case VPInstruction::CanonicalIVIncrementForPart:
     case VPInstruction::BranchOnCount:
@@ -2180,6 +2190,39 @@ class VPActiveLaneMaskPHIRecipe : public VPHeaderPHIRecipe {
 #endif
 };
 
+/// A recipe for generating the phi node for the current index of elements,
+/// adjusted in accordance with EVL value. It starts at StartIV value and gets
+/// incremented by EVL in each iteration of the vector loop.
+class VPEVLBasedIVPHIRecipe : public VPHeaderPHIRecipe {
+public:
+  VPEVLBasedIVPHIRecipe(VPValue *StartMask, DebugLoc DL)
+      : VPHeaderPHIRecipe(VPDef::VPEVLBasedIVPHISC, nullptr, StartMask, DL) {}
+
+  ~VPEVLBasedIVPHIRecipe() override = default;
+
+  VP_CLASSOF_IMPL(VPDef::VPEVLBasedIVPHISC)
+
+  static inline bool classof(const VPHeaderPHIRecipe *D) {
+    return D->getVPDefID() == VPDef::VPEVLBasedIVPHISC;
+  }
+
+  /// Generate phi for handling IV based on EVL over iterations correctly.
+  void execute(VPTransformState &State) override;
+
+  /// Returns true if the recipe only uses the first lane of operand \p Op.
+  bool onlyFirstLaneUsed(const VPValue *Op) const override {
+    assert(is_contained(operands(), Op) &&
+           "Op must be an operand of the recipe");
+    return true;
+  }
+
+#if !defined(NDEBUG) || defined(LLVM_ENABLE_DUMP)
+  /// Print the recipe.
+  void print(raw_ostream &O, const Twine &Indent,
+             VPSlotTracker &SlotTracker) const override;
+#endif
+};
+
 /// A Recipe for widening the canonical induction variable of the vector loop.
 class VPWidenCanonicalIVRecipe : public VPRecipeBase, public VPValue {
 public:
diff --git a/llvm/lib/Transforms/Vectorize/VPlanAnalysis.cpp b/llvm/lib/Transforms/Vectorize/VPlanAnalysis.cpp
index 97a8a1803bbf5a..b8ed256d236a4b 100644
--- a/llvm/lib/Transforms/Vectorize/VPlanAnalysis.cpp
+++ b/llvm/lib/Transforms/Vectorize/VPlanAnalysis.cpp
@@ -207,14 +207,14 @@ Type *VPTypeAnalysis::inferScalarType(const VPValue *V) {
   Type *ResultTy =
       TypeSwitch<const VPRecipeBase *, Type *>(V->getDefiningRecipe())
           .Case<VPCanonicalIVPHIRecipe, VPFirstOrderRecurrencePHIRecipe,
-                VPReductionPHIRecipe, VPWidenPointerInductionRecipe>(
-              [this](const auto *R) {
-                // Handle header phi recipes, except VPWienIntOrFpInduction
-                // which needs special handling due it being possibly truncated.
-                // TODO: consider inferring/caching type of siblings, e.g.,
-                // backedge value, here and in cases below.
-                return inferScalarType(R->getStartValue());
-              })
+                VPReductionPHIRecipe, VPWidenPointerInductionRecipe,
+                VPEVLBasedIVPHIRecipe>([this](const auto *R) {
+            // Handle header phi recipes, except VPWienIntOrFpInduction
+            // which needs special handling due it being possibly truncated.
+            // TODO: consider inferring/caching type of siblings, e.g.,
+            // backedge value, here and in cases below.
+            return inferScalarType(R->getStartValue());
+          })
           .Case<VPWidenIntOrFpInductionRecipe, VPDerivedIVRecipe>(
               [](const auto *R) { return R->getScalarType(); })
           .Case<VPPredInstPHIRecipe, VPWidenPHIRecipe, VPScalarIVStepsRecipe,
diff --git a/llvm/lib/Transforms/Vectorize/VPlanRecipes.cpp b/llvm/lib/Transforms/Vectorize/VPlanRecipes.cpp
index 02e400d590bed4..5e0344a14df5da 100644
--- a/llvm/lib/Transforms/Vectorize/VPlanRecipes.cpp
+++ b/llvm/lib/Transforms/Vectorize/VPlanRecipes.cpp
@@ -345,6 +345,44 @@ Value *VPInstruction::generateInstruction(VPTransformState &State,
     Value *Zero = ConstantInt::get(ScalarTC->getType(), 0);
     return Builder.CreateSelect(Cmp, Sub, Zero);
   }
+  case VPInstruction::ExplicitVectorLength: {
+    // Compute EVL
+    auto GetSetVL = [=](VPTransformState &State, Value *EVL) {
+      assert(EVL->getType()->isIntegerTy() &&
+             "Requested vector length should be an integer.");
+
+      // TODO: Add support for MaxSafeDist for correct loop emission.
+      Value *VFArg = State.Builder.getInt32(State.VF.getKnownMinValue());
+
+      Value *GVL = State.Builder.CreateIntrinsic(
+          State.Builder.getInt32Ty(), Intrinsic::experimental_get_vector_length,
+          {EVL, VFArg, State.Builder.getTrue()});
+      return GVL;
+    };
+    // TODO: Restructur...
[truncated]

github-actions · 2023-12-21T18:14:52Z

✅ With the latest revision this PR passed the C/C++ code formatter.

alexey-bataev · 2023-12-29T14:03:55Z

Ping!

fhahn · 2023-12-29T21:06:35Z

Thanks for moving to Github now that Phabricator has been taken down!

I think @ayalz added some comments shortly before Phabricator was deactivated; unfortunately https://reviews.llvm.org/D99750 isn't accessible at the moment it seems (and it also doesn't seem to be available at http://108.170.204.19/D99750 which is supposed to have a static mirror). I am not sure what's the best way to pick up the recent comments here, perhaps it would be best to share the latest responses here on GH now?

alexey-bataev · 2023-12-29T22:14:55Z

I addressed most of the @ayalz comments in this version

fhahn · 2024-01-02T13:43:55Z

I addressed most of the @ayalz comments in this version

Ok thanks!

It would be helpful to import the recent conversations here and including what has been addressed how in the current iteration and if anything is still left open. Unfortunately it looks like for some reason D99750 isn't included in the static archive of reviews.llvm.org, I posted https://discourse.llvm.org/t/some-reviews-on-reviews-llvm-org-seem-to-be-missing-from-the-static-archive/76001 to hopefully get back access to the context in Phabricator.

alexey-bataev · 2024-01-05T01:20:09Z

Rebase

fhahn · 2024-01-10T20:42:57Z

llvm/lib/Transforms/Vectorize/VPlanRecipes.cpp

+    assert(EVL->getType()->getScalarSizeInBits() <=
+               Phi->getType()->getScalarSizeInBits() &&
+           "EVL type must be smaller than Phi type.");
+    EVL = Builder.CreateIntCast(EVL, Phi->getType(), /*isSigned=*/false);


Would it be possible to use the same type for all users without needing to cast here? Without the case, would a simple Add VPInstruction suffice (as in a5891fa)

I tried but it does not work unfortunately. It would be good to have Cast VPRecipe to implement this without adding new Instruction.
The type of the EVL (and many of their users) is i32 (because of https://llvm.org/docs/LangRef.html#llvm-experimental-get-vector-length-intrinsic) and the cast is required

Ah yes, I remember now again! There's now a recipe for vector casts, but not yet for scalar casts. Let me check if there are other places that would benefit from such a recipe.

Looks link a general recipe for scalar casts would also be helpful in other cases (e.g. truncate of induction steps), shared a patch for discussion: #78113

#78113 landed so it should be possible now to use Add for the increment. Does that work now?

Still does not work:
opt: lib/Transforms/Vectorize/VPlan.cpp:290: llvm::Value *llvm::VPTransformState::get(llvm::VPValue *, unsigned int): Assertion `(isa(Def->getDefiningRecipe()) || isa(Def->getDefiningRecipe()) || isa(Def->getDefiningRecipe())) && "unexpected recipe found to be invariant"' failed.
PLEASE submit a bug report to https://github.com/llvm/llvm-project/issues/ and include the crash backtrace.

llvm/lib/Transforms/Vectorize/VPlanRecipes.cpp

llvm/test/Transforms/LoopVectorize/vectorize-vp-intrinsics-interleave.ll

alexey-bataev · 2024-01-22T18:32:58Z

The update regarding AVL/EVL. I missed one point here, when we discussed it before.

AVL can be referred as the input parameter for llvm.experimental.get.vector.length intrinsic.
EVL is the result, returned by this intrinsic.

So, this 2 subject are separate. For this reason the corresponding parameter in LLVM IR Reference manual (https://llvm.org/docs/LangRef.html) for VP-based intrinsics is named as %evl and we use it here as EVL.

alexey-bataev · 2024-01-22T18:34:09Z

Rebase + merged the check lines in the tests

fhahn

The review on Phabricator is now available on the static archive: https://reviews.llvm.org/D99750

Went through @ayalz 's latest comments and shared the one that still seems pending/open. There was quite a lot to go over, so I might have missed some comments.

Also added some more comments inline. In terms of further refactoring for this patch, would be good to remove the dedicated EVLIncrement opcode now that #78113 landed, if possible.

The other larger pending suggestion is related to EVL handling in the recipes; I suggest to add a TODO and address this as follow-up, unless @ayalz prefers doing the refactoring first. I am planning to split up/refactor memory recipe soon now that address computation is already moved out.

llvm/lib/Transforms/Vectorize/LoopVectorize.cpp

llvm/lib/Transforms/Vectorize/VPlan.h

fhahn · 2024-01-26T15:42:13Z

llvm/lib/Transforms/Vectorize/VPlanRecipes.cpp

+    assert(EVL->getType()->getScalarSizeInBits() <=
+               Phi->getType()->getScalarSizeInBits() &&
+           "EVL type must be smaller than Phi type.");
+    EVL = Builder.CreateIntCast(EVL, Phi->getType(), /*isSigned=*/false);


#78113 landed so it should be possible now to use Add for the increment. Does that work now?

llvm/lib/Transforms/Vectorize/VPlanRecipes.cpp

llvm/lib/Transforms/Vectorize/LoopVectorize.cpp

llvm/lib/Transforms/Vectorize/VPlanRecipes.cpp

llvm/test/Transforms/LoopVectorize/RISCV/vplan-vp-intrinsics.ll

llvm/lib/Transforms/Vectorize/LoopVectorize.cpp

llvm/lib/Target/RISCV/RISCVTargetTransformInfo.h

alexey-bataev · 2024-01-26T17:50:19Z

The review on Phabricator is now available on the static archive: https://reviews.llvm.org/D99750

Went through @ayalz 's latest comments and shared the one that still seems pending/open. There was quite a lot to go over, so I might have missed some comments.

Also added some more comments inline. In terms of further refactoring for this patch, would be good to remove the dedicated EVLIncrement opcode now that #78113 landed, if possible.

The other larger pending suggestion is related to EVL handling in the recipes; I suggest to add a TODO and address this as follow-up, unless @ayalz prefers doing the refactoring first. I am planning to split up/refactor memory recipe soon now that address computation is already moved out.

Instruction::Add still does not work, crashes the compiler because this VPInstruction returns that it "does not use only first lane".

fhahn

Instruction::Add still does not work, crashes the compiler because this VPInstruction returns that it "does not use only first lane".

Can you push that commit somewhere so I can have a look? Looks like only-first-lane-used analysis might need some additional info.

llvm/test/Transforms/LoopVectorize/RISCV/vplan-vp-intrinsics.ll

llvm/lib/Transforms/Vectorize/VPlan.h

llvm/lib/Transforms/Vectorize/LoopVectorize.cpp

alexey-bataev · 2024-01-26T21:53:10Z

Instruction::Add still does not work, crashes the compiler because this VPInstruction returns that it "does not use only first lane".

Can you push that commit somewhere so I can have a look? Looks like only-first-lane-used analysis might need some additional info.

Just replace VPInstruction::ExplicitVectorLengthIVIncrement with Instruction::Add in lib/Transforms/Vectorize/VPlanTransforms.cpp, line 1273

alexey-bataev · 2024-01-29T19:00:39Z

Oh right, surprised it is already used by PPC. EVL LV won't work on PPC due to them not using scalable vectors? At least I cannot find a test that uses vscale. But would it make sense to support EVL on PPC, if it supports active-vector-length?

Currently it won't work for PPC, since it has some specific checks in TTI. We (or PPC developers) can enable it later.

appujee · 2024-04-04T18:38:55Z

Thanks for approving it!

Introduce new subclasses of VPWidenMemoryRecipe for VP (vector-predicated) loads and stores to address multiple TODOs from llvm#76172 Note that the introduction of the new recipes also improves code-gen for VP gather/scatters by removing the redundant header mask. With the new approach, it is not sufficient to look at users of the widened canonical IV to find all uses of the header mask. In some cases, a widened IV is used instead of separately widening the canonical IV. To handle those cases, iterate over all recipes in the vector loop region to make sure all widened memory recipes are processed. Depends on llvm#87411.

vntkmr · 2024-04-08T22:46:21Z

@alexey-bataev I just noticed that this patch is merged, good work!
As you might remember, I was the original author of this patch back in 2021 while at BSC, before you took over.

I noticed that because of the move from phabricator to github, some of the history of this patch is now lost - I can't find a way to access the initial commits and discussions on the patch from 2021.
I will appreciate if you can add an acknowledgement for the initial work on this patch.

CC: @rofirrim @simoll

Vineet Kumar

alexey-bataev · 2024-04-08T22:49:13Z

@alexey-bataev I just noticed that this patch is merged, good work! As you might remember, I was the original author of this patch back in 2021 while at BSC, before you took over.

I noticed that because of the move from phabricator to github, some of the history of this patch is now lost - I can't find a way to access the initial commits and discussions on the patch from 2021. I will appreciate if you can add an acknowledgement for the initial work on this patch.

CC: @rofirrim @simoll

Vineet Kumar

Hi, sure! Sorry, forgot about that :(

alexey-bataev · 2024-04-08T23:06:58Z

@alexey-bataev I just noticed that this patch is merged, good work! As you might remember, I was the original author of this patch back in 2021 while at BSC, before you took over.

I noticed that because of the move from phabricator to github, some of the history of this patch is now lost - I can't find a way to access the initial commits and discussions on the patch from 2021. I will appreciate if you can add an acknowledgement for the initial work on this patch.

CC: @rofirrim @simoll

Vineet Kumar

Added co-authors to the commit

vntkmr · 2024-04-08T23:07:48Z

@alexey-bataev I just noticed that this patch is merged, good work! As you might remember, I was the original author of this patch back in 2021 while at BSC, before you took over.
I noticed that because of the move from phabricator to github, some of the history of this patch is now lost - I can't find a way to access the initial commits and discussions on the patch from 2021. I will appreciate if you can add an acknowledgement for the initial work on this patch.
CC: @rofirrim @simoll
Vineet Kumar

Hi, sure! Sorry, forgot about that :(

Thank you!

For those interested in the older discussion on this patch, it is recorded on the internet archive at https://web.archive.org/web/20230128111909/https://reviews.llvm.org/D99750.
Diff for the initial patch is available on phabricator archive https://reviews.llvm.org/D99750?vs=334753&id=353243.
("Show Older Changes" link on the archived phabricator page does not work anymore, unfortunately.)

appujee · 2024-04-09T03:25:50Z

@alexey-bataev I just noticed that this patch is merged, good work! As you might remember, I was the original author of this patch back in 2021 while at BSC, before you took over.
I noticed that because of the move from phabricator to github, some of the history of this patch is now lost - I can't find a way to access the initial commits and discussions on the patch from 2021. I will appreciate if you can add an acknowledgement for the initial work on this patch.
CC: @rofirrim @simoll
Vineet Kumar

Hi, sure! Sorry, forgot about that :(

Thank you!

Hi Vineet,
Thanks for your work!

For those interested in the initial version and older discussion on this patch, it is recorded on the internet archive at https://web.archive.org/web/20230128111909/https://reviews.llvm.org/D99750. "Show Older Changes" link on the archived phabricator page does not work anymore, unfortunately.

What you can do is create a PR(on top of the correct baseline) with this change and close the PR. That way it can be there on github as well. It might take a bit but i feel this would help others keep historical context.

Introduce new subclasses of VPWidenMemoryRecipe for VP (vector-predicated) loads and stores to address multiple TODOs from llvm#76172 Note that the introduction of the new recipes also improves code-gen for VP gather/scatters by removing the redundant header mask. With the new approach, it is not sufficient to look at users of the widened canonical IV to find all uses of the header mask. In some cases, a widened IV is used instead of separately widening the canonical IV. To handle those cases, iterate over all recipes in the vector loop region to make sure all widened memory recipes are processed. Depends on llvm#87411.

Introduce new subclasses of VPWidenMemoryRecipe for VP (vector-predicated) loads and stores to address multiple TODOs from #76172 Note that the introduction of the new recipes also improves code-gen for VP gather/scatters by removing the redundant header mask. With the new approach, it is not sufficient to look at users of the widened canonical IV to find all uses of the header mask. In some cases, a widened IV is used instead of separately widening the canonical IV. To handle that, first collect all VPValues representing header masks (by looking at users of both the canonical IV and widened inductions that are canonical) and then checking all users (recursively) of those header masks. Depends on #87411. PR: #87816

Introduce new subclasses of VPWidenMemoryRecipe for VP (vector-predicated) loads and stores to address multiple TODOs from llvm#76172 Note that the introduction of the new recipes also improves code-gen for VP gather/scatters by removing the redundant header mask. With the new approach, it is not sufficient to look at users of the widened canonical IV to find all uses of the header mask. In some cases, a widened IV is used instead of separately widening the canonical IV. To handle that, first collect all VPValues representing header masks (by looking at users of both the canonical IV and widened inductions that are canonical) and then checking all users (recursively) of those header masks. Depends on llvm#87411. PR: llvm#87816

appujee · 2024-04-22T18:30:54Z

Posted on https://lists.riscv.org/g/sig-toolchains/message/678 notifying interested parties..

fhahn · 2024-04-26T09:45:32Z

Are there any plans on adding upstream runtime testing for EVL vectorization to guard against regressions?

We really should have upstream end-to-end testing that enables the EVL vectorization path and does a stage2 build + llvm-test-suite to catch regressions (similar to how SVE enabled bots were added when scalable vector support was added IIRC)

cc'ing some additional people who might also be able to help @appujee @Mel-Chen @nikolaypanchenko @arcbbb @preames

nikolaypanchenko · 2024-04-26T13:36:16Z

Are there any plans on adding upstream runtime testing for EVL vectorization to guard against regressions?

We really should have upstream end-to-end testing that enables the EVL vectorization path and does a stage2 build + llvm-test-suite to catch regressions (similar to how SVE enabled bots were added when scalable vector support was added IIRC)

cc'ing some additional people who might also be able to help @appujee @Mel-Chen @nikolaypanchenko @arcbbb @preames

We don't have them yet, but certainly stability testing should be done as early as possible. We will work on plan for it!

appujee · 2024-04-26T16:30:02Z

Are there any plans on adding upstream runtime testing for EVL vectorization to guard against regressions?

We really should have upstream end-to-end testing that enables the EVL vectorization path and does a stage2 build + llvm-test-suite to catch regressions (similar to how SVE enabled bots were added when scalable vector support was added IIRC)

cc'ing some additional people who might also be able to help @appujee @Mel-Chen @nikolaypanchenko @arcbbb @preames

+1. Let me know if i can help with anything here.

asb · 2024-04-27T07:21:33Z

Are there any plans on adding upstream runtime testing for EVL vectorization to guard against regressions?

We really should have upstream end-to-end testing that enables the EVL vectorization path and does a stage2 build + llvm-test-suite to catch regressions (similar to how SVE enabled bots were added when scalable vector support was added IIRC)

I'm currently working through a project to spin up a range of RISC-V builders. Part of that will involve deciding what configurations to test given the resources we have. It sounds like this could be an interesting config to add to the list. To clarify, is the suggestion basically a build with -mllvm -force-tail-folding-style=data-with-evl?

alexey-bataev · 2024-04-27T10:46:15Z

Hi Alex, yes, at least one builder should enable this option. I think we need to test both configs for now, with and without this option.

topperc · 2024-05-08T16:02:47Z

data-with-evl

Do we also need -prefer-predicate-over-epilogue=predicate-dont-vectorize?

Selects the tail-folding style while choosing the max vector factor and storing it in the data member rather than calculating it each time upon getTailFoldingStyle call. Part of llvm#76172 Reviewers: ayalz, fhahn Reviewed By: fhahn Pull Request: llvm#81885

) Following from #87816, add VPReductionEVLRecipe to describe vector predication reduction. Address one of TODOs from #76172.

…m#90184) Summary: Following from llvm#87816, add VPReductionEVLRecipe to describe vector predication reduction. Address one of TODOs from llvm#76172. Test Plan: Reviewers: Subscribers: Tasks: Tags: Differential Revision: https://phabricator.intern.facebook.com/D59822470

) Summary: Following from #87816, add VPReductionEVLRecipe to describe vector predication reduction. Address one of TODOs from #76172. Test Plan: Reviewers: Subscribers: Tasks: Tags: Differential Revision: https://phabricator.intern.facebook.com/D60251485

alexey-bataev requested a review from fhahn December 21, 2023 18:12

llvmbot added backend:RISC-V vectorizers llvm:analysis llvm:transforms labels Dec 21, 2023

alexey-bataev requested a review from ayalz December 21, 2023 18:12

alexey-bataev force-pushed the arcpatch-D99750 branch from 9f0b36c to 3071632 Compare December 21, 2023 18:24

alexey-bataev force-pushed the arcpatch-D99750 branch from 3071632 to 8bb19c6 Compare January 5, 2024 01:15

alexey-bataev force-pushed the arcpatch-D99750 branch from 8bb19c6 to b960fdd Compare January 9, 2024 21:40

fhahn reviewed Jan 10, 2024

View reviewed changes

alexey-bataev force-pushed the arcpatch-D99750 branch 2 times, most recently from d847073 to 8665929 Compare January 11, 2024 12:46

fhahn mentioned this pull request Jan 22, 2024

[VPlan] Add new VPScalarCastRecipe, use for IV & step trunc. #78113

Merged

alexey-bataev force-pushed the arcpatch-D99750 branch from 8665929 to a6c1689 Compare January 22, 2024 18:33

fhahn reviewed Jan 26, 2024

View reviewed changes

alexey-bataev force-pushed the arcpatch-D99750 branch from a6c1689 to 9a7809b Compare January 26, 2024 18:10

fhahn reviewed Jan 26, 2024

View reviewed changes

alexey-bataev merged commit 413a66f into llvm:main Apr 4, 2024
4 checks passed

alexey-bataev deleted the arcpatch-D99750 branch April 4, 2024 22:30

fhahn mentioned this pull request Apr 5, 2024

[VPlan] Split VPWidenMemoryInstructionRecipe (NFCI). #87411

Merged

fhahn mentioned this pull request Apr 5, 2024

[VPlan] Introduce recipes for VP loads and stores. #87816

Merged

Mel-Chen mentioned this pull request Apr 26, 2024

[LV][EVL] Support in-loop reduction using tail folding with EVL. #90184

Merged

Mel-Chen added a commit that referenced this pull request Jul 16, 2024

[LV][EVL] Support in-loop reduction using tail folding with EVL. (#90184

4eb30cf

) Following from #87816, add VPReductionEVLRecipe to describe vector predication reduction. Address one of TODOs from #76172.

[LV, VP]VP intrinsics support for the Loop Vectorizer + adding new tail-folding mode using EVL. #76172

[LV, VP]VP intrinsics support for the Loop Vectorizer + adding new tail-folding mode using EVL. #76172

Conversation

alexey-bataev commented Dec 21, 2023 • edited Loading

llvmbot commented Dec 21, 2023 • edited by alexey-bataev Loading

llvmbot commented Dec 21, 2023 • edited by alexey-bataev Loading

github-actions bot commented Dec 21, 2023 • edited Loading

alexey-bataev commented Dec 29, 2023

fhahn commented Dec 29, 2023

alexey-bataev commented Dec 29, 2023

fhahn commented Jan 2, 2024

alexey-bataev commented Jan 5, 2024

fhahn Jan 10, 2024

Choose a reason for hiding this comment

alexey-bataev Jan 10, 2024

Choose a reason for hiding this comment

fhahn Jan 11, 2024

Choose a reason for hiding this comment

fhahn Jan 16, 2024

Choose a reason for hiding this comment

fhahn Jan 26, 2024

Choose a reason for hiding this comment

alexey-bataev Feb 5, 2024

Choose a reason for hiding this comment

alexey-bataev commented Jan 22, 2024

alexey-bataev commented Jan 22, 2024

fhahn left a comment

Choose a reason for hiding this comment

fhahn Jan 26, 2024

Choose a reason for hiding this comment

alexey-bataev commented Jan 26, 2024

fhahn left a comment

Choose a reason for hiding this comment

alexey-bataev commented Jan 26, 2024

alexey-bataev commented Jan 29, 2024

appujee commented Apr 4, 2024

vntkmr commented Apr 8, 2024 • edited Loading

alexey-bataev commented Apr 8, 2024

alexey-bataev commented Apr 8, 2024

vntkmr commented Apr 8, 2024 • edited Loading

appujee commented Apr 9, 2024

appujee commented Apr 22, 2024

fhahn commented Apr 26, 2024

nikolaypanchenko commented Apr 26, 2024

appujee commented Apr 26, 2024

asb commented Apr 27, 2024

alexey-bataev commented Apr 27, 2024

topperc commented May 8, 2024

alexey-bataev commented Dec 21, 2023 •

edited

Loading

llvmbot commented Dec 21, 2023 •

edited by alexey-bataev

Loading

llvmbot commented Dec 21, 2023 •

edited by alexey-bataev

Loading

github-actions bot commented Dec 21, 2023 •

edited

Loading

vntkmr commented Apr 8, 2024 •

edited

Loading

vntkmr commented Apr 8, 2024 •

edited

Loading