[VPlan] Introduce recipes for VP loads and stores. #87816

fhahn · 2024-04-05T18:40:09Z

Introduce new subclasses of VPWidenMemoryRecipe for VP
(vector-predicated) loads and stores to address multiple TODOs from
#76172

Note that the introduction of the new recipes also improves code-gen for
VP gather/scatters by removing the redundant header mask. With the new
approach, it is not sufficient to look at users of the widened canonical
IV to find all uses of the header mask.

In some cases, a widened IV is used instead of separately widening the
canonical IV. To handle those cases, iterate over all recipes in the
vector loop region to make sure all widened memory recipes are
processed.

Depends on #87411.

llvmbot · 2024-04-05T18:40:40Z

@llvm/pr-subscribers-llvm-transforms

@llvm/pr-subscribers-backend-risc-v

Author: Florian Hahn (fhahn)

Changes

Introduce new subclasses of VPWidenMemoryRecipe for VP
(vector-predicated) loads and stores to address multiple TODOs from
#76172

Note that the introduction of the new recipes also improves code-gen for
VP gather/scatters by removing the redundant header mask. With the new
approach, it is not sufficient to look at users of the widened canonical
IV to find all uses of the header mask.

In some cases, a widened IV is used instead of separately widening the
canonical IV. To handle those cases, iterate over all recipes in the
vector loop region to make sure all widened memory recipes are
processed.

Depends on #87411.

Patch is 77.02 KiB, truncated to 20.00 KiB below, full version: https://github.com/llvm/llvm-project/pull/87816.diff

23 Files Affected:

(modified) llvm/lib/Transforms/Vectorize/LoopVectorize.cpp (+155-140)
(modified) llvm/lib/Transforms/Vectorize/VPRecipeBuilder.h (+3-3)
(modified) llvm/lib/Transforms/Vectorize/VPlan.h (+211-63)
(modified) llvm/lib/Transforms/Vectorize/VPlanAnalysis.cpp (+4-5)
(modified) llvm/lib/Transforms/Vectorize/VPlanAnalysis.h (+2-2)
(modified) llvm/lib/Transforms/Vectorize/VPlanRecipes.cpp (+35-21)
(modified) llvm/lib/Transforms/Vectorize/VPlanTransforms.cpp (+53-23)
(modified) llvm/lib/Transforms/Vectorize/VPlanValue.h (+4-3)
(modified) llvm/lib/Transforms/Vectorize/VPlanVerifier.cpp (+1-1)
(modified) llvm/test/Transforms/LoopVectorize/AArch64/sve-tail-folding-forced.ll (+1-1)
(modified) llvm/test/Transforms/LoopVectorize/AArch64/sve-widen-gep.ll (+2-2)
(modified) llvm/test/Transforms/LoopVectorize/AArch64/synthesize-mask-for-call.ll (+6-6)
(modified) llvm/test/Transforms/LoopVectorize/RISCV/riscv-vector-reverse.ll (+2-2)
(modified) llvm/test/Transforms/LoopVectorize/RISCV/vectorize-force-tail-with-evl-gather-scatter.ll (+5-9)
(modified) llvm/test/Transforms/LoopVectorize/RISCV/vplan-vp-intrinsics.ll (+5-5)
(modified) llvm/test/Transforms/LoopVectorize/X86/vplan-vp-intrinsics.ll (+2-2)
(modified) llvm/test/Transforms/LoopVectorize/first-order-recurrence-chains-vplan.ll (+2-2)
(modified) llvm/test/Transforms/LoopVectorize/vplan-dot-printing.ll (+1-1)
(modified) llvm/test/Transforms/LoopVectorize/vplan-iv-transforms.ll (+1-1)
(modified) llvm/test/Transforms/LoopVectorize/vplan-printing-before-execute.ll (+2-2)
(modified) llvm/test/Transforms/LoopVectorize/vplan-printing.ll (+9-9)
(modified) llvm/unittests/Transforms/Vectorize/VPlanHCFGTest.cpp (+2-2)
(modified) llvm/unittests/Transforms/Vectorize/VPlanTest.cpp (+4-5)

diff --git a/llvm/lib/Transforms/Vectorize/LoopVectorize.cpp b/llvm/lib/Transforms/Vectorize/LoopVectorize.cpp
index 49bacb5ae6cc4e..10d41e829e88b3 100644
--- a/llvm/lib/Transforms/Vectorize/LoopVectorize.cpp
+++ b/llvm/lib/Transforms/Vectorize/LoopVectorize.cpp
@@ -8095,7 +8095,7 @@ void VPRecipeBuilder::createBlockInMask(BasicBlock *BB) {
   BlockMaskCache[BB] = BlockMask;
 }
 
-VPWidenMemoryInstructionRecipe *
+VPWidenMemoryRecipe *
 VPRecipeBuilder::tryToWidenMemory(Instruction *I, ArrayRef<VPValue *> Operands,
                                   VFRange &Range) {
   assert((isa<LoadInst>(I) || isa<StoreInst>(I)) &&
@@ -8140,12 +8140,12 @@ VPRecipeBuilder::tryToWidenMemory(Instruction *I, ArrayRef<VPValue *> Operands,
     Ptr = VectorPtr;
   }
   if (LoadInst *Load = dyn_cast<LoadInst>(I))
-    return new VPWidenMemoryInstructionRecipe(*Load, Ptr, Mask, Consecutive,
-                                              Reverse, I->getDebugLoc());
+    return new VPWidenLoadRecipe(*Load, Ptr, Mask, Consecutive, Reverse,
+                                 I->getDebugLoc());
 
   StoreInst *Store = cast<StoreInst>(I);
-  return new VPWidenMemoryInstructionRecipe(
-      *Store, Ptr, Operands[0], Mask, Consecutive, Reverse, I->getDebugLoc());
+  return new VPWidenStoreRecipe(*Store, Operands[0], Ptr, Mask, Consecutive,
+                                Reverse, I->getDebugLoc());
 }
 
 /// Creates a VPWidenIntOrFpInductionRecpipe for \p Phi. If needed, it will also
@@ -8780,13 +8780,12 @@ LoopVectorizationPlanner::tryToBuildVPlanWithVPRecipes(VFRange &Range) {
   // for this VPlan, replace the Recipes widening its memory instructions with a
   // single VPInterleaveRecipe at its insertion point.
   for (const auto *IG : InterleaveGroups) {
-    auto *Recipe = cast<VPWidenMemoryInstructionRecipe>(
-        RecipeBuilder.getRecipe(IG->getInsertPos()));
+    auto *Recipe =
+        cast<VPWidenMemoryRecipe>(RecipeBuilder.getRecipe(IG->getInsertPos()));
     SmallVector<VPValue *, 4> StoredValues;
     for (unsigned i = 0; i < IG->getFactor(); ++i)
       if (auto *SI = dyn_cast_or_null<StoreInst>(IG->getMember(i))) {
-        auto *StoreR =
-            cast<VPWidenMemoryInstructionRecipe>(RecipeBuilder.getRecipe(SI));
+        auto *StoreR = cast<VPWidenStoreRecipe>(RecipeBuilder.getRecipe(SI));
         StoredValues.push_back(StoreR->getStoredValue());
       }
 
@@ -9418,73 +9417,19 @@ void VPReplicateRecipe::execute(VPTransformState &State) {
       State.ILV->scalarizeInstruction(UI, this, VPIteration(Part, Lane), State);
 }
 
-/// Creates either vp_store or vp_scatter intrinsics calls to represent
-/// predicated store/scatter.
-static Instruction *
-lowerStoreUsingVectorIntrinsics(IRBuilderBase &Builder, Value *Addr,
-                                Value *StoredVal, bool IsScatter, Value *Mask,
-                                Value *EVL, const Align &Alignment) {
-  CallInst *Call;
-  if (IsScatter) {
-    Call = Builder.CreateIntrinsic(Type::getVoidTy(EVL->getContext()),
-                                   Intrinsic::vp_scatter,
-                                   {StoredVal, Addr, Mask, EVL});
-  } else {
-    VectorBuilder VBuilder(Builder);
-    VBuilder.setEVL(EVL).setMask(Mask);
-    Call = cast<CallInst>(VBuilder.createVectorInstruction(
-        Instruction::Store, Type::getVoidTy(EVL->getContext()),
-        {StoredVal, Addr}));
-  }
-  Call->addParamAttr(
-      1, Attribute::getWithAlignment(Call->getContext(), Alignment));
-  return Call;
-}
-
-/// Creates either vp_load or vp_gather intrinsics calls to represent
-/// predicated load/gather.
-static Instruction *lowerLoadUsingVectorIntrinsics(IRBuilderBase &Builder,
-                                                   VectorType *DataTy,
-                                                   Value *Addr, bool IsGather,
-                                                   Value *Mask, Value *EVL,
-                                                   const Align &Alignment) {
-  CallInst *Call;
-  if (IsGather) {
-    Call =
-        Builder.CreateIntrinsic(DataTy, Intrinsic::vp_gather, {Addr, Mask, EVL},
-                                nullptr, "wide.masked.gather");
-  } else {
-    VectorBuilder VBuilder(Builder);
-    VBuilder.setEVL(EVL).setMask(Mask);
-    Call = cast<CallInst>(VBuilder.createVectorInstruction(
-        Instruction::Load, DataTy, Addr, "vp.op.load"));
-  }
-  Call->addParamAttr(
-      0, Attribute::getWithAlignment(Call->getContext(), Alignment));
-  return Call;
-}
-
-void VPWidenMemoryInstructionRecipe::execute(VPTransformState &State) {
-  VPValue *StoredValue = isStore() ? getStoredValue() : nullptr;
-
+void VPWidenLoadRecipe::execute(VPTransformState &State) {
   // Attempt to issue a wide load.
-  LoadInst *LI = dyn_cast<LoadInst>(&Ingredient);
-  StoreInst *SI = dyn_cast<StoreInst>(&Ingredient);
-
-  assert((LI || SI) && "Invalid Load/Store instruction");
-  assert((!SI || StoredValue) && "No stored value provided for widened store");
-  assert((!LI || !StoredValue) && "Stored value provided for widened load");
+  auto *LI = cast<LoadInst>(&Ingredient);
 
   Type *ScalarDataTy = getLoadStoreType(&Ingredient);
-
   auto *DataTy = VectorType::get(ScalarDataTy, State.VF);
   const Align Alignment = getLoadStoreAlignment(&Ingredient);
-  bool CreateGatherScatter = !isConsecutive();
+  bool CreateGather = !isConsecutive();
 
   auto &Builder = State.Builder;
   InnerLoopVectorizer::VectorParts BlockInMaskParts(State.UF);
-  bool isMaskRequired = getMask();
-  if (isMaskRequired) {
+  bool IsMaskRequired = getMask();
+  if (IsMaskRequired) {
     // Mask reversal is only needed for non-all-one (null) masks, as reverse of
     // a null all-one mask is a null mask.
     for (unsigned Part = 0; Part < State.UF; ++Part) {
@@ -9495,88 +9440,20 @@ void VPWidenMemoryInstructionRecipe::execute(VPTransformState &State) {
     }
   }
 
-  // Handle Stores:
-  if (SI) {
-    State.setDebugLocFrom(getDebugLoc());
-
-    for (unsigned Part = 0; Part < State.UF; ++Part) {
-      Instruction *NewSI = nullptr;
-      Value *StoredVal = State.get(StoredValue, Part);
-      // TODO: split this into several classes for better design.
-      if (State.EVL) {
-        assert(State.UF == 1 && "Expected only UF == 1 when vectorizing with "
-                                "explicit vector length.");
-        assert(cast<VPInstruction>(State.EVL)->getOpcode() ==
-                   VPInstruction::ExplicitVectorLength &&
-               "EVL must be VPInstruction::ExplicitVectorLength.");
-        Value *EVL = State.get(State.EVL, VPIteration(0, 0));
-        // If EVL is not nullptr, then EVL must be a valid value set during plan
-        // creation, possibly default value = whole vector register length. EVL
-        // is created only if TTI prefers predicated vectorization, thus if EVL
-        // is not nullptr it also implies preference for predicated
-        // vectorization.
-        // FIXME: Support reverse store after vp_reverse is added.
-        Value *MaskPart = isMaskRequired ? BlockInMaskParts[Part] : nullptr;
-        NewSI = lowerStoreUsingVectorIntrinsics(
-            Builder, State.get(getAddr(), Part, !CreateGatherScatter),
-            StoredVal, CreateGatherScatter, MaskPart, EVL, Alignment);
-      } else if (CreateGatherScatter) {
-        Value *MaskPart = isMaskRequired ? BlockInMaskParts[Part] : nullptr;
-        Value *VectorGep = State.get(getAddr(), Part);
-        NewSI = Builder.CreateMaskedScatter(StoredVal, VectorGep, Alignment,
-                                            MaskPart);
-      } else {
-        if (isReverse()) {
-          // If we store to reverse consecutive memory locations, then we need
-          // to reverse the order of elements in the stored value.
-          StoredVal = Builder.CreateVectorReverse(StoredVal, "reverse");
-          // We don't want to update the value in the map as it might be used in
-          // another expression. So don't call resetVectorValue(StoredVal).
-        }
-        auto *VecPtr = State.get(getAddr(), Part, /*IsScalar*/ true);
-        if (isMaskRequired)
-          NewSI = Builder.CreateMaskedStore(StoredVal, VecPtr, Alignment,
-                                            BlockInMaskParts[Part]);
-        else
-          NewSI = Builder.CreateAlignedStore(StoredVal, VecPtr, Alignment);
-      }
-      State.addMetadata(NewSI, SI);
-    }
-    return;
-  }
-
   // Handle loads.
   assert(LI && "Must have a load instruction");
   State.setDebugLocFrom(getDebugLoc());
   for (unsigned Part = 0; Part < State.UF; ++Part) {
     Value *NewLI;
-    // TODO: split this into several classes for better design.
-    if (State.EVL) {
-      assert(State.UF == 1 && "Expected only UF == 1 when vectorizing with "
-                              "explicit vector length.");
-      assert(cast<VPInstruction>(State.EVL)->getOpcode() ==
-                 VPInstruction::ExplicitVectorLength &&
-             "EVL must be VPInstruction::ExplicitVectorLength.");
-      Value *EVL = State.get(State.EVL, VPIteration(0, 0));
-      // If EVL is not nullptr, then EVL must be a valid value set during plan
-      // creation, possibly default value = whole vector register length. EVL
-      // is created only if TTI prefers predicated vectorization, thus if EVL
-      // is not nullptr it also implies preference for predicated
-      // vectorization.
-      // FIXME: Support reverse loading after vp_reverse is added.
-      Value *MaskPart = isMaskRequired ? BlockInMaskParts[Part] : nullptr;
-      NewLI = lowerLoadUsingVectorIntrinsics(
-          Builder, DataTy, State.get(getAddr(), Part, !CreateGatherScatter),
-          CreateGatherScatter, MaskPart, EVL, Alignment);
-    } else if (CreateGatherScatter) {
-      Value *MaskPart = isMaskRequired ? BlockInMaskParts[Part] : nullptr;
+    if (CreateGather) {
+      Value *MaskPart = IsMaskRequired ? BlockInMaskParts[Part] : nullptr;
       Value *VectorGep = State.get(getAddr(), Part);
       NewLI = Builder.CreateMaskedGather(DataTy, VectorGep, Alignment, MaskPart,
                                          nullptr, "wide.masked.gather");
       State.addMetadata(NewLI, LI);
     } else {
       auto *VecPtr = State.get(getAddr(), Part, /*IsScalar*/ true);
-      if (isMaskRequired)
+      if (IsMaskRequired)
         NewLI = Builder.CreateMaskedLoad(
             DataTy, VecPtr, Alignment, BlockInMaskParts[Part],
             PoisonValue::get(DataTy), "wide.masked.load");
@@ -9590,10 +9467,148 @@ void VPWidenMemoryInstructionRecipe::execute(VPTransformState &State) {
         NewLI = Builder.CreateVectorReverse(NewLI, "reverse");
     }
 
-    State.set(getVPSingleValue(), NewLI, Part);
+    State.set(this, NewLI, Part);
+  }
+}
+
+void VPWidenVPLoadRecipe::execute(VPTransformState &State) {
+  assert(State.UF == 1 && "Expected only UF == 1 when vectorizing with "
+                          "explicit vector length.");
+  // FIXME: Support reverse loading after vp_reverse is added.
+  assert(!isReverse() && "Reverse loads are not implemented yet.");
+
+  // Attempt to issue a wide load.
+  auto *LI = cast<LoadInst>(&Ingredient);
+
+  Type *ScalarDataTy = getLoadStoreType(&Ingredient);
+  auto *DataTy = VectorType::get(ScalarDataTy, State.VF);
+  const Align Alignment = getLoadStoreAlignment(&Ingredient);
+  bool CreateGather = !isConsecutive();
+
+  auto &Builder = State.Builder;
+  // Handle loads.
+  assert(LI && "Must have a load instruction");
+  State.setDebugLocFrom(getDebugLoc());
+  for (unsigned Part = 0; Part < State.UF; ++Part) {
+    CallInst *NewLI;
+    Value *EVL = State.get(getEVL(), VPIteration(0, 0));
+    Value *Addr = State.get(getAddr(), Part, !CreateGather);
+    Value *Mask =
+        getMask()
+            ? State.get(getMask(), Part)
+            : Mask = Builder.CreateVectorSplat(State.VF, Builder.getTrue());
+    if (CreateGather) {
+      NewLI = Builder.CreateIntrinsic(DataTy, Intrinsic::vp_gather,
+                                      {Addr, Mask, EVL}, nullptr,
+                                      "wide.masked.gather");
+    } else {
+      VectorBuilder VBuilder(Builder);
+      VBuilder.setEVL(EVL).setMask(Mask);
+      NewLI = cast<CallInst>(VBuilder.createVectorInstruction(
+          Instruction::Load, DataTy, Addr, "vp.op.load"));
+    }
+    NewLI->addParamAttr(
+        0, Attribute::getWithAlignment(NewLI->getContext(), Alignment));
+
+    // Add metadata to the load.
+    State.addMetadata(NewLI, LI);
+    State.set(this, NewLI, Part);
+  }
+}
+
+void VPWidenStoreRecipe::execute(VPTransformState &State) {
+  auto *SI = cast<StoreInst>(&Ingredient);
+
+  VPValue *StoredValue = getStoredValue();
+  bool CreateScatter = !isConsecutive();
+  const Align Alignment = getLoadStoreAlignment(&Ingredient);
+
+  auto &Builder = State.Builder;
+  InnerLoopVectorizer::VectorParts BlockInMaskParts(State.UF);
+  bool IsMaskRequired = getMask();
+  if (IsMaskRequired) {
+    // Mask reversal is only needed for non-all-one (null) masks, as reverse of
+    // a null all-one mask is a null mask.
+    for (unsigned Part = 0; Part < State.UF; ++Part) {
+      Value *Mask = State.get(getMask(), Part);
+      if (isReverse())
+        Mask = Builder.CreateVectorReverse(Mask, "reverse");
+      BlockInMaskParts[Part] = Mask;
+    }
+  }
+
+  State.setDebugLocFrom(getDebugLoc());
+
+  for (unsigned Part = 0; Part < State.UF; ++Part) {
+    Instruction *NewSI = nullptr;
+    Value *StoredVal = State.get(StoredValue, Part);
+    // TODO: split this into several classes for better design.
+    if (CreateScatter) {
+      Value *MaskPart = IsMaskRequired ? BlockInMaskParts[Part] : nullptr;
+      Value *VectorGep = State.get(getAddr(), Part);
+      NewSI = Builder.CreateMaskedScatter(StoredVal, VectorGep, Alignment,
+                                          MaskPart);
+    } else {
+      if (isReverse()) {
+        // If we store to reverse consecutive memory locations, then we need
+        // to reverse the order of elements in the stored value.
+        StoredVal = Builder.CreateVectorReverse(StoredVal, "reverse");
+        // We don't want to update the value in the map as it might be used in
+        // another expression. So don't call resetVectorValue(StoredVal).
+      }
+      auto *VecPtr = State.get(getAddr(), Part, /*IsScalar*/ true);
+      if (IsMaskRequired)
+        NewSI = Builder.CreateMaskedStore(StoredVal, VecPtr, Alignment,
+                                          BlockInMaskParts[Part]);
+      else
+        NewSI = Builder.CreateAlignedStore(StoredVal, VecPtr, Alignment);
+    }
+    State.addMetadata(NewSI, SI);
   }
 }
 
+void VPWidenVPStoreRecipe::execute(VPTransformState &State) {
+  assert(State.UF == 1 && "Expected only UF == 1 when vectorizing with "
+                          "explicit vector length.");
+  // FIXME: Support reverse loading after vp_reverse is added.
+  assert(!isReverse() && "Reverse store are not implemented yet.");
+
+  auto *SI = cast<StoreInst>(&Ingredient);
+
+  VPValue *StoredValue = getStoredValue();
+  bool CreateScatter = !isConsecutive();
+  const Align Alignment = getLoadStoreAlignment(&Ingredient);
+
+  auto &Builder = State.Builder;
+  State.setDebugLocFrom(getDebugLoc());
+
+  for (unsigned Part = 0; Part < State.UF; ++Part) {
+    CallInst *NewSI = nullptr;
+    Value *StoredVal = State.get(StoredValue, Part);
+    Value *EVL = State.get(getEVL(), VPIteration(0, 0));
+    // FIXME: Support reverse store after vp_reverse is added.
+    Value *Mask =
+        getMask()
+            ? State.get(getMask(), Part)
+            : Mask = Builder.CreateVectorSplat(State.VF, Builder.getTrue());
+    Value *Addr = State.get(getAddr(), Part, !CreateScatter);
+    if (CreateScatter) {
+      NewSI = Builder.CreateIntrinsic(Type::getVoidTy(EVL->getContext()),
+                                      Intrinsic::vp_scatter,
+                                      {StoredVal, Addr, Mask, EVL});
+    } else {
+      VectorBuilder VBuilder(Builder);
+      VBuilder.setEVL(EVL).setMask(Mask);
+      NewSI = cast<CallInst>(VBuilder.createVectorInstruction(
+          Instruction::Store, Type::getVoidTy(EVL->getContext()),
+          {StoredVal, Addr}));
+    }
+    NewSI->addParamAttr(
+        1, Attribute::getWithAlignment(NewSI->getContext(), Alignment));
+
+    State.addMetadata(NewSI, SI);
+  }
+}
 // Determine how to lower the scalar epilogue, which depends on 1) optimising
 // for minimum code-size, 2) predicate compiler options, 3) loop hints forcing
 // predication, and 4) a TTI hook that analyses whether the loop is suitable
diff --git a/llvm/lib/Transforms/Vectorize/VPRecipeBuilder.h b/llvm/lib/Transforms/Vectorize/VPRecipeBuilder.h
index 605b47fa0a46b8..b4c7ab02f928f0 100644
--- a/llvm/lib/Transforms/Vectorize/VPRecipeBuilder.h
+++ b/llvm/lib/Transforms/Vectorize/VPRecipeBuilder.h
@@ -69,9 +69,9 @@ class VPRecipeBuilder {
   /// Check if the load or store instruction \p I should widened for \p
   /// Range.Start and potentially masked. Such instructions are handled by a
   /// recipe that takes an additional VPInstruction for the mask.
-  VPWidenMemoryInstructionRecipe *tryToWidenMemory(Instruction *I,
-                                                   ArrayRef<VPValue *> Operands,
-                                                   VFRange &Range);
+  VPWidenMemoryRecipe *tryToWidenMemory(Instruction *I,
+                                        ArrayRef<VPValue *> Operands,
+                                        VFRange &Range);
 
   /// Check if an induction recipe should be constructed for \p Phi. If so build
   /// and return it. If not, return null.
diff --git a/llvm/lib/Transforms/Vectorize/VPlan.h b/llvm/lib/Transforms/Vectorize/VPlan.h
index 77577b516ae274..cbe015874d16a3 100644
--- a/llvm/lib/Transforms/Vectorize/VPlan.h
+++ b/llvm/lib/Transforms/Vectorize/VPlan.h
@@ -242,15 +242,6 @@ struct VPTransformState {
   ElementCount VF;
   unsigned UF;
 
-  /// If EVL (Explicit Vector Length) is not nullptr, then EVL must be a valid
-  /// value set during plan transformation, possibly a default value = whole
-  /// vector register length. EVL is created only if TTI prefers predicated
-  /// vectorization, thus if EVL is not nullptr it also implies preference for
-  /// predicated vectorization.
-  /// TODO: this is a temporarily solution, the EVL must be explicitly used by
-  /// the recipes and must be removed here.
-  VPValue *EVL = nullptr;
-
   /// Hold the indices to generate specific scalar instructions. Null indicates
   /// that all instances are to be generated, using either scalar or vector
   /// instructions.
@@ -875,7 +866,8 @@ class VPSingleDefRecipe : public VPRecipeBase, public VPValue {
       return true;
     case VPRecipeBase::VPInterleaveSC:
     case VPRecipeBase::VPBranchOnMaskSC:
-    case VPRecipeBase::VPWidenMemoryInstructionSC:
+    case VPRecipeBase::VPWidenLoadSC:
+    case VPRecipeBase::VPWidenStoreSC:
       // TODO: Widened stores don't define a value, but widened loads do. Split
       // the recipes to be able to make widened loads VPSingleDefRecipes.
       return false;
@@ -2273,19 +2265,16 @@ class VPPredInstPHIRecipe : public VPSingleDefRecipe {
   }
 };
 
-/// A Recipe for widening load/store operations.
-/// The recipe uses the following VPValues:
-/// - For load: Address, optional mask
-/// - For store: Address, stored value, optional mask
-/// TODO: We currently execute only per-part unless a specific instance is
-/// provided.
-class VPWidenMemoryInstructionRecipe : public VPRecipeBase {
+/// A common base class for widening memory operations. An optional mask can be
+/// provided the last operand.
+class VPWidenMemoryRecipe : public VPRecipeBase {
+protected:
   Instruction &Ingredient;
 
-  // Whether the loaded-from / stored-to addresses are consecutive.
+  /// Whether the loaded-from / stored-to addresses are consecutive.
   bool Consecutive;
 
-  // Whether the consecutive loaded/stored addresses are in reverse order.
+  /// Whether the consecutive loaded/stored addresses are in reverse order.
   bool Reverse;
 
   void setMask(VPValue *Mask) {
@@ -2294,48 +2283,66 @@ class VPWidenMemoryInstructionRecipe : public VPRecipeBase {
     addOperand(Mask);
   }
 
-  bool isMasked() const {
-    return isStore() ? getNumOperands() == 3 : getNumOperands() == 2;
+  VPWidenMemoryRecipe(const char unsigned ...
[truncated]

fhahn · 2024-04-05T18:40:49Z

Note that this still contains the commit for #87411 as that PR is based on a branch not in llvm-project. Will update once #87411 is merged.

alexey-bataev · 2024-04-05T18:48:33Z

llvm/lib/Transforms/Vectorize/LoopVectorize.cpp

+  // Handle loads.
+  assert(LI && "Must have a load instruction");
+  State.setDebugLocFrom(getDebugLoc());
+  for (unsigned Part = 0; Part < State.UF; ++Part) {


No need for the loop here, since expected that State.UF == 1

Removed loop, thanks!

alexey-bataev · 2024-04-05T18:49:15Z

llvm/lib/Transforms/Vectorize/LoopVectorize.cpp

+  auto &Builder = State.Builder;
+  State.setDebugLocFrom(getDebugLoc());
+
+  for (unsigned Part = 0; Part < State.UF; ++Part) {


Removed, thanks!

alexey-bataev · 2024-04-05T18:50:20Z

llvm/lib/Transforms/Vectorize/VPlan.h

+    return new VPWidenVPLoadRecipe(cast<LoadInst>(Ingredient), getAddr(),
+                                   getEVL(), getMask(), isConsecutive(),
+                                   getDebugLoc());


Do we need clone implementation?

+1
Currently clone() is called only for epilog vectorization, so this is unreachable.
Can define it unreachable at the VPWidenMemoryRecipe parent, instead of pure virtual.

moved to VPWidenMemoryRecipe::clone, thanks!

alexey-bataev · 2024-04-05T18:51:49Z

llvm/lib/Transforms/Vectorize/VPlanTransforms.cpp

+  VPValue *BTC = Plan.getOrCreateBackedgeTakenCount();
+  auto IsHeaderMask = [BTC](VPValue *V) {


Suggested change

VPValue *BTC = Plan.getOrCreateBackedgeTakenCount();

auto IsHeaderMask = [BTC](VPValue *V) {

auto IsHeaderMask = [BTC = Plan.getOrCreateBackedgeTakenCount()](VPValue *V) {

Reworked, the lambda is gone now, thanks!

ayalz

Thanks for taking care of this clean-up!

ayalz · 2024-04-11T11:39:09Z

llvm/lib/Transforms/Vectorize/LoopVectorize.cpp

+  // FIXME: Support reverse loading after vp_reverse is added.
+  assert(!isReverse() && "Reverse loads are not implemented yet.");
+
+  // Attempt to issue a wide load.


Suggested change

// Attempt to issue a wide load.

Removed, thanks!

ayalz · 2024-04-11T11:39:25Z

llvm/lib/Transforms/Vectorize/LoopVectorize.cpp

+  // Handle loads.
+  assert(LI && "Must have a load instruction");


Suggested change

// Handle loads.

assert(LI && "Must have a load instruction");

Removed, thanks!

ayalz · 2024-04-11T11:40:15Z

llvm/lib/Transforms/Vectorize/LoopVectorize.cpp

+    NewLI->addParamAttr(
+        0, Attribute::getWithAlignment(NewLI->getContext(), Alignment));
+
+    // Add metadata to the load.


Does this comment add any information?

Removed, thanks!

ayalz · 2024-04-11T11:42:27Z

llvm/lib/Transforms/Vectorize/VPlan.h

@@ -875,7 +866,8 @@ class VPSingleDefRecipe : public VPRecipeBase, public VPValue {
      return true;
    case VPRecipeBase::VPInterleaveSC:
    case VPRecipeBase::VPBranchOnMaskSC:
-    case VPRecipeBase::VPWidenMemoryInstructionSC:
+    case VPRecipeBase::VPWidenLoadSC:
+    case VPRecipeBase::VPWidenStoreSC:


Add the two VP variants?

Added, thanks!

ayalz · 2024-04-11T11:44:33Z

llvm/lib/Transforms/Vectorize/VPlan.h

  }

-  VP_CLASSOF_IMPL(VPDef::VPWidenMemoryInstructionSC)
+  /// Returns true if the recipe is masked.
+  bool isMasked() const {


Further incentive to maintain a bool IsMasked indicator?

Updated to use bool

ayalz · 2024-04-11T16:56:41Z

llvm/lib/Transforms/Vectorize/VPlanTransforms.cpp

+    return CompareToReplace &&
+           CompareToReplace->getOpcode() == Instruction::ICmp &&
+           CompareToReplace->getPredicate() == CmpInst::ICMP_ULE &&
+           CompareToReplace->getOperand(1) == BTC;


... With the new
approach, it is not sufficient to look at users of the widened canonical
IV to find all uses of the header mask.

In some cases, a widened IV is used instead of separately widening the
canonical IV. To handle those cases, iterate over all recipes in the
vector loop region to make sure all widened memory recipes are
processed.

Must every ICMP_ULE of BTC be the header mask, regardless of its other operand?
Should/can the header mask(s) continue to be identified by looking for compares of VPWidenCanonicalIVRecipe that in turn use the canonical IV, before calling CanonicalIVPHI->replaceAllUsesWith(EVLPhi)?

Tail folding (an integral part of initial stripmining step according to roadmap) should arguably introduce a single abstract HeaderMask VPValue, retrievable from the loop region (as with its canonical IV), to be later materialized/legalized/lowered into concrete bump/widening/compare/active-lane-mask/no-mask-with-EVL recipes placed inside blocks?

Must every ICMP_ULE of BTC be the header mask, regardless of its other operand?

At the moment yes, but I reworked this to walk users of header-masks.

The current placement of addExplicitVectorLength unfortunately means that VPWidenCanonicalIVRecipe may have been removed, if there's a matching VPWidenIntOrFpInductionRecipe (see TODO from the initial EVL PR; the current placement is to avoid introducing new uses of the canonical IV after introducing EVL)

So we also need to walk the users of VPWidenIntOrFpInductionRecipe that are canonical.

Tail folding (an integral part of initial stripmining step according to roadmap) should arguably introduce a single abstract HeaderMask VPValue, retrievable from the loop region (as with its canonical IV), to be later materialized/legalized/lowered into concrete bump/widening/compare/active-lane-mask/no-mask-with-EVL recipes placed inside blocks?

Sounds good to me, pushing towards more gradual lowering (as follow-up)

ayalz · 2024-04-11T17:46:18Z

llvm/lib/Transforms/Vectorize/VPlanTransforms.cpp

+      if (!MemR)
+        continue;
+      VPValue *OrigMask = MemR->getMask();
+      if (!OrigMask)


Are unmasked loads/stores permitted when tail-folding, moreover with EVL - can they remain unmodified and independent of EVL, alongside EVL VP intrinsics?

Should not be null at this point, updated, thanks!

ayalz · 2024-04-11T17:51:23Z

llvm/lib/Transforms/Vectorize/VPlanTransforms.cpp

+      VPValue *OrigMask = MemR->getMask();
+      if (!OrigMask)
+        continue;
+      assert(!MemR->isReverse() &&
+             "Reversed memory operations not supported yet.");
+      VPValue *Mask = IsHeaderMask(OrigMask) ? nullptr : OrigMask;


Suggested change

VPValue *OrigMask = MemR->getMask();

if (!OrigMask)

continue;

assert(!MemR->isReverse() &&

"Reversed memory operations not supported yet.");

VPValue *Mask = IsHeaderMask(OrigMask) ? nullptr : OrigMask;

assert(!MemR->isReverse() &&

"Reversed memory operations not supported yet.");

VPValue *OrigMask = MemR->getMask();

assert(OrigMask && "Unmasked widen memory recipe when folding tail");

VPValue *Mask = IsHeaderMask(OrigMask) ? nullptr : OrigMask;

worth slightly reordering, if OrigMask must be non-null.

Reordered, thanks!

ayalz · 2024-04-11T17:54:50Z

llvm/lib/Transforms/Vectorize/VPlanTransforms.cpp

+             "Reversed memory operations not supported yet.");
+      VPValue *Mask = IsHeaderMask(OrigMask) ? nullptr : OrigMask;
+      if (auto *L = dyn_cast<VPWidenLoadRecipe>(&R)) {
+        auto *N = new VPWidenVPLoadRecipe(cast<LoadInst>(L->getIngredient()),


nit: perhaps the constructor of VPWidenVPLoadRecipe should take a VPWidenLoadRecipe, VPEVL, and mask as parameters.

Done, thanks!

ayalz · 2024-04-11T17:55:57Z

llvm/lib/Transforms/Vectorize/VPlanTransforms.cpp

+        L->replaceAllUsesWith(N);
+        L->eraseFromParent();
+      } else if (auto *S = dyn_cast<VPWidenStoreRecipe>(&R)) {
+        auto *N = new VPWidenVPStoreRecipe(


nit: perhaps the constructor of VPWidenVPStoreRecipe should take a VPWidenStoreRecipe, VPEVL, and mask as parameters.

Done, thanks!

nikolaypanchenko · 2024-04-12T21:49:21Z

llvm/lib/Transforms/Vectorize/VPlan.h

+/// A recipe for widening load operations with vector-predication intrinsics,
+/// using the address to load from, the explicit vector length and an optional
+/// mask.
+struct VPWidenVPLoadRecipe final : public VPWidenMemoryRecipe, public VPValue {


My understanding that you want to have dedicated recipes that emit vp-intrinsics. Since they're derived from VPRecipe, is it expected they will be fully supported by VPlanTransforms ?

Yes, any recipe must be fully supported; all existing code dealing with memory ops should already do this, either by directly dealing with the common base class or various may-read/may-write helpers.

Factor out logic to collect all users recursively to be re-used in #87816.

Introduce new subclasses of VPWidenMemoryRecipe for VP (vector-predicated) loads and stores to address multiple TODOs from llvm#76172 Note that the introduction of the new recipes also improves code-gen for VP gather/scatters by removing the redundant header mask. With the new approach, it is not sufficient to look at users of the widened canonical IV to find all uses of the header mask. In some cases, a widened IV is used instead of separately widening the canonical IV. To handle those cases, iterate over all recipes in the vector loop region to make sure all widened memory recipes are processed. Depends on llvm#87411.

fhahn

Rebased after parent commits landed & comments addressed. Apologies for the force-push

fhahn · 2024-04-17T10:56:07Z

llvm/lib/Transforms/Vectorize/LoopVectorize.cpp

+  // FIXME: Support reverse loading after vp_reverse is added.
+  assert(!isReverse() && "Reverse loads are not implemented yet.");
+
+  // Attempt to issue a wide load.


Removed, thanks!

fhahn · 2024-04-17T10:56:25Z

llvm/lib/Transforms/Vectorize/LoopVectorize.cpp

+  // Handle loads.
+  assert(LI && "Must have a load instruction");


Removed, thanks!

fhahn · 2024-04-17T10:57:17Z

llvm/lib/Transforms/Vectorize/LoopVectorize.cpp

+  // Handle loads.
+  assert(LI && "Must have a load instruction");
+  State.setDebugLocFrom(getDebugLoc());
+  for (unsigned Part = 0; Part < State.UF; ++Part) {


Removed loop, thanks!

fhahn · 2024-04-17T10:57:28Z

llvm/lib/Transforms/Vectorize/LoopVectorize.cpp

+    NewLI->addParamAttr(
+        0, Attribute::getWithAlignment(NewLI->getContext(), Alignment));
+
+    // Add metadata to the load.


Removed, thanks!

fhahn · 2024-04-17T10:58:27Z

llvm/lib/Transforms/Vectorize/LoopVectorize.cpp

+  auto &Builder = State.Builder;
+  State.setDebugLocFrom(getDebugLoc());
+
+  for (unsigned Part = 0; Part < State.UF; ++Part) {


Removed, thanks!

fhahn · 2024-04-17T15:02:53Z

llvm/lib/Transforms/Vectorize/VPlanTransforms.cpp

+    return CompareToReplace &&
+           CompareToReplace->getOpcode() == Instruction::ICmp &&
+           CompareToReplace->getPredicate() == CmpInst::ICMP_ULE &&
+           CompareToReplace->getOperand(1) == BTC;


Must every ICMP_ULE of BTC be the header mask, regardless of its other operand?

At the moment yes, but I reworked this to walk users of header-masks.

The current placement of addExplicitVectorLength unfortunately means that VPWidenCanonicalIVRecipe may have been removed, if there's a matching VPWidenIntOrFpInductionRecipe (see TODO from the initial EVL PR; the current placement is to avoid introducing new uses of the canonical IV after introducing EVL)

So we also need to walk the users of VPWidenIntOrFpInductionRecipe that are canonical.

Tail folding (an integral part of initial stripmining step according to roadmap) should arguably introduce a single abstract HeaderMask VPValue, retrievable from the loop region (as with its canonical IV), to be later materialized/legalized/lowered into concrete bump/widening/compare/active-lane-mask/no-mask-with-EVL recipes placed inside blocks?

Sounds good to me, pushing towards more gradual lowering (as follow-up)

fhahn · 2024-04-17T15:04:32Z

llvm/lib/Transforms/Vectorize/VPlanTransforms.cpp

+      if (!MemR)
+        continue;
+      VPValue *OrigMask = MemR->getMask();
+      if (!OrigMask)


Should not be null at this point, updated, thanks!

fhahn · 2024-04-17T15:05:36Z

llvm/lib/Transforms/Vectorize/VPlanTransforms.cpp

+      VPValue *OrigMask = MemR->getMask();
+      if (!OrigMask)
+        continue;
+      assert(!MemR->isReverse() &&
+             "Reversed memory operations not supported yet.");
+      VPValue *Mask = IsHeaderMask(OrigMask) ? nullptr : OrigMask;


Reordered, thanks!

fhahn · 2024-04-17T15:10:23Z

llvm/lib/Transforms/Vectorize/VPlanTransforms.cpp

+             "Reversed memory operations not supported yet.");
+      VPValue *Mask = IsHeaderMask(OrigMask) ? nullptr : OrigMask;
+      if (auto *L = dyn_cast<VPWidenLoadRecipe>(&R)) {
+        auto *N = new VPWidenVPLoadRecipe(cast<LoadInst>(L->getIngredient()),


Done, thanks!

fhahn · 2024-04-17T15:10:27Z

llvm/lib/Transforms/Vectorize/VPlanTransforms.cpp

+        L->replaceAllUsesWith(N);
+        L->eraseFromParent();
+      } else if (auto *S = dyn_cast<VPWidenStoreRecipe>(&R)) {
+        auto *N = new VPWidenVPStoreRecipe(


Done, thanks!

ayalz · 2024-04-14T06:52:46Z

llvm/lib/Transforms/Vectorize/LoopVectorize.cpp


  StoreInst *Store = cast<StoreInst>(I);
-  return new VPWidenMemoryInstructionRecipe(
-      *Store, Ptr, Operands[0], Mask, Consecutive, Reverse, I->getDebugLoc());
+  return new VPWidenStoreRecipe(*Store, Operands[0], Ptr, Mask, Consecutive,


nit: better retain parameters in their order as operands?

ayalz · 2024-04-14T06:54:18Z

llvm/lib/Transforms/Vectorize/LoopVectorize.cpp

-void VPWidenMemoryInstructionRecipe::execute(VPTransformState &State) {
-  VPValue *StoredValue = isStore() ? getStoredValue() : nullptr;
-
+void VPWidenLoadRecipe::execute(VPTransformState &State) {
  // Attempt to issue a wide load.


nit: redundant?

ayalz · 2024-04-18T11:24:34Z

llvm/lib/Transforms/Vectorize/LoopVectorize.cpp

-          Builder, DataTy, State.get(getAddr(), Part, !CreateGather),
-          CreateGather, Mask, EVL, Alignment);
-    } else if (CreateGather) {
+    if (CreateGather) {


Suggested change

if (CreateGather) {

Value *Addr = State.get(getAddr(), Part, !CreateGather);

if (CreateGather)

NewLI = Builder.CreateMaskedGather(DataTy, Addr, Alignment, Mask,

nullptr, "wide.masked.gather");

else if (Mask)

NewLI = Builder.CreateMaskedLoad(DataTy, Addr, Alignment, Mask,

PoisonValue::get(DataTy),

"wide.masked.load");

else

NewLI =

Builder.CreateAlignedLoad(DataTy, Addr, Alignment, "wide.load");

// Add metadata to the load, but setVectorValue to possibly reversed shuffle.

State.addMetadata(NewLI, LI);

if (Reverse)

NewLI = Builder.CreateVectorReverse(NewLI, "reverse");

State.set(this, NewLI, Part);

nit: could be simplified a bit, consistent with VPWidenVPLoadRecipe::execute()?

Updated, thanks!

Thanks! The brackets could be removed.

Kept the braces as there are multi-line statements (for which I think it is recommended to use braces)

ok, but let the store case below be consistent with the load case here?

ayalz · 2024-04-18T11:34:00Z

llvm/lib/Transforms/Vectorize/LoopVectorize.cpp

-          Builder, State.get(getAddr(), Part, !CreateScatter), StoredVal,
-          CreateScatter, Mask, EVL, Alignment);
-    } else if (CreateScatter) {
+    if (CreateScatter) {


Suggested change

if (CreateScatter) {

Value *Addr = State.get(getAddr(), Part, !CreateScatter);

if (CreateScatter)

NewSI =

Builder.CreateMaskedScatter(StoredVal, Addr, Alignment, Mask);

else if (Mask)

NewSI = Builder.CreateMaskedStore(StoredVal, Addr, Alignment, Mask);

else

NewSI = Builder.CreateAlignedStore(StoredVal, Addr, Alignment);

nit: can be simplified a bit, consistent with VPWidenVPStoreRecipe::execute()?

Done, thanks!

Thanks! The brackets can be removed, along with inlining else if.

Done, thanks!

ayalz · 2024-04-18T11:34:29Z

llvm/lib/Transforms/Vectorize/LoopVectorize.cpp

@@ -9448,31 +9423,14 @@ void VPWidenStoreRecipe::execute(VPTransformState &State) {

    Value *StoredVal = State.get(StoredVPValue, Part);
    if (isReverse()) {
-      assert(!State.EVL && "reversing not yet implemented with EVL");
      // If we store to reverse consecutive memory locations, then we need
      // to reverse the order of elements in the stored value.
      StoredVal = Builder.CreateVectorReverse(StoredVal, "reverse");
      // We don't want to update the value in the map as it might be used in
      // another expression. So don't call resetVectorValue(StoredVal).
    }
    // TODO: split this into several classes for better design.


Is this TODO still relevant?

Removed, thanks!

ayalz · 2024-04-18T12:44:23Z

llvm/lib/Transforms/Vectorize/VPlanTransforms.cpp

+  // Walk users of wide canonical IVs and replace all compares of the form
  // (ICMP_ULE, WideCanonicalIV, backedge-taken-count) with
  // the given idiom VPValue.


Suggested change

// Walk users of wide canonical IVs and replace all compares of the form

// (ICMP_ULE, WideCanonicalIV, backedge-taken-count) with

// the given idiom VPValue.

// Walk users of wide canonical IVs and apply Fn to all compares of the form

// (ICMP_ULE, WideCanonicalIV, backedge-taken-count).

Updated, thanks!

ayalz · 2024-04-18T12:49:21Z

llvm/test/Transforms/LoopVectorize/RISCV/vplan-vp-intrinsics.ll

@@ -27,14 +27,14 @@ define void @foo(ptr noalias %a, ptr noalias %b, ptr noalias %c, i64 %N) {
 ; IF-EVL-NEXT:    vp<[[ST:%[0-9]+]]> = SCALAR-STEPS vp<[[EVL_PHI]]>, ir<1>
 ; IF-EVL-NEXT:    CLONE ir<[[GEP1:%.+]]> = getelementptr inbounds ir<%b>, vp<[[ST]]>
 ; IF-EVL-NEXT:    vp<[[PTR1:%[0-9]+]]> = vector-pointer ir<[[GEP1]]>
-; IF-EVL-NEXT:    WIDEN ir<[[LD1:%.+]]> = load vp<[[PTR1]]>, ir<true>
+; IF-EVL-NEXT:    WIDEN ir<[[LD1:%.+]]> = vp.load vp<[[PTR1]]>, vp<[[EVL]]>


nit (unrelated to this patch): may be helpful to add the name of the underlying IR LoadInst when printing WIDEN [vp.]load recipes, to clarify what is being widened.

The name of the original IR instruction should be available as the IR name of the recipe (matched using a pattern here ir<[[LD1:%.+]]>)

Ah, of course.

ayalz · 2024-04-18T13:02:34Z

llvm/test/Transforms/LoopVectorize/RISCV/vectorize-force-tail-with-evl-gather-scatter.ll

 ; IF-EVL-NEXT:    [[TMP20:%.*]] = getelementptr inbounds i32, ptr [[INDEX:%.*]], <vscale x 2 x i64> [[VEC_IND]]
-; IF-EVL-NEXT:    [[WIDE_MASKED_GATHER:%.*]] = call <vscale x 2 x i64> @llvm.vp.gather.nxv2i64.nxv2p0(<vscale x 2 x ptr> align 8 [[TMP20]], <vscale x 2 x i1> [[TMP19]], i32 [[TMP18]])
+; IF-EVL-NEXT:    [[WIDE_MASKED_GATHER:%.*]] = call <vscale x 2 x i64> @llvm.vp.gather.nxv2i64.nxv2p0(<vscale x 2 x ptr> align 8 [[TMP20]], <vscale x 2 x i1> shufflevector (<vscale x 2 x i1> insertelement (<vscale x 2 x i1> poison, i1 true, i64 0), <vscale x 2 x i1> poison, <vscale x 2 x i32> zeroinitializer), i32 [[TMP18]])


nit: would be clearer if this shufflevector ( ... zeroinitializer) operand representing a full-mask was outlined; as some "oneinitializer".

Hmm, at the moment this creates a constant, but maybe there's an easy way to create an equivalent shuffle vector instruction. Can check separately.

ayalz · 2024-04-18T13:21:38Z

llvm/lib/Transforms/Vectorize/VPlanTransforms.cpp

@@ -1336,6 +1332,30 @@ void VPlanTransforms::addExplicitVectorLength(VPlan &Plan) {
  NextEVLIV->insertBefore(CanonicalIVIncrement);
  EVLPhi->addOperand(NextEVLIV);

+  forAllHeaderPredicates(Plan, [VPEVL](VPInstruction &Mask) {


An alternative approach would be to scan the recipes inside the loop, replacing every widen memory recipe with its vp counterpart, where header masks are replaced with null. Before doing so, collect all header masks into a set, possibly via forAllHeaderPredicates(), which could be simplified into collectAllHeaderPredicates(), until a designated recipe can be use directly.

Where is collectUsersRecursively() defined?

An alternative approach would be to scan the recipes inside the loop, replacing every widen memory recipe with its vp counterpart, where header masks are replaced with null. Before doing so, collect all header masks into a set, possibly via forAllHeaderPredicates(), which could be simplified into collectAllHeaderPredicates(), until a designated recipe can be use directly.

Updated to use collectAllHeaderPredicates (collectAllHeaderMasks), but kept walking those users for now. Can also revert back to iterating over the whole vector loop region as was done in earlier versions of the patch if preferred.

ayalz · 2024-04-18T13:35:50Z

llvm/lib/Transforms/Vectorize/LoopVectorize.cpp

@@ -9425,6 +9362,44 @@ void VPWidenLoadRecipe::execute(VPTransformState &State) {
  }
 }

+void VPWidenVPLoadRecipe::execute(VPTransformState &State) {


A general naming comment: VPWidenVP... sounds a bit confusing. The original non-EVL/non-VP masked vector loads and stores are also already "vector predicated", and the VP prefix is already prevalently used to denote VPlan. The distinction lies with the additional EVL operand, currently employed to relieve the mask of conveying the tail. How about VPWidenEVLLoadRecipe and VPWidenEVLStoreRecipe?

Updated, thanks!

fhahn

Thanks for taking a look, latest comments should be addressed.

fhahn · 2024-04-18T14:29:11Z

llvm/lib/Transforms/Vectorize/LoopVectorize.cpp

-          Builder, DataTy, State.get(getAddr(), Part, !CreateGather),
-          CreateGather, Mask, EVL, Alignment);
-    } else if (CreateGather) {
+    if (CreateGather) {


Updated, thanks!

fhahn · 2024-04-18T14:33:03Z

llvm/lib/Transforms/Vectorize/LoopVectorize.cpp

@@ -9425,6 +9362,44 @@ void VPWidenLoadRecipe::execute(VPTransformState &State) {
  }
 }

+void VPWidenVPLoadRecipe::execute(VPTransformState &State) {


Updated, thanks!

fhahn · 2024-04-18T14:34:19Z

llvm/lib/Transforms/Vectorize/LoopVectorize.cpp

-          Builder, State.get(getAddr(), Part, !CreateScatter), StoredVal,
-          CreateScatter, Mask, EVL, Alignment);
-    } else if (CreateScatter) {
+    if (CreateScatter) {


Done, thanks!

fhahn · 2024-04-18T14:34:24Z

llvm/lib/Transforms/Vectorize/LoopVectorize.cpp

@@ -9448,31 +9423,14 @@ void VPWidenStoreRecipe::execute(VPTransformState &State) {

    Value *StoredVal = State.get(StoredVPValue, Part);
    if (isReverse()) {
-      assert(!State.EVL && "reversing not yet implemented with EVL");
      // If we store to reverse consecutive memory locations, then we need
      // to reverse the order of elements in the stored value.
      StoredVal = Builder.CreateVectorReverse(StoredVal, "reverse");
      // We don't want to update the value in the map as it might be used in
      // another expression. So don't call resetVectorValue(StoredVal).
    }
    // TODO: split this into several classes for better design.


Removed, thanks!

fhahn · 2024-04-18T14:35:47Z

llvm/lib/Transforms/Vectorize/VPlan.h

+    // Widened, consecutive memory operations only demand the first lane of
+    // their address, unless the same operand is also stored. That latter can
+    // happen with opaque pointers.


Done ,not. sure what happened there.

fhahn · 2024-04-18T14:46:46Z

llvm/lib/Transforms/Vectorize/VPlanTransforms.cpp

+/// WideCanonicalIV, backedge-taken-count) pattern
+static void forAllHeaderPredicates(VPlan &Plan,
+                                   function_ref<void(VPInstruction &)> Fn) {
+  SmallVector<VPValue *> WideCanonicalIVs;
  auto *FoundWidenCanonicalIVUser =
      find_if(Plan.getCanonicalIV()->users(),


Only a at-most one WidenCanonicalIVRecipe user expected (worth asserting?), or simply iterate over all users possibly pushing multiple candidates into the worklist, as with widen int or fp induction recipes below?

Yes, added an assert.

Instead of a pattern matching search for header masks, introduce an abstract HeaderMask recipe, which is later (i.e., here) lowered into an ICMP_ULE, ActiveLaneMask, EVL, whatnot?

Sounds good, OK as follow-up? Added a TODO for now?

fhahn · 2024-04-18T14:47:26Z

llvm/lib/Transforms/Vectorize/VPlanTransforms.cpp

+  // Walk users of wide canonical IVs and replace all compares of the form
  // (ICMP_ULE, WideCanonicalIV, backedge-taken-count) with
  // the given idiom VPValue.


Updated, thanks!

fhahn · 2024-04-18T15:02:26Z

llvm/test/Transforms/LoopVectorize/RISCV/vplan-vp-intrinsics.ll

@@ -27,14 +27,14 @@ define void @foo(ptr noalias %a, ptr noalias %b, ptr noalias %c, i64 %N) {
 ; IF-EVL-NEXT:    vp<[[ST:%[0-9]+]]> = SCALAR-STEPS vp<[[EVL_PHI]]>, ir<1>
 ; IF-EVL-NEXT:    CLONE ir<[[GEP1:%.+]]> = getelementptr inbounds ir<%b>, vp<[[ST]]>
 ; IF-EVL-NEXT:    vp<[[PTR1:%[0-9]+]]> = vector-pointer ir<[[GEP1]]>
-; IF-EVL-NEXT:    WIDEN ir<[[LD1:%.+]]> = load vp<[[PTR1]]>, ir<true>
+; IF-EVL-NEXT:    WIDEN ir<[[LD1:%.+]]> = vp.load vp<[[PTR1]]>, vp<[[EVL]]>


The name of the original IR instruction should be available as the IR name of the recipe (matched using a pattern here ir<[[LD1:%.+]]>)

fhahn · 2024-04-18T15:02:32Z

llvm/lib/Transforms/Vectorize/VPlanTransforms.cpp

@@ -1336,6 +1332,30 @@ void VPlanTransforms::addExplicitVectorLength(VPlan &Plan) {
  NextEVLIV->insertBefore(CanonicalIVIncrement);
  EVLPhi->addOperand(NextEVLIV);

+  forAllHeaderPredicates(Plan, [VPEVL](VPInstruction &Mask) {


An alternative approach would be to scan the recipes inside the loop, replacing every widen memory recipe with its vp counterpart, where header masks are replaced with null. Before doing so, collect all header masks into a set, possibly via forAllHeaderPredicates(), which could be simplified into collectAllHeaderPredicates(), until a designated recipe can be use directly.

Updated to use collectAllHeaderPredicates (collectAllHeaderMasks), but kept walking those users for now. Can also revert back to iterating over the whole vector loop region as was done in earlier versions of the patch if preferred.

fhahn · 2024-04-18T15:03:46Z

llvm/test/Transforms/LoopVectorize/RISCV/vectorize-force-tail-with-evl-gather-scatter.ll

 ; IF-EVL-NEXT:    [[TMP20:%.*]] = getelementptr inbounds i32, ptr [[INDEX:%.*]], <vscale x 2 x i64> [[VEC_IND]]
-; IF-EVL-NEXT:    [[WIDE_MASKED_GATHER:%.*]] = call <vscale x 2 x i64> @llvm.vp.gather.nxv2i64.nxv2p0(<vscale x 2 x ptr> align 8 [[TMP20]], <vscale x 2 x i1> [[TMP19]], i32 [[TMP18]])
+; IF-EVL-NEXT:    [[WIDE_MASKED_GATHER:%.*]] = call <vscale x 2 x i64> @llvm.vp.gather.nxv2i64.nxv2p0(<vscale x 2 x ptr> align 8 [[TMP20]], <vscale x 2 x i1> shufflevector (<vscale x 2 x i1> insertelement (<vscale x 2 x i1> poison, i1 true, i64 0), <vscale x 2 x i1> poison, <vscale x 2 x i32> zeroinitializer), i32 [[TMP18]])


Hmm, at the moment this creates a constant, but maybe there's an easy way to create an equivalent shuffle vector instruction. Can check separately.

ayalz · 2024-04-18T15:27:18Z

llvm/lib/Transforms/Vectorize/VPlanRecipes.cpp

  case VPWidenStoreSC:
+  case VPWidenEVLStoreSC:


Suggested change

case VPWidenStoreSC:

case VPWidenEVLStoreSC:

case VPWidenEVLStoreSC:

case VPWidenStoreSC:

nit: lex order

There are many such disorders below, where the EVL and non-EVL versions would be listed apart. Perhaps better call them VPWidenLoadEVL[SC] and VPWidenStoreEVL[SC], for better alignment?

Renamed including the recipe names (for consistency), thanks!

ayalz · 2024-04-18T15:28:37Z

llvm/lib/Transforms/Vectorize/VPlanRecipes.cpp

  case VPWidenIntOrFpInductionSC:
  case VPWidenLoadSC:
  case VPWidenPHISC:
+  case VPWidenEVLLoadSC:


Suggested change

case VPWidenIntOrFpInductionSC:

case VPWidenLoadSC:

case VPWidenPHISC:

case VPWidenEVLLoadSC:

case VPWidenEVLLoadSC:

case VPWidenIntOrFpInductionSC:

case VPWidenLoadSC:

case VPWidenPHISC:

nit: lex order.

Updated with new naming.

ayalz · 2024-04-18T15:29:07Z

llvm/lib/Transforms/Vectorize/VPlanRecipes.cpp

  case VPWidenLoadSC:
+  case VPWidenEVLLoadSC:


Suggested change

case VPWidenLoadSC:

case VPWidenEVLLoadSC:

case VPWidenEVLLoadSC:

case VPWidenLoadSC:

Updated with new naming

ayalz · 2024-04-18T15:35:28Z

llvm/lib/Transforms/Vectorize/VPlanTransforms.cpp

-static void
-replaceHeaderPredicateWith(VPlan &Plan, VPValue &Idiom,
-                           function_ref<bool(VPUser &, unsigned)> Cond = {}) {
+/// Collet all VPValues representing a header mask through the (ICMP_ULE,


Suggested change

/// Collet all VPValues representing a header mask through the (ICMP_ULE,

/// Collect all VPValues representing a header mask through the (ICMP_ULE,

Fixed, thanks!

ayalz · 2024-04-18T18:18:04Z

llvm/lib/Transforms/Vectorize/VPlanTransforms.cpp

+  // Walk users of wide canonical IVs and apply Fn to all compares of the form
+  // (ICMP_ULE, WideCanonicalIV, backedge-taken-count).


Suggested change

// Walk users of wide canonical IVs and apply Fn to all compares of the form

// (ICMP_ULE, WideCanonicalIV, backedge-taken-count).

// Walk users of wide canonical IVs and collect all compares of the form

// (ICMP_ULE, WideCanonicalIV, backedge-taken-count).

Done, thanks!

ayalz · 2024-04-18T20:25:20Z

llvm/lib/Transforms/Vectorize/VPlanRecipes.cpp

@@ -157,6 +161,8 @@ bool VPRecipeBase::mayHaveSideEffects() const {
    return mayWriteToMemory();
  case VPWidenLoadSC:
  case VPWidenStoreSC:
+  case VPWidenEVLLoadSC:
+  case VPWidenEVLStoreSC:


updated with new naming

ayalz · 2024-04-18T20:34:07Z

llvm/lib/Transforms/Vectorize/VPlanTransforms.cpp

+      // TODO: Introduce explicit recipe for header-mask instead of searching
+      // for the header-mask pattern manually.


This TODO can be placed at the definition of collectAllHeaderMasks(), which would be removed as a whole.

Moved, thanks!

ayalz · 2024-04-18T20:43:15Z

llvm/lib/Transforms/Vectorize/VPlanTransforms.cpp

+        auto *N = new VPWidenEVLLoadRecipe(L, VPEVL, NewMask);
+        N->insertBefore(L);
+        L->replaceAllUsesWith(N);
+        L->eraseFromParent();


Suggested change

L->eraseFromParent();

nit: could the following recursivelyDeleteDeadRecipes() take care of erasing Load recipes from their parents? The former is needed due to no subsequent VPlan dce.
Collecting dead Stores is more challenging.

At the moment recursivelyDeleteDeadRecipes works upwards, recursing from uses to def, so recursivelyDeleteDeadRecipes(HeaderMask) won't clean up the load.

ayalz · 2024-04-18T20:44:29Z

llvm/lib/Transforms/Vectorize/VPlanValue.h

@@ -358,6 +358,8 @@ class VPDef {
    VPWidenGEPSC,
    VPWidenLoadSC,
    VPWidenStoreSC,
+    VPWidenEVLLoadSC,
+    VPWidenEVLStoreSC,


reordered with new naming, thanks!

ayalz · 2024-04-18T20:45:25Z

llvm/test/Transforms/LoopVectorize/RISCV/vectorize-force-tail-with-evl-gather-scatter.ll

 ; IF-EVL-NEXT:    [[TMP20:%.*]] = getelementptr inbounds i32, ptr [[INDEX:%.*]], <vscale x 2 x i64> [[VEC_IND]]
-; IF-EVL-NEXT:    [[WIDE_MASKED_GATHER:%.*]] = call <vscale x 2 x i64> @llvm.vp.gather.nxv2i64.nxv2p0(<vscale x 2 x ptr> align 8 [[TMP20]], <vscale x 2 x i1> [[TMP19]], i32 [[TMP18]])
+; IF-EVL-NEXT:    [[WIDE_MASKED_GATHER:%.*]] = call <vscale x 2 x i64> @llvm.vp.gather.nxv2i64.nxv2p0(<vscale x 2 x ptr> align 8 [[TMP20]], <vscale x 2 x i1> shufflevector (<vscale x 2 x i1> insertelement (<vscale x 2 x i1> poison, i1 true, i64 0), <vscale x 2 x i1> poison, <vscale x 2 x i32> zeroinitializer), i32 [[TMP18]])


fhahn

Addressed comments, thanks!

fhahn · 2024-04-18T21:18:12Z

llvm/lib/Transforms/Vectorize/VPlan.h

+struct VPWidenEVLLoadRecipe final : public VPWidenMemoryRecipe, public VPValue {
+  VPWidenEVLLoadRecipe(VPWidenLoadRecipe *L, VPValue *EVL, VPValue *Mask)
+      : VPWidenMemoryRecipe(
+            VPDef::VPWidenEVLLoadSC, *cast<LoadInst>(&L->getIngredient()),


Ah no, removed, thanks!

fhahn · 2024-04-18T21:18:22Z

llvm/lib/Transforms/Vectorize/VPlan.h

+struct VPWidenEVLStoreRecipe final : public VPWidenMemoryRecipe {
+  VPWidenEVLStoreRecipe(VPWidenStoreRecipe *S, VPValue *EVL, VPValue *Mask)
+      : VPWidenMemoryRecipe(VPDef::VPWidenEVLStoreSC,
+                            *cast<StoreInst>(&S->getIngredient()),


Not needed, removed, thanks!

fhahn · 2024-04-18T21:21:51Z

llvm/lib/Transforms/Vectorize/VPlanRecipes.cpp

  case VPWidenStoreSC:
+  case VPWidenEVLStoreSC:


Renamed including the recipe names (for consistency), thanks!

fhahn · 2024-04-18T21:23:11Z

llvm/lib/Transforms/Vectorize/VPlanRecipes.cpp

  case VPWidenIntOrFpInductionSC:
  case VPWidenLoadSC:
  case VPWidenPHISC:
+  case VPWidenEVLLoadSC:


Updated with new naming.

fhahn · 2024-04-18T21:23:21Z

llvm/lib/Transforms/Vectorize/VPlanRecipes.cpp

  case VPWidenLoadSC:
+  case VPWidenEVLLoadSC:


Updated with new naming

fhahn · 2024-04-18T21:27:54Z

llvm/lib/Transforms/Vectorize/VPlanTransforms.cpp

+  if (FoundWidenCanonicalIVUser != Plan.getCanonicalIV()->users().end()) {
+    auto *WideCanonicalIV =
+        cast<VPWidenCanonicalIVRecipe>(*FoundWidenCanonicalIVUser);
+    assert(all_of(Plan.getCanonicalIV()->users(),


Done, thanks!

fhahn · 2024-04-18T21:28:51Z

llvm/lib/Transforms/Vectorize/VPlanTransforms.cpp

+  // Walk users of wide canonical IVs and apply Fn to all compares of the form
+  // (ICMP_ULE, WideCanonicalIV, backedge-taken-count).


Done, thanks!

fhahn · 2024-04-18T21:28:57Z

llvm/lib/Transforms/Vectorize/VPlanTransforms.cpp

+      // TODO: Introduce explicit recipe for header-mask instead of searching
+      // for the header-mask pattern manually.


Moved, thanks!

fhahn · 2024-04-18T21:31:38Z

llvm/lib/Transforms/Vectorize/VPlanTransforms.cpp

+        auto *N = new VPWidenEVLLoadRecipe(L, VPEVL, NewMask);
+        N->insertBefore(L);
+        L->replaceAllUsesWith(N);
+        L->eraseFromParent();


At the moment recursivelyDeleteDeadRecipes works upwards, recursing from uses to def, so recursivelyDeleteDeadRecipes(HeaderMask) won't clean up the load.

fhahn · 2024-04-18T21:32:01Z

llvm/lib/Transforms/Vectorize/VPlanValue.h

@@ -358,6 +358,8 @@ class VPDef {
    VPWidenGEPSC,
    VPWidenLoadSC,
    VPWidenStoreSC,
+    VPWidenEVLLoadSC,
+    VPWidenEVLStoreSC,


reordered with new naming, thanks!

ayalz

This LGTM, thanks for accommodating!
Adding last minor nits.

ayalz · 2024-04-18T21:44:33Z

llvm/lib/Transforms/Vectorize/LoopVectorize.cpp

-          Builder, DataTy, State.get(getAddr(), Part, !CreateGather),
-          CreateGather, Mask, EVL, Alignment);
-    } else if (CreateGather) {
+    if (CreateGather) {


Thanks! The brackets could be removed.

ayalz · 2024-04-18T21:48:42Z

llvm/lib/Transforms/Vectorize/LoopVectorize.cpp

-          Builder, State.get(getAddr(), Part, !CreateScatter), StoredVal,
-          CreateScatter, Mask, EVL, Alignment);
-    } else if (CreateScatter) {
+    if (CreateScatter) {


Thanks! The brackets can be removed, along with inlining else if.

ayalz · 2024-04-18T21:51:44Z

llvm/lib/Transforms/Vectorize/VPlan.h

+struct VPWidenLoadEVLRecipe final : public VPWidenMemoryRecipe, public VPValue {
+  VPWidenLoadEVLRecipe(VPWidenLoadRecipe *L, VPValue *EVL, VPValue *Mask)
+      : VPWidenMemoryRecipe(
+            VPDef::VPWidenLoadEVLSC, *cast<LoadInst>(&L->getIngredient()),


Still a redundant cast?

removed, thanks!

ayalz · 2024-04-18T21:59:07Z

llvm/lib/Transforms/Vectorize/VPlan.h

+struct VPWidenLoadEVLRecipe final : public VPWidenMemoryRecipe, public VPValue {
+  VPWidenLoadEVLRecipe(VPWidenLoadRecipe *L, VPValue *EVL, VPValue *Mask)
+      : VPWidenMemoryRecipe(
+            VPDef::VPWidenLoadEVLSC, *cast<LoadInst>(&L->getIngredient()),


Suggested change

VPDef::VPWidenLoadEVLSC, *cast<LoadInst>(&L->getIngredient()),

VPDef::VPWidenLoadEVLSC, L->getIngredient(),

Still a redundant cast?

removed, thanks!

ayalz · 2024-04-18T22:06:02Z

llvm/lib/Transforms/Vectorize/VPlanTransforms.cpp

-  replaceHeaderPredicateWith(Plan, *LaneMask);
+  for (VPValue *HeaderMask : collectAllHeaderMasks(Plan)) {
+    HeaderMask->replaceAllUsesWith(LaneMask);
+    recursivelyDeleteDeadRecipes(HeaderMask);


Suggested change

recursivelyDeleteDeadRecipes(HeaderMask);

Wonder if this is redundant due to subsequent VPlan dce.

Ah yes, removed thanks(originally thought this comment was for the addExplicitVectorlength, where it wouldn't apply)

ayalz · 2024-04-18T22:22:58Z

llvm/lib/Transforms/Vectorize/VPlanTransforms.cpp

+  assert(count_if(Plan.getCanonicalIV()->users(),
+                  [](VPUser *U) { return isa<VPWidenCanonicalIVRecipe>(U); }) <=
+             1 &&
+         "Must at most one VPWideCanonicalIVRecipe");


Suggested change

"Must at most one VPWideCanonicalIVRecipe");

"Must be at most one VPWideCanonicalIVRecipe");

Updated to use Must have at most, thanks!

Introduce new subclasses of VPWidenMemoryRecipe for VP (vector-predicated) loads and stores to address multiple TODOs from llvm#76172 Note that the introduction of the new recipes also improves code-gen for VP gather/scatters by removing the redundant header mask. With the new approach, it is not sufficient to look at users of the widened canonical IV to find all uses of the header mask. In some cases, a widened IV is used instead of separately widening the canonical IV. To handle that, first collect all VPValues representing header masks (by looking at users of both the canonical IV and widened inductions that are canonical) and then checking all users (recursively) of those header masks. Depends on llvm#87411. PR: llvm#87816

fhahn · 2024-04-22T12:51:07Z

Just shared a PR to introduce an abstract header mask early on: #89603

) Following from #87816, add VPReductionEVLRecipe to describe vector predication reduction. Address one of TODOs from #76172.

…m#90184) Summary: Following from llvm#87816, add VPReductionEVLRecipe to describe vector predication reduction. Address one of TODOs from llvm#76172. Test Plan: Reviewers: Subscribers: Tasks: Tags: Differential Revision: https://phabricator.intern.facebook.com/D59822470

) Summary: Following from #87816, add VPReductionEVLRecipe to describe vector predication reduction. Address one of TODOs from #76172. Test Plan: Reviewers: Subscribers: Tasks: Tags: Differential Revision: https://phabricator.intern.facebook.com/D60251485

fhahn requested review from alexey-bataev, ayalz and aniragil April 5, 2024 18:40

llvmbot added backend:RISC-V vectorizers llvm:transforms labels Apr 5, 2024

alexey-bataev reviewed Apr 5, 2024

View reviewed changes

ayalz reviewed Apr 11, 2024

View reviewed changes

nikolaypanchenko reviewed Apr 12, 2024

View reviewed changes

fhahn added a commit that referenced this pull request Apr 17, 2024

[VPlan] Factor out helper to recursively collect all users (NFCI).

41b7341

Factor out logic to collect all users recursively to be re-used in #87816.

fhahn force-pushed the vplan-split-vpmemory branch from 6dcd584 to 4533d92 Compare April 17, 2024 15:11

fhahn commented Apr 17, 2024

View reviewed changes

ayalz reviewed Apr 18, 2024

View reviewed changes

fhahn added 2 commits April 18, 2024 15:02

Merge remote-tracking branch 'origin/main' into vplan-split-vpmemory

c8fafce

!fixup address latest comments, thanks!

cb97949

fhahn commented Apr 18, 2024

View reviewed changes

ayalz reviewed Apr 18, 2024

View reviewed changes

fhahn added 2 commits April 18, 2024 22:15

Merge remote-tracking branch 'origin/main' into vplan-split-vpmemory

2436232

!fixup address latest comments, thanks!

43ede9c

fhahn commented Apr 18, 2024

View reviewed changes

ayalz approved these changes Apr 18, 2024

View reviewed changes

fhahn added 2 commits April 19, 2024 08:21

Merge remote-tracking branch 'origin/main' into vplan-split-vpmemory

cdb7061

!fixup address final comments, thanks!

4cfbcf0

fhahn merged commit e2a72fa into llvm:main Apr 19, 2024
3 of 4 checks passed

fhahn deleted the vplan-split-vpmemory branch April 19, 2024 08:44

fhahn mentioned this pull request Apr 22, 2024

[VPlan] Add new VPInstruction ocpode for header mask. #89603

Open

Mel-Chen mentioned this pull request Apr 26, 2024

[LV][EVL] Support in-loop reduction using tail folding with EVL. #90184

Merged

Mel-Chen added a commit that referenced this pull request Jul 16, 2024

[LV][EVL] Support in-loop reduction using tail folding with EVL. (#90184

4eb30cf

) Following from #87816, add VPReductionEVLRecipe to describe vector predication reduction. Address one of TODOs from #76172.

		VPValue *BTC = Plan.getOrCreateBackedgeTakenCount();
		auto IsHeaderMask = [BTC](VPValue *V) {

	VPValue *BTC = Plan.getOrCreateBackedgeTakenCount();
	auto IsHeaderMask = [BTC](VPValue *V) {
	auto IsHeaderMask = [BTC = Plan.getOrCreateBackedgeTakenCount()](VPValue *V) {

		// Handle loads.
		assert(LI && "Must have a load instruction");

-    if (CreateScatter) {
+    Value *Addr = State.get(getAddr(), Part, !CreateScatter);
+    if (CreateScatter)
+      NewSI =
+        Builder.CreateMaskedScatter(StoredVal, Addr, Alignment, Mask);
+    else if (Mask)
+      NewSI = Builder.CreateMaskedStore(StoredVal, Addr, Alignment, Mask);
+    else
+      NewSI = Builder.CreateAlignedStore(StoredVal, Addr, Alignment);

	/// Collet all VPValues representing a header mask through the (ICMP_ULE,
	/// Collect all VPValues representing a header mask through the (ICMP_ULE,

		// Walk users of wide canonical IVs and apply Fn to all compares of the form
		// (ICMP_ULE, WideCanonicalIV, backedge-taken-count).

		// TODO: Introduce explicit recipe for header-mask instead of searching
		// for the header-mask pattern manually.

	VPDef::VPWidenLoadEVLSC, *cast<LoadInst>(&L->getIngredient()),
	VPDef::VPWidenLoadEVLSC, L->getIngredient(),

	"Must at most one VPWideCanonicalIVRecipe");
	"Must be at most one VPWideCanonicalIVRecipe");

[VPlan] Introduce recipes for VP loads and stores. #87816

[VPlan] Introduce recipes for VP loads and stores. #87816

Conversation

fhahn commented Apr 5, 2024

llvmbot commented Apr 5, 2024 • edited Loading

fhahn commented Apr 5, 2024

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

ayalz left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

fhahn left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

fhahn left a comment

Choose a reason for hiding this comment

llvmbot commented Apr 5, 2024 •

edited

Loading