[VPlan] Split VPWidenMemoryInstructionRecipe (NFCI). #87411

fhahn · 2024-04-02T21:06:01Z

This patch introduces a new VPWidenMemoryRecipe base class and distinct sub-classes to model loads and stores.

This is a first step in an effort to simplify and modularize code generation for widened loads and stores and enable adding further more specialized memory recipes.

This patch introduces a new VPWidenMemoryRecipe abstract base class a and distinct sub-classes to model loads and stores. This is a first step in an effort to simplify and modularize code generation for widened loads and stores and enable adding further more specialized memory recipes. Note that this adjusts the order of the operands for VPWidenStoreRecipe to match the order of operands of stores in IR and other recipes (like VPReplicateRecipe).

alexey-bataev · 2024-04-04T23:17:06Z

llvm/lib/Transforms/Vectorize/VPlan.h

-  VPValue *getAddr() const {
-    return getOperand(0); // Address is the 1st, mandatory operand.
-  }
+  virtual VPValue *getAddr() const = 0;


Do you really need to make it virtual? I think you can just remove it from the base class

There are callers that need to get the address of any WidenMemoryRecipe (e.g. VPlanTransforms::dropPoisonGeneratingRecipes), kept virtual for now.

Better to use static isa/dyn_cast sequences where possible instead of virtual functions

Fair enough, replaced with an implementation using switch and recipe ID.

alexey-bataev · 2024-04-04T23:17:21Z

llvm/lib/Transforms/Vectorize/VPlan.h

+  Instruction &getIngredient() const { return Ingredient; }
+};
+
+struct VPWidenLoadRecipe : public VPWidenMemoryRecipe, public VPValue {


Done, thanks!

alexey-bataev · 2024-04-04T23:17:31Z

llvm/lib/Transforms/Vectorize/VPlan.h

 };

+struct VPWidenStoreRecipe : public VPWidenMemoryRecipe {


Done, thanks!

…ryinst

llvmbot · 2024-04-05T16:27:17Z

@llvm/pr-subscribers-llvm-transforms

Author: Florian Hahn (fhahn)

Changes

This patch introduces a new VPWidenMemoryRecipe abstract base class a and distinct sub-classes to model loads and stores.

This is a first step in an effort to simplify and modularize code generation for widened loads and stores and enable adding further more specialized memory recipes.

Note that this adjusts the order of the operands for VPWidenStoreRecipe to match the order of operands of stores in IR and other recipes (like VPReplicateRecipe).

Patch is 56.35 KiB, truncated to 20.00 KiB below, full version: https://github.com/llvm/llvm-project/pull/87411.diff

22 Files Affected:

(modified) llvm/lib/Transforms/Vectorize/LoopVectorize.cpp (+83-73)
(modified) llvm/lib/Transforms/Vectorize/VPRecipeBuilder.h (+3-3)
(modified) llvm/lib/Transforms/Vectorize/VPlan.h (+102-48)
(modified) llvm/lib/Transforms/Vectorize/VPlanAnalysis.cpp (+4-5)
(modified) llvm/lib/Transforms/Vectorize/VPlanAnalysis.h (+2-2)
(modified) llvm/lib/Transforms/Vectorize/VPlanRecipes.cpp (+22-20)
(modified) llvm/lib/Transforms/Vectorize/VPlanTransforms.cpp (+11-12)
(modified) llvm/lib/Transforms/Vectorize/VPlanValue.h (+2-3)
(modified) llvm/lib/Transforms/Vectorize/VPlanVerifier.cpp (+1-1)
(modified) llvm/test/Transforms/LoopVectorize/AArch64/sve-tail-folding-forced.ll (+1-1)
(modified) llvm/test/Transforms/LoopVectorize/AArch64/sve-widen-gep.ll (+2-2)
(modified) llvm/test/Transforms/LoopVectorize/AArch64/synthesize-mask-for-call.ll (+6-6)
(modified) llvm/test/Transforms/LoopVectorize/RISCV/riscv-vector-reverse.ll (+2-2)
(modified) llvm/test/Transforms/LoopVectorize/RISCV/vplan-vp-intrinsics.ll (+3-3)
(modified) llvm/test/Transforms/LoopVectorize/X86/vplan-vp-intrinsics.ll (+2-2)
(modified) llvm/test/Transforms/LoopVectorize/first-order-recurrence-chains-vplan.ll (+2-2)
(modified) llvm/test/Transforms/LoopVectorize/vplan-dot-printing.ll (+1-1)
(modified) llvm/test/Transforms/LoopVectorize/vplan-iv-transforms.ll (+1-1)
(modified) llvm/test/Transforms/LoopVectorize/vplan-printing-before-execute.ll (+2-2)
(modified) llvm/test/Transforms/LoopVectorize/vplan-printing.ll (+9-9)
(modified) llvm/unittests/Transforms/Vectorize/VPlanHCFGTest.cpp (+2-2)
(modified) llvm/unittests/Transforms/Vectorize/VPlanTest.cpp (+4-5)

diff --git a/llvm/lib/Transforms/Vectorize/LoopVectorize.cpp b/llvm/lib/Transforms/Vectorize/LoopVectorize.cpp
index 49bacb5ae6cc4e..d6a3365743355f 100644
--- a/llvm/lib/Transforms/Vectorize/LoopVectorize.cpp
+++ b/llvm/lib/Transforms/Vectorize/LoopVectorize.cpp
@@ -8095,7 +8095,7 @@ void VPRecipeBuilder::createBlockInMask(BasicBlock *BB) {
   BlockMaskCache[BB] = BlockMask;
 }
 
-VPWidenMemoryInstructionRecipe *
+VPWidenMemoryRecipe *
 VPRecipeBuilder::tryToWidenMemory(Instruction *I, ArrayRef<VPValue *> Operands,
                                   VFRange &Range) {
   assert((isa<LoadInst>(I) || isa<StoreInst>(I)) &&
@@ -8140,12 +8140,12 @@ VPRecipeBuilder::tryToWidenMemory(Instruction *I, ArrayRef<VPValue *> Operands,
     Ptr = VectorPtr;
   }
   if (LoadInst *Load = dyn_cast<LoadInst>(I))
-    return new VPWidenMemoryInstructionRecipe(*Load, Ptr, Mask, Consecutive,
-                                              Reverse, I->getDebugLoc());
+    return new VPWidenLoadRecipe(*Load, Ptr, Mask, Consecutive, Reverse,
+                                 I->getDebugLoc());
 
   StoreInst *Store = cast<StoreInst>(I);
-  return new VPWidenMemoryInstructionRecipe(
-      *Store, Ptr, Operands[0], Mask, Consecutive, Reverse, I->getDebugLoc());
+  return new VPWidenStoreRecipe(*Store, Operands[0], Ptr, Mask, Consecutive,
+                                Reverse, I->getDebugLoc());
 }
 
 /// Creates a VPWidenIntOrFpInductionRecpipe for \p Phi. If needed, it will also
@@ -8780,13 +8780,12 @@ LoopVectorizationPlanner::tryToBuildVPlanWithVPRecipes(VFRange &Range) {
   // for this VPlan, replace the Recipes widening its memory instructions with a
   // single VPInterleaveRecipe at its insertion point.
   for (const auto *IG : InterleaveGroups) {
-    auto *Recipe = cast<VPWidenMemoryInstructionRecipe>(
-        RecipeBuilder.getRecipe(IG->getInsertPos()));
+    auto *Recipe =
+        cast<VPWidenMemoryRecipe>(RecipeBuilder.getRecipe(IG->getInsertPos()));
     SmallVector<VPValue *, 4> StoredValues;
     for (unsigned i = 0; i < IG->getFactor(); ++i)
       if (auto *SI = dyn_cast_or_null<StoreInst>(IG->getMember(i))) {
-        auto *StoreR =
-            cast<VPWidenMemoryInstructionRecipe>(RecipeBuilder.getRecipe(SI));
+        auto *StoreR = cast<VPWidenStoreRecipe>(RecipeBuilder.getRecipe(SI));
         StoredValues.push_back(StoreR->getStoredValue());
       }
 
@@ -9464,22 +9463,15 @@ static Instruction *lowerLoadUsingVectorIntrinsics(IRBuilderBase &Builder,
   return Call;
 }
 
-void VPWidenMemoryInstructionRecipe::execute(VPTransformState &State) {
-  VPValue *StoredValue = isStore() ? getStoredValue() : nullptr;
-
+void VPWidenLoadRecipe::execute(VPTransformState &State) {
   // Attempt to issue a wide load.
-  LoadInst *LI = dyn_cast<LoadInst>(&Ingredient);
-  StoreInst *SI = dyn_cast<StoreInst>(&Ingredient);
-
-  assert((LI || SI) && "Invalid Load/Store instruction");
-  assert((!SI || StoredValue) && "No stored value provided for widened store");
-  assert((!LI || !StoredValue) && "Stored value provided for widened load");
+  LoadInst *LI = cast<LoadInst>(&Ingredient);
 
   Type *ScalarDataTy = getLoadStoreType(&Ingredient);
 
   auto *DataTy = VectorType::get(ScalarDataTy, State.VF);
   const Align Alignment = getLoadStoreAlignment(&Ingredient);
-  bool CreateGatherScatter = !isConsecutive();
+  bool CreateGather = !isConsecutive();
 
   auto &Builder = State.Builder;
   InnerLoopVectorizer::VectorParts BlockInMaskParts(State.UF);
@@ -9495,56 +9487,6 @@ void VPWidenMemoryInstructionRecipe::execute(VPTransformState &State) {
     }
   }
 
-  // Handle Stores:
-  if (SI) {
-    State.setDebugLocFrom(getDebugLoc());
-
-    for (unsigned Part = 0; Part < State.UF; ++Part) {
-      Instruction *NewSI = nullptr;
-      Value *StoredVal = State.get(StoredValue, Part);
-      // TODO: split this into several classes for better design.
-      if (State.EVL) {
-        assert(State.UF == 1 && "Expected only UF == 1 when vectorizing with "
-                                "explicit vector length.");
-        assert(cast<VPInstruction>(State.EVL)->getOpcode() ==
-                   VPInstruction::ExplicitVectorLength &&
-               "EVL must be VPInstruction::ExplicitVectorLength.");
-        Value *EVL = State.get(State.EVL, VPIteration(0, 0));
-        // If EVL is not nullptr, then EVL must be a valid value set during plan
-        // creation, possibly default value = whole vector register length. EVL
-        // is created only if TTI prefers predicated vectorization, thus if EVL
-        // is not nullptr it also implies preference for predicated
-        // vectorization.
-        // FIXME: Support reverse store after vp_reverse is added.
-        Value *MaskPart = isMaskRequired ? BlockInMaskParts[Part] : nullptr;
-        NewSI = lowerStoreUsingVectorIntrinsics(
-            Builder, State.get(getAddr(), Part, !CreateGatherScatter),
-            StoredVal, CreateGatherScatter, MaskPart, EVL, Alignment);
-      } else if (CreateGatherScatter) {
-        Value *MaskPart = isMaskRequired ? BlockInMaskParts[Part] : nullptr;
-        Value *VectorGep = State.get(getAddr(), Part);
-        NewSI = Builder.CreateMaskedScatter(StoredVal, VectorGep, Alignment,
-                                            MaskPart);
-      } else {
-        if (isReverse()) {
-          // If we store to reverse consecutive memory locations, then we need
-          // to reverse the order of elements in the stored value.
-          StoredVal = Builder.CreateVectorReverse(StoredVal, "reverse");
-          // We don't want to update the value in the map as it might be used in
-          // another expression. So don't call resetVectorValue(StoredVal).
-        }
-        auto *VecPtr = State.get(getAddr(), Part, /*IsScalar*/ true);
-        if (isMaskRequired)
-          NewSI = Builder.CreateMaskedStore(StoredVal, VecPtr, Alignment,
-                                            BlockInMaskParts[Part]);
-        else
-          NewSI = Builder.CreateAlignedStore(StoredVal, VecPtr, Alignment);
-      }
-      State.addMetadata(NewSI, SI);
-    }
-    return;
-  }
-
   // Handle loads.
   assert(LI && "Must have a load instruction");
   State.setDebugLocFrom(getDebugLoc());
@@ -9566,9 +9508,9 @@ void VPWidenMemoryInstructionRecipe::execute(VPTransformState &State) {
       // FIXME: Support reverse loading after vp_reverse is added.
       Value *MaskPart = isMaskRequired ? BlockInMaskParts[Part] : nullptr;
       NewLI = lowerLoadUsingVectorIntrinsics(
-          Builder, DataTy, State.get(getAddr(), Part, !CreateGatherScatter),
-          CreateGatherScatter, MaskPart, EVL, Alignment);
-    } else if (CreateGatherScatter) {
+          Builder, DataTy, State.get(getAddr(), Part, !CreateGather),
+          CreateGather, MaskPart, EVL, Alignment);
+    } else if (CreateGather) {
       Value *MaskPart = isMaskRequired ? BlockInMaskParts[Part] : nullptr;
       Value *VectorGep = State.get(getAddr(), Part);
       NewLI = Builder.CreateMaskedGather(DataTy, VectorGep, Alignment, MaskPart,
@@ -9590,7 +9532,75 @@ void VPWidenMemoryInstructionRecipe::execute(VPTransformState &State) {
         NewLI = Builder.CreateVectorReverse(NewLI, "reverse");
     }
 
-    State.set(getVPSingleValue(), NewLI, Part);
+    State.set(this, NewLI, Part);
+  }
+}
+
+void VPWidenStoreRecipe::execute(VPTransformState &State) {
+  VPValue *StoredValue = getStoredValue();
+
+  const Align Alignment = getLoadStoreAlignment(&Ingredient);
+  bool CreateScatter = !isConsecutive();
+
+  StoreInst *SI = cast<StoreInst>(&Ingredient);
+  auto &Builder = State.Builder;
+  InnerLoopVectorizer::VectorParts BlockInMaskParts(State.UF);
+  bool isMaskRequired = getMask();
+  if (isMaskRequired) {
+    // Mask reversal is only needed for non-all-one (null) masks, as reverse of
+    // a null all-one mask is a null mask.
+    for (unsigned Part = 0; Part < State.UF; ++Part) {
+      Value *Mask = State.get(getMask(), Part);
+      if (isReverse())
+        Mask = Builder.CreateVectorReverse(Mask, "reverse");
+      BlockInMaskParts[Part] = Mask;
+    }
+  }
+
+  State.setDebugLocFrom(getDebugLoc());
+
+  for (unsigned Part = 0; Part < State.UF; ++Part) {
+    Instruction *NewSI = nullptr;
+    Value *StoredVal = State.get(StoredValue, Part);
+    // TODO: split this into several classes for better design.
+    if (State.EVL) {
+      assert(State.UF == 1 && "Expected only UF == 1 when vectorizing with "
+                              "explicit vector length.");
+      assert(cast<VPInstruction>(State.EVL)->getOpcode() ==
+                 VPInstruction::ExplicitVectorLength &&
+             "EVL must be VPInstruction::ExplicitVectorLength.");
+      Value *EVL = State.get(State.EVL, VPIteration(0, 0));
+      // If EVL is not nullptr, then EVL must be a valid value set during plan
+      // creation, possibly default value = whole vector register length. EVL
+      // is created only if TTI prefers predicated vectorization, thus if EVL
+      // is not nullptr it also implies preference for predicated
+      // vectorization.
+      // FIXME: Support reverse store after vp_reverse is added.
+      Value *MaskPart = isMaskRequired ? BlockInMaskParts[Part] : nullptr;
+      NewSI = lowerStoreUsingVectorIntrinsics(
+          Builder, State.get(getAddr(), Part, !CreateScatter), StoredVal,
+          CreateScatter, MaskPart, EVL, Alignment);
+    } else if (CreateScatter) {
+      Value *MaskPart = isMaskRequired ? BlockInMaskParts[Part] : nullptr;
+      Value *VectorGep = State.get(getAddr(), Part);
+      NewSI = Builder.CreateMaskedScatter(StoredVal, VectorGep, Alignment,
+                                          MaskPart);
+    } else {
+      if (isReverse()) {
+        // If we store to reverse consecutive memory locations, then we need
+        // to reverse the order of elements in the stored value.
+        StoredVal = Builder.CreateVectorReverse(StoredVal, "reverse");
+        // We don't want to update the value in the map as it might be used in
+        // another expression. So don't call resetVectorValue(StoredVal).
+      }
+      auto *VecPtr = State.get(getAddr(), Part, /*IsScalar*/ true);
+      if (isMaskRequired)
+        NewSI = Builder.CreateMaskedStore(StoredVal, VecPtr, Alignment,
+                                          BlockInMaskParts[Part]);
+      else
+        NewSI = Builder.CreateAlignedStore(StoredVal, VecPtr, Alignment);
+    }
+    State.addMetadata(NewSI, SI);
   }
 }
 
diff --git a/llvm/lib/Transforms/Vectorize/VPRecipeBuilder.h b/llvm/lib/Transforms/Vectorize/VPRecipeBuilder.h
index 605b47fa0a46b8..b4c7ab02f928f0 100644
--- a/llvm/lib/Transforms/Vectorize/VPRecipeBuilder.h
+++ b/llvm/lib/Transforms/Vectorize/VPRecipeBuilder.h
@@ -69,9 +69,9 @@ class VPRecipeBuilder {
   /// Check if the load or store instruction \p I should widened for \p
   /// Range.Start and potentially masked. Such instructions are handled by a
   /// recipe that takes an additional VPInstruction for the mask.
-  VPWidenMemoryInstructionRecipe *tryToWidenMemory(Instruction *I,
-                                                   ArrayRef<VPValue *> Operands,
-                                                   VFRange &Range);
+  VPWidenMemoryRecipe *tryToWidenMemory(Instruction *I,
+                                        ArrayRef<VPValue *> Operands,
+                                        VFRange &Range);
 
   /// Check if an induction recipe should be constructed for \p Phi. If so build
   /// and return it. If not, return null.
diff --git a/llvm/lib/Transforms/Vectorize/VPlan.h b/llvm/lib/Transforms/Vectorize/VPlan.h
index 77577b516ae274..3a0800bbb3d45c 100644
--- a/llvm/lib/Transforms/Vectorize/VPlan.h
+++ b/llvm/lib/Transforms/Vectorize/VPlan.h
@@ -875,7 +875,8 @@ class VPSingleDefRecipe : public VPRecipeBase, public VPValue {
       return true;
     case VPRecipeBase::VPInterleaveSC:
     case VPRecipeBase::VPBranchOnMaskSC:
-    case VPRecipeBase::VPWidenMemoryInstructionSC:
+    case VPRecipeBase::VPWidenLoadSC:
+    case VPRecipeBase::VPWidenStoreSC:
       // TODO: Widened stores don't define a value, but widened loads do. Split
       // the recipes to be able to make widened loads VPSingleDefRecipes.
       return false;
@@ -2279,7 +2280,8 @@ class VPPredInstPHIRecipe : public VPSingleDefRecipe {
 /// - For store: Address, stored value, optional mask
 /// TODO: We currently execute only per-part unless a specific instance is
 /// provided.
-class VPWidenMemoryInstructionRecipe : public VPRecipeBase {
+class VPWidenMemoryRecipe : public VPRecipeBase {
+protected:
   Instruction &Ingredient;
 
   // Whether the loaded-from / stored-to addresses are consecutive.
@@ -2294,47 +2296,40 @@ class VPWidenMemoryInstructionRecipe : public VPRecipeBase {
     addOperand(Mask);
   }
 
-  bool isMasked() const {
-    return isStore() ? getNumOperands() == 3 : getNumOperands() == 2;
-  }
-
 public:
-  VPWidenMemoryInstructionRecipe(LoadInst &Load, VPValue *Addr, VPValue *Mask,
-                                 bool Consecutive, bool Reverse, DebugLoc DL)
-      : VPRecipeBase(VPDef::VPWidenMemoryInstructionSC, {Addr}, DL),
-        Ingredient(Load), Consecutive(Consecutive), Reverse(Reverse) {
+  VPWidenMemoryRecipe(const char unsigned SC, Instruction &I,
+                      std::initializer_list<VPValue *> Operands,
+                      bool Consecutive, bool Reverse, DebugLoc DL)
+      : VPRecipeBase(SC, Operands, DL), Ingredient(I), Consecutive(Consecutive),
+        Reverse(Reverse) {
     assert((Consecutive || !Reverse) && "Reverse implies consecutive");
-    new VPValue(this, &Load);
-    setMask(Mask);
   }
 
-  VPWidenMemoryInstructionRecipe(StoreInst &Store, VPValue *Addr,
-                                 VPValue *StoredValue, VPValue *Mask,
-                                 bool Consecutive, bool Reverse, DebugLoc DL)
-      : VPRecipeBase(VPDef::VPWidenMemoryInstructionSC, {Addr, StoredValue},
-                     DL),
-        Ingredient(Store), Consecutive(Consecutive), Reverse(Reverse) {
-    assert((Consecutive || !Reverse) && "Reverse implies consecutive");
-    setMask(Mask);
-  }
+  VPRecipeBase *clone() override = 0;
 
-  VPRecipeBase *clone() override {
-    if (isStore())
-      return new VPWidenMemoryInstructionRecipe(
-          cast<StoreInst>(Ingredient), getAddr(), getStoredValue(), getMask(),
-          Consecutive, Reverse, getDebugLoc());
+  static inline bool classof(const VPRecipeBase *R) {
+    return R->getVPDefID() == VPRecipeBase::VPWidenStoreSC ||
+           R->getVPDefID() == VPRecipeBase::VPWidenLoadSC;
+  }
 
-    return new VPWidenMemoryInstructionRecipe(cast<LoadInst>(Ingredient),
-                                              getAddr(), getMask(), Consecutive,
-                                              Reverse, getDebugLoc());
+  static inline bool classof(const VPUser *U) {
+    auto *R = dyn_cast<VPRecipeBase>(U);
+    return R && classof(R);
   }
 
-  VP_CLASSOF_IMPL(VPDef::VPWidenMemoryInstructionSC)
+  /// Returns true if the recipe is masked.
+  virtual bool isMasked() const = 0;
 
   /// Return the address accessed by this recipe.
-  VPValue *getAddr() const {
-    return getOperand(0); // Address is the 1st, mandatory operand.
-  }
+  virtual VPValue *getAddr() const = 0;
+
+
+  // Return whether the loaded-from / stored-to addresses are consecutive.
+  bool isConsecutive() const { return Consecutive; }
+
+  // Return whether the consecutive loaded/stored addresses are in reverse
+  // order.
+  bool isReverse() const { return Reverse; }
 
   /// Return the mask used by this recipe. Note that a full mask is represented
   /// by a nullptr.
@@ -2343,21 +2338,37 @@ class VPWidenMemoryInstructionRecipe : public VPRecipeBase {
     return isMasked() ? getOperand(getNumOperands() - 1) : nullptr;
   }
 
-  /// Returns true if this recipe is a store.
-  bool isStore() const { return isa<StoreInst>(Ingredient); }
+  /// Generate the wide load/store.
+  void execute(VPTransformState &State) override = 0;
+
+  Instruction &getIngredient() const { return Ingredient; }
+};
 
-  /// Return the address accessed by this recipe.
-  VPValue *getStoredValue() const {
-    assert(isStore() && "Stored value only available for store instructions");
-    return getOperand(1); // Stored value is the 2nd, mandatory operand.
+struct VPWidenLoadRecipe final : public VPWidenMemoryRecipe, public VPValue {
+  VPWidenLoadRecipe(LoadInst &Load, VPValue *Addr, VPValue *Mask,
+                    bool Consecutive, bool Reverse, DebugLoc DL)
+      : VPWidenMemoryRecipe(VPDef::VPWidenLoadSC, Load, {Addr}, Consecutive,
+                            Reverse, DL),
+        VPValue(this, &Load) {
+    assert((Consecutive || !Reverse) && "Reverse implies consecutive");
+    setMask(Mask);
   }
 
-  // Return whether the loaded-from / stored-to addresses are consecutive.
-  bool isConsecutive() const { return Consecutive; }
+  VPRecipeBase *clone() override {
+    return new VPWidenLoadRecipe(cast<LoadInst>(Ingredient), getAddr(),
+                                 getMask(), Consecutive, Reverse,
+                                 getDebugLoc());
+  }
 
-  // Return whether the consecutive loaded/stored addresses are in reverse
-  // order.
-  bool isReverse() const { return Reverse; }
+  VP_CLASSOF_IMPL(VPDef::VPWidenLoadSC);
+
+  /// Returns true if the recipe is masked.
+  bool isMasked() const override { return getNumOperands() == 2; }
+
+  /// Return the address accessed by this recipe.
+  VPValue *getAddr() const override {
+    return getOperand(0); // Address is the 1st, mandatory operand.
+  }
 
   /// Generate the wide load/store.
   void execute(VPTransformState &State) override;
@@ -2376,13 +2387,56 @@ class VPWidenMemoryInstructionRecipe : public VPRecipeBase {
     // Widened, consecutive memory operations only demand the first lane of
     // their address, unless the same operand is also stored. That latter can
     // happen with opaque pointers.
-    return Op == getAddr() && isConsecutive() &&
-           (!isStore() || Op != getStoredValue());
+    return Op == getAddr() && isConsecutive();
   }
-
-  Instruction &getIngredient() const { return Ingredient; }
 };
 
+struct VPWidenStoreRecipe final : public VPWidenMemoryRecipe {
+  VPWidenStoreRecipe(StoreInst &Store, VPValue *StoredVal, VPValue *Addr,
+                     VPValue *Mask, bool Consecutive, bool Reverse, DebugLoc DL)
+      : VPWidenMemoryRecipe(VPDef::VPWidenStoreSC, Store, {StoredVal, Addr},
+                            Consecutive, Reverse, DL) {
+    assert((Consecutive || !Reverse) && "Reverse implies consecutive");
+    setMask(Mask);
+  }
+
+  VPRecipeBase *clone() override {
+    return new VPWidenStoreRecipe(cast<StoreInst>(Ingredient), getStoredValue(),
+                                  getAddr(), getMask(), Consecutive, Reverse,
+                                  getDebugLoc());
+  }
+
+  VP_CLASSOF_IMPL(VPDef::VPWidenStoreSC);
+
+  /// Returns true if the recipe is masked.
+  bool isMasked() const override { return getNumOperands() == 3; }
+
+  /// Return the address accessed by this recipe.
+  VPValue *getAddr() const override { return getOperand(1); }
+
+  /// Return the address accessed by this recipe.
+  VPValue *getStoredValue() const { return getOperand(0); }
+
+  /// Generate the wide load/store.
+  void execute(VPTransformState &State) override;
+
+#if !defined(NDEBUG) || defined(LLVM_ENABLE_DUMP)
+  /// Print the recipe.
+  void print(raw_ostream &O, const Twine &Indent,
+             VPSlotTracker &SlotTracker) const override;
+#endif
+
+  /// Returns true if the recipe only uses the first lane of operand \p Op.
+  bool onlyFirstLaneUsed(const VPValue *Op) const override {
+    assert(is_contained(operands(), Op) &&
+           "Op must be an operand of the recipe");
+
+    // Widened, consecutive memory operations only demand the first lane of
+    // their address, unless the same operand is also stored. That latter can
+    // happen with opaque pointers.
+    return Op == getAddr() && isConsecutive() && Op != getStoredValue();
+  }
+};
 /// Recipe to expand a SCEV expression.
 class VPExpandSCEVRecipe : public VPSingleDefRecipe {
   const SCEV *Expr;
diff --git a/llvm/lib/Transforms/Vectorize/VPlanAnalysis.cpp b/llvm/lib/Transforms/Vectorize/VPlanAnalysis.cpp
index c8ae2ee5a30fe5..130fb04f586e75 100644
--- a/llvm/lib/T...
[truncated]

llvmbot · 2024-04-05T16:27:18Z

@llvm/pr-subscribers-backend-risc-v

Author: Florian Hahn (fhahn)

Changes

This patch introduces a new VPWidenMemoryRecipe abstract base class a and distinct sub-classes to model loads and stores.

This is a first step in an effort to simplify and modularize code generation for widened loads and stores and enable adding further more specialized memory recipes.

Note that this adjusts the order of the operands for VPWidenStoreRecipe to match the order of operands of stores in IR and other recipes (like VPReplicateRecipe).

Patch is 56.35 KiB, truncated to 20.00 KiB below, full version: https://github.com/llvm/llvm-project/pull/87411.diff

22 Files Affected:

(modified) llvm/lib/Transforms/Vectorize/LoopVectorize.cpp (+83-73)
(modified) llvm/lib/Transforms/Vectorize/VPRecipeBuilder.h (+3-3)
(modified) llvm/lib/Transforms/Vectorize/VPlan.h (+102-48)
(modified) llvm/lib/Transforms/Vectorize/VPlanAnalysis.cpp (+4-5)
(modified) llvm/lib/Transforms/Vectorize/VPlanAnalysis.h (+2-2)
(modified) llvm/lib/Transforms/Vectorize/VPlanRecipes.cpp (+22-20)
(modified) llvm/lib/Transforms/Vectorize/VPlanTransforms.cpp (+11-12)
(modified) llvm/lib/Transforms/Vectorize/VPlanValue.h (+2-3)
(modified) llvm/lib/Transforms/Vectorize/VPlanVerifier.cpp (+1-1)
(modified) llvm/test/Transforms/LoopVectorize/AArch64/sve-tail-folding-forced.ll (+1-1)
(modified) llvm/test/Transforms/LoopVectorize/AArch64/sve-widen-gep.ll (+2-2)
(modified) llvm/test/Transforms/LoopVectorize/AArch64/synthesize-mask-for-call.ll (+6-6)
(modified) llvm/test/Transforms/LoopVectorize/RISCV/riscv-vector-reverse.ll (+2-2)
(modified) llvm/test/Transforms/LoopVectorize/RISCV/vplan-vp-intrinsics.ll (+3-3)
(modified) llvm/test/Transforms/LoopVectorize/X86/vplan-vp-intrinsics.ll (+2-2)
(modified) llvm/test/Transforms/LoopVectorize/first-order-recurrence-chains-vplan.ll (+2-2)
(modified) llvm/test/Transforms/LoopVectorize/vplan-dot-printing.ll (+1-1)
(modified) llvm/test/Transforms/LoopVectorize/vplan-iv-transforms.ll (+1-1)
(modified) llvm/test/Transforms/LoopVectorize/vplan-printing-before-execute.ll (+2-2)
(modified) llvm/test/Transforms/LoopVectorize/vplan-printing.ll (+9-9)
(modified) llvm/unittests/Transforms/Vectorize/VPlanHCFGTest.cpp (+2-2)
(modified) llvm/unittests/Transforms/Vectorize/VPlanTest.cpp (+4-5)

diff --git a/llvm/lib/Transforms/Vectorize/LoopVectorize.cpp b/llvm/lib/Transforms/Vectorize/LoopVectorize.cpp
index 49bacb5ae6cc4e..d6a3365743355f 100644
--- a/llvm/lib/Transforms/Vectorize/LoopVectorize.cpp
+++ b/llvm/lib/Transforms/Vectorize/LoopVectorize.cpp
@@ -8095,7 +8095,7 @@ void VPRecipeBuilder::createBlockInMask(BasicBlock *BB) {
   BlockMaskCache[BB] = BlockMask;
 }
 
-VPWidenMemoryInstructionRecipe *
+VPWidenMemoryRecipe *
 VPRecipeBuilder::tryToWidenMemory(Instruction *I, ArrayRef<VPValue *> Operands,
                                   VFRange &Range) {
   assert((isa<LoadInst>(I) || isa<StoreInst>(I)) &&
@@ -8140,12 +8140,12 @@ VPRecipeBuilder::tryToWidenMemory(Instruction *I, ArrayRef<VPValue *> Operands,
     Ptr = VectorPtr;
   }
   if (LoadInst *Load = dyn_cast<LoadInst>(I))
-    return new VPWidenMemoryInstructionRecipe(*Load, Ptr, Mask, Consecutive,
-                                              Reverse, I->getDebugLoc());
+    return new VPWidenLoadRecipe(*Load, Ptr, Mask, Consecutive, Reverse,
+                                 I->getDebugLoc());
 
   StoreInst *Store = cast<StoreInst>(I);
-  return new VPWidenMemoryInstructionRecipe(
-      *Store, Ptr, Operands[0], Mask, Consecutive, Reverse, I->getDebugLoc());
+  return new VPWidenStoreRecipe(*Store, Operands[0], Ptr, Mask, Consecutive,
+                                Reverse, I->getDebugLoc());
 }
 
 /// Creates a VPWidenIntOrFpInductionRecpipe for \p Phi. If needed, it will also
@@ -8780,13 +8780,12 @@ LoopVectorizationPlanner::tryToBuildVPlanWithVPRecipes(VFRange &Range) {
   // for this VPlan, replace the Recipes widening its memory instructions with a
   // single VPInterleaveRecipe at its insertion point.
   for (const auto *IG : InterleaveGroups) {
-    auto *Recipe = cast<VPWidenMemoryInstructionRecipe>(
-        RecipeBuilder.getRecipe(IG->getInsertPos()));
+    auto *Recipe =
+        cast<VPWidenMemoryRecipe>(RecipeBuilder.getRecipe(IG->getInsertPos()));
     SmallVector<VPValue *, 4> StoredValues;
     for (unsigned i = 0; i < IG->getFactor(); ++i)
       if (auto *SI = dyn_cast_or_null<StoreInst>(IG->getMember(i))) {
-        auto *StoreR =
-            cast<VPWidenMemoryInstructionRecipe>(RecipeBuilder.getRecipe(SI));
+        auto *StoreR = cast<VPWidenStoreRecipe>(RecipeBuilder.getRecipe(SI));
         StoredValues.push_back(StoreR->getStoredValue());
       }
 
@@ -9464,22 +9463,15 @@ static Instruction *lowerLoadUsingVectorIntrinsics(IRBuilderBase &Builder,
   return Call;
 }
 
-void VPWidenMemoryInstructionRecipe::execute(VPTransformState &State) {
-  VPValue *StoredValue = isStore() ? getStoredValue() : nullptr;
-
+void VPWidenLoadRecipe::execute(VPTransformState &State) {
   // Attempt to issue a wide load.
-  LoadInst *LI = dyn_cast<LoadInst>(&Ingredient);
-  StoreInst *SI = dyn_cast<StoreInst>(&Ingredient);
-
-  assert((LI || SI) && "Invalid Load/Store instruction");
-  assert((!SI || StoredValue) && "No stored value provided for widened store");
-  assert((!LI || !StoredValue) && "Stored value provided for widened load");
+  LoadInst *LI = cast<LoadInst>(&Ingredient);
 
   Type *ScalarDataTy = getLoadStoreType(&Ingredient);
 
   auto *DataTy = VectorType::get(ScalarDataTy, State.VF);
   const Align Alignment = getLoadStoreAlignment(&Ingredient);
-  bool CreateGatherScatter = !isConsecutive();
+  bool CreateGather = !isConsecutive();
 
   auto &Builder = State.Builder;
   InnerLoopVectorizer::VectorParts BlockInMaskParts(State.UF);
@@ -9495,56 +9487,6 @@ void VPWidenMemoryInstructionRecipe::execute(VPTransformState &State) {
     }
   }
 
-  // Handle Stores:
-  if (SI) {
-    State.setDebugLocFrom(getDebugLoc());
-
-    for (unsigned Part = 0; Part < State.UF; ++Part) {
-      Instruction *NewSI = nullptr;
-      Value *StoredVal = State.get(StoredValue, Part);
-      // TODO: split this into several classes for better design.
-      if (State.EVL) {
-        assert(State.UF == 1 && "Expected only UF == 1 when vectorizing with "
-                                "explicit vector length.");
-        assert(cast<VPInstruction>(State.EVL)->getOpcode() ==
-                   VPInstruction::ExplicitVectorLength &&
-               "EVL must be VPInstruction::ExplicitVectorLength.");
-        Value *EVL = State.get(State.EVL, VPIteration(0, 0));
-        // If EVL is not nullptr, then EVL must be a valid value set during plan
-        // creation, possibly default value = whole vector register length. EVL
-        // is created only if TTI prefers predicated vectorization, thus if EVL
-        // is not nullptr it also implies preference for predicated
-        // vectorization.
-        // FIXME: Support reverse store after vp_reverse is added.
-        Value *MaskPart = isMaskRequired ? BlockInMaskParts[Part] : nullptr;
-        NewSI = lowerStoreUsingVectorIntrinsics(
-            Builder, State.get(getAddr(), Part, !CreateGatherScatter),
-            StoredVal, CreateGatherScatter, MaskPart, EVL, Alignment);
-      } else if (CreateGatherScatter) {
-        Value *MaskPart = isMaskRequired ? BlockInMaskParts[Part] : nullptr;
-        Value *VectorGep = State.get(getAddr(), Part);
-        NewSI = Builder.CreateMaskedScatter(StoredVal, VectorGep, Alignment,
-                                            MaskPart);
-      } else {
-        if (isReverse()) {
-          // If we store to reverse consecutive memory locations, then we need
-          // to reverse the order of elements in the stored value.
-          StoredVal = Builder.CreateVectorReverse(StoredVal, "reverse");
-          // We don't want to update the value in the map as it might be used in
-          // another expression. So don't call resetVectorValue(StoredVal).
-        }
-        auto *VecPtr = State.get(getAddr(), Part, /*IsScalar*/ true);
-        if (isMaskRequired)
-          NewSI = Builder.CreateMaskedStore(StoredVal, VecPtr, Alignment,
-                                            BlockInMaskParts[Part]);
-        else
-          NewSI = Builder.CreateAlignedStore(StoredVal, VecPtr, Alignment);
-      }
-      State.addMetadata(NewSI, SI);
-    }
-    return;
-  }
-
   // Handle loads.
   assert(LI && "Must have a load instruction");
   State.setDebugLocFrom(getDebugLoc());
@@ -9566,9 +9508,9 @@ void VPWidenMemoryInstructionRecipe::execute(VPTransformState &State) {
       // FIXME: Support reverse loading after vp_reverse is added.
       Value *MaskPart = isMaskRequired ? BlockInMaskParts[Part] : nullptr;
       NewLI = lowerLoadUsingVectorIntrinsics(
-          Builder, DataTy, State.get(getAddr(), Part, !CreateGatherScatter),
-          CreateGatherScatter, MaskPart, EVL, Alignment);
-    } else if (CreateGatherScatter) {
+          Builder, DataTy, State.get(getAddr(), Part, !CreateGather),
+          CreateGather, MaskPart, EVL, Alignment);
+    } else if (CreateGather) {
       Value *MaskPart = isMaskRequired ? BlockInMaskParts[Part] : nullptr;
       Value *VectorGep = State.get(getAddr(), Part);
       NewLI = Builder.CreateMaskedGather(DataTy, VectorGep, Alignment, MaskPart,
@@ -9590,7 +9532,75 @@ void VPWidenMemoryInstructionRecipe::execute(VPTransformState &State) {
         NewLI = Builder.CreateVectorReverse(NewLI, "reverse");
     }
 
-    State.set(getVPSingleValue(), NewLI, Part);
+    State.set(this, NewLI, Part);
+  }
+}
+
+void VPWidenStoreRecipe::execute(VPTransformState &State) {
+  VPValue *StoredValue = getStoredValue();
+
+  const Align Alignment = getLoadStoreAlignment(&Ingredient);
+  bool CreateScatter = !isConsecutive();
+
+  StoreInst *SI = cast<StoreInst>(&Ingredient);
+  auto &Builder = State.Builder;
+  InnerLoopVectorizer::VectorParts BlockInMaskParts(State.UF);
+  bool isMaskRequired = getMask();
+  if (isMaskRequired) {
+    // Mask reversal is only needed for non-all-one (null) masks, as reverse of
+    // a null all-one mask is a null mask.
+    for (unsigned Part = 0; Part < State.UF; ++Part) {
+      Value *Mask = State.get(getMask(), Part);
+      if (isReverse())
+        Mask = Builder.CreateVectorReverse(Mask, "reverse");
+      BlockInMaskParts[Part] = Mask;
+    }
+  }
+
+  State.setDebugLocFrom(getDebugLoc());
+
+  for (unsigned Part = 0; Part < State.UF; ++Part) {
+    Instruction *NewSI = nullptr;
+    Value *StoredVal = State.get(StoredValue, Part);
+    // TODO: split this into several classes for better design.
+    if (State.EVL) {
+      assert(State.UF == 1 && "Expected only UF == 1 when vectorizing with "
+                              "explicit vector length.");
+      assert(cast<VPInstruction>(State.EVL)->getOpcode() ==
+                 VPInstruction::ExplicitVectorLength &&
+             "EVL must be VPInstruction::ExplicitVectorLength.");
+      Value *EVL = State.get(State.EVL, VPIteration(0, 0));
+      // If EVL is not nullptr, then EVL must be a valid value set during plan
+      // creation, possibly default value = whole vector register length. EVL
+      // is created only if TTI prefers predicated vectorization, thus if EVL
+      // is not nullptr it also implies preference for predicated
+      // vectorization.
+      // FIXME: Support reverse store after vp_reverse is added.
+      Value *MaskPart = isMaskRequired ? BlockInMaskParts[Part] : nullptr;
+      NewSI = lowerStoreUsingVectorIntrinsics(
+          Builder, State.get(getAddr(), Part, !CreateScatter), StoredVal,
+          CreateScatter, MaskPart, EVL, Alignment);
+    } else if (CreateScatter) {
+      Value *MaskPart = isMaskRequired ? BlockInMaskParts[Part] : nullptr;
+      Value *VectorGep = State.get(getAddr(), Part);
+      NewSI = Builder.CreateMaskedScatter(StoredVal, VectorGep, Alignment,
+                                          MaskPart);
+    } else {
+      if (isReverse()) {
+        // If we store to reverse consecutive memory locations, then we need
+        // to reverse the order of elements in the stored value.
+        StoredVal = Builder.CreateVectorReverse(StoredVal, "reverse");
+        // We don't want to update the value in the map as it might be used in
+        // another expression. So don't call resetVectorValue(StoredVal).
+      }
+      auto *VecPtr = State.get(getAddr(), Part, /*IsScalar*/ true);
+      if (isMaskRequired)
+        NewSI = Builder.CreateMaskedStore(StoredVal, VecPtr, Alignment,
+                                          BlockInMaskParts[Part]);
+      else
+        NewSI = Builder.CreateAlignedStore(StoredVal, VecPtr, Alignment);
+    }
+    State.addMetadata(NewSI, SI);
   }
 }
 
diff --git a/llvm/lib/Transforms/Vectorize/VPRecipeBuilder.h b/llvm/lib/Transforms/Vectorize/VPRecipeBuilder.h
index 605b47fa0a46b8..b4c7ab02f928f0 100644
--- a/llvm/lib/Transforms/Vectorize/VPRecipeBuilder.h
+++ b/llvm/lib/Transforms/Vectorize/VPRecipeBuilder.h
@@ -69,9 +69,9 @@ class VPRecipeBuilder {
   /// Check if the load or store instruction \p I should widened for \p
   /// Range.Start and potentially masked. Such instructions are handled by a
   /// recipe that takes an additional VPInstruction for the mask.
-  VPWidenMemoryInstructionRecipe *tryToWidenMemory(Instruction *I,
-                                                   ArrayRef<VPValue *> Operands,
-                                                   VFRange &Range);
+  VPWidenMemoryRecipe *tryToWidenMemory(Instruction *I,
+                                        ArrayRef<VPValue *> Operands,
+                                        VFRange &Range);
 
   /// Check if an induction recipe should be constructed for \p Phi. If so build
   /// and return it. If not, return null.
diff --git a/llvm/lib/Transforms/Vectorize/VPlan.h b/llvm/lib/Transforms/Vectorize/VPlan.h
index 77577b516ae274..3a0800bbb3d45c 100644
--- a/llvm/lib/Transforms/Vectorize/VPlan.h
+++ b/llvm/lib/Transforms/Vectorize/VPlan.h
@@ -875,7 +875,8 @@ class VPSingleDefRecipe : public VPRecipeBase, public VPValue {
       return true;
     case VPRecipeBase::VPInterleaveSC:
     case VPRecipeBase::VPBranchOnMaskSC:
-    case VPRecipeBase::VPWidenMemoryInstructionSC:
+    case VPRecipeBase::VPWidenLoadSC:
+    case VPRecipeBase::VPWidenStoreSC:
       // TODO: Widened stores don't define a value, but widened loads do. Split
       // the recipes to be able to make widened loads VPSingleDefRecipes.
       return false;
@@ -2279,7 +2280,8 @@ class VPPredInstPHIRecipe : public VPSingleDefRecipe {
 /// - For store: Address, stored value, optional mask
 /// TODO: We currently execute only per-part unless a specific instance is
 /// provided.
-class VPWidenMemoryInstructionRecipe : public VPRecipeBase {
+class VPWidenMemoryRecipe : public VPRecipeBase {
+protected:
   Instruction &Ingredient;
 
   // Whether the loaded-from / stored-to addresses are consecutive.
@@ -2294,47 +2296,40 @@ class VPWidenMemoryInstructionRecipe : public VPRecipeBase {
     addOperand(Mask);
   }
 
-  bool isMasked() const {
-    return isStore() ? getNumOperands() == 3 : getNumOperands() == 2;
-  }
-
 public:
-  VPWidenMemoryInstructionRecipe(LoadInst &Load, VPValue *Addr, VPValue *Mask,
-                                 bool Consecutive, bool Reverse, DebugLoc DL)
-      : VPRecipeBase(VPDef::VPWidenMemoryInstructionSC, {Addr}, DL),
-        Ingredient(Load), Consecutive(Consecutive), Reverse(Reverse) {
+  VPWidenMemoryRecipe(const char unsigned SC, Instruction &I,
+                      std::initializer_list<VPValue *> Operands,
+                      bool Consecutive, bool Reverse, DebugLoc DL)
+      : VPRecipeBase(SC, Operands, DL), Ingredient(I), Consecutive(Consecutive),
+        Reverse(Reverse) {
     assert((Consecutive || !Reverse) && "Reverse implies consecutive");
-    new VPValue(this, &Load);
-    setMask(Mask);
   }
 
-  VPWidenMemoryInstructionRecipe(StoreInst &Store, VPValue *Addr,
-                                 VPValue *StoredValue, VPValue *Mask,
-                                 bool Consecutive, bool Reverse, DebugLoc DL)
-      : VPRecipeBase(VPDef::VPWidenMemoryInstructionSC, {Addr, StoredValue},
-                     DL),
-        Ingredient(Store), Consecutive(Consecutive), Reverse(Reverse) {
-    assert((Consecutive || !Reverse) && "Reverse implies consecutive");
-    setMask(Mask);
-  }
+  VPRecipeBase *clone() override = 0;
 
-  VPRecipeBase *clone() override {
-    if (isStore())
-      return new VPWidenMemoryInstructionRecipe(
-          cast<StoreInst>(Ingredient), getAddr(), getStoredValue(), getMask(),
-          Consecutive, Reverse, getDebugLoc());
+  static inline bool classof(const VPRecipeBase *R) {
+    return R->getVPDefID() == VPRecipeBase::VPWidenStoreSC ||
+           R->getVPDefID() == VPRecipeBase::VPWidenLoadSC;
+  }
 
-    return new VPWidenMemoryInstructionRecipe(cast<LoadInst>(Ingredient),
-                                              getAddr(), getMask(), Consecutive,
-                                              Reverse, getDebugLoc());
+  static inline bool classof(const VPUser *U) {
+    auto *R = dyn_cast<VPRecipeBase>(U);
+    return R && classof(R);
   }
 
-  VP_CLASSOF_IMPL(VPDef::VPWidenMemoryInstructionSC)
+  /// Returns true if the recipe is masked.
+  virtual bool isMasked() const = 0;
 
   /// Return the address accessed by this recipe.
-  VPValue *getAddr() const {
-    return getOperand(0); // Address is the 1st, mandatory operand.
-  }
+  virtual VPValue *getAddr() const = 0;
+
+
+  // Return whether the loaded-from / stored-to addresses are consecutive.
+  bool isConsecutive() const { return Consecutive; }
+
+  // Return whether the consecutive loaded/stored addresses are in reverse
+  // order.
+  bool isReverse() const { return Reverse; }
 
   /// Return the mask used by this recipe. Note that a full mask is represented
   /// by a nullptr.
@@ -2343,21 +2338,37 @@ class VPWidenMemoryInstructionRecipe : public VPRecipeBase {
     return isMasked() ? getOperand(getNumOperands() - 1) : nullptr;
   }
 
-  /// Returns true if this recipe is a store.
-  bool isStore() const { return isa<StoreInst>(Ingredient); }
+  /// Generate the wide load/store.
+  void execute(VPTransformState &State) override = 0;
+
+  Instruction &getIngredient() const { return Ingredient; }
+};
 
-  /// Return the address accessed by this recipe.
-  VPValue *getStoredValue() const {
-    assert(isStore() && "Stored value only available for store instructions");
-    return getOperand(1); // Stored value is the 2nd, mandatory operand.
+struct VPWidenLoadRecipe final : public VPWidenMemoryRecipe, public VPValue {
+  VPWidenLoadRecipe(LoadInst &Load, VPValue *Addr, VPValue *Mask,
+                    bool Consecutive, bool Reverse, DebugLoc DL)
+      : VPWidenMemoryRecipe(VPDef::VPWidenLoadSC, Load, {Addr}, Consecutive,
+                            Reverse, DL),
+        VPValue(this, &Load) {
+    assert((Consecutive || !Reverse) && "Reverse implies consecutive");
+    setMask(Mask);
   }
 
-  // Return whether the loaded-from / stored-to addresses are consecutive.
-  bool isConsecutive() const { return Consecutive; }
+  VPRecipeBase *clone() override {
+    return new VPWidenLoadRecipe(cast<LoadInst>(Ingredient), getAddr(),
+                                 getMask(), Consecutive, Reverse,
+                                 getDebugLoc());
+  }
 
-  // Return whether the consecutive loaded/stored addresses are in reverse
-  // order.
-  bool isReverse() const { return Reverse; }
+  VP_CLASSOF_IMPL(VPDef::VPWidenLoadSC);
+
+  /// Returns true if the recipe is masked.
+  bool isMasked() const override { return getNumOperands() == 2; }
+
+  /// Return the address accessed by this recipe.
+  VPValue *getAddr() const override {
+    return getOperand(0); // Address is the 1st, mandatory operand.
+  }
 
   /// Generate the wide load/store.
   void execute(VPTransformState &State) override;
@@ -2376,13 +2387,56 @@ class VPWidenMemoryInstructionRecipe : public VPRecipeBase {
     // Widened, consecutive memory operations only demand the first lane of
     // their address, unless the same operand is also stored. That latter can
     // happen with opaque pointers.
-    return Op == getAddr() && isConsecutive() &&
-           (!isStore() || Op != getStoredValue());
+    return Op == getAddr() && isConsecutive();
   }
-
-  Instruction &getIngredient() const { return Ingredient; }
 };
 
+struct VPWidenStoreRecipe final : public VPWidenMemoryRecipe {
+  VPWidenStoreRecipe(StoreInst &Store, VPValue *StoredVal, VPValue *Addr,
+                     VPValue *Mask, bool Consecutive, bool Reverse, DebugLoc DL)
+      : VPWidenMemoryRecipe(VPDef::VPWidenStoreSC, Store, {StoredVal, Addr},
+                            Consecutive, Reverse, DL) {
+    assert((Consecutive || !Reverse) && "Reverse implies consecutive");
+    setMask(Mask);
+  }
+
+  VPRecipeBase *clone() override {
+    return new VPWidenStoreRecipe(cast<StoreInst>(Ingredient), getStoredValue(),
+                                  getAddr(), getMask(), Consecutive, Reverse,
+                                  getDebugLoc());
+  }
+
+  VP_CLASSOF_IMPL(VPDef::VPWidenStoreSC);
+
+  /// Returns true if the recipe is masked.
+  bool isMasked() const override { return getNumOperands() == 3; }
+
+  /// Return the address accessed by this recipe.
+  VPValue *getAddr() const override { return getOperand(1); }
+
+  /// Return the address accessed by this recipe.
+  VPValue *getStoredValue() const { return getOperand(0); }
+
+  /// Generate the wide load/store.
+  void execute(VPTransformState &State) override;
+
+#if !defined(NDEBUG) || defined(LLVM_ENABLE_DUMP)
+  /// Print the recipe.
+  void print(raw_ostream &O, const Twine &Indent,
+             VPSlotTracker &SlotTracker) const override;
+#endif
+
+  /// Returns true if the recipe only uses the first lane of operand \p Op.
+  bool onlyFirstLaneUsed(const VPValue *Op) const override {
+    assert(is_contained(operands(), Op) &&
+           "Op must be an operand of the recipe");
+
+    // Widened, consecutive memory operations only demand the first lane of
+    // their address, unless the same operand is also stored. That latter can
+    // happen with opaque pointers.
+    return Op == getAddr() && isConsecutive() && Op != getStoredValue();
+  }
+};
 /// Recipe to expand a SCEV expression.
 class VPExpandSCEVRecipe : public VPSingleDefRecipe {
   const SCEV *Expr;
diff --git a/llvm/lib/Transforms/Vectorize/VPlanAnalysis.cpp b/llvm/lib/Transforms/Vectorize/VPlanAnalysis.cpp
index c8ae2ee5a30fe5..130fb04f586e75 100644
--- a/llvm/lib/T...
[truncated]

fhahn

Updated to latest main, addressed comments, resolved conflicts after #76172 landed.

fhahn · 2024-04-05T16:27:58Z

llvm/lib/Transforms/Vectorize/VPlan.h

-  VPValue *getAddr() const {
-    return getOperand(0); // Address is the 1st, mandatory operand.
-  }
+  virtual VPValue *getAddr() const = 0;


There are callers that need to get the address of any WidenMemoryRecipe (e.g. VPlanTransforms::dropPoisonGeneratingRecipes), kept virtual for now.

github-actions · 2024-04-05T16:29:32Z

✅ With the latest revision this PR passed the C/C++ code formatter.

…ryinst

alexey-bataev · 2024-04-05T17:44:07Z

llvm/lib/Transforms/Vectorize/LoopVectorize.cpp

-  assert((LI || SI) && "Invalid Load/Store instruction");
-  assert((!SI || StoredValue) && "No stored value provided for widened store");
-  assert((!LI || !StoredValue) && "Stored value provided for widened load");
+  LoadInst *LI = cast<LoadInst>(&Ingredient);


Suggested change

LoadInst *LI = cast<LoadInst>(&Ingredient);

auto *LI = cast<LoadInst>(&Ingredient);

Done, thanks!

alexey-bataev · 2024-04-05T17:44:53Z

llvm/lib/Transforms/Vectorize/LoopVectorize.cpp


  Type *ScalarDataTy = getLoadStoreType(&Ingredient);

  auto *DataTy = VectorType::get(ScalarDataTy, State.VF);
  const Align Alignment = getLoadStoreAlignment(&Ingredient);
-  bool CreateGatherScatter = !isConsecutive();
+  bool CreateGather = !isConsecutive();


Suggested change

bool CreateGather = !isConsecutive();

bool IsConsecutive = isConsecutive();

Kept as is for now, as CreateGather seems slightly more descriptive w.r.t. how it is used

alexey-bataev · 2024-04-05T17:45:23Z

llvm/lib/Transforms/Vectorize/LoopVectorize.cpp

-          Builder, DataTy, State.get(getAddr(), Part, !CreateGatherScatter),
-          CreateGatherScatter, MaskPart, EVL, Alignment);
-    } else if (CreateGatherScatter) {
+          Builder, DataTy, State.get(getAddr(), Part, !CreateGather),


Suggested change

Builder, DataTy, State.get(getAddr(), Part, !CreateGather),

Builder, DataTy, State.get(getAddr(), Part, IsConsecutive),

Kept as is for now, as CreateGather seems slightly more descriptive w.r.t. how it is used

alexey-bataev · 2024-04-05T17:45:45Z

llvm/lib/Transforms/Vectorize/LoopVectorize.cpp

-    } else if (CreateGatherScatter) {
+          Builder, DataTy, State.get(getAddr(), Part, !CreateGather),
+          CreateGather, MaskPart, EVL, Alignment);
+    } else if (CreateGather) {


Suggested change

} else if (CreateGather) {

} else if (!IsConsecutive) {

Kept as is for now, as CreateGather seems slightly more descriptive w.r.t. how it is used

alexey-bataev · 2024-04-05T17:46:14Z

llvm/lib/Transforms/Vectorize/LoopVectorize.cpp

+  const Align Alignment = getLoadStoreAlignment(&Ingredient);
+  bool CreateScatter = !isConsecutive();
+
+  StoreInst *SI = cast<StoreInst>(&Ingredient);


Suggested change

StoreInst *SI = cast<StoreInst>(&Ingredient);

auto *SI = cast<StoreInst>(&Ingredient);

Done, thanks!

alexey-bataev · 2024-04-05T17:47:09Z

llvm/lib/Transforms/Vectorize/LoopVectorize.cpp

+      // FIXME: Support reverse store after vp_reverse is added.
+      Value *MaskPart = isMaskRequired ? BlockInMaskParts[Part] : nullptr;
+      NewSI = lowerStoreUsingVectorIntrinsics(
+          Builder, State.get(getAddr(), Part, !CreateScatter), StoredVal,


Suggested change

Builder, State.get(getAddr(), Part, !CreateScatter), StoredVal,

Builder, State.get(getAddr(), Part, IsConsecutive), StoredVal,

Kept as is for now, as CreateScatter seems slightly more descriptive w.r.t. how it is used

alexey-bataev · 2024-04-05T17:47:28Z

llvm/lib/Transforms/Vectorize/LoopVectorize.cpp

+      NewSI = lowerStoreUsingVectorIntrinsics(
+          Builder, State.get(getAddr(), Part, !CreateScatter), StoredVal,
+          CreateScatter, MaskPart, EVL, Alignment);
+    } else if (CreateScatter) {


Suggested change

} else if (CreateScatter) {

} else if (!IsConsecutive) {

Kept as is for now, as CreateScatter seems slightly more descriptive w.r.t. how it is used

alexey-bataev · 2024-04-05T17:50:16Z

llvm/lib/Transforms/Vectorize/VPlan.h

+      return getNumOperands() == 2;
+    case VPDef::VPWidenStoreSC:
+      return getNumOperands() == 3;


Suggested change

return getNumOperands() == 2;

case VPDef::VPWidenStoreSC:

return getNumOperands() == 3;

cast<VPWidenLoadRecipe>(this)->isMasked();

case VPDef::VPWidenStoreSC:

cast<VPWidenStoreRecipe>(this)->isMasked();

Is it worth duplicating isMasked in the subclasses if we dispatch manually here? Having the checks here directly seems slightly more compact. Same for getAddr

Is it needeв at all to keep it in a base class? Maybe just use the recipe classes explicitly rather than rely on base class? It exposes implementation details in the base class, which is not very good.

Updated to use IsMasked to track if it is masked, keeping things simpler for the initial version, thanks!

alexey-bataev · 2024-04-05T17:51:10Z

llvm/lib/Transforms/Vectorize/VPlan.h

+    case VPDef::VPWidenLoadSC:
+      return getOperand(0);
+    case VPDef::VPWidenStoreSC:
+      return getOperand(1);


Suggested change

case VPDef::VPWidenLoadSC:

return getOperand(0);

case VPDef::VPWidenStoreSC:

return getOperand(1);

case VPDef::VPWidenLoadSC:

return cast<VPWidenLoadRecipe>(this)->getAddr();

case VPDef::VPWidenStoreSC:

return cast<VPWidenStoreRecipe>(this)->getAddr();

see comment above for isMasked

alexey-bataev · 2024-04-05T17:52:21Z

llvm/lib/Transforms/Vectorize/VPlan.h

-  /// Returns true if this recipe is a store.
-  bool isStore() const { return isa<StoreInst>(Ingredient); }
+  /// Generate the wide load/store.
+  void execute(VPTransformState &State) override = 0;


Do you really need to make it pure virtual or just enough to have execute function in each implementation? And here just make it llvm_unreachable?

Replaced with llvm_unreachable, thanks!

llvm#87411

fhahn

Addressed latest comments, thanks!

fhahn · 2024-04-05T18:08:06Z

llvm/lib/Transforms/Vectorize/LoopVectorize.cpp

-  assert((LI || SI) && "Invalid Load/Store instruction");
-  assert((!SI || StoredValue) && "No stored value provided for widened store");
-  assert((!LI || !StoredValue) && "Stored value provided for widened load");
+  LoadInst *LI = cast<LoadInst>(&Ingredient);


Done, thanks!

fhahn · 2024-04-05T18:08:38Z

llvm/lib/Transforms/Vectorize/LoopVectorize.cpp


  Type *ScalarDataTy = getLoadStoreType(&Ingredient);

  auto *DataTy = VectorType::get(ScalarDataTy, State.VF);
  const Align Alignment = getLoadStoreAlignment(&Ingredient);
-  bool CreateGatherScatter = !isConsecutive();
+  bool CreateGather = !isConsecutive();


Kept as is for now, as CreateGather seems slightly more descriptive w.r.t. how it is used

fhahn · 2024-04-05T18:09:04Z

llvm/lib/Transforms/Vectorize/LoopVectorize.cpp

-          Builder, DataTy, State.get(getAddr(), Part, !CreateGatherScatter),
-          CreateGatherScatter, MaskPart, EVL, Alignment);
-    } else if (CreateGatherScatter) {
+          Builder, DataTy, State.get(getAddr(), Part, !CreateGather),


Kept as is for now, as CreateGather seems slightly more descriptive w.r.t. how it is used

fhahn · 2024-04-05T18:09:05Z

llvm/lib/Transforms/Vectorize/LoopVectorize.cpp

-    } else if (CreateGatherScatter) {
+          Builder, DataTy, State.get(getAddr(), Part, !CreateGather),
+          CreateGather, MaskPart, EVL, Alignment);
+    } else if (CreateGather) {


Kept as is for now, as CreateGather seems slightly more descriptive w.r.t. how it is used

fhahn · 2024-04-05T18:09:16Z

llvm/lib/Transforms/Vectorize/LoopVectorize.cpp

+  VPValue *StoredValue = getStoredValue();
+
+  const Align Alignment = getLoadStoreAlignment(&Ingredient);
+  bool CreateScatter = !isConsecutive();


Kept as is for now, as CreateScatter seems slightly more descriptive w.r.t. how it is used

fhahn · 2024-04-05T18:17:43Z

llvm/lib/Transforms/Vectorize/VPlan.h

-  /// Returns true if this recipe is a store.
-  bool isStore() const { return isa<StoreInst>(Ingredient); }
+  /// Generate the wide load/store.
+  void execute(VPTransformState &State) override = 0;


Replaced with llvm_unreachable, thanks!

fhahn · 2024-04-05T18:21:25Z

llvm/lib/Transforms/Vectorize/VPlan.h

+      return getNumOperands() == 2;
+    case VPDef::VPWidenStoreSC:
+      return getNumOperands() == 3;


Is it worth duplicating isMasked in the subclasses if we dispatch manually here? Having the checks here directly seems slightly more compact. Same for getAddr

fhahn · 2024-04-05T18:37:51Z

llvm/lib/Transforms/Vectorize/LoopVectorize.cpp

+      // FIXME: Support reverse store after vp_reverse is added.
+      Value *MaskPart = isMaskRequired ? BlockInMaskParts[Part] : nullptr;
+      NewSI = lowerStoreUsingVectorIntrinsics(
+          Builder, State.get(getAddr(), Part, !CreateScatter), StoredVal,


Kept as is for now, as CreateScatter seems slightly more descriptive w.r.t. how it is used

fhahn · 2024-04-05T18:37:54Z

llvm/lib/Transforms/Vectorize/LoopVectorize.cpp

+      NewSI = lowerStoreUsingVectorIntrinsics(
+          Builder, State.get(getAddr(), Part, !CreateScatter), StoredVal,
+          CreateScatter, MaskPart, EVL, Alignment);
+    } else if (CreateScatter) {


Kept as is for now, as CreateScatter seems slightly more descriptive w.r.t. how it is used

fhahn · 2024-04-05T18:38:20Z

llvm/lib/Transforms/Vectorize/VPlan.h

+    case VPDef::VPWidenLoadSC:
+      return getOperand(0);
+    case VPDef::VPWidenStoreSC:
+      return getOperand(1);


see comment above for isMasked

Introduce new subclasses of VPWidenMemoryRecipe for VP (vector-predicated) loads and stores to address multiple TODOs from llvm#76172 Note that the introduction of the new recipes also improves code-gen for VP gather/scatters by removing the redundant header mask. With the new approach, it is not sufficient to look at users of the widened canonical IV to find all uses of the header mask. In some cases, a widened IV is used instead of separately widening the canonical IV. To handle those cases, iterate over all recipes in the vector loop region to make sure all widened memory recipes are processed. Depends on llvm#87411.

alexey-bataev · 2024-04-05T19:07:24Z

llvm/lib/Transforms/Vectorize/VPlan.h

+  bool isMasked() const {
+    switch (getVPDefID()) {
+    case VPDef::VPWidenLoadSC:
+      return getNumOperands() == 2;
+    case VPDef::VPWidenStoreSC:
+      return getNumOperands() == 3;


I think it can be fixed this way:

template <typename T> class Base { ... bool isMasked() const { return cast<typename T>(this)->isMasked(); } ... } class Derived1 : public Base<Derived1> { ... bool isMasked() const {return ..;} } class Derived2 : public Base<Derived2> { ... bool isMasked() const {return ..;} }

ayalz

Good step forward, thanks for following-up on this!

ayalz · 2024-04-09T15:40:29Z

llvm/lib/Transforms/Vectorize/LoopVectorize.cpp

  // Handle loads.
  assert(LI && "Must have a load instruction");


Suggested change

// Handle loads.

assert(LI && "Must have a load instruction");

Only loads are handled, and LI is asserted to be non-null by the non-dynamic cast.

removed, thanks!

ayalz · 2024-04-09T15:44:32Z

llvm/lib/Transforms/Vectorize/LoopVectorize.cpp


  auto &Builder = State.Builder;
  InnerLoopVectorizer::VectorParts BlockInMaskParts(State.UF);
-  bool isMaskRequired = getMask();
-  if (isMaskRequired) {
+  bool IsMaskRequired = getMask();


Now that loads and stores are handled separately, it makes sense for each to get its mask while taking care of each part, instead of preparing BlockInMaskParts, and do so once for all EVL/gather/consecutive cases. I.e.,

for (unsigned Part = 0; Part < State.UF; ++Part) { Value *NewLI; Value *Mask = nullptr; if (VPValue *VPMask = getMask()) { Mask = State.get(VPMask, Part); if (isReverse()) Mask = Builder.CreateVectorReverse(Mask, "reverse"); } // TODO: split this into several classes for better design. if (State.EVL) { ... }

Updated, thanks!

ayalz · 2024-04-09T16:07:23Z

llvm/lib/Transforms/Vectorize/VPlan.h

@@ -875,7 +875,8 @@ class VPSingleDefRecipe : public VPRecipeBase, public VPValue {
      return true;
    case VPRecipeBase::VPInterleaveSC:
    case VPRecipeBase::VPBranchOnMaskSC:
-    case VPRecipeBase::VPWidenMemoryInstructionSC:
+    case VPRecipeBase::VPWidenLoadSC:
+    case VPRecipeBase::VPWidenStoreSC:


Hmm, the TODO below suggests that Loads should (also) be considered single def. Should VPWidenLoadRecipe inherit from both VPWidenMemoryRecipe and VPSingleDefRecipe? (Deserves a separate patch, but worth thinking when introducing the class hierarchy here.)

Yes, unfortunately it will require some extra work, as at the moment both VPWidenMemoryRecipe and VPSingleDefRecipe inherits from VPRecipeBase, both so they can manage operands.

Another alternative may be to also consider VPWidenStoreRecipe as a Single Def recipe, with a singleton "void" Def that has no uses. Akin to LLVM. I.e., VPSingle[OrNo]DefRecipe.

ayalz · 2024-04-09T16:12:38Z

llvm/lib/Transforms/Vectorize/VPlan.h

-/// provided.
-class VPWidenMemoryInstructionRecipe : public VPRecipeBase {
+/// A common base class for widening memory operations. An optional mask can be
+/// provided the last operand.


Suggested change

/// provided the last operand.

/// provided as the last operand.

Done, thanks!

ayalz · 2024-04-09T18:18:58Z

llvm/lib/Transforms/Vectorize/VPlan.h

+                     VPValue *Mask, bool Consecutive, bool Reverse, DebugLoc DL)
+      : VPWidenMemoryRecipe(VPDef::VPWidenStoreSC, Store, {StoredVal, Addr},
+                            Consecutive, Reverse, DL) {
+    assert((Consecutive || !Reverse) && "Reverse implies consecutive");


nit: suffice to assert that reverse implies consecutive in the WidenMemory base class, where they are held.

Removed here, thanks!

ayalz · 2024-04-09T18:19:17Z

llvm/lib/Transforms/Vectorize/VPlan.h

+      : VPWidenMemoryRecipe(VPDef::VPWidenLoadSC, Load, {Addr}, Consecutive,
+                            Reverse, DL),
+        VPValue(this, &Load) {
+    assert((Consecutive || !Reverse) && "Reverse implies consecutive");


nit: suffice to assert that reverse implies consecutive once, in the WidenMemory base class, where they are held.

Removed here, thanks!

ayalz · 2024-04-09T22:14:58Z

llvm/lib/Transforms/Vectorize/VPlan.h

+    default:
+      llvm_unreachable("unhandled recipe");
+    }
+  }

  /// Return the address accessed by this recipe.
  VPValue *getAddr() const {


Note that this adjusts the order of the operands for VPWidenStoreRecipe to match the order of operands of stores in IR and other recipes (like VPReplicateRecipe).

Note that the current order, even if distinct from IR and other recipes, would help simplify this base recipe, responsible for elements common to stores/loads/scatters/gathers, by holding the address as the first operand (and mask as last) for all, supporting its simple retrieval:

VPValue *getAddr() const { return getOperand(0); // Address is the 1st, mandatory operand. }

In any case, it may be good to swap the order in a follow-up patch.

Updated to keep the address as first operand for now, to keep the patch simpler initially, thanks!

ayalz · 2024-04-09T22:35:03Z

llvm/lib/Transforms/Vectorize/VPlan.h

  }

-  VP_CLASSOF_IMPL(VPDef::VPWidenMemoryInstructionSC)
+  /// Returns true if the recipe is masked.
+  bool isMasked() const {


Another option is to have VPWidenMemoryRecipe maintain an IsMasked indicator instead of counting operands (the latter may be done by assert/validation).

Updated for now, to keep initial version simpler, thanks!

…ryinst

ayalz

LGTM, thanks! Please wait a day or so if @alexey-bataev has further comments.
Added minor nits.
Commit message worth updating: last paragraph regarding operand reordering, and slight typo in first paragraph.

ayalz · 2024-04-14T07:17:18Z

llvm/lib/Transforms/Vectorize/LoopVectorize.cpp


  StoreInst *Store = cast<StoreInst>(I);
-  return new VPWidenMemoryInstructionRecipe(
-      *Store, Ptr, Operands[0], Mask, Consecutive, Reverse, I->getDebugLoc());
+  return new VPWidenStoreRecipe(*Store, Operands[0], Ptr, Mask, Consecutive,


nit: worth retaining the parameters in their order as operands? I.e., Ptr as first operand, before stored value.

Updated, thanks!

ayalz · 2024-04-14T07:20:50Z

llvm/lib/Transforms/Vectorize/LoopVectorize.cpp

-    }
-    return;
-  }
-
  // Handle loads.


nit: remove - redundant?

Removed, thanks!

ayalz · 2024-04-14T07:50:30Z

llvm/lib/Transforms/Vectorize/LoopVectorize.cpp

+void VPWidenLoadRecipe::execute(VPTransformState &State) {
  // Attempt to issue a wide load.


Suggested change

void VPWidenLoadRecipe::execute(VPTransformState &State) {

// Attempt to issue a wide load.

void VPWidenLoadRecipe::execute(VPTransformState &State) {

nit: redundant?

removed, thanks

ayalz · 2024-04-14T07:59:59Z

llvm/lib/Transforms/Vectorize/LoopVectorize.cpp

+        Mask = Builder.CreateVectorReverse(Mask, "reverse");
+    }
+
+    Value *StoredVal = State.get(StoredValue, Part);


Suggested change

Value *StoredVal = State.get(StoredValue, Part);

Value *StoredVal = State.get(StoredValue, Part);

if (isReverse()) {

// If we store to reverse consecutive memory locations, then we need

// to reverse the order of elements in the stored value.

StoredVal = Builder.CreateVectorReverse(StoredVal, "reverse");

// We don't want to update the value in the map as it might be used in

// another expression. So don't call resetVectorValue(StoredVal).

}

Hoisted, thanks!

ayalz · 2024-04-14T08:03:20Z

llvm/lib/Transforms/Vectorize/LoopVectorize.cpp

+      if (isReverse()) {
+        // If we store to reverse consecutive memory locations, then we need
+        // to reverse the order of elements in the stored value.
+        StoredVal = Builder.CreateVectorReverse(StoredVal, "reverse");
+        // We don't want to update the value in the map as it might be used in
+        // another expression. So don't call resetVectorValue(StoredVal).
+      }


Suggested change

if (isReverse()) {

// If we store to reverse consecutive memory locations, then we need

// to reverse the order of elements in the stored value.

StoredVal = Builder.CreateVectorReverse(StoredVal, "reverse");

// We don't want to update the value in the map as it might be used in

// another expression. So don't call resetVectorValue(StoredVal).

}

nit: better fix StoredVal above, when set. Can assert that reverse implies !State.EVL. The assert that reverse implies !CreateScatter == isConsecutive is already there.

Hoisted, thanks!

ayalz · 2024-04-14T13:27:22Z

llvm/lib/Transforms/Vectorize/VPlan.h

+  /// Return the value stored by this recipe.
+  VPValue *getStoredValue() const { return getOperand(1); }
+
+  /// Generate the wide load/store.


Suggested change

/// Generate the wide load/store.

/// Generate a wide store or scatter.

Done, thanks!

ayalz · 2024-04-14T13:36:45Z

llvm/lib/Transforms/Vectorize/VPlanRecipes.cpp

  case VPWidenPHISC:
+  case VPWidenLoadSC:


Suggested change

case VPWidenPHISC:

case VPWidenLoadSC:

case VPWidenLoadSC:

case VPWidenPHISC:

nit: retain lex order.

reordered, thanks!

ayalz · 2024-04-14T13:38:19Z

llvm/lib/Transforms/Vectorize/VPlanRecipes.cpp

  case VPScalarIVStepsSC:
  case VPPredInstPHISC:
+  case VPWidenStoreSC:


Suggested change

case VPScalarIVStepsSC:

case VPPredInstPHISC:

case VPWidenStoreSC:

case VPPredInstPHISC:

case VPScalarIVStepsSC:

case VPWidenStoreSC:

nit: while we're here, can fix lex order.

Done, thanks!

ayalz · 2024-04-14T13:48:33Z

llvm/lib/Transforms/Vectorize/VPlanRecipes.cpp

  O << Indent << "WIDEN ";
+  printAsOperand(O, SlotTracker);


Can this call to printAsOperand() work w/o getVPSingleValue(), given that VPWidenLoadRecipe inherits from RecipeBase rather than SingleDefRecipe?

VPWidenLoadRecipe inherits directly from VPValue.

ayalz · 2024-04-14T13:52:40Z

llvm/lib/Transforms/Vectorize/VPlanTransforms.cpp

+      if (isa<VPWidenLoadRecipe>(&R)) {
        continue;
      }


Suggested change

if (isa<VPWidenLoadRecipe>(&R)) {

continue;

}

if (isa<VPWidenLoadRecipe>(&R))

continue;

Done, thanks!

…ryinst

Introduce new subclasses of VPWidenMemoryRecipe for VP (vector-predicated) loads and stores to address multiple TODOs from llvm#76172 Note that the introduction of the new recipes also improves code-gen for VP gather/scatters by removing the redundant header mask. With the new approach, it is not sufficient to look at users of the widened canonical IV to find all uses of the header mask. In some cases, a widened IV is used instead of separately widening the canonical IV. To handle those cases, iterate over all recipes in the vector loop region to make sure all widened memory recipes are processed. Depends on llvm#87411.

Introduce new subclasses of VPWidenMemoryRecipe for VP (vector-predicated) loads and stores to address multiple TODOs from #76172 Note that the introduction of the new recipes also improves code-gen for VP gather/scatters by removing the redundant header mask. With the new approach, it is not sufficient to look at users of the widened canonical IV to find all uses of the header mask. In some cases, a widened IV is used instead of separately widening the canonical IV. To handle that, first collect all VPValues representing header masks (by looking at users of both the canonical IV and widened inductions that are canonical) and then checking all users (recursively) of those header masks. Depends on #87411. PR: #87816

Introduce new subclasses of VPWidenMemoryRecipe for VP (vector-predicated) loads and stores to address multiple TODOs from llvm#76172 Note that the introduction of the new recipes also improves code-gen for VP gather/scatters by removing the redundant header mask. With the new approach, it is not sufficient to look at users of the widened canonical IV to find all uses of the header mask. In some cases, a widened IV is used instead of separately widening the canonical IV. To handle that, first collect all VPValues representing header masks (by looking at users of both the canonical IV and widened inductions that are canonical) and then checking all users (recursively) of those header masks. Depends on llvm#87411. PR: llvm#87816

fhahn requested review from ayalz and aniragil April 2, 2024 21:06

fhahn mentioned this pull request Apr 2, 2024

[LV, VP]VP intrinsics support for the Loop Vectorizer + adding new tail-folding mode using EVL. #76172

Merged

alexey-bataev reviewed Apr 4, 2024

View reviewed changes

fhahn added 2 commits April 5, 2024 14:21

Merge remote-tracking branch 'origin/main' into vplan-split-widenmemo…

132a5da

…ryinst

!fixup update after upstream changes.

09de83e

llvmbot added backend:RISC-V vectorizers llvm:transforms labels Apr 5, 2024

fhahn commented Apr 5, 2024

View reviewed changes

fhahn added 3 commits April 5, 2024 17:30

!fixup fix formatting.

e5cba93

!fixup remove virtual functions, add additional doc comments.

be4033c

Merge remote-tracking branch 'origin/main' into vplan-split-widenmemo…

ffa600c

…ryinst

alexey-bataev reviewed Apr 5, 2024

View reviewed changes

!fixup address latest comments, thanks!

fe09e81

fhahn added a commit to fhahn/llvm-project that referenced this pull request Apr 5, 2024

[VPlan] Split VPWidenMemoryInstructionRecipe (NFCI).

2ec0e32

llvm#87411

fhahn commented Apr 5, 2024

View reviewed changes

fhahn mentioned this pull request Apr 5, 2024

[VPlan] Introduce recipes for VP loads and stores. #87816

Merged

alexey-bataev reviewed Apr 5, 2024

View reviewed changes

ayalz reviewed Apr 9, 2024

View reviewed changes

fhahn added 2 commits April 11, 2024 15:31

Merge remote-tracking branch 'origin/main' into vplan-split-widenmemo…

ea170df

…ryinst

!fixup

7106160

fhahn added 3 commits April 12, 2024 20:04

Merge remote-tracking branch 'origin/main' into vplan-split-widenmemo…

792be89

…ryinst

!fixup fix mayRead/mayWrite implementation

06285b9

!fixup fix formatting

3d15115

ayalz approved these changes Apr 14, 2024

View reviewed changes

fhahn added 2 commits April 16, 2024 14:04

Merge remote-tracking branch 'origin/main' into vplan-split-widenmemo…

2b96fc5

…ryinst

!fixup restore original operand order, additional fixups

dd5f1f6

fhahn merged commit a9bafe9 into llvm:main Apr 17, 2024
4 checks passed

fhahn deleted the vplan-split-widenmemoryinst branch April 17, 2024 10:38

	LoadInst *LI = cast<LoadInst>(&Ingredient);
	auto *LI = cast<LoadInst>(&Ingredient);

	bool CreateGather = !isConsecutive();
	bool IsConsecutive = isConsecutive();

	Builder, DataTy, State.get(getAddr(), Part, !CreateGather),
	Builder, DataTy, State.get(getAddr(), Part, IsConsecutive),

	StoreInst *SI = cast<StoreInst>(&Ingredient);
	auto *SI = cast<StoreInst>(&Ingredient);

	Builder, State.get(getAddr(), Part, !CreateScatter), StoredVal,
	Builder, State.get(getAddr(), Part, IsConsecutive), StoredVal,

		// Handle loads.
		assert(LI && "Must have a load instruction");

	/// provided the last operand.
	/// provided as the last operand.

		void VPWidenLoadRecipe::execute(VPTransformState &State) {
		// Attempt to issue a wide load.

	/// Generate the wide load/store.
	/// Generate a wide store or scatter.

[VPlan] Split VPWidenMemoryInstructionRecipe (NFCI). #87411

[VPlan] Split VPWidenMemoryInstructionRecipe (NFCI). #87411

Conversation

fhahn commented Apr 2, 2024 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

llvmbot commented Apr 5, 2024

llvmbot commented Apr 5, 2024

fhahn left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

github-actions bot commented Apr 5, 2024 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

fhahn left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

ayalz left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

ayalz left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

fhahn commented Apr 2, 2024 •

edited

Loading

github-actions bot commented Apr 5, 2024 •

edited

Loading