[RISCV] Handle recurrences in RISCVVLOptimizer #151285

lukel97 · 2025-07-30T06:59:11Z

After #144666 we now support vectorizing loops with induction variables with EVL tail folding. The induction updates don't use VP intrinsics to avoid VL toggles but instead rely on RISCVVLOptimizer. However RISCVVLOptimizer can't reason about cycles or recurrences today, which means we are left with a VL toggle to VLMAX:

# %bb.1:                                # %for.body.preheader
	li	a2, 0
	vsetvli	a3, zero, e32, m2, ta, ma
	vid.v	v8
.LBB0_2:                                # %vector.body
                                        # =>This Inner Loop Header: Depth=1
	sub	a3, a1, a2
	sh2add	a4, a2, a0
	vsetvli	a3, a3, e32, m2, ta, ma
	vle32.v	v10, (a4)
	add	a2, a2, a3
	vadd.vv	v10, v10, v8
	vse32.v	v10, (a4)
	vsetvli	a4, zero, e32, m2, ta, ma
	vadd.vx	v8, v8, a3
	bne	a2, a1, .LBB0_2

This patch teaches RISCVVLOptimizer to reason about recurrences so we can remove the VLMAX toggle:

# %bb.1:                                # %for.body.preheader
	li	a2, 0
	vsetvli	a3, zero, e32, m2, ta, ma
	vid.v	v8
.LBB0_2:                                # %vector.body
                                        # =>This Inner Loop Header: Depth=1
	sub	a3, a1, a2
	sh2add	a4, a2, a0
	vsetvli	a3, a3, e32, m2, ta, ma
	vle32.v	v10, (a4)
	add	a2, a2, a3
	vadd.vv	v10, v10, v8
	vse32.v	v10, (a4)
	vadd.vx	v8, v8, a3
	bne	a2, a1, .LBB0_2

With this we remove a significant number of VL toggles and vsetvli instructions across llvm-test-suite and SPEC CPU 2017 with tail folding enabled, since it affects every loop with an induction variable.

This builds upon the work in #124530 where we started computing what VL each instruction demanded, and generalizes it to an optimistic sparse dataflow analysis:

We begin by optimistically assuming no VL is used by any instruction, and push instructions onto the worklist starting from the bottom.
For each instruction on the worklist we apply the transfer function, which propagates the VL needed by that instruction upwards to the instructions it uses. If a use's demanded VL changes, it's added to the worklist.
Eventually this converges to a fixpoint when all uses have been processed and every demanded VL has been propagated throughout the entire use-def chain. Only after this is the DemandedVL map accurate.

Some implementation details:

The roots are stores (or other unsupported instructions not in isSupportedInstr) or copies to physical registers (they fail the any_of(MI.defs(), isPhysical) check)
This patch untangles getMinimumVLForUser and checkUsers. getMinimumVLForUser now returns how many lanes of an operand are read by an instruction, whilst checkUsers checks that an instruction and its users have compatible EEW/EMULs.
The DemandedVL struct was added so that we have a default constructor of 0 for DenseMap<const MachineInstr *, DemandedVL> DemandedVLs, so we don't need to check if a key exists when looking things up.

There was no measurable compile time impact on llvm-test-suite or SPEC CPU 2017, and the analysis is guaranteed to terminate.

Formally the set of possible states for DemandedVLs forms a semilattice ordered by $$S_1 \le S_2 = \forall x \in \hbox{defs}, S_1[x] \le S_2[x]$$. The transfer function is monotonic with respect to this ordering (we only ever increase a demanded VL because we take the maximum):

$$f(i,S) = \forall x \in \hbox{defs}, { x \Rightarrow \cases{ \max(S[x], \hbox{getMinimumVLForUser}(S, i, x)) &\quad if $x \in \hbox{ops}(i)$ \cr S[x] &\quad otherwise \cr } }$$

DemandedVLs is a finite lattice, where the height is determined by the number of distinct VL values in the program (which in practice is usually just one, a call to @llvm.experimental.get.vector.length), so either the analysis must reach a fixpoint or reach the maximum, i.e. every instruction demands VLMAX.

There are more details in this EuroLLVM talk.

The proof of monotonicity has also been mechanised in Lean

Fixes #149354

llvmbot · 2025-07-30T06:59:44Z

@llvm/pr-subscribers-backend-risc-v

Author: Luke Lau (lukel97)

Changes

[RISCV] Add TSFlag for reading past VL behaviour. NFCI
Precommit tests
[RISCV] Handle recurrences in RISCVVLOptimizer

Patch is 25.36 KiB, truncated to 20.00 KiB below, full version: https://github.com/llvm/llvm-project/pull/151285.diff

11 Files Affected:

(modified) llvm/lib/Target/RISCV/MCTargetDesc/RISCVBaseInfo.h (+9)
(modified) llvm/lib/Target/RISCV/RISCVInstrFormats.td (+6)
(modified) llvm/lib/Target/RISCV/RISCVInstrInfoV.td (+5-4)
(modified) llvm/lib/Target/RISCV/RISCVInstrInfoXRivos.td (+2-1)
(modified) llvm/lib/Target/RISCV/RISCVInstrInfoXSf.td (+2)
(modified) llvm/lib/Target/RISCV/RISCVInstrInfoXSfmm.td (+5)
(modified) llvm/lib/Target/RISCV/RISCVVLOptimizer.cpp (+87-76)
(modified) llvm/test/CodeGen/RISCV/rvv/reproducer-pr146855.ll (+2-2)
(modified) llvm/test/CodeGen/RISCV/rvv/vl-opt.ll (+52)
(modified) llvm/test/CodeGen/RISCV/rvv/vl-opt.mir (+70-1)
(modified) llvm/test/CodeGen/RISCV/rvv/vlopt-same-vl.ll (+2-2)

diff --git a/llvm/lib/Target/RISCV/MCTargetDesc/RISCVBaseInfo.h b/llvm/lib/Target/RISCV/MCTargetDesc/RISCVBaseInfo.h
index bddea43fbb09c..9d26fc01bf379 100644
--- a/llvm/lib/Target/RISCV/MCTargetDesc/RISCVBaseInfo.h
+++ b/llvm/lib/Target/RISCV/MCTargetDesc/RISCVBaseInfo.h
@@ -139,6 +139,9 @@ enum {
   // 3 -> SEW * 4
   DestEEWShift = ElementsDependOnMaskShift + 1,
   DestEEWMask = 3ULL << DestEEWShift,
+
+  ReadsPastVLShift = DestEEWShift + 2,
+  ReadsPastVLMask = 1ULL << ReadsPastVLShift,
 };
 
 // Helper functions to read TSFlags.
@@ -195,6 +198,12 @@ static inline bool elementsDependOnMask(uint64_t TSFlags) {
   return TSFlags & ElementsDependOnMaskMask;
 }
 
+/// \returns true if the instruction may read elements past VL, e.g.
+/// vslidedown/vrgather
+static inline bool readsPastVL(uint64_t TSFlags) {
+  return TSFlags & ReadsPastVLMask;
+}
+
 static inline unsigned getVLOpNum(const MCInstrDesc &Desc) {
   const uint64_t TSFlags = Desc.TSFlags;
   // This method is only called if we expect to have a VL operand, and all
diff --git a/llvm/lib/Target/RISCV/RISCVInstrFormats.td b/llvm/lib/Target/RISCV/RISCVInstrFormats.td
index d9c6101478064..878a0ec938919 100644
--- a/llvm/lib/Target/RISCV/RISCVInstrFormats.td
+++ b/llvm/lib/Target/RISCV/RISCVInstrFormats.td
@@ -261,6 +261,12 @@ class RVInstCommon<dag outs, dag ins, string opcodestr, string argstr,
   // Indicates the EEW of a vector instruction's destination operand.
   EEW DestEEW = EEWSEWx1;
   let TSFlags{25-24} = DestEEW.Value;
+
+  // Some vector instructions like vslidedown/vrgather will read elements past
+  // VL, and should be marked to make sure RISCVVLOptimizer doesn't reduce its
+  // operands' VLs.
+  bit ReadsPastVL = 0;
+  let TSFlags{26} = ReadsPastVL;
 }
 
 class RVInst<dag outs, dag ins, string opcodestr, string argstr,
diff --git a/llvm/lib/Target/RISCV/RISCVInstrInfoV.td b/llvm/lib/Target/RISCV/RISCVInstrInfoV.td
index 33c713833d8b9..cebab2112d02d 100644
--- a/llvm/lib/Target/RISCV/RISCVInstrInfoV.td
+++ b/llvm/lib/Target/RISCV/RISCVInstrInfoV.td
@@ -1703,8 +1703,9 @@ let Constraints = "@earlyclobber $vd", RVVConstraint = SlideUp in {
 defm VSLIDEUP_V : VSLD_IV_X_I<"vslideup", 0b001110, /*slidesUp=*/true>;
 defm VSLIDE1UP_V : VSLD1_MV_X<"vslide1up", 0b001110>;
 } // Constraints = "@earlyclobber $vd", RVVConstraint = SlideUp
+let ReadsPastVL = 1 in
 defm VSLIDEDOWN_V : VSLD_IV_X_I<"vslidedown", 0b001111, /*slidesUp=*/false>;
-let ElementsDependOn = EltDepsVL in
+let ElementsDependOn = EltDepsVL, ReadsPastVL = 1 in
 defm VSLIDE1DOWN_V : VSLD1_MV_X<"vslide1down", 0b001111>;
 } // Predicates = [HasVInstructions]
 
@@ -1712,19 +1713,19 @@ let Predicates = [HasVInstructionsAnyF] in {
 let Constraints = "@earlyclobber $vd", RVVConstraint = SlideUp in {
 defm VFSLIDE1UP_V : VSLD1_FV_F<"vfslide1up", 0b001110>;
 } // Constraints = "@earlyclobber $vd", RVVConstraint = SlideUp
-let ElementsDependOn = EltDepsVL in
+let ElementsDependOn = EltDepsVL, ReadsPastVL = 1 in
 defm VFSLIDE1DOWN_V : VSLD1_FV_F<"vfslide1down", 0b001111>;
 } // Predicates = [HasVInstructionsAnyF]
 
 let Predicates = [HasVInstructions] in {
 // Vector Register Gather Instruction
-let Constraints = "@earlyclobber $vd", RVVConstraint = Vrgather in {
+let Constraints = "@earlyclobber $vd", RVVConstraint = Vrgather, ReadsPastVL = 1 in {
 defm VRGATHER_V : VGTR_IV_V_X_I<"vrgather", 0b001100>;
 def VRGATHEREI16_VV : VALUVV<0b001110, OPIVV, "vrgatherei16.vv">,
                       SchedBinaryMC<"WriteVRGatherEI16VV",
                                     "ReadVRGatherEI16VV_data",
                                     "ReadVRGatherEI16VV_index">;
-} // Constraints = "@earlyclobber $vd", RVVConstraint = Vrgather
+} // Constraints = "@earlyclobber $vd", RVVConstraint = Vrgather, ReadsPastVL = 1
 
 // Vector Compress Instruction
 let Constraints = "@earlyclobber $vd", RVVConstraint = Vcompress, ElementsDependOn = EltDepsVLMask in {
diff --git a/llvm/lib/Target/RISCV/RISCVInstrInfoXRivos.td b/llvm/lib/Target/RISCV/RISCVInstrInfoXRivos.td
index ebcf079f300b3..3a6ce3ce1d469 100644
--- a/llvm/lib/Target/RISCV/RISCVInstrInfoXRivos.td
+++ b/llvm/lib/Target/RISCV/RISCVInstrInfoXRivos.td
@@ -58,7 +58,7 @@ class CustomRivosXVI<bits<6> funct6, RISCVVFormat opv, dag outs, dag ins,
 
 let Predicates = [HasVendorXRivosVizip], DecoderNamespace = "XRivos",
   Constraints = "@earlyclobber $vd", RVVConstraint = Vrgather,
-  Inst<6-0> = OPC_CUSTOM_2.Value in  {
+  Inst<6-0> = OPC_CUSTOM_2.Value, ReadsPastVL = 1 in  {
 defm RI_VZIPEVEN_V : VALU_IV_V<"ri.vzipeven", 0b001100>;
 defm RI_VZIPODD_V : VALU_IV_V<"ri.vzipodd", 0b011100>;
 defm RI_VZIP2A_V : VALU_IV_V<"ri.vzip2a", 0b000100>;
@@ -126,6 +126,7 @@ def RI_VINSERT : CustomRivosVXI<0b010000, OPMVX, (outs VR:$vd_wb),
                                 (ins VR:$vd, GPR:$rs1, uimm5:$imm),
                                 "ri.vinsert.v.x", "$vd, $rs1, $imm">;
 
+let ReadsPastVL = 1 in
 def RI_VEXTRACT : CustomRivosXVI<0b010111, OPMVV, (outs GPR:$rd),
                                 (ins VR:$vs2, uimm5:$imm),
                                 "ri.vextract.x.v", "$rd, $vs2, $imm">;
diff --git a/llvm/lib/Target/RISCV/RISCVInstrInfoXSf.td b/llvm/lib/Target/RISCV/RISCVInstrInfoXSf.td
index a47dfe363c21e..b546339ce99e2 100644
--- a/llvm/lib/Target/RISCV/RISCVInstrInfoXSf.td
+++ b/llvm/lib/Target/RISCV/RISCVInstrInfoXSf.td
@@ -74,6 +74,7 @@ class RVInstVCCustom2<bits<4> funct6_hi4, bits<3> funct3, dag outs, dag ins,
   let Uses = [VL, VTYPE];
   let RVVConstraint = NoConstraint;
   let ElementsDependOn = EltDepsVLMask;
+  let ReadsPastVL = 1;
 }
 
 class RVInstVCFCustom2<bits<4> funct6_hi4, bits<3> funct3, dag outs, dag ins,
@@ -98,6 +99,7 @@ class RVInstVCFCustom2<bits<4> funct6_hi4, bits<3> funct3, dag outs, dag ins,
   let Uses = [VL, VTYPE];
   let RVVConstraint = NoConstraint;
   let ElementsDependOn = EltDepsVLMask;
+  let ReadsPastVL = 1;
 }
 
 class VCIXInfo<string suffix, VCIXType type, DAGOperand TyRd,
diff --git a/llvm/lib/Target/RISCV/RISCVInstrInfoXSfmm.td b/llvm/lib/Target/RISCV/RISCVInstrInfoXSfmm.td
index 66cb2d53da960..a5ee701386b6d 100644
--- a/llvm/lib/Target/RISCV/RISCVInstrInfoXSfmm.td
+++ b/llvm/lib/Target/RISCV/RISCVInstrInfoXSfmm.td
@@ -65,6 +65,7 @@ class SFInstTileMemOp<dag outs, dag ins, bits<3> nf, RISCVOpcode opcode,
   let Inst{6-0} = opcode.Value;
 
   let Uses = [VTYPE, VL];
+  let ReadsPastVL = 1;
 }
 
 let hasSideEffects = 0, mayLoad = 1, mayStore = 0 in
@@ -94,6 +95,7 @@ class SFInstTileMoveOp<bits<6> funct6, dag outs, dag ins, string opcodestr,
   let Inst{6-0} = OPC_OP_V.Value;
 
   let Uses = [VTYPE, VL];
+  let ReadsPastVL = 1;
 }
 
 let hasSideEffects = 0, mayLoad = 0, mayStore = 0 in
@@ -113,6 +115,7 @@ class SFInstMatmulF<dag outs, dag ins, string opcodestr, string argstr>
   let Inst{6-0} = OPC_OP_VE.Value;
 
   let Uses = [VTYPE, VL];
+  let ReadsPastVL = 1;
 }
 
 let hasSideEffects = 0, mayLoad = 0, mayStore = 0 in
@@ -135,6 +138,7 @@ class SFInstMatmulF8<bit a, bit b, dag outs, dag ins,
   let Inst{6-0} = OPC_OP_VE.Value;
 
   let Uses = [VTYPE, VL];
+  let ReadsPastVL = 1;
 }
 
 
@@ -167,6 +171,7 @@ class SFInstMatmulI8<bit funct6_1, bit a, bit b, dag outs, dag ins,
   let Inst{6-0} = OPC_OP_VE.Value;
 
   let Uses = [VTYPE, VL];
+  let ReadsPastVL = 1;
 }
 
 class I8Encode<bit encoding, string name> {
diff --git a/llvm/lib/Target/RISCV/RISCVVLOptimizer.cpp b/llvm/lib/Target/RISCV/RISCVVLOptimizer.cpp
index c9464515d2e56..40af9b04c97b6 100644
--- a/llvm/lib/Target/RISCV/RISCVVLOptimizer.cpp
+++ b/llvm/lib/Target/RISCV/RISCVVLOptimizer.cpp
@@ -30,6 +30,27 @@ using namespace llvm;
 
 namespace {
 
+/// Wrapper around MachineOperand that defaults to immediate 0.
+struct DemandedVL {
+  MachineOperand VL;
+  DemandedVL() : VL(MachineOperand::CreateImm(0)) {}
+  DemandedVL(MachineOperand VL) : VL(VL) {}
+  static DemandedVL vlmax() {
+    return DemandedVL(MachineOperand::CreateImm(RISCV::VLMaxSentinel));
+  }
+  bool operator!=(const DemandedVL &Other) const {
+    return !VL.isIdenticalTo(Other.VL);
+  }
+};
+
+static DemandedVL max(const DemandedVL &LHS, const DemandedVL &RHS) {
+  if (RISCV::isVLKnownLE(LHS.VL, RHS.VL))
+    return RHS;
+  if (RISCV::isVLKnownLE(RHS.VL, LHS.VL))
+    return LHS;
+  return DemandedVL::vlmax();
+}
+
 class RISCVVLOptimizer : public MachineFunctionPass {
   const MachineRegisterInfo *MRI;
   const MachineDominatorTree *MDT;
@@ -51,17 +72,26 @@ class RISCVVLOptimizer : public MachineFunctionPass {
   StringRef getPassName() const override { return PASS_NAME; }
 
 private:
-  std::optional<MachineOperand>
-  getMinimumVLForUser(const MachineOperand &UserOp) const;
-  /// Returns the largest common VL MachineOperand that may be used to optimize
-  /// MI. Returns std::nullopt if it failed to find a suitable VL.
-  std::optional<MachineOperand> checkUsers(const MachineInstr &MI) const;
+  DemandedVL getMinimumVLForUser(const MachineOperand &UserOp) const;
+  /// Returns true if the users of \p MI have compatible EEWs and SEWs.
+  bool checkUsers(const MachineInstr &MI) const;
   bool tryReduceVL(MachineInstr &MI) const;
   bool isCandidate(const MachineInstr &MI) const;
+  void transfer(const MachineInstr &MI);
+
+  /// Returns all uses of vector virtual registers.
+  auto vector_uses(const MachineInstr &MI) const {
+    auto Pred = [this](const MachineOperand &MO) -> bool {
+      return MO.isReg() && MO.getReg().isVirtual() &&
+             RISCVRegisterInfo::isRVVRegClass(MRI->getRegClass(MO.getReg()));
+    };
+    return make_filter_range(MI.uses(), Pred);
+  }
 
   /// For a given instruction, records what elements of it are demanded by
   /// downstream users.
-  DenseMap<const MachineInstr *, std::optional<MachineOperand>> DemandedVLs;
+  DenseMap<const MachineInstr *, DemandedVL> DemandedVLs;
+  SetVector<const MachineInstr *> Worklist;
 };
 
 /// Represents the EMUL and EEW of a MachineOperand.
@@ -787,6 +817,9 @@ getOperandInfo(const MachineOperand &MO, const MachineRegisterInfo *MRI) {
 /// white-list approach simplifies this optimization for instructions that may
 /// have more complex semantics with relation to how it uses VL.
 static bool isSupportedInstr(const MachineInstr &MI) {
+  if (MI.isPHI() || MI.isFullCopy())
+    return true;
+
   const RISCVVPseudosTable::PseudoInfo *RVV =
       RISCVVPseudosTable::getPseudoInfo(MI.getOpcode());
 
@@ -1210,34 +1243,6 @@ static bool isVectorOpUsedAsScalarOp(const MachineOperand &MO) {
   }
 }
 
-/// Return true if MI may read elements past VL.
-static bool mayReadPastVL(const MachineInstr &MI) {
-  const RISCVVPseudosTable::PseudoInfo *RVV =
-      RISCVVPseudosTable::getPseudoInfo(MI.getOpcode());
-  if (!RVV)
-    return true;
-
-  switch (RVV->BaseInstr) {
-  // vslidedown instructions may read elements past VL. They are handled
-  // according to current tail policy.
-  case RISCV::VSLIDEDOWN_VI:
-  case RISCV::VSLIDEDOWN_VX:
-  case RISCV::VSLIDE1DOWN_VX:
-  case RISCV::VFSLIDE1DOWN_VF:
-
-  // vrgather instructions may read the source vector at any index < VLMAX,
-  // regardless of VL.
-  case RISCV::VRGATHER_VI:
-  case RISCV::VRGATHER_VV:
-  case RISCV::VRGATHER_VX:
-  case RISCV::VRGATHEREI16_VV:
-    return true;
-
-  default:
-    return false;
-  }
-}
-
 bool RISCVVLOptimizer::isCandidate(const MachineInstr &MI) const {
   const MCInstrDesc &Desc = MI.getDesc();
   if (!RISCVII::hasVLOp(Desc.TSFlags) || !RISCVII::hasSEWOp(Desc.TSFlags))
@@ -1287,20 +1292,24 @@ bool RISCVVLOptimizer::isCandidate(const MachineInstr &MI) const {
   return true;
 }
 
-std::optional<MachineOperand>
+DemandedVL
 RISCVVLOptimizer::getMinimumVLForUser(const MachineOperand &UserOp) const {
   const MachineInstr &UserMI = *UserOp.getParent();
   const MCInstrDesc &Desc = UserMI.getDesc();
 
+  if (UserMI.isPHI() || UserMI.isFullCopy())
+    return DemandedVLs.lookup(&UserMI);
+
   if (!RISCVII::hasVLOp(Desc.TSFlags) || !RISCVII::hasSEWOp(Desc.TSFlags)) {
     LLVM_DEBUG(dbgs() << "    Abort due to lack of VL, assume that"
                          " use VLMAX\n");
-    return std::nullopt;
+    return DemandedVL::vlmax();
   }
 
-  if (mayReadPastVL(UserMI)) {
+  if (RISCVII::readsPastVL(
+          TII->get(RISCV::getRVVMCOpcode(UserMI.getOpcode())).TSFlags)) {
     LLVM_DEBUG(dbgs() << "    Abort because used by unsafe instruction\n");
-    return std::nullopt;
+    return DemandedVL::vlmax();
   }
 
   unsigned VLOpNum = RISCVII::getVLOpNum(Desc);
@@ -1314,11 +1323,10 @@ RISCVVLOptimizer::getMinimumVLForUser(const MachineOperand &UserOp) const {
   if (UserOp.isTied()) {
     assert(UserOp.getOperandNo() == UserMI.getNumExplicitDefs() &&
            RISCVII::isFirstDefTiedToFirstUse(UserMI.getDesc()));
-    auto DemandedVL = DemandedVLs.lookup(&UserMI);
-    if (!DemandedVL || !RISCV::isVLKnownLE(*DemandedVL, VLOp)) {
+    if (!RISCV::isVLKnownLE(DemandedVLs.lookup(&UserMI).VL, VLOp)) {
       LLVM_DEBUG(dbgs() << "    Abort because user is passthru in "
                            "instruction with demanded tail\n");
-      return std::nullopt;
+      return DemandedVL::vlmax();
     }
   }
 
@@ -1331,18 +1339,16 @@ RISCVVLOptimizer::getMinimumVLForUser(const MachineOperand &UserOp) const {
 
   // If we know the demanded VL of UserMI, then we can reduce the VL it
   // requires.
-  if (auto DemandedVL = DemandedVLs.lookup(&UserMI)) {
-    assert(isCandidate(UserMI));
-    if (RISCV::isVLKnownLE(*DemandedVL, VLOp))
-      return DemandedVL;
-  }
+  if (RISCV::isVLKnownLE(DemandedVLs.lookup(&UserMI).VL, VLOp))
+    return DemandedVLs.lookup(&UserMI);
 
   return VLOp;
 }
 
-std::optional<MachineOperand>
-RISCVVLOptimizer::checkUsers(const MachineInstr &MI) const {
-  std::optional<MachineOperand> CommonVL;
+bool RISCVVLOptimizer::checkUsers(const MachineInstr &MI) const {
+  if (MI.isPHI() || MI.isFullCopy())
+    return true;
+
   SmallSetVector<MachineOperand *, 8> Worklist;
   SmallPtrSet<const MachineInstr *, 4> PHISeen;
   for (auto &UserOp : MRI->use_operands(MI.getOperand(0).getReg()))
@@ -1370,23 +1376,9 @@ RISCVVLOptimizer::checkUsers(const MachineInstr &MI) const {
       continue;
     }
 
-    auto VLOp = getMinimumVLForUser(UserOp);
-    if (!VLOp)
-      return std::nullopt;
-
-    // Use the largest VL among all the users. If we cannot determine this
-    // statically, then we cannot optimize the VL.
-    if (!CommonVL || RISCV::isVLKnownLE(*CommonVL, *VLOp)) {
-      CommonVL = *VLOp;
-      LLVM_DEBUG(dbgs() << "    User VL is: " << VLOp << "\n");
-    } else if (!RISCV::isVLKnownLE(*VLOp, *CommonVL)) {
-      LLVM_DEBUG(dbgs() << "    Abort because cannot determine a common VL\n");
-      return std::nullopt;
-    }
-
     if (!RISCVII::hasSEWOp(UserMI.getDesc().TSFlags)) {
       LLVM_DEBUG(dbgs() << "    Abort due to lack of SEW operand\n");
-      return std::nullopt;
+      return false;
     }
 
     std::optional<OperandInfo> ConsumerInfo = getOperandInfo(UserOp, MRI);
@@ -1396,7 +1388,7 @@ RISCVVLOptimizer::checkUsers(const MachineInstr &MI) const {
       LLVM_DEBUG(dbgs() << "    Abort due to unknown operand information.\n");
       LLVM_DEBUG(dbgs() << "      ConsumerInfo is: " << ConsumerInfo << "\n");
       LLVM_DEBUG(dbgs() << "      ProducerInfo is: " << ProducerInfo << "\n");
-      return std::nullopt;
+      return false;
     }
 
     // If the operand is used as a scalar operand, then the EEW must be
@@ -1411,11 +1403,11 @@ RISCVVLOptimizer::checkUsers(const MachineInstr &MI) const {
           << "    Abort due to incompatible information for EMUL or EEW.\n");
       LLVM_DEBUG(dbgs() << "      ConsumerInfo is: " << ConsumerInfo << "\n");
       LLVM_DEBUG(dbgs() << "      ProducerInfo is: " << ProducerInfo << "\n");
-      return std::nullopt;
+      return false;
     }
   }
 
-  return CommonVL;
+  return true;
 }
 
 bool RISCVVLOptimizer::tryReduceVL(MachineInstr &MI) const {
@@ -1431,9 +1423,7 @@ bool RISCVVLOptimizer::tryReduceVL(MachineInstr &MI) const {
     return false;
   }
 
-  auto CommonVL = DemandedVLs.lookup(&MI);
-  if (!CommonVL)
-    return false;
+  auto *CommonVL = &DemandedVLs.at(&MI).VL;
 
   assert((CommonVL->isImm() || CommonVL->getReg().isVirtual()) &&
          "Expected VL to be an Imm or virtual Reg");
@@ -1468,6 +1458,24 @@ bool RISCVVLOptimizer::tryReduceVL(MachineInstr &MI) const {
   return true;
 }
 
+static bool isPhysical(const MachineOperand &MO) {
+  return MO.isReg() && MO.getReg().isPhysical();
+}
+
+/// Look through \p MI's operands and propagate what it demands to its uses.
+void RISCVVLOptimizer::transfer(const MachineInstr &MI) {
+  if (!isSupportedInstr(MI) || !checkUsers(MI) || any_of(MI.defs(), isPhysical))
+    DemandedVLs[&MI] = DemandedVL::vlmax();
+
+  for (const MachineOperand &MO : vector_uses(MI)) {
+    const MachineInstr *Def = MRI->getVRegDef(MO.getReg());
+    DemandedVL Prev = DemandedVLs[Def];
+    DemandedVLs[Def] = max(DemandedVLs[Def], getMinimumVLForUser(MO));
+    if (DemandedVLs[Def] != Prev)
+      Worklist.insert(Def);
+  }
+}
+
 bool RISCVVLOptimizer::runOnMachineFunction(MachineFunction &MF) {
   if (skipFunction(MF.getFunction()))
     return false;
@@ -1484,14 +1492,17 @@ bool RISCVVLOptimizer::runOnMachineFunction(MachineFunction &MF) {
   assert(DemandedVLs.empty());
 
   // For each instruction that defines a vector, compute what VL its
-  // downstream users demand.
+  // upstream uses demand.
   for (MachineBasicBlock *MBB : post_order(&MF)) {
     assert(MDT->isReachableFromEntry(MBB));
-    for (MachineInstr &MI : reverse(*MBB)) {
-      if (!isCandidate(MI))
-        continue;
-      DemandedVLs.insert({&MI, checkUsers(MI)});
-    }
+    for (MachineInstr &MI : reverse(*MBB))
+      Worklist.insert(&MI);
+  }
+
+  while (!Worklist.empty()) {
+    const MachineInstr *MI = Worklist.front();
+    Worklist.remove(MI);
+    transfer(*MI);
   }
 
   // Then go through and see if we can reduce the VL of any instructions to
diff --git a/llvm/test/CodeGen/RISCV/rvv/reproducer-pr146855.ll b/llvm/test/CodeGen/RISCV/rvv/reproducer-pr146855.ll
index cca00bf58063d..2d64defe8c7b1 100644
--- a/llvm/test/CodeGen/RISCV/rvv/reproducer-pr146855.ll
+++ b/llvm/test/CodeGen/RISCV/rvv/reproducer-pr146855.ll
@@ -6,7 +6,7 @@ target triple = "riscv64-unknown-linux-gnu"
 define i32 @_ZN4Mesh12rezone_countESt6vectorIiSaIiEERiS3_(<vscale x 4 x i32> %wide.load, <vscale x 4 x i1> %0, <vscale x 4 x i1> %1, <vscale x 4 x i1> %2, <vscale x 4 x i1> %3) #0 {
 ; CHECK-LABEL: _ZN4Mesh12rezone_countESt6vectorIiSaIiEERiS3_:
 ; CHECK:       # %bb.0: # %entry
-; CHECK-NEXT:    vsetvli a0, zero, e32, m2, ta, ma
+; CHECK-NEXT:    vsetivli zero, 0, e32, m2, ta, ma
 ; CHECK-NEXT:    vmv1r.v v8, v0
 ; CHECK-NEXT:    li a0, 0
 ; CHECK-NEXT:    vmv.v.i v10, 0
@@ -14,7 +14,7 @@ define i32 @_ZN4Mesh12rezone_countESt6vectorIiSaIiEERiS3_(<vscale x 4 x i32> %wi
 ; CHECK-NEXT:    vmv.v.i v14, 0
 ; CHECK-NEXT:  .LBB0_1: # %vector.body
 ; CHECK-NEXT:    # =>This Inner Loop Header: Depth=1
-; CHECK-NEXT:    vsetvli a1, zero, e32, m2, ta, mu
+; CHECK-NEXT:    vsetivli zero, 0, e32, m2, ta, mu
 ; CHECK-NEXT:    vmv1r.v v0, v8
 ; CHECK-NEXT:    slli a0, a0, 2
 ; CHECK-NEXT:    vmv2r.v v16, v10
diff --git a/llvm/test/CodeGen/RISCV/rvv/vl-opt.ll b/llvm/test/CodeGen/RISCV/rvv/vl-opt.ll
index cd282c265ae47..ecea4efa4e768 100644
--- a/llvm/test/CodeGen/RISCV/rvv/vl-opt.ll
+++ b/llvm/test/CodeGen/RISCV/rvv/vl-opt.ll
@@ -198,3 +198,55 @@ define void @fadd_fcmp_select_copy(<vscale x 4 x float> %v, <vscale x 4 x i1> %c
   call void @llvm.riscv.vsm(<vscale x 4 x i1> %select, ptr %p, iXLen %vl)
   ret void
 }
+
+define void @recurrence(<vscale x 4 x i32> %v, ptr %p, iXLen %n, iXLen %vl) {
+; CHECK-LABEL: recurrence:
+; CHECK:       # %bb.0: # %entry
+; CHECK-NEXT:    vsetvli zero, a2, e32, m2, ta, ma
+; CHECK-NEXT:    vmv.v.i v10, 0
+; CHECK-NEXT:  .LBB13_1: # %loop
+; CHECK-NEXT:    # =>This Inner Loop Header: Depth=1
+; CHECK-NEXT:    addi a1, a1, -1
+; CHECK-NEXT:    vadd.vv v10, v10, v8
+; CHECK-NEXT:    bnez a1, .LBB13_1
+; CHECK-NEXT:  # %bb.2: # %exit
+; CHECK-NEXT:    vse32.v v10, (a0)
+; CHECK-NEXT:    ret
+entry:
+  br label %loop
+loop:
+  %iv = phi iXLen [ 0, %entry ], [ %iv.next, %loop ]
+  %phi = phi <vscale x 4 x i32> [ zeroinitializer, %entry ], [ %x, %loop ]
+  %x = add <vscale x 4 x i32> %phi, %v
+  %iv.next = add iXLen %iv, 1
+  %done = icmp eq iXLen %iv.next, %n
+  br i1 %done, label %exit, label %loop
+exit:
+  call void @llvm.riscv.vse(<vscale x 4 x i32>...
[truncated]

lukel97 · 2025-08-21T06:12:36Z

Ping, rebased now that #149704 is landed.

llvm/test/CodeGen/RISCV/rvv/reproducer-pr146855.ll

llvm/test/CodeGen/RISCV/rvv/vl-opt.ll

…dataflow-analysis

lukel97 · 2025-09-03T01:31:02Z

Ping

llvm/lib/Target/RISCV/RISCVVLOptimizer.cpp

…dataflow-analysis

TIL that MachineFunction actually stores a reference to MachineRegisterInfo, so use that instead of plumbing it through. This helps avoid the need to plumb MRI through static functions in #151285

…dataflow-analysis

github-actions · 2025-09-05T07:02:48Z

✅ With the latest revision this PR passed the C/C++ code formatter.

llvm/lib/Target/RISCV/RISCVVLOptimizer.cpp

lukel97 · 2025-09-06T04:00:52Z

llvm/lib/Target/RISCV/RISCVVLOptimizer.cpp

  return OperandInfo(getEMULEqualsEEWDivSEWTimesLMUL(*Log2EEW, MI), *Log2EEW);
 }

+static bool isTupleInsertInstr(const MachineInstr &MI);


Forward declared here to remove code motion from the diff, will remove in a follow-up commit

lukel97 · 2025-09-10T16:05:09Z

Ping

mshockwave

Generally looks good to me, as long as the safe approximation in the dataflow analysis is VLMAX and we never reduce the VL in this process I think we're good.

where the height is determined by the number of distinct VL values

Related to this: could we add a test for fault-only-first load?

lukel97 · 2025-09-11T02:15:18Z

llvm/test/CodeGen/RISCV/rvv/vl-opt.ll

+; CHECK-LABEL: recurrence_vleff:
+; CHECK:       # %bb.0: # %entry
+; CHECK-NEXT:    vsetvli zero, a2, e32, m2, ta, ma
+; CHECK-NEXT:    vmv.v.i v8, 0
+; CHECK-NEXT:    mv a3, a0
+; CHECK-NEXT:  .LBB17_1: # %loop
+; CHECK-NEXT:    # =>This Inner Loop Header: Depth=1
+; CHECK-NEXT:    vsetvli zero, a2, e32, m2, ta, ma
+; CHECK-NEXT:    vle32ff.v v10, (a3)
+; CHECK-NEXT:    addi a1, a1, -1
+; CHECK-NEXT:    vadd.vv v8, v8, v10
+; CHECK-NEXT:    vse32.v v8, (a0)
+; CHECK-NEXT:    addi a3, a3, 4
+; CHECK-NEXT:    bnez a1, .LBB17_1
+; CHECK-NEXT:  # %bb.2: # %exit
+; CHECK-NEXT:    ret


@mshockwave for reference the codegen before this patch is:

recurrence_vleff: # @recurrence_vleff .cfi_startproc # %bb.0: # %entry vsetvli a3, zero, e32, m2, ta, ma vmv.v.i v8, 0 mv a3, a0 .LBB0_1: # %loop # =>This Inner Loop Header: Depth=1 vsetvli zero, a2, e32, m2, ta, ma vle32ff.v v10, (a3) csrr a4, vl addi a1, a1, -1 vsetvli a5, zero, e32, m2, ta, ma vadd.vv v8, v8, v10 vsetvli zero, a4, e32, m2, ta, ma vse32.v v8, (a0) addi a3, a3, 4 bnez a1, .LBB0_1 # %bb.2: # %exit ret .Lfunc_end0: .size recurrence_vleff, .Lfunc_end0-recurrence_vleff .cfi_endproc # -- End function .section ".note.GNU-stack","",@progbits

So now we're able to propagate %vleff.vl through to %y and %phi

mshockwave

LGTM cheers

topperc · 2025-09-11T18:04:10Z

llvm/lib/Target/RISCV/RISCVVLOptimizer.cpp

+  }
+};
+
+static DemandedVL max(const DemandedVL &LHS, const DemandedVL &RHS) {


Move this into the DemandedVL class so that it has to be called as DemandedVL::max?

topperc · 2025-09-11T18:13:41Z

llvm/lib/Target/RISCV/RISCVVLOptimizer.cpp

+static bool isVirtualVec(const MachineOperand &MO) {
+  return MO.isReg() && MO.getReg().isVirtual() &&
+         RISCVRegisterInfo::isRVVRegClass(
+             MO.getParent()->getMF()->getRegInfo().getRegClass(MO.getReg()));


This feels like a lot of pointer chasing to do on every operand.

Can we make this a lambda in the one function that calls it and capture MRI from the RISCVVLOptimizer class?

I originally had a helper method in the RISCVVLOptimizer class that captured MRI in ea2b861, I've added it back in 0ef2baf

topperc · 2025-09-11T18:16:28Z

llvm/lib/Target/RISCV/RISCVVLOptimizer.cpp

+  if (MI.isPHI() || MI.isFullCopy() || isTupleInsertInstr(MI))
+    return true;
+
  SmallSetVector<MachineOperand *, 8> Worklist;


This name now shadows the new MachineInstr* Worklist in the class. Should we disambiguate them?

topperc

LGTM

llvmbot added the backend:RISC-V label Jul 30, 2025

lukel97 requested review from michaelmaitland, mikhailramalho, mshockwave, preames, topperc and wangpc-pp July 30, 2025 08:07

lukel97 added 3 commits August 21, 2025 11:33

Precommit tests

9aba342

[RISCV] Handle recurrences in RISCVVLOptimizer

ea2b861

Link to talk in header comment

9f24fe7

lukel97 force-pushed the vloptimizer/dataflow-analysis branch from 540f6f9 to 9f24fe7 Compare August 21, 2025 05:52

mshockwave reviewed Aug 21, 2025

View reviewed changes

llvm/test/CodeGen/RISCV/rvv/reproducer-pr146855.ll Show resolved Hide resolved

topperc reviewed Aug 21, 2025

View reviewed changes

llvm/test/CodeGen/RISCV/rvv/vl-opt.ll Outdated Show resolved Hide resolved

lukel97 added 2 commits August 22, 2025 09:42

Fix test name

9b61df6

Merge branch 'main' of github.com:llvm/llvm-project into vloptimizer/…

a78cc60

…dataflow-analysis

lukel97 mentioned this pull request Sep 2, 2025

[VPlan] Use predicated intrinsics for trapping divisors #154076

Open

wangpc-pp reviewed Sep 3, 2025

View reviewed changes

llvm/lib/Target/RISCV/RISCVVLOptimizer.cpp Outdated Show resolved Hide resolved

lukel97 added 2 commits September 3, 2025 15:11

Reword a comment to be more clear

257ed3c

Merge branch 'main' of github.com:llvm/llvm-project into vloptimizer/…

4d80e45

…dataflow-analysis

lukel97 mentioned this pull request Sep 4, 2025

[RISCV][VLOPT] Support segmented store instructions #155467

Merged

lukel97 added 2 commits September 4, 2025 19:06

Merge branch 'main' of github.com:llvm/llvm-project into vloptimizer/…

5465920

…dataflow-analysis

Merge branch 'main' of github.com:llvm/llvm-project into vloptimizer/…

a31269c

…dataflow-analysis

lukel97 added 2 commits September 5, 2025 14:54

Merge branch 'main' of github.com:llvm/llvm-project into vloptimizer/…

97a12b1

…dataflow-analysis

Make vector_uses just a static function

8f75db7

clang-format

d550870

topperc reviewed Sep 5, 2025

View reviewed changes

llvm/lib/Target/RISCV/RISCVVLOptimizer.cpp Outdated Show resolved Hide resolved

Remove debug code

dc0ca0e

lukel97 commented Sep 6, 2025

View reviewed changes

Merge branch 'main' into vloptimizer/dataflow-analysis

1dd36a4

mshockwave reviewed Sep 10, 2025

View reviewed changes

Add vleff recurrence test case

ebce546

lukel97 commented Sep 11, 2025

View reviewed changes

mshockwave approved these changes Sep 11, 2025

View reviewed changes

topperc requested changes Sep 11, 2025

View reviewed changes

lukel97 added 3 commits September 12, 2025 14:02

Move max into DemandedVL

cadf393

Move isVirtualVec into virtual_vec_uses method, avoid pointer chasing

0ef2baf

Rename Worklist in checkUsers to avoid shadowing

1068f3a

topperc approved these changes Sep 12, 2025

View reviewed changes

Merge branch 'main' into vloptimizer/dataflow-analysis

62635d5

lukel97 enabled auto-merge (squash) September 15, 2025 03:29

lukel97 merged commit 65ad21d into llvm:main Sep 15, 2025
11 checks passed

Sharjeel-Khan mentioned this pull request Sep 17, 2025

[RISCV] Crash in RISCVVLOptimizer #159422

Closed

[RISCV] Handle recurrences in RISCVVLOptimizer #151285

[RISCV] Handle recurrences in RISCVVLOptimizer #151285

Uh oh!

Conversation

lukel97 commented Jul 30, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

llvmbot commented Jul 30, 2025

Uh oh!

lukel97 commented Aug 21, 2025

Uh oh!

Uh oh!

Uh oh!

lukel97 commented Sep 3, 2025

Uh oh!

Uh oh!

github-actions bot commented Sep 5, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Uh oh!

lukel97 Sep 6, 2025

Choose a reason for hiding this comment

Uh oh!

lukel97 commented Sep 10, 2025

Uh oh!

mshockwave left a comment

Choose a reason for hiding this comment

Uh oh!

lukel97 Sep 11, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

mshockwave left a comment

Choose a reason for hiding this comment

Uh oh!

topperc Sep 11, 2025

Choose a reason for hiding this comment

Uh oh!

topperc Sep 11, 2025

Choose a reason for hiding this comment

Uh oh!

lukel97 Sep 12, 2025

Choose a reason for hiding this comment

Uh oh!

topperc Sep 11, 2025

Choose a reason for hiding this comment

Uh oh!

topperc left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

5 participants

lukel97 commented Jul 30, 2025 •

edited

Loading

github-actions bot commented Sep 5, 2025 •

edited

Loading

lukel97 Sep 11, 2025 •

edited

Loading