Reland [VPlan] Simplify pow-of-2 (mul|udiv) -> (shl|lshr) by artagnon · Pull Request #174581 · llvm/llvm-project

artagnon · 2026-01-06T12:57:13Z

The original patch, landed as a2db31b ([VPlan] Simplify pow-of-2 (mul|udiv) -> (shl|lshr), #172477) had a critical commutative matcher bug, which has now been fixed. An assert has also been strengthened, following a post-commit review.

llvmbot · 2026-01-06T12:57:46Z

@llvm/pr-subscribers-vectorizers
@llvm/pr-subscribers-llvm-transforms

@llvm/pr-subscribers-backend-risc-v

Author: Ramkumar Ramachandra (artagnon)

Changes

The original patch, landed as a2db31b ([VPlan] Simplify pow-of-2 (mul|udiv) -> (shl|lshr), #172477) had a critical commutative matcher bug, which has now been fixed. An assert has also been strengthened, following a post-commit review.

Patch is 467.49 KiB, truncated to 20.00 KiB below, full version: https://github.com/llvm/llvm-project/pull/174581.diff

131 Files Affected:

(modified) llvm/lib/Transforms/Vectorize/VPlanPatternMatch.h (+11-2)
(modified) llvm/lib/Transforms/Vectorize/VPlanTransforms.cpp (+13)
(modified) llvm/lib/Transforms/Vectorize/VPlanVerifier.cpp (+1)
(modified) llvm/test/Transforms/LoopVectorize/AArch64/clamped-trip-count.ll (+2-2)
(modified) llvm/test/Transforms/LoopVectorize/AArch64/conditional-branches-cost.ll (+6-6)
(modified) llvm/test/Transforms/LoopVectorize/AArch64/divs-with-scalable-vfs.ll (+4-4)
(modified) llvm/test/Transforms/LoopVectorize/AArch64/eliminate-tail-predication.ll (+1-1)
(modified) llvm/test/Transforms/LoopVectorize/AArch64/gather-do-not-vectorize-addressing.ll (+1-1)
(modified) llvm/test/Transforms/LoopVectorize/AArch64/induction-costs-sve.ll (+4-4)
(modified) llvm/test/Transforms/LoopVectorize/AArch64/interleaving-load-store.ll (+3-3)
(modified) llvm/test/Transforms/LoopVectorize/AArch64/interleaving-reduction.ll (+3-3)
(modified) llvm/test/Transforms/LoopVectorize/AArch64/invalid-costs.ll (+1-1)
(modified) llvm/test/Transforms/LoopVectorize/AArch64/load-cast-context.ll (+4-4)
(modified) llvm/test/Transforms/LoopVectorize/AArch64/low_trip_count_predicates.ll (+3-3)
(modified) llvm/test/Transforms/LoopVectorize/AArch64/masked-call.ll (+24-24)
(modified) llvm/test/Transforms/LoopVectorize/AArch64/masked_ldst_sme.ll (+1-1)
(modified) llvm/test/Transforms/LoopVectorize/AArch64/optsize_minsize.ll (+3-3)
(modified) llvm/test/Transforms/LoopVectorize/AArch64/outer_loop_prefer_scalable.ll (+1-1)
(modified) llvm/test/Transforms/LoopVectorize/AArch64/partial-reduce-chained.ll (+18-18)
(modified) llvm/test/Transforms/LoopVectorize/AArch64/partial-reduce-constant-ops.ll (+3-3)
(modified) llvm/test/Transforms/LoopVectorize/AArch64/partial-reduce-dot-product-epilogue.ll (+6-6)
(modified) llvm/test/Transforms/LoopVectorize/AArch64/partial-reduce-dot-product-mixed.ll (+4-4)
(modified) llvm/test/Transforms/LoopVectorize/AArch64/partial-reduce-dot-product.ll (+27-27)
(modified) llvm/test/Transforms/LoopVectorize/AArch64/partial-reduce-incomplete-chains.ll (+1-1)
(modified) llvm/test/Transforms/LoopVectorize/AArch64/partial-reduce-sub.ll (+4-4)
(modified) llvm/test/Transforms/LoopVectorize/AArch64/partial-reduce.ll (+32-32)
(modified) llvm/test/Transforms/LoopVectorize/AArch64/pr60831-sve-inv-store-crash.ll (+11-11)
(modified) llvm/test/Transforms/LoopVectorize/AArch64/reduction-recurrence-costs-sve.ll (+8-8)
(modified) llvm/test/Transforms/LoopVectorize/AArch64/scalable-avoid-scalarization.ll (+1-1)
(modified) llvm/test/Transforms/LoopVectorize/AArch64/scalable-reduction-inloop-cond.ll (+2-2)
(modified) llvm/test/Transforms/LoopVectorize/AArch64/scalable-strict-fadd.ll (+40-40)
(modified) llvm/test/Transforms/LoopVectorize/AArch64/scalable-struct-return.ll (+3-3)
(modified) llvm/test/Transforms/LoopVectorize/AArch64/simple_early_exit.ll (+2-2)
(modified) llvm/test/Transforms/LoopVectorize/AArch64/single-early-exit-interleave.ll (+3-3)
(modified) llvm/test/Transforms/LoopVectorize/AArch64/store-costs-sve.ll (+7-7)
(modified) llvm/test/Transforms/LoopVectorize/AArch64/sve-epilog-vect-inloop-reductions.ll (+2-2)
(modified) llvm/test/Transforms/LoopVectorize/AArch64/sve-epilog-vect-reductions.ll (+2-2)
(modified) llvm/test/Transforms/LoopVectorize/AArch64/sve-epilog-vect-strict-reductions.ll (+2-2)
(modified) llvm/test/Transforms/LoopVectorize/AArch64/sve-epilog-vect.ll (+24-24)
(modified) llvm/test/Transforms/LoopVectorize/AArch64/sve-epilog-vscale-fixed.ll (+5-5)
(modified) llvm/test/Transforms/LoopVectorize/AArch64/sve-fneg.ll (+2-2)
(modified) llvm/test/Transforms/LoopVectorize/AArch64/sve-inductions-unusual-types.ll (+4-4)
(modified) llvm/test/Transforms/LoopVectorize/AArch64/sve-interleaved-accesses.ll (+18-18)
(modified) llvm/test/Transforms/LoopVectorize/AArch64/sve-inv-store.ll (+2-2)
(modified) llvm/test/Transforms/LoopVectorize/AArch64/sve-live-out-pointer-induction.ll (+2-2)
(modified) llvm/test/Transforms/LoopVectorize/AArch64/sve-low-trip-count.ll (+1-1)
(modified) llvm/test/Transforms/LoopVectorize/AArch64/sve-multiexit.ll (+4-4)
(modified) llvm/test/Transforms/LoopVectorize/AArch64/sve-predicated-costs.ll (+2-2)
(modified) llvm/test/Transforms/LoopVectorize/AArch64/sve-runtime-check-size-based-threshold.ll (+2-2)
(modified) llvm/test/Transforms/LoopVectorize/AArch64/sve-tail-folding-forced.ll (+1-1)
(modified) llvm/test/Transforms/LoopVectorize/AArch64/sve-tail-folding-optsize.ll (+1-1)
(modified) llvm/test/Transforms/LoopVectorize/AArch64/sve-tail-folding-reductions.ll (+6-6)
(modified) llvm/test/Transforms/LoopVectorize/AArch64/sve-tail-folding-unroll.ll (+6-6)
(modified) llvm/test/Transforms/LoopVectorize/AArch64/sve-tail-folding.ll (+14-14)
(modified) llvm/test/Transforms/LoopVectorize/AArch64/sve-vscale-based-trip-counts.ll (+12-12)
(modified) llvm/test/Transforms/LoopVectorize/AArch64/sve-wide-lane-mask.ll (+12-12)
(modified) llvm/test/Transforms/LoopVectorize/AArch64/sve-widen-extractvalue.ll (+1-1)
(modified) llvm/test/Transforms/LoopVectorize/AArch64/sve-widen-gep.ll (+2-2)
(modified) llvm/test/Transforms/LoopVectorize/AArch64/sve-widen-phi.ll (+4-4)
(modified) llvm/test/Transforms/LoopVectorize/AArch64/tail-folding-styles.ll (+5-5)
(modified) llvm/test/Transforms/LoopVectorize/AArch64/transform-narrow-interleave-to-widen-memory-scalable.ll (+9-9)
(modified) llvm/test/Transforms/LoopVectorize/AArch64/transform-narrow-interleave-to-widen-memory-with-wide-ops-chained.ll (+1-1)
(modified) llvm/test/Transforms/LoopVectorize/AArch64/type-shrinkage-zext-costs.ll (+6-6)
(modified) llvm/test/Transforms/LoopVectorize/AArch64/widen-gep-all-indices-invariant.ll (+1-1)
(modified) llvm/test/Transforms/LoopVectorize/AArch64/wider-VF-for-callinst.ll (+2-2)
(modified) llvm/test/Transforms/LoopVectorize/ARM/mve-gather-scatter-tailpred.ll (+3-3)
(modified) llvm/test/Transforms/LoopVectorize/RISCV/dead-ops-cost.ll (+5-5)
(modified) llvm/test/Transforms/LoopVectorize/RISCV/evl-compatible-loops.ll (+3-3)
(modified) llvm/test/Transforms/LoopVectorize/RISCV/first-order-recurrence-scalable-vf1.ll (+1-1)
(modified) llvm/test/Transforms/LoopVectorize/RISCV/fminimumnum.ll (+8-8)
(modified) llvm/test/Transforms/LoopVectorize/RISCV/induction-costs.ll (+1-1)
(modified) llvm/test/Transforms/LoopVectorize/RISCV/inloop-reduction.ll (+6-6)
(modified) llvm/test/Transforms/LoopVectorize/RISCV/interleaved-accesses.ll (+23-23)
(modified) llvm/test/Transforms/LoopVectorize/RISCV/masked_gather_scatter.ll (+7-7)
(modified) llvm/test/Transforms/LoopVectorize/RISCV/partial-reduce-dot-product.ll (+8-8)
(modified) llvm/test/Transforms/LoopVectorize/RISCV/reg-usage-prune-vf.ll (+18-18)
(modified) llvm/test/Transforms/LoopVectorize/RISCV/riscv-vector-reverse.ll (+8-8)
(modified) llvm/test/Transforms/LoopVectorize/RISCV/strided-accesses.ll (+36-36)
(modified) llvm/test/Transforms/LoopVectorize/RISCV/tail-folding-bin-unary-ops-args.ll (+18-18)
(modified) llvm/test/Transforms/LoopVectorize/RISCV/tail-folding-call-intrinsics.ll (+9-9)
(modified) llvm/test/Transforms/LoopVectorize/RISCV/tail-folding-cast-intrinsics.ll (+11-11)
(modified) llvm/test/Transforms/LoopVectorize/RISCV/tail-folding-cond-reduction.ll (+8-8)
(modified) llvm/test/Transforms/LoopVectorize/RISCV/tail-folding-div.ll (+4-4)
(modified) llvm/test/Transforms/LoopVectorize/RISCV/tail-folding-fixed-order-recurrence.ll (+10-10)
(modified) llvm/test/Transforms/LoopVectorize/RISCV/tail-folding-inloop-reduction.ll (+14-14)
(modified) llvm/test/Transforms/LoopVectorize/RISCV/tail-folding-interleave.ll (+6-6)
(modified) llvm/test/Transforms/LoopVectorize/RISCV/tail-folding-intermediate-store.ll (+2-2)
(modified) llvm/test/Transforms/LoopVectorize/RISCV/tail-folding-iv32.ll (+1-1)
(modified) llvm/test/Transforms/LoopVectorize/RISCV/tail-folding-masked-loadstore.ll (+1-1)
(modified) llvm/test/Transforms/LoopVectorize/RISCV/tail-folding-ordered-reduction.ll (+1-1)
(modified) llvm/test/Transforms/LoopVectorize/RISCV/tail-folding-reduction.ll (+14-14)
(modified) llvm/test/Transforms/LoopVectorize/RISCV/tail-folding-reverse-load-store.ll (+2-2)
(modified) llvm/test/Transforms/LoopVectorize/RISCV/tail-folding-safe-dep-distance.ll (+6-6)
(modified) llvm/test/Transforms/LoopVectorize/RISCV/transform-narrow-interleave-to-widen-memory.ll (+2-2)
(modified) llvm/test/Transforms/LoopVectorize/RISCV/truncate-to-minimal-bitwidth-evl-crash.ll (+1-1)
(modified) llvm/test/Transforms/LoopVectorize/RISCV/vectorize-vp-intrinsics.ll (+1-1)
(modified) llvm/test/Transforms/LoopVectorize/SystemZ/scalar-steps-with-users-demanding-all-lanes-and-first-lane-only.ll (+5-10)
(modified) llvm/test/Transforms/LoopVectorize/X86/cost-constant-known-via-scev.ll (+2-2)
(modified) llvm/test/Transforms/LoopVectorize/X86/epilog-vectorization-inductions.ll (+1-1)
(modified) llvm/test/Transforms/LoopVectorize/X86/induction-costs.ll (+3-3)
(modified) llvm/test/Transforms/LoopVectorize/X86/uniform_mem_op.ll (+4-4)
(modified) llvm/test/Transforms/LoopVectorize/create-induction-resume.ll (+1-1)
(modified) llvm/test/Transforms/LoopVectorize/first-order-recurrence-chains-vplan.ll (+8-8)
(modified) llvm/test/Transforms/LoopVectorize/hoist-predicated-loads.ll (+3-3)
(modified) llvm/test/Transforms/LoopVectorize/if-reduction.ll (+1-1)
(modified) llvm/test/Transforms/LoopVectorize/induction-step.ll (+5-5)
(modified) llvm/test/Transforms/LoopVectorize/induction.ll (+1-1)
(modified) llvm/test/Transforms/LoopVectorize/load-deref-pred-align.ll (+4-4)
(modified) llvm/test/Transforms/LoopVectorize/narrow-to-single-scalar-widen-gep-scalable.ll (+1-1)
(modified) llvm/test/Transforms/LoopVectorize/optimal-epilog-vectorization.ll (+4-4)
(modified) llvm/test/Transforms/LoopVectorize/outer_loop_scalable.ll (+1-1)
(modified) llvm/test/Transforms/LoopVectorize/pointer-induction.ll (+1-1)
(modified) llvm/test/Transforms/LoopVectorize/pr30654-phiscev-sext-trunc.ll (+3-3)
(modified) llvm/test/Transforms/LoopVectorize/pr58811-scev-expansion.ll (+6-6)
(modified) llvm/test/Transforms/LoopVectorize/scalable-assume.ll (+6-6)
(modified) llvm/test/Transforms/LoopVectorize/scalable-first-order-recurrence.ll (+17-17)
(modified) llvm/test/Transforms/LoopVectorize/scalable-inductions.ll (+5-5)
(modified) llvm/test/Transforms/LoopVectorize/scalable-iv-outside-user.ll (+2-2)
(modified) llvm/test/Transforms/LoopVectorize/scalable-lifetime.ll (+2-2)
(modified) llvm/test/Transforms/LoopVectorize/scalable-loop-unpredicated-body-scalar-tail.ll (+3-3)
(modified) llvm/test/Transforms/LoopVectorize/scalable-predication.ll (+1-1)
(modified) llvm/test/Transforms/LoopVectorize/scalable-reduction-inloop.ll (+2-2)
(modified) llvm/test/Transforms/LoopVectorize/scalable-trunc-min-bitwidth.ll (+2-2)
(modified) llvm/test/Transforms/LoopVectorize/scev-predicate-reasoning.ll (+1-1)
(modified) llvm/test/Transforms/LoopVectorize/uniform_across_vf_induction1.ll (+40-40)
(modified) llvm/test/Transforms/LoopVectorize/uniform_across_vf_induction1_div_urem.ll (+49-49)
(modified) llvm/test/Transforms/LoopVectorize/uniform_across_vf_induction2.ll (+47-47)
(modified) llvm/test/Transforms/LoopVectorize/vectorize-force-tail-with-evl.ll (+1-1)
(modified) llvm/test/Transforms/LoopVectorize/vplan-printing-reductions.ll (+3-3)
(modified) llvm/test/Transforms/LoopVectorize/vplan-sink-scalars-and-merge.ll (+6-10)
(modified) llvm/test/Transforms/PhaseOrdering/AArch64/sve-interleave-vectorization.ll (+1-1)

diff --git a/llvm/lib/Transforms/Vectorize/VPlanPatternMatch.h b/llvm/lib/Transforms/Vectorize/VPlanPatternMatch.h
index 70b174509fc47..41ee4aede3122 100644
--- a/llvm/lib/Transforms/Vectorize/VPlanPatternMatch.h
+++ b/llvm/lib/Transforms/Vectorize/VPlanPatternMatch.h
@@ -290,8 +290,11 @@ struct Recipe_match {
     if ((!matchRecipeAndOpcode<RecipeTys>(R) && ...))
       return false;
 
-    if (R->getNumOperands() != std::tuple_size<Ops_t>::value) {
-      assert(Opcode == Instruction::PHI &&
+    if (R->getNumOperands() != std::tuple_size_v<Ops_t>) {
+      auto *RepR = dyn_cast<VPReplicateRecipe>(R);
+      assert((Opcode == Instruction::PHI ||
+              (RepR && std::tuple_size_v<Ops_t> ==
+                           RepR->getNumOperands() - RepR->isPredicated())) &&
              "non-variadic recipe with matched opcode does not have the "
              "expected number of operands");
       return false;
@@ -564,6 +567,12 @@ m_c_Mul(const Op0_t &Op0, const Op1_t &Op1) {
   return m_c_Binary<Instruction::Mul, Op0_t, Op1_t>(Op0, Op1);
 }
 
+template <typename Op0_t, typename Op1_t>
+inline AllRecipe_match<Instruction::UDiv, Op0_t, Op1_t>
+m_UDiv(const Op0_t &Op0, const Op1_t &Op1) {
+  return m_Binary<Instruction::UDiv, Op0_t, Op1_t>(Op0, Op1);
+}
+
 /// Match a binary AND operation.
 template <typename Op0_t, typename Op1_t>
 inline AllRecipe_commutative_match<Instruction::And, Op0_t, Op1_t>
diff --git a/llvm/lib/Transforms/Vectorize/VPlanTransforms.cpp b/llvm/lib/Transforms/Vectorize/VPlanTransforms.cpp
index b1880517a4199..d26b56d21ef03 100644
--- a/llvm/lib/Transforms/Vectorize/VPlanTransforms.cpp
+++ b/llvm/lib/Transforms/Vectorize/VPlanTransforms.cpp
@@ -1342,6 +1342,19 @@ static void simplifyRecipe(VPSingleDefRecipe *Def, VPTypeAnalysis &TypeInfo) {
     return Def->replaceAllUsesWith(
         Def->getOperand(0) == A ? Def->getOperand(1) : Def->getOperand(0));
 
+  const APInt *APC;
+  if (match(Def, m_c_Mul(m_VPValue(A), m_APInt(APC))) && APC->isPowerOf2())
+    return Def->replaceAllUsesWith(Builder.createNaryOp(
+        Instruction::Shl,
+        {A, Plan->getConstantInt(APC->getBitWidth(), APC->exactLogBase2())},
+        *cast<VPRecipeWithIRFlags>(Def), Def->getDebugLoc()));
+
+  if (match(Def, m_UDiv(m_VPValue(A), m_APInt(APC))) && APC->isPowerOf2())
+    return Def->replaceAllUsesWith(Builder.createNaryOp(
+        Instruction::LShr,
+        {A, Plan->getConstantInt(APC->getBitWidth(), APC->exactLogBase2())}, {},
+        Def->getDebugLoc()));
+
   if (match(Def, m_Not(m_VPValue(A)))) {
     if (match(A, m_Not(m_VPValue(A))))
       return Def->replaceAllUsesWith(A);
diff --git a/llvm/lib/Transforms/Vectorize/VPlanVerifier.cpp b/llvm/lib/Transforms/Vectorize/VPlanVerifier.cpp
index af9262f3ca4d1..32849cfbbd636 100644
--- a/llvm/lib/Transforms/Vectorize/VPlanVerifier.cpp
+++ b/llvm/lib/Transforms/Vectorize/VPlanVerifier.cpp
@@ -188,6 +188,7 @@ bool VPlanVerifier::verifyEVLRecipe(const VPInstruction &EVL) const {
           case Instruction::Trunc:
           case Instruction::ZExt:
           case Instruction::Mul:
+          case Instruction::Shl:
           case Instruction::FMul:
           case VPInstruction::Broadcast:
             // Opcodes above can only use EVL after wide inductions have been
diff --git a/llvm/test/Transforms/LoopVectorize/AArch64/clamped-trip-count.ll b/llvm/test/Transforms/LoopVectorize/AArch64/clamped-trip-count.ll
index ac8095ae5c3e7..077bde70a537b 100644
--- a/llvm/test/Transforms/LoopVectorize/AArch64/clamped-trip-count.ll
+++ b/llvm/test/Transforms/LoopVectorize/AArch64/clamped-trip-count.ll
@@ -9,7 +9,7 @@ define void @clamped_tc_8(ptr nocapture %dst, i32 %n, i64 %val) vscale_range(1,1
 ; CHECK-NEXT:    br label [[VECTOR_PH:%.*]]
 ; CHECK:       vector.ph:
 ; CHECK-NEXT:    [[TMP0:%.*]] = call i64 @llvm.vscale.i64()
-; CHECK-NEXT:    [[TMP1:%.*]] = mul nuw i64 [[TMP0]], 8
+; CHECK-NEXT:    [[TMP1:%.*]] = shl nuw i64 [[TMP0]], 3
 ; CHECK-NEXT:    [[ACTIVE_LANE_MASK_ENTRY:%.*]] = call <vscale x 8 x i1> @llvm.get.active.lane.mask.nxv8i1.i64(i64 0, i64 8)
 ; CHECK-NEXT:    [[BROADCAST_SPLATINSERT:%.*]] = insertelement <vscale x 8 x i64> poison, i64 [[VAL]], i64 0
 ; CHECK-NEXT:    [[BROADCAST_SPLAT:%.*]] = shufflevector <vscale x 8 x i64> [[BROADCAST_SPLATINSERT]], <vscale x 8 x i64> poison, <vscale x 8 x i32> zeroinitializer
@@ -71,7 +71,7 @@ define void @clamped_tc_max_8(ptr nocapture %dst, i32 %n, i64 %val) vscale_range
 ; CHECK-NEXT:    br label [[VECTOR_PH:%.*]]
 ; CHECK:       vector.ph:
 ; CHECK-NEXT:    [[TMP0:%.*]] = call i64 @llvm.vscale.i64()
-; CHECK-NEXT:    [[TMP1:%.*]] = mul nuw i64 [[TMP0]], 8
+; CHECK-NEXT:    [[TMP1:%.*]] = shl nuw i64 [[TMP0]], 3
 ; CHECK-NEXT:    [[ACTIVE_LANE_MASK_ENTRY:%.*]] = call <vscale x 8 x i1> @llvm.get.active.lane.mask.nxv8i1.i64(i64 0, i64 [[WIDE_TRIP_COUNT]])
 ; CHECK-NEXT:    [[BROADCAST_SPLATINSERT:%.*]] = insertelement <vscale x 8 x i64> poison, i64 [[VAL]], i64 0
 ; CHECK-NEXT:    [[BROADCAST_SPLAT:%.*]] = shufflevector <vscale x 8 x i64> [[BROADCAST_SPLATINSERT]], <vscale x 8 x i64> poison, <vscale x 8 x i32> zeroinitializer
diff --git a/llvm/test/Transforms/LoopVectorize/AArch64/conditional-branches-cost.ll b/llvm/test/Transforms/LoopVectorize/AArch64/conditional-branches-cost.ll
index 4f2933ad2f85c..d7d77cb4325d4 100644
--- a/llvm/test/Transforms/LoopVectorize/AArch64/conditional-branches-cost.ll
+++ b/llvm/test/Transforms/LoopVectorize/AArch64/conditional-branches-cost.ll
@@ -528,8 +528,8 @@ define void @multiple_exit_conditions(ptr %src, ptr noalias %dst) #1 {
 ; DEFAULT-NEXT:    br i1 [[MIN_ITERS_CHECK1]], label %[[SCALAR_PH:.*]], label %[[VECTOR_PH:.*]]
 ; DEFAULT:       [[VECTOR_PH]]:
 ; DEFAULT-NEXT:    [[TMP4:%.*]] = call i64 @llvm.vscale.i64()
-; DEFAULT-NEXT:    [[TMP11:%.*]] = mul nuw i64 [[TMP4]], 4
-; DEFAULT-NEXT:    [[TMP5:%.*]] = mul nuw i64 [[TMP11]], 4
+; DEFAULT-NEXT:    [[TMP11:%.*]] = shl nuw i64 [[TMP4]], 2
+; DEFAULT-NEXT:    [[TMP5:%.*]] = shl nuw i64 [[TMP11]], 2
 ; DEFAULT-NEXT:    [[N_MOD_VF:%.*]] = urem i64 257, [[TMP5]]
 ; DEFAULT-NEXT:    [[N_VEC:%.*]] = sub i64 257, [[N_MOD_VF]]
 ; DEFAULT-NEXT:    [[OFFSET_IDX:%.*]] = mul i64 [[N_VEC]], 8
@@ -545,7 +545,7 @@ define void @multiple_exit_conditions(ptr %src, ptr noalias %dst) #1 {
 ; DEFAULT-NEXT:    [[BROADCAST_SPLATINSERT:%.*]] = insertelement <vscale x 4 x i16> poison, i16 [[TMP8]], i64 0
 ; DEFAULT-NEXT:    [[BROADCAST_SPLAT:%.*]] = shufflevector <vscale x 4 x i16> [[BROADCAST_SPLATINSERT]], <vscale x 4 x i16> poison, <vscale x 4 x i32> zeroinitializer
 ; DEFAULT-NEXT:    [[TMP9:%.*]] = uitofp <vscale x 4 x i16> [[BROADCAST_SPLAT]] to <vscale x 4 x double>
-; DEFAULT-NEXT:    [[TMP14:%.*]] = mul nuw nsw i64 [[TMP11]], 2
+; DEFAULT-NEXT:    [[TMP14:%.*]] = shl nuw nsw i64 [[TMP11]], 1
 ; DEFAULT-NEXT:    [[TMP17:%.*]] = mul nuw nsw i64 [[TMP11]], 3
 ; DEFAULT-NEXT:    [[TMP12:%.*]] = getelementptr double, ptr [[NEXT_GEP1]], i64 [[TMP11]]
 ; DEFAULT-NEXT:    [[TMP15:%.*]] = getelementptr double, ptr [[NEXT_GEP1]], i64 [[TMP14]]
@@ -568,7 +568,7 @@ define void @multiple_exit_conditions(ptr %src, ptr noalias %dst) #1 {
 ; PRED-NEXT:    br label %[[VECTOR_PH:.*]]
 ; PRED:       [[VECTOR_PH]]:
 ; PRED-NEXT:    [[TMP0:%.*]] = call i64 @llvm.vscale.i64()
-; PRED-NEXT:    [[TMP1:%.*]] = mul nuw i64 [[TMP0]], 4
+; PRED-NEXT:    [[TMP1:%.*]] = shl nuw i64 [[TMP0]], 2
 ; PRED-NEXT:    [[TMP6:%.*]] = call i64 @llvm.vscale.i64()
 ; PRED-NEXT:    [[TMP7:%.*]] = shl nuw i64 [[TMP6]], 2
 ; PRED-NEXT:    [[TMP8:%.*]] = sub i64 257, [[TMP7]]
@@ -1219,7 +1219,7 @@ define void @pred_udiv_select_cost(ptr %A, ptr %B, ptr %C, i64 %n, i8 %y) #1 {
 ; DEFAULT-NEXT:    br i1 [[CONFLICT_RDX]], label %[[SCALAR_PH]], label %[[VECTOR_PH:.*]]
 ; DEFAULT:       [[VECTOR_PH]]:
 ; DEFAULT-NEXT:    [[TMP8:%.*]] = call i64 @llvm.vscale.i64()
-; DEFAULT-NEXT:    [[TMP9:%.*]] = mul nuw i64 [[TMP8]], 4
+; DEFAULT-NEXT:    [[TMP9:%.*]] = shl nuw i64 [[TMP8]], 2
 ; DEFAULT-NEXT:    [[N_MOD_VF:%.*]] = urem i64 [[TMP0]], [[TMP9]]
 ; DEFAULT-NEXT:    [[N_VEC:%.*]] = sub i64 [[TMP0]], [[N_MOD_VF]]
 ; DEFAULT-NEXT:    [[BROADCAST_SPLATINSERT:%.*]] = insertelement <vscale x 4 x i8> poison, i8 [[Y]], i64 0
@@ -1273,7 +1273,7 @@ define void @pred_udiv_select_cost(ptr %A, ptr %B, ptr %C, i64 %n, i8 %y) #1 {
 ; PRED-NEXT:    br i1 [[CONFLICT_RDX]], label %[[SCALAR_PH:.*]], label %[[VECTOR_PH:.*]]
 ; PRED:       [[VECTOR_PH]]:
 ; PRED-NEXT:    [[TMP5:%.*]] = call i64 @llvm.vscale.i64()
-; PRED-NEXT:    [[TMP6:%.*]] = mul nuw i64 [[TMP5]], 16
+; PRED-NEXT:    [[TMP6:%.*]] = shl nuw i64 [[TMP5]], 4
 ; PRED-NEXT:    [[TMP7:%.*]] = call i64 @llvm.vscale.i64()
 ; PRED-NEXT:    [[TMP8:%.*]] = shl nuw i64 [[TMP7]], 4
 ; PRED-NEXT:    [[TMP9:%.*]] = sub i64 [[TMP0]], [[TMP8]]
diff --git a/llvm/test/Transforms/LoopVectorize/AArch64/divs-with-scalable-vfs.ll b/llvm/test/Transforms/LoopVectorize/AArch64/divs-with-scalable-vfs.ll
index 2b294696ebe89..99bbcad95b6de 100644
--- a/llvm/test/Transforms/LoopVectorize/AArch64/divs-with-scalable-vfs.ll
+++ b/llvm/test/Transforms/LoopVectorize/AArch64/divs-with-scalable-vfs.ll
@@ -21,8 +21,8 @@ define void @sdiv_feeding_gep(ptr %dst, i32 %x, i64 %M, i64 %conv6, i64 %N) {
 ; CHECK-NEXT:    br i1 [[TMP7]], label %[[SCALAR_PH]], label %[[VECTOR_PH:.*]]
 ; CHECK:       [[VECTOR_PH]]:
 ; CHECK-NEXT:    [[TMP8:%.*]] = call i64 @llvm.vscale.i64()
-; CHECK-NEXT:    [[TMP11:%.*]] = mul nuw i64 [[TMP8]], 2
-; CHECK-NEXT:    [[TMP9:%.*]] = mul nuw i64 [[TMP11]], 2
+; CHECK-NEXT:    [[TMP11:%.*]] = shl nuw i64 [[TMP8]], 1
+; CHECK-NEXT:    [[TMP9:%.*]] = shl nuw i64 [[TMP11]], 1
 ; CHECK-NEXT:    [[N_MOD_VF:%.*]] = urem i64 [[N]], [[TMP9]]
 ; CHECK-NEXT:    [[N_VEC:%.*]] = sub i64 [[N]], [[N_MOD_VF]]
 ; CHECK-NEXT:    [[TMP18:%.*]] = sdiv i64 [[M]], [[CONV6]]
@@ -106,7 +106,7 @@ define void @sdiv_feeding_gep_predicated(ptr %dst, i32 %x, i64 %M, i64 %conv6, i
 ; CHECK-NEXT:    br i1 [[TMP4]], label %[[SCALAR_PH:.*]], label %[[VECTOR_PH:.*]]
 ; CHECK:       [[VECTOR_PH]]:
 ; CHECK-NEXT:    [[TMP5:%.*]] = call i64 @llvm.vscale.i64()
-; CHECK-NEXT:    [[TMP6:%.*]] = mul nuw i64 [[TMP5]], 2
+; CHECK-NEXT:    [[TMP6:%.*]] = shl nuw i64 [[TMP5]], 1
 ; CHECK-NEXT:    [[TMP10:%.*]] = call i64 @llvm.vscale.i64()
 ; CHECK-NEXT:    [[TMP11:%.*]] = shl nuw i64 [[TMP10]], 1
 ; CHECK-NEXT:    [[TMP12:%.*]] = sub i64 [[N]], [[TMP11]]
@@ -220,7 +220,7 @@ define void @udiv_urem_feeding_gep(i64 %x, ptr %dst, i64 %N) {
 ; CHECK-NEXT:    br i1 [[TMP4]], label %[[SCALAR_PH:.*]], label %[[VECTOR_PH:.*]]
 ; CHECK:       [[VECTOR_PH]]:
 ; CHECK-NEXT:    [[TMP5:%.*]] = call i64 @llvm.vscale.i64()
-; CHECK-NEXT:    [[TMP6:%.*]] = mul nuw i64 [[TMP5]], 2
+; CHECK-NEXT:    [[TMP6:%.*]] = shl nuw i64 [[TMP5]], 1
 ; CHECK-NEXT:    [[TMP10:%.*]] = call i64 @llvm.vscale.i64()
 ; CHECK-NEXT:    [[TMP11:%.*]] = shl nuw i64 [[TMP10]], 1
 ; CHECK-NEXT:    [[TMP12:%.*]] = sub i64 [[TMP0]], [[TMP11]]
diff --git a/llvm/test/Transforms/LoopVectorize/AArch64/eliminate-tail-predication.ll b/llvm/test/Transforms/LoopVectorize/AArch64/eliminate-tail-predication.ll
index 14c53cd89c922..74598e2063e48 100644
--- a/llvm/test/Transforms/LoopVectorize/AArch64/eliminate-tail-predication.ll
+++ b/llvm/test/Transforms/LoopVectorize/AArch64/eliminate-tail-predication.ll
@@ -12,7 +12,7 @@ define void @f1(ptr %A) #0 {
 ; CHECK-NEXT:    br label [[VECTOR_PH:%.*]]
 ; CHECK:       vector.ph:
 ; CHECK-NEXT:    [[TMP0:%.*]] = call i64 @llvm.vscale.i64()
-; CHECK-NEXT:    [[TMP1:%.*]] = mul nuw i64 [[TMP0]], 4
+; CHECK-NEXT:    [[TMP1:%.*]] = shl nuw i64 [[TMP0]], 2
 ; CHECK-NEXT:    [[N_MOD_VF:%.*]] = urem i64 1024, [[TMP1]]
 ; CHECK-NEXT:    [[N_VEC:%.*]] = sub i64 1024, [[N_MOD_VF]]
 ; CHECK-NEXT:    br label [[VECTOR_BODY:%.*]]
diff --git a/llvm/test/Transforms/LoopVectorize/AArch64/gather-do-not-vectorize-addressing.ll b/llvm/test/Transforms/LoopVectorize/AArch64/gather-do-not-vectorize-addressing.ll
index 26a9545764091..d18283f831799 100644
--- a/llvm/test/Transforms/LoopVectorize/AArch64/gather-do-not-vectorize-addressing.ll
+++ b/llvm/test/Transforms/LoopVectorize/AArch64/gather-do-not-vectorize-addressing.ll
@@ -77,7 +77,7 @@ define dso_local double @test(ptr nocapture noundef readonly %data, ptr nocaptur
 ; SVE-NEXT:    br i1 [[MIN_ITERS_CHECK]], label [[SCALAR_PH:%.*]], label [[VECTOR_PH:%.*]]
 ; SVE:       vector.ph:
 ; SVE-NEXT:    [[TMP2:%.*]] = call i64 @llvm.vscale.i64()
-; SVE-NEXT:    [[TMP3:%.*]] = mul nuw i64 [[TMP2]], 2
+; SVE-NEXT:    [[TMP3:%.*]] = shl nuw i64 [[TMP2]], 1
 ; SVE-NEXT:    [[N_MOD_VF:%.*]] = urem i64 [[WIDE_TRIP_COUNT]], [[TMP3]]
 ; SVE-NEXT:    [[N_VEC:%.*]] = sub i64 [[WIDE_TRIP_COUNT]], [[N_MOD_VF]]
 ; SVE-NEXT:    br label [[VECTOR_BODY:%.*]]
diff --git a/llvm/test/Transforms/LoopVectorize/AArch64/induction-costs-sve.ll b/llvm/test/Transforms/LoopVectorize/AArch64/induction-costs-sve.ll
index 4cacc30a714b6..aa6e4df26d71b 100644
--- a/llvm/test/Transforms/LoopVectorize/AArch64/induction-costs-sve.ll
+++ b/llvm/test/Transforms/LoopVectorize/AArch64/induction-costs-sve.ll
@@ -30,8 +30,8 @@ define void @iv_casts(ptr %dst, ptr %src, i32 %x, i64 %N) #0 {
 ; DEFAULT-NEXT:    br i1 [[MIN_ITERS_CHECK3]], label %[[VEC_EPILOG_PH:.*]], label %[[VECTOR_PH:.*]]
 ; DEFAULT:       [[VECTOR_PH]]:
 ; DEFAULT-NEXT:    [[TMP9:%.*]] = call i64 @llvm.vscale.i64()
-; DEFAULT-NEXT:    [[TMP13:%.*]] = mul nuw i64 [[TMP9]], 8
-; DEFAULT-NEXT:    [[TMP10:%.*]] = mul nuw i64 [[TMP13]], 2
+; DEFAULT-NEXT:    [[TMP13:%.*]] = shl nuw i64 [[TMP9]], 3
+; DEFAULT-NEXT:    [[TMP10:%.*]] = shl nuw i64 [[TMP13]], 1
 ; DEFAULT-NEXT:    [[N_MOD_VF:%.*]] = urem i64 [[TMP0]], [[TMP10]]
 ; DEFAULT-NEXT:    [[N_VEC:%.*]] = sub i64 [[TMP0]], [[N_MOD_VF]]
 ; DEFAULT-NEXT:    [[BROADCAST_SPLATINSERT:%.*]] = insertelement <vscale x 8 x i32> poison, i32 [[X]], i64 0
@@ -70,7 +70,7 @@ define void @iv_casts(ptr %dst, ptr %src, i32 %x, i64 %N) #0 {
 ; DEFAULT:       [[VEC_EPILOG_PH]]:
 ; DEFAULT-NEXT:    [[VEC_EPILOG_RESUME_VAL:%.*]] = phi i64 [ [[N_VEC]], %[[VEC_EPILOG_ITER_CHECK]] ], [ 0, %[[VECTOR_MAIN_LOOP_ITER_CHECK]] ]
 ; DEFAULT-NEXT:    [[TMP33:%.*]] = call i64 @llvm.vscale.i64()
-; DEFAULT-NEXT:    [[TMP34:%.*]] = mul nuw i64 [[TMP33]], 4
+; DEFAULT-NEXT:    [[TMP34:%.*]] = shl nuw i64 [[TMP33]], 2
 ; DEFAULT-NEXT:    [[N_MOD_VF5:%.*]] = urem i64 [[TMP0]], [[TMP34]]
 ; DEFAULT-NEXT:    [[N_VEC6:%.*]] = sub i64 [[TMP0]], [[N_MOD_VF5]]
 ; DEFAULT-NEXT:    [[BROADCAST_SPLATINSERT7:%.*]] = insertelement <vscale x 4 x i32> poison, i32 [[X]], i64 0
@@ -130,7 +130,7 @@ define void @iv_casts(ptr %dst, ptr %src, i32 %x, i64 %N) #0 {
 ; PRED-NEXT:    br i1 [[DIFF_CHECK]], label %[[SCALAR_PH:.*]], label %[[VECTOR_PH:.*]]
 ; PRED:       [[VECTOR_PH]]:
 ; PRED-NEXT:    [[TMP4:%.*]] = call i64 @llvm.vscale.i64()
-; PRED-NEXT:    [[TMP5:%.*]] = mul nuw i64 [[TMP4]], 16
+; PRED-NEXT:    [[TMP5:%.*]] = shl nuw i64 [[TMP4]], 4
 ; PRED-NEXT:    [[BROADCAST_SPLATINSERT:%.*]] = insertelement <vscale x 16 x i32> poison, i32 [[X]], i64 0
 ; PRED-NEXT:    [[BROADCAST_SPLAT:%.*]] = shufflevector <vscale x 16 x i32> [[BROADCAST_SPLATINSERT]], <vscale x 16 x i32> poison, <vscale x 16 x i32> zeroinitializer
 ; PRED-NEXT:    [[TMP6:%.*]] = call i64 @llvm.vscale.i64()
diff --git a/llvm/test/Transforms/LoopVectorize/AArch64/interleaving-load-store.ll b/llvm/test/Transforms/LoopVectorize/AArch64/interleaving-load-store.ll
index c2224858049b7..bdc36ee44559e 100644
--- a/llvm/test/Transforms/LoopVectorize/AArch64/interleaving-load-store.ll
+++ b/llvm/test/Transforms/LoopVectorize/AArch64/interleaving-load-store.ll
@@ -227,8 +227,8 @@ define void @interleave_single_load_store(ptr %src, ptr %dst, i64 %N, i8 %a, i8
 ; INTERLEAVE-4-VLA-NEXT:    br i1 [[MIN_ITERS_CHECK3]], label [[VEC_EPILOG_PH:%.*]], label [[VECTOR_PH:%.*]]
 ; INTERLEAVE-4-VLA:       vector.ph:
 ; INTERLEAVE-4-VLA-NEXT:    [[TMP6:%.*]] = call i64 @llvm.vscale.i64()
-; INTERLEAVE-4-VLA-NEXT:    [[TMP10:%.*]] = mul nuw i64 [[TMP6]], 16
-; INTERLEAVE-4-VLA-NEXT:    [[TMP7:%.*]] = mul nuw i64 [[TMP10]], 4
+; INTERLEAVE-4-VLA-NEXT:    [[TMP10:%.*]] = shl nuw i64 [[TMP6]], 4
+; INTERLEAVE-4-VLA-NEXT:    [[TMP7:%.*]] = shl nuw i64 [[TMP10]], 2
 ; INTERLEAVE-4-VLA-NEXT:    [[N_MOD_VF:%.*]] = urem i64 [[N]], [[TMP7]]
 ; INTERLEAVE-4-VLA-NEXT:    [[N_VEC:%.*]] = sub i64 [[N]], [[N_MOD_VF]]
 ; INTERLEAVE-4-VLA-NEXT:    [[BROADCAST_SPLATINSERT:%.*]] = insertelement <vscale x 16 x i8> poison, i8 [[B:%.*]], i64 0
@@ -239,7 +239,7 @@ define void @interleave_single_load_store(ptr %src, ptr %dst, i64 %N, i8 %a, i8
 ; INTERLEAVE-4-VLA:       vector.body:
 ; INTERLEAVE-4-VLA-NEXT:    [[INDEX:%.*]] = phi i64 [ 0, [[VECTOR_PH]] ], [ [[INDEX_NEXT:%.*]], [[VECTOR_BODY]] ]
 ; INTERLEAVE-4-VLA-NEXT:    [[TMP8:%.*]] = getelementptr inbounds i8, ptr [[SRC]], i64 [[INDEX]]
-; INTERLEAVE-4-VLA-NEXT:    [[TMP13:%.*]] = mul nuw nsw i64 [[TMP10]], 2
+; INTERLEAVE-4-VLA-NEXT:    [[TMP13:%.*]] = shl nuw nsw i64 [[TMP10]], 1
 ; INTERLEAVE-4-VLA-NEXT:    [[TMP16:%.*]] = mul nuw nsw i64 [[TMP10]], 3
 ; INTERLEAVE-4-VLA-NEXT:    [[TMP11:%.*]] = getelementptr inbounds i8, ptr [[TMP8]], i64 [[TMP10]]
 ; INTERLEAVE-4-VLA-NEXT:    [[TMP14:%.*]] = getelementptr inbounds i8, ptr [[TMP8]], i64 [[TMP13]]
diff --git a/llvm/test/Transforms/LoopVectorize/AArch64/interleaving-reduction.ll b/llvm/test/Transforms/LoopVectorize/AArch64/interleaving-reduction.ll
index 040acb494a42e..fff9365baccb4 100644
--- a/llvm/test/Transforms/LoopVectorize/AArch64/interleaving-reduction.ll
+++ b/llvm/test/Transforms/LoopVectorize/AArch64/interleaving-reduction.ll
@@ -144,8 +144,8 @@ define i32 @interleave_integer_reduction(ptr %src, i64 %N) {
 ; INTERLEAVE-4-VLA-NEXT:    br i1 [[MIN_ITERS_CHECK1]], label [[VEC_EPILOG_PH:%.*]], label [[VECTOR_PH:%.*]]
 ; INTERLEAVE-4-VLA:       vector.ph:
 ; INTERLEAVE-4-VLA-NEXT:    [[TMP2:%.*]] = call i64 @llvm.vscale.i64()
-; INTERLEAVE-4-VLA-NEXT:    [[TMP5:%.*]] = mul nuw i64 [[TMP2]], 4
-; INTERLEAVE-4-VLA-NEXT:    [[TMP3:%.*]] = mul nuw i64 [[TMP5]], 4
+; INTERLEAVE-4-VLA-NEXT:    [[TMP5:%.*]] = shl nuw i64 [[TMP2]], 2
+; INTERLEAVE-4-VLA-NEXT:    [[TMP3:%.*]] = shl nuw i64 [[TMP5]], 2
 ; INTERLEAVE-4-VLA-NEXT:    [[N_MOD_VF:%.*]] = urem i64 [[N]], [[TMP3]]
 ; INTERLEAVE-4-VLA-NEXT:    [[N_VEC:%.*]] = sub i64 [[N]], [[N_MOD_VF]]
 ; INTERLEAVE-4-VLA-NEXT:    br label [[VECTOR_BODY:%.*]]
@@ -156,7 +156,7 @@ define i32 @interleave_integer_reduction(ptr %src, i64 %N) {
 ; INTERLEAVE-4-VLA-NEXT:    [[VEC_PHI3:%.*]] = phi <vscale x 4 x i32> [ zeroinitializer, [[VECTOR_PH]] ], [ [[TMP16:%.*]], [[VECTOR_BODY]] ]
 ; INTERLEAVE-4-VLA-NEXT:    [[VEC_PHI4:%.*]] = phi <vscale x 4 x i32> [ zeroinitializer, [[VECTOR_PH]] ], [ [[TMP17:%.*]], [[VECTOR_BODY]] ]
 ; INTERLEAVE-4-VLA-NEXT:    [[TMP4:%.*]] = getelementptr inbounds i32, ptr [[SRC:%.*]], i64 [[INDEX]]
-; INTERLEAVE-4-VLA-NEXT:    [[TMP9:%.*]] = mul nuw nsw i64 [[TMP5]], 2
+; INTERLEAVE-4-VLA-NEXT:    [[TMP9:%.*]] = shl nuw nsw i64 [[TMP5]], 1
 ; INTERLEAVE-4-VLA-NEXT:    [[TMP12:%.*]] = mul nuw nsw i64 [[TMP5]], 3
 ; INTERLEAVE-4-VLA-NEXT:    [[TMP7:%.*]] = getelementptr inbounds i32, ptr [[TMP4]], i64 [[TMP5]]
 ; INTERLEAVE-4-VLA-NEXT:    [[TMP10:%.*]] = getelementptr inbounds i32, ptr [[TMP4]], i64 [[TMP9]]
diff --git a/llvm/test/Transforms/LoopVectorize/AArch64/invalid-costs.ll b/llvm/test/Transforms/LoopVectorize/AArch64/invalid-costs.ll
index cc3b1c9c9db8a..de5a24666626c 100644
--- a/llvm/test/Transforms/LoopVectorize/AArch64/invalid-costs.ll
+++ b/llvm/test/Transforms/LoopVectorize/AArch64/invalid-costs.ll
@@ -16,7 +16,7 @@ define void @replicate_sdiv_conditional(ptr noalias %a, ptr noalias %b, ptr noal
 ; CHECK-NEXT:    br i1 [[MIN_ITERS_CHECK]], label %[[SCALAR_PH:.*]], label %[[VECTOR_PH:.*]]
 ; CHECK:       [[VECTOR_PH]]:
 ; CHECK-NEXT:    [[TMP2:%.*]] = call i64 @llvm.vscale.i64()
-; CHECK-NEXT:    [[TMP3:%.*]] = mul nuw i64 [[TMP2]], 4
+; CHECK-NEXT:    [[TMP3:%.*]] = shl nuw i64 [[TMP2]], 2
 ; CHECK-NEXT:    [[N_MOD_VF:%.*]] = urem i64 64, [[TMP3]]
 ; CHECK-NEXT:    [[N_VEC:%.*]] = sub i64 64, [[N_MOD_VF]]
 ; CHECK-NEXT:    br label %[[VECTOR_BODY:.*]]
diff --git a/llvm/test/Transforms/LoopVectorize/AArch64/load-cast-context.ll b/llvm/test/Transforms/LoopVectorize/AArch64/load-cast-context.ll
index 116c017a4bd73..2b0684f3cda01 100644
--- a/llvm/test/Transforms/LoopVectorize/AArch64/load-cast-context.ll
+++ b/llvm/test/Transforms/LoopVectorize/AArch64/load-cast-context.ll
@@ -73,8 +73,8 @@ define i32 @sext_of_non_memory_op(ptr %src, ...
[truncated]

asb · 2026-01-06T14:04:02Z

I'll leave an actual review to others who have been iterating on this patch before, but just to note that I've tested this on llvm-test-suite built with rva23 with no problems.

fhahn

lgtm, thanks

The patch is buggy.

The mul-to-shl simplification done by a2db31b ([VPlan] Simplify pow-of-2 (mul|udiv) -> (shl|lshr)) has a major bug in a commutative matcher, and made bad test updates as a result. Fix this immediately.

Follow up on a post-commit review of a2db31b ([VPlan] Simplify pow-of-2 (mul|udiv) -> (shl|lshr)) to strengthen a VPlanPatternMatch assert in the case when number of operands mismatch.

github-actions · 2026-01-06T20:33:39Z

🪟 Windows x64 Test Results

129170 tests passed
2843 tests skipped
1 test failed

Failed Tests

(click on a test name to see its output)

LLVM

LLVM.CodeGen/RISCV/riscv-macho.ll

Exit Code: 3221225477

Command Output (stdout):
--
# RUN: at line 1
c:\_work\llvm-project\llvm-project\build\bin\llc.exe -mtriple=riscv32-apple-macho C:\_work\llvm-project\llvm-project\llvm\test\CodeGen\RISCV\riscv-macho.ll -o - | c:\_work\llvm-project\llvm-project\build\bin\filecheck.exe C:\_work\llvm-project\llvm-project\llvm\test\CodeGen\RISCV\riscv-macho.ll
# executed command: 'c:\_work\llvm-project\llvm-project\build\bin\llc.exe' -mtriple=riscv32-apple-macho 'C:\_work\llvm-project\llvm-project\llvm\test\CodeGen\RISCV\riscv-macho.ll' -o -
# note: command had no output on stdout or stderr
# executed command: 'c:\_work\llvm-project\llvm-project\build\bin\filecheck.exe' 'C:\_work\llvm-project\llvm-project\llvm\test\CodeGen\RISCV\riscv-macho.ll'
# note: command had no output on stdout or stderr
# RUN: at line 2
c:\_work\llvm-project\llvm-project\build\bin\llc.exe -mtriple=riscv32-apple-macho -filetype=obj C:\_work\llvm-project\llvm-project\llvm\test\CodeGen\RISCV\riscv-macho.ll -o C:\_work\llvm-project\llvm-project\build\test\CodeGen\RISCV\Output\riscv-macho.ll.tmp.o
# executed command: 'c:\_work\llvm-project\llvm-project\build\bin\llc.exe' -mtriple=riscv32-apple-macho -filetype=obj 'C:\_work\llvm-project\llvm-project\llvm\test\CodeGen\RISCV\riscv-macho.ll' -o 'C:\_work\llvm-project\llvm-project\build\test\CodeGen\RISCV\Output\riscv-macho.ll.tmp.o'
# .---command stderr------------
# | PLEASE submit a bug report to https://github.com/llvm/llvm-project/issues/ and include the crash backtrace and instructions to reproduce the bug.
# | Stack dump:
# | 0.	Program arguments: c:\\_work\\llvm-project\\llvm-project\\build\\bin\\llc.exe -mtriple=riscv32-apple-macho -filetype=obj C:\\_work\\llvm-project\\llvm-project\\llvm\\test\\CodeGen\\RISCV\\riscv-macho.ll -o C:\\_work\\llvm-project\\llvm-project\\build\\test\\CodeGen\\RISCV\\Output\\riscv-macho.ll.tmp.o
# | Exception Code: 0xC0000005
# |  #0 0x00007ff60e36fbbc (c:\_work\llvm-project\llvm-project\build\bin\llc.exe+0x2e4fbbc)
# |  #1 0x00007ff60d3a9ee8 (c:\_work\llvm-project\llvm-project\build\bin\llc.exe+0x1e89ee8)
# |  #2 0x00007ff60bd8e7e5 (c:\_work\llvm-project\llvm-project\build\bin\llc.exe+0x86e7e5)
# |  #3 0x00007ff60b9e19d1 (c:\_work\llvm-project\llvm-project\build\bin\llc.exe+0x4c19d1)
# |  #4 0x00007ff60b9d9fe1 (c:\_work\llvm-project\llvm-project\build\bin\llc.exe+0x4b9fe1)
# |  #5 0x00007ff60b527c0c (c:\_work\llvm-project\llvm-project\build\bin\llc.exe+0x7c0c)
# |  #6 0x00007ff60b524e86 (c:\_work\llvm-project\llvm-project\build\bin\llc.exe+0x4e86)
# |  #7 0x00007ff60fb3b114 (c:\_work\llvm-project\llvm-project\build\bin\llc.exe+0x461b114)
# |  #8 0x00007ff807684cb0 (C:\Windows\System32\KERNEL32.DLL+0x14cb0)
# |  #9 0x00007ff8134fedcb (C:\Windows\SYSTEM32\ntdll.dll+0x7edcb)
# `-----------------------------
# error: command failed with exit status: 0xc0000005

--

If these failures are unrelated to your changes (for example tests are broken or flaky at HEAD), please open an issue at https://github.com/llvm/llvm-project/issues and add the infrastructure label.

The UDiv fold added in d12e993 (llvm#174581) is currently also applied to replicate regions, which means we may end up with VPInstructions in replicate regions, which is currently nots supported. Fixes llvm#175295.

The UDiv fold added in d12e993 (#174581) is currently also applied to replicate regions, which means we may end up with VPInstructions in replicate regions, which is currently nots supported. Fixes #175295. PR: #175460

The UDiv fold added in d12e993 (llvm#174581) is currently also applied to replicate regions, which means we may end up with VPInstructions in replicate regions, which is currently nots supported. Fixes llvm#175295. PR: llvm#175460

artagnon requested review from asb and fhahn January 6, 2026 12:57

llvmbot added backend:RISC-V backend:SystemZ vectorizers llvm:transforms labels Jan 6, 2026

fhahn approved these changes Jan 6, 2026

View reviewed changes

artagnon added 4 commits January 6, 2026 19:45

[VPlan] Cherry-pick simplify pow-of-2 (mul|udiv) -> (shl|lshr)

4002227

The patch is buggy.

[VPlan] Fix commutative matcher error in mul-to-shl simpl

0cfc2b8

The mul-to-shl simplification done by a2db31b ([VPlan] Simplify pow-of-2 (mul|udiv) -> (shl|lshr)) has a major bug in a commutative matcher, and made bad test updates as a result. Fix this immediately.

[VPlanPatternMatch] Strengthen assert on num-ops mismatch

803803a

Follow up on a post-commit review of a2db31b ([VPlan] Simplify pow-of-2 (mul|udiv) -> (shl|lshr)) to strengthen a VPlanPatternMatch assert in the case when number of operands mismatch.

[VPlan] Use a FileCheck pattern in a VPlan test

67b19b1

artagnon force-pushed the vplan-simpl-muldiv-reland branch from 2884f21 to 67b19b1 Compare January 6, 2026 19:45

artagnon enabled auto-merge (squash) January 6, 2026 19:46

artagnon disabled auto-merge January 6, 2026 20:36

artagnon merged commit d12e993 into llvm:main Jan 6, 2026
9 of 10 checks passed

artagnon deleted the vplan-simpl-muldiv-reland branch January 6, 2026 20:36

fhahn mentioned this pull request Jan 11, 2026

[VPlan] Don't fold UDiv in replicate regions. #175460

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Reland [VPlan] Simplify pow-of-2 (mul|udiv) -> (shl|lshr)#174581

Reland [VPlan] Simplify pow-of-2 (mul|udiv) -> (shl|lshr)#174581
artagnon merged 4 commits intollvm:mainfrom
artagnon:vplan-simpl-muldiv-reland

artagnon commented Jan 6, 2026

Uh oh!

llvmbot commented Jan 6, 2026 •

edited

Loading

Uh oh!

asb commented Jan 6, 2026

Uh oh!

fhahn left a comment

Uh oh!

github-actions bot commented Jan 6, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

Conversation

artagnon commented Jan 6, 2026

Uh oh!

llvmbot commented Jan 6, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

asb commented Jan 6, 2026

Uh oh!

fhahn left a comment

Choose a reason for hiding this comment

Uh oh!

github-actions bot commented Jan 6, 2026

🪟 Windows x64 Test Results

Failed Tests

LLVM

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

llvmbot commented Jan 6, 2026 •

edited

Loading