[VectorCombine] Compact shuffle operands by eliminating unused elements #176074

ginsbach · 2026-01-15T01:45:59Z

This patch implements an optimization in VectorCombine that reduces the size of shuffle operands by eliminating elements that are not referenced by the shuffle mask.

Motivation

I came up with this transformation for #137447, to improve the optimization of the following code:

#include <arm_neon.h>

int8x16_t f(int8_t x)
{
  return (int8x16_t) { x, 0, x, 1, x, 2, x, 3,
                       x, 4, x, 5, x, 6, x, 7 };
}

int8x16_t g(int8_t x)
{
  return (int8x16_t) { 0, x, 1, x, 2, x, 3, x,
                       4, x, 5, x, 6, x, 7, x };
}

On main, this generates two vectors that are loaded from the constant pool and then each interleaved with a splat vector:

<16 x i8> <i8 poison, i8 0, i8 poison, i8 1, i8 poison, i8 2, i8 poison, i8 3, i8 poison, i8 4, i8 poison, i8 5, i8 poison, i8 6, i8 poison, i8 7>
<16 x i8> <i8 0, i8 poison, i8 1, i8 poison, i8 2, i8 poison, i8 3, i8 poison, i8 4, i8 poison, i8 5, i8 poison, i8 6, i8 poison, i8 7, i8 poison>

This PR's transformation compacts both to:

<8 x i8> <i8 0, i8 1, i8 2, i8 3, i8 4, i8 5, i8 6, i8 7>

The first two test cases in shufflevec-compact-operands.ll are based on this motivating example.

The primary benefit should be reducing constant pool usage. In this specific case, the constant pool could even be eliminated entirely, as the new vectors could be generated with the index instruction.

Note that this ideal will need follow-up work in the backend. With this PR alone, the output for f (and almost exactly the same for g) becomes:

.LCPI0_0:
	.byte	0                               // 0x0
	.byte	1                               // 0x1
	.byte	2                               // 0x2
	.byte	3                               // 0x3
	.byte	4                               // 0x4
	.byte	5                               // 0x5
	.byte	6                               // 0x6
	.byte	7                               // 0x7
	.zero	1
	.zero	1
	.zero	1
	.zero	1
	.zero	1
	.zero	1
	.zero	1
	.zero	1
	.text
	.globl	f
	.p2align	2
	.type	f,@function
f:                                      // @f
	.cfi_startproc
// %bb.0:                               // %entry
	adrp	x8, .LCPI0_0
	dup	v0.16b, w0
	ldr	q1, [x8, :lo12:.LCPI0_0]
	zip1	v0.16b, v0.16b, v1.16b
	ret

Implementation

The optimization identifies which elements are actually used from each shuffle operand. For single-use compactable operands (constants or shuffles), it creates narrower vectors containing only the used elements and updates the shuffle mask accordingly.

The transformation uses TTI::getShuffleCost to ensure it doesn't increase the overall cost. It only applies when at least one operand is a constant. Each operand can be compacted if it is either a constant or a single-use shufflevector instruction; other operands are left unchanged.

Questions for Reviewers

Non-power-of-2 vector sizes: The optimization can create vectors like <3 x i32>, and the heuristic does not consider power-of-2 vector sizes preferable in any way. I believe the backend handles this gracefully, but I'd appreciate confirmation.
Code structure: I used std::function with move-captured lambdas to defer operand construction, which feels somewhat inelegant. However, alternatives seem worse: we need to analyze both operands before deciding whether to transform, but ShuffleVectorInst is immutable once created, so we can't speculatively create compacted operands. Creating temporary instructions would be wasteful if we don't apply the transformation. Is this lambda-based approach acceptable given these constraints, or is there a cleaner design I'm missing?

github-actions · 2026-01-15T01:47:53Z

✅ With the latest revision this PR passed the C/C++ code formatter.

github-actions · 2026-01-15T01:47:53Z

✅ With the latest revision this PR passed the undef deprecator.

llvmbot · 2026-01-15T13:23:11Z

@llvm/pr-subscribers-llvm-transforms

@llvm/pr-subscribers-vectorizers

Author: Philip Ginsbach-Chen (ginsbach)

Changes

This patch implements an optimization in VectorCombine that reduces the size of shuffle operands by eliminating elements that are not referenced by the shuffle mask.

Motivation

I came up with this transformation for #137447, to improve the optimization of the following code:

#include &lt;arm_neon.h&gt;

int8x16_t f(int8_t x)
{
  return (int8x16_t) { x, 0, x, 1, x, 2, x, 3,
                       x, 4, x, 5, x, 6, x, 7 };
}

int8x16_t g(int8_t x)
{
  return (int8x16_t) { 0, x, 1, x, 2, x, 3, x,
                       4, x, 5, x, 6, x, 7, x };
}

On main, this generates two vectors that are loaded from the constant pool and then each interleaved with a splat vector:

&lt;16 x i8&gt; &lt;i8 poison, i8 0, i8 poison, i8 1, i8 poison, i8 2, i8 poison, i8 3, i8 poison, i8 4, i8 poison, i8 5, i8 poison, i8 6, i8 poison, i8 7&gt;
&lt;16 x i8&gt; &lt;i8 0, i8 poison, i8 1, i8 poison, i8 2, i8 poison, i8 3, i8 poison, i8 4, i8 poison, i8 5, i8 poison, i8 6, i8 poison, i8 7, i8 poison&gt;

This PR's transformation compacts both to:

&lt;8 x i8&gt; &lt;i8 0, i8 1, i8 2, i8 3, i8 4, i8 5, i8 6, i8 7&gt;

The first two test cases in shufflevec-compact-operands.ll are based on this motivating example.

The primary benefit should be reducing constant pool usage. In this specific case, the constant pool could even be eliminated entirely, as the new vectors could be generated with the index instruction.

Note that this ideal will need follow-up work in the backend. With this PR alone, the output for f (and almost exactly the same for g) becomes:

.LCPI0_0:
	.byte	0                               // 0x0
	.byte	1                               // 0x1
	.byte	2                               // 0x2
	.byte	3                               // 0x3
	.byte	4                               // 0x4
	.byte	5                               // 0x5
	.byte	6                               // 0x6
	.byte	7                               // 0x7
	.zero	1
	.zero	1
	.zero	1
	.zero	1
	.zero	1
	.zero	1
	.zero	1
	.zero	1
	.text
	.globl	f
	.p2align	2
	.type	f,@<!-- -->function
f:                                      // @<!-- -->f
	.cfi_startproc
// %bb.0:                               // %entry
	adrp	x8, .LCPI0_0
	dup	v0.16b, w0
	ldr	q1, [x8, :lo12:.LCPI0_0]
	zip1	v0.16b, v0.16b, v1.16b
	ret

Implementation

The optimization identifies which elements are actually used from each shuffle operand. For single-use compactable operands (constants or shuffles), it creates narrower vectors containing only the used elements and updates the shuffle mask accordingly.

The transformation uses TTI::getShuffleCost to ensure it doesn't increase the overall cost. It only applies when at least one operand is a constant. Each operand can be compacted if it is either a constant or a single-use shufflevector instruction; other operands are left unchanged.

Questions for Reviewers

Non-power-of-2 vector sizes: The optimization can create vectors like <3 x i32>, and the heuristic does not consider power-of-2 vector sizes preferable in any way. I believe the backend handles this gracefully, but I'd appreciate confirmation.
Code structure: I used std::function with move-captured lambdas to defer operand construction, which feels somewhat inelegant. However, alternatives seem worse: we need to analyze both operands before deciding whether to transform, but ShuffleVectorInst is immutable once created, so we can't speculatively create compacted operands. Creating temporary instructions would be wasteful if we don't apply the transformation. Is this lambda-based approach acceptable given these constraints, or is there a cleaner design I'm missing?

Patch is 42.28 KiB, truncated to 20.00 KiB below, full version: https://github.com/llvm/llvm-project/pull/176074.diff

9 Files Affected:

(modified) llvm/lib/Transforms/Vectorize/VectorCombine.cpp (+236)
(modified) llvm/test/Transforms/PhaseOrdering/X86/addsub.ll (+6-9)
(modified) llvm/test/Transforms/PhaseOrdering/X86/fmaddsub.ll (+4-4)
(modified) llvm/test/Transforms/VectorCombine/AArch64/shuffletoidentity.ll (+3-4)
(added) llvm/test/Transforms/VectorCombine/AArch64/shufflevec-compact-operands.ll (+212)
(modified) llvm/test/Transforms/VectorCombine/X86/extract-binop.ll (+4-10)
(modified) llvm/test/Transforms/VectorCombine/X86/extract-insert-poison.ll (+8-4)
(modified) llvm/test/Transforms/VectorCombine/X86/permute-of-binops.ll (+1-1)
(modified) llvm/test/Transforms/VectorCombine/X86/reduction-two-vecs-combine.ll (+1-2)

diff --git a/llvm/lib/Transforms/Vectorize/VectorCombine.cpp b/llvm/lib/Transforms/Vectorize/VectorCombine.cpp
index 3e06f74fa5c65..47ebe2ca24340 100644
--- a/llvm/lib/Transforms/Vectorize/VectorCombine.cpp
+++ b/llvm/lib/Transforms/Vectorize/VectorCombine.cpp
@@ -143,6 +143,7 @@ class VectorCombine {
   bool foldShufflesOfLengthChangingShuffles(Instruction &I);
   bool foldShuffleOfIntrinsics(Instruction &I);
   bool foldShuffleToIdentity(Instruction &I);
+  bool compactShuffleOperands(Instruction &I);
   bool foldShuffleFromReductions(Instruction &I);
   bool foldShuffleChainsToReduce(Instruction &I);
   bool foldCastFromReductions(Instruction &I);
@@ -2762,6 +2763,239 @@ bool VectorCombine::foldShuffleOfCastops(Instruction &I) {
   return true;
 }
 
+/// Describes whether and how a shuffle operand can be compacted.
+struct ShuffleOperandCompaction {
+  /// The cost difference between compacted and original operand. Used to avoid
+  /// compactions that increase cost. Zero if compaction cannot be applied, but
+  /// note that valid compactions may also have zero cost.
+  InstructionCost Cost;
+  /// The minimal width required for the compacted vector.
+  unsigned CompactedWidth;
+  /// Function to create the compacted operand, or nullptr if no compaction can
+  /// be applied.
+  std::function<Value *(unsigned, IRBuilder<InstSimplifyFolder> &)> Apply;
+};
+
+/// Attempt to narrow/compact a constant vector used in a shuffle by removing
+/// elements that are not referenced by the shuffle mask.
+static ShuffleOperandCompaction
+compactShuffleOperand(Constant *ShuffleInput,
+                      MutableArrayRef<int> UserShuffleMask, int IndexStart) {
+  auto *VecTy = cast<FixedVectorType>(ShuffleInput->getType());
+  unsigned Width = VecTy->getNumElements();
+
+  // Collect only the constant elements that are actually used.
+  SmallVector<Constant *, 16> CompactedElts;
+  // Map from original element index to compacted index.
+  SmallVector<int, 16> IndexRemap(Width, -1);
+
+  // Track whether used elements are already compacted at the front. Even if
+  // true, we may still shrink this operand by not re-adding trailing poison.
+  bool AlreadyCompacted = true;
+
+  // This modifies UserShuffleMask, so we cannot back out of transforming the
+  // operand while proceeding with compactShuffleOperands on the instruction.
+  for (int &MaskElt : UserShuffleMask) {
+    if (MaskElt >= IndexStart && MaskElt < IndexStart + (int)Width) {
+      int RelMaskElt = MaskElt - IndexStart;
+      if (IndexRemap[RelMaskElt] < 0) {
+        IndexRemap[RelMaskElt] = CompactedElts.size() + IndexStart;
+        CompactedElts.push_back(ShuffleInput->getAggregateElement(RelMaskElt));
+      }
+      if (IndexRemap[RelMaskElt] != MaskElt) {
+        AlreadyCompacted = false;
+        MaskElt = IndexRemap[RelMaskElt];
+      }
+    }
+  }
+
+  unsigned CompactedWidth = CompactedElts.size();
+
+  // To determine the eventual width (between CompactedWidth and Width), we have
+  // to consider the other operand. Hence, we return a functor here to delay.
+  return {0, CompactedWidth,
+          [ShuffleInput, AlreadyCompacted, Width, VecTy,
+           CompactedElts = std::move(CompactedElts)](
+              unsigned PaddedWidth,
+              IRBuilder<InstSimplifyFolder> &Builder) -> Value * {
+            // Return original if unchanged to guarantee fixpoint termination.
+            if (AlreadyCompacted && Width == PaddedWidth)
+              return ShuffleInput;
+
+            // Pad with poison to reach the requested width.
+            SmallVector<Constant *, 16> PaddedElts(CompactedElts);
+            while (PaddedElts.size() < PaddedWidth)
+              PaddedElts.push_back(PoisonValue::get(VecTy->getElementType()));
+
+            return ConstantVector::get(PaddedElts);
+          }};
+}
+
+/// Attempt to narrow/compact a shuffle instruction used in a shuffle by
+/// removing elements that are not referenced by the shuffle mask.
+static ShuffleOperandCompaction
+compactShuffleOperand(ShuffleVectorInst *ShuffleInput,
+                      MutableArrayRef<int> UserShuffleMask, int IndexStart,
+                      const TargetTransformInfo &TTI,
+                      TTI::TargetCostKind CostKind) {
+  auto *VecTy = cast<FixedVectorType>(ShuffleInput->getType());
+  unsigned Width = VecTy->getNumElements();
+
+  // Collect only the shuffle mask elements that are actually used.
+  SmallVector<int, 16> CompactedMask;
+  // Map from original element index to compacted index.
+  SmallVector<int, 16> IndexRemap(Width, -1);
+
+  // Track whether used elements are already compacted at the front. Even if
+  // true, we may still shrink this operand by not re-adding trailing poison.
+  bool AlreadyCompacted = true;
+
+  // This modifies UserShuffleMask, so we cannot back out of transforming the
+  // operand while proceeding with compactShuffleOperands on the instruction.
+  for (int &MaskElt : UserShuffleMask) {
+    if (MaskElt >= IndexStart && MaskElt < IndexStart + (int)Width) {
+      int RelMaskElt = MaskElt - IndexStart;
+      if (IndexRemap[RelMaskElt] < 0) {
+        IndexRemap[RelMaskElt] = CompactedMask.size() + IndexStart;
+        CompactedMask.push_back(ShuffleInput->getMaskValue(RelMaskElt));
+      }
+      if (IndexRemap[RelMaskElt] != MaskElt) {
+        AlreadyCompacted = false;
+        MaskElt = IndexRemap[RelMaskElt];
+      }
+    }
+  }
+
+  unsigned CompactedWidth = CompactedMask.size();
+
+  // Check if the compacted shuffle would be more expensive than the original.
+  InstructionCost CompactionCost(0);
+  if (!AlreadyCompacted) {
+    ArrayRef<int> OriginalMask = ShuffleInput->getShuffleMask();
+    auto *OriginalSrcTy =
+        cast<FixedVectorType>(ShuffleInput->getOperand(0)->getType());
+
+    InstructionCost OriginalCost =
+        TTI.getShuffleCost(TargetTransformInfo::SK_PermuteTwoSrc, VecTy,
+                           OriginalSrcTy, OriginalMask, CostKind);
+
+    // Create a type for the compacted shuffle result.
+    auto *CompactedDstTy =
+        FixedVectorType::get(VecTy->getElementType(), CompactedWidth);
+
+    InstructionCost CompactedCost = TTI.getShuffleCost(
+        TargetTransformInfo::SK_PermuteTwoSrc, CompactedDstTy, OriginalSrcTy,
+        CompactedMask, CostKind);
+
+    CompactionCost = CompactedCost - OriginalCost;
+  }
+
+  // To determine the eventual width (between CompactedWidth and Width), we have
+  // to consider the other operand. Hence, we return a functor here to delay.
+  return {CompactionCost, CompactedWidth,
+          [ShuffleInput, AlreadyCompacted, Width,
+           CompactedMask = std::move(CompactedMask)](
+              unsigned PaddedWidth,
+              IRBuilder<InstSimplifyFolder> &Builder) -> Value * {
+            // Return original if unchanged to guarantee fixpoint termination.
+            if (AlreadyCompacted && Width == PaddedWidth)
+              return ShuffleInput;
+
+            // Pad with poison mask elements to reach the requested width.
+            SmallVector<int, 16> PaddedMask(CompactedMask);
+            while (PaddedMask.size() < PaddedWidth)
+              PaddedMask.push_back(PoisonMaskElem);
+
+            return Builder.CreateShuffleVector(ShuffleInput->getOperand(0),
+                                               ShuffleInput->getOperand(1),
+                                               PaddedMask);
+          }};
+}
+
+/// Try to narrow/compact a shuffle operand by eliminating elements that are
+/// not used by the shuffle mask. This updates the shuffle mask in-place to
+/// reflect the compaction. Returns information about whether compaction is
+/// possible and a lambda to apply the compaction if beneficial.
+static ShuffleOperandCompaction
+compactShuffleOperand(Value *ShuffleInput, MutableArrayRef<int> ShuffleMask,
+                      int IndexStart, const TargetTransformInfo &TTI,
+                      TTI::TargetCostKind CostKind) {
+  auto *VecTy = cast<FixedVectorType>(ShuffleInput->getType());
+  unsigned Width = VecTy->getNumElements();
+  if (ShuffleInput->getNumUses() > 1)
+    return {0, Width, nullptr};
+
+  if (auto *C = dyn_cast<Constant>(ShuffleInput))
+    return compactShuffleOperand(C, ShuffleMask, IndexStart);
+  if (auto *Shuf = dyn_cast<ShuffleVectorInst>(ShuffleInput))
+    return compactShuffleOperand(Shuf, ShuffleMask, IndexStart, TTI, CostKind);
+
+  return {0, Width, nullptr};
+}
+
+/// Try to narrow the shuffle by eliminating unused elements from the operands.
+bool VectorCombine::compactShuffleOperands(Instruction &I) {
+  Value *LHS, *RHS;
+  ArrayRef<int> Mask;
+  if (!match(&I, m_Shuffle(m_Value(LHS), m_Value(RHS), m_Mask(Mask))))
+    return false;
+
+  // Require at least one constant operand to ensure profitability.
+  if (!isa<Constant>(LHS) && !isa<Constant>(RHS))
+    return false;
+
+  auto *LHSTy = dyn_cast<FixedVectorType>(LHS->getType());
+  if (!LHSTy)
+    return false;
+
+  // Analyze both operands. This updates NewMask in-place to reflect compaction.
+  unsigned LHSWidth = LHSTy->getNumElements();
+  SmallVector<int, 16> NewMask(Mask.begin(), Mask.end());
+  ShuffleOperandCompaction LHSCompact =
+      compactShuffleOperand(LHS, NewMask, 0, TTI, CostKind);
+  ShuffleOperandCompaction RHSCompact =
+      compactShuffleOperand(RHS, NewMask, LHSWidth, TTI, CostKind);
+
+  unsigned CompactedWidth =
+      std::max(LHSCompact.CompactedWidth, RHSCompact.CompactedWidth);
+
+  // Check total cost: compacting operands + change to outer shuffle.
+  if (LHSCompact.Apply || RHSCompact.Apply) {
+    auto *ShuffleDstTy = cast<FixedVectorType>(I.getType());
+    InstructionCost CostBefore =
+        TTI.getShuffleCost(TargetTransformInfo::SK_PermuteTwoSrc, ShuffleDstTy,
+                           LHSTy, Mask, CostKind, 0, nullptr, {LHS, RHS}, &I);
+
+    InstructionCost CostAfter =
+        TTI.getShuffleCost(TargetTransformInfo::SK_PermuteTwoSrc, ShuffleDstTy,
+                           LHSTy, NewMask, CostKind);
+
+    InstructionCost OuterCost = CostAfter - CostBefore;
+
+    if (OuterCost + LHSCompact.Cost + RHSCompact.Cost > 0)
+      return false;
+  } else if (CompactedWidth == LHSWidth)
+    return false;
+
+  Value *NewLHS =
+      LHSCompact.Apply ? LHSCompact.Apply(CompactedWidth, Builder) : LHS;
+  Value *NewRHS =
+      RHSCompact.Apply ? RHSCompact.Apply(CompactedWidth, Builder) : RHS;
+
+  // Ensure we terminate from the optimization fixpoint loop eventually.
+  if (LHS == NewLHS && RHS == NewRHS)
+    return false;
+
+  // Adjust RHS indices in the mask to account for the new LHS width.
+  for (int &MaskElt : NewMask)
+    if (MaskElt >= (int)LHSWidth)
+      MaskElt = MaskElt - LHSWidth + CompactedWidth;
+
+  Value *NewShuf = Builder.CreateShuffleVector(NewLHS, NewRHS, NewMask);
+  replaceValue(I, *NewShuf);
+  return true;
+}
+
 /// Try to convert any of:
 /// "shuffle (shuffle x, y), (shuffle y, x)"
 /// "shuffle (shuffle x, undef), (shuffle y, undef)"
@@ -5034,6 +5268,8 @@ bool VectorCombine::run() {
           return true;
         if (foldShuffleToIdentity(I))
           return true;
+        if (compactShuffleOperands(I))
+          return true;
         break;
       case Instruction::Load:
         if (shrinkLoadForShuffles(I))
diff --git a/llvm/test/Transforms/PhaseOrdering/X86/addsub.ll b/llvm/test/Transforms/PhaseOrdering/X86/addsub.ll
index de64bf2657f72..e3c1318278d38 100644
--- a/llvm/test/Transforms/PhaseOrdering/X86/addsub.ll
+++ b/llvm/test/Transforms/PhaseOrdering/X86/addsub.ll
@@ -334,8 +334,7 @@ define <4 x float> @test_addsub_v4f32_partial_23(<4 x float> %A, <4 x float> %B)
 ; CHECK-NEXT:    [[TMP2:%.*]] = shufflevector <4 x float> [[B:%.*]], <4 x float> poison, <2 x i32> <i32 2, i32 3>
 ; CHECK-NEXT:    [[TMP3:%.*]] = fsub <2 x float> [[TMP1]], [[TMP2]]
 ; CHECK-NEXT:    [[TMP4:%.*]] = fadd <2 x float> [[TMP1]], [[TMP2]]
-; CHECK-NEXT:    [[TMP5:%.*]] = shufflevector <2 x float> [[TMP3]], <2 x float> [[TMP4]], <4 x i32> <i32 0, i32 3, i32 poison, i32 poison>
-; CHECK-NEXT:    [[VECINSERT21:%.*]] = shufflevector <4 x float> [[TMP5]], <4 x float> <float undef, float undef, float poison, float poison>, <4 x i32> <i32 4, i32 5, i32 0, i32 1>
+; CHECK-NEXT:    [[VECINSERT21:%.*]] = shufflevector <2 x float> [[TMP3]], <2 x float> [[TMP4]], <4 x i32> <i32 poison, i32 poison, i32 0, i32 3>
 ; CHECK-NEXT:    ret <4 x float> [[VECINSERT21]]
 ;
   %1 = extractelement <4 x float> %A, i32 2
@@ -344,7 +343,7 @@ define <4 x float> @test_addsub_v4f32_partial_23(<4 x float> %A, <4 x float> %B)
   %3 = extractelement <4 x float> %A, i32 3
   %4 = extractelement <4 x float> %B, i32 3
   %add2 = fadd float %3, %4
-  %vecinsert1 = insertelement <4 x float> undef, float %sub2, i32 2
+  %vecinsert1 = insertelement <4 x float> poison, float %sub2, i32 2
   %vecinsert2 = insertelement <4 x float> %vecinsert1, float %add2, i32 3
   ret <4 x float> %vecinsert2
 }
@@ -353,8 +352,7 @@ define <4 x float> @test_addsub_v4f32_partial_03(<4 x float> %A, <4 x float> %B)
 ; CHECK-LABEL: @test_addsub_v4f32_partial_03(
 ; CHECK-NEXT:    [[FOLDEXTEXTBINOP:%.*]] = fsub <4 x float> [[A:%.*]], [[B:%.*]]
 ; CHECK-NEXT:    [[FOLDEXTEXTBINOP2:%.*]] = fadd <4 x float> [[A]], [[B]]
-; CHECK-NEXT:    [[VECINSERT1:%.*]] = shufflevector <4 x float> [[FOLDEXTEXTBINOP]], <4 x float> <float poison, float undef, float undef, float poison>, <4 x i32> <i32 0, i32 5, i32 6, i32 poison>
-; CHECK-NEXT:    [[VECINSERT2:%.*]] = shufflevector <4 x float> [[VECINSERT1]], <4 x float> [[FOLDEXTEXTBINOP2]], <4 x i32> <i32 0, i32 1, i32 2, i32 7>
+; CHECK-NEXT:    [[VECINSERT2:%.*]] = shufflevector <4 x float> [[FOLDEXTEXTBINOP]], <4 x float> [[FOLDEXTEXTBINOP2]], <4 x i32> <i32 0, i32 poison, i32 poison, i32 7>
 ; CHECK-NEXT:    ret <4 x float> [[VECINSERT2]]
 ;
   %1 = extractelement <4 x float> %A, i32 0
@@ -363,7 +361,7 @@ define <4 x float> @test_addsub_v4f32_partial_03(<4 x float> %A, <4 x float> %B)
   %3 = extractelement <4 x float> %A, i32 3
   %4 = extractelement <4 x float> %B, i32 3
   %add = fadd float %4, %3
-  %vecinsert1 = insertelement <4 x float> undef, float %sub, i32 0
+  %vecinsert1 = insertelement <4 x float> poison, float %sub, i32 0
   %vecinsert2 = insertelement <4 x float> %vecinsert1, float %add, i32 3
   ret <4 x float> %vecinsert2
 }
@@ -374,8 +372,7 @@ define <4 x float> @test_addsub_v4f32_partial_12(<4 x float> %A, <4 x float> %B)
 ; CHECK-NEXT:    [[TMP2:%.*]] = shufflevector <4 x float> [[B:%.*]], <4 x float> poison, <2 x i32> <i32 1, i32 2>
 ; CHECK-NEXT:    [[TMP3:%.*]] = fadd <2 x float> [[TMP1]], [[TMP2]]
 ; CHECK-NEXT:    [[TMP4:%.*]] = fsub <2 x float> [[TMP1]], [[TMP2]]
-; CHECK-NEXT:    [[TMP5:%.*]] = shufflevector <2 x float> [[TMP3]], <2 x float> [[TMP4]], <4 x i32> <i32 0, i32 3, i32 poison, i32 poison>
-; CHECK-NEXT:    [[VECINSERT21:%.*]] = shufflevector <4 x float> [[TMP5]], <4 x float> <float undef, float poison, float poison, float undef>, <4 x i32> <i32 4, i32 0, i32 1, i32 7>
+; CHECK-NEXT:    [[VECINSERT21:%.*]] = shufflevector <2 x float> [[TMP3]], <2 x float> [[TMP4]], <4 x i32> <i32 poison, i32 0, i32 3, i32 poison>
 ; CHECK-NEXT:    ret <4 x float> [[VECINSERT21]]
 ;
   %1 = extractelement <4 x float> %A, i32 2
@@ -384,7 +381,7 @@ define <4 x float> @test_addsub_v4f32_partial_12(<4 x float> %A, <4 x float> %B)
   %3 = extractelement <4 x float> %A, i32 1
   %4 = extractelement <4 x float> %B, i32 1
   %add = fadd float %3, %4
-  %vecinsert1 = insertelement <4 x float> undef, float %sub, i32 2
+  %vecinsert1 = insertelement <4 x float> poison, float %sub, i32 2
   %vecinsert2 = insertelement <4 x float> %vecinsert1, float %add, i32 1
   ret <4 x float> %vecinsert2
 }
diff --git a/llvm/test/Transforms/PhaseOrdering/X86/fmaddsub.ll b/llvm/test/Transforms/PhaseOrdering/X86/fmaddsub.ll
index c5f56d3644c5f..6370e9ccb50db 100644
--- a/llvm/test/Transforms/PhaseOrdering/X86/fmaddsub.ll
+++ b/llvm/test/Transforms/PhaseOrdering/X86/fmaddsub.ll
@@ -419,11 +419,11 @@ define <8 x double> @buildvector_mul_addsub_pd512_partial(<8 x double> %C, <8 x
 ; SSE-NEXT:    [[TMP4:%.*]] = shufflevector <8 x double> [[TMP3]], <8 x double> poison, <2 x i32> <i32 1, i32 3>
 ; SSE-NEXT:    [[TMP5:%.*]] = shufflevector <4 x double> [[TMP2]], <4 x double> poison, <6 x i32> <i32 0, i32 1, i32 2, i32 3, i32 poison, i32 poison>
 ; SSE-NEXT:    [[TMP6:%.*]] = shufflevector <2 x double> [[TMP4]], <2 x double> poison, <6 x i32> <i32 0, i32 1, i32 poison, i32 poison, i32 poison, i32 poison>
-; SSE-NEXT:    [[TMP7:%.*]] = shufflevector <6 x double> [[TMP5]], <6 x double> [[TMP6]], <6 x i32> <i32 0, i32 1, i32 2, i32 3, i32 6, i32 7>
 ; SSE-NEXT:    [[A7:%.*]] = extractelement <8 x double> [[A]], i64 7
 ; SSE-NEXT:    [[B7:%.*]] = extractelement <8 x double> [[B]], i64 7
 ; SSE-NEXT:    [[ADD7:%.*]] = fadd double [[A7]], [[B7]]
-; SSE-NEXT:    [[TMP8:%.*]] = shufflevector <6 x double> [[TMP7]], <6 x double> <double undef, double poison, double poison, double poison, double poison, double poison>, <8 x i32> <i32 0, i32 4, i32 1, i32 5, i32 2, i32 6, i32 3, i32 poison>
+; SSE-NEXT:    [[TMP7:%.*]] = shufflevector <6 x double> [[TMP5]], <6 x double> [[TMP6]], <6 x i32> <i32 0, i32 6, i32 1, i32 7, i32 2, i32 3>
+; SSE-NEXT:    [[TMP8:%.*]] = shufflevector <6 x double> [[TMP7]], <6 x double> <double undef, double poison, double poison, double poison, double poison, double poison>, <8 x i32> <i32 0, i32 1, i32 2, i32 3, i32 4, i32 6, i32 5, i32 poison>
 ; SSE-NEXT:    [[VECINSERT8:%.*]] = insertelement <8 x double> [[TMP8]], double [[ADD7]], i64 7
 ; SSE-NEXT:    ret <8 x double> [[VECINSERT8]]
 ;
@@ -934,11 +934,11 @@ define <8 x double> @buildvector_mul_subadd_pd512_partial(<8 x double> %C, <8 x
 ; SSE-NEXT:    [[TMP4:%.*]] = shufflevector <8 x double> [[TMP3]], <8 x double> poison, <2 x i32> <i32 1, i32 3>
 ; SSE-NEXT:    [[TMP5:%.*]] = shufflevector <4 x double> [[TMP2]], <4 x double> poison, <6 x i32> <i32 0, i32 1, i32 2, i32 3, i32 poison, i32 poison>
 ; SSE-NEXT:    [[TMP6:%.*]] = shufflevector <2 x double> [[TMP4]], <2 x double> poison, <6 x i32> <i32 0, i32 1, i32 poison, i32 poison, i32 poison, i32 poison>
-; SSE-NEXT:    [[TMP7:%.*]] = shufflevector <6 x double> [[TMP5]], <6 x double> [[TMP6]], <6 x i32> <i32 0, i32 1, i32 2, i32 3, i32 6, i32 7>
 ; SSE-NEXT:    [[A7:%.*]] = extractelement <8 x double> [[A]], i64 7
 ; SSE-NEXT:    [[B7:%.*]] = extractelement <8 x double> [[B]], i64 7
 ; SSE-NEXT:    [[ADD7:%.*]] = fsub double [[A7]], [[B7]]
-; SSE-NEXT:    [[TMP8:%.*]] = shufflevector <6 x double> [[TMP7]], <6 x double> <double undef, double poison, double poison, double poison, double poison, double poison>, <8 x i32> <i32 0, i32 4, i32 1, i32 5, i32 2, i32 6, i32 3, i32 poison>
+; SSE-NEXT:    [[TMP7:%.*]] = shufflevector <6 x double> [[TMP5]], <6 x double> [[TMP6]], <6 x i32> <i32 0, i32 6, i32 1, i32 7, i32 2, i32 3>
+; SSE-NEXT:    [[TMP8:%.*]] = shufflevector <6 x double> [[TMP7]], <6 x double> <double undef, double poison, double poison, double poison, double poison, double poison>, <8 x i32> <i32 0, i32 1, i32 2, i32 3, i32 4, i32 6, i32 5, i32 poison>
 ; SSE-NEXT:    [[VECINSERT8:%.*]] = insertelement <8 x double> [[TMP8]], double [[ADD7]], i64 7
 ; SSE-NEXT:    ret <8 x double> [[VECINSERT8]]
 ;
diff --git a/llvm/test/Transforms/VectorCombine/AArch64/shuffletoidentity.ll b/llvm/test/Transforms/VectorCombine/AArch64/shuffletoidentity.ll
index 7ffd0d29b4f05..5de2bb6515e15 100644
--- a/llvm/test/Transforms/VectorCombine/AArch64/shuffletoidentity.ll
+++ b/llvm/test/Transforms/VectorCombine/AArch64/shuffletoidentity.ll
@@ -1026,9 +1026,8 @@ define <4 x i64> @bitcast_smax_v8i32_v4i32(<4 x i64> %a, <4 x i64> %b) {
 define void @bitcast_srcty_mismatch() {
 ; CHECK-LABEL: @bitcast_srcty_mismatch(
 ; CHECK-NEXT:  entry:
-; CHECK-NEXT:    [[SHUFFLE_I_I:%.*]] = shufflevector <2 x i64> zeroinitializer, <2 x i64> zeroinitializer, <2 x i32> <i32 1, i32 3>
 ; CHECK-NEXT:    [[TMP0:%.*]] = bitcast <4 x i32> zeroinitializer to <4 x float>
-; CHECK-NEXT:    [[TMP1:%.*]] = bitcast <2 x i64> [[SHUFFLE_I_I]] to <4 x float>
+; CHECK-NEXT:    [[TMP1:%.*]] = bitcast <2 x i64> zeroinitializer to <4 x float>
 ; CHECK-NEXT:    [[SHUFP_I196:%.*]] = shufflevector <4 x float> [[TMP0]], <4 x float> [[TMP1]], <4 x i32> <i32 2, i32 1, i32 4, i32 7>
 ; CHECK-NEXT:    store <4 x float> [[SHUFP_I196]], ptr null, align 16
 ; CHECK-NEXT:    ret void
@@ -1064,8 +1063,8 @@ entry:
 define <16 x...
[truncated]

[VectorCombine] Compact shuffle operands by eliminating unused elements

be270fe

ginsbach force-pushed the ginsbach/VectorCombineCompactShuffleOperands branch 2 times, most recently from c05eef5 to 85a52be Compare January 15, 2026 02:08

replace undef with poison in test cases

85a52be

ginsbach marked this pull request as ready for review January 15, 2026 13:22

llvmbot added vectorizers llvm:transforms llvm:vectorcombine labels Jan 15, 2026

ginsbach mentioned this pull request Jan 15, 2026

[InstCombine] Compact shuffle operands by eliminating unused elements #175255

Closed

RKSimon requested review from RKSimon and fhahn January 15, 2026 13:40

ginsbach mentioned this pull request Jan 18, 2026

[SelectionDAG] Handle undef at any position in isConstantSequence #176671

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[VectorCombine] Compact shuffle operands by eliminating unused elements #176074

[VectorCombine] Compact shuffle operands by eliminating unused elements #176074

ginsbach commented Jan 15, 2026

Uh oh!

github-actions bot commented Jan 15, 2026 •

edited

Loading

Uh oh!

github-actions bot commented Jan 15, 2026 •

edited

Loading

Uh oh!

llvmbot commented Jan 15, 2026 •

edited

Loading

Motivation

Implementation

Questions for Reviewers

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

[VectorCombine] Compact shuffle operands by eliminating unused elements #176074

Are you sure you want to change the base?

[VectorCombine] Compact shuffle operands by eliminating unused elements #176074

Conversation

ginsbach commented Jan 15, 2026

Motivation

Implementation

Questions for Reviewers

Uh oh!

github-actions bot commented Jan 15, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

github-actions bot commented Jan 15, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

llvmbot commented Jan 15, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Motivation

Implementation

Questions for Reviewers

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

github-actions bot commented Jan 15, 2026 •

edited

Loading

github-actions bot commented Jan 15, 2026 •

edited

Loading

llvmbot commented Jan 15, 2026 •

edited

Loading