[InstCombine] Compact shuffle operands by eliminating unused elements #175255

ginsbach · 2026-01-09T22:02:49Z

Disclaimer: This is my first contribution to InstCombine. This addresses a specific problem (#137447), and I've learned a lot implementing this, but I'm open to suggestions for improvement or even an entirely different approach.

This patch implements an optimization in InstCombine that reduces the size of shuffle operands by eliminating elements that are not referenced by the shuffle mask.

Motivation

I came up with this transformation for #137447, to improve the optimization of the following code:

#include <arm_neon.h>

int8x16_t f(int8_t x)
{
  return (int8x16_t) { x, 0, x, 1, x, 2, x, 3,
                       x, 4, x, 5, x, 6, x, 7 };
}

int8x16_t g(int8_t x)
{
  return (int8x16_t) { 0, x, 1, x, 2, x, 3, x,
                       4, x, 5, x, 6, x, 7, x };
}

On main, this generates two vectors that are loaded from the constant pool and then each interleaved with a splat vector:

<16 x i8> <i8 poison, i8 0, i8 poison, i8 1, i8 poison, i8 2, i8 poison, i8 3, i8 poison, i8 4, i8 poison, i8 5, i8 poison, i8 6, i8 poison, i8 7>
<16 x i8> <i8 0, i8 poison, i8 1, i8 poison, i8 2, i8 poison, i8 3, i8 poison, i8 4, i8 poison, i8 5, i8 poison, i8 6, i8 poison, i8 7, i8 poison>

This PR's transformation compacts both to:

<8 x i8> <i8 0, i8 1, i8 2, i8 3, i8 4, i8 5, i8 6, i8 7>

The first two test cases in shufflevec-compact-operands.ll are based on this motivating example.

The primary benefit should be reducing constant pool usage. In this specific case, the constant pool could even be eliminated entirely, as the new vectors could be generated with the index instruction.

Note that this ideal will need follow-up work in the backend. With this PR alone, the output for f (and almost exactly the same for g) becomes:

.LCPI0_0:
	.byte	0                               // 0x0
	.byte	1                               // 0x1
	.byte	2                               // 0x2
	.byte	3                               // 0x3
	.byte	4                               // 0x4
	.byte	5                               // 0x5
	.byte	6                               // 0x6
	.byte	7                               // 0x7
	.zero	1
	.zero	1
	.zero	1
	.zero	1
	.zero	1
	.zero	1
	.zero	1
	.zero	1
	.text
	.globl	f
	.p2align	2
	.type	f,@function
f:                                      // @f
	.cfi_startproc
// %bb.0:                               // %entry
	adrp	x8, .LCPI0_0
	dup	v0.16b, w0
	ldr	q1, [x8, :lo12:.LCPI0_0]
	zip1	v0.16b, v0.16b, v1.16b
	ret

Implementation

The optimization identifies which elements are actually used from each shuffle operand. For single-use compactable operands (constants or shuffles), it creates narrower vectors containing only the used elements and updates the shuffle mask accordingly.

Conservative heuristic: The transformation only applies when 1) at least one operand is a constant; 2) all operands are constants or shufflevectors; and 3) both operands have no other uses. The heuristic is target-independent.

Questions for Reviewers

Heuristics: The main concern is that the transformation can destroy regularity in the shuffle masks. The current heuristic entirely avoids cases where both operands are themselves shufflevector instructions, which should eliminate the most problematic cases. Nonetheless, I could incorporate TTI::getShuffleCost to avoid regressions. My understanding is that this is discouraged in InstCombine, but I could move to VectorCombine or elsewhere if that seems a better place?
Non-power-of-2 vector sizes: The optimization can create vectors like <3 x i32>, and the heuristic does not consider power-of-2 vector sizes preferable in any way. I believe the backend handles this gracefully, but I'd appreciate confirmation.
Code structure 1: I used std::function with move-captured lambdas to defer operand construction, which feels somewhat inelegant. However, alternatives seem worse: we need to analyze both operands before deciding whether to transform, but ShuffleVectorInst is immutable once created, so we can't speculatively create compacted operands. Creating temporary instructions would be wasteful if we don't apply the transformation. Is this lambda-based approach acceptable given these constraints, or is there a cleaner design I'm missing?
Code structure 2: It was a bit awkward to insert tryCompactShuffleOperands at the end of visitShuffleVectorInst. I wanted to ensure this never disrupts existing transformations in InstCombineVectorOps, but I could not rely on the fixpoint iteration to reach my transformation if I just inserted it as a single call at the bottom. Instead, I ended up with the three call sites. I suppose I could pull out the entire existing code below bool MadeChange = false; into a helper function to avoid this duplication, but that seemed distracting for review.

github-actions · 2026-01-09T22:04:41Z

✅ With the latest revision this PR passed the undef deprecator.

llvmbot · 2026-01-10T10:08:10Z

@llvm/pr-subscribers-llvm-transforms

Author: Philip Ginsbach-Chen (ginsbach)

Changes

Disclaimer: This is my first contribution to InstCombine. This addresses a specific problem (#137447), and I've learned a lot implementing this, but I'm open to suggestions for improvement or even an entirely different approach.

This patch implements an optimization in InstCombine that reduces the size of shuffle operands by eliminating elements that are not referenced by the shuffle mask.

Motivation

I came up with this transformation for #137447, to improve the optimization of the following code:

#include &lt;arm_neon.h&gt;

int8x16_t f(int8_t x)
{
  return (int8x16_t) { x, 0, x, 1, x, 2, x, 3,
                       x, 4, x, 5, x, 6, x, 7 };
}

int8x16_t g(int8_t x)
{
  return (int8x16_t) { 0, x, 1, x, 2, x, 3, x,
                       4, x, 5, x, 6, x, 7, x };
}

On main, this generates two vectors that are loaded from the constant pool and then each interleaved with a splat vector:

&lt;16 x i8&gt; &lt;i8 poison, i8 0, i8 poison, i8 1, i8 poison, i8 2, i8 poison, i8 3, i8 poison, i8 4, i8 poison, i8 5, i8 poison, i8 6, i8 poison, i8 7&gt;
&lt;16 x i8&gt; &lt;i8 0, i8 poison, i8 1, i8 poison, i8 2, i8 poison, i8 3, i8 poison, i8 4, i8 poison, i8 5, i8 poison, i8 6, i8 poison, i8 7, i8 poison&gt;

This PR's transformation compacts both to:

&lt;8 x i8&gt; &lt;i8 0, i8 1, i8 2, i8 3, i8 4, i8 5, i8 6, i8 7&gt;

The first two test cases in shufflevec-compact-operands.ll are based on this motivating example.

The primary benefit should be reducing constant pool usage. In this specific case, the constant pool could even be eliminated entirely, as the new vectors could be generated with the index instruction.

Note that this ideal will need follow-up work in the backend. With this PR alone, the output for f (and almost exactly the same for g) becomes:

.LCPI0_0:
	.byte	0                               // 0x0
	.byte	1                               // 0x1
	.byte	2                               // 0x2
	.byte	3                               // 0x3
	.byte	4                               // 0x4
	.byte	5                               // 0x5
	.byte	6                               // 0x6
	.byte	7                               // 0x7
	.zero	1
	.zero	1
	.zero	1
	.zero	1
	.zero	1
	.zero	1
	.zero	1
	.zero	1
	.text
	.globl	f
	.p2align	2
	.type	f,@<!-- -->function
f:                                      // @<!-- -->f
	.cfi_startproc
// %bb.0:                               // %entry
	adrp	x8, .LCPI0_0
	dup	v0.16b, w0
	ldr	q1, [x8, :lo12:.LCPI0_0]
	zip1	v0.16b, v0.16b, v1.16b
	ret

Implementation

The optimization identifies which elements are actually used from each shuffle operand. For single-use compactable operands (constants or shuffles), it creates narrower vectors containing only the used elements and updates the shuffle mask accordingly.

Conservative heuristic: The transformation only applies when 1) at least one operand is a constant; 2) all operands are constants or shufflevectors; and 3) both operands have no other uses. The heuristic is target-independent.

Questions for Reviewers

Heuristics: The main concern is that the transformation can destroy regularity in the shuffle masks. The current heuristic entirely avoids cases where both operands are themselves shufflevector instructions, which should eliminate the most problematic cases. Nonetheless, I could incorporate TTI::getShuffleCost to avoid regressions. My understanding is that this is discouraged in InstCombine, but I could move to VectorCombine or elsewhere if that seems a better place?
Non-power-of-2 vector sizes: The optimization can create vectors like <3 x i32>, and the heuristic does not consider power-of-2 vector sizes preferable in any way. I believe the backend handles this gracefully, but I'd appreciate confirmation.
Code structure 1: I used std::function with move-captured lambdas to defer operand construction, which feels somewhat inelegant. However, alternatives seem worse: we need to analyze both operands before deciding whether to transform, but ShuffleVectorInst is immutable once created, so we can't speculatively create compacted operands. Creating temporary instructions would be wasteful if we don't apply the transformation. Is this lambda-based approach acceptable given these constraints, or is there a cleaner design I'm missing?
Code structure 2: It was a bit awkward to insert tryCompactShuffleOperands at the end of visitShuffleVectorInst. I wanted to ensure this never disrupts existing transformations in InstCombineVectorOps, but I could not rely on the fixpoint iteration to reach my transformation if I just inserted it as a single call at the bottom. Instead, I ended up with the three call sites. I suppose I could pull out the entire existing code below bool MadeChange = false; into a helper function to avoid this duplication, but that seemed distracting for review.

Patch is 36.71 KiB, truncated to 20.00 KiB below, full version: https://github.com/llvm/llvm-project/pull/175255.diff

10 Files Affected:

(modified) llvm/lib/Transforms/InstCombine/InstCombineVectorOps.cpp (+142-3)
(modified) llvm/test/Transforms/InstCombine/AMDGPU/simplify-demanded-vector-elts-lane-intrinsics.ll (+6-12)
(modified) llvm/test/Transforms/InstCombine/insert-extract-shuffle-inseltpoison.ll (+2-2)
(modified) llvm/test/Transforms/InstCombine/insert-extract-shuffle.ll (+2-2)
(added) llvm/test/Transforms/InstCombine/shufflevec-compact-operands.ll (+180)
(modified) llvm/test/Transforms/InstCombine/vec_shuffle-inseltpoison.ll (+2-2)
(modified) llvm/test/Transforms/InstCombine/vec_shuffle.ll (+2-2)
(modified) llvm/test/Transforms/PhaseOrdering/X86/addsub.ll (+4-6)
(modified) llvm/test/Transforms/PhaseOrdering/X86/hadd.ll (+2-2)
(modified) llvm/test/Transforms/PhaseOrdering/X86/hsub.ll (+2-2)

diff --git a/llvm/lib/Transforms/InstCombine/InstCombineVectorOps.cpp b/llvm/lib/Transforms/InstCombine/InstCombineVectorOps.cpp
index 3b034f6c37f66..a5a859ed57445 100644
--- a/llvm/lib/Transforms/InstCombine/InstCombineVectorOps.cpp
+++ b/llvm/lib/Transforms/InstCombine/InstCombineVectorOps.cpp
@@ -2893,6 +2893,145 @@ Instruction *InstCombinerImpl::simplifyBinOpSplats(ShuffleVectorInst &SVI) {
   return new ShuffleVectorInst(NewBO, SVI.getShuffleMask());
 }
 
+/// Describes whether and how a shuffle operand can be compacted.
+struct ShuffleOperandCompaction {
+  /// Whether this operand can be compacted (has a single use and is either
+  /// a constant or another shuffle instruction).
+  bool CanCompact;
+  /// Conservative heuristic: whether this operand's compaction justifies
+  /// the overall transformation (true for constants; false for shuffles).
+  bool ShouldCompact;
+  /// The minimal width required for the compacted vector.
+  unsigned CompactedWidth;
+  /// Function to create the compacted operand if the transformation applies.
+  std::function<Value *(unsigned, InstCombiner::BuilderTy &)> Apply;
+};
+
+/// Attempt to narrow/compact a constant vector used in a shuffle by removing
+/// elements that are not referenced by the shuffle mask.
+static ShuffleOperandCompaction
+compactShuffleOperand(Constant *ShuffleInput,
+                      MutableArrayRef<int> UserShuffleMask, int IndexStart) {
+  auto *VecTy = cast<FixedVectorType>(ShuffleInput->getType());
+  unsigned Width = VecTy->getNumElements();
+
+  // Collect only the constant elements that are actually used.
+  SmallVector<Constant *, 16> CompactedElts;
+  // Map from original element index to compacted index.
+  SmallVector<int, 16> IndexRemap(Width, -1);
+
+  for (int &MaskElt : UserShuffleMask) {
+    if (MaskElt >= IndexStart && MaskElt < IndexStart + (int)Width) {
+      int RelMaskElt = MaskElt - IndexStart;
+      if (IndexRemap[RelMaskElt] < 0) {
+        IndexRemap[RelMaskElt] = CompactedElts.size() + IndexStart;
+        CompactedElts.push_back(ShuffleInput->getAggregateElement(RelMaskElt));
+      }
+      MaskElt = IndexRemap[RelMaskElt];
+    }
+  }
+
+  return {true, true, static_cast<unsigned>(CompactedElts.size()),
+          [CompactedElts = std::move(CompactedElts),
+           VecTy](unsigned PaddedWidth,
+                  InstCombiner::BuilderTy &Builder) -> Value * {
+            // Pad with poison to reach the requested width.
+            SmallVector<Constant *, 16> PaddedElts(CompactedElts);
+            while (PaddedElts.size() < PaddedWidth)
+              PaddedElts.push_back(PoisonValue::get(VecTy->getElementType()));
+
+            return ConstantVector::get(PaddedElts);
+          }};
+}
+
+/// Attempt to narrow/compact a shuffle instruction used in a shuffle by
+/// removing elements that are not referenced by the shuffle mask.
+static ShuffleOperandCompaction
+compactShuffleOperand(ShuffleVectorInst *ShuffleInput,
+                      MutableArrayRef<int> UserShuffleMask, int IndexStart) {
+  auto *VecTy = cast<FixedVectorType>(ShuffleInput->getType());
+  unsigned Width = VecTy->getNumElements();
+
+  // Collect only the shuffle mask elements that are actually used.
+  SmallVector<int, 16> CompactedMask;
+  // Map from original element index to compacted index.
+  SmallVector<int, 16> IndexRemap(Width, -1);
+
+  for (int &MaskElt : UserShuffleMask) {
+    if (MaskElt >= IndexStart && MaskElt < IndexStart + (int)Width) {
+      int RelMaskElt = MaskElt - IndexStart;
+      if (IndexRemap[RelMaskElt] < 0) {
+        IndexRemap[RelMaskElt] = CompactedMask.size() + IndexStart;
+        CompactedMask.push_back(ShuffleInput->getMaskValue(RelMaskElt));
+      }
+      MaskElt = IndexRemap[RelMaskElt];
+    }
+  }
+
+  return {true, false, static_cast<unsigned>(CompactedMask.size()),
+          [CompactedMask = std::move(CompactedMask),
+           ShuffleInput](unsigned PaddedWidth,
+                         InstCombiner::BuilderTy &Builder) -> Value * {
+            // Pad with poison mask elements to reach the requested width.
+            SmallVector<int, 16> PaddedMask(CompactedMask);
+            while (PaddedMask.size() < PaddedWidth)
+              PaddedMask.push_back(PoisonMaskElem);
+
+            return Builder.CreateShuffleVector(ShuffleInput->getOperand(0),
+                                               ShuffleInput->getOperand(1),
+                                               PaddedMask);
+          }};
+}
+
+/// Try to narrow/compact a shuffle operand by eliminating elements that are
+/// not used by the shuffle mask. This updates the shuffle mask in-place to
+/// reflect the compaction. Returns information about whether compaction is
+/// possible and a lambda to apply the compaction if beneficial.
+static ShuffleOperandCompaction
+compactShuffleOperand(Value *ShuffleInput, MutableArrayRef<int> ShuffleMask,
+                      int IndexStart) {
+  if (ShuffleInput->getNumUses() > 1)
+    return {false, false, 0, nullptr};
+
+  if (auto *C = dyn_cast<Constant>(ShuffleInput))
+    return compactShuffleOperand(C, ShuffleMask, IndexStart);
+  if (auto *Shuf = dyn_cast<ShuffleVectorInst>(ShuffleInput))
+    return compactShuffleOperand(Shuf, ShuffleMask, IndexStart);
+
+  return {false, false, 0, nullptr};
+}
+
+/// Try to narrow the shuffle by eliminating unused elements from the operands.
+static Instruction *tryCompactShuffleOperands(ShuffleVectorInst &SVI,
+                                              InstCombinerImpl &IC) {
+  Value *LHS = SVI.getOperand(0);
+  Value *RHS = SVI.getOperand(1);
+  ArrayRef<int> Mask = SVI.getShuffleMask();
+  unsigned LHSWidth = cast<FixedVectorType>(LHS->getType())->getNumElements();
+
+  SmallVector<int, 16> NewMask(Mask.begin(), Mask.end());
+  ShuffleOperandCompaction LHSCompact = compactShuffleOperand(LHS, NewMask, 0);
+  ShuffleOperandCompaction RHSCompact =
+      compactShuffleOperand(RHS, NewMask, LHSWidth);
+  if (LHSCompact.CanCompact && RHSCompact.CanCompact &&
+      (LHSCompact.ShouldCompact || RHSCompact.ShouldCompact)) {
+    unsigned CompactWidth =
+        std::max(LHSCompact.CompactedWidth, RHSCompact.CompactedWidth);
+    if (CompactWidth < LHSWidth) {
+      IC.replaceOperand(SVI, 0, LHSCompact.Apply(CompactWidth, IC.Builder));
+      IC.replaceOperand(SVI, 1, RHSCompact.Apply(CompactWidth, IC.Builder));
+      // Adjust RHS indices in the mask to account for the new LHS width.
+      for (int &MaskElt : NewMask)
+        if (MaskElt >= (int)LHSWidth)
+          MaskElt = MaskElt - LHSWidth + CompactWidth;
+      SVI.setShuffleMask(NewMask);
+      return &SVI;
+    }
+  }
+
+  return nullptr;
+}
+
 Instruction *InstCombinerImpl::visitShuffleVectorInst(ShuffleVectorInst &SVI) {
   Value *LHS = SVI.getOperand(0);
   Value *RHS = SVI.getOperand(1);
@@ -3172,7 +3311,7 @@ Instruction *InstCombinerImpl::visitShuffleVectorInst(ShuffleVectorInst &SVI) {
     if (!match(RHSShuffle->getOperand(1), m_Poison()))
       RHSShuffle = nullptr;
   if (!LHSShuffle && !RHSShuffle)
-    return MadeChange ? &SVI : nullptr;
+    return MadeChange ? &SVI : tryCompactShuffleOperands(SVI, *this);
 
   Value* LHSOp0 = nullptr;
   Value* LHSOp1 = nullptr;
@@ -3212,7 +3351,7 @@ Instruction *InstCombinerImpl::visitShuffleVectorInst(ShuffleVectorInst &SVI) {
   }
 
   if (newLHS == LHS && newRHS == RHS)
-    return MadeChange ? &SVI : nullptr;
+    return MadeChange ? &SVI : tryCompactShuffleOperands(SVI, *this);
 
   ArrayRef<int> LHSMask;
   ArrayRef<int> RHSMask;
@@ -3294,5 +3433,5 @@ Instruction *InstCombinerImpl::visitShuffleVectorInst(ShuffleVectorInst &SVI) {
     return new ShuffleVectorInst(newLHS, newRHS, newMask);
   }
 
-  return MadeChange ? &SVI : nullptr;
+  return MadeChange ? &SVI : tryCompactShuffleOperands(SVI, *this);
 }
diff --git a/llvm/test/Transforms/InstCombine/AMDGPU/simplify-demanded-vector-elts-lane-intrinsics.ll b/llvm/test/Transforms/InstCombine/AMDGPU/simplify-demanded-vector-elts-lane-intrinsics.ll
index 056caabb6d60a..818a83fcc4103 100644
--- a/llvm/test/Transforms/InstCombine/AMDGPU/simplify-demanded-vector-elts-lane-intrinsics.ll
+++ b/llvm/test/Transforms/InstCombine/AMDGPU/simplify-demanded-vector-elts-lane-intrinsics.ll
@@ -115,8 +115,7 @@ define <2 x i16> @extract_elt32_v4i16_readfirstlane(<4 x i16> %src) {
 ; CHECK-SAME: <4 x i16> [[SRC:%.*]]) #[[ATTR0]] {
 ; CHECK-NEXT:    [[TMP1:%.*]] = shufflevector <4 x i16> [[SRC]], <4 x i16> poison, <2 x i32> <i32 2, i32 3>
 ; CHECK-NEXT:    [[TMP2:%.*]] = call <2 x i16> @llvm.amdgcn.readfirstlane.v2i16(<2 x i16> [[TMP1]])
-; CHECK-NEXT:    [[VEC:%.*]] = shufflevector <2 x i16> [[TMP2]], <2 x i16> poison, <4 x i32> <i32 poison, i32 poison, i32 0, i32 1>
-; CHECK-NEXT:    [[SHUFFLE:%.*]] = shufflevector <4 x i16> [[VEC]], <4 x i16> poison, <2 x i32> <i32 3, i32 2>
+; CHECK-NEXT:    [[SHUFFLE:%.*]] = shufflevector <2 x i16> [[TMP2]], <2 x i16> poison, <2 x i32> <i32 1, i32 0>
 ; CHECK-NEXT:    ret <2 x i16> [[SHUFFLE]]
 ;
   %vec = call <4 x i16> @llvm.amdgcn.readfirstlane.v4i16(<4 x i16> %src)
@@ -287,8 +286,7 @@ define <2 x i32> @extract_elt13_v4i32_readfirstlane(<4 x i32> %src) {
 ; CHECK-SAME: <4 x i32> [[SRC:%.*]]) #[[ATTR0]] {
 ; CHECK-NEXT:    [[TMP1:%.*]] = shufflevector <4 x i32> [[SRC]], <4 x i32> poison, <3 x i32> <i32 1, i32 poison, i32 3>
 ; CHECK-NEXT:    [[TMP2:%.*]] = call <3 x i32> @llvm.amdgcn.readfirstlane.v3i32(<3 x i32> [[TMP1]])
-; CHECK-NEXT:    [[VEC:%.*]] = shufflevector <3 x i32> [[TMP2]], <3 x i32> poison, <4 x i32> <i32 poison, i32 0, i32 poison, i32 2>
-; CHECK-NEXT:    [[SHUFFLE:%.*]] = shufflevector <4 x i32> [[VEC]], <4 x i32> poison, <2 x i32> <i32 1, i32 3>
+; CHECK-NEXT:    [[SHUFFLE:%.*]] = shufflevector <3 x i32> [[TMP2]], <3 x i32> poison, <2 x i32> <i32 0, i32 2>
 ; CHECK-NEXT:    ret <2 x i32> [[SHUFFLE]]
 ;
   %vec = call <4 x i32> @llvm.amdgcn.readfirstlane.v4i32(<4 x i32> %src)
@@ -328,8 +326,7 @@ define < 2 x i32> @extract_elt13_v4i32_readfirstlane_source_simplify1(i32 %src0,
 ; CHECK-NEXT:    [[TMP1:%.*]] = insertelement <4 x i32> poison, i32 [[SRC0]], i64 0
 ; CHECK-NEXT:    [[TMP2:%.*]] = shufflevector <4 x i32> [[TMP1]], <4 x i32> poison, <3 x i32> <i32 0, i32 poison, i32 0>
 ; CHECK-NEXT:    [[TMP3:%.*]] = call <3 x i32> @llvm.amdgcn.readfirstlane.v3i32(<3 x i32> [[TMP2]])
-; CHECK-NEXT:    [[VEC:%.*]] = shufflevector <3 x i32> [[TMP3]], <3 x i32> poison, <4 x i32> <i32 poison, i32 0, i32 poison, i32 2>
-; CHECK-NEXT:    [[SHUFFLE:%.*]] = shufflevector <4 x i32> [[VEC]], <4 x i32> poison, <2 x i32> <i32 1, i32 3>
+; CHECK-NEXT:    [[SHUFFLE:%.*]] = shufflevector <3 x i32> [[TMP3]], <3 x i32> poison, <2 x i32> <i32 0, i32 2>
 ; CHECK-NEXT:    ret <2 x i32> [[SHUFFLE]]
 ;
   %ins.0 = insertelement <4 x i32> poison, i32 %src0, i32 1
@@ -372,8 +369,7 @@ define < 2 x i32> @extract_elt13_v4i32_readfirstlane_source_simplify1_convergenc
 ; CHECK-NEXT:    [[TMP1:%.*]] = insertelement <4 x i32> poison, i32 [[SRC0]], i64 0
 ; CHECK-NEXT:    [[TMP2:%.*]] = shufflevector <4 x i32> [[TMP1]], <4 x i32> poison, <3 x i32> <i32 0, i32 poison, i32 0>
 ; CHECK-NEXT:    [[TMP3:%.*]] = call <3 x i32> @llvm.amdgcn.readfirstlane.v3i32(<3 x i32> [[TMP2]]) [ "convergencectrl"(token [[T]]) ]
-; CHECK-NEXT:    [[VEC:%.*]] = shufflevector <3 x i32> [[TMP3]], <3 x i32> poison, <4 x i32> <i32 poison, i32 0, i32 poison, i32 2>
-; CHECK-NEXT:    [[SHUFFLE:%.*]] = shufflevector <4 x i32> [[VEC]], <4 x i32> poison, <2 x i32> <i32 1, i32 3>
+; CHECK-NEXT:    [[SHUFFLE:%.*]] = shufflevector <3 x i32> [[TMP3]], <3 x i32> poison, <2 x i32> <i32 0, i32 2>
 ; CHECK-NEXT:    ret <2 x i32> [[SHUFFLE]]
 ;
   %t = call token @llvm.experimental.convergence.entry()
@@ -413,8 +409,7 @@ define <2 x i32> @extract_elt13_v8i32_readfirstlane(<8 x i32> %src) {
 ; CHECK-SAME: <8 x i32> [[SRC:%.*]]) #[[ATTR0]] {
 ; CHECK-NEXT:    [[TMP1:%.*]] = shufflevector <8 x i32> [[SRC]], <8 x i32> poison, <3 x i32> <i32 1, i32 poison, i32 3>
 ; CHECK-NEXT:    [[TMP2:%.*]] = call <3 x i32> @llvm.amdgcn.readfirstlane.v3i32(<3 x i32> [[TMP1]])
-; CHECK-NEXT:    [[VEC:%.*]] = shufflevector <3 x i32> [[TMP2]], <3 x i32> poison, <8 x i32> <i32 poison, i32 0, i32 poison, i32 2, i32 poison, i32 poison, i32 poison, i32 poison>
-; CHECK-NEXT:    [[SHUFFLE:%.*]] = shufflevector <8 x i32> [[VEC]], <8 x i32> poison, <2 x i32> <i32 1, i32 3>
+; CHECK-NEXT:    [[SHUFFLE:%.*]] = shufflevector <3 x i32> [[TMP2]], <3 x i32> poison, <2 x i32> <i32 0, i32 2>
 ; CHECK-NEXT:    ret <2 x i32> [[SHUFFLE]]
 ;
   %vec = call <8 x i32> @llvm.amdgcn.readfirstlane.v8i32(<8 x i32> %src)
@@ -439,8 +434,7 @@ define <3 x i32> @extract_elt124_v8i32_readfirstlane(<8 x i32> %src) {
 ; CHECK-SAME: <8 x i32> [[SRC:%.*]]) #[[ATTR0]] {
 ; CHECK-NEXT:    [[TMP1:%.*]] = shufflevector <8 x i32> [[SRC]], <8 x i32> poison, <4 x i32> <i32 1, i32 2, i32 poison, i32 4>
 ; CHECK-NEXT:    [[TMP2:%.*]] = call <4 x i32> @llvm.amdgcn.readfirstlane.v4i32(<4 x i32> [[TMP1]])
-; CHECK-NEXT:    [[VEC:%.*]] = shufflevector <4 x i32> [[TMP2]], <4 x i32> poison, <8 x i32> <i32 poison, i32 0, i32 1, i32 poison, i32 3, i32 poison, i32 poison, i32 poison>
-; CHECK-NEXT:    [[SHUFFLE:%.*]] = shufflevector <8 x i32> [[VEC]], <8 x i32> poison, <3 x i32> <i32 1, i32 2, i32 4>
+; CHECK-NEXT:    [[SHUFFLE:%.*]] = shufflevector <4 x i32> [[TMP2]], <4 x i32> poison, <3 x i32> <i32 0, i32 1, i32 3>
 ; CHECK-NEXT:    ret <3 x i32> [[SHUFFLE]]
 ;
   %vec = call <8 x i32> @llvm.amdgcn.readfirstlane.v8i32(<8 x i32> %src)
diff --git a/llvm/test/Transforms/InstCombine/insert-extract-shuffle-inseltpoison.ll b/llvm/test/Transforms/InstCombine/insert-extract-shuffle-inseltpoison.ll
index 8bc915e695aa7..5dba85b0b2452 100644
--- a/llvm/test/Transforms/InstCombine/insert-extract-shuffle-inseltpoison.ll
+++ b/llvm/test/Transforms/InstCombine/insert-extract-shuffle-inseltpoison.ll
@@ -86,8 +86,8 @@ define <8 x float> @widen_extract4(<8 x float> %ins, <2 x float> %ext) {
 
 define <8 x i16> @pr26015(<4 x i16> %t0) {
 ; CHECK-LABEL: @pr26015(
-; CHECK-NEXT:    [[TMP1:%.*]] = shufflevector <4 x i16> [[T0:%.*]], <4 x i16> poison, <8 x i32> <i32 poison, i32 poison, i32 2, i32 3, i32 poison, i32 poison, i32 poison, i32 poison>
-; CHECK-NEXT:    [[T5:%.*]] = shufflevector <8 x i16> <i16 0, i16 0, i16 0, i16 poison, i16 0, i16 0, i16 0, i16 poison>, <8 x i16> [[TMP1]], <8 x i32> <i32 0, i32 1, i32 2, i32 10, i32 4, i32 5, i32 6, i32 11>
+; CHECK-NEXT:    [[TMP1:%.*]] = shufflevector <4 x i16> [[T0:%.*]], <4 x i16> poison, <6 x i32> <i32 2, i32 3, i32 poison, i32 poison, i32 poison, i32 poison>
+; CHECK-NEXT:    [[T5:%.*]] = shufflevector <6 x i16> zeroinitializer, <6 x i16> [[TMP1]], <8 x i32> <i32 0, i32 1, i32 2, i32 6, i32 3, i32 4, i32 5, i32 7>
 ; CHECK-NEXT:    ret <8 x i16> [[T5]]
 ;
   %t1 = extractelement <4 x i16> %t0, i32 2
diff --git a/llvm/test/Transforms/InstCombine/insert-extract-shuffle.ll b/llvm/test/Transforms/InstCombine/insert-extract-shuffle.ll
index 470d6be88672b..5d98e20cb7b01 100644
--- a/llvm/test/Transforms/InstCombine/insert-extract-shuffle.ll
+++ b/llvm/test/Transforms/InstCombine/insert-extract-shuffle.ll
@@ -86,8 +86,8 @@ define <8 x float> @widen_extract4(<8 x float> %ins, <2 x float> %ext) {
 
 define <8 x i16> @pr26015(<4 x i16> %t0) {
 ; CHECK-LABEL: @pr26015(
-; CHECK-NEXT:    [[TMP1:%.*]] = shufflevector <4 x i16> [[T0:%.*]], <4 x i16> poison, <8 x i32> <i32 poison, i32 poison, i32 2, i32 3, i32 poison, i32 poison, i32 poison, i32 poison>
-; CHECK-NEXT:    [[T5:%.*]] = shufflevector <8 x i16> <i16 0, i16 0, i16 0, i16 poison, i16 0, i16 0, i16 0, i16 poison>, <8 x i16> [[TMP1]], <8 x i32> <i32 0, i32 1, i32 2, i32 10, i32 4, i32 5, i32 6, i32 11>
+; CHECK-NEXT:    [[TMP1:%.*]] = shufflevector <4 x i16> [[T0:%.*]], <4 x i16> poison, <6 x i32> <i32 2, i32 3, i32 poison, i32 poison, i32 poison, i32 poison>
+; CHECK-NEXT:    [[T5:%.*]] = shufflevector <6 x i16> zeroinitializer, <6 x i16> [[TMP1]], <8 x i32> <i32 0, i32 1, i32 2, i32 6, i32 3, i32 4, i32 5, i32 7>
 ; CHECK-NEXT:    ret <8 x i16> [[T5]]
 ;
   %t1 = extractelement <4 x i16> %t0, i32 2
diff --git a/llvm/test/Transforms/InstCombine/shufflevec-compact-operands.ll b/llvm/test/Transforms/InstCombine/shufflevec-compact-operands.ll
new file mode 100644
index 0000000000000..d9c6b31c1f072
--- /dev/null
+++ b/llvm/test/Transforms/InstCombine/shufflevec-compact-operands.ll
@@ -0,0 +1,180 @@
+; NOTE: Assertions have been autogenerated by utils/update_test_checks.py
+; RUN: opt -passes=instcombine -S %s | FileCheck %s
+
+; Interleaving splat shuffle with constant operand - SHOULD compact
+define <8 x i8> @interleave_splat_constant(i8 %x) {
+; CHECK-LABEL: @interleave_splat_constant(
+; CHECK-NEXT:    [[TMP1:%.*]] = insertelement <4 x i8> poison, i8 [[X:%.*]], i64 0
+; CHECK-NEXT:    [[TMP2:%.*]] = shufflevector <4 x i8> [[TMP1]], <4 x i8> poison, <4 x i32> zeroinitializer
+; CHECK-NEXT:    [[TMP3:%.*]] = shufflevector <4 x i8> [[TMP2]], <4 x i8> <i8 0, i8 1, i8 2, i8 3>, <8 x i32> <i32 0, i32 4, i32 1, i32 5, i32 2, i32 6, i32 3, i32 7>
+; CHECK-NEXT:    ret <8 x i8> [[TMP3]]
+;
+  %1 = insertelement <4 x i8> poison, i8 %x, i32 0
+  %2 = shufflevector <4 x i8> %1, <4 x i8> poison, <4 x i32> zeroinitializer
+  %3 = shufflevector <4 x i8> %2, <4 x i8> poison, <8 x i32> <i32 0, i32 poison, i32 1, i32 poison, i32 2, i32 poison, i32 3, i32 poison>
+  %4 = shufflevector <8 x i8> <i8 poison, i8 0, i8 poison, i8 1, i8 poison, i8 2, i8 poison, i8 3>, <8 x i8> %3, <8 x i32> <i32 8, i32 1, i32 10, i32 3, i32 12, i32 5, i32 14, i32 7>
+  ret <8 x i8> %4
+}
+
+; Interleaving constant with splat shuffle operand - SHOULD compact
+define <8 x i8> @interleave_constant_splat(i8 %x) {
+; CHECK-LABEL: @interleave_constant_splat(
+; CHECK-NEXT:    [[TMP1:%.*]] = insertelement <4 x i8> poison, i8 [[X:%.*]], i64 0
+; CHECK-NEXT:    [[TMP2:%.*]] = shufflevector <4 x i8> [[TMP1]], <4 x i8> poison, <4 x i32> zeroinitializer
+; CHECK-NEXT:    [[TMP3:%.*]] = shufflevector <4 x i8> <i8 0, i8 1, i8 2, i8 3>, <4 x i8> [[TMP2]], <8 x i32> <i32 0, i32 4, i32 1, i32 5, i32 2, i32 6, i32 3, i32 7>
+; CHECK-NEXT:    ret <8 x i8> [[TMP3]]
+;
+  %1 = insertelement <4 x i8> poison, i8 %x, i32 0
+  %2 = shufflevector <4 x i8> %1, <4 x i8> poison, <4 x i32> zeroinitializer
+  %3 = shufflevector <4 x i8> %2, <4 x i8> poison, <8 x i32> <i32 0, i32 poison, i32 1, i32 poison, i32 2, i32 poison, i32 3, i32 poison>
+  %4 = shufflevector <8 x i8> <i8 0, i8 poison, i8 1, i8 poison, i8 2, i8 poison, i8 3, i8 poison>, <8 x i8> %3, <8 x i32> <i32 0, i32 8, i32 2, i32 10, i32 4, i32 12, i32 6, i32 14>
+  ret <8 x i8> %4
+}
+
+; Interleaving random shuffle with constant operand - SHOULD compact
+define <8 x i8> @interleave_shuffle_constant(<4 x i8> %x, <4 x i8> %y) {
+; CHECK-LABEL: @interleave_shuffle_constant(
+; CHECK-NEXT:    [[TMP2:%.*]] = shufflevector <4 x i8> [[X:%.*]], <4 x i8> [[Y:%.*]], <4 x i32> <i32 7, i32 1, i32 3, i32 2>
+; CHECK-NEXT:    [[TMP3:%.*]] = shufflevector <4 x i8> [[TMP2]], <4 x i8> <i8 0, i8 1, i8 2, i8 3>, <8 x i32> <i32 0, i32 4, i32 1, i32 5, i32 2, i32 6, i32 3, i32 7>
+; CHECK-NEXT:    ret <8 x i8> [[TMP3]]
+;
+  %1 = shufflevector <4 x i8> %x, <4 x i8> %y, <8 x i32> <i32 7, i32 4, i32 1, i32 6, i32 3, i32 0, i32 2, i32 5>
+  %2 = shufflevector <8 x i8> %1, <8 x i8> <i8 0, i8 9, i8 1, i8 9, i8 2, i8 9, i8 3, i8 9>, <8 x i32> <i32 0, i32 8, i32 2, i32 10, i32 4, i32 12, i32 6, i32 14>
+  ret <8 x i8> %2
+}
+
+; Interleaving constant with random shuffle - SHOULD compact
+define <8 x i8> @interleave_constant_shuffle(<4 x i8> %x, <4 x i8> %y) {
+; CHECK-LABEL: @interleave_constant_shuffle(
+; CHECK-NEXT:    [[TMP1:%.*]] = shufflevector <4 x i8> [[X:%.*]], <4 x i8> [[Y:%.*]], <4 x i32> <i32 7, i32 1, i32 3, i32 2>
+; CHECK-NEXT:    [[TMP3:%.*]] = shufflevector <4 x i8> <i8 0, i8 1, i8 2, i8 3>, <4 x i8> [[TMP1]], <8 x i32> <i32 0, i32 4, i32 1, i32 5, i32 2, i32 6, i32 3, i32 7>
+; CHECK-NEXT:    ret <8 x i8> [[TMP3]]
+;
+  %1 = shufflevector <4 x i8> %x, <4 x i8> %y, <8 x i32> <i32 7, i32 4, i32 1, i32 6, i32 3, i32 0, i32 2, i32 5>
+  %2 = shufflevector <8 x i8> <i8 0, i8 9, i8 1, i8 9, i8 2, i8 9, i8 3, i8 9>, <8 x i8> %1, <8 x i32> <i32 0, i32 8, i32 2, i32 10, i32 4, i32 12, i32 6, i32 14>
+  ret <8 x i8> %2
+}
+
+; Randomly shuffle random shuffle with constant operand - SHOULD compact
+define <8 x i8> @shuffle_shuffle_constant(<4 x i8> %x, <4 x i8> %y) {
+; CHECK-LABEL: @shuffle_shuffle_constant(
+; CHECK-NE...
[truncated]

llvmbot · 2026-01-10T10:08:11Z

@llvm/pr-subscribers-backend-amdgpu

Author: Philip Ginsbach-Chen (ginsbach)

Changes

Disclaimer: This is my first contribution to InstCombine. This addresses a specific problem (#137447), and I've learned a lot implementing this, but I'm open to suggestions for improvement or even an entirely different approach.

This patch implements an optimization in InstCombine that reduces the size of shuffle operands by eliminating elements that are not referenced by the shuffle mask.

Motivation

I came up with this transformation for #137447, to improve the optimization of the following code:

#include &lt;arm_neon.h&gt;

int8x16_t f(int8_t x)
{
  return (int8x16_t) { x, 0, x, 1, x, 2, x, 3,
                       x, 4, x, 5, x, 6, x, 7 };
}

int8x16_t g(int8_t x)
{
  return (int8x16_t) { 0, x, 1, x, 2, x, 3, x,
                       4, x, 5, x, 6, x, 7, x };
}

On main, this generates two vectors that are loaded from the constant pool and then each interleaved with a splat vector:

&lt;16 x i8&gt; &lt;i8 poison, i8 0, i8 poison, i8 1, i8 poison, i8 2, i8 poison, i8 3, i8 poison, i8 4, i8 poison, i8 5, i8 poison, i8 6, i8 poison, i8 7&gt;
&lt;16 x i8&gt; &lt;i8 0, i8 poison, i8 1, i8 poison, i8 2, i8 poison, i8 3, i8 poison, i8 4, i8 poison, i8 5, i8 poison, i8 6, i8 poison, i8 7, i8 poison&gt;

This PR's transformation compacts both to:

&lt;8 x i8&gt; &lt;i8 0, i8 1, i8 2, i8 3, i8 4, i8 5, i8 6, i8 7&gt;

The first two test cases in shufflevec-compact-operands.ll are based on this motivating example.

The primary benefit should be reducing constant pool usage. In this specific case, the constant pool could even be eliminated entirely, as the new vectors could be generated with the index instruction.

Note that this ideal will need follow-up work in the backend. With this PR alone, the output for f (and almost exactly the same for g) becomes:

.LCPI0_0:
	.byte	0                               // 0x0
	.byte	1                               // 0x1
	.byte	2                               // 0x2
	.byte	3                               // 0x3
	.byte	4                               // 0x4
	.byte	5                               // 0x5
	.byte	6                               // 0x6
	.byte	7                               // 0x7
	.zero	1
	.zero	1
	.zero	1
	.zero	1
	.zero	1
	.zero	1
	.zero	1
	.zero	1
	.text
	.globl	f
	.p2align	2
	.type	f,@<!-- -->function
f:                                      // @<!-- -->f
	.cfi_startproc
// %bb.0:                               // %entry
	adrp	x8, .LCPI0_0
	dup	v0.16b, w0
	ldr	q1, [x8, :lo12:.LCPI0_0]
	zip1	v0.16b, v0.16b, v1.16b
	ret

Implementation

The optimization identifies which elements are actually used from each shuffle operand. For single-use compactable operands (constants or shuffles), it creates narrower vectors containing only the used elements and updates the shuffle mask accordingly.

Conservative heuristic: The transformation only applies when 1) at least one operand is a constant; 2) all operands are constants or shufflevectors; and 3) both operands have no other uses. The heuristic is target-independent.

Questions for Reviewers

Heuristics: The main concern is that the transformation can destroy regularity in the shuffle masks. The current heuristic entirely avoids cases where both operands are themselves shufflevector instructions, which should eliminate the most problematic cases. Nonetheless, I could incorporate TTI::getShuffleCost to avoid regressions. My understanding is that this is discouraged in InstCombine, but I could move to VectorCombine or elsewhere if that seems a better place?
Non-power-of-2 vector sizes: The optimization can create vectors like <3 x i32>, and the heuristic does not consider power-of-2 vector sizes preferable in any way. I believe the backend handles this gracefully, but I'd appreciate confirmation.
Code structure 1: I used std::function with move-captured lambdas to defer operand construction, which feels somewhat inelegant. However, alternatives seem worse: we need to analyze both operands before deciding whether to transform, but ShuffleVectorInst is immutable once created, so we can't speculatively create compacted operands. Creating temporary instructions would be wasteful if we don't apply the transformation. Is this lambda-based approach acceptable given these constraints, or is there a cleaner design I'm missing?
Code structure 2: It was a bit awkward to insert tryCompactShuffleOperands at the end of visitShuffleVectorInst. I wanted to ensure this never disrupts existing transformations in InstCombineVectorOps, but I could not rely on the fixpoint iteration to reach my transformation if I just inserted it as a single call at the bottom. Instead, I ended up with the three call sites. I suppose I could pull out the entire existing code below bool MadeChange = false; into a helper function to avoid this duplication, but that seemed distracting for review.

Patch is 36.71 KiB, truncated to 20.00 KiB below, full version: https://github.com/llvm/llvm-project/pull/175255.diff

10 Files Affected:

(modified) llvm/lib/Transforms/InstCombine/InstCombineVectorOps.cpp (+142-3)
(modified) llvm/test/Transforms/InstCombine/AMDGPU/simplify-demanded-vector-elts-lane-intrinsics.ll (+6-12)
(modified) llvm/test/Transforms/InstCombine/insert-extract-shuffle-inseltpoison.ll (+2-2)
(modified) llvm/test/Transforms/InstCombine/insert-extract-shuffle.ll (+2-2)
(added) llvm/test/Transforms/InstCombine/shufflevec-compact-operands.ll (+180)
(modified) llvm/test/Transforms/InstCombine/vec_shuffle-inseltpoison.ll (+2-2)
(modified) llvm/test/Transforms/InstCombine/vec_shuffle.ll (+2-2)
(modified) llvm/test/Transforms/PhaseOrdering/X86/addsub.ll (+4-6)
(modified) llvm/test/Transforms/PhaseOrdering/X86/hadd.ll (+2-2)
(modified) llvm/test/Transforms/PhaseOrdering/X86/hsub.ll (+2-2)

diff --git a/llvm/lib/Transforms/InstCombine/InstCombineVectorOps.cpp b/llvm/lib/Transforms/InstCombine/InstCombineVectorOps.cpp
index 3b034f6c37f66..a5a859ed57445 100644
--- a/llvm/lib/Transforms/InstCombine/InstCombineVectorOps.cpp
+++ b/llvm/lib/Transforms/InstCombine/InstCombineVectorOps.cpp
@@ -2893,6 +2893,145 @@ Instruction *InstCombinerImpl::simplifyBinOpSplats(ShuffleVectorInst &SVI) {
   return new ShuffleVectorInst(NewBO, SVI.getShuffleMask());
 }
 
+/// Describes whether and how a shuffle operand can be compacted.
+struct ShuffleOperandCompaction {
+  /// Whether this operand can be compacted (has a single use and is either
+  /// a constant or another shuffle instruction).
+  bool CanCompact;
+  /// Conservative heuristic: whether this operand's compaction justifies
+  /// the overall transformation (true for constants; false for shuffles).
+  bool ShouldCompact;
+  /// The minimal width required for the compacted vector.
+  unsigned CompactedWidth;
+  /// Function to create the compacted operand if the transformation applies.
+  std::function<Value *(unsigned, InstCombiner::BuilderTy &)> Apply;
+};
+
+/// Attempt to narrow/compact a constant vector used in a shuffle by removing
+/// elements that are not referenced by the shuffle mask.
+static ShuffleOperandCompaction
+compactShuffleOperand(Constant *ShuffleInput,
+                      MutableArrayRef<int> UserShuffleMask, int IndexStart) {
+  auto *VecTy = cast<FixedVectorType>(ShuffleInput->getType());
+  unsigned Width = VecTy->getNumElements();
+
+  // Collect only the constant elements that are actually used.
+  SmallVector<Constant *, 16> CompactedElts;
+  // Map from original element index to compacted index.
+  SmallVector<int, 16> IndexRemap(Width, -1);
+
+  for (int &MaskElt : UserShuffleMask) {
+    if (MaskElt >= IndexStart && MaskElt < IndexStart + (int)Width) {
+      int RelMaskElt = MaskElt - IndexStart;
+      if (IndexRemap[RelMaskElt] < 0) {
+        IndexRemap[RelMaskElt] = CompactedElts.size() + IndexStart;
+        CompactedElts.push_back(ShuffleInput->getAggregateElement(RelMaskElt));
+      }
+      MaskElt = IndexRemap[RelMaskElt];
+    }
+  }
+
+  return {true, true, static_cast<unsigned>(CompactedElts.size()),
+          [CompactedElts = std::move(CompactedElts),
+           VecTy](unsigned PaddedWidth,
+                  InstCombiner::BuilderTy &Builder) -> Value * {
+            // Pad with poison to reach the requested width.
+            SmallVector<Constant *, 16> PaddedElts(CompactedElts);
+            while (PaddedElts.size() < PaddedWidth)
+              PaddedElts.push_back(PoisonValue::get(VecTy->getElementType()));
+
+            return ConstantVector::get(PaddedElts);
+          }};
+}
+
+/// Attempt to narrow/compact a shuffle instruction used in a shuffle by
+/// removing elements that are not referenced by the shuffle mask.
+static ShuffleOperandCompaction
+compactShuffleOperand(ShuffleVectorInst *ShuffleInput,
+                      MutableArrayRef<int> UserShuffleMask, int IndexStart) {
+  auto *VecTy = cast<FixedVectorType>(ShuffleInput->getType());
+  unsigned Width = VecTy->getNumElements();
+
+  // Collect only the shuffle mask elements that are actually used.
+  SmallVector<int, 16> CompactedMask;
+  // Map from original element index to compacted index.
+  SmallVector<int, 16> IndexRemap(Width, -1);
+
+  for (int &MaskElt : UserShuffleMask) {
+    if (MaskElt >= IndexStart && MaskElt < IndexStart + (int)Width) {
+      int RelMaskElt = MaskElt - IndexStart;
+      if (IndexRemap[RelMaskElt] < 0) {
+        IndexRemap[RelMaskElt] = CompactedMask.size() + IndexStart;
+        CompactedMask.push_back(ShuffleInput->getMaskValue(RelMaskElt));
+      }
+      MaskElt = IndexRemap[RelMaskElt];
+    }
+  }
+
+  return {true, false, static_cast<unsigned>(CompactedMask.size()),
+          [CompactedMask = std::move(CompactedMask),
+           ShuffleInput](unsigned PaddedWidth,
+                         InstCombiner::BuilderTy &Builder) -> Value * {
+            // Pad with poison mask elements to reach the requested width.
+            SmallVector<int, 16> PaddedMask(CompactedMask);
+            while (PaddedMask.size() < PaddedWidth)
+              PaddedMask.push_back(PoisonMaskElem);
+
+            return Builder.CreateShuffleVector(ShuffleInput->getOperand(0),
+                                               ShuffleInput->getOperand(1),
+                                               PaddedMask);
+          }};
+}
+
+/// Try to narrow/compact a shuffle operand by eliminating elements that are
+/// not used by the shuffle mask. This updates the shuffle mask in-place to
+/// reflect the compaction. Returns information about whether compaction is
+/// possible and a lambda to apply the compaction if beneficial.
+static ShuffleOperandCompaction
+compactShuffleOperand(Value *ShuffleInput, MutableArrayRef<int> ShuffleMask,
+                      int IndexStart) {
+  if (ShuffleInput->getNumUses() > 1)
+    return {false, false, 0, nullptr};
+
+  if (auto *C = dyn_cast<Constant>(ShuffleInput))
+    return compactShuffleOperand(C, ShuffleMask, IndexStart);
+  if (auto *Shuf = dyn_cast<ShuffleVectorInst>(ShuffleInput))
+    return compactShuffleOperand(Shuf, ShuffleMask, IndexStart);
+
+  return {false, false, 0, nullptr};
+}
+
+/// Try to narrow the shuffle by eliminating unused elements from the operands.
+static Instruction *tryCompactShuffleOperands(ShuffleVectorInst &SVI,
+                                              InstCombinerImpl &IC) {
+  Value *LHS = SVI.getOperand(0);
+  Value *RHS = SVI.getOperand(1);
+  ArrayRef<int> Mask = SVI.getShuffleMask();
+  unsigned LHSWidth = cast<FixedVectorType>(LHS->getType())->getNumElements();
+
+  SmallVector<int, 16> NewMask(Mask.begin(), Mask.end());
+  ShuffleOperandCompaction LHSCompact = compactShuffleOperand(LHS, NewMask, 0);
+  ShuffleOperandCompaction RHSCompact =
+      compactShuffleOperand(RHS, NewMask, LHSWidth);
+  if (LHSCompact.CanCompact && RHSCompact.CanCompact &&
+      (LHSCompact.ShouldCompact || RHSCompact.ShouldCompact)) {
+    unsigned CompactWidth =
+        std::max(LHSCompact.CompactedWidth, RHSCompact.CompactedWidth);
+    if (CompactWidth < LHSWidth) {
+      IC.replaceOperand(SVI, 0, LHSCompact.Apply(CompactWidth, IC.Builder));
+      IC.replaceOperand(SVI, 1, RHSCompact.Apply(CompactWidth, IC.Builder));
+      // Adjust RHS indices in the mask to account for the new LHS width.
+      for (int &MaskElt : NewMask)
+        if (MaskElt >= (int)LHSWidth)
+          MaskElt = MaskElt - LHSWidth + CompactWidth;
+      SVI.setShuffleMask(NewMask);
+      return &SVI;
+    }
+  }
+
+  return nullptr;
+}
+
 Instruction *InstCombinerImpl::visitShuffleVectorInst(ShuffleVectorInst &SVI) {
   Value *LHS = SVI.getOperand(0);
   Value *RHS = SVI.getOperand(1);
@@ -3172,7 +3311,7 @@ Instruction *InstCombinerImpl::visitShuffleVectorInst(ShuffleVectorInst &SVI) {
     if (!match(RHSShuffle->getOperand(1), m_Poison()))
       RHSShuffle = nullptr;
   if (!LHSShuffle && !RHSShuffle)
-    return MadeChange ? &SVI : nullptr;
+    return MadeChange ? &SVI : tryCompactShuffleOperands(SVI, *this);
 
   Value* LHSOp0 = nullptr;
   Value* LHSOp1 = nullptr;
@@ -3212,7 +3351,7 @@ Instruction *InstCombinerImpl::visitShuffleVectorInst(ShuffleVectorInst &SVI) {
   }
 
   if (newLHS == LHS && newRHS == RHS)
-    return MadeChange ? &SVI : nullptr;
+    return MadeChange ? &SVI : tryCompactShuffleOperands(SVI, *this);
 
   ArrayRef<int> LHSMask;
   ArrayRef<int> RHSMask;
@@ -3294,5 +3433,5 @@ Instruction *InstCombinerImpl::visitShuffleVectorInst(ShuffleVectorInst &SVI) {
     return new ShuffleVectorInst(newLHS, newRHS, newMask);
   }
 
-  return MadeChange ? &SVI : nullptr;
+  return MadeChange ? &SVI : tryCompactShuffleOperands(SVI, *this);
 }
diff --git a/llvm/test/Transforms/InstCombine/AMDGPU/simplify-demanded-vector-elts-lane-intrinsics.ll b/llvm/test/Transforms/InstCombine/AMDGPU/simplify-demanded-vector-elts-lane-intrinsics.ll
index 056caabb6d60a..818a83fcc4103 100644
--- a/llvm/test/Transforms/InstCombine/AMDGPU/simplify-demanded-vector-elts-lane-intrinsics.ll
+++ b/llvm/test/Transforms/InstCombine/AMDGPU/simplify-demanded-vector-elts-lane-intrinsics.ll
@@ -115,8 +115,7 @@ define <2 x i16> @extract_elt32_v4i16_readfirstlane(<4 x i16> %src) {
 ; CHECK-SAME: <4 x i16> [[SRC:%.*]]) #[[ATTR0]] {
 ; CHECK-NEXT:    [[TMP1:%.*]] = shufflevector <4 x i16> [[SRC]], <4 x i16> poison, <2 x i32> <i32 2, i32 3>
 ; CHECK-NEXT:    [[TMP2:%.*]] = call <2 x i16> @llvm.amdgcn.readfirstlane.v2i16(<2 x i16> [[TMP1]])
-; CHECK-NEXT:    [[VEC:%.*]] = shufflevector <2 x i16> [[TMP2]], <2 x i16> poison, <4 x i32> <i32 poison, i32 poison, i32 0, i32 1>
-; CHECK-NEXT:    [[SHUFFLE:%.*]] = shufflevector <4 x i16> [[VEC]], <4 x i16> poison, <2 x i32> <i32 3, i32 2>
+; CHECK-NEXT:    [[SHUFFLE:%.*]] = shufflevector <2 x i16> [[TMP2]], <2 x i16> poison, <2 x i32> <i32 1, i32 0>
 ; CHECK-NEXT:    ret <2 x i16> [[SHUFFLE]]
 ;
   %vec = call <4 x i16> @llvm.amdgcn.readfirstlane.v4i16(<4 x i16> %src)
@@ -287,8 +286,7 @@ define <2 x i32> @extract_elt13_v4i32_readfirstlane(<4 x i32> %src) {
 ; CHECK-SAME: <4 x i32> [[SRC:%.*]]) #[[ATTR0]] {
 ; CHECK-NEXT:    [[TMP1:%.*]] = shufflevector <4 x i32> [[SRC]], <4 x i32> poison, <3 x i32> <i32 1, i32 poison, i32 3>
 ; CHECK-NEXT:    [[TMP2:%.*]] = call <3 x i32> @llvm.amdgcn.readfirstlane.v3i32(<3 x i32> [[TMP1]])
-; CHECK-NEXT:    [[VEC:%.*]] = shufflevector <3 x i32> [[TMP2]], <3 x i32> poison, <4 x i32> <i32 poison, i32 0, i32 poison, i32 2>
-; CHECK-NEXT:    [[SHUFFLE:%.*]] = shufflevector <4 x i32> [[VEC]], <4 x i32> poison, <2 x i32> <i32 1, i32 3>
+; CHECK-NEXT:    [[SHUFFLE:%.*]] = shufflevector <3 x i32> [[TMP2]], <3 x i32> poison, <2 x i32> <i32 0, i32 2>
 ; CHECK-NEXT:    ret <2 x i32> [[SHUFFLE]]
 ;
   %vec = call <4 x i32> @llvm.amdgcn.readfirstlane.v4i32(<4 x i32> %src)
@@ -328,8 +326,7 @@ define < 2 x i32> @extract_elt13_v4i32_readfirstlane_source_simplify1(i32 %src0,
 ; CHECK-NEXT:    [[TMP1:%.*]] = insertelement <4 x i32> poison, i32 [[SRC0]], i64 0
 ; CHECK-NEXT:    [[TMP2:%.*]] = shufflevector <4 x i32> [[TMP1]], <4 x i32> poison, <3 x i32> <i32 0, i32 poison, i32 0>
 ; CHECK-NEXT:    [[TMP3:%.*]] = call <3 x i32> @llvm.amdgcn.readfirstlane.v3i32(<3 x i32> [[TMP2]])
-; CHECK-NEXT:    [[VEC:%.*]] = shufflevector <3 x i32> [[TMP3]], <3 x i32> poison, <4 x i32> <i32 poison, i32 0, i32 poison, i32 2>
-; CHECK-NEXT:    [[SHUFFLE:%.*]] = shufflevector <4 x i32> [[VEC]], <4 x i32> poison, <2 x i32> <i32 1, i32 3>
+; CHECK-NEXT:    [[SHUFFLE:%.*]] = shufflevector <3 x i32> [[TMP3]], <3 x i32> poison, <2 x i32> <i32 0, i32 2>
 ; CHECK-NEXT:    ret <2 x i32> [[SHUFFLE]]
 ;
   %ins.0 = insertelement <4 x i32> poison, i32 %src0, i32 1
@@ -372,8 +369,7 @@ define < 2 x i32> @extract_elt13_v4i32_readfirstlane_source_simplify1_convergenc
 ; CHECK-NEXT:    [[TMP1:%.*]] = insertelement <4 x i32> poison, i32 [[SRC0]], i64 0
 ; CHECK-NEXT:    [[TMP2:%.*]] = shufflevector <4 x i32> [[TMP1]], <4 x i32> poison, <3 x i32> <i32 0, i32 poison, i32 0>
 ; CHECK-NEXT:    [[TMP3:%.*]] = call <3 x i32> @llvm.amdgcn.readfirstlane.v3i32(<3 x i32> [[TMP2]]) [ "convergencectrl"(token [[T]]) ]
-; CHECK-NEXT:    [[VEC:%.*]] = shufflevector <3 x i32> [[TMP3]], <3 x i32> poison, <4 x i32> <i32 poison, i32 0, i32 poison, i32 2>
-; CHECK-NEXT:    [[SHUFFLE:%.*]] = shufflevector <4 x i32> [[VEC]], <4 x i32> poison, <2 x i32> <i32 1, i32 3>
+; CHECK-NEXT:    [[SHUFFLE:%.*]] = shufflevector <3 x i32> [[TMP3]], <3 x i32> poison, <2 x i32> <i32 0, i32 2>
 ; CHECK-NEXT:    ret <2 x i32> [[SHUFFLE]]
 ;
   %t = call token @llvm.experimental.convergence.entry()
@@ -413,8 +409,7 @@ define <2 x i32> @extract_elt13_v8i32_readfirstlane(<8 x i32> %src) {
 ; CHECK-SAME: <8 x i32> [[SRC:%.*]]) #[[ATTR0]] {
 ; CHECK-NEXT:    [[TMP1:%.*]] = shufflevector <8 x i32> [[SRC]], <8 x i32> poison, <3 x i32> <i32 1, i32 poison, i32 3>
 ; CHECK-NEXT:    [[TMP2:%.*]] = call <3 x i32> @llvm.amdgcn.readfirstlane.v3i32(<3 x i32> [[TMP1]])
-; CHECK-NEXT:    [[VEC:%.*]] = shufflevector <3 x i32> [[TMP2]], <3 x i32> poison, <8 x i32> <i32 poison, i32 0, i32 poison, i32 2, i32 poison, i32 poison, i32 poison, i32 poison>
-; CHECK-NEXT:    [[SHUFFLE:%.*]] = shufflevector <8 x i32> [[VEC]], <8 x i32> poison, <2 x i32> <i32 1, i32 3>
+; CHECK-NEXT:    [[SHUFFLE:%.*]] = shufflevector <3 x i32> [[TMP2]], <3 x i32> poison, <2 x i32> <i32 0, i32 2>
 ; CHECK-NEXT:    ret <2 x i32> [[SHUFFLE]]
 ;
   %vec = call <8 x i32> @llvm.amdgcn.readfirstlane.v8i32(<8 x i32> %src)
@@ -439,8 +434,7 @@ define <3 x i32> @extract_elt124_v8i32_readfirstlane(<8 x i32> %src) {
 ; CHECK-SAME: <8 x i32> [[SRC:%.*]]) #[[ATTR0]] {
 ; CHECK-NEXT:    [[TMP1:%.*]] = shufflevector <8 x i32> [[SRC]], <8 x i32> poison, <4 x i32> <i32 1, i32 2, i32 poison, i32 4>
 ; CHECK-NEXT:    [[TMP2:%.*]] = call <4 x i32> @llvm.amdgcn.readfirstlane.v4i32(<4 x i32> [[TMP1]])
-; CHECK-NEXT:    [[VEC:%.*]] = shufflevector <4 x i32> [[TMP2]], <4 x i32> poison, <8 x i32> <i32 poison, i32 0, i32 1, i32 poison, i32 3, i32 poison, i32 poison, i32 poison>
-; CHECK-NEXT:    [[SHUFFLE:%.*]] = shufflevector <8 x i32> [[VEC]], <8 x i32> poison, <3 x i32> <i32 1, i32 2, i32 4>
+; CHECK-NEXT:    [[SHUFFLE:%.*]] = shufflevector <4 x i32> [[TMP2]], <4 x i32> poison, <3 x i32> <i32 0, i32 1, i32 3>
 ; CHECK-NEXT:    ret <3 x i32> [[SHUFFLE]]
 ;
   %vec = call <8 x i32> @llvm.amdgcn.readfirstlane.v8i32(<8 x i32> %src)
diff --git a/llvm/test/Transforms/InstCombine/insert-extract-shuffle-inseltpoison.ll b/llvm/test/Transforms/InstCombine/insert-extract-shuffle-inseltpoison.ll
index 8bc915e695aa7..5dba85b0b2452 100644
--- a/llvm/test/Transforms/InstCombine/insert-extract-shuffle-inseltpoison.ll
+++ b/llvm/test/Transforms/InstCombine/insert-extract-shuffle-inseltpoison.ll
@@ -86,8 +86,8 @@ define <8 x float> @widen_extract4(<8 x float> %ins, <2 x float> %ext) {
 
 define <8 x i16> @pr26015(<4 x i16> %t0) {
 ; CHECK-LABEL: @pr26015(
-; CHECK-NEXT:    [[TMP1:%.*]] = shufflevector <4 x i16> [[T0:%.*]], <4 x i16> poison, <8 x i32> <i32 poison, i32 poison, i32 2, i32 3, i32 poison, i32 poison, i32 poison, i32 poison>
-; CHECK-NEXT:    [[T5:%.*]] = shufflevector <8 x i16> <i16 0, i16 0, i16 0, i16 poison, i16 0, i16 0, i16 0, i16 poison>, <8 x i16> [[TMP1]], <8 x i32> <i32 0, i32 1, i32 2, i32 10, i32 4, i32 5, i32 6, i32 11>
+; CHECK-NEXT:    [[TMP1:%.*]] = shufflevector <4 x i16> [[T0:%.*]], <4 x i16> poison, <6 x i32> <i32 2, i32 3, i32 poison, i32 poison, i32 poison, i32 poison>
+; CHECK-NEXT:    [[T5:%.*]] = shufflevector <6 x i16> zeroinitializer, <6 x i16> [[TMP1]], <8 x i32> <i32 0, i32 1, i32 2, i32 6, i32 3, i32 4, i32 5, i32 7>
 ; CHECK-NEXT:    ret <8 x i16> [[T5]]
 ;
   %t1 = extractelement <4 x i16> %t0, i32 2
diff --git a/llvm/test/Transforms/InstCombine/insert-extract-shuffle.ll b/llvm/test/Transforms/InstCombine/insert-extract-shuffle.ll
index 470d6be88672b..5d98e20cb7b01 100644
--- a/llvm/test/Transforms/InstCombine/insert-extract-shuffle.ll
+++ b/llvm/test/Transforms/InstCombine/insert-extract-shuffle.ll
@@ -86,8 +86,8 @@ define <8 x float> @widen_extract4(<8 x float> %ins, <2 x float> %ext) {
 
 define <8 x i16> @pr26015(<4 x i16> %t0) {
 ; CHECK-LABEL: @pr26015(
-; CHECK-NEXT:    [[TMP1:%.*]] = shufflevector <4 x i16> [[T0:%.*]], <4 x i16> poison, <8 x i32> <i32 poison, i32 poison, i32 2, i32 3, i32 poison, i32 poison, i32 poison, i32 poison>
-; CHECK-NEXT:    [[T5:%.*]] = shufflevector <8 x i16> <i16 0, i16 0, i16 0, i16 poison, i16 0, i16 0, i16 0, i16 poison>, <8 x i16> [[TMP1]], <8 x i32> <i32 0, i32 1, i32 2, i32 10, i32 4, i32 5, i32 6, i32 11>
+; CHECK-NEXT:    [[TMP1:%.*]] = shufflevector <4 x i16> [[T0:%.*]], <4 x i16> poison, <6 x i32> <i32 2, i32 3, i32 poison, i32 poison, i32 poison, i32 poison>
+; CHECK-NEXT:    [[T5:%.*]] = shufflevector <6 x i16> zeroinitializer, <6 x i16> [[TMP1]], <8 x i32> <i32 0, i32 1, i32 2, i32 6, i32 3, i32 4, i32 5, i32 7>
 ; CHECK-NEXT:    ret <8 x i16> [[T5]]
 ;
   %t1 = extractelement <4 x i16> %t0, i32 2
diff --git a/llvm/test/Transforms/InstCombine/shufflevec-compact-operands.ll b/llvm/test/Transforms/InstCombine/shufflevec-compact-operands.ll
new file mode 100644
index 0000000000000..d9c6b31c1f072
--- /dev/null
+++ b/llvm/test/Transforms/InstCombine/shufflevec-compact-operands.ll
@@ -0,0 +1,180 @@
+; NOTE: Assertions have been autogenerated by utils/update_test_checks.py
+; RUN: opt -passes=instcombine -S %s | FileCheck %s
+
+; Interleaving splat shuffle with constant operand - SHOULD compact
+define <8 x i8> @interleave_splat_constant(i8 %x) {
+; CHECK-LABEL: @interleave_splat_constant(
+; CHECK-NEXT:    [[TMP1:%.*]] = insertelement <4 x i8> poison, i8 [[X:%.*]], i64 0
+; CHECK-NEXT:    [[TMP2:%.*]] = shufflevector <4 x i8> [[TMP1]], <4 x i8> poison, <4 x i32> zeroinitializer
+; CHECK-NEXT:    [[TMP3:%.*]] = shufflevector <4 x i8> [[TMP2]], <4 x i8> <i8 0, i8 1, i8 2, i8 3>, <8 x i32> <i32 0, i32 4, i32 1, i32 5, i32 2, i32 6, i32 3, i32 7>
+; CHECK-NEXT:    ret <8 x i8> [[TMP3]]
+;
+  %1 = insertelement <4 x i8> poison, i8 %x, i32 0
+  %2 = shufflevector <4 x i8> %1, <4 x i8> poison, <4 x i32> zeroinitializer
+  %3 = shufflevector <4 x i8> %2, <4 x i8> poison, <8 x i32> <i32 0, i32 poison, i32 1, i32 poison, i32 2, i32 poison, i32 3, i32 poison>
+  %4 = shufflevector <8 x i8> <i8 poison, i8 0, i8 poison, i8 1, i8 poison, i8 2, i8 poison, i8 3>, <8 x i8> %3, <8 x i32> <i32 8, i32 1, i32 10, i32 3, i32 12, i32 5, i32 14, i32 7>
+  ret <8 x i8> %4
+}
+
+; Interleaving constant with splat shuffle operand - SHOULD compact
+define <8 x i8> @interleave_constant_splat(i8 %x) {
+; CHECK-LABEL: @interleave_constant_splat(
+; CHECK-NEXT:    [[TMP1:%.*]] = insertelement <4 x i8> poison, i8 [[X:%.*]], i64 0
+; CHECK-NEXT:    [[TMP2:%.*]] = shufflevector <4 x i8> [[TMP1]], <4 x i8> poison, <4 x i32> zeroinitializer
+; CHECK-NEXT:    [[TMP3:%.*]] = shufflevector <4 x i8> <i8 0, i8 1, i8 2, i8 3>, <4 x i8> [[TMP2]], <8 x i32> <i32 0, i32 4, i32 1, i32 5, i32 2, i32 6, i32 3, i32 7>
+; CHECK-NEXT:    ret <8 x i8> [[TMP3]]
+;
+  %1 = insertelement <4 x i8> poison, i8 %x, i32 0
+  %2 = shufflevector <4 x i8> %1, <4 x i8> poison, <4 x i32> zeroinitializer
+  %3 = shufflevector <4 x i8> %2, <4 x i8> poison, <8 x i32> <i32 0, i32 poison, i32 1, i32 poison, i32 2, i32 poison, i32 3, i32 poison>
+  %4 = shufflevector <8 x i8> <i8 0, i8 poison, i8 1, i8 poison, i8 2, i8 poison, i8 3, i8 poison>, <8 x i8> %3, <8 x i32> <i32 0, i32 8, i32 2, i32 10, i32 4, i32 12, i32 6, i32 14>
+  ret <8 x i8> %4
+}
+
+; Interleaving random shuffle with constant operand - SHOULD compact
+define <8 x i8> @interleave_shuffle_constant(<4 x i8> %x, <4 x i8> %y) {
+; CHECK-LABEL: @interleave_shuffle_constant(
+; CHECK-NEXT:    [[TMP2:%.*]] = shufflevector <4 x i8> [[X:%.*]], <4 x i8> [[Y:%.*]], <4 x i32> <i32 7, i32 1, i32 3, i32 2>
+; CHECK-NEXT:    [[TMP3:%.*]] = shufflevector <4 x i8> [[TMP2]], <4 x i8> <i8 0, i8 1, i8 2, i8 3>, <8 x i32> <i32 0, i32 4, i32 1, i32 5, i32 2, i32 6, i32 3, i32 7>
+; CHECK-NEXT:    ret <8 x i8> [[TMP3]]
+;
+  %1 = shufflevector <4 x i8> %x, <4 x i8> %y, <8 x i32> <i32 7, i32 4, i32 1, i32 6, i32 3, i32 0, i32 2, i32 5>
+  %2 = shufflevector <8 x i8> %1, <8 x i8> <i8 0, i8 9, i8 1, i8 9, i8 2, i8 9, i8 3, i8 9>, <8 x i32> <i32 0, i32 8, i32 2, i32 10, i32 4, i32 12, i32 6, i32 14>
+  ret <8 x i8> %2
+}
+
+; Interleaving constant with random shuffle - SHOULD compact
+define <8 x i8> @interleave_constant_shuffle(<4 x i8> %x, <4 x i8> %y) {
+; CHECK-LABEL: @interleave_constant_shuffle(
+; CHECK-NEXT:    [[TMP1:%.*]] = shufflevector <4 x i8> [[X:%.*]], <4 x i8> [[Y:%.*]], <4 x i32> <i32 7, i32 1, i32 3, i32 2>
+; CHECK-NEXT:    [[TMP3:%.*]] = shufflevector <4 x i8> <i8 0, i8 1, i8 2, i8 3>, <4 x i8> [[TMP1]], <8 x i32> <i32 0, i32 4, i32 1, i32 5, i32 2, i32 6, i32 3, i32 7>
+; CHECK-NEXT:    ret <8 x i8> [[TMP3]]
+;
+  %1 = shufflevector <4 x i8> %x, <4 x i8> %y, <8 x i32> <i32 7, i32 4, i32 1, i32 6, i32 3, i32 0, i32 2, i32 5>
+  %2 = shufflevector <8 x i8> <i8 0, i8 9, i8 1, i8 9, i8 2, i8 9, i8 3, i8 9>, <8 x i8> %1, <8 x i32> <i32 0, i32 8, i32 2, i32 10, i32 4, i32 12, i32 6, i32 14>
+  ret <8 x i8> %2
+}
+
+; Randomly shuffle random shuffle with constant operand - SHOULD compact
+define <8 x i8> @shuffle_shuffle_constant(<4 x i8> %x, <4 x i8> %y) {
+; CHECK-LABEL: @shuffle_shuffle_constant(
+; CHECK-NE...
[truncated]

dtcxzyw · 2026-01-10T14:27:28Z

llvm/lib/Transforms/InstCombine/InstCombineVectorOps.cpp

+  /// Whether this operand can be compacted (has a single use and is either
+  /// a constant or another shuffle instruction).
+  bool CanCompact;
+  /// Conservative heuristic: whether this operand's compaction justifies


I'd suggest implementing this in VectorCombine, as the vector length is changed. In VectorCombine, you can use the cost model for a better heuristic.

nikic · 2026-01-10T14:28:16Z

Pretty sure this needs to go in VectorCombine. Supported shuffles are target specific, and "simplifying" to a non-native shuffle can be very expensive.

RKSimon · 2026-01-10T15:34:22Z

VectorCombine has foldShuffleOfShuffles which might be extendable to handle this.

RKSimon

This needs to be moved to vectorcombine - anything but the most trivial shuffle combines need to be cost driven.

ginsbach · 2026-01-15T13:25:04Z

Thank you all for the feedback. I have implemented the transformation in VectorCombine as suggested: #176074

[InstCombine] Compact shuffle operands by eliminating unused elements

910a62f

replace undef with poison in test cases

57b66f1

ginsbach marked this pull request as ready for review January 10, 2026 10:07

ginsbach requested a review from nikic as a code owner January 10, 2026 10:07

llvmbot added backend:AMDGPU llvm:instcombine Covers the InstCombine, InstSimplify and AggressiveInstCombine passes llvm:transforms labels Jan 10, 2026

dtcxzyw reviewed Jan 10, 2026

View reviewed changes

nikic requested a review from RKSimon January 10, 2026 14:27

RKSimon requested changes Jan 11, 2026

View reviewed changes

ginsbach closed this Jan 15, 2026

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[InstCombine] Compact shuffle operands by eliminating unused elements #175255

[InstCombine] Compact shuffle operands by eliminating unused elements #175255

Uh oh!

ginsbach commented Jan 9, 2026

Uh oh!

github-actions bot commented Jan 9, 2026 •

edited

Loading

Uh oh!

llvmbot commented Jan 10, 2026

Motivation

Implementation

Questions for Reviewers

Uh oh!

llvmbot commented Jan 10, 2026

Motivation

Implementation

Questions for Reviewers

Uh oh!

dtcxzyw Jan 10, 2026

Uh oh!

RKSimon Jan 10, 2026

Uh oh!

nikic commented Jan 10, 2026

Uh oh!

RKSimon commented Jan 10, 2026

Uh oh!

RKSimon left a comment

Uh oh!

ginsbach commented Jan 15, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

5 participants

[InstCombine] Compact shuffle operands by eliminating unused elements #175255

[InstCombine] Compact shuffle operands by eliminating unused elements #175255

Uh oh!

Conversation

ginsbach commented Jan 9, 2026

Motivation

Implementation

Questions for Reviewers

Uh oh!

github-actions bot commented Jan 9, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

llvmbot commented Jan 10, 2026

Motivation

Implementation

Questions for Reviewers

Uh oh!

llvmbot commented Jan 10, 2026

Motivation

Implementation

Questions for Reviewers

Uh oh!

dtcxzyw Jan 10, 2026

Choose a reason for hiding this comment

Uh oh!

RKSimon Jan 10, 2026

Choose a reason for hiding this comment

Uh oh!

nikic commented Jan 10, 2026

Uh oh!

RKSimon commented Jan 10, 2026

Uh oh!

RKSimon left a comment

Choose a reason for hiding this comment

Uh oh!

ginsbach commented Jan 15, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

5 participants

github-actions bot commented Jan 9, 2026 •

edited

Loading