Skip to content

Conversation

@ginsbach
Copy link
Contributor

This patch extends BuildVectorSDNode::isConstantSequence to recognize constant sequences that contain undef elements at any position.

The new implementation finds the first two non-undef constant elements, computes the stride from their difference, then verifies all other defined elements match the sequence. This enables SVE's INDEX instruction to be used in more cases.

This change particularly benefits ZIP1/ZIP2 patterns where one operand is a constant sequence. When a smaller constant vector like <0, 1, 2, 3> is used in a ZIP1 shuffle producing a wider result, it gets expanded with trailing undefs. Similarly, for ZIP2 patterns, the DAG combiner transforms the constant to have leading undefs since ZIP2 only uses the upper half of its operands.

In particular, these patterns arise naturally from VectorCombine's compactShuffleOperands optimization (see #176074) that I am suggesting as a fix for #137447.

@github-actions
Copy link

github-actions bot commented Jan 18, 2026

✅ With the latest revision this PR passed the undef deprecator.

@ginsbach ginsbach force-pushed the improve-isConstantSequence branch from decb831 to 3df3bce Compare January 18, 2026 22:20
@ginsbach ginsbach marked this pull request as ready for review January 18, 2026 23:11
@llvmbot llvmbot added backend:AArch64 llvm:SelectionDAG SelectionDAGISel as well labels Jan 18, 2026
@llvmbot
Copy link
Member

llvmbot commented Jan 18, 2026

@llvm/pr-subscribers-backend-aarch64

Author: Philip Ginsbach-Chen (ginsbach)

Changes

This patch extends BuildVectorSDNode::isConstantSequence to recognize constant sequences that contain undef elements at any position.

The new implementation finds the first two non-undef constant elements, computes the stride from their difference, then verifies all other defined elements match the sequence. This enables SVE's INDEX instruction to be used in more cases.

This change particularly benefits ZIP1/ZIP2 patterns where one operand is a constant sequence. When a smaller constant vector like &lt;0, 1, 2, 3&gt; is used in a ZIP1 shuffle producing a wider result, it gets expanded with trailing undefs. Similarly, for ZIP2 patterns, the DAG combiner transforms the constant to have leading undefs since ZIP2 only uses the upper half of its operands.

In particular, these patterns arise naturally from VectorCombine's compactShuffleOperands optimization (see #176074) that I am suggesting as a fix for #137447.


Full diff: https://github.com/llvm/llvm-project/pull/176671.diff

2 Files Affected:

  • (modified) llvm/lib/CodeGen/SelectionDAG/SelectionDAG.cpp (+34-14)
  • (modified) llvm/test/CodeGen/AArch64/sve-fixed-length-build-vector.ll (+125)
diff --git a/llvm/lib/CodeGen/SelectionDAG/SelectionDAG.cpp b/llvm/lib/CodeGen/SelectionDAG/SelectionDAG.cpp
index 35e443b40c41f..74cc8fb2ed366 100644
--- a/llvm/lib/CodeGen/SelectionDAG/SelectionDAG.cpp
+++ b/llvm/lib/CodeGen/SelectionDAG/SelectionDAG.cpp
@@ -14037,26 +14037,46 @@ BuildVectorSDNode::isConstantSequence() const {
   if (NumOps < 2)
     return std::nullopt;
 
-  if (!isa<ConstantSDNode>(getOperand(0)) ||
-      !isa<ConstantSDNode>(getOperand(1)))
-    return std::nullopt;
-
   unsigned EltSize = getValueType(0).getScalarSizeInBits();
-  APInt Start = getConstantOperandAPInt(0).trunc(EltSize);
-  APInt Stride = getConstantOperandAPInt(1).trunc(EltSize) - Start;
-
-  if (Stride.isZero())
-    return std::nullopt;
+  APInt Start, Stride;
+  int FirstIdx = -1, SecondIdx = -1;
 
-  for (unsigned i = 2; i < NumOps; ++i) {
-    if (!isa<ConstantSDNode>(getOperand(i)))
+  // Find the first two non-undef constant elements to determine Start and
+  // Stride, then verify all remaining elements match the sequence.
+  for (unsigned I = 0; I < NumOps; ++I) {
+    SDValue Op = getOperand(I);
+    if (Op->isUndef())
+      continue;
+    if (!isa<ConstantSDNode>(Op))
       return std::nullopt;
 
-    APInt Val = getConstantOperandAPInt(i).trunc(EltSize);
-    if (Val != (Start + (Stride * i)))
-      return std::nullopt;
+    APInt Val = getConstantOperandAPInt(I).trunc(EltSize);
+    if (FirstIdx < 0) {
+      FirstIdx = I;
+      Start = Val;
+    } else if (SecondIdx < 0) {
+      SecondIdx = I;
+      // Compute stride based on the difference between the two elements.
+      unsigned IdxDiff = I - FirstIdx;
+      APInt ValDiff = Val - Start;
+      if (ValDiff.srem(IdxDiff) != 0)
+        return std::nullopt;
+      Stride = ValDiff.sdiv(IdxDiff);
+      if (Stride.isZero())
+        return std::nullopt;
+      // Adjust Start based on the first defined element's index.
+      Start -= Stride * FirstIdx;
+    } else {
+      // Verify this element matches the sequence.
+      if (Val != Start + Stride * I)
+        return std::nullopt;
+    }
   }
 
+  // Need at least two defined elements.
+  if (SecondIdx < 0)
+    return std::nullopt;
+
   return std::make_pair(Start, Stride);
 }
 
diff --git a/llvm/test/CodeGen/AArch64/sve-fixed-length-build-vector.ll b/llvm/test/CodeGen/AArch64/sve-fixed-length-build-vector.ll
index 47fda39d84001..09f6693373642 100644
--- a/llvm/test/CodeGen/AArch64/sve-fixed-length-build-vector.ll
+++ b/llvm/test/CodeGen/AArch64/sve-fixed-length-build-vector.ll
@@ -65,4 +65,129 @@ define void @build_vector_no_stride_v4i64(ptr %a) #0 {
   ret void
 }
 
+; Sequence with trailing poison elements.
+define void @build_vector_trailing_poison_v8i32(ptr %a) #0 {
+; VBITS_GE_256-LABEL: build_vector_trailing_poison_v8i32:
+; VBITS_GE_256:       // %bb.0:
+; VBITS_GE_256-NEXT:    index z0.s, #0, #3
+; VBITS_GE_256-NEXT:    ptrue p0.s, vl8
+; VBITS_GE_256-NEXT:    st1w { z0.s }, p0, [x0]
+; VBITS_GE_256-NEXT:    ret
+  store <8 x i32> <i32 0, i32 3, i32 6, i32 9, i32 12, i32 15, i32 poison, i32 poison>, ptr %a, align 4
+  ret void
+}
+
+; Sequence with leading poison elements.
+define void @build_vector_leading_poison_v8i32(ptr %a) #0 {
+; VBITS_GE_256-LABEL: build_vector_leading_poison_v8i32:
+; VBITS_GE_256:       // %bb.0:
+; VBITS_GE_256-NEXT:    index z0.s, #0, #3
+; VBITS_GE_256-NEXT:    ptrue p0.s, vl8
+; VBITS_GE_256-NEXT:    st1w { z0.s }, p0, [x0]
+; VBITS_GE_256-NEXT:    ret
+  store <8 x i32> <i32 poison, i32 poison, i32 6, i32 9, i32 12, i32 15, i32 18, i32 21>, ptr %a, align 4
+  ret void
+}
+
+; Sequence with poison elements in the middle.
+define void @build_vector_middle_poison_v8i32(ptr %a) #0 {
+; VBITS_GE_256-LABEL: build_vector_middle_poison_v8i32:
+; VBITS_GE_256:       // %bb.0:
+; VBITS_GE_256-NEXT:    index z0.s, #0, #3
+; VBITS_GE_256-NEXT:    ptrue p0.s, vl8
+; VBITS_GE_256-NEXT:    st1w { z0.s }, p0, [x0]
+; VBITS_GE_256-NEXT:    ret
+  store <8 x i32> <i32 0, i32 3, i32 poison, i32 poison, i32 12, i32 15, i32 18, i32 21>, ptr %a, align 4
+  ret void
+}
+
+; Sequence with poison elements scattered throughout.
+define void @build_vector_scattered_poison_v8i32(ptr %a) #0 {
+; VBITS_GE_256-LABEL: build_vector_scattered_poison_v8i32:
+; VBITS_GE_256:       // %bb.0:
+; VBITS_GE_256-NEXT:    index z0.s, #0, #3
+; VBITS_GE_256-NEXT:    ptrue p0.s, vl8
+; VBITS_GE_256-NEXT:    st1w { z0.s }, p0, [x0]
+; VBITS_GE_256-NEXT:    ret
+  store <8 x i32> <i32 poison, i32 3, i32 poison, i32 9, i32 poison, i32 15, i32 poison, i32 21>, ptr %a, align 4
+  ret void
+}
+
+; Sequence with only two defined elements (minimum required).
+define void @build_vector_two_defined_v4i64(ptr %a) #0 {
+; VBITS_GE_256-LABEL: build_vector_two_defined_v4i64:
+; VBITS_GE_256:       // %bb.0:
+; VBITS_GE_256-NEXT:    index z0.d, #5, #7
+; VBITS_GE_256-NEXT:    ptrue p0.d, vl4
+; VBITS_GE_256-NEXT:    st1d { z0.d }, p0, [x0]
+; VBITS_GE_256-NEXT:    ret
+  store <4 x i64> <i64 poison, i64 12, i64 poison, i64 26>, ptr %a, align 8
+  ret void
+}
+
+; Sequence with negative stride and poison elements.
+define void @build_vector_neg_stride_poison_v8i32(ptr %a) #0 {
+; VBITS_GE_256-LABEL: build_vector_neg_stride_poison_v8i32:
+; VBITS_GE_256:       // %bb.0:
+; VBITS_GE_256-NEXT:    index z0.s, #0, #-2
+; VBITS_GE_256-NEXT:    ptrue p0.s, vl8
+; VBITS_GE_256-NEXT:    st1w { z0.s }, p0, [x0]
+; VBITS_GE_256-NEXT:    ret
+  store <8 x i32> <i32 poison, i32 -2, i32 -4, i32 poison, i32 -8, i32 -10, i32 poison, i32 -14>, ptr %a, align 4
+  ret void
+}
+
+; Only one defined element - cannot determine stride, so no index instruction.
+define void @build_vector_single_defined_v8i32(ptr %a) #0 {
+; VBITS_GE_256-LABEL: build_vector_single_defined_v8i32:
+; VBITS_GE_256:       // %bb.0:
+; VBITS_GE_256-NEXT:    mov z0.s, #42 // =0x2a
+; VBITS_GE_256-NEXT:    ptrue p0.s, vl8
+; VBITS_GE_256-NEXT:    st1w { z0.s }, p0, [x0]
+; VBITS_GE_256-NEXT:    ret
+  store <8 x i32> <i32 poison, i32 poison, i32 poison, i32 42, i32 poison, i32 poison, i32 poison, i32 poison>, ptr %a, align 4
+  ret void
+}
+
+; Fractional stride: elements at indices 1 and 3 differ by 3, so stride would be 3/2.
+define void @build_vector_fractional_stride_v8i32(ptr %a) #0 {
+; VBITS_GE_256-LABEL: build_vector_fractional_stride_v8i32:
+; VBITS_GE_256:       // %bb.0:
+; VBITS_GE_256-NEXT:    ptrue p0.s, vl8
+; VBITS_GE_256-NEXT:    adrp x8, .LCPI12_0
+; VBITS_GE_256-NEXT:    add x8, x8, :lo12:.LCPI12_0
+; VBITS_GE_256-NEXT:    ld1w { z0.s }, p0/z, [x8]
+; VBITS_GE_256-NEXT:    st1w { z0.s }, p0, [x0]
+; VBITS_GE_256-NEXT:    ret
+  store <8 x i32> <i32 poison, i32 0, i32 poison, i32 3, i32 poison, i32 poison, i32 poison, i32 poison>, ptr %a, align 4
+  ret void
+}
+
+; zip1 pattern: constant <0, 1, 2, 3> is expanded to <0, 1, 2, 3, poison, poison, poison, poison>
+; to match the shuffle result width. isConstantSequence recognizes this as a sequence.
+define <8 x i8> @zip_const_seq_with_variable(i8 %x) #0 {
+; VBITS_GE_256-LABEL: zip_const_seq_with_variable:
+; VBITS_GE_256:       // %bb.0:
+; VBITS_GE_256-NEXT:    index z0.b, #0, #1
+; VBITS_GE_256-NEXT:    dup v1.8b, w0
+; VBITS_GE_256-NEXT:    zip1 v0.8b, v0.8b, v1.8b
+; VBITS_GE_256-NEXT:    ret
+  %ins = insertelement <4 x i8> poison, i8 %x, i32 0
+  %splat = shufflevector <4 x i8> %ins, <4 x i8> poison, <4 x i32> zeroinitializer
+  %interleave = shufflevector <4 x i8> <i8 0, i8 1, i8 2, i8 3>, <4 x i8> %splat, <8 x i32> <i32 0, i32 4, i32 1, i32 5, i32 2, i32 6, i32 3, i32 7>
+  ret <8 x i8> %interleave
+}
+
+; zip2 pattern: constant <0, 1, 2, 3, 4, 5, 6, 7> is transformed by the DAG combiner to
+; <poison, poison, poison, poison, 4, 5, 6, 7> since zip2 only uses elements 4-7.
+define <8 x i8> @zip2_const_seq_with_variable(<8 x i8> %x) #0 {
+; VBITS_GE_256-LABEL: zip2_const_seq_with_variable:
+; VBITS_GE_256:       // %bb.0:
+; VBITS_GE_256-NEXT:    index z1.b, #0, #1
+; VBITS_GE_256-NEXT:    zip2 v0.8b, v1.8b, v0.8b
+; VBITS_GE_256-NEXT:    ret
+  %interleave = shufflevector <8 x i8> <i8 0, i8 1, i8 2, i8 3, i8 4, i8 5, i8 6, i8 7>, <8 x i8> %x, <8 x i32> <i32 4, i32 12, i32 5, i32 13, i32 6, i32 14, i32 7, i32 15>
+  ret <8 x i8> %interleave
+}
+
 attributes #0 = { "target-features"="+sve" }

@llvmbot
Copy link
Member

llvmbot commented Jan 18, 2026

@llvm/pr-subscribers-llvm-selectiondag

Author: Philip Ginsbach-Chen (ginsbach)

Changes

This patch extends BuildVectorSDNode::isConstantSequence to recognize constant sequences that contain undef elements at any position.

The new implementation finds the first two non-undef constant elements, computes the stride from their difference, then verifies all other defined elements match the sequence. This enables SVE's INDEX instruction to be used in more cases.

This change particularly benefits ZIP1/ZIP2 patterns where one operand is a constant sequence. When a smaller constant vector like &lt;0, 1, 2, 3&gt; is used in a ZIP1 shuffle producing a wider result, it gets expanded with trailing undefs. Similarly, for ZIP2 patterns, the DAG combiner transforms the constant to have leading undefs since ZIP2 only uses the upper half of its operands.

In particular, these patterns arise naturally from VectorCombine's compactShuffleOperands optimization (see #176074) that I am suggesting as a fix for #137447.


Full diff: https://github.com/llvm/llvm-project/pull/176671.diff

2 Files Affected:

  • (modified) llvm/lib/CodeGen/SelectionDAG/SelectionDAG.cpp (+34-14)
  • (modified) llvm/test/CodeGen/AArch64/sve-fixed-length-build-vector.ll (+125)
diff --git a/llvm/lib/CodeGen/SelectionDAG/SelectionDAG.cpp b/llvm/lib/CodeGen/SelectionDAG/SelectionDAG.cpp
index 35e443b40c41f..74cc8fb2ed366 100644
--- a/llvm/lib/CodeGen/SelectionDAG/SelectionDAG.cpp
+++ b/llvm/lib/CodeGen/SelectionDAG/SelectionDAG.cpp
@@ -14037,26 +14037,46 @@ BuildVectorSDNode::isConstantSequence() const {
   if (NumOps < 2)
     return std::nullopt;
 
-  if (!isa<ConstantSDNode>(getOperand(0)) ||
-      !isa<ConstantSDNode>(getOperand(1)))
-    return std::nullopt;
-
   unsigned EltSize = getValueType(0).getScalarSizeInBits();
-  APInt Start = getConstantOperandAPInt(0).trunc(EltSize);
-  APInt Stride = getConstantOperandAPInt(1).trunc(EltSize) - Start;
-
-  if (Stride.isZero())
-    return std::nullopt;
+  APInt Start, Stride;
+  int FirstIdx = -1, SecondIdx = -1;
 
-  for (unsigned i = 2; i < NumOps; ++i) {
-    if (!isa<ConstantSDNode>(getOperand(i)))
+  // Find the first two non-undef constant elements to determine Start and
+  // Stride, then verify all remaining elements match the sequence.
+  for (unsigned I = 0; I < NumOps; ++I) {
+    SDValue Op = getOperand(I);
+    if (Op->isUndef())
+      continue;
+    if (!isa<ConstantSDNode>(Op))
       return std::nullopt;
 
-    APInt Val = getConstantOperandAPInt(i).trunc(EltSize);
-    if (Val != (Start + (Stride * i)))
-      return std::nullopt;
+    APInt Val = getConstantOperandAPInt(I).trunc(EltSize);
+    if (FirstIdx < 0) {
+      FirstIdx = I;
+      Start = Val;
+    } else if (SecondIdx < 0) {
+      SecondIdx = I;
+      // Compute stride based on the difference between the two elements.
+      unsigned IdxDiff = I - FirstIdx;
+      APInt ValDiff = Val - Start;
+      if (ValDiff.srem(IdxDiff) != 0)
+        return std::nullopt;
+      Stride = ValDiff.sdiv(IdxDiff);
+      if (Stride.isZero())
+        return std::nullopt;
+      // Adjust Start based on the first defined element's index.
+      Start -= Stride * FirstIdx;
+    } else {
+      // Verify this element matches the sequence.
+      if (Val != Start + Stride * I)
+        return std::nullopt;
+    }
   }
 
+  // Need at least two defined elements.
+  if (SecondIdx < 0)
+    return std::nullopt;
+
   return std::make_pair(Start, Stride);
 }
 
diff --git a/llvm/test/CodeGen/AArch64/sve-fixed-length-build-vector.ll b/llvm/test/CodeGen/AArch64/sve-fixed-length-build-vector.ll
index 47fda39d84001..09f6693373642 100644
--- a/llvm/test/CodeGen/AArch64/sve-fixed-length-build-vector.ll
+++ b/llvm/test/CodeGen/AArch64/sve-fixed-length-build-vector.ll
@@ -65,4 +65,129 @@ define void @build_vector_no_stride_v4i64(ptr %a) #0 {
   ret void
 }
 
+; Sequence with trailing poison elements.
+define void @build_vector_trailing_poison_v8i32(ptr %a) #0 {
+; VBITS_GE_256-LABEL: build_vector_trailing_poison_v8i32:
+; VBITS_GE_256:       // %bb.0:
+; VBITS_GE_256-NEXT:    index z0.s, #0, #3
+; VBITS_GE_256-NEXT:    ptrue p0.s, vl8
+; VBITS_GE_256-NEXT:    st1w { z0.s }, p0, [x0]
+; VBITS_GE_256-NEXT:    ret
+  store <8 x i32> <i32 0, i32 3, i32 6, i32 9, i32 12, i32 15, i32 poison, i32 poison>, ptr %a, align 4
+  ret void
+}
+
+; Sequence with leading poison elements.
+define void @build_vector_leading_poison_v8i32(ptr %a) #0 {
+; VBITS_GE_256-LABEL: build_vector_leading_poison_v8i32:
+; VBITS_GE_256:       // %bb.0:
+; VBITS_GE_256-NEXT:    index z0.s, #0, #3
+; VBITS_GE_256-NEXT:    ptrue p0.s, vl8
+; VBITS_GE_256-NEXT:    st1w { z0.s }, p0, [x0]
+; VBITS_GE_256-NEXT:    ret
+  store <8 x i32> <i32 poison, i32 poison, i32 6, i32 9, i32 12, i32 15, i32 18, i32 21>, ptr %a, align 4
+  ret void
+}
+
+; Sequence with poison elements in the middle.
+define void @build_vector_middle_poison_v8i32(ptr %a) #0 {
+; VBITS_GE_256-LABEL: build_vector_middle_poison_v8i32:
+; VBITS_GE_256:       // %bb.0:
+; VBITS_GE_256-NEXT:    index z0.s, #0, #3
+; VBITS_GE_256-NEXT:    ptrue p0.s, vl8
+; VBITS_GE_256-NEXT:    st1w { z0.s }, p0, [x0]
+; VBITS_GE_256-NEXT:    ret
+  store <8 x i32> <i32 0, i32 3, i32 poison, i32 poison, i32 12, i32 15, i32 18, i32 21>, ptr %a, align 4
+  ret void
+}
+
+; Sequence with poison elements scattered throughout.
+define void @build_vector_scattered_poison_v8i32(ptr %a) #0 {
+; VBITS_GE_256-LABEL: build_vector_scattered_poison_v8i32:
+; VBITS_GE_256:       // %bb.0:
+; VBITS_GE_256-NEXT:    index z0.s, #0, #3
+; VBITS_GE_256-NEXT:    ptrue p0.s, vl8
+; VBITS_GE_256-NEXT:    st1w { z0.s }, p0, [x0]
+; VBITS_GE_256-NEXT:    ret
+  store <8 x i32> <i32 poison, i32 3, i32 poison, i32 9, i32 poison, i32 15, i32 poison, i32 21>, ptr %a, align 4
+  ret void
+}
+
+; Sequence with only two defined elements (minimum required).
+define void @build_vector_two_defined_v4i64(ptr %a) #0 {
+; VBITS_GE_256-LABEL: build_vector_two_defined_v4i64:
+; VBITS_GE_256:       // %bb.0:
+; VBITS_GE_256-NEXT:    index z0.d, #5, #7
+; VBITS_GE_256-NEXT:    ptrue p0.d, vl4
+; VBITS_GE_256-NEXT:    st1d { z0.d }, p0, [x0]
+; VBITS_GE_256-NEXT:    ret
+  store <4 x i64> <i64 poison, i64 12, i64 poison, i64 26>, ptr %a, align 8
+  ret void
+}
+
+; Sequence with negative stride and poison elements.
+define void @build_vector_neg_stride_poison_v8i32(ptr %a) #0 {
+; VBITS_GE_256-LABEL: build_vector_neg_stride_poison_v8i32:
+; VBITS_GE_256:       // %bb.0:
+; VBITS_GE_256-NEXT:    index z0.s, #0, #-2
+; VBITS_GE_256-NEXT:    ptrue p0.s, vl8
+; VBITS_GE_256-NEXT:    st1w { z0.s }, p0, [x0]
+; VBITS_GE_256-NEXT:    ret
+  store <8 x i32> <i32 poison, i32 -2, i32 -4, i32 poison, i32 -8, i32 -10, i32 poison, i32 -14>, ptr %a, align 4
+  ret void
+}
+
+; Only one defined element - cannot determine stride, so no index instruction.
+define void @build_vector_single_defined_v8i32(ptr %a) #0 {
+; VBITS_GE_256-LABEL: build_vector_single_defined_v8i32:
+; VBITS_GE_256:       // %bb.0:
+; VBITS_GE_256-NEXT:    mov z0.s, #42 // =0x2a
+; VBITS_GE_256-NEXT:    ptrue p0.s, vl8
+; VBITS_GE_256-NEXT:    st1w { z0.s }, p0, [x0]
+; VBITS_GE_256-NEXT:    ret
+  store <8 x i32> <i32 poison, i32 poison, i32 poison, i32 42, i32 poison, i32 poison, i32 poison, i32 poison>, ptr %a, align 4
+  ret void
+}
+
+; Fractional stride: elements at indices 1 and 3 differ by 3, so stride would be 3/2.
+define void @build_vector_fractional_stride_v8i32(ptr %a) #0 {
+; VBITS_GE_256-LABEL: build_vector_fractional_stride_v8i32:
+; VBITS_GE_256:       // %bb.0:
+; VBITS_GE_256-NEXT:    ptrue p0.s, vl8
+; VBITS_GE_256-NEXT:    adrp x8, .LCPI12_0
+; VBITS_GE_256-NEXT:    add x8, x8, :lo12:.LCPI12_0
+; VBITS_GE_256-NEXT:    ld1w { z0.s }, p0/z, [x8]
+; VBITS_GE_256-NEXT:    st1w { z0.s }, p0, [x0]
+; VBITS_GE_256-NEXT:    ret
+  store <8 x i32> <i32 poison, i32 0, i32 poison, i32 3, i32 poison, i32 poison, i32 poison, i32 poison>, ptr %a, align 4
+  ret void
+}
+
+; zip1 pattern: constant <0, 1, 2, 3> is expanded to <0, 1, 2, 3, poison, poison, poison, poison>
+; to match the shuffle result width. isConstantSequence recognizes this as a sequence.
+define <8 x i8> @zip_const_seq_with_variable(i8 %x) #0 {
+; VBITS_GE_256-LABEL: zip_const_seq_with_variable:
+; VBITS_GE_256:       // %bb.0:
+; VBITS_GE_256-NEXT:    index z0.b, #0, #1
+; VBITS_GE_256-NEXT:    dup v1.8b, w0
+; VBITS_GE_256-NEXT:    zip1 v0.8b, v0.8b, v1.8b
+; VBITS_GE_256-NEXT:    ret
+  %ins = insertelement <4 x i8> poison, i8 %x, i32 0
+  %splat = shufflevector <4 x i8> %ins, <4 x i8> poison, <4 x i32> zeroinitializer
+  %interleave = shufflevector <4 x i8> <i8 0, i8 1, i8 2, i8 3>, <4 x i8> %splat, <8 x i32> <i32 0, i32 4, i32 1, i32 5, i32 2, i32 6, i32 3, i32 7>
+  ret <8 x i8> %interleave
+}
+
+; zip2 pattern: constant <0, 1, 2, 3, 4, 5, 6, 7> is transformed by the DAG combiner to
+; <poison, poison, poison, poison, 4, 5, 6, 7> since zip2 only uses elements 4-7.
+define <8 x i8> @zip2_const_seq_with_variable(<8 x i8> %x) #0 {
+; VBITS_GE_256-LABEL: zip2_const_seq_with_variable:
+; VBITS_GE_256:       // %bb.0:
+; VBITS_GE_256-NEXT:    index z1.b, #0, #1
+; VBITS_GE_256-NEXT:    zip2 v0.8b, v1.8b, v0.8b
+; VBITS_GE_256-NEXT:    ret
+  %interleave = shufflevector <8 x i8> <i8 0, i8 1, i8 2, i8 3, i8 4, i8 5, i8 6, i8 7>, <8 x i8> %x, <8 x i32> <i32 4, i32 12, i32 5, i32 13, i32 6, i32 14, i32 7, i32 15>
+  ret <8 x i8> %interleave
+}
+
 attributes #0 = { "target-features"="+sve" }

@ginsbach ginsbach force-pushed the improve-isConstantSequence branch from 3df3bce to 90b1b95 Compare January 18, 2026 23:14
Comment on lines 14064 to 14066
Stride = ValDiff.sdiv(IdxDiff);
if (Stride.isZero())
return std::nullopt;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

APInt has divrem (but can you avoid division here?)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As detailed below, I have changed the implementation to use modular arithmetic with APInt::multiplicativeInverse(), based on @jayfoad's observation.

@RKSimon RKSimon self-requested a review January 19, 2026 07:29
@jayfoad
Copy link
Contributor

jayfoad commented Jan 19, 2026

This introduces interesting questions about signed or unsigned overflow during an arithmetic sequence. If we want to ignore overflow (assume 2s complement wrap around) then all of these are possible:

i8 <0, undef, undef, 0xFF> -> <0, 0x55>
i8 <0, undef, undef, 0xFE> -> <0, 0xAA>
i8 <0, undef, undef, 0xFD> -> <0, 0xFF>

Your patch uses sdiv/srem so I think it will handle the last case only. If you used udiv/urem you could handle the first case only.

@github-actions
Copy link

github-actions bot commented Jan 19, 2026

✅ With the latest revision this PR passed the C/C++ code formatter.

@ginsbach
Copy link
Contributor Author

This introduces interesting questions about signed or unsigned overflow during an arithmetic sequence. If we want to ignore overflow (assume 2s complement wrap around) then all of these are possible:

i8 <0, undef, undef, 0xFF> -> <0, 0x55>
i8 <0, undef, undef, 0xFE> -> <0, 0xAA>
i8 <0, undef, undef, 0xFD> -> <0, 0xFF>

Your patch uses sdiv/srem so I think it will handle the last case only. If you used udiv/urem you could handle the first case only.

Good point! I've pushed a second commit that uses modular arithmetic with APInt::multiplicativeInverse() to handle all valid cases.

The approach: to solve Stride × IdxDiff ≡ ValDiff (mod 2^n), we factor out common powers of 2 from both sides, then use multiplicative inverse (which exists for odd numbers mod 2^n). The only rejection case is when ValDiff has fewer trailing zeros than IdxDiff - meaning no modular solution exists.

This is appropriate because the target (SVE's INDEX instruction) uses modular wraparound arithmetic, and modular arithmetic is agnostic to signed/unsigned interpretation. I've updated the docstring for isConstantSequence() to document this behavior.

The modular inverse approach adds some complexity compared to simple division, but it makes the implementation mathematically complete - isConstantSequence now recognizes every valid arithmetic sequence mod 2^n, with no edge cases left on the table. I think that might be worth the extra complexity.

@ginsbach
Copy link
Contributor Author

@jayfoad are you happy with the updated version of this PR using modular arithmetic?

@ginsbach ginsbach requested a review from arsenm January 27, 2026 21:45
Copy link
Contributor

@jayfoad jayfoad left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Makes sense to me.

For completeness note that there are still more cases that could be handled e.g.:

i8 <0, undef, 0x80, 0x40> -> {0, 0xC0}

When the first two defined elements give an index difference with
trailing zeros, there are multiple valid strides (mod 2^EltSize).
Previously we only tried one; now we try all candidates until one
matches all elements or we exhaust the space.

Example: <0, poison, 0x80, 0x40> has IdxDiff=2, giving 2 candidate
strides. The first (0x40) fails at index 3, but the second (0xC0) works.
@ginsbach
Copy link
Contributor Author

For completeness note that there are still more cases that could be handled e.g.:

i8 <0, undef, 0x80, 0x40> -> {0, 0xC0}

Great catch!

I've fixed that in commit 4. The complexity is getting a bit out of hand for what was supposed to be a simple method, but I think it's hard to justify the modular arithmetic if it then isn't even complete. Let me know what you think.

@ginsbach ginsbach requested a review from jayfoad January 29, 2026 10:51
@jayfoad
Copy link
Contributor

jayfoad commented Jan 29, 2026

For completeness note that there are still more cases that could be handled e.g.:
i8 <0, undef, 0x80, 0x40> -> {0, 0xC0}

Great catch!

I've fixed that in commit 4. The complexity is getting a bit out of hand for what was supposed to be a simple method, but I think it's hard to justify the modular arithmetic if it then isn't even complete. Let me know what you think.

I have to say I am not too keen on commit 4. I'll add some more specific comments, but one option is just to ditch it and leave a TODO for now.

Stride = SmallestStride;

// Step 3: Adjust Start based on the first defined element's index.
Start -= Stride * FirstIdx;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nit: this is confusing for readers, since it changes the meaning of Start depending on whether SecondIdx < 0 or not.

Start -= StrideInc * FirstIdx;
} while (Val != Start + Stride * I);
// All strides produce the same value at SecondIdx, so skip re-checking.
I = SecondIdx;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Redoing all the checks from SecondIdx + 1 feels expensive and I think it can be avoided. Every time you accept a third-or-later value, you could update StrideInc based on the minimum number of trailing zeros of any of the values you have accepted so far.

// Verify this element matches the sequence, trying alternative strides.
if (Val == Start + Stride * I)
continue;
do {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looping over all possible strides feels expensive and messy. As an alternative, could you redo the multiplicativeInverse calculation from above and check that the result is compatible with the previous result (i.e. differs only in high order bits that you don't care about)?

@jayfoad
Copy link
Contributor

jayfoad commented Jan 29, 2026

Here's another half-baked idea for reimplemeting commit 4. Take it with a pinch of salt.

First, find an arbitrary defined element at FirstIdx. Then iterate over all the other elements in a specific order, based on the number of trailing zeros in the "IdxDiff" value vs FirstIdx:

  • First all the ones with IdxDiff odd, (IdxDiff % 2) == 1.
  • Then all the ones with (IdxDiff % 4) == 2.
  • Etc

The idea is that the ones you visit first will always have the most restrictions on the high bits of the resulting Stride.


// There are 2^CommonPow2Bits valid strides, evenly spaced by StrideInc.
// Valid strides: {SmallestStride + k*StrideInc | k = 0, 1, ...}
StrideInc = APInt(EltSize, 1) << (EltSize - CommonPow2Bits);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think there's an APInt helper to create masks like this

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

backend:AArch64 llvm:SelectionDAG SelectionDAGISel as well

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants