[SelectionDAG] Handle undef at any position in isConstantSequence #176671

ginsbach · 2026-01-18T22:12:22Z

This patch extends BuildVectorSDNode::isConstantSequence to recognize constant sequences that contain undef elements at any position.

The new implementation finds the first two non-undef constant elements, computes the stride from their difference, then verifies all other defined elements match the sequence. This enables SVE's INDEX instruction to be used in more cases.

This change particularly benefits ZIP1/ZIP2 patterns where one operand is a constant sequence. When a smaller constant vector like <0, 1, 2, 3> is used in a ZIP1 shuffle producing a wider result, it gets expanded with trailing undefs. Similarly, for ZIP2 patterns, the DAG combiner transforms the constant to have leading undefs since ZIP2 only uses the upper half of its operands.

In particular, these patterns arise naturally from VectorCombine's compactShuffleOperands optimization (see #176074) that I am suggesting as a fix for #137447.

github-actions · 2026-01-18T22:14:12Z

✅ With the latest revision this PR passed the undef deprecator.

llvmbot · 2026-01-18T23:12:20Z

@llvm/pr-subscribers-backend-aarch64

Author: Philip Ginsbach-Chen (ginsbach)

Changes

This patch extends BuildVectorSDNode::isConstantSequence to recognize constant sequences that contain undef elements at any position.

The new implementation finds the first two non-undef constant elements, computes the stride from their difference, then verifies all other defined elements match the sequence. This enables SVE's INDEX instruction to be used in more cases.

This change particularly benefits ZIP1/ZIP2 patterns where one operand is a constant sequence. When a smaller constant vector like <0, 1, 2, 3> is used in a ZIP1 shuffle producing a wider result, it gets expanded with trailing undefs. Similarly, for ZIP2 patterns, the DAG combiner transforms the constant to have leading undefs since ZIP2 only uses the upper half of its operands.

In particular, these patterns arise naturally from VectorCombine's compactShuffleOperands optimization (see #176074) that I am suggesting as a fix for #137447.

Full diff: https://github.com/llvm/llvm-project/pull/176671.diff

2 Files Affected:

(modified) llvm/lib/CodeGen/SelectionDAG/SelectionDAG.cpp (+34-14)
(modified) llvm/test/CodeGen/AArch64/sve-fixed-length-build-vector.ll (+125)

diff --git a/llvm/lib/CodeGen/SelectionDAG/SelectionDAG.cpp b/llvm/lib/CodeGen/SelectionDAG/SelectionDAG.cpp
index 35e443b40c41f..74cc8fb2ed366 100644
--- a/llvm/lib/CodeGen/SelectionDAG/SelectionDAG.cpp
+++ b/llvm/lib/CodeGen/SelectionDAG/SelectionDAG.cpp
@@ -14037,26 +14037,46 @@ BuildVectorSDNode::isConstantSequence() const {
   if (NumOps < 2)
     return std::nullopt;
 
-  if (!isa<ConstantSDNode>(getOperand(0)) ||
-      !isa<ConstantSDNode>(getOperand(1)))
-    return std::nullopt;
-
   unsigned EltSize = getValueType(0).getScalarSizeInBits();
-  APInt Start = getConstantOperandAPInt(0).trunc(EltSize);
-  APInt Stride = getConstantOperandAPInt(1).trunc(EltSize) - Start;
-
-  if (Stride.isZero())
-    return std::nullopt;
+  APInt Start, Stride;
+  int FirstIdx = -1, SecondIdx = -1;
 
-  for (unsigned i = 2; i < NumOps; ++i) {
-    if (!isa<ConstantSDNode>(getOperand(i)))
+  // Find the first two non-undef constant elements to determine Start and
+  // Stride, then verify all remaining elements match the sequence.
+  for (unsigned I = 0; I < NumOps; ++I) {
+    SDValue Op = getOperand(I);
+    if (Op->isUndef())
+      continue;
+    if (!isa<ConstantSDNode>(Op))
       return std::nullopt;
 
-    APInt Val = getConstantOperandAPInt(i).trunc(EltSize);
-    if (Val != (Start + (Stride * i)))
-      return std::nullopt;
+    APInt Val = getConstantOperandAPInt(I).trunc(EltSize);
+    if (FirstIdx < 0) {
+      FirstIdx = I;
+      Start = Val;
+    } else if (SecondIdx < 0) {
+      SecondIdx = I;
+      // Compute stride based on the difference between the two elements.
+      unsigned IdxDiff = I - FirstIdx;
+      APInt ValDiff = Val - Start;
+      if (ValDiff.srem(IdxDiff) != 0)
+        return std::nullopt;
+      Stride = ValDiff.sdiv(IdxDiff);
+      if (Stride.isZero())
+        return std::nullopt;
+      // Adjust Start based on the first defined element's index.
+      Start -= Stride * FirstIdx;
+    } else {
+      // Verify this element matches the sequence.
+      if (Val != Start + Stride * I)
+        return std::nullopt;
+    }
   }
 
+  // Need at least two defined elements.
+  if (SecondIdx < 0)
+    return std::nullopt;
+
   return std::make_pair(Start, Stride);
 }
 
diff --git a/llvm/test/CodeGen/AArch64/sve-fixed-length-build-vector.ll b/llvm/test/CodeGen/AArch64/sve-fixed-length-build-vector.ll
index 47fda39d84001..09f6693373642 100644
--- a/llvm/test/CodeGen/AArch64/sve-fixed-length-build-vector.ll
+++ b/llvm/test/CodeGen/AArch64/sve-fixed-length-build-vector.ll
@@ -65,4 +65,129 @@ define void @build_vector_no_stride_v4i64(ptr %a) #0 {
   ret void
 }
 
+; Sequence with trailing poison elements.
+define void @build_vector_trailing_poison_v8i32(ptr %a) #0 {
+; VBITS_GE_256-LABEL: build_vector_trailing_poison_v8i32:
+; VBITS_GE_256:       // %bb.0:
+; VBITS_GE_256-NEXT:    index z0.s, #0, #3
+; VBITS_GE_256-NEXT:    ptrue p0.s, vl8
+; VBITS_GE_256-NEXT:    st1w { z0.s }, p0, [x0]
+; VBITS_GE_256-NEXT:    ret
+  store <8 x i32> <i32 0, i32 3, i32 6, i32 9, i32 12, i32 15, i32 poison, i32 poison>, ptr %a, align 4
+  ret void
+}
+
+; Sequence with leading poison elements.
+define void @build_vector_leading_poison_v8i32(ptr %a) #0 {
+; VBITS_GE_256-LABEL: build_vector_leading_poison_v8i32:
+; VBITS_GE_256:       // %bb.0:
+; VBITS_GE_256-NEXT:    index z0.s, #0, #3
+; VBITS_GE_256-NEXT:    ptrue p0.s, vl8
+; VBITS_GE_256-NEXT:    st1w { z0.s }, p0, [x0]
+; VBITS_GE_256-NEXT:    ret
+  store <8 x i32> <i32 poison, i32 poison, i32 6, i32 9, i32 12, i32 15, i32 18, i32 21>, ptr %a, align 4
+  ret void
+}
+
+; Sequence with poison elements in the middle.
+define void @build_vector_middle_poison_v8i32(ptr %a) #0 {
+; VBITS_GE_256-LABEL: build_vector_middle_poison_v8i32:
+; VBITS_GE_256:       // %bb.0:
+; VBITS_GE_256-NEXT:    index z0.s, #0, #3
+; VBITS_GE_256-NEXT:    ptrue p0.s, vl8
+; VBITS_GE_256-NEXT:    st1w { z0.s }, p0, [x0]
+; VBITS_GE_256-NEXT:    ret
+  store <8 x i32> <i32 0, i32 3, i32 poison, i32 poison, i32 12, i32 15, i32 18, i32 21>, ptr %a, align 4
+  ret void
+}
+
+; Sequence with poison elements scattered throughout.
+define void @build_vector_scattered_poison_v8i32(ptr %a) #0 {
+; VBITS_GE_256-LABEL: build_vector_scattered_poison_v8i32:
+; VBITS_GE_256:       // %bb.0:
+; VBITS_GE_256-NEXT:    index z0.s, #0, #3
+; VBITS_GE_256-NEXT:    ptrue p0.s, vl8
+; VBITS_GE_256-NEXT:    st1w { z0.s }, p0, [x0]
+; VBITS_GE_256-NEXT:    ret
+  store <8 x i32> <i32 poison, i32 3, i32 poison, i32 9, i32 poison, i32 15, i32 poison, i32 21>, ptr %a, align 4
+  ret void
+}
+
+; Sequence with only two defined elements (minimum required).
+define void @build_vector_two_defined_v4i64(ptr %a) #0 {
+; VBITS_GE_256-LABEL: build_vector_two_defined_v4i64:
+; VBITS_GE_256:       // %bb.0:
+; VBITS_GE_256-NEXT:    index z0.d, #5, #7
+; VBITS_GE_256-NEXT:    ptrue p0.d, vl4
+; VBITS_GE_256-NEXT:    st1d { z0.d }, p0, [x0]
+; VBITS_GE_256-NEXT:    ret
+  store <4 x i64> <i64 poison, i64 12, i64 poison, i64 26>, ptr %a, align 8
+  ret void
+}
+
+; Sequence with negative stride and poison elements.
+define void @build_vector_neg_stride_poison_v8i32(ptr %a) #0 {
+; VBITS_GE_256-LABEL: build_vector_neg_stride_poison_v8i32:
+; VBITS_GE_256:       // %bb.0:
+; VBITS_GE_256-NEXT:    index z0.s, #0, #-2
+; VBITS_GE_256-NEXT:    ptrue p0.s, vl8
+; VBITS_GE_256-NEXT:    st1w { z0.s }, p0, [x0]
+; VBITS_GE_256-NEXT:    ret
+  store <8 x i32> <i32 poison, i32 -2, i32 -4, i32 poison, i32 -8, i32 -10, i32 poison, i32 -14>, ptr %a, align 4
+  ret void
+}
+
+; Only one defined element - cannot determine stride, so no index instruction.
+define void @build_vector_single_defined_v8i32(ptr %a) #0 {
+; VBITS_GE_256-LABEL: build_vector_single_defined_v8i32:
+; VBITS_GE_256:       // %bb.0:
+; VBITS_GE_256-NEXT:    mov z0.s, #42 // =0x2a
+; VBITS_GE_256-NEXT:    ptrue p0.s, vl8
+; VBITS_GE_256-NEXT:    st1w { z0.s }, p0, [x0]
+; VBITS_GE_256-NEXT:    ret
+  store <8 x i32> <i32 poison, i32 poison, i32 poison, i32 42, i32 poison, i32 poison, i32 poison, i32 poison>, ptr %a, align 4
+  ret void
+}
+
+; Fractional stride: elements at indices 1 and 3 differ by 3, so stride would be 3/2.
+define void @build_vector_fractional_stride_v8i32(ptr %a) #0 {
+; VBITS_GE_256-LABEL: build_vector_fractional_stride_v8i32:
+; VBITS_GE_256:       // %bb.0:
+; VBITS_GE_256-NEXT:    ptrue p0.s, vl8
+; VBITS_GE_256-NEXT:    adrp x8, .LCPI12_0
+; VBITS_GE_256-NEXT:    add x8, x8, :lo12:.LCPI12_0
+; VBITS_GE_256-NEXT:    ld1w { z0.s }, p0/z, [x8]
+; VBITS_GE_256-NEXT:    st1w { z0.s }, p0, [x0]
+; VBITS_GE_256-NEXT:    ret
+  store <8 x i32> <i32 poison, i32 0, i32 poison, i32 3, i32 poison, i32 poison, i32 poison, i32 poison>, ptr %a, align 4
+  ret void
+}
+
+; zip1 pattern: constant <0, 1, 2, 3> is expanded to <0, 1, 2, 3, poison, poison, poison, poison>
+; to match the shuffle result width. isConstantSequence recognizes this as a sequence.
+define <8 x i8> @zip_const_seq_with_variable(i8 %x) #0 {
+; VBITS_GE_256-LABEL: zip_const_seq_with_variable:
+; VBITS_GE_256:       // %bb.0:
+; VBITS_GE_256-NEXT:    index z0.b, #0, #1
+; VBITS_GE_256-NEXT:    dup v1.8b, w0
+; VBITS_GE_256-NEXT:    zip1 v0.8b, v0.8b, v1.8b
+; VBITS_GE_256-NEXT:    ret
+  %ins = insertelement <4 x i8> poison, i8 %x, i32 0
+  %splat = shufflevector <4 x i8> %ins, <4 x i8> poison, <4 x i32> zeroinitializer
+  %interleave = shufflevector <4 x i8> <i8 0, i8 1, i8 2, i8 3>, <4 x i8> %splat, <8 x i32> <i32 0, i32 4, i32 1, i32 5, i32 2, i32 6, i32 3, i32 7>
+  ret <8 x i8> %interleave
+}
+
+; zip2 pattern: constant <0, 1, 2, 3, 4, 5, 6, 7> is transformed by the DAG combiner to
+; <poison, poison, poison, poison, 4, 5, 6, 7> since zip2 only uses elements 4-7.
+define <8 x i8> @zip2_const_seq_with_variable(<8 x i8> %x) #0 {
+; VBITS_GE_256-LABEL: zip2_const_seq_with_variable:
+; VBITS_GE_256:       // %bb.0:
+; VBITS_GE_256-NEXT:    index z1.b, #0, #1
+; VBITS_GE_256-NEXT:    zip2 v0.8b, v1.8b, v0.8b
+; VBITS_GE_256-NEXT:    ret
+  %interleave = shufflevector <8 x i8> <i8 0, i8 1, i8 2, i8 3, i8 4, i8 5, i8 6, i8 7>, <8 x i8> %x, <8 x i32> <i32 4, i32 12, i32 5, i32 13, i32 6, i32 14, i32 7, i32 15>
+  ret <8 x i8> %interleave
+}
+
 attributes #0 = { "target-features"="+sve" }

llvmbot · 2026-01-18T23:12:21Z

@llvm/pr-subscribers-llvm-selectiondag

Author: Philip Ginsbach-Chen (ginsbach)

Changes

This patch extends BuildVectorSDNode::isConstantSequence to recognize constant sequences that contain undef elements at any position.

The new implementation finds the first two non-undef constant elements, computes the stride from their difference, then verifies all other defined elements match the sequence. This enables SVE's INDEX instruction to be used in more cases.

This change particularly benefits ZIP1/ZIP2 patterns where one operand is a constant sequence. When a smaller constant vector like <0, 1, 2, 3> is used in a ZIP1 shuffle producing a wider result, it gets expanded with trailing undefs. Similarly, for ZIP2 patterns, the DAG combiner transforms the constant to have leading undefs since ZIP2 only uses the upper half of its operands.

In particular, these patterns arise naturally from VectorCombine's compactShuffleOperands optimization (see #176074) that I am suggesting as a fix for #137447.

Full diff: https://github.com/llvm/llvm-project/pull/176671.diff

2 Files Affected:

(modified) llvm/lib/CodeGen/SelectionDAG/SelectionDAG.cpp (+34-14)
(modified) llvm/test/CodeGen/AArch64/sve-fixed-length-build-vector.ll (+125)

diff --git a/llvm/lib/CodeGen/SelectionDAG/SelectionDAG.cpp b/llvm/lib/CodeGen/SelectionDAG/SelectionDAG.cpp
index 35e443b40c41f..74cc8fb2ed366 100644
--- a/llvm/lib/CodeGen/SelectionDAG/SelectionDAG.cpp
+++ b/llvm/lib/CodeGen/SelectionDAG/SelectionDAG.cpp
@@ -14037,26 +14037,46 @@ BuildVectorSDNode::isConstantSequence() const {
   if (NumOps < 2)
     return std::nullopt;
 
-  if (!isa<ConstantSDNode>(getOperand(0)) ||
-      !isa<ConstantSDNode>(getOperand(1)))
-    return std::nullopt;
-
   unsigned EltSize = getValueType(0).getScalarSizeInBits();
-  APInt Start = getConstantOperandAPInt(0).trunc(EltSize);
-  APInt Stride = getConstantOperandAPInt(1).trunc(EltSize) - Start;
-
-  if (Stride.isZero())
-    return std::nullopt;
+  APInt Start, Stride;
+  int FirstIdx = -1, SecondIdx = -1;
 
-  for (unsigned i = 2; i < NumOps; ++i) {
-    if (!isa<ConstantSDNode>(getOperand(i)))
+  // Find the first two non-undef constant elements to determine Start and
+  // Stride, then verify all remaining elements match the sequence.
+  for (unsigned I = 0; I < NumOps; ++I) {
+    SDValue Op = getOperand(I);
+    if (Op->isUndef())
+      continue;
+    if (!isa<ConstantSDNode>(Op))
       return std::nullopt;
 
-    APInt Val = getConstantOperandAPInt(i).trunc(EltSize);
-    if (Val != (Start + (Stride * i)))
-      return std::nullopt;
+    APInt Val = getConstantOperandAPInt(I).trunc(EltSize);
+    if (FirstIdx < 0) {
+      FirstIdx = I;
+      Start = Val;
+    } else if (SecondIdx < 0) {
+      SecondIdx = I;
+      // Compute stride based on the difference between the two elements.
+      unsigned IdxDiff = I - FirstIdx;
+      APInt ValDiff = Val - Start;
+      if (ValDiff.srem(IdxDiff) != 0)
+        return std::nullopt;
+      Stride = ValDiff.sdiv(IdxDiff);
+      if (Stride.isZero())
+        return std::nullopt;
+      // Adjust Start based on the first defined element's index.
+      Start -= Stride * FirstIdx;
+    } else {
+      // Verify this element matches the sequence.
+      if (Val != Start + Stride * I)
+        return std::nullopt;
+    }
   }
 
+  // Need at least two defined elements.
+  if (SecondIdx < 0)
+    return std::nullopt;
+
   return std::make_pair(Start, Stride);
 }
 
diff --git a/llvm/test/CodeGen/AArch64/sve-fixed-length-build-vector.ll b/llvm/test/CodeGen/AArch64/sve-fixed-length-build-vector.ll
index 47fda39d84001..09f6693373642 100644
--- a/llvm/test/CodeGen/AArch64/sve-fixed-length-build-vector.ll
+++ b/llvm/test/CodeGen/AArch64/sve-fixed-length-build-vector.ll
@@ -65,4 +65,129 @@ define void @build_vector_no_stride_v4i64(ptr %a) #0 {
   ret void
 }
 
+; Sequence with trailing poison elements.
+define void @build_vector_trailing_poison_v8i32(ptr %a) #0 {
+; VBITS_GE_256-LABEL: build_vector_trailing_poison_v8i32:
+; VBITS_GE_256:       // %bb.0:
+; VBITS_GE_256-NEXT:    index z0.s, #0, #3
+; VBITS_GE_256-NEXT:    ptrue p0.s, vl8
+; VBITS_GE_256-NEXT:    st1w { z0.s }, p0, [x0]
+; VBITS_GE_256-NEXT:    ret
+  store <8 x i32> <i32 0, i32 3, i32 6, i32 9, i32 12, i32 15, i32 poison, i32 poison>, ptr %a, align 4
+  ret void
+}
+
+; Sequence with leading poison elements.
+define void @build_vector_leading_poison_v8i32(ptr %a) #0 {
+; VBITS_GE_256-LABEL: build_vector_leading_poison_v8i32:
+; VBITS_GE_256:       // %bb.0:
+; VBITS_GE_256-NEXT:    index z0.s, #0, #3
+; VBITS_GE_256-NEXT:    ptrue p0.s, vl8
+; VBITS_GE_256-NEXT:    st1w { z0.s }, p0, [x0]
+; VBITS_GE_256-NEXT:    ret
+  store <8 x i32> <i32 poison, i32 poison, i32 6, i32 9, i32 12, i32 15, i32 18, i32 21>, ptr %a, align 4
+  ret void
+}
+
+; Sequence with poison elements in the middle.
+define void @build_vector_middle_poison_v8i32(ptr %a) #0 {
+; VBITS_GE_256-LABEL: build_vector_middle_poison_v8i32:
+; VBITS_GE_256:       // %bb.0:
+; VBITS_GE_256-NEXT:    index z0.s, #0, #3
+; VBITS_GE_256-NEXT:    ptrue p0.s, vl8
+; VBITS_GE_256-NEXT:    st1w { z0.s }, p0, [x0]
+; VBITS_GE_256-NEXT:    ret
+  store <8 x i32> <i32 0, i32 3, i32 poison, i32 poison, i32 12, i32 15, i32 18, i32 21>, ptr %a, align 4
+  ret void
+}
+
+; Sequence with poison elements scattered throughout.
+define void @build_vector_scattered_poison_v8i32(ptr %a) #0 {
+; VBITS_GE_256-LABEL: build_vector_scattered_poison_v8i32:
+; VBITS_GE_256:       // %bb.0:
+; VBITS_GE_256-NEXT:    index z0.s, #0, #3
+; VBITS_GE_256-NEXT:    ptrue p0.s, vl8
+; VBITS_GE_256-NEXT:    st1w { z0.s }, p0, [x0]
+; VBITS_GE_256-NEXT:    ret
+  store <8 x i32> <i32 poison, i32 3, i32 poison, i32 9, i32 poison, i32 15, i32 poison, i32 21>, ptr %a, align 4
+  ret void
+}
+
+; Sequence with only two defined elements (minimum required).
+define void @build_vector_two_defined_v4i64(ptr %a) #0 {
+; VBITS_GE_256-LABEL: build_vector_two_defined_v4i64:
+; VBITS_GE_256:       // %bb.0:
+; VBITS_GE_256-NEXT:    index z0.d, #5, #7
+; VBITS_GE_256-NEXT:    ptrue p0.d, vl4
+; VBITS_GE_256-NEXT:    st1d { z0.d }, p0, [x0]
+; VBITS_GE_256-NEXT:    ret
+  store <4 x i64> <i64 poison, i64 12, i64 poison, i64 26>, ptr %a, align 8
+  ret void
+}
+
+; Sequence with negative stride and poison elements.
+define void @build_vector_neg_stride_poison_v8i32(ptr %a) #0 {
+; VBITS_GE_256-LABEL: build_vector_neg_stride_poison_v8i32:
+; VBITS_GE_256:       // %bb.0:
+; VBITS_GE_256-NEXT:    index z0.s, #0, #-2
+; VBITS_GE_256-NEXT:    ptrue p0.s, vl8
+; VBITS_GE_256-NEXT:    st1w { z0.s }, p0, [x0]
+; VBITS_GE_256-NEXT:    ret
+  store <8 x i32> <i32 poison, i32 -2, i32 -4, i32 poison, i32 -8, i32 -10, i32 poison, i32 -14>, ptr %a, align 4
+  ret void
+}
+
+; Only one defined element - cannot determine stride, so no index instruction.
+define void @build_vector_single_defined_v8i32(ptr %a) #0 {
+; VBITS_GE_256-LABEL: build_vector_single_defined_v8i32:
+; VBITS_GE_256:       // %bb.0:
+; VBITS_GE_256-NEXT:    mov z0.s, #42 // =0x2a
+; VBITS_GE_256-NEXT:    ptrue p0.s, vl8
+; VBITS_GE_256-NEXT:    st1w { z0.s }, p0, [x0]
+; VBITS_GE_256-NEXT:    ret
+  store <8 x i32> <i32 poison, i32 poison, i32 poison, i32 42, i32 poison, i32 poison, i32 poison, i32 poison>, ptr %a, align 4
+  ret void
+}
+
+; Fractional stride: elements at indices 1 and 3 differ by 3, so stride would be 3/2.
+define void @build_vector_fractional_stride_v8i32(ptr %a) #0 {
+; VBITS_GE_256-LABEL: build_vector_fractional_stride_v8i32:
+; VBITS_GE_256:       // %bb.0:
+; VBITS_GE_256-NEXT:    ptrue p0.s, vl8
+; VBITS_GE_256-NEXT:    adrp x8, .LCPI12_0
+; VBITS_GE_256-NEXT:    add x8, x8, :lo12:.LCPI12_0
+; VBITS_GE_256-NEXT:    ld1w { z0.s }, p0/z, [x8]
+; VBITS_GE_256-NEXT:    st1w { z0.s }, p0, [x0]
+; VBITS_GE_256-NEXT:    ret
+  store <8 x i32> <i32 poison, i32 0, i32 poison, i32 3, i32 poison, i32 poison, i32 poison, i32 poison>, ptr %a, align 4
+  ret void
+}
+
+; zip1 pattern: constant <0, 1, 2, 3> is expanded to <0, 1, 2, 3, poison, poison, poison, poison>
+; to match the shuffle result width. isConstantSequence recognizes this as a sequence.
+define <8 x i8> @zip_const_seq_with_variable(i8 %x) #0 {
+; VBITS_GE_256-LABEL: zip_const_seq_with_variable:
+; VBITS_GE_256:       // %bb.0:
+; VBITS_GE_256-NEXT:    index z0.b, #0, #1
+; VBITS_GE_256-NEXT:    dup v1.8b, w0
+; VBITS_GE_256-NEXT:    zip1 v0.8b, v0.8b, v1.8b
+; VBITS_GE_256-NEXT:    ret
+  %ins = insertelement <4 x i8> poison, i8 %x, i32 0
+  %splat = shufflevector <4 x i8> %ins, <4 x i8> poison, <4 x i32> zeroinitializer
+  %interleave = shufflevector <4 x i8> <i8 0, i8 1, i8 2, i8 3>, <4 x i8> %splat, <8 x i32> <i32 0, i32 4, i32 1, i32 5, i32 2, i32 6, i32 3, i32 7>
+  ret <8 x i8> %interleave
+}
+
+; zip2 pattern: constant <0, 1, 2, 3, 4, 5, 6, 7> is transformed by the DAG combiner to
+; <poison, poison, poison, poison, 4, 5, 6, 7> since zip2 only uses elements 4-7.
+define <8 x i8> @zip2_const_seq_with_variable(<8 x i8> %x) #0 {
+; VBITS_GE_256-LABEL: zip2_const_seq_with_variable:
+; VBITS_GE_256:       // %bb.0:
+; VBITS_GE_256-NEXT:    index z1.b, #0, #1
+; VBITS_GE_256-NEXT:    zip2 v0.8b, v1.8b, v0.8b
+; VBITS_GE_256-NEXT:    ret
+  %interleave = shufflevector <8 x i8> <i8 0, i8 1, i8 2, i8 3, i8 4, i8 5, i8 6, i8 7>, <8 x i8> %x, <8 x i32> <i32 4, i32 12, i32 5, i32 13, i32 6, i32 14, i32 7, i32 15>
+  ret <8 x i8> %interleave
+}
+
 attributes #0 = { "target-features"="+sve" }

arsenm · 2026-01-19T07:28:56Z

llvm/lib/CodeGen/SelectionDAG/SelectionDAG.cpp

+      Stride = ValDiff.sdiv(IdxDiff);
+      if (Stride.isZero())
+        return std::nullopt;


APInt has divrem (but can you avoid division here?)

As detailed below, I have changed the implementation to use modular arithmetic with APInt::multiplicativeInverse(), based on @jayfoad's observation.

jayfoad · 2026-01-19T11:56:12Z

This introduces interesting questions about signed or unsigned overflow during an arithmetic sequence. If we want to ignore overflow (assume 2s complement wrap around) then all of these are possible:

i8 <0, undef, undef, 0xFF> -> <0, 0x55>
i8 <0, undef, undef, 0xFE> -> <0, 0xAA>
i8 <0, undef, undef, 0xFD> -> <0, 0xFF>

Your patch uses sdiv/srem so I think it will handle the last case only. If you used udiv/urem you could handle the first case only.

github-actions · 2026-01-19T21:32:34Z

✅ With the latest revision this PR passed the C/C++ code formatter.

ginsbach · 2026-01-19T21:55:48Z

This introduces interesting questions about signed or unsigned overflow during an arithmetic sequence. If we want to ignore overflow (assume 2s complement wrap around) then all of these are possible:

i8 <0, undef, undef, 0xFF> -> <0, 0x55>
i8 <0, undef, undef, 0xFE> -> <0, 0xAA>
i8 <0, undef, undef, 0xFD> -> <0, 0xFF>

Your patch uses sdiv/srem so I think it will handle the last case only. If you used udiv/urem you could handle the first case only.

Good point! I've pushed a second commit that uses modular arithmetic with APInt::multiplicativeInverse() to handle all valid cases.

The approach: to solve Stride × IdxDiff ≡ ValDiff (mod 2^n), we factor out common powers of 2 from both sides, then use multiplicative inverse (which exists for odd numbers mod 2^n). The only rejection case is when ValDiff has fewer trailing zeros than IdxDiff - meaning no modular solution exists.

This is appropriate because the target (SVE's INDEX instruction) uses modular wraparound arithmetic, and modular arithmetic is agnostic to signed/unsigned interpretation. I've updated the docstring for isConstantSequence() to document this behavior.

The modular inverse approach adds some complexity compared to simple division, but it makes the implementation mathematically complete - isConstantSequence now recognizes every valid arithmetic sequence mod 2^n, with no edge cases left on the table. I think that might be worth the extra complexity.

ginsbach · 2026-01-27T21:45:35Z

@jayfoad are you happy with the updated version of this PR using modular arithmetic?

jayfoad

Makes sense to me.

For completeness note that there are still more cases that could be handled e.g.:

i8 <0, undef, 0x80, 0x40> -> {0, 0xC0}

llvm/include/llvm/CodeGen/SelectionDAGNodes.h

When the first two defined elements give an index difference with trailing zeros, there are multiple valid strides (mod 2^EltSize). Previously we only tried one; now we try all candidates until one matches all elements or we exhaust the space. Example: <0, poison, 0x80, 0x40> has IdxDiff=2, giving 2 candidate strides. The first (0x40) fails at index 3, but the second (0xC0) works.

ginsbach · 2026-01-28T22:17:00Z

For completeness note that there are still more cases that could be handled e.g.:

i8 <0, undef, 0x80, 0x40> -> {0, 0xC0}

Great catch!

I've fixed that in commit 4. The complexity is getting a bit out of hand for what was supposed to be a simple method, but I think it's hard to justify the modular arithmetic if it then isn't even complete. Let me know what you think.

jayfoad · 2026-01-29T11:58:36Z

For completeness note that there are still more cases that could be handled e.g.:
i8 <0, undef, 0x80, 0x40> -> {0, 0xC0}

Great catch!

I've fixed that in commit 4. The complexity is getting a bit out of hand for what was supposed to be a simple method, but I think it's hard to justify the modular arithmetic if it then isn't even complete. Let me know what you think.

I have to say I am not too keen on commit 4. I'll add some more specific comments, but one option is just to ditch it and leave a TODO for now.

jayfoad · 2026-01-29T11:59:15Z

llvm/lib/CodeGen/SelectionDAG/SelectionDAG.cpp

+      Stride = SmallestStride;
+
+      // Step 3: Adjust Start based on the first defined element's index.
+      Start -= Stride * FirstIdx;


Nit: this is confusing for readers, since it changes the meaning of Start depending on whether SecondIdx < 0 or not.

jayfoad · 2026-01-29T12:01:26Z

llvm/lib/CodeGen/SelectionDAG/SelectionDAG.cpp

+        Start -= StrideInc * FirstIdx;
+      } while (Val != Start + Stride * I);
+      // All strides produce the same value at SecondIdx, so skip re-checking.
+      I = SecondIdx;


Redoing all the checks from SecondIdx + 1 feels expensive and I think it can be avoided. Every time you accept a third-or-later value, you could update StrideInc based on the minimum number of trailing zeros of any of the values you have accepted so far.

jayfoad · 2026-01-29T12:03:07Z

llvm/lib/CodeGen/SelectionDAG/SelectionDAG.cpp

+      // Verify this element matches the sequence, trying alternative strides.
+      if (Val == Start + Stride * I)
+        continue;
+      do {


Looping over all possible strides feels expensive and messy. As an alternative, could you redo the multiplicativeInverse calculation from above and check that the result is compatible with the previous result (i.e. differs only in high order bits that you don't care about)?

jayfoad · 2026-01-29T12:07:51Z

Here's another half-baked idea for reimplemeting commit 4. Take it with a pinch of salt.

First, find an arbitrary defined element at FirstIdx. Then iterate over all the other elements in a specific order, based on the number of trailing zeros in the "IdxDiff" value vs FirstIdx:

First all the ones with IdxDiff odd, (IdxDiff % 2) == 1.
Then all the ones with (IdxDiff % 4) == 2.
Etc

The idea is that the ones you visit first will always have the most restrictions on the high bits of the resulting Stride.

arsenm · 2026-01-29T13:22:57Z

llvm/lib/CodeGen/SelectionDAG/SelectionDAG.cpp


+      // There are 2^CommonPow2Bits valid strides, evenly spaced by StrideInc.
+      // Valid strides: {SmallestStride + k*StrideInc | k = 0, 1, ...}
+      StrideInc = APInt(EltSize, 1) << (EltSize - CommonPow2Bits);


I think there's an APInt helper to create masks like this

ginsbach force-pushed the improve-isConstantSequence branch from decb831 to 3df3bce Compare January 18, 2026 22:20

ginsbach marked this pull request as ready for review January 18, 2026 23:11

llvmbot added backend:AArch64 llvm:SelectionDAG SelectionDAGISel as well labels Jan 18, 2026

ginsbach force-pushed the improve-isConstantSequence branch from 3df3bce to 90b1b95 Compare January 18, 2026 23:14

[SelectionDAG] Handle undef at any position in isConstantSequence

90b1b95

arsenm reviewed Jan 19, 2026

View reviewed changes

RKSimon self-requested a review January 19, 2026 07:29

use modular inverse for stride computation

316e473

fix formatting

c1cee5d

ginsbach requested a review from arsenm January 27, 2026 21:45

jayfoad approved these changes Jan 28, 2026

View reviewed changes

llvm/include/llvm/CodeGen/SelectionDAGNodes.h Outdated Show resolved Hide resolved

ginsbach added 2 commits January 28, 2026 22:11

fix terminology in isConstantSequence docstring

ca22e44

ginsbach requested a review from jayfoad January 29, 2026 10:51

jayfoad reviewed Jan 29, 2026

View reviewed changes

arsenm reviewed Jan 29, 2026

View reviewed changes

[SelectionDAG] Handle undef at any position in isConstantSequence #176671

Are you sure you want to change the base?

[SelectionDAG] Handle undef at any position in isConstantSequence #176671

Conversation

ginsbach commented Jan 18, 2026

Uh oh!

github-actions bot commented Jan 18, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

llvmbot commented Jan 18, 2026

Uh oh!

llvmbot commented Jan 18, 2026

Uh oh!

arsenm Jan 19, 2026

Choose a reason for hiding this comment

Uh oh!

ginsbach Jan 19, 2026

Choose a reason for hiding this comment

Uh oh!

jayfoad commented Jan 19, 2026

Uh oh!

github-actions bot commented Jan 19, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

ginsbach commented Jan 19, 2026

Uh oh!

ginsbach commented Jan 27, 2026

Uh oh!

jayfoad left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

ginsbach commented Jan 28, 2026

Uh oh!

jayfoad commented Jan 29, 2026

Uh oh!

jayfoad Jan 29, 2026

Choose a reason for hiding this comment

Uh oh!

jayfoad Jan 29, 2026

Choose a reason for hiding this comment

Uh oh!

jayfoad Jan 29, 2026

Choose a reason for hiding this comment

Uh oh!

jayfoad commented Jan 29, 2026

Uh oh!

arsenm Jan 29, 2026

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

github-actions bot commented Jan 18, 2026 •

edited

Loading

github-actions bot commented Jan 19, 2026 •

edited

Loading