-
Notifications
You must be signed in to change notification settings - Fork 15.9k
[SelectionDAG] Handle undef at any position in isConstantSequence #176671
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
|
✅ With the latest revision this PR passed the undef deprecator. |
decb831 to
3df3bce
Compare
|
@llvm/pr-subscribers-backend-aarch64 Author: Philip Ginsbach-Chen (ginsbach) ChangesThis patch extends The new implementation finds the first two non-undef constant elements, computes the stride from their difference, then verifies all other defined elements match the sequence. This enables SVE's INDEX instruction to be used in more cases. This change particularly benefits ZIP1/ZIP2 patterns where one operand is a constant sequence. When a smaller constant vector like In particular, these patterns arise naturally from Full diff: https://github.com/llvm/llvm-project/pull/176671.diff 2 Files Affected:
diff --git a/llvm/lib/CodeGen/SelectionDAG/SelectionDAG.cpp b/llvm/lib/CodeGen/SelectionDAG/SelectionDAG.cpp
index 35e443b40c41f..74cc8fb2ed366 100644
--- a/llvm/lib/CodeGen/SelectionDAG/SelectionDAG.cpp
+++ b/llvm/lib/CodeGen/SelectionDAG/SelectionDAG.cpp
@@ -14037,26 +14037,46 @@ BuildVectorSDNode::isConstantSequence() const {
if (NumOps < 2)
return std::nullopt;
- if (!isa<ConstantSDNode>(getOperand(0)) ||
- !isa<ConstantSDNode>(getOperand(1)))
- return std::nullopt;
-
unsigned EltSize = getValueType(0).getScalarSizeInBits();
- APInt Start = getConstantOperandAPInt(0).trunc(EltSize);
- APInt Stride = getConstantOperandAPInt(1).trunc(EltSize) - Start;
-
- if (Stride.isZero())
- return std::nullopt;
+ APInt Start, Stride;
+ int FirstIdx = -1, SecondIdx = -1;
- for (unsigned i = 2; i < NumOps; ++i) {
- if (!isa<ConstantSDNode>(getOperand(i)))
+ // Find the first two non-undef constant elements to determine Start and
+ // Stride, then verify all remaining elements match the sequence.
+ for (unsigned I = 0; I < NumOps; ++I) {
+ SDValue Op = getOperand(I);
+ if (Op->isUndef())
+ continue;
+ if (!isa<ConstantSDNode>(Op))
return std::nullopt;
- APInt Val = getConstantOperandAPInt(i).trunc(EltSize);
- if (Val != (Start + (Stride * i)))
- return std::nullopt;
+ APInt Val = getConstantOperandAPInt(I).trunc(EltSize);
+ if (FirstIdx < 0) {
+ FirstIdx = I;
+ Start = Val;
+ } else if (SecondIdx < 0) {
+ SecondIdx = I;
+ // Compute stride based on the difference between the two elements.
+ unsigned IdxDiff = I - FirstIdx;
+ APInt ValDiff = Val - Start;
+ if (ValDiff.srem(IdxDiff) != 0)
+ return std::nullopt;
+ Stride = ValDiff.sdiv(IdxDiff);
+ if (Stride.isZero())
+ return std::nullopt;
+ // Adjust Start based on the first defined element's index.
+ Start -= Stride * FirstIdx;
+ } else {
+ // Verify this element matches the sequence.
+ if (Val != Start + Stride * I)
+ return std::nullopt;
+ }
}
+ // Need at least two defined elements.
+ if (SecondIdx < 0)
+ return std::nullopt;
+
return std::make_pair(Start, Stride);
}
diff --git a/llvm/test/CodeGen/AArch64/sve-fixed-length-build-vector.ll b/llvm/test/CodeGen/AArch64/sve-fixed-length-build-vector.ll
index 47fda39d84001..09f6693373642 100644
--- a/llvm/test/CodeGen/AArch64/sve-fixed-length-build-vector.ll
+++ b/llvm/test/CodeGen/AArch64/sve-fixed-length-build-vector.ll
@@ -65,4 +65,129 @@ define void @build_vector_no_stride_v4i64(ptr %a) #0 {
ret void
}
+; Sequence with trailing poison elements.
+define void @build_vector_trailing_poison_v8i32(ptr %a) #0 {
+; VBITS_GE_256-LABEL: build_vector_trailing_poison_v8i32:
+; VBITS_GE_256: // %bb.0:
+; VBITS_GE_256-NEXT: index z0.s, #0, #3
+; VBITS_GE_256-NEXT: ptrue p0.s, vl8
+; VBITS_GE_256-NEXT: st1w { z0.s }, p0, [x0]
+; VBITS_GE_256-NEXT: ret
+ store <8 x i32> <i32 0, i32 3, i32 6, i32 9, i32 12, i32 15, i32 poison, i32 poison>, ptr %a, align 4
+ ret void
+}
+
+; Sequence with leading poison elements.
+define void @build_vector_leading_poison_v8i32(ptr %a) #0 {
+; VBITS_GE_256-LABEL: build_vector_leading_poison_v8i32:
+; VBITS_GE_256: // %bb.0:
+; VBITS_GE_256-NEXT: index z0.s, #0, #3
+; VBITS_GE_256-NEXT: ptrue p0.s, vl8
+; VBITS_GE_256-NEXT: st1w { z0.s }, p0, [x0]
+; VBITS_GE_256-NEXT: ret
+ store <8 x i32> <i32 poison, i32 poison, i32 6, i32 9, i32 12, i32 15, i32 18, i32 21>, ptr %a, align 4
+ ret void
+}
+
+; Sequence with poison elements in the middle.
+define void @build_vector_middle_poison_v8i32(ptr %a) #0 {
+; VBITS_GE_256-LABEL: build_vector_middle_poison_v8i32:
+; VBITS_GE_256: // %bb.0:
+; VBITS_GE_256-NEXT: index z0.s, #0, #3
+; VBITS_GE_256-NEXT: ptrue p0.s, vl8
+; VBITS_GE_256-NEXT: st1w { z0.s }, p0, [x0]
+; VBITS_GE_256-NEXT: ret
+ store <8 x i32> <i32 0, i32 3, i32 poison, i32 poison, i32 12, i32 15, i32 18, i32 21>, ptr %a, align 4
+ ret void
+}
+
+; Sequence with poison elements scattered throughout.
+define void @build_vector_scattered_poison_v8i32(ptr %a) #0 {
+; VBITS_GE_256-LABEL: build_vector_scattered_poison_v8i32:
+; VBITS_GE_256: // %bb.0:
+; VBITS_GE_256-NEXT: index z0.s, #0, #3
+; VBITS_GE_256-NEXT: ptrue p0.s, vl8
+; VBITS_GE_256-NEXT: st1w { z0.s }, p0, [x0]
+; VBITS_GE_256-NEXT: ret
+ store <8 x i32> <i32 poison, i32 3, i32 poison, i32 9, i32 poison, i32 15, i32 poison, i32 21>, ptr %a, align 4
+ ret void
+}
+
+; Sequence with only two defined elements (minimum required).
+define void @build_vector_two_defined_v4i64(ptr %a) #0 {
+; VBITS_GE_256-LABEL: build_vector_two_defined_v4i64:
+; VBITS_GE_256: // %bb.0:
+; VBITS_GE_256-NEXT: index z0.d, #5, #7
+; VBITS_GE_256-NEXT: ptrue p0.d, vl4
+; VBITS_GE_256-NEXT: st1d { z0.d }, p0, [x0]
+; VBITS_GE_256-NEXT: ret
+ store <4 x i64> <i64 poison, i64 12, i64 poison, i64 26>, ptr %a, align 8
+ ret void
+}
+
+; Sequence with negative stride and poison elements.
+define void @build_vector_neg_stride_poison_v8i32(ptr %a) #0 {
+; VBITS_GE_256-LABEL: build_vector_neg_stride_poison_v8i32:
+; VBITS_GE_256: // %bb.0:
+; VBITS_GE_256-NEXT: index z0.s, #0, #-2
+; VBITS_GE_256-NEXT: ptrue p0.s, vl8
+; VBITS_GE_256-NEXT: st1w { z0.s }, p0, [x0]
+; VBITS_GE_256-NEXT: ret
+ store <8 x i32> <i32 poison, i32 -2, i32 -4, i32 poison, i32 -8, i32 -10, i32 poison, i32 -14>, ptr %a, align 4
+ ret void
+}
+
+; Only one defined element - cannot determine stride, so no index instruction.
+define void @build_vector_single_defined_v8i32(ptr %a) #0 {
+; VBITS_GE_256-LABEL: build_vector_single_defined_v8i32:
+; VBITS_GE_256: // %bb.0:
+; VBITS_GE_256-NEXT: mov z0.s, #42 // =0x2a
+; VBITS_GE_256-NEXT: ptrue p0.s, vl8
+; VBITS_GE_256-NEXT: st1w { z0.s }, p0, [x0]
+; VBITS_GE_256-NEXT: ret
+ store <8 x i32> <i32 poison, i32 poison, i32 poison, i32 42, i32 poison, i32 poison, i32 poison, i32 poison>, ptr %a, align 4
+ ret void
+}
+
+; Fractional stride: elements at indices 1 and 3 differ by 3, so stride would be 3/2.
+define void @build_vector_fractional_stride_v8i32(ptr %a) #0 {
+; VBITS_GE_256-LABEL: build_vector_fractional_stride_v8i32:
+; VBITS_GE_256: // %bb.0:
+; VBITS_GE_256-NEXT: ptrue p0.s, vl8
+; VBITS_GE_256-NEXT: adrp x8, .LCPI12_0
+; VBITS_GE_256-NEXT: add x8, x8, :lo12:.LCPI12_0
+; VBITS_GE_256-NEXT: ld1w { z0.s }, p0/z, [x8]
+; VBITS_GE_256-NEXT: st1w { z0.s }, p0, [x0]
+; VBITS_GE_256-NEXT: ret
+ store <8 x i32> <i32 poison, i32 0, i32 poison, i32 3, i32 poison, i32 poison, i32 poison, i32 poison>, ptr %a, align 4
+ ret void
+}
+
+; zip1 pattern: constant <0, 1, 2, 3> is expanded to <0, 1, 2, 3, poison, poison, poison, poison>
+; to match the shuffle result width. isConstantSequence recognizes this as a sequence.
+define <8 x i8> @zip_const_seq_with_variable(i8 %x) #0 {
+; VBITS_GE_256-LABEL: zip_const_seq_with_variable:
+; VBITS_GE_256: // %bb.0:
+; VBITS_GE_256-NEXT: index z0.b, #0, #1
+; VBITS_GE_256-NEXT: dup v1.8b, w0
+; VBITS_GE_256-NEXT: zip1 v0.8b, v0.8b, v1.8b
+; VBITS_GE_256-NEXT: ret
+ %ins = insertelement <4 x i8> poison, i8 %x, i32 0
+ %splat = shufflevector <4 x i8> %ins, <4 x i8> poison, <4 x i32> zeroinitializer
+ %interleave = shufflevector <4 x i8> <i8 0, i8 1, i8 2, i8 3>, <4 x i8> %splat, <8 x i32> <i32 0, i32 4, i32 1, i32 5, i32 2, i32 6, i32 3, i32 7>
+ ret <8 x i8> %interleave
+}
+
+; zip2 pattern: constant <0, 1, 2, 3, 4, 5, 6, 7> is transformed by the DAG combiner to
+; <poison, poison, poison, poison, 4, 5, 6, 7> since zip2 only uses elements 4-7.
+define <8 x i8> @zip2_const_seq_with_variable(<8 x i8> %x) #0 {
+; VBITS_GE_256-LABEL: zip2_const_seq_with_variable:
+; VBITS_GE_256: // %bb.0:
+; VBITS_GE_256-NEXT: index z1.b, #0, #1
+; VBITS_GE_256-NEXT: zip2 v0.8b, v1.8b, v0.8b
+; VBITS_GE_256-NEXT: ret
+ %interleave = shufflevector <8 x i8> <i8 0, i8 1, i8 2, i8 3, i8 4, i8 5, i8 6, i8 7>, <8 x i8> %x, <8 x i32> <i32 4, i32 12, i32 5, i32 13, i32 6, i32 14, i32 7, i32 15>
+ ret <8 x i8> %interleave
+}
+
attributes #0 = { "target-features"="+sve" }
|
|
@llvm/pr-subscribers-llvm-selectiondag Author: Philip Ginsbach-Chen (ginsbach) ChangesThis patch extends The new implementation finds the first two non-undef constant elements, computes the stride from their difference, then verifies all other defined elements match the sequence. This enables SVE's INDEX instruction to be used in more cases. This change particularly benefits ZIP1/ZIP2 patterns where one operand is a constant sequence. When a smaller constant vector like In particular, these patterns arise naturally from Full diff: https://github.com/llvm/llvm-project/pull/176671.diff 2 Files Affected:
diff --git a/llvm/lib/CodeGen/SelectionDAG/SelectionDAG.cpp b/llvm/lib/CodeGen/SelectionDAG/SelectionDAG.cpp
index 35e443b40c41f..74cc8fb2ed366 100644
--- a/llvm/lib/CodeGen/SelectionDAG/SelectionDAG.cpp
+++ b/llvm/lib/CodeGen/SelectionDAG/SelectionDAG.cpp
@@ -14037,26 +14037,46 @@ BuildVectorSDNode::isConstantSequence() const {
if (NumOps < 2)
return std::nullopt;
- if (!isa<ConstantSDNode>(getOperand(0)) ||
- !isa<ConstantSDNode>(getOperand(1)))
- return std::nullopt;
-
unsigned EltSize = getValueType(0).getScalarSizeInBits();
- APInt Start = getConstantOperandAPInt(0).trunc(EltSize);
- APInt Stride = getConstantOperandAPInt(1).trunc(EltSize) - Start;
-
- if (Stride.isZero())
- return std::nullopt;
+ APInt Start, Stride;
+ int FirstIdx = -1, SecondIdx = -1;
- for (unsigned i = 2; i < NumOps; ++i) {
- if (!isa<ConstantSDNode>(getOperand(i)))
+ // Find the first two non-undef constant elements to determine Start and
+ // Stride, then verify all remaining elements match the sequence.
+ for (unsigned I = 0; I < NumOps; ++I) {
+ SDValue Op = getOperand(I);
+ if (Op->isUndef())
+ continue;
+ if (!isa<ConstantSDNode>(Op))
return std::nullopt;
- APInt Val = getConstantOperandAPInt(i).trunc(EltSize);
- if (Val != (Start + (Stride * i)))
- return std::nullopt;
+ APInt Val = getConstantOperandAPInt(I).trunc(EltSize);
+ if (FirstIdx < 0) {
+ FirstIdx = I;
+ Start = Val;
+ } else if (SecondIdx < 0) {
+ SecondIdx = I;
+ // Compute stride based on the difference between the two elements.
+ unsigned IdxDiff = I - FirstIdx;
+ APInt ValDiff = Val - Start;
+ if (ValDiff.srem(IdxDiff) != 0)
+ return std::nullopt;
+ Stride = ValDiff.sdiv(IdxDiff);
+ if (Stride.isZero())
+ return std::nullopt;
+ // Adjust Start based on the first defined element's index.
+ Start -= Stride * FirstIdx;
+ } else {
+ // Verify this element matches the sequence.
+ if (Val != Start + Stride * I)
+ return std::nullopt;
+ }
}
+ // Need at least two defined elements.
+ if (SecondIdx < 0)
+ return std::nullopt;
+
return std::make_pair(Start, Stride);
}
diff --git a/llvm/test/CodeGen/AArch64/sve-fixed-length-build-vector.ll b/llvm/test/CodeGen/AArch64/sve-fixed-length-build-vector.ll
index 47fda39d84001..09f6693373642 100644
--- a/llvm/test/CodeGen/AArch64/sve-fixed-length-build-vector.ll
+++ b/llvm/test/CodeGen/AArch64/sve-fixed-length-build-vector.ll
@@ -65,4 +65,129 @@ define void @build_vector_no_stride_v4i64(ptr %a) #0 {
ret void
}
+; Sequence with trailing poison elements.
+define void @build_vector_trailing_poison_v8i32(ptr %a) #0 {
+; VBITS_GE_256-LABEL: build_vector_trailing_poison_v8i32:
+; VBITS_GE_256: // %bb.0:
+; VBITS_GE_256-NEXT: index z0.s, #0, #3
+; VBITS_GE_256-NEXT: ptrue p0.s, vl8
+; VBITS_GE_256-NEXT: st1w { z0.s }, p0, [x0]
+; VBITS_GE_256-NEXT: ret
+ store <8 x i32> <i32 0, i32 3, i32 6, i32 9, i32 12, i32 15, i32 poison, i32 poison>, ptr %a, align 4
+ ret void
+}
+
+; Sequence with leading poison elements.
+define void @build_vector_leading_poison_v8i32(ptr %a) #0 {
+; VBITS_GE_256-LABEL: build_vector_leading_poison_v8i32:
+; VBITS_GE_256: // %bb.0:
+; VBITS_GE_256-NEXT: index z0.s, #0, #3
+; VBITS_GE_256-NEXT: ptrue p0.s, vl8
+; VBITS_GE_256-NEXT: st1w { z0.s }, p0, [x0]
+; VBITS_GE_256-NEXT: ret
+ store <8 x i32> <i32 poison, i32 poison, i32 6, i32 9, i32 12, i32 15, i32 18, i32 21>, ptr %a, align 4
+ ret void
+}
+
+; Sequence with poison elements in the middle.
+define void @build_vector_middle_poison_v8i32(ptr %a) #0 {
+; VBITS_GE_256-LABEL: build_vector_middle_poison_v8i32:
+; VBITS_GE_256: // %bb.0:
+; VBITS_GE_256-NEXT: index z0.s, #0, #3
+; VBITS_GE_256-NEXT: ptrue p0.s, vl8
+; VBITS_GE_256-NEXT: st1w { z0.s }, p0, [x0]
+; VBITS_GE_256-NEXT: ret
+ store <8 x i32> <i32 0, i32 3, i32 poison, i32 poison, i32 12, i32 15, i32 18, i32 21>, ptr %a, align 4
+ ret void
+}
+
+; Sequence with poison elements scattered throughout.
+define void @build_vector_scattered_poison_v8i32(ptr %a) #0 {
+; VBITS_GE_256-LABEL: build_vector_scattered_poison_v8i32:
+; VBITS_GE_256: // %bb.0:
+; VBITS_GE_256-NEXT: index z0.s, #0, #3
+; VBITS_GE_256-NEXT: ptrue p0.s, vl8
+; VBITS_GE_256-NEXT: st1w { z0.s }, p0, [x0]
+; VBITS_GE_256-NEXT: ret
+ store <8 x i32> <i32 poison, i32 3, i32 poison, i32 9, i32 poison, i32 15, i32 poison, i32 21>, ptr %a, align 4
+ ret void
+}
+
+; Sequence with only two defined elements (minimum required).
+define void @build_vector_two_defined_v4i64(ptr %a) #0 {
+; VBITS_GE_256-LABEL: build_vector_two_defined_v4i64:
+; VBITS_GE_256: // %bb.0:
+; VBITS_GE_256-NEXT: index z0.d, #5, #7
+; VBITS_GE_256-NEXT: ptrue p0.d, vl4
+; VBITS_GE_256-NEXT: st1d { z0.d }, p0, [x0]
+; VBITS_GE_256-NEXT: ret
+ store <4 x i64> <i64 poison, i64 12, i64 poison, i64 26>, ptr %a, align 8
+ ret void
+}
+
+; Sequence with negative stride and poison elements.
+define void @build_vector_neg_stride_poison_v8i32(ptr %a) #0 {
+; VBITS_GE_256-LABEL: build_vector_neg_stride_poison_v8i32:
+; VBITS_GE_256: // %bb.0:
+; VBITS_GE_256-NEXT: index z0.s, #0, #-2
+; VBITS_GE_256-NEXT: ptrue p0.s, vl8
+; VBITS_GE_256-NEXT: st1w { z0.s }, p0, [x0]
+; VBITS_GE_256-NEXT: ret
+ store <8 x i32> <i32 poison, i32 -2, i32 -4, i32 poison, i32 -8, i32 -10, i32 poison, i32 -14>, ptr %a, align 4
+ ret void
+}
+
+; Only one defined element - cannot determine stride, so no index instruction.
+define void @build_vector_single_defined_v8i32(ptr %a) #0 {
+; VBITS_GE_256-LABEL: build_vector_single_defined_v8i32:
+; VBITS_GE_256: // %bb.0:
+; VBITS_GE_256-NEXT: mov z0.s, #42 // =0x2a
+; VBITS_GE_256-NEXT: ptrue p0.s, vl8
+; VBITS_GE_256-NEXT: st1w { z0.s }, p0, [x0]
+; VBITS_GE_256-NEXT: ret
+ store <8 x i32> <i32 poison, i32 poison, i32 poison, i32 42, i32 poison, i32 poison, i32 poison, i32 poison>, ptr %a, align 4
+ ret void
+}
+
+; Fractional stride: elements at indices 1 and 3 differ by 3, so stride would be 3/2.
+define void @build_vector_fractional_stride_v8i32(ptr %a) #0 {
+; VBITS_GE_256-LABEL: build_vector_fractional_stride_v8i32:
+; VBITS_GE_256: // %bb.0:
+; VBITS_GE_256-NEXT: ptrue p0.s, vl8
+; VBITS_GE_256-NEXT: adrp x8, .LCPI12_0
+; VBITS_GE_256-NEXT: add x8, x8, :lo12:.LCPI12_0
+; VBITS_GE_256-NEXT: ld1w { z0.s }, p0/z, [x8]
+; VBITS_GE_256-NEXT: st1w { z0.s }, p0, [x0]
+; VBITS_GE_256-NEXT: ret
+ store <8 x i32> <i32 poison, i32 0, i32 poison, i32 3, i32 poison, i32 poison, i32 poison, i32 poison>, ptr %a, align 4
+ ret void
+}
+
+; zip1 pattern: constant <0, 1, 2, 3> is expanded to <0, 1, 2, 3, poison, poison, poison, poison>
+; to match the shuffle result width. isConstantSequence recognizes this as a sequence.
+define <8 x i8> @zip_const_seq_with_variable(i8 %x) #0 {
+; VBITS_GE_256-LABEL: zip_const_seq_with_variable:
+; VBITS_GE_256: // %bb.0:
+; VBITS_GE_256-NEXT: index z0.b, #0, #1
+; VBITS_GE_256-NEXT: dup v1.8b, w0
+; VBITS_GE_256-NEXT: zip1 v0.8b, v0.8b, v1.8b
+; VBITS_GE_256-NEXT: ret
+ %ins = insertelement <4 x i8> poison, i8 %x, i32 0
+ %splat = shufflevector <4 x i8> %ins, <4 x i8> poison, <4 x i32> zeroinitializer
+ %interleave = shufflevector <4 x i8> <i8 0, i8 1, i8 2, i8 3>, <4 x i8> %splat, <8 x i32> <i32 0, i32 4, i32 1, i32 5, i32 2, i32 6, i32 3, i32 7>
+ ret <8 x i8> %interleave
+}
+
+; zip2 pattern: constant <0, 1, 2, 3, 4, 5, 6, 7> is transformed by the DAG combiner to
+; <poison, poison, poison, poison, 4, 5, 6, 7> since zip2 only uses elements 4-7.
+define <8 x i8> @zip2_const_seq_with_variable(<8 x i8> %x) #0 {
+; VBITS_GE_256-LABEL: zip2_const_seq_with_variable:
+; VBITS_GE_256: // %bb.0:
+; VBITS_GE_256-NEXT: index z1.b, #0, #1
+; VBITS_GE_256-NEXT: zip2 v0.8b, v1.8b, v0.8b
+; VBITS_GE_256-NEXT: ret
+ %interleave = shufflevector <8 x i8> <i8 0, i8 1, i8 2, i8 3, i8 4, i8 5, i8 6, i8 7>, <8 x i8> %x, <8 x i32> <i32 4, i32 12, i32 5, i32 13, i32 6, i32 14, i32 7, i32 15>
+ ret <8 x i8> %interleave
+}
+
attributes #0 = { "target-features"="+sve" }
|
3df3bce to
90b1b95
Compare
| Stride = ValDiff.sdiv(IdxDiff); | ||
| if (Stride.isZero()) | ||
| return std::nullopt; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
APInt has divrem (but can you avoid division here?)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
As detailed below, I have changed the implementation to use modular arithmetic with APInt::multiplicativeInverse(), based on @jayfoad's observation.
|
This introduces interesting questions about signed or unsigned overflow during an arithmetic sequence. If we want to ignore overflow (assume 2s complement wrap around) then all of these are possible: i8 <0, undef, undef, 0xFF> -> <0, 0x55> Your patch uses sdiv/srem so I think it will handle the last case only. If you used udiv/urem you could handle the first case only. |
|
✅ With the latest revision this PR passed the C/C++ code formatter. |
Good point! I've pushed a second commit that uses modular arithmetic with The approach: to solve Stride × IdxDiff ≡ ValDiff (mod 2^n), we factor out common powers of 2 from both sides, then use multiplicative inverse (which exists for odd numbers mod 2^n). The only rejection case is when ValDiff has fewer trailing zeros than IdxDiff - meaning no modular solution exists. This is appropriate because the target (SVE's INDEX instruction) uses modular wraparound arithmetic, and modular arithmetic is agnostic to signed/unsigned interpretation. I've updated the docstring for The modular inverse approach adds some complexity compared to simple division, but it makes the implementation mathematically complete - |
|
@jayfoad are you happy with the updated version of this PR using modular arithmetic? |
jayfoad
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Makes sense to me.
For completeness note that there are still more cases that could be handled e.g.:
i8 <0, undef, 0x80, 0x40> -> {0, 0xC0}
When the first two defined elements give an index difference with trailing zeros, there are multiple valid strides (mod 2^EltSize). Previously we only tried one; now we try all candidates until one matches all elements or we exhaust the space. Example: <0, poison, 0x80, 0x40> has IdxDiff=2, giving 2 candidate strides. The first (0x40) fails at index 3, but the second (0xC0) works.
Great catch! I've fixed that in commit 4. The complexity is getting a bit out of hand for what was supposed to be a simple method, but I think it's hard to justify the modular arithmetic if it then isn't even complete. Let me know what you think. |
I have to say I am not too keen on commit 4. I'll add some more specific comments, but one option is just to ditch it and leave a TODO for now. |
| Stride = SmallestStride; | ||
|
|
||
| // Step 3: Adjust Start based on the first defined element's index. | ||
| Start -= Stride * FirstIdx; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nit: this is confusing for readers, since it changes the meaning of Start depending on whether SecondIdx < 0 or not.
| Start -= StrideInc * FirstIdx; | ||
| } while (Val != Start + Stride * I); | ||
| // All strides produce the same value at SecondIdx, so skip re-checking. | ||
| I = SecondIdx; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Redoing all the checks from SecondIdx + 1 feels expensive and I think it can be avoided. Every time you accept a third-or-later value, you could update StrideInc based on the minimum number of trailing zeros of any of the values you have accepted so far.
| // Verify this element matches the sequence, trying alternative strides. | ||
| if (Val == Start + Stride * I) | ||
| continue; | ||
| do { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looping over all possible strides feels expensive and messy. As an alternative, could you redo the multiplicativeInverse calculation from above and check that the result is compatible with the previous result (i.e. differs only in high order bits that you don't care about)?
|
Here's another half-baked idea for reimplemeting commit 4. Take it with a pinch of salt. First, find an arbitrary defined element at FirstIdx. Then iterate over all the other elements in a specific order, based on the number of trailing zeros in the "IdxDiff" value vs FirstIdx:
The idea is that the ones you visit first will always have the most restrictions on the high bits of the resulting Stride. |
|
|
||
| // There are 2^CommonPow2Bits valid strides, evenly spaced by StrideInc. | ||
| // Valid strides: {SmallestStride + k*StrideInc | k = 0, 1, ...} | ||
| StrideInc = APInt(EltSize, 1) << (EltSize - CommonPow2Bits); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think there's an APInt helper to create masks like this
This patch extends
BuildVectorSDNode::isConstantSequenceto recognize constant sequences that contain undef elements at any position.The new implementation finds the first two non-undef constant elements, computes the stride from their difference, then verifies all other defined elements match the sequence. This enables SVE's INDEX instruction to be used in more cases.
This change particularly benefits ZIP1/ZIP2 patterns where one operand is a constant sequence. When a smaller constant vector like
<0, 1, 2, 3>is used in a ZIP1 shuffle producing a wider result, it gets expanded with trailing undefs. Similarly, for ZIP2 patterns, the DAG combiner transforms the constant to have leading undefs since ZIP2 only uses the upper half of its operands.In particular, these patterns arise naturally from
VectorCombine'scompactShuffleOperandsoptimization (see #176074) that I am suggesting as a fix for #137447.