[CoreML EP] Support pre-opset-13 Split via 'split' attribute#28270
[CoreML EP] Support pre-opset-13 Split via 'split' attribute#28270yuslepukhin merged 7 commits intomicrosoft:mainfrom
Conversation
The CoreML SplitOpBuilder gated `GetMinSupportedOpSet` at 13 because pre-13 Split carries split sizes via an INTS attribute rather than a second input. Lowers the gate to 1 and reads the attribute in both the MLProgram and NeuralNetwork emitters. Adds validation matching the existing input-form path (>=2 outputs, sum equals axis size, no zero entries, axis size not dynamic). For the no-attribute / no-input case (legacy even split) we now also explicitly require the axis size to be evenly divisible by num_outputs, since CoreML's `num_splits` demands it; this only affects opset 2-12 graphs that were 100% rejected before, so no existing path regresses. Motivated by DWPose `dw-ll_ucoco_384.onnx` (opset 11), which has two Split nodes — one uneven (`split=[512, 512, 128]`), one even (`split=[1, 1]`) — both falling back to CPU and fragmenting the CoreML partition. Without this change: 3 partitions, 301/303 nodes on CoreML EP. With it: 1 partition, 303/303 nodes on CoreML EP. On M3 Max (1299-iter timed run, batch 1, MLProgram): P50 6.81 ms -> 6.55 ms P99 7.33 ms -> 6.93 ms Max 12.62 ms -> 10.36 ms (-17.9%) StdDev 0.24 ms -> 0.17 ms (-29%) CPU EP baseline P50 38.7 ms Adds three tests covering both NeuralNetwork and MLProgram emitters: Split11UnevenAttribute — uneven sizes via attribute Split11EvenAttribute — uniform sizes via attribute path Split11NoAttributeEven — fall-through to num_splits branch Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
d650b4d to
c3b381b
Compare
|
Refactored the tests to match the HardSigmoid PR (#28182) pattern:
All three tests pass locally (NN + MLProgram for each scenario). |
The CoreML GatherOpBuilder rejected rank-0 (scalar) indices because CoreML's `gather` requires rank-1+ indices and the workaround would change the output rank. Fixes microsoft#28180 by performing the workaround internally: when indices are scalar, the builder now emits expand_dims(indices, axes=[0]) -> indices_1d : shape [1] gather(data, indices_1d, axis) -> shape data_shape with axis=1 squeeze(., axes=[axis]) -> ONNX gather output shape in both the MLProgram and NeuralNetwork emitters. This requires claiming a static intermediate shape, so we restrict the fast path to fully-static `data` shape; dynamic-shape scalar Gather still falls back to CPU (existing behavior). Motivated by StyleGAN-family generators (e.g. GFPGAN), which select per-layer style codes with a scalar-index Gather. Combined with microsoft#28270 (pre-opset-13 Split), GFPGAN-1024 (input 1x3x512x512) goes from 9 partitions / 534 CoreML nodes to 1 partition / 557 CoreML nodes on M3 Max. 100-iter timed run, MLProgram: partitions mean P99 max origin/main 9 118.0 ms 131.0 131.3 Gather fix only 8 118.1 ms 134.1 134.7 Gather + Split (microsoft#28270) 1 81.8 ms 96.7 98.1 The Gather fix alone trims one graph break (the 16 scalar Gathers collapse into the surrounding CoreML partitions); the full latency win lands when both fixes ship. Adds three tests covering both NeuralNetwork and MLProgram emitters: GatherScalarIndicesAxis1 - axis=1 (squeeze inserted axis) GatherScalarIndicesAxis0 - axis=0 (squeeze leading axis) GatherScalarIndicesNegativeAxis - axis=-1 (HandleNegativeAxis path) Updates the comment on the existing GatherWithScalarIndices test to reflect that scalar Gather is now supported when 'data' is fully static (the test continues to exercise the dynamic-shape fall-back). Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
The CoreML GatherOpBuilder rejected rank-0 (scalar) indices because CoreML's `gather` requires rank-1+ indices and the workaround would change the output rank. Fixes microsoft#28180 by performing the workaround internally: when indices are scalar, the builder now emits reshape(indices, shape=[1]) -> indices_1d gather(data, indices_1d, axis) -> shape data_shape with axis=1 squeeze(., axes=[axis]) -> ONNX gather output shape in both the MLProgram and NeuralNetwork emitters. We use `reshape` rather than `expand_dims` because CoreML internally pads scalars and expand_dims on the padded tensor can push the apparent rank past CoreML's rank-5 limit. Restrictions: - 'data' must have a fully static shape (we claim a static intermediate shape between gather and squeeze). - 'data' rank capped at 4 (the rank-5 case still trips CoreML's compiler with "Invalid rank: 6", so we keep the conservative boundary). Dynamic-shape and rank-5+ scalar Gather still falls back to CPU. Motivated by StyleGAN-family generators (e.g. GFPGAN), which select per-layer style codes with a scalar-index Gather. Combined with microsoft#28270 (pre-opset-13 Split), GFPGAN-1024 (input 1x3x512x512) goes from 9 partitions / 534 CoreML nodes to 1 partition / 557 CoreML nodes on M3 Max. Adds six tests covering both NeuralNetwork and MLProgram emitters: GatherScalarIndicesAxis1 - axis=1, mid-rank squeeze GatherScalarIndicesAxis0 - axis=0, leading-axis squeeze GatherScalarIndicesNegativeAxis - axis=-1, HandleNegativeAxis path GatherScalarIndicesFloat16 - fp16 data (MLProgram only) GatherScalarIndicesInt64Data - int64 data, both formats GatherScalarIndicesRank4Data - rank-4 data, exercises the cap Updates the comment on the existing GatherWithScalarIndices test to reflect that scalar Gather is now supported when 'data' is fully static and rank-4 or below; the test continues to exercise the dynamic-shape fall-back. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Adds three tests that mirror the opset-11 attribute-form Split tests using the opset-13 input form (split sizes as a constant initializer): - Split13UnevenInput (parity with Split11UnevenAttribute) - Split13EvenInput (parity with Split11EvenAttribute) - Split13NoSplitInputEven (parity with Split11NoAttributeEven) Each test runs both the NeuralNetwork and MLProgram emitters, compares outputs against the CPU EP, and asserts ExpectedEPNodeAssignment::All so any CoreML fallback would surface as a failure. Also drops the redundant "Pre-opset-13 split attribute is also supported" sentence from coreml_supported_mlprogram_ops.md — the table already lists the operator, and the attribute-form support is documented in the PR. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
|
Added a follow-up commit (b1989fc) addressing two things: 1. Opset-13 input-form parity testsThe original three tests only exercised the new pre-opset-13 attribute path. To prove the post-opset-13 input-form path (the existing, untouched code branch) still behaves identically, this commit adds three mirror tests with the same shapes and split sizes:
Each test runs both the NeuralNetwork and MLProgram emitters via
This locks in the parity argument from the PR description ("no path that used to work regresses") with explicit coverage rather than relying on it being implicit. All 6 tests (3 original + 3 new) pass locally on macOS 26.3 / M3 Max: 2. Ops doc cleanupRemoved the redundant "Pre-opset-13 |
The two-line comment explaining that pre-opset-13 Split uses an attribute form was anchored to GetMinSupportedOpSet, but the relevant context for a reader is in IsOpSupportedImpl, which already documents the attribute path inline. Removing the duplicate. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
|
@yuslepukhin greatly appreciate your help and guidance on this and my other Apple Onnxruntime optimizations :) |
There was a problem hiding this comment.
Pull request overview
Note
Copilot was unable to run its full agentic suite in this review.
Adds CoreML EP support for legacy ONNX Split nodes that encode split sizes in the pre-opset-13 split attribute, and expands CoreML Split test coverage.
Changes:
- Lowers the CoreML
Splitminimum supported opset from 13 to 1. - Adds pre-opset-13
split-attribute handling in both MLProgram and NeuralNetwork emitters, plus validation for legacy even-split behavior. - Adds new CoreML tests covering Split opset 11 and opset 13 for uneven, even, and implicit-even split forms.
Reviewed changes
Copilot reviewed 2 out of 2 changed files in this pull request and generated 6 comments.
| File | Description |
|---|---|
onnxruntime/core/providers/coreml/builders/impl/split_op_builder.cc |
Enables legacy Split support in the CoreML builder and updates support validation. |
onnxruntime/test/providers/coreml/coreml_basic_test.cc |
Adds Split coverage for opset 11/13 across attribute, input, and implicit-even split forms. |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
- Reject non-positive entries (not just zero) in the 'split' attribute validation; previously [5, -1] would pass since the sum still matched the axis size. - Split the combined dynamic-axis / non-divisible branch in the omitted- split path into two distinct conditions so the verbose log no longer blames divisibility when the actual cause is a dynamic axis. - Add opset-7 attribute test (Split7UnevenAttribute) for the <=10 range the builder now advertises. - Add two negative tests verifying CoreML rejects unsupported configs: Split11ZeroSplitValueNotSupported (zero entry in 'split') and Split11OmittedNonDivisibleAxisNotSupported (omitted split with axis size not divisible by num outputs).
ONNX shape inference for Split-11 with omitted 'split' rejects models whose declared output shapes don't match an even split, so the test fails at graph.Resolve() before the CoreML EP gets a chance to apply its own divisibility rejection. The CoreML check at lines 187-191 of split_op_builder.cc remains as defensive code, but it isn't reachable through a well-formed model. Remove the unreachable test. The other negative case from the Copilot review (Split11ZeroSplitValueNotSupported) covers the new non-positive entry rejection and stays in. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
There was a problem hiding this comment.
Pull request overview
Copilot reviewed 2 out of 2 changed files in this pull request and generated 3 comments.
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
When the splitting axis is dynamic (split_dims_at_axis == -1), the sum of split sizes — always positive after the explicit positivity check — can never equal -1, so the mismatch check fired first with a misleading "sum of split doesn't match axis size" message. The actual cause was that the axis is dynamic, which CoreML doesn't allow. Reorder both the input-form and attribute-form paths so the dynamic-axis check runs before the sum check. The pre-existing latent bug in the input-form path (untouched by this PR's behavior change) is fixed at the same time, per reviewer request. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
ONNX schema and the CPU Split kernel both accept a Split node with a single output, so the CoreML rejection (split_attr->size() < 2 in the attribute-form path) is observable via partition assertion and worth locking in. The other negative cases yuslepukhin requested — sum mismatch, non-divisible omitted split, negative split value — are caught earlier by ONNX shape inference and CPU's SplitBase Initialize check, so the malformed model never reaches the partitioner with a CPU fallback path to compare against. The CoreML checks for those cases remain as defensive code but cannot be exercised in isolation through the public graph-builder API. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Resolves conflict in onnxruntime/test/providers/coreml/coreml_basic_test.cc where this branch's FusedConv test helpers + 6 tests landed in the same file region as the Split11/13/7 tests merged via microsoft#28270. Both sets are preserved sequentially. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Resolves conflict in coreml_basic_test.cc where this branch's tile/ceil/identity helpers + tests landed in the same file region as the Split11/13/7 tests merged via microsoft#28270. Both sets are preserved sequentially. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Resolves conflict in coreml_basic_test.cc where this branch's GatherScalarIndices* tests landed in the same file region as the Split11/13/7 tests merged via microsoft#28270. Both sets are preserved sequentially (Gather first, then Split). Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Description
The CoreML
SplitOpBuilderpreviously gatedGetMinSupportedOpSetat 13 because pre-13Splitcarries split sizes via an INTS attribute rather than a second input. This PR lowers the gate to 1 and reads the attribute in both the MLProgram and NeuralNetwork emitters, soSplitfrom any opset is supported on the CoreML EP.The validation in
IsOpSupportedImplmirrors the existing input-form rules — ≥2 outputs, sum of sizes equals the axis dim, all sizes positive, axis dim not dynamic. For the no-attribute / no-input case (legacy even-split) we also explicitly require the axis dim to be evenly divisible bynum_outputs, since CoreML'snum_splitsrequires that. This is a behavior change only for opset 2–12 graphs that were 100% rejected before, so no path that used to work regresses.Motivation
DWPose
dw-ll_ucoco_384.onnx(opset 11), a common pose-estimation model, has twoSplitnodes — one uneven (split=[512, 512, 128]) and one even (split=[1, 1]). Both fall back to CPU today, fragmenting the CoreML partition.Benchmark — M3 Max, MLProgram, batch 1, 1299-iter steady state
Removing the two CPU↔CoreML round trips improves the tail far more than the median — useful for real-time perception pipelines where worst-case latency determines the frame budget.
Tests
Eight new tests in
onnxruntime/test/providers/coreml/coreml_basic_test.cc, each exercising both the NeuralNetwork and MLProgram emitters and asserting full CoreML EP node assignment (no CPU fallback).Pre-opset-13 attribute form (the new code path):
Split7UnevenAttribute— opset 7 unevensplit=[4, 3, 2], covering the opset 7–10 range.Split11UnevenAttribute— DWPose's pattern,split=[4, 3, 2].Split11EvenAttribute— uniform sizes via attribute.Split11NoAttributeEven— falls through to thenum_splits = num_outputsbranch.Post-opset-13 input form (parity with the existing, untouched path):
Split13UnevenInput—splitinput[4, 3, 2].Split13EvenInput— uniform sizes via input.Split13NoSplitInputEven— nosplitinput, even-split fallback.Negative coverage:
Split11ZeroSplitValueNotSupported— verifies the attribute-form rejection of a non-positive entry; expects no CoreML assignment.All eight pass locally on macOS 26.3 / M3 Max.
Motivation for upstreaming
Most pre-2023 vision exports (DWPose, MMPose models, original YOLOv5/v7/v8, etc.) target ONNX opset 11/12 and use the
Splitattribute form. They currently lose anySplitto CPU on the CoreML EP. This is a self-contained gap with a clean fix.