[CoreML EP] Support pre-opset-13 Split via 'split' attribute by maxwbuckley · Pull Request #28270 · microsoft/onnxruntime

maxwbuckley · 2026-04-29T15:51:13Z

Description

The CoreML SplitOpBuilder previously gated GetMinSupportedOpSet at 13 because pre-13 Split carries split sizes via an INTS attribute rather than a second input. This PR lowers the gate to 1 and reads the attribute in both the MLProgram and NeuralNetwork emitters, so Split from any opset is supported on the CoreML EP.

The validation in IsOpSupportedImpl mirrors the existing input-form rules — ≥2 outputs, sum of sizes equals the axis dim, all sizes positive, axis dim not dynamic. For the no-attribute / no-input case (legacy even-split) we also explicitly require the axis dim to be evenly divisible by num_outputs, since CoreML's num_splits requires that. This is a behavior change only for opset 2–12 graphs that were 100% rejected before, so no path that used to work regresses.

Motivation

DWPose dw-ll_ucoco_384.onnx (opset 11), a common pose-estimation model, has two Split nodes — one uneven (split=[512, 512, 128]) and one even (split=[1, 1]). Both fall back to CPU today, fragmenting the CoreML partition.

	Without this PR	With this PR
CoreML partitions	3	1
Nodes on CoreML EP	301 / 303	303 / 303

Benchmark — M3 Max, MLProgram, batch 1, 1299-iter steady state

Metric	Without PR	With PR	Δ
Mean	6.838 ms	6.565 ms	−4.0%
StdDev	0.239 ms	0.170 ms	−29%
P50	6.810 ms	6.545 ms	−3.9%
P95	7.070 ms	6.775 ms	−4.2%
P99	7.330 ms	6.928 ms	−5.5%
P99.9	8.917 ms	8.164 ms	−8.4%
Max	12.616 ms	10.360 ms	−17.9%

Removing the two CPU↔CoreML round trips improves the tail far more than the median — useful for real-time perception pipelines where worst-case latency determines the frame budget.

Tests

Eight new tests in onnxruntime/test/providers/coreml/coreml_basic_test.cc, each exercising both the NeuralNetwork and MLProgram emitters and asserting full CoreML EP node assignment (no CPU fallback).

Pre-opset-13 attribute form (the new code path):

Split7UnevenAttribute — opset 7 uneven split=[4, 3, 2], covering the opset 7–10 range.
Split11UnevenAttribute — DWPose's pattern, split=[4, 3, 2].
Split11EvenAttribute — uniform sizes via attribute.
Split11NoAttributeEven — falls through to the num_splits = num_outputs branch.

Post-opset-13 input form (parity with the existing, untouched path):

Split13UnevenInput — split input [4, 3, 2].
Split13EvenInput — uniform sizes via input.
Split13NoSplitInputEven — no split input, even-split fallback.

Negative coverage:

Split11ZeroSplitValueNotSupported — verifies the attribute-form rejection of a non-positive entry; expects no CoreML assignment.

All eight pass locally on macOS 26.3 / M3 Max.

Motivation for upstreaming

Most pre-2023 vision exports (DWPose, MMPose models, original YOLOv5/v7/v8, etc.) target ONNX opset 11/12 and use the Split attribute form. They currently lose any Split to CPU on the CoreML EP. This is a self-contained gap with a clean fix.

The CoreML SplitOpBuilder gated `GetMinSupportedOpSet` at 13 because pre-13 Split carries split sizes via an INTS attribute rather than a second input. Lowers the gate to 1 and reads the attribute in both the MLProgram and NeuralNetwork emitters. Adds validation matching the existing input-form path (>=2 outputs, sum equals axis size, no zero entries, axis size not dynamic). For the no-attribute / no-input case (legacy even split) we now also explicitly require the axis size to be evenly divisible by num_outputs, since CoreML's `num_splits` demands it; this only affects opset 2-12 graphs that were 100% rejected before, so no existing path regresses. Motivated by DWPose `dw-ll_ucoco_384.onnx` (opset 11), which has two Split nodes — one uneven (`split=[512, 512, 128]`), one even (`split=[1, 1]`) — both falling back to CPU and fragmenting the CoreML partition. Without this change: 3 partitions, 301/303 nodes on CoreML EP. With it: 1 partition, 303/303 nodes on CoreML EP. On M3 Max (1299-iter timed run, batch 1, MLProgram): P50 6.81 ms -> 6.55 ms P99 7.33 ms -> 6.93 ms Max 12.62 ms -> 10.36 ms (-17.9%) StdDev 0.24 ms -> 0.17 ms (-29%) CPU EP baseline P50 38.7 ms Adds three tests covering both NeuralNetwork and MLProgram emitters: Split11UnevenAttribute — uneven sizes via attribute Split11EvenAttribute — uniform sizes via attribute path Split11NoAttributeEven — fall-through to num_splits branch Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

maxwbuckley · 2026-04-29T16:00:26Z

Refactored the tests to match the HardSigmoid PR (#28182) pattern:

Use onnxruntime::Model / Graph API instead of raw ModelProto construction.
Pin opset 11 explicitly via domain_to_version so the attribute form is exercised.
Run both NeuralNetwork and MLProgram emitters in each test (the NN code path I changed wasn't covered before).
Inline per-test, no shared helper, matching the style of the surrounding HardSigmoid / Pad tests.
Replaced the synthetic 1152-element DWPose-shape input with smaller (1,9 / 1,6 / 1,8) tensors that still cover the same code paths.

All three tests pass locally (NN + MLProgram for each scenario).

The CoreML GatherOpBuilder rejected rank-0 (scalar) indices because CoreML's `gather` requires rank-1+ indices and the workaround would change the output rank. Fixes microsoft#28180 by performing the workaround internally: when indices are scalar, the builder now emits expand_dims(indices, axes=[0]) -> indices_1d : shape [1] gather(data, indices_1d, axis) -> shape data_shape with axis=1 squeeze(., axes=[axis]) -> ONNX gather output shape in both the MLProgram and NeuralNetwork emitters. This requires claiming a static intermediate shape, so we restrict the fast path to fully-static `data` shape; dynamic-shape scalar Gather still falls back to CPU (existing behavior). Motivated by StyleGAN-family generators (e.g. GFPGAN), which select per-layer style codes with a scalar-index Gather. Combined with microsoft#28270 (pre-opset-13 Split), GFPGAN-1024 (input 1x3x512x512) goes from 9 partitions / 534 CoreML nodes to 1 partition / 557 CoreML nodes on M3 Max. 100-iter timed run, MLProgram: partitions mean P99 max origin/main 9 118.0 ms 131.0 131.3 Gather fix only 8 118.1 ms 134.1 134.7 Gather + Split (microsoft#28270) 1 81.8 ms 96.7 98.1 The Gather fix alone trims one graph break (the 16 scalar Gathers collapse into the surrounding CoreML partitions); the full latency win lands when both fixes ship. Adds three tests covering both NeuralNetwork and MLProgram emitters: GatherScalarIndicesAxis1 - axis=1 (squeeze inserted axis) GatherScalarIndicesAxis0 - axis=0 (squeeze leading axis) GatherScalarIndicesNegativeAxis - axis=-1 (HandleNegativeAxis path) Updates the comment on the existing GatherWithScalarIndices test to reflect that scalar Gather is now supported when 'data' is fully static (the test continues to exercise the dynamic-shape fall-back). Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

The CoreML GatherOpBuilder rejected rank-0 (scalar) indices because CoreML's `gather` requires rank-1+ indices and the workaround would change the output rank. Fixes microsoft#28180 by performing the workaround internally: when indices are scalar, the builder now emits reshape(indices, shape=[1]) -> indices_1d gather(data, indices_1d, axis) -> shape data_shape with axis=1 squeeze(., axes=[axis]) -> ONNX gather output shape in both the MLProgram and NeuralNetwork emitters. We use `reshape` rather than `expand_dims` because CoreML internally pads scalars and expand_dims on the padded tensor can push the apparent rank past CoreML's rank-5 limit. Restrictions: - 'data' must have a fully static shape (we claim a static intermediate shape between gather and squeeze). - 'data' rank capped at 4 (the rank-5 case still trips CoreML's compiler with "Invalid rank: 6", so we keep the conservative boundary). Dynamic-shape and rank-5+ scalar Gather still falls back to CPU. Motivated by StyleGAN-family generators (e.g. GFPGAN), which select per-layer style codes with a scalar-index Gather. Combined with microsoft#28270 (pre-opset-13 Split), GFPGAN-1024 (input 1x3x512x512) goes from 9 partitions / 534 CoreML nodes to 1 partition / 557 CoreML nodes on M3 Max. Adds six tests covering both NeuralNetwork and MLProgram emitters: GatherScalarIndicesAxis1 - axis=1, mid-rank squeeze GatherScalarIndicesAxis0 - axis=0, leading-axis squeeze GatherScalarIndicesNegativeAxis - axis=-1, HandleNegativeAxis path GatherScalarIndicesFloat16 - fp16 data (MLProgram only) GatherScalarIndicesInt64Data - int64 data, both formats GatherScalarIndicesRank4Data - rank-4 data, exercises the cap Updates the comment on the existing GatherWithScalarIndices test to reflect that scalar Gather is now supported when 'data' is fully static and rank-4 or below; the test continues to exercise the dynamic-shape fall-back. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

Adds three tests that mirror the opset-11 attribute-form Split tests using the opset-13 input form (split sizes as a constant initializer): - Split13UnevenInput (parity with Split11UnevenAttribute) - Split13EvenInput (parity with Split11EvenAttribute) - Split13NoSplitInputEven (parity with Split11NoAttributeEven) Each test runs both the NeuralNetwork and MLProgram emitters, compares outputs against the CPU EP, and asserts ExpectedEPNodeAssignment::All so any CoreML fallback would surface as a failure. Also drops the redundant "Pre-opset-13 split attribute is also supported" sentence from coreml_supported_mlprogram_ops.md — the table already lists the operator, and the attribute-form support is documented in the PR. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

maxwbuckley · 2026-05-05T12:13:33Z

Added a follow-up commit (b1989fc) addressing two things:

1. Opset-13 input-form parity tests

The original three tests only exercised the new pre-opset-13 attribute path. To prove the post-opset-13 input-form path (the existing, untouched code branch) still behaves identically, this commit adds three mirror tests with the same shapes and split sizes:

New test	Mirrors	Path exercised in `split_op_builder.cc`
`Split13UnevenInput`	`Split11UnevenAttribute`	input-form (`input_defs.size() > 1`) — MLProgram L68–74 / NN L111–119
`Split13EvenInput`	`Split11EvenAttribute`	same path, uniform sizes
`Split13NoSplitInputEven`	`Split11NoAttributeEven`	no `split` input, `SinceVersion() < 18` even-split fallback — MLProgram L79–82 / NN L125–127

Each test runs both the NeuralNetwork and MLProgram emitters via RunAndVerifyOutputsWithEP, which:

Compares outputs element-wise against the CPU EP baseline (1e-5 fp32 tolerance), and
Asserts ExpectedEPNodeAssignment::All, so any CoreML fallback would fail the test rather than silently regress.

This locks in the parity argument from the PR description ("no path that used to work regresses") with explicit coverage rather than relying on it being implicit.

All 6 tests (3 original + 3 new) pass locally on macOS 26.3 / M3 Max:

[ OK ] Split11UnevenAttribute     (64 ms)
[ OK ] Split11EvenAttribute       (33 ms)
[ OK ] Split11NoAttributeEven     (30 ms)
[ OK ] Split13UnevenInput         (43 ms)
[ OK ] Split13EvenInput           (46 ms)
[ OK ] Split13NoSplitInputEven    (42 ms)

2. Ops doc cleanup

Removed the redundant "Pre-opset-13 split attribute is also supported." sentence from coreml_supported_mlprogram_ops.md. The presence of ai.onnx:Split in the table is already the support claim, and tying it to a specific opset boundary in the doc is more noise than signal once both forms are supported.

The two-line comment explaining that pre-opset-13 Split uses an attribute form was anchored to GetMinSupportedOpSet, but the relevant context for a reader is in IsOpSupportedImpl, which already documents the attribute path inline. Removing the duplicate. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

maxwbuckley · 2026-05-05T12:21:03Z

@yuslepukhin greatly appreciate your help and guidance on this and my other Apple Onnxruntime optimizations :)

Copilot

Pull request overview

Note

Copilot was unable to run its full agentic suite in this review.

Adds CoreML EP support for legacy ONNX Split nodes that encode split sizes in the pre-opset-13 split attribute, and expands CoreML Split test coverage.

Changes:

Lowers the CoreML Split minimum supported opset from 13 to 1.
Adds pre-opset-13 split-attribute handling in both MLProgram and NeuralNetwork emitters, plus validation for legacy even-split behavior.
Adds new CoreML tests covering Split opset 11 and opset 13 for uneven, even, and implicit-even split forms.

Reviewed changes

Copilot reviewed 2 out of 2 changed files in this pull request and generated 6 comments.

File	Description
`onnxruntime/core/providers/coreml/builders/impl/split_op_builder.cc`	Enables legacy Split support in the CoreML builder and updates support validation.
`onnxruntime/test/providers/coreml/coreml_basic_test.cc`	Adds Split coverage for opset 11/13 across attribute, input, and implicit-even split forms.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

- Reject non-positive entries (not just zero) in the 'split' attribute validation; previously [5, -1] would pass since the sum still matched the axis size. - Split the combined dynamic-axis / non-divisible branch in the omitted- split path into two distinct conditions so the verbose log no longer blames divisibility when the actual cause is a dynamic axis. - Add opset-7 attribute test (Split7UnevenAttribute) for the <=10 range the builder now advertises. - Add two negative tests verifying CoreML rejects unsupported configs: Split11ZeroSplitValueNotSupported (zero entry in 'split') and Split11OmittedNonDivisibleAxisNotSupported (omitted split with axis size not divisible by num outputs).

ONNX shape inference for Split-11 with omitted 'split' rejects models whose declared output shapes don't match an even split, so the test fails at graph.Resolve() before the CoreML EP gets a chance to apply its own divisibility rejection. The CoreML check at lines 187-191 of split_op_builder.cc remains as defensive code, but it isn't reachable through a well-formed model. Remove the unreachable test. The other negative case from the Copilot review (Split11ZeroSplitValueNotSupported) covers the new non-positive entry rejection and stays in. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

Copilot

Pull request overview

Copilot reviewed 2 out of 2 changed files in this pull request and generated 3 comments.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

When the splitting axis is dynamic (split_dims_at_axis == -1), the sum of split sizes — always positive after the explicit positivity check — can never equal -1, so the mismatch check fired first with a misleading "sum of split doesn't match axis size" message. The actual cause was that the axis is dynamic, which CoreML doesn't allow. Reorder both the input-form and attribute-form paths so the dynamic-axis check runs before the sum check. The pre-existing latent bug in the input-form path (untouched by this PR's behavior change) is fixed at the same time, per reviewer request. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

ONNX schema and the CPU Split kernel both accept a Split node with a single output, so the CoreML rejection (split_attr->size() < 2 in the attribute-form path) is observable via partition assertion and worth locking in. The other negative cases yuslepukhin requested — sum mismatch, non-divisible omitted split, negative split value — are caught earlier by ONNX shape inference and CPU's SplitBase Initialize check, so the malformed model never reaches the partitioner with a CPU fallback path to compare against. The CoreML checks for those cases remain as defensive code but cannot be exercised in isolation through the public graph-builder API. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

Resolves conflict in onnxruntime/test/providers/coreml/coreml_basic_test.cc where this branch's FusedConv test helpers + 6 tests landed in the same file region as the Split11/13/7 tests merged via microsoft#28270. Both sets are preserved sequentially. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

Resolves conflict in coreml_basic_test.cc where this branch's tile/ceil/identity helpers + tests landed in the same file region as the Split11/13/7 tests merged via microsoft#28270. Both sets are preserved sequentially. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

Resolves conflict in coreml_basic_test.cc where this branch's GatherScalarIndices* tests landed in the same file region as the Split11/13/7 tests merged via microsoft#28270. Both sets are preserved sequentially (Gather first, then Split). Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

maxwbuckley force-pushed the coreml-split-attr branch from d650b4d to c3b381b Compare April 29, 2026 16:00

maxwbuckley mentioned this pull request Apr 29, 2026

[CoreML EP] Support Gather with scalar 'indices' #28278

Open

yuslepukhin requested a review from Copilot May 5, 2026 18:12

Copilot AI reviewed May 5, 2026

View reviewed changes

Copilot started reviewing on behalf of yuslepukhin May 5, 2026 18:45 View session

maxwbuckley and others added 2 commits May 5, 2026 22:17

yuslepukhin reviewed May 5, 2026

View reviewed changes

Comment thread onnxruntime/core/providers/coreml/builders/impl/split_op_builder.cc Outdated

Comment thread onnxruntime/test/providers/coreml/coreml_basic_test.cc

yuslepukhin requested a review from Copilot May 5, 2026 20:52

Copilot started reviewing on behalf of yuslepukhin May 5, 2026 20:55 View session

Copilot AI reviewed May 5, 2026

View reviewed changes

Comment thread onnxruntime/core/providers/coreml/builders/impl/split_op_builder.cc

Comment thread onnxruntime/core/providers/coreml/builders/impl/split_op_builder.cc Outdated

Comment thread onnxruntime/core/providers/coreml/builders/impl/split_op_builder.cc

maxwbuckley and others added 2 commits May 6, 2026 09:13

yuslepukhin approved these changes May 6, 2026

View reviewed changes

yuslepukhin merged commit 470977a into microsoft:main May 6, 2026
85 of 87 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[CoreML EP] Support pre-opset-13 Split via 'split' attribute#28270

[CoreML EP] Support pre-opset-13 Split via 'split' attribute#28270
yuslepukhin merged 7 commits intomicrosoft:mainfrom
maxwbuckley:coreml-split-attr

maxwbuckley commented Apr 29, 2026 •

edited

Loading

Uh oh!

maxwbuckley commented Apr 29, 2026

Uh oh!

maxwbuckley commented May 5, 2026

Uh oh!

maxwbuckley commented May 5, 2026

Uh oh!

Copilot AI left a comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Copilot AI left a comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Conversation

maxwbuckley commented Apr 29, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Description

Motivation

Benchmark — M3 Max, MLProgram, batch 1, 1299-iter steady state

Tests

Motivation for upstreaming

Uh oh!

maxwbuckley commented Apr 29, 2026

Uh oh!

maxwbuckley commented May 5, 2026

1. Opset-13 input-form parity tests

2. Ops doc cleanup

Uh oh!

maxwbuckley commented May 5, 2026

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Reviewed changes

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

maxwbuckley commented Apr 29, 2026 •

edited

Loading