[CoreML EP] Add FusedConv support by maxwbuckley · Pull Request #28289 · microsoft/onnxruntime

maxwbuckley · 2026-04-30T08:43:18Z

Description

Adds support for `com.microsoft:FusedConv` to the CoreML EP's MLProgram and NeuralNetwork paths. `FusedConv` is produced by ORT's `ConvActivationFusion` pass when a model is optimized with the CPU EP (or any EP in `cpu_acl_js_webgpu_eps`) and saved via `session.optimized_model_filepath` or the ORT-format conversion tool. The saved graph contains `com.microsoft:FusedConv` nodes that — before this patch — the CoreML EP could not claim, fragmenting the partition.

ORT's in-process pipeline does not currently run `ConvActivationFusion` when CoreML EP is the target (the fusion's compat set excludes CoreML), so `FusedConv` typically reaches the CoreML EP only via pre-optimized graphs. That's a real and common workflow: anyone shipping a pre-optimized model artifact (mobile pipelines, ORT-format models, session-cached optimized graphs) that's then loaded with the CoreML EP hits this path.

There's no pre-existing issue tracking this; it was discovered via DWPose / ResNet50 partitioning analysis on Apple Silicon.

Empirical impact

ResNet50-v2 from the ONNX model zoo, CPU-optimized at `ORT_ENABLE_EXTENDED` and reloaded on the CoreML EP (108 nodes total, 33 of them `FusedConv` with Relu activation). M3 Max, MLProgram, batch 1, 100-iter timed runs, 3 interleaved rounds (n=597 per variant):

	Partitions	Nodes on CoreML	Mean	StdDev	P99	Max
Without this patch	18	75 / 108	23.34 ms	1.01	27.68	30.59
With this patch	1	108 / 108	2.94 ms	0.16	3.75	4.32

7.94× mean speedup. The 33 FusedConv nodes that previously fell back to CPU now stay on the ANE/GPU. Variance also tightens 6× (stddev 1.01 → 0.16).

Partition counts on other Conv-heavy ONNX-zoo models post CPU-optimization:

Model	Without	With	Notes
ResNet50-v2	18	1	33 FusedConv (Relu)
FCN-ResNet50	18	1	35 FusedConv (Relu); fails to compile on CoreML for unrelated reasons
YOLOv3 (full)	27	4	72 FusedConv (LeakyRelu); detection post-proc fails on CoreML for unrelated dynamic-shape reasons
YOLOv3-tiny	13	7	11 FusedConv (LeakyRelu); same

Partition reduction is robust across architectures. ResNet50 is the configuration that runs end-to-end on this exact ONNX-zoo collection on the CoreML EP today; the FCN/YOLO failures are orthogonal CoreML-EP limitations on segmentation upsampling and detection post-processing.

Implementation

Reuses `ConvOpBuilder`, which now branches on `op_type`:

`Conv`: behaviour unchanged.
`FusedConv`: emit the `conv` MIL op into an intermediate, then chain the activation MIL op on top. Supports all six activation types `ConvActivationFusion` produces:

ONNX activation	MIL op	params
Relu	`relu`	–
Sigmoid	`sigmoid`	–
Tanh	`tanh`	–
LeakyRelu	`leaky_relu`	alpha (from `activation_params`)
Clip	`clip`	alpha=min, beta=max (from `activation_params`)
HardSigmoid	`sigmoid_hard`	alpha, beta (from `activation_params`)

`IsOpSupportedImpl` rejects `FusedConv` in NeuralNetwork mode (which would emit an unfused Conv and silently lose the activation) and rejects any unrecognized activation string.

Tests

Six new tests in `onnxruntime/test/providers/coreml/coreml_basic_test.cc`, one per supported activation class (param-less, single-param, two-param-positional, two-param-named):

`FusedConvTestRelu` — no `activation_params` attribute
`FusedConvTestSigmoid` — same shape, exercises sigmoid op-name dispatch
`FusedConvTestTanh` — same shape, exercises tanh op-name dispatch
`FusedConvTestLeakyRelu` — single param (alpha); the YOLOv3 case
`FusedConvTestClip` — two params (min, max)
`FusedConvTestHardSigmoid` — two params (alpha, beta); depends on the HardSigmoid CoreML builder landed in [CoreML EP] Add HardSigmoid support #28182

Each verifies CoreML output against the CPU EP reference and asserts `ExpectedEPNodeAssignment::All`. All pass locally on macOS 26.3 / M3 Max.

Also adds the supported-ops doc entry.

Adds support for `com.microsoft:FusedConv` to the CoreML EP's MLProgram and NeuralNetwork paths. FusedConv is produced by ORT's `ConvActivationFusion` pass when a model is optimized with the CPU EP (or any EP in `cpu_acl_js_webgpu_eps`) and saved via `session.optimized_model_filepath` or the ORT-format conversion tool. That saved graph contains `com.microsoft:FusedConv` nodes that — before this patch — the CoreML EP could not claim, fragmenting the partition. ORT's in-process pipeline does not currently run `ConvActivationFusion` when CoreML EP is the target (the fusion's compat set excludes CoreML), so FusedConv typically reaches the CoreML EP only via pre-optimized graphs. That's a real and common workflow: anyone shipping a pre-optimized model artifact (mobile pipelines, ORT-format models, session-cached optimized graphs) that's then loaded with the CoreML EP hits this path. ## Empirical impact (M3 Max, MLProgram, batch 1) ResNet50-v2 from the ONNX model zoo, CPU-optimized at ORT_ENABLE_EXTENDED and reloaded on CoreML EP (108 nodes total, 33 of them FusedConv with Relu activation): | | Partitions | Nodes on CoreML | Mean | StdDev | P99 | Max | |------------------------------|------------|-----------------|-----------|--------|----------|---------| | Without this patch | 18 | 75 / 108 | 23.34 ms | 1.01 | 27.68 | 30.59 | | With this patch | 1 | 108 / 108 | 2.94 ms | 0.16 | 3.75 | 4.32 | 7.94× mean speedup; the 33 FusedConv nodes that previously fell back to CPU now stay on the ANE/GPU. Variance also tightens 6× (stddev 1.01 → 0.16). 597 timed iterations per variant, 3 interleaved rounds. Partition counts on other Conv-heavy ONNX-zoo models with FusedConv content (post CPU optimization): | Model | Without | With | Notes | |--------------------|---------|------|----------------------------------------| | ResNet50-v2 | 18 | 1 | 33 FusedConv (Relu) | | FCN-ResNet50 | 18 | 1 | 35 FusedConv (Relu); fails to compile | | | | | on CoreML for unrelated reasons | | YOLOv3 (full) | 27 | 4 | 72 FusedConv (LeakyRelu); detection | | | | | post-proc fails on CoreML for | | | | | unrelated dynamic-shape reasons | | YOLOv3-tiny | 13 | 7 | 11 FusedConv (LeakyRelu); same | The partition-count reduction is robust across architectures; ResNet50 is the configuration that runs end-to-end on this exact ONNX-zoo model collection. ## Implementation Reuses `ConvOpBuilder`, which now branches on `op_type`: - `Conv`: behavior unchanged. - `FusedConv`: emit the `conv` MIL op into an intermediate, then chain the activation MIL op on top. Supports all six activation types `ConvActivationFusion` produces: Relu -> relu Sigmoid -> sigmoid Tanh -> tanh LeakyRelu -> leaky_relu (alpha from activation_params) Clip -> clip (min/max from activation_params) HardSigmoid -> sigmoid_hard (alpha/beta from activation_params) `IsOpSupportedImpl` rejects FusedConv in NeuralNetwork mode (which would emit an unfused Conv and lose the activation) and rejects any unrecognized activation string. ## Tests Six new tests in `coreml_basic_test.cc`, one per supported activation class (param-less, single-param, two-param-positional, two-param-named): FusedConvTestRelu — no `activation_params` attribute FusedConvTestSigmoid — same shape, exercises sigmoid op-name dispatch FusedConvTestTanh — same shape, exercises tanh op-name dispatch FusedConvTestLeakyRelu — single param (alpha) FusedConvTestClip — two params (min, max) FusedConvTestHardSigmoid — two params (alpha, beta) Each verifies CoreML output against the CPU EP reference and asserts `ExpectedEPNodeAssignment::All`. All pass locally on macOS 26.3 / M3 Max. Also adds the supported-ops doc entry. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

The activation list and the function name already convey what's allowed; the cross-reference to a specific line range in conv_activation_fusion.cc would rot the moment that file gets touched. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

maxwbuckley · 2026-05-06T10:24:54Z

@yuslepukhin this is also a lovely one :) ResNet50-v2 goes Brrrr......!

Copilot

Pull request overview

Note

Copilot was unable to run its full agentic suite in this review.

Adds CoreML EP support for com.microsoft:FusedConv so pre-optimized ORT graphs (via ConvActivationFusion) can be fully claimed/compiled on CoreML MLProgram, reducing partition fragmentation and improving performance.

Changes:

Register com.microsoft:FusedConv with the CoreML op builder factory.
Extend ConvOpBuilder to emit conv + fused activation MIL ops for MLProgram.
Add CoreML tests covering six fused activation variants and document the newly supported op.

Reviewed changes

Copilot reviewed 4 out of 4 changed files in this pull request and generated 5 comments.

File	Description
tools/ci_build/github/apple/coreml_supported_mlprogram_ops.md	Documents `com.microsoft:FusedConv` as supported in MLProgram.
onnxruntime/test/providers/coreml/coreml_basic_test.cc	Adds single-node `FusedConv` model generator and 6 activation-focused MLProgram tests.
onnxruntime/core/providers/coreml/builders/op_builder_factory.cc	Registers `FusedConv` to use the Conv builder implementation.
onnxruntime/core/providers/coreml/builders/impl/conv_op_builder.cc	Implements MLProgram lowering for `FusedConv` as `conv` + activation and adds support checks.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

yuslepukhin

The happy-path implementation does match the PR description for pure Conv + activation MLProgram nodes, but it does not safely support the full FusedConv surface that the registration now exposes. Test coverage also misses both problems: the new CoreML tests only build X/W models with supported activations in coreml_basic_test.cc:1168, so there is no B/Z case and no negative coverage for malformed attributes or rejection paths.

Resolves conflict in onnxruntime/test/providers/coreml/coreml_basic_test.cc where this branch's FusedConv test helpers + 6 tests landed in the same file region as the Split11/13/7 tests merged via microsoft#28270. Both sets are preserved sequentially. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

Replaces the duplicated activation lists in IsSupportedFusedConvActivation and the if/else MIL-op chain in AddToModelBuilderImpl with a single constexpr table mapping each ONNX activation name to its MIL op, expected activation_params arity, and MIL input port names. Both the support gate and the dispatch path now consult that table. Also tightens IsOpSupportedImpl to reject FusedConv nodes whose activation_params arity does not match what the activation expects (0 for Relu/Sigmoid/Tanh, 1 for LeakyRelu, 2 for Clip/HardSigmoid). The CPU EP already rejects mismatches in fused_activation.cc; CoreML now matches that behaviour instead of silently inventing defaults. Addresses review feedback from yuslepukhin and copilot-pull-request-reviewer on microsoft#28289. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

Adds two early rejections in IsOpSupportedImpl that the previous implementation was silently letting through: 1. The optional 4th input 'Z' (residual sum) — FusedConv with Z is Y = activation(Conv(X,W,B) + Z), but the MLProgram lowering only emits conv + activation and never reads input[3]. Without this guard a pre-optimized Conv+Add+Act graph would be fully assigned to CoreML and produce the wrong result by dropping the residual add. Reported by yuslepukhin on microsoft#28289. 2. Non-float element types — FusedConv schema's `T` permits double, but the activation-param lambda only handles FLOAT and FLOAT16. CoreML does not support double anyway; reject double explicitly so the fallback to CPU is what actually runs. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

…ting Adds two ExpectedEPNodeAssignment::None tests covering the support-gating paths added in the previous commit: - FusedConvNeuralNetworkNotSupported — FusedConv on the NeuralNetwork EP is rejected so the node falls back to CPU rather than emit an unfused Conv that silently drops the activation. - FusedConvWithZInputNotSupported — FusedConv with the optional residual Z input is rejected to prevent the silent drop of Conv+Add+Act semantics that yuslepukhin flagged on microsoft#28289. The unsupported-activation and wrong-arity rejections are also live but not testable end-to-end: the CPU FusedConv kernel rejects those same malformed graphs at kernel construction, so TestModelLoad's Initialize fails before partition assignment can be observed. MakeFusedConvModel grows an `add_z` knob to wire the optional 4th input. A small RunFusedConvNegativeTest helper packages the serialize-then-TestModelLoad pattern. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

The previous comment said FusedConv "reuses the existing ConvOpBuilder", which Copilot flagged as misleading because CreateConvOpBuilder registers a new instance under the FusedConv op type rather than literally reusing the Conv-registered instance. Reword to "handled by the same ConvOpBuilder class" so it's clear the reuse is at the class/dispatch level, not the instance. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

Adds a TODO above the FusedConv Z-input rejection pointing at the straightforward MIL lowering (`add(conv_out, Z)` between conv and activation) and noting which optimizer pass produces the Z form (ConvAddActivationFusion at TransformerLevel::Level3, gated to cpu_ep). This way the next person looking at residual-block coverage on CoreML finds the implementation hint without re-discovering the schema and optimizer pass independently. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

maxwbuckley · 2026-05-07T08:46:54Z

@yuslepukhin friendly ping — the latest stack (bb7f4b1 → 0f85140) addresses both of your inline comments (Z residual input rejection, activation_params arity validation) plus the Copilot suggestions (single activation table, FLOAT16/DOUBLE handling, factory comment), with two new negative tests for the gating paths and an inline TODO for proper Z lowering as a follow-up. Re-benchmarked ResNet50-v2 post-refactor and the 8× speedup is intact. CI looks like it needs a maintainer trigger for the full matrix on the new commits. Ready for another look when you have a moment 🙏

yuslepukhin · 2026-05-08T18:37:30Z

Merge from main, pls.

maxwbuckley · 2026-05-08T20:47:29Z

Thanks a lot Dmitiri. We are making Mac computer vision go Brrrr.....! 🔥

maxwbuckley and others added 2 commits April 30, 2026 10:42

yuslepukhin requested a review from Copilot May 6, 2026 21:00

Copilot AI reviewed May 6, 2026

View reviewed changes

yuslepukhin reviewed May 6, 2026

View reviewed changes

Comment thread onnxruntime/core/providers/coreml/builders/impl/conv_op_builder.cc

Comment thread onnxruntime/core/providers/coreml/builders/impl/conv_op_builder.cc Outdated

Copilot started reviewing on behalf of yuslepukhin May 6, 2026 21:37 View session

maxwbuckley and others added 6 commits May 7, 2026 09:54

yuslepukhin approved these changes May 7, 2026

View reviewed changes

yuslepukhin enabled auto-merge (squash) May 7, 2026 18:45

Merge remote-tracking branch 'origin/main' into coreml-fusedconv

98c2e2a

yuslepukhin merged commit aa92574 into microsoft:main May 9, 2026
89 of 90 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[CoreML EP] Add FusedConv support#28289

[CoreML EP] Add FusedConv support#28289
yuslepukhin merged 9 commits into
microsoft:mainfrom
maxwbuckley:coreml-fusedconv

maxwbuckley commented Apr 30, 2026 •

edited

Loading

Uh oh!

maxwbuckley commented May 6, 2026

Uh oh!

Copilot AI left a comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

yuslepukhin left a comment

Uh oh!

Uh oh!

Uh oh!

maxwbuckley commented May 7, 2026

Uh oh!

yuslepukhin commented May 8, 2026

Uh oh!

maxwbuckley commented May 8, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Conversation

maxwbuckley commented Apr 30, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Description

Empirical impact

Implementation

Tests

Uh oh!

maxwbuckley commented May 6, 2026

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Reviewed changes

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

yuslepukhin left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

maxwbuckley commented May 7, 2026

Uh oh!

yuslepukhin commented May 8, 2026

Uh oh!

maxwbuckley commented May 8, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

maxwbuckley commented Apr 30, 2026 •

edited

Loading