[CoreML EP] Add HardSigmoid support#28182
Conversation
|
@microsoft-github-policy-service agree |
d2e815c to
72940ee
Compare
|
Amended the branch to add a dedicated CoreML-EP test and correct the earlier claim about test coverage. My original PR description asserted that the existing The new test in This pattern worth keeping in mind more broadly — the recent Softplus/Elu addition (#26462) also relies on the multi-EP CPU test and may not be catching CoreML-side regressions either. |
There was a problem hiding this comment.
Pull request overview
Adds HardSigmoid operator coverage to the CoreML Execution Provider so models using this activation no longer fall back to CPU (avoiding CoreML↔CPU graph breaks) while maintaining output parity with the CPU reference.
Changes:
- Implement
HardSigmoidin CoreML EP activation builder for both MLProgram (sigmoid_hard) and NeuralNetwork (ActivationSigmoidHard) paths, includingalpha/betawiring. - Register
HardSigmoidin the CoreML op builder factory. - Add a dedicated CoreML EP test that verifies full-node assignment and output correctness in both NN and MLProgram formats; update the supported-ops doc list.
Reviewed changes
Copilot reviewed 4 out of 4 changed files in this pull request and generated no comments.
| File | Description |
|---|---|
| tools/ci_build/github/apple/coreml_supported_mlprogram_ops.md | Documents ai.onnx:HardSigmoid as supported for MLProgram. |
| onnxruntime/test/providers/coreml/coreml_basic_test.cc | Adds a single-node HardSigmoid model test verifying full CoreML assignment and correct outputs (NN + MLProgram). |
| onnxruntime/core/providers/coreml/builders/op_builder_factory.cc | Registers HardSigmoid with the activation op builder. |
| onnxruntime/core/providers/coreml/builders/impl/activation_op_builder.cc | Implements HardSigmoid conversion for MLProgram and NeuralNetwork model formats and lists it as a supported activation. |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
|
The PR may require rebase from main when the pipelines are fixed. |
Adds `HardSigmoid` to the CoreML Execution Provider's activation op builder. Both MLProgram (`sigmoid_hard`) and NeuralNetwork (`ActivationSigmoidHard`) code paths are implemented; the op's ONNX definition matches CoreML MIL's `sigmoid_hard` exactly, so no decomposition is required. Adds a dedicated CoreML-EP test `CoreMLExecutionProviderTest.HardSigmoidTest` that verifies the entire graph is placed on the CoreML EP (both NN and MLProgram formats) via `ExpectedEPNodeAssignment::All`, and that the output matches the CPU reference. The existing multi-EP test in `activation_op_test.cc` silently falls back to CPU for unsupported-on-EP ops, so a dedicated test is required to genuinely verify the CoreML path. Also updates coreml_supported_mlprogram_ops.md. Fixes microsoft#28181. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
72940ee to
81c4421
Compare
|
Force push makes it hard to review the changes. |
|
Apologies — I force-pushed after rebasing on main and amending. Won't repeat that pattern; I'll stack follow-up commits instead so the review-since-last diff stays usable. For this round, what changed on top of the original commit (
Range-diff if helpful: |
There was a problem hiding this comment.
Pull request overview
Copilot reviewed 4 out of 4 changed files in this pull request and generated no new comments.
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
|
/azp run Win_TRT_Minimal_CUDA_Test_CI, Windows GPU Doc Gen CI Pipeline |
|
Azure Pipelines successfully started running 2 pipeline(s). |
### Description Adds support for \`com.microsoft:FusedConv\` to the CoreML EP's MLProgram and NeuralNetwork paths. \`FusedConv\` is produced by ORT's \`ConvActivationFusion\` pass when a model is optimized with the CPU EP (or any EP in \`cpu_acl_js_webgpu_eps\`) and saved via \`session.optimized_model_filepath\` or the ORT-format conversion tool. The saved graph contains \`com.microsoft:FusedConv\` nodes that — before this patch — the CoreML EP could not claim, fragmenting the partition. ORT's in-process pipeline does not currently run \`ConvActivationFusion\` when CoreML EP is the target (the fusion's compat set excludes CoreML), so \`FusedConv\` typically reaches the CoreML EP only via pre-optimized graphs. That's a real and common workflow: anyone shipping a pre-optimized model artifact (mobile pipelines, ORT-format models, session-cached optimized graphs) that's then loaded with the CoreML EP hits this path. There's no pre-existing issue tracking this; it was discovered via DWPose / ResNet50 partitioning analysis on Apple Silicon. ### Empirical impact ResNet50-v2 from the ONNX model zoo, CPU-optimized at \`ORT_ENABLE_EXTENDED\` and reloaded on the CoreML EP (108 nodes total, 33 of them \`FusedConv\` with Relu activation). M3 Max, MLProgram, batch 1, 100-iter timed runs, 3 interleaved rounds (n=597 per variant): | | Partitions | Nodes on CoreML | Mean | StdDev | P99 | Max | |---|---|---|---|---|---|---| | Without this patch | 18 | 75 / 108 | 23.34 ms | 1.01 | 27.68 | 30.59 | | **With this patch** | **1** | **108 / 108** | **2.94 ms** | **0.16** | **3.75** | **4.32** | **7.94× mean speedup.** The 33 FusedConv nodes that previously fell back to CPU now stay on the ANE/GPU. Variance also tightens 6× (stddev 1.01 → 0.16). Partition counts on other Conv-heavy ONNX-zoo models post CPU-optimization: | Model | Without | With | Notes | |---|---|---|---| | ResNet50-v2 | 18 | **1** | 33 FusedConv (Relu) | | FCN-ResNet50 | 18 | **1** | 35 FusedConv (Relu); fails to compile on CoreML for unrelated reasons | | YOLOv3 (full) | 27 | **4** | 72 FusedConv (LeakyRelu); detection post-proc fails on CoreML for unrelated dynamic-shape reasons | | YOLOv3-tiny | 13 | **7** | 11 FusedConv (LeakyRelu); same | Partition reduction is robust across architectures. ResNet50 is the configuration that runs end-to-end on this exact ONNX-zoo collection on the CoreML EP today; the FCN/YOLO failures are orthogonal CoreML-EP limitations on segmentation upsampling and detection post-processing. ### Implementation Reuses \`ConvOpBuilder\`, which now branches on \`op_type\`: - \`Conv\`: behaviour unchanged. - \`FusedConv\`: emit the \`conv\` MIL op into an intermediate, then chain the activation MIL op on top. Supports all six activation types \`ConvActivationFusion\` produces: | ONNX activation | MIL op | params | |---|---|---| | Relu | \`relu\` | – | | Sigmoid | \`sigmoid\` | – | | Tanh | \`tanh\` | – | | LeakyRelu | \`leaky_relu\` | alpha (from \`activation_params\`) | | Clip | \`clip\` | alpha=min, beta=max (from \`activation_params\`) | | HardSigmoid | \`sigmoid_hard\` | alpha, beta (from \`activation_params\`) | \`IsOpSupportedImpl\` rejects \`FusedConv\` in NeuralNetwork mode (which would emit an unfused Conv and silently lose the activation) and rejects any unrecognized activation string. ### Tests Six new tests in \`onnxruntime/test/providers/coreml/coreml_basic_test.cc\`, one per supported activation class (param-less, single-param, two-param-positional, two-param-named): - \`FusedConvTestRelu\` — no \`activation_params\` attribute - \`FusedConvTestSigmoid\` — same shape, exercises sigmoid op-name dispatch - \`FusedConvTestTanh\` — same shape, exercises tanh op-name dispatch - \`FusedConvTestLeakyRelu\` — single param (alpha); the YOLOv3 case - \`FusedConvTestClip\` — two params (min, max) - \`FusedConvTestHardSigmoid\` — two params (alpha, beta); depends on the HardSigmoid CoreML builder landed in #28182 Each verifies CoreML output against the CPU EP reference and asserts \`ExpectedEPNodeAssignment::All\`. All pass locally on macOS 26.3 / M3 Max. Also adds the supported-ops doc entry. --------- Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Description
Adds
HardSigmoidto the CoreML Execution Provider's activation op builder. Both MLProgram (sigmoid_hard) and NeuralNetwork (ActivationSigmoidHard) code paths are implemented; the op's ONNX definition matches CoreML MIL'ssigmoid_hardexactly, so no decomposition is required.Adds a dedicated CoreML-EP test (
CoreMLExecutionProviderTest.HardSigmoidTest) that builds a single-node HardSigmoid model with non-defaultalpha/betaand usesRunAndVerifyOutputsWithEPwithExpectedEPNodeAssignment::Allto confirm (a) the entire graph is claimed by the CoreML EP in both NN and MLProgram formats, and (b) the output matches the CPU reference. I verified the test is not trivially passing by temporarily unregistering HardSigmoid from the activation builder — the test fails withVerifyEPNodeAssignmentemitting a fatal failure, proving it genuinely exercises the CoreML path. (The existing multi-EP test inactivation_op_test.ccsilently falls back to CPU when an EP rejects the node, so it does not give CoreML coverage on its own.)Also updates
coreml_supported_mlprogram_ops.md.Motivation and Context
Fixes #28181.
On a DWPose pose-estimation model (
dw-ll_ucoco_384.onnx), 4 HardSigmoid ops were each forcing a CoreML → CPU → CoreML round-trip, and also caused downstream ops to be rejected with "unsupported inputs" because their producers had been sent to CPU. Adding HardSigmoid collapses the graph from 5 CoreML subgraphs to 1, and drops inference from 9.22 ms to 6.92 ms (−25%) on Apple Silicon with MLProgram + ComputeUnits=ALL.