[CoreML EP] Add HardSigmoid support by maxwbuckley · Pull Request #28182 · microsoft/onnxruntime

maxwbuckley · 2026-04-22T13:13:24Z

Description

Adds HardSigmoid to the CoreML Execution Provider's activation op builder. Both MLProgram (sigmoid_hard) and NeuralNetwork (ActivationSigmoidHard) code paths are implemented; the op's ONNX definition matches CoreML MIL's sigmoid_hard exactly, so no decomposition is required.

Adds a dedicated CoreML-EP test (CoreMLExecutionProviderTest.HardSigmoidTest) that builds a single-node HardSigmoid model with non-default alpha/beta and uses RunAndVerifyOutputsWithEP with ExpectedEPNodeAssignment::All to confirm (a) the entire graph is claimed by the CoreML EP in both NN and MLProgram formats, and (b) the output matches the CPU reference. I verified the test is not trivially passing by temporarily unregistering HardSigmoid from the activation builder — the test fails with VerifyEPNodeAssignment emitting a fatal failure, proving it genuinely exercises the CoreML path. (The existing multi-EP test in activation_op_test.cc silently falls back to CPU when an EP rejects the node, so it does not give CoreML coverage on its own.)

Also updates coreml_supported_mlprogram_ops.md.

Motivation and Context

Fixes #28181.

On a DWPose pose-estimation model (dw-ll_ucoco_384.onnx), 4 HardSigmoid ops were each forcing a CoreML → CPU → CoreML round-trip, and also caused downstream ops to be rejected with "unsupported inputs" because their producers had been sent to CPU. Adding HardSigmoid collapses the graph from 5 CoreML subgraphs to 1, and drops inference from 9.22 ms to 6.92 ms (−25%) on Apple Silicon with MLProgram + ComputeUnits=ALL.

maxwbuckley · 2026-04-22T13:15:21Z

@microsoft-github-policy-service agree

maxwbuckley · 2026-04-22T13:26:19Z

Amended the branch to add a dedicated CoreML-EP test and correct the earlier claim about test coverage.

My original PR description asserted that the existing TEST_F(ActivationOpTest, HardSigmoid) in onnxruntime/test/providers/cpu/activation/activation_op_test.cc would automatically exercise the CoreML path via TestActivationOp. That turned out to be wrong: when CoreML EP rejects a node, the ORT session silently falls back to CPU and OpTester::Run still sees the correct output, so the test passes regardless of CoreML coverage. I confirmed this empirically by building with my patch reverted — the existing multi-EP test still passed.

The new test in coreml_basic_test.cc uses RunAndVerifyOutputsWithEP with ExpectedEPNodeAssignment::All, which asserts that every graph node is actually assigned to the CoreML EP (not just that the final output matches CPU). Verified that this test genuinely catches the regression: temporarily removing "HardSigmoid" from the activation op builder's op_types vector causes the test to fail with VerifyEPNodeAssignment emitting a fatal failure, as expected. With the patch applied, both NN-format and MLProgram-format sub-cases pass.

This pattern worth keeping in mind more broadly — the recent Softplus/Elu addition (#26462) also relies on the multi-EP CPU test and may not be catching CoreML-side regressions either.

Copilot

Pull request overview

Adds HardSigmoid operator coverage to the CoreML Execution Provider so models using this activation no longer fall back to CPU (avoiding CoreML↔CPU graph breaks) while maintaining output parity with the CPU reference.

Changes:

Implement HardSigmoid in CoreML EP activation builder for both MLProgram (sigmoid_hard) and NeuralNetwork (ActivationSigmoidHard) paths, including alpha/beta wiring.
Register HardSigmoid in the CoreML op builder factory.
Add a dedicated CoreML EP test that verifies full-node assignment and output correctness in both NN and MLProgram formats; update the supported-ops doc list.

Reviewed changes

Copilot reviewed 4 out of 4 changed files in this pull request and generated no comments.

File	Description
tools/ci_build/github/apple/coreml_supported_mlprogram_ops.md	Documents `ai.onnx:HardSigmoid` as supported for MLProgram.
onnxruntime/test/providers/coreml/coreml_basic_test.cc	Adds a single-node `HardSigmoid` model test verifying full CoreML assignment and correct outputs (NN + MLProgram).
onnxruntime/core/providers/coreml/builders/op_builder_factory.cc	Registers `HardSigmoid` with the activation op builder.
onnxruntime/core/providers/coreml/builders/impl/activation_op_builder.cc	Implements `HardSigmoid` conversion for MLProgram and NeuralNetwork model formats and lists it as a supported activation.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

yuslepukhin · 2026-04-22T22:04:46Z

The PR may require rebase from main when the pipelines are fixed.

Adds `HardSigmoid` to the CoreML Execution Provider's activation op builder. Both MLProgram (`sigmoid_hard`) and NeuralNetwork (`ActivationSigmoidHard`) code paths are implemented; the op's ONNX definition matches CoreML MIL's `sigmoid_hard` exactly, so no decomposition is required. Adds a dedicated CoreML-EP test `CoreMLExecutionProviderTest.HardSigmoidTest` that verifies the entire graph is placed on the CoreML EP (both NN and MLProgram formats) via `ExpectedEPNodeAssignment::All`, and that the output matches the CPU reference. The existing multi-EP test in `activation_op_test.cc` silently falls back to CPU for unsupported-on-EP ops, so a dedicated test is required to genuinely verify the CoreML path. Also updates coreml_supported_mlprogram_ops.md. Fixes microsoft#28181. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

yuslepukhin · 2026-04-23T19:18:19Z

Force push makes it hard to review the changes.

maxwbuckley · 2026-04-23T19:20:31Z

Apologies — I force-pushed after rebasing on main and amending. Won't repeat that pattern; I'll stack follow-up commits instead so the review-since-last diff stays usable.

For this round, what changed on top of the original commit (72940ee5ab) is exactly two things:

Dtype gate for HardSigmoid — address your inline comment on line 182. New branch in ActivationOpBuilder::IsOpSupportedImpl:
```
if (op_type == "HardSigmoid") {
  const auto input_dtype = node.InputDefs()[0]->TypeAsProto()->tensor_type().elem_type();
  if (input_dtype != ONNX_NAMESPACE::TensorProto_DataType_FLOAT &&
      input_dtype != ONNX_NAMESPACE::TensorProto_DataType_FLOAT16) {
    LOGS(logger, VERBOSE) << ...;
    return false;
  }
}
```
Double / bfloat16 inputs now fall back to CPU at GetCapability time instead of silently being narrowed to fp16 via the else branch of AddToModelBuilderImpl. Scoped to HardSigmoid only — LeakyRelu / Elu kept as-is to match your "pre-existing pattern" note; happy to open a separate PR if you want the same fix for those.
Rebase onto current main — no code changes, just a fast-forward over 4265122712.

Range-diff if helpful: git range-diff 72940ee5ab..81c4421ecd against the forked branch.

Copilot

Pull request overview

Copilot reviewed 4 out of 4 changed files in this pull request and generated no new comments.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

yuslepukhin

LGTM

yuslepukhin · 2026-04-24T15:15:57Z

/azp run Win_TRT_Minimal_CUDA_Test_CI, Windows GPU Doc Gen CI Pipeline

azure-pipelines · 2026-04-24T15:16:12Z

Azure Pipelines successfully started running 2 pipeline(s).

### Description Adds support for \`com.microsoft:FusedConv\` to the CoreML EP's MLProgram and NeuralNetwork paths. \`FusedConv\` is produced by ORT's \`ConvActivationFusion\` pass when a model is optimized with the CPU EP (or any EP in \`cpu_acl_js_webgpu_eps\`) and saved via \`session.optimized_model_filepath\` or the ORT-format conversion tool. The saved graph contains \`com.microsoft:FusedConv\` nodes that — before this patch — the CoreML EP could not claim, fragmenting the partition. ORT's in-process pipeline does not currently run \`ConvActivationFusion\` when CoreML EP is the target (the fusion's compat set excludes CoreML), so \`FusedConv\` typically reaches the CoreML EP only via pre-optimized graphs. That's a real and common workflow: anyone shipping a pre-optimized model artifact (mobile pipelines, ORT-format models, session-cached optimized graphs) that's then loaded with the CoreML EP hits this path. There's no pre-existing issue tracking this; it was discovered via DWPose / ResNet50 partitioning analysis on Apple Silicon. ### Empirical impact ResNet50-v2 from the ONNX model zoo, CPU-optimized at \`ORT_ENABLE_EXTENDED\` and reloaded on the CoreML EP (108 nodes total, 33 of them \`FusedConv\` with Relu activation). M3 Max, MLProgram, batch 1, 100-iter timed runs, 3 interleaved rounds (n=597 per variant): | | Partitions | Nodes on CoreML | Mean | StdDev | P99 | Max | |---|---|---|---|---|---|---| | Without this patch | 18 | 75 / 108 | 23.34 ms | 1.01 | 27.68 | 30.59 | | **With this patch** | **1** | **108 / 108** | **2.94 ms** | **0.16** | **3.75** | **4.32** | **7.94× mean speedup.** The 33 FusedConv nodes that previously fell back to CPU now stay on the ANE/GPU. Variance also tightens 6× (stddev 1.01 → 0.16). Partition counts on other Conv-heavy ONNX-zoo models post CPU-optimization: | Model | Without | With | Notes | |---|---|---|---| | ResNet50-v2 | 18 | **1** | 33 FusedConv (Relu) | | FCN-ResNet50 | 18 | **1** | 35 FusedConv (Relu); fails to compile on CoreML for unrelated reasons | | YOLOv3 (full) | 27 | **4** | 72 FusedConv (LeakyRelu); detection post-proc fails on CoreML for unrelated dynamic-shape reasons | | YOLOv3-tiny | 13 | **7** | 11 FusedConv (LeakyRelu); same | Partition reduction is robust across architectures. ResNet50 is the configuration that runs end-to-end on this exact ONNX-zoo collection on the CoreML EP today; the FCN/YOLO failures are orthogonal CoreML-EP limitations on segmentation upsampling and detection post-processing. ### Implementation Reuses \`ConvOpBuilder\`, which now branches on \`op_type\`: - \`Conv\`: behaviour unchanged. - \`FusedConv\`: emit the \`conv\` MIL op into an intermediate, then chain the activation MIL op on top. Supports all six activation types \`ConvActivationFusion\` produces: | ONNX activation | MIL op | params | |---|---|---| | Relu | \`relu\` | – | | Sigmoid | \`sigmoid\` | – | | Tanh | \`tanh\` | – | | LeakyRelu | \`leaky_relu\` | alpha (from \`activation_params\`) | | Clip | \`clip\` | alpha=min, beta=max (from \`activation_params\`) | | HardSigmoid | \`sigmoid_hard\` | alpha, beta (from \`activation_params\`) | \`IsOpSupportedImpl\` rejects \`FusedConv\` in NeuralNetwork mode (which would emit an unfused Conv and silently lose the activation) and rejects any unrecognized activation string. ### Tests Six new tests in \`onnxruntime/test/providers/coreml/coreml_basic_test.cc\`, one per supported activation class (param-less, single-param, two-param-positional, two-param-named): - \`FusedConvTestRelu\` — no \`activation_params\` attribute - \`FusedConvTestSigmoid\` — same shape, exercises sigmoid op-name dispatch - \`FusedConvTestTanh\` — same shape, exercises tanh op-name dispatch - \`FusedConvTestLeakyRelu\` — single param (alpha); the YOLOv3 case - \`FusedConvTestClip\` — two params (min, max) - \`FusedConvTestHardSigmoid\` — two params (alpha, beta); depends on the HardSigmoid CoreML builder landed in #28182 Each verifies CoreML output against the CPU EP reference and asserts \`ExpectedEPNodeAssignment::All\`. All pass locally on macOS 26.3 / M3 Max. Also adds the supported-ops doc entry. --------- Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

maxwbuckley force-pushed the coreml-hardsigmoid branch from d2e815c to 72940ee Compare April 22, 2026 13:25

yuslepukhin requested a review from Copilot April 22, 2026 19:02

Copilot started reviewing on behalf of yuslepukhin April 22, 2026 19:03 View session

Copilot AI reviewed Apr 22, 2026

View reviewed changes

yuslepukhin reviewed Apr 22, 2026

View reviewed changes

Comment thread onnxruntime/core/providers/coreml/builders/impl/activation_op_builder.cc

Comment thread onnxruntime/core/providers/coreml/builders/impl/activation_op_builder.cc

maxwbuckley force-pushed the coreml-hardsigmoid branch from 72940ee to 81c4421 Compare April 23, 2026 17:11

maxwbuckley mentioned this pull request Apr 23, 2026

[CoreML EP] Add QuickGelu support #28184

Merged

yuslepukhin requested a review from Copilot April 23, 2026 19:21

Copilot started reviewing on behalf of yuslepukhin April 23, 2026 19:21 View session

Copilot AI reviewed Apr 23, 2026

View reviewed changes

yuslepukhin approved these changes Apr 23, 2026

View reviewed changes

yuslepukhin enabled auto-merge (squash) April 23, 2026 22:38

yuslepukhin merged commit 5dd7f15 into microsoft:main Apr 24, 2026
95 of 100 checks passed

This was referenced Apr 29, 2026

[CoreML EP] Support pre-opset-13 Split via 'split' attribute #28270

Merged

[CoreML EP] Add FusedConv support #28289

Merged

BrewTestBot mentioned this pull request May 8, 2026

onnxruntime 1.26.0 Homebrew/homebrew-core#281672

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[CoreML EP] Add HardSigmoid support#28182

[CoreML EP] Add HardSigmoid support#28182
yuslepukhin merged 1 commit intomicrosoft:mainfrom
maxwbuckley:coreml-hardsigmoid

maxwbuckley commented Apr 22, 2026 •

edited

Loading

Uh oh!

maxwbuckley commented Apr 22, 2026

Uh oh!

maxwbuckley commented Apr 22, 2026

Uh oh!

Copilot AI left a comment

Uh oh!

Uh oh!

Uh oh!

yuslepukhin commented Apr 22, 2026

Uh oh!

yuslepukhin commented Apr 23, 2026

Uh oh!

maxwbuckley commented Apr 23, 2026

Uh oh!

Copilot AI left a comment

Uh oh!

yuslepukhin left a comment

Uh oh!

yuslepukhin commented Apr 24, 2026

Uh oh!

azure-pipelines Bot commented Apr 24, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Conversation

maxwbuckley commented Apr 22, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Description

Motivation and Context

Uh oh!

maxwbuckley commented Apr 22, 2026

Uh oh!

maxwbuckley commented Apr 22, 2026

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Reviewed changes

Uh oh!

Uh oh!

Uh oh!

yuslepukhin commented Apr 22, 2026

Uh oh!

yuslepukhin commented Apr 23, 2026

Uh oh!

maxwbuckley commented Apr 23, 2026

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Uh oh!

yuslepukhin left a comment

Choose a reason for hiding this comment

Uh oh!

yuslepukhin commented Apr 24, 2026

Uh oh!

azure-pipelines Bot commented Apr 24, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

maxwbuckley commented Apr 22, 2026 •

edited

Loading