From cf78dce3580b5153032c163e70288350db996428 Mon Sep 17 00:00:00 2001
From: Max Buckley <maxwbuckley@gmail.com>
Date: Fri, 24 Apr 2026 22:55:40 +0200
Subject: [PATCH 1/7] [CoreML EP] Add FusedConv support
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: 8bit

Adds support for `com.microsoft:FusedConv` to the CoreML EP's MLProgram
and NeuralNetwork paths. FusedConv is produced by ORT's
`ConvActivationFusion` pass when a model is optimized with the CPU EP
(or any EP in `cpu_acl_js_webgpu_eps`) and saved via
`session.optimized_model_filepath` or the ORT-format conversion tool.
That saved graph contains `com.microsoft:FusedConv` nodes that — before
this patch — the CoreML EP could not claim, fragmenting the partition.

ORT's in-process pipeline does not currently run `ConvActivationFusion`
when CoreML EP is the target (the fusion's compat set excludes CoreML),
so FusedConv typically reaches the CoreML EP only via pre-optimized
graphs. That's a real and common workflow: anyone shipping a
pre-optimized model artifact (mobile pipelines, ORT-format models,
session-cached optimized graphs) that's then loaded with the CoreML EP
hits this path.

## Empirical impact (M3 Max, MLProgram, batch 1)

ResNet50-v2 from the ONNX model zoo, CPU-optimized at ORT_ENABLE_EXTENDED
and reloaded on CoreML EP (108 nodes total, 33 of them FusedConv with
Relu activation):

|                              | Partitions | Nodes on CoreML | Mean      | StdDev | P99      | Max     |
|------------------------------|------------|-----------------|-----------|--------|----------|---------|
| Without this patch           | 18         | 75 / 108        | 23.34 ms  | 1.01   | 27.68    | 30.59   |
| With this patch              |  1         | 108 / 108       |  2.94 ms  | 0.16   |  3.75    |  4.32   |

7.94× mean speedup; the 33 FusedConv nodes that previously fell back to
CPU now stay on the ANE/GPU. Variance also tightens 6× (stddev
1.01 → 0.16). 597 timed iterations per variant, 3 interleaved rounds.

Partition counts on other Conv-heavy ONNX-zoo models with FusedConv
content (post CPU optimization):

| Model              | Without | With | Notes                                  |
|--------------------|---------|------|----------------------------------------|
| ResNet50-v2        |    18   |   1  | 33 FusedConv (Relu)                    |
| FCN-ResNet50       |    18   |   1  | 35 FusedConv (Relu); fails to compile  |
|                    |         |      |   on CoreML for unrelated reasons      |
| YOLOv3 (full)      |    27   |   4  | 72 FusedConv (LeakyRelu); detection    |
|                    |         |      |   post-proc fails on CoreML for        |
|                    |         |      |   unrelated dynamic-shape reasons      |
| YOLOv3-tiny        |    13   |   7  | 11 FusedConv (LeakyRelu); same         |

The partition-count reduction is robust across architectures; ResNet50
is the configuration that runs end-to-end on this exact ONNX-zoo model
collection.

## Implementation

Reuses `ConvOpBuilder`, which now branches on `op_type`:

  - `Conv`: behavior unchanged.
  - `FusedConv`: emit the `conv` MIL op into an intermediate, then chain
    the activation MIL op on top. Supports all six activation types
    `ConvActivationFusion` produces:
      Relu        -> relu
      Sigmoid     -> sigmoid
      Tanh        -> tanh
      LeakyRelu   -> leaky_relu      (alpha from activation_params)
      Clip        -> clip            (min/max from activation_params)
      HardSigmoid -> sigmoid_hard    (alpha/beta from activation_params)

`IsOpSupportedImpl` rejects FusedConv in NeuralNetwork mode (which would
emit an unfused Conv and lose the activation) and rejects any
unrecognized activation string.

## Tests

Six new tests in `coreml_basic_test.cc`, one per supported activation
class (param-less, single-param, two-param-positional, two-param-named):

  FusedConvTestRelu          — no `activation_params` attribute
  FusedConvTestSigmoid       — same shape, exercises sigmoid op-name dispatch
  FusedConvTestTanh          — same shape, exercises tanh op-name dispatch
  FusedConvTestLeakyRelu     — single param (alpha)
  FusedConvTestClip          — two params (min, max)
  FusedConvTestHardSigmoid   — two params (alpha, beta)

Each verifies CoreML output against the CPU EP reference and asserts
`ExpectedEPNodeAssignment::All`. All pass locally on macOS 26.3 / M3 Max.

Also adds the supported-ops doc entry.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
---
 .../coreml/builders/impl/conv_op_builder.cc   | 109 ++++++++++++++-
 .../coreml/builders/op_builder_factory.cc     |   5 +-
 .../providers/coreml/coreml_basic_test.cc     | 132 ++++++++++++++++++
 .../apple/coreml_supported_mlprogram_ops.md   |   1 +
 4 files changed, 244 insertions(+), 3 deletions(-)

diff --git a/onnxruntime/core/providers/coreml/builders/impl/conv_op_builder.cc b/onnxruntime/core/providers/coreml/builders/impl/conv_op_builder.cc
index d534aab1e86b6..877631b4692f5 100644
--- a/onnxruntime/core/providers/coreml/builders/impl/conv_op_builder.cc
+++ b/onnxruntime/core/providers/coreml/builders/impl/conv_op_builder.cc
@@ -15,6 +15,19 @@ using namespace CoreML::Specification;
 namespace onnxruntime {
 namespace coreml {
 
+namespace {
+
+// Set of activations that ORT's ConvActivationFusion may fold into a FusedConv
+// and that the CoreML EP has MLProgram equivalents for.
+// See onnxruntime/core/optimizer/conv_activation_fusion.cc:82-99 for the
+// producer side.
+bool IsSupportedFusedConvActivation(const std::string& name) {
+  return name == "Relu" || name == "Sigmoid" || name == "Tanh" ||
+         name == "LeakyRelu" || name == "Clip" || name == "HardSigmoid";
+}
+
+}  // namespace
+
 class ConvOpBuilder : public BaseOpBuilder {
   void AddInitializersToSkip(ModelBuilder& model_builder, const Node& node) const override;
 
@@ -92,9 +105,83 @@ Status ConvOpBuilder::AddToModelBuilderImpl(ModelBuilder& model_builder, const N
 
     AddPadTypeAndPads(*conv_op, model_builder, op_type, helper, num_spatial_dims);
 
-    AddOperationOutput(*conv_op, *node.OutputDefs()[0]);
+    const bool is_fused_conv = node.OpType() == "FusedConv";
+    if (!is_fused_conv) {
+      AddOperationOutput(*conv_op, *node.OutputDefs()[0]);
+      model_builder.AddOperation(std::move(conv_op));
+    } else {
+      // com.microsoft:FusedConv = Conv + activation. Emit conv into an
+      // intermediate, then the activation MIL op on top. Mirrors how
+      // ConvActivationFusion was going to compose them on other EPs.
+      const auto output_elem_type = static_cast<int32_t>(
+          node.OutputDefs()[0]->TypeAsProto()->tensor_type().elem_type());
+      std::vector<int64_t> output_shape;
+      ORT_RETURN_IF_NOT(GetShape(*node.OutputDefs()[0], output_shape, logger),
+                        "Failed to get FusedConv output shape");
+
+      const std::string& conv_out_name = model_builder.GetUniqueName(node, "fused_conv_conv_out");
+      AddIntermediateOperationOutput(*conv_op, conv_out_name, output_elem_type, output_shape);
+      model_builder.AddOperation(std::move(conv_op));
+
+      const std::string activation = helper.Get("activation", std::string(""));
+      const auto activation_params = helper.Get("activation_params", std::vector<float>{});
+
+      std::string_view mil_op;
+      if (activation == "Relu") {
+        mil_op = "relu";
+      } else if (activation == "Sigmoid") {
+        mil_op = "sigmoid";
+      } else if (activation == "Tanh") {
+        mil_op = "tanh";
+      } else if (activation == "LeakyRelu") {
+        mil_op = "leaky_relu";
+      } else if (activation == "Clip") {
+        mil_op = "clip";
+      } else if (activation == "HardSigmoid") {
+        mil_op = "sigmoid_hard";
+      } else {
+        return ORT_MAKE_STATUS(ONNXRUNTIME, INVALID_ARGUMENT,
+                               "FusedConv has unsupported activation: ", activation);
+      }
+
+      auto act_op = model_builder.CreateOperation(node, mil_op, "activation");
+      AddOperationInput(*act_op, "x", conv_out_name);
 
-    model_builder.AddOperation(std::move(conv_op));
+      auto add_scalar = [&](std::string_view port_name, float value) {
+        if (output_elem_type == ONNX_NAMESPACE::TensorProto_DataType_FLOAT) {
+          AddOperationInput(*act_op, std::string(port_name),
+                            model_builder.AddScalarConstant(act_op->type(), std::string(port_name), value));
+        } else {
+          AddOperationInput(*act_op, std::string(port_name),
+                            model_builder.AddScalarConstant(act_op->type(), std::string(port_name), MLFloat16(value)));
+        }
+      };
+
+      // Activation-specific params. ConvActivationFusion packs them into
+      // `activation_params` in this order (see conv_activation_fusion.cc:165-184):
+      //   LeakyRelu: [alpha]
+      //   Clip:      [min, max]
+      //   HardSigmoid: [alpha, beta]
+      if (activation == "LeakyRelu") {
+        const float alpha = activation_params.empty() ? 0.01f : activation_params[0];
+        add_scalar("alpha", alpha);
+      } else if (activation == "Clip") {
+        const float min_v = activation_params.size() > 0 ? activation_params[0]
+                                                         : std::numeric_limits<float>::lowest();
+        const float max_v = activation_params.size() > 1 ? activation_params[1]
+                                                         : std::numeric_limits<float>::max();
+        add_scalar("alpha", min_v);
+        add_scalar("beta", max_v);
+      } else if (activation == "HardSigmoid") {
+        const float alpha = activation_params.size() > 0 ? activation_params[0] : 0.2f;
+        const float beta = activation_params.size() > 1 ? activation_params[1] : 0.5f;
+        add_scalar("alpha", alpha);
+        add_scalar("beta", beta);
+      }
+
+      AddOperationOutput(*act_op, *node.OutputDefs()[0]);
+      model_builder.AddOperation(std::move(act_op));
+    }
   } else {
     std::unique_ptr<COREML_SPEC::NeuralNetworkLayer> layer = model_builder.CreateNNLayer(node);
 
@@ -232,6 +319,24 @@ bool ConvOpBuilder::IsOpSupportedImpl(const Node& node, const OpBuilderInputPara
                                       const logging::Logger& logger) const {
   const auto& name = node.Name();
   const auto& input_defs = node.InputDefs();
+  const bool is_fused_conv = node.OpType() == "FusedConv";
+
+  // FusedConv composes Conv with an activation op in a single node. Only
+  // implemented for the MLProgram path; fall back to CPU in NeuralNetwork mode
+  // rather than emitting an unfused Conv and losing the activation.
+  if (is_fused_conv) {
+    if (!input_params.create_mlprogram) {
+      LOGS(logger, VERBOSE) << "FusedConv is only supported in MLProgram format";
+      return false;
+    }
+    NodeAttrHelper fused_helper(node);
+    const std::string activation = fused_helper.Get("activation", std::string(""));
+    if (!IsSupportedFusedConvActivation(activation)) {
+      LOGS(logger, VERBOSE) << "FusedConv activation [" << activation
+                            << "] is not supported by the CoreML EP";
+      return false;
+    }
+  }
 
   const auto& weight_name = input_defs[1]->Name();
   const auto* weight = input_params.graph_viewer.GetConstantInitializer(weight_name);
diff --git a/onnxruntime/core/providers/coreml/builders/op_builder_factory.cc b/onnxruntime/core/providers/coreml/builders/op_builder_factory.cc
index d4f14273eeef5..2d7cee49a2cee 100644
--- a/onnxruntime/core/providers/coreml/builders/op_builder_factory.cc
+++ b/onnxruntime/core/providers/coreml/builders/op_builder_factory.cc
@@ -26,8 +26,11 @@ static OpBuilderRegistrations CreateOpBuilderRegistrations() {
   CreateActivationOpBuilder("Elu", op_registrations);
   CreateActivationOpBuilder("HardSigmoid", op_registrations);
 
-  // Microsoft-domain ops produced by ORT's own optimizer passes
+  // Microsoft-domain ops produced by ORT's own optimizer passes.
   CreateQuickGeluOpBuilder("QuickGelu", op_registrations);
+  // FusedConv (from ConvActivationFusion) reuses the existing ConvOpBuilder
+  // which branches on op_type internally.
+  CreateConvOpBuilder("FusedConv", op_registrations);
 
   // Unary ops
   CreateUnaryOpBuilder("Erf", op_registrations);
diff --git a/onnxruntime/test/providers/coreml/coreml_basic_test.cc b/onnxruntime/test/providers/coreml/coreml_basic_test.cc
index f56c81d2e89de..fbd73af7d6514 100644
--- a/onnxruntime/test/providers/coreml/coreml_basic_test.cc
+++ b/onnxruntime/test/providers/coreml/coreml_basic_test.cc
@@ -1164,6 +1164,138 @@ TEST(CoreMLExecutionProviderTest, QuickGeluTestFp16) {
 #endif
 }
 
+namespace {
+// Build a single-node com.microsoft:FusedConv model for the tests below.
+// Input X is {1, 2, 4, 4}, weight W is {3, 2, 2, 2} (constant initializer, set
+// to a simple pattern), no bias. stride=1, pad=0. Output is {1, 3, 3, 3}.
+ONNX_NAMESPACE::ModelProto MakeFusedConvModel(const std::string& activation,
+                                              const std::vector<float>& activation_params) {
+  ONNX_NAMESPACE::ModelProto model_proto;
+  model_proto.set_ir_version(ONNX_NAMESPACE::IR_VERSION);
+  auto* onnx_opset = model_proto.add_opset_import();
+  onnx_opset->set_domain("");
+  onnx_opset->set_version(13);
+  auto* ms_opset = model_proto.add_opset_import();
+  ms_opset->set_domain("com.microsoft");
+  ms_opset->set_version(1);
+
+  auto* graph_proto = model_proto.mutable_graph();
+  graph_proto->set_name("fused_conv_test");
+
+  auto add_tensor_value = [&](auto* proto, const char* name, const std::vector<int64_t>& shape) {
+    proto->set_name(name);
+    auto* tt = proto->mutable_type()->mutable_tensor_type();
+    tt->set_elem_type(ONNX_NAMESPACE::TensorProto_DataType_FLOAT);
+    for (int64_t d : shape) tt->mutable_shape()->add_dim()->set_dim_value(d);
+  };
+  add_tensor_value(graph_proto->add_input(), "X", {1, 2, 4, 4});
+  add_tensor_value(graph_proto->add_output(), "Y", {1, 3, 3, 3});
+
+  // Weight initializer: {3, 2, 2, 2} = 24 floats, deterministic pattern.
+  auto* w_init = graph_proto->add_initializer();
+  w_init->set_name("W");
+  w_init->set_data_type(ONNX_NAMESPACE::TensorProto_DataType_FLOAT);
+  for (int64_t d : {3, 2, 2, 2}) w_init->add_dims(d);
+  for (int i = 0; i < 3 * 2 * 2 * 2; ++i) {
+    w_init->add_float_data(static_cast<float>(i) * 0.05f - 0.4f);
+  }
+
+  auto* node = graph_proto->add_node();
+  node->set_op_type("FusedConv");
+  node->set_domain("com.microsoft");
+  node->add_input("X");
+  node->add_input("W");
+  node->add_output("Y");
+
+  // Set pads explicitly since the CoreML conv builder's VALID-pad branch
+  // omits the 'pad' input that the MIL op requires. Conv attrs otherwise
+  // default: strides=[1,1].
+  auto* pads_attr = node->add_attribute();
+  pads_attr->set_name("pads");
+  pads_attr->set_type(ONNX_NAMESPACE::AttributeProto_AttributeType_INTS);
+  for (int64_t v : {0, 0, 0, 0}) pads_attr->add_ints(v);
+
+  auto* act_attr = node->add_attribute();
+  act_attr->set_name("activation");
+  act_attr->set_type(ONNX_NAMESPACE::AttributeProto_AttributeType_STRING);
+  act_attr->set_s(activation);
+
+  if (!activation_params.empty()) {
+    auto* act_params_attr = node->add_attribute();
+    act_params_attr->set_name("activation_params");
+    act_params_attr->set_type(ONNX_NAMESPACE::AttributeProto_AttributeType_FLOATS);
+    for (float v : activation_params) act_params_attr->add_floats(v);
+  }
+
+  return model_proto;
+}
+
+void RunFusedConvTest(const std::string& activation,
+                      const std::vector<float>& activation_params,
+                      std::string_view log_id) {
+  auto model_proto = MakeFusedConvModel(activation, activation_params);
+  std::string model_data;
+  ASSERT_TRUE(model_proto.SerializeToString(&model_data));
+  gsl::span<const std::byte> model_span{reinterpret_cast<const std::byte*>(model_data.data()), model_data.size()};
+
+#if defined(__APPLE__)
+  std::vector<float> x_data(1 * 2 * 4 * 4);
+  for (size_t i = 0; i < x_data.size(); ++i) x_data[i] = static_cast<float>(i) * 0.1f - 1.5f;
+  OrtValue ml_value_x;
+  AllocatorPtr allocator = CPUAllocator::DefaultInstance();
+  CreateMLValue<float>(allocator, {1, 2, 4, 4}, x_data, &ml_value_x);
+
+  NameMLValMap feeds;
+  feeds.insert(std::make_pair("X", ml_value_x));
+
+  RunAndVerifyOutputsWithEP(model_span, std::string(log_id),
+                            MakeCoreMLExecutionProvider("MLProgram"),
+                            feeds,
+                            EPVerificationParams{ExpectedEPNodeAssignment::All});
+#else
+  TestModelLoad(model_span, MakeCoreMLExecutionProvider("MLProgram"), ExpectedEPNodeAssignment::All);
+#endif
+}
+}  // namespace
+
+TEST(CoreMLExecutionProviderTest, FusedConvTestRelu) {
+  // Param-less activation. Exercises the Conv → activation wiring with no
+  // `activation_params` attribute.
+  RunFusedConvTest("Relu", {}, "FusedConvTestRelu_MLProgram");
+}
+
+TEST(CoreMLExecutionProviderTest, FusedConvTestHardSigmoid) {
+  // Two-param activation (alpha, beta) with non-default values — catches any
+  // activation_params-wiring bug. Depends on the HardSigmoid CoreML builder
+  // landed in #28182.
+  RunFusedConvTest("HardSigmoid", {0.15f, 0.55f}, "FusedConvTestHardSigmoid_MLProgram");
+}
+
+TEST(CoreMLExecutionProviderTest, FusedConvTestClip) {
+  // Two-param activation where params map to alpha=min, beta=max in CoreML's
+  // clip op. Covers the remaining parametric activation.
+  RunFusedConvTest("Clip", {-0.5f, 0.5f}, "FusedConvTestClip_MLProgram");
+}
+
+TEST(CoreMLExecutionProviderTest, FusedConvTestLeakyRelu) {
+  // Single-param activation (alpha). Heavily used by YOLOv3 — a CPU-optimized
+  // YOLOv3 graph contains 72 Conv→LeakyRelu fusions, all of which would
+  // otherwise fall back to CPU and fragment the CoreML partition.
+  RunFusedConvTest("LeakyRelu", {0.1f}, "FusedConvTestLeakyRelu_MLProgram");
+}
+
+TEST(CoreMLExecutionProviderTest, FusedConvTestSigmoid) {
+  // Param-less Sigmoid activation. Distinct from the Relu test only in the
+  // emitted MIL op (`sigmoid` vs `relu`); guards against regressions in
+  // op-name dispatch.
+  RunFusedConvTest("Sigmoid", {}, "FusedConvTestSigmoid_MLProgram");
+}
+
+TEST(CoreMLExecutionProviderTest, FusedConvTestTanh) {
+  // Param-less Tanh activation; same rationale as the Sigmoid test for the
+  // remaining elementwise activation.
+  RunFusedConvTest("Tanh", {}, "FusedConvTestTanh_MLProgram");
+}
 #endif  // !(ORT_MINIMAL_BUILD)
 }  // namespace test
 }  // namespace onnxruntime
diff --git a/tools/ci_build/github/apple/coreml_supported_mlprogram_ops.md b/tools/ci_build/github/apple/coreml_supported_mlprogram_ops.md
index 5bcdcc2e1ecee..395813844906a 100644
--- a/tools/ci_build/github/apple/coreml_supported_mlprogram_ops.md
+++ b/tools/ci_build/github/apple/coreml_supported_mlprogram_ops.md
@@ -53,3 +53,4 @@ Keep in sync with doco generated from /docs/execution-providers/CoreML-Execution
 |ai.onnx:Transpose||
 |ai.onnx:Unsqueeze||
 |com.microsoft:QuickGelu|Produced by ORT's `QuickGeluFusion` optimizer pass. Decomposed into `mul` / `sigmoid` / `mul`.|
+|com.microsoft:FusedConv|Produced by ORT's `ConvActivationFusion` pass. Decomposed into `conv` + the fused activation (`Relu`, `Sigmoid`, `Tanh`, `LeakyRelu`, `Clip`, `HardSigmoid`).|

From ce199242e6080fc5140416f489007ecb0820bc31 Mon Sep 17 00:00:00 2001
From: Max Buckley <maxwbuckley@gmail.com>
Date: Wed, 6 May 2026 09:58:18 +0200
Subject: [PATCH 2/7] Drop redundant comment above
 IsSupportedFusedConvActivation

The activation list and the function name already convey what's
allowed; the cross-reference to a specific line range in
conv_activation_fusion.cc would rot the moment that file gets touched.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
---
 .../core/providers/coreml/builders/impl/conv_op_builder.cc    | 4 ----
 1 file changed, 4 deletions(-)

diff --git a/onnxruntime/core/providers/coreml/builders/impl/conv_op_builder.cc b/onnxruntime/core/providers/coreml/builders/impl/conv_op_builder.cc
index 877631b4692f5..f8c3c4ec2aa0c 100644
--- a/onnxruntime/core/providers/coreml/builders/impl/conv_op_builder.cc
+++ b/onnxruntime/core/providers/coreml/builders/impl/conv_op_builder.cc
@@ -17,10 +17,6 @@ namespace coreml {
 
 namespace {
 
-// Set of activations that ORT's ConvActivationFusion may fold into a FusedConv
-// and that the CoreML EP has MLProgram equivalents for.
-// See onnxruntime/core/optimizer/conv_activation_fusion.cc:82-99 for the
-// producer side.
 bool IsSupportedFusedConvActivation(const std::string& name) {
   return name == "Relu" || name == "Sigmoid" || name == "Tanh" ||
          name == "LeakyRelu" || name == "Clip" || name == "HardSigmoid";

From bb7f4b1ad5cd724e5f07304522889b4d6d9cac26 Mon Sep 17 00:00:00 2001
From: Max Buckley <maxwbuckley@gmail.com>
Date: Thu, 7 May 2026 09:58:46 +0200
Subject: [PATCH 3/7] [CoreML EP] Drive FusedConv activation handling from a
 single table

Replaces the duplicated activation lists in IsSupportedFusedConvActivation
and the if/else MIL-op chain in AddToModelBuilderImpl with a single
constexpr table mapping each ONNX activation name to its MIL op, expected
activation_params arity, and MIL input port names. Both the support gate
and the dispatch path now consult that table.

Also tightens IsOpSupportedImpl to reject FusedConv nodes whose
activation_params arity does not match what the activation expects (0 for
Relu/Sigmoid/Tanh, 1 for LeakyRelu, 2 for Clip/HardSigmoid). The CPU EP
already rejects mismatches in fused_activation.cc; CoreML now matches that
behaviour instead of silently inventing defaults.

Addresses review feedback from yuslepukhin and copilot-pull-request-reviewer
on #28289.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
---
 .../coreml/builders/impl/conv_op_builder.cc   | 96 ++++++++++---------
 1 file changed, 53 insertions(+), 43 deletions(-)

diff --git a/onnxruntime/core/providers/coreml/builders/impl/conv_op_builder.cc b/onnxruntime/core/providers/coreml/builders/impl/conv_op_builder.cc
index f8c3c4ec2aa0c..1e5d0ca8af319 100644
--- a/onnxruntime/core/providers/coreml/builders/impl/conv_op_builder.cc
+++ b/onnxruntime/core/providers/coreml/builders/impl/conv_op_builder.cc
@@ -1,6 +1,9 @@
 // Copyright (c) Microsoft Corporation. All rights reserved.
 // Licensed under the MIT License.
 
+#include <array>
+#include <string_view>
+
 #include "core/providers/common.h"
 #include "core/providers/coreml/builders/helper.h"
 #include "core/providers/coreml/builders/impl/base_op_builder.h"
@@ -17,9 +20,34 @@ namespace coreml {
 
 namespace {
 
-bool IsSupportedFusedConvActivation(const std::string& name) {
-  return name == "Relu" || name == "Sigmoid" || name == "Tanh" ||
-         name == "LeakyRelu" || name == "Clip" || name == "HardSigmoid";
+// Single source of truth for FusedConv activation handling. Drives both the
+// support check in IsOpSupportedImpl and the MIL op dispatch in
+// AddToModelBuilderImpl. `param_ports` lists the MIL op input ports that map
+// positionally to `activation_params`. ConvActivationFusion packs the params
+// in the same order: LeakyRelu=[alpha], Clip=[min,max], HardSigmoid=[alpha,
+// beta] (see conv_activation_fusion.cc:165-184). For MIL's `clip`, alpha/beta
+// are the min/max bounds.
+struct FusedConvActivationSpec {
+  std::string_view onnx_name;
+  std::string_view mil_op;
+  uint8_t param_count;
+  std::array<std::string_view, 2> param_ports;
+};
+
+constexpr FusedConvActivationSpec kFusedConvActivations[] = {
+    {"Relu", "relu", 0, {}},
+    {"Sigmoid", "sigmoid", 0, {}},
+    {"Tanh", "tanh", 0, {}},
+    {"LeakyRelu", "leaky_relu", 1, {{"alpha"}}},
+    {"Clip", "clip", 2, {{"alpha", "beta"}}},
+    {"HardSigmoid", "sigmoid_hard", 2, {{"alpha", "beta"}}},
+};
+
+const FusedConvActivationSpec* FindFusedConvActivationSpec(std::string_view name) {
+  for (const auto& spec : kFusedConvActivations) {
+    if (spec.onnx_name == name) return &spec;
+  }
+  return nullptr;
 }
 
 }  // namespace
@@ -122,25 +150,17 @@ Status ConvOpBuilder::AddToModelBuilderImpl(ModelBuilder& model_builder, const N
       const std::string activation = helper.Get("activation", std::string(""));
       const auto activation_params = helper.Get("activation_params", std::vector<float>{});
 
-      std::string_view mil_op;
-      if (activation == "Relu") {
-        mil_op = "relu";
-      } else if (activation == "Sigmoid") {
-        mil_op = "sigmoid";
-      } else if (activation == "Tanh") {
-        mil_op = "tanh";
-      } else if (activation == "LeakyRelu") {
-        mil_op = "leaky_relu";
-      } else if (activation == "Clip") {
-        mil_op = "clip";
-      } else if (activation == "HardSigmoid") {
-        mil_op = "sigmoid_hard";
-      } else {
-        return ORT_MAKE_STATUS(ONNXRUNTIME, INVALID_ARGUMENT,
-                               "FusedConv has unsupported activation: ", activation);
-      }
-
-      auto act_op = model_builder.CreateOperation(node, mil_op, "activation");
+      // IsOpSupportedImpl gates both of these, so the lookup and arity check
+      // serve as a defensive backstop rather than primary validation.
+      const auto* spec = FindFusedConvActivationSpec(activation);
+      ORT_RETURN_IF_NOT(spec != nullptr,
+                        "FusedConv has unsupported activation: ", activation);
+      ORT_RETURN_IF_NOT(activation_params.size() == spec->param_count,
+                        "FusedConv activation '", activation, "' expects ",
+                        static_cast<unsigned>(spec->param_count),
+                        " activation_params, got ", activation_params.size());
+
+      auto act_op = model_builder.CreateOperation(node, std::string(spec->mil_op), "activation");
       AddOperationInput(*act_op, "x", conv_out_name);
 
       auto add_scalar = [&](std::string_view port_name, float value) {
@@ -153,26 +173,8 @@ Status ConvOpBuilder::AddToModelBuilderImpl(ModelBuilder& model_builder, const N
         }
       };
 
-      // Activation-specific params. ConvActivationFusion packs them into
-      // `activation_params` in this order (see conv_activation_fusion.cc:165-184):
-      //   LeakyRelu: [alpha]
-      //   Clip:      [min, max]
-      //   HardSigmoid: [alpha, beta]
-      if (activation == "LeakyRelu") {
-        const float alpha = activation_params.empty() ? 0.01f : activation_params[0];
-        add_scalar("alpha", alpha);
-      } else if (activation == "Clip") {
-        const float min_v = activation_params.size() > 0 ? activation_params[0]
-                                                         : std::numeric_limits<float>::lowest();
-        const float max_v = activation_params.size() > 1 ? activation_params[1]
-                                                         : std::numeric_limits<float>::max();
-        add_scalar("alpha", min_v);
-        add_scalar("beta", max_v);
-      } else if (activation == "HardSigmoid") {
-        const float alpha = activation_params.size() > 0 ? activation_params[0] : 0.2f;
-        const float beta = activation_params.size() > 1 ? activation_params[1] : 0.5f;
-        add_scalar("alpha", alpha);
-        add_scalar("beta", beta);
+      for (uint8_t i = 0; i < spec->param_count; ++i) {
+        add_scalar(spec->param_ports[i], activation_params[i]);
       }
 
       AddOperationOutput(*act_op, *node.OutputDefs()[0]);
@@ -327,11 +329,19 @@ bool ConvOpBuilder::IsOpSupportedImpl(const Node& node, const OpBuilderInputPara
     }
     NodeAttrHelper fused_helper(node);
     const std::string activation = fused_helper.Get("activation", std::string(""));
-    if (!IsSupportedFusedConvActivation(activation)) {
+    const auto* spec = FindFusedConvActivationSpec(activation);
+    if (!spec) {
       LOGS(logger, VERBOSE) << "FusedConv activation [" << activation
                             << "] is not supported by the CoreML EP";
       return false;
     }
+    const auto activation_params = fused_helper.Get("activation_params", std::vector<float>{});
+    if (activation_params.size() != spec->param_count) {
+      LOGS(logger, VERBOSE) << "FusedConv activation [" << activation << "] expects "
+                            << static_cast<unsigned>(spec->param_count)
+                            << " activation_params, got " << activation_params.size();
+      return false;
+    }
   }
 
   const auto& weight_name = input_defs[1]->Name();

From d0f411b435336e2d921bc2bb3d6eb3705a881a6c Mon Sep 17 00:00:00 2001
From: Max Buckley <maxwbuckley@gmail.com>
Date: Thu, 7 May 2026 09:59:37 +0200
Subject: [PATCH 4/7] [CoreML EP] Gate FusedConv against Z residual input and
 non-float dtypes
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: 8bit

Adds two early rejections in IsOpSupportedImpl that the previous
implementation was silently letting through:

1. The optional 4th input 'Z' (residual sum) — FusedConv with Z is
   Y = activation(Conv(X,W,B) + Z), but the MLProgram lowering only emits
   conv + activation and never reads input[3]. Without this guard a
   pre-optimized Conv+Add+Act graph would be fully assigned to CoreML and
   produce the wrong result by dropping the residual add. Reported by
   yuslepukhin on #28289.

2. Non-float element types — FusedConv schema's `T` permits double, but
   the activation-param lambda only handles FLOAT and FLOAT16. CoreML does
   not support double anyway; reject double explicitly so the fallback to
   CPU is what actually runs.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
---
 .../coreml/builders/impl/conv_op_builder.cc    | 18 ++++++++++++++++++
 1 file changed, 18 insertions(+)

diff --git a/onnxruntime/core/providers/coreml/builders/impl/conv_op_builder.cc b/onnxruntime/core/providers/coreml/builders/impl/conv_op_builder.cc
index 1e5d0ca8af319..f08ee5eecb4e7 100644
--- a/onnxruntime/core/providers/coreml/builders/impl/conv_op_builder.cc
+++ b/onnxruntime/core/providers/coreml/builders/impl/conv_op_builder.cc
@@ -327,6 +327,24 @@ bool ConvOpBuilder::IsOpSupportedImpl(const Node& node, const OpBuilderInputPara
       LOGS(logger, VERBOSE) << "FusedConv is only supported in MLProgram format";
       return false;
     }
+    // FusedConv schema (contrib_defs.cc) has 4 inputs: X, W, B (optional),
+    // Z (optional). Z is a residual sum input — Y = activation(Conv(X,W,B) + Z).
+    // The MLProgram lowering below does not read input 3, so accepting a node
+    // with Z would silently drop the residual and produce wrong results.
+    if (input_defs.size() > 3) {
+      LOGS(logger, VERBOSE) << "FusedConv with the optional 'Z' (residual sum) input "
+                               "is not supported by the CoreML EP";
+      return false;
+    }
+    // Only float/float16 are wired through add_scalar in AddToModelBuilderImpl.
+    // FusedConv schema also allows double, which CoreML does not support.
+    const auto x_elem_type = input_defs[0]->TypeAsProto()->tensor_type().elem_type();
+    if (x_elem_type != ONNX_NAMESPACE::TensorProto_DataType_FLOAT &&
+        x_elem_type != ONNX_NAMESPACE::TensorProto_DataType_FLOAT16) {
+      LOGS(logger, VERBOSE) << "FusedConv element type [" << x_elem_type
+                            << "] is not supported by the CoreML EP (expected FLOAT or FLOAT16)";
+      return false;
+    }
     NodeAttrHelper fused_helper(node);
     const std::string activation = fused_helper.Get("activation", std::string(""));
     const auto* spec = FindFusedConvActivationSpec(activation);

From 3fd535f9e2d46eb60603a5d1d2e409908a629cd2 Mon Sep 17 00:00:00 2001
From: Max Buckley <maxwbuckley@gmail.com>
Date: Thu, 7 May 2026 10:07:35 +0200
Subject: [PATCH 5/7] [CoreML EP] Add FusedConv negative tests for NN-format
 and Z-input gating
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: 8bit

Adds two ExpectedEPNodeAssignment::None tests covering the support-gating
paths added in the previous commit:

- FusedConvNeuralNetworkNotSupported — FusedConv on the NeuralNetwork EP
  is rejected so the node falls back to CPU rather than emit an unfused
  Conv that silently drops the activation.

- FusedConvWithZInputNotSupported — FusedConv with the optional residual
  Z input is rejected to prevent the silent drop of Conv+Add+Act
  semantics that yuslepukhin flagged on #28289.

The unsupported-activation and wrong-arity rejections are also live but
not testable end-to-end: the CPU FusedConv kernel rejects those same
malformed graphs at kernel construction, so TestModelLoad's Initialize
fails before partition assignment can be observed.

MakeFusedConvModel grows an `add_z` knob to wire the optional 4th input.
A small RunFusedConvNegativeTest helper packages the
serialize-then-TestModelLoad pattern.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
---
 .../providers/coreml/coreml_basic_test.cc     | 45 ++++++++++++++++++-
 1 file changed, 44 insertions(+), 1 deletion(-)

diff --git a/onnxruntime/test/providers/coreml/coreml_basic_test.cc b/onnxruntime/test/providers/coreml/coreml_basic_test.cc
index 5fad6c9f7734d..b6e1545d6f319 100644
--- a/onnxruntime/test/providers/coreml/coreml_basic_test.cc
+++ b/onnxruntime/test/providers/coreml/coreml_basic_test.cc
@@ -1168,8 +1168,11 @@ namespace {
 // Build a single-node com.microsoft:FusedConv model for the tests below.
 // Input X is {1, 2, 4, 4}, weight W is {3, 2, 2, 2} (constant initializer, set
 // to a simple pattern), no bias. stride=1, pad=0. Output is {1, 3, 3, 3}.
+// When `add_z` is true, the optional 4th 'Z' (residual sum) input is added —
+// used by the negative test that exercises CoreML's rejection path.
 ONNX_NAMESPACE::ModelProto MakeFusedConvModel(const std::string& activation,
-                                              const std::vector<float>& activation_params) {
+                                              const std::vector<float>& activation_params,
+                                              bool add_z = false) {
   ONNX_NAMESPACE::ModelProto model_proto;
   model_proto.set_ir_version(ONNX_NAMESPACE::IR_VERSION);
   auto* onnx_opset = model_proto.add_opset_import();
@@ -1189,6 +1192,9 @@ ONNX_NAMESPACE::ModelProto MakeFusedConvModel(const std::string& activation,
     for (int64_t d : shape) tt->mutable_shape()->add_dim()->set_dim_value(d);
   };
   add_tensor_value(graph_proto->add_input(), "X", {1, 2, 4, 4});
+  if (add_z) {
+    add_tensor_value(graph_proto->add_input(), "Z", {1, 3, 3, 3});
+  }
   add_tensor_value(graph_proto->add_output(), "Y", {1, 3, 3, 3});
 
   // Weight initializer: {3, 2, 2, 2} = 24 floats, deterministic pattern.
@@ -1205,6 +1211,12 @@ ONNX_NAMESPACE::ModelProto MakeFusedConvModel(const std::string& activation,
   node->set_domain("com.microsoft");
   node->add_input("X");
   node->add_input("W");
+  if (add_z) {
+    // FusedConv schema: X, W, B(optional), Z(optional). Skip B with "" so Z
+    // lands in input slot 3.
+    node->add_input("");
+    node->add_input("Z");
+  }
   node->add_output("Y");
 
   // Set pads explicitly since the CoreML conv builder's VALID-pad branch
@@ -1230,6 +1242,15 @@ ONNX_NAMESPACE::ModelProto MakeFusedConvModel(const std::string& activation,
   return model_proto;
 }
 
+void RunFusedConvNegativeTest(const ONNX_NAMESPACE::ModelProto& model_proto, bool mlprogram) {
+  std::string model_data;
+  ASSERT_TRUE(model_proto.SerializeToString(&model_data));
+  gsl::span<const std::byte> model_span{reinterpret_cast<const std::byte*>(model_data.data()), model_data.size()};
+  auto provider = mlprogram ? MakeCoreMLExecutionProvider("MLProgram")
+                            : MakeCoreMLExecutionProvider();
+  TestModelLoad(model_span, std::move(provider), ExpectedEPNodeAssignment::None);
+}
+
 void RunFusedConvTest(const std::string& activation,
                       const std::vector<float>& activation_params,
                       std::string_view log_id) {
@@ -1297,6 +1318,28 @@ TEST(CoreMLExecutionProviderTest, FusedConvTestTanh) {
   RunFusedConvTest("Tanh", {}, "FusedConvTestTanh_MLProgram");
 }
 
+// Negative tests below cover the two gating cases that have a working CPU
+// fallback (so TestModelLoad's Initialize() succeeds and the EP partition
+// assignment can be verified). The arity-mismatch and unsupported-activation
+// cases are also rejected by IsOpSupportedImpl, but the CPU FusedConv kernel
+// rejects them too, so there's no end-to-end fallback to observe.
+
+TEST(CoreMLExecutionProviderTest, FusedConvNeuralNetworkNotSupported) {
+  // FusedConv is only implemented on the MLProgram path. The NeuralNetwork
+  // builder must reject it so the node falls back to CPU rather than emit an
+  // unfused Conv and silently lose the activation.
+  RunFusedConvNegativeTest(MakeFusedConvModel("Relu", {}), /*mlprogram=*/false);
+}
+
+TEST(CoreMLExecutionProviderTest, FusedConvWithZInputNotSupported) {
+  // The optional Z residual sum input (Y = activation(Conv(X,W,B) + Z)) is
+  // not lowered by the MLProgram builder. Accepting such a node would
+  // silently drop the residual add and produce wrong results, so it must be
+  // rejected and fall back to CPU.
+  RunFusedConvNegativeTest(MakeFusedConvModel("Relu", {}, /*add_z=*/true),
+                           /*mlprogram=*/true);
+}
+
 TEST(CoreMLExecutionProviderTest, Split11UnevenAttribute) {
   // ai.onnx:Split-11 with `split` attribute carrying non-uniform sizes.
   // This is the form used by DWPose (`dw-ll_ucoco_384.onnx`); without

From cb96038238523ccd0508119194cd32bab3fb8e51 Mon Sep 17 00:00:00 2001
From: Max Buckley <maxwbuckley@gmail.com>
Date: Thu, 7 May 2026 10:08:04 +0200
Subject: [PATCH 6/7] [CoreML EP] Reword FusedConv factory comment

The previous comment said FusedConv "reuses the existing ConvOpBuilder",
which Copilot flagged as misleading because CreateConvOpBuilder registers
a new instance under the FusedConv op type rather than literally reusing
the Conv-registered instance. Reword to "handled by the same ConvOpBuilder
class" so it's clear the reuse is at the class/dispatch level, not the
instance.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
---
 .../core/providers/coreml/builders/op_builder_factory.cc      | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/onnxruntime/core/providers/coreml/builders/op_builder_factory.cc b/onnxruntime/core/providers/coreml/builders/op_builder_factory.cc
index 2d7cee49a2cee..6f465774a3c3c 100644
--- a/onnxruntime/core/providers/coreml/builders/op_builder_factory.cc
+++ b/onnxruntime/core/providers/coreml/builders/op_builder_factory.cc
@@ -28,8 +28,8 @@ static OpBuilderRegistrations CreateOpBuilderRegistrations() {
 
   // Microsoft-domain ops produced by ORT's own optimizer passes.
   CreateQuickGeluOpBuilder("QuickGelu", op_registrations);
-  // FusedConv (from ConvActivationFusion) reuses the existing ConvOpBuilder
-  // which branches on op_type internally.
+  // FusedConv (from ConvActivationFusion) is handled by the same ConvOpBuilder
+  // class, which branches on op_type internally.
   CreateConvOpBuilder("FusedConv", op_registrations);
 
   // Unary ops

From 0f851406f738dd48bbbafb6df86bf075cc3242e6 Mon Sep 17 00:00:00 2001
From: Max Buckley <maxwbuckley@gmail.com>
Date: Thu, 7 May 2026 10:24:47 +0200
Subject: [PATCH 7/7] [CoreML EP] Document Z residual input as a TODO

Adds a TODO above the FusedConv Z-input rejection pointing at the
straightforward MIL lowering (`add(conv_out, Z)` between conv and
activation) and noting which optimizer pass produces the Z form
(ConvAddActivationFusion at TransformerLevel::Level3, gated to cpu_ep).
This way the next person looking at residual-block coverage on CoreML
finds the implementation hint without re-discovering the schema and
optimizer pass independently.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
---
 .../core/providers/coreml/builders/impl/conv_op_builder.cc | 7 +++++++
 1 file changed, 7 insertions(+)

diff --git a/onnxruntime/core/providers/coreml/builders/impl/conv_op_builder.cc b/onnxruntime/core/providers/coreml/builders/impl/conv_op_builder.cc
index f08ee5eecb4e7..3c6794b300557 100644
--- a/onnxruntime/core/providers/coreml/builders/impl/conv_op_builder.cc
+++ b/onnxruntime/core/providers/coreml/builders/impl/conv_op_builder.cc
@@ -331,6 +331,13 @@ bool ConvOpBuilder::IsOpSupportedImpl(const Node& node, const OpBuilderInputPara
     // Z (optional). Z is a residual sum input — Y = activation(Conv(X,W,B) + Z).
     // The MLProgram lowering below does not read input 3, so accepting a node
     // with Z would silently drop the residual and produce wrong results.
+    //
+    // TODO: support Z by inserting an `add` MIL op between the conv output
+    // and the activation input — `act_in = add(conv_out, Z)` — preserving the
+    // `act(conv + Z)` ordering. This would unlock CoreML coverage for graphs
+    // optimized at TransformerLevel::Level3 (ORT_ENABLE_ALL) where
+    // ConvAddActivationFusion (core/optimizer/conv_add_act_fusion.cc) produces
+    // FusedConv(B, Z, act) for residual blocks (ResNet/EfficientNet etc).
     if (input_defs.size() > 3) {
       LOGS(logger, VERBOSE) << "FusedConv with the optional 'Z' (residual sum) input "
                                "is not supported by the CoreML EP";