From cf78dce3580b5153032c163e70288350db996428 Mon Sep 17 00:00:00 2001 From: Max Buckley Date: Fri, 24 Apr 2026 22:55:40 +0200 Subject: [PATCH 1/7] [CoreML EP] Add FusedConv support MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit Adds support for `com.microsoft:FusedConv` to the CoreML EP's MLProgram and NeuralNetwork paths. FusedConv is produced by ORT's `ConvActivationFusion` pass when a model is optimized with the CPU EP (or any EP in `cpu_acl_js_webgpu_eps`) and saved via `session.optimized_model_filepath` or the ORT-format conversion tool. That saved graph contains `com.microsoft:FusedConv` nodes that — before this patch — the CoreML EP could not claim, fragmenting the partition. ORT's in-process pipeline does not currently run `ConvActivationFusion` when CoreML EP is the target (the fusion's compat set excludes CoreML), so FusedConv typically reaches the CoreML EP only via pre-optimized graphs. That's a real and common workflow: anyone shipping a pre-optimized model artifact (mobile pipelines, ORT-format models, session-cached optimized graphs) that's then loaded with the CoreML EP hits this path. ## Empirical impact (M3 Max, MLProgram, batch 1) ResNet50-v2 from the ONNX model zoo, CPU-optimized at ORT_ENABLE_EXTENDED and reloaded on CoreML EP (108 nodes total, 33 of them FusedConv with Relu activation): | | Partitions | Nodes on CoreML | Mean | StdDev | P99 | Max | |------------------------------|------------|-----------------|-----------|--------|----------|---------| | Without this patch | 18 | 75 / 108 | 23.34 ms | 1.01 | 27.68 | 30.59 | | With this patch | 1 | 108 / 108 | 2.94 ms | 0.16 | 3.75 | 4.32 | 7.94× mean speedup; the 33 FusedConv nodes that previously fell back to CPU now stay on the ANE/GPU. Variance also tightens 6× (stddev 1.01 → 0.16). 597 timed iterations per variant, 3 interleaved rounds. Partition counts on other Conv-heavy ONNX-zoo models with FusedConv content (post CPU optimization): | Model | Without | With | Notes | |--------------------|---------|------|----------------------------------------| | ResNet50-v2 | 18 | 1 | 33 FusedConv (Relu) | | FCN-ResNet50 | 18 | 1 | 35 FusedConv (Relu); fails to compile | | | | | on CoreML for unrelated reasons | | YOLOv3 (full) | 27 | 4 | 72 FusedConv (LeakyRelu); detection | | | | | post-proc fails on CoreML for | | | | | unrelated dynamic-shape reasons | | YOLOv3-tiny | 13 | 7 | 11 FusedConv (LeakyRelu); same | The partition-count reduction is robust across architectures; ResNet50 is the configuration that runs end-to-end on this exact ONNX-zoo model collection. ## Implementation Reuses `ConvOpBuilder`, which now branches on `op_type`: - `Conv`: behavior unchanged. - `FusedConv`: emit the `conv` MIL op into an intermediate, then chain the activation MIL op on top. Supports all six activation types `ConvActivationFusion` produces: Relu -> relu Sigmoid -> sigmoid Tanh -> tanh LeakyRelu -> leaky_relu (alpha from activation_params) Clip -> clip (min/max from activation_params) HardSigmoid -> sigmoid_hard (alpha/beta from activation_params) `IsOpSupportedImpl` rejects FusedConv in NeuralNetwork mode (which would emit an unfused Conv and lose the activation) and rejects any unrecognized activation string. ## Tests Six new tests in `coreml_basic_test.cc`, one per supported activation class (param-less, single-param, two-param-positional, two-param-named): FusedConvTestRelu — no `activation_params` attribute FusedConvTestSigmoid — same shape, exercises sigmoid op-name dispatch FusedConvTestTanh — same shape, exercises tanh op-name dispatch FusedConvTestLeakyRelu — single param (alpha) FusedConvTestClip — two params (min, max) FusedConvTestHardSigmoid — two params (alpha, beta) Each verifies CoreML output against the CPU EP reference and asserts `ExpectedEPNodeAssignment::All`. All pass locally on macOS 26.3 / M3 Max. Also adds the supported-ops doc entry. Co-Authored-By: Claude Opus 4.7 (1M context) --- .../coreml/builders/impl/conv_op_builder.cc | 109 ++++++++++++++- .../coreml/builders/op_builder_factory.cc | 5 +- .../providers/coreml/coreml_basic_test.cc | 132 ++++++++++++++++++ .../apple/coreml_supported_mlprogram_ops.md | 1 + 4 files changed, 244 insertions(+), 3 deletions(-) diff --git a/onnxruntime/core/providers/coreml/builders/impl/conv_op_builder.cc b/onnxruntime/core/providers/coreml/builders/impl/conv_op_builder.cc index d534aab1e86b6..877631b4692f5 100644 --- a/onnxruntime/core/providers/coreml/builders/impl/conv_op_builder.cc +++ b/onnxruntime/core/providers/coreml/builders/impl/conv_op_builder.cc @@ -15,6 +15,19 @@ using namespace CoreML::Specification; namespace onnxruntime { namespace coreml { +namespace { + +// Set of activations that ORT's ConvActivationFusion may fold into a FusedConv +// and that the CoreML EP has MLProgram equivalents for. +// See onnxruntime/core/optimizer/conv_activation_fusion.cc:82-99 for the +// producer side. +bool IsSupportedFusedConvActivation(const std::string& name) { + return name == "Relu" || name == "Sigmoid" || name == "Tanh" || + name == "LeakyRelu" || name == "Clip" || name == "HardSigmoid"; +} + +} // namespace + class ConvOpBuilder : public BaseOpBuilder { void AddInitializersToSkip(ModelBuilder& model_builder, const Node& node) const override; @@ -92,9 +105,83 @@ Status ConvOpBuilder::AddToModelBuilderImpl(ModelBuilder& model_builder, const N AddPadTypeAndPads(*conv_op, model_builder, op_type, helper, num_spatial_dims); - AddOperationOutput(*conv_op, *node.OutputDefs()[0]); + const bool is_fused_conv = node.OpType() == "FusedConv"; + if (!is_fused_conv) { + AddOperationOutput(*conv_op, *node.OutputDefs()[0]); + model_builder.AddOperation(std::move(conv_op)); + } else { + // com.microsoft:FusedConv = Conv + activation. Emit conv into an + // intermediate, then the activation MIL op on top. Mirrors how + // ConvActivationFusion was going to compose them on other EPs. + const auto output_elem_type = static_cast( + node.OutputDefs()[0]->TypeAsProto()->tensor_type().elem_type()); + std::vector output_shape; + ORT_RETURN_IF_NOT(GetShape(*node.OutputDefs()[0], output_shape, logger), + "Failed to get FusedConv output shape"); + + const std::string& conv_out_name = model_builder.GetUniqueName(node, "fused_conv_conv_out"); + AddIntermediateOperationOutput(*conv_op, conv_out_name, output_elem_type, output_shape); + model_builder.AddOperation(std::move(conv_op)); + + const std::string activation = helper.Get("activation", std::string("")); + const auto activation_params = helper.Get("activation_params", std::vector{}); + + std::string_view mil_op; + if (activation == "Relu") { + mil_op = "relu"; + } else if (activation == "Sigmoid") { + mil_op = "sigmoid"; + } else if (activation == "Tanh") { + mil_op = "tanh"; + } else if (activation == "LeakyRelu") { + mil_op = "leaky_relu"; + } else if (activation == "Clip") { + mil_op = "clip"; + } else if (activation == "HardSigmoid") { + mil_op = "sigmoid_hard"; + } else { + return ORT_MAKE_STATUS(ONNXRUNTIME, INVALID_ARGUMENT, + "FusedConv has unsupported activation: ", activation); + } + + auto act_op = model_builder.CreateOperation(node, mil_op, "activation"); + AddOperationInput(*act_op, "x", conv_out_name); - model_builder.AddOperation(std::move(conv_op)); + auto add_scalar = [&](std::string_view port_name, float value) { + if (output_elem_type == ONNX_NAMESPACE::TensorProto_DataType_FLOAT) { + AddOperationInput(*act_op, std::string(port_name), + model_builder.AddScalarConstant(act_op->type(), std::string(port_name), value)); + } else { + AddOperationInput(*act_op, std::string(port_name), + model_builder.AddScalarConstant(act_op->type(), std::string(port_name), MLFloat16(value))); + } + }; + + // Activation-specific params. ConvActivationFusion packs them into + // `activation_params` in this order (see conv_activation_fusion.cc:165-184): + // LeakyRelu: [alpha] + // Clip: [min, max] + // HardSigmoid: [alpha, beta] + if (activation == "LeakyRelu") { + const float alpha = activation_params.empty() ? 0.01f : activation_params[0]; + add_scalar("alpha", alpha); + } else if (activation == "Clip") { + const float min_v = activation_params.size() > 0 ? activation_params[0] + : std::numeric_limits::lowest(); + const float max_v = activation_params.size() > 1 ? activation_params[1] + : std::numeric_limits::max(); + add_scalar("alpha", min_v); + add_scalar("beta", max_v); + } else if (activation == "HardSigmoid") { + const float alpha = activation_params.size() > 0 ? activation_params[0] : 0.2f; + const float beta = activation_params.size() > 1 ? activation_params[1] : 0.5f; + add_scalar("alpha", alpha); + add_scalar("beta", beta); + } + + AddOperationOutput(*act_op, *node.OutputDefs()[0]); + model_builder.AddOperation(std::move(act_op)); + } } else { std::unique_ptr layer = model_builder.CreateNNLayer(node); @@ -232,6 +319,24 @@ bool ConvOpBuilder::IsOpSupportedImpl(const Node& node, const OpBuilderInputPara const logging::Logger& logger) const { const auto& name = node.Name(); const auto& input_defs = node.InputDefs(); + const bool is_fused_conv = node.OpType() == "FusedConv"; + + // FusedConv composes Conv with an activation op in a single node. Only + // implemented for the MLProgram path; fall back to CPU in NeuralNetwork mode + // rather than emitting an unfused Conv and losing the activation. + if (is_fused_conv) { + if (!input_params.create_mlprogram) { + LOGS(logger, VERBOSE) << "FusedConv is only supported in MLProgram format"; + return false; + } + NodeAttrHelper fused_helper(node); + const std::string activation = fused_helper.Get("activation", std::string("")); + if (!IsSupportedFusedConvActivation(activation)) { + LOGS(logger, VERBOSE) << "FusedConv activation [" << activation + << "] is not supported by the CoreML EP"; + return false; + } + } const auto& weight_name = input_defs[1]->Name(); const auto* weight = input_params.graph_viewer.GetConstantInitializer(weight_name); diff --git a/onnxruntime/core/providers/coreml/builders/op_builder_factory.cc b/onnxruntime/core/providers/coreml/builders/op_builder_factory.cc index d4f14273eeef5..2d7cee49a2cee 100644 --- a/onnxruntime/core/providers/coreml/builders/op_builder_factory.cc +++ b/onnxruntime/core/providers/coreml/builders/op_builder_factory.cc @@ -26,8 +26,11 @@ static OpBuilderRegistrations CreateOpBuilderRegistrations() { CreateActivationOpBuilder("Elu", op_registrations); CreateActivationOpBuilder("HardSigmoid", op_registrations); - // Microsoft-domain ops produced by ORT's own optimizer passes + // Microsoft-domain ops produced by ORT's own optimizer passes. CreateQuickGeluOpBuilder("QuickGelu", op_registrations); + // FusedConv (from ConvActivationFusion) reuses the existing ConvOpBuilder + // which branches on op_type internally. + CreateConvOpBuilder("FusedConv", op_registrations); // Unary ops CreateUnaryOpBuilder("Erf", op_registrations); diff --git a/onnxruntime/test/providers/coreml/coreml_basic_test.cc b/onnxruntime/test/providers/coreml/coreml_basic_test.cc index f56c81d2e89de..fbd73af7d6514 100644 --- a/onnxruntime/test/providers/coreml/coreml_basic_test.cc +++ b/onnxruntime/test/providers/coreml/coreml_basic_test.cc @@ -1164,6 +1164,138 @@ TEST(CoreMLExecutionProviderTest, QuickGeluTestFp16) { #endif } +namespace { +// Build a single-node com.microsoft:FusedConv model for the tests below. +// Input X is {1, 2, 4, 4}, weight W is {3, 2, 2, 2} (constant initializer, set +// to a simple pattern), no bias. stride=1, pad=0. Output is {1, 3, 3, 3}. +ONNX_NAMESPACE::ModelProto MakeFusedConvModel(const std::string& activation, + const std::vector& activation_params) { + ONNX_NAMESPACE::ModelProto model_proto; + model_proto.set_ir_version(ONNX_NAMESPACE::IR_VERSION); + auto* onnx_opset = model_proto.add_opset_import(); + onnx_opset->set_domain(""); + onnx_opset->set_version(13); + auto* ms_opset = model_proto.add_opset_import(); + ms_opset->set_domain("com.microsoft"); + ms_opset->set_version(1); + + auto* graph_proto = model_proto.mutable_graph(); + graph_proto->set_name("fused_conv_test"); + + auto add_tensor_value = [&](auto* proto, const char* name, const std::vector& shape) { + proto->set_name(name); + auto* tt = proto->mutable_type()->mutable_tensor_type(); + tt->set_elem_type(ONNX_NAMESPACE::TensorProto_DataType_FLOAT); + for (int64_t d : shape) tt->mutable_shape()->add_dim()->set_dim_value(d); + }; + add_tensor_value(graph_proto->add_input(), "X", {1, 2, 4, 4}); + add_tensor_value(graph_proto->add_output(), "Y", {1, 3, 3, 3}); + + // Weight initializer: {3, 2, 2, 2} = 24 floats, deterministic pattern. + auto* w_init = graph_proto->add_initializer(); + w_init->set_name("W"); + w_init->set_data_type(ONNX_NAMESPACE::TensorProto_DataType_FLOAT); + for (int64_t d : {3, 2, 2, 2}) w_init->add_dims(d); + for (int i = 0; i < 3 * 2 * 2 * 2; ++i) { + w_init->add_float_data(static_cast(i) * 0.05f - 0.4f); + } + + auto* node = graph_proto->add_node(); + node->set_op_type("FusedConv"); + node->set_domain("com.microsoft"); + node->add_input("X"); + node->add_input("W"); + node->add_output("Y"); + + // Set pads explicitly since the CoreML conv builder's VALID-pad branch + // omits the 'pad' input that the MIL op requires. Conv attrs otherwise + // default: strides=[1,1]. + auto* pads_attr = node->add_attribute(); + pads_attr->set_name("pads"); + pads_attr->set_type(ONNX_NAMESPACE::AttributeProto_AttributeType_INTS); + for (int64_t v : {0, 0, 0, 0}) pads_attr->add_ints(v); + + auto* act_attr = node->add_attribute(); + act_attr->set_name("activation"); + act_attr->set_type(ONNX_NAMESPACE::AttributeProto_AttributeType_STRING); + act_attr->set_s(activation); + + if (!activation_params.empty()) { + auto* act_params_attr = node->add_attribute(); + act_params_attr->set_name("activation_params"); + act_params_attr->set_type(ONNX_NAMESPACE::AttributeProto_AttributeType_FLOATS); + for (float v : activation_params) act_params_attr->add_floats(v); + } + + return model_proto; +} + +void RunFusedConvTest(const std::string& activation, + const std::vector& activation_params, + std::string_view log_id) { + auto model_proto = MakeFusedConvModel(activation, activation_params); + std::string model_data; + ASSERT_TRUE(model_proto.SerializeToString(&model_data)); + gsl::span model_span{reinterpret_cast(model_data.data()), model_data.size()}; + +#if defined(__APPLE__) + std::vector x_data(1 * 2 * 4 * 4); + for (size_t i = 0; i < x_data.size(); ++i) x_data[i] = static_cast(i) * 0.1f - 1.5f; + OrtValue ml_value_x; + AllocatorPtr allocator = CPUAllocator::DefaultInstance(); + CreateMLValue(allocator, {1, 2, 4, 4}, x_data, &ml_value_x); + + NameMLValMap feeds; + feeds.insert(std::make_pair("X", ml_value_x)); + + RunAndVerifyOutputsWithEP(model_span, std::string(log_id), + MakeCoreMLExecutionProvider("MLProgram"), + feeds, + EPVerificationParams{ExpectedEPNodeAssignment::All}); +#else + TestModelLoad(model_span, MakeCoreMLExecutionProvider("MLProgram"), ExpectedEPNodeAssignment::All); +#endif +} +} // namespace + +TEST(CoreMLExecutionProviderTest, FusedConvTestRelu) { + // Param-less activation. Exercises the Conv → activation wiring with no + // `activation_params` attribute. + RunFusedConvTest("Relu", {}, "FusedConvTestRelu_MLProgram"); +} + +TEST(CoreMLExecutionProviderTest, FusedConvTestHardSigmoid) { + // Two-param activation (alpha, beta) with non-default values — catches any + // activation_params-wiring bug. Depends on the HardSigmoid CoreML builder + // landed in #28182. + RunFusedConvTest("HardSigmoid", {0.15f, 0.55f}, "FusedConvTestHardSigmoid_MLProgram"); +} + +TEST(CoreMLExecutionProviderTest, FusedConvTestClip) { + // Two-param activation where params map to alpha=min, beta=max in CoreML's + // clip op. Covers the remaining parametric activation. + RunFusedConvTest("Clip", {-0.5f, 0.5f}, "FusedConvTestClip_MLProgram"); +} + +TEST(CoreMLExecutionProviderTest, FusedConvTestLeakyRelu) { + // Single-param activation (alpha). Heavily used by YOLOv3 — a CPU-optimized + // YOLOv3 graph contains 72 Conv→LeakyRelu fusions, all of which would + // otherwise fall back to CPU and fragment the CoreML partition. + RunFusedConvTest("LeakyRelu", {0.1f}, "FusedConvTestLeakyRelu_MLProgram"); +} + +TEST(CoreMLExecutionProviderTest, FusedConvTestSigmoid) { + // Param-less Sigmoid activation. Distinct from the Relu test only in the + // emitted MIL op (`sigmoid` vs `relu`); guards against regressions in + // op-name dispatch. + RunFusedConvTest("Sigmoid", {}, "FusedConvTestSigmoid_MLProgram"); +} + +TEST(CoreMLExecutionProviderTest, FusedConvTestTanh) { + // Param-less Tanh activation; same rationale as the Sigmoid test for the + // remaining elementwise activation. + RunFusedConvTest("Tanh", {}, "FusedConvTestTanh_MLProgram"); +} #endif // !(ORT_MINIMAL_BUILD) } // namespace test } // namespace onnxruntime diff --git a/tools/ci_build/github/apple/coreml_supported_mlprogram_ops.md b/tools/ci_build/github/apple/coreml_supported_mlprogram_ops.md index 5bcdcc2e1ecee..395813844906a 100644 --- a/tools/ci_build/github/apple/coreml_supported_mlprogram_ops.md +++ b/tools/ci_build/github/apple/coreml_supported_mlprogram_ops.md @@ -53,3 +53,4 @@ Keep in sync with doco generated from /docs/execution-providers/CoreML-Execution |ai.onnx:Transpose|| |ai.onnx:Unsqueeze|| |com.microsoft:QuickGelu|Produced by ORT's `QuickGeluFusion` optimizer pass. Decomposed into `mul` / `sigmoid` / `mul`.| +|com.microsoft:FusedConv|Produced by ORT's `ConvActivationFusion` pass. Decomposed into `conv` + the fused activation (`Relu`, `Sigmoid`, `Tanh`, `LeakyRelu`, `Clip`, `HardSigmoid`).| From ce199242e6080fc5140416f489007ecb0820bc31 Mon Sep 17 00:00:00 2001 From: Max Buckley Date: Wed, 6 May 2026 09:58:18 +0200 Subject: [PATCH 2/7] Drop redundant comment above IsSupportedFusedConvActivation The activation list and the function name already convey what's allowed; the cross-reference to a specific line range in conv_activation_fusion.cc would rot the moment that file gets touched. Co-Authored-By: Claude Opus 4.7 (1M context) --- .../core/providers/coreml/builders/impl/conv_op_builder.cc | 4 ---- 1 file changed, 4 deletions(-) diff --git a/onnxruntime/core/providers/coreml/builders/impl/conv_op_builder.cc b/onnxruntime/core/providers/coreml/builders/impl/conv_op_builder.cc index 877631b4692f5..f8c3c4ec2aa0c 100644 --- a/onnxruntime/core/providers/coreml/builders/impl/conv_op_builder.cc +++ b/onnxruntime/core/providers/coreml/builders/impl/conv_op_builder.cc @@ -17,10 +17,6 @@ namespace coreml { namespace { -// Set of activations that ORT's ConvActivationFusion may fold into a FusedConv -// and that the CoreML EP has MLProgram equivalents for. -// See onnxruntime/core/optimizer/conv_activation_fusion.cc:82-99 for the -// producer side. bool IsSupportedFusedConvActivation(const std::string& name) { return name == "Relu" || name == "Sigmoid" || name == "Tanh" || name == "LeakyRelu" || name == "Clip" || name == "HardSigmoid"; From bb7f4b1ad5cd724e5f07304522889b4d6d9cac26 Mon Sep 17 00:00:00 2001 From: Max Buckley Date: Thu, 7 May 2026 09:58:46 +0200 Subject: [PATCH 3/7] [CoreML EP] Drive FusedConv activation handling from a single table Replaces the duplicated activation lists in IsSupportedFusedConvActivation and the if/else MIL-op chain in AddToModelBuilderImpl with a single constexpr table mapping each ONNX activation name to its MIL op, expected activation_params arity, and MIL input port names. Both the support gate and the dispatch path now consult that table. Also tightens IsOpSupportedImpl to reject FusedConv nodes whose activation_params arity does not match what the activation expects (0 for Relu/Sigmoid/Tanh, 1 for LeakyRelu, 2 for Clip/HardSigmoid). The CPU EP already rejects mismatches in fused_activation.cc; CoreML now matches that behaviour instead of silently inventing defaults. Addresses review feedback from yuslepukhin and copilot-pull-request-reviewer on #28289. Co-Authored-By: Claude Opus 4.7 (1M context) --- .../coreml/builders/impl/conv_op_builder.cc | 96 ++++++++++--------- 1 file changed, 53 insertions(+), 43 deletions(-) diff --git a/onnxruntime/core/providers/coreml/builders/impl/conv_op_builder.cc b/onnxruntime/core/providers/coreml/builders/impl/conv_op_builder.cc index f8c3c4ec2aa0c..1e5d0ca8af319 100644 --- a/onnxruntime/core/providers/coreml/builders/impl/conv_op_builder.cc +++ b/onnxruntime/core/providers/coreml/builders/impl/conv_op_builder.cc @@ -1,6 +1,9 @@ // Copyright (c) Microsoft Corporation. All rights reserved. // Licensed under the MIT License. +#include +#include + #include "core/providers/common.h" #include "core/providers/coreml/builders/helper.h" #include "core/providers/coreml/builders/impl/base_op_builder.h" @@ -17,9 +20,34 @@ namespace coreml { namespace { -bool IsSupportedFusedConvActivation(const std::string& name) { - return name == "Relu" || name == "Sigmoid" || name == "Tanh" || - name == "LeakyRelu" || name == "Clip" || name == "HardSigmoid"; +// Single source of truth for FusedConv activation handling. Drives both the +// support check in IsOpSupportedImpl and the MIL op dispatch in +// AddToModelBuilderImpl. `param_ports` lists the MIL op input ports that map +// positionally to `activation_params`. ConvActivationFusion packs the params +// in the same order: LeakyRelu=[alpha], Clip=[min,max], HardSigmoid=[alpha, +// beta] (see conv_activation_fusion.cc:165-184). For MIL's `clip`, alpha/beta +// are the min/max bounds. +struct FusedConvActivationSpec { + std::string_view onnx_name; + std::string_view mil_op; + uint8_t param_count; + std::array param_ports; +}; + +constexpr FusedConvActivationSpec kFusedConvActivations[] = { + {"Relu", "relu", 0, {}}, + {"Sigmoid", "sigmoid", 0, {}}, + {"Tanh", "tanh", 0, {}}, + {"LeakyRelu", "leaky_relu", 1, {{"alpha"}}}, + {"Clip", "clip", 2, {{"alpha", "beta"}}}, + {"HardSigmoid", "sigmoid_hard", 2, {{"alpha", "beta"}}}, +}; + +const FusedConvActivationSpec* FindFusedConvActivationSpec(std::string_view name) { + for (const auto& spec : kFusedConvActivations) { + if (spec.onnx_name == name) return &spec; + } + return nullptr; } } // namespace @@ -122,25 +150,17 @@ Status ConvOpBuilder::AddToModelBuilderImpl(ModelBuilder& model_builder, const N const std::string activation = helper.Get("activation", std::string("")); const auto activation_params = helper.Get("activation_params", std::vector{}); - std::string_view mil_op; - if (activation == "Relu") { - mil_op = "relu"; - } else if (activation == "Sigmoid") { - mil_op = "sigmoid"; - } else if (activation == "Tanh") { - mil_op = "tanh"; - } else if (activation == "LeakyRelu") { - mil_op = "leaky_relu"; - } else if (activation == "Clip") { - mil_op = "clip"; - } else if (activation == "HardSigmoid") { - mil_op = "sigmoid_hard"; - } else { - return ORT_MAKE_STATUS(ONNXRUNTIME, INVALID_ARGUMENT, - "FusedConv has unsupported activation: ", activation); - } - - auto act_op = model_builder.CreateOperation(node, mil_op, "activation"); + // IsOpSupportedImpl gates both of these, so the lookup and arity check + // serve as a defensive backstop rather than primary validation. + const auto* spec = FindFusedConvActivationSpec(activation); + ORT_RETURN_IF_NOT(spec != nullptr, + "FusedConv has unsupported activation: ", activation); + ORT_RETURN_IF_NOT(activation_params.size() == spec->param_count, + "FusedConv activation '", activation, "' expects ", + static_cast(spec->param_count), + " activation_params, got ", activation_params.size()); + + auto act_op = model_builder.CreateOperation(node, std::string(spec->mil_op), "activation"); AddOperationInput(*act_op, "x", conv_out_name); auto add_scalar = [&](std::string_view port_name, float value) { @@ -153,26 +173,8 @@ Status ConvOpBuilder::AddToModelBuilderImpl(ModelBuilder& model_builder, const N } }; - // Activation-specific params. ConvActivationFusion packs them into - // `activation_params` in this order (see conv_activation_fusion.cc:165-184): - // LeakyRelu: [alpha] - // Clip: [min, max] - // HardSigmoid: [alpha, beta] - if (activation == "LeakyRelu") { - const float alpha = activation_params.empty() ? 0.01f : activation_params[0]; - add_scalar("alpha", alpha); - } else if (activation == "Clip") { - const float min_v = activation_params.size() > 0 ? activation_params[0] - : std::numeric_limits::lowest(); - const float max_v = activation_params.size() > 1 ? activation_params[1] - : std::numeric_limits::max(); - add_scalar("alpha", min_v); - add_scalar("beta", max_v); - } else if (activation == "HardSigmoid") { - const float alpha = activation_params.size() > 0 ? activation_params[0] : 0.2f; - const float beta = activation_params.size() > 1 ? activation_params[1] : 0.5f; - add_scalar("alpha", alpha); - add_scalar("beta", beta); + for (uint8_t i = 0; i < spec->param_count; ++i) { + add_scalar(spec->param_ports[i], activation_params[i]); } AddOperationOutput(*act_op, *node.OutputDefs()[0]); @@ -327,11 +329,19 @@ bool ConvOpBuilder::IsOpSupportedImpl(const Node& node, const OpBuilderInputPara } NodeAttrHelper fused_helper(node); const std::string activation = fused_helper.Get("activation", std::string("")); - if (!IsSupportedFusedConvActivation(activation)) { + const auto* spec = FindFusedConvActivationSpec(activation); + if (!spec) { LOGS(logger, VERBOSE) << "FusedConv activation [" << activation << "] is not supported by the CoreML EP"; return false; } + const auto activation_params = fused_helper.Get("activation_params", std::vector{}); + if (activation_params.size() != spec->param_count) { + LOGS(logger, VERBOSE) << "FusedConv activation [" << activation << "] expects " + << static_cast(spec->param_count) + << " activation_params, got " << activation_params.size(); + return false; + } } const auto& weight_name = input_defs[1]->Name(); From d0f411b435336e2d921bc2bb3d6eb3705a881a6c Mon Sep 17 00:00:00 2001 From: Max Buckley Date: Thu, 7 May 2026 09:59:37 +0200 Subject: [PATCH 4/7] [CoreML EP] Gate FusedConv against Z residual input and non-float dtypes MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit Adds two early rejections in IsOpSupportedImpl that the previous implementation was silently letting through: 1. The optional 4th input 'Z' (residual sum) — FusedConv with Z is Y = activation(Conv(X,W,B) + Z), but the MLProgram lowering only emits conv + activation and never reads input[3]. Without this guard a pre-optimized Conv+Add+Act graph would be fully assigned to CoreML and produce the wrong result by dropping the residual add. Reported by yuslepukhin on #28289. 2. Non-float element types — FusedConv schema's `T` permits double, but the activation-param lambda only handles FLOAT and FLOAT16. CoreML does not support double anyway; reject double explicitly so the fallback to CPU is what actually runs. Co-Authored-By: Claude Opus 4.7 (1M context) --- .../coreml/builders/impl/conv_op_builder.cc | 18 ++++++++++++++++++ 1 file changed, 18 insertions(+) diff --git a/onnxruntime/core/providers/coreml/builders/impl/conv_op_builder.cc b/onnxruntime/core/providers/coreml/builders/impl/conv_op_builder.cc index 1e5d0ca8af319..f08ee5eecb4e7 100644 --- a/onnxruntime/core/providers/coreml/builders/impl/conv_op_builder.cc +++ b/onnxruntime/core/providers/coreml/builders/impl/conv_op_builder.cc @@ -327,6 +327,24 @@ bool ConvOpBuilder::IsOpSupportedImpl(const Node& node, const OpBuilderInputPara LOGS(logger, VERBOSE) << "FusedConv is only supported in MLProgram format"; return false; } + // FusedConv schema (contrib_defs.cc) has 4 inputs: X, W, B (optional), + // Z (optional). Z is a residual sum input — Y = activation(Conv(X,W,B) + Z). + // The MLProgram lowering below does not read input 3, so accepting a node + // with Z would silently drop the residual and produce wrong results. + if (input_defs.size() > 3) { + LOGS(logger, VERBOSE) << "FusedConv with the optional 'Z' (residual sum) input " + "is not supported by the CoreML EP"; + return false; + } + // Only float/float16 are wired through add_scalar in AddToModelBuilderImpl. + // FusedConv schema also allows double, which CoreML does not support. + const auto x_elem_type = input_defs[0]->TypeAsProto()->tensor_type().elem_type(); + if (x_elem_type != ONNX_NAMESPACE::TensorProto_DataType_FLOAT && + x_elem_type != ONNX_NAMESPACE::TensorProto_DataType_FLOAT16) { + LOGS(logger, VERBOSE) << "FusedConv element type [" << x_elem_type + << "] is not supported by the CoreML EP (expected FLOAT or FLOAT16)"; + return false; + } NodeAttrHelper fused_helper(node); const std::string activation = fused_helper.Get("activation", std::string("")); const auto* spec = FindFusedConvActivationSpec(activation); From 3fd535f9e2d46eb60603a5d1d2e409908a629cd2 Mon Sep 17 00:00:00 2001 From: Max Buckley Date: Thu, 7 May 2026 10:07:35 +0200 Subject: [PATCH 5/7] [CoreML EP] Add FusedConv negative tests for NN-format and Z-input gating MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit Adds two ExpectedEPNodeAssignment::None tests covering the support-gating paths added in the previous commit: - FusedConvNeuralNetworkNotSupported — FusedConv on the NeuralNetwork EP is rejected so the node falls back to CPU rather than emit an unfused Conv that silently drops the activation. - FusedConvWithZInputNotSupported — FusedConv with the optional residual Z input is rejected to prevent the silent drop of Conv+Add+Act semantics that yuslepukhin flagged on #28289. The unsupported-activation and wrong-arity rejections are also live but not testable end-to-end: the CPU FusedConv kernel rejects those same malformed graphs at kernel construction, so TestModelLoad's Initialize fails before partition assignment can be observed. MakeFusedConvModel grows an `add_z` knob to wire the optional 4th input. A small RunFusedConvNegativeTest helper packages the serialize-then-TestModelLoad pattern. Co-Authored-By: Claude Opus 4.7 (1M context) --- .../providers/coreml/coreml_basic_test.cc | 45 ++++++++++++++++++- 1 file changed, 44 insertions(+), 1 deletion(-) diff --git a/onnxruntime/test/providers/coreml/coreml_basic_test.cc b/onnxruntime/test/providers/coreml/coreml_basic_test.cc index 5fad6c9f7734d..b6e1545d6f319 100644 --- a/onnxruntime/test/providers/coreml/coreml_basic_test.cc +++ b/onnxruntime/test/providers/coreml/coreml_basic_test.cc @@ -1168,8 +1168,11 @@ namespace { // Build a single-node com.microsoft:FusedConv model for the tests below. // Input X is {1, 2, 4, 4}, weight W is {3, 2, 2, 2} (constant initializer, set // to a simple pattern), no bias. stride=1, pad=0. Output is {1, 3, 3, 3}. +// When `add_z` is true, the optional 4th 'Z' (residual sum) input is added — +// used by the negative test that exercises CoreML's rejection path. ONNX_NAMESPACE::ModelProto MakeFusedConvModel(const std::string& activation, - const std::vector& activation_params) { + const std::vector& activation_params, + bool add_z = false) { ONNX_NAMESPACE::ModelProto model_proto; model_proto.set_ir_version(ONNX_NAMESPACE::IR_VERSION); auto* onnx_opset = model_proto.add_opset_import(); @@ -1189,6 +1192,9 @@ ONNX_NAMESPACE::ModelProto MakeFusedConvModel(const std::string& activation, for (int64_t d : shape) tt->mutable_shape()->add_dim()->set_dim_value(d); }; add_tensor_value(graph_proto->add_input(), "X", {1, 2, 4, 4}); + if (add_z) { + add_tensor_value(graph_proto->add_input(), "Z", {1, 3, 3, 3}); + } add_tensor_value(graph_proto->add_output(), "Y", {1, 3, 3, 3}); // Weight initializer: {3, 2, 2, 2} = 24 floats, deterministic pattern. @@ -1205,6 +1211,12 @@ ONNX_NAMESPACE::ModelProto MakeFusedConvModel(const std::string& activation, node->set_domain("com.microsoft"); node->add_input("X"); node->add_input("W"); + if (add_z) { + // FusedConv schema: X, W, B(optional), Z(optional). Skip B with "" so Z + // lands in input slot 3. + node->add_input(""); + node->add_input("Z"); + } node->add_output("Y"); // Set pads explicitly since the CoreML conv builder's VALID-pad branch @@ -1230,6 +1242,15 @@ ONNX_NAMESPACE::ModelProto MakeFusedConvModel(const std::string& activation, return model_proto; } +void RunFusedConvNegativeTest(const ONNX_NAMESPACE::ModelProto& model_proto, bool mlprogram) { + std::string model_data; + ASSERT_TRUE(model_proto.SerializeToString(&model_data)); + gsl::span model_span{reinterpret_cast(model_data.data()), model_data.size()}; + auto provider = mlprogram ? MakeCoreMLExecutionProvider("MLProgram") + : MakeCoreMLExecutionProvider(); + TestModelLoad(model_span, std::move(provider), ExpectedEPNodeAssignment::None); +} + void RunFusedConvTest(const std::string& activation, const std::vector& activation_params, std::string_view log_id) { @@ -1297,6 +1318,28 @@ TEST(CoreMLExecutionProviderTest, FusedConvTestTanh) { RunFusedConvTest("Tanh", {}, "FusedConvTestTanh_MLProgram"); } +// Negative tests below cover the two gating cases that have a working CPU +// fallback (so TestModelLoad's Initialize() succeeds and the EP partition +// assignment can be verified). The arity-mismatch and unsupported-activation +// cases are also rejected by IsOpSupportedImpl, but the CPU FusedConv kernel +// rejects them too, so there's no end-to-end fallback to observe. + +TEST(CoreMLExecutionProviderTest, FusedConvNeuralNetworkNotSupported) { + // FusedConv is only implemented on the MLProgram path. The NeuralNetwork + // builder must reject it so the node falls back to CPU rather than emit an + // unfused Conv and silently lose the activation. + RunFusedConvNegativeTest(MakeFusedConvModel("Relu", {}), /*mlprogram=*/false); +} + +TEST(CoreMLExecutionProviderTest, FusedConvWithZInputNotSupported) { + // The optional Z residual sum input (Y = activation(Conv(X,W,B) + Z)) is + // not lowered by the MLProgram builder. Accepting such a node would + // silently drop the residual add and produce wrong results, so it must be + // rejected and fall back to CPU. + RunFusedConvNegativeTest(MakeFusedConvModel("Relu", {}, /*add_z=*/true), + /*mlprogram=*/true); +} + TEST(CoreMLExecutionProviderTest, Split11UnevenAttribute) { // ai.onnx:Split-11 with `split` attribute carrying non-uniform sizes. // This is the form used by DWPose (`dw-ll_ucoco_384.onnx`); without From cb96038238523ccd0508119194cd32bab3fb8e51 Mon Sep 17 00:00:00 2001 From: Max Buckley Date: Thu, 7 May 2026 10:08:04 +0200 Subject: [PATCH 6/7] [CoreML EP] Reword FusedConv factory comment The previous comment said FusedConv "reuses the existing ConvOpBuilder", which Copilot flagged as misleading because CreateConvOpBuilder registers a new instance under the FusedConv op type rather than literally reusing the Conv-registered instance. Reword to "handled by the same ConvOpBuilder class" so it's clear the reuse is at the class/dispatch level, not the instance. Co-Authored-By: Claude Opus 4.7 (1M context) --- .../core/providers/coreml/builders/op_builder_factory.cc | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) diff --git a/onnxruntime/core/providers/coreml/builders/op_builder_factory.cc b/onnxruntime/core/providers/coreml/builders/op_builder_factory.cc index 2d7cee49a2cee..6f465774a3c3c 100644 --- a/onnxruntime/core/providers/coreml/builders/op_builder_factory.cc +++ b/onnxruntime/core/providers/coreml/builders/op_builder_factory.cc @@ -28,8 +28,8 @@ static OpBuilderRegistrations CreateOpBuilderRegistrations() { // Microsoft-domain ops produced by ORT's own optimizer passes. CreateQuickGeluOpBuilder("QuickGelu", op_registrations); - // FusedConv (from ConvActivationFusion) reuses the existing ConvOpBuilder - // which branches on op_type internally. + // FusedConv (from ConvActivationFusion) is handled by the same ConvOpBuilder + // class, which branches on op_type internally. CreateConvOpBuilder("FusedConv", op_registrations); // Unary ops From 0f851406f738dd48bbbafb6df86bf075cc3242e6 Mon Sep 17 00:00:00 2001 From: Max Buckley Date: Thu, 7 May 2026 10:24:47 +0200 Subject: [PATCH 7/7] [CoreML EP] Document Z residual input as a TODO Adds a TODO above the FusedConv Z-input rejection pointing at the straightforward MIL lowering (`add(conv_out, Z)` between conv and activation) and noting which optimizer pass produces the Z form (ConvAddActivationFusion at TransformerLevel::Level3, gated to cpu_ep). This way the next person looking at residual-block coverage on CoreML finds the implementation hint without re-discovering the schema and optimizer pass independently. Co-Authored-By: Claude Opus 4.7 (1M context) --- .../core/providers/coreml/builders/impl/conv_op_builder.cc | 7 +++++++ 1 file changed, 7 insertions(+) diff --git a/onnxruntime/core/providers/coreml/builders/impl/conv_op_builder.cc b/onnxruntime/core/providers/coreml/builders/impl/conv_op_builder.cc index f08ee5eecb4e7..3c6794b300557 100644 --- a/onnxruntime/core/providers/coreml/builders/impl/conv_op_builder.cc +++ b/onnxruntime/core/providers/coreml/builders/impl/conv_op_builder.cc @@ -331,6 +331,13 @@ bool ConvOpBuilder::IsOpSupportedImpl(const Node& node, const OpBuilderInputPara // Z (optional). Z is a residual sum input — Y = activation(Conv(X,W,B) + Z). // The MLProgram lowering below does not read input 3, so accepting a node // with Z would silently drop the residual and produce wrong results. + // + // TODO: support Z by inserting an `add` MIL op between the conv output + // and the activation input — `act_in = add(conv_out, Z)` — preserving the + // `act(conv + Z)` ordering. This would unlock CoreML coverage for graphs + // optimized at TransformerLevel::Level3 (ORT_ENABLE_ALL) where + // ConvAddActivationFusion (core/optimizer/conv_add_act_fusion.cc) produces + // FusedConv(B, Z, act) for residual blocks (ResNet/EfficientNet etc). if (input_defs.size() > 3) { LOGS(logger, VERBOSE) << "FusedConv with the optional 'Z' (residual sum) input " "is not supported by the CoreML EP";