Skip to content

Conversation

@adrianlizarraga
Copy link
Contributor

@adrianlizarraga adrianlizarraga commented Sep 30, 2025

Description

This PR adds an initial set of C APIs necessary to support kernel registration for plugin EPs.

Example use

The example plugin EP implementation now registers MemcpyFromHost and MemcpyToHost operator kernels using the new APIs. New utilities in the example implementation make the process of defining operator kernels very similar to the existing process used by provider-bridge EPs.

First, the operator kernel class is defined:

// File: onnxruntime/test/autoep/library/kernels/memcpy.h
struct Memcpy : public OrtKernelImpl {
  static OrtStatus* Create(const OrtKernelInfo* info, void* state, /*out*/ std::unique_ptr<Memcpy>& kernel);

  Memcpy(const OrtKernelInfo* info, void* state);

  static OrtStatus* ORT_API_CALL ComputeImpl(OrtKernelImpl* this_ptr, OrtKernelContext* kernel_ctx) noexcept;
  static void ORT_API_CALL ReleaseImpl(OrtKernelImpl* this_ptr) noexcept;

  OrtStatus* DoCompute(OrtKernelContext* kernel_ctx) noexcept;

 private:
  const OrtKernelInfo* info_;
  void* state_;  // Custom state passed from OrtEp
};

Then, a macro defines a function that can be called to register the operator with the EP's kernel registry:

// File: onnxruntime/test/autoep/library/kernels/memcpy.cc
ONNX_OPERATOR_KERNEL_EX(
    MemcpyFromHost,
    kOnnxDomain,
    1,
    (Ort::KernelDefBuilder()
         .SetInputMemType(0, OrtMemType::OrtMemTypeCPUInput)
         .AddTypeConstraint("T", MLDataTypes::GetTensorType(ONNX_TENSOR_ELEMENT_DATA_TYPE_FLOAT))),
    Memcpy)

ONNX_OPERATOR_KERNEL_EX(
    MemcpyToHost,
    kOnnxDomain,
    1,
    (Ort::KernelDefBuilder()
         .SetOutputMemType(0, OrtMemType::OrtMemTypeCPUOutput)
         .AddTypeConstraint("T", MLDataTypes::GetTensorType(ONNX_TENSOR_ELEMENT_DATA_TYPE_FLOAT))),
    Memcpy)

Lastly, the functions defined by the above macro are entered into a table:

// File: onnxruntime/test/autoep/library/ep_kernel_registration.cc

// Include kernel files:
#include "kernels/memcpy.h"

// Forward declarations of kernel classes used as template args for BuildKernelCreateInfo
class ONNX_OPERATOR_KERNEL_CLASS_NAME(kOnnxDomain, 1, MemcpyFromHost);
class ONNX_OPERATOR_KERNEL_CLASS_NAME(kOnnxDomain, 1, MemcpyToHost);

// Table of BuildKernelCreateInfo functions for each operator
static const BuildKernelCreateInfoFn build_kernel_create_info_funcs[] = {
    BuildKernelCreateInfo<void>,  // Dummy to avoid table becoming empty.
    BuildKernelCreateInfo<ONNX_OPERATOR_KERNEL_CLASS_NAME(kOnnxDomain, 1, MemcpyFromHost)>,
    BuildKernelCreateInfo<ONNX_OPERATOR_KERNEL_CLASS_NAME(kOnnxDomain, 1, MemcpyToHost)>,
};

The example EP processes the entries in the above table to add information about the supported operator kernels to the EP's kernel registry (OrtKernelRegistry).

Additionally, during the call to OrtEp::GetCapability, an EP can now lookup registered kernel definitions via the new API EpGraphSupportInfo_LookUpKernel. Note that an EP would not normally lookup kernels for Memcpy**Host, which are inserted by ORT. Instead, it would be used to look up other registered operator kernels like Conv, for example.

static OrtStatus* ORT_API_CALL GetCapabilityImpl(OrtEp* this_ptr, const OrtGraph* graph,
                                                           OrtEpGraphSupportInfo* graph_support_info) noexcept {
  // ...

  for (const OrtNode* node : nodes) {
    const OrtKernelDef* kernel_def = nullptr;
    OrtStatus* status = this_ep->ep_api->EpGraphSupportInfo_LookUpKernel(graph_support_info, node, &kernel_def);

    if (status != nullptr) {
      return status;
    }

    if (kernel_def != nullptr) {  // Take node if this EP has a registered kernel for it.
      if (OrtStatus* st = this_ep->ep_api->EpGraphSupportInfo_AddSingleNode(graph_support_info, node);
          st != nullptr) {
        return st;
      }
    }
  }

  return nullptr;
}

EP implementation details

An EP instance (i.e., OrtEp) that needs to register operator kernels with ONNX Runtime must implement the following OrtEp::GetKernelRegistry() function:

Function Signature Description
GetKernelRegistry

Returns:OrtStatus*

Parameters:
  • OrtEp* this_ptr: The OrtEp instance.
  • const OrtKernelRegistry** kernel_registry: Output parameter set to the EP's kernel registry, which must remain valid throughout the lifetime of the EP.
Gets the execution provider's kernel registry, if any.

Remarks: A kernel registry contains kernel creation information for operator kernels supported by an EP.

Note: Implementation of this function is optional. If set to NULL, ORT assumes the EP compiles nodes.

If defined by the EP, the OrtEp::GetKernelRegistry() function is called by ONNX Runtime after creating an instance of the OrtEp in order to retrieve the EP's kernel registry.

APIs used by EP to add entries to kernel registry

An EP's kernel registry (OrtKernelRegistry) contains information necessary for the (later) creation of operator kernels supported by an EP. Conceptually, a kernel registry contains an array of "kernel creation information" elements, one per operator. Each such element consists of:

  • A kernel definition (OrtKernelDef), which specifies operator type, supported versions, type constraints, I/O memory types, etc.
  • A function of type OrtKernelCreateFunc that ORT calls to create an instance of the kernel (OrtKernelImpl).
  • Custom opaque state (provided by the OrtEp) that is passed to the OrtKernelCreateFunc.

An EP uses the following OrtEpApi::KernelRegistry_AddKernel() function to add an entry for one supported operator.

Function Signature Description
KernelRegistry_AddKernel

Returns:OrtStatus*

Parameters:
  • OrtKernelRegistry* kernel_registry: The OrtKernelRegistry instance.
  • const OrtKernelDef* kernel_def: The kernel definition, which includes operator type, version, EP name, type constraints, etc.
  • OrtKernelCreateFunc kernel_create_func: Function that creates an instance of the operator kernel as a OrtKernelImpl instance.
  • void* kernel_create_func_state: Custom state passed to the kernel creation function. Can be null.
Adds kernel creation information for a supported operator kernel to the given kernel registry.

Remarks: Refer to OrtEp::GetKernelRegistry, which returns an EP's kernel registry to ORT.
Building a kernel definition

An EP uses a kernel definition builder (OrtKernelDefBuilder) to create a kernel definition (OrtKernelDef). The following table lists some of the C APIs related to building a kernel definition. The above ONNX_OPERATOR_KERNEL_EX macro uses these APIs.

Function Signature Description
KernelDefBuilder_SetOperatorType

Returns:OrtStatus*

Parameters:
  • OrtKernelDefBuilder* kernel_def_builder: The OrtKernelDefBuilder instance.
  • const char* op_type: A null-terminated string representing the operator type.
Sets the kernel's operator type.
KernelDefBuilder_SetDomain

Returns:OrtStatus*

Parameters:
  • OrtKernelDefBuilder* kernel_def_builder: The OrtKernelDefBuilder instance.
  • const char* domain: A null-terminated string representing the operator's domain.
Sets the kernel's domain.
... ...
KernelDefBuilder_Build

Returns:OrtStatus*

Parameters:
  • OrtKernelDefBuilder* kernel_def_builder: The OrtKernelDefBuilder instance.
  • OrtKernelDef** kernel_def_out: The new OrtKernelDef instance.
Creates a OrtKernelDef instance from the given kernel definition builder.
Defining a kernel implementation

An EP defines a kernel implementation by initializing an instance of OrtKernelImpl (shown below) with function pointers for computation, release, etc.

struct OrtKernelImpl {
  uint32_t ort_version_supported;  ///< Must be initialized to ORT_API_VERSION

  /** \brief Computation function called to execute the kernel on an EP.
   *
   * \param[in] this_ptr The OrtKernelImpl instance.
   * \param[in] context The OrtKernelContext instance that provides access to the inputs and outputs.
   *
   * \snippet{doc} snippets.dox OrtStatus Return Value
   *
   * \since Version 1.24.
   */
  ORT_API2_STATUS(Compute, _In_ OrtKernelImpl* this_ptr, _In_ OrtKernelContext* context);

  /** \brief Called by ORT to release the OrtKernelImpl instance and its resources.
   *
   * \param[in] this_ptr The OrtKernelImpl instance.
   *
   * \since Version 1.24.
   */
  ORT_API_T(void, Release, _In_ OrtKernelImpl* this_ptr);
};

As shown previously, the example EP creates a Memcpy class that inherits from OrtKernelImpl and implements the above functions.

Defining a kernel creation function

An EP must provide a function of type OrtKernelCreateFunc that ORT can later call to create an instance of a kernel (OrtKernelImpl). The signature of the OrtKernelCreateFunc is shown below.

/** \brief Type definition for a function that creates an OrtKernelImpl instance for an operator kernel.
 *
 * \param[in] ctx Unused/reserved for future use.
 * \param[in] kernel_create_func_state Opaque state initially provided by the EP that registered the kernel.
 *                                     Refer to OrtEpApi::KernelRegistry_AddKernel(). May be null.
 * \param[in] info The OrtKernelInfo instance that provides access to the kernel's input and output characteristics.
 * \param[out] kernel_out Output parameter set to the new OrtKernelImpl instance.
 *
 * \snippet{doc} snippets.dox OrtStatus Return Value
 *
 * \since Version 1.24.
 */
typedef OrtStatus*(ORT_API_CALL* OrtKernelCreateFunc)(_In_ OrtKernelCreateContext* ctx,  // unused/reserved as of 1.24
                                                      _In_ void* kernel_create_func_state,
                                                      _In_ const OrtKernelInfo* info,
                                                      _Outptr_result_maybenull_ OrtKernelImpl** kernel_out);

The example EP declares kernel creation functions via use of the previously mentioned ONNX_OPERATOR_KERNEL_EX macro. If one were to expand the macro call, the kernel creation function for MemcpyFromHost would look similar to the following snippet:

OrtStatus* ORT_API_CALL CreateMemcpyKernel(OrtKernelCreateContext* /*ctx*/, void* kernel_create_func_state,
                                           const OrtKernelInfo* info, OrtKernelImpl** kernel_out) {
  *kernel_out = nullptr;

  std::unique_ptr<Memcpy> kernel;
  RETURN_IF_ERROR(Memcpy::Create(info, kernel_create_func_state, kernel));

  *kernel_out = kernel.release();
  return nullptr;
}

Motivation and Context

Copy link
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull Request Overview

This draft PR implements support for kernel-based execution providers (EPs) within the ONNX Runtime EP plugin architecture. The changes enable plugin EPs to register custom kernels directly with the ORT runtime, expanding beyond the current node-based computation model.

  • Adds comprehensive kernel registration infrastructure for plugin EPs
  • Implements memory copy kernels as examples (MemcpyFromHost/MemcpyToHost)
  • Extends the EP API with kernel definition and creation functionality

Reviewed Changes

Copilot reviewed 24 out of 24 changed files in this pull request and generated 2 comments.

Show a summary per file
File Description
onnxruntime/test/framework/ep_plugin_provider_test.cc Updates test to pass kernel registry parameter
onnxruntime/test/autoep/library/kernels/utils.h Defines kernel creation utilities and macros
onnxruntime/test/autoep/library/kernels/memcpy.h Declares example Memcpy kernel interface
onnxruntime/test/autoep/library/kernels/memcpy.cc Implements example Memcpy kernel with registration
onnxruntime/test/autoep/library/kernels/data_types.h Declares MLDataTypes singleton for type management
onnxruntime/test/autoep/library/kernels/data_types.cc Implements MLDataTypes for tensor type retrieval
onnxruntime/test/autoep/library/ep_kernel_registration.h Declares kernel registration functions
onnxruntime/test/autoep/library/ep_kernel_registration.cc Implements kernel registration logic
onnxruntime/test/autoep/library/ep.h Adds kernel creation method declarations to EP
onnxruntime/test/autoep/library/ep.cc Implements kernel creation methods in example EP
onnxruntime/core/session/utils.h Declares CopyTensors utility function
onnxruntime/core/session/utils.cc Implements CopyTensors utility function
onnxruntime/core/session/provider_policy_context.cc Updates EP creation to use new factory method
onnxruntime/core/session/plugin_ep/ep_plugin_provider_interfaces.h Extends PluginExecutionProvider with kernel registry support
onnxruntime/core/session/plugin_ep/ep_plugin_provider_interfaces.cc Implements kernel registry initialization in plugin EP
onnxruntime/core/session/plugin_ep/ep_kernel_registration.h Declares kernel registration infrastructure
onnxruntime/core/session/plugin_ep/ep_kernel_registration.cc Implements plugin EP kernel wrapper and registration
onnxruntime/core/session/plugin_ep/ep_api.h Declares new EP API functions for kernel support
onnxruntime/core/session/plugin_ep/ep_api.cc Implements new EP API functions for kernel support
onnxruntime/core/session/onnxruntime_c_api.cc Refactors CopyTensors to use shared utility
include/onnxruntime/core/session/onnxruntime_ep_c_api.h Adds kernel-related types and API declarations
include/onnxruntime/core/session/onnxruntime_cxx_inline.h Implements C++ wrapper methods for kernel APIs
include/onnxruntime/core/session/onnxruntime_cxx_api.h Declares C++ KernelDefBuilder class
cmake/onnxruntime_unittests.cmake Updates build to include kernel source files

Tip: Customize your code reviews with copilot-instructions.md. Create the file or learn how to get started.

Copy link
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull Request Overview

Copilot reviewed 25 out of 25 changed files in this pull request and generated 3 comments.


Tip: Customize your code reviews with copilot-instructions.md. Create the file or learn how to get started.

Copy link
Contributor

@edgchen1 edgchen1 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

looks good, just some minor comments on the example EP code

edgchen1
edgchen1 previously approved these changes Dec 2, 2025
@adrianlizarraga adrianlizarraga merged commit db6d83b into main Dec 5, 2025
91 checks passed
@adrianlizarraga adrianlizarraga deleted the adrianl/ep-abi-kernel-based-eps branch December 5, 2025 23:23
alex-spacemit pushed a commit to spacemit-com/onnxruntime that referenced this pull request Dec 8, 2025
### Description
This PR adds an initial set of C APIs necessary to support kernel
registration for plugin EPs.

### Example use
The example plugin EP implementation now registers `MemcpyFromHost` and
`MemcpyToHost` operator kernels using the new APIs. New utilities in the
example implementation make the process of defining operator kernels
very similar to the existing process used by provider-bridge EPs.

First, the operator kernel class is defined:
```c++
// File: onnxruntime/test/autoep/library/kernels/memcpy.h
struct Memcpy : public OrtKernelImpl {
  static OrtStatus* Create(const OrtKernelInfo* info, void* state, /*out*/ std::unique_ptr<Memcpy>& kernel);

  Memcpy(const OrtKernelInfo* info, void* state);

  static OrtStatus* ORT_API_CALL ComputeImpl(OrtKernelImpl* this_ptr, OrtKernelContext* kernel_ctx) noexcept;
  static void ORT_API_CALL ReleaseImpl(OrtKernelImpl* this_ptr) noexcept;

  OrtStatus* DoCompute(OrtKernelContext* kernel_ctx) noexcept;

 private:
  const OrtKernelInfo* info_;
  void* state_;  // Custom state passed from OrtEp
};
```

Then, a macro defines a function that can be called to register the
operator with the EP's kernel registry:
```c++
// File: onnxruntime/test/autoep/library/kernels/memcpy.cc
ONNX_OPERATOR_KERNEL_EX(
    MemcpyFromHost,
    kOnnxDomain,
    1,
    (Ort::KernelDefBuilder()
         .SetInputMemType(0, OrtMemType::OrtMemTypeCPUInput)
         .AddTypeConstraint("T", MLDataTypes::GetTensorType(ONNX_TENSOR_ELEMENT_DATA_TYPE_FLOAT))),
    Memcpy)

ONNX_OPERATOR_KERNEL_EX(
    MemcpyToHost,
    kOnnxDomain,
    1,
    (Ort::KernelDefBuilder()
         .SetOutputMemType(0, OrtMemType::OrtMemTypeCPUOutput)
         .AddTypeConstraint("T", MLDataTypes::GetTensorType(ONNX_TENSOR_ELEMENT_DATA_TYPE_FLOAT))),
    Memcpy)
```

Lastly, the functions defined by the above macro are entered into a
table:

```c++
// File: onnxruntime/test/autoep/library/ep_kernel_registration.cc

// Include kernel files:
#include "kernels/memcpy.h"

// Forward declarations of kernel classes used as template args for BuildKernelCreateInfo
class ONNX_OPERATOR_KERNEL_CLASS_NAME(kOnnxDomain, 1, MemcpyFromHost);
class ONNX_OPERATOR_KERNEL_CLASS_NAME(kOnnxDomain, 1, MemcpyToHost);

// Table of BuildKernelCreateInfo functions for each operator
static const BuildKernelCreateInfoFn build_kernel_create_info_funcs[] = {
    BuildKernelCreateInfo<void>,  // Dummy to avoid table becoming empty.
    BuildKernelCreateInfo<ONNX_OPERATOR_KERNEL_CLASS_NAME(kOnnxDomain, 1, MemcpyFromHost)>,
    BuildKernelCreateInfo<ONNX_OPERATOR_KERNEL_CLASS_NAME(kOnnxDomain, 1, MemcpyToHost)>,
};
```
The [example EP processes the entries in the above
table](https://github.com/microsoft/onnxruntime/blob/adrianl/ep-abi-kernel-based-eps/onnxruntime/test/autoep/library/ep_kernel_registration.cc)
to add information about the supported operator kernels to the EP's
kernel registry (`OrtKernelRegistry`).

Additionally, during the call to `OrtEp::GetCapability`, an EP can now
lookup registered kernel definitions via the new API
`EpGraphSupportInfo_LookUpKernel`. Note that an EP would not normally
lookup kernels for `Memcpy**Host`, which are inserted by ORT. Instead,
it would be used to look up other registered operator kernels like
`Conv`, for example.

```c++
static OrtStatus* ORT_API_CALL GetCapabilityImpl(OrtEp* this_ptr, const OrtGraph* graph,
                                                           OrtEpGraphSupportInfo* graph_support_info) noexcept {
  // ...

  for (const OrtNode* node : nodes) {
    const OrtKernelDef* kernel_def = nullptr;
    OrtStatus* status = this_ep->ep_api->EpGraphSupportInfo_LookUpKernel(graph_support_info, node, &kernel_def);

    if (status != nullptr) {
      return status;
    }

    if (kernel_def != nullptr) {  // Take node if this EP has a registered kernel for it.
      if (OrtStatus* st = this_ep->ep_api->EpGraphSupportInfo_AddSingleNode(graph_support_info, node);
          st != nullptr) {
        return st;
      }
    }
  }

  return nullptr;
}
```
### EP implementation details

An EP instance (i.e., `OrtEp`) that needs to register operator kernels
with ONNX Runtime must implement the following
`OrtEp::GetKernelRegistry()` function:

| Function Signature | Description |
|--------------------|-------------|

|**GetKernelRegistry**<br/><br/>**Returns**:`OrtStatus*`<br/><br/>**Parameters:**<br/><ul><li>`OrtEp*
this_ptr`: The OrtEp instance.</li><li>`const OrtKernelRegistry**
kernel_registry`: Output parameter set to the EP's kernel registry,
which must remain valid throughout the lifetime of the EP.</li></ul>|
Gets the execution provider's kernel registry, if
any.<br/><br/>**Remarks:** A kernel registry contains kernel creation
information for operator kernels supported by an EP.<br/><br/>**Note:**
Implementation of this function is optional. If set to NULL, ORT assumes
the EP compiles nodes. |

If defined by the EP, the `OrtEp::GetKernelRegistry()` function is
[called by ONNX
Runtime](https://github.com/microsoft/onnxruntime/blob/0f7145f3809103c123de2d281a6b310677e6d56c/onnxruntime/core/session/plugin_ep/ep_plugin_provider_interfaces.cc#L146-L147)
after creating an instance of the `OrtEp` in order to retrieve the EP's
kernel registry.

#### APIs used by EP to add entries to kernel registry
An EP's kernel registry (`OrtKernelRegistry`) contains **information**
necessary for the (later) creation of operator kernels supported by an
EP. Conceptually, a kernel registry contains an array of "kernel
creation information" elements, one per operator. Each such element
consists of:
- A kernel **definition** (`OrtKernelDef`), which specifies operator
type, supported versions, type constraints, I/O memory types, etc.
- A function of type `OrtKernelCreateFunc` that ORT calls to create an
instance of the kernel (`OrtKernelImpl`).
- Custom opaque state (provided by the `OrtEp`) that is passed to the
`OrtKernelCreateFunc`.

An EP uses the following `OrtEpApi::KernelRegistry_AddKernel()` function
to add an entry for one supported operator.

| Function Signature | Description |
|--------------------|-------------|

|**KernelRegistry_AddKernel**<br/><br/>**Returns**:`OrtStatus*`<br/><br/>**Parameters:**<br/><ul><li>`OrtKernelRegistry*
kernel_registry`: The OrtKernelRegistry instance.</li><li>`const
OrtKernelDef* kernel_def`: The kernel definition, which includes
operator type, version, EP name, type constraints,
etc.</li><li>`OrtKernelCreateFunc kernel_create_func`: Function that
creates an instance of the operator kernel as a OrtKernelImpl
instance.</li><li>`void* kernel_create_func_state`: Custom state passed
to the kernel creation function. Can be null.</li></ul>| Adds kernel
creation information for a supported operator kernel to the given kernel
registry.<br/><br/>**Remarks:** Refer to OrtEp::GetKernelRegistry, which
returns an EP's kernel registry to ORT. |

##### Building a kernel definition
An EP uses a kernel definition builder (`OrtKernelDefBuilder`) to create
a kernel definition (`OrtKernelDef`). The following table lists **some**
of the C APIs related to building a kernel definition. The above
`ONNX_OPERATOR_KERNEL_EX` macro [uses these
APIs](https://github.com/microsoft/onnxruntime/blob/adrianl/ep-abi-kernel-based-eps/onnxruntime/test/autoep/library/kernels/utils.h#L42).

| Function Signature | Description |
|--------------------|-------------|

|**KernelDefBuilder_SetOperatorType**<br/><br/>**Returns**:`OrtStatus*`<br/><br/>**Parameters:**<br/><ul><li>`OrtKernelDefBuilder*
kernel_def_builder`: The OrtKernelDefBuilder instance.</li><li>`const
char* op_type`: A null-terminated string representing the operator
type.</li></ul>| Sets the kernel's operator type. |

|**KernelDefBuilder_SetDomain**<br/><br/>**Returns**:`OrtStatus*`<br/><br/>**Parameters:**<br/><ul><li>`OrtKernelDefBuilder*
kernel_def_builder`: The OrtKernelDefBuilder instance.</li><li>`const
char* domain`: A null-terminated string representing the operator's
domain.</li></ul>| Sets the kernel's domain. |
| ... | ... |

|**KernelDefBuilder_Build**<br/><br/>**Returns**:`OrtStatus*`<br/><br/>**Parameters:**<br/><ul><li>`OrtKernelDefBuilder*
kernel_def_builder`: The OrtKernelDefBuilder
instance.</li><li>`OrtKernelDef** kernel_def_out`: The new OrtKernelDef
instance.</li></ul>| Creates a OrtKernelDef instance from the given
kernel definition builder. |

##### Defining a kernel implementation
An EP defines a kernel implementation by initializing an instance of
`OrtKernelImpl` (shown below) with function pointers for computation,
release, etc.

```c++
struct OrtKernelImpl {
  uint32_t ort_version_supported;  ///< Must be initialized to ORT_API_VERSION

  /** \brief Computation function called to execute the kernel on an EP.
   *
   * \param[in] this_ptr The OrtKernelImpl instance.
   * \param[in] context The OrtKernelContext instance that provides access to the inputs and outputs.
   *
   * \snippet{doc} snippets.dox OrtStatus Return Value
   *
   * \since Version 1.24.
   */
  ORT_API2_STATUS(Compute, _In_ OrtKernelImpl* this_ptr, _In_ OrtKernelContext* context);

  /** \brief Called by ORT to release the OrtKernelImpl instance and its resources.
   *
   * \param[in] this_ptr The OrtKernelImpl instance.
   *
   * \since Version 1.24.
   */
  ORT_API_T(void, Release, _In_ OrtKernelImpl* this_ptr);
};
```

As shown previously, the example EP creates a `Memcpy` class that
inherits from `OrtKernelImpl` and [implements the above
functions](https://github.com/microsoft/onnxruntime/blob/adrianl/ep-abi-kernel-based-eps/onnxruntime/test/autoep/library/kernels/memcpy.cc).

##### Defining a kernel creation function
An EP must provide a function of type `OrtKernelCreateFunc` that ORT can
later call to create an instance of a kernel (`OrtKernelImpl`). The
signature of the `OrtKernelCreateFunc` is shown below.

```c++
/** \brief Type definition for a function that creates an OrtKernelImpl instance for an operator kernel.
 *
 * \param[in] ctx Unused/reserved for future use.
 * \param[in] kernel_create_func_state Opaque state initially provided by the EP that registered the kernel.
 *                                     Refer to OrtEpApi::KernelRegistry_AddKernel(). May be null.
 * \param[in] info The OrtKernelInfo instance that provides access to the kernel's input and output characteristics.
 * \param[out] kernel_out Output parameter set to the new OrtKernelImpl instance.
 *
 * \snippet{doc} snippets.dox OrtStatus Return Value
 *
 * \since Version 1.24.
 */
typedef OrtStatus*(ORT_API_CALL* OrtKernelCreateFunc)(_In_ OrtKernelCreateContext* ctx,  // unused/reserved as of 1.24
                                                      _In_ void* kernel_create_func_state,
                                                      _In_ const OrtKernelInfo* info,
                                                      _Outptr_result_maybenull_ OrtKernelImpl** kernel_out);
```

The example EP declares kernel creation functions via use of the
previously mentioned `ONNX_OPERATOR_KERNEL_EX`
[macro](https://github.com/microsoft/onnxruntime/blob/adrianl/ep-abi-kernel-based-eps/onnxruntime/test/autoep/library/kernels/utils.h#L56-L64).
If one were to expand the macro call, the kernel creation function for
`MemcpyFromHost` would look similar to the following snippet:

```c++
OrtStatus* ORT_API_CALL CreateMemcpyKernel(OrtKernelCreateContext* /*ctx*/, void* kernel_create_func_state,
                                           const OrtKernelInfo* info, OrtKernelImpl** kernel_out) {
  *kernel_out = nullptr;

  std::unique_ptr<Memcpy> kernel;
  RETURN_IF_ERROR(Memcpy::Create(info, kernel_create_func_state, kernel));

  *kernel_out = kernel.release();
  return nullptr;
}
```

### Motivation and Context
<!-- - Why is this change required? What problem does it solve?
- If it fixes an open issue, please link to the issue here. -->

---------

Co-authored-by: Copilot <[email protected]>
Co-authored-by: Edward Chen <[email protected]>
adrianlizarraga added a commit that referenced this pull request Dec 27, 2025
…26754)

### Description
- Adds C APIs to support pre-packing of const weights for
`OrtKernelImpl` implementations.
- APIs optionally support sharing of pre-packed weight data (for
cpu-accessible memory).
- Updates example kernel (Mul) to use new pre-packing API. Tested by
existing unit test:
https://github.com/microsoft/onnxruntime/blob/549d7415e26e2b3f86c42f86e135bb746caa37b4/onnxruntime/test/autoep/test_execution.cc#L242-L256


### Motivation and Context
The [previous PR](#26206)
added the base APIs that support kernel-based plugin EPs. This PR adds
an additional feature that was identified as necessary for the port of
WebGPU EP.

---------

Co-authored-by: Edward Chen <[email protected]>
Sumit2318 pushed a commit that referenced this pull request Jan 6, 2026
### Description
This PR adds an initial set of C APIs necessary to support kernel
registration for plugin EPs.

### Example use
The example plugin EP implementation now registers `MemcpyFromHost` and
`MemcpyToHost` operator kernels using the new APIs. New utilities in the
example implementation make the process of defining operator kernels
very similar to the existing process used by provider-bridge EPs.

First, the operator kernel class is defined:
```c++
// File: onnxruntime/test/autoep/library/kernels/memcpy.h
struct Memcpy : public OrtKernelImpl {
  static OrtStatus* Create(const OrtKernelInfo* info, void* state, /*out*/ std::unique_ptr<Memcpy>& kernel);

  Memcpy(const OrtKernelInfo* info, void* state);

  static OrtStatus* ORT_API_CALL ComputeImpl(OrtKernelImpl* this_ptr, OrtKernelContext* kernel_ctx) noexcept;
  static void ORT_API_CALL ReleaseImpl(OrtKernelImpl* this_ptr) noexcept;

  OrtStatus* DoCompute(OrtKernelContext* kernel_ctx) noexcept;

 private:
  const OrtKernelInfo* info_;
  void* state_;  // Custom state passed from OrtEp
};
```

Then, a macro defines a function that can be called to register the
operator with the EP's kernel registry:
```c++
// File: onnxruntime/test/autoep/library/kernels/memcpy.cc
ONNX_OPERATOR_KERNEL_EX(
    MemcpyFromHost,
    kOnnxDomain,
    1,
    (Ort::KernelDefBuilder()
         .SetInputMemType(0, OrtMemType::OrtMemTypeCPUInput)
         .AddTypeConstraint("T", MLDataTypes::GetTensorType(ONNX_TENSOR_ELEMENT_DATA_TYPE_FLOAT))),
    Memcpy)

ONNX_OPERATOR_KERNEL_EX(
    MemcpyToHost,
    kOnnxDomain,
    1,
    (Ort::KernelDefBuilder()
         .SetOutputMemType(0, OrtMemType::OrtMemTypeCPUOutput)
         .AddTypeConstraint("T", MLDataTypes::GetTensorType(ONNX_TENSOR_ELEMENT_DATA_TYPE_FLOAT))),
    Memcpy)
```

Lastly, the functions defined by the above macro are entered into a
table:

```c++
// File: onnxruntime/test/autoep/library/ep_kernel_registration.cc

// Include kernel files:
#include "kernels/memcpy.h"

// Forward declarations of kernel classes used as template args for BuildKernelCreateInfo
class ONNX_OPERATOR_KERNEL_CLASS_NAME(kOnnxDomain, 1, MemcpyFromHost);
class ONNX_OPERATOR_KERNEL_CLASS_NAME(kOnnxDomain, 1, MemcpyToHost);

// Table of BuildKernelCreateInfo functions for each operator
static const BuildKernelCreateInfoFn build_kernel_create_info_funcs[] = {
    BuildKernelCreateInfo<void>,  // Dummy to avoid table becoming empty.
    BuildKernelCreateInfo<ONNX_OPERATOR_KERNEL_CLASS_NAME(kOnnxDomain, 1, MemcpyFromHost)>,
    BuildKernelCreateInfo<ONNX_OPERATOR_KERNEL_CLASS_NAME(kOnnxDomain, 1, MemcpyToHost)>,
};
```
The [example EP processes the entries in the above
table](https://github.com/microsoft/onnxruntime/blob/adrianl/ep-abi-kernel-based-eps/onnxruntime/test/autoep/library/ep_kernel_registration.cc)
to add information about the supported operator kernels to the EP's
kernel registry (`OrtKernelRegistry`).

Additionally, during the call to `OrtEp::GetCapability`, an EP can now
lookup registered kernel definitions via the new API
`EpGraphSupportInfo_LookUpKernel`. Note that an EP would not normally
lookup kernels for `Memcpy**Host`, which are inserted by ORT. Instead,
it would be used to look up other registered operator kernels like
`Conv`, for example.

```c++
static OrtStatus* ORT_API_CALL GetCapabilityImpl(OrtEp* this_ptr, const OrtGraph* graph,
                                                           OrtEpGraphSupportInfo* graph_support_info) noexcept {
  // ...

  for (const OrtNode* node : nodes) {
    const OrtKernelDef* kernel_def = nullptr;
    OrtStatus* status = this_ep->ep_api->EpGraphSupportInfo_LookUpKernel(graph_support_info, node, &kernel_def);

    if (status != nullptr) {
      return status;
    }

    if (kernel_def != nullptr) {  // Take node if this EP has a registered kernel for it.
      if (OrtStatus* st = this_ep->ep_api->EpGraphSupportInfo_AddSingleNode(graph_support_info, node);
          st != nullptr) {
        return st;
      }
    }
  }

  return nullptr;
}
```
### EP implementation details

An EP instance (i.e., `OrtEp`) that needs to register operator kernels
with ONNX Runtime must implement the following
`OrtEp::GetKernelRegistry()` function:

| Function Signature | Description |
|--------------------|-------------|

|**GetKernelRegistry**<br/><br/>**Returns**:`OrtStatus*`<br/><br/>**Parameters:**<br/><ul><li>`OrtEp*
this_ptr`: The OrtEp instance.</li><li>`const OrtKernelRegistry**
kernel_registry`: Output parameter set to the EP's kernel registry,
which must remain valid throughout the lifetime of the EP.</li></ul>|
Gets the execution provider's kernel registry, if
any.<br/><br/>**Remarks:** A kernel registry contains kernel creation
information for operator kernels supported by an EP.<br/><br/>**Note:**
Implementation of this function is optional. If set to NULL, ORT assumes
the EP compiles nodes. |

If defined by the EP, the `OrtEp::GetKernelRegistry()` function is
[called by ONNX
Runtime](https://github.com/microsoft/onnxruntime/blob/0f7145f3809103c123de2d281a6b310677e6d56c/onnxruntime/core/session/plugin_ep/ep_plugin_provider_interfaces.cc#L146-L147)
after creating an instance of the `OrtEp` in order to retrieve the EP's
kernel registry.

#### APIs used by EP to add entries to kernel registry
An EP's kernel registry (`OrtKernelRegistry`) contains **information**
necessary for the (later) creation of operator kernels supported by an
EP. Conceptually, a kernel registry contains an array of "kernel
creation information" elements, one per operator. Each such element
consists of:
- A kernel **definition** (`OrtKernelDef`), which specifies operator
type, supported versions, type constraints, I/O memory types, etc.
- A function of type `OrtKernelCreateFunc` that ORT calls to create an
instance of the kernel (`OrtKernelImpl`).
- Custom opaque state (provided by the `OrtEp`) that is passed to the
`OrtKernelCreateFunc`.

An EP uses the following `OrtEpApi::KernelRegistry_AddKernel()` function
to add an entry for one supported operator.

| Function Signature | Description |
|--------------------|-------------|

|**KernelRegistry_AddKernel**<br/><br/>**Returns**:`OrtStatus*`<br/><br/>**Parameters:**<br/><ul><li>`OrtKernelRegistry*
kernel_registry`: The OrtKernelRegistry instance.</li><li>`const
OrtKernelDef* kernel_def`: The kernel definition, which includes
operator type, version, EP name, type constraints,
etc.</li><li>`OrtKernelCreateFunc kernel_create_func`: Function that
creates an instance of the operator kernel as a OrtKernelImpl
instance.</li><li>`void* kernel_create_func_state`: Custom state passed
to the kernel creation function. Can be null.</li></ul>| Adds kernel
creation information for a supported operator kernel to the given kernel
registry.<br/><br/>**Remarks:** Refer to OrtEp::GetKernelRegistry, which
returns an EP's kernel registry to ORT. |

##### Building a kernel definition
An EP uses a kernel definition builder (`OrtKernelDefBuilder`) to create
a kernel definition (`OrtKernelDef`). The following table lists **some**
of the C APIs related to building a kernel definition. The above
`ONNX_OPERATOR_KERNEL_EX` macro [uses these
APIs](https://github.com/microsoft/onnxruntime/blob/adrianl/ep-abi-kernel-based-eps/onnxruntime/test/autoep/library/kernels/utils.h#L42).

| Function Signature | Description |
|--------------------|-------------|

|**KernelDefBuilder_SetOperatorType**<br/><br/>**Returns**:`OrtStatus*`<br/><br/>**Parameters:**<br/><ul><li>`OrtKernelDefBuilder*
kernel_def_builder`: The OrtKernelDefBuilder instance.</li><li>`const
char* op_type`: A null-terminated string representing the operator
type.</li></ul>| Sets the kernel's operator type. |

|**KernelDefBuilder_SetDomain**<br/><br/>**Returns**:`OrtStatus*`<br/><br/>**Parameters:**<br/><ul><li>`OrtKernelDefBuilder*
kernel_def_builder`: The OrtKernelDefBuilder instance.</li><li>`const
char* domain`: A null-terminated string representing the operator's
domain.</li></ul>| Sets the kernel's domain. |
| ... | ... |

|**KernelDefBuilder_Build**<br/><br/>**Returns**:`OrtStatus*`<br/><br/>**Parameters:**<br/><ul><li>`OrtKernelDefBuilder*
kernel_def_builder`: The OrtKernelDefBuilder
instance.</li><li>`OrtKernelDef** kernel_def_out`: The new OrtKernelDef
instance.</li></ul>| Creates a OrtKernelDef instance from the given
kernel definition builder. |

##### Defining a kernel implementation
An EP defines a kernel implementation by initializing an instance of
`OrtKernelImpl` (shown below) with function pointers for computation,
release, etc.

```c++
struct OrtKernelImpl {
  uint32_t ort_version_supported;  ///< Must be initialized to ORT_API_VERSION

  /** \brief Computation function called to execute the kernel on an EP.
   *
   * \param[in] this_ptr The OrtKernelImpl instance.
   * \param[in] context The OrtKernelContext instance that provides access to the inputs and outputs.
   *
   * \snippet{doc} snippets.dox OrtStatus Return Value
   *
   * \since Version 1.24.
   */
  ORT_API2_STATUS(Compute, _In_ OrtKernelImpl* this_ptr, _In_ OrtKernelContext* context);

  /** \brief Called by ORT to release the OrtKernelImpl instance and its resources.
   *
   * \param[in] this_ptr The OrtKernelImpl instance.
   *
   * \since Version 1.24.
   */
  ORT_API_T(void, Release, _In_ OrtKernelImpl* this_ptr);
};
```

As shown previously, the example EP creates a `Memcpy` class that
inherits from `OrtKernelImpl` and [implements the above
functions](https://github.com/microsoft/onnxruntime/blob/adrianl/ep-abi-kernel-based-eps/onnxruntime/test/autoep/library/kernels/memcpy.cc).

##### Defining a kernel creation function
An EP must provide a function of type `OrtKernelCreateFunc` that ORT can
later call to create an instance of a kernel (`OrtKernelImpl`). The
signature of the `OrtKernelCreateFunc` is shown below.

```c++
/** \brief Type definition for a function that creates an OrtKernelImpl instance for an operator kernel.
 *
 * \param[in] ctx Unused/reserved for future use.
 * \param[in] kernel_create_func_state Opaque state initially provided by the EP that registered the kernel.
 *                                     Refer to OrtEpApi::KernelRegistry_AddKernel(). May be null.
 * \param[in] info The OrtKernelInfo instance that provides access to the kernel's input and output characteristics.
 * \param[out] kernel_out Output parameter set to the new OrtKernelImpl instance.
 *
 * \snippet{doc} snippets.dox OrtStatus Return Value
 *
 * \since Version 1.24.
 */
typedef OrtStatus*(ORT_API_CALL* OrtKernelCreateFunc)(_In_ OrtKernelCreateContext* ctx,  // unused/reserved as of 1.24
                                                      _In_ void* kernel_create_func_state,
                                                      _In_ const OrtKernelInfo* info,
                                                      _Outptr_result_maybenull_ OrtKernelImpl** kernel_out);
```

The example EP declares kernel creation functions via use of the
previously mentioned `ONNX_OPERATOR_KERNEL_EX`
[macro](https://github.com/microsoft/onnxruntime/blob/adrianl/ep-abi-kernel-based-eps/onnxruntime/test/autoep/library/kernels/utils.h#L56-L64).
If one were to expand the macro call, the kernel creation function for
`MemcpyFromHost` would look similar to the following snippet:

```c++
OrtStatus* ORT_API_CALL CreateMemcpyKernel(OrtKernelCreateContext* /*ctx*/, void* kernel_create_func_state,
                                           const OrtKernelInfo* info, OrtKernelImpl** kernel_out) {
  *kernel_out = nullptr;

  std::unique_ptr<Memcpy> kernel;
  RETURN_IF_ERROR(Memcpy::Create(info, kernel_create_func_state, kernel));

  *kernel_out = kernel.release();
  return nullptr;
}
```

### Motivation and Context
<!-- - Why is this change required? What problem does it solve?
- If it fixes an open issue, please link to the issue here. -->

---------

Co-authored-by: Copilot <[email protected]>
Co-authored-by: Edward Chen <[email protected]>
alex-spacemit pushed a commit to spacemit-com/onnxruntime that referenced this pull request Jan 20, 2026
…icrosoft#26754)

### Description
- Adds C APIs to support pre-packing of const weights for
`OrtKernelImpl` implementations.
- APIs optionally support sharing of pre-packed weight data (for
cpu-accessible memory).
- Updates example kernel (Mul) to use new pre-packing API. Tested by
existing unit test:
https://github.com/microsoft/onnxruntime/blob/549d7415e26e2b3f86c42f86e135bb746caa37b4/onnxruntime/test/autoep/test_execution.cc#L242-L256


### Motivation and Context
The [previous PR](microsoft#26206)
added the base APIs that support kernel-based plugin EPs. This PR adds
an additional feature that was identified as necessary for the port of
WebGPU EP.

---------

Co-authored-by: Edward Chen <[email protected]>
alex-spacemit pushed a commit to spacemit-com/onnxruntime that referenced this pull request Jan 27, 2026
…icrosoft#26754)

- Adds C APIs to support pre-packing of const weights for
`OrtKernelImpl` implementations.
- APIs optionally support sharing of pre-packed weight data (for
cpu-accessible memory).
- Updates example kernel (Mul) to use new pre-packing API. Tested by
existing unit test:
https://github.com/microsoft/onnxruntime/blob/549d7415e26e2b3f86c42f86e135bb746caa37b4/onnxruntime/test/autoep/test_execution.cc#L242-L256

The [previous PR](microsoft#26206)
added the base APIs that support kernel-based plugin EPs. This PR adds
an additional feature that was identified as necessary for the port of
WebGPU EP.

---------

Co-authored-by: Edward Chen <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants