Skip to content

Zero-copy I/O for plugin EPs with HOST_ACCESSIBLE memory#28037

Merged
yuslepukhin merged 20 commits into
microsoft:mainfrom
ericcraw:host-accessible-allocator
Jun 2, 2026
Merged

Zero-copy I/O for plugin EPs with HOST_ACCESSIBLE memory#28037
yuslepukhin merged 20 commits into
microsoft:mainfrom
ericcraw:host-accessible-allocator

Conversation

@ericcraw

Copy link
Copy Markdown
Contributor

Description

Adds DevicesAreMemoryCompatible() to skip data copies between devices that share memory (CPU <-> HOST_ACCESSIBLE, or HOST_ACCESSIBLE <-> DEFAULT on the same physical device). Applied in feed/fetch copy planning and in BatchOrCopyMLValue.

Overrides GetOrtDeviceByMemType() in PluginExecutionProvider so the allocation planner routes CPU-type I/O through the HOST_ACCESSIBLE allocator when the plugin EP has registered one. This enables the planner to place intermediate tensors (CPU EP -> plugin EP boundary) in HOST_ACCESSIBLE memory, avoiding copies at the partition boundary.

Updates the in-place optimization check in the allocation planner to use UsesCpuMemory() so it recognizes HOST_ACCESSIBLE outputs as CPU-memory-compatible.

Motivation and Context

Remove unnecessary copies for non-cpu HOST_ACCESSIBLE device allocations.

Adds DevicesAreMemoryCompatible() to skip data copies between devices
that share memory (CPU <-> HOST_ACCESSIBLE, or HOST_ACCESSIBLE <->
DEFAULT on the same physical device). Applied in feed/fetch copy
planning and in BatchOrCopyMLValue.

Overrides GetOrtDeviceByMemType() in PluginExecutionProvider so the
allocation planner routes CPU-type I/O through the HOST_ACCESSIBLE
allocator when the plugin EP has registered one. This enables the
planner to place intermediate tensors (CPU EP -> plugin EP boundary)
in HOST_ACCESSIBLE memory, avoiding copies at the partition boundary.

Updates the in-place optimization check in the allocation planner to
use UsesCpuMemory() so it recognises HOST_ACCESSIBLE outputs as
CPU-memory-compatible.
Comment thread onnxruntime/core/framework/utils.cc Outdated
Comment thread onnxruntime/core/framework/utils.cc Outdated
Comment thread onnxruntime/core/framework/utils.cc
Comment thread onnxruntime/core/session/plugin_ep/ep_plugin_provider_interfaces.cc Outdated
@yuslepukhin

Copy link
Copy Markdown
Member

There are no unit tests for DevicesAreMemoryCompatible. Given the function has five distinct logical branches (both CPU, CPU + HOST_ACCESSIBLE same device, CPU + HOST_ACCESSIBLE different device, HOST_ACCESSIBLE ↔ DEFAULT same device, incompatible), it should have dedicated unit tests covering each case.

@yuslepukhin yuslepukhin left a comment

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🕐

Copilot AI left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Adds HOST_ACCESSIBLE-aware device selection and copy-planning logic to reduce (or eliminate) unnecessary feed/fetch and boundary copies for plugin execution providers that register host-accessible memory.

Changes:

  • Override PluginExecutionProvider::GetOrtDeviceByMemType to route CPU I/O mem types through a registered HOST_ACCESSIBLE allocator/device.
  • Introduce DevicesAreMemoryCompatible() and apply it to feed/fetch copy planning and BatchOrCopyMLValue to skip transfers when devices are deemed memory-compatible.
  • Update allocation planner CPU-memory checks to use OrtDevice::UsesCpuMemory() (so HOST_ACCESSIBLE is treated as CPU-memory-compatible).

Reviewed changes

Copilot reviewed 4 out of 4 changed files in this pull request and generated 3 comments.

File Description
onnxruntime/core/session/plugin_ep/ep_plugin_provider_interfaces.h Declares PluginExecutionProvider override for mem-type → device mapping.
onnxruntime/core/session/plugin_ep/ep_plugin_provider_interfaces.cc Implements HOST_ACCESSIBLE routing for CPUInput/CPUOutput mem types.
onnxruntime/core/framework/utils.cc Adds memory-compatibility logic and uses it to skip copies + reuse fetch buffers.
onnxruntime/core/framework/allocation_planner.cc Treats HOST_ACCESSIBLE as CPU-memory-compatible in in-place planning checks.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment thread onnxruntime/core/session/plugin_ep/ep_plugin_provider_interfaces.cc Outdated
Comment thread onnxruntime/core/session/plugin_ep/ep_plugin_provider_interfaces.cc Outdated
Comment thread onnxruntime/core/framework/utils.cc Outdated
Comment thread onnxruntime/core/framework/allocation_planner.cc Outdated
Comment thread onnxruntime/core/framework/allocation_planner.cc
@ericcraw ericcraw force-pushed the host-accessible-allocator branch from cf5a86b to 204dcd7 Compare April 17, 2026 00:53
@ericcraw

Copy link
Copy Markdown
Contributor Author

Thanks for the feedback! I've ran out of time today unfortunately and I'm going to be out until Wednesday next week. Hopefully you'll hear from me again by then. 😄

@ericcraw ericcraw marked this pull request as ready for review April 30, 2026 22:56
@ericcraw ericcraw requested a review from Copilot April 30, 2026 22:57

Copilot AI left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 8 out of 8 changed files in this pull request and generated 4 comments.


💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment thread onnxruntime/core/framework/utils.cc Outdated
Comment thread onnxruntime/core/framework/utils.cc Outdated
Comment thread onnxruntime/core/session/plugin_ep/ep_plugin_provider_interfaces.cc
Comment thread onnxruntime/core/session/plugin_ep/ep_plugin_provider_interfaces.cc Outdated
Comment thread onnxruntime/core/session/plugin_ep/ep_plugin_provider_interfaces.cc Outdated
Comment thread include/onnxruntime/core/session/onnxruntime_ep_c_api.h Outdated
Comment thread include/onnxruntime/core/session/onnxruntime_ep_c_api.h
Comment thread onnxruntime/core/session/plugin_ep/ep_plugin_provider_interfaces.h Outdated
yuslepukhin
yuslepukhin previously approved these changes Jun 1, 2026

@yuslepukhin yuslepukhin left a comment

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@ericcraw

ericcraw commented Jun 2, 2026

Copy link
Copy Markdown
Contributor Author

@yuslepukhin, just minor fix on the ort/enforce. Thanks for rerunning the checks, it looks like it's passing now.

@nieubank nieubank left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for putting this together, I just confirmed that this fixes a long standing issue I've been hitting recently with a certain EP and IoBinding.

@yuslepukhin yuslepukhin merged commit a827f9c into microsoft:main Jun 2, 2026
86 checks passed
yuslepukhin pushed a commit that referenced this pull request Jun 4, 2026
### Description
Adds memory_info= parameter to OrtValue.ortvalue_from_shape_and_type(),
backed by two new C-level factory methods that look up the registered
shared allocator via the full OrtMemoryInfo (including mem_type).

This is required because the current shared allocator query doesn't
include the memory type making HOST_ACCESSIBLE invisible to python.
UsesCpuMemory() is used in GetPyObjFromTensor so that tensors in
HOST_ACCESSIBLE memory are returned as zero-copy numpy views.

### Motivation and Context
Enable zero copy interop between numpy and ortvalue.

This is a follow up for
#28037
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

6 participants