Zero-copy I/O for plugin EPs with HOST_ACCESSIBLE memory#28037
Conversation
Adds DevicesAreMemoryCompatible() to skip data copies between devices that share memory (CPU <-> HOST_ACCESSIBLE, or HOST_ACCESSIBLE <-> DEFAULT on the same physical device). Applied in feed/fetch copy planning and in BatchOrCopyMLValue. Overrides GetOrtDeviceByMemType() in PluginExecutionProvider so the allocation planner routes CPU-type I/O through the HOST_ACCESSIBLE allocator when the plugin EP has registered one. This enables the planner to place intermediate tensors (CPU EP -> plugin EP boundary) in HOST_ACCESSIBLE memory, avoiding copies at the partition boundary. Updates the in-place optimization check in the allocation planner to use UsesCpuMemory() so it recognises HOST_ACCESSIBLE outputs as CPU-memory-compatible.
|
There are no unit tests for DevicesAreMemoryCompatible. Given the function has five distinct logical branches (both CPU, CPU + HOST_ACCESSIBLE same device, CPU + HOST_ACCESSIBLE different device, HOST_ACCESSIBLE ↔ DEFAULT same device, incompatible), it should have dedicated unit tests covering each case. |
There was a problem hiding this comment.
Pull request overview
Adds HOST_ACCESSIBLE-aware device selection and copy-planning logic to reduce (or eliminate) unnecessary feed/fetch and boundary copies for plugin execution providers that register host-accessible memory.
Changes:
- Override
PluginExecutionProvider::GetOrtDeviceByMemTypeto route CPU I/O mem types through a registered HOST_ACCESSIBLE allocator/device. - Introduce
DevicesAreMemoryCompatible()and apply it to feed/fetch copy planning andBatchOrCopyMLValueto skip transfers when devices are deemed memory-compatible. - Update allocation planner CPU-memory checks to use
OrtDevice::UsesCpuMemory()(so HOST_ACCESSIBLE is treated as CPU-memory-compatible).
Reviewed changes
Copilot reviewed 4 out of 4 changed files in this pull request and generated 3 comments.
| File | Description |
|---|---|
| onnxruntime/core/session/plugin_ep/ep_plugin_provider_interfaces.h | Declares PluginExecutionProvider override for mem-type → device mapping. |
| onnxruntime/core/session/plugin_ep/ep_plugin_provider_interfaces.cc | Implements HOST_ACCESSIBLE routing for CPUInput/CPUOutput mem types. |
| onnxruntime/core/framework/utils.cc | Adds memory-compatibility logic and uses it to skip copies + reuse fetch buffers. |
| onnxruntime/core/framework/allocation_planner.cc | Treats HOST_ACCESSIBLE as CPU-memory-compatible in in-place planning checks. |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
cf5a86b to
204dcd7
Compare
|
Thanks for the feedback! I've ran out of time today unfortunately and I'm going to be out until Wednesday next week. Hopefully you'll hear from me again by then. 😄 |
Co-authored-by: Copilot <copilot@github.com>
Co-authored-by: Copilot <copilot@github.com>
There was a problem hiding this comment.
Pull request overview
Copilot reviewed 8 out of 8 changed files in this pull request and generated 4 comments.
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
|
@yuslepukhin, just minor fix on the ort/enforce. Thanks for rerunning the checks, it looks like it's passing now. |
nieubank
left a comment
There was a problem hiding this comment.
Thanks for putting this together, I just confirmed that this fixes a long standing issue I've been hitting recently with a certain EP and IoBinding.
### Description Adds memory_info= parameter to OrtValue.ortvalue_from_shape_and_type(), backed by two new C-level factory methods that look up the registered shared allocator via the full OrtMemoryInfo (including mem_type). This is required because the current shared allocator query doesn't include the memory type making HOST_ACCESSIBLE invisible to python. UsesCpuMemory() is used in GetPyObjFromTensor so that tensors in HOST_ACCESSIBLE memory are returned as zero-copy numpy views. ### Motivation and Context Enable zero copy interop between numpy and ortvalue. This is a follow up for #28037
Description
Adds DevicesAreMemoryCompatible() to skip data copies between devices that share memory (CPU <-> HOST_ACCESSIBLE, or HOST_ACCESSIBLE <-> DEFAULT on the same physical device). Applied in feed/fetch copy planning and in BatchOrCopyMLValue.
Overrides GetOrtDeviceByMemType() in PluginExecutionProvider so the allocation planner routes CPU-type I/O through the HOST_ACCESSIBLE allocator when the plugin EP has registered one. This enables the planner to place intermediate tensors (CPU EP -> plugin EP boundary) in HOST_ACCESSIBLE memory, avoiding copies at the partition boundary.
Updates the in-place optimization check in the allocation planner to use UsesCpuMemory() so it recognizes HOST_ACCESSIBLE outputs as CPU-memory-compatible.
Motivation and Context
Remove unnecessary copies for non-cpu HOST_ACCESSIBLE device allocations.