Skip to content

Conversation

@skottmckay
Copy link
Contributor

@skottmckay skottmckay commented Jul 2, 2025

Description

Plugin EP data transfer and Stream support.

Add the ability for a plugin EP to provide an IDataTransfer implementation and an OrtSyncStream implementation to do async data copy outside of an inference session.

Example usage added for CUDA EP.

Caveat: Support for providing the OrtSyncStream from the data copy to Session.Run will be a follow up PR. For the CUDA EP we can pass in the native cudaStream_t from the OrtSyncStream used for the data copy to the Run via CUDA EP provider options.

Motivation and Context

TODO: Flesh out for usage on ORT API boundary for model inputs/outputs including usage involving interop.
@skottmckay skottmckay requested review from adrianlizarraga and edgchen1 and removed request for adrianlizarraga July 2, 2025 08:37
@skottmckay
Copy link
Contributor Author

@gedoensmax
@gaugarg-nv

@bil-ash
Copy link

bil-ash commented Jul 7, 2025

Will this PR allow streaming the models from disk, thereby reducing device memory usage?
Or is it between CPU and GPU only?
If it is between CPU/GPU and disk, please add support for WebGPU or WASM also.

Can be used with CopyTensors with the cudaStream_t from the OrtSyncStream being passed in via provider options (existing setup).
Updated CopyTensors to take a single stream. User can make multiple calls if they need to use multiple streams.
Data copy test with CUDA works, including passing in the cudaStream_t to the session via provider options. Follow up PR will make it possible to provide an OrtSyncStream to the Run instead.
@skottmckay skottmckay changed the title Initial pieces for plugin EP Stream support. Plugin EP DataCopy and Stream support. Jul 9, 2025
@skottmckay skottmckay marked this pull request as ready for review July 9, 2025 01:32
Copy link
Contributor

@github-actions github-actions bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You can commit the suggested changes from lintrunner.

Copy link
Contributor

@github-actions github-actions bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You can commit the suggested changes from lintrunner.

@skottmckay skottmckay changed the title Plugin EP DataCopy and Stream support. Plugin EP data transfer and Stream support. Jul 9, 2025
Check return status in data copy unit test.
```
error: The difference between expected[i] and results[i] is 8.8930130004882812e-05, which exceeds 1e-5, where
expected[i] evaluates to -1.2537282705307007,
results[i] evaluates to -1.2536393404006958, and
1e-5 evaluates to 1.0000000000000001e-05.
```
…reateSyncStreamForDevice. An EP implementation may be dependent on the per-session settings or need to create a per-session allocator (WebGPU does) so we need to either provide the OrtEp this way, or add create allocator/stream functions to OrtEp.

Add new function implementations to QNN OrtEpFactory.
Copy link
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull Request Overview

Adds support for asynchronous data transfer and stream synchronization for plugin execution providers (EPs), including new C API extensions, environment/allocator updates, and example implementations.

  • Introduce OrtSyncStream, OrtSyncNotification, and CopyTensors APIs in the C API and internal OrtApis.
  • Extend Environment to register/unregister plugin EP data transfer, allocators (including arena wrap), and manage sync streams.
  • Provide example stream/notification implementations for CUDA and the plugin EP test library, plus new tests in test_data_copy.cc.

Reviewed Changes

Copilot reviewed 42 out of 42 changed files in this pull request and generated 2 comments.

Show a summary per file
File Description
onnxruntime/test/shared_lib/test_data_copy.cc Add tests for plugin EP data copy and optional stream use
onnxruntime/core/session/ort_apis.h Extend C API with SyncStream, SyncNotification, and CopyTensors
onnxruntime/core/framework/plugin_ep_stream.* Implement plugin EP stream & notification wrapper classes
onnxruntime/core/session/environment.cc Register plugin EP data transfers and manage shared allocators
onnxruntime/core/session/ep_factory_internal.* Refactor EP factory to support stream creation & data transfer
Comments suppressed due to low confidence (1)

onnxruntime/test/autoep/library/ep_data_transfer.cc:66

  • [nitpick] There's a typo in the comment ('teh'). Consider correcting the spelling or removing this outdated commented line.
    // the implementation for a 'real' EP would be something along these lines.

Add functions to create allocator and sync stream to OrtEp struct so an EP can create per-session instances if it chooses to. Needed for CUDA (sync stream needs to read session option for graph capture) and webgpu (session allocator uses different memory type to shared allocator).
@skottmckay skottmckay merged commit 9d11ae2 into main Jul 19, 2025
149 of 155 checks passed
@skottmckay skottmckay deleted the skottmckay/OrtSyncStream branch July 19, 2025 02:27
@snnn
Copy link
Contributor

snnn commented Jul 25, 2025

Hi there! We haven't cut the release branch for this version yet, so I'm removing the release:1.23.0 label for now to keep things tidy. Thanks so much for your contribution! We'll make sure this gets included when the release is prepared. 🤖

qti-yuduo pushed a commit to CodeLinaro/onnxruntime that referenced this pull request Aug 8, 2025
### Description
<!-- Describe your changes. -->
Plugin EP data transfer and Stream support.

Add the ability for a plugin EP to provide an IDataTransfer
implementation and an OrtSyncStream implementation to do async data copy
outside of an inference session.

Example usage added for CUDA EP.

Caveat: Support for providing the OrtSyncStream from the data copy to
Session.Run will be a follow up PR. For the CUDA EP we can pass in the
native cudaStream_t from the OrtSyncStream used for the data copy to the
Run via CUDA EP provider options.

### Motivation and Context
<!-- - Why is this change required? What problem does it solve?
- If it fixes an open issue, please link to the issue here. -->
sanketkaleoss pushed a commit to sanketkaleoss/onnxruntime that referenced this pull request Aug 11, 2025
### Description
<!-- Describe your changes. -->
Plugin EP data transfer and Stream support.

Add the ability for a plugin EP to provide an IDataTransfer
implementation and an OrtSyncStream implementation to do async data copy
outside of an inference session.

Example usage added for CUDA EP.

Caveat: Support for providing the OrtSyncStream from the data copy to
Session.Run will be a follow up PR. For the CUDA EP we can pass in the
native cudaStream_t from the OrtSyncStream used for the data copy to the
Run via CUDA EP provider options.

### Motivation and Context
<!-- - Why is this change required? What problem does it solve?
- If it fixes an open issue, please link to the issue here. -->
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

8 participants