-
Notifications
You must be signed in to change notification settings - Fork 3.7k
[WebGPU] Register DataTransfer to Env #26450
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
04434ea to
47b3a2d
Compare
fs-eire
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The Environment::data_transfer_mgr_ is different from InferenceSession::data_transfer_mgr_. They are the same type, but the one inside Environment should not depend on any session instance. The changes in this PR brings dependency on a specific session for Environment and I believe this is not what we want.
There is an existing method CreateAndRegisterInternalEps in class Environment, which should have already called RegisterExecutionProviderLibrary on WebGPU. Why the data transfer is not correctly registered - I can take a look at it.
|
Please add override
in class |
WebGPU DataTransfer requires a BufferManager. |
Done. Use the context 0's buffer manager. Will create one if not exist. |
This PR enables the graph capture for webgpu. It implements CopyDeviceToCpu\CopyCpuToDevice\CopyFrom\Zero functions using the new `CopyTensors` API. The ort part needs to apply this PR [#26450](microsoft/onnxruntime#26450) to make it work for webgpu. Below things will be implemented in following-up PRs to get the full performance gain for graph capture (The original one is #1720). 1. Support UpdateAttentionMask, UpdatePositionIds, and Cast to keep the whole pipeline on gpu. 2. Optimize CopyFrom with offsets --------- Co-authored-by: Copilot <[email protected]>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Pull request overview
This pull request adds WebGPU data transfer functionality to the ONNX Runtime core, enabling tensor copying between CPU and GPU devices via the WebGPU execution provider. The implementation provides a C API wrapper with lazy initialization that determines the WebGPU context from the tensors during the first copy operation.
Key Changes:
- Adds
CreateDataTransfermethod toWebGpuEpFactoryfor registering data transfer with the environment - Implements
WebGpuDataTransferImplwrapper that bridges C API and C++ internal data transfer implementation - Introduces lazy initialization of WebGPU context based on tensor device information
Reviewed changes
Copilot reviewed 6 out of 6 changed files in this pull request and generated 8 comments.
Show a summary per file
| File | Description |
|---|---|
| onnxruntime/core/session/plugin_ep/ep_factory_webgpu.h | Declares CreateDataTransfer override method in WebGpuEpFactory |
| onnxruntime/core/session/plugin_ep/ep_factory_webgpu.cc | Implements CreateDataTransfer by calling WebGPU provider's C API function |
| onnxruntime/core/providers/webgpu/webgpu_provider_factory_creator.h | Declares C API function OrtWebGpuCreateDataTransfer() for creating data transfer instances |
| onnxruntime/core/providers/webgpu/webgpu_provider_factory.cc | Implements data transfer wrapper with lazy context initialization, helper functions, and vendor ID filtering |
| onnxruntime/core/providers/webgpu/webgpu_context.h | Adds HasContext method to WebGpuContextFactory for checking context existence |
| onnxruntime/core/providers/webgpu/webgpu_context.cc | Implements HasContext method with thread-safe context lookup |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
This PR enables the graph capture for webgpu. It implements CopyDeviceToCpu\CopyCpuToDevice\CopyFrom\Zero functions using the new `CopyTensors` API. The ort part needs to apply this PR [#26450](microsoft/onnxruntime#26450) to make it work for webgpu. Below things will be implemented in following-up PRs to get the full performance gain for graph capture (The original one is #1720). 1. Support UpdateAttentionMask, UpdatePositionIds, and Cast to keep the whole pipeline on gpu. 2. Optimize CopyFrom with offsets --------- Co-authored-by: Copilot <[email protected]>
This pull request adds a C API for WebGPU data transfer, enabling tensor copying between CPU and GPU devices via the WebGPU execution provider. The main changes introduce a wrapper implementation for data transfer, integrate it with the plugin execution provider factory, and expose a creation function for use by the ONNX Runtime core.
This pull request adds a C API for WebGPU data transfer, enabling tensor copying between CPU and GPU devices via the WebGPU execution provider. The main changes introduce a wrapper implementation for data transfer, integrate it with the plugin execution provider factory, and expose a creation function for use by the ONNX Runtime core.