Skip to content

Conversation

@qjia7
Copy link
Contributor

@qjia7 qjia7 commented Oct 30, 2025

This pull request adds a C API for WebGPU data transfer, enabling tensor copying between CPU and GPU devices via the WebGPU execution provider. The main changes introduce a wrapper implementation for data transfer, integrate it with the plugin execution provider factory, and expose a creation function for use by the ONNX Runtime core.

@qjia7 qjia7 force-pushed the data_transfer_mgr branch from 04434ea to 47b3a2d Compare November 11, 2025 07:39
@qjia7 qjia7 marked this pull request as ready for review November 12, 2025 10:12
Copy link
Contributor

@fs-eire fs-eire left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The Environment::data_transfer_mgr_ is different from InferenceSession::data_transfer_mgr_. They are the same type, but the one inside Environment should not depend on any session instance. The changes in this PR brings dependency on a specific session for Environment and I believe this is not what we want.

There is an existing method CreateAndRegisterInternalEps in class Environment, which should have already called RegisterExecutionProviderLibrary on WebGPU. Why the data transfer is not correctly registered - I can take a look at it.

@fs-eire
Copy link
Contributor

fs-eire commented Nov 13, 2025

Please add override CreateDataTransfer:

virtual OrtStatus* CreateDataTransfer(_Outptr_result_maybenull_ OrtDataTransferImpl** data_transfer) noexcept {
*data_transfer = nullptr;
return nullptr; // Default implementation does nothing
}

in class WebGpuEpFactory (file in https://github.com/Microsoft/onnxruntime/blob/main/onnxruntime/core/session/plugin_ep/ep_factory_webgpu.h) for the purpose of this change.

@qjia7
Copy link
Contributor Author

qjia7 commented Nov 13, 2025

Please add override CreateDataTransfer:

virtual OrtStatus* CreateDataTransfer(_Outptr_result_maybenull_ OrtDataTransferImpl** data_transfer) noexcept {
*data_transfer = nullptr;
return nullptr; // Default implementation does nothing
}

in class WebGpuEpFactory (file in https://github.com/Microsoft/onnxruntime/blob/main/onnxruntime/core/session/plugin_ep/ep_factory_webgpu.h) for the purpose of this change.

WebGPU DataTransfer requires a BufferManager.
For graph capture, the BufferManager is tied to the execution provider instance not session-independent. That's the problem.

@qjia7
Copy link
Contributor Author

qjia7 commented Nov 17, 2025

Please add override CreateDataTransfer:

virtual OrtStatus* CreateDataTransfer(_Outptr_result_maybenull_ OrtDataTransferImpl** data_transfer) noexcept {
*data_transfer = nullptr;
return nullptr; // Default implementation does nothing
}

in class WebGpuEpFactory (file in https://github.com/Microsoft/onnxruntime/blob/main/onnxruntime/core/session/plugin_ep/ep_factory_webgpu.h) for the purpose of this change.

Done. Use the context 0's buffer manager. Will create one if not exist.

@qjia7 qjia7 requested a review from fs-eire November 17, 2025 05:51
@guschmue guschmue added the ep:WebGPU ort-web webgpu provider label Nov 21, 2025
qjia7 added a commit to microsoft/onnxruntime-genai that referenced this pull request Nov 25, 2025
This PR enables the graph capture for webgpu. It implements
CopyDeviceToCpu\CopyCpuToDevice\CopyFrom\Zero functions using the new
`CopyTensors` API.

The ort part needs to apply this PR
[#26450](microsoft/onnxruntime#26450) to make it
work for webgpu.

Below things will be implemented in following-up PRs to get the full
performance gain for graph capture (The original one is
#1720).
1. Support UpdateAttentionMask, UpdatePositionIds, and Cast to keep the
whole pipeline on gpu.
2. Optimize CopyFrom with offsets

---------

Co-authored-by: Copilot <[email protected]>
@qjia7
Copy link
Contributor Author

qjia7 commented Nov 25, 2025

@fs-eire @guschmue The webgpu related failures have been fixed. Others are not related with my changes. Please take a look, thanks.

Copy link
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This pull request adds WebGPU data transfer functionality to the ONNX Runtime core, enabling tensor copying between CPU and GPU devices via the WebGPU execution provider. The implementation provides a C API wrapper with lazy initialization that determines the WebGPU context from the tensors during the first copy operation.

Key Changes:

  • Adds CreateDataTransfer method to WebGpuEpFactory for registering data transfer with the environment
  • Implements WebGpuDataTransferImpl wrapper that bridges C API and C++ internal data transfer implementation
  • Introduces lazy initialization of WebGPU context based on tensor device information

Reviewed changes

Copilot reviewed 6 out of 6 changed files in this pull request and generated 8 comments.

Show a summary per file
File Description
onnxruntime/core/session/plugin_ep/ep_factory_webgpu.h Declares CreateDataTransfer override method in WebGpuEpFactory
onnxruntime/core/session/plugin_ep/ep_factory_webgpu.cc Implements CreateDataTransfer by calling WebGPU provider's C API function
onnxruntime/core/providers/webgpu/webgpu_provider_factory_creator.h Declares C API function OrtWebGpuCreateDataTransfer() for creating data transfer instances
onnxruntime/core/providers/webgpu/webgpu_provider_factory.cc Implements data transfer wrapper with lazy context initialization, helper functions, and vendor ID filtering
onnxruntime/core/providers/webgpu/webgpu_context.h Adds HasContext method to WebGpuContextFactory for checking context existence
onnxruntime/core/providers/webgpu/webgpu_context.cc Implements HasContext method with thread-safe context lookup

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

guschmue
guschmue previously approved these changes Dec 1, 2025
kunal-vaishnavi pushed a commit to microsoft/onnxruntime-genai that referenced this pull request Dec 5, 2025
This PR enables the graph capture for webgpu. It implements
CopyDeviceToCpu\CopyCpuToDevice\CopyFrom\Zero functions using the new
`CopyTensors` API.

The ort part needs to apply this PR
[#26450](microsoft/onnxruntime#26450) to make it
work for webgpu.

Below things will be implemented in following-up PRs to get the full
performance gain for graph capture (The original one is
#1720).
1. Support UpdateAttentionMask, UpdatePositionIds, and Cast to keep the
whole pipeline on gpu.
2. Optimize CopyFrom with offsets

---------

Co-authored-by: Copilot <[email protected]>
@qjia7 qjia7 requested a review from fs-eire December 9, 2025 12:21
@qjia7 qjia7 merged commit 0aebe82 into main Dec 15, 2025
91 checks passed
@qjia7 qjia7 deleted the data_transfer_mgr branch December 15, 2025 00:45
Sumit2318 pushed a commit that referenced this pull request Jan 6, 2026
This pull request adds a C API for WebGPU data transfer, enabling tensor
copying between CPU and GPU devices via the WebGPU execution provider.
The main changes introduce a wrapper implementation for data transfer,
integrate it with the plugin execution provider factory, and expose a
creation function for use by the ONNX Runtime core.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

ep:WebGPU ort-web webgpu provider

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants