Skip to content

Conversation

@fs-eire
Copy link
Contributor

@fs-eire fs-eire commented Sep 3, 2025

Description

Reduce the time blocked waiting for the shader to be compiled.

Motivation and Context

Try to optimize the responsiveness of the application when running ort-web in main thread. See #25882

@grazder
Copy link

grazder commented Sep 4, 2025

Try to optimize the responsiveness of the application when running ort-web in main thread

I actually launch ORT-WEB in worker, so these GPU blocks appear regardless of whether it is launched in worker or in main thread

@guschmue guschmue added the ep:WebGPU ort-web webgpu provider label Sep 4, 2025
@fs-eire
Copy link
Contributor Author

fs-eire commented Sep 4, 2025

Try to optimize the responsiveness of the application when running ort-web in main thread

I actually launch ORT-WEB in worker, so these GPU blocks appear regardless of whether it is launched in worker or in main thread

Do you mean that the UI responsiveness problem mentioned in #25882 is caused by GPU exhausted but not caused by the UI threads running JavaScript?

@grazder
Copy link

grazder commented Sep 5, 2025

Do you mean that the UI responsiveness problem mentioned in #25882 is caused by GPU exhausted but not caused by the UI threads running JavaScript?

Yes, the main problem is that when the model initialized, it causes large GPU operations (not CPU operations in the main thread) that lock up the GPU and prevent the user interface from being rendered, which is also rendered using the GPU.

The image shows that during large GPU-based operations, frames were not rendered.
image

@qjia7
Copy link
Contributor

qjia7 commented Sep 5, 2025

I think the async compilation is resolving the cpu issue that gpu process is occupied a long time due to shader compilation. The UI  threads' render commands have to wait on gpu process until one CreateComputePipeline is finished. So with this change, the CreateComputePipeline is moved into a gpu thread and won't block the gpu main thread so that the ui commands can send to gpu in time.
GPU busy is another issue that one ort task is too big and the ui task has to be wait on gpu. Currently we batch 16 dispatches and submit once to minimize the submit overhead. Too frequently submit will bring gpu bubbles and not friendly for small operations or models. It's challenging to determine an optimal batch size that suits all models. Maybe we could consider exposing the batch size as a session option, allowing users to customize this value to better fit their needs.

@grazder
Copy link

grazder commented Sep 5, 2025

Maybe we could consider exposing the batch size as a session option, allowing users to customize this value to better fit their needs.

Yeah, that would be great

@fs-eire fs-eire requested a review from Copilot October 22, 2025 00:38
Copy link
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull Request Overview

This PR refactors the WebGPU shader compilation to use asynchronous pipeline creation, improving application responsiveness when running in the main thread. The change replaces synchronous CreateComputePipeline with CreateComputePipelineAsync to avoid blocking while waiting for shader compilation to complete.

Key Changes

  • ProgramManager constructor now accepts a WebGpuContext reference instead of separate device and limits parameters
  • Shader compilation changed from synchronous to asynchronous using CreateComputePipelineAsync with callback-based completion handling
  • Error handling added for async pipeline creation failures

Reviewed Changes

Copilot reviewed 3 out of 3 changed files in this pull request and generated 2 comments.

File Description
onnxruntime/core/providers/webgpu/webgpu_context.cc Updated ProgramManager instantiation to pass WebGpuContext reference
onnxruntime/core/providers/webgpu/program_manager.h Modified constructor to accept WebGpuContext reference and updated member variables
onnxruntime/core/providers/webgpu/program_manager.cc Implemented async shader compilation with CreateComputePipelineAsync and callback handling

Tip: Customize your code reviews with copilot-instructions.md. Create the file or learn how to get started.

@fs-eire fs-eire requested a review from guschmue October 22, 2025 00:43
guschmue
guschmue previously approved these changes Oct 22, 2025
@fs-eire
Copy link
Contributor Author

fs-eire commented Oct 24, 2025

@microsoft-github-policy-service rerun

@fs-eire fs-eire merged commit 954bb7b into main Oct 27, 2025
93 of 94 checks passed
@fs-eire fs-eire deleted the fs-eire/allow-compile-shader-async branch October 27, 2025 19:10
naomiOvad pushed a commit to naomiOvad/onnxruntime that referenced this pull request Nov 2, 2025
### Description

Reduce the time blocked waiting for the shader to be compiled.

### Motivation and Context

Try to optimize the responsiveness of the application when running
ort-web in main thread. See microsoft#25882
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

ep:WebGPU ort-web webgpu provider

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants