-
Notifications
You must be signed in to change notification settings - Fork 3.7k
[webgpu] Optimize generic 4D Transpose using OIHW2OHWI Program #26942
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
1b8e605 to
be0ea7b
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Pull request overview
This PR optimizes generic 4D transpose operations by migrating the specialized OIHW2OHWIProgram shader from the Im2ColMatMul operator to the Transpose operator. The migration enables reuse of this optimized shader for any 4D tensor transpose with the {0, 2, 3, 1} permutation pattern, while also fixing a calculation bug in the process.
- Moves OIHW2OHWIProgram class and implementation from im2col_matmul to transpose
- Relocates the WGSL shader template from nn/ to tensor/ directory
- Fixes a bug where H_W_tiles was calculated using
kernel_height * kernel_heightinstead ofkernel_height * kernel_width
Reviewed changes
Copilot reviewed 4 out of 5 changed files in this pull request and generated no comments.
Show a summary per file
| File | Description |
|---|---|
| onnxruntime/core/providers/webgpu/tensor/transpose.h | Adds OIHW2OHWIProgram class declaration with uniform variable definitions |
| onnxruntime/core/providers/webgpu/tensor/transpose.cc | Implements OIHW2OHWIProgram shader generation and integrates the optimization into DoTranspose with proper threshold checks; fixes bug in H_W_tiles calculation |
| onnxruntime/core/providers/webgpu/tensor/oihw_to_ohwi.wgsl.template | Adds the WGSL shader template for the OIHW to OHWI transpose operation with proper bounds checking and workgroup synchronization |
| onnxruntime/core/providers/webgpu/nn/im2col_matmul.h | Removes OIHW2OHWIProgram class declaration as it's moved to transpose |
| onnxruntime/core/providers/webgpu/nn/im2col_matmul.cc | Replaces local OIHW2OHWI implementation with call to TransposeKernel; adds conv.h include for the function |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
|
Fix the CI error for span. |
b0d9657 to
61b825a
Compare
|
Thanks for analyzing the CI failure. The |
|
React Native CI Pipeline / React Native CI Android (pull_request) This failure appears to be environment-specific and is unrelated to the changes in this PR. |

Description
This PR migrates the
OIHW2OHWIProgram fromIm2ColMatMulto theTransposeoperator. By centralizing this logic, we leverage the specialized shader to optimize generic 4D transpositions (specifically the {0, 2, 3, 1} permutation pattern) while reducing code duplication.While this shader is capable of supporting 2D/3D transpositions, those optimizations are reserved for follow-up PRs.
Motivation and Context
See above.