Add weight layout transformation cache for Conv operator by jchen10 · Pull Request #26595 · microsoft/onnxruntime

jchen10 · 2025-11-18T07:58:08Z

Implement lazy weight layout transformation for WebGPU Conv kernel to avoid redundant GPU transposes on every inference.

Key changes:

Add WeightLayoutTransformCache to cache transformed weights by name and format
Implement TransformWeightLayout() helper using existing TransposeKernel for OIHW->HWIO transformation
Cache stored in WebGpuExecutionProvider, shared across all kernels

Implement lazy weight layout transformation for WebGPU Conv kernel to avoid redundant GPU transposes on every inference. Key changes: - Add WeightLayoutTransformCache to cache transformed weights by name and format - Implement TransformWeightLayout() helper using existing TransposeKernel for OIHW->HWIO transformation - Cache stored in WebGpuExecutionProvider, shared across all kernels

jchen10 · 2025-11-18T12:19:08Z

Follow-up for #26554

jchen10 · 2025-11-18T13:58:25Z

@fs-eire PTAL

jchen10 · 2025-11-19T02:39:25Z

I am still looking into the PrePack approach, which seems more appealing as it does release the original tensors.

fs-eire · 2025-11-19T02:41:00Z

I am still looking into the PrePack approach, which seems more appealing as it does release the original tensors.

Please take a look at #26602. However I didn't finish all validation yet.

jchen10 · 2025-11-19T03:09:17Z

I am still looking into the PrePack approach, which seems more appealing as it does release the original tensors.

Please take a look at #26602. However I didn't finish all validation yet.

Great. That's exactly what I wanted. One pity of PrePack is that we couldn't know the runtime input/output shapes which may impact how we choose the optimal blocked format for weight. Let's see if this issue will come up in the future. So far so good.

jchen10 force-pushed the conv_weight branch from ce0fe5c to 60c50a4 Compare November 18, 2025 11:55

jchen10 closed this Nov 26, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add weight layout transformation cache for Conv operator#26595

Add weight layout transformation cache for Conv operator#26595
jchen10 wants to merge 1 commit intomicrosoft:mainfrom
jchen10:conv_weight

jchen10 commented Nov 18, 2025

Uh oh!

jchen10 commented Nov 18, 2025

Uh oh!

jchen10 commented Nov 18, 2025

Uh oh!

jchen10 commented Nov 19, 2025

Uh oh!

fs-eire commented Nov 19, 2025

Uh oh!

jchen10 commented Nov 19, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

jchen10 commented Nov 18, 2025

Uh oh!

jchen10 commented Nov 18, 2025

Uh oh!

jchen10 commented Nov 18, 2025

Uh oh!

jchen10 commented Nov 19, 2025

Uh oh!

fs-eire commented Nov 19, 2025

Uh oh!

jchen10 commented Nov 19, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants