Add weight layout transformation cache for Conv operator#26595
Add weight layout transformation cache for Conv operator#26595jchen10 wants to merge 1 commit intomicrosoft:mainfrom
Conversation
Implement lazy weight layout transformation for WebGPU Conv kernel to avoid redundant GPU transposes on every inference. Key changes: - Add WeightLayoutTransformCache to cache transformed weights by name and format - Implement TransformWeightLayout() helper using existing TransposeKernel for OIHW->HWIO transformation - Cache stored in WebGpuExecutionProvider, shared across all kernels
|
Follow-up for #26554 |
|
@fs-eire PTAL |
|
I am still looking into the PrePack approach, which seems more appealing as it does release the original tensors. |
Please take a look at #26602. However I didn't finish all validation yet. |
Great. That's exactly what I wanted. One pity of PrePack is that we couldn't know the runtime input/output shapes which may impact how we choose the optimal blocked format for weight. Let's see if this issue will come up in the future. So far so good. |
Implement lazy weight layout transformation for WebGPU Conv kernel to avoid redundant GPU transposes on every inference.
Key changes: