Skip to content

webgpu support for qwen3.5#27996

Merged
guschmue merged 5 commits intomainfrom
gs/wgpu-qwen35-support
Apr 9, 2026
Merged

webgpu support for qwen3.5#27996
guschmue merged 5 commits intomainfrom
gs/wgpu-qwen35-support

Conversation

@guschmue
Copy link
Copy Markdown
Contributor

@guschmue guschmue commented Apr 7, 2026

webgpu support for qwen3.5, adding LinearAttention and CausalConvWithState ops based on this proposal:
from onnx/onnx#7767

The model can be created with model builder from https://github.com/microsoft/onnxruntime-genai/blob/main/src/python/py/models/builder.py.

For example for the text only flavor:

python builder.py -m Qwen/Qwen3.5-0.8B  -o Qwen3.5-0.8B -e webgpu -p int4 --extra_options int4_accuracy_level=4 exclude_embeds=False

Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Adds WebGPU execution-provider support needed to run Qwen3.5 models by introducing two new contrib ops and wiring them into kernel registration.

Changes:

  • Fixes boolean Expand vec4/bool-pack indexing in the WebGPU Expand shader generation.
  • Adds WebGPU contrib kernels for LinearAttention and CausalConvWithState (C++ + WGSL templates) and registers them.
  • Implements fused recurrent linear attention and causal depthwise conv-with-state shaders for Qwen3.5-style decode/prefill.

Reviewed changes

Copilot reviewed 8 out of 8 changed files in this pull request and generated 9 comments.

Show a summary per file
File Description
onnxruntime/core/providers/webgpu/tensor/expand.cc Corrects boolx4 input offset calculation for vec4-path expand.
onnxruntime/contrib_ops/webgpu/webgpu_contrib_kernels.cc Registers new WebGPU contrib kernels for LinearAttention and CausalConvWithState.
onnxruntime/contrib_ops/webgpu/bert/linear_attention.h Declares LinearAttention kernel/program and update-rule parsing.
onnxruntime/contrib_ops/webgpu/bert/linear_attention.cc Implements LinearAttention kernel setup, validation, dispatch, and shader templating.
onnxruntime/contrib_ops/webgpu/bert/linear_attention.wgsl.template Adds fused linear attention WGSL implementation (incl. delta/gated-delta paths).
onnxruntime/contrib_ops/webgpu/bert/causal_conv_with_state.h Declares CausalConvWithState kernel/program and activation parsing.
onnxruntime/contrib_ops/webgpu/bert/causal_conv_with_state.cc Implements CausalConvWithState kernel setup, validation, and dispatch.
onnxruntime/contrib_ops/webgpu/bert/causal_conv_with_state.wgsl.template Adds WGSL for depthwise causal convolution with optional state/bias/SiLU.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment thread onnxruntime/contrib_ops/webgpu/bert/linear_attention.cc
Comment thread onnxruntime/contrib_ops/webgpu/bert/linear_attention.cc Outdated
Comment thread onnxruntime/contrib_ops/webgpu/bert/linear_attention.cc
Comment thread onnxruntime/contrib_ops/webgpu/bert/linear_attention.cc
Comment thread onnxruntime/contrib_ops/webgpu/bert/linear_attention.cc Outdated
Comment thread onnxruntime/contrib_ops/webgpu/bert/causal_conv_with_state.cc
Comment thread onnxruntime/contrib_ops/webgpu/bert/causal_conv_with_state.cc
Comment thread onnxruntime/contrib_ops/webgpu/bert/causal_conv_with_state.cc Outdated
Comment thread onnxruntime/contrib_ops/webgpu/bert/linear_attention.wgsl.template
Comment thread onnxruntime/contrib_ops/webgpu/bert/causal_conv_with_state.cc Outdated
Comment thread onnxruntime/contrib_ops/webgpu/bert/causal_conv_with_state.cc Outdated
Comment thread onnxruntime/contrib_ops/webgpu/bert/causal_conv_with_state.h Outdated
Comment thread onnxruntime/contrib_ops/webgpu/bert/linear_attention.cc Outdated
Comment thread onnxruntime/contrib_ops/webgpu/bert/linear_attention.h Outdated
Comment thread onnxruntime/core/providers/webgpu/tensor/expand.cc
Comment thread onnxruntime/contrib_ops/webgpu/bert/causal_conv_with_state.cc
Comment thread onnxruntime/contrib_ops/webgpu/bert/linear_attention.cc Outdated
@guschmue guschmue added the ep:WebGPU ort-web webgpu provider label Apr 8, 2026
hariharans29
hariharans29 previously approved these changes Apr 8, 2026
Comment thread onnxruntime/contrib_ops/webgpu/bert/causal_conv_with_state.cc
@guschmue guschmue merged commit 3ac6040 into main Apr 9, 2026
101 of 102 checks passed
@guschmue guschmue deleted the gs/wgpu-qwen35-support branch April 9, 2026 22:31
sanaa-hamel-microsoft pushed a commit that referenced this pull request Apr 21, 2026
webgpu support for qwen3.5, adding LinearAttention and
CausalConvWithState ops based on this proposal:
from onnx/onnx#7767

The model can be created with model builder from
https://github.com/microsoft/onnxruntime-genai/blob/main/src/python/py/models/builder.py.

For example for the text only flavor:
```
python builder.py -m Qwen/Qwen3.5-0.8B  -o Qwen3.5-0.8B -e webgpu -p int4 --extra_options int4_accuracy_level=4 exclude_embeds=False
```
sanaa-hamel-microsoft added a commit that referenced this pull request Apr 24, 2026
Version bump to 1.25.1.

This cherry-picks the following commits for the release:

| Commit ID | PR Number | Commit Title |
|-----------|-----------|-------------|
| e532c21 | #27842 | linear attention signature |
| 410f5a8 | #27752 | +rotemb, +rmsnorm, reshape->opset-25,
transpose->opset-24 |
| 0fedb26 | #27907 | Add LinearAttention and CausalConvState ops for
Qwen3.5 |
| 3ac6040 | #27996 | webgpu support for qwen3.5 |
| c36c422 | #27998 | [WebGPU EP] Fuse QMoE 1-token decode path to
reduce GPU dispatches |
| 94f32ec | #27289 | [CORE]: Improve filesystem error messages during
Linux device discovery |
| dce77a3 | #28118 | Fix lack of auth on python packaging |

---------

Co-authored-by: Akshay Sonawane <111780983+apsonawane@users.noreply.github.com>
Co-authored-by: Copilot Autofix powered by AI <62310815+github-advanced-security[bot]@users.noreply.github.com>
Co-authored-by: eserscor <erscor@microsoft.com>
Co-authored-by: Sanaa Hamel <sanaahamel@microsoft.com>
Co-authored-by: Guenther Schmuelling <guschmue@microsoft.com>
Co-authored-by: Stephan Seitz <sseitz@nvidia.com>
Co-authored-by: Jiajia Qin <jiajiaqin@microsoft.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

ep:WebGPU ort-web webgpu provider release:1.25.1

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants