webgpu support for qwen3.5 by guschmue · Pull Request #27996 · microsoft/onnxruntime

guschmue · 2026-04-07T01:41:02Z

webgpu support for qwen3.5, adding LinearAttention and CausalConvWithState ops based on this proposal:
from onnx/onnx#7767

The model can be created with model builder from https://github.com/microsoft/onnxruntime-genai/blob/main/src/python/py/models/builder.py.

For example for the text only flavor:

python builder.py -m Qwen/Qwen3.5-0.8B  -o Qwen3.5-0.8B -e webgpu -p int4 --extra_options int4_accuracy_level=4 exclude_embeds=False

Copilot

Pull request overview

Adds WebGPU execution-provider support needed to run Qwen3.5 models by introducing two new contrib ops and wiring them into kernel registration.

Changes:

Fixes boolean Expand vec4/bool-pack indexing in the WebGPU Expand shader generation.
Adds WebGPU contrib kernels for LinearAttention and CausalConvWithState (C++ + WGSL templates) and registers them.
Implements fused recurrent linear attention and causal depthwise conv-with-state shaders for Qwen3.5-style decode/prefill.

Reviewed changes

Copilot reviewed 8 out of 8 changed files in this pull request and generated 9 comments.

Show a summary per file

File	Description
onnxruntime/core/providers/webgpu/tensor/expand.cc	Corrects boolx4 input offset calculation for vec4-path expand.
onnxruntime/contrib_ops/webgpu/webgpu_contrib_kernels.cc	Registers new WebGPU contrib kernels for LinearAttention and CausalConvWithState.
onnxruntime/contrib_ops/webgpu/bert/linear_attention.h	Declares LinearAttention kernel/program and update-rule parsing.
onnxruntime/contrib_ops/webgpu/bert/linear_attention.cc	Implements LinearAttention kernel setup, validation, dispatch, and shader templating.
onnxruntime/contrib_ops/webgpu/bert/linear_attention.wgsl.template	Adds fused linear attention WGSL implementation (incl. delta/gated-delta paths).
onnxruntime/contrib_ops/webgpu/bert/causal_conv_with_state.h	Declares CausalConvWithState kernel/program and activation parsing.
onnxruntime/contrib_ops/webgpu/bert/causal_conv_with_state.cc	Implements CausalConvWithState kernel setup, validation, and dispatch.
onnxruntime/contrib_ops/webgpu/bert/causal_conv_with_state.wgsl.template	Adds WGSL for depthwise causal convolution with optional state/bias/SiLU.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

webgpu support for qwen3.5, adding LinearAttention and CausalConvWithState ops based on this proposal: from onnx/onnx#7767 The model can be created with model builder from https://github.com/microsoft/onnxruntime-genai/blob/main/src/python/py/models/builder.py. For example for the text only flavor: ``` python builder.py -m Qwen/Qwen3.5-0.8B -o Qwen3.5-0.8B -e webgpu -p int4 --extra_options int4_accuracy_level=4 exclude_embeds=False ```

Version bump to 1.25.1. This cherry-picks the following commits for the release: | Commit ID | PR Number | Commit Title | |-----------|-----------|-------------| | e532c21 | #27842 | linear attention signature | | 410f5a8 | #27752 | +rotemb, +rmsnorm, reshape->opset-25, transpose->opset-24 | | 0fedb26 | #27907 | Add LinearAttention and CausalConvState ops for Qwen3.5 | | 3ac6040 | #27996 | webgpu support for qwen3.5 | | c36c422 | #27998 | [WebGPU EP] Fuse QMoE 1-token decode path to reduce GPU dispatches | | 94f32ec | #27289 | [CORE]: Improve filesystem error messages during Linux device discovery | | dce77a3 | #28118 | Fix lack of auth on python packaging | --------- Co-authored-by: Akshay Sonawane <111780983+apsonawane@users.noreply.github.com> Co-authored-by: Copilot Autofix powered by AI <62310815+github-advanced-security[bot]@users.noreply.github.com> Co-authored-by: eserscor <erscor@microsoft.com> Co-authored-by: Sanaa Hamel <sanaahamel@microsoft.com> Co-authored-by: Guenther Schmuelling <guschmue@microsoft.com> Co-authored-by: Stephan Seitz <sseitz@nvidia.com> Co-authored-by: Jiajia Qin <jiajiaqin@microsoft.com>

webgpu support for qwen3.5

d0acdd7

guschmue mentioned this pull request Apr 7, 2026

webgpu support for LinearAttention and CausalConvWithState #27896

Closed

add CacheHint to CausalConvWithState

4a5283e

guschmue marked this pull request as ready for review April 7, 2026 19:44

guschmue mentioned this pull request Apr 7, 2026

[Feature Request] ONNX Loop op makes Mamba (SSM) models unusable on CPU and WebGPU #27796

Closed

edgchen1 requested a review from Copilot April 8, 2026 00:00

Copilot started reviewing on behalf of edgchen1 April 8, 2026 00:01 View session

Copilot AI reviewed Apr 8, 2026

View reviewed changes

edgchen1 reviewed Apr 8, 2026

View reviewed changes

hariharans29 reviewed Apr 8, 2026

View reviewed changes

Comment thread onnxruntime/contrib_ops/webgpu/bert/causal_conv_with_state.cc

hariharans29 reviewed Apr 8, 2026

View reviewed changes

Comment thread onnxruntime/contrib_ops/webgpu/bert/linear_attention.cc Outdated

guschmue added the ep:WebGPU ort-web webgpu provider label Apr 8, 2026

review feedback

68334ab

hariharans29 previously approved these changes Apr 8, 2026

View reviewed changes

guschmue added 2 commits April 8, 2026 16:36

make clang happy

7d1f37a

Merge branch 'main' into gs/wgpu-qwen35-support

12ff020

guschmue dismissed hariharans29’s stale review via 12ff020 April 8, 2026 23:36

edgchen1 reviewed Apr 9, 2026

View reviewed changes

Comment thread onnxruntime/contrib_ops/webgpu/bert/causal_conv_with_state.cc

hariharans29 approved these changes Apr 9, 2026

View reviewed changes

guschmue merged commit 3ac6040 into main Apr 9, 2026
101 of 102 checks passed

guschmue deleted the gs/wgpu-qwen35-support branch April 9, 2026 22:31

guschmue added the release:1.25.1 label Apr 21, 2026

vraspar mentioned this pull request Apr 21, 2026

ORT 1.25.1 release: version bump and cherry-pick #27907 #28149

Merged

BrewTestBot mentioned this pull request Apr 27, 2026

onnxruntime 1.25.1 Homebrew/homebrew-core#279761

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

webgpu support for qwen3.5#27996

webgpu support for qwen3.5#27996
guschmue merged 5 commits intomainfrom
gs/wgpu-qwen35-support

guschmue commented Apr 7, 2026 •

edited

Loading

Uh oh!

Copilot AI left a comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

Conversation

guschmue commented Apr 7, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Reviewed changes

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

guschmue commented Apr 7, 2026 •

edited

Loading