WebGPU: Support Split-K with batch size > 1 by Jiawei-Shao · Pull Request #28151 · microsoft/onnxruntime

Jiawei-Shao · 2026-04-21T05:44:03Z

Description

This patch adds the support of Split-K with batch size > 1 by
encoding both batch index and Split-K index in dispatch_z and
decompose them in the shader via:
batch = logical_global_id.z / num_k_splits
split_index = logical_global_id.z % num_k_splits

This patch also adds batch size to the criteria of using Split-K
as increasing batch size will also increasing the parallelism,
reducing the effectiveness of Split-K.

This patch also replaces consteval with constexpr in
ort_version_check.h to workaround a compilation error
about vs2022.

Motivation and Context

With this patch we can improve the performance of
sam-vit-b-decoder-static-fp16-demo (7.5%) on Intel PTL.

This patch adds the support of Split-K with batch size > 1 by encoding both batch index and Split-K index in dispatch_z and decompose them in the shader via: batch = logical_global_id.z / num_k_splits split_index = logical_global_id.z % num_k_splits This patch also adds batch size to the criteria of using Split-K as increasing batch size will also increasing the parallelism, reducing the effectiveness of Split-K.

This reverts commit 0d79450.

Copilot

Pull request overview

Adds WebGPU Split‑K support for MatMul/Conv|MatMul when batch_size > 1 by packing (batch, split_index) into dispatch_z, and updates Split‑K heuristics to consider batch size (plus a small MSVC workaround in version checking).

Changes:

Encode both batch index and Split‑K index into dispatch_z and decode in WGSL using splits_per_batch.
Add a batch-size threshold and incorporate batch size into Split‑K gating heuristics.
Add new unit tests intended to exercise Split‑K with batched inputs; replace consteval with constexpr in version validation helpers.

Reviewed changes

Copilot reviewed 11 out of 11 changed files in this pull request and generated 5 comments.

Show a summary per file

File	Description
onnxruntime/core/providers/webgpu/math/gemm_utils.cc	WGSL Split‑K indexing updated to decode batch/split from `logical_global_id.z`.
onnxruntime/core/providers/webgpu/math/gemm_packed.cc	Minor callsite update (comment/arg annotation) for `MakeMatMulPackedVec4Source`.
onnxruntime/core/providers/webgpu/math/matmul.cc	Compute-side Split‑K dispatch now encodes `(batch * splits_per_batch)` and passes `splits_per_batch` as a uniform; batched fill-bias/zero init.
onnxruntime/core/providers/webgpu/math/matmul.h	Extend fill-bias/zero program factory to accept `batch_size`.
onnxruntime/core/providers/webgpu/math/matmul_packed.h	Add `splits_per_batch` and `batch_size` uniforms to relevant programs.
onnxruntime/core/providers/webgpu/math/matmul_packed.cc	Fill-bias/zero shader updated to initialize outputs across all batches.
onnxruntime/core/providers/webgpu/webgpu_utils.h	Add `max_batch_size_` to Split‑K config.
onnxruntime/core/providers/webgpu/webgpu_utils.cc	Split‑K gating updated to include batch size and enforce a max batch threshold.
onnxruntime/core/session/ort_version_check.h	Switch `consteval` to `constexpr` to work around VS2022/MSVC issues.
onnxruntime/test/providers/cpu/math/matmul_test.cc	Add a batched MatMul test intended to hit the Split‑K path.
onnxruntime/test/providers/cpu/nn/conv_op_test.cc	Add batched Conv2D “matmul-like” tests for Split‑K (with/without bias).

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Copilot

Pull request overview

Copilot reviewed 11 out of 11 changed files in this pull request and generated 2 comments.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Copilot

Pull request overview

Copilot reviewed 11 out of 11 changed files in this pull request and generated 4 comments.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Copilot

Pull request overview

Copilot reviewed 11 out of 11 changed files in this pull request and generated 2 comments.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Copilot

Pull request overview

Copilot reviewed 11 out of 11 changed files in this pull request and generated 3 comments.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

qjia7

LGTM with one nit.

Jiawei-Shao added 5 commits April 21, 2026 09:07

Fix bug in Gemm

54e0509

Rename

a5baabc

Recover consteval

0d79450

Revert "Recover consteval"

0e9c382

This reverts commit 0d79450.

qjia7 requested a review from Copilot April 21, 2026 09:35

Copilot started reviewing on behalf of qjia7 April 21, 2026 09:36 View session

Copilot AI reviewed Apr 21, 2026

View reviewed changes

Fix comments from copilot

6866166

Jiawei-Shao requested a review from Copilot April 22, 2026 08:48

Copilot started reviewing on behalf of Jiawei-Shao April 22, 2026 08:49 View session

Copilot AI reviewed Apr 22, 2026

View reviewed changes

Comment thread onnxruntime/core/providers/webgpu/webgpu_utils.cc Outdated

Comment thread onnxruntime/core/providers/webgpu/math/matmul.cc Outdated

guschmue added the ep:WebGPU ort-web webgpu provider label Apr 22, 2026

Address more copilot's comments

632bc2f

Jiawei-Shao requested a review from Copilot April 23, 2026 02:53

Copilot started reviewing on behalf of Jiawei-Shao April 23, 2026 02:54 View session

Copilot AI reviewed Apr 23, 2026

View reviewed changes

Comment thread onnxruntime/test/providers/cpu/nn/conv_op_test.cc Outdated

Comment thread onnxruntime/core/providers/webgpu/webgpu_utils.h

Comment thread onnxruntime/core/providers/webgpu/webgpu_utils.cc

Comment thread onnxruntime/test/providers/cpu/nn/conv_op_test.cc Outdated

Address more copilot's comments

f6a00ad

Jiawei-Shao requested a review from Copilot April 23, 2026 05:55

Copilot started reviewing on behalf of Jiawei-Shao April 23, 2026 05:56 View session

Copilot AI reviewed Apr 23, 2026

View reviewed changes

Comment thread onnxruntime/core/providers/webgpu/math/matmul.cc Outdated

Comment thread onnxruntime/test/providers/cpu/math/matmul_test.cc

Address more copilot's comments

eba527c

Jiawei-Shao requested a review from Copilot April 23, 2026 06:20

Copilot started reviewing on behalf of Jiawei-Shao April 23, 2026 06:21 View session

Copilot AI reviewed Apr 23, 2026

View reviewed changes

Comment thread onnxruntime/core/session/ort_version_check.h Outdated

Comment thread onnxruntime/core/providers/webgpu/webgpu_utils.cc Outdated

Comment thread onnxruntime/core/providers/webgpu/math/matmul_packed.cc

Address more copilot's comments

a4fc2a5

qjia7 reviewed Apr 24, 2026

View reviewed changes

Comment thread onnxruntime/core/providers/webgpu/math/gemm_utils.cc

Address reviewer's comment

ea78c68

Jiawei-Shao requested a review from qjia7 April 27, 2026 07:17

qjia7 reviewed Apr 27, 2026

View reviewed changes

Comment thread onnxruntime/core/providers/webgpu/nn/conv2d_mm.cc Outdated

Jiawei-Shao added 2 commits April 27, 2026 16:24

Address more reviewer's comments

265bf71

Fix a typo

3cd9795

qjia7 previously approved these changes Apr 27, 2026

View reviewed changes

Merge branch 'main' into split-k-multi-batch

a3fec55

Jiawei-Shao dismissed qjia7’s stale review via ad1e1c0 April 28, 2026 03:43

Fix build error

ad1e1c0

Jiawei-Shao requested a review from qjia7 April 29, 2026 00:22

qjia7 approved these changes Apr 29, 2026

View reviewed changes

guschmue approved these changes Apr 29, 2026

View reviewed changes

guschmue merged commit f97b8c4 into microsoft:main Apr 29, 2026
87 of 88 checks passed

BrewTestBot mentioned this pull request May 8, 2026

onnxruntime 1.26.0 Homebrew/homebrew-core#281672

Merged

This was referenced May 8, 2026

chore(deps): Bump Microsoft.ML.OnnxRuntime from 1.22.0 to 1.26.0 verbara/Verbara.Sdk#3

Open

Bump Microsoft.ML.OnnxRuntime from 1.25.1 to 1.26.0 IoTSharp/SonnetDB#62

Open

Bump Microsoft.ML.OnnxRuntime from 1.25.1 to 1.26.0 lopatnov/translate#23

Merged

Conversation

Jiawei-Shao commented Apr 21, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Description

Motivation and Context

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Reviewed changes

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Uh oh!

Uh oh!

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Uh oh!

Uh oh!

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Uh oh!

Uh oh!

Uh oh!

Uh oh!

qjia7 left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

Jiawei-Shao commented Apr 21, 2026 •

edited

Loading