ggml webgpu: Fix bug in dispatching large matrix-vector multiplication by reeselevine · Pull Request #19535 · ggml-org/llama.cpp

reeselevine · 2026-02-12T06:01:03Z

Bug fix for calculating overflowing workgroup sizes for large matrix-vector multiplication batches. Should fix failures from new tests in #19519.

This approach isn't ideal because it may over-provision workgroups by quite a bit, a better strategy is the one proposed for Vulkan in #19509, but this will work for now.

reeselevine · 2026-02-18T15:40:58Z

@jeffbolznv can you approve this PR or should I ask someone else to?

jeffbolznv · 2026-02-18T15:51:14Z

I don't quite follow how this is fixing it. But if you explain it, I can try to review.

reeselevine · 2026-02-18T15:59:54Z

Sure, this was actually just a bug in the logic to handle cases where the number of workgroups that should be launched exceeds the X dimension limit. If it does overflow, we should launch the maximum number of workgroups in the X dimension, and then in the Y dimension, the total necessary workgroups divided by the maximum number of workgroups in the X dimension (total_wg/maxComputeWorkgroupsPerDimension). This way we have enough total workgroups launched, with the workgroup ids linearized in the shader.

This does overprovision the number of workgroups, for example if there are 65536 workgroups needed, we will end up launching 2 * 65335, but extra workgroups exit immediately. The TODO comment is to improve this in the future.

ggml-org#19535) * Fix bug in dispatching large matrix-vector multiplication

Fix bug in dispatching large matrix-vector multiplication

36f28fe

reeselevine mentioned this pull request Feb 12, 2026

test: mul_mat tests with huge batch size #19519

Merged

github-actions bot added the ggml changes relating to the ggml tensor library for machine learning label Feb 12, 2026

reeselevine added 2 commits February 18, 2026 08:15

Merge remote-tracking branch 'upstream/master' into huge_mul_mat_fix

f4348ea

restore fix

bfdbaa2

jeffbolznv approved these changes Feb 18, 2026

View reviewed changes

reeselevine merged commit e7f2f95 into ggml-org:master Feb 18, 2026
76 of 78 checks passed

liparetejas pushed a commit to liparetejas/llama.cpp that referenced this pull request Feb 23, 2026

ggml webgpu: Fix bug in dispatching large matrix-vector multiplication (

331b888

ggml-org#19535) * Fix bug in dispatching large matrix-vector multiplication

bartowski1182 pushed a commit to bartowski1182/llama.cpp that referenced this pull request Mar 2, 2026

ggml webgpu: Fix bug in dispatching large matrix-vector multiplication (

6d33eb2

ggml-org#19535) * Fix bug in dispatching large matrix-vector multiplication

ArberSephirotheca pushed a commit to ArberSephirotheca/llama.cpp that referenced this pull request Mar 3, 2026

ggml webgpu: Fix bug in dispatching large matrix-vector multiplication (

1875189

ggml-org#19535) * Fix bug in dispatching large matrix-vector multiplication

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

ggml webgpu: Fix bug in dispatching large matrix-vector multiplication#19535

ggml webgpu: Fix bug in dispatching large matrix-vector multiplication#19535
reeselevine merged 3 commits intoggml-org:masterfrom
reeselevine:huge_mul_mat_fix

reeselevine commented Feb 12, 2026

Uh oh!

reeselevine commented Feb 18, 2026

Uh oh!

jeffbolznv commented Feb 18, 2026

Uh oh!

reeselevine commented Feb 18, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

reeselevine commented Feb 12, 2026

Uh oh!

reeselevine commented Feb 18, 2026

Uh oh!

jeffbolznv commented Feb 18, 2026

Uh oh!

reeselevine commented Feb 18, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants