Skip to content

ggml webgpu: Fix bug in dispatching large matrix-vector multiplication#19535

Merged
reeselevine merged 3 commits intoggml-org:masterfrom
reeselevine:huge_mul_mat_fix
Feb 18, 2026
Merged

ggml webgpu: Fix bug in dispatching large matrix-vector multiplication#19535
reeselevine merged 3 commits intoggml-org:masterfrom
reeselevine:huge_mul_mat_fix

Conversation

@reeselevine
Copy link
Contributor

Bug fix for calculating overflowing workgroup sizes for large matrix-vector multiplication batches. Should fix failures from new tests in #19519.

This approach isn't ideal because it may over-provision workgroups by quite a bit, a better strategy is the one proposed for Vulkan in #19509, but this will work for now.

@github-actions github-actions bot added the ggml changes relating to the ggml tensor library for machine learning label Feb 12, 2026
@reeselevine
Copy link
Contributor Author

@jeffbolznv can you approve this PR or should I ask someone else to?

@jeffbolznv
Copy link
Contributor

I don't quite follow how this is fixing it. But if you explain it, I can try to review.

@reeselevine
Copy link
Contributor Author

Sure, this was actually just a bug in the logic to handle cases where the number of workgroups that should be launched exceeds the X dimension limit. If it does overflow, we should launch the maximum number of workgroups in the X dimension, and then in the Y dimension, the total necessary workgroups divided by the maximum number of workgroups in the X dimension (total_wg/maxComputeWorkgroupsPerDimension). This way we have enough total workgroups launched, with the workgroup ids linearized in the shader.

This does overprovision the number of workgroups, for example if there are 65536 workgroups needed, we will end up launching 2 * 65335, but extra workgroups exit immediately. The TODO comment is to improve this in the future.

@reeselevine reeselevine merged commit e7f2f95 into ggml-org:master Feb 18, 2026
76 of 78 checks passed
liparetejas pushed a commit to liparetejas/llama.cpp that referenced this pull request Feb 23, 2026
ggml-org#19535)

* Fix bug in dispatching large matrix-vector multiplication
bartowski1182 pushed a commit to bartowski1182/llama.cpp that referenced this pull request Mar 2, 2026
ggml-org#19535)

* Fix bug in dispatching large matrix-vector multiplication
ArberSephirotheca pushed a commit to ArberSephirotheca/llama.cpp that referenced this pull request Mar 3, 2026
ggml-org#19535)

* Fix bug in dispatching large matrix-vector multiplication
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

ggml changes relating to the ggml tensor library for machine learning

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants