ggml webgpu: Fix bug in dispatching large matrix-vector multiplication#19535
ggml webgpu: Fix bug in dispatching large matrix-vector multiplication#19535reeselevine merged 3 commits intoggml-org:masterfrom
Conversation
|
@jeffbolznv can you approve this PR or should I ask someone else to? |
|
I don't quite follow how this is fixing it. But if you explain it, I can try to review. |
|
Sure, this was actually just a bug in the logic to handle cases where the number of workgroups that should be launched exceeds the X dimension limit. If it does overflow, we should launch the maximum number of workgroups in the X dimension, and then in the Y dimension, the total necessary workgroups divided by the maximum number of workgroups in the X dimension ( This does overprovision the number of workgroups, for example if there are 65536 workgroups needed, we will end up launching 2 * 65335, but extra workgroups exit immediately. The TODO comment is to improve this in the future. |
ggml-org#19535) * Fix bug in dispatching large matrix-vector multiplication
ggml-org#19535) * Fix bug in dispatching large matrix-vector multiplication
ggml-org#19535) * Fix bug in dispatching large matrix-vector multiplication
Bug fix for calculating overflowing workgroup sizes for large matrix-vector multiplication batches. Should fix failures from new tests in #19519.
This approach isn't ideal because it may over-provision workgroups by quite a bit, a better strategy is the one proposed for Vulkan in #19509, but this will work for now.