Skip to content

vulkan: split mul_mat into multiple dispatches to avoid overflow#19509

Merged
0cc4m merged 2 commits intoggml-org:masterfrom
jeffbolznv:mul_mat_batch_overflow
Feb 18, 2026
Merged

vulkan: split mul_mat into multiple dispatches to avoid overflow#19509
0cc4m merged 2 commits intoggml-org:masterfrom
jeffbolznv:mul_mat_batch_overflow

Conversation

@jeffbolznv
Copy link
Contributor

The batch dimensions can be greater than the max workgroup count limit, in which case we need to split into multiple dispatches and pass the base index through a push constant.

Fall back for the less common p021 and nc variants.

Fixes #19471.

@github-actions github-actions bot added testing Everything test related Vulkan Issues specific to the Vulkan backend ggml changes relating to the ggml tensor library for machine learning labels Feb 11, 2026
The batch dimensions can be greater than the max workgroup count limit,
in which case we need to split into multiple dispatches and pass the base
index through a push constant.

Fall back for the less common p021 and nc variants.
@jeffbolznv
Copy link
Contributor Author

The new tests are failing on multiple backends, I'll move them to a separate PR so this isn't blocked.

while (base_work_group_z < batch) {
uint32_t groups_z = std::min(batch - base_work_group_z, ctx->device->properties.limits.maxComputeWorkGroupCount[2]);

ggml_pipeline_request_descriptor_sets(ctx, pipeline, 1);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why request the descriptor sets in the loop and not before? It's not gonna retrigger pipeline compile of course, but will ping the descriptor pools more than necessary.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ok, moved it out of the loop for now. Eventually, I'd like to not have to explicitly call this anywhere.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It should be possible to automatically request one when grabbing the pipeline and to allocate the sets on demand before dispatch.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, the main catch right now is that some shaders won't get their wg_denoms initialized until after this call.

@0cc4m 0cc4m merged commit d0061be into ggml-org:master Feb 18, 2026
74 of 78 checks passed
liparetejas pushed a commit to liparetejas/llama.cpp that referenced this pull request Feb 23, 2026
…l-org#19509)

* vulkan: split mul_mat into multiple dispatches to avoid overflow

The batch dimensions can be greater than the max workgroup count limit,
in which case we need to split into multiple dispatches and pass the base
index through a push constant.

Fall back for the less common p021 and nc variants.

* address feedback
bartowski1182 pushed a commit to bartowski1182/llama.cpp that referenced this pull request Mar 2, 2026
…l-org#19509)

* vulkan: split mul_mat into multiple dispatches to avoid overflow

The batch dimensions can be greater than the max workgroup count limit,
in which case we need to split into multiple dispatches and pass the base
index through a push constant.

Fall back for the less common p021 and nc variants.

* address feedback
ArberSephirotheca pushed a commit to ArberSephirotheca/llama.cpp that referenced this pull request Mar 3, 2026
…l-org#19509)

* vulkan: split mul_mat into multiple dispatches to avoid overflow

The batch dimensions can be greater than the max workgroup count limit,
in which case we need to split into multiple dispatches and pass the base
index through a push constant.

Fall back for the less common p021 and nc variants.

* address feedback
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

ggml changes relating to the ggml tensor library for machine learning testing Everything test related Vulkan Issues specific to the Vulkan backend

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Vulkan: GGML_ASSERT failed on Kimi-Linear-48B with large context - maxComputeWorkGroupCount exceeded

2 participants