vulkan: split mul_mat into multiple dispatches to avoid overflow#19509
vulkan: split mul_mat into multiple dispatches to avoid overflow#195090cc4m merged 2 commits intoggml-org:masterfrom
Conversation
The batch dimensions can be greater than the max workgroup count limit, in which case we need to split into multiple dispatches and pass the base index through a push constant. Fall back for the less common p021 and nc variants.
|
The new tests are failing on multiple backends, I'll move them to a separate PR so this isn't blocked. |
ccb4ed3 to
66d7c14
Compare
ggml/src/ggml-vulkan/ggml-vulkan.cpp
Outdated
| while (base_work_group_z < batch) { | ||
| uint32_t groups_z = std::min(batch - base_work_group_z, ctx->device->properties.limits.maxComputeWorkGroupCount[2]); | ||
|
|
||
| ggml_pipeline_request_descriptor_sets(ctx, pipeline, 1); |
There was a problem hiding this comment.
Why request the descriptor sets in the loop and not before? It's not gonna retrigger pipeline compile of course, but will ping the descriptor pools more than necessary.
There was a problem hiding this comment.
ok, moved it out of the loop for now. Eventually, I'd like to not have to explicitly call this anywhere.
There was a problem hiding this comment.
It should be possible to automatically request one when grabbing the pipeline and to allocate the sets on demand before dispatch.
There was a problem hiding this comment.
Yeah, the main catch right now is that some shaders won't get their wg_denoms initialized until after this call.
…l-org#19509) * vulkan: split mul_mat into multiple dispatches to avoid overflow The batch dimensions can be greater than the max workgroup count limit, in which case we need to split into multiple dispatches and pass the base index through a push constant. Fall back for the less common p021 and nc variants. * address feedback
…l-org#19509) * vulkan: split mul_mat into multiple dispatches to avoid overflow The batch dimensions can be greater than the max workgroup count limit, in which case we need to split into multiple dispatches and pass the base index through a push constant. Fall back for the less common p021 and nc variants. * address feedback
…l-org#19509) * vulkan: split mul_mat into multiple dispatches to avoid overflow The batch dimensions can be greater than the max workgroup count limit, in which case we need to split into multiple dispatches and pass the base index through a push constant. Fall back for the less common p021 and nc variants. * address feedback
The batch dimensions can be greater than the max workgroup count limit, in which case we need to split into multiple dispatches and pass the base index through a push constant.
Fall back for the less common p021 and nc variants.
Fixes #19471.