test: mul_mat tests with huge batch size#19519
Conversation
|
@reeselevine can you address the webgpu failure? @JohannesGaessler or @am17an can you address the cuda failure? For context, in #19471 with a larger |
|
#19535 should fix the WebGPU failures |
|
Are these only for the F16 data-type? For large batch sizes the CUDA code falls back to using cuBLAS, I think that should be a relatively simple change vs doing for quantized data types |
|
In the failing model, everything was GGML_TYPE_F32. The GGML_TYPE_F16 came from me copy/pasting another test case. We could add both if there's an interesting difference in the code paths. |
|
as long as it's F16, BF16 or F32 I think #19538 will fix it (passes these tests) |
a3de448 to
f6c10e6
Compare
tests for #19471.
vulkan fix is in #19509.