Skip to content

UPSTREAM PR #18958: CUDA: use mmvq for mul-mat-id for small batch sizes#1101

Open
loci-dev wants to merge 4 commits intomainfrom
loci/pr-18958-mmid-vec
Open

UPSTREAM PR #18958: CUDA: use mmvq for mul-mat-id for small batch sizes#1101
loci-dev wants to merge 4 commits intomainfrom
loci/pr-18958-mmid-vec

Conversation

@loci-dev
Copy link

Note

Source pull request: ggml-org/llama.cpp#18958

Currently for batch_sizes > 1, we immediately move to mmq which is suboptimal for small batch sizes. Bring performance of batched bench in line (previously there was a dip at n_tokens = 2)

Micro-benchmark for test-backend-ops

Backend GGML op Op parameters TFLOPS master TFLOPS mmid-vec Speedup
CUDA0 MUL_MAT_ID type_a=q4_0,type_b=f32,n_mats=128,n_used=8,b=0,m=768,n=1,k=2048 4.61 4.62 1.00
CUDA0 MUL_MAT_ID type_a=q4_0,type_b=f32,n_mats=128,n_used=8,b=0,m=768,n=4,k=2048 2.34 6.13 2.62
CUDA0 MUL_MAT_ID type_a=q4_0,type_b=f32,n_mats=128,n_used=8,b=0,m=768,n=8,k=2048 4.27 6.83 1.60
CUDA0 MUL_MAT_ID type_a=q4_0,type_b=f32,n_mats=32,n_used=4,b=0,m=1792,n=1,k=2048 5.49 5.49 1.00
CUDA0 MUL_MAT_ID type_a=q4_0,type_b=f32,n_mats=32,n_used=4,b=0,m=1792,n=4,k=2048 3.37 6.37 1.89
CUDA0 MUL_MAT_ID type_a=q4_0,type_b=f32,n_mats=32,n_used=4,b=0,m=1792,n=8,k=2048 6.57 7.23 1.10

@loci-review
Copy link

loci-review bot commented Jan 31, 2026

No meaningful performance changes were detected across 113023 analyzed functions in the following binaries: build.bin.libmtmd.so, build.bin.libllama.so, build.bin.llama-tts, build.bin.llama-cvector-generator, build.bin.llama-gemma3-cli, build.bin.llama-gguf-split, build.bin.llama-llava-cli, build.bin.llama-minicpmv-cli, build.bin.llama-quantize, build.bin.llama-qwen2vl-cli, build.bin.llama-tokenize, build.bin.llama-bench, build.bin.libggml-base.so, build.bin.libggml-cpu.so, build.bin.libggml.so.

🔎 Full breakdown: Loci Inspector.
💬 Questions? Tag @loci-dev.

@loci-dev loci-dev force-pushed the main branch 27 times, most recently from b128b33 to d613f70 Compare February 1, 2026 12:16
@loci-dev loci-dev force-pushed the main branch 30 times, most recently from 7ff3e7f to 99b11e9 Compare February 3, 2026 09:20
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants