Skip to content

metal : fix q5_k mul_mv register spill#20399

Merged
ggerganov merged 1 commit intomasterfrom
gg/metal-mul-mv-q5_k-spill
Mar 11, 2026
Merged

metal : fix q5_k mul_mv register spill#20399
ggerganov merged 1 commit intomasterfrom
gg/metal-mul-mv-q5_k-spill

Conversation

@ggerganov
Copy link
Member

@ggerganov ggerganov commented Mar 11, 2026

cont #20398

Noticed too high register pressure in the q5_k vec kernel:

METAL_CAPTURE_ENABLED=1 GGML_METAL_CAPTURE_COMPUTE=8 ./bin/llama-completion -m ~/models/qwen2.5-3b-coder/ggml-model-q5_k.gguf -fa 1 -p "hello" -n 10 -no-cnv --top-k 1

Before:

image

After:

image

Perf:

Model Test t/s master t/s gg/metal-mul-mv-q5_k-spill Speedup
gemma2 9B Q5_K_M pp1 56.27 61.67 1.10
gemma2 9B Q5_K_M pp2 74.87 80.06 1.07
gemma2 9B Q5_K_M pp3 83.92 89.26 1.06
gemma2 9B Q5_K_M pp4 128.29 127.65 0.99
gemma2 9B Q5_K_M tg32 57.22 62.00 1.08
gemma3 4B Q5_K_M pp1 99.76 109.82 1.10
gemma3 4B Q5_K_M pp2 146.22 159.29 1.09
gemma3 4B Q5_K_M pp3 173.88 185.46 1.07
gemma3 4B Q5_K_M pp4 236.71 236.36 1.00
gemma3 4B Q5_K_M tg32 103.52 114.42 1.11
llama 8B Q5_K_M pp1 78.57 84.24 1.07
llama 8B Q5_K_M pp2 100.28 104.78 1.04
llama 8B Q5_K_M pp3 112.15 114.89 1.02
llama 8B Q5_K_M pp4 166.52 166.64 1.00
llama 8B Q5_K_M tg32 79.30 84.32 1.06
qwen2 3B Q5_K_M pp1 127.54 136.71 1.07
qwen2 3B Q5_K_M pp2 183.02 196.62 1.07
qwen2 3B Q5_K_M pp3 216.24 226.41 1.05
qwen2 3B Q5_K_M pp4 276.70 276.13 1.00
qwen2 3B Q5_K_M tg32 128.72 140.52 1.09
qwen2 7B Q5_K_M pp1 79.31 86.19 1.09
qwen2 7B Q5_K_M pp2 100.62 107.88 1.07
qwen2 7B Q5_K_M pp3 110.58 117.67 1.06
qwen2 7B Q5_K_M pp4 179.16 179.97 1.00
qwen2 7B Q5_K_M tg32 79.47 86.58 1.09
qwen3 0.6B Q5_K_M pp1 266.12 292.98 1.10
qwen3 0.6B Q5_K_M pp2 483.89 503.42 1.04
qwen3 0.6B Q5_K_M pp3 634.80 667.10 1.05
qwen3 0.6B Q5_K_M pp4 733.00 733.70 1.00
qwen3 0.6B Q5_K_M tg32 280.87 300.45 1.07

@github-actions github-actions bot added ggml changes relating to the ggml tensor library for machine learning Apple Metal https://en.wikipedia.org/wiki/Metal_(API) labels Mar 11, 2026
@ggerganov ggerganov merged commit b541241 into master Mar 11, 2026
16 of 75 checks passed
@ggerganov ggerganov deleted the gg/metal-mul-mv-q5_k-spill branch March 11, 2026 14:25
ProgenyAlpha pushed a commit to ProgenyAlpha/llama.cpp that referenced this pull request Mar 12, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Apple Metal https://en.wikipedia.org/wiki/Metal_(API) ggml changes relating to the ggml tensor library for machine learning

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant