vulkan: Enable topk_moe fusion for GLM-4.7-Flash by jeffbolznv · Pull Request #18947 · ggml-org/llama.cpp

jeffbolznv · 2026-01-20T04:39:20Z

Just need to add the fusion detection logic, this is a combination of existing modes (early softmax, bias, norm, scale), and is covered by the existing backend tests.

before

Z:\github\jeffbolznv\llama.cpp\build\bin\RelWithDebInfo>llama-bench -m c:\models\GLM-4.7-Flash-Q4_K_M.gguf -r 10 -fa 1 -p 512 -n 128
ggml_vulkan: Found 1 Vulkan devices:
ggml_vulkan: 0 = NVIDIA GeForce RTX 5090 (NVIDIA) | uma: 0 | fp16: 1 | bf16: 1 | warp size: 32 | shared memory: 49152 | int dot: 1 | matrix cores: NV_coopmat2
load_backend: loaded Vulkan backend from Z:\github\jeffbolznv\llama.cpp\build\bin\RelWithDebInfo\ggml-vulkan.dll
load_backend: loaded CPU backend from Z:\github\jeffbolznv\llama.cpp\build\bin\RelWithDebInfo\ggml-cpu.dll
| model                          |       size |     params | backend    | ngl | fa |            test |                  t/s |
| ------------------------------ | ---------: | ---------: | ---------- | --: | -: | --------------: | -------------------: |
| deepseek2 ?B Q4_K - Medium     |  16.88 GiB |    29.94 B | Vulkan     |  99 |  1 |           pp512 |      8434.22 ± 37.67 |
| deepseek2 ?B Q4_K - Medium     |  16.88 GiB |    29.94 B | Vulkan     |  99 |  1 |           tg128 |       185.26 ± 16.12 |

after

Z:\github\jeffbolznv\llama.cpp\build\bin\RelWithDebInfo>llama-bench -m c:\models\GLM-4.7-Flash-Q4_K_M.gguf -r 10 -fa 1 -p 512 -n 128
ggml_vulkan: Found 1 Vulkan devices:
ggml_vulkan: 0 = NVIDIA GeForce RTX 5090 (NVIDIA) | uma: 0 | fp16: 1 | bf16: 1 | warp size: 32 | shared memory: 49152 | int dot: 1 | matrix cores: NV_coopmat2
load_backend: loaded Vulkan backend from Z:\github\jeffbolznv\llama.cpp\build\bin\RelWithDebInfo\ggml-vulkan.dll
load_backend: loaded CPU backend from Z:\github\jeffbolznv\llama.cpp\build\bin\RelWithDebInfo\ggml-cpu.dll
| model                          |       size |     params | backend    | ngl | fa |            test |                  t/s |
| ------------------------------ | ---------: | ---------: | ---------- | --: | -: | --------------: | -------------------: |
| deepseek2 ?B Q4_K - Medium     |  16.88 GiB |    29.94 B | Vulkan     |  99 |  1 |           pp512 |      8504.07 ± 57.02 |
| deepseek2 ?B Q4_K - Medium     |  16.88 GiB |    29.94 B | Vulkan     |  99 |  1 |           tg128 |       206.38 ± 16.00 |

Just need to add the fusion detection logic, this is a combination of existing modes (early softmax, bias, norm, scale), and is covered by the existing backend tests.

jeffbolznv · 2026-01-21T03:16:02Z

This may not be needed after #18980, I'll check once that lands.

jeffbolznv · 2026-01-21T15:10:20Z

I verified the model is hitting TOPK_MOE_SIGMOID_NORM_BIAS now, without this change. So we don't really need the change.

0cc4m · 2026-01-22T06:00:48Z

So this can be closed?

jeffbolznv · 2026-01-22T06:02:34Z

I'm fine with abandoning it. If another model needs it in the future the code will still be here.

vulkan: Enable topk_moe fusion for GLM-4.7-Flash

addcbcc

Just need to add the fusion detection logic, this is a combination of existing modes (early softmax, bias, norm, scale), and is covered by the existing backend tests.

jeffbolznv requested a review from 0cc4m as a code owner January 20, 2026 04:39

loci-dev mentioned this pull request Jan 20, 2026

UPSTREAM PR #18947: vulkan: Enable topk_moe fusion for GLM-4.7-Flash auroralabs-loci/llama.cpp#975

Open

github-actions bot added Vulkan Issues specific to the Vulkan backend ggml changes relating to the ggml tensor library for machine learning labels Jan 20, 2026

jeffbolznv closed this Jan 22, 2026

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

vulkan: Enable topk_moe fusion for GLM-4.7-Flash#18947

vulkan: Enable topk_moe fusion for GLM-4.7-Flash#18947
jeffbolznv wants to merge 1 commit intoggml-org:masterfrom
jeffbolznv:topk_moe_early_softmax_norm_bias_edges

jeffbolznv commented Jan 20, 2026

Uh oh!

jeffbolznv commented Jan 21, 2026

Uh oh!

jeffbolznv commented Jan 21, 2026

Uh oh!

0cc4m commented Jan 22, 2026

Uh oh!

jeffbolznv commented Jan 22, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

jeffbolznv commented Jan 20, 2026

Uh oh!

jeffbolznv commented Jan 21, 2026

Uh oh!

jeffbolznv commented Jan 21, 2026

Uh oh!

0cc4m commented Jan 22, 2026

Uh oh!

jeffbolznv commented Jan 22, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants