UPSTREAM PR #18947: vulkan: Enable topk_moe fusion for GLM-4.7-Flash by loci-dev · Pull Request #975 · auroralabs-loci/llama.cpp

loci-dev · 2026-01-20T04:46:27Z

Just need to add the fusion detection logic, this is a combination of existing modes (early softmax, bias, norm, scale), and is covered by the existing backend tests.

before

Z:\github\jeffbolznv\llama.cpp\build\bin\RelWithDebInfo>llama-bench -m c:\models\GLM-4.7-Flash-Q4_K_M.gguf -r 10 -fa 1 -p 512 -n 128
ggml_vulkan: Found 1 Vulkan devices:
ggml_vulkan: 0 = NVIDIA GeForce RTX 5090 (NVIDIA) | uma: 0 | fp16: 1 | bf16: 1 | warp size: 32 | shared memory: 49152 | int dot: 1 | matrix cores: NV_coopmat2
load_backend: loaded Vulkan backend from Z:\github\jeffbolznv\llama.cpp\build\bin\RelWithDebInfo\ggml-vulkan.dll
load_backend: loaded CPU backend from Z:\github\jeffbolznv\llama.cpp\build\bin\RelWithDebInfo\ggml-cpu.dll
| model                          |       size |     params | backend    | ngl | fa |            test |                  t/s |
| ------------------------------ | ---------: | ---------: | ---------- | --: | -: | --------------: | -------------------: |
| deepseek2 ?B Q4_K - Medium     |  16.88 GiB |    29.94 B | Vulkan     |  99 |  1 |           pp512 |      8434.22 ± 37.67 |
| deepseek2 ?B Q4_K - Medium     |  16.88 GiB |    29.94 B | Vulkan     |  99 |  1 |           tg128 |       185.26 ± 16.12 |

after

Z:\github\jeffbolznv\llama.cpp\build\bin\RelWithDebInfo>llama-bench -m c:\models\GLM-4.7-Flash-Q4_K_M.gguf -r 10 -fa 1 -p 512 -n 128
ggml_vulkan: Found 1 Vulkan devices:
ggml_vulkan: 0 = NVIDIA GeForce RTX 5090 (NVIDIA) | uma: 0 | fp16: 1 | bf16: 1 | warp size: 32 | shared memory: 49152 | int dot: 1 | matrix cores: NV_coopmat2
load_backend: loaded Vulkan backend from Z:\github\jeffbolznv\llama.cpp\build\bin\RelWithDebInfo\ggml-vulkan.dll
load_backend: loaded CPU backend from Z:\github\jeffbolznv\llama.cpp\build\bin\RelWithDebInfo\ggml-cpu.dll
| model                          |       size |     params | backend    | ngl | fa |            test |                  t/s |
| ------------------------------ | ---------: | ---------: | ---------- | --: | -: | --------------: | -------------------: |
| deepseek2 ?B Q4_K - Medium     |  16.88 GiB |    29.94 B | Vulkan     |  99 |  1 |           pp512 |      8504.07 ± 57.02 |
| deepseek2 ?B Q4_K - Medium     |  16.88 GiB |    29.94 B | Vulkan     |  99 |  1 |           tg128 |       206.38 ± 16.00 |

Just need to add the fusion detection logic, this is a combination of existing modes (early softmax, bias, norm, scale), and is covered by the existing backend tests.

loci-review · 2026-01-20T05:36:21Z

Explore the complete analysis inside the Version Insights

vulkan: Enable topk_moe fusion for GLM-4.7-Flash

addcbcc

Just need to add the fusion detection logic, this is a combination of existing modes (early softmax, bias, norm, scale), and is covered by the existing backend tests.

loci-dev temporarily deployed to PROD__AL_DEMO January 20, 2026 04:46 — with GitHub Actions Inactive

loci-dev force-pushed the main branch 27 times, most recently from 0e2fcc8 to 5668a6a Compare January 24, 2026 07:09

loci-dev force-pushed the main branch 30 times, most recently from 10471d1 to e11b5e5 Compare January 29, 2026 15:17

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

UPSTREAM PR #18947: vulkan: Enable topk_moe fusion for GLM-4.7-Flash#975

UPSTREAM PR #18947: vulkan: Enable topk_moe fusion for GLM-4.7-Flash#975
loci-dev wants to merge 1 commit intomainfrom
upstream-PR18947-branch_jeffbolznv-topk_moe_early_softmax_norm_bias_edges

loci-dev commented Jan 20, 2026

Uh oh!

loci-review bot commented Jan 20, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

loci-dev commented Jan 20, 2026

Uh oh!

loci-review bot commented Jan 20, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants