UPSTREAM PR #18934: ggml-cuda: enable cuda-graphs for `n-cpu-moe` by loci-dev · Pull Request #971 · auroralabs-loci/llama.cpp

loci-dev · 2026-01-19T16:44:14Z

Add piece-wise cuda graph for the multiple split case. Currently cuda graphs get disabled when there are splits as we only keep 1 cuda graph per device. Multiple updates with different sized splits/shapes triggers the disable.
This PR adds a cuda graph per split (a split is keyed via the first node in the split)

Tested on 2x4090 and 1x 5090

Model	n_cpu_moe	Test	t/s `3d55846`	t/s n-cpu-moe-piecewise	Speedup
glm4moe 106B.A12B IQ4_XS - 4.25 bpw	8	tg128	60.84	63.12	1.04
glm4moe 106B.A12B IQ4_XS - 4.25 bpw	16	tg128	48.25	50.75	1.05
glm4moe 106B.A12B IQ4_XS - 4.25 bpw	32	tg128	32.84	35.03	1.07
glm4moe 106B.A12B IQ4_XS - 4.25 bpw	64	tg128	25.08	27.49	1.10
gpt-oss 120B MXFP4 MoE	8	tg128	95.96	100.93	1.05
gpt-oss 120B MXFP4 MoE	16	tg128	70.42	75.44	1.07
gpt-oss 120B MXFP4 MoE	32	tg128	44.80	48.47	1.08
gpt-oss 120B MXFP4 MoE	64	tg128	40.87	44.90	1.10

loci-review · 2026-01-19T17:32:02Z

Explore the complete analysis inside the Version Insights

am17an added 2 commits January 19, 2026 16:11

ggml-cuda: add split-wise cuda graph

3693d98

add n-cpu-moe compare_llama_bench.py

af5b6d5

loci-dev temporarily deployed to PROD__AL_DEMO January 19, 2026 16:44 — with GitHub Actions Inactive

loci-dev force-pushed the main branch 26 times, most recently from 5b137d4 to ab9ebfa Compare January 23, 2026 08:12

loci-dev force-pushed the main branch 30 times, most recently from 706d8e7 to 83ca7a9 Compare January 29, 2026 04:39

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

UPSTREAM PR #18934: ggml-cuda: enable cuda-graphs for `n-cpu-moe`#971

UPSTREAM PR #18934: ggml-cuda: enable cuda-graphs for `n-cpu-moe`#971
loci-dev wants to merge 2 commits intomainfrom
upstream-PR18934-branch_am17an-n-cpu-moe-piecewise

loci-dev commented Jan 19, 2026

Uh oh!

loci-review bot commented Jan 19, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

loci-dev commented Jan 19, 2026

Uh oh!

loci-review bot commented Jan 19, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants