UPSTREAM PR #16490: graph : reuse SSM graphs by loci-dev · Pull Request #1083 · auroralabs-loci/llama.cpp

loci-dev · 2026-01-30T23:58:46Z

Note

Source pull request: ggml-org/llama.cpp#16490

Not sure if there is a reason not to enable graph reuse for recurrent graphs (mamba, hybrids, SSM, etc.). Did a few tests and seems to work, resulting in some modest perf improvements. cc @gabe-l-hart @compilade

Without graph reuse

make -j && LLAMA_GRAPH_REUSE_DISABLE=1 ./bin/llama-bench -m ../models/mamba-130m/ggml-model-f16.gguf -m ../models/granite-4-h-tiny/ggml-model-q8_0.gguf -m ../models/ai21-jamba-mini-1.7/ggml-model-q8_0.gguf -m ../models/liquidai-lfm2-2.6b/ggml-model-q4_k.gguf -fa 1 -t 1 -n 32

model	size	params	backend	ngl	threads	fa	test	t/s
mamba 0.1B F16	256.96 MiB	129.14 M	Metal	99	1	1	pp512	8415.73 ± 46.47
mamba 0.1B F16	256.96 MiB	129.14 M	Metal	99	1	1	tg32	322.74 ± 0.64
granitehybrid ?B Q8_0	6.88 GiB	6.94 B	Metal	99	1	1	pp512	2119.36 ± 3.31
granitehybrid ?B Q8_0	6.88 GiB	6.94 B	Metal	99	1	1	tg32	77.17 ± 0.11
jamba ?B Q8_0	51.05 GiB	51.57 B	Metal	99	1	1	pp512	603.47 ± 1.83
jamba ?B Q8_0	51.05 GiB	51.57 B	Metal	99	1	1	tg32	42.35 ± 0.02
lfm2 2.6B Q4_K - Medium	1.45 GiB	2.57 B	Metal	99	1	1	pp512	2923.41 ± 3.20
lfm2 2.6B Q4_K - Medium	1.45 GiB	2.57 B	Metal	99	1	1	tg32	169.83 ± 0.67
build: `638e2c2` (6725)

With graph reuse

make -j && ./bin/llama-bench -m ../models/mamba-130m/ggml-model-f16.gguf -m ../models/granite-4-h-tiny/ggml-model-q8_0.gguf -m ../models/ai21-jamba-mini-1.7/ggml-model-q8_0.gguf -m ../models/liquidai-lfm2-2.6b/ggml-model-q4_k.gguf -fa 1 -t 1 -n 32

model	size	params	backend	ngl	threads	fa	test	t/s
mamba 0.1B F16	256.96 MiB	129.14 M	Metal	99	1	1	pp512	8453.65 ± 20.10
mamba 0.1B F16	256.96 MiB	129.14 M	Metal	99	1	1	tg32	348.83 ± 1.67
granitehybrid ?B Q8_0	6.88 GiB	6.94 B	Metal	99	1	1	pp512	2126.12 ± 1.90
granitehybrid ?B Q8_0	6.88 GiB	6.94 B	Metal	99	1	1	tg32	82.26 ± 0.13
jamba ?B Q8_0	51.05 GiB	51.57 B	Metal	99	1	1	pp512	604.56 ± 2.08
jamba ?B Q8_0	51.05 GiB	51.57 B	Metal	99	1	1	tg32	43.22 ± 0.02
lfm2 2.6B Q4_K - Medium	1.45 GiB	2.57 B	Metal	99	1	1	pp512	2928.31 ± 1.78
lfm2 2.6B Q4_K - Medium	1.45 GiB	2.57 B	Metal	99	1	1	tg32	179.18 ± 0.47
build: `638e2c2` (6725)

This reverts commit 00f115f.

ggerganov and others added 7 commits December 15, 2025 13:52

graph : reuse hybrid graphs

cc10fab

graph : reuse recurrent graphs

4f71c75

graph : fix reuse check for recurrent inputs

36a95e6

memory : move the recurrent state into the memory context

3aa4e3c

Revert "memory : move the recurrent state into the memory context"

d24eb42

This reverts commit 00f115f.

cont : fix build

454ab90

Merge branch 'main' into loci/pr-16490-gg-graph-mamba-reuse

b3492b6

DajanaV had a problem deploying to PROD__AL_DEMO January 31, 2026 00:09 — with GitHub Actions Error

DajanaV closed this Jan 31, 2026

DajanaV deleted the loci/pr-16490-gg-graph-mamba-reuse branch January 31, 2026 00:10

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

UPSTREAM PR #16490: graph : reuse SSM graphs#1083

UPSTREAM PR #16490: graph : reuse SSM graphs#1083
loci-dev wants to merge 7 commits intomainfrom
loci/pr-16490-gg-graph-mamba-reuse

loci-dev commented Jan 30, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Conversation

loci-dev commented Jan 30, 2026

Without graph reuse

With graph reuse

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants