Skip to content

UPSTREAM PR #16490: graph : reuse SSM graphs#1083

Closed
loci-dev wants to merge 7 commits intomainfrom
loci/pr-16490-gg-graph-mamba-reuse
Closed

UPSTREAM PR #16490: graph : reuse SSM graphs#1083
loci-dev wants to merge 7 commits intomainfrom
loci/pr-16490-gg-graph-mamba-reuse

Conversation

@loci-dev
Copy link

Note

Source pull request: ggml-org/llama.cpp#16490

Not sure if there is a reason not to enable graph reuse for recurrent graphs (mamba, hybrids, SSM, etc.). Did a few tests and seems to work, resulting in some modest perf improvements. cc @gabe-l-hart @compilade

Without graph reuse

make -j && LLAMA_GRAPH_REUSE_DISABLE=1 ./bin/llama-bench -m ../models/mamba-130m/ggml-model-f16.gguf -m ../models/granite-4-h-tiny/ggml-model-q8_0.gguf -m ../models/ai21-jamba-mini-1.7/ggml-model-q8_0.gguf -m ../models/liquidai-lfm2-2.6b/ggml-model-q4_k.gguf -fa 1 -t 1 -n 32
model size params backend ngl threads fa test t/s
mamba 0.1B F16 256.96 MiB 129.14 M Metal 99 1 1 pp512 8415.73 ± 46.47
mamba 0.1B F16 256.96 MiB 129.14 M Metal 99 1 1 tg32 322.74 ± 0.64
granitehybrid ?B Q8_0 6.88 GiB 6.94 B Metal 99 1 1 pp512 2119.36 ± 3.31
granitehybrid ?B Q8_0 6.88 GiB 6.94 B Metal 99 1 1 tg32 77.17 ± 0.11
jamba ?B Q8_0 51.05 GiB 51.57 B Metal 99 1 1 pp512 603.47 ± 1.83
jamba ?B Q8_0 51.05 GiB 51.57 B Metal 99 1 1 tg32 42.35 ± 0.02
lfm2 2.6B Q4_K - Medium 1.45 GiB 2.57 B Metal 99 1 1 pp512 2923.41 ± 3.20
lfm2 2.6B Q4_K - Medium 1.45 GiB 2.57 B Metal 99 1 1 tg32 169.83 ± 0.67
build: 638e2c2 (6725)

With graph reuse

make -j && ./bin/llama-bench -m ../models/mamba-130m/ggml-model-f16.gguf -m ../models/granite-4-h-tiny/ggml-model-q8_0.gguf -m ../models/ai21-jamba-mini-1.7/ggml-model-q8_0.gguf -m ../models/liquidai-lfm2-2.6b/ggml-model-q4_k.gguf -fa 1 -t 1 -n 32
model size params backend ngl threads fa test t/s
mamba 0.1B F16 256.96 MiB 129.14 M Metal 99 1 1 pp512 8453.65 ± 20.10
mamba 0.1B F16 256.96 MiB 129.14 M Metal 99 1 1 tg32 348.83 ± 1.67
granitehybrid ?B Q8_0 6.88 GiB 6.94 B Metal 99 1 1 pp512 2126.12 ± 1.90
granitehybrid ?B Q8_0 6.88 GiB 6.94 B Metal 99 1 1 tg32 82.26 ± 0.13
jamba ?B Q8_0 51.05 GiB 51.57 B Metal 99 1 1 pp512 604.56 ± 2.08
jamba ?B Q8_0 51.05 GiB 51.57 B Metal 99 1 1 tg32 43.22 ± 0.02
lfm2 2.6B Q4_K - Medium 1.45 GiB 2.57 B Metal 99 1 1 pp512 2928.31 ± 1.78
lfm2 2.6B Q4_K - Medium 1.45 GiB 2.57 B Metal 99 1 1 tg32 179.18 ± 0.47
build: 638e2c2 (6725)

@DajanaV DajanaV closed this Jan 31, 2026
@DajanaV DajanaV deleted the loci/pr-16490-gg-graph-mamba-reuse branch January 31, 2026 00:10
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants