[PD][Core] Fix Mamba prefix cache hit rate in PD disaggregation by ZhanqiuHu · Pull Request #44243 · vllm-project/vllm

ZhanqiuHu · 2026-06-01T18:02:20Z

Co-authored with @underfituu.

Fix the bug described in #42524; overwrite find_longest_cache_hit to bypass truncation of full attention groups.

Note on the expected behavior: the last state of Mamba will always be transfer, but full attention will only transfer the prefix cache miss part.

NickLucche

I think this is quite interesting

NickLucche · 2026-06-03T10:41:18Z

+                    ):
+                        new_computed_blocks, per_group_hits = (
+                            self._get_computed_blocks_per_group(request)
+                        )
+                        num_new_local_computed_tokens = min(per_group_hits)


dumb q: what's the main issue with evaluating per-group for all hybrid models, regardless of mamba?

mm sw probably needs right2left alignment

yeah i was thinking how we can generalize this, let me think more :)
I think num_new_local_computed_tokens = min(per_group_hits) is actually wrong, but technically num_new_local_computed_tokens is not really used later with kv connector.
sw definitely needs different handling

mergify · 2026-06-04T16:04:42Z

This pull request has merge conflicts that must be resolved before it can be
merged. Please rebase the PR, @ZhanqiuHu.

https://docs.github.com/en/pull-requests/collaborating-with-pull-requests/working-with-forks/syncing-a-fork

Per-group cache evaluation so FA local hits are preserved even when Mamba has no local state on the D-side. Worker-side prefix caching now handles SSM groups correctly (end-trim instead of strict equality). Fixes 0% D-side prefix cache hit rate for Mamba hybrid models in PD. Signed-off-by: Zhanqiu Hu <zhu@redhat.com>

Signed-off-by: Zhanqiu Hu <zhu@redhat.com>

claude

Claude Code Review

This pull request is from a fork — automated review is disabled. A repository maintainer can comment @claude review to run a one-time review.

NickLucche

LGTM as per offline discussion

mergify · 2026-06-08T16:30:36Z

Hi @ZhanqiuHu, the pre-commit checks have failed. Please run:

uv pip install pre-commit>=4.5.1
pre-commit install
pre-commit run --all-files

Then, commit the changes and push to your branch.

For future commits, pre-commit will run automatically on changed files before each commit.

underfituu · 2026-06-09T04:14:24Z

-                    )
+                    if (
+                        self.connector is not None
+                        and self.has_mamba_layers


Thanks for the non-intrusive changes! Abstracting this into a separate function looks much more elegant. We recently noticed that DeepSeek-V4 also supports partial hits, so would it make sense to generalize this logic to accommodate it?

…-project#44243) Co-authored-by: lHrHenry233 <2381623149@qq.com> Co-authored-by: underfituu <hzhucong@163.com> Signed-off-by: Zhanqiu Hu <zhu@redhat.com>

mergify Bot added v1 kv-connector labels Jun 1, 2026

ZhanqiuHu force-pushed the zhanqiu/fix-pd-mamba-prefix-cache branch from e40d378 to 33cd846 Compare June 1, 2026 18:22

ZhanqiuHu mentioned this pull request Jun 1, 2026

[PD][Feature] Add KV consumer partial-group caching for hybrid Mamba models #42524

Open

NickLucche reviewed Jun 3, 2026

View reviewed changes

mergify Bot added the needs-rebase label Jun 4, 2026

ZhanqiuHu added 3 commits June 4, 2026 14:39

remove loggings

4129f5d

Signed-off-by: Zhanqiu Hu <zhu@redhat.com>

clean

a45c78a

Signed-off-by: Zhanqiu Hu <zhu@redhat.com>

ZhanqiuHu force-pushed the zhanqiu/fix-pd-mamba-prefix-cache branch from 88da91d to a45c78a Compare June 4, 2026 18:43

mergify Bot removed the needs-rebase label Jun 4, 2026

ZhanqiuHu added 3 commits June 4, 2026 16:23

move per group function

f9d5dfe

Signed-off-by: Zhanqiu Hu <zhu@redhat.com>

add e2e test

739ac9d

Signed-off-by: Zhanqiu Hu <zhu@redhat.com>

clean up

fb795d4

Signed-off-by: Zhanqiu Hu <zhu@redhat.com>

mergify Bot added the ci/build label Jun 5, 2026

ZhanqiuHu marked this pull request as ready for review June 8, 2026 13:31

ZhanqiuHu requested review from ApostaC, Harry-Chen, WoosukKwon, alexm-redhat, heheda12345, khluu, njhill, orozery, robertgshaw2-redhat and ywang96 as code owners June 8, 2026 13:31

claude Bot reviewed Jun 8, 2026

View reviewed changes

NickLucche approved these changes Jun 8, 2026

View reviewed changes

NickLucche added the ready ONLY add when PR is ready to merge/full CI is needed label Jun 8, 2026

Merge branch 'main' into zhanqiu/fix-pd-mamba-prefix-cache

ed3c7c2

NickLucche enabled auto-merge (squash) June 8, 2026 16:21

underfituu reviewed Jun 9, 2026

View reviewed changes

NickLucche added 2 commits June 9, 2026 09:04

Merge branch 'main' into zhanqiu/fix-pd-mamba-prefix-cache

85f3f93

Merge branch 'main' into zhanqiu/fix-pd-mamba-prefix-cache

2ea9e6a

NickLucche merged commit 55911db into vllm-project:main Jun 11, 2026
75 checks passed

NickLucche mentioned this pull request Jun 15, 2026

[Bug]: Prefix caching on Nemotron Ultra breaks gsm8k #45699

Open

1 task

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[PD][Core] Fix Mamba prefix cache hit rate in PD disaggregation#44243

[PD][Core] Fix Mamba prefix cache hit rate in PD disaggregation#44243
NickLucche merged 9 commits into
vllm-project:mainfrom
ZhanqiuHu:zhanqiu/fix-pd-mamba-prefix-cache

ZhanqiuHu commented Jun 1, 2026 •

edited

Loading

Uh oh!

NickLucche left a comment

Uh oh!

NickLucche Jun 3, 2026

Uh oh!

NickLucche Jun 3, 2026

Uh oh!

ZhanqiuHu Jun 3, 2026 •

edited

Loading

Uh oh!

mergify Bot commented Jun 4, 2026

Uh oh!

claude Bot left a comment

Uh oh!

NickLucche left a comment

Uh oh!

mergify Bot commented Jun 8, 2026

Uh oh!

underfituu Jun 9, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Uh oh!

Conversation

ZhanqiuHu commented Jun 1, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

NickLucche left a comment

Choose a reason for hiding this comment

Uh oh!

NickLucche Jun 3, 2026

Choose a reason for hiding this comment

Uh oh!

NickLucche Jun 3, 2026

Choose a reason for hiding this comment

Uh oh!

ZhanqiuHu Jun 3, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

mergify Bot commented Jun 4, 2026

Uh oh!

claude Bot left a comment

Choose a reason for hiding this comment

Claude Code Review

Uh oh!

NickLucche left a comment

Choose a reason for hiding this comment

Uh oh!

mergify Bot commented Jun 8, 2026

Uh oh!

underfituu Jun 9, 2026

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

ZhanqiuHu commented Jun 1, 2026 •

edited

Loading

ZhanqiuHu Jun 3, 2026 •

edited

Loading