[Core][KV Transfer] Support PD disagg + speculative decoding KV lifecycle by ZhanqiuHu · Pull Request #34926 · vllm-project/vllm

ZhanqiuHu · 2026-02-19T23:40:36Z

Summary

Defer KV connector finalization for P/D disaggregation + speculative decoding compatibility.
Add E2E tests for MTP (DeepSeek), EAGLE (Llama-3.1-8B), and EAGLE3 (GPT-OSS-20B) with ExampleConnector.

Purpose

The connector metadata (and wait_for_save) was being finalized after the target model forward but before the draft model forward, preventing drafter KV from being saved/loaded correctly. This PR defers both until after the draft forward completes.

Test Plan

Test Result

Essential Elements of an Effective PR Description Checklist

The purpose of the PR, such as "Fix some issue (link existing issues this PR will resolve)".
The test plan, such as providing test command.
The test results, such as pasting the results comparison before and after, or e2e results
(Optional) The necessary documentation update, such as updating supported_models.md and examples for a new model.
(Optional) Release notes update. If your change is user facing, please update the release notes draft in the Google Doc.

When spec decode is active, the draft model's forward pass also needs the connector metadata to save its KV cache via @maybe_transfer_kv_layer. Add delay_clear param to _get_kv_connector_output so the finally block skips clear_connector_metadata(), and explicitly clear after draft proposals complete in sample_tokens(). Signed-off-by: Zhanqiu Hu <zh338@cornell.edu>

When speculative decoding is enabled with KV transfer, the connector metadata was being cleared after the target model's forward pass but before the draft model's forward. This prevented drafter KV layers from being saved/loaded. Fix: defer both wait_for_save() and clear_connector_metadata() until after the draft model forward completes via finalize_connector_and_clear(). Add E2E tests covering MTP, EAGLE, and EAGLE3 with ExampleConnector. Signed-off-by: Zhanqiu Hu <zh338@cornell.edu>

gemini-code-assist

Code Review

This pull request correctly defers the KV connector finalization to support prefill/decode disaggregation in combination with speculative decoding. The changes are well-implemented by introducing a delay_clear flag in the KVConnectorModelRunnerMixin, which is enabled when a speculative configuration is present. The deferred finalization is then correctly triggered after the draft model's forward pass. The addition of comprehensive end-to-end tests for MTP, EAGLE, and EAGLE3 methods is excellent and ensures the new functionality is robust and correct. The overall implementation is clean and effectively addresses the issue of saving drafter KV caches in a disaggregated setup.

GPT-OSS uses hybrid sliding/full attention which is incompatible with ExampleConnector's single block table slot mapping. Tests are skipped for now; GPT-OSS works with NixlConnector (raw block transfer). Signed-off-by: Zhanqiu Hu <zh338@cornell.edu>

mergify · 2026-02-23T13:23:44Z

This pull request has merge conflicts that must be resolved before it can be
merged. Please rebase the PR, @ZhanqiuHu.

https://docs.github.com/en/pull-requests/collaborating-with-pull-requests/working-with-forks/syncing-a-fork

ZhanqiuHu · 2026-03-05T21:46:10Z

Follow-up in #35158

ZhanqiuHu added 2 commits February 19, 2026 13:14

mergify bot added v1 kv-connector labels Feb 19, 2026

gemini-code-assist bot reviewed Feb 19, 2026

View reviewed changes

mergify bot added the needs-rebase label Feb 23, 2026

NickLucche mentioned this pull request Feb 23, 2026

[Spec Decode] Defer clearing KV connector metadata for EAGLE3 speculative decode + prefill / decode disagg setup #34529

Merged

5 tasks

ZhanqiuHu closed this Mar 5, 2026

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[Core][KV Transfer] Support PD disagg + speculative decoding KV lifecycle#34926

[Core][KV Transfer] Support PD disagg + speculative decoding KV lifecycle#34926
ZhanqiuHu wants to merge 3 commits intovllm-project:mainfrom
ZhanqiuHu:pd-sd-delay-clear

ZhanqiuHu commented Feb 19, 2026 •

edited by github-actions bot

Loading

Uh oh!

gemini-code-assist bot left a comment

Uh oh!

mergify bot commented Feb 23, 2026

Uh oh!

ZhanqiuHu commented Mar 5, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Uh oh!

Conversation

ZhanqiuHu commented Feb 19, 2026 • edited by github-actions bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

Purpose

Test Plan

Test Result

Uh oh!

gemini-code-assist bot left a comment

Choose a reason for hiding this comment

Code Review

Uh oh!

mergify bot commented Feb 23, 2026

Uh oh!

ZhanqiuHu commented Mar 5, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

ZhanqiuHu commented Feb 19, 2026 •

edited by github-actions bot

Loading