[V1] [P/D] Refactor KV Connector Path by sdavidbd · Pull Request #21980 · vllm-project/vllm

sdavidbd · 2025-07-30T23:47:27Z

Essential Elements of an Effective PR Description Checklist

The purpose of the PR, such as "Fix some issue (link existing issues this PR will resolve)".
The test plan, such as providing test command.
The test results, such as pasting the results comparison before and after, or e2e results
(Optional) The necessary documentation update, such as updating supported_models.md and examples for a new model.

Purpose

This PR refactors the KV connector integration in GPUModelRunner.execute_model by introducing a context manager that encapsulates the lifecycle of the KV Connector. This clarifies the execution flow and improves modularity.

Additionally, this PR simplifies IntermediateTensors and ModelRunnerOutput by consolidating multiple ad-hoc KV-related fields into a single kv_connector_output field of type KVConnectorOutput, which improves readability and maintainability.

Test Plan

Run all existing tests.

Test Result

All tests pass.

(Optional) Documentation Update

github-actions · 2025-07-30T23:47:35Z

👋 Hi! Thank you for contributing to the vLLM project.

💬 Join our developer Slack at https://slack.vllm.ai to discuss your PR in #pr-reviews, coordinate on features in #feat- channels, or join special interest groups in #sig- channels.

Just a reminder: PRs would not trigger full CI run by default. Instead, it would only run fastcheck CI which starts running only a small and essential subset of CI tests to quickly catch errors. You can run other CI tests on top of those by going to your fastcheck build on Buildkite UI (linked in the PR checks section) and unblock them. If you do not have permission to unblock, ping simon-mo or khluu to add you in our Buildkite org.

Once the PR is approved and ready to go, your PR reviewer(s) can run CI to test the changes comprehensively before merging.

To run CI, PR reviewers can either: Add ready label to the PR or enable auto-merge.

🚀

gemini-code-assist

Code Review

This pull request introduces a valuable refactoring by encapsulating the KV connector lifecycle within a context manager in GPUModelRunner. This significantly improves code clarity and maintainability. The consolidation of KV-related fields into a single kv_connector_output in ModelRunnerOutput and IntermediateTensors is also a welcome change that enhances readability.

However, a critical issue has been introduced in the TPUModelRunner. The refactoring was not applied to it, and it now calls methods that have been removed from KVConnectorModelRunnerMixin, which will cause runtime failures. This needs to be addressed before merging.

vllm/v1/worker/tpu_model_runner.py

NickLucche

Hey thanks for your work!
I am just wondering why can't we have KVConnector.get_finished return a KVConnectorOutput? That would make for easier extensibility as we need to move more stuff from workers to executor.

sdavidbd · 2025-07-31T09:51:47Z

Hey thanks for your work! I am just wondering why can't we have KVConnector.get_finished return a KVConnectorOutput? That would make for easier extensibility as we need to move more stuff from workers to executor.

Thanks @NickLucche ! That's a great question.

Shaping the output returned from the connector into a general structure is still a work in progress. To avoid locking in a premature design, I believe it’s best to construct KVConnectorOutput externally rather than returning it directly from get_finished. For example, in #19330, I use an additional API to extend the output without modifying the get_finished interface.
A key concern here is API stability - we want to minimize breaking changes to the connector interface, especially given the number of third-party implementations already in use. By keeping the connector output external, we can evolve the design more safely and incrementally.

njhill

Thanks @sdavidbd, this looks great to me.

I also feel it would make sense to return KVConnectorOutput from get_finished(), but since there's not yet agreement on that we could get this merged asap and handle that as a follow-on.

vllm/v1/worker/kv_connector_model_runner_mixin.py

njhill

Thanks @sdavidbd! We can continue discussion in follow-on PRs

lk-chen · 2025-07-31T17:59:51Z

For example, in #19330, I use an additional API to extend the output without modifying the get_finished interface. A key concern here is API stability - we want to minimize breaking changes to the connector interface, especially given the number of third-party implementations already in use.

I think calling "additional API" brings complexity of keeping atomicity? If you get done_sending from get_finished(), and metrics from new API get_metrics(), how would you guarantee these two are from the same time point?

Regarding "third-party impl.", if this is referring the implementation inside vllm/ repo, I guess we just change all of them altogether at once?

Never the less, I'm ok merging this PR. At least in follow-up PR, we don't need to touch ModelRunnerOutput etc.

NickLucche · 2025-07-31T19:30:29Z

We have to fix tests

2025-07-31T13:30:51Z] FAILED v1/core/test_async_scheduler.py::test_stop_by_max_tokens[1] - TypeError: ModelRunnerOutput.__init__() missing 1 required positional argument: 'kv_connector_output'
[2025-07-31T13:30:51Z] FAILED v1/core/test_async_scheduler.py::test_stop_by_max_tokens[2] - TypeError: ModelRunnerOutput.__init__() missing 1 required positional argument: 'kv_connector_output'
[2025-07-31T13:30:51Z] FAILED v1/core/test_async_scheduler.py::test_stop_by_max_tokens[3] - TypeError: ModelRunnerOutput.__init__() missing 1 required positional argument: 'kv_connector_output'
[2025-07-31T13:30:51Z] FAILED v1/core/test_async_scheduler.py::test_stop_by_max_tokens[5] - TypeError: ModelRunnerOutput.__init__() missing 1 required positional argument: 'kv_connector_output'

DarkLight1337 · 2025-08-01T15:20:13Z

There are more failing tests in V1 tests, please fix them. The rest should be fixed if you merge from main

sdavidbd · 2025-08-02T18:53:07Z

There are more failing tests in V1 tests, please fix them. The rest should be fixed if you merge from main

Strange - there seems to be a mismatch between the code shown in the commit view and the version that was actually tested in CI. I'll try rebasing and pushing again to see if that resolves it.

sdavidbd · 2025-08-03T06:41:41Z

@DarkLight1337 Failed checks appear to be caused by known issues unrelated to this PR:

ModuleNotFoundError: No module named 'ray.experimental.channel.accelerator_context' - addressed by PR [Misc] Bump ray to 2.48.0 #22123
AssertionError: two copies of the same prompt should be the same in test_completion.py - seems unrelated and was mentioned in the #sig-ci channel as a known flake.

… connector path Signed-off-by: David Ben-David <davidb@pliops.com>

Signed-off-by: David Ben-David <davidb@pliops.com>

sdavidbd · 2025-08-03T11:35:05Z

For example, in #19330, I use an additional API to extend the output without modifying the get_finished interface. A key concern here is API stability - we want to minimize breaking changes to the connector interface, especially given the number of third-party implementations already in use.

I think calling "additional API" brings complexity of keeping atomicity? If you get done_sending from get_finished(), and metrics from new API get_metrics(), how would you guarantee these two are from the same time point?

Regarding "third-party impl.", if this is referring the implementation inside vllm/ repo, I guess we just change all of them altogether at once?

Never the less, I'm ok merging this PR. At least in follow-up PR, we don't need to touch ModelRunnerOutput etc.

Thanks, @lk-chen!

A significant part of this PR is the introduction of the KV connector context manager, which manages the connector lifecycle over a single model execution (i.e., a scheduling step). This provides a natural boundary for atomicity - between bind_connector_metadata and clear_connector_metadata. The intention is that any output retrieved within this scope (e.g., via get_finished or a potential get_metrics) reflects a consistent view of the connector state for that scheduling step. Open to feedback if there are scenarios I’ve missed or if this consistency assumption doesn’t hold in practice.

Regarding third-party implementations, I was referring to out-of-tree connectors that are dynamically loaded via KVTransferConfig.kv_connector_module_path, not just the ones under the vllm repo. That’s where the concern for interface stability comes in - while the connector API is still a work in progress and some breaking changes are expected, we’d like to minimize the frequency of such changes to reduce churn for external users and give the interface a chance to stabilize gradually

Signed-off-by: David Ben-David <davidb@pliops.com> Co-authored-by: David Ben-David <davidb@pliops.com>

liuzijing2014 · 2025-08-07T21:56:04Z

Hi @sdavidbd I think we miss handling the edge case where KVConnectorOutput could be None. I put up a quick fix here: #22473

Signed-off-by: David Ben-David <davidb@pliops.com> Co-authored-by: David Ben-David <davidb@pliops.com> Signed-off-by: Jinzhen Lin <linjinzhen@hotmail.com>

Signed-off-by: David Ben-David <davidb@pliops.com> Co-authored-by: David Ben-David <davidb@pliops.com> Signed-off-by: Noam Gat <noamgat@gmail.com>

Signed-off-by: David Ben-David <davidb@pliops.com> Co-authored-by: David Ben-David <davidb@pliops.com> Signed-off-by: Paul Pak <paulpak58@gmail.com>

Signed-off-by: David Ben-David <davidb@pliops.com> Co-authored-by: David Ben-David <davidb@pliops.com> Signed-off-by: Diego-Castan <diego.castan@ibm.com>

Signed-off-by: David Ben-David <davidb@pliops.com> Co-authored-by: David Ben-David <davidb@pliops.com>

hidva · 2025-11-04T06:41:21Z

vllm/v1/worker/gpu_model_runner.py

-        ):
-            self.maybe_setup_kv_connector(scheduler_output)
+        ), self.maybe_get_kv_connector_output(
+                scheduler_output) as kv_connector_output:


With Always call connector clear_metadata() at end of step, clear_connector_meta was invoked before the step ended, meaning it was executed after the draft model forward.

In the current change, it is invoked before the draft model forward, which causes the connector to be unable to send the draft layer KV cache.

sdavidbd requested review from WoosukKwon, alexm-redhat, comaniac, njhill, robertgshaw2-redhat and ywang96 as code owners July 30, 2025 23:47

mergify bot added v1 tpu Related to Google TPUs labels Jul 30, 2025

gemini-code-assist bot reviewed Jul 30, 2025

View reviewed changes

vllm/v1/worker/tpu_model_runner.py Outdated Show resolved Hide resolved

sdavidbd mentioned this pull request Jul 30, 2025

[V1] [P/D] Add Support for KV Load Failure Recovery #19330

Merged

lk-chen approved these changes Jul 31, 2025

View reviewed changes

NickLucche requested changes Jul 31, 2025

View reviewed changes

njhill reviewed Jul 31, 2025

View reviewed changes

vllm/v1/worker/kv_connector_model_runner_mixin.py Outdated Show resolved Hide resolved

sdavidbd requested a review from njhill July 31, 2025 11:43

njhill approved these changes Jul 31, 2025

View reviewed changes

njhill added the ready ONLY add when PR is ready to merge/full CI is needed label Jul 31, 2025

sdavidbd force-pushed the feature/kv-connector-context-manager branch from 5d3ceee to 373df31 Compare August 2, 2025 19:10

David Ben-David added 6 commits August 3, 2025 09:47

Refactor GPUModelRunner.execute_model to use a context manager for KV…

7bb45a7

… connector path Signed-off-by: David Ben-David <davidb@pliops.com>

Refactor IntermediateTensors to encapsulate KVConnectorOutput

b9bb266

Signed-off-by: David Ben-David <davidb@pliops.com>

Refactor ModelRunnerOutput to encapsulate KVConnectorOutput

aab7c7c

Signed-off-by: David Ben-David <davidb@pliops.com>

Move KVConnectorOutput to outputs.py

aa5a1bd

Signed-off-by: David Ben-David <davidb@pliops.com>

Set default value for kv_connector_output in ModelRunnerOutput

61778b5

Signed-off-by: David Ben-David <davidb@pliops.com>

Add __repr__ to DummyModelRunnerOutput to avoid logging errors

25f2873

Signed-off-by: David Ben-David <davidb@pliops.com>

sdavidbd force-pushed the feature/kv-connector-context-manager branch from 373df31 to 25f2873 Compare August 3, 2025 06:48

vllm-bot merged commit aefeea0 into vllm-project:main Aug 3, 2025
44 of 46 checks passed

lk-chen mentioned this pull request Aug 5, 2025

[P/D] Update output of get_finished to newly defined class KVConnectorFinishOutput #21790

Closed

4 tasks

npanpaliya pushed a commit to odh-on-pz/vllm-upstream that referenced this pull request Aug 6, 2025

[V1] [P/D] Refactor KV Connector Path (vllm-project#21980)

8fec38f

Signed-off-by: David Ben-David <davidb@pliops.com> Co-authored-by: David Ben-David <davidb@pliops.com>

liuzijing2014 mentioned this pull request Aug 7, 2025

[V1][P/D]Bug fix: handle edge case where KVConnectorOutput is None #22473

Closed

njhill mentioned this pull request Aug 10, 2025

[BugFix] Fix KVConnectorOutput TPU breakage #22598

Merged

epwalsh pushed a commit to epwalsh/vllm that referenced this pull request Aug 28, 2025

[V1] [P/D] Refactor KV Connector Path (vllm-project#21980)

d1951d7

Signed-off-by: David Ben-David <davidb@pliops.com> Co-authored-by: David Ben-David <davidb@pliops.com>

zhewenl pushed a commit to zhewenl/vllm that referenced this pull request Aug 28, 2025

[V1] [P/D] Refactor KV Connector Path (vllm-project#21980)

2713abf

Signed-off-by: David Ben-David <davidb@pliops.com> Co-authored-by: David Ben-David <davidb@pliops.com>

juewAtAmazon mentioned this pull request Sep 21, 2025

[Bug]: disaggregated_serving_p2p_nccl_xpyd prefill worker failed with "EngineCore failed to start". #25043

Closed

1 task

hidva reviewed Nov 4, 2025

View reviewed changes

mergify bot added the kv-connector label Nov 4, 2025

Uh oh!

Conversation

sdavidbd commented Jul 30, 2025 • edited by github-actions bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Essential Elements of an Effective PR Description Checklist

Purpose

Test Plan

Test Result

(Optional) Documentation Update

Uh oh!

github-actions bot commented Jul 30, 2025

Uh oh!

gemini-code-assist bot left a comment

Choose a reason for hiding this comment

Code Review

Uh oh!

Uh oh!

NickLucche left a comment

Choose a reason for hiding this comment

Uh oh!

sdavidbd commented Jul 31, 2025

Uh oh!

njhill left a comment • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

njhill left a comment

Choose a reason for hiding this comment

Uh oh!

lk-chen commented Jul 31, 2025

Uh oh!

NickLucche commented Jul 31, 2025

Uh oh!

DarkLight1337 commented Aug 1, 2025

Uh oh!

sdavidbd commented Aug 2, 2025

Uh oh!

sdavidbd commented Aug 3, 2025

Uh oh!

Uh oh!

sdavidbd commented Aug 3, 2025

Uh oh!

liuzijing2014 commented Aug 7, 2025

Uh oh!

hidva Nov 4, 2025

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

8 participants

sdavidbd commented Jul 30, 2025 •

edited by github-actions bot

Loading

njhill left a comment •

edited

Loading