Skip to content

fix multiconnector for multi connector use push kv connector#31144

Open
liziyu179 wants to merge 1 commit intovllm-project:mainfrom
liziyu179:fix_multi_connector
Open

fix multiconnector for multi connector use push kv connector#31144
liziyu179 wants to merge 1 commit intovllm-project:mainfrom
liziyu179:fix_multi_connector

Conversation

@liziyu179
Copy link

@liziyu179 liziyu179 commented Dec 22, 2025

Purpose

a bugfix for multi connector use push kv connector,pass real blocks to kv connector.

Fixes #31145.

Test Plan

Test Result


Essential Elements of an Effective PR Description Checklist
  • The purpose of the PR, such as "Fix some issue (link existing issues this PR will resolve)".
  • The test plan, such as providing test command.
  • The test results, such as pasting the results comparison before and after, or e2e results
  • (Optional) The necessary documentation update, such as updating supported_models.md and examples for a new model.
  • (Optional) Release notes update. If your change is user facing, please update the release notes draft in the Google Doc.

Copy link
Contributor

@gemini-code-assist gemini-code-assist bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request addresses a bug in the MultiConnector by correctly passing real KV cache blocks to connectors configured as KV producers with PUT_ASYNC or PUT send types. The change ensures that all relevant connectors receive the necessary data for their operations, improving the correctness of KV transfer in multi-connector setups. The logic appears sound and directly resolves the described issue.

@mergify
Copy link

mergify bot commented Dec 22, 2025

Hi @liziyu179, the pre-commit checks have failed. Please run:

uv pip install pre-commit
pre-commit install
pre-commit run --all-files

Then, commit the changes and push to your branch.

For future commits, pre-commit will run automatically on changed files before each commit.

Tip

Is mypy or markdownlint failing?
mypy and markdownlint are run differently in CI. If the failure is related to either of these checks, please use the following commands to run them locally:
# For mypy (substitute "3.10" with the failing version if needed)
pre-commit run --hook-stage manual mypy-3.10
# For markdownlint
pre-commit run --hook-stage manual markdownlint

@github-actions
Copy link

👋 Hi! Thank you for contributing to the vLLM project.

💬 Join our developer Slack at https://slack.vllm.ai to discuss your PR in #pr-reviews, coordinate on features in #feat- channels, or join special interest groups in #sig- channels.

Just a reminder: PRs would not trigger full CI run by default. Instead, it would only run fastcheck CI which starts running only a small and essential subset of CI tests to quickly catch errors.

You ask your reviewers to trigger select CI tests on top of fastcheck CI.

Once the PR is approved and ready to go, your PR reviewer(s) can run CI to test the changes comprehensively before merging.

To run CI, PR reviewers can either: Add ready label to the PR or enable auto-merge.

If you have any questions, please reach out to us on Slack at https://slack.vllm.ai.

🚀

Signed-off-by: liziyu <liziyu16@huawei.com>
@LCAIZJ
Copy link
Contributor

LCAIZJ commented Jan 4, 2026

@NickLucche Could you please take a look at the issue addressed by this PR when you have time?

Copy link
Collaborator

@NickLucche NickLucche left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for spotting this @liziyu179 !

Personally, I think this level of detail should be abstracted away from the multiconnector, but more generally I am not sure that prioritizing push connectors (with connector-specific format) is the way to go.
I also understand that given the inherent flexibility of the interface, cases like this where some connectors are not "compatible" may arise again.
Perhaps we should consider again some "priority" level of make the inherent one more clear?

@njhill do you have a better idea on how the MultiConnector behavior should be modified to accomodate this case?

cc @sdavidbd who may also be interested in the discussion

@sdavidbd
Copy link
Contributor

sdavidbd commented Jan 7, 2026

I agree with @NickLucche that the MultiConnector implementation should remain agnostic to any specific connector’s internal semantics.

As I understand it, MultiConnector today supports loading a given request’s KV from a single connector (specifically, the first connector that reports a non-zero match via get_num_new_matched_tokens()), while supporting save/send via multiple connectors.

On the API semantics: I believe update_state_after_alloc() was designed to be tightly coupled to get_num_new_matched_tokens() and intended for the load path. The num_external_tokens parameter is explicitly “the number of tokens that will be loaded from the external KV cache,” (i.e., the value returned by get_num_new_matched_tokens()). In Scheduler::schedule, comments in the code also frame update_state_after_alloc() as load-related (e.g., “used to determine if a load is needed”).

Accordingly, if a connector is using update_state_after_alloc() as the mechanism to derive save operations, that assumption will not hold under MultiConnector’s current contract: when there is no external hit, update_state_after_alloc() will not be invoked at all, by design. (Related: it seems the NixlConnector's CPU save path will also run into the same issue.)

For deriving save/send work, the intended mechanism appears to be build_connector_meta(), which receives SchedulerOutput and therefore has access to the per-request metadata accumulated during scheduling (including any metadata contributed via update_state_after_alloc() when applicable).

Concretely, I think the correct fix here is to derive save/send work in build_connector_meta().

That said, I agree the name update_state_after_alloc() can be misleading, since it reads like a general “post-allocation hook” for both load and save. I would suggest either renaming it to make the intent explicit (e.g., update_state_for_load_after_alloc()), or considering deprecating it altogether - since in principle build_connector_meta() can be sufficient to derive both load and save work.

@njhill
Copy link
Member

njhill commented Jan 9, 2026

I agree with what @sdavidbd wrote, the intention is for get_num_new_matched_tokens and update_state_after_alloc to be used on the loading/receiving side only, and to use other hooks for the sending/saving.

@liziyu179 it would help if you could provide an example of a "push" connector implementation of the kind you're talking about here.

Related: it seems the NixlConnector's CPU save path will also run into the same issue.

@sdavidbd I don't quite understand what you mean by this?

@sdavidbd
Copy link
Contributor

Related: it seems the NixlConnector's CPU save path will also run into the same issue.

@sdavidbd I don't quite understand what you mean by this?

@njhill I was referring to the NixlConnector path that offloads KV to a CPU buffer (added for TPU support in PR #18293). In that implementation, the connector appears to mark/derive the save work in update_state_after_alloc(). Under MultiConnector, update_state_after_alloc() is not invoked at all on do_remote_decode, so relying on it to schedule CPU offload/save work would not be reliable when MultiConnector is used.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

[Bug]: [P/D] multi-connector cannot be used together with P2pNcclConnector that uses a put mode(push kv cache from P node to D node)

5 participants