Skip to content

Conversation

@mgoin
Copy link
Member

@mgoin mgoin commented May 1, 2025

Co-authored with @njhill.

We take advantage of the kv_connector_extra_config: dict[str, Any] already present in KVTransferConfig to stash all the connectors we want in an ordered list of kwargs. The new MultiConnector then creates a connector from each item in its list of KVTransferConfig.kv_connector_extra_config["connectors"]:

class MultiConnector(KVConnectorBase_V1):

    def __init__(self, vllm_config: "VllmConfig", role: KVConnectorRole):
        super().__init__(vllm_config=vllm_config, role=role)
        self._connectors = []
        ktcs = vllm_config.kv_transfer_config.kv_connector_extra_config.get("connectors")
        assert ktcs is not None
        for ktc in ktcs:
            temp_config = copy.copy(vllm_config)
            temp_config.kv_transfer_config = KVTransferConfig(**ktc)
            self._connectors.append(KVConnectorFactory.create_connector_v1(temp_config, role))

Example usage:

CUDA_VISIBLE_DEVICES=0 NIXL_ROLE="SENDER" vllm serve meta-llama/Llama-3.2-1B-Instruct \
    --port 8100 \
    --enforce-eager \
    --disable-log-requests \
    --kv-transfer-config '{"kv_connector":"MultiConnector","kv_role":"kv_both","kv_connector_extra_config":{"connectors":[{"kv_connector":"NixlConnector","kv_role":"kv_both"},{"kv_connector":"SharedStorageConnector","kv_role":"kv_both","kv_connector_extra_config":{"shared_storage_path":"local_storage"}}]}}'

@github-actions
Copy link

github-actions bot commented May 1, 2025

👋 Hi! Thank you for contributing to the vLLM project.

💬 Join our developer Slack at https://slack.vllm.ai to discuss your PR in #pr-reviews, coordinate on features in #feat- channels, or join special interest groups in #sig- channels.

Just a reminder: PRs would not trigger full CI run by default. Instead, it would only run fastcheck CI which starts running only a small and essential subset of CI tests to quickly catch errors. You can run other CI tests on top of those by going to your fastcheck build on Buildkite UI (linked in the PR checks section) and unblock them. If you do not have permission to unblock, ping simon-mo or khluu to add you in our Buildkite org.

Once the PR is approved and ready to go, your PR reviewer(s) can run CI to test the changes comprehensively before merging.

To run CI, PR reviewers can either: Add ready label to the PR or enable auto-merge.

🚀

@njhill njhill force-pushed the multi-kv-connectors branch from 02b54bd to f6a48c5 Compare May 3, 2025 15:36
@mergify mergify bot added the frontend label May 3, 2025
@njhill njhill force-pushed the multi-kv-connectors branch from f6a48c5 to 8423135 Compare May 3, 2025 15:44
@mgoin mgoin changed the title [WIP] Support multiple kv connectors [V1] Support multiple kv connectors May 6, 2025
@mgoin mgoin marked this pull request as ready for review May 6, 2025 12:11
@mgoin mgoin marked this pull request as draft May 6, 2025 12:12
@mgoin
Copy link
Member Author

mgoin commented May 6, 2025

Dependent on #17686 landing first for KVConnectorBase_V1 API changes/additions made in that PR.

njhill pushed a commit to neuralmagic/vllm that referenced this pull request May 10, 2025
From PR vllm-project#17564

Co-authored-by: Nick Hill <[email protected]>

Signed-off-by: mgoin <[email protected]>
Signed-off-by: Nick Hill <[email protected]>
@mergify
Copy link

mergify bot commented May 12, 2025

This pull request has merge conflicts that must be resolved before it can be
merged. Please rebase the PR, @mgoin.

https://docs.github.com/en/pull-requests/collaborating-with-pull-requests/working-with-forks/syncing-a-fork

@mergify mergify bot added the needs-rebase label May 12, 2025
@mergify mergify bot removed the needs-rebase label May 12, 2025
@mgoin mgoin marked this pull request as ready for review May 12, 2025 17:03
Copy link
Member

@tlrmchlsmth tlrmchlsmth left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, I think my main comment is that we should make sure to document the behavior of the loads. I.E. make sure the precedence order stuff is written down somewhere.

@mgoin mgoin added the ready ONLY add when PR is ready to merge/full CI is needed label May 13, 2025
This was there originally but inadvertently dropped

Signed-off-by: Nick Hill <[email protected]>
@maobaolong
Copy link
Contributor

@mgoin @njhill Thanks for this feature! It looks useful. And I want to make connectorA as P/D transfer connector, make connectorB as offload-reuse purpose connector, how to config to make the connectorA and connectorB co-exist?

@njhill
Copy link
Member

njhill commented May 14, 2025

@mgoin @njhill Thanks for this feature! It looks useful. And I want to make connectorA as P/D transfer connector, make connectorB as offload-reuse purpose connector, how to config to make the connectorA and connectorB co-exist?

@maobaolong We'll add better doc/comments for this but the logic initially is to:

  • Load from the first connector that advertises available tokens (in the order specified in the config)
  • Save to all connectors

@maobaolong
Copy link
Contributor

@mgoin @njhill Thanks for this feature! It looks useful. And I want to make connectorA as P/D transfer connector, make connectorB as offload-reuse purpose connector, how to config to make the connectorA and connectorB co-exist?

@maobaolong We'll add better doc/comments for this but the logic initially is to:

  • Load from the first connector that advertises available tokens (in the order specified in the config)
  • Save to all connectors

@njhill Thanks for your explain. And I'd like to know more about the motivation and the scenario, if you can add some more description, it will help. So I guess this PR can not solve the co-exist issue for P/D transfer connector and offload-reuse purpose connector ?

Signed-off-by: Nick Hill <[email protected]>
@njhill
Copy link
Member

njhill commented May 14, 2025

@mgoin @njhill Thanks for this feature! It looks useful. And I want to make connectorA as P/D transfer connector, make connectorB as offload-reuse purpose connector, how to config to make the connectorA and connectorB co-exist?

@maobaolong We'll add better doc/comments for this but the logic initially is to:

  • Load from the first connector that advertises available tokens (in the order specified in the config)
  • Save to all connectors

@njhill Thanks for your explain. And I'd like to know more about the motivation and the scenario, if you can add some more description, it will help. So I guess this PR can not solve the co-exist issue for P/D transfer connector and offload-reuse purpose connector ?

@maobaolong this is intended to be a starting point, it can work with P/D + offload-reuse if the P/D connector uses kv transfer params passed in the request to determine whether it should offer tokens to load. Which is how the current NixlConnector works - it will only return nonzero token count from get_matched_tokens if the kv transfer params in the request contain do_remote_prefill=True, otherwise it will fall back to the second connector which can be the one to reuse offloaded kvcache.

We can extend it with other loading selection logic if you have additional ideas...

@simon-mo simon-mo merged commit 2142035 into vllm-project:main May 14, 2025
62 of 64 checks passed
@njhill njhill deleted the multi-kv-connectors branch May 15, 2025 01:31
zzzyq pushed a commit to zzzyq/vllm that referenced this pull request May 24, 2025
Signed-off-by: mgoin <[email protected]>
Signed-off-by: Nick Hill <[email protected]>
Co-authored-by: Nick Hill <[email protected]>
Signed-off-by: Yuqi Zhang <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

ci/build frontend ready ONLY add when PR is ready to merge/full CI is needed v1

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants