[Bugfix][KV Transfer] Use kv_transfer_params for P2pNcclConnector coordination by eicherseiji · Pull Request #33947 · vllm-project/vllm

eicherseiji · 2026-02-05T22:25:46Z

After #27987, Prefill and Decode get different internal request_ids, breaking P/D coordination in the P2P NCCL connector. The connector currently encodes the remote address into the request_id string and parses it back out with a regex, which is also fragile.

Implements the design from https://gist.github.com/markmc/0c10179d49bb7fed8b737e1cfa56f912: switch to kv_transfer_params following the NIXL pattern. The proxy injects the decode instance's KV address before sending to prefill, prefill returns its own address and request ID on completion, and the proxy forwards those to decode.

Includes unit tests, updates to both proxy implementations, and design doc changes.

Repro steps

pip install quart aiohttp pyzmq msgpack

# Terminal 1: Proxy
python examples/online_serving/disaggregated_serving_p2p_nccl_xpyd/disagg_proxy_p2p_nccl_xpyd.py

# Terminal 2: Prefill (GPU 0)
CUDA_VISIBLE_DEVICES=0 vllm serve facebook/opt-125m \
    --enforce-eager --host 0.0.0.0 --port 20003 \
    --dtype float16 --max-model-len 2048 --gpu-memory-utilization 0.5 \
    --kv-transfer-config \
    '{"kv_connector":"P2pNcclConnector","kv_role":"kv_producer","kv_port":"21001","kv_connector_extra_config":{"proxy_ip":"0.0.0.0","proxy_port":"30001","http_port":"20003","send_type":"PUT_ASYNC"}}'

# Terminal 3: Decode (GPU 1)
CUDA_VISIBLE_DEVICES=1 vllm serve facebook/opt-125m \
    --enforce-eager --host 0.0.0.0 --port 20005 \
    --dtype float16 --max-model-len 2048 --gpu-memory-utilization 0.5 \
    --kv-transfer-config \
    '{"kv_connector":"P2pNcclConnector","kv_role":"kv_consumer","kv_buffer_size":"5e9","kv_port":"22001","kv_connector_extra_config":{"proxy_ip":"0.0.0.0","proxy_port":"30001","http_port":"20005","send_type":"PUT_ASYNC"}}'

# Terminal 4: Test
curl -s http://localhost:11001/v1/completions \
  -H "Content-Type: application/json" \
  -d '{"model":"facebook/opt-125m","prompt":"The capital of France is","max_tokens":20}' \
  | python -m json.tool

Without fix

546552007-3750c16c-d5b6-4d91-90c9-d33aa796f973

With fix

gemini-code-assist

Code Review

This pull request adds CI coverage for P2pNcclConnector and LMCacheConnectorV1 by adding new integration test steps and parameterizing the test script. The changes are well-structured and address a gap in test coverage. My feedback focuses on improving the new CI steps for better consistency and robustness by using the shared requirements file for installing dependencies.

.buildkite/test_areas/distributed.yaml

eicherseiji · 2026-02-07T09:22:29Z

Prioritizing bug fix, will follow up with CI test here: #34050

eicherseiji · 2026-02-07T09:35:17Z

/gemini review

gemini-code-assist

Code Review

This pull request addresses a critical bug in the P2P NCCL KV transfer mechanism, which was causing hangs due to the use of a randomized internal request_id. The fix involves replacing this with the consistent external_req_id for coordination between prefill and decode instances. The changes correctly propagate the external_req_id through the Request and NewRequestData structures and apply it within the P2pNcclConnector. The implementation appears correct and effectively resolves the issue. I have included one suggestion to enhance code maintainability by refactoring an indexed tuple into a more robust NamedTuple or dataclass.

gemini-code-assist · 2026-02-07T09:36:57Z

vllm/distributed/kv_transfer/kv_connector/v1/p2p/p2p_nccl_connector.py

        self._requests_need_load: dict[str, Any] = {}
        self.is_producer = self._kv_transfer_config.is_kv_producer
-        self.chunked_prefill: dict[str, tuple[list[int], list[int] | None]] = {}
+        self.chunked_prefill: dict[str, tuple[list[int], list[int] | None, str]] = {}


The chunked_prefill dictionary stores a tuple with three elements, which are then accessed by index (e.g., [0], [1], [2]) in build_connector_meta. This practice is fragile and can lead to bugs if the tuple structure is modified in the future. To improve readability and maintainability, consider using a NamedTuple or a dataclass to provide meaningful names to the fields.

For example:

from typing import NamedTuple class ChunkedPrefillData(NamedTuple): block_ids: list[int] prompt_token_ids: list[int] | None external_req_id: str # In __init__ self.chunked_prefill: dict[str, ChunkedPrefillData] = {} # Then in build_connector_meta, you can access fields by name: # prompt_token_ids = self.chunked_prefill[req_id].prompt_token_ids # kv_request_id = self.chunked_prefill[req_id].external_req_id

NickLucche

Hey thanks for the fix @eicherseiji !
Left a comment.
cc @markmc on whether this is the best approach to integrate the request_id.

NickLucche · 2026-02-10T08:40:42Z

vllm/v1/request.py

    ) -> None:
        self.request_id = request_id
+        self.external_req_id = external_req_id
        self.client_index = client_index


I would personally prefer not to edit request.py unless deemed necessary more generally, as we're still trying to figure out how much the nccl one is actually used.
I think this mapping could be stored within the connector itself for the time being.
Let's hear @markmc thoughts on this global change

Makes sense. I could also strip the trailing request ID characters, or add a method that does this on InputProcessor. TIA for taking a look @markmc!

markmc · 2026-02-10T11:30:46Z

Hi @eicherseiji

I'm not familiar at all with this connector, and don't really have time right now to dig in deeply. But I did spend some time with Claude, trying to formulate some feedback and a recommendation based on how the NIXL connector works and also #32937 for the moriio connector. Obviously I could be missing something, but I think this is pretty solid: https://gist.github.com/markmc/0c10179d49bb7fed8b737e1cfa56f912

shwgao · 2026-02-10T22:34:50Z

Hi, apologies — I didn't notice that #33947 by @eicherseiji already addresses the same bug before I opened this PR. Sorry for the duplicate!

That said, the two PRs take different approaches:

#33947: Propagates external_req_id through Request and NewRequestData (changes 3 files: request.py, output.py, p2p_nccl_connector.py)
#34278: Strips the assign_request_id() random suffix inside the connector itself via _strip_internal_id_suffix() (changes 1 file: p2p_nccl_connector.py only)
My approach avoids modifying request.py / output.py, which aligns with @NickLucche's comment on #33947 suggesting the mapping be kept within the connector for now.

Totally fine to close #34278. Just wanted to offer an alternative, happy to defer to whatever the maintainers decide.

orozery · 2026-02-11T06:49:50Z

Obviously I could be missing something, but I think this is pretty solid: https://gist.github.com/markmc/0c10179d49bb7fed8b737e1cfa56f912

I agree. I think Request.kv_transfer_params is a perfect fit for the solution.

njhill · 2026-02-11T16:58:01Z

Agree we should avoid changing request.py and output.py for this if possible.

shwgao · 2026-02-11T17:05:07Z

Obviously I could be missing something, but I think this is pretty solid: https://gist.github.com/markmc/0c10179d49bb7fed8b737e1cfa56f912

I agree. I think Request.kv_transfer_params is a perfect fit for the solution.

Agree, the proper long-term fix is to follow the NIXL pattern. I could also take a look at this factor after @eicherseiji updates the PR, since I am currently working close to the P2pncclconnector.
But, this design touches multiple components (connector + proxy + protocol), it goes beyond a simple bugfix. The P2P NCCL connector is currently completely broken on main, so it might be worth landing a minimal hotfix first to unblock users, then following up with the NIXL-pattern refactor as a separate PR.

eicherseiji · 2026-02-11T18:24:06Z

Thanks @markmc, all for feedback. Will proceed with the kv_transfer_params design here.

In the meantime, maybe we can merge @shwgao's #34278 to recover main? @NickLucche, thoughts?

markmc · 2026-02-11T20:58:09Z

I'd be more inclined to accept a CLI argument to disable the request ID randomization - this would be a temporary feature available to users of the broken connectors as a workaround

The P2P NCCL connector encoded network addresses in request_id strings and parsed them with regex. After PR vllm-project#27987, prefill and decode have different internal request_ids, breaking this scheme. Follow the NIXL connector pattern: prefill returns its internal request_id and KV address via kv_transfer_params in the API response; the proxy forwards these to decode for coordination. No core engine changes required. Design: https://gist.github.com/markmc/0c10179d49bb7fed8b737e1cfa56f912 Signed-off-by: Seiji Eicher <seiji@anyscale.com>

mergify · 2026-02-12T09:27:31Z

Documentation preview: https://vllm--33947.org.readthedocs.build/en/33947/

eicherseiji · 2026-02-12T09:44:07Z

@markmc, if it's temporary, thoughts on an environment variable for a more minimal change? #34415

This PR is ready to review.

markmc · 2026-02-12T12:17:23Z

@markmc, if it's temporary, thoughts on an environment variable for a more minimal change? #34415

This PR is ready to review.

Thanks - I think this is a pragmatic solution 👍

eicherseiji · 2026-02-18T03:37:10Z

@markmc bumping for your review when you have a chance. Thanks!

mergify bot added ci/build v1 kv-connector labels Feb 5, 2026

gemini-code-assist bot reviewed Feb 5, 2026

View reviewed changes

.buildkite/test_areas/distributed.yaml Outdated Show resolved Hide resolved

.buildkite/test_areas/distributed.yaml Outdated Show resolved Hide resolved

mergify bot added the tpu Related to Google TPUs label Feb 5, 2026

eicherseiji closed this Feb 5, 2026

eicherseiji reopened this Feb 5, 2026

eicherseiji changed the title ~~[CI] Add integration tests for P2pNccl and LMCache connectors~~ [WIP][CI] Add integration tests for P2pNccl and LMCache connectors Feb 5, 2026

eicherseiji changed the title ~~[WIP][CI] Add integration tests for P2pNccl and LMCache connectors~~ [CI][Bugfix] Fix P2P NCCL KV transfer + add PD connector integration tests Feb 7, 2026

eicherseiji changed the title ~~[CI][Bugfix] Fix P2P NCCL KV transfer + add PD connector integration tests~~ [Bugfix] Fix P2P NCCL KV transfer using external_req_id Feb 7, 2026

mergify bot added the bug Something isn't working label Feb 7, 2026

eicherseiji force-pushed the ci/add-p2p-lmcache-connector-tests branch from da41489 to d9cec5f Compare February 7, 2026 08:49

mergify bot removed the tpu Related to Google TPUs label Feb 7, 2026

eicherseiji closed this Feb 7, 2026

eicherseiji reopened this Feb 7, 2026

eicherseiji force-pushed the ci/add-p2p-lmcache-connector-tests branch from 0c013d9 to ac9f7ef Compare February 7, 2026 09:18

eicherseiji marked this pull request as ready for review February 7, 2026 09:22

eicherseiji requested review from ApostaC, NickLucche, WoosukKwon, alexm-redhat, heheda12345, njhill, orozery, robertgshaw2-redhat and ywang96 as code owners February 7, 2026 09:22

gemini-code-assist bot reviewed Feb 7, 2026

View reviewed changes

NickLucche reviewed Feb 10, 2026

View reviewed changes

markmc mentioned this pull request Feb 10, 2026

[Bugfix] Fix P2pNcclConnector NCCL send/recv key mismatch in disaggregated prefill XpYd #34278

Closed

5 tasks

shwgao mentioned this pull request Feb 10, 2026

[Bug]: P2pNcclConnector NCCL send/recv key mismatch in disaggregated prefill XpYd architecture due to assign_request_id() random suffix #34277

Open

4 tasks

tlrmchlsmth added this to Large-Scale Serving Feb 11, 2026

github-project-automation bot moved this to Backlog in Large-Scale Serving Feb 11, 2026

tlrmchlsmth moved this from Backlog to In review in Large-Scale Serving Feb 11, 2026

eicherseiji changed the title ~~[Bugfix] Fix P2P NCCL KV transfer using external_req_id~~ [Bugfix][KV Transfer] Use kv_transfer_params for P2pNcclConnector coordination Feb 12, 2026

eicherseiji force-pushed the ci/add-p2p-lmcache-connector-tests branch from ac9f7ef to 47b2549 Compare February 12, 2026 09:26

mergify bot added documentation Improvements or additions to documentation performance Performance-related issues labels Feb 12, 2026

eicherseiji mentioned this pull request Feb 12, 2026

[KV Connector] Add temporary, off-by-default VLLM_DISABLE_REQUEST_ID_RANDOMIZATION workaround #34415

Merged

eicherseiji requested a review from NickLucche February 13, 2026 19:51

njhill mentioned this pull request Feb 18, 2026

fix: use external_req_id for P2P NCCL keys in disaggregated prefill #34747

Open

Uh oh!

Conversation

eicherseiji commented Feb 5, 2026 • edited by github-actions bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Repro steps

Without fix

With fix

Uh oh!

gemini-code-assist bot left a comment

Choose a reason for hiding this comment

Code Review

Uh oh!

Uh oh!

Uh oh!

eicherseiji commented Feb 7, 2026

Uh oh!

eicherseiji commented Feb 7, 2026

Uh oh!

gemini-code-assist bot left a comment

Choose a reason for hiding this comment

Code Review

Uh oh!

gemini-code-assist bot Feb 7, 2026

Choose a reason for hiding this comment

Uh oh!

NickLucche left a comment

Choose a reason for hiding this comment

Uh oh!

NickLucche Feb 10, 2026

Choose a reason for hiding this comment

Uh oh!

eicherseiji Feb 10, 2026

Choose a reason for hiding this comment

Uh oh!

markmc commented Feb 10, 2026

Uh oh!

shwgao commented Feb 10, 2026

Uh oh!

orozery commented Feb 11, 2026

Uh oh!

njhill commented Feb 11, 2026

Uh oh!

shwgao commented Feb 11, 2026

Uh oh!

eicherseiji commented Feb 11, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

markmc commented Feb 11, 2026

Uh oh!

mergify bot commented Feb 12, 2026

Uh oh!

eicherseiji commented Feb 12, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

markmc commented Feb 12, 2026

Uh oh!

eicherseiji commented Feb 18, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

7 participants

eicherseiji commented Feb 5, 2026 •

edited by github-actions bot

Loading

eicherseiji commented Feb 11, 2026 •

edited

Loading

eicherseiji commented Feb 12, 2026 •

edited

Loading