[Bugfix][KV Transfer] Use kv_transfer_params for P2pNcclConnector coordination#33947
[Bugfix][KV Transfer] Use kv_transfer_params for P2pNcclConnector coordination#33947eicherseiji wants to merge 1 commit intovllm-project:mainfrom
Conversation
There was a problem hiding this comment.
Code Review
This pull request adds CI coverage for P2pNcclConnector and LMCacheConnectorV1 by adding new integration test steps and parameterizing the test script. The changes are well-structured and address a gap in test coverage. My feedback focuses on improving the new CI steps for better consistency and robustness by using the shared requirements file for installing dependencies.
da41489 to
d9cec5f
Compare
0c013d9 to
ac9f7ef
Compare
|
Prioritizing bug fix, will follow up with CI test here: #34050 |
|
/gemini review |
There was a problem hiding this comment.
Code Review
This pull request addresses a critical bug in the P2P NCCL KV transfer mechanism, which was causing hangs due to the use of a randomized internal request_id. The fix involves replacing this with the consistent external_req_id for coordination between prefill and decode instances. The changes correctly propagate the external_req_id through the Request and NewRequestData structures and apply it within the P2pNcclConnector. The implementation appears correct and effectively resolves the issue. I have included one suggestion to enhance code maintainability by refactoring an indexed tuple into a more robust NamedTuple or dataclass.
| self._requests_need_load: dict[str, Any] = {} | ||
| self.is_producer = self._kv_transfer_config.is_kv_producer | ||
| self.chunked_prefill: dict[str, tuple[list[int], list[int] | None]] = {} | ||
| self.chunked_prefill: dict[str, tuple[list[int], list[int] | None, str]] = {} |
There was a problem hiding this comment.
The chunked_prefill dictionary stores a tuple with three elements, which are then accessed by index (e.g., [0], [1], [2]) in build_connector_meta. This practice is fragile and can lead to bugs if the tuple structure is modified in the future. To improve readability and maintainability, consider using a NamedTuple or a dataclass to provide meaningful names to the fields.
For example:
from typing import NamedTuple
class ChunkedPrefillData(NamedTuple):
block_ids: list[int]
prompt_token_ids: list[int] | None
external_req_id: str
# In __init__
self.chunked_prefill: dict[str, ChunkedPrefillData] = {}
# Then in build_connector_meta, you can access fields by name:
# prompt_token_ids = self.chunked_prefill[req_id].prompt_token_ids
# kv_request_id = self.chunked_prefill[req_id].external_req_id
NickLucche
left a comment
There was a problem hiding this comment.
Hey thanks for the fix @eicherseiji !
Left a comment.
cc @markmc on whether this is the best approach to integrate the request_id.
| ) -> None: | ||
| self.request_id = request_id | ||
| self.external_req_id = external_req_id | ||
| self.client_index = client_index |
There was a problem hiding this comment.
I would personally prefer not to edit request.py unless deemed necessary more generally, as we're still trying to figure out how much the nccl one is actually used.
I think this mapping could be stored within the connector itself for the time being.
Let's hear @markmc thoughts on this global change
There was a problem hiding this comment.
Makes sense. I could also strip the trailing request ID characters, or add a method that does this on InputProcessor. TIA for taking a look @markmc!
|
Hi @eicherseiji I'm not familiar at all with this connector, and don't really have time right now to dig in deeply. But I did spend some time with Claude, trying to formulate some feedback and a recommendation based on how the NIXL connector works and also #32937 for the moriio connector. Obviously I could be missing something, but I think this is pretty solid: https://gist.github.com/markmc/0c10179d49bb7fed8b737e1cfa56f912 |
|
Hi, apologies — I didn't notice that #33947 by @eicherseiji already addresses the same bug before I opened this PR. Sorry for the duplicate! That said, the two PRs take different approaches: #33947: Propagates external_req_id through Request and NewRequestData (changes 3 files: request.py, output.py, p2p_nccl_connector.py) Totally fine to close #34278. Just wanted to offer an alternative, happy to defer to whatever the maintainers decide. |
I agree. I think |
|
Agree we should avoid changing |
Agree, the proper long-term fix is to follow the NIXL pattern. I could also take a look at this factor after @eicherseiji updates the PR, since I am currently working close to the P2pncclconnector. |
|
Thanks @markmc, all for feedback. Will proceed with the In the meantime, maybe we can merge @shwgao's #34278 to recover |
|
I'd be more inclined to accept a CLI argument to disable the request ID randomization - this would be a temporary feature available to users of the broken connectors as a workaround |
The P2P NCCL connector encoded network addresses in request_id strings and parsed them with regex. After PR vllm-project#27987, prefill and decode have different internal request_ids, breaking this scheme. Follow the NIXL connector pattern: prefill returns its internal request_id and KV address via kv_transfer_params in the API response; the proxy forwards these to decode for coordination. No core engine changes required. Design: https://gist.github.com/markmc/0c10179d49bb7fed8b737e1cfa56f912 Signed-off-by: Seiji Eicher <seiji@anyscale.com>
ac9f7ef to
47b2549
Compare
|
Documentation preview: https://vllm--33947.org.readthedocs.build/en/33947/ |
|
@markmc bumping for your review when you have a chance. Thanks! |
After #27987, Prefill and Decode get different internal
request_ids, breaking P/D coordination in the P2P NCCL connector. The connector currently encodes the remote address into the request_id string and parses it back out with a regex, which is also fragile.Implements the design from https://gist.github.com/markmc/0c10179d49bb7fed8b737e1cfa56f912: switch to
kv_transfer_paramsfollowing the NIXL pattern. The proxy injects the decode instance's KV address before sending to prefill, prefill returns its own address and request ID on completion, and the proxy forwards those to decode.Includes unit tests, updates to both proxy implementations, and design doc changes.
Repro steps
Without fix
With fix