[ROCm][P/D][MORI][BugFix] Add transfer_id for moriio_connector so moriio_connector to restore P/D functionality#34907
Conversation
Signed-off-by: Randall Smith <Randall.Smith@amd.com>
Signed-off-by: Randall Smith <Randall.Smith@amd.com>
|
Documentation preview: https://vllm--34907.org.readthedocs.build/en/34907/ |
There was a problem hiding this comment.
Code Review
The pull request introduces a transfer_id to the MORI KV connector to maintain consistent request-to-transfer mappings across scheduler and worker processes, addressing issues caused by random suffixes in completion IDs. While the architectural change is appropriate, the current implementation contains several critical issues, including NameErrors in log messages and exception handlers, and potential runtime crashes due to missing keys in mapping dictionaries, particularly for aborted requests or when operating in non-WRITE modes.
vllm/distributed/kv_transfer/kv_connector/v1/moriio/moriio_connector.py
Outdated
Show resolved
Hide resolved
vllm/distributed/kv_transfer/kv_connector/v1/moriio/moriio_connector.py
Outdated
Show resolved
Hide resolved
vllm/distributed/kv_transfer/kv_connector/v1/moriio/moriio_engine.py
Outdated
Show resolved
Hide resolved
vllm/distributed/kv_transfer/kv_connector/v1/moriio/moriio_engine.py
Outdated
Show resolved
Hide resolved
Signed-off-by: Randall Smith <Randall.Smith@amd.com>
|
Hi @rasmith, the pre-commit checks have failed. Please run: uv pip install pre-commit
pre-commit install
pre-commit run --all-filesThen, commit the changes and push to your branch. For future commits, Tip Is
|
Signed-off-by: Randall Smith <Randall.Smith@amd.com>
Signed-off-by: Randall Smith <Randall.Smith@amd.com>
| # Reqs to send and their expiration time | ||
| self._reqs_need_send: dict[ReqId, float] = {} | ||
| self.paths: dict[str, zmq.Socket] = {} | ||
| self.transfer_id_to_request_id: dict[TransferId, ReqId] = {} |
There was a problem hiding this comment.
Thanks @rasmith a lot! Would it be better to define two mapping functions here?
There was a problem hiding this comment.
Depending on the function name, this could save some horizontal space, but what other benefit could be had?
There was a problem hiding this comment.
For versatility and maintainability, if it's only used within the class, I think using a dictionary is also fine.
There was a problem hiding this comment.
Not sure what you meant actually, but I added a map and unmap functions for request-id-transfer-id pairs. If you mean to swap the dictionaries for functions, its also possible to implement a class with __get_item__ and use that instead. It should be pretty easy, so if the time arises it shouldn't be too hard, but for the time being I don't see a need for it.
|
@orozery @NickLucche @ApostaC Please take a look? |
Signed-off-by: Randall Smith <Randall.Smith@amd.com>
Signed-off-by: Randall Smith <Randall.Smith@amd.com>
|
@rasmith |
|
@vllmellm Let's help to verify this along with the environment variable to help land this PR. |
It doesn't need randomization to be disabled. |
|
@rasmith please try to fix the doc CI error. |
|
Verified both RDMA write and read modes on Benchmark config: Both modes complete successfully without needing to set
|
|
Added the |
|
@inkcherry Please take another look? |
Signed-off-by: Randall Smith <Randall.Smith@amd.com>
Fixed |
Signed-off-by: Randall Smith <Randall.Smith@amd.com>
…roxy Replace NixlConnector with MoRIIOConnector for KV cache transfer and replace the Rust-based vllm-router with a MoRI-IO-aware Python proxy that handles both HTTP routing and ZMQ-based RDMA endpoint discovery. The key architectural change is that the proxy enriches each request's kv_transfer_params with remote RDMA endpoint info (handshake_port, notify_port, host, port) before dispatching, enabling concurrent prefill+decode in WRITE mode — something vllm-router could not do because it only understands HTTP, not the MoRI-IO registration protocol. Changes: - Add moriio_proxy.py: MoRI-IO-aware proxy with ZMQ service discovery, request enrichment, and /health endpoint (adapted from vLLM upstream moriio_toy_proxy_server.py) - server.sh: switch --kv-transfer-config from NixlConnector to MoRIIOConnector with kv_connector_extra_config (proxy_ip, proxy_ping_port, http_port); launch proxy before prefill on NODE_RANK=0; set VLLM_DISABLE_REQUEST_ID_RANDOMIZATION=1 as workaround for v0.17.1 completion-ID mismatch (upstream fix: vllm-project/vllm#34907) - setup_deps.sh: replace vllm-router/Rust install with lightweight Python deps (quart, aiohttp, msgpack, pyzmq) for the proxy Benchmark (Job 2853 vs 2818 NixlConnector baseline, ISL/OSL=1024): TTFT median: -37% to -55% across C8–C64 (e.g. 384→241ms @C64) TTFT p99: -63% at C64 (6622→2469ms) Throughput: +8% at C64 (2634→2844 tok/s) TPOT: unchanged (~22ms @C64)
|
Issue: lm_eval fails against the disaggregated proxy Issue 1 — endpoint mismatch (400 error) When running The proxy's registered Issue 2 — wrong content-type The proxy response carries Attempted quick fix
def rewrite_url(registered_url, path):
return re.sub(r'/v1/(chat/)?completions$', path, registered_url)
response.headers['Content-Type'] = decode_response.headers.get('Content-Type', 'application/json')GSM8K (5-shot, 50 samples, RDMA write mode): lm_eval --model local-chat-completions \
--model_args model=Qwen/Qwen3-235B-A22B-FP8,base_url=http://localhost:10001/v1/chat/completions,tokenizer=Qwen/Qwen3-235B-A22B-FP8,num_concurrent=1 \
--tasks gsm8k --num_fewshot 5 --apply_chat_template --batch_size 1 \
--gen_kwargs '{"max_tokens": 4096}'
|
@junkang1991 This just disables chat completion if I am correct. Is the issue you describe a problem with this PR? |
@rasmith @inkcherry can you guys help to take a look at this problem? |
|
Is this some kind of test group that fails? |
|
@AndreasKaratzas it seems in this PR, The NIXL tests are not triggered automatically. I have just manually triggered one of the tests and I will come back and check the status again, to know if it captures the failure. |
|
Thanks @junkang1991, @tjtanaa for your efforts. @knitcapcat-amd let's refine the proxy example based on the feedback to ensure better compatibility across more scenarios. Contributions from @rasmith are also very welcome. |
Indeed, this PR is behind the V1 others regression on ROCm too. |
@tjtanaa @inkcherry I'll look into it |
…iio_connector to restore P/D functionality (vllm-project#34907) Signed-off-by: Randall Smith <Randall.Smith@amd.com>
…iio_connector to restore P/D functionality (vllm-project#34907) Signed-off-by: Randall Smith <Randall.Smith@amd.com>
…iio_connector to restore P/D functionality (vllm-project#34907) Signed-off-by: Randall Smith <Randall.Smith@amd.com>
…iio_connector to restore P/D functionality (vllm-project#34907) Signed-off-by: Randall Smith <Randall.Smith@amd.com>
Purpose
This PR introduced random suffixes to completion ids to better differentiate services in logs. That PR was able to update the other KV connector implementations, but the modifications for those KV connectors could not be applied to the MORI KV connector.
The result is that the MORI KV connector hangs indefinitely, because it is waiting for a completion id that will never arrive.
This PR was opened to fix the problem, but the author has ceased working on it. It seems that this was done in favor of this PR. However, the second PR attempts to modify the scheduler which is undesirable.
The first PR introduced a transaction id, which seems the most natural solution, and maintains some consistency with the other KV connector solutions. However, one of the issues with getting the MORI KV connector to work is that the decode worker needs to receive the completion id and furthermore needs to avoid any race conditions when associating a transaction id with a completion id.
This PR stashes the
transfer_id_to_request_idrelationship in theMoRIIOConnectorMetadata, which the scheduler will store in theSchedulerOutput. TheMoRIIOConnectorMetadatastored in theSchedulerOutputwill then be received bystart_load_kvin theMoRIIOConnectorWorker, which finally allows both the scheduler and worker processes to maintain the same relationship of transfer id and completion id pairings.Test Plan
I tested 1P1D on a single machine and with two machines to check that functionality had been restored. I used the following justfile:
Essential Elements of an Effective PR Description Checklist
supported_models.mdandexamplesfor a new model.