[Fix][MoRI] Align MoRI-IO message format with P2pNcclConnector and vllm-router#39565
Conversation
…fault ports Signed-off-by: simondanielsson <simon.danielsson99@hotmail.com>
…rs for router compatibility Signed-off-by: simondanielsson <simon.danielsson99@hotmail.com>
Similar to the P2pNcclConnector, we embed the nonstandard transfer fields into the zmq_address which is part of the request_id. These fields include remote_host, remote_port, remote_handshake_port, and remote_notify_port which were previously required to be sent by the router. That would require special logic in the router just for this specific KV connector, so instead we follow the logic in P2pNcclConnector and put any specific metadata inside the request ID. Signed-off-by: simondanielsson <simon.danielsson99@hotmail.com>
Signed-off-by: simondanielsson <simon.danielsson99@hotmail.com>
Signed-off-by: simondanielsson <simon.danielsson99@hotmail.com>
|
Documentation preview: https://vllm--39565.org.readthedocs.build/en/39565/ |
|
Re-opened from #38813 |
There was a problem hiding this comment.
Code Review
This pull request refactors MoRI-IO disaggregated serving to embed ZMQ connection metadata within the request_id, allowing the connector to derive peer information without explicit routing parameters. Changes include updating the toy proxy registration logic and adding parsing utilities in the common module. Feedback highlights potential service instability, specifically noting that unhandled exceptions in the background discovery thread could terminate the process and that malformed ZMQ addresses could cause engine crashes during integer conversion.
…to defaults Signed-off-by: simondanielsson <simon.danielsson99@hotmail.com>
Signed-off-by: simondanielsson <simon.danielsson99@hotmail.com>
|
/gemini review |
There was a problem hiding this comment.
Code Review
This pull request refactors the MoRI-IO disaggregated serving proxy and connector to communicate connection details through the request ID, aligning with vLLM's routing architecture. It introduces ZMQ address parsing utilities and updates the registration protocol to use a more robust format. Feedback identifies a potential KeyError in the toy proxy's WRITE mode due to a missing transfer ID and suggests updating registration logic to handle instance restarts correctly by refreshing existing entries.
Signed-off-by: simondanielsson <simon.danielsson99@hotmail.com>
Signed-off-by: simondanielsson <simon.danielsson99@hotmail.com>
Signed-off-by: simondanielsson <simon.danielsson99@hotmail.com>
|
/gemini review |
There was a problem hiding this comment.
Code Review
This pull request refactors the MoRI-IO disaggregated serving communication by embedding peer connection information (ZMQ addresses) directly into the request_id. This change eliminates the need for the router to explicitly pass host and port details in kv_transfer_params, aligning the implementation with the P2P-NCCL connector's approach. The toy proxy server and the MoRI-IO connector have been updated to support this new registration and address resolution logic. I have no feedback to provide.
|
This pull request has merge conflicts that must be resolved before it can be |
…ssages Signed-off-by: simondanielsson <simon.danielsson99@hotmail.com>
| req_data_copy["kv_transfer_params"].update( | ||
| { | ||
| "do_remote_decode": True, | ||
| "do_remote_prefill": False, |
There was a problem hiding this comment.
Explanation: These were MoRI-specific fields. For uniformity we instead we embed them into the zmq_address which is then injected into the request id, similar to P2pNccl
…lm-router (vllm-project#39565) Signed-off-by: simondanielsson <simon.danielsson99@hotmail.com> Co-authored-by: Matvei Pashkovskii <mpashkov@amd.com> Signed-off-by: Avinash Singh <avinashsingh.rcoem@gmail.com>
…lm-router (vllm-project#39565) Signed-off-by: simondanielsson <simon.danielsson99@hotmail.com> Co-authored-by: Matvei Pashkovskii <mpashkov@amd.com> Signed-off-by: Adrian <info@zzit.ch>
…lm-router (vllm-project#39565) Signed-off-by: simondanielsson <simon.danielsson99@hotmail.com> Co-authored-by: Matvei Pashkovskii <mpashkov@amd.com> Co-authored-by: hongbolv <33214277+hongbolv@users.noreply.github.com>
…lm-router (vllm-project#39565) Signed-off-by: simondanielsson <simon.danielsson99@hotmail.com> Co-authored-by: Matvei Pashkovskii <mpashkov@amd.com>
…lm-router (vllm-project#39565) Signed-off-by: simondanielsson <simon.danielsson99@hotmail.com> Co-authored-by: Matvei Pashkovskii <mpashkov@amd.com>
…lm-router (vllm-project#39565) Signed-off-by: simondanielsson <simon.danielsson99@hotmail.com> Co-authored-by: Matvei Pashkovskii <mpashkov@amd.com>
…lm-router (vllm-project#39565) Signed-off-by: simondanielsson <simon.danielsson99@hotmail.com> Co-authored-by: Matvei Pashkovskii <mpashkov@amd.com>
…lm-router (vllm-project#39565) Signed-off-by: simondanielsson <simon.danielsson99@hotmail.com> Co-authored-by: Matvei Pashkovskii <mpashkov@amd.com>
…lm-router (vllm-project#39565) Signed-off-by: simondanielsson <simon.danielsson99@hotmail.com> Co-authored-by: Matvei Pashkovskii <mpashkov@amd.com>
…lm-router (vllm-project#39565) Signed-off-by: simondanielsson <simon.danielsson99@hotmail.com> Co-authored-by: Matvei Pashkovskii <mpashkov@amd.com> Signed-off-by: Matt Van Horn <455140+mvanhorn@users.noreply.github.com>
Purpose
Fixes #38692.
This PR aligns the message formats of the MoRI-IO KV Connector with the P2pNcclConnector, making MoRI-IO itself compatible with vllm-router with minimal changes required on the router side.
The changes made are:
The benefits of this PR are two-fold:
This already works with the toy proxy. To make MoRI connector work with vllm-router, we also need these two PRs on the router side:
vllm bench serve)Codeveloped with: @mpashkovskii
Test Plan
We'll compare using vllm bench serve and accuracy using GSM8k. Reproducer scripts can be found in this temporary branch: mpashkovskii#4
Example below how you vllm bench serve w/ 1P1D on 2 nodes using DSR1, using MoRIIOConnector and vllm-router:
Build vllm from source on this branch, and include broadcom NIC drivers OR simply pull these images I already built on this branch using this Dockerfile.
Also pull the router image, or build (see instructions here):
Then checkout the branch containing reproducer scripts, and run
This will launch one vllm prefill instance and the vllm router on the prefill node, and a vllm decode instance on the decode node, and run
vllm bench serve.Test Result
2 nodes.
1P1D on two 8xMI300X nodes, DeepSeek-R1-0528 TP8EP8, MoRI-IO KV connector, 1k/1k ISL/OSL.
Note: concurrencies 16 and 32 use eager mode, concurrency >=64 use PIECEWISE compilation mode one the decode instance.
See full bench results
Concurrency: 16
Concurrency: 32
Concurrency: 64 + PIECEWISE cudagraphs in decode instance:
1 node
1P1D on two MI300X devices, Qwen3-8b, MoRI-IO Connector, 1k/1k ISL/OSL
See full bench results
Concurrency: 16
GSM8K
Essential Elements of an Effective PR Description Checklist
supported_models.mdandexamplesfor a new model.