[ROCm][P/D][MORI][BugFix] Ensure correct api is used when making requests to prefill / decode nodes#39835
Merged
Conversation
Signed-off-by: Randall Smith <Randall.Smith@amd.com>
Contributor
|
Documentation preview: https://vllm--39835.org.readthedocs.build/en/39835/ |
Contributor
There was a problem hiding this comment.
Code Review
This pull request refactors the toy proxy server to support both completions and chat completions by modularizing the request handler and dynamically constructing endpoint URLs. It also improves error messaging and updates the ping address in the Moriio connector. A critical syntax error was introduced in moriio_toy_proxy_server.py where a stray closing parenthesis will prevent the code from running.
Signed-off-by: Randall Smith <Randall.Smith@amd.com>
… into ransmith_fix_moriio_lm_eval
gshtras
approved these changes
Apr 15, 2026
Contributor
Author
|
@inkcherry Please take a look? |
Copilot AI
pushed a commit
to hongbolv/vllm
that referenced
this pull request
Apr 22, 2026
…ests to prefill / decode nodes (vllm-project#39835) Signed-off-by: Randall Smith <Randall.Smith@amd.com> Co-authored-by: hongbolv <33214277+hongbolv@users.noreply.github.com>
baonudesifeizhai
pushed a commit
to baonudesifeizhai/vllm
that referenced
this pull request
Apr 23, 2026
…ests to prefill / decode nodes (vllm-project#39835) Signed-off-by: Randall Smith <Randall.Smith@amd.com>
yzong-rh
pushed a commit
to yzong-rh/vllm
that referenced
this pull request
Apr 23, 2026
…ests to prefill / decode nodes (vllm-project#39835) Signed-off-by: Randall Smith <Randall.Smith@amd.com> Signed-off-by: Yifan <yzong@redhat.com>
avinashsingh77
pushed a commit
to avinashsingh77/vllm
that referenced
this pull request
Apr 27, 2026
…ests to prefill / decode nodes (vllm-project#39835) Signed-off-by: Randall Smith <Randall.Smith@amd.com> Signed-off-by: Avinash Singh <avinashsingh.rcoem@gmail.com>
4 tasks
jaeyoun98
pushed a commit
to jaeyoun98/vllm
that referenced
this pull request
Apr 30, 2026
…V transfer This PR bundles three independent fixes for the MoRIIO-based P/D disaggregation path (no overlap with open PRs vllm-project#40344, vllm-project#39276, vllm-project#32630, vllm-project#39835). 1. Proxy: handle max_completion_tokens for chat completions API. The OpenAI Chat Completions API uses max_completion_tokens; the toy proxy was only decrementing max_tokens, dropping the field on forward and causing the backend to use its default. When both fields are present, max_completion_tokens takes precedence per the OpenAI spec, so decrement that one first; fall back to max_tokens otherwise. 2. Proxy: add /health endpoint. Benchmark harnesses (e.g. vllm bench serve) perform health probes against the proxy before sending traffic. Without this endpoint the probe fails and the benchmark aborts. Add a minimal 200-OK responder that also reports the current instance counts. 3. Engine: defer write task when remote block_ids is None. When a prefill-side write task arrives before the scheduler has populated the remote block_ids (async handshake race), the previous code silently dropped the task. The fix is in _is_remote_ready: it now returns False when block_ids is still None, so the existing outer deferral path (in _write_worker_loop and _process_deferred_tasks) re-queues the task safely. The inner None-check inside _execute_write_task is removed; appending to self._deferred_tasks from inside the iteration in _process_deferred_tasks would have been clobbered by the still_deferred overwrite at end of loop. All three are narrow, independent fixes. They do not modify the scheduler's hot path or introduce new control flow. Verified to apply cleanly on upstream/main as of 2026-04-22. This change was developed with AI assistance; the author has reviewed and tested each hunk and is accountable for the behavior. Signed-off-by: Chaemin Lim <chaemin.lim@mangoboost.io>
Lafunamor
pushed a commit
to Lafunamor/vllm
that referenced
this pull request
May 1, 2026
…ests to prefill / decode nodes (vllm-project#39835) Signed-off-by: Randall Smith <Randall.Smith@amd.com> Signed-off-by: Adrian <info@zzit.ch>
Copilot AI
pushed a commit
to hongbolv/vllm
that referenced
this pull request
May 7, 2026
…ests to prefill / decode nodes (vllm-project#39835) Signed-off-by: Randall Smith <Randall.Smith@amd.com> Co-authored-by: hongbolv <33214277+hongbolv@users.noreply.github.com>
weifang231
pushed a commit
to weifang231/eb-vllm
that referenced
this pull request
May 13, 2026
…ests to prefill / decode nodes (vllm-project#39835) Signed-off-by: Randall Smith <Randall.Smith@amd.com>
my-other-github-account
pushed a commit
to my-other-github-account/vllm
that referenced
this pull request
May 15, 2026
…ests to prefill / decode nodes (vllm-project#39835) Signed-off-by: Randall Smith <Randall.Smith@amd.com>
my-other-github-account
pushed a commit
to my-other-github-account/vllm
that referenced
this pull request
May 15, 2026
…ests to prefill / decode nodes (vllm-project#39835) Signed-off-by: Randall Smith <Randall.Smith@amd.com>
mfylcek
pushed a commit
to mfylcek/vllm
that referenced
this pull request
May 19, 2026
…ests to prefill / decode nodes (vllm-project#39835) Signed-off-by: Randall Smith <Randall.Smith@amd.com>
jhu960213
pushed a commit
to jhu960213/vllm
that referenced
this pull request
May 20, 2026
…ests to prefill / decode nodes (vllm-project#39835) Signed-off-by: Randall Smith <Randall.Smith@amd.com>
brian-dellabetta
pushed a commit
to neuralmagic/vllm
that referenced
this pull request
May 29, 2026
…ests to prefill / decode nodes (vllm-project#39835) Signed-off-by: Randall Smith <Randall.Smith@amd.com>
mvanhorn
pushed a commit
to mvanhorn/vllm
that referenced
this pull request
Jun 4, 2026
…ests to prefill / decode nodes (vllm-project#39835) Signed-off-by: Randall Smith <Randall.Smith@amd.com> Signed-off-by: Matt Van Horn <455140+mvanhorn@users.noreply.github.com>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Purpose
This PR fixes the MORI IO KV connector so it uses the correct API when making requests to prefill and decode instances.
Currently, this is broken since the request URL has a
/v1/completionssuffix which should instead be just/v1. Furthermore, the routes/v1/completions and/v1/chat/completionsare not differentiated by the handler, which contributes to the problem. Instead, this PR creates individual routes which then call the main handler function. Finally, the correct URL with the correct API call is used, which fixes the problem.Test Plan
Please see the
Justfilefrom this PR and the instructions included here. Add this to theJustfile:and run:
just -f Justfile.mori evalwhere
Justfile.moriis the name of theJustfileI chose, but you can pick a different one.Test Result
The
lm_evalsession ran successfully and I got the result:Essential Elements of an Effective PR Description Checklist
supported_models.mdandexamplesfor a new model.