[ROCm][P/D][MORI][BugFix] Ensure correct api is used when making requests to prefill / decode nodes by rasmith · Pull Request #39835 · vllm-project/vllm

rasmith · 2026-04-14T21:55:09Z

Purpose

This PR fixes the MORI IO KV connector so it uses the correct API when making requests to prefill and decode instances.
Currently, this is broken since the request URL has a /v1/completions suffix which should instead be just /v1. Furthermore, the routes /v1/completions and /v1/chat/completions are not differentiated by the handler, which contributes to the problem. Instead, this PR creates individual routes which then call the main handler function. Finally, the correct URL with the correct API call is used, which fixes the problem.

Test Plan

Please see the Justfile from this PR and the instructions included here. Add this to the Justfile:

eval:
  lm_eval --model local-chat-completions \
    --model_args model={{MODEL}},\
          base_url=http://localhost:10001/v1/chat/completions,\
          num_concurrent=1 \
    --tasks gsm8k \
    --num_fewshot 5 \
    --apply_chat_template \
    --batch_size 1 \
    --gen_kwargs '{"max_tokens": 4096}'

and run:
just -f Justfile.mori eval
where Justfile.mori is the name of the Justfile I chose, but you can pick a different one.

Test Result

The lm_eval session ran successfully and I got the result:

|Tasks|Version|     Filter     |n-shot|  Metric   |   |Value |   |Stderr|
|-----|------:|----------------|-----:|-----------|---|-----:|---|-----:|
|gsm8k|      3|flexible-extract|     5|exact_match|↑  |0.9462|±  |0.0062|
|     |       |strict-match    |     5|exact_match|↑  |0.8893|±  |0.0086|

Essential Elements of an Effective PR Description Checklist

[X ] The purpose of the PR, such as "Fix some issue (link existing issues this PR will resolve)".
[ X] The test plan, such as providing test command.
[ X] The test results, such as pasting the results comparison before and after, or e2e results
(Optional) The necessary documentation update, such as updating supported_models.md and examples for a new model.
(Optional) Release notes update. If your change is user facing, please update the release notes draft in the Google Doc.

Signed-off-by: Randall Smith <Randall.Smith@amd.com>

mergify · 2026-04-14T21:55:52Z

Documentation preview: https://vllm--39835.org.readthedocs.build/en/39835/

gemini-code-assist

Code Review

This pull request refactors the toy proxy server to support both completions and chat completions by modularizing the request handler and dynamically constructing endpoint URLs. It also improves error messaging and updates the ping address in the Moriio connector. A critical syntax error was introduced in moriio_toy_proxy_server.py where a stray closing parenthesis will prevent the code from running.

Signed-off-by: Randall Smith <Randall.Smith@amd.com>

… into ransmith_fix_moriio_lm_eval

Signed-off-by: Randall Smith <Randall.Smith@amd.com>

rasmith · 2026-04-17T18:43:15Z

@inkcherry Please take a look?

inkcherry

LGTM, thanks! @rasmith

…ests to prefill / decode nodes (vllm-project#39835) Signed-off-by: Randall Smith <Randall.Smith@amd.com> Co-authored-by: hongbolv <33214277+hongbolv@users.noreply.github.com>

…ests to prefill / decode nodes (vllm-project#39835) Signed-off-by: Randall Smith <Randall.Smith@amd.com>

…ests to prefill / decode nodes (vllm-project#39835) Signed-off-by: Randall Smith <Randall.Smith@amd.com> Signed-off-by: Yifan <yzong@redhat.com>

…ests to prefill / decode nodes (vllm-project#39835) Signed-off-by: Randall Smith <Randall.Smith@amd.com> Signed-off-by: Avinash Singh <avinashsingh.rcoem@gmail.com>

…V transfer This PR bundles three independent fixes for the MoRIIO-based P/D disaggregation path (no overlap with open PRs vllm-project#40344, vllm-project#39276, vllm-project#32630, vllm-project#39835). 1. Proxy: handle max_completion_tokens for chat completions API. The OpenAI Chat Completions API uses max_completion_tokens; the toy proxy was only decrementing max_tokens, dropping the field on forward and causing the backend to use its default. When both fields are present, max_completion_tokens takes precedence per the OpenAI spec, so decrement that one first; fall back to max_tokens otherwise. 2. Proxy: add /health endpoint. Benchmark harnesses (e.g. vllm bench serve) perform health probes against the proxy before sending traffic. Without this endpoint the probe fails and the benchmark aborts. Add a minimal 200-OK responder that also reports the current instance counts. 3. Engine: defer write task when remote block_ids is None. When a prefill-side write task arrives before the scheduler has populated the remote block_ids (async handshake race), the previous code silently dropped the task. The fix is in _is_remote_ready: it now returns False when block_ids is still None, so the existing outer deferral path (in _write_worker_loop and _process_deferred_tasks) re-queues the task safely. The inner None-check inside _execute_write_task is removed; appending to self._deferred_tasks from inside the iteration in _process_deferred_tasks would have been clobbered by the still_deferred overwrite at end of loop. All three are narrow, independent fixes. They do not modify the scheduler's hot path or introduce new control flow. Verified to apply cleanly on upstream/main as of 2026-04-22. This change was developed with AI assistance; the author has reviewed and tested each hunk and is accountable for the behavior. Signed-off-by: Chaemin Lim <chaemin.lim@mangoboost.io>

…ests to prefill / decode nodes (vllm-project#39835) Signed-off-by: Randall Smith <Randall.Smith@amd.com> Signed-off-by: Adrian <info@zzit.ch>

…ests to prefill / decode nodes (vllm-project#39835) Signed-off-by: Randall Smith <Randall.Smith@amd.com> Co-authored-by: hongbolv <33214277+hongbolv@users.noreply.github.com>

…ests to prefill / decode nodes (vllm-project#39835) Signed-off-by: Randall Smith <Randall.Smith@amd.com>

…ests to prefill / decode nodes (vllm-project#39835) Signed-off-by: Randall Smith <Randall.Smith@amd.com> Signed-off-by: Matt Van Horn <455140+mvanhorn@users.noreply.github.com>

rasmith added 2 commits April 14, 2026 21:37

Ensure correct api used

a40feb1

Signed-off-by: Randall Smith <Randall.Smith@amd.com>

typo

43a8b61

Signed-off-by: Randall Smith <Randall.Smith@amd.com>

rasmith requested review from ApostaC, NickLucche and orozery as code owners April 14, 2026 21:55

Merge branch 'main' into ransmith_fix_moriio_lm_eval

49ea7de

mergify Bot added documentation Improvements or additions to documentation rocm Related to AMD ROCm bug Something isn't working labels Apr 14, 2026

github-project-automation Bot added this to AMD Apr 14, 2026

mergify Bot added the kv-connector label Apr 14, 2026

github-project-automation Bot moved this to Todo in AMD Apr 14, 2026

gemini-code-assist Bot reviewed Apr 14, 2026

View reviewed changes

Comment thread examples/online_serving/disaggregated_serving/moriio_toy_proxy_server.py Outdated

rasmith added 3 commits April 14, 2026 21:58

move error message

8145157

Signed-off-by: Randall Smith <Randall.Smith@amd.com>

Merge branch 'ransmith_fix_moriio_lm_eval' of github.com:rasmith/vllm…

78369fb

… into ransmith_fix_moriio_lm_eval

fix typo

5deb59b

Signed-off-by: Randall Smith <Randall.Smith@amd.com>

gshtras approved these changes Apr 15, 2026

View reviewed changes

gshtras added the ready ONLY add when PR is ready to merge/full CI is needed label Apr 15, 2026

rasmith added 2 commits April 15, 2026 14:01

Merge branch 'main' into ransmith_fix_moriio_lm_eval

527b06e

Merge branch 'main' into ransmith_fix_moriio_lm_eval

73c8f73

rasmith requested a review from xuechendi as a code owner April 17, 2026 18:43

inkcherry approved these changes Apr 20, 2026

View reviewed changes

Merge branch 'main' into ransmith_fix_moriio_lm_eval

ce70344

tjtanaa merged commit cefa528 into vllm-project:main Apr 22, 2026
58 of 59 checks passed

github-project-automation Bot moved this from Todo to Done in AMD Apr 22, 2026

baonudesifeizhai pushed a commit to baonudesifeizhai/vllm that referenced this pull request Apr 23, 2026

[ROCm][P/D][MORI][BugFix] Ensure correct api is used when making requ…

ff31e7a

…ests to prefill / decode nodes (vllm-project#39835) Signed-off-by: Randall Smith <Randall.Smith@amd.com>

jaeyoun98 mentioned this pull request Apr 29, 2026

[Bugfix][MoRIIO] Proxy robustness and write-task deferral for async KV transfer #41242

Open

4 tasks

weifang231 pushed a commit to weifang231/eb-vllm that referenced this pull request May 13, 2026

[ROCm][P/D][MORI][BugFix] Ensure correct api is used when making requ…

3e17b9d

…ests to prefill / decode nodes (vllm-project#39835) Signed-off-by: Randall Smith <Randall.Smith@amd.com>

mfylcek pushed a commit to mfylcek/vllm that referenced this pull request May 19, 2026

[ROCm][P/D][MORI][BugFix] Ensure correct api is used when making requ…

d8f1782

…ests to prefill / decode nodes (vllm-project#39835) Signed-off-by: Randall Smith <Randall.Smith@amd.com>

jhu960213 pushed a commit to jhu960213/vllm that referenced this pull request May 20, 2026

[ROCm][P/D][MORI][BugFix] Ensure correct api is used when making requ…

acf1321

…ests to prefill / decode nodes (vllm-project#39835) Signed-off-by: Randall Smith <Randall.Smith@amd.com>

brian-dellabetta pushed a commit to neuralmagic/vllm that referenced this pull request May 29, 2026

[ROCm][P/D][MORI][BugFix] Ensure correct api is used when making requ…

c1fb908

…ests to prefill / decode nodes (vllm-project#39835) Signed-off-by: Randall Smith <Randall.Smith@amd.com>

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[ROCm][P/D][MORI][BugFix] Ensure correct api is used when making requests to prefill / decode nodes#39835

[ROCm][P/D][MORI][BugFix] Ensure correct api is used when making requests to prefill / decode nodes#39835
tjtanaa merged 9 commits into
vllm-project:mainfrom
rasmith:ransmith_fix_moriio_lm_eval

rasmith commented Apr 14, 2026 •

edited by github-actions Bot

Loading

Uh oh!

mergify Bot commented Apr 14, 2026

Uh oh!

gemini-code-assist Bot left a comment

Uh oh!

Uh oh!

rasmith commented Apr 17, 2026

Uh oh!

inkcherry left a comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

Uh oh!

Conversation

rasmith commented Apr 14, 2026 • edited by github-actions Bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Purpose

Test Plan

Test Result

Uh oh!

mergify Bot commented Apr 14, 2026

Uh oh!

gemini-code-assist Bot left a comment

Choose a reason for hiding this comment

Code Review

Uh oh!

Uh oh!

rasmith commented Apr 17, 2026

Uh oh!

inkcherry left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

rasmith commented Apr 14, 2026 •

edited by github-actions Bot

Loading