Skip to content

[ROCm][P/D][MORI][BugFix] Ensure correct api is used when making requests to prefill / decode nodes#39835

Merged
tjtanaa merged 9 commits into
vllm-project:mainfrom
rasmith:ransmith_fix_moriio_lm_eval
Apr 22, 2026
Merged

[ROCm][P/D][MORI][BugFix] Ensure correct api is used when making requests to prefill / decode nodes#39835
tjtanaa merged 9 commits into
vllm-project:mainfrom
rasmith:ransmith_fix_moriio_lm_eval

Conversation

@rasmith
Copy link
Copy Markdown
Contributor

@rasmith rasmith commented Apr 14, 2026

Purpose

This PR fixes the MORI IO KV connector so it uses the correct API when making requests to prefill and decode instances.
Currently, this is broken since the request URL has a /v1/completions suffix which should instead be just /v1. Furthermore, the routes /v1/completions and /v1/chat/completions are not differentiated by the handler, which contributes to the problem. Instead, this PR creates individual routes which then call the main handler function. Finally, the correct URL with the correct API call is used, which fixes the problem.

Test Plan

Please see the Justfile from this PR and the instructions included here. Add this to the Justfile:

eval:
  lm_eval --model local-chat-completions \
    --model_args model={{MODEL}},\
          base_url=http://localhost:10001/v1/chat/completions,\
          num_concurrent=1 \
    --tasks gsm8k \
    --num_fewshot 5 \
    --apply_chat_template \
    --batch_size 1 \
    --gen_kwargs '{"max_tokens": 4096}'

and run:
just -f Justfile.mori eval
where Justfile.mori is the name of the Justfile I chose, but you can pick a different one.

Test Result

The lm_eval session ran successfully and I got the result:

|Tasks|Version|     Filter     |n-shot|  Metric   |   |Value |   |Stderr|
|-----|------:|----------------|-----:|-----------|---|-----:|---|-----:|
|gsm8k|      3|flexible-extract|     5|exact_match|↑  |0.9462|±  |0.0062|
|     |       |strict-match    |     5|exact_match|↑  |0.8893|±  |0.0086|

Essential Elements of an Effective PR Description Checklist
  • [X ] The purpose of the PR, such as "Fix some issue (link existing issues this PR will resolve)".
  • [ X] The test plan, such as providing test command.
  • [ X] The test results, such as pasting the results comparison before and after, or e2e results
  • (Optional) The necessary documentation update, such as updating supported_models.md and examples for a new model.
  • (Optional) Release notes update. If your change is user facing, please update the release notes draft in the Google Doc.

rasmith added 2 commits April 14, 2026 21:37
Signed-off-by: Randall Smith <Randall.Smith@amd.com>
Signed-off-by: Randall Smith <Randall.Smith@amd.com>
@mergify
Copy link
Copy Markdown
Contributor

mergify Bot commented Apr 14, 2026

Documentation preview: https://vllm--39835.org.readthedocs.build/en/39835/

@mergify mergify Bot added documentation Improvements or additions to documentation rocm Related to AMD ROCm bug Something isn't working labels Apr 14, 2026
@mergify mergify Bot added the kv-connector label Apr 14, 2026
@github-project-automation github-project-automation Bot moved this to Todo in AMD Apr 14, 2026
Copy link
Copy Markdown
Contributor

@gemini-code-assist gemini-code-assist Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request refactors the toy proxy server to support both completions and chat completions by modularizing the request handler and dynamically constructing endpoint URLs. It also improves error messaging and updates the ping address in the Moriio connector. A critical syntax error was introduced in moriio_toy_proxy_server.py where a stray closing parenthesis will prevent the code from running.

Comment thread examples/online_serving/disaggregated_serving/moriio_toy_proxy_server.py Outdated
rasmith added 3 commits April 14, 2026 21:58
Signed-off-by: Randall Smith <Randall.Smith@amd.com>
Signed-off-by: Randall Smith <Randall.Smith@amd.com>
@gshtras gshtras added the ready ONLY add when PR is ready to merge/full CI is needed label Apr 15, 2026
@rasmith rasmith requested a review from xuechendi as a code owner April 17, 2026 18:43
@rasmith
Copy link
Copy Markdown
Contributor Author

rasmith commented Apr 17, 2026

@inkcherry Please take a look?

Copy link
Copy Markdown
Contributor

@inkcherry inkcherry left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, thanks! @rasmith

@tjtanaa tjtanaa merged commit cefa528 into vllm-project:main Apr 22, 2026
58 of 59 checks passed
@github-project-automation github-project-automation Bot moved this from Todo to Done in AMD Apr 22, 2026
Copilot AI pushed a commit to hongbolv/vllm that referenced this pull request Apr 22, 2026
…ests to prefill / decode nodes (vllm-project#39835)

Signed-off-by: Randall Smith <Randall.Smith@amd.com>
Co-authored-by: hongbolv <33214277+hongbolv@users.noreply.github.com>
baonudesifeizhai pushed a commit to baonudesifeizhai/vllm that referenced this pull request Apr 23, 2026
…ests to prefill / decode nodes (vllm-project#39835)

Signed-off-by: Randall Smith <Randall.Smith@amd.com>
yzong-rh pushed a commit to yzong-rh/vllm that referenced this pull request Apr 23, 2026
…ests to prefill / decode nodes (vllm-project#39835)

Signed-off-by: Randall Smith <Randall.Smith@amd.com>
Signed-off-by: Yifan <yzong@redhat.com>
avinashsingh77 pushed a commit to avinashsingh77/vllm that referenced this pull request Apr 27, 2026
…ests to prefill / decode nodes (vllm-project#39835)

Signed-off-by: Randall Smith <Randall.Smith@amd.com>
Signed-off-by: Avinash Singh <avinashsingh.rcoem@gmail.com>
jaeyoun98 pushed a commit to jaeyoun98/vllm that referenced this pull request Apr 30, 2026
…V transfer

This PR bundles three independent fixes for the MoRIIO-based P/D
disaggregation path (no overlap with open PRs vllm-project#40344, vllm-project#39276, vllm-project#32630,
vllm-project#39835).

1. Proxy: handle max_completion_tokens for chat completions API.
   The OpenAI Chat Completions API uses max_completion_tokens; the toy
   proxy was only decrementing max_tokens, dropping the field on
   forward and causing the backend to use its default. When both
   fields are present, max_completion_tokens takes precedence per the
   OpenAI spec, so decrement that one first; fall back to max_tokens
   otherwise.

2. Proxy: add /health endpoint.
   Benchmark harnesses (e.g. vllm bench serve) perform health probes
   against the proxy before sending traffic. Without this endpoint the
   probe fails and the benchmark aborts. Add a minimal 200-OK
   responder that also reports the current instance counts.

3. Engine: defer write task when remote block_ids is None.
   When a prefill-side write task arrives before the scheduler has
   populated the remote block_ids (async handshake race), the previous
   code silently dropped the task. The fix is in _is_remote_ready:
   it now returns False when block_ids is still None, so the existing
   outer deferral path (in _write_worker_loop and _process_deferred_tasks)
   re-queues the task safely. The inner None-check inside
   _execute_write_task is removed; appending to self._deferred_tasks
   from inside the iteration in _process_deferred_tasks would have
   been clobbered by the still_deferred overwrite at end of loop.

All three are narrow, independent fixes. They do not modify the
scheduler's hot path or introduce new control flow. Verified to apply
cleanly on upstream/main as of 2026-04-22.

This change was developed with AI assistance; the author has reviewed
and tested each hunk and is accountable for the behavior.

Signed-off-by: Chaemin Lim <chaemin.lim@mangoboost.io>
Lafunamor pushed a commit to Lafunamor/vllm that referenced this pull request May 1, 2026
…ests to prefill / decode nodes (vllm-project#39835)

Signed-off-by: Randall Smith <Randall.Smith@amd.com>
Signed-off-by: Adrian <info@zzit.ch>
Copilot AI pushed a commit to hongbolv/vllm that referenced this pull request May 7, 2026
…ests to prefill / decode nodes (vllm-project#39835)

Signed-off-by: Randall Smith <Randall.Smith@amd.com>
Co-authored-by: hongbolv <33214277+hongbolv@users.noreply.github.com>
weifang231 pushed a commit to weifang231/eb-vllm that referenced this pull request May 13, 2026
…ests to prefill / decode nodes (vllm-project#39835)

Signed-off-by: Randall Smith <Randall.Smith@amd.com>
my-other-github-account pushed a commit to my-other-github-account/vllm that referenced this pull request May 15, 2026
…ests to prefill / decode nodes (vllm-project#39835)

Signed-off-by: Randall Smith <Randall.Smith@amd.com>
my-other-github-account pushed a commit to my-other-github-account/vllm that referenced this pull request May 15, 2026
…ests to prefill / decode nodes (vllm-project#39835)

Signed-off-by: Randall Smith <Randall.Smith@amd.com>
mfylcek pushed a commit to mfylcek/vllm that referenced this pull request May 19, 2026
…ests to prefill / decode nodes (vllm-project#39835)

Signed-off-by: Randall Smith <Randall.Smith@amd.com>
jhu960213 pushed a commit to jhu960213/vllm that referenced this pull request May 20, 2026
…ests to prefill / decode nodes (vllm-project#39835)

Signed-off-by: Randall Smith <Randall.Smith@amd.com>
brian-dellabetta pushed a commit to neuralmagic/vllm that referenced this pull request May 29, 2026
…ests to prefill / decode nodes (vllm-project#39835)

Signed-off-by: Randall Smith <Randall.Smith@amd.com>
mvanhorn pushed a commit to mvanhorn/vllm that referenced this pull request Jun 4, 2026
…ests to prefill / decode nodes (vllm-project#39835)

Signed-off-by: Randall Smith <Randall.Smith@amd.com>
Signed-off-by: Matt Van Horn <455140+mvanhorn@users.noreply.github.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

bug Something isn't working documentation Improvements or additions to documentation kv-connector ready ONLY add when PR is ready to merge/full CI is needed rocm Related to AMD ROCm

Projects

Status: Done

Development

Successfully merging this pull request may close these issues.

4 participants