[Bugfix] fix DP-aware routing in OpenAI API requests by inkcherry · Pull Request #29002 · vllm-project/vllm

inkcherry · 2025-11-19T08:46:24Z

Purpose

fix #24945
In add_request, duplicate initialization is skipped, but during the previousself.processor.process_inputs, data_parallel_rank is not initialized. Using -H 'X-data-parallel-rank' to specify the data parallel rank would be invalid in this case., cc @njhill

vllm/vllm/v1/engine/async_llm.py

Lines 283 to 302 in d69062c

    
           # Convert Input --> Request. 
        
           if isinstance(prompt, EngineCoreRequest): 
        
               request = prompt 
        
           else: 
        
               assert prompt_text is None 
        
               logger.warning_once( 
        
                   "Processor has been moved under OpenAIServing and will " 
        
                   "be removed from AsyncLLM in v0.13." 
        
               ) 
        
               request = self.processor.process_inputs( 
        
                   request_id, 
        
                   prompt, 
        
                   params, 
        
                   arrival_time, 
        
                   lora_request, 
        
                   tokenization_kwargs, 
        
                   trace_headers, 
        
                   priority, 
        
                   data_parallel_rank, 
        
               )

Test Plan

Test Result

Essential Elements of an Effective PR Description Checklist

The purpose of the PR, such as "Fix some issue (link existing issues this PR will resolve)".
The test plan, such as providing test command.
The test results, such as pasting the results comparison before and after, or e2e results
(Optional) The necessary documentation update, such as updating supported_models.md and examples for a new model.
(Optional) Release notes update. If your change is user facing, please update the release notes draft in the Google Doc.

gemini-code-assist

Code Review

This pull request aims to fix an issue with Data Parallelism-aware routing by propagating the data_parallel_rank to the engine's processor. The changes correctly pass the rank through serving_completion.py and serving_engine.py.

However, I've identified a critical issue in serving_engine.py where the signature of _process_inputs is changed in a way that breaks other parts of the code and has an incorrect type hint. I've left a comment with a suggested fix to make the new argument optional, which will prevent runtime errors.

vllm/entrypoints/openai/serving_engine.py

chatgpt-codex-connector

💡 Codex Review

Here are some automated review suggestions for this pull request.

ℹ️ About Codex in GitHub

Codex has been enabled to automatically review pull requests in this repo. Reviews are triggered when you

Open a pull request for review
Mark a draft as ready
Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

When you sign up for Codex through ChatGPT, Codex can also answer questions or update the PR, like "@codex address that feedback".

vllm/entrypoints/openai/serving_engine.py

Signed-off-by: inkcherry <mingzhi.liu@amd.com>

markmc

Nice catch!

/cc @Prowindy

markmc · 2025-11-19T09:54:02Z

vllm/entrypoints/openai/serving_completion.py

                        lora_request=lora_request,
                        trace_headers=trace_headers,
                        priority=request.priority,
+                        data_parallel_rank=data_parallel_rank,


Looks like a similar fix is required in serving_chat.py?

Also would be good to catch this issue in the test_serving_chat_data_parallel_rank_extraction test

Even better, it would be great to add a similar test for serving_completion !

@markmc Thanks for the comment. I've added it to serving_chat.py.

I noticed that mock objects won't trigger this error. So I added a test with a real engine for coverage, placed after the test_dp_rank_argument test.

ChuanLi1101 · 2025-12-11T20:19:45Z

vllm/entrypoints/openai/serving_completion.py

                        lora_request=lora_request,
                        trace_headers=trace_headers,
                        priority=request.priority,
+                        data_parallel_rank=data_parallel_rank,


Could this be a breaking change for the current API?

Actually not. When unspecified, it defaults to None and uses the default DP load algorithm

Prowindy · 2025-12-13T23:12:28Z

@inkcherry could you clarify what failure you saw, crash or request failure? Any repro steps will be helpful.

I think data_parallel_rank isn't a must-have for all endpoints. Only the endpoints supported by https://github.com/vllm-project/router would have X-http-header set and data_parallel_rank needed. Missing this should fallback to normal routing mode.

Signed-off-by: inkcherry <mingzhi.liu@amd.com>

inkcherry · 2025-12-17T06:00:48Z

@inkcherry could you clarify what failure you saw, crash or request failure? Any repro steps will be helpful.

I think data_parallel_rank isn't a must-have for all endpoints. Only the endpoints supported by https://github.com/vllm-project/router would have X-http-header set and data_parallel_rank needed. Missing this should fallback to normal routing mode.

Thank you for your great work and feedback. I noticed that it does not crash; instead, it fails under certain circumstances (equivalent to not being specified). I have added tests to cover this.

I agree with your observation, I did not add support for new endpoints.

njhill

Thanks @inkcherry!

Doesn't need to be addressed in this PR but I don't see why we would only want to support this header on the chat and completion endpoints, it could apply similarly to all of the endpoints.

Review comments addressed.

) Signed-off-by: inkcherry <mingzhi.liu@amd.com>

) Signed-off-by: inkcherry <mingzhi.liu@amd.com> Signed-off-by: Ubuntu <mjtaheri68@gmail.com>

) Signed-off-by: inkcherry <mingzhi.liu@amd.com> Signed-off-by: dsuhinin <suhinin.dmitriy@gmail.com>

inkcherry requested review from aarnphm and chaunceyjiang as code owners November 19, 2025 08:46

mergify bot added the frontend label Nov 19, 2025

gemini-code-assist bot reviewed Nov 19, 2025

View reviewed changes

vllm/entrypoints/openai/serving_engine.py Outdated Show resolved Hide resolved

chatgpt-codex-connector bot reviewed Nov 19, 2025

View reviewed changes

vllm/entrypoints/openai/serving_engine.py Show resolved Hide resolved

inkcherry force-pushed the fix_dp_router branch from 7d32d92 to af200e8 Compare November 19, 2025 08:49

inkcherry requested review from LucasWilkinson, WoosukKwon, mgoin, tlrmchlsmth and yewentao256 as code owners November 19, 2025 08:49

mergify bot added ci/build nvidia v1 labels Nov 19, 2025

github-project-automation bot added this to NVIDIA Nov 19, 2025

fix dp router

3c79698

Signed-off-by: inkcherry <mingzhi.liu@amd.com>

inkcherry force-pushed the fix_dp_router branch from af200e8 to 3c79698 Compare November 19, 2025 08:54

inkcherry added 2 commits November 19, 2025 08:57

update

2654fca

Signed-off-by: inkcherry <mingzhi.liu@amd.com>

Merge branch 'main' into fix_dp_router

0fe2e44

markmc previously requested changes Nov 19, 2025

View reviewed changes

ChuanLi1101 reviewed Dec 11, 2025

View reviewed changes

inkcherry added 6 commits December 16, 2025 03:43

Merge remote-tracking branch 'off/main' into fix_dp_router

8bbabf0

add arg in serving_chat

91cd84c

Signed-off-by: inkcherry <mingzhi.liu@amd.com>

update fake func

0e7dfd9

Signed-off-by: inkcherry <mingzhi.liu@amd.com>

update test

4daa2aa

Signed-off-by: inkcherry <mingzhi.liu@amd.com>

format

ab45559

Signed-off-by: inkcherry <mingzhi.liu@amd.com>

Merge remote-tracking branch 'off/main' into fix_dp_1217

230a1c4

inkcherry requested review from DarkLight1337 and robertgshaw2-redhat as code owners December 17, 2025 05:46

inkcherry requested a review from NickLucche as a code owner December 17, 2025 05:46

Merge branch 'main' into fix_dp_router

1e6b1cd

njhill approved these changes Dec 18, 2025

View reviewed changes

njhill added the ready ONLY add when PR is ready to merge/full CI is needed label Dec 18, 2025

njhill merged commit 500f26e into vllm-project:main Dec 18, 2025
48 checks passed

github-project-automation bot moved this to Done in NVIDIA Dec 18, 2025

yugong333 pushed a commit to yugong333/vllm that referenced this pull request Dec 22, 2025

[Bugfix] fix DP-aware routing in OpenAI API requests (vllm-project#29002

24ced49

) Signed-off-by: inkcherry <mingzhi.liu@amd.com>

Majid-Taheri pushed a commit to Majid-Taheri/vllm that referenced this pull request Dec 23, 2025

[Bugfix] fix DP-aware routing in OpenAI API requests (vllm-project#29002

4bdd96e

) Signed-off-by: inkcherry <mingzhi.liu@amd.com> Signed-off-by: Ubuntu <mjtaheri68@gmail.com>

dsuhinin pushed a commit to dsuhinin/vllm that referenced this pull request Jan 21, 2026

[Bugfix] fix DP-aware routing in OpenAI API requests (vllm-project#29002

8d66c78

) Signed-off-by: inkcherry <mingzhi.liu@amd.com> Signed-off-by: dsuhinin <suhinin.dmitriy@gmail.com>

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[Bugfix] fix DP-aware routing in OpenAI API requests#29002

[Bugfix] fix DP-aware routing in OpenAI API requests#29002
njhill merged 10 commits intovllm-project:mainfrom
inkcherry:fix_dp_router

inkcherry commented Nov 19, 2025 •

edited by github-actions bot

Loading

Uh oh!

gemini-code-assist bot left a comment

Uh oh!

Uh oh!

chatgpt-codex-connector bot left a comment

Uh oh!

Uh oh!

markmc left a comment

Uh oh!

markmc Nov 19, 2025

Uh oh!

inkcherry Dec 17, 2025 •

edited

Loading

Uh oh!

ChuanLi1101 Dec 11, 2025

Uh oh!

inkcherry Dec 17, 2025

Uh oh!

Prowindy commented Dec 13, 2025

Uh oh!

inkcherry commented Dec 17, 2025 •

edited

Loading

Uh oh!

njhill left a comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

5 participants

	# Convert Input --> Request.
	if isinstance(prompt, EngineCoreRequest):
	request = prompt
	else:
	assert prompt_text is None
	logger.warning_once(
	"Processor has been moved under OpenAIServing and will "
	"be removed from AsyncLLM in v0.13."
	)
	request = self.processor.process_inputs(
	request_id,
	prompt,
	params,
	arrival_time,
	lora_request,
	tokenization_kwargs,
	trace_headers,
	priority,
	data_parallel_rank,
	)

Uh oh!

Conversation

inkcherry commented Nov 19, 2025 • edited by github-actions bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Purpose

Test Plan

Test Result

Uh oh!

gemini-code-assist bot left a comment

Choose a reason for hiding this comment

Code Review

Uh oh!

Uh oh!

chatgpt-codex-connector bot left a comment

Choose a reason for hiding this comment

💡 Codex Review

Uh oh!

Uh oh!

markmc left a comment

Choose a reason for hiding this comment

Uh oh!

markmc Nov 19, 2025

Choose a reason for hiding this comment

Uh oh!

inkcherry Dec 17, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

ChuanLi1101 Dec 11, 2025

Choose a reason for hiding this comment

Uh oh!

inkcherry Dec 17, 2025

Choose a reason for hiding this comment

Uh oh!

Prowindy commented Dec 13, 2025

Uh oh!

inkcherry commented Dec 17, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

njhill left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

5 participants

inkcherry commented Nov 19, 2025 •

edited by github-actions bot

Loading

inkcherry Dec 17, 2025 •

edited

Loading

inkcherry commented Dec 17, 2025 •

edited

Loading