Skip to content

Conversation

@njhill
Copy link
Member

@njhill njhill commented Nov 7, 2025

This is a re-apply of #28012 which was reverted in #28289 due to a bug related to aggregating kv connector outputs which broke for example Nixl P/D for TP > 1.

The second commit is a fix for that issue. The original PR was itself a fix for a significant performance regression.

As a follow-on I will likely refactor this a little more and improve the test coverage.

Copy link
Contributor

@gemini-code-assist gemini-code-assist bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request refactors the multiprocessing executor to remove the I/O thread pool, aiming to fix a performance regression. The core change involves a new threadless FutureWrapper and a manual future queue to manage asynchronous RPC calls. While this is a clever way to avoid thread overhead, the new FutureWrapper implementations in both multiproc_executor.py and ray_utils.py have dropped support for timeouts, which is a functional regression from the standard Future API. Furthermore, the interaction between KVOutputAggregator and the new FutureWrapper in multiproc_executor is complex, tightly coupled, and introduces a critical bug that will cause a crash if a timeout is used. I've provided detailed comments and suggestions to address these issues.

Copy link

@chatgpt-codex-connector chatgpt-codex-connector bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

ℹ️ About Codex in GitHub

Codex has been enabled to automatically review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

When you sign up for Codex through ChatGPT, Codex can also answer questions or update the PR, like "@codex address that feedback".

Signed-off-by: Nick Hill <[email protected]>
Signed-off-by: Nick Hill <[email protected]>
@njhill njhill changed the title [PerfFix] Avoid separate thread for MP executor shm spin [PerfFix] Avoid separate thread for MP executor shm spin (take 2) Nov 7, 2025
@njhill njhill added the ready ONLY add when PR is ready to merge/full CI is needed label Nov 7, 2025
@njhill njhill enabled auto-merge (squash) November 7, 2025 21:04
@njhill njhill merged commit 67a2da8 into vllm-project:main Nov 7, 2025
54 checks passed
@njhill njhill deleted the reapply-mp-perf-fix branch November 7, 2025 22:11
xuebwang-amd pushed a commit to xuebwang-amd/vllm that referenced this pull request Nov 13, 2025
devpatelio pushed a commit to SumanthRH/vllm that referenced this pull request Nov 29, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

kv-connector ready ONLY add when PR is ready to merge/full CI is needed v1

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants