Skip to content

[Bugfix] Filter None compilation times in Executor#36186

Open
842974287 wants to merge 1 commit intovllm-project:mainfrom
842974287:fix-multinode-workerproc-init-ordering
Open

[Bugfix] Filter None compilation times in Executor#36186
842974287 wants to merge 1 commit intovllm-project:mainfrom
842974287:fix-multinode-workerproc-init-ordering

Conversation

@842974287
Copy link
Contributor

@842974287 842974287 commented Mar 5, 2026

Purpose

When workers return None for compilation_time (e.g. when compilation is skipped), max() on the collected results raises a TypeError. This filters out None values before computing the maximum.

Test Plan

  • Verified that max() no longer raises when compilation_times contains None values.
  • Existing CI tests cover the normal (non-None) path.

Test Result

No regression — the fix is a single-line defensive filter.


Essential Elements of an Effective PR Description Checklist
  • The purpose of the PR, such as "Fix some issue (link existing issues this PR will resolve)".
  • The test plan, such as providing test command.
  • The test results, such as pasting the results comparison before and after, or e2e results
  • (Optional) The necessary documentation update, such as updating supported_models.md and examples for a new model.
  • (Optional) Release notes update.

@842974287 842974287 requested a review from njhill as a code owner March 5, 2026 23:29
@mergify mergify bot added v1 bug Something isn't working labels Mar 5, 2026
Copy link
Contributor

@gemini-code-assist gemini-code-assist bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request addresses two key issues: it fixes a critical initialization order bug in WorkerProc for multi-node setups by moving _init_message_queues to after init_device, and it adds a filter to remove None values from compilation_times to prevent potential TypeError exceptions. No security vulnerabilities were found in these changes.

@842974287 842974287 changed the title [Bugfix] Fix multi-node WorkerProc init ordering broken by Elastic EP [Bugfix] Fix WorkerProc init order for multi-node TP Mar 5, 2026
@njhill
Copy link
Member

njhill commented Mar 6, 2026

Thanks @842974287! There is already a PR for this that should hopefully be merged very soon #35892.

Perhaps you can update this one to just cover the secondary fix, or open another PR for that.

@842974287 842974287 changed the title [Bugfix] Fix WorkerProc init order for multi-node TP [Bugfix] Filter None compilation times in Executor Mar 6, 2026
@842974287 842974287 force-pushed the fix-multinode-workerproc-init-ordering branch from 9b0b262 to 3bc708f Compare March 6, 2026 18:10
@842974287
Copy link
Contributor Author

@njhill Thanks, updated PR.

When workers return None for compilation_time (e.g. when compilation
is skipped), max() raises a TypeError. Filter out None values before
computing the maximum.

Signed-off-by: Daosheng Yang <dsy842974287@users.noreply.github.com>
Signed-off-by: Shiyan Deng <dsy842974287@meta.com>
@842974287 842974287 force-pushed the fix-multinode-workerproc-init-ordering branch from 3bc708f to 1318d0a Compare March 9, 2026 05:48
@ananyakgarg
Copy link

Could we land this? it's breaking some critical CIs on the mtia side

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

bug Something isn't working v1

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants