Skip to content

[Enhancement] Patch AsyncOmniEngine try_get_output[_async] hanging issues#2153

Merged
david6666666 merged 3 commits intovllm-project:mainfrom
pi314ever:async-omni-try-get-output-hang-fix
Mar 26, 2026
Merged

[Enhancement] Patch AsyncOmniEngine try_get_output[_async] hanging issues#2153
david6666666 merged 3 commits intovllm-project:mainfrom
pi314ever:async-omni-try-get-output-hang-fix

Conversation

@pi314ever
Copy link
Copy Markdown
Contributor

@pi314ever pi314ever commented Mar 24, 2026

PLEASE FILL IN THE PR DESCRIPTION HERE ENSURING ALL CHECKLIST ITEMS (AT THE BOTTOM) HAVE BEEN CONSIDERED.

Purpose

Adds #1560 functionality post #1908 refactor. Fixes issues during init for #1346

Test Plan

Added L1 test in tests/engine/test_async_omni_engine_outputs.py.

Test Result

Both passed.


Essential Elements of an Effective PR Description Checklist
  • The purpose of the PR, such as "Fix some issue (link existing issues this PR will resolve)".
  • The test plan. Please provide the test scripts & test commands. Please state the reasons if your codes don't require additional test scripts. For test file guidelines, please check the test style doc
  • The test results. Please paste the results comparison before and after, or the e2e results.
  • (Optional) The necessary documentation update, such as updating supported_models.md and examples for a new model. Please run mkdocs serve to sync the documentation editions to ./docs.
  • (Optional) Release notes update. If your change is user-facing, please update the release notes draft.

BEFORE SUBMITTING, PLEASE READ https://github.com/vllm-project/vllm-omni/blob/main/CONTRIBUTING.md (anything written below this line will be removed by GitHub Actions)

Signed-off-by: Daniel Huang <daniel1.huang@intel.com>
Signed-off-by: Daniel Huang <daniel1.huang@intel.com>
@pi314ever pi314ever changed the title Async omni try get output hang fix [Enhancement] Patch AsyncOmniEngine try_get_output[_async] hanging issues Mar 24, 2026
@pi314ever pi314ever mentioned this pull request Mar 24, 2026
1 task
@david6666666
Copy link
Copy Markdown
Collaborator

@yinpeiqi @Bounty-hunter ptal thx

@david6666666
Copy link
Copy Markdown
Collaborator

@codex review

@chatgpt-codex-connector
Copy link
Copy Markdown

Codex Review: Didn't find any major issues. Bravo.

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

@david6666666 david6666666 added the ready label to trigger buildkite CI label Mar 25, 2026
@david6666666
Copy link
Copy Markdown
Collaborator

LGTM

@david6666666 david6666666 enabled auto-merge (squash) March 25, 2026 01:54
Copy link
Copy Markdown
Collaborator

@SamitHuang SamitHuang Mar 25, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it's not that necessary to add tests for the log info, since the logic is straightforward

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It was requested by @david6666666 in #1346, but I can remove it if the consensus is that it is not needed.

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it's necessary to add test to corer handle out-of-memory (OOM) errors in a manner consistent with vLLM.

@david6666666 david6666666 disabled auto-merge March 25, 2026 02:24
return self.output_queue.sync_q.get(timeout=timeout)
except queue.Empty:
if not self.is_alive():
raise RuntimeError("Orchestrator died unexpectedly. See logs above.")
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

See logs above. seems too vague. Is it really helpful for users to debug the reason why Orchestrator died?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What message would you suggest instead? There is no meaningful way currently to obtain the specific error thrown by the Orchestrator thread. This is meant to be the catch-all worst-case scenario when orchestrator aborts ungracefully or forcefully that was not caught by the standard paths of sending to the output queue.

return self.output_queue.sync_q.get_nowait()
except queue.Empty:
if not self.is_alive():
raise RuntimeError("Orchestrator died unexpectedly. See logs above.")
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I guess the error messages might be asynchronously printed in stdout. Is the error message really shown above?

@david6666666 david6666666 merged commit ce916f4 into vllm-project:main Mar 26, 2026
8 checks passed
zhangj1an pushed a commit to zhangj1an/vllm-omni that referenced this pull request Mar 26, 2026
…sues (vllm-project#2153)

Signed-off-by: Daniel Huang <daniel1.huang@intel.com>

Signed-off-by: Zhang <jianmusings@gmail.com>
zhangj1an pushed a commit to zhangj1an/vllm-omni that referenced this pull request Mar 26, 2026
…sues (vllm-project#2153)

Signed-off-by: Daniel Huang <daniel1.huang@intel.com>
zhangj1an pushed a commit to zhangj1an/vllm-omni that referenced this pull request Mar 26, 2026
…sues (vllm-project#2153)

Signed-off-by: Daniel Huang <daniel1.huang@intel.com>
vraiti pushed a commit to vraiti/vllm-omni that referenced this pull request Apr 9, 2026
…sues (vllm-project#2153)

Signed-off-by: Daniel Huang <daniel1.huang@intel.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

ready label to trigger buildkite CI

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants