[Enhancement] Engine runtime errors#2426
Conversation
Signed-off-by: Daniel Huang <daniel1.huang@intel.com>
Signed-off-by: Daniel Huang <daniel1.huang@intel.com>
816f281 to
a35010f
Compare
Signed-off-by: Daniel Huang <daniel1.huang@intel.com>
Signed-off-by: Daniel Huang <daniel1.huang@intel.com>
hsliuustc0106
left a comment
There was a problem hiding this comment.
@fake0fan @chickeyton PTAL
There was a problem hiding this comment.
Pull request overview
This PR improves how generation-time failures (especially fatal engine/stage failures like worker death/OOM) propagate through the diffusion runtime and OpenAI-compatible serving layer, aiming to prevent hangs and better mirror upstream vLLM error-handling patterns.
Changes:
- Introduces structured per-request error outputs (
OmniRequestOutput.error) and propagates stage errors asEngineGenerateErrorvs fatalEngineDeadError. - Adds diffusion worker death detection (monitor thread + failure callbacks) and plumbs “engine dead” state through stage clients, orchestrator, and health checks.
- Updates OpenAI API server endpoints and
/healthbehavior, plus adds targeted regression tests for these failure modes.
Reviewed changes
Copilot reviewed 15 out of 15 changed files in this pull request and generated 3 comments.
Show a summary per file
| File | Description |
|---|---|
| vllm_omni/outputs.py | Adds error field + from_error() constructor for failed request outputs. |
| vllm_omni/entrypoints/openai/api_server.py | Re-raises engine exceptions; updates /health to return 503 on EngineDeadError. |
| vllm_omni/entrypoints/omni_base.py | Extends check_health() to query stage clients. |
| vllm_omni/entrypoints/async_omni.py | Differentiates recoverable vs fatal stage errors; updates errored-state detection and output loop behavior. |
| vllm_omni/entrypoints/async_omni_diffusion.py | Wraps diffusion generation exceptions as EngineGenerateError while preserving fatal errors. |
| vllm_omni/engine/orchestrator.py | Detects diffusion error outputs and stage-dead conditions; forwards error outputs and escalates fatal failures. |
| vllm_omni/engine/async_omni_engine.py | Marks orchestrator-thread crashes as fatal messages (fatal: True). |
| vllm_omni/diffusion/stage_diffusion_client.py | Tracks _engine_dead, queues error outputs, and registers worker-death callbacks. |
| vllm_omni/diffusion/executor/multiproc_executor.py | Adds worker monitor thread, failure callbacks, and EngineDeadError propagation. |
| vllm_omni/diffusion/diffusion_engine.py | Tightens error raising semantics and preserves EngineDeadError. |
| vllm_omni/diffusion/offloader/layerwise_backend.py | Adds __future__ annotations import (typing). |
| tests/* | Adds regression tests for fatal/non-fatal error propagation and /health behavior. |
Comments suppressed due to low confidence (1)
vllm_omni/diffusion/stage_diffusion_client.py:140
- In
add_batch_request_async, the triple-quoted string is placed after runtime statements, so it is not treated as the function docstring (it becomes a no-op string literal). Move the docstring to be the first statement in the function body (before the_engine_deadcheck) or convert it to a comment to avoid losing documentation and confusing static doc tools.
async def add_batch_request_async(
self,
request_id: str,
prompts: list[OmniPromptType],
sampling_params: OmniDiffusionSamplingParams,
) -> None:
if self._engine_dead:
raise EngineDeadError()
"""Submit a list of prompts as a single batched engine call.
All prompts are processed in one ``DiffusionEngine.step()`` call
and the combined result is placed on the output queue with a single
*request_id*.
"""
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
Signed-off-by: Daniel Huang <daniel1.huang@intel.com>
Signed-off-by: Daniel Huang <daniel1.huang@intel.com>
fake0fan
left a comment
There was a problem hiding this comment.
left a few comments. One thing I’m curious about is whether the current changes are compatible with the LLM engine’s dead-worker detection.
|
@pi314ever , I am a little confused on what is this PR fixed for? |
lishunyang12
left a comment
There was a problem hiding this comment.
left a few comments, mostly around the worker monitor logic
Co-authored-by: Chenguang Zheng <645327136@qq.com> Signed-off-by: Daniel Huang <pilotflyer824@gmail.com>
Signed-off-by: Daniel Huang <daniel1.huang@intel.com>
Signed-off-by: Daniel Huang <daniel1.huang@intel.com>
Signed-off-by: Daniel Huang <daniel1.huang@intel.com>
Signed-off-by: Daniel Huang <daniel1.huang@intel.com>
Signed-off-by: Daniel Huang <daniel1.huang@intel.com>
|
@fake0fan @lishunyang12 I have resolved the outstanding comments. PTAL! |
|
@bjf-frz are u coordinating with @pi314ever ? |
Signed-off-by: Daniel Huang <daniel1.huang@intel.com>
Signed-off-by: Daniel Huang <daniel1.huang@intel.com>
Signed-off-by: Daniel Huang <daniel1.huang@intel.com>
|
@david6666666 PR Is ready. AMD CI fail is not related. |
|
I tested this pr, the server end encountered a "prompt too long" error, and the user end received an error status which shows: The |
I fixed this bug in v1/videos api, you may check this commit: |
hsliuustc0106
left a comment
There was a problem hiding this comment.
PR Overview
This PR addresses runtime error propagation through the diffusion engine, orchestrator, and OpenAI-compatible serving layer — specifically for worker death/OOM/fatal failures that previously caused hanging. It introduces structured error handling with EngineDeadError (fatal) vs EngineGenerateError (recoverable), subprocess death monitoring, and Omni-aware exception handlers.
Scope: 21 files, ~710 LOC production + ~835 LOC tests.
Blocker Scan
| Category | Result |
|---|---|
| Correctness | ISSUE: EngineGenerateError() raised without message at omni_base.py:301 — see inline comment |
| Reliability/Safety | PASS — monitor threads use daemon=True, weakref cleanup, and OSError guards |
| Breaking Changes | PASS — OmniBase.errored is a new property; AsyncOmni.errored semantics broadened (was orchestrator-only, now includes stage clients) but all callers check via .errored which still returns bool |
| Test Coverage | PASS — 18 targeted tests covering worker monitor, death sentinel, error propagation through all layers, and /health endpoint |
| Documentation | PASS — inline docstrings are thorough; no new public API exposed |
| Security | PASS |
Blocking Issue
Empty error message for recoverable failures (omni_base.py:301):
EngineGenerateError() is raised without the error text. The app-level handler calls create_error_response(exc) which uses str(exc), producing an empty "message": "" in the client response. This matches the bug reported in the PR thread where a "prompt too long" error became {"code": "EngineGenerateError", "message": ""}.
Fix: raise EngineGenerateError(error_text) instead of raise EngineGenerateError() from RuntimeError(error_text).
Non-Blocking Observations
-
Copilot's earlier concern about
EngineDeadErrorat orchestrator line 279 — you've already addressed this by switching tocheck_health(). Good. -
_wait_for_orchestrator_initpolling loop — the 1-second poll interval during startup is reasonable. The early-detection of dead orchestrator thread is a meaningful improvement over the previous single-blockingfuture.result(). -
Video background task
terminate_if_errored— correctly signals shutdown from background tasks that can't propagate exceptions to FastAPI handlers. Theapp_statethreading is clean. -
O(n_stages) check on every error in
_check_engine_output_error— you've documented this with a NOTE. Acceptable for the typical small number of stages.
Verdict
1 blocker (empty error message). Once EngineGenerateError(error_text) is fixed, this is ready to approve.
Co-authored-by: Hongsheng Liu <liuhongsheng4@huawei.com> Signed-off-by: Daniel Huang <pilotflyer824@gmail.com>
|
@hsliuustc0106 I have addressed the blocking issue. |
|
please fix CI |
we should consider this @pi314ever, thx |
Signed-off-by: bjf-frz <frz123db@gmail.com> Signed-off-by: Daniel Huang <daniel1.huang@intel.com>
|
@david6666666 I cherry-picked the commit in, should be resolved now. I also fixed CI. |
Signed-off-by: Daniel Huang <daniel1.huang@intel.com>
|
please resolve conflicts, thx |
Signed-off-by: Daniel Huang <daniel1.huang@intel.com>
|
LGTM now |
|
@hsliuustc0106 @Gaohan123 ptal thx |
Signed-off-by: Daniel Huang <daniel1.huang@intel.com> Signed-off-by: Daniel Huang <pilotflyer824@gmail.com> Signed-off-by: bjf-frz <frz123db@gmail.com> Co-authored-by: Chenguang Zheng <645327136@qq.com> Co-authored-by: Zhou Taichang <tzhouam@connect.ust.hk> Co-authored-by: Hongsheng Liu <liuhongsheng4@huawei.com> Co-authored-by: bjf-frz <frz123db@gmail.com>
Signed-off-by: Daniel Huang <daniel1.huang@intel.com> Signed-off-by: Daniel Huang <pilotflyer824@gmail.com> Signed-off-by: bjf-frz <frz123db@gmail.com> Co-authored-by: Chenguang Zheng <645327136@qq.com> Co-authored-by: Zhou Taichang <tzhouam@connect.ust.hk> Co-authored-by: Hongsheng Liu <liuhongsheng4@huawei.com> Co-authored-by: bjf-frz <frz123db@gmail.com>
Signed-off-by: Daniel Huang <daniel1.huang@intel.com> Signed-off-by: Daniel Huang <pilotflyer824@gmail.com> Signed-off-by: bjf-frz <frz123db@gmail.com> Co-authored-by: Chenguang Zheng <645327136@qq.com> Co-authored-by: Zhou Taichang <tzhouam@connect.ust.hk> Co-authored-by: Hongsheng Liu <liuhongsheng4@huawei.com> Co-authored-by: bjf-frz <frz123db@gmail.com>
PLEASE FILL IN THE PR DESCRIPTION HERE ENSURING ALL CHECKLIST ITEMS (AT THE BOTTOM) HAVE BEEN CONSIDERED.
Purpose
This PR fixes multiple issues with runtime errors causing hanging issues, tracked in #1346 . PR #2390 exists but does not match patterns set in vLLM regarding generation-time exceptions.
Test Plan
Mock testing for various states/injected errors during generation pipeline.
Test Result
All tests pass.
Essential Elements of an Effective PR Description Checklist
supported_models.mdandexamplesfor a new model. Please runmkdocs serveto sync the documentation editions to./docs.BEFORE SUBMITTING, PLEASE READ https://github.com/vllm-project/vllm-omni/blob/main/CONTRIBUTING.md (anything written below this line will be removed by GitHub Actions)