[Bugfix]fix wan2.2 RuntimeError no response to user by bjf-frz · Pull Request #2390 · vllm-project/vllm-omni

bjf-frz · 2026-04-01T02:10:09Z

PLEASE FILL IN THE PR DESCRIPTION HERE ENSURING ALL CHECKLIST ITEMS (AT THE BOTTOM) HAVE BEEN CONSIDERED.

Purpose

When an OOM error occurs in wan2.2, the server throws a RuntimeError but does not return an HTTP error code. As a result, the user receives no response from the server, and the process continues to loop.

Try to address #2327

Test Plan

Test Result

Creating video generation task...
Create response: {"model":"Wan-AI/Wan2.2-I2V-A14B-Diffusers","prompt":"xxx。","id":"video_gen_idxxx","object":"video","status":"queued","size":null,"progress":0,"seconds":"4","quality":"default","completed_at":null,"created_at":1774992048,"remixed_from_video_id":null,"error":null,"media_type":"video/mp4","expires_at":null,"file_name":null,"inference_time_s":null}
Video ID: video_gen_idxxx
Waiting for video generation...
Status: in_progress
Status: in_progress
Status: in_progress
Status: failed
Video generation failed
{
"model": "/home/models/Wan-AI/Wan2.2-I2V-A14B-Diffusers",
"prompt": "xxx。",
"id": "video_gen_idxxx",
"object": "video",
"status": "failed",
"size": null,
"progress": 0,
"seconds": "4",
"quality": "default",
"completed_at": 1774992128,
"created_at": 1774992048,
"remixed_from_video_id": null,
"error": {
"code": "HTTPException",
"message": "500: {'message': 'CUDA out of memory. Tried to allocate 6.90 GiB. GPU 0 has a total capacity of 79.25 GiB of which 1.67 GiB is free. Including non-PyTorch memory, this process has 77.57 GiB memory in use. Of the allocated memory 68.90 GiB is allocated by PyTorch, and 6.49 GiB is reserved by PyTorch but unallocated. If reserved but unallocated memory is large try setting PYTORCH_ALLOC_CONF=expandable_segments:True to avoid fragmentation. See documentation for Memory Management (https://pytorch.org/docs/stable/notes/cuda.html#environment-variables)', 'request_id': 'video_gen_idxxx', 'stage_id': 0, 'error_type': 'DiffusionExecutionError', 'detail': {}}"
},
"media_type": "video/mp4",
"expires_at": null,
"file_name": null,
"inference_time_s": 79.77156472019851
}

Essential Elements of an Effective PR Description Checklist

The purpose of the PR, such as "Fix some issue (link existing issues this PR will resolve)".
The test plan. Please provide the test scripts & test commands. Please state the reasons if your codes don't require additional test scripts. For test file guidelines, please check the test style doc
The test results. Please paste the results comparison before and after, or the e2e results.
(Optional) The necessary documentation update, such as updating supported_models.md and examples for a new model. Please run mkdocs serve to sync the documentation editions to ./docs.
(Optional) Release notes update. If your change is user-facing, please update the release notes draft.

BEFORE SUBMITTING, PLEASE READ https://github.com/vllm-project/vllm-omni/blob/main/CONTRIBUTING.md (anything written below this line will be removed by GitHub Actions)

Signed-off-by: bjf-frz <frz123db@gmail.com>

chatgpt-codex-connector

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: cd3679e6f3

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

Open a pull request for review
Mark a draft as ready
Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

chatgpt-codex-connector · 2026-04-01T02:17:07Z

+                            exc_info=True,
+                        )
+                        # output = DiffusionOutput(error=str(exc))
+                        raise normalize_omni_error(exc) from exc


Update scheduler state before rethrowing executor failures

In DiffusionEngine.add_req_and_wait_for_response, this rethrow happens before self.scheduler.update_from_output(...) runs, so the failed scheduled request is never transitioned out of the scheduler’s running set. The finally block only calls pop_request_state, which does not clear _running; after one executor error (for example OOM), later requests can spin indefinitely because schedule() sees capacity as full, returns empty work, and has_requests() remains true forever. The failure path needs to mark the scheduled request finished (or otherwise remove it from _running) before propagating the exception.

Useful? React with 👍 / 👎.

david6666666 · 2026-04-01T02:34:06Z

@pi314ever please help to review, thx.

david6666666 · 2026-04-01T02:43:39Z

@fake0fan ptal thx

wuhang2014 · 2026-04-01T02:47:02Z

+        self,
+        message: str,
+        *,
+        status_code: int = 500,


status_code is a http related field, it should be defined in http related component, not in diffusion.

chatgpt-codex-connector

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: cd3679e6f3

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

Open a pull request for review
Mark a draft as ready
Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

chatgpt-codex-connector · 2026-04-01T07:04:09Z

-                    stage_id,
-                    result["error"],
+            if isinstance(result, dict) and result.get("type") == "error":
+                raise OmniRequestError(


Clean up request state before propagating stage errors

Raising OmniRequestError directly here exits generate() through the exception path, but AsyncOmni.generate only calls _log_summary_and_cleanup(request_id) after the normal completion path, so failed requests remain in self.request_states. With the new diffusion error messages, repeated stage failures (e.g., OOMs) will accumulate stale per-request queues/metrics in memory and can eventually degrade long-running servers.

Useful? React with 👍 / 👎.

hsliuustc0106 · 2026-04-01T08:37:09Z

add tests

pi314ever · 2026-04-01T19:47:36Z

Thanks @bjf-frz for creating this PR. The main issue I see with this approach is the diversion from vLLM standards of using EngineDeadError and EngineGenerateError to differentiate between fatal and recoverable errors. My PR #2426 addresses this issue in this way. Can you please test if that implementation also resolves your issue?

lishunyang12

left a few comments — the overall approach is reasonable but needs some cleanup before merging.

lishunyang12 · 2026-04-02T15:06:58Z

@@ -675,6 +675,49 @@ class DiffusionOutput:
    peak_memory_mb: float = 0.0




@dataclass here is a no-op — you define a custom __init__, so dataclass won't generate one, and there are no class-level field annotations for it to act on. Just drop the decorator.

Suggested change

class OmniRequestError(RuntimeError):

lishunyang12 · 2026-04-02T15:06:58Z

@@ -99,7 +99,12 @@ def step(self, request: OmniDiffusionRequest) -> list[OmniRequestOutput]:
        exec_total_time = time.perf_counter() - exec_start_time

        if output.error:


Remove the commented-out code. Same for the other # raise RuntimeError(...) / # output = ... / # error_msg = ... / # self.scheduler.pop_request_state(...) lines throughout the PR.

lishunyang12 · 2026-04-02T15:06:58Z

+                            exc_info=True,
+                        )
+                        # output = DiffusionOutput(error=str(exc))
+                        raise normalize_omni_error(exc) from exc


Agreeing with the bot comment above — this raise skips scheduler.update_from_output(...), so the request stays in the scheduler's running set. On a single-slot scheduler (_max_batch_size=1), this permanently blocks all future requests. You need to call scheduler.abort_request(sched_req_id) (or equivalent) before re-raising.

lishunyang12 · 2026-04-02T15:06:58Z

 from vllm.logger import init_logger

-from vllm_omni.diffusion.data import DiffusionOutput, OmniDiffusionConfig
+from vllm_omni.diffusion.data import DiffusionOutput, OmniDiffusionConfig, OmniRequestError, normalize_omni_error


Nit: line too long. Split the import.

Suggested change

from vllm_omni.diffusion.data import DiffusionOutput, OmniDiffusionConfig, OmniRequestError, normalize_omni_error

from vllm_omni.diffusion.data import (

DiffusionOutput,

OmniDiffusionConfig,

OmniRequestError,

normalize_omni_error,

)

Gaohan123 · 2026-04-16T02:37:00Z

@bjf-frz Hello, any updates?

bjf-frz · 2026-04-16T03:04:18Z

This PR conflicts with #2426. The issue is now resolved via #2426, and a new PR with this solution has been submitted and merged into the post1 branch.

fix oom no response to user

cd3679e

Signed-off-by: bjf-frz <frz123db@gmail.com>

bjf-frz requested a review from hsliuustc0106 as a code owner April 1, 2026 02:10

bjf-frz marked this pull request as draft April 1, 2026 02:10

chatgpt-codex-connector Bot reviewed Apr 1, 2026

View reviewed changes

david6666666 mentioned this pull request Apr 1, 2026

[RFC]: Exit on OOM #1346

Open

1 task

wuhang2014 reviewed Apr 1, 2026

View reviewed changes

david6666666 mentioned this pull request Apr 1, 2026

[RFC][0.20.0]: Qwen-Image、Qwen-Image-Layered、Qwen-Image-Edit-Plus、Wan2.2 Production-grade Feature Monitoring JiusiServe/vllm-omni#181

Closed

1 task

bjf-frz changed the title ~~[WIP][Bugfix]fix oom no response to user~~ [Bugfix]fix wan2.2 RuntimeError no response to user Apr 1, 2026

bjf-frz marked this pull request as ready for review April 1, 2026 06:57

chatgpt-codex-connector Bot reviewed Apr 1, 2026

View reviewed changes

This was referenced Apr 1, 2026

[RFC]: Unified failure semantics and request isolation for async generation #2392

Open

[Bug]: OOM during video generation causes client hang and state corruption for subsequent requests (Wan2.2) #2327

Open

pi314ever mentioned this pull request Apr 1, 2026

[Enhancement] Engine runtime errors #2426

Merged

5 tasks

lishunyang12 requested changes Apr 2, 2026

View reviewed changes

Gaohan123 added this to the v0.20.0 milestone Apr 16, 2026

bjf-frz closed this Apr 16, 2026

		@@ -675,6 +675,49 @@ class DiffusionOutput:
		peak_memory_mb: float = 0.0

		@@ -99,7 +99,12 @@ def step(self, request: OmniDiffusionRequest) -> list[OmniRequestOutput]:
		exec_total_time = time.perf_counter() - exec_start_time

		if output.error:

Conversation

bjf-frz commented Apr 1, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Purpose

Test Plan

Test Result

Uh oh!

chatgpt-codex-connector Bot left a comment

Choose a reason for hiding this comment

💡 Codex Review

Uh oh!

chatgpt-codex-connector Bot Apr 1, 2026

Choose a reason for hiding this comment

Uh oh!

david6666666 commented Apr 1, 2026

Uh oh!

david6666666 commented Apr 1, 2026

Uh oh!

wuhang2014 Apr 1, 2026

Choose a reason for hiding this comment

Uh oh!

chatgpt-codex-connector Bot left a comment

Choose a reason for hiding this comment

💡 Codex Review

Uh oh!

chatgpt-codex-connector Bot Apr 1, 2026

Choose a reason for hiding this comment

Uh oh!

hsliuustc0106 commented Apr 1, 2026

Uh oh!

pi314ever commented Apr 1, 2026

Uh oh!

lishunyang12 left a comment

Choose a reason for hiding this comment

Uh oh!

lishunyang12 Apr 2, 2026

Choose a reason for hiding this comment

Uh oh!

lishunyang12 Apr 2, 2026

Choose a reason for hiding this comment

Uh oh!

lishunyang12 Apr 2, 2026

Choose a reason for hiding this comment

Uh oh!

lishunyang12 Apr 2, 2026

Choose a reason for hiding this comment

Uh oh!

Gaohan123 commented Apr 16, 2026

Uh oh!

bjf-frz commented Apr 16, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

7 participants

bjf-frz commented Apr 1, 2026 •

edited

Loading