Skip to content

[Bugfix]fix wan2.2 RuntimeError no response to user#2390

Closed
bjf-frz wants to merge 1 commit into
vllm-project:mainfrom
bjf-frz:fix_http_error_code
Closed

[Bugfix]fix wan2.2 RuntimeError no response to user#2390
bjf-frz wants to merge 1 commit into
vllm-project:mainfrom
bjf-frz:fix_http_error_code

Conversation

@bjf-frz
Copy link
Copy Markdown
Contributor

@bjf-frz bjf-frz commented Apr 1, 2026

PLEASE FILL IN THE PR DESCRIPTION HERE ENSURING ALL CHECKLIST ITEMS (AT THE BOTTOM) HAVE BEEN CONSIDERED.

Purpose

When an OOM error occurs in wan2.2, the server throws a RuntimeError but does not return an HTTP error code. As a result, the user receives no response from the server, and the process continues to loop.

Try to address #2327

Test Plan

Test Result

Creating video generation task...
Create response: {"model":"Wan-AI/Wan2.2-I2V-A14B-Diffusers","prompt":"xxx。","id":"video_gen_idxxx","object":"video","status":"queued","size":null,"progress":0,"seconds":"4","quality":"default","completed_at":null,"created_at":1774992048,"remixed_from_video_id":null,"error":null,"media_type":"video/mp4","expires_at":null,"file_name":null,"inference_time_s":null}
Video ID: video_gen_idxxx
Waiting for video generation...
Status: in_progress
Status: in_progress
Status: in_progress
Status: failed
Video generation failed
{
"model": "/home/models/Wan-AI/Wan2.2-I2V-A14B-Diffusers",
"prompt": "xxx。",
"id": "video_gen_idxxx",
"object": "video",
"status": "failed",
"size": null,
"progress": 0,
"seconds": "4",
"quality": "default",
"completed_at": 1774992128,
"created_at": 1774992048,
"remixed_from_video_id": null,
"error": {
"code": "HTTPException",
"message": "500: {'message': 'CUDA out of memory. Tried to allocate 6.90 GiB. GPU 0 has a total capacity of 79.25 GiB of which 1.67 GiB is free. Including non-PyTorch memory, this process has 77.57 GiB memory in use. Of the allocated memory 68.90 GiB is allocated by PyTorch, and 6.49 GiB is reserved by PyTorch but unallocated. If reserved but unallocated memory is large try setting PYTORCH_ALLOC_CONF=expandable_segments:True to avoid fragmentation. See documentation for Memory Management (https://pytorch.org/docs/stable/notes/cuda.html#environment-variables)', 'request_id': 'video_gen_idxxx', 'stage_id': 0, 'error_type': 'DiffusionExecutionError', 'detail': {}}"
},
"media_type": "video/mp4",
"expires_at": null,
"file_name": null,
"inference_time_s": 79.77156472019851
}


Essential Elements of an Effective PR Description Checklist
  • The purpose of the PR, such as "Fix some issue (link existing issues this PR will resolve)".
  • The test plan. Please provide the test scripts & test commands. Please state the reasons if your codes don't require additional test scripts. For test file guidelines, please check the test style doc
  • The test results. Please paste the results comparison before and after, or the e2e results.
  • (Optional) The necessary documentation update, such as updating supported_models.md and examples for a new model. Please run mkdocs serve to sync the documentation editions to ./docs.
  • (Optional) Release notes update. If your change is user-facing, please update the release notes draft.

BEFORE SUBMITTING, PLEASE READ https://github.com/vllm-project/vllm-omni/blob/main/CONTRIBUTING.md (anything written below this line will be removed by GitHub Actions)

Signed-off-by: bjf-frz <frz123db@gmail.com>
@bjf-frz bjf-frz requested a review from hsliuustc0106 as a code owner April 1, 2026 02:10
@bjf-frz bjf-frz marked this pull request as draft April 1, 2026 02:10
Copy link
Copy Markdown

@chatgpt-codex-connector chatgpt-codex-connector Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: cd3679e6f3

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

exc_info=True,
)
# output = DiffusionOutput(error=str(exc))
raise normalize_omni_error(exc) from exc
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P1 Badge Update scheduler state before rethrowing executor failures

In DiffusionEngine.add_req_and_wait_for_response, this rethrow happens before self.scheduler.update_from_output(...) runs, so the failed scheduled request is never transitioned out of the scheduler’s running set. The finally block only calls pop_request_state, which does not clear _running; after one executor error (for example OOM), later requests can spin indefinitely because schedule() sees capacity as full, returns empty work, and has_requests() remains true forever. The failure path needs to mark the scheduled request finished (or otherwise remove it from _running) before propagating the exception.

Useful? React with 👍 / 👎.

@david6666666 david6666666 mentioned this pull request Apr 1, 2026
1 task
@david6666666
Copy link
Copy Markdown
Collaborator

@pi314ever please help to review, thx.

@david6666666
Copy link
Copy Markdown
Collaborator

@fake0fan ptal thx

self,
message: str,
*,
status_code: int = 500,
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

status_code is a http related field, it should be defined in http related component, not in diffusion.

@bjf-frz bjf-frz changed the title [WIP][Bugfix]fix oom no response to user [Bugfix]fix wan2.2 RuntimeError no response to user Apr 1, 2026
@bjf-frz bjf-frz marked this pull request as ready for review April 1, 2026 06:57
Copy link
Copy Markdown

@chatgpt-codex-connector chatgpt-codex-connector Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: cd3679e6f3

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

stage_id,
result["error"],
if isinstance(result, dict) and result.get("type") == "error":
raise OmniRequestError(
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 Badge Clean up request state before propagating stage errors

Raising OmniRequestError directly here exits generate() through the exception path, but AsyncOmni.generate only calls _log_summary_and_cleanup(request_id) after the normal completion path, so failed requests remain in self.request_states. With the new diffusion error messages, repeated stage failures (e.g., OOMs) will accumulate stale per-request queues/metrics in memory and can eventually degrade long-running servers.

Useful? React with 👍 / 👎.

@hsliuustc0106
Copy link
Copy Markdown
Collaborator

add tests

@pi314ever
Copy link
Copy Markdown
Contributor

Thanks @bjf-frz for creating this PR. The main issue I see with this approach is the diversion from vLLM standards of using EngineDeadError and EngineGenerateError to differentiate between fatal and recoverable errors. My PR #2426 addresses this issue in this way. Can you please test if that implementation also resolves your issue?

Copy link
Copy Markdown
Collaborator

@lishunyang12 lishunyang12 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

left a few comments — the overall approach is reasonable but needs some cleanup before merging.

@@ -675,6 +675,49 @@ class DiffusionOutput:
peak_memory_mb: float = 0.0


Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@dataclass here is a no-op — you define a custom __init__, so dataclass won't generate one, and there are no class-level field annotations for it to act on. Just drop the decorator.

Suggested change
class OmniRequestError(RuntimeError):

@@ -99,7 +99,12 @@ def step(self, request: OmniDiffusionRequest) -> list[OmniRequestOutput]:
exec_total_time = time.perf_counter() - exec_start_time

if output.error:
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Remove the commented-out code. Same for the other # raise RuntimeError(...) / # output = ... / # error_msg = ... / # self.scheduler.pop_request_state(...) lines throughout the PR.

exc_info=True,
)
# output = DiffusionOutput(error=str(exc))
raise normalize_omni_error(exc) from exc
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Agreeing with the bot comment above — this raise skips scheduler.update_from_output(...), so the request stays in the scheduler's running set. On a single-slot scheduler (_max_batch_size=1), this permanently blocks all future requests. You need to call scheduler.abort_request(sched_req_id) (or equivalent) before re-raising.

from vllm.logger import init_logger

from vllm_omni.diffusion.data import DiffusionOutput, OmniDiffusionConfig
from vllm_omni.diffusion.data import DiffusionOutput, OmniDiffusionConfig, OmniRequestError, normalize_omni_error
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nit: line too long. Split the import.

Suggested change
from vllm_omni.diffusion.data import DiffusionOutput, OmniDiffusionConfig, OmniRequestError, normalize_omni_error
from vllm_omni.diffusion.data import (
DiffusionOutput,
OmniDiffusionConfig,
OmniRequestError,
normalize_omni_error,
)

@Gaohan123
Copy link
Copy Markdown
Collaborator

@bjf-frz Hello, any updates?

@Gaohan123 Gaohan123 added this to the v0.20.0 milestone Apr 16, 2026
@bjf-frz
Copy link
Copy Markdown
Contributor Author

bjf-frz commented Apr 16, 2026

This PR conflicts with #2426. The issue is now resolved via #2426, and a new PR with this solution has been submitted and merged into the post1 branch.

@bjf-frz bjf-frz closed this Apr 16, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

7 participants