[Bugfix]fix wan2.2 RuntimeError no response to user#2390
Conversation
Signed-off-by: bjf-frz <frz123db@gmail.com>
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: cd3679e6f3
ℹ️ About Codex in GitHub
Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".
| exc_info=True, | ||
| ) | ||
| # output = DiffusionOutput(error=str(exc)) | ||
| raise normalize_omni_error(exc) from exc |
There was a problem hiding this comment.
Update scheduler state before rethrowing executor failures
In DiffusionEngine.add_req_and_wait_for_response, this rethrow happens before self.scheduler.update_from_output(...) runs, so the failed scheduled request is never transitioned out of the scheduler’s running set. The finally block only calls pop_request_state, which does not clear _running; after one executor error (for example OOM), later requests can spin indefinitely because schedule() sees capacity as full, returns empty work, and has_requests() remains true forever. The failure path needs to mark the scheduled request finished (or otherwise remove it from _running) before propagating the exception.
Useful? React with 👍 / 👎.
|
@pi314ever please help to review, thx. |
|
@fake0fan ptal thx |
| self, | ||
| message: str, | ||
| *, | ||
| status_code: int = 500, |
There was a problem hiding this comment.
status_code is a http related field, it should be defined in http related component, not in diffusion.
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: cd3679e6f3
ℹ️ About Codex in GitHub
Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".
| stage_id, | ||
| result["error"], | ||
| if isinstance(result, dict) and result.get("type") == "error": | ||
| raise OmniRequestError( |
There was a problem hiding this comment.
Clean up request state before propagating stage errors
Raising OmniRequestError directly here exits generate() through the exception path, but AsyncOmni.generate only calls _log_summary_and_cleanup(request_id) after the normal completion path, so failed requests remain in self.request_states. With the new diffusion error messages, repeated stage failures (e.g., OOMs) will accumulate stale per-request queues/metrics in memory and can eventually degrade long-running servers.
Useful? React with 👍 / 👎.
|
add tests |
|
Thanks @bjf-frz for creating this PR. The main issue I see with this approach is the diversion from vLLM standards of using |
lishunyang12
left a comment
There was a problem hiding this comment.
left a few comments — the overall approach is reasonable but needs some cleanup before merging.
| @@ -675,6 +675,49 @@ class DiffusionOutput: | |||
| peak_memory_mb: float = 0.0 | |||
|
|
|||
|
|
|||
There was a problem hiding this comment.
@dataclass here is a no-op — you define a custom __init__, so dataclass won't generate one, and there are no class-level field annotations for it to act on. Just drop the decorator.
| class OmniRequestError(RuntimeError): |
| @@ -99,7 +99,12 @@ def step(self, request: OmniDiffusionRequest) -> list[OmniRequestOutput]: | |||
| exec_total_time = time.perf_counter() - exec_start_time | |||
|
|
|||
| if output.error: | |||
There was a problem hiding this comment.
Remove the commented-out code. Same for the other # raise RuntimeError(...) / # output = ... / # error_msg = ... / # self.scheduler.pop_request_state(...) lines throughout the PR.
| exc_info=True, | ||
| ) | ||
| # output = DiffusionOutput(error=str(exc)) | ||
| raise normalize_omni_error(exc) from exc |
There was a problem hiding this comment.
Agreeing with the bot comment above — this raise skips scheduler.update_from_output(...), so the request stays in the scheduler's running set. On a single-slot scheduler (_max_batch_size=1), this permanently blocks all future requests. You need to call scheduler.abort_request(sched_req_id) (or equivalent) before re-raising.
| from vllm.logger import init_logger | ||
|
|
||
| from vllm_omni.diffusion.data import DiffusionOutput, OmniDiffusionConfig | ||
| from vllm_omni.diffusion.data import DiffusionOutput, OmniDiffusionConfig, OmniRequestError, normalize_omni_error |
There was a problem hiding this comment.
Nit: line too long. Split the import.
| from vllm_omni.diffusion.data import DiffusionOutput, OmniDiffusionConfig, OmniRequestError, normalize_omni_error | |
| from vllm_omni.diffusion.data import ( | |
| DiffusionOutput, | |
| OmniDiffusionConfig, | |
| OmniRequestError, | |
| normalize_omni_error, | |
| ) |
|
@bjf-frz Hello, any updates? |
PLEASE FILL IN THE PR DESCRIPTION HERE ENSURING ALL CHECKLIST ITEMS (AT THE BOTTOM) HAVE BEEN CONSIDERED.
Purpose
When an OOM error occurs in wan2.2, the server throws a RuntimeError but does not return an HTTP error code. As a result, the user receives no response from the server, and the process continues to loop.
Try to address #2327
Test Plan
Test Result
Creating video generation task...
Create response: {"model":"Wan-AI/Wan2.2-I2V-A14B-Diffusers","prompt":"xxx。","id":"video_gen_idxxx","object":"video","status":"queued","size":null,"progress":0,"seconds":"4","quality":"default","completed_at":null,"created_at":1774992048,"remixed_from_video_id":null,"error":null,"media_type":"video/mp4","expires_at":null,"file_name":null,"inference_time_s":null}
Video ID: video_gen_idxxx
Waiting for video generation...
Status: in_progress
Status: in_progress
Status: in_progress
Status: failed
Video generation failed
{
"model": "/home/models/Wan-AI/Wan2.2-I2V-A14B-Diffusers",
"prompt": "xxx。",
"id": "video_gen_idxxx",
"object": "video",
"status": "failed",
"size": null,
"progress": 0,
"seconds": "4",
"quality": "default",
"completed_at": 1774992128,
"created_at": 1774992048,
"remixed_from_video_id": null,
"error": {
"code": "HTTPException",
"message": "500: {'message': 'CUDA out of memory. Tried to allocate 6.90 GiB. GPU 0 has a total capacity of 79.25 GiB of which 1.67 GiB is free. Including non-PyTorch memory, this process has 77.57 GiB memory in use. Of the allocated memory 68.90 GiB is allocated by PyTorch, and 6.49 GiB is reserved by PyTorch but unallocated. If reserved but unallocated memory is large try setting PYTORCH_ALLOC_CONF=expandable_segments:True to avoid fragmentation. See documentation for Memory Management (https://pytorch.org/docs/stable/notes/cuda.html#environment-variables)', 'request_id': 'video_gen_idxxx', 'stage_id': 0, 'error_type': 'DiffusionExecutionError', 'detail': {}}"
},
"media_type": "video/mp4",
"expires_at": null,
"file_name": null,
"inference_time_s": 79.77156472019851
}
Essential Elements of an Effective PR Description Checklist
supported_models.mdandexamplesfor a new model. Please runmkdocs serveto sync the documentation editions to./docs.BEFORE SUBMITTING, PLEASE READ https://github.com/vllm-project/vllm-omni/blob/main/CONTRIBUTING.md (anything written below this line will be removed by GitHub Actions)