[Bugfix] Align Offline and Online Inference#3506
Conversation
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: 81f0e2772a
ℹ️ About Codex in GitHub
Codex has been enabled to automatically review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
When you sign up for Codex through ChatGPT, Codex can also answer questions or update the PR, like "@codex address that feedback".
| images = getattr(result.request_output, "images", []) | ||
| stage_durations = result.stage_durations | ||
| peak_memory_mb = result.peak_memory_mb | ||
| logger.info(f"[DEBUG] all_outputs length={len(all_outputs)}") |
There was a problem hiding this comment.
Guard CoT extraction for non-AsyncOmni engines
In generate_diffusion_images, all_outputs is only initialized inside the if isinstance(engine, AsyncOmni) branch, but it is used unconditionally afterward; when engine is not AsyncOmni, this raises UnboundLocalError before the function can return images. This breaks single-stage/non-Async diffusion serving paths that rely on the else: result = await engine.generate(...) branch.
Useful? React with 👍 / 👎.
| flat_images.append(item) | ||
|
|
||
| return flat_images, stage_durations, peak_memory_mb | ||
| return flat_images, stage_durations, peak_memory_mb, cot_output |
There was a problem hiding this comment.
Keep return tuple compatible with existing image-edit caller
This function now returns four values (flat_images, stage_durations, peak_memory_mb, cot_output), but /v1/images/edits still unpacks only three (images, _, _ = generation_result in api_server.py), which causes ValueError: too many values to unpack on that multi-stage edit path. Either update all callers or return a backward-compatible shape.
Useful? React with 👍 / 👎.
There was a problem hiding this comment.
images/edits is now supported
|
Online CoT output isn't ready yet — will clean up redundant code ,all pictures and add CoT support later. |
hsliuustc0106
left a comment
There was a problem hiding this comment.
Merge conflict (CONFLICTING status). Please rebase before review.
dd48197 to
610f591
Compare
CoT support has been added. |
Resolved merge conflicts. |
74b45c3 to
c966702
Compare
74cd091 to
1939c90
Compare
|
The |
Currently, unit tests are pending and will be added. The bot_task has already been unified following PR #3444. |
|
P1: The use_system_prompt parameter for /v1/images/generations does not actually take effect.api_server.py (line 1538) passes extra_body["use_system_prompt"], while serving_chat.py (line 2251) only reads sys_type. As a result, the use_system_prompt="custom" or other modes passed by users in the Images Generation API are silently ignored, and the multi-stage pipeline fails to work as per the declared schema. |
|
P2: Online stop tokens must not be hardcoded; the stop tokens provided by the framework should be used instead. |
Will handle all of these in the next commit, along with unit tests. |
Signed-off-by: skf1999 <13234016272@163.com>
Signed-off-by: skf1999 <13234016272@163.com>
Signed-off-by: skf1999 <13234016272@163.com>
Purpose
Align offline and online inference
Test Plan
t2i:
Offline:
Online:
Also added online AR cot_text output support.
The AR stage YAML needs to add final_output: true and final_output_type: text.
Online request:
Bot task type. Use full task names or simplified aliases: think, recaption, vanilla.
Test Result
Essential Elements of an Effective PR Description Checklist
supported_models.mdandexamplesfor a new model. Please runmkdocs serveto sync the documentation editions to./docs.BEFORE SUBMITTING, PLEASE READ https://github.com/vllm-project/vllm-omni/blob/main/CONTRIBUTING.md (anything written below this line will be removed by GitHub Actions)