[Bugfix] Align Offline and Online Inference by skf-1999 · Pull Request #3506 · vllm-project/vllm-omni

skf-1999 · 2026-05-11T12:49:44Z

Purpose

Align offline and online inference

Test Plan

t2i:

PROMPT='A cinematic medium shot captures a single Asian woman seated on a chair within a dimly lit room, creating an intimate and theatrical atmosphere. The composition is focused on the subject, rendered with rich colors and intricate textures that evoke a nostalgic and moody feeling.

The primary subject is a young Asian woman with a thoughtful and expressive countenance, her gaze directed slightly away from the camera. She is seated in a relaxed yet elegant posture on an ornate, vintage armchair. The chair is upholstered in a deep red velvet, its fabric showing detailed, intricate textures and slight signs of wear. She wears a simple, elegant dress in a dark teal hue, the material catching the light in a way that reveals its fine-woven texture. Her skin has a soft, matte quality, and the light delicately models the contours of her face and arms.

The surrounding room is characterized by its vintage decor, which contributes to the historic and evocative mood. In the immediate background, partially blurred due to a shallow depth of field consistent with a f/2.8 aperture, the wall is covered with wallpaper featuring a subtle, damask pattern. The overall color palette is a carefully balanced interplay of deep teal and rich red hues, creating a visually compelling and cohesive environment. The entire scene is detailed, from the fibers of the upholstery to the subtle patterns on the wall.

The lighting is highly dramatic and artistic, defined by high contrast and pronounced shadow play. A single key light source, positioned off-camera, projects gobo lighting patterns onto the scene, casting intricate shapes of light and shadow across the woman and the back wall. These dramatic shadows create a strong scense of depth and a theatrical quality. While some shadows are deep and defined, others remain soft, gently wrapping around the subject and preventing the loss of detail in darker areas. The soft focus on the background enhances the intimate feeling, drawing all attention to the expressive subject. The overall image presents a cinematic, photorealistic photography style.'

Offline:

python3 examples/offline_inference/hunyuan_image3/end2end.py \
  --modality text2img \
  --model /data/HunyuanImage-3.0-Instruct \
  --prompts "$PROMPT" \
  --bot-task think \
  --sys-type en_unified \
  --seed 42 \
  --steps 50 \
  --output ./out_t2i_think_unified \
  --deploy-config vllm_omni/deploy/hunyuan_image3_dit.yaml

Online:

curl -X POST http://localhost:8091/v1/images/generations \
  -H "Content-Type: application/json" \
  -d '{
    "prompt": "A cinematic medium shot captures a single Asian woman seated on a chair within a dimly lit room, creating an intimate and theatrical atmosphere. The composition is focused on the subject, rendered with rich colors and intricate textures that evoke a nostalgic and moody feeling. The primary subject is a young Asian woman with a thoughtful and expressive countenance, her gaze directed slightly away from the camera. She is seated in a relaxed yet elegant posture on an ornate, vintage armchair. The chair is upholstered in a deep red velvet, its fabric showing detailed, intricate textures and slight signs of wear. She wears a simple, elegant dress in a dark teal hue, the material catching the light in a way that reveals its fine-woven texture. Her skin has a soft, matte quality, and the light delicately models the contours of her face and arms. The surrounding room is characterized by its vintage decor, which contributes to the historic and evocative mood. In the immediate background, partially blurred due to a shallow depth of field consistent with a f/2.8 aperture, the wall is covered with wallpaper featuring a subtle, damask pattern. The overall color palette is a carefully balanced interplay of deep teal and rich red hues, creating a visually compelling and cohesive environment. The entire scene is detailed, from the fibers of the upholstery to the subtle patterns on the wall. The lighting is highly dramatic and artistic, defined by high contrast and pronounced shadow play. A single key light source, positioned off-camera, projects gobo lighting patterns onto the scene, casting intricate shapes of light and shadow across the woman and the back wall. These dramatic shadows create a strong scense of depth and a theatrical quality. While some shadows are deep and defined, others remain soft, gently wrapping around the subject and preventing the loss of detail in darker areas. The soft focus on the background enhances the intimate feeling, drawing all attention to the expressive subject. The overall image presents a cinematic, photorealistic photography style.",
    "use_system_prompt": "en_unified",
    "bot_task": "think",
    "num_inference_steps": 50,
    "n": 4,
    "seed": 42
  }' | jq -r '.data[0].b64_json' | base64 -d > t2i_think_unified.png

Also added online AR cot_text output support.
The AR stage YAML needs to add final_output: true and final_output_type: text.

stages:
  - stage_id: 0
    final_output: true
    final_output_type: text

Online request:

curl -X POST http://localhost:8091/v1/images/edits \
  -F "image=@/data/edit_dog.png" \
  -F "prompt=新年宠物海报，Q版圆润的可爱标题\"新年快乐汪\"，副标题\"HAPPY NEW YEAR\"。 鱼眼镜头，背景是房间门口，近景，上传的主体歪头笑，围着红色围巾，戴着红色毛线帽，高清，绒毛细节，面部特写。 宝丽莱相纸，超现实主义，写实主义，胶片摄影，打印颗粒感肌理。肌理，超写实，复古感。" \
  -F "use_system_prompt=en_unified" \
  -F "bot_task=think" \
  -F "n=1" \
  -F "num_inference_steps=50" \
  -F "guidance_scale=2.5" \
  -F "seed=42" \
  | jq '.cot_output'

Bot task type. Use full task names or simplified aliases: think, recaption, vanilla.

Test Result

Offline	Online

Essential Elements of an Effective PR Description Checklist

The purpose of the PR, such as "Fix some issue (link existing issues this PR will resolve)".
The test plan. Please provide the test scripts & test commands. Please state the reasons if your codes don't require additional test scripts. For test file guidelines, please check the test style doc
The test results. Please paste the results comparison before and after, or the e2e results.
(Optional) The necessary documentation update, such as updating supported_models.md and examples for a new model. Please run mkdocs serve to sync the documentation editions to ./docs.
(Optional) Release notes update. If your change is user-facing, please update the release notes draft.

BEFORE SUBMITTING, PLEASE READ https://github.com/vllm-project/vllm-omni/blob/main/CONTRIBUTING.md (anything written below this line will be removed by GitHub Actions)

chatgpt-codex-connector

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 81f0e2772a

ℹ️ About Codex in GitHub

Codex has been enabled to automatically review pull requests in this repo. Reviews are triggered when you

Open a pull request for review
Mark a draft as ready
Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

When you sign up for Codex through ChatGPT, Codex can also answer questions or update the PR, like "@codex address that feedback".

chatgpt-codex-connector · 2026-05-11T12:51:33Z

        images = getattr(result.request_output, "images", [])
        stage_durations = result.stage_durations
        peak_memory_mb = result.peak_memory_mb
+        logger.info(f"[DEBUG] all_outputs length={len(all_outputs)}")


Guard CoT extraction for non-AsyncOmni engines

In generate_diffusion_images, all_outputs is only initialized inside the if isinstance(engine, AsyncOmni) branch, but it is used unconditionally afterward; when engine is not AsyncOmni, this raises UnboundLocalError before the function can return images. This breaks single-stage/non-Async diffusion serving paths that rely on the else: result = await engine.generate(...) branch.

Useful? React with 👍 / 👎.

chatgpt-codex-connector · 2026-05-11T12:51:33Z

                flat_images.append(item)

-        return flat_images, stage_durations, peak_memory_mb
+        return flat_images, stage_durations, peak_memory_mb, cot_output


Keep return tuple compatible with existing image-edit caller

This function now returns four values (flat_images, stage_durations, peak_memory_mb, cot_output), but /v1/images/edits still unpacks only three (images, _, _ = generation_result in api_server.py), which causes ValueError: too many values to unpack on that multi-stage edit path. Either update all callers or return a backward-compatible shape.

Useful? React with 👍 / 👎.

images/edits is now supported

skf-1999 · 2026-05-11T12:51:35Z

Online CoT output isn't ready yet — will clean up redundant code ，all pictures and add CoT support later.

hsliuustc0106

Merge conflict (CONFLICTING status). Please rebase before review.

skf-1999 · 2026-05-12T08:09:09Z

Online CoT output isn't ready yet — will clean up redundant code ，all pictures and add CoT support later.

CoT support has been added.

skf-1999 · 2026-05-12T12:45:30Z

Merge conflict (CONFLICTING status). Please rebase before review.

Resolved merge conflicts.

Gaohan123

Thanks for the contribution. Here are 2 suggestions:

Supplement UT for it
Please refer to PR #3444 to confirm the list of supported bot tasks, as you two are not consistent.

skf-1999 · 2026-05-18T07:42:19Z

The buildkite/vllm-omni-intel-ci failure in test_mix_to_audio is a pre-existing environment issue — the Intel CI image is missing the eSpeak system dependency required by pyttsx3. This is unrelated to the changes in this PR.

skf-1999 · 2026-05-18T09:01:13Z

Thanks for the contribution. Here are 2 suggestions:

Supplement UT for it

Please refer to PR [Feature] HunyuanImage-3.0 IT2I: multi-image input + prompt API cleanup #3444 to confirm the list of supported bot tasks, as you two are not consistent.

Currently, unit tests are pending and will be added. The bot_task has already been unified following PR #3444.
Conflicts have been resolved.

TaffyOfficial · 2026-05-18T11:36:21Z

P1: The use_system_prompt parameter for /v1/images/generations does not actually take effect.api_server.py (line 1538) passes extra_body["use_system_prompt"], while serving_chat.py (line 2251) only reads sys_type. As a result, the use_system_prompt="custom" or other modes passed by users in the Images Generation API are silently ignored, and the multi-stage pipeline fails to work as per the declared schema.
P1/P2: The bot_task schema is inconsistent with the actual contract of the Hunyuan prompt builder.images.py (line 116) allows legacy composite tasks such as t2i_think and it2i_recaption, but the bot_task in prompt_utils.py (line 67) only accepts think, recaption, think_recaption, vanilla, and None in practice. The values allowed by the schema will trigger an "Unknown bot_task" error in the builder when passed to serving_chat.py (line 2279). Meanwhile, the schema omits think_recaption, which is actually supported.
P2: all_outputs is still initialized only in the AsyncOmni branch.all_outputs is initialized in serving_chat.py (line 2501), but the code unconditionally iterates over it in serving_chat.py (line 2523). The current service initialization almost always follows the AsyncOmni path, so this does not cause immediate failure on the main path; however, the code compatibility branch is unstable.

TaffyOfficial · 2026-05-18T11:37:20Z

P2: Online stop tokens must not be hardcoded; the stop tokens provided by the framework should be used instead.

skf-1999 · 2026-05-18T14:22:27Z

P1: The use_system_prompt parameter for /v1/images/generations does not actually take effect.api_server.py (line 1538) passes extra_body["use_system_prompt"], while serving_chat.py (line 2251) only reads sys_type. As a result, the use_system_prompt="custom" or other modes passed by users in the Images Generation API are silently ignored, and the multi-stage pipeline fails to work as per the declared schema. P1/P2: The bot_task schema is inconsistent with the actual contract of the Hunyuan prompt builder.images.py (line 116) allows legacy composite tasks such as t2i_think and it2i_recaption, but the bot_task in prompt_utils.py (line 67) only accepts think, recaption, think_recaption, vanilla, and None in practice. The values allowed by the schema will trigger an "Unknown bot_task" error in the builder when passed to serving_chat.py (line 2279). Meanwhile, the schema omits think_recaption, which is actually supported. P2: all_outputs is still initialized only in the AsyncOmni branch.all_outputs is initialized in serving_chat.py (line 2501), but the code unconditionally iterates over it in serving_chat.py (line 2523). The current service initialization almost always follows the AsyncOmni path, so this does not cause immediate failure on the main path; however, the code compatibility branch is unstable.

Will handle all of these in the next commit, along with unit tests.

Signed-off-by: skf1999 <13234016272@163.com>

skf-1999 requested review from Gaohan123, hsliuustc0106, tzhouam and ywang96 as code owners May 11, 2026 12:49

chatgpt-codex-connector Bot reviewed May 11, 2026

View reviewed changes

hsliuustc0106 added the frontend code related to entrypoint label May 11, 2026

hsliuustc0106 reviewed May 11, 2026

View reviewed changes

skf-1999 force-pushed the offline-online branch 2 times, most recently from dd48197 to 610f591 Compare May 12, 2026 08:06

skf-1999 changed the title ~~[Bugfix]Aligning Offline and Online Text-to-Image (t2i) Inference~~ [Bugfix] Align offline and online inference May 12, 2026

skf-1999 changed the title ~~[Bugfix] Align offline and online inference~~ [Bugfix] Align Offline and Online Text-to-Image (t2i) Inference May 12, 2026

skf-1999 force-pushed the offline-online branch 3 times, most recently from 74b45c3 to c966702 Compare May 13, 2026 03:35

skf-1999 changed the title ~~[Bugfix] Align Offline and Online Text-to-Image (t2i) Inference~~ [Bugfix] Align Offline and Online Inference May 13, 2026

hsliuustc0106 added the ready label to trigger buildkite CI label May 14, 2026

skf-1999 force-pushed the offline-online branch from c966702 to 388d6b2 Compare May 14, 2026 08:53

Gaohan123 added this to the v0.22.0 milestone May 14, 2026

Gaohan123 reviewed May 14, 2026

View reviewed changes

skf-1999 force-pushed the offline-online branch 2 times, most recently from 74cd091 to 1939c90 Compare May 18, 2026 07:06

Align Offline and Online Inference

1939c90

Signed-off-by: skf1999 <13234016272@163.com>

skf-1999 added 2 commits May 19, 2026 11:40

Align Offline and Online Inference

36705b1

Signed-off-by: skf1999 <13234016272@163.com>

Align Offline and Online Inference

ce2a843

Signed-off-by: skf1999 <13234016272@163.com>

Conversation

skf-1999 commented May 11, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Purpose

Test Plan

Test Result

Uh oh!

chatgpt-codex-connector Bot left a comment

Choose a reason for hiding this comment

💡 Codex Review

Uh oh!

chatgpt-codex-connector Bot May 11, 2026

Choose a reason for hiding this comment

Uh oh!

chatgpt-codex-connector Bot May 11, 2026

Choose a reason for hiding this comment

Uh oh!

skf-1999 May 12, 2026

Choose a reason for hiding this comment

Uh oh!

skf-1999 commented May 11, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

hsliuustc0106 left a comment

Choose a reason for hiding this comment

Uh oh!

skf-1999 commented May 12, 2026

Uh oh!

skf-1999 commented May 12, 2026

Uh oh!

Gaohan123 left a comment

Choose a reason for hiding this comment

Uh oh!

skf-1999 commented May 18, 2026

Uh oh!

skf-1999 commented May 18, 2026

Uh oh!

TaffyOfficial commented May 18, 2026

Uh oh!

TaffyOfficial commented May 18, 2026

Uh oh!

skf-1999 commented May 18, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

skf-1999 commented May 11, 2026 •

edited

Loading

skf-1999 commented May 11, 2026 •

edited

Loading