[Bugfix]Fix bug of online server can not return mutli images by Hu1Lcode · Pull Request #2007 · vllm-project/vllm-omni

Hu1Lcode · 2026-03-19T06:45:35Z

PLEASE FILL IN THE PR DESCRIPTION HERE ENSURING ALL CHECKLIST ITEMS (AT THE BOTTOM) HAVE BEEN CONSIDERED.

Purpose

When i use vllm-omni to infer Qwen-Image-Layered, i found this model return mutli pictures but vllm-omni report an error.

Test Plan

Start a Qwen-Image-Layered online service to conduct reasoning tests.

Our start command

vllm serve ./Qwen-Image-Layered
--omni
--port 8091
--log-stats
--vae-use-slicing
--vae-use-tiling

curl command

#!/bin/bash

INPUT_IMG="./rabbit.png"
CURRENT_TIME=$(date +%Y%m%d%H%M%S)
OUTPUT="${3:-image_edit_${CURRENT_TIME}.png}"
#OUTPUT="./output.png"
REQUEST_JSON_FILE="./test.json"
IMAGE_BASE64=$(base64 -w0 "$INPUT_IMG")

base64 -w0 "$INPUT_IMG"
| jq -Rs '{
messages: [{
role: "user",
content: [
{"type": "image_url", "image_url": {"url": ("data:image/png;base64," + .)}}
]
}],
extra_body: {
num_inference_steps: 50,
cfg_scale: 4.0,
output: "layered",
layers: 4,
color_format: "RGBA"
}
}' > "$REQUEST_JSON_FILE"

curl -s "http://localhost:8091/v1/chat/completions"
-H "Content-Type: application/json"
-d @"$REQUEST_JSON_FILE" |
jq -r '.choices[0].message.content[] | .image_url.url | split(",")[1]' |
while IFS= read -r base64_data; do
((i++))
echo "$base64_data" | base64 -d > "${OUTPUT_PREFIX}${i}.png"
echo "Saved ${OUTPUT_PREFIX}${i}.png"
done

Test result

output 4 pics

Essential Elements of an Effective PR Description Checklist

The purpose of the PR, such as "Fix some issue (link existing issues this PR will resolve)".
The test plan. Please provide the test scripts & test commands. Please state the reasons if your codes don't require additional test scripts. For test file guidelines, please check the test style doc
The test results. Please paste the results comparison before and after, or the e2e results.
(Optional) The necessary documentation update, such as updating supported_models.md and examples for a new model. Please run mkdocs serve to sync the documentation editions to ./docs.
(Optional) Release notes update. If your change is user-facing, please update the release notes draft.

BEFORE SUBMITTING, PLEASE READ https://github.com/vllm-project/vllm-omni/blob/main/CONTRIBUTING.md (anything written below this line will be removed by GitHub Actions)

chatgpt-codex-connector

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: d5e5be9fcd

ℹ️ About Codex in GitHub

Codex has been enabled to automatically review pull requests in this repo. Reviews are triggered when you

Open a pull request for review
Mark a draft as ready
Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

When you sign up for Codex through ChatGPT, Codex can also answer questions or update the PR, like "@codex address that feedback".

chatgpt-codex-connector · 2026-03-19T06:47:53Z

            for img in images:
-                with BytesIO() as buffer:
-                    img.save(buffer, format="PNG")
-                    img_bytes = buffer.getvalue()
-                img_base64 = base64.b64encode(img_bytes).decode("utf-8")
-                image_contents.append(
-                    {
-                        "type": "image_url",
-                        "image_url": {
-                            "url": f"data:image/png;base64,{img_base64}",
-                        },
-                    }
-                )
+                if isinstance(img, list):
+                    for sub_idx, sub_img in enumerate(img):


Preserve flat image lists when encoding diffusion results

If result.request_output.images is the existing flat list[PIL.Image.Image] shape, this new if isinstance(img, list) guard drops every image and falls through to the "no images were produced" response. DiffusionEngine.generate() still passes images=outputs for normal diffusion requests (vllm_omni/diffusion/diffusion_engine.py:180-188), so this regresses the chat endpoint for all single-image / non-layered models while fixing the layered case.

Useful? React with 👍 / 👎.

hsliuustc0106 · 2026-03-19T07:37:35Z

@bjf-frz PTAL

hsliuustc0106 · 2026-03-19T07:38:34Z

fix precommits please

david6666666 · 2026-03-19T08:04:43Z

        choices: list[ChatCompletionResponseChoice] = []
        final_res = omni_outputs.request_output

-        # Handle profiling data


Why remove this

david6666666 · 2026-03-19T08:06:59Z

@codex review

chatgpt-codex-connector

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: c015313b89

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

Open a pull request for review
Mark a draft as ready
Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

chatgpt-codex-connector · 2026-03-19T08:12:49Z

                    "image_url": {
                        "url": f"data:image/png;base64,{img_base64}",
                    },
-                    "stage_durations": stage_durations,
                }


Keep stage_durations on chat image content

Removing stage_durations here regresses the non-pure-diffusion /v1/chat/completions path for image outputs. OmniBase._process_single_result() still attaches per-stage timings to OmniRequestOutput, and benchmarks/diffusion/backends.py:116-122 reads them back from choices[0].message.content[0].stage_durations. After this change, multi-stage image models served through _create_image_choice() lose their stage breakdowns entirely, so benchmark/profiling runs can no longer report per-stage latency.

Useful? React with 👍 / 👎.

david6666666 · 2026-03-19T09:31:43Z

LGTM now

Signed-off-by: Hui <1779066624@qq.com>

…oject#2007) Signed-off-by: Hui <1779066624@qq.com> Co-authored-by: AlvisGong <gwly0401@163.com> Signed-off-by: yiliu30 <yi4.liu@intel.com>

### vllm-omni-audio-tts - Source: [PR #2059](vllm-project/vllm-omni#2059) - [BugFix][Qwen3TTS] CodePredictor CudaGraph Pool - Changes: - Bug fix: [BugFix][Qwen3TTS] CodePredictor CudaGraph Pool ### vllm-omni-perf - Source: [PR #2059](vllm-project/vllm-omni#2059) - [BugFix][Qwen3TTS] CodePredictor CudaGraph Pool - Changes: - Bug fix: [BugFix][Qwen3TTS] CodePredictor CudaGraph Pool ### vllm-omni-api - Source: [PR #2058](vllm-project/vllm-omni#2058) - [Bugfix] Fix Fish Speech and CosyVoice3 online serving - missing is_comprehension and broken model detection - Changes: - Bug fix: [Bugfix] Fix Fish Speech and CosyVoice3 online serving - missing is_comprehension and broken model detection ### vllm-omni-contrib - Source: [PR #2045](vllm-project/vllm-omni#2045) - [Voxtral] Improve example ### vllm-omni-cicd - Source: [PR #2045](vllm-project/vllm-omni#2045) - [Voxtral] Improve example ### vllm-omni-api - Source: [PR #2042](vllm-project/vllm-omni#2042) - [bugfix] /chat/completion doesn't read extra_body for diffusion model - Changes: - Bug fix: [bugfix] /chat/completion doesn't read extra_body for diffusion model ### vllm-omni-perf - Source: [PR #2042](vllm-project/vllm-omni#2042) - [bugfix] /chat/completion doesn't read extra_body for diffusion model - Changes: - Bug fix: [bugfix] /chat/completion doesn't read extra_body for diffusion model ### vllm-omni-contrib - Source: [PR #2038](vllm-project/vllm-omni#2038) - [Doc] Update docs and dockerfiles for rebase of vllm v0.18.0 ### vllm-omni-serving - Source: [PR #2037](vllm-project/vllm-omni#2037) - [Rebase] Rebase to vllm v0.18.0 ### vllm-omni-contrib - Source: [PR #2037](vllm-project/vllm-omni#2037) - [Rebase] Rebase to vllm v0.18.0 ### vllm-omni-api - Source: [PR #2037](vllm-project/vllm-omni#2037) - [Rebase] Rebase to vllm v0.18.0 ### vllm-omni-cicd - Source: [PR #2037](vllm-project/vllm-omni#2037) - [Rebase] Rebase to vllm v0.18.0 ### vllm-omni-cicd - Source: [PR #2032](vllm-project/vllm-omni#2032) - [CI] Change Bagel online test environment variable `VLLM_TEST_CLEAN_GPU_MEMORY` to `0` ### vllm-omni-cicd - Source: [PR #2031](vllm-project/vllm-omni#2031) - [CI] Fix test. - Changes: - Bug fix: [CI] Fix test. ### vllm-omni-cicd - Source: [PR #2017](vllm-project/vllm-omni#2017) - [CI] [ROCm] Setup `test-ready.yml` and `test-merge.yml` ### vllm-omni-cicd - Source: [PR #2014](vllm-project/vllm-omni#2014) - [Test] Implement mock HTTP request handling in benchmark CLI tests ### vllm-omni-perf - Source: [PR #2014](vllm-project/vllm-omni#2014) - [Test] Implement mock HTTP request handling in benchmark CLI tests ### vllm-omni-serving - Source: [PR #2012](vllm-project/vllm-omni#2012) - [Fixbug][Perf] Qwen3-omni: code predictor with re-prefill + SDPA and eliminate decode hot-path CPU round-trips - Changes: - Bug fix: [Fixbug][Perf] Qwen3-omni: code predictor with re-prefill + SDPA and eliminate decode hot-path CPU round-trips ### vllm-omni-image-gen - Source: [PR #2012](vllm-project/vllm-omni#2012) - [Fixbug][Perf] Qwen3-omni: code predictor with re-prefill + SDPA and eliminate decode hot-path CPU round-trips - Changes: - Bug fix: [Fixbug][Perf] Qwen3-omni: code predictor with re-prefill + SDPA and eliminate decode hot-path CPU round-trips ### vllm-omni-perf - Source: [PR #2012](vllm-project/vllm-omni#2012) - [Fixbug][Perf] Qwen3-omni: code predictor with re-prefill + SDPA and eliminate decode hot-path CPU round-trips - Changes: - Bug fix: [Fixbug][Perf] Qwen3-omni: code predictor with re-prefill + SDPA and eliminate decode hot-path CPU round-trips ### vllm-omni-serving - Source: [PR #2009](vllm-project/vllm-omni#2009) - [Bugfix] revert PR#1758 which introduced the accuracy problem of qwen3-omni - Changes: - Bug fix: [Bugfix] revert PR#1758 which introduced the accuracy problem of qwen3-omni ### vllm-omni-image-gen - Source: [PR #2007](vllm-project/vllm-omni#2007) - [Bugfix]Fix bug of online server can not return mutli images - Changes: - Bug fix: [Bugfix]Fix bug of online server can not return mutli images - Additions: - Qwen-Image-Layered - Qwen-Image-Layered - Qwen-Image-Layered ### vllm-omni-api - Source: [PR #2007](vllm-project/vllm-omni#2007) - [Bugfix]Fix bug of online server can not return mutli images - Changes: - Bug fix: [Bugfix]Fix bug of online server can not return mutli images ### vllm-omni-cicd - Source: [PR #1998](vllm-project/vllm-omni#1998) - [CI] Split BAGEL tests into dummy/real weight tiers (L2/L3) ### vllm-omni-serving - Source: [PR #1985](vllm-project/vllm-omni#1985) - [Perf] [Qwen3-TTS] Keep audio_codes and last_talker_hidden on GPU to eliminate per-step sync stalls - Changes: - Performance improvement: [Perf] [Qwen3-TTS] Keep audio_codes and last_talker_hidden on GPU to eliminate per-step sync stalls ### vllm-omni-audio-tts - Source: [PR #1985](vllm-project/vllm-omni#1985) - [Perf] [Qwen3-TTS] Keep audio_codes and last_talker_hidden on GPU to eliminate per-step sync stalls - Changes: - Performance improvement: [Perf] [Qwen3-TTS] Keep audio_codes and last_talker_hidden on GPU to eliminate per-step sync stalls ### vllm-omni-perf - Source: [PR #1985](vllm-project/vllm-omni#1985) - [Perf] [Qwen3-TTS] Keep audio_codes and last_talker_hidden on GPU to eliminate per-step sync stalls - Changes: - Performance improvement: [Perf] [Qwen3-TTS] Keep audio_codes and last_talker_hidden on GPU to eliminate per-step sync stalls ### vllm-omni-serving - Source: [PR #1984](vllm-project/vllm-omni#1984) - [CI] [ROCm] Bugfix device environment issue - Changes: - Bug fix: [CI] [ROCm] Bugfix device environment issue ### vllm-omni-api - Source: [PR #1984](vllm-project/vllm-omni#1984) - [CI] [ROCm] Bugfix device environment issue - Changes: - Bug fix: [CI] [ROCm] Bugfix device environment issue ### vllm-omni-serving - Source: [PR #1982](vllm-project/vllm-omni#1982) - [Fix] Fix slow hasattr in CUDAGraphWrapper.__getattr__ - Changes: - Bug fix: [Fix] Fix slow hasattr in CUDAGraphWrapper.__getattr__ ### vllm-omni-cicd - Source: [PR #1982](vllm-project/vllm-omni#1982) - [Fix] Fix slow hasattr in CUDAGraphWrapper.__getattr__ - Changes: - Bug fix: [Fix] Fix slow hasattr in CUDAGraphWrapper.__getattr__ ### vllm-omni-api - Source: [PR #1979](vllm-project/vllm-omni#1979) - [Bugfix] Fix config misalignment between offline and online diffusion inference (Wan2.2, Qwen-Image series) - Changes: - Bug fix: [Bugfix] Fix config misalignment between offline and online diffusion inference (Wan2.2, Qwen-Image series) - Additions: - `/v1/chat/completions` ### vllm-omni-perf - Source: [PR #1979](vllm-project/vllm-omni#1979) - [Bugfix] Fix config misalignment between offline and online diffusion inference (Wan2.2, Qwen-Image series) - Changes: - Bug fix: [Bugfix] Fix config misalignment between offline and online diffusion inference (Wan2.2, Qwen-Image series) ### vllm-omni-contrib - Source: [PR #1976](vllm-project/vllm-omni#1976) - [skip ci][Docs] Update WeChat QR code (fix filename case) - Changes: - Bug fix: [skip ci][Docs] Update WeChat QR code (fix filename case) ### vllm-omni-contrib - Source: [PR #1974](vllm-project/vllm-omni#1974) - [Docs] Update WeChat QR code for community support ### vllm-omni-cicd - Source: [PR #1945](vllm-project/vllm-omni#1945) - Fix Base voice clone streaming quality and stop-token crash - Changes: - Bug fix: Fix Base voice clone streaming quality and stop-token crash ### vllm-omni-cicd - Source: [PR #1938](vllm-project/vllm-omni#1938) - [Test] L4 complete diffusion feature test for Bagel models - Changes: - New feature: [Test] L4 complete diffusion feature test for Bagel models ### vllm-omni-perf - Source: [PR #1938](vllm-project/vllm-omni#1938) - [Test] L4 complete diffusion feature test for Bagel models - Changes: - New feature: [Test] L4 complete diffusion feature test for Bagel models ### vllm-omni-perf - Source: [PR #1934](vllm-project/vllm-omni#1934) - Fix OmniGen2 transformer config loading for HF models - Changes: - Bug fix: Fix OmniGen2 transformer config loading for HF models ### vllm-omni-audio-tts - Source: [PR #1930](vllm-project/vllm-omni#1930) - [Bug][Qwen3TTS][Streaming] remove dynamic initial chunk and only compute on initial request ### vllm-omni-perf - Source: [PR #1930](vllm-project/vllm-omni#1930) - [Bug][Qwen3TTS][Streaming] remove dynamic initial chunk and only compute on initial request ### vllm-omni-audio-tts - Source: [PR #1926](vllm-project/vllm-omni#1926) - [Misc] removed qwen3_tts.py as it is out-dated ### vllm-omni-contrib - Source: [PR #1920](vllm-project/vllm-omni#1920) - [Docs] Add Wan2.1-T2V as supported video generation models - Changes: - New feature: [Docs] Add Wan2.1-T2V as supported video generation models ### vllm-omni-video-gen - Source: [PR #1915](vllm-project/vllm-omni#1915) - [Bugfix] fix helios video generate use cpu device - Changes: - Bug fix: [Bugfix] fix helios video generate use cpu device ### vllm-omni-perf - Source: [PR #1915](vllm-project/vllm-omni#1915) - [Bugfix] fix helios video generate use cpu device - Changes: - Bug fix: [Bugfix] fix helios video generate use cpu device ### vllm-omni-audio-tts - Source: [PR #1913](vllm-project/vllm-omni#1913) - [Optim][Qwen3TTS][CodePredictor] support torch.compile with reduce-overhead and dynamic False ### vllm-omni-perf - Source: [PR #1913](vllm-project/vllm-omni#1913) - [Optim][Qwen3TTS][CodePredictor] support torch.compile with reduce-overhead and dynamic False ### vllm-omni-api - Source: [PR #1908](vllm-project/vllm-omni#1908) - [Entrypoint][Refactor] vLLM-Omni Entrypoint Refactoring ### vllm-omni-perf - Source: [PR #1908](vllm-project/vllm-omni#1908) - [Entrypoint][Refactor] vLLM-Omni Entrypoint Refactoring ### vllm-omni-contrib - Source: [PR #1908](vllm-project/vllm-omni#1908) - [Entrypoint][Refactor] vLLM-Omni Entrypoint Refactoring ### vllm-omni-serving - Source: [PR #1908](vllm-project/vllm-omni#1908) - [Entrypoint][Refactor] vLLM-Omni Entrypoint Refactoring ### vllm-omni-cicd - Source: [PR #1908](vllm-project/vllm-omni#1908) - [Entrypoint][Refactor] vLLM-Omni Entrypoint Refactoring ### vllm-omni-image-gen - Source: [PR #1900](vllm-project/vllm-omni#1900) - [Feat] support HSDP for Flux family - Changes: - New feature: [Feat] support HSDP for Flux family ### vllm-omni-contrib - Source: [PR #1900](vllm-project/vllm-omni#1900) - [Feat] support HSDP for Flux family - Changes: - New feature: [Feat] support HSDP for Flux family ### vllm-omni-distributed - Source: [PR #1898](vllm-project/vllm-omni#1898) - [Feature]: Remove some useless `hf_overrides` in yaml - Changes: - New feature: [Feature]: Remove some useless `hf_overrides` in yaml ### vllm-omni-quantization - Source: [PR #1898](vllm-project/vllm-omni#1898) - [Feature]: Remove some useless `hf_overrides` in yaml - Changes: - New feature: [Feature]: Remove some useless `hf_overrides` in yaml ### vllm-omni-cicd - Source: [PR #1898](vllm-project/vllm-omni#1898) - [Feature]: Remove some useless `hf_overrides` in yaml - Changes: - New feature: [Feature]: Remove some useless `hf_overrides` in yaml ### vllm-omni-perf - Source: [PR #1898](vllm-project/vllm-omni#1898) - [Feature]: Remove some useless `hf_overrides` in yaml - Changes: - New feature: [Feature]: Remove some useless `hf_overrides` in yaml ### vllm-omni-contrib - Source: [PR #1890](vllm-project/vllm-omni#1890) - [NPU] Upgrade to v0.17.0 ### vllm-omni-contrib - Source: [PR #1889](vllm-project/vllm-omni#1889) - Add `Governance` section - Changes: - New feature: Add `Governance` section ### vllm-omni-distributed - Source: [PR #1881](vllm-project/vllm-omni#1881) - [Feat] Support T5 Tensor Parallelism - Changes: - New feature: [Feat] Support T5 Tensor Parallelism ### vllm-omni-cicd - Source: [PR #1881](vllm-project/vllm-omni#1881) - [Feat] Support T5 Tensor Parallelism - Changes: - New feature: [Feat] Support T5 Tensor Parallelism

Hu1Lcode requested a review from hsliuustc0106 as a code owner March 19, 2026 06:45

chatgpt-codex-connector bot reviewed Mar 19, 2026

View reviewed changes

FrosterHan mentioned this pull request Mar 19, 2026

[Bug]: Image-to-image model's multi-image output in the API response retains only the first image #1969

Closed

1 task

Hu1Lcode force-pushed the bug-fix/mutli-images-output branch from d5e5be9 to fccb029 Compare March 19, 2026 06:59

david6666666 linked an issue Mar 19, 2026 that may be closed by this pull request

[Bug]: Image-to-image model's multi-image output in the API response retains only the first image #1969

Closed

1 task

david6666666 mentioned this pull request Mar 19, 2026

[RFC]: Qwen-Image、Qwen-Image-Layered、Qwen-Image-Edit-Plus、Wan2.2 Production-grade Feature Monitoring JiusiServe/vllm-omni#167

Closed

26 tasks

david6666666 reviewed Mar 19, 2026

View reviewed changes

chatgpt-codex-connector bot reviewed Mar 19, 2026

View reviewed changes

Hu1Lcode force-pushed the bug-fix/mutli-images-output branch 5 times, most recently from 74d6db8 to 4b170c6 Compare March 19, 2026 09:20

david6666666 approved these changes Mar 19, 2026

View reviewed changes

david6666666 added the ready label to trigger buildkite CI label Mar 19, 2026

AlvisGong and others added 6 commits March 19, 2026 17:44

[Bugfix]Fix bug of online server can not return mutli images

2e4666c

Signed-off-by: Hui <1779066624@qq.com>

fix version error

c92f46b

Signed-off-by: Hui <1779066624@qq.com>

solve conflict

cf3e60a

Signed-off-by: Hui <1779066624@qq.com>

format prob

8b9202d

Signed-off-by: Hui <1779066624@qq.com>

fix precommit

34edd37

Signed-off-by: Hui <1779066624@qq.com>

rebase main

e690866

Signed-off-by: Hui <1779066624@qq.com>

Hu1Lcode force-pushed the bug-fix/mutli-images-output branch from 6c0b92e to e690866 Compare March 19, 2026 09:44

david6666666 enabled auto-merge (squash) March 19, 2026 10:31

david6666666 merged commit 99cc766 into vllm-project:main Mar 19, 2026
6 of 7 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Bugfix]Fix bug of online server can not return mutli images#2007

[Bugfix]Fix bug of online server can not return mutli images#2007
david6666666 merged 6 commits intovllm-project:mainfrom
Hu1Lcode:bug-fix/mutli-images-output

Hu1Lcode commented Mar 19, 2026

Uh oh!

chatgpt-codex-connector bot left a comment

Uh oh!

chatgpt-codex-connector bot Mar 19, 2026

Uh oh!

hsliuustc0106 commented Mar 19, 2026

Uh oh!

hsliuustc0106 commented Mar 19, 2026

Uh oh!

david6666666 Mar 19, 2026

Uh oh!

david6666666 commented Mar 19, 2026

Uh oh!

chatgpt-codex-connector bot left a comment

Uh oh!

chatgpt-codex-connector bot Mar 19, 2026

Uh oh!

david6666666 commented Mar 19, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

Conversation

Hu1Lcode commented Mar 19, 2026

Purpose

Test Plan

Our start command

curl command

Test result

Uh oh!

chatgpt-codex-connector bot left a comment

Choose a reason for hiding this comment

💡 Codex Review

Uh oh!

chatgpt-codex-connector bot Mar 19, 2026

Choose a reason for hiding this comment

Uh oh!

hsliuustc0106 commented Mar 19, 2026

Uh oh!

hsliuustc0106 commented Mar 19, 2026

Uh oh!

david6666666 Mar 19, 2026

Choose a reason for hiding this comment

Uh oh!

david6666666 commented Mar 19, 2026

Uh oh!

chatgpt-codex-connector bot left a comment

Choose a reason for hiding this comment

💡 Codex Review

Uh oh!

chatgpt-codex-connector bot Mar 19, 2026

Choose a reason for hiding this comment

Uh oh!

david6666666 commented Mar 19, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants