[Bugfix]Fix bug of online server can not return mutli images#2007
[Bugfix]Fix bug of online server can not return mutli images#2007david6666666 merged 6 commits intovllm-project:mainfrom
Conversation
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: d5e5be9fcd
ℹ️ About Codex in GitHub
Codex has been enabled to automatically review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
When you sign up for Codex through ChatGPT, Codex can also answer questions or update the PR, like "@codex address that feedback".
| for img in images: | ||
| with BytesIO() as buffer: | ||
| img.save(buffer, format="PNG") | ||
| img_bytes = buffer.getvalue() | ||
| img_base64 = base64.b64encode(img_bytes).decode("utf-8") | ||
| image_contents.append( | ||
| { | ||
| "type": "image_url", | ||
| "image_url": { | ||
| "url": f"data:image/png;base64,{img_base64}", | ||
| }, | ||
| } | ||
| ) | ||
| if isinstance(img, list): | ||
| for sub_idx, sub_img in enumerate(img): |
There was a problem hiding this comment.
Preserve flat image lists when encoding diffusion results
If result.request_output.images is the existing flat list[PIL.Image.Image] shape, this new if isinstance(img, list) guard drops every image and falls through to the "no images were produced" response. DiffusionEngine.generate() still passes images=outputs for normal diffusion requests (vllm_omni/diffusion/diffusion_engine.py:180-188), so this regresses the chat endpoint for all single-image / non-layered models while fixing the layered case.
Useful? React with 👍 / 👎.
d5e5be9 to
fccb029
Compare
|
@bjf-frz PTAL |
|
fix precommits please |
| choices: list[ChatCompletionResponseChoice] = [] | ||
| final_res = omni_outputs.request_output | ||
|
|
||
| # Handle profiling data |
|
@codex review |
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: c015313b89
ℹ️ About Codex in GitHub
Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".
| "image_url": { | ||
| "url": f"data:image/png;base64,{img_base64}", | ||
| }, | ||
| "stage_durations": stage_durations, | ||
| } |
There was a problem hiding this comment.
Keep
stage_durations on chat image content
Removing stage_durations here regresses the non-pure-diffusion /v1/chat/completions path for image outputs. OmniBase._process_single_result() still attaches per-stage timings to OmniRequestOutput, and benchmarks/diffusion/backends.py:116-122 reads them back from choices[0].message.content[0].stage_durations. After this change, multi-stage image models served through _create_image_choice() lose their stage breakdowns entirely, so benchmark/profiling runs can no longer report per-stage latency.
Useful? React with 👍 / 👎.
74d6db8 to
4b170c6
Compare
|
LGTM now |
Signed-off-by: Hui <1779066624@qq.com>
Signed-off-by: Hui <1779066624@qq.com>
Signed-off-by: Hui <1779066624@qq.com>
Signed-off-by: Hui <1779066624@qq.com>
Signed-off-by: Hui <1779066624@qq.com>
Signed-off-by: Hui <1779066624@qq.com>
6c0b92e to
e690866
Compare
…oject#2007) Signed-off-by: Hui <1779066624@qq.com> Co-authored-by: AlvisGong <gwly0401@163.com> Signed-off-by: yiliu30 <yi4.liu@intel.com>
### vllm-omni-audio-tts - Source: [PR #2059](vllm-project/vllm-omni#2059) - [BugFix][Qwen3TTS] CodePredictor CudaGraph Pool - Changes: - Bug fix: [BugFix][Qwen3TTS] CodePredictor CudaGraph Pool ### vllm-omni-perf - Source: [PR #2059](vllm-project/vllm-omni#2059) - [BugFix][Qwen3TTS] CodePredictor CudaGraph Pool - Changes: - Bug fix: [BugFix][Qwen3TTS] CodePredictor CudaGraph Pool ### vllm-omni-api - Source: [PR #2058](vllm-project/vllm-omni#2058) - [Bugfix] Fix Fish Speech and CosyVoice3 online serving - missing is_comprehension and broken model detection - Changes: - Bug fix: [Bugfix] Fix Fish Speech and CosyVoice3 online serving - missing is_comprehension and broken model detection ### vllm-omni-contrib - Source: [PR #2045](vllm-project/vllm-omni#2045) - [Voxtral] Improve example ### vllm-omni-cicd - Source: [PR #2045](vllm-project/vllm-omni#2045) - [Voxtral] Improve example ### vllm-omni-api - Source: [PR #2042](vllm-project/vllm-omni#2042) - [bugfix] /chat/completion doesn't read extra_body for diffusion model - Changes: - Bug fix: [bugfix] /chat/completion doesn't read extra_body for diffusion model ### vllm-omni-perf - Source: [PR #2042](vllm-project/vllm-omni#2042) - [bugfix] /chat/completion doesn't read extra_body for diffusion model - Changes: - Bug fix: [bugfix] /chat/completion doesn't read extra_body for diffusion model ### vllm-omni-contrib - Source: [PR #2038](vllm-project/vllm-omni#2038) - [Doc] Update docs and dockerfiles for rebase of vllm v0.18.0 ### vllm-omni-serving - Source: [PR #2037](vllm-project/vllm-omni#2037) - [Rebase] Rebase to vllm v0.18.0 ### vllm-omni-contrib - Source: [PR #2037](vllm-project/vllm-omni#2037) - [Rebase] Rebase to vllm v0.18.0 ### vllm-omni-api - Source: [PR #2037](vllm-project/vllm-omni#2037) - [Rebase] Rebase to vllm v0.18.0 ### vllm-omni-cicd - Source: [PR #2037](vllm-project/vllm-omni#2037) - [Rebase] Rebase to vllm v0.18.0 ### vllm-omni-cicd - Source: [PR #2032](vllm-project/vllm-omni#2032) - [CI] Change Bagel online test environment variable `VLLM_TEST_CLEAN_GPU_MEMORY` to `0` ### vllm-omni-cicd - Source: [PR #2031](vllm-project/vllm-omni#2031) - [CI] Fix test. - Changes: - Bug fix: [CI] Fix test. ### vllm-omni-cicd - Source: [PR #2017](vllm-project/vllm-omni#2017) - [CI] [ROCm] Setup `test-ready.yml` and `test-merge.yml` ### vllm-omni-cicd - Source: [PR #2014](vllm-project/vllm-omni#2014) - [Test] Implement mock HTTP request handling in benchmark CLI tests ### vllm-omni-perf - Source: [PR #2014](vllm-project/vllm-omni#2014) - [Test] Implement mock HTTP request handling in benchmark CLI tests ### vllm-omni-serving - Source: [PR #2012](vllm-project/vllm-omni#2012) - [Fixbug][Perf] Qwen3-omni: code predictor with re-prefill + SDPA and eliminate decode hot-path CPU round-trips - Changes: - Bug fix: [Fixbug][Perf] Qwen3-omni: code predictor with re-prefill + SDPA and eliminate decode hot-path CPU round-trips ### vllm-omni-image-gen - Source: [PR #2012](vllm-project/vllm-omni#2012) - [Fixbug][Perf] Qwen3-omni: code predictor with re-prefill + SDPA and eliminate decode hot-path CPU round-trips - Changes: - Bug fix: [Fixbug][Perf] Qwen3-omni: code predictor with re-prefill + SDPA and eliminate decode hot-path CPU round-trips ### vllm-omni-perf - Source: [PR #2012](vllm-project/vllm-omni#2012) - [Fixbug][Perf] Qwen3-omni: code predictor with re-prefill + SDPA and eliminate decode hot-path CPU round-trips - Changes: - Bug fix: [Fixbug][Perf] Qwen3-omni: code predictor with re-prefill + SDPA and eliminate decode hot-path CPU round-trips ### vllm-omni-serving - Source: [PR #2009](vllm-project/vllm-omni#2009) - [Bugfix] revert PR#1758 which introduced the accuracy problem of qwen3-omni - Changes: - Bug fix: [Bugfix] revert PR#1758 which introduced the accuracy problem of qwen3-omni ### vllm-omni-image-gen - Source: [PR #2007](vllm-project/vllm-omni#2007) - [Bugfix]Fix bug of online server can not return mutli images - Changes: - Bug fix: [Bugfix]Fix bug of online server can not return mutli images - Additions: - Qwen-Image-Layered - Qwen-Image-Layered - Qwen-Image-Layered ### vllm-omni-api - Source: [PR #2007](vllm-project/vllm-omni#2007) - [Bugfix]Fix bug of online server can not return mutli images - Changes: - Bug fix: [Bugfix]Fix bug of online server can not return mutli images ### vllm-omni-cicd - Source: [PR #1998](vllm-project/vllm-omni#1998) - [CI] Split BAGEL tests into dummy/real weight tiers (L2/L3) ### vllm-omni-serving - Source: [PR #1985](vllm-project/vllm-omni#1985) - [Perf] [Qwen3-TTS] Keep audio_codes and last_talker_hidden on GPU to eliminate per-step sync stalls - Changes: - Performance improvement: [Perf] [Qwen3-TTS] Keep audio_codes and last_talker_hidden on GPU to eliminate per-step sync stalls ### vllm-omni-audio-tts - Source: [PR #1985](vllm-project/vllm-omni#1985) - [Perf] [Qwen3-TTS] Keep audio_codes and last_talker_hidden on GPU to eliminate per-step sync stalls - Changes: - Performance improvement: [Perf] [Qwen3-TTS] Keep audio_codes and last_talker_hidden on GPU to eliminate per-step sync stalls ### vllm-omni-perf - Source: [PR #1985](vllm-project/vllm-omni#1985) - [Perf] [Qwen3-TTS] Keep audio_codes and last_talker_hidden on GPU to eliminate per-step sync stalls - Changes: - Performance improvement: [Perf] [Qwen3-TTS] Keep audio_codes and last_talker_hidden on GPU to eliminate per-step sync stalls ### vllm-omni-serving - Source: [PR #1984](vllm-project/vllm-omni#1984) - [CI] [ROCm] Bugfix device environment issue - Changes: - Bug fix: [CI] [ROCm] Bugfix device environment issue ### vllm-omni-api - Source: [PR #1984](vllm-project/vllm-omni#1984) - [CI] [ROCm] Bugfix device environment issue - Changes: - Bug fix: [CI] [ROCm] Bugfix device environment issue ### vllm-omni-serving - Source: [PR #1982](vllm-project/vllm-omni#1982) - [Fix] Fix slow hasattr in CUDAGraphWrapper.__getattr__ - Changes: - Bug fix: [Fix] Fix slow hasattr in CUDAGraphWrapper.__getattr__ ### vllm-omni-cicd - Source: [PR #1982](vllm-project/vllm-omni#1982) - [Fix] Fix slow hasattr in CUDAGraphWrapper.__getattr__ - Changes: - Bug fix: [Fix] Fix slow hasattr in CUDAGraphWrapper.__getattr__ ### vllm-omni-api - Source: [PR #1979](vllm-project/vllm-omni#1979) - [Bugfix] Fix config misalignment between offline and online diffusion inference (Wan2.2, Qwen-Image series) - Changes: - Bug fix: [Bugfix] Fix config misalignment between offline and online diffusion inference (Wan2.2, Qwen-Image series) - Additions: - `/v1/chat/completions` ### vllm-omni-perf - Source: [PR #1979](vllm-project/vllm-omni#1979) - [Bugfix] Fix config misalignment between offline and online diffusion inference (Wan2.2, Qwen-Image series) - Changes: - Bug fix: [Bugfix] Fix config misalignment between offline and online diffusion inference (Wan2.2, Qwen-Image series) ### vllm-omni-contrib - Source: [PR #1976](vllm-project/vllm-omni#1976) - [skip ci][Docs] Update WeChat QR code (fix filename case) - Changes: - Bug fix: [skip ci][Docs] Update WeChat QR code (fix filename case) ### vllm-omni-contrib - Source: [PR #1974](vllm-project/vllm-omni#1974) - [Docs] Update WeChat QR code for community support ### vllm-omni-cicd - Source: [PR #1945](vllm-project/vllm-omni#1945) - Fix Base voice clone streaming quality and stop-token crash - Changes: - Bug fix: Fix Base voice clone streaming quality and stop-token crash ### vllm-omni-cicd - Source: [PR #1938](vllm-project/vllm-omni#1938) - [Test] L4 complete diffusion feature test for Bagel models - Changes: - New feature: [Test] L4 complete diffusion feature test for Bagel models ### vllm-omni-perf - Source: [PR #1938](vllm-project/vllm-omni#1938) - [Test] L4 complete diffusion feature test for Bagel models - Changes: - New feature: [Test] L4 complete diffusion feature test for Bagel models ### vllm-omni-perf - Source: [PR #1934](vllm-project/vllm-omni#1934) - Fix OmniGen2 transformer config loading for HF models - Changes: - Bug fix: Fix OmniGen2 transformer config loading for HF models ### vllm-omni-audio-tts - Source: [PR #1930](vllm-project/vllm-omni#1930) - [Bug][Qwen3TTS][Streaming] remove dynamic initial chunk and only compute on initial request ### vllm-omni-perf - Source: [PR #1930](vllm-project/vllm-omni#1930) - [Bug][Qwen3TTS][Streaming] remove dynamic initial chunk and only compute on initial request ### vllm-omni-audio-tts - Source: [PR #1926](vllm-project/vllm-omni#1926) - [Misc] removed qwen3_tts.py as it is out-dated ### vllm-omni-contrib - Source: [PR #1920](vllm-project/vllm-omni#1920) - [Docs] Add Wan2.1-T2V as supported video generation models - Changes: - New feature: [Docs] Add Wan2.1-T2V as supported video generation models ### vllm-omni-video-gen - Source: [PR #1915](vllm-project/vllm-omni#1915) - [Bugfix] fix helios video generate use cpu device - Changes: - Bug fix: [Bugfix] fix helios video generate use cpu device ### vllm-omni-perf - Source: [PR #1915](vllm-project/vllm-omni#1915) - [Bugfix] fix helios video generate use cpu device - Changes: - Bug fix: [Bugfix] fix helios video generate use cpu device ### vllm-omni-audio-tts - Source: [PR #1913](vllm-project/vllm-omni#1913) - [Optim][Qwen3TTS][CodePredictor] support torch.compile with reduce-overhead and dynamic False ### vllm-omni-perf - Source: [PR #1913](vllm-project/vllm-omni#1913) - [Optim][Qwen3TTS][CodePredictor] support torch.compile with reduce-overhead and dynamic False ### vllm-omni-api - Source: [PR #1908](vllm-project/vllm-omni#1908) - [Entrypoint][Refactor] vLLM-Omni Entrypoint Refactoring ### vllm-omni-perf - Source: [PR #1908](vllm-project/vllm-omni#1908) - [Entrypoint][Refactor] vLLM-Omni Entrypoint Refactoring ### vllm-omni-contrib - Source: [PR #1908](vllm-project/vllm-omni#1908) - [Entrypoint][Refactor] vLLM-Omni Entrypoint Refactoring ### vllm-omni-serving - Source: [PR #1908](vllm-project/vllm-omni#1908) - [Entrypoint][Refactor] vLLM-Omni Entrypoint Refactoring ### vllm-omni-cicd - Source: [PR #1908](vllm-project/vllm-omni#1908) - [Entrypoint][Refactor] vLLM-Omni Entrypoint Refactoring ### vllm-omni-image-gen - Source: [PR #1900](vllm-project/vllm-omni#1900) - [Feat] support HSDP for Flux family - Changes: - New feature: [Feat] support HSDP for Flux family ### vllm-omni-contrib - Source: [PR #1900](vllm-project/vllm-omni#1900) - [Feat] support HSDP for Flux family - Changes: - New feature: [Feat] support HSDP for Flux family ### vllm-omni-distributed - Source: [PR #1898](vllm-project/vllm-omni#1898) - [Feature]: Remove some useless `hf_overrides` in yaml - Changes: - New feature: [Feature]: Remove some useless `hf_overrides` in yaml ### vllm-omni-quantization - Source: [PR #1898](vllm-project/vllm-omni#1898) - [Feature]: Remove some useless `hf_overrides` in yaml - Changes: - New feature: [Feature]: Remove some useless `hf_overrides` in yaml ### vllm-omni-cicd - Source: [PR #1898](vllm-project/vllm-omni#1898) - [Feature]: Remove some useless `hf_overrides` in yaml - Changes: - New feature: [Feature]: Remove some useless `hf_overrides` in yaml ### vllm-omni-perf - Source: [PR #1898](vllm-project/vllm-omni#1898) - [Feature]: Remove some useless `hf_overrides` in yaml - Changes: - New feature: [Feature]: Remove some useless `hf_overrides` in yaml ### vllm-omni-contrib - Source: [PR #1890](vllm-project/vllm-omni#1890) - [NPU] Upgrade to v0.17.0 ### vllm-omni-contrib - Source: [PR #1889](vllm-project/vllm-omni#1889) - Add `Governance` section - Changes: - New feature: Add `Governance` section ### vllm-omni-distributed - Source: [PR #1881](vllm-project/vllm-omni#1881) - [Feat] Support T5 Tensor Parallelism - Changes: - New feature: [Feat] Support T5 Tensor Parallelism ### vllm-omni-cicd - Source: [PR #1881](vllm-project/vllm-omni#1881) - [Feat] Support T5 Tensor Parallelism - Changes: - New feature: [Feat] Support T5 Tensor Parallelism
PLEASE FILL IN THE PR DESCRIPTION HERE ENSURING ALL CHECKLIST ITEMS (AT THE BOTTOM) HAVE BEEN CONSIDERED.
Purpose
When i use vllm-omni to infer Qwen-Image-Layered, i found this model return mutli pictures but vllm-omni report an error.
Test Plan
Start a Qwen-Image-Layered online service to conduct reasoning tests.
Our start command
vllm serve ./Qwen-Image-Layered
--omni
--port 8091
--log-stats
--vae-use-slicing
--vae-use-tiling
curl command
#!/bin/bash
INPUT_IMG="./rabbit.png"
CURRENT_TIME=$(date +%Y%m%d%H%M%S)
OUTPUT="${3:-image_edit_${CURRENT_TIME}.png}"
#OUTPUT="./output.png"
REQUEST_JSON_FILE="./test.json"
IMAGE_BASE64=$(base64 -w0 "$INPUT_IMG")
base64 -w0 "$INPUT_IMG"
| jq -Rs '{
messages: [{
role: "user",
content: [
{"type": "image_url", "image_url": {"url": ("data:image/png;base64," + .)}}
]
}],
extra_body: {
num_inference_steps: 50,
cfg_scale: 4.0,
output: "layered",
layers: 4,
color_format: "RGBA"
}
}' > "$REQUEST_JSON_FILE"
curl -s "http://localhost:8091/v1/chat/completions"
-H "Content-Type: application/json"
-d @"$REQUEST_JSON_FILE" |
jq -r '.choices[0].message.content[] | .image_url.url | split(",")[1]' |
while IFS= read -r base64_data; do
((i++))
echo "$base64_data" | base64 -d > "${OUTPUT_PREFIX}${i}.png"
echo "Saved ${OUTPUT_PREFIX}${i}.png"
done
Test result
output 4 pics

Essential Elements of an Effective PR Description Checklist
supported_models.mdandexamplesfor a new model. Please runmkdocs serveto sync the documentation editions to./docs.BEFORE SUBMITTING, PLEASE READ https://github.com/vllm-project/vllm-omni/blob/main/CONTRIBUTING.md (anything written below this line will be removed by GitHub Actions)