[BugFix] Surface diffusion metrics in chat completions; sanitize Bagel img2img mm kwargs#2932
Merged
Gaohan123 merged 4 commits intovllm-project:mainfrom Apr 22, 2026
Merged
Conversation
…lback Signed-off-by: NumberWan <wantszkin2003@gmail.com>
Signed-off-by: NumberWan <wantszkin2003@gmail.com>
Signed-off-by: NumberWan <wantszkin2003@gmail.com>
16 tasks
Collaborator
hsliuustc0106
left a comment
There was a problem hiding this comment.
BLOCKING:
- Test Coverage — Missing regression test. Please add an automated test that verifies:
- Diffusion profiler metrics (stage_durations, peak_memory_mb) are exposed in chat completions responses for image outputs
- The benchmark fallback logic correctly populates metrics from top-level when message-level metrics are missing
- Bagel img2img mode works correctly with height/width parameters without HF processor errors
The current test plan only provides manual verification. Please add automated tests with assertions.
Contributor
Author
The 3 tests have been added and passed all tests. Please refer to the "Test" and "Test Result" parts in the description |
Contributor
Author
Collaborator
|
I think you need to paste the results w/o this commit, I cannot see the benefits of this PR |
Contributor
Author
The description has been updated, before&after result showed in the description |
qinganrice
pushed a commit
to qinganrice/vllm-omni
that referenced
this pull request
Apr 23, 2026
…l img2img mm kwargs (vllm-project#2932) Signed-off-by: NumberWan <wantszkin2003@gmail.com>
5 tasks
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
PLEASE FILL IN THE PR DESCRIPTION HERE ENSURING ALL CHECKLIST ITEMS (AT THE BOTTOM) HAVE BEEN CONSIDERED.
Purpose
Fixes #2931
Bug Description
Run Code:
pytest -s -v /home/w00917303/vllm-omni/tests/dfx/perf/scripts/run_diffusion_benchmark.py -- --config-file /home/w00917303/vllm-omni/tests/dfx/perf/tests/test_bagel_vllm_omni.jsonIn t2i test, memory related information cannot read correctly since the metric merging have problem in mutli-stage model.

In i2i test, it showed 400 bad request. Root cause clarification: In the img2img path, OmniBagelProcessor forwards target_h/target_w (derived from width/height in the benchmark extra_body) into tokenizer(), which causes the failure. The SigLIP warning only indicates that image preprocessing ignores these two arguments.
Summary
stage_durations,peak_memory_mb) are visible to clients and benchmarks on the OpenAI-compatible chat completions path for image outputs.height/widthare provided, always passtarget_h/target_wviamm_processor_kwargsfor both t2i and i2i so GLM-style processors keep consistent behavior.target_h/target_winsideOmniBagelMultiModalProcessorbefore Bagel img2img HF calls so Bagel stays compatible without model-specific branching in the HTTP layer.Changes
vllm_omni/entrypoints/openai/serving_chat.py— merge / default image responsemetricswith omni profiler fields; keep unifiedmm_processor_kwargsfor dimensions on the omni multistage image path.benchmarks/diffusion/backends.py— if message-level metrics lackstage_durationsorpeak_memory_mb, fall back to top-levelmetricsin the JSON body.vllm_omni/model_executor/models/bagel/bagel.py—_mm_kwargs_for_bagel_img2img_hf()and use it on img2imgsuper()._call_hf_processor(...)paths.Testing
test_create_image_choice_exposes_diffusion_metricsintests/entrypoints/openai_api/test_serving_chat_metrics.pyto verifystage_durations/peak_memory_mbare exposed in image chat completions.tests/benchmarks/test_diffusion_backends_metrics.py:test_chat_completions_metrics_fallback_to_top_leveltest_chat_completions_metrics_message_level_takes_precedencetest_bagel_img2img_onlineintests/e2e/online_serving/test_bagel_online.pyto include explicitheight/widthinextra_body.Test Results
In, t2i test, memory related information read correctly

In i2i test, both 30 request replied succussfully

python -m pytest tests/entrypoints/openai_api/test_serving_chat_metrics.py -vvtest_create_image_choice_exposes_diffusion_metrics✅ PASSEDpython -m pytest tests/benchmarks/test_diffusion_backends_metrics.py -vvtest_chat_completions_metrics_fallback_to_top_level✅ PASSEDtest_chat_completions_metrics_message_level_takes_precedence✅ PASSEDtest_bagel_img2img_online[omni_server0]✅ PASSEDNotes
if model == ...inserving_chat.Essential Elements of an Effective PR Description Checklist
supported_models.mdandexamplesfor a new model. Please runmkdocs serveto sync the documentation editions to./docs.BEFORE SUBMITTING, PLEASE READ https://github.com/vllm-project/vllm-omni/blob/main/CONTRIBUTING.md (anything written below this line will be removed by GitHub Actions)