Skip to content

[BugFix] Surface diffusion metrics in chat completions; sanitize Bagel img2img mm kwargs#2932

Merged
Gaohan123 merged 4 commits intovllm-project:mainfrom
NumberWan:bugfix/benchmark_metric
Apr 22, 2026
Merged

[BugFix] Surface diffusion metrics in chat completions; sanitize Bagel img2img mm kwargs#2932
Gaohan123 merged 4 commits intovllm-project:mainfrom
NumberWan:bugfix/benchmark_metric

Conversation

@NumberWan
Copy link
Copy Markdown
Contributor

@NumberWan NumberWan commented Apr 20, 2026

PLEASE FILL IN THE PR DESCRIPTION HERE ENSURING ALL CHECKLIST ITEMS (AT THE BOTTOM) HAVE BEEN CONSIDERED.

Purpose

Fixes #2931

Bug Description

Run Code:
pytest -s -v /home/w00917303/vllm-omni/tests/dfx/perf/scripts/run_diffusion_benchmark.py -- --config-file /home/w00917303/vllm-omni/tests/dfx/perf/tests/test_bagel_vllm_omni.json

In t2i test, memory related information cannot read correctly since the metric merging have problem in mutli-stage model.
屏幕截图 2026-04-22 113359

In i2i test, it showed 400 bad request. Root cause clarification: In the img2img path, OmniBagelProcessor forwards target_h/target_w (derived from width/height in the benchmark extra_body) into tokenizer(), which causes the failure. The SigLIP warning only indicates that image preprocessing ignores these two arguments.

屏幕截图 2026-04-22 114207

Summary

  • Ensure diffusion profiler fields (stage_durations, peak_memory_mb) are visible to clients and benchmarks on the OpenAI-compatible chat completions path for image outputs.
  • When height / width are provided, always pass target_h / target_w via mm_processor_kwargs for both t2i and i2i so GLM-style processors keep consistent behavior.
  • Strip target_h / target_w inside OmniBagelMultiModalProcessor before Bagel img2img HF calls so Bagel stays compatible without model-specific branching in the HTTP layer.

Changes

  • vllm_omni/entrypoints/openai/serving_chat.py — merge / default image response metrics with omni profiler fields; keep unified mm_processor_kwargs for dimensions on the omni multistage image path.
  • benchmarks/diffusion/backends.py — if message-level metrics lack stage_durations or peak_memory_mb, fall back to top-level metrics in the JSON body.
  • vllm_omni/model_executor/models/bagel/bagel.py_mm_kwargs_for_bagel_img2img_hf() and use it on img2img super()._call_hf_processor(...) paths.

Testing

  • Added unit test: test_create_image_choice_exposes_diffusion_metrics in tests/entrypoints/openai_api/test_serving_chat_metrics.py to verify stage_durations / peak_memory_mb are exposed in image chat completions.
  • Added benchmark regression tests in tests/benchmarks/test_diffusion_backends_metrics.py:
    • test_chat_completions_metrics_fallback_to_top_level
    • test_chat_completions_metrics_message_level_takes_precedence
  • Updated e2e test test_bagel_img2img_online in tests/e2e/online_serving/test_bagel_online.py to include explicit height / width in extra_body.

Test Results

In, t2i test, memory related information read correctly
image

In i2i test, both 30 request replied succussfully
屏幕截图 2026-04-22 142027

  • python -m pytest tests/entrypoints/openai_api/test_serving_chat_metrics.py -vv
    • test_create_image_choice_exposes_diffusion_metrics ✅ PASSED
  • python -m pytest tests/benchmarks/test_diffusion_backends_metrics.py -vv
    • test_chat_completions_metrics_fallback_to_top_level ✅ PASSED
    • test_chat_completions_metrics_message_level_takes_precedence ✅ PASSED
  • `python -m pytest tests/e2e/online_serving/test_bagel_online.py -vv
    • test_bagel_img2img_online[omni_server0] ✅ PASSED

Notes

  • Intentionally keeps Bagel-specific behavior in the Bagel multimodal processor rather than if model == ... in serving_chat.

Essential Elements of an Effective PR Description Checklist
  • The purpose of the PR, such as "Fix some issue (link existing issues this PR will resolve)".
  • The test plan. Please provide the test scripts & test commands. Please state the reasons if your codes don't require additional test scripts. For test file guidelines, please check the test style doc
  • The test results. Please paste the results comparison before and after, or the e2e results.
  • (Optional) The necessary documentation update, such as updating supported_models.md and examples for a new model. Please run mkdocs serve to sync the documentation editions to ./docs.
  • (Optional) Release notes update. If your change is user-facing, please update the release notes draft.

BEFORE SUBMITTING, PLEASE READ https://github.com/vllm-project/vllm-omni/blob/main/CONTRIBUTING.md (anything written below this line will be removed by GitHub Actions)

…lback

Signed-off-by: NumberWan <wantszkin2003@gmail.com>
Signed-off-by: NumberWan <wantszkin2003@gmail.com>
Signed-off-by: NumberWan <wantszkin2003@gmail.com>
Copy link
Copy Markdown
Collaborator

@hsliuustc0106 hsliuustc0106 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

BLOCKING:

  • Test Coverage — Missing regression test. Please add an automated test that verifies:
    1. Diffusion profiler metrics (stage_durations, peak_memory_mb) are exposed in chat completions responses for image outputs
    2. The benchmark fallback logic correctly populates metrics from top-level when message-level metrics are missing
    3. Bagel img2img mode works correctly with height/width parameters without HF processor errors

The current test plan only provides manual verification. Please add automated tests with assertions.

@NumberWan NumberWan changed the title [WIP][BugFix] Surface diffusion metrics in chat completions; sanitize Bagel img2img mm kwargs [BugFix] Surface diffusion metrics in chat completions; sanitize Bagel img2img mm kwargs Apr 21, 2026
Signed-off-by: NumberWan <wantszkin2003@gmail.com>
@NumberWan
Copy link
Copy Markdown
Contributor Author

BLOCKING:

  • Test Coverage — Missing regression test. Please add an automated test that verifies:

    1. Diffusion profiler metrics (stage_durations, peak_memory_mb) are exposed in chat completions responses for image outputs
    2. The benchmark fallback logic correctly populates metrics from top-level when message-level metrics are missing
    3. Bagel img2img mode works correctly with height/width parameters without HF processor errors

The current test plan only provides manual verification. Please add automated tests with assertions.

The 3 tests have been added and passed all tests. Please refer to the "Test" and "Test Result" parts in the description

@NumberWan
Copy link
Copy Markdown
Contributor Author

NumberWan commented Apr 21, 2026

@natureofnature

@hsliuustc0106
Copy link
Copy Markdown
Collaborator

I think you need to paste the results w/o this commit, I cannot see the benefits of this PR

@NumberWan
Copy link
Copy Markdown
Contributor Author

I think you need to paste the results w/o this commit, I cannot see the benefits of this PR

The description has been updated, before&after result showed in the description

@Gaohan123 Gaohan123 added the ready label to trigger buildkite CI label Apr 22, 2026
Copy link
Copy Markdown
Collaborator

@Gaohan123 Gaohan123 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM. Thanks

@Gaohan123 Gaohan123 enabled auto-merge (squash) April 22, 2026 16:23
@Gaohan123 Gaohan123 merged commit ead87aa into vllm-project:main Apr 22, 2026
7 of 8 checks passed
qinganrice pushed a commit to qinganrice/vllm-omni that referenced this pull request Apr 23, 2026
…l img2img mm kwargs (vllm-project#2932)

Signed-off-by: NumberWan <wantszkin2003@gmail.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

ready label to trigger buildkite CI

Projects

None yet

Development

Successfully merging this pull request may close these issues.

[Bug]: Chat completions omit diffusion profiler metrics; BAGEL img2img breaks on GLM-style mm_processor_kwargs

3 participants