Skip to content

[CI] Qwen image edit performance benckmark#2216

Merged
wtomin merged 6 commits into
vllm-project:mainfrom
fhfuih:qwen-image-edit-ci
Apr 14, 2026
Merged

[CI] Qwen image edit performance benckmark#2216
wtomin merged 6 commits into
vllm-project:mainfrom
fhfuih:qwen-image-edit-ci

Conversation

@fhfuih
Copy link
Copy Markdown
Contributor

@fhfuih fhfuih commented Mar 26, 2026

PLEASE FILL IN THE PR DESCRIPTION HERE ENSURING ALL CHECKLIST ITEMS (AT THE BOTTOM) HAVE BEEN CONSIDERED.

Purpose

Add benchmark test for Qwen Image Edit and Qwen Image Edit 2509 (multi-image input), similar to #1805 and #2111 .

In the second commit of this PR, it also incorporates the utility script from https://github.com/wtomin/vllm-omni/tree/read-benchmark , and extend the report for the recent CI kanban template

Note: this PR is based on #2179 . Pending that one to merge first. Also, pending at least one run on CI machine to finalize all the thresholds before merging

Test Plan

The same benchmark config as Qwen Image:

2 sampling parameter groups:

  • 512x512_steps20_i2i
  • 1536x1536_steps35_i2i

COMBINING

3 diffusion feature groups (same as Qwen Image):

  • base
  • Ulysses=2+CFG=2+VAE=4
  • 2Ulysses=2+CFG=2+CacheDiT

Test Result

Passed on my side. Benchmark figures on 4*A100 is as follow:

backend benchmark_params test_name(server_params) throughput_qps latency_mean latency_median latency_p99 latency_p95 latency_p50 peak_memory_mb_max peak_memory_mb_mean peak_memory_mb_median
vllm_omni 512x512_steps20_i2i test_qwen_image_edit_ulysses2_cfg2_vae_patch4 0.106573026 9.382967346 9.411954862 9.494944628 9.480373901 9.411954862 56660 56660 56660
vllm_omni 1536x1536_steps35_i2i test_qwen_image_edit_ulysses2_cfg2_vae_patch4 0.026411221 37.86242981 37.87738652 38.01178825 37.98244949 37.87738652 56670 56670 56670
vllm_omni 512x512_steps20_i2i test_qwen_image_edit_ulysses2_cfg2_cache_dit 0.139649285 7.160598632 6.961157049 7.749325271 7.733595765 6.961157049 60468 60468 60468
vllm_omni 1536x1536_steps35_i2i test_qwen_image_edit_ulysses2_cfg2_cache_dit 0.045629346 21.91545818 21.58149048 24.18900405 23.91975841 21.58149048 67394 67394 67394
vllm_omni 512x512_steps20_i2i test_qwen_image_edit_single_device 0.040145847 24.90893269 24.84716635 25.23590528 25.17896626 24.84716635 60356 60356 60356
vllm_omni 1536x1536_steps35_i2i test_qwen_image_edit_single_device 0.00822772 121.5400661 121.61316 121.8717732 121.8502488 121.61316 67282 67282 67282

Essential Elements of an Effective PR Description Checklist
  • The purpose of the PR, such as "Fix some issue (link existing issues this PR will resolve)".
  • The test plan. Please provide the test scripts & test commands. Please state the reasons if your codes don't require additional test scripts. For test file guidelines, please check the test style doc
  • The test results. Please paste the results comparison before and after, or the e2e results.
  • (Optional) The necessary documentation update, such as updating supported_models.md and examples for a new model. Please run mkdocs serve to sync the documentation editions to ./docs.
  • (Optional) Release notes update. If your change is user-facing, please update the release notes draft.

BEFORE SUBMITTING, PLEASE READ https://github.com/vllm-project/vllm-omni/blob/main/CONTRIBUTING.md (anything written below this line will be removed by GitHub Actions)

@fhfuih fhfuih force-pushed the qwen-image-edit-ci branch 3 times, most recently from 303b0cc to aff539a Compare March 27, 2026 09:39
@wtomin
Copy link
Copy Markdown
Collaborator

wtomin commented Mar 30, 2026

Is this PR ready for review? @fhfuih

@fhfuih fhfuih force-pushed the qwen-image-edit-ci branch 3 times, most recently from 945fc1e to 7b9836a Compare March 30, 2026 08:39
@fhfuih fhfuih marked this pull request as ready for review March 30, 2026 08:40
@fhfuih fhfuih requested a review from hsliuustc0106 as a code owner March 30, 2026 08:40
Copilot AI review requested due to automatic review settings March 30, 2026 08:40
@fhfuih
Copy link
Copy Markdown
Contributor Author

fhfuih commented Mar 30, 2026

Is this PR ready for review? @fhfuih

Ready now! And it also needs a nightly CI tag

Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Adds nightly diffusion performance benchmark coverage for the Qwen-Image-Edit model family (including the multi-image 2509 variant) and extends the existing diffusion benchmark runner/serving scripts to better support/report these runs.

Changes:

  • Add new perf config JSONs for Qwen/Qwen-Image-Edit and Qwen/Qwen-Image-Edit-2509, and wire them into the nightly Buildkite diffusion perf step.
  • Extend the diffusion perf runner to emit richer, flattened reporting fields (resolution/parallelism/cache/etc.) and provenance metadata in the aggregated JSON output.
  • Update diffusion serving benchmark to support generating multiple synthetic input images via --num-input-images for random i2i-style tasks.

Reviewed changes

Copilot reviewed 8 out of 9 changed files in this pull request and generated 5 comments.

Show a summary per file
File Description
tests/dfx/perf/tests/test_qwen_image_vllm_omni.json Minor formatting cleanup in existing Qwen-Image perf config.
tests/dfx/perf/tests/test_qwen_image_edit_vllm_omni.json New Qwen-Image-Edit perf config (single device / ulysses+cfg+vae / cache_dit).
tests/dfx/perf/tests/test_qwen_image_edit_2509_vllm_omni.json New Qwen-Image-Edit-2509 perf config with 2-input-image benchmarks.
tests/dfx/perf/scripts/run_diffusion_benchmark.py Add flattened reporting fields + commit/build provenance; refactor server config handling.
benchmarks/diffusion/diffusion_benchmark_serving.py Add --num-input-images and generate multiple synthetic input images for random dataset.
tests/dfx/benchmark_results_to_excel.py New local utility to convert benchmark JSONs into an Excel summary.
pyproject.toml Add pandas to dev dependencies.
.gitignore Unignore perf config JSONs under tests/dfx/perf/tests/.
.buildkite/test-nightly.yml Run the new Qwen-Image-Edit benchmarks in nightly diffusion perf step; set env vars; mark step soft-fail.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment thread tests/dfx/perf/scripts/run_diffusion_benchmark.py Outdated
Comment thread .buildkite/test-nightly.yml Outdated
Comment thread tests/dfx/perf/tests/test_qwen_image_edit_vllm_omni.json
Comment thread tests/dfx/perf/tests/test_qwen_image_edit_2509_vllm_omni.json

os.environ["VLLM_WORKER_MULTIPROC_METHOD"] = "spawn"
os.environ["VLLM_TEST_CLEAN_GPU_MEMORY"] = "0"
os.environ.setdefault("DIFFUSION_ATTENTION_BACKEND", "FLASH_ATTN")
Copy link

Copilot AI Mar 30, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

DIFFUSION_ATTENTION_BACKEND is being defaulted to FLASH_ATTN at import time. This can force the FlashAttention backend even in environments where flash-attn isn’t installed (the backend raises ImportError and suggests using TORCH_SDPA), causing local runs to fail unexpectedly. Prefer leaving this env var unset by default (let platform selection decide), or set it conditionally only when FlashAttention is available / in CI where it’s guaranteed.

Suggested change
os.environ.setdefault("DIFFUSION_ATTENTION_BACKEND", "FLASH_ATTN")

Copilot uses AI. Check for mistakes.
Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Intended to use flash attention in benchmark

Copy link
Copy Markdown

@chatgpt-codex-connector chatgpt-codex-connector Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 7b9836a790

ℹ️ About Codex in GitHub

Codex has been enabled to automatically review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

When you sign up for Codex through ChatGPT, Codex can also answer questions or update the PR, like "@codex address that feedback".

Comment thread .buildkite/test-nightly.yml Outdated
Comment thread tests/dfx/perf/scripts/run_diffusion_benchmark.py Outdated
@fhfuih
Copy link
Copy Markdown
Contributor Author

fhfuih commented Mar 30, 2026

@congw729 in this PR, apart from adding the Qwen Image Edit benchmark test, I also adjusted the test report's format and added the Excel export script from @wtomin 's https://github.com/wtomin/vllm-omni/tree/read-benchmark to match the WIP perf kanban format per our internal discussion. Plz suggest if it is OK to include this script here in this PR. Thanks Removed

Comment thread .gitignore Outdated
Comment thread tests/dfx/benchmark_results_to_excel.py Outdated
@congw729
Copy link
Copy Markdown
Collaborator

@congw729 in this PR, apart from adding the Qwen Image Edit benchmark test, I also adjusted the test report's format and added the Excel export script from @wtomin 's https://github.com/wtomin/vllm-omni/tree/read-benchmark to match the WIP perf kanban format per our internal discussion. Plz suggest if it is OK to include this script here in this PR. Thanks

Can't we reuse the original scripts?

@fhfuih fhfuih force-pushed the qwen-image-edit-ci branch from 8aebd40 to c45a3bb Compare March 31, 2026 08:21
@wtomin wtomin added the nightly-test label to trigger buildkite nightly test CI label Mar 31, 2026
@congw729
Copy link
Copy Markdown
Collaborator

congw729 commented Apr 1, 2026

Do you also need to modify the _DIFFUSION_JSON_PREFIX in tools/nightly/generate_nightly_perf_excel.py?

@fhfuih
Copy link
Copy Markdown
Contributor Author

fhfuih commented Apr 1, 2026

Do you also need to modify the _DIFFUSION_JSON_PREFIX in tools/nightly/generate_nightly_perf_excel.py?

Yes, this is done


The latest CI result is at https://buildkite.com/vllm/vllm-omni/builds/5593/steps/canvas

Extra manual test: download all the json files in the Qwen Image Series Perf Test -> Artifacts. Run the generate_nightly_perf_excel.py script. Got the following output:

nightly_perf_20260401-082845.xlsx

empty commit_sha because it is read from buildkite environment. Major fields read successfully

@congw729
Copy link
Copy Markdown
Collaborator

congw729 commented Apr 2, 2026

Is this ready to merge?

"enable-negative-prompt": true,
"baseline": {
"throughput_qps": 0.008,
"latency_p99": 150.0,
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

latency_mean is a better metric than latency_p99

Copy link
Copy Markdown
Collaborator

@congw729 congw729 Apr 2, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For the baseline metrics, maybe keep mean, median, and p99 together?

Copy link
Copy Markdown
Contributor Author

@fhfuih fhfuih Apr 2, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For the baseline metrics, maybe keep mean, median, and p99 together?

After we have a performance kanban, asserting too much metrics here may not be needed no more. It fluctuates and sometime blocks CI unexpectedly.

latency_mean is a better metric than latency_p99

Agree. Less strict. I will change that

@fhfuih
Copy link
Copy Markdown
Contributor Author

fhfuih commented Apr 2, 2026

Is this ready to merge?

Pending #2415 yesterday. Should be able to merge today. I'll run CI again

@wtomin
Copy link
Copy Markdown
Collaborator

wtomin commented Apr 2, 2026

I solved this conflicts. I think the previous CI passed nicely, Let's wait until this CI end.

Signed-off-by: Huang, Zeyu <11222265+fhfuih@users.noreply.github.com>
fhfuih added 2 commits April 13, 2026 11:00
Add benchmark_results_to_excel.py for aggregating benchmark JSON into Excel.
Adapt the excel generator to the new diffusion JSON format

Made-with: Cursor

Signed-off-by: Huang, Zeyu <11222265+fhfuih@users.noreply.github.com>
Signed-off-by: Huang, Zeyu <11222265+fhfuih@users.noreply.github.com>
@fhfuih fhfuih force-pushed the qwen-image-edit-ci branch from b225d4e to 109713b Compare April 13, 2026 03:04
Signed-off-by: Huang, Zeyu <11222265+fhfuih@users.noreply.github.com>
@fhfuih fhfuih force-pushed the qwen-image-edit-ci branch from aabbde3 to 25b8f51 Compare April 13, 2026 03:25
@fhfuih
Copy link
Copy Markdown
Contributor Author

fhfuih commented Apr 13, 2026

Rebased on main branch and here is the CI result up till the above action:

https://buildkite.com/vllm/vllm-omni/builds/6446/steps/canvas?sid=019d84e8-b7b5-45c7-91cd-99ac69d4a3df&tab=output

Note that there is one assertion error due to my updated threshold. I will loosen the threshold once again (appearing in more commits below soon).

The logics remain intact and ready to merge

Signed-off-by: Huang, Zeyu <11222265+fhfuih@users.noreply.github.com>
@fhfuih fhfuih force-pushed the qwen-image-edit-ci branch from cdc360f to a4647bd Compare April 13, 2026 07:57
@fhfuih
Copy link
Copy Markdown
Contributor Author

fhfuih commented Apr 14, 2026

Note: all tests have passed in https://buildkite.com/vllm/vllm-omni/builds/6482/steps/canvas except for two unrelated ones. Below I will submit a minor fix of pre-commit issue. I think we can ignore the latest CI

Signed-off-by: Huang, Zeyu <11222265+fhfuih@users.noreply.github.com>
@fhfuih fhfuih force-pushed the qwen-image-edit-ci branch from 410376a to 9edda19 Compare April 14, 2026 01:41
Copy link
Copy Markdown
Collaborator

@wtomin wtomin left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@wtomin wtomin added ready label to trigger buildkite CI and removed nightly-test label to trigger buildkite nightly test CI labels Apr 14, 2026
@wtomin wtomin merged commit 8d23549 into vllm-project:main Apr 14, 2026
7 of 8 checks passed
@fhfuih fhfuih deleted the qwen-image-edit-ci branch April 14, 2026 06:39
alex-jw-brooks pushed a commit to alex-jw-brooks/vllm-omni that referenced this pull request Apr 14, 2026
Signed-off-by: Huang, Zeyu <11222265+fhfuih@users.noreply.github.com>
alex-jw-brooks pushed a commit to alex-jw-brooks/vllm-omni that referenced this pull request Apr 14, 2026
Signed-off-by: Huang, Zeyu <11222265+fhfuih@users.noreply.github.com>
y123456y78 pushed a commit to y123456y78/vllm-omni that referenced this pull request Apr 15, 2026
Signed-off-by: Huang, Zeyu <11222265+fhfuih@users.noreply.github.com>
y123456y78 pushed a commit to y123456y78/vllm-omni that referenced this pull request Apr 16, 2026
Signed-off-by: Huang, Zeyu <11222265+fhfuih@users.noreply.github.com>
lvliang-intel pushed a commit to lvliang-intel/vllm-omni that referenced this pull request Apr 20, 2026
Signed-off-by: Huang, Zeyu <11222265+fhfuih@users.noreply.github.com>
lengrongfu pushed a commit to lengrongfu/vllm-omni that referenced this pull request May 1, 2026
Signed-off-by: Huang, Zeyu <11222265+fhfuih@users.noreply.github.com>
clodaghwalsh17 pushed a commit to clodaghwalsh17/nm-vllm-omni-ent that referenced this pull request May 12, 2026
Signed-off-by: Huang, Zeyu <11222265+fhfuih@users.noreply.github.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

ready label to trigger buildkite CI

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants