[Perf] Qwen-Image Performance Nightly CI test by wtomin · Pull Request #1805 · vllm-project/vllm-omni

wtomin · 2026-03-11T05:30:29Z

Purpose

This is one of the PRs addressing this RFC #1606 .

This PR aims to run night performance benchmark for Qwen-Image text-to-image tasks, and generate performance records in json files.

The current acceleration combinations includes:

test case
Single device, no acceleration
USP=2, CFG=2, VAE-Patch-Parallel=4
USP=2, CFG=2, CacheDiT

Notes:

This benchmark test requires 4 H100 GPUs with 80GB VRAM

CPU offloading (module-wise) is too slow. Excluding it from configurations

I removed tp2+usp2 because it is not as good as cfg+usp2

I removed ring2+usp2 because it is slower than usp2 along

The acceleration combinations are defined as "server_params" in tests/perf/tests/test_qwen_image_vllm_omni.json or tests/perf/tests/test_qwen_image_sglang_diffusion.json.

The current dataset configuration:

Parameter	Value
dataset	random
task	t2i
resolution	512x512 or 1536x1536 or mixed_resolution
num-inference-steps	20 or 35 or mixed_steps
num-prompts	10
max-concurrency	1
negative-prompt	blurry, low quality, distorted
cfg-scale	4.0

Notes:

Currently does not involve concurrency > 1;

Tries to mimic the dataset A & B & C in [Benchmark] [Diffusion] [Enhancement] Random dataset #1657

The datasets params are defined in tests/perf/tests/benchmark_params.json.

The number of test cases equals to:

The number of server_params x The number of benchmark_params = 9

Test Plan

Only vllm-omni benchmarks is defined in night CI now. Sglang benchmark is verified locally.

If running it locally:

export DIFFUSION_BENCHMARK_DIR=tests/perf/results
pytest -s -v tests/perf/scripts/run_diffusion_benchmark.py --config-file tests/perf/tests/test_qwen_image_vllm_omni.json

# I only run sglang benchmark locally
pytest -s -v tests/perf/scripts/run_diffusion_benchmark.py --config-file tests/perf/tests/test_qwen_image_sglang_diffusion.json

It tooked about 12mins on H800:


================================================ 9 passed in 649.58s (0:12:49) =================================================

Afterwards, json files will be generated, for example:

$ ls tests/perf/results/
benchmark_results_test_qwen_image_vllm_omni_20260323-095953.json  logs
 ls tests/perf/results/logs/
test_qwen_image_ulysses2_cfg2_cache_dit_vllm-omni_20260323-101210.log  test_qwen_image_ulysses2_cfg2_vae_patch4_vllm-omni_20260323-100801.log  test_qwen_image_ulysses2_cfg2_vllm-omni_20260323-100309.log
test_qwen_image_ulysses2_cfg2_cache_dit_vllm-omni_20260323-101226.log  test_qwen_image_ulysses2_cfg2_vae_patch4_vllm-omni_20260323-100819.log  test_qwen_image_ulysses2_cfg2_vllm-omni_20260323-100327.log
test_qwen_image_ulysses2_cfg2_cache_dit_vllm-omni_20260323-101319.log  test_qwen_image_ulysses2_cfg2_vae_patch4_vllm-omni_20260323-100943.log  test_qwen_image_ulysses2_cfg2_vllm-omni_20260323-100449.lo

Test Result

Notes:

measured on H800 (4 cards);

My FA installation is broke, thus this experiment is using SDPA;

All results of both vllm-omni and sglang are shared in this URL.

An example of json file of vllm-omni:

  {
    "test_name": "test_qwen_image_ulysses2_cfg2",
    "backend": "vllm-omni",
    "timestamp": "20260323-100309",
    "benchmark_params": {
      "name": "512x512_steps20",
      "dataset": "random",
      "task": "t2i",
      "width": 512,
      "height": 512,
      "num-inference-steps": 20,
      "num-prompts": 10,
      "max-concurrency": 1,
      "enable-negative-prompt": true,
      "baseline": {
        "throughput_qps": 0.001,
        "latency_p99": 1000.0,
        "peak_memory_mb_max": 400000,
        "peak_memory_mb_mean": 400000
      }
    },
    "result": {
      "duration": 13.625132665038109,
      "completed_requests": 10,
      "failed_requests": 0,
      "throughput_qps": 0.7339378078614863,
      "latency_mean": 1.3623275712132454,
      "latency_median": 1.364011375233531,
      "latency_p99": 1.3710215020179748,
      "latency_p95": 1.3701544940471648,
      "latency_p50": 1.364011375233531,
      "peak_memory_mb_max": 60202.0,
      "peak_memory_mb_mean": 60202.0,
      "peak_memory_mb_median": 60202.0,
      "stage_durations_mean": {
        "QwenImagePipeline.text_encoder.forward": 0.012416277453303337,
        "QwenImagePipeline.diffuse": 1.2124435044825077,
        "QwenImagePipeline.vae.decode": 0.018585569970309735
      },
      "stage_durations_p50": {
        "QwenImagePipeline.text_encoder.forward": 0.012366048991680145,
        "QwenImagePipeline.diffuse": 1.2126157889142632,
        "QwenImagePipeline.vae.decode": 0.01858174055814743
      },
      "stage_durations_p99": {
        "QwenImagePipeline.text_encoder.forward": 0.012640358190983534,
        "QwenImagePipeline.diffuse": 1.2175082486867905,
        "QwenImagePipeline.vae.decode": 0.018637520931661128
      },
      "backend": "vllm-omni",
      "model": "Qwen/Qwen-Image",
      "dataset": "random",
      "task": "t2i"
    },
    "log_file": "/home/fq9hpsac/fq9hpsacuser04/workspace/vllm-omni-ddd/tests/perf/results/logs/test_qwen_image_ulysses2_cfg2_vllm-omni_20260323-100309.log"
  },

Essential Elements of an Effective PR Description Checklist

The purpose of the PR, such as "Fix some issue (link existing issues this PR will resolve)".
The test plan. Please provide the test scripts & test commands. Please state the reasons if your codes don't require additional test scripts. For test file guidelines, please check the test style doc
The test results. Please paste the results comparison before and after, or the e2e results.
(Optional) The necessary documentation update, such as updating supported_models.md and examples for a new model. Please run mkdocs serve to sync the documentation editions to ./docs.
(Optional) Release notes update. If your change is user-facing, please update the release notes draft.

BEFORE SUBMITTING, PLEASE READ https://github.com/vllm-project/vllm-omni/blob/main/CONTRIBUTING.md (anything written below this line will be removed by GitHub Actions)

wtomin · 2026-03-12T01:11:32Z

@congw729 Alicia, how to solve thi email sending problem?

python tools/nightly/send_nightly_perf_email.py --report-file tests
Missing required env vars: SMTP_HOST, SMTP_PORT, SMTP_USERNAME, SMTP_PASSWORD, DAILY_EMAIL_LIST. Set them (e.g. in Buildkite secrets).

Besides, can you check it the .xlsx file is correct?

chatgpt-codex-connector

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 2ea794b290

ℹ️ About Codex in GitHub

Codex has been enabled to automatically review pull requests in this repo. Reviews are triggered when you

Open a pull request for review
Mark a draft as ready
Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

When you sign up for Codex through ChatGPT, Codex can also answer questions or update the PR, like "@codex address that feedback".

wtomin · 2026-03-12T01:27:27Z

@hsliuustc0106 Can you check if the dataset configuration and test case coverage are in line with your expectations?

congw729 · 2026-03-12T01:53:01Z

@congw729 Alicia, how to solve thi email sending problem?
python tools/nightly/send_nightly_perf_email.py --report-file tests
Missing required env vars: SMTP_HOST, SMTP_PORT, SMTP_USERNAME, SMTP_PASSWORD, DAILY_EMAIL_LIST. Set them (e.g. in Buildkite secrets).
Besides, can you check it the .xlsx file is correct?

Hi, you can set those using your personal email account token (like Gmail) for your local testing (which will send email). Or you can set those env to None and using --dry-run to test (only test the most functionality).

congw729 · 2026-03-12T01:54:53Z

Can you send me the performance results JSON file? I need to make sure the combination of all models' performance results is correctly shown.

congw729 · 2026-03-12T02:03:30Z

Have you checked the #1321? I'm not sure if you are using the same key for the same meaning metric.

hsliuustc0106 · 2026-03-12T02:17:54Z

@hsliuustc0106 Can you check if the dataset configuration and test case coverage are in line with your expectations?

if we only check the performance here, we need to use diffusion profiler for each stage and make the inference step smaller, maybe even 1&2 is enough.

hsliuustc0106 · 2026-03-12T02:18:38Z

please check #1657 for dataset A&B&C

wtomin · 2026-03-12T02:23:19Z

Can you send me the performance results JSON file? I need to make sure the combination of all models' performance results is correctly shown.

Sure. Sent to you.

Have you checked the #1321? I'm not sure if you are using the same key for the same meaning metric.

A lot of metrics name of omni models and diffusion models are not overlapped. An example of diffusion performance json file:

{
  "duration": 55.729680337011814,
  "completed_requests": 10,
  "failed_requests": 0,
  "throughput_qps": 0.17943759841304327,
  "latency_mean": 5.572730791615323,
  "latency_median": 5.546993801603094,
  "latency_p99": 5.834732530997135,
  "latency_p50": 5.546993801603094,
  "peak_memory_mb_max": 0,
  "peak_memory_mb_mean": 0,
  "peak_memory_mb_median": 0,
  "backend": "vllm-omni",
  "model": "Qwen/Qwen-Image",
  "dataset": "random",
  "task": "t2i"
}

I think we cannot use the same key names as the omni models's json file.

congw729 · 2026-03-12T02:57:10Z

Can you send me the performance results JSON file? I need to make sure the combination of all models' performance results is correctly shown.

Sure. Sent to you.

Have you checked the #1321? I'm not sure if you are using the same key for the same meaning metric.

A lot of metrics name of omni models and diffusion models are not overlapped. An example of diffusion performance json file:
{
  "duration": 55.729680337011814,
  "completed_requests": 10,
  "failed_requests": 0,
  "throughput_qps": 0.17943759841304327,
  "latency_mean": 5.572730791615323,
  "latency_median": 5.546993801603094,
  "latency_p99": 5.834732530997135,
  "latency_p50": 5.546993801603094,
  "peak_memory_mb_max": 0,
  "peak_memory_mb_mean": 0,
  "peak_memory_mb_median": 0,
  "backend": "vllm-omni",
  "model": "Qwen/Qwen-Image",
  "dataset": "random",
  "task": "t2i"
}
I think we cannot use the same key names as the omni models's json file.

I see, in that case, we could have two sub-sheets for different types of models.

wtomin · 2026-03-12T03:19:15Z

@congw729 As requested, the email sending label is removed from test-nightly.yml. Related edits were reverted.

congw729 · 2026-03-12T09:29:36Z

I have another question, in Omni benchmark tests, we will have max_concurrency and num_prompts to control the request frequency. Do we need those configurations in diffusion models' benchmark tests?

wtomin · 2026-03-13T02:46:27Z

I have another question, in Omni benchmark tests, we will have max_concurrency and num_prompts to control the request frequency. Do we need those configurations in diffusion models' benchmark tests?

I think max_concurrency value is only valid when batch request is supported for diffusion models. Since most diffusion models do not support batch request (@fhfuih only Qwen-Image support it? I think wan2.2 does not support batch request), then we should always use max_concurrency =1. Even if we enlarge the max_concurrency, the request is still feeding to the pipeline one by one.

Please correct me if I am wrong. @Bounty-hunter Have you tested max_concurrency >1 in your PR #1657?

fhfuih · 2026-03-13T02:53:34Z

I think max_concurrency value is only valid when batch request is supported for diffusion models. Since most diffusion models do not support batch request (@fhfuih only Qwen-Image support it? I think wan2.2 does not support batch request),

Correct, not all models support batch request. For the common Alibaba ones, Wan & Qwen-Image-Edit-* don't support. Qwen-Image supports. Z-Image can run, but the internal paddings and stuff may not be truly optimized for batch requests

hsliuustc0106 · 2026-03-19T12:55:07Z

I suggest to test different resolutions, in each resolution, we only need to select a few(maybe only the lowest latency one) parallel strategy to monitor the performance.

Signed-off-by: Didan Deng <33117903+wtomin@users.noreply.github.com>

congw729 · 2026-03-23T08:59:22Z

Hi, as we discussed earlier, would you attach the baseline results to *.json for future comparison. Ref: #2011

Gaohan123

LGTM. Thanks

Signed-off-by: Didan Deng <33117903+wtomin@users.noreply.github.com>

wtomin mentioned this pull request Mar 11, 2026

[RFC]: Performance Dashboard for Diffusion Models #1606

Closed

5 tasks

wtomin changed the title ~~[Perf][WIP] Qwen-Image Night Performance CI test~~ [Perf][WIP] Qwen-Image Performance Nightly CI test Mar 11, 2026

wtomin force-pushed the perf-diff-qwen-image branch from 350a928 to 2ea794b Compare March 11, 2026 12:58

wtomin marked this pull request as ready for review March 12, 2026 01:10

wtomin requested a review from hsliuustc0106 as a code owner March 12, 2026 01:10

wtomin changed the title ~~[Perf][WIP] Qwen-Image Performance Nightly CI test~~ [Perf] Qwen-Image Performance Nightly CI test Mar 12, 2026

chatgpt-codex-connector Bot reviewed Mar 12, 2026

View reviewed changes

Comment thread tests/perf/scripts/run_qwen_image_benchmark.py Outdated

Comment thread tests/perf/scripts/run_qwen_image_benchmark.py Outdated

congw729 reviewed Mar 12, 2026

View reviewed changes

Comment thread tools/nightly/generate_diffusion_nightly_perf_excel.py Outdated

congw729 reviewed Mar 12, 2026

View reviewed changes

Comment thread .buildkite/test-nightly.yml Outdated

Gaohan123 added this to the v0.18.0 milestone Mar 12, 2026

This was referenced Mar 12, 2026

[RFC]: Diffusion Models Features Supports Plan #814

Open

[RFC]: L4 test Performance for Diffusion Models JiusiServe/vllm-omni#155

Closed

yenuo26 mentioned this pull request Mar 13, 2026

[RFC]: Supplement use cases for L1, L3, and L4 JiusiServe/vllm-omni#163

Closed

1 task

wtomin changed the title ~~[Perf] Qwen-Image Performance Nightly CI test~~ [WIP][Perf] Qwen-Image Performance Nightly CI test Mar 17, 2026

wtomin force-pushed the perf-diff-qwen-image branch from 37569a7 to 6806ee8 Compare March 19, 2026 09:25

hsliuustc0106 reviewed Mar 19, 2026

View reviewed changes

Comment thread tests/perf/tests/test_qwen_image_vllm_omni.json Outdated

Comment thread tests/perf/tests/test_qwen_image_vllm_omni.json Outdated

wtomin added 18 commits March 23, 2026 14:59

mix resolution

b01eabd

Signed-off-by: Didan Deng <33117903+wtomin@users.noreply.github.com>

correct test cases

0d95743

Signed-off-by: Didan Deng <33117903+wtomin@users.noreply.github.com>

get peak_mem

c8b2544

Signed-off-by: Didan Deng <33117903+wtomin@users.noreply.github.com>

get stage durations

eed0a56

Signed-off-by: Didan Deng <33117903+wtomin@users.noreply.github.com>

correct name

58172bf

Signed-off-by: Didan Deng <33117903+wtomin@users.noreply.github.com>

pipeline profiler

5c28da7

Signed-off-by: Didan Deng <33117903+wtomin@users.noreply.github.com>

accept sglang

dbf644c

Signed-off-by: Didan Deng <33117903+wtomin@users.noreply.github.com>

accept sglang backend

c4d25bb

Signed-off-by: Didan Deng <33117903+wtomin@users.noreply.github.com>

cache-dit sync

8fc03d7

Signed-off-by: Didan Deng <33117903+wtomin@users.noreply.github.com>

cache-dit sync

aa6b94d

Signed-off-by: Didan Deng <33117903+wtomin@users.noreply.github.com>

disable offload for sglang

7b655fa

Signed-off-by: Didan Deng <33117903+wtomin@users.noreply.github.com>

update cache-dit-config

be36628

Signed-off-by: Didan Deng <33117903+wtomin@users.noreply.github.com>

aggregate_results

9ae4a1c

Signed-off-by: Didan Deng <33117903+wtomin@users.noreply.github.com>

save logs

c8e4c6d

Signed-off-by: Didan Deng <33117903+wtomin@users.noreply.github.com>

save more logs

77548b7

Signed-off-by: Didan Deng <33117903+wtomin@users.noreply.github.com>

save more

e7e0e75

Signed-off-by: Didan Deng <33117903+wtomin@users.noreply.github.com>

updates

b485cc5

Signed-off-by: Didan Deng <33117903+wtomin@users.noreply.github.com>

udpates

5d1d3c5

Signed-off-by: Didan Deng <33117903+wtomin@users.noreply.github.com>

wtomin force-pushed the perf-diff-qwen-image branch from 5860bf4 to 5d1d3c5 Compare March 23, 2026 07:21

wtomin added 4 commits March 23, 2026 15:23

rm docs

5b01d68

Signed-off-by: Didan Deng <33117903+wtomin@users.noreply.github.com>

change sglang name

ed911ae

Signed-off-by: Didan Deng <33117903+wtomin@users.noreply.github.com>

replace by current_omni_platform

ae7086f

Signed-off-by: Didan Deng <33117903+wtomin@users.noreply.github.com>

single device test

ff8ee9b

Signed-off-by: Didan Deng <33117903+wtomin@users.noreply.github.com>

hsliuustc0106 added the ready label to trigger buildkite CI label Mar 23, 2026

Gaohan123 approved these changes Mar 23, 2026

View reviewed changes

Gaohan123 merged commit 1f5eca5 into vllm-project:main Mar 23, 2026
7 of 8 checks passed

fhfuih mentioned this pull request Mar 26, 2026

[CI] Qwen image edit performance benckmark #2216

Merged

5 tasks

BBuf mentioned this pull request Apr 28, 2026

SGLang Diffusion 外部影响力调研：kernel、feature 与平台采用情况 BBuf/how-to-optim-algorithm-in-cuda#14

Open

clodaghwalsh17 pushed a commit to clodaghwalsh17/nm-vllm-omni-ent that referenced this pull request May 12, 2026

[Perf] Qwen-Image Performance Nightly CI test (vllm-project#1805)

fce92e7

Signed-off-by: Didan Deng <33117903+wtomin@users.noreply.github.com>

Conversation

wtomin commented Mar 11, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Purpose

Test Plan

Test Result

Uh oh!

wtomin commented Mar 12, 2026

Uh oh!

chatgpt-codex-connector Bot left a comment

Choose a reason for hiding this comment

💡 Codex Review

Uh oh!

Uh oh!

Uh oh!

wtomin commented Mar 12, 2026

Uh oh!

congw729 commented Mar 12, 2026

Uh oh!

congw729 commented Mar 12, 2026

Uh oh!

congw729 commented Mar 12, 2026

Uh oh!

Uh oh!

hsliuustc0106 commented Mar 12, 2026

Uh oh!

Uh oh!

hsliuustc0106 commented Mar 12, 2026

Uh oh!

wtomin commented Mar 12, 2026

Uh oh!

congw729 commented Mar 12, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

wtomin commented Mar 12, 2026

Uh oh!

congw729 commented Mar 12, 2026

Uh oh!

wtomin commented Mar 13, 2026

Uh oh!

fhfuih commented Mar 13, 2026

Uh oh!

Uh oh!

Uh oh!

hsliuustc0106 commented Mar 19, 2026

Uh oh!

congw729 commented Mar 23, 2026

Uh oh!

Gaohan123 left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

6 participants

wtomin commented Mar 11, 2026 •

edited

Loading

congw729 commented Mar 12, 2026 •

edited

Loading