Skip to content

[Perf] Qwen-Image Performance Nightly CI test#1805

Merged
Gaohan123 merged 96 commits into
vllm-project:mainfrom
wtomin:perf-diff-qwen-image
Mar 23, 2026
Merged

[Perf] Qwen-Image Performance Nightly CI test#1805
Gaohan123 merged 96 commits into
vllm-project:mainfrom
wtomin:perf-diff-qwen-image

Conversation

@wtomin
Copy link
Copy Markdown
Collaborator

@wtomin wtomin commented Mar 11, 2026

Purpose

This is one of the PRs addressing this RFC #1606 .

This PR aims to run night performance benchmark for Qwen-Image text-to-image tasks, and generate performance records in json files.

The current acceleration combinations includes:

test case
Single device, no acceleration
USP=2, CFG=2, VAE-Patch-Parallel=4
USP=2, CFG=2, CacheDiT

Notes:

  1. This benchmark test requires 4 H100 GPUs with 80GB VRAM
  2. CPU offloading (module-wise) is too slow. Excluding it from configurations
  3. I removed tp2+usp2 because it is not as good as cfg+usp2
  4. I removed ring2+usp2 because it is slower than usp2 along

The acceleration combinations are defined as "server_params" in tests/perf/tests/test_qwen_image_vllm_omni.json or tests/perf/tests/test_qwen_image_sglang_diffusion.json.

The current dataset configuration:

Parameter Value
dataset random
task t2i
resolution 512x512 or 1536x1536 or mixed_resolution
num-inference-steps 20 or 35 or mixed_steps
num-prompts 10
max-concurrency 1
negative-prompt blurry, low quality, distorted
cfg-scale 4.0

Notes:

  1. Currently does not involve concurrency > 1;
  2. Tries to mimic the dataset A & B & C in [Benchmark] [Diffusion] [Enhancement] Random dataset #1657

The datasets params are defined in tests/perf/tests/benchmark_params.json.

The number of test cases equals to:

The number of server_params x The number of benchmark_params = 9

Test Plan

Only vllm-omni benchmarks is defined in night CI now. Sglang benchmark is verified locally.

If running it locally:

export DIFFUSION_BENCHMARK_DIR=tests/perf/results
pytest -s -v tests/perf/scripts/run_diffusion_benchmark.py --config-file tests/perf/tests/test_qwen_image_vllm_omni.json

# I only run sglang benchmark locally
pytest -s -v tests/perf/scripts/run_diffusion_benchmark.py --config-file tests/perf/tests/test_qwen_image_sglang_diffusion.json

It tooked about 12mins on H800:


================================================ 9 passed in 649.58s (0:12:49) =================================================

Afterwards, json files will be generated, for example:

$ ls tests/perf/results/
benchmark_results_test_qwen_image_vllm_omni_20260323-095953.json  logs
 ls tests/perf/results/logs/
test_qwen_image_ulysses2_cfg2_cache_dit_vllm-omni_20260323-101210.log  test_qwen_image_ulysses2_cfg2_vae_patch4_vllm-omni_20260323-100801.log  test_qwen_image_ulysses2_cfg2_vllm-omni_20260323-100309.log
test_qwen_image_ulysses2_cfg2_cache_dit_vllm-omni_20260323-101226.log  test_qwen_image_ulysses2_cfg2_vae_patch4_vllm-omni_20260323-100819.log  test_qwen_image_ulysses2_cfg2_vllm-omni_20260323-100327.log
test_qwen_image_ulysses2_cfg2_cache_dit_vllm-omni_20260323-101319.log  test_qwen_image_ulysses2_cfg2_vae_patch4_vllm-omni_20260323-100943.log  test_qwen_image_ulysses2_cfg2_vllm-omni_20260323-100449.lo

Test Result

Notes:

  1. measured on H800 (4 cards);
  2. My FA installation is broke, thus this experiment is using SDPA;

All results of both vllm-omni and sglang are shared in this URL.

An example of json file of vllm-omni:

  {
    "test_name": "test_qwen_image_ulysses2_cfg2",
    "backend": "vllm-omni",
    "timestamp": "20260323-100309",
    "benchmark_params": {
      "name": "512x512_steps20",
      "dataset": "random",
      "task": "t2i",
      "width": 512,
      "height": 512,
      "num-inference-steps": 20,
      "num-prompts": 10,
      "max-concurrency": 1,
      "enable-negative-prompt": true,
      "baseline": {
        "throughput_qps": 0.001,
        "latency_p99": 1000.0,
        "peak_memory_mb_max": 400000,
        "peak_memory_mb_mean": 400000
      }
    },
    "result": {
      "duration": 13.625132665038109,
      "completed_requests": 10,
      "failed_requests": 0,
      "throughput_qps": 0.7339378078614863,
      "latency_mean": 1.3623275712132454,
      "latency_median": 1.364011375233531,
      "latency_p99": 1.3710215020179748,
      "latency_p95": 1.3701544940471648,
      "latency_p50": 1.364011375233531,
      "peak_memory_mb_max": 60202.0,
      "peak_memory_mb_mean": 60202.0,
      "peak_memory_mb_median": 60202.0,
      "stage_durations_mean": {
        "QwenImagePipeline.text_encoder.forward": 0.012416277453303337,
        "QwenImagePipeline.diffuse": 1.2124435044825077,
        "QwenImagePipeline.vae.decode": 0.018585569970309735
      },
      "stage_durations_p50": {
        "QwenImagePipeline.text_encoder.forward": 0.012366048991680145,
        "QwenImagePipeline.diffuse": 1.2126157889142632,
        "QwenImagePipeline.vae.decode": 0.01858174055814743
      },
      "stage_durations_p99": {
        "QwenImagePipeline.text_encoder.forward": 0.012640358190983534,
        "QwenImagePipeline.diffuse": 1.2175082486867905,
        "QwenImagePipeline.vae.decode": 0.018637520931661128
      },
      "backend": "vllm-omni",
      "model": "Qwen/Qwen-Image",
      "dataset": "random",
      "task": "t2i"
    },
    "log_file": "/home/fq9hpsac/fq9hpsacuser04/workspace/vllm-omni-ddd/tests/perf/results/logs/test_qwen_image_ulysses2_cfg2_vllm-omni_20260323-100309.log"
  },

Essential Elements of an Effective PR Description Checklist
  • The purpose of the PR, such as "Fix some issue (link existing issues this PR will resolve)".
  • The test plan. Please provide the test scripts & test commands. Please state the reasons if your codes don't require additional test scripts. For test file guidelines, please check the test style doc
  • The test results. Please paste the results comparison before and after, or the e2e results.
  • (Optional) The necessary documentation update, such as updating supported_models.md and examples for a new model. Please run mkdocs serve to sync the documentation editions to ./docs.
  • (Optional) Release notes update. If your change is user-facing, please update the release notes draft.

BEFORE SUBMITTING, PLEASE READ https://github.com/vllm-project/vllm-omni/blob/main/CONTRIBUTING.md (anything written below this line will be removed by GitHub Actions)

@wtomin wtomin changed the title [Perf][WIP] Qwen-Image Night Performance CI test [Perf][WIP] Qwen-Image Performance Nightly CI test Mar 11, 2026
@wtomin wtomin force-pushed the perf-diff-qwen-image branch from 350a928 to 2ea794b Compare March 11, 2026 12:58
@wtomin wtomin marked this pull request as ready for review March 12, 2026 01:10
@wtomin wtomin requested a review from hsliuustc0106 as a code owner March 12, 2026 01:10
@wtomin
Copy link
Copy Markdown
Collaborator Author

wtomin commented Mar 12, 2026

@congw729 Alicia, how to solve thi email sending problem?

python tools/nightly/send_nightly_perf_email.py --report-file tests
Missing required env vars: SMTP_HOST, SMTP_PORT, SMTP_USERNAME, SMTP_PASSWORD, DAILY_EMAIL_LIST. Set them (e.g. in Buildkite secrets).

Besides, can you check it the .xlsx file is correct?

@wtomin wtomin changed the title [Perf][WIP] Qwen-Image Performance Nightly CI test [Perf] Qwen-Image Performance Nightly CI test Mar 12, 2026
Copy link
Copy Markdown

@chatgpt-codex-connector chatgpt-codex-connector Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 2ea794b290

ℹ️ About Codex in GitHub

Codex has been enabled to automatically review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

When you sign up for Codex through ChatGPT, Codex can also answer questions or update the PR, like "@codex address that feedback".

Comment thread tests/perf/scripts/run_qwen_image_benchmark.py Outdated
Comment thread tests/perf/scripts/run_qwen_image_benchmark.py Outdated
@wtomin
Copy link
Copy Markdown
Collaborator Author

wtomin commented Mar 12, 2026

@hsliuustc0106 Can you check if the dataset configuration and test case coverage are in line with your expectations?

@congw729
Copy link
Copy Markdown
Collaborator

@congw729 Alicia, how to solve thi email sending problem?

python tools/nightly/send_nightly_perf_email.py --report-file tests
Missing required env vars: SMTP_HOST, SMTP_PORT, SMTP_USERNAME, SMTP_PASSWORD, DAILY_EMAIL_LIST. Set them (e.g. in Buildkite secrets).

Besides, can you check it the .xlsx file is correct?

Hi, you can set those using your personal email account token (like Gmail) for your local testing (which will send email). Or you can set those env to None and using --dry-run to test (only test the most functionality).

@congw729
Copy link
Copy Markdown
Collaborator

Can you send me the performance results JSON file? I need to make sure the combination of all models' performance results is correctly shown.

@congw729
Copy link
Copy Markdown
Collaborator

Have you checked the #1321? I'm not sure if you are using the same key for the same meaning metric.

Comment thread tools/nightly/generate_diffusion_nightly_perf_excel.py Outdated
@hsliuustc0106
Copy link
Copy Markdown
Collaborator

@hsliuustc0106 Can you check if the dataset configuration and test case coverage are in line with your expectations?

if we only check the performance here, we need to use diffusion profiler for each stage and make the inference step smaller, maybe even 1&2 is enough.

Comment thread .buildkite/test-nightly.yml Outdated
@hsliuustc0106
Copy link
Copy Markdown
Collaborator

please check #1657 for dataset A&B&C

@wtomin
Copy link
Copy Markdown
Collaborator Author

wtomin commented Mar 12, 2026

Can you send me the performance results JSON file? I need to make sure the combination of all models' performance results is correctly shown.

Sure. Sent to you.

Have you checked the #1321? I'm not sure if you are using the same key for the same meaning metric.

A lot of metrics name of omni models and diffusion models are not overlapped. An example of diffusion performance json file:

{
  "duration": 55.729680337011814,
  "completed_requests": 10,
  "failed_requests": 0,
  "throughput_qps": 0.17943759841304327,
  "latency_mean": 5.572730791615323,
  "latency_median": 5.546993801603094,
  "latency_p99": 5.834732530997135,
  "latency_p50": 5.546993801603094,
  "peak_memory_mb_max": 0,
  "peak_memory_mb_mean": 0,
  "peak_memory_mb_median": 0,
  "backend": "vllm-omni",
  "model": "Qwen/Qwen-Image",
  "dataset": "random",
  "task": "t2i"
}

I think we cannot use the same key names as the omni models's json file.

@congw729
Copy link
Copy Markdown
Collaborator

congw729 commented Mar 12, 2026

Can you send me the performance results JSON file? I need to make sure the combination of all models' performance results is correctly shown.

Sure. Sent to you.

Have you checked the #1321? I'm not sure if you are using the same key for the same meaning metric.

A lot of metrics name of omni models and diffusion models are not overlapped. An example of diffusion performance json file:

{
  "duration": 55.729680337011814,
  "completed_requests": 10,
  "failed_requests": 0,
  "throughput_qps": 0.17943759841304327,
  "latency_mean": 5.572730791615323,
  "latency_median": 5.546993801603094,
  "latency_p99": 5.834732530997135,
  "latency_p50": 5.546993801603094,
  "peak_memory_mb_max": 0,
  "peak_memory_mb_mean": 0,
  "peak_memory_mb_median": 0,
  "backend": "vllm-omni",
  "model": "Qwen/Qwen-Image",
  "dataset": "random",
  "task": "t2i"
}

I think we cannot use the same key names as the omni models's json file.

I see, in that case, we could have two sub-sheets for different types of models.

@wtomin
Copy link
Copy Markdown
Collaborator Author

wtomin commented Mar 12, 2026

@congw729 As requested, the email sending label is removed from test-nightly.yml. Related edits were reverted.

@Gaohan123 Gaohan123 added this to the v0.18.0 milestone Mar 12, 2026
@congw729
Copy link
Copy Markdown
Collaborator

I have another question, in Omni benchmark tests, we will have max_concurrency and num_prompts to control the request frequency. Do we need those configurations in diffusion models' benchmark tests?

@wtomin
Copy link
Copy Markdown
Collaborator Author

wtomin commented Mar 13, 2026

I have another question, in Omni benchmark tests, we will have max_concurrency and num_prompts to control the request frequency. Do we need those configurations in diffusion models' benchmark tests?

I think max_concurrency value is only valid when batch request is supported for diffusion models. Since most diffusion models do not support batch request (@fhfuih only Qwen-Image support it? I think wan2.2 does not support batch request), then we should always use max_concurrency =1. Even if we enlarge the max_concurrency, the request is still feeding to the pipeline one by one.

Please correct me if I am wrong. @Bounty-hunter Have you tested max_concurrency >1 in your PR #1657?

@fhfuih
Copy link
Copy Markdown
Contributor

fhfuih commented Mar 13, 2026

I think max_concurrency value is only valid when batch request is supported for diffusion models. Since most diffusion models do not support batch request (@fhfuih only Qwen-Image support it? I think wan2.2 does not support batch request),

Correct, not all models support batch request. For the common Alibaba ones, Wan & Qwen-Image-Edit-* don't support. Qwen-Image supports. Z-Image can run, but the internal paddings and stuff may not be truly optimized for batch requests

@wtomin wtomin changed the title [Perf] Qwen-Image Performance Nightly CI test [WIP][Perf] Qwen-Image Performance Nightly CI test Mar 17, 2026
@wtomin wtomin force-pushed the perf-diff-qwen-image branch from 37569a7 to 6806ee8 Compare March 19, 2026 09:25
Comment thread tests/perf/tests/test_qwen_image_vllm_omni.json Outdated
Comment thread tests/perf/tests/test_qwen_image_vllm_omni.json Outdated
@hsliuustc0106
Copy link
Copy Markdown
Collaborator

I suggest to test different resolutions, in each resolution, we only need to select a few(maybe only the lowest latency one) parallel strategy to monitor the performance.

wtomin added 18 commits March 23, 2026 14:59
Signed-off-by: Didan Deng <33117903+wtomin@users.noreply.github.com>
Signed-off-by: Didan Deng <33117903+wtomin@users.noreply.github.com>
Signed-off-by: Didan Deng <33117903+wtomin@users.noreply.github.com>
Signed-off-by: Didan Deng <33117903+wtomin@users.noreply.github.com>
Signed-off-by: Didan Deng <33117903+wtomin@users.noreply.github.com>
Signed-off-by: Didan Deng <33117903+wtomin@users.noreply.github.com>
Signed-off-by: Didan Deng <33117903+wtomin@users.noreply.github.com>
Signed-off-by: Didan Deng <33117903+wtomin@users.noreply.github.com>
Signed-off-by: Didan Deng <33117903+wtomin@users.noreply.github.com>
Signed-off-by: Didan Deng <33117903+wtomin@users.noreply.github.com>
Signed-off-by: Didan Deng <33117903+wtomin@users.noreply.github.com>
Signed-off-by: Didan Deng <33117903+wtomin@users.noreply.github.com>
Signed-off-by: Didan Deng <33117903+wtomin@users.noreply.github.com>
Signed-off-by: Didan Deng <33117903+wtomin@users.noreply.github.com>
Signed-off-by: Didan Deng <33117903+wtomin@users.noreply.github.com>
Signed-off-by: Didan Deng <33117903+wtomin@users.noreply.github.com>
Signed-off-by: Didan Deng <33117903+wtomin@users.noreply.github.com>
Signed-off-by: Didan Deng <33117903+wtomin@users.noreply.github.com>
@wtomin wtomin force-pushed the perf-diff-qwen-image branch from 5860bf4 to 5d1d3c5 Compare March 23, 2026 07:21
wtomin added 4 commits March 23, 2026 15:23
Signed-off-by: Didan Deng <33117903+wtomin@users.noreply.github.com>
Signed-off-by: Didan Deng <33117903+wtomin@users.noreply.github.com>
Signed-off-by: Didan Deng <33117903+wtomin@users.noreply.github.com>
Signed-off-by: Didan Deng <33117903+wtomin@users.noreply.github.com>
@congw729
Copy link
Copy Markdown
Collaborator

Hi, as we discussed earlier, would you attach the baseline results to *.json for future comparison. Ref: #2011

@hsliuustc0106 hsliuustc0106 added the ready label to trigger buildkite CI label Mar 23, 2026
Copy link
Copy Markdown
Collaborator

@Gaohan123 Gaohan123 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM. Thanks

@Gaohan123 Gaohan123 merged commit 1f5eca5 into vllm-project:main Mar 23, 2026
7 of 8 checks passed
clodaghwalsh17 pushed a commit to clodaghwalsh17/nm-vllm-omni-ent that referenced this pull request May 12, 2026
Signed-off-by: Didan Deng <33117903+wtomin@users.noreply.github.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

ready label to trigger buildkite CI

Projects

None yet

Development

Successfully merging this pull request may close these issues.

6 participants