[Test] Add performance tests for Qwen-Image-Layered model by kechengliu97 · Pull Request #2807 · vllm-project/vllm-omni

kechengliu97 · 2026-04-15T03:30:02Z

This pull request adds a new performance test configuration for the Qwen Image Layered model using the vLLM-Omni server. The test includes single-device baselines for two different image sizes and step counts, with detailed benchmark parameters and expected baseline metrics.

Performance testing:

Added test_qwen_image_layered_single_device to test_qwen_image_layered_vllm_omni.json, providing single-device performance baselines for the Qwen Image Layered model on the vLLM-Omni server, including two benchmark scenarios with different image resolutions and inference steps.

chatgpt-codex-connector · 2026-04-15T03:30:08Z

Codex usage limits have been reached for code reviews. Please check with the admins of this repo to increase the limits by adding credits.
Credits must be used to enable repository wide code reviews.

hsliuustc0106 · 2026-04-15T12:05:53Z

+                "baseline": {
+                    "throughput_qps": 0.02,
+                    "latency_mean": 40.0,
+                    "peak_memory_mb_max": 82000,


82GB? H100 single device only have 80GB, how could this become the baseline?

hsliuustc0106 · 2026-04-15T12:06:05Z

+                "baseline": {
+                    "throughput_qps": 0.005,
+                    "latency_mean": 150.0,
+                    "peak_memory_mb_max": 90000,


hsliuustc0106 · 2026-04-15T12:06:24Z

+                "enable-negative-prompt": true,
+                "baseline": {
+                    "throughput_qps": 0.02,
+                    "latency_mean": 40.0,


what's the tested latency results locally?

hsliuustc0106 · 2026-04-15T12:12:12Z


 /usr/bin/python3 -u /workspace/build/buildkite/benchmarks/diffusion/diffusion_benchmark_serving.py --host 127.0.0.1 --port 55451 --model Qwen/Qwen-Image-Layered --backend vllm-omni --dataset random --task i2i --output-file /tmp/diffusion_bench_tmp_8d2oggzq.json --width 1024 --height 1024 --num-inference-steps 35 --num-prompts 10 --max-concurrency 1 --enable-negative-prompt
--
 
================= Serving Benchmark Result =================
Backend:                                 vllm-omni
Model:                                   Qwen/Qwen-Image-Layered
Dataset:                                 random
Task:                                    i2i
--------------------------------------------------
Benchmark duration (s):                  249.65
Request rate:                            inf
Max request concurrency:                 1
Successful requests:                     10/10
--------------------------------------------------
Request throughput (req/s):              0.04
Latency Mean (s):                        24.9649
Latency Median (s):                      24.9634
Latency P99 (s):                         25.0931
Latency P95 (s):                         25.0926
--------------------------------------------------
Peak Memory Max (MB):                    62910.00
Peak Memory Mean (MB):                   62910.00
Peak Memory Median (MB):                 62910.00
--------------------------------------------------
Stage Durations Mean (s):
QwenImageLayeredPipeline.text_encoder.forward: 0.0374
QwenImageLayeredPipeline.vae.encode:   0.0179
QwenImageLayeredPipeline.diffuse:      24.2250
QwenImageLayeredPipeline.vae.decode:   0.1088
 
============================================================

Gaohan123 · 2026-04-15T16:47:56Z

What is the relationship between this and PR #2772 ?

kechengliu97 · 2026-04-16T01:22:43Z

What is the relationship between this and PR #2772 ?

2772 Focuses on the accuracy while this is targeted for performance

kechengliu97 · 2026-04-16T01:27:11Z


 /usr/bin/python3 -u /workspace/build/buildkite/benchmarks/diffusion/diffusion_benchmark_serving.py --host 127.0.0.1 --port 55451 --model Qwen/Qwen-Image-Layered --backend vllm-omni --dataset random --task i2i --output-file /tmp/diffusion_bench_tmp_8d2oggzq.json --width 1024 --height 1024 --num-inference-steps 35 --num-prompts 10 --max-concurrency 1 --enable-negative-prompt
--
 
================= Serving Benchmark Result =================
Backend:                                 vllm-omni
Model:                                   Qwen/Qwen-Image-Layered
Dataset:                                 random
Task:                                    i2i
--------------------------------------------------
Benchmark duration (s):                  249.65
Request rate:                            inf
Max request concurrency:                 1
Successful requests:                     10/10
--------------------------------------------------
Request throughput (req/s):              0.04
Latency Mean (s):                        24.9649
Latency Median (s):                      24.9634
Latency P99 (s):                         25.0931
Latency P95 (s):                         25.0926
--------------------------------------------------
Peak Memory Max (MB):                    62910.00
Peak Memory Mean (MB):                   62910.00
Peak Memory Median (MB):                 62910.00
--------------------------------------------------
Stage Durations Mean (s):
QwenImageLayeredPipeline.text_encoder.forward: 0.0374
QwenImageLayeredPipeline.vae.encode:   0.0179
QwenImageLayeredPipeline.diffuse:      24.2250
QwenImageLayeredPipeline.vae.decode:   0.1088
 
============================================================

================= Serving Benchmark Result =================
Backend:                                 vllm-omni      
Model:                                   /home/models/Qwen/Qwen-Image-Layered
Dataset:                                 random         
Task:                                    i2i            
--------------------------------------------------
Benchmark duration (s):                  683.43         
Request rate:                            inf            
Max request concurrency:                 1              
Successful requests:                     10/10             
--------------------------------------------------
Request throughput (req/s):              0.01           
Latency Mean (s):                        68.3425        
Latency Median (s):                      68.3621        
Latency P99 (s):                         68.4734        
Latency P95 (s):                         68.4632        
--------------------------------------------------
Peak Memory Max (MB):                    62856.00       
Peak Memory Mean (MB):                   62856.00       
Peak Memory Median (MB):                 62856.00       
--------------------------------------------------
Stage Durations Mean (s):
  QwenImageLayeredPipeline.text_encoder.forward: 0.0511         
  QwenImageLayeredPipeline.vae.encode:   0.0325         
  QwenImageLayeredPipeline.diffuse:      67.3730        
  QwenImageLayeredPipeline.vae.decode:   0.1994         

============================================================

Add a new perf test JSON for the Qwen/Qwen-Image-Layered model using the vllm-omni server. Defines a single-device baseline with diffusion pipeline profiler enabled and two benchmark scenarios (640x640, 20 steps; 1024x1024, 35 steps) including baseline throughput, latency, and peak memory targets. Signed-off-by: John Liu BUAA <liukecheng97@gmail.com>

Run an additional pytest using tests/dfx/perf/tests/test_qwen_image_layered_vllm_omni.json in the nightly diffusion pipeline. Capture its exit code as EXIT4 and include it in the success conditional and final exit bitwise OR so the step considers this new test when uploading artifacts and determining overall status. Signed-off-by: John Liu BUAA <liukecheng97@gmail.com>

Run an additional diffusion benchmark (tests/dfx/perf/tests/test_qwen_image_layered_vllm_omni.json) in the nightly pipeline, capture its exit code (EXIT4), and include it in the success condition so artifacts (results and logs) are uploaded if any benchmark succeeds. Signed-off-by: John Liu BUAA <liukecheng97@gmail.com>

fhfuih

LGTM. The errors in the latest CI pipeline seems unrelated: https://buildkite.com/vllm/vllm-omni/builds/6894/steps/canvas

Output (Accuracy?) assertion in Omni model
Timeout in diffusion functionality test

…ct#2807) Signed-off-by: John Liu BUAA <liukecheng97@gmail.com>

kechengliu97 requested a review from hsliuustc0106 as a code owner April 15, 2026 07:53

congw729 added the nightly-test label to trigger buildkite nightly test CI label Apr 15, 2026

yenuo26 mentioned this pull request Apr 15, 2026

[RFC]: CI optimization and supplementary task tracking JiusiServe/vllm-omni#177

Open

19 tasks

hsliuustc0106 reviewed Apr 15, 2026

View reviewed changes

kechengliu97 force-pushed the perf branch 2 times, most recently from f5e20dc to 2275bc3 Compare April 16, 2026 01:34

Gaohan123 added the ready label to trigger buildkite CI label Apr 16, 2026

kechengliu97 force-pushed the perf branch from 78377ba to 75614ba Compare April 16, 2026 03:05

kechengliu97 added 2 commits April 16, 2026 11:37

kechengliu97 force-pushed the perf branch from 2c8ca40 to 2bdfe11 Compare April 16, 2026 03:38

kechengliu97 force-pushed the perf branch from 1dd50b5 to fd13f9e Compare April 16, 2026 06:27

kechengliu97 added 2 commits April 16, 2026 14:27

Merge branch 'main' into perf

12cd5c0

Merge branch 'main' into perf

f993c95

fhfuih reviewed Apr 16, 2026

View reviewed changes

yenuo26 removed the nightly-test label to trigger buildkite nightly test CI label Apr 16, 2026

hsliuustc0106 merged commit e8658b5 into vllm-project:main Apr 16, 2026
8 checks passed

lvliang-intel pushed a commit to lvliang-intel/vllm-omni that referenced this pull request Apr 20, 2026

[Test] Add performance tests for Qwen-Image-Layered model (vllm-proje…

32414bc

…ct#2807) Signed-off-by: John Liu BUAA <liukecheng97@gmail.com>

kechengliu97 mentioned this pull request Apr 20, 2026

[RFC]: Qwen-Image-Layered Test case JiusiServe/vllm-omni#197

Closed

1 task

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Test] Add performance tests for Qwen-Image-Layered model#2807

[Test] Add performance tests for Qwen-Image-Layered model#2807
hsliuustc0106 merged 5 commits intovllm-project:mainfrom
kechengliu97:perf

kechengliu97 commented Apr 15, 2026

Uh oh!

chatgpt-codex-connector Bot commented Apr 15, 2026

Uh oh!

hsliuustc0106 Apr 15, 2026

Uh oh!

hsliuustc0106 Apr 15, 2026

Uh oh!

hsliuustc0106 Apr 15, 2026

Uh oh!

hsliuustc0106 commented Apr 15, 2026 •

edited

Loading

Uh oh!

Gaohan123 commented Apr 15, 2026

Uh oh!

kechengliu97 commented Apr 16, 2026

Uh oh!

kechengliu97 commented Apr 16, 2026 •

edited

Loading

Uh oh!

fhfuih left a comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

6 participants

Conversation

kechengliu97 commented Apr 15, 2026

Uh oh!

chatgpt-codex-connector Bot commented Apr 15, 2026

Uh oh!

hsliuustc0106 Apr 15, 2026

Choose a reason for hiding this comment

Uh oh!

hsliuustc0106 Apr 15, 2026

Choose a reason for hiding this comment

Uh oh!

hsliuustc0106 Apr 15, 2026

Choose a reason for hiding this comment

Uh oh!

hsliuustc0106 commented Apr 15, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Gaohan123 commented Apr 15, 2026

Uh oh!

kechengliu97 commented Apr 16, 2026

Uh oh!

kechengliu97 commented Apr 16, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

fhfuih left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

6 participants

hsliuustc0106 commented Apr 15, 2026 •

edited

Loading

kechengliu97 commented Apr 16, 2026 •

edited

Loading