Skip to content

[Test] Add performance tests for Qwen-Image-Layered model#2807

Merged
hsliuustc0106 merged 5 commits intovllm-project:mainfrom
kechengliu97:perf
Apr 16, 2026
Merged

[Test] Add performance tests for Qwen-Image-Layered model#2807
hsliuustc0106 merged 5 commits intovllm-project:mainfrom
kechengliu97:perf

Conversation

@kechengliu97
Copy link
Copy Markdown
Contributor

This pull request adds a new performance test configuration for the Qwen Image Layered model using the vLLM-Omni server. The test includes single-device baselines for two different image sizes and step counts, with detailed benchmark parameters and expected baseline metrics.

Performance testing:

  • Added test_qwen_image_layered_single_device to test_qwen_image_layered_vllm_omni.json, providing single-device performance baselines for the Qwen Image Layered model on the vLLM-Omni server, including two benchmark scenarios with different image resolutions and inference steps.

@chatgpt-codex-connector
Copy link
Copy Markdown

Codex usage limits have been reached for code reviews. Please check with the admins of this repo to increase the limits by adding credits.
Credits must be used to enable repository wide code reviews.

@congw729 congw729 added the nightly-test label to trigger buildkite nightly test CI label Apr 15, 2026
"baseline": {
"throughput_qps": 0.02,
"latency_mean": 40.0,
"peak_memory_mb_max": 82000,
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

82GB? H100 single device only have 80GB, how could this become the baseline?

"baseline": {
"throughput_qps": 0.005,
"latency_mean": 150.0,
"peak_memory_mb_max": 90000,
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

same here

"enable-negative-prompt": true,
"baseline": {
"throughput_qps": 0.02,
"latency_mean": 40.0,
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

what's the tested latency results locally?

@hsliuustc0106
Copy link
Copy Markdown
Collaborator

hsliuustc0106 commented Apr 15, 2026


 /usr/bin/python3 -u /workspace/build/buildkite/benchmarks/diffusion/diffusion_benchmark_serving.py --host 127.0.0.1 --port 55451 --model Qwen/Qwen-Image-Layered --backend vllm-omni --dataset random --task i2i --output-file /tmp/diffusion_bench_tmp_8d2oggzq.json --width 1024 --height 1024 --num-inference-steps 35 --num-prompts 10 --max-concurrency 1 --enable-negative-prompt
--
 
================= Serving Benchmark Result =================
Backend:                                 vllm-omni
Model:                                   Qwen/Qwen-Image-Layered
Dataset:                                 random
Task:                                    i2i
--------------------------------------------------
Benchmark duration (s):                  249.65
Request rate:                            inf
Max request concurrency:                 1
Successful requests:                     10/10
--------------------------------------------------
Request throughput (req/s):              0.04
Latency Mean (s):                        24.9649
Latency Median (s):                      24.9634
Latency P99 (s):                         25.0931
Latency P95 (s):                         25.0926
--------------------------------------------------
Peak Memory Max (MB):                    62910.00
Peak Memory Mean (MB):                   62910.00
Peak Memory Median (MB):                 62910.00
--------------------------------------------------
Stage Durations Mean (s):
QwenImageLayeredPipeline.text_encoder.forward: 0.0374
QwenImageLayeredPipeline.vae.encode:   0.0179
QwenImageLayeredPipeline.diffuse:      24.2250
QwenImageLayeredPipeline.vae.decode:   0.1088
 
============================================================

@Gaohan123
Copy link
Copy Markdown
Collaborator

What is the relationship between this and PR #2772 ?

@kechengliu97
Copy link
Copy Markdown
Contributor Author

What is the relationship between this and PR #2772 ?

2772 Focuses on the accuracy while this is targeted for performance

@kechengliu97
Copy link
Copy Markdown
Contributor Author

kechengliu97 commented Apr 16, 2026


 /usr/bin/python3 -u /workspace/build/buildkite/benchmarks/diffusion/diffusion_benchmark_serving.py --host 127.0.0.1 --port 55451 --model Qwen/Qwen-Image-Layered --backend vllm-omni --dataset random --task i2i --output-file /tmp/diffusion_bench_tmp_8d2oggzq.json --width 1024 --height 1024 --num-inference-steps 35 --num-prompts 10 --max-concurrency 1 --enable-negative-prompt
--
 
================= Serving Benchmark Result =================
Backend:                                 vllm-omni
Model:                                   Qwen/Qwen-Image-Layered
Dataset:                                 random
Task:                                    i2i
--------------------------------------------------
Benchmark duration (s):                  249.65
Request rate:                            inf
Max request concurrency:                 1
Successful requests:                     10/10
--------------------------------------------------
Request throughput (req/s):              0.04
Latency Mean (s):                        24.9649
Latency Median (s):                      24.9634
Latency P99 (s):                         25.0931
Latency P95 (s):                         25.0926
--------------------------------------------------
Peak Memory Max (MB):                    62910.00
Peak Memory Mean (MB):                   62910.00
Peak Memory Median (MB):                 62910.00
--------------------------------------------------
Stage Durations Mean (s):
QwenImageLayeredPipeline.text_encoder.forward: 0.0374
QwenImageLayeredPipeline.vae.encode:   0.0179
QwenImageLayeredPipeline.diffuse:      24.2250
QwenImageLayeredPipeline.vae.decode:   0.1088
 
============================================================
================= Serving Benchmark Result =================
Backend:                                 vllm-omni      
Model:                                   /home/models/Qwen/Qwen-Image-Layered
Dataset:                                 random         
Task:                                    i2i            
--------------------------------------------------
Benchmark duration (s):                  683.43         
Request rate:                            inf            
Max request concurrency:                 1              
Successful requests:                     10/10             
--------------------------------------------------
Request throughput (req/s):              0.01           
Latency Mean (s):                        68.3425        
Latency Median (s):                      68.3621        
Latency P99 (s):                         68.4734        
Latency P95 (s):                         68.4632        
--------------------------------------------------
Peak Memory Max (MB):                    62856.00       
Peak Memory Mean (MB):                   62856.00       
Peak Memory Median (MB):                 62856.00       
--------------------------------------------------
Stage Durations Mean (s):
  QwenImageLayeredPipeline.text_encoder.forward: 0.0511         
  QwenImageLayeredPipeline.vae.encode:   0.0325         
  QwenImageLayeredPipeline.diffuse:      67.3730        
  QwenImageLayeredPipeline.vae.decode:   0.1994         

============================================================

@kechengliu97 kechengliu97 force-pushed the perf branch 2 times, most recently from f5e20dc to 2275bc3 Compare April 16, 2026 01:34
@Gaohan123 Gaohan123 added the ready label to trigger buildkite CI label Apr 16, 2026
Add a new perf test JSON for the Qwen/Qwen-Image-Layered model using the vllm-omni server. Defines a single-device baseline with diffusion pipeline profiler enabled and two benchmark scenarios (640x640, 20 steps; 1024x1024, 35 steps) including baseline throughput, latency, and peak memory targets.

Signed-off-by: John Liu BUAA <liukecheng97@gmail.com>
Run an additional pytest using tests/dfx/perf/tests/test_qwen_image_layered_vllm_omni.json in the nightly diffusion pipeline. Capture its exit code as EXIT4 and include it in the success conditional and final exit bitwise OR so the step considers this new test when uploading artifacts and determining overall status.

Signed-off-by: John Liu BUAA <liukecheng97@gmail.com>
Run an additional diffusion benchmark (tests/dfx/perf/tests/test_qwen_image_layered_vllm_omni.json) in the nightly pipeline, capture its exit code (EXIT4), and include it in the success condition so artifacts (results and logs) are uploaded if any benchmark succeeds.

Signed-off-by: John Liu BUAA <liukecheng97@gmail.com>
Copy link
Copy Markdown
Contributor

@fhfuih fhfuih left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM. The errors in the latest CI pipeline seems unrelated: https://buildkite.com/vllm/vllm-omni/builds/6894/steps/canvas

  • Output (Accuracy?) assertion in Omni model
  • Timeout in diffusion functionality test

@yenuo26 yenuo26 removed the nightly-test label to trigger buildkite nightly test CI label Apr 16, 2026
@hsliuustc0106 hsliuustc0106 merged commit e8658b5 into vllm-project:main Apr 16, 2026
8 checks passed
lvliang-intel pushed a commit to lvliang-intel/vllm-omni that referenced this pull request Apr 20, 2026
…ct#2807)

Signed-off-by: John Liu BUAA <liukecheng97@gmail.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

ready label to trigger buildkite CI

Projects

None yet

Development

Successfully merging this pull request may close these issues.

6 participants