[Test] add stability test case for wan2.2, qwen-tts, qwen3-omni and qwen-image model and modified conftest.py in test/dfx/ by zhumingjue138 · Pull Request #2817 · vllm-project/vllm-omni

zhumingjue138 · 2026-04-15T08:05:53Z

PLEASE FILL IN THE PR DESCRIPTION HERE ENSURING ALL CHECKLIST ITEMS (AT THE BOTTOM) HAVE BEEN CONSIDERED.

Purpose

add stability test case for wan2.2 model and modified conftest.py in test/dfx/

Test Plan

1、modified conftest.py in test/dfx/

2、Split the file "tests/dfx/stability/scripts/test_benchmark_stability.py" according to the model names and rename it as "tests/dfx/stability/scripts/test_stability_qwen3_omni.py" and "tests/dfx/stability/scripts/test_stability_wan22.py"

pytest -s -v tests/dfx/perf/scripts/run_benchmark.py
pytest -s -v tests/dfx/stability/scripts/test_stability_qwen3_omni.py
pytest -s -v tests/dfx/stability/scripts/test_stability_wan22.py
pytest -s -v tests/dfx/stability/scripts/test_stability_qwen_image.py
pytest -s -v tests/dfx/stability/scripts/test_stability_qwen3_tts.py

Test Result

pytest -s -v tests/dfx/perf/scripts/run_benchmark.py

pytest -s -v tests/dfx/stability/scripts/test_stability_qwen3_omni.py

[
    {
        "test_name": "test_qwen3_omni_stability_async_chunk",
        "server_params": {
            "model": "/home/models/Qwen/Qwen3-Omni-30B-A3B-Instruct",
            "stage_overrides": {
                "2": {
                    "max_num_batched_tokens": 1000000
                }
            },
            "extra_cli_args": ["--async-chunk"]
        },
        "benchmark_params": [
            {
                "dataset_name": "random-mm",
                "backend": "openai-chat-omni",
                "endpoint": "/v1/chat/completions",
                "duration_sec": 200,
                "request_rate": 0.3,
                "num_prompts_per_batch": 10,
                "random_input_len": {
                    "min": 0,
                    "max": 8000
                },
                "random_output_len": {
                    "min": 0,
                    "max": 1000
                },
                "random_range_ratio": 0.0,
                "random_mm_base_items_per_request": {
                    "min": 0,
                    "max": 6
                },
                "random_mm_num_mm_items_range_ratio": 0.0,
                "random_mm_limit_mm_per_prompt": {
                    "image": 2,
                    "video": 2,
                    "audio": 2
                },
                "random_mm_bucket_config": {
                    "(128-1024, 128-1024, 1)": 0.34,
                    "(256-1080, 256-1920, 2-16)": 0.33,
                    "(0, 1-60, 1-3)": 0.33
                },
                "ignore_eos": true,
                "percentile-metrics": "ttft,tpot,itl,e2el,audio_rtf,audio_ttfp,audio_duration"
            }
        ]
    }
]

pytest -s -v tests/dfx/stability/scripts/test_stability_qwen_image.py

[
    {
        "test_name": "test_qwen_image_stability",
        "server_params": {
            "model": "/nvme1n1p1/models/Qwen/Qwen-Image"
        },
        "benchmark_params": [
            {
                "dataset": "random",
                "task": "t2i",
                "backend": "vllm-omni",
                "duration_sec": 200,
                "max_concurrency": 1,
                "num_prompts_per_batch": 2,
                "width": {
                    "min": 512,
                    "max": 2048
                },
                "height": {
                    "min": 512,
                    "max": 2048
                },
                "num_inference_steps": 50,
                "enable_negative_prompt": true
            }
        ]
    }
]

pytest -s -v tests/dfx/stability/scripts/test_stability_qwen3_tts.py

[
    {
        "test_name": "test_qwen3_tts_stability",
        "server_params": {
            "model": "/nvme1n1p1/models/Qwen3-TTS-12Hz-1.7B-CustomVoice"
        },
        "benchmark_params": [
            {
                "dataset_name": "random",
                "backend": "openai-audio-speech",
                "endpoint": "/v1/audio/speech",
                "duration_sec": 120,
                "request_rate": 0.3,
                "num_prompts_per_batch": 3,
                "random_input_len": {
                    "min": 0,
                    "max": 1000
                },
                "random_output_len": {
                    "min": 0,
                    "max": 1000
                },
                "random_range_ratio": 0.0,
                "extra_body": {
                    "voice": "Vivian",
                    "language": "English"
                },
                "ignore_eos": true,
                "percentile-metrics": "ttft,e2el,audio_rtf,audio_ttfp,audio_duration"
            }
        ]
    }
]

pytest -s -v tests/dfx/stability/scripts/test_stability_wan22.py

[
    {
        "test_name": "test_wan22_t2v_stability_v1_videos",
        "server_params": {
            "model": "/home/models/Wan-AI/Wan2.2-T2V-A14B-Diffusers",
            "serve_args": {
                "ulysses-degree": 1,
                "vae-patch-parallel-size": 2,
                "cfg-parallel-size": 1,
                "tensor-parallel-size": 1,
                "use-hsdp": true,
                "hsdp_shard_size": 2,
                "hsdp_replicate_size": 1,
                "vae-use-slicing": true,
                "vae-use-tiling": true
            }
        },
        "benchmark_params": [
            {
                "dataset": "random",
                "task": "t2v",
                "backend": "v1/videos",
                "duration_sec": 300,
                "max_concurrency": 1,
                "num_prompts_per_batch": 3,
                "enable_negative_prompt": true,
                "random_request_config": [
                    {"width": 854, "height": 480, "num_inference_steps": 3, "num_frames": 80, "fps": 16, "weight": 0.65},
                    {"width": 854, "height": 480, "num_inference_steps": 4, "num_frames": 120, "fps": 24, "weight": 0.25},
                    {"width": 1280, "height": 720, "num_inference_steps": 6, "num_frames": 80, "fps": 16, "weight": 0.1}
                ]
            }
        ]
    }
]

================= Serving Benchmark Result =================
Backend:                                 v1/videos
Model:                                   /home/models/Wan-AI/Wan2.2-T2V-A14B-Diffusers
Dataset:                                 random
Task:                                    t2v
--------------------------------------------------
Benchmark duration (s):                  170.36
Request rate:                            inf
Max request concurrency:                 1
Successful requests:                     3/3
--------------------------------------------------
Request throughput (req/s):              0.02
Latency Mean (s):                        56.7875
Latency Median (s):                      56.1269
Latency P99 (s):                         58.0920
Latency P95 (s):                         57.9316
--------------------------------------------------
Peak Memory Max (MB):                    51132.00
Peak Memory Mean (MB):                   50554.00
Peak Memory Median (MB):                 51040.00

============================================================
Metrics saved to /tmp/stability_diffusion_q30pi7rf.json
================= Serving Benchmark Result =================
Backend:                                 v1/videos
Model:                                   /home/models/Wan-AI/Wan2.2-T2V-A14B-Diffusers
Dataset:                                 random
Task:                                    t2v
--------------------------------------------------
Benchmark duration (s):                  173.02
Request rate:                            inf
Max request concurrency:                 1
Successful requests:                     3/3
--------------------------------------------------
Request throughput (req/s):              0.02
Latency Mean (s):                        57.6723
Latency Median (s):                      58.1305
Latency P99 (s):                         58.2935
Latency P95 (s):                         58.2802
--------------------------------------------------
Peak Memory Max (MB):                    51186.00
Peak Memory Mean (MB):                   50572.00
Peak Memory Median (MB):                 51040.00

============================================================
Metrics saved to /tmp/stability_diffusion_wj55pjs7.json
============ Stability Benchmark Summary ============
Successful requests:                     6
Failed requests:                         0
Total duration (s):                      352.24
==================================================

nightly CI

24h test:

============ Stability Benchmark Summary ============
Successful requests:                     1752      
Failed requests:                         68        
Total duration (s):                      93786.44  
==================================================

related issue: #2928

12h test:

[
    {
        "test_name": "test_wan22_t2v_stability_v1_videos",
        "server_params": {
            "model": "/nvme1n1p1/models/Wan2.2-I2V-A14B-Diffusers/snapshots/596658fd9ca6b7b71d5057529bbf319ecbc61d74",
            "serve_args": {
                "ulysses-degree": 2,
                "vae-patch-parallel-size": 2,
                "tensor-parallel-size": 1,
                "use-hsdp": true,
                "vae-use-slicing": true,
                "vae-use-tiling": true
            }
        },
        "benchmark_params": [
            {
                "dataset": "random",
                "task": "i2v",
                "backend": "v1/videos",
                "duration_sec": 43200,
                "max_concurrency": 1,
                "num_prompts_per_batch": 10,
                "enable_negative_prompt": true,
                "random_request_config": [
                    {"width": 832, "height": 480, "num_inference_steps": 2, "num_frames": 81, "fps": 16, "weight": 0.5},
                    {"width": 1280, "height": 720, "num_inference_steps": 2, "num_frames": 121, "fps": 16, "weight": 0.5}
                ]
            }
        ]
    }
]

result
============ Stability Benchmark Summary ============
Successful requests:                     490
Failed requests:                         0
Total duration (s):                      43387.18
==================================================

another 12h test

============ Stability Benchmark Summary ============
Successful requests:                     440
Failed requests:                         0
Total duration (s):                      40191.27
==================================================

Essential Elements of an Effective PR Description Checklist

The purpose of the PR, such as "Fix some issue (link existing issues this PR will resolve)".
The test plan. Please provide the test scripts & test commands. Please state the reasons if your codes don't require additional test scripts. For test file guidelines, please check the test style doc
The test results. Please paste the results comparison before and after, or the e2e results.
(Optional) The necessary documentation update, such as updating supported_models.md and examples for a new model. Please run mkdocs serve to sync the documentation editions to ./docs.
(Optional) Release notes update. If your change is user-facing, please update the release notes draft.

BEFORE SUBMITTING, PLEASE READ https://github.com/vllm-project/vllm-omni/blob/main/CONTRIBUTING.md (anything written below this line will be removed by GitHub Actions)

…ions and scripts - Introduced new stability test scripts for Qwen3-Omni and Wan2.2 models, including `test_stability_qwen3_omni.py` and `test_stability_wan22.py`. - Added corresponding JSON configuration files for both models to define benchmark parameters. - Updated existing documentation to reflect changes in stability testing configurations and methods. - Enhanced the `conftest.py` files to support new test structures and parameters. These additions aim to improve the stability testing framework and provide comprehensive benchmarks for the new models. Signed-off-by: zhumingjue <zhumingjue@huawei.com>

Signed-off-by: zhumingjue <zhumingjue@huawei.com>

…mands and benchmark execution - Added L5 stability testing commands for Qwen3-Omni and Wan2.2 models in the test guide. - Introduced a new `run_benchmark` function in `conftest.py` to streamline benchmark execution and result handling. - Refactored existing stability test scripts to utilize the new benchmark execution method, improving code organization and maintainability. These updates aim to enhance the stability testing capabilities and provide clearer guidance for executing benchmarks. Signed-off-by: zhumingjue <zhumingjue@huawei.com>

…8/vllm-omni into main-longterm-wan22

Signed-off-by: zhumingjue <zhumingjue@huawei.com>

chatgpt-codex-connector · 2026-04-15T08:05:59Z

Codex usage limits have been reached for code reviews. Please check with the admins of this repo to increase the limits by adding credits.
Credits must be used to enable repository wide code reviews.

hsliuustc0106 · 2026-04-15T09:26:12Z

The test_wan22.json config only runs 3 prompts (num_prompts_per_batch=3) over 300 seconds with max_concurrency=1. Is this sufficient for a stability test? Consider increasing num_prompts_per_batch to get better coverage.

Signed-off-by: zhumingjue138 <zhumingjue@huawei.com>

…string in conftest.py Signed-off-by: zhumingjue <zhumingjue@huawei.com>

…server params creation and update OmniServer fixture to accommodate stage config paths. Signed-off-by: zhumingjue <zhumingjue@huawei.com>

… improve performance stability. Signed-off-by: zhumingjue <zhumingjue@huawei.com>

…s; introduce serve_args support for OmniServer fixture and streamline unique server params creation. Signed-off-by: zhumingjue <zhumingjue@huawei.com>

Signed-off-by: zhumingjue <zhumingjue@huawei.com>

…iptions Signed-off-by: zhumingjue <zhumingjue@huawei.com>

Signed-off-by: zhumingjue <zhumingjue@huawei.com>

… gracefully. Updated paths to use 'deploy' directory instead of 'stage_configs'. Signed-off-by: zhumingjue <zhumingjue@huawei.com>

…bility_qwen3_omni.py to allow for longer initialization periods. Signed-off-by: zhumingjue <zhumingjue@huawei.com>

Signed-off-by: zhumingjue <zhumingjue@huawei.com>

…e stability test functionality. Signed-off-by: zhumingjue <zhumingjue@huawei.com>

…t.py and update test configurations - Introduced functions to sample integer values from specified ranges and to handle bucket key sampling. - Updated benchmark parameters in test_qwen3_omni.json to use range specifications for input and output lengths, and adjusted request rates. - Changed dataset names from "random" to "random-mm" for clarity. Signed-off-by: zhumingjue <zhumingjue@huawei.com>

…pecifications for bucket keys - Modified the bucket configuration from "(0, 60, 3)" to "(0, 1-60, 1-3)" for improved clarity and consistency with recent changes in sampling functionality. Signed-off-by: zhumingjue <zhumingjue@huawei.com>

…y testing adjustments Signed-off-by: zhumingjue <zhumingjue@huawei.com>

…event ffmpeg encoding failures - Added logic to ensure that height and width are even numbers when the number of frames is greater than one, addressing potential encoding/decoding issues. Signed-off-by: zhumingjue <zhumingjue@huawei.com>

…onfigurations - Introduced new test scripts for Qwen-Image and Qwen3-TTS stability benchmarks, utilizing parameterized test cases to handle various server configurations and benchmark parameters. - Updated the `_sample_stability_batch_params` function in `conftest.py` to include additional fields for width and height, enhancing the sampling capabilities for stability tests. Signed-off-by: zhumingjue <zhumingjue@huawei.com>

- Introduced JSON test files for Qwen-Image and Qwen3-TTS, defining server and benchmark parameters for stability testing. - Each test includes detailed configurations such as model specifications, dataset names, and various performance metrics. Signed-off-by: zhumingjue <zhumingjue@huawei.com>

- Removed redundant parameters and adjusted the random_range_ratio to 0.0 for improved stability testing. - Updated random_mm_bucket_config to use range specifications for clarity. - Cleaned up the JSON structure by eliminating unnecessary dataset entries. Signed-off-by: zhumingjue <zhumingjue@huawei.com>

yenuo26 · 2026-04-21T01:29:37Z

+            os.environ.pop("BENCHMARK_DIR")
+
+
+def _run_one_diffusion_batch(


i think these can be moved to helpers.py

- Moved benchmark helper functions from `conftest.py` to `helpers.py` for better organization and clarity. - Updated test scripts to import benchmark functions from `helpers.py`, ensuring a cleaner structure. - Enhanced the documentation in `conftest.py` to reflect the new organization of helper functions. Signed-off-by: zhumingjue <zhumingjue@huawei.com>

zhumingjue138 · 2026-04-22T07:39:57Z

@Gaohan123 @hsliuustc0106 this pr is ready, can it be merged?

Signed-off-by: zhumingjue <zhumingjue@huawei.com>

…8/vllm-omni into main-longterm-wan22

…wen-image model and modified conftest.py in test/dfx/ (vllm-project#2817) Signed-off-by: zhumingjue <zhumingjue@huawei.com> Signed-off-by: zhumingjue138 <zhumingjue@huawei.com> Signed-off-by: hongzhigao <761417898@qq.com>

…wen-image model and modified conftest.py in test/dfx/ (vllm-project#2817) Signed-off-by: zhumingjue <zhumingjue@huawei.com> Signed-off-by: zhumingjue138 <zhumingjue@huawei.com>

zhumingjue138 added 8 commits April 14, 2026 11:33

fix json

4435b98

Signed-off-by: zhumingjue <zhumingjue@huawei.com>

Merge branch 'vllm-project:main' into main-longterm-wan22

e7f6b5b

Merge branch 'main-longterm-wan22' of https://github.com/zhumingjue13…

b6e3d54

…8/vllm-omni into main-longterm-wan22

add new import

3700dda

Signed-off-by: zhumingjue <zhumingjue@huawei.com>

Merge remote-tracking branch 'upstream/main' into main-longterm-wan22

4cddb90

Signed-off-by: zhumingjue <zhumingjue@huawei.com>

Merge branch 'vllm-project:main' into main-longterm-wan22

439a15f

Merge branch 'main' into main-longterm-wan22

6400ba7

yenuo26 added the nightly-test label to trigger buildkite nightly test CI label Apr 15, 2026

zhumingjue138 added 5 commits April 16, 2026 14:25

Merge branch 'main' into main-longterm-wan22

595bc77

Signed-off-by: zhumingjue138 <zhumingjue@huawei.com>

Refactor parameter handling in benchmark scripts; remove outdated doc…

e262c2c

…string in conftest.py Signed-off-by: zhumingjue <zhumingjue@huawei.com>

Refactor server parameter handling in conftest.py; streamline unique …

893a58f

…server params creation and update OmniServer fixture to accommodate stage config paths. Signed-off-by: zhumingjue <zhumingjue@huawei.com>

Update OmniServer stage initialization timeout in benchmark script to…

7e08f49

… improve performance stability. Signed-off-by: zhumingjue <zhumingjue@huawei.com>

Merge branch 'main' into main-longterm-wan22

603a1bc

yenuo26 added omni-test label to trigger buildkite omni model test in nightly CI ready label to trigger buildkite CI and removed nightly-test label to trigger buildkite nightly test CI omni-test label to trigger buildkite omni model test in nightly CI labels Apr 16, 2026

Enhance server parameter handling in conftest.py and benchmark script…

4b5765a

…s; introduce serve_args support for OmniServer fixture and streamline unique server params creation. Signed-off-by: zhumingjue <zhumingjue@huawei.com>

zhumingjue138 mentioned this pull request Apr 20, 2026

[Bug]: "No available shared memory broadcast block found in 60 seconds" for wan2.2 model after 14h stability test #2928

Closed

1 task

zhumingjue138 added 7 commits April 20, 2026 14:21

Merge remote-tracking branch 'upstream/main' into main-longterm-wan22

12ef429

Signed-off-by: zhumingjue <zhumingjue@huawei.com>

fix precommit

118de59

Signed-off-by: zhumingjue <zhumingjue@huawei.com>

fix error

0cf9d8f

Signed-off-by: zhumingjue <zhumingjue@huawei.com>

Update comments in run_benchmark.py to use en dashes for metric descr…

975d9a9

…iptions Signed-off-by: zhumingjue <zhumingjue@huawei.com>

change duration_sec

6a3cc49

Signed-off-by: zhumingjue <zhumingjue@huawei.com>

Refactor stability test scripts to handle missing configuration files…

926c042

… gracefully. Updated paths to use 'deploy' directory instead of 'stage_configs'. Signed-off-by: zhumingjue <zhumingjue@huawei.com>

Increase stability server timeout from 120 to 600 seconds in test_sta…

0b6a339

…bility_qwen3_omni.py to allow for longer initialization periods. Signed-off-by: zhumingjue <zhumingjue@huawei.com>

zhumingjue138 added 10 commits April 20, 2026 15:19

Merge remote-tracking branch 'upstream/main' into main-longterm-wan22

2303521

Signed-off-by: zhumingjue <zhumingjue@huawei.com>

Import OmniServer from tests.helpers.runtime in conftest.py to enhanc…

eb6f000

…e stability test functionality. Signed-off-by: zhumingjue <zhumingjue@huawei.com>

Update random_range_ratio in test_qwen3_omni.json to 0.0 for stabilit…

72955e4

…y testing adjustments Signed-off-by: zhumingjue <zhumingjue@huawei.com>

Merge branch 'main' into main-longterm-wan22

29e641e

zhumingjue138 changed the title ~~[Test] add stability test case for wan2.2 model and modified conftest.py in test/dfx/~~ [Test] add stability test case for wan2.2, qwen-tts, qwen3-omni and qwen-image model and modified conftest.py in test/dfx/ Apr 20, 2026

yenuo26 reviewed Apr 21, 2026

View reviewed changes

zhumingjue138 added 2 commits April 21, 2026 10:14

Merge branch 'main' into main-longterm-wan22

fedecdd

zhumingjue138 added 5 commits April 23, 2026 12:03

Merge branch 'main' into main-longterm-wan22

6f5d9c0

Merge branch 'main' into main-longterm-wan22

ec77c44

update wan22json

a729c64

Signed-off-by: zhumingjue <zhumingjue@huawei.com>

Merge branch 'main-longterm-wan22' of https://github.com/zhumingjue13…

6868a3c

…8/vllm-omni into main-longterm-wan22

Merge branch 'main' into main-longterm-wan22

ad139e7

hsliuustc0106 merged commit 47edee1 into vllm-project:main Apr 23, 2026
7 of 8 checks passed

yenuo26 mentioned this pull request Apr 24, 2026

[RFC]: CI optimization and supplementary task tracking JiusiServe/vllm-omni#177

Open

12 tasks

zhumingjue138 mentioned this pull request Apr 25, 2026

[RFC]: L5 Long-Term Stability Test and GPU Memory Monitoring #1590

Open

1 task

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Test] add stability test case for wan2.2, qwen-tts, qwen3-omni and qwen-image model and modified conftest.py in test/dfx/#2817

[Test] add stability test case for wan2.2, qwen-tts, qwen3-omni and qwen-image model and modified conftest.py in test/dfx/#2817
hsliuustc0106 merged 39 commits into
vllm-project:mainfrom
zhumingjue138:main-longterm-wan22

zhumingjue138 commented Apr 15, 2026 •

edited

Loading

Uh oh!

chatgpt-codex-connector Bot commented Apr 15, 2026

Uh oh!

hsliuustc0106 commented Apr 15, 2026

Uh oh!

yenuo26 Apr 21, 2026

Uh oh!

zhumingjue138 Apr 21, 2026

Uh oh!

zhumingjue138 commented Apr 22, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

		os.environ.pop("BENCHMARK_DIR")


		def _run_one_diffusion_batch(

Conversation

zhumingjue138 commented Apr 15, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Purpose

Test Plan

Test Result

Uh oh!

chatgpt-codex-connector Bot commented Apr 15, 2026

Uh oh!

hsliuustc0106 commented Apr 15, 2026

Uh oh!

yenuo26 Apr 21, 2026

Choose a reason for hiding this comment

Uh oh!

zhumingjue138 Apr 21, 2026

Choose a reason for hiding this comment

Uh oh!

zhumingjue138 commented Apr 22, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

zhumingjue138 commented Apr 15, 2026 •

edited

Loading