Skip to content

[CI][Perf] Add nightly PR labels, consolidate pipeline, and switch benchmark flag to --test-config-file#2816

Merged
hsliuustc0106 merged 9 commits intovllm-project:mainfrom
yenuo26:pr-label
Apr 16, 2026
Merged

[CI][Perf] Add nightly PR labels, consolidate pipeline, and switch benchmark flag to --test-config-file#2816
hsliuustc0106 merged 9 commits intovllm-project:mainfrom
yenuo26:pr-label

Conversation

@yenuo26
Copy link
Copy Markdown
Collaborator

@yenuo26 yenuo26 commented Apr 15, 2026

PLEASE FILL IN THE PR DESCRIPTION HERE ENSURING ALL CHECKLIST ITEMS (AT THE BOTTOM) HAVE BEEN CONSIDERED.

Purpose

fix #2410

Summary

  • Refactored Nightly CI orchestration: Merged the deprecated test-nightly-diffusion.yml content into test-nightly.yml, and refined the trigger logic based on tags (omni-test / tts-test / diffusion-x2iat-test / diffusion-x2v-test).
  • Updated Perf test entry and configuration: Migrated Omni configuration from test_omni.json to test_qwen_omni.json, and unified the benchmark parameter name to --test-config-file to avoid conflicts with pytest’s built-in --config-file.
  • Stabilized perf parameter parsing and collection: Moved the registration of the --test-config-file option down to tests/dfx/conftest.py to ensure pytest recognizes it during the parameter parsing phase; simultaneously cleaned up duplicate registrations in scripts.
  • Synchronized documentation and toolchain: Updated CI documentation examples, L4 performance test instructions, and naming references in nightly report scripts to ensure consistency among commands, file names, and configuration names.

Key Changes

CI:

.buildkite/pipeline.yml
.buildkite/test-nightly.yml
Deleted .buildkite/test-nightly-diffusion.yml
Perf tests:

tests/dfx/perf/scripts/run_benchmark.py
tests/dfx/perf/scripts/run_diffusion_benchmark.py
tests/dfx/conftest.py
Deleted tests/dfx/perf/tests/test.json
Added/using tests/dfx/perf/tests/test_qwen_omni.json
Docs & tooling:

docs/contributing/ci/CI_5levels.md
docs/contributing/ci/test_guide.md
docs/contributing/ci/test_examples/l4_performance_tests.inc.md
tools/nightly/generate_nightly_perf_excel.py

Why

  • To prevent --config-file from being interpreted as pytest’s own configuration parameter under pytest 9, which could cause rootdir shifts and collection exceptions such as ModuleNotFoundError: No module named 'tests'.
  • To simplify Nightly pipeline maintenance costs and reduce drift and duplication caused by scattered YAML files.
  • To ensure consistency in L4 perf paths, configuration naming, and documentation descriptions for Omni/TTS/Diffusion, thereby lowering troubleshooting costs.

Test Plan

1.Run perf test in local: pytest -s -v tests/dfx/perf/scripts/run_benchmark.py --test-config-file tests/dfx/perf/tests/test_tts.json
2.CI Nightly: Validate each of the tag paths (omni/tts/diffusion) once, triggering each one individually.

Test Result

227c2925-2a07-4db4-8f3f-845abde21aff 2. tts-test: 47c6b3d3-2694-4103-96f0-277a06c4a8b8

omni-test:
1a91073c-a520-4f04-938f-b24d6e6ee332

diffusion-x2v-test:
533548ee-27ea-40af-8e1c-208785242050

diffusion-x2iat-test:
fc4740ec-4b1e-4f6e-ab95-9287b0a35772

nightly-test:
0eb3238a-acfd-46ba-9d42-a48d2977be23


Essential Elements of an Effective PR Description Checklist
  • The purpose of the PR, such as "Fix some issue (link existing issues this PR will resolve)".
  • The test plan. Please provide the test scripts & test commands. Please state the reasons if your codes don't require additional test scripts. For test file guidelines, please check the test style doc
  • The test results. Please paste the results comparison before and after, or the e2e results.
  • (Optional) The necessary documentation update, such as updating supported_models.md and examples for a new model. Please run mkdocs serve to sync the documentation editions to ./docs.
  • (Optional) Release notes update. If your change is user-facing, please update the release notes draft.

BEFORE SUBMITTING, PLEASE READ https://github.com/vllm-project/vllm-omni/blob/main/CONTRIBUTING.md (anything written below this line will be removed by GitHub Actions)

@yenuo26 yenuo26 requested a review from hsliuustc0106 as a code owner April 15, 2026 08:01
@chatgpt-codex-connector
Copy link
Copy Markdown

Codex usage limits have been reached for code reviews. Please check with the admins of this repo to increase the limits by adding credits.
Credits must be used to enable repository wide code reviews.

@yenuo26 yenuo26 added the tts-test label to trigger buildkite tts models test in nightly CI label Apr 15, 2026
@hsliuustc0106
Copy link
Copy Markdown
Collaborator

BLOCKER scan:

Category Result
Correctness Minor: Removed explicit error checks in _resolve_baseline_value(). Old code provided clear ValueError/IndexError messages; new code relies on Python's default IndexError which is less informative. Consider keeping the explicit checks for better debugging.
Reliability/Safety PASS
Breaking Changes PASS
Test Coverage PASS - Test results provided in PR description
Documentation PASS - Documentation updated with new --config-file usage
Security PASS

OVERALL: NO BLOCKERS

VERDICT: COMMENT


This is a straightforward CI infrastructure update. The changes to support --config-file for performance tests are useful, and the documentation is updated accordingly.

Minor suggestion: Consider keeping the explicit error checks in _resolve_baseline_value() for sweep_index validation. The old code provided clearer error messages that would help users debug configuration issues more quickly.

Overall, the PR is ready to merge once the blocked checks pass.

@yenuo26 yenuo26 removed the tts-test label to trigger buildkite tts models test in nightly CI label Apr 15, 2026
…models

- Enhanced the nightly pipeline to include additional labels for triggering tests.
- Removed the obsolete `test-nightly-diffusion.yml` file.
- Updated `test-nightly.yml` to include new performance tests for Omni and TTS models.
- Introduced new performance test configurations in `test_qwen_omni.json` and `test_tts.json`.
- Added new benchmark scripts for Omni and diffusion models.
- Updated documentation to reflect changes in performance test configurations.

Signed-off-by: wangyu <410167048@qq.com>
Co-authored-by: inaniloquentee <inani_@stu.xjtu.edu.cn>
Fishermanykx and others added 3 commits April 15, 2026 18:16
…rmance testing of Omni and TTS models.

- Updated the nightly pipeline configuration to reflect changes in test script names and parameters.
- Introduced `run_diffusion_benchmark.py` for benchmarking diffusion models.
- Adjusted documentation to align with new test script usage and configuration options.

Signed-off-by: wangyu <410167048@qq.com>
Co-authored-by: inaniloquentee <inani_@stu.xjtu.edu.cn>
Signed-off-by: wangyu <410167048@qq.com>
@yenuo26 yenuo26 added the tts-test label to trigger buildkite tts models test in nightly CI label Apr 15, 2026
@yenuo26 yenuo26 changed the title [CI] Update nightly pipeline configuration and remove deprecated testfiles [CI][Perf] Add nightly PR labels, consolidate pipeline, and switch benchmark flag to --test-config-file Apr 15, 2026
@yenuo26
Copy link
Copy Markdown
Collaborator Author

yenuo26 commented Apr 15, 2026

I have modified some performance test scripts for the following reasons, @amy-why-3459 @fhfuih PTAL:

  • To separate the performance test scripts for omni and tts.
  • To prevent --config-file from being interpreted as pytest's own configuration parameter under pytest 9, which could cause rootdir shifts and collection exceptions such as ModuleNotFoundError: No module named 'tests'.

@yenuo26 yenuo26 added omni-test label to trigger buildkite omni model test in nightly CI and removed tts-test label to trigger buildkite tts models test in nightly CI labels Apr 15, 2026
Signed-off-by: wangyu <410167048@qq.com>
@yenuo26 yenuo26 added diffusion-x2v-test label to trigger buildkite x2video series of diffusion models test in nightly CI diffusion-x2iat-test label to trigger buildkite x2image + x2audio + x2text series of diffusion models test in nightly CI and removed omni-test label to trigger buildkite omni model test in nightly CI diffusion-x2v-test label to trigger buildkite x2video series of diffusion models test in nightly CI labels Apr 15, 2026
@yenuo26 yenuo26 added nightly-test label to trigger buildkite nightly test CI ready label to trigger buildkite CI and removed diffusion-x2iat-test label to trigger buildkite x2image + x2audio + x2text series of diffusion models test in nightly CI nightly-test label to trigger buildkite nightly test CI labels Apr 15, 2026
Copy link
Copy Markdown
Contributor

@fhfuih fhfuih left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For the diffusion part, generally looks good to me. Left some comments on documentation

Comment thread docs/contributing/ci/CI_5levels.md Outdated
/tests/e2e/offline_inference/test_{model_name}_expansion.py<br>
<strong>Performance:</strong><br>
/tests/dfx/perf/tests/test.json<br>
/tests/dfx/perf/tests/test_qwen_omni.json (Omni) and test_tts.json (TTS)<br>
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

And there is /tests/dfx/perf/tests/test_{some diffusion models}_vllm_omni.json Maybe you would like to mention them in the doc

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It is also related to your change in docs/contributing/ci/test_examples/l4_performance_tests.inc.md and some changes below within this file

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

fixed

Comment thread docs/contributing/ci/CI_5levels.md Outdated
├── test_cache_dit.py
├── test_teacache.py
├── test_stable_audio_expansion.py
├── test_stable_audio_model.py
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this unintentional? There isn't a test_stable_audio_model anymore after the L4 test for stable audio is merged

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

fixed

Signed-off-by: wangyu <410167048@qq.com>
…configurations

Signed-off-by: wangyu <410167048@qq.com>
@hsliuustc0106 hsliuustc0106 merged commit c83f664 into vllm-project:main Apr 16, 2026
8 checks passed
path: /mnt/hf-cache
type: DirectoryOrCreate

- label: ":full_moon: Diffusion · Qwen-Image · Accuracy Test"
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You mistakenly deleted this nightly test. Please add it back.

lvliang-intel pushed a commit to lvliang-intel/vllm-omni that referenced this pull request Apr 20, 2026
…nchmark flag to --test-config-file (vllm-project#2816)

Signed-off-by: wangyu <410167048@qq.com>
Co-authored-by: Y. Fisher <yukexiong1@huawei.com>
Co-authored-by: inaniloquentee <inani_@stu.xjtu.edu.cn>
@yenuo26 yenuo26 deleted the pr-label branch April 21, 2026 01:38
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

ready label to trigger buildkite CI

Projects

None yet

Development

Successfully merging this pull request may close these issues.

[Feature]: Trigger Model-Specific Performance Tests via Tags in vLLM-Omni

5 participants