Skip to content

[CI] Update test markers and configurations to use 'full_model' for L4 nightly tests#2641

Merged
hsliuustc0106 merged 15 commits into
vllm-project:mainfrom
yenuo26:L4-mark
Apr 22, 2026
Merged

[CI] Update test markers and configurations to use 'full_model' for L4 nightly tests#2641
hsliuustc0106 merged 15 commits into
vllm-project:mainfrom
yenuo26:L4-mark

Conversation

@yenuo26
Copy link
Copy Markdown
Collaborator

@yenuo26 yenuo26 commented Apr 9, 2026

PLEASE FILL IN THE PR DESCRIPTION HERE ENSURING ALL CHECKLIST ITEMS (AT THE BOTTOM) HAVE BEEN CONSIDERED.

Purpose

Update test markers and configurations to use 'full_model' for L4 nightly tests

  • Changed test markers from 'advanced_model' to 'full_model' across various test files to align with the new testing structure.
  • Updated the 'pyproject.toml' to reflect the new marker definitions.
  • Adjusted Buildkite configurations to run full model tests in nightly pipelines.
  • Enhanced documentation to clarify the use of 'full_model' for nightly tests and 'advanced_model' for merge tests.

Test Plan

run in ci

Test Result

51acfa85-c65d-4f28-9c5b-edfaab262e5d
Essential Elements of an Effective PR Description Checklist
  • The purpose of the PR, such as "Fix some issue (link existing issues this PR will resolve)".
  • The test plan. Please provide the test scripts & test commands. Please state the reasons if your codes don't require additional test scripts. For test file guidelines, please check the test style doc
  • The test results. Please paste the results comparison before and after, or the e2e results.
  • (Optional) The necessary documentation update, such as updating supported_models.md and examples for a new model. Please run mkdocs serve to sync the documentation editions to ./docs.
  • (Optional) Release notes update. If your change is user-facing, please update the release notes draft.

BEFORE SUBMITTING, PLEASE READ https://github.com/vllm-project/vllm-omni/blob/main/CONTRIBUTING.md (anything written below this line will be removed by GitHub Actions)

…4 nightly tests

- Changed test markers from 'advanced_model' to 'full_model' across various test files to align with the new testing structure.
- Updated the 'pyproject.toml' to reflect the new marker definitions.
- Adjusted Buildkite configurations to run full model tests in nightly pipelines.
- Enhanced documentation to clarify the use of 'full_model' for nightly tests and 'advanced_model' for merge tests.

Signed-off-by: wangyu <410167048@qq.com>
@yenuo26 yenuo26 requested a review from hsliuustc0106 as a code owner April 9, 2026 08:56
@chatgpt-codex-connector
Copy link
Copy Markdown

Codex usage limits have been reached for code reviews. Please check with the admins of this repo to increase the limits by adding credits.
Credits must be used to enable repository wide code reviews.

@yenuo26 yenuo26 added the nightly-test label to trigger buildkite nightly test CI label Apr 9, 2026
Comment thread .buildkite/test-nightly-diffusion.yml Outdated
commands:
- export VLLM_WORKER_MULTIPROC_METHOD=spawn
- pytest -s -v tests/e2e/online_serving/test_*_expansion.py -k "not test_wan22_expansion and not test_wan_2_1_vace_expansion and not test_qwen_image" -m "advanced_model and diffusion and H100" --run-level "advanced_model"
- pytest -s -v tests/e2e/ -k "not test_wan22_expansion and not test_wan_2_1_vace_expansion and not test_qwen_image" -m "full_model and diffusion and H100" --run-level "full_model"
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Broadening from tests/e2e/online_serving/test_*_expansion.py to tests/e2e/ now sweeps in tests/e2e/accuracy/test_gebench_h100_smoke.py, test_gedit_bench_h100_smoke.py, and tests/e2e/accuracy/wan22_i2v/* — they all match full_model and diffusion and H100 after this PR, and the -k filter only excludes test_wan22_expansion/test_wan_2_1_vace_expansion/test_qwen_image, not the accuracy files. Those have dedicated steps below and need --gebench-model/--gedit-model CLI args, so they'll either double-run or fail here. Please tighten the path back to tests/e2e/online_serving/ or extend the -k exclusion.

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

fixed

Copy link
Copy Markdown
Collaborator

@lishunyang12 lishunyang12 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Review Summary

This PR cleanly separates the L3 (merge) and L4 (nightly) test tiers by introducing a new full_model marker for L4 nightly tests, keeping advanced_model exclusively for L3 merge tests. The change is well-structured and consistently applied.

What looks good

  1. _is_deep_run_level() helper in conftest.py -- Good abstraction. Centralizing the run_level in ("advanced_model", "full_model") check avoids scattering the logic and makes future level additions trivial.

  2. Marker registration in pyproject.toml -- Correctly adds full_model and updates the --run-level choices to include it.

  3. Buildkite pipeline updates -- All nightly YAMLs consistently switch from advanced_model to full_model in both -m and --run-level flags.

  4. Test path widening (e.g. tests/e2e/online_serving/test_*_expansion.py -> tests/e2e/) -- This is safe because the marker filter (-m "full_model and diffusion and H100") still constrains collection. It also picks up the accuracy tests that were migrated to full_model.

  5. Documentation updates -- CI level docs, marker docs, and READMEs are all updated consistently.

Minor observations (non-blocking)

  • test_qwen3_tts_base_expansion.py has both @pytest.mark.full_model and @pytest.mark.core_model on the same tests. This appears intentional (tests that run at both PR and nightly levels), but it might be worth a brief comment in the file explaining why they carry dual markers, for future contributors.

  • Benchmark tests (run_benchmark.py, run_diffusion_benchmark.py) gained full_model + benchmark markers. Previously they had no level marker at all. This is a good addition that brings them into the marker system properly.

  • The GPU cleanup log message change ("GPU cleanup disabled" -> "\nPost-test GPU cleanup skipped...") is unrelated to the marker refactor. Not a problem, just noting it's bundled in.

LGTM. The separation between merge-level and nightly-level test markers is clear and consistently applied across all 36 changed files.

@lishunyang12
Copy link
Copy Markdown
Collaborator

Solve conflict thanks.

yenuo26 and others added 8 commits April 20, 2026 20:48
Signed-off-by: wangyu <410167048@qq.com>
…istent pytestmark usage across various test modules.

Signed-off-by: wangyu <410167048@qq.com>
Signed-off-by: wangyu <410167048@qq.com>
…racy tests; enhance run_args.py to include 'full_model' in run-level options.

Signed-off-by: wangyu <410167048@qq.com>
…s_base_expansion.py to streamline test definitions.

Signed-off-by: wangyu <410167048@qq.com>
…line CI pipeline.

Signed-off-by: wangyu <410167048@qq.com>
@yenuo26 yenuo26 removed the nightly-test label to trigger buildkite nightly test CI label Apr 21, 2026
yenuo26 and others added 2 commits April 21, 2026 15:24
…mpts, update request configurations, and streamline audio transcription process. Adjust pytestmark for diffusion tests.

Signed-off-by: wangyu <410167048@qq.com>
…T2S prompts, adjust pytestmark for omni tests, and enhance audio validation logic in assertions.

Signed-off-by: wangyu <410167048@qq.com>
@yenuo26 yenuo26 added the nightly-test label to trigger buildkite nightly test CI label Apr 21, 2026
…attribute with direct cosine similarity calculation, ensuring more accurate audio-text comparison. Clean up unused similarity variable in runtime handling.

Signed-off-by: wangyu <410167048@qq.com>
@yenuo26 yenuo26 added ready label to trigger buildkite CI and removed nightly-test label to trigger buildkite nightly test CI labels Apr 22, 2026
@yenuo26
Copy link
Copy Markdown
Collaborator Author

yenuo26 commented Apr 22, 2026

Solve conflict thanks.

fixed

@yenuo26
Copy link
Copy Markdown
Collaborator Author

yenuo26 commented Apr 22, 2026

@hsliuustc0106 @Gaohan123 @gcanlin This PR is ready. Could you please check if it can be merged?

@gcanlin
Copy link
Copy Markdown
Collaborator

gcanlin commented Apr 22, 2026

From the semantics, does full_model mean nightly will run all models or run all tests of one model?

@yenuo26
Copy link
Copy Markdown
Collaborator Author

yenuo26 commented Apr 22, 2026

From the semantics, does full_model mean nightly will run all models or run all tests of one model?

Our design is that, except for the simple test cases of L2 & L3, L4 will run the full set of test cases for all high, medium, and low priority models (excluding the test cases already run by L2 and L3). Therefore, it refers to both running all models and running all tests.
core_model -> L1&L2
advanced_model -> L3
full_model -> L4

@hsliuustc0106 hsliuustc0106 merged commit e18cb89 into vllm-project:main Apr 22, 2026
8 checks passed
qinganrice pushed a commit to qinganrice/vllm-omni that referenced this pull request Apr 23, 2026
…4 nightly tests (vllm-project#2641)

Signed-off-by: wangyu <410167048@qq.com>
lengrongfu pushed a commit to lengrongfu/vllm-omni that referenced this pull request May 1, 2026
…4 nightly tests (vllm-project#2641)

Signed-off-by: wangyu <410167048@qq.com>
@yenuo26 yenuo26 deleted the L4-mark branch May 9, 2026 01:07
clodaghwalsh17 pushed a commit to clodaghwalsh17/nm-vllm-omni-ent that referenced this pull request May 12, 2026
…4 nightly tests (vllm-project#2641)

Signed-off-by: wangyu <410167048@qq.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

ready label to trigger buildkite CI

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants