[CI] Update test markers and configurations to use 'full_model' for L4 nightly tests#2641
Conversation
…4 nightly tests - Changed test markers from 'advanced_model' to 'full_model' across various test files to align with the new testing structure. - Updated the 'pyproject.toml' to reflect the new marker definitions. - Adjusted Buildkite configurations to run full model tests in nightly pipelines. - Enhanced documentation to clarify the use of 'full_model' for nightly tests and 'advanced_model' for merge tests. Signed-off-by: wangyu <410167048@qq.com>
|
Codex usage limits have been reached for code reviews. Please check with the admins of this repo to increase the limits by adding credits. |
| commands: | ||
| - export VLLM_WORKER_MULTIPROC_METHOD=spawn | ||
| - pytest -s -v tests/e2e/online_serving/test_*_expansion.py -k "not test_wan22_expansion and not test_wan_2_1_vace_expansion and not test_qwen_image" -m "advanced_model and diffusion and H100" --run-level "advanced_model" | ||
| - pytest -s -v tests/e2e/ -k "not test_wan22_expansion and not test_wan_2_1_vace_expansion and not test_qwen_image" -m "full_model and diffusion and H100" --run-level "full_model" |
There was a problem hiding this comment.
Broadening from tests/e2e/online_serving/test_*_expansion.py to tests/e2e/ now sweeps in tests/e2e/accuracy/test_gebench_h100_smoke.py, test_gedit_bench_h100_smoke.py, and tests/e2e/accuracy/wan22_i2v/* — they all match full_model and diffusion and H100 after this PR, and the -k filter only excludes test_wan22_expansion/test_wan_2_1_vace_expansion/test_qwen_image, not the accuracy files. Those have dedicated steps below and need --gebench-model/--gedit-model CLI args, so they'll either double-run or fail here. Please tighten the path back to tests/e2e/online_serving/ or extend the -k exclusion.
lishunyang12
left a comment
There was a problem hiding this comment.
Review Summary
This PR cleanly separates the L3 (merge) and L4 (nightly) test tiers by introducing a new full_model marker for L4 nightly tests, keeping advanced_model exclusively for L3 merge tests. The change is well-structured and consistently applied.
What looks good
-
_is_deep_run_level()helper inconftest.py-- Good abstraction. Centralizing therun_level in ("advanced_model", "full_model")check avoids scattering the logic and makes future level additions trivial. -
Marker registration in
pyproject.toml-- Correctly addsfull_modeland updates the--run-levelchoices to include it. -
Buildkite pipeline updates -- All nightly YAMLs consistently switch from
advanced_modeltofull_modelin both-mand--run-levelflags. -
Test path widening (e.g.
tests/e2e/online_serving/test_*_expansion.py->tests/e2e/) -- This is safe because the marker filter (-m "full_model and diffusion and H100") still constrains collection. It also picks up the accuracy tests that were migrated tofull_model. -
Documentation updates -- CI level docs, marker docs, and READMEs are all updated consistently.
Minor observations (non-blocking)
-
test_qwen3_tts_base_expansion.pyhas both@pytest.mark.full_modeland@pytest.mark.core_modelon the same tests. This appears intentional (tests that run at both PR and nightly levels), but it might be worth a brief comment in the file explaining why they carry dual markers, for future contributors. -
Benchmark tests (
run_benchmark.py,run_diffusion_benchmark.py) gainedfull_model+benchmarkmarkers. Previously they had no level marker at all. This is a good addition that brings them into the marker system properly. -
The GPU cleanup log message change (
"GPU cleanup disabled"->"\nPost-test GPU cleanup skipped...") is unrelated to the marker refactor. Not a problem, just noting it's bundled in.
LGTM. The separation between merge-level and nightly-level test markers is clear and consistently applied across all 36 changed files.
|
Solve conflict thanks. |
Signed-off-by: wangyu <410167048@qq.com>
…istent pytestmark usage across various test modules. Signed-off-by: wangyu <410167048@qq.com>
Signed-off-by: wangyu <410167048@qq.com>
…racy tests; enhance run_args.py to include 'full_model' in run-level options. Signed-off-by: wangyu <410167048@qq.com>
…s_base_expansion.py to streamline test definitions. Signed-off-by: wangyu <410167048@qq.com>
…line CI pipeline. Signed-off-by: wangyu <410167048@qq.com>
…mpts, update request configurations, and streamline audio transcription process. Adjust pytestmark for diffusion tests. Signed-off-by: wangyu <410167048@qq.com>
…T2S prompts, adjust pytestmark for omni tests, and enhance audio validation logic in assertions. Signed-off-by: wangyu <410167048@qq.com>
…attribute with direct cosine similarity calculation, ensuring more accurate audio-text comparison. Clean up unused similarity variable in runtime handling. Signed-off-by: wangyu <410167048@qq.com>
fixed |
|
@hsliuustc0106 @Gaohan123 @gcanlin This PR is ready. Could you please check if it can be merged? |
|
From the semantics, does |
Our design is that, except for the simple test cases of L2 & L3, L4 will run the full set of test cases for all high, medium, and low priority models (excluding the test cases already run by L2 and L3). Therefore, it refers to both running all models and running all tests. |
…4 nightly tests (vllm-project#2641) Signed-off-by: wangyu <410167048@qq.com>
…4 nightly tests (vllm-project#2641) Signed-off-by: wangyu <410167048@qq.com>
…4 nightly tests (vllm-project#2641) Signed-off-by: wangyu <410167048@qq.com>
PLEASE FILL IN THE PR DESCRIPTION HERE ENSURING ALL CHECKLIST ITEMS (AT THE BOTTOM) HAVE BEEN CONSIDERED.
Purpose
Update test markers and configurations to use 'full_model' for L4 nightly tests
Test Plan
run in ci
Test Result
Essential Elements of an Effective PR Description Checklist
supported_models.mdandexamplesfor a new model. Please runmkdocs serveto sync the documentation editions to./docs.BEFORE SUBMITTING, PLEASE READ https://github.com/vllm-project/vllm-omni/blob/main/CONTRIBUTING.md (anything written below this line will be removed by GitHub Actions)