[POC] [CI] Setup test suite infrastructure for staged migration#13653
[POC] [CI] Setup test suite infrastructure for staged migration#13653alisonshao wants to merge 31 commits intomainfrom
Conversation
This PR organizes tests by moving all manual tests (previously in __not_in_ci__ section) to a dedicated test/manual/ directory. Changes: - Moved 77 manual tests from test/srt/ to test/manual/ - Removed __not_in_ci__ section from test/srt/run_suite.py - Updated .github/workflows/pr-test.yml - Added REORGANIZATION_PLAN.md documenting the changes Benefits: - Clearer separation between CI and manual tests - Easier to identify which tests are not run in CI - Cleaner run_suite.py without __not_in_ci__ section
…pport - Added all stage labels to LABEL_MAPPING (stage-a-test-2, stage-b-*, stage-c-*) - Added auto-partition function and command-line arguments - Stages with no registered tests will now pass with 0 tests instead of failing
Performance benchmark tests (test_bench_one_batch and test_bench_serving) were moved to test/manual/ but workflow still referenced test/srt/. Updated all performance-test-1-gpu-part-* jobs to use test/manual/.
- Moved test_bench_one_batch.py and test_bench_serving.py back to test/srt/ - Reverted all performance test workflow paths from test/manual to test/srt - These tests should stay in srt to avoid breaking sanity checks
test_bench_one_batch.py and test_bench_serving.py are run via performance-test-1-gpu-part-* jobs directly, not through run_suite.py, so they need to be in __not_in_ci__ to pass the sanity check.
This test is run directly via accuracy-test-1-gpu jobs in the workflow, not through run_suite.py. Added to __not_in_ci__ section.
This test is run directly via CI jobs, not through run_suite.py. Added to __not_in_ci__ section.
Moved back: - test_gpt_oss_common.py (imported by test_gpt_oss_1gpu.py, test_gpt_oss_4gpu.py) - test_vision_openai_server_common.py (imported by test_vision_openai_server_a.py) These are helper/base class files that need to be in test/srt/ so they can be imported by tests running in CI. Added to __not_in_ci__ section.
Related to #13610 Migrates the rotary_embedding feature tests as part of the test reorganization effort to move from hardware-based to feature-based test organization. Changes: - Moved test/srt/rotary_embedding/test_mrope.py to test/per_commit/rotary_embedding/ - Added register_cuda_ci decorator (est_time=10s, suite=stage-b-test-small-1-gpu) - Removed from test/srt/run_suite.py per-commit-1-gpu suite This test will now be discovered via the registration decorator system introduced in #13610.
This comment was marked as outdated.
This comment was marked as outdated.
This PR now only sets up the test suite infrastructure without migrating any tests yet. The actual test migration will be done in follow-up PRs. Changes: - Keep suite definitions in test/run_suite.py - Keep CI workflow stages in .github/workflows/pr-test.yml - Revert test_mrope.py migration back to test/srt/ - Keep migration plan documentation
Focus on 1-GPU migration plan. Multi-GPU tests (2-GPU and 4-GPU) are commented out and will be enabled later when tests are migrated to those suites.
|
/tag-and-rerun-ci |
- Comment out stage-b-test-large-2-gpu, stage-c-test-large-2-gpu, and stage-c-test-large-4-gpu from pr-test-finish needs - Remove corresponding suite labels from LABEL_MAPPING in test/run_suite.py - These will be uncommented when multi-GPU tests are migrated
Add all target directories for the 28 test categories in the migration plan:
- rotary_embedding, debug_utils, cache, utils, hicache, rl
- runtime, moe, scheduler, tokenization, vision, performance
- layers/attention/mamba, openai_server/{basic,features,function_call,validation}
- sampling, quant, attention/{backends,mla}, quantization
- observability, other, lora, speculative_decoding, models
Each directory contains a .gitkeep to track empty folders.
Complete Test Migration PlanOverviewThis document tracks the migration of ALL 240 tests to Test Sources:
Complete Test List by Feature1. vision/ (5 tests)
2. attention/backends/ (7 tests)
3. attention/mla/ (6 tests)
4. attention/mamba/ (5 tests)
5. attention/nsa/ (1 test)
6. attention/flashinfer_trtllm/ (2 tests)
7. cache/ (10 tests)
8. models/ (22 tests)
9. lora/ (12 tests)
10. openai_server/ (20 tests)basic/ (6)
features/ (5)
function_call/ (5)
validation/ (4)
11. speculative/ (9 tests)
12. quant/ (12 tests)
13. quantization/ (10 tests)
14. sampling/ (7 tests)
15. scheduler/ (7 tests)
16. runtime/ (14 tests)
17. rl/ (4 tests)
18. observability/ (6 tests)
19. ep/ (4 tests)
20. moe/ (7 tests)
21. disaggregation/ (5 tests)
22. deepseek/ (12 tests)
23. tokenization/ (3 tests)
24. utils/ (5 tests)
25. debug_utils/ (1 test)
26. rotary_embedding/ (2 tests)
27. ops/ (1 test)
28. cuda_graph/ (4 tests)
29. swa/ (2 tests)
30. eval/ (4 tests)
31. perf/ (8 tests)
32. cpu/ (17 tests)
33. ascend/ (11 tests)
34. xpu/ (1 test)
Migration Progress Summary
Total: 240 tests (5 already migrated) Notes
|
Migration plan details have been added as a PR comment for reference. The README files are kept locally but removed from version control to keep this PR focused on just the infrastructure setup.
This comment was marked as outdated.
This comment was marked as outdated.
- Rename test/per_commit/ to test/registered/ per issue #13808 - Add nightly and disabled params to CI registry functions - Create feature-based subdirectories in test/registered/ - Update run_suite.py to scan registered/ with --nightly flag support - Update README.md with new test structure documentation The new structure: - test/manual/: unofficially maintained tests (code references) - test/registered/: officially maintained tests (per-commit + nightly) Tests are now organized by feature (models, lora, quant, etc.) rather than by behavior (per-commit vs nightly). The nightly flag in the registry determines when tests run, not the directory location.
|
Note (#13944): when migrating lora:
|
Test Categorization for per-commit-1-gpu MigrationTotal: 120 tests
Summary by Suite
Detailed Test Categorization
Register Import LinesFor CPU tests: from sglang.test.ci.ci_register import register_cpu_ci
register_cpu_ci(est_time=<TIME>, suite="stage-a-cpu")For CUDA tests: from sglang.test.ci.ci_register import register_cuda_ci
register_cuda_ci(est_time=<TIME>, suite="<SUITE>") |
|
update scripts/ci/slash_command_handler.py once new stages are added. |
| cd sglang/test/srt | ||
|
|
||
| # Run a single file | ||
| python3 test_srt_endpoint.py |
There was a problem hiding this comment.
Hi @alisonshao ,
There is no test_srt_endpoint.py under the folder sglang/test/srt.
Can you update this part of document? Thank you.
|
Hi @alisonshao , I was wondering if there’s any plan for this PR #13653 to be merged. The unittest README updates are really helpful for contributors, so if this PR won’t be merged, perhaps a separate PR just for the README could be considered. I previously opened my own PR #18919, but it seems the changes here could fully cover mine. Thanks! |
Related: #13610, #13808
Overview
This is a POC branch for the new CI test infrastructure. It establishes the framework for organizing tests into
test/registered/with a registry-based system.Important: This PR should NOT be merged directly. Instead, tests should be migrated incrementally - one feature at a time in separate PRs.
Changes
CI Registry (
python/sglang/test/ci/ci_register.py)nightlyflag to register functions - tests can now be marked as nightly-onlydisabledflag with reason string - temporarily disable tests while preserving metadataTest Runner (
test/run_suite.py)test/registered/directory--nightlyflag to include nightly testsDirectory Structure
test/per_commit/→test/registered/nightlyflag in registry determines when tests run, not the directoryDocumentation (
test/README.md)New Structure (per #13808)
Next Steps
Each feature migration should be done in a separate PR:
test/srt/totest/registered/<feature>/test/srt/run_suite.pyto remove migrated testsThis POC demonstrates the infrastructure - actual migration happens incrementally.