Skip to content

[POC] [CI] Setup test suite infrastructure for staged migration#13653

Open
alisonshao wants to merge 31 commits intomainfrom
migrate-rotary-embedding
Open

[POC] [CI] Setup test suite infrastructure for staged migration#13653
alisonshao wants to merge 31 commits intomainfrom
migrate-rotary-embedding

Conversation

@alisonshao
Copy link
Collaborator

@alisonshao alisonshao commented Nov 20, 2025

Related: #13610, #13808

Overview

This is a POC branch for the new CI test infrastructure. It establishes the framework for organizing tests into test/registered/ with a registry-based system.

Important: This PR should NOT be merged directly. Instead, tests should be migrated incrementally - one feature at a time in separate PRs.

Changes

CI Registry (python/sglang/test/ci/ci_register.py)

  • Added nightly flag to register functions - tests can now be marked as nightly-only
  • Added disabled flag with reason string - temporarily disable tests while preserving metadata
  • Example usage:
    # Per-commit test
    register_cuda_ci(est_time=80, suite="stage-a-test-1")
    
    # Nightly-only test  
    register_cuda_ci(est_time=200, suite="nightly-1-gpu", nightly=True)
    
    # Temporarily disabled test
    register_cuda_ci(est_time=80, suite="stage-a-test-1", disabled="flaky - see #12345")

Test Runner (test/run_suite.py)

  • Updated to scan test/registered/ directory
  • Added --nightly flag to include nightly tests
  • Filters out disabled tests with warning messages
  • Supports auto-partitioning for load balancing

Directory Structure

  • Renamed test/per_commit/test/registered/
  • Tests organized by feature (not by behavior like per-commit/nightly)
  • The nightly flag in registry determines when tests run, not the directory

Documentation (test/README.md)

  • Updated with new directory structure
  • Documented CI registry parameters
  • Added examples for running tests

New Structure (per #13808)

test/
├── manual/         # Unofficially maintained (code references)
├── registered/     # Officially maintained (per-commit + nightly)
│   └── <feature>/  # Organized by feature
├── srt/            # Legacy (being migrated)
└── nightly/        # Legacy (being migrated)

Next Steps

Each feature migration should be done in a separate PR:

  1. Move tests from test/srt/ to test/registered/<feature>/
  2. Add CI registry decorators to test files
  3. Update test/srt/run_suite.py to remove migrated tests

This POC demonstrates the infrastructure - actual migration happens incrementally.

alisonshao and others added 11 commits November 19, 2025 20:21
This PR organizes tests by moving all manual tests (previously in
__not_in_ci__ section) to a dedicated test/manual/ directory.

Changes:
- Moved 77 manual tests from test/srt/ to test/manual/
- Removed __not_in_ci__ section from test/srt/run_suite.py
- Updated .github/workflows/pr-test.yml
- Added REORGANIZATION_PLAN.md documenting the changes

Benefits:
- Clearer separation between CI and manual tests
- Easier to identify which tests are not run in CI
- Cleaner run_suite.py without __not_in_ci__ section
…pport

- Added all stage labels to LABEL_MAPPING (stage-a-test-2, stage-b-*, stage-c-*)
- Added auto-partition function and command-line arguments
- Stages with no registered tests will now pass with 0 tests instead of failing
Performance benchmark tests (test_bench_one_batch and test_bench_serving)
were moved to test/manual/ but workflow still referenced test/srt/.

Updated all performance-test-1-gpu-part-* jobs to use test/manual/.
- Moved test_bench_one_batch.py and test_bench_serving.py back to test/srt/
- Reverted all performance test workflow paths from test/manual to test/srt
- These tests should stay in srt to avoid breaking sanity checks
test_bench_one_batch.py and test_bench_serving.py are run via
performance-test-1-gpu-part-* jobs directly, not through run_suite.py,
so they need to be in __not_in_ci__ to pass the sanity check.
This test is run directly via accuracy-test-1-gpu jobs in the workflow,
not through run_suite.py. Added to __not_in_ci__ section.
This test is run directly via CI jobs, not through run_suite.py.
Added to __not_in_ci__ section.
Moved back:
- test_gpt_oss_common.py (imported by test_gpt_oss_1gpu.py, test_gpt_oss_4gpu.py)
- test_vision_openai_server_common.py (imported by test_vision_openai_server_a.py)

These are helper/base class files that need to be in test/srt/ so they can
be imported by tests running in CI. Added to __not_in_ci__ section.
Related to #13610

Migrates the rotary_embedding feature tests as part of the test
reorganization effort to move from hardware-based to feature-based
test organization.

Changes:
- Moved test/srt/rotary_embedding/test_mrope.py to test/per_commit/rotary_embedding/
- Added register_cuda_ci decorator (est_time=10s, suite=stage-b-test-small-1-gpu)
- Removed from test/srt/run_suite.py per-commit-1-gpu suite

This test will now be discovered via the registration decorator system
introduced in #13610.
@gemini-code-assist

This comment was marked as outdated.

gemini-code-assist[bot]

This comment was marked as outdated.

Base automatically changed from test-reorganization-plan to main November 20, 2025 21:42
@github-actions github-actions bot added the documentation Improvements or additions to documentation label Nov 20, 2025
This PR now only sets up the test suite infrastructure without migrating any tests yet. The actual test migration will be done in follow-up PRs.

Changes:
- Keep suite definitions in test/run_suite.py
- Keep CI workflow stages in .github/workflows/pr-test.yml
- Revert test_mrope.py migration back to test/srt/
- Keep migration plan documentation
@alisonshao alisonshao changed the title [CI] Migrate rotary_embedding tests to per_commit (1/28) [CI] Setup test suite infrastructure for staged migration Nov 20, 2025
Focus on 1-GPU migration plan. Multi-GPU tests (2-GPU and 4-GPU) are commented out and will be enabled later when tests are migrated to those suites.
@alisonshao
Copy link
Collaborator Author

/tag-and-rerun-ci

- Comment out stage-b-test-large-2-gpu, stage-c-test-large-2-gpu, and stage-c-test-large-4-gpu from pr-test-finish needs
- Remove corresponding suite labels from LABEL_MAPPING in test/run_suite.py
- These will be uncommented when multi-GPU tests are migrated
Add all target directories for the 28 test categories in the migration plan:
- rotary_embedding, debug_utils, cache, utils, hicache, rl
- runtime, moe, scheduler, tokenization, vision, performance
- layers/attention/mamba, openai_server/{basic,features,function_call,validation}
- sampling, quant, attention/{backends,mla}, quantization
- observability, other, lora, speculative_decoding, models

Each directory contains a .gitkeep to track empty folders.
@alisonshao
Copy link
Collaborator Author

alisonshao commented Nov 20, 2025

Complete Test Migration Plan

Overview

This document tracks the migration of ALL 240 tests to test/registered/ with the new CI registry system.

Test Sources:

Source Count Status
test/srt/ 208 To migrate
test/nightly/ 28 To migrate
test/registered/ 4 Already migrated
Total 240

Complete Test List by Feature

1. vision/ (5 tests)

Test Source Est Time Suite Notes
test_vision_chunked_prefill.py srt 170s per-commit-1-gpu
test_vision_openai_server_a.py srt 900s per-commit-1-gpu
test_vision_openai_server_common.py srt - - Common utils
test_vlm_input_format.py srt 300s per-commit-1-gpu
test_vlms_perf.py nightly - nightly Nightly perf

2. attention/backends/ (7 tests)

Test Source Est Time Suite
test_fa3.py srt 420s per-commit-1-gpu
test_flash_attention_4.py srt 300s per-commit-4-gpu-b200
test_hybrid_attn_backend.py srt 379s per-commit-1-gpu
test_torch_native_attention_backend.py srt 123s per-commit-1-gpu, AMD
test_triton_attention_backend.py srt 150s per-commit-1-gpu, AMD
test_triton_attention_kernels.py srt 4s per-commit-1-gpu
test_wave_attention_kernels.py srt 2s per-commit-amd

3. attention/mla/ (6 tests)

Test Source Est Time Suite
test_flashmla.py srt 230s per-commit-1-gpu
test_mla.py srt 180s per-commit-1-gpu
test_mla_deepseek_v3.py srt 500s per-commit-1-gpu
test_mla_flashinfer.py srt 302s per-commit-1-gpu
test_mla_fp8.py srt 93s per-commit-1-gpu
test_mla_int8_deepseek_v3.py srt 300s per-commit-1-gpu

4. attention/mamba/ (5 tests)

Test Source Est Time Suite
test_causal_conv1d.py srt/layers 25s per-commit-1-gpu
test_mamba_ssm.py srt/layers 50s per-commit-1-gpu
test_mamba_ssm_ssd.py srt/layers 20s per-commit-1-gpu
test_mamba_unittest.py srt 4s per-commit-1-gpu
test_mamba2_mixer.py srt/layers 50s per-commit-2-gpu

5. attention/nsa/ (1 test)

Test Source Est Time Suite Nightly
test_nsa_indexer.py nightly 2s nightly-1-gpu

6. attention/flashinfer_trtllm/ (2 tests)

Test Source Est Time Suite Nightly
test_flashinfer_trtllm_gen_attn_backend.py nightly 300s nightly-4-gpu-b200
test_flashinfer_trtllm_gen_moe_backend.py nightly 300s nightly-4-gpu-b200

7. cache/ (10 tests)

Test Source Est Time Suite Nightly
test_page_size.py srt 60s per-commit-1-gpu
test_radix_attention.py srt 105s per-commit-1-gpu
test_radix_cache_unit.py srt 5s per-commit-1-gpu
test_cpp_radix_cache.py nightly 60s nightly-1-gpu
test_hicache_storage.py srt/hicache 127s per-commit-1-gpu
test_hicache_variants.py srt/hicache 393s per-commit-1-gpu
test_hicache_storage_3fs_backend.py srt/hicache 200s per-commit-2-gpu
test_hicache_storage_file_backend.py srt/hicache 200s per-commit-2-gpu
test_hicache_storage_mooncake_backend.py srt/hicache 300s per-commit-2-gpu

8. models/ (22 tests)

Test Source Est Time Suite Notes
test_compressed_tensors_models.py srt/models 42s per-commit-1-gpu
test_cross_encoder_models.py srt/models 100s per-commit-1-gpu
test_dummy_grok_models.py srt/models - - Manual test
test_embedding_models.py srt/models 73s per-commit-1-gpu
test_encoder_embedding_models.py srt/models 460s per-commit-1-gpu
test_generation_models.py srt/models 103s per-commit-1-gpu
test_glm4_moe_models.py srt/models 100s per-commit-2-gpu
test_kimi_k2_models.py srt/models 200s per-commit-8-gpu-h200
test_kimi_linear_models.py srt/models 90s per-commit-2-gpu
test_nvidia_nemotron_nano_v2.py srt/models 160s per-commit-1-gpu
test_qwen_models.py srt/models 150s per-commit-1-gpu
test_qwen3_next_models.py srt/models 291s per-commit-4-gpu
test_reward_models.py srt/models 132s per-commit-1-gpu
test_transformers_models.py srt/models 320s per-commit-1-gpu
test_vlm_models.py srt/models 741s per-commit-1-gpu
test_external_models.py srt 155s per-commit-1-gpu
test_gpt_oss_1gpu.py srt 750s per-commit-1-gpu
test_gpt_oss_4gpu.py srt 300s per-commit-4-gpu
test_gpt_oss_common.py srt - - Common utils
test_model_hooks.py srt 1s per-commit-1-gpu
test_encoder_dp.py nightly 500s nightly-4-gpu Nightly
test_qwen3_next_deterministic.py nightly 200s nightly-4-gpu Nightly

9. lora/ (12 tests)

Test Source Est Time Suite Nightly
test_lora.py srt/lora 150s per-commit-1-gpu
test_lora_backend.py srt/lora 99s per-commit-1-gpu
test_lora_eviction.py srt/lora 240s per-commit-1-gpu
test_lora_spec_decoding.py srt/lora 150s per-commit-1-gpu
test_lora_update.py srt/lora 600s per-commit-1-gpu
test_multi_lora_backend.py srt/lora 60s per-commit-1-gpu
test_lora_tp.py srt/lora 116s per-commit-2-gpu
test_lora_qwen3.py nightly 97s nightly-1-gpu
test_lora_radix_cache.py nightly 200s nightly-1-gpu
test_lora_eviction_policy.py nightly 200s nightly-1-gpu
test_lora_openai_api.py nightly 30s nightly-1-gpu
test_lora_openai_compatible.py nightly 150s nightly-1-gpu

10. openai_server/ (20 tests)

basic/ (6)

Test Source Est Time Suite
test_openai_embedding.py srt 79s per-commit-1-gpu
test_openai_server.py srt 270s per-commit-1-gpu
test_protocol.py srt 10s per-commit-1-gpu
test_serving_chat.py srt 10s per-commit-1-gpu
test_serving_completions.py srt 10s per-commit-1-gpu
test_serving_embedding.py srt 10s per-commit-1-gpu

features/ (5)

Test Source Est Time Suite
test_enable_thinking.py srt 70s per-commit-1-gpu
test_json_mode.py srt 120s per-commit-1-gpu
test_openai_server_ebnf.py srt 20s per-commit-1-gpu
test_openai_server_hidden_states.py srt 240s per-commit-1-gpu
test_reasoning_content.py srt 89s per-commit-1-gpu

function_call/ (5)

Test Source Est Time Suite
test_openai_function_calling.py srt 60s per-commit-1-gpu
test_tool_choice.py srt 120s per-commit-1-gpu
test_function_call_parser.py registered - CPU
test_json_schema_constraint.py registered - CPU
test_unknown_tool_name.py registered 1s CPU

validation/ (4)

Test Source Est Time Suite
test_large_max_new_tokens.py srt 41s per-commit-1-gpu
test_matched_stop.py srt 60s per-commit-1-gpu
test_openai_server_ignore_eos.py srt 85s per-commit-1-gpu
test_request_length_validation.py srt 31s per-commit-1-gpu

11. speculative/ (9 tests)

Test Source Est Time Suite
test_build_eagle_tree.py srt 8s per-commit-1-gpu
test_eagle_infer_a.py srt 750s per-commit-1-gpu
test_eagle_infer_b.py srt 750s per-commit-1-gpu
test_eagle_infer_beta.py srt 90s per-commit-1-gpu
test_ngram_speculative_decoding.py srt 290s per-commit-1-gpu
test_speculative_registry.py srt 1s per-commit-1-gpu
test_standalone_speculative_decoding.py srt 150s per-commit-1-gpu
test_eagle_dp_attention.py srt 200s per-commit-2-gpu

12. quant/ (12 tests)

Test Source Est Time Suite
test_autoround.py srt/quant 60s per-commit-1-gpu
test_awq_dequant.py srt/quant 2s per-commit-amd
test_awq.py srt/quant 163s quantization_test
test_block_int8.py srt/quant 22s per-commit-1-gpu
test_fp8_kernel.py srt/quant 8s per-commit-1-gpu
test_fused_rms_fp8_group_quant.py srt/quant 10s per-commit-amd
test_int8_kernel.py srt/quant 8s per-commit-1-gpu
test_triton_scaled_mm.py srt/quant 8s per-commit-1-gpu
test_w4a8_deepseek_v3.py srt/quant 520s per-commit-8-gpu-h20
test_w8a8_quantization.py srt/quant 160s per-commit-1-gpu

13. quantization/ (10 tests)

Test Source Est Time Suite
test_eval_fp8_accuracy.py srt 303s per-commit-1-gpu
test_fp8_utils.py srt 5s per-commit-1-gpu
test_modelopt_export.py srt 30s per-commit-1-gpu
test_modelopt_loader.py srt 30s per-commit-1-gpu
test_torchao.py srt 70s per-commit-1-gpu
test_bnb.py srt 5s quantization_test
test_gptqmodel_dynamic.py srt 102s quantization_test
test_quantization.py srt 185s quantization_test
test_gguf.py srt 96s quantization_test

14. sampling/ (7 tests)

Test Source Est Time Suite
test_constrained_decoding.py srt 150s per-commit-1-gpu
test_harmony_parser.py srt 20s per-commit-1-gpu
test_jinja_template_utils.py srt 1s per-commit-1-gpu
test_penalty.py srt 82s per-commit-1-gpu
test_pytorch_sampling_backend.py srt 66s per-commit-1-gpu
test_reasoning_parser.py srt 5s per-commit-1-gpu

15. scheduler/ (7 tests)

Test Source Est Time Suite
test_no_overlap_scheduler.py srt 234s per-commit-1-gpu
test_priority_scheduling.py srt 130s per-commit-1-gpu
test_request_queue_validation.py srt 30s per-commit-1-gpu
test_retract_decode.py srt 450s per-commit-1-gpu
test_local_attn.py srt 411s per-commit-4-gpu
test_type_based_dispatcher.py srt 10s per-commit-amd
test_bench_typebaseddispatcher.py srt 10s per-commit-amd

16. runtime/ (14 tests)

Test Source Est Time Suite Nightly
test_abort.py srt 190s per-commit-1-gpu
test_deterministic.py srt 400s per-commit-1-gpu
test_srt_endpoint.py srt 130s per-commit-1-gpu
test_srt_engine.py srt 450s per-commit-1-gpu
test_server_args.py srt 1s per-commit-1-gpu
test_srt_backend.py registered 80s stage-a-test-1
test_data_parallelism.py srt 73s per-commit-2-gpu
test_dp_attention.py srt 350s per-commit-2-gpu
test_load_weights_from_remote_instance.py srt 72s per-commit-2-gpu
test_patch_torch.py srt 19s per-commit-2-gpu
test_release_memory_occupation.py srt 200s per-commit-2-gpu
test_multi_instance_release_memory_occupation.py srt 64s per-commit-4-gpu
test_pp_single_node.py srt 481s per-commit-4-gpu
test_batch_invariant_ops.py nightly 10s nightly-1-gpu

17. rl/ (4 tests)

Test Source Est Time Suite
test_fp32_lm_head.py srt/rl 30s per-commit-1-gpu
test_update_weights_from_disk.py srt/rl 210s per-commit-1-gpu
test_update_weights_from_tensor.py srt/rl 80s per-commit-1-gpu
test_update_weights_from_distributed.py srt/rl 103s per-commit-2-gpu

18. observability/ (6 tests)

Test Source Est Time Suite
test_hidden_states.py srt 55s per-commit-1-gpu
test_metrics.py srt 32s per-commit-1-gpu
test_metrics_utils.py srt 1s per-commit-1-gpu
test_profile_merger.py srt 60s per-commit-1-gpu
test_profile_merger_http_api.py srt 15s per-commit-1-gpu
test_start_profile.py srt 180s per-commit-1-gpu

19. ep/ (4 tests)

Test Source Est Time Suite
test_moe_ep.py srt/ep 140s per-commit-2-gpu
test_deepep_small.py srt/ep 531s per-commit-4-gpu-deepep
test_mooncake_ep_small.py srt/ep 450s per-commit-4-gpu-deepep
test_deepep_large.py srt/ep 338s per-commit-8-gpu-h200-deepep

20. moe/ (7 tests)

Test Source Est Time Suite Nightly
test_fused_moe.py srt 80s per-commit-1-gpu
test_torch_compile_moe.py srt 210s per-commit-1-gpu
test_triton_moe_channel_fp8_kernel.py srt 25s per-commit-1-gpu
test_triton_fused_moe.py srt 80s per-commit-1-gpu
test_cutedsl_moe.py srt 300s per-commit-4-gpu-gb200
test_deepseek_v3_fp4_cutlass_moe.py nightly 900s nightly-4-gpu-b200
test_fp4_moe.py nightly 300s nightly-4-gpu-b200

21. disaggregation/ (5 tests)

Test Source Est Time Suite
test_disaggregation_basic.py srt 400s per-commit-2-gpu
test_disaggregation_different_tp.py srt 600s per-commit-8-gpu-h20
test_disaggregation_pp.py srt 140s per-commit-8-gpu-h20
test_disaggregation_dp_attention.py srt 155s per-commit-8-gpu-h20
test_disaggregation_hybrid_attention.py srt 200s per-commit-8-gpu-h200

22. deepseek/ (12 tests)

Test Source Est Time Suite Nightly
test_deepseek_v3_basic.py srt 275s per-commit-8-gpu-h200
test_deepseek_v3_mtp.py srt 275s per-commit-8-gpu-h200
test_deepseek_v32_basic.py srt 275s per-commit-8-gpu-h200
test_deepseek_v32_mtp.py srt 275s per-commit-8-gpu-h200
test_deepseek_v3_fp4_4gpu.py srt 1800s per-commit-4-gpu-b200
test_deepseek_v3_cutedsl_4gpu.py srt 590s per-commit-4-gpu-gb200
test_llama31_fp4.py srt 300s per-commit-4-gpu-b200
test_deepseek_v3_deterministic.py nightly 240s nightly-1-gpu
test_deepseek_v32_nsabackend.py nightly 600s nightly-8-gpu-h200
test_deepseek_r1_fp8_trtllm_backend.py nightly 3600s nightly-8-gpu-b200
test_deepseek_v31_perf.py nightly - nightly
test_deepseek_v32_perf.py nightly - nightly

23. tokenization/ (3 tests)

Test Source Est Time Suite
test_input_embeddings.py srt 38s per-commit-1-gpu
test_multi_tokenizer.py srt 230s per-commit-1-gpu
test_skip_tokenizer_init.py srt 117s per-commit-1-gpu

24. utils/ (5 tests)

Test Source Est Time Suite
test_io_struct.py srt 8s per-commit-1-gpu
test_utils_update_weights.py srt 48s per-commit-1-gpu
test_create_kvindices.py srt 2s per-commit-1-gpu
test_original_logprobs.py srt 41s per-commit-1-gpu
test_score_api.py srt 310s per-commit-1-gpu

25. debug_utils/ (1 test)

Test Source Est Time Suite
test_tensor_dump_forward_hook.py srt/debug_utils 15s per-commit-1-gpu

26. rotary_embedding/ (2 tests)

Test Source Est Time Suite
test_mrope.py srt/rotary_embedding 10s per-commit-1-gpu
test_rope_rocm.py srt 3s per-commit-amd

27. ops/ (1 test)

Test Source Est Time Suite
test_repeat_interleave.py srt/ops 60s per-commit-1-gpu

28. cuda_graph/ (4 tests)

Test Source Est Time Suite
test_chunked_prefill.py srt 410s per-commit-1-gpu
test_no_chunked_prefill.py srt 108s per-commit-1-gpu
test_piecewise_cuda_graph.py srt 850s per-commit-1-gpu
test_torch_compile.py srt 76s per-commit-1-gpu

29. swa/ (2 tests)

Test Source Est Time Suite
test_swa_unittest.py srt 1s per-commit-1-gpu
test_triton_sliding_window.py srt 100s per-commit-1-gpu

30. eval/ (4 tests)

Test Source Est Time Suite Nightly
test_text_models_gsm8k_eval.py nightly - nightly
test_vlms_mmmu_eval.py nightly - nightly
test_gsm8k_eval_amd.py srt/nightly - nightly-amd
test_eval_accuracy_large.py srt - - Manual
test_moe_eval_accuracy_large.py srt - - Manual

31. perf/ (8 tests)

Test Source Est Time Suite Nightly
test_text_models_perf.py nightly - nightly
test_glm_4_6_perf.py nightly - nightly
test_kimi_k2_thinking_perf.py nightly - nightly
test_minimax_m2_perf.py nightly - nightly
test_qwen3_235b_perf.py nightly - nightly
test_gpt_oss_4gpu_perf.py nightly 600s nightly-4-gpu-b200
test_bench_one_batch.py srt - - Manual
test_bench_serving.py srt - - Manual

32. cpu/ (17 tests)

Test Source Est Time Suite
test_activation.py srt/cpu - per-commit-cpu
test_binding.py srt/cpu - per-commit-cpu
test_cpu_graph.py srt/cpu - per-commit-cpu
test_decode.py srt/cpu - per-commit-cpu
test_extend.py srt/cpu - per-commit-cpu
test_gemm.py srt/cpu - per-commit-cpu
test_intel_amx_attention_backend_a.py srt/cpu - per-commit-cpu
test_intel_amx_attention_backend_b.py srt/cpu - per-commit-cpu
test_intel_amx_attention_backend_c.py srt/cpu - per-commit-cpu
test_mla.py srt/cpu - per-commit-cpu
test_moe.py srt/cpu - per-commit-cpu
test_norm.py srt/cpu - per-commit-cpu
test_qkv_proj_with_rope.py srt/cpu - per-commit-cpu
test_rope.py srt/cpu - per-commit-cpu
test_shared_expert.py srt/cpu - per-commit-cpu
test_topk.py srt/cpu - per-commit-cpu

33. ascend/ (11 tests)

Test Source Est Time Suite
test_ascend_deepep.py srt/ascend 400s per-commit-16-npu-a3
test_ascend_graph_tp1_bf16.py srt/ascend 400s per-commit-1-npu-a2
test_ascend_graph_tp2_bf16.py srt/ascend 400s per-commit-2-npu-a2
test_ascend_hicache_mha.py srt/ascend 400s per-commit-1-npu-a2
test_ascend_mla_fia_w8a8int8.py srt/ascend 400s per-commit-2-npu-a2
test_ascend_mla_w8a8int8.py srt/ascend 400s per-commit-4-npu-a2
test_ascend_sampling_backend.py srt/ascend 400s per-commit-1-npu-a2
test_ascend_tp1_bf16.py srt/ascend 400s per-commit-1-npu-a2
test_ascend_tp2_bf16.py srt/ascend 400s per-commit-2-npu-a2
test_ascend_tp2_fia_bf16.py srt/ascend 400s per-commit-2-npu-a2
test_ascend_tp4_bf16.py srt/ascend 400s per-commit-4-npu-a2

34. xpu/ (1 test)

Test Source Est Time Suite
test_intel_xpu_backend.py srt/xpu - per-commit-xpu

Migration Progress Summary

Category Tests Migrated Status
vision 5 0
attention/backends 7 0
attention/mla 6 0
attention/mamba 5 0
attention/nsa 1 0
attention/flashinfer_trtllm 2 0
cache 10 0
models 22 0
lora 12 0
openai_server 20 4 🔄
speculative 9 0
quant 12 0
quantization 10 0
sampling 7 0
scheduler 7 0
runtime 14 1 🔄
rl 4 0
observability 6 0
ep 4 0
moe 7 0
disaggregation 5 0
deepseek 12 0
tokenization 3 0
utils 5 0
debug_utils 1 0
rotary_embedding 2 0
ops 1 0
cuda_graph 4 0
swa 2 0
eval 5 0
perf 8 0
cpu 17 0
ascend 11 0
xpu 1 0

Total: 240 tests (5 already migrated)


Notes

  1. This is a POC branch - do NOT merge directly
  2. Each feature should be migrated in a separate PR
  3. Tests organized by feature, not by behavior
  4. nightly=True flag determines nightly-only tests
  5. disabled="reason" for temporarily disabled tests
  6. Tests marked "Manual" are for local development/benchmarking only

Migration plan details have been added as a PR comment for reference. The README files are kept locally but removed from version control to keep this PR focused on just the infrastructure setup.
@alisonshao

This comment was marked as outdated.

@hnyls2002 hnyls2002 mentioned this pull request Nov 24, 2025
7 tasks
alisonshao and others added 4 commits November 24, 2025 22:43
- Rename test/per_commit/ to test/registered/ per issue #13808
- Add nightly and disabled params to CI registry functions
- Create feature-based subdirectories in test/registered/
- Update run_suite.py to scan registered/ with --nightly flag support
- Update README.md with new test structure documentation

The new structure:
- test/manual/: unofficially maintained tests (code references)
- test/registered/: officially maintained tests (per-commit + nightly)

Tests are now organized by feature (models, lora, quant, etc.) rather
than by behavior (per-commit vs nightly). The nightly flag in the
registry determines when tests run, not the directory location.
@hnyls2002 hnyls2002 added the format Auto Format Code label Nov 25, 2025
@github-actions github-actions bot removed the format Auto Format Code label Nov 25, 2025
@hnyls2002 hnyls2002 changed the title [CI] Setup test suite infrastructure for staged migration [POC] [CI] Setup test suite infrastructure for staged migration Nov 25, 2025
@alisonshao alisonshao changed the title [POC] [CI] Setup test suite infrastructure for staged migration [POC] CI test infrastructure: test/registered with nightly/disabled flags Nov 25, 2025
@alisonshao alisonshao changed the title [POC] CI test infrastructure: test/registered with nightly/disabled flags [POC] [CI] Setup test suite infrastructure for staged migration Nov 25, 2025
@alisonshao
Copy link
Collaborator Author

alisonshao commented Nov 26, 2025

Note (#13944): when migrating lora:
Move lora_utils.py into test/registered/lora/
test/
registered/
lora/
lora_utils.py # moves here with tests
test_lora_qwen3.py

  • No PYTHONPATH needed if tests run from same directory
  • Or PYTHONPATH includes test/registered/lora/

@alisonshao
Copy link
Collaborator Author

Test Categorization for per-commit-1-gpu Migration

Total: 120 tests

  • stage-a-cpu: CPU-only tests, no GPU required
  • stage-a-test-2: Unit tests, no model loading, < 60s
  • stage-b-test-small-1-gpu: Tests using DEFAULT_SMALL_MODEL_NAME_FOR_TEST or similar small models, 60-150s
  • stage-b-test-large-1-gpu: Tests with larger models or complex operations, 150-500s
  • stage-c-test-large-1-gpu: Very long tests, > 500s

Summary by Suite

Suite Tests
stage-a-cpu 8
stage-a-test-2 35
stage-b-test-small-1-gpu 33
stage-b-test-large-1-gpu 36
stage-c-test-large-1-gpu 8

Detailed Test Categorization

Filename Time (s) Register Suite Category
test_jinja_template_utils.py 1 register_cpu_ci stage-a-cpu sampling
test_metrics_utils.py 1 register_cpu_ci stage-a-cpu observability
test_model_hooks.py 1 register_cpu_ci stage-a-cpu models
test_server_args.py 1 register_cpu_ci stage-a-cpu openai_server
test_speculative_registry.py 1 register_cpu_ci stage-a-cpu speculative
test_swa_unittest.py 1 register_cpu_ci stage-a-cpu other
test_create_kvindices.py 2 register_cpu_ci stage-a-cpu other
openai_server/basic/test_protocol.py 10 register_cpu_ci stage-a-cpu basic
test_mamba_unittest.py 4 register_cuda_ci stage-a-test-2 mamba
test_triton_attention_kernels.py 4 register_cuda_ci stage-a-test-2 attention_backends
test_fp8_utils.py 5 register_cuda_ci stage-a-test-2 quantization
test_radix_cache_unit.py 5 register_cuda_ci stage-a-test-2 cache
test_reasoning_parser.py 5 register_cuda_ci stage-a-test-2 sampling
quant/test_fp8_kernel.py 8 register_cuda_ci stage-a-test-2 quant
quant/test_int8_kernel.py 8 register_cuda_ci stage-a-test-2 quant
quant/test_triton_scaled_mm.py 8 register_cuda_ci stage-a-test-2 quant
test_build_eagle_tree.py 8 register_cuda_ci stage-a-test-2 speculative
test_io_struct.py 8 register_cuda_ci stage-a-test-2 utils
openai_server/basic/test_serving_chat.py 10 register_cuda_ci stage-a-test-2 basic
openai_server/basic/test_serving_completions.py 10 register_cuda_ci stage-a-test-2 basic
openai_server/basic/test_serving_embedding.py 10 register_cuda_ci stage-a-test-2 basic
rotary_embedding/test_mrope.py 10 register_cuda_ci stage-a-test-2 rotary_embedding
debug_utils/test_tensor_dump_forward_hook.py 15 register_cuda_ci stage-a-test-2 debug_utils
test_profile_merger_http_api.py 15 register_cuda_ci stage-a-test-2 observability
layers/attention/mamba/test_mamba_ssm_ssd.py 20 register_cuda_ci stage-a-test-2 mamba
openai_server/features/test_openai_server_ebnf.py 20 register_cuda_ci stage-a-test-2 features
test_harmony_parser.py 20 register_cuda_ci stage-a-test-2 sampling
quant/test_block_int8.py 22 register_cuda_ci stage-a-test-2 quant
layers/attention/mamba/test_causal_conv1d.py 25 register_cuda_ci stage-a-test-2 mamba
test_triton_moe_channel_fp8_kernel.py 25 register_cuda_ci stage-a-test-2 moe
rl/test_fp32_lm_head.py 30 register_cuda_ci stage-a-test-2 rl
test_modelopt_export.py 30 register_cuda_ci stage-a-test-2 quantization
test_modelopt_loader.py 30 register_cuda_ci stage-a-test-2 quantization
test_request_queue_validation.py 30 register_cuda_ci stage-a-test-2 scheduler
openai_server/validation/test_request_length_validation.py 31 register_cuda_ci stage-a-test-2 validation
test_metrics.py 32 register_cuda_ci stage-a-test-2 observability
test_input_embeddings.py 38 register_cuda_ci stage-a-test-2 tokenization
openai_server/validation/test_large_max_new_tokens.py 41 register_cuda_ci stage-a-test-2 validation
test_original_logprobs.py 41 register_cuda_ci stage-a-test-2 other
models/test_compressed_tensors_models.py 42 register_cuda_ci stage-a-test-2 models
test_utils_update_weights.py 48 register_cuda_ci stage-a-test-2 utils
layers/attention/mamba/test_mamba_ssm.py 50 register_cuda_ci stage-a-test-2 mamba
test_hidden_states.py 55 register_cuda_ci stage-a-test-2 observability
lora/test_multi_lora_backend.py 60 register_cuda_ci stage-b-test-small-1-gpu lora
openai_server/function_call/test_openai_function_calling.py 60 register_cuda_ci stage-b-test-small-1-gpu function_call
openai_server/validation/test_matched_stop.py 60 register_cuda_ci stage-b-test-small-1-gpu validation
quant/test_autoround.py 60 register_cuda_ci stage-b-test-small-1-gpu quant
test_page_size.py 60 register_cuda_ci stage-b-test-small-1-gpu cache
test_profile_merger.py 60 register_cuda_ci stage-b-test-small-1-gpu observability
test_pytorch_sampling_backend.py 66 register_cuda_ci stage-b-test-small-1-gpu sampling
openai_server/features/test_enable_thinking.py 70 register_cuda_ci stage-b-test-small-1-gpu features
test_torchao.py 70 register_cuda_ci stage-b-test-small-1-gpu quantization
models/test_embedding_models.py 73 register_cuda_ci stage-b-test-small-1-gpu models
test_torch_compile.py 76 register_cuda_ci stage-b-test-small-1-gpu performance
openai_server/basic/test_openai_embedding.py 79 register_cuda_ci stage-b-test-small-1-gpu basic
rl/test_update_weights_from_tensor.py 80 register_cuda_ci stage-b-test-small-1-gpu rl
test_fused_moe.py 80 register_cuda_ci stage-b-test-small-1-gpu moe
test_penalty.py 82 register_cuda_ci stage-b-test-small-1-gpu sampling
openai_server/validation/test_openai_server_ignore_eos.py 85 register_cuda_ci stage-b-test-small-1-gpu validation
openai_server/features/test_reasoning_content.py 89 register_cuda_ci stage-b-test-small-1-gpu features
test_eagle_infer_beta.py 90 register_cuda_ci stage-b-test-small-1-gpu speculative
test_mla_fp8.py 93 register_cuda_ci stage-b-test-small-1-gpu attention_mla
lora/test_lora_backend.py 99 register_cuda_ci stage-b-test-small-1-gpu lora
models/test_cross_encoder_models.py 100 register_cuda_ci stage-b-test-small-1-gpu models
test_triton_sliding_window.py 100 register_cuda_ci stage-b-test-small-1-gpu other
models/test_generation_models.py 103 register_cuda_ci stage-b-test-small-1-gpu models
test_radix_attention.py 105 register_cuda_ci stage-b-test-small-1-gpu cache
test_no_chunked_prefill.py 108 register_cuda_ci stage-b-test-small-1-gpu performance
test_skip_tokenizer_init.py 117 register_cuda_ci stage-b-test-small-1-gpu tokenization
openai_server/features/test_json_mode.py 120 register_cuda_ci stage-b-test-small-1-gpu features
openai_server/function_call/test_tool_choice.py 120 register_cuda_ci stage-b-test-small-1-gpu function_call
test_torch_native_attention_backend.py 123 register_cuda_ci stage-b-test-small-1-gpu attention_backends
hicache/test_hicache_storage.py 127 register_cuda_ci stage-b-test-small-1-gpu hicache
test_priority_scheduling.py 130 register_cuda_ci stage-b-test-small-1-gpu scheduler
test_srt_endpoint.py 130 register_cuda_ci stage-b-test-small-1-gpu openai_server
models/test_reward_models.py 132 register_cuda_ci stage-b-test-small-1-gpu models
lora/test_lora.py 150 register_cuda_ci stage-b-test-large-1-gpu lora
lora/test_lora_spec_decoding.py 150 register_cuda_ci stage-b-test-large-1-gpu lora
models/test_qwen_models.py 150 register_cuda_ci stage-b-test-large-1-gpu models
test_constrained_decoding.py 150 register_cuda_ci stage-b-test-large-1-gpu sampling
test_standalone_speculative_decoding.py 150 register_cuda_ci stage-b-test-large-1-gpu speculative
test_triton_attention_backend.py 150 register_cuda_ci stage-b-test-large-1-gpu attention_backends
test_external_models.py 155 register_cuda_ci stage-b-test-large-1-gpu models
models/test_nvidia_nemotron_nano_v2.py 160 register_cuda_ci stage-b-test-large-1-gpu models
quant/test_w8a8_quantization.py 160 register_cuda_ci stage-b-test-large-1-gpu quant
test_vision_chunked_prefill.py 170 register_cuda_ci stage-b-test-large-1-gpu vision
test_mla.py 180 register_cuda_ci stage-b-test-large-1-gpu attention_mla
test_start_profile.py 180 register_cuda_ci stage-b-test-large-1-gpu observability
test_abort.py 190 register_cuda_ci stage-b-test-large-1-gpu runtime
rl/test_update_weights_from_disk.py 210 register_cuda_ci stage-b-test-large-1-gpu rl
test_torch_compile_moe.py 210 register_cuda_ci stage-b-test-large-1-gpu moe
test_flashmla.py 230 register_cuda_ci stage-b-test-large-1-gpu attention_mla
test_multi_tokenizer.py 230 register_cuda_ci stage-b-test-large-1-gpu tokenization
test_no_overlap_scheduler.py 234 register_cuda_ci stage-b-test-large-1-gpu scheduler
lora/test_lora_eviction.py 240 register_cuda_ci stage-b-test-large-1-gpu lora
openai_server/features/test_openai_server_hidden_states.py 240 register_cuda_ci stage-b-test-large-1-gpu features
openai_server/basic/test_openai_server.py 270 register_cuda_ci stage-b-test-large-1-gpu basic
test_ngram_speculative_decoding.py 290 register_cuda_ci stage-b-test-large-1-gpu speculative
test_mla_int8_deepseek_v3.py 300 register_cuda_ci stage-b-test-large-1-gpu attention_mla
test_vlm_input_format.py 300 register_cuda_ci stage-b-test-large-1-gpu vision
test_mla_flashinfer.py 302 register_cuda_ci stage-b-test-large-1-gpu attention_mla
test_eval_fp8_accuracy.py 303 register_cuda_ci stage-b-test-large-1-gpu quantization
test_score_api.py 310 register_cuda_ci stage-b-test-large-1-gpu other
models/test_transformers_models.py 320 register_cuda_ci stage-b-test-large-1-gpu models
test_hybrid_attn_backend.py 379 register_cuda_ci stage-b-test-large-1-gpu attention_backends
hicache/test_hicache_variants.py 393 register_cuda_ci stage-b-test-large-1-gpu hicache
test_deterministic.py 400 register_cuda_ci stage-b-test-large-1-gpu runtime
test_chunked_prefill.py 410 register_cuda_ci stage-b-test-large-1-gpu performance
test_fa3.py 420 register_cuda_ci stage-b-test-large-1-gpu attention_backends
test_retract_decode.py 450 register_cuda_ci stage-b-test-large-1-gpu runtime
test_srt_engine.py 450 register_cuda_ci stage-b-test-large-1-gpu other
models/test_encoder_embedding_models.py 460 register_cuda_ci stage-b-test-large-1-gpu models
test_mla_deepseek_v3.py 500 register_cuda_ci stage-c-test-large-1-gpu attention_mla
lora/test_lora_update.py 600 register_cuda_ci stage-c-test-large-1-gpu lora
models/test_vlm_models.py 741 register_cuda_ci stage-c-test-large-1-gpu models
test_eagle_infer_a.py 750 register_cuda_ci stage-c-test-large-1-gpu speculative
test_eagle_infer_b.py 750 register_cuda_ci stage-c-test-large-1-gpu speculative
test_gpt_oss_1gpu.py 750 register_cuda_ci stage-c-test-large-1-gpu models
test_piecewise_cuda_graph.py 750 register_cuda_ci stage-c-test-large-1-gpu performance
test_vision_openai_server_a.py 900 register_cuda_ci stage-c-test-large-1-gpu vision

Register Import Lines

For CPU tests:

from sglang.test.ci.ci_register import register_cpu_ci

register_cpu_ci(est_time=<TIME>, suite="stage-a-cpu")

For CUDA tests:

from sglang.test.ci.ci_register import register_cuda_ci

register_cuda_ci(est_time=<TIME>, suite="<SUITE>")

@alisonshao
Copy link
Collaborator Author

update scripts/ci/slash_command_handler.py once new stages are added.

cd sglang/test/srt

# Run a single file
python3 test_srt_endpoint.py
Copy link
Contributor

@SoluMilken SoluMilken Feb 27, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hi @alisonshao ,
There is no test_srt_endpoint.py under the folder sglang/test/srt.
Can you update this part of document? Thank you.

@SoluMilken
Copy link
Contributor

Hi @alisonshao ,

I was wondering if there’s any plan for this PR #13653 to be merged. The unittest README updates are really helpful for contributors, so if this PR won’t be merged, perhaps a separate PR just for the README could be considered.

I previously opened my own PR #18919, but it seems the changes here could fully cover mine.

Thanks!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

documentation Improvements or additions to documentation run-ci

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants