sgl-project · SoluMilken · Feb 17, 2026
diff --git a/test/README.md b/test/README.md
@@ -4,33 +4,34 @@ SGLang uses the built-in library [unittest](https://docs.python.org/3/library/un
 
 ## Test Backend Runtime
 ```bash
-cd sglang/test/srt
-
 # Run a single file
-python3 test_srt_endpoint.py
+> cd test/registered
+> python3 core/test_srt_endpoint.py
 
 # Run a single test
-python3 test_srt_endpoint.py TestSRTEndpoint.test_simple_decode
+> cd test/registered
+> python3 core/test_srt_endpoint.py TestSRTEndpoint.test_simple_decode
 
 # Run a suite with multiple files
-python3 run_suite.py --suite per-commit
+> cd test
+> python run_suite.py --hw cuda --suite stage-b-test-small-1-gpu
 ```
 
 ## Test Frontend Language
 ```bash
-cd sglang/test/lang
+> cd test/manual/lang_frontend
 
 # Run a single file
-python3 test_choices.py
+> python3 test_choices.py
 ```
 
 ## Adding or Updating Tests in CI
 
-- Create new test files under `test/srt` or `test/lang` depending on the type of test.
-- For nightly tests, place them in `test/srt/nightly/`. Use the `NightlyBenchmarkRunner` helper class in `nightly_utils.py` for performance benchmarking tests.
-- Ensure they are referenced in the respective `run_suite.py` (e.g., `test/srt/run_suite.py`) so they are picked up in CI. For most small test cases, they can be added to the `per-commit-1-gpu` suite. Sort the test cases alphabetically by name.
-- Ensure you added `unittest.main()` for unittest and `sys.exit(pytest.main([__file__]))` for pytest in the scripts. The CI run them via `python3 test_file.py`.
-- The CI will run some suites such as `per-commit-1-gpu`, `per-commit-2-gpu`, and `nightly-1-gpu` automatically. If you need special setup or custom test groups, you may modify the workflows in [`.github/workflows/`](https://github.com/sgl-project/sglang/tree/main/.github/workflows).
+- Create new test files under `test/registered/` (organized by category) for CI tests, or `test/manual/` for manual tests.
+- For nightly tests, use the CI registry with `nightly=True`. For performance benchmarking tests, use the `NightlyBenchmarkRunner` helper class in `python/sglang/test/nightly_utils.py`.
+- Register tests using the CI registry system (see below). For most small test cases, use the `stage-b-test-small-1-gpu` suite. Sort the test cases alphabetically by name.
+- Ensure you added `unittest.main()` for unittest and `sys.exit(pytest.main([__file__]))` for pytest in the scripts. The CI runs them via `python3 test_file.py`.
+- The CI will run some suites such as `stage-b-test-small-1-gpu`, `stage-b-test-large-2-gpu`, and `nightly-1-gpu` automatically. If you need special setup or custom test groups, you may modify the workflows in [`.github/workflows/`](https://github.com/sgl-project/sglang/tree/main/.github/workflows).
 
 ## CI Registry System
 
@@ -60,7 +61,7 @@ register_cuda_ci(est_time=200, suite="nightly-1-gpu", nightly=True)
 
 # Multi-backend test
 register_cuda_ci(est_time=80, suite="stage-b-test-small-1-gpu")
-register_amd_ci(est_time=120, suite="stage-a-test-1")
+register_amd_ci(est_time=120, suite="stage-b-test-small-1-gpu-amd")
 
 # Temporarily disabled test
 register_cuda_ci(est_time=80, suite="stage-b-test-small-1-gpu", disabled="flaky - see #12345")
@@ -98,16 +99,24 @@ If a test cannot run on 5090 due to any of the above, use `stage-b-test-large-1-
 ### Available Suites
 
 **Per-Commit (CUDA)**:
-- Stage A: `stage-a-test-1` (locked), `stage-a-test-2`, `stage-a-test-cpu`
+- Stage A: `stage-a-test-1` (locked), `stage-a-cpu-only`
 - Stage B: `stage-b-test-small-1-gpu` (5090), `stage-b-test-large-1-gpu` (H100), `stage-b-test-large-2-gpu`
 - Stage C (4-GPU): `stage-c-test-4-gpu-h100`, `stage-c-test-4-gpu-b200`, `stage-c-test-4-gpu-gb200`, `stage-c-test-deepep-4-gpu`
 - Stage C (8-GPU): `stage-c-test-8-gpu-h20`, `stage-c-test-8-gpu-h200`, `stage-c-test-8-gpu-b200`, `stage-c-test-deepep-8-gpu-h200`
 
 **Per-Commit (AMD)**:
-- `stage-a-test-1`, `stage-b-test-small-1-gpu-amd`, `stage-b-test-large-2-gpu-amd`
+- `stage-a-test-1-amd`, `stage-b-test-small-1-gpu-amd`, `stage-b-test-large-1-gpu-amd`, `stage-b-test-large-2-gpu-amd`
+
+**Per-Commit (NPU)**:
+- `stage-a-test-1`, `stage-b-test-1-npu-a2`, `stage-b-test-2-npu-a2`, `stage-b-test-4-npu-a3`, `stage-b-test-16-npu-a3`
 
-**Nightly**:
+**Nightly (CUDA)**:
 - `nightly-1-gpu`, `nightly-2-gpu`, `nightly-4-gpu`, `nightly-8-gpu`, etc.
+- Eval: `nightly-eval-text-2-gpu`, `nightly-eval-vlm-2-gpu`
+- Perf: `nightly-perf-text-2-gpu`, `nightly-perf-vlm-2-gpu`
+
+**Nightly (AMD)**:
+- `nightly-amd`, `nightly-amd-1-gpu`, `nightly-amd-8-gpu`, `nightly-amd-vlm`
 
 ### Running Tests with run_suite.py
 
@@ -125,17 +134,16 @@ python test/run_suite.py --hw cuda --suite stage-b-test-small-1-gpu \
 
 ## Writing Elegant Test Cases
 
-- Learn from existing examples in [sglang/test/srt](https://github.com/sgl-project/sglang/tree/main/test/srt).
+- Learn from existing examples in [sglang/test/registered](https://github.com/sgl-project/sglang/tree/main/test/registered).
 - Reduce the test time by using smaller models and reusing the server for multiple test cases. Launching a server takes a lot of time.
 - Use as few GPUs as possible. Do not run long tests with 8-gpu runners.
 - If the test cases take too long, considering adding them to nightly tests instead of per-commit tests.
 - Keep each test function focused on a single scenario or piece of functionality.
 - Give tests descriptive names reflecting their purpose.
 - Use robust assertions (e.g., assert, unittest methods) to validate outcomes.
 - Clean up resources to avoid side effects and preserve test independence.
-- Reduce the test time by using smaller models and reusing the server for multiple test cases.
 
 
 ## Adding New Models to Nightly CI
-- **For text models**: extend [global model lists variables](https://github.com/sgl-project/sglang/blob/85c1f7937781199203b38bb46325a2840f353a04/python/sglang/test/test_utils.py#L104) in `test_utils.py`, or add more model lists
-- **For vlms**: extend the `MODEL_THRESHOLDS` global dictionary in `test/srt/nightly/test_vlms_mmmu_eval.py`
+- **For text models**: extend the `DEFAULT_MODEL_NAME_FOR_NIGHTLY_EVAL_*` variables in `python/sglang/test/test_utils.py`, or add new model constants.
+- **For VLMs**: extend the `MODEL_THRESHOLDS` dictionary in `test/registered/eval/test_vlms_mmmu_eval.py`.