vllm-project · wtomin · Mar 23, 2026 · Mar 12, 2026 · Mar 23, 2026 · fhfuih
@@ -111,6 +111,48 @@ steps:
                   path: /mnt/hf-cache
                   type: DirectoryOrCreate
 
+  - label: ":full_moon: Documentation Example Code Test with H100"
+    timeout_in_minutes: 60
+    depends_on: upload-nightly-pipeline
+    if: build.env("NIGHTLY") == "1"
+    commands:
+      - export VLLM_WORKER_MULTIPROC_METHOD=spawn
+      - export VLLM_TEST_CLEAN_GPU_MEMORY="1"
+      - pytest -s -v tests/examples/online_serving/test_text_to_image.py tests/examples/offline_inference/test_text_to_image.py -m "advanced_model and example and H100" --run-level "advanced_model"
+    agents:
+      queue: "mithril-h100-pool"
+    plugins:
+      - kubernetes:
+          podSpec:
+            containers:
+              - image: 936637512419.dkr.ecr.us-west-2.amazonaws.com/vllm-ci-pull-through-cache/q9t5s3a7/vllm-ci-test-repo:$BUILDKITE_COMMIT
+                resources:
+                  limits:
+                    nvidia.com/gpu: 2
+                volumeMounts:
+                  - name: devshm
+                    mountPath: /dev/shm
+                  - name: hf-cache
+                    mountPath: /root/.cache/huggingface
+                env:
+                  - name: HF_HOME
+                    value: /root/.cache/huggingface
+                  - name: HF_TOKEN
+                    valueFrom:
+                      secretKeyRef:
+                        name: hf-token-secret
+                        key: token
+            nodeSelector:
+              node.kubernetes.io/instance-type: gpu-h100-sxm
+            volumes:
+              - name: devshm
+                emptyDir:
+                  medium: Memory
+              - name: hf-cache
+                hostPath:
+                  path: /mnt/hf-cache
+                  type: DirectoryOrCreate
+
   - label: ":full_moon: Qwen3-TTS Non-Async-Chunk E2E Test"
     timeout_in_minutes: 30
     depends_on: upload-nightly-pipeline

@@ -0,0 +1,6 @@
+nav:
+  - CI_5levels.md
+  - failures.md
+  - test_guide.md
+  - test_markers.md
+  - test_style.md
@@ -545,97 +545,104 @@ L4 level testing is a comprehensive quality audit before a version release. It e
 -   ***Trigger Timing***: **`Nightly`**, automatically executed every night.
 -   ***Execution Environment***: ***GPU*** server clusters to meet the resource demands of performance testing.
 -   ***Script Example***:
-<details>
-<summary> Test Examples</summary>
-When you want to add L4-level performance test cases, you can refer to the following format for case addition in tests/perf/tests/test.json:
-
-```JSON
-{
-    "test_name": "test_qwen3_omni",
-    "server_params": {
-        "model": "Qwen/Qwen3-Omni-30B-A3B-Instruct",
-        "stage_config_name": "qwen3_omni.yaml"
-    },
-    "benchmark_params": [
-        {
-            "dataset_name": "random",
-            "num_prompts": [10, 20],
-            "request_rate": [0.5, 1],
-            "random_input_len": 2500,
-            "random_output_len": 900,
-            "ignore_eos": true,
-            "percentile-metrics": "ttft,tpot,itl,e2el,audio_rtf,audio_ttfp,audio_duration",
-            "baseline": {
-                "mean_ttft_ms": 100000,
-                "mean_audio_ttfp_ms": 100000,
-                "mean_audio_rtf": 100000
+
+???+ example "Test Examples"
+
+    When adding L4-level ***documentation example Tests***, please pay attention to the following guides.
+
+    --8<-- "docs/contributing/ci/test_examples/doc_example_tests.inc.md"
+
+    When you want to add L4-level ***performance test*** cases, you can refer to the following format for case addition in tests/perf/tests/test.json:
+
+    ```JSON
+    {
+        "test_name": "test_qwen3_omni",
+        "server_params": {
+            "model": "Qwen/Qwen3-Omni-30B-A3B-Instruct",
+            "stage_config_name": "qwen3_omni.yaml"
+        },
+        "benchmark_params": [
+            {
+                "dataset_name": "random",
+                "num_prompts": [10, 20],
+                "request_rate": [0.5, 1],
+                "random_input_len": 2500,
+                "random_output_len": 900,
+                "ignore_eos": true,
+                "percentile-metrics": "ttft,tpot,itl,e2el,audio_rtf,audio_ttfp,audio_duration",
+                "baseline": {
+                    "mean_ttft_ms": 100000,
+                    "mean_audio_ttfp_ms": 100000,
+                    "mean_audio_rtf": 100000
+                }
             }
-        }
-    ]
-}
-```
+        ]
+    }
+    ```
 
-#### Parameter Explanation
+    **Parameter Explanation**
 
-***Overview***
+    *Overview*
 
-| Field            | Required | Description                                                     |
-| ---------------- | -------- | --------------------------------------------------------------- |
-| test_name        | Yes      | Unique identifier for the test case                             |
-| server_params    | Yes      | Server-side configuration parameters                            |
-| benchmark_params | Yes      | Benchmark running parameters (supports multiple configurations) |
+    | Field            | Required | Description                                                     |
+    | ---------------- | -------- | --------------------------------------------------------------- |
+    | test_name        | Yes      | Unique identifier for the test case                             |
+    | server_params    | Yes      | Server-side configuration parameters                            |
+    | benchmark_params | Yes      | Benchmark running parameters (supports multiple configurations) |
 
-#### server_params Configuration
+    **server_params Configuration**
 
-##### Basic Parameters
+    *Basic Parameters*
 
-| Parameter         | Required | Example                            | Description                   |
-| ----------------- | -------- | ---------------------------------- | ----------------------------- |
-| model             | Yes      | "Qwen/Qwen3-Omni-30B-A3B-Instruct" | Model name or path            |
-| stage_config_name | Yes      | "qwen3_omni.yaml"                  | Stage configuration file name |
+    | Parameter         | Required | Example                            | Description                   |
+    | ----------------- | -------- | ---------------------------------- | ----------------------------- |
+    | model             | Yes      | "Qwen/Qwen3-Omni-30B-A3B-Instruct" | Model name or path            |
+    | stage_config_name | Yes      | "qwen3_omni.yaml"                  | Stage configuration file name |
 
-##### Dynamic Configuration (update/delete)
+    *Dynamic Configuration (update/delete)*
 
-Supports incremental modifications based on the basic configuration:
+    Supports incremental modifications based on the basic configuration:
 
-| Operation | Description                          |
-| --------- | ------------------------------------ |
-| update    | Update or add configuration items    |
-| delete    | Delete specified configuration items |
+    | Operation | Description                          |
+    | --------- | ------------------------------------ |
+    | update    | Update or add configuration items    |
+    | delete    | Delete specified configuration items |
 
-***Example***:
-```
-"update": {
-    "async_chunk": true,  // Enable asynchronous chunk processing
-    "stage_args": {
-        "0": {
-            "engine_args.custom_process_next_stage_input_func": "vllm_omni.model_executor.stage_input_processors.qwen3_omni.thinker2talker_async_chunk"
+    ***Example***:
+
+    ```
+    "update": {
+        "async_chunk": true,  // Enable asynchronous chunk processing
+        "stage_args": {
+            "0": {
+                "engine_args.custom_process_next_stage_input_func": "vllm_omni.model_executor.stage_input_processors.qwen3_omni.thinker2talker_async_chunk"
+            }
+        }
+    },
+    "delete": {
+        "stage_args": {
+            "2": ["custom_process_input_func"]  // Delete this configuration for stage 2
         }
     }
-},
-"delete": {
-    "stage_args": {
-        "2": ["custom_process_input_func"]  // Delete this configuration for stage 2
-    }
-}
-```
-#### benchmark_params Configuration
+    ```
 
-You can add any benchmark running parameters you need here. For all optional parameters, refer to the [benchmark documentation](https://github.com/vllm-project/vllm-omni/blob/main/docs/cli/bench/serve.md). General modifications are as follows:
+    **benchmark_params Configuration**
 
-1.  Change the ---xxx-xx-xx running parameters to xxx_xx_xx format and fill them as keys in the JSON file.
-2.  For boolean variables in the running parameters, modify them to forms such as ignore_eos: true/false and fill them into the JSON file.
-3.  Add the baseline parameter to specify the required validation values, ensuring the validation metric names match those in the result.json generated by the benchmark.
-4.  The qps and concurrency modes are mutually exclusive. For detailed explanations, see the table below:
+    You can add any benchmark running parameters you need here. For all optional parameters, refer to the [benchmark documentation](https://github.com/vllm-project/vllm-omni/blob/main/docs/cli/bench/serve.md). General modifications are as follows:
 
-| Parameter       | Type        | Required | Example/Values  | Description                                                                                                                                                                                                                                                          |
-| --------------- | ----------- | -------- | --------------- | -------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- |
-| num_prompts     | int / array | Yes      | 10,[10, 20, 30] | Number of requests. Supports single values or arrays. If a single value is used, it will be automatically expanded to match the number of qps or max_concurrency, e.g., [10,10,10]. If an array is used, its length must match the number of qps or max_concurrency. |
-| request_rate    | int / array | No       | 1, [1, 2, 3]    | Queries per second. Supports single values or arrays. If a single value is used, it will be automatically expanded to match the number of num_prompts, e.g., [1,1,1]. If an array is used, its length must match the number of num_prompts.                          |
-| max_concurrency | int / array | No       | 1, [1, 2, 3]    | Queries per second. Supports single values or arrays. If a single value is used, it will be automatically expanded to match the number of num_prompts, e.g., [1,1,1]. If an array is used, its length must match the number of num_prompts.                          |
-</details>
+    1.  Change the ---xxx-xx-xx running parameters to xxx_xx_xx format and fill them as keys in the JSON file.
+    2.  For boolean variables in the running parameters, modify them to forms such as ignore_eos: true/false and fill them into the JSON file.
+    3.  Add the baseline parameter to specify the required validation values, ensuring the validation metric names match those in the result.json generated by the benchmark.
+    4.  The qps and concurrency modes are mutually exclusive. For detailed explanations, see the table below:
+
+    | Parameter       | Type        | Required | Example/Values  | Description                                                                                                                                                                                                                                                          |
+    | --------------- | ----------- | -------- | --------------- | -------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- |
+    | num_prompts     | int / array | Yes      | 10,[10, 20, 30] | Number of requests. Supports single values or arrays. If a single value is used, it will be automatically expanded to match the number of qps or max_concurrency, e.g., [10,10,10]. If an array is used, its length must match the number of qps or max_concurrency. |
+    | request_rate    | int / array | No       | 1, [1, 2, 3]    | Queries per second. Supports single values or arrays. If a single value is used, it will be automatically expanded to match the number of num_prompts, e.g., [1,1,1]. If an array is used, its length must match the number of num_prompts.                          |
+    | max_concurrency | int / array | No       | 1, [1, 2, 3]    | Queries per second. Supports single values or arrays. If a single value is used, it will be automatically expanded to match the number of num_prompts, e.g., [1,1,1]. If an array is used, its length must match the number of num_prompts.                          |
+    </details>
 
--   -   ***Run Command***: (Specific commands would depend on the performance testing tool and configuration defined in `nightly.json`).
+    -   -   ***Run Command***: (Specific commands would depend on the performance testing tool and configuration defined in `nightly.json`).
 
 ## Chapter 4: L5 Level Testing - Stability and Reliability Testing
 

@@ -0,0 +1,49 @@
+**Preferred Test Strategy**
+
+Use one of the following patterns depending on page type:
+
+- **Dynamic code-block extraction (preferred for offline docs)**
+    - Extract Python/Bash code blocks from markdown AST analyzer, then execute them directly in tests.
+    - Benefit: test logic stays automatically aligned with docs.
+    - Basic idea: Use `ReadmeSnippet.extract_readme_snippets` to extract a list of code blocks as a global variable in file,
+    use this list as `pytest.mark.parametrize` parameters, and pass each snippet item to `example_runner.run` inside the parametrized test.
+    Additionally pass an `output_subfolder` argument for the 2nd-level output folder explained in **Output Directory Structure** below.
+    If any extra environment variable is need for a test (e.g., the example script reads it), `example_runner.run` also accepts a 3rd `env` parameter.
+    - See [tests/examples/offline_inference/test_text_to_image.py](https://github.com/vllm-project/vllm-omni/blob/main/tests/examples/offline_inference/test_text_to_image.py) for reference implementation.
+
+- **Explicit copied scripts (used by online docs for now until further update)**
+    - For online serving pages, it is acceptable to copy code from docs into dedicated test functions, because only client-side, request-sending scripts are tested.
+    - Benefit: dynamic extraction is overly complex: need to tell server-launch and client-request scripts.
+    - Requirement: copied test code must be kept in sync with doc updates.
+
+**Test Case Naming Convention**
+
+- Dynamic code extraction (auto-generated internally):
+    - `test_{single_function_name_matching_file_name}[h2_heading_00X]`
+    - Example: `test_text_to_image[basic_usage_001]`
+- Explicit copied scripts:
+    - `test_{h2_heading_00X}[{dummy_param_id_for_omni_server}]`
+    - Example: `test_api_calls_001[omni_server0]`
+
+**Runtime Configuration**
+
+In the example code tests, do **not** reduce `num_inference_steps` just to speed up the tests  unless there is a strong CI reliability reason to do otherwise.
+
+**Skipping Rules**
+
+You may skip examples falling in the following categories using `pytest.mark.skip` or `pytest.skip`:
+
+- Gradio UI scripts
+- Scenarios that significantly overlap with existing tests and add little new coverage.
+
+**Output Directory Structure**
+
+Use a three-layer output structure to store output artifacts:
+
+1. Root output directory
+    - Auto-detected from `OUTPUT_DIR` env var or auto-generated under `/tmp`.
+2. Doc-page directory
+    - Define and use a clear page-level folder name in each `test_*.py` yourself (abbreviations are acceptable, e.g., `example_offline_t2i`).
+3. Test-case directory
+    - Must match the case identifier (e.g., `basic_usage_001`).
+    - Auto-generated for dynamic extracted tests.
@@ -157,6 +157,12 @@ vllm_omni/                                    tests/
                                                        ├── qwen3_omni_ci.yaml
                                                        ├── bagel_*.yaml
                                                        └── npu/, rocm/, etc.
+examples/                                     tests
+│                                             └── examples
+├── online_serving/                     →         ├── online_serving/
+│   └── {doc_page_title}/README.md                │   └── test_{doc_page_title}.py  ⬜
+└── offline_inference/                  →         └── offline_inference/
+    └── {doc_page_title}/README.md                    └── test_{doc_page_title}.py  ⬜
 ```