Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
42 changes: 42 additions & 0 deletions .buildkite/test-nightly.yml
Original file line number Diff line number Diff line change
Expand Up @@ -111,6 +111,48 @@ steps:
path: /mnt/hf-cache
type: DirectoryOrCreate

- label: ":full_moon: Documentation Example Code Test with H100"
timeout_in_minutes: 60
depends_on: upload-nightly-pipeline
if: build.env("NIGHTLY") == "1"
commands:
- export VLLM_WORKER_MULTIPROC_METHOD=spawn
- export VLLM_TEST_CLEAN_GPU_MEMORY="1"
- pytest -s -v tests/examples/online_serving/test_text_to_image.py tests/examples/offline_inference/test_text_to_image.py -m "advanced_model and example and H100" --run-level "advanced_model"
agents:
queue: "mithril-h100-pool"
Comment thread
fhfuih marked this conversation as resolved.
plugins:
- kubernetes:
podSpec:
containers:
- image: 936637512419.dkr.ecr.us-west-2.amazonaws.com/vllm-ci-pull-through-cache/q9t5s3a7/vllm-ci-test-repo:$BUILDKITE_COMMIT
resources:
limits:
nvidia.com/gpu: 2
volumeMounts:
- name: devshm
mountPath: /dev/shm
- name: hf-cache
mountPath: /root/.cache/huggingface
env:
- name: HF_HOME
value: /root/.cache/huggingface
- name: HF_TOKEN
valueFrom:
secretKeyRef:
name: hf-token-secret
key: token
nodeSelector:
node.kubernetes.io/instance-type: gpu-h100-sxm
volumes:
- name: devshm
emptyDir:
medium: Memory
- name: hf-cache
hostPath:
path: /mnt/hf-cache
type: DirectoryOrCreate

- label: ":full_moon: Qwen3-TTS Non-Async-Chunk E2E Test"
timeout_in_minutes: 30
depends_on: upload-nightly-pipeline
Expand Down
6 changes: 6 additions & 0 deletions docs/contributing/ci/.nav.yaml
Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This file is to explicitly exclude the newly add .inc.md file.

Previously without this file, all *.md files in this folder are implicitly added to doc. Now that L4 tests receive additional guides, I put them in separate subfiles to avoid cluttering CI_5levels.md. This is the same design pattern as docs/getting_started/installation/...

Original file line number Diff line number Diff line change
@@ -0,0 +1,6 @@
nav:
- CI_5levels.md
- failures.md
- test_guide.md
- test_markers.md
- test_style.md
155 changes: 81 additions & 74 deletions docs/contributing/ci/CI_5levels.md
Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Apart from adding doc test guides to L4 test documentation, I also change the previous <details><summary>... fold block to MkDocs-native ???+ example ... fold block syntax. Everything in the former block is expected to be HTML, not markdown, so all formatting is lost. The original content in this block is only indented

Original file line number Diff line number Diff line change
Expand Up @@ -545,97 +545,104 @@ L4 level testing is a comprehensive quality audit before a version release. It e
- ***Trigger Timing***: **`Nightly`**, automatically executed every night.
- ***Execution Environment***: ***GPU*** server clusters to meet the resource demands of performance testing.
- ***Script Example***:
<details>
<summary> Test Examples</summary>
When you want to add L4-level performance test cases, you can refer to the following format for case addition in tests/perf/tests/test.json:

```JSON
{
"test_name": "test_qwen3_omni",
"server_params": {
"model": "Qwen/Qwen3-Omni-30B-A3B-Instruct",
"stage_config_name": "qwen3_omni.yaml"
},
"benchmark_params": [
{
"dataset_name": "random",
"num_prompts": [10, 20],
"request_rate": [0.5, 1],
"random_input_len": 2500,
"random_output_len": 900,
"ignore_eos": true,
"percentile-metrics": "ttft,tpot,itl,e2el,audio_rtf,audio_ttfp,audio_duration",
"baseline": {
"mean_ttft_ms": 100000,
"mean_audio_ttfp_ms": 100000,
"mean_audio_rtf": 100000

???+ example "Test Examples"

When adding L4-level ***documentation example Tests***, please pay attention to the following guides.

--8<-- "docs/contributing/ci/test_examples/doc_example_tests.inc.md"

When you want to add L4-level ***performance test*** cases, you can refer to the following format for case addition in tests/perf/tests/test.json:

```JSON
{
"test_name": "test_qwen3_omni",
"server_params": {
"model": "Qwen/Qwen3-Omni-30B-A3B-Instruct",
"stage_config_name": "qwen3_omni.yaml"
},
"benchmark_params": [
{
"dataset_name": "random",
"num_prompts": [10, 20],
"request_rate": [0.5, 1],
"random_input_len": 2500,
"random_output_len": 900,
"ignore_eos": true,
"percentile-metrics": "ttft,tpot,itl,e2el,audio_rtf,audio_ttfp,audio_duration",
"baseline": {
"mean_ttft_ms": 100000,
"mean_audio_ttfp_ms": 100000,
"mean_audio_rtf": 100000
}
}
}
]
}
```
]
}
```

#### Parameter Explanation
**Parameter Explanation**

***Overview***
*Overview*

| Field | Required | Description |
| ---------------- | -------- | --------------------------------------------------------------- |
| test_name | Yes | Unique identifier for the test case |
| server_params | Yes | Server-side configuration parameters |
| benchmark_params | Yes | Benchmark running parameters (supports multiple configurations) |
| Field | Required | Description |
| ---------------- | -------- | --------------------------------------------------------------- |
| test_name | Yes | Unique identifier for the test case |
| server_params | Yes | Server-side configuration parameters |
| benchmark_params | Yes | Benchmark running parameters (supports multiple configurations) |

#### server_params Configuration
**server_params Configuration**

##### Basic Parameters
*Basic Parameters*

| Parameter | Required | Example | Description |
| ----------------- | -------- | ---------------------------------- | ----------------------------- |
| model | Yes | "Qwen/Qwen3-Omni-30B-A3B-Instruct" | Model name or path |
| stage_config_name | Yes | "qwen3_omni.yaml" | Stage configuration file name |
| Parameter | Required | Example | Description |
| ----------------- | -------- | ---------------------------------- | ----------------------------- |
| model | Yes | "Qwen/Qwen3-Omni-30B-A3B-Instruct" | Model name or path |
| stage_config_name | Yes | "qwen3_omni.yaml" | Stage configuration file name |

##### Dynamic Configuration (update/delete)
*Dynamic Configuration (update/delete)*

Supports incremental modifications based on the basic configuration:
Supports incremental modifications based on the basic configuration:

| Operation | Description |
| --------- | ------------------------------------ |
| update | Update or add configuration items |
| delete | Delete specified configuration items |
| Operation | Description |
| --------- | ------------------------------------ |
| update | Update or add configuration items |
| delete | Delete specified configuration items |

***Example***:
```
"update": {
"async_chunk": true, // Enable asynchronous chunk processing
"stage_args": {
"0": {
"engine_args.custom_process_next_stage_input_func": "vllm_omni.model_executor.stage_input_processors.qwen3_omni.thinker2talker_async_chunk"
***Example***:

```
"update": {
"async_chunk": true, // Enable asynchronous chunk processing
"stage_args": {
"0": {
"engine_args.custom_process_next_stage_input_func": "vllm_omni.model_executor.stage_input_processors.qwen3_omni.thinker2talker_async_chunk"
}
}
},
"delete": {
"stage_args": {
"2": ["custom_process_input_func"] // Delete this configuration for stage 2
}
}
},
"delete": {
"stage_args": {
"2": ["custom_process_input_func"] // Delete this configuration for stage 2
}
}
```
#### benchmark_params Configuration
```

You can add any benchmark running parameters you need here. For all optional parameters, refer to the [benchmark documentation](https://github.com/vllm-project/vllm-omni/blob/main/docs/cli/bench/serve.md). General modifications are as follows:
**benchmark_params Configuration**

1. Change the ---xxx-xx-xx running parameters to xxx_xx_xx format and fill them as keys in the JSON file.
2. For boolean variables in the running parameters, modify them to forms such as ignore_eos: true/false and fill them into the JSON file.
3. Add the baseline parameter to specify the required validation values, ensuring the validation metric names match those in the result.json generated by the benchmark.
4. The qps and concurrency modes are mutually exclusive. For detailed explanations, see the table below:
You can add any benchmark running parameters you need here. For all optional parameters, refer to the [benchmark documentation](https://github.com/vllm-project/vllm-omni/blob/main/docs/cli/bench/serve.md). General modifications are as follows:

| Parameter | Type | Required | Example/Values | Description |
| --------------- | ----------- | -------- | --------------- | -------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- |
| num_prompts | int / array | Yes | 10,[10, 20, 30] | Number of requests. Supports single values or arrays. If a single value is used, it will be automatically expanded to match the number of qps or max_concurrency, e.g., [10,10,10]. If an array is used, its length must match the number of qps or max_concurrency. |
| request_rate | int / array | No | 1, [1, 2, 3] | Queries per second. Supports single values or arrays. If a single value is used, it will be automatically expanded to match the number of num_prompts, e.g., [1,1,1]. If an array is used, its length must match the number of num_prompts. |
| max_concurrency | int / array | No | 1, [1, 2, 3] | Queries per second. Supports single values or arrays. If a single value is used, it will be automatically expanded to match the number of num_prompts, e.g., [1,1,1]. If an array is used, its length must match the number of num_prompts. |
</details>
1. Change the ---xxx-xx-xx running parameters to xxx_xx_xx format and fill them as keys in the JSON file.
2. For boolean variables in the running parameters, modify them to forms such as ignore_eos: true/false and fill them into the JSON file.
3. Add the baseline parameter to specify the required validation values, ensuring the validation metric names match those in the result.json generated by the benchmark.
4. The qps and concurrency modes are mutually exclusive. For detailed explanations, see the table below:

| Parameter | Type | Required | Example/Values | Description |
| --------------- | ----------- | -------- | --------------- | -------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- |
| num_prompts | int / array | Yes | 10,[10, 20, 30] | Number of requests. Supports single values or arrays. If a single value is used, it will be automatically expanded to match the number of qps or max_concurrency, e.g., [10,10,10]. If an array is used, its length must match the number of qps or max_concurrency. |
| request_rate | int / array | No | 1, [1, 2, 3] | Queries per second. Supports single values or arrays. If a single value is used, it will be automatically expanded to match the number of num_prompts, e.g., [1,1,1]. If an array is used, its length must match the number of num_prompts. |
| max_concurrency | int / array | No | 1, [1, 2, 3] | Queries per second. Supports single values or arrays. If a single value is used, it will be automatically expanded to match the number of num_prompts, e.g., [1,1,1]. If an array is used, its length must match the number of num_prompts. |
</details>

- - ***Run Command***: (Specific commands would depend on the performance testing tool and configuration defined in `nightly.json`).
- - ***Run Command***: (Specific commands would depend on the performance testing tool and configuration defined in `nightly.json`).

## Chapter 4: L5 Level Testing - Stability and Reliability Testing

Expand Down
49 changes: 49 additions & 0 deletions docs/contributing/ci/test_examples/doc_example_tests.inc.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,49 @@
**Preferred Test Strategy**

Use one of the following patterns depending on page type:

- **Dynamic code-block extraction (preferred for offline docs)**
- Extract Python/Bash code blocks from markdown AST analyzer, then execute them directly in tests.
- Benefit: test logic stays automatically aligned with docs.
- Basic idea: Use `ReadmeSnippet.extract_readme_snippets` to extract a list of code blocks as a global variable in file,
use this list as `pytest.mark.parametrize` parameters, and pass each snippet item to `example_runner.run` inside the parametrized test.
Additionally pass an `output_subfolder` argument for the 2nd-level output folder explained in **Output Directory Structure** below.
If any extra environment variable is need for a test (e.g., the example script reads it), `example_runner.run` also accepts a 3rd `env` parameter.
- See [tests/examples/offline_inference/test_text_to_image.py](https://github.com/vllm-project/vllm-omni/blob/main/tests/examples/offline_inference/test_text_to_image.py) for reference implementation.

- **Explicit copied scripts (used by online docs for now until further update)**
- For online serving pages, it is acceptable to copy code from docs into dedicated test functions, because only client-side, request-sending scripts are tested.
- Benefit: dynamic extraction is overly complex: need to tell server-launch and client-request scripts.
- Requirement: copied test code must be kept in sync with doc updates.

**Test Case Naming Convention**

- Dynamic code extraction (auto-generated internally):
- `test_{single_function_name_matching_file_name}[h2_heading_00X]`
- Example: `test_text_to_image[basic_usage_001]`
- Explicit copied scripts:
- `test_{h2_heading_00X}[{dummy_param_id_for_omni_server}]`
- Example: `test_api_calls_001[omni_server0]`

**Runtime Configuration**

In the example code tests, do **not** reduce `num_inference_steps` just to speed up the tests unless there is a strong CI reliability reason to do otherwise.

**Skipping Rules**

You may skip examples falling in the following categories using `pytest.mark.skip` or `pytest.skip`:

- Gradio UI scripts
- Scenarios that significantly overlap with existing tests and add little new coverage.

**Output Directory Structure**

Use a three-layer output structure to store output artifacts:

1. Root output directory
- Auto-detected from `OUTPUT_DIR` env var or auto-generated under `/tmp`.
2. Doc-page directory
- Define and use a clear page-level folder name in each `test_*.py` yourself (abbreviations are acceptable, e.g., `example_offline_t2i`).
3. Test-case directory
- Must match the case identifier (e.g., `basic_usage_001`).
- Auto-generated for dynamic extracted tests.
6 changes: 6 additions & 0 deletions docs/contributing/ci/tests_style.md
Original file line number Diff line number Diff line change
Expand Up @@ -157,6 +157,12 @@ vllm_omni/ tests/
├── qwen3_omni_ci.yaml
├── bagel_*.yaml
└── npu/, rocm/, etc.
examples/ tests
│ └── examples
├── online_serving/ → ├── online_serving/
│ └── {doc_page_title}/README.md │ └── test_{doc_page_title}.py ⬜
└── offline_inference/ → └── offline_inference/
└── {doc_page_title}/README.md └── test_{doc_page_title}.py ⬜
```


Expand Down
Loading
Loading