Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
12 changes: 10 additions & 2 deletions .buildkite/pipeline.yml
Original file line number Diff line number Diff line change
Expand Up @@ -44,11 +44,19 @@ steps:
agents:
queue: "cpu_queue_premerge"

# L4 Test — main+NIGHTLY=1 (scheduled), or PR with label nightly-test (e.g. add label then Rebuild)
# L4 Test — main+NIGHTLY=1 (scheduled), or PR with specific label (e.g. add label then Rebuild)
- label: "Upload Nightly Pipeline"
depends_on: image-build
key: upload-nightly-pipeline
if: '(build.branch == "main" && build.env("NIGHTLY") == "1") || (build.branch != "main" && build.pull_request.labels includes "nightly-test")'
if: >-
(build.branch == "main" && build.env("NIGHTLY") == "1") ||
(build.branch != "main" && (
build.pull_request.labels includes "nightly-test" ||
build.pull_request.labels includes "omni-test" ||
build.pull_request.labels includes "tts-test" ||
build.pull_request.labels includes "diffusion-x2iat-test" ||
build.pull_request.labels includes "diffusion-x2v-test"
))
commands:
- buildkite-agent pipeline upload .buildkite/test-nightly.yml
agents:
Expand Down
417 changes: 0 additions & 417 deletions .buildkite/test-nightly-diffusion.yml

This file was deleted.

432 changes: 393 additions & 39 deletions .buildkite/test-nightly.yml

Large diffs are not rendered by default.

7 changes: 4 additions & 3 deletions docs/contributing/ci/CI_5levels.md
Original file line number Diff line number Diff line change
Expand Up @@ -86,7 +86,8 @@ Through five levels (L1-L5) and common (Common) specifications, the system clari
/tests/e2e/online_serving/test_{model_name}_expansion.py<br>
/tests/e2e/offline_inference/test_{model_name}_expansion.py<br>
<strong>Performance:</strong><br>
/tests/dfx/perf/tests/test.json<br>
/tests/dfx/perf/tests/test_qwen_omni.json (Omni), test_tts.json (TTS),<br>
and /tests/dfx/perf/tests/test_{diffusion_model}_vllm_omni.json (Diffusion)<br>
<strong>Doc Test:</strong><br>
tests/example/online_serving/test_{model_name}.py<br>
tests/example/offline_inference/test_{model_name}.py
Expand Down Expand Up @@ -530,13 +531,13 @@ L4 level testing is a comprehensive quality audit before a version release. It e
### 3.2 Testing Content and Scope

- ***Full Functionality Testing***: Executes all test cases defined in `test_{model_name}_expansion.py`, covering all implemented features, positive flows, boundary conditions, and exception handling.
- ***Performance Testing***: Uses the `tests/dfx/perf/tests/test.json` configuration file to drive performance testing tools for stress, load, and endurance tests, collecting metrics like throughput, response time, and resource utilization.
- ***Performance Testing***: Uses `tests/dfx/perf/tests/test_qwen_omni.json`, `tests/dfx/perf/tests/test_tts.json`, and diffusion configs in the form `tests/dfx/perf/tests/test_*_vllm_omni.json` (passed to `run_benchmark.py` via `--test-config-file`) to drive performance testing tools for stress, load, and endurance tests, collecting metrics like throughput, response time, and resource utilization.
- ***Documentation Testing***: Verifies whether the example code provided to users is runnable and its results match the description.

### 3.3 Test Directory and Execution Files

- ***Functional Testing***: Same directories as L3.
- ***Performance Test Configuration***: `tests/dfx/perf/tests/test.json`
- ***Performance Test Configuration***: `tests/dfx/perf/tests/test_qwen_omni.json`, `tests/dfx/perf/tests/test_tts.json`, and diffusion configs `tests/dfx/perf/tests/test_*_vllm_omni.json` (e.g. `test_qwen_image_vllm_omni.json`)
- ***Documentation Example Tests***:
- - `tests/example/online_serving/test_{model_name}.py`
- `tests/example/offline_inference/test_{model_name}.py`
Expand Down
Original file line number Diff line number Diff line change
@@ -1,4 +1,4 @@
When you want to add L4-level ***performance test*** cases, you can refer to the following format for case addition in tests/dfx/perf/tests/test.json:
When you want to add L4-level ***performance test*** cases, you can refer to the following format for case addition in `tests/dfx/perf/tests/test_qwen_omni.json`, `tests/dfx/perf/tests/test_tts.json`, or diffusion configs such as `tests/dfx/perf/tests/test_*_vllm_omni.json` (selected via `pytest ... run_benchmark.py --test-config-file <path>`):

```JSON
{
Expand Down
5 changes: 2 additions & 3 deletions docs/contributing/ci/test_guide.md
Original file line number Diff line number Diff line change
Expand Up @@ -45,7 +45,6 @@ Our test scripts use the pytest framework. First, please use `git clone https://
=== "L3 level & L4 level"

```bash
cd tests
pytest -s -v -m "advanced_model" --run-level=advanced_model
```
If you only want to run L3 test case, you can use:
Expand All @@ -60,9 +59,9 @@ Our test scripts use the pytest framework. First, please use `git clone https://
```bash
pytest -s -v -m "core_model and distributed_cuda and L4" --run-level=core_model
```
Note: To run performance tests, use:
Note: To run performance tests (defaults to ``test_qwen_omni.json``; use ``--test-config-file tests/dfx/perf/tests/test_tts.json`` for TTS):
```bash
pytest -s -v perf/scripts/run_benchmark.py
pytest -s -v tests/dfx/perf/scripts/run_benchmark.py
```

The latest L3 test commands for various test suites can be found in the [pipeline](https://github.com/vllm-project/vllm-omni/blob/main/.buildkite/test-merge.yml).
Expand Down
12 changes: 12 additions & 0 deletions tests/dfx/conftest.py
Original file line number Diff line number Diff line change
Expand Up @@ -2,6 +2,8 @@
from pathlib import Path
from typing import Any

import pytest

from tests.conftest import modify_stage_config


Expand Down Expand Up @@ -95,3 +97,13 @@ def create_benchmark_indices(
indices.append((test_name, idx))

return indices


def pytest_addoption(parser: pytest.Parser) -> None:
"""Register shared CLI options for DFX benchmark suites."""
parser.addoption(
"--test-config-file",
action="store",
default=None,
help=("Path to benchmark config JSON. Example: --test-config-file tests/dfx/perf/tests/test_tts.json"),
)
49 changes: 31 additions & 18 deletions tests/dfx/perf/scripts/run_benchmark.py
Original file line number Diff line number Diff line change
Expand Up @@ -21,10 +21,30 @@
os.environ["VLLM_TEST_CLEAN_GPU_MEMORY"] = "0"


CONFIG_FILE_PATH = str(Path(__file__).parent.parent / "tests" / "test.json")
BENCHMARK_CONFIGS = load_configs(CONFIG_FILE_PATH)
STAGE_INIT_TIMEOUT = 600
def _get_config_file_from_argv() -> str | None:
"""Read ``--test-config-file`` from ``sys.argv`` at import time so parametrization can use it."""
import sys

for i, arg in enumerate(sys.argv):
if arg == "--test-config-file" and i + 1 < len(sys.argv):
return sys.argv[i + 1]
if arg.startswith("--test-config-file="):
return arg.split("=", 1)[1]
return None


_PERF_TESTS_DIR = Path(__file__).resolve().parent.parent / "tests"
_DEFAULT_CONFIG_FILE = str(_PERF_TESTS_DIR / "test_qwen_omni.json")

CONFIG_FILE_PATH = _get_config_file_from_argv()
if CONFIG_FILE_PATH is None:
print(
"No --test-config-file in argv, using default: tests/dfx/perf/tests/test_qwen_omni.json "
"(override with e.g. --test-config-file tests/dfx/perf/tests/test_tts.json)"
)
CONFIG_FILE_PATH = _DEFAULT_CONFIG_FILE

BENCHMARK_CONFIGS = load_configs(CONFIG_FILE_PATH)

STAGE_CONFIGS_DIR = Path(__file__).parent.parent / "stage_configs"
test_params = create_unique_server_params(BENCHMARK_CONFIGS, STAGE_CONFIGS_DIR)
Expand All @@ -44,7 +64,7 @@ def omni_server(request):

print(f"Starting OmniServer with test: {test_name}, model: {model}")

server_args = ["--stage-init-timeout", str(STAGE_INIT_TIMEOUT), "--init-timeout", "900"]
server_args = ["--stage-init-timeout", "300", "--init-timeout", "900"]
if stage_config_path:
server_args = ["--stage-configs-path", stage_config_path] + server_args
with OmniServer(model, server_args) as server:
Expand Down Expand Up @@ -97,8 +117,6 @@ def run_benchmark(
["vllm", "bench", "serve", "--omni"]
+ args
+ [
"--num-warmups",
"2",
"--save-result",
"--result-dir",
os.environ.get("BENCHMARK_DIR", "tests"),
Expand Down Expand Up @@ -141,7 +159,6 @@ def run_benchmark(
result["random_output_len"] = random_output_len
with open(result_path, "w", encoding="utf-8") as f:
json.dump(result, f, ensure_ascii=False, indent=2)

return result


Expand Down Expand Up @@ -207,10 +224,6 @@ def _resolve_baseline_value(
f"or request_rate={request_rate!r}; keys={list(baseline_raw.keys())!r}"
)
if isinstance(baseline_raw, (list, tuple)):
if sweep_index is None:
raise ValueError("list baseline requires sweep_index")
if not (0 <= sweep_index < len(baseline_raw)):
raise IndexError(f"baseline list len={len(baseline_raw)} has no index {sweep_index}")
return baseline_raw[sweep_index]
return baseline_raw

Expand Down Expand Up @@ -245,14 +258,14 @@ def assert_result(
) -> None:
assert result["completed"] == num_prompt, "Request failures exist"
baseline_data = params.get("baseline", {})
thresholds = _baseline_thresholds_for_step(
baseline_data,
sweep_index=sweep_index,
max_concurrency=max_concurrency,
request_rate=request_rate,
)
for metric_name, baseline_value in thresholds.items():
for metric_name, baseline_raw in baseline_data.items():
current_value = result[metric_name]
baseline_value = _resolve_baseline_value(
baseline_raw,
sweep_index=sweep_index,
max_concurrency=max_concurrency,
request_rate=request_rate,
)
if "throughput" in metric_name:
if current_value <= baseline_value:
print(
Expand Down
25 changes: 6 additions & 19 deletions tests/dfx/perf/scripts/run_diffusion_benchmark.py
Original file line number Diff line number Diff line change
Expand Up @@ -5,8 +5,8 @@
- vllm-omni (default): starts DiffusionServer via vllm_omni.entrypoints.cli.main,
benchmarks with diffusion_benchmark_serving.py --backend vllm-omni

A config JSON file is REQUIRED via --config-file:
pytest run_diffusion_benchmark.py --config-file tests/dfx/perf/tests/test_qwen_image_vllm_omni.json
A config JSON file is REQUIRED via --test-config-file:
pytest run_diffusion_benchmark.py --test-config-file tests/dfx/perf/tests/test_qwen_image_vllm_omni.json

JSON config entries use a "server_type" field, and this runner executes
the vllm-omni path.
Expand Down Expand Up @@ -55,16 +55,16 @@


def _get_config_file_from_argv() -> str | None:
"""Read --config-file from sys.argv at import time so pytest parametrize can use it.
"""Read --test-config-file from sys.argv at import time so pytest parametrize can use it.

pytest_addoption (below) registers the same flag so pytest does not reject it.
Supports both ``--config-file path`` and ``--config-file=path`` forms.
Supports both ``--test-config-file path`` and ``--test-config-file=path`` forms.
Returns None if the flag is not present; callers must handle the missing case.
"""
for i, arg in enumerate(sys.argv):
if arg == "--config-file" and i + 1 < len(sys.argv):
if arg == "--test-config-file" and i + 1 < len(sys.argv):
return sys.argv[i + 1]
if arg.startswith("--config-file="):
if arg.startswith("--test-config-file="):
return arg.split("=", 1)[1]
return None

Expand Down Expand Up @@ -133,19 +133,6 @@ def _append_to_aggregated_file(record: dict[str, Any]) -> None:
json.dump(records, f, indent=2, ensure_ascii=False)


# Register --config-file with pytest so it does not reject the argument.
def pytest_addoption(parser: pytest.Parser) -> None:
parser.addoption(
"--config-file",
action="store",
default=None,
help=(
"Path to the benchmark config JSON file (required). "
"Example: --config-file tests/dfx/perf/tests/test_qwen_image_vllm_omni.json"
),
)


_server_lock = threading.Lock()

# ---------------------------------------------------------------------------
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -329,37 +329,5 @@
}
}
]
},
{
"test_name": "test_qwen3_tts",
"server_params": {
"model": "Qwen/Qwen3-TTS-12Hz-1.7B-CustomVoice"
},
"benchmark_params": [
{
"dataset_name": "random",
"backend": "openai-audio-speech",
"endpoint": "/v1/audio/speech",
"num_prompts": [
10,
40
],
"max_concurrency": [
1,
4
],
"random_input_len": 100,
"random_output_len": 100,
"extra_body": {
"voice": "Vivian",
"language": "English"
},
"percentile-metrics": "ttft,e2el,audio_rtf,audio_ttfp,audio_duration",
"baseline": {
"mean_audio_ttfp_ms": [6000, 6000],
"mean_audio_rtf": [0.3, 0.3]
}
}
]
}
]
34 changes: 34 additions & 0 deletions tests/dfx/perf/tests/test_tts.json
Original file line number Diff line number Diff line change
@@ -0,0 +1,34 @@
[
{
"test_name": "test_qwen3_tts",
"server_params": {
"model": "Qwen/Qwen3-TTS-12Hz-1.7B-CustomVoice"
},
"benchmark_params": [
{
"dataset_name": "random",
"backend": "openai-audio-speech",
"endpoint": "/v1/audio/speech",
"num_prompts": [
10,
40
],
"max_concurrency": [
1,
4
],
"random_input_len": 100,
"random_output_len": 100,
"extra_body": {
"voice": "Vivian",
"language": "English"
},
"percentile-metrics": "ttft,e2el,audio_rtf,audio_ttfp,audio_duration",
"baseline": {
"mean_audio_ttfp_ms": [6000, 6000],
"mean_audio_rtf": [0.3, 0.3]
}
}
]
}
]
Loading