diff --git a/.github/PULL_REQUEST_TEMPLATE.md b/.github/PULL_REQUEST_TEMPLATE.md index 2466bc71ea8..c67a9c7f40c 100644 --- a/.github/PULL_REQUEST_TEMPLATE.md +++ b/.github/PULL_REQUEST_TEMPLATE.md @@ -12,8 +12,8 @@ PLEASE FILL IN THE PR DESCRIPTION HERE ENSURING ALL CHECKLIST ITEMS (AT THE BOTT Essential Elements of an Effective PR Description Checklist - [ ] The purpose of the PR, such as "Fix some issue (link existing issues this PR will resolve)". -- [ ] The test plan, such as providing test command. -- [ ] The test results, such as pasting the results comparison before and after, or e2e results +- [ ] The test plan. Please providing the test scripts & test commands. Please state the reasons if your codes don't require additional test scripts. For test file guidelines, please check the [test style doc](https://docs.vllm.ai/projects/vllm-omni/en/latest/contributing/ci/tests_style/) +- [ ] The test results. Please pasting the results comparison before and after, or e2e results. - [ ] (Optional) The necessary documentation update, such as updating `supported_models.md` and `examples` for a new model. **Please run `mkdocs serve` to sync the documentation editions to `./docs`.** - [ ] (Optional) Release notes update. If your change is user facing, please update the release notes draft. diff --git a/docs/contributing/ci/CI_5levels.md b/docs/contributing/ci/CI_5levels.md new file mode 100644 index 00000000000..dbf87d35289 --- /dev/null +++ b/docs/contributing/ci/CI_5levels.md @@ -0,0 +1,678 @@ +# Multi-Level Automated Testing System Documentation + +## Document Overview + +This testing system aims to build a complete, efficient, and well-structured quality assurance framework for the development, integration, and release of model services. It draws on the concept of the test pyramid from modern software engineering, progressively expanding testing activities from basic code logic verification to complex end-to-end (E2E) functionality, performance, accuracy, and even long-term stability validation. + +Through five levels (L1-L5) and common (Common) specifications, the system clarifies the testing objectives, scope, execution frequency, and required resources for different development stages (e.g., each commit, PR merge, daily build, pre-release). This ensures that models meet high standards for functionality, performance, and reliability across various deployment scenarios (online serving and offline inference). + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + +
LevelScope & FocusTime CostTest DirDocFrequencyHardware
CommonContribution Guideline & PR checklist//.github/PULL_REQUEST_TEMPLATE.md Test Style (PR Checklist)//
CI Failure Description// CI Failures//
L1
(Unit & Logic)
Unit tests for components like entrypoints, models<15min/tests/{component_name}/test_xxx + Chapter 1
+ Section 1 L1&L2: Purpose, Test Content, Directory Location, Example +
PR with ready label (also can run locally)CPU
L2
(E2E across models & GPU-required UT)
Online & Offline (basic deployment scenarios):
dummy, normal inference function (output format, stream), some instance startup UT
+ /tests/e2e/online_serving/test_{model_name}.py
+ /tests/e2e/offline_inference/test_{model_name}.py +
+ Chapter 1
+ L1&L2: Purpose, Test Content, Directory Location, Example +
PR with ready labelGPU
L3
(Important Perf & Integration & Accuracy)
Online & Offline (multiple deployment scenarios):
real model, normal inference function, normal accuracy
<30min + /tests/e2e/online_serving/test_{model_name}_expansion.py
+ /tests/e2e/offline_inference/test_{model_name}_expansion.py +
+ Chapter 2
+ L3: Purpose, Test Content, Directory Location, Example +
PR Merged (Also run L1&L2 Tests)GPU
L4
(Perf & Integration & Accuracy)
Online & Offline: full functional scenarios + performance test + doc test<3 hour + Full Function:
+ /tests/e2e/online_serving/test_{model_name}_expansion.py
+ /tests/e2e/offline_inference/test_{model_name}_expansion.py
+ Performance:
+ /tests/e2e/perf/nightly.json
+ Doc Test:
+ tests/example/online_serving/test_{model_name}.py
+ tests/example/offline_inference/test_{model_name}.py +
+ Chapter 3
+ L4: Purpose, Test Content, Directory Location, Example +
NightlyGPU
L5
(Stability & Reliability)
Online & Offline: long-term stability test + reliability test Depends on reality + Stability:
+ tests/e2e/stability/weekly.json
+ Reliability:
+ tests/e2e/reliability/test_{model_name}.py +
+ Chapter 4
+ L5: Purpose, Test Content, Directory Location, Example +
Weekly / Days before ReleaseGPU
+ + +--- +
+ The folder structure for tests file based on the 5 levels design +Legend: `✅` = test exists, `⬜` = suggested to add. +``` +vllm_omni/ tests/ +├── config/ → ├── config/ +│ ├── model.py │ └── test_model.py ⬜ +│ └── lora.py │ └── test_lora.py ⬜ +│ +├── core/ → ├── core/ +│ └── sched/ │ └── sched/ +│ ├── omni_ar_scheduler.py │ ├── test_omni_ar_scheduler.py ⬜ +│ ├── omni_generation_scheduler.py │ ├── test_omni_generation_scheduler.py ⬜ +│ └── output.py │ └── test_output.py ✅ currently in entrypoints/test_omni_new_request_data.py (tests output.OmniNewRequestData) +│ +├── diffusion/ → ├── diffusion/ +│ ├── diffusion_engine.py │ ├── test_diffusion_engine.py ⬜ +│ ├── attention/ │ ├── attention/ +│ │ ├── layer.py │ │ ├── test_attention_sp.py ✅ +│ │ └── backends/ │ │ └── test_flash_attn.py ✅ +│ ├── distributed/ │ ├── distributed/ +│ │ └── ... │ │ ├── test_comm.py ✅ +│ │ │ │ ├── test_cfg_parallel.py ✅ +│ │ │ │ └── test_sp_plan_hooks.py ✅ +│ ├── lora/ │ ├── lora/ +│ │ └── ... │ │ ├── test_base_linear.py ✅ +│ │ │ │ └── test_lora_manager.py ✅ +│ ├── models/ │ ├── models/ +│ │ ├── qwen_image/ │ │ ├── qwen_image/ (e2e coverage) +│ │ ├── z_image/ │ │ └── z_image/ +│ │ └── ... │ │ └── test_zimage_tp_constraints.py ✅ +│ └── worker/ │ └── worker/ +│ ├── diffusion_worker.py │ └── test_diffusion_worker.py ✅ file at diffusion/test_diffusion_worker.py +│ └── diffusion_model_runner.py │ +│ +├── distributed/ → ├── distributed/ +│ └── omni_connectors/ │ └── omni_connectors/ +│ ├── adapter.py │ ├── test_adapter_and_flow.py ✅ +│ ├── kv_transfer_manager.py │ ├── test_basic_connectors.py ✅ +│ ├── connectors/ │ ├── test_kv_flow.py ✅ +│ └── utils/ │ └── test_omni_connector_configs.py ✅ +│ +├── engine/ → ├── engine/ +│ ├── input_processor.py │ ├── test_input_processor.py ⬜ (no processor.py in source) +│ ├── output_processor.py │ └── test_output_processor.py ⬜ +│ └── arg_utils.py │ └── test_arg_utils.py ⬜ +│ +├── entrypoints/ → ├── entrypoints/ +│ ├── omni.py │ ├── test_omni.py ⬜ (E2E covered by e2e/offline, e2e/online) +│ ├── omni_llm.py │ ├── test_omni_llm.py ✅ +│ ├── omni_stage.py │ ├── test_omni_stage.py ⬜ (partial in test_omni_stage_diffusion_config.py) +│ ├── omni_diffusion.py │ ├── test_omni_diffusion.py ✅ +│ ├── async_omni.py │ ├── test_async_omni.py ✅ actually in e2e/online_serving/test_async_omni.py +│ ├── async_omni_diffusion.py │ ├── test_async_omni_diffusion_config.py ✅ +│ ├── stage_utils.py │ ├── test_stage_utils.py ✅ +│ ├── cli/ │ ├── cli/ (benchmarks/test_serve_cli.py covers CLI serve) +│ │ └── ... │ │ └── test_*.py ⬜ +│ └── openai/ │ └── openai_api/ # maps to entrypoints/openai/ +│ ├── api_server.py │ ├── test_api_server.py ⬜ (e2e indirect coverage) +│ ├── serving_chat.py │ ├── test_serving_chat_sampling_params.py ✅ +│ ├── serving_speech.py │ ├── test_serving_speech.py ✅ +│ └── image_api_utils.py │ └── test_image_server.py ✅ +│ +├── inputs/ → ├── inputs/ +│ ├── data.py │ ├── test_data.py ⬜ +│ ├── parse.py │ ├── test_parse.py ⬜ +│ └── preprocess.py │ └── test_preprocess.py ✅ currently in entrypoints/test_omni_input_preprocessor.py +│ +├── model_executor/ → ├── model_executor/ +│ ├── layers/ │ ├── layers/ +│ │ └── mrope.py │ │ └── test_mrope.py ⬜ +│ ├── model_loader/ │ ├── model_loader/ +│ │ └── weight_utils.py │ │ └── test_weight_utils.py ⬜ +│ ├── models/ │ ├── models/ +│ │ ├── qwen2_5_omni/ │ │ ├── qwen2_5_omni/ +│ │ │ ├── qwen2_5_omni_thinker.py │ │ │ ├── test_audio_length.py ✅ +│ │ │ ├── qwen2_5_omni_talker.py │ │ │ ├── test_qwen2_5_omni_thinker.py ⬜ +│ │ │ └── qwen2_5_omni_token2wav.py │ │ │ ├── test_qwen2_5_omni_talker.py ⬜ +│ │ └── qwen3_omni/ │ │ │ └── test_qwen2_5_omni_token2wav.py ⬜ +│ │ └── ... │ │ └── qwen3_omni/ +│ ├── stage_configs/ │ │ └── test_*.py ⬜ +│ │ └── *.yaml │ └── stage_configs/ (used by e2e, test_*.py can be added) ⬜ +│ └── stage_input_processors/ │ └── stage_input_processors/ +│ └── ... │ └── test_*.py ⬜ +│ +├── sample/ → ├── sample/ +│ └── __init__.py │ └── test_*.py ⬜ +│ +├── utils/ → ├── utils/ +│ └── __init__.py │ └── test_*.py ⬜ (no platform_utils.py currently) +│ +├── worker/ → ├── worker/ +│ ├── gpu_ar_model_runner.py │ ├── test_gpu_ar_model_runner.py ⬜ +│ ├── gpu_ar_worker.py │ ├── test_gpu_ar_worker.py ⬜ +│ ├── gpu_generation_model_runner.py │ ├── test_gpu_generation_model_runner.py ✅ +│ ├── gpu_generation_worker.py │ ├── test_gpu_generation_worker.py ⬜ +│ ├── gpu_model_runner.py │ ├── test_omni_gpu_model_runner.py ✅ +│ └── mixins.py │ └── (npu under platforms/npu/worker/) # not worker/npu/ +│ +├── platforms/ → (no tests/platforms/, e2e and stage_configs provide indirect coverage) +│ ├── cuda/ +│ ├── npu/worker/ # NPU worker here, not vllm_omni/worker/npu/ +│ ├── rocm/ +│ └── xpu/worker/ +│ +├── outputs.py → test_outputs.py ✅ (at tests root) +├── (logger, patch, request, version) → (no corresponding unit test) +│ +└── e2e (tests side only) → ├── e2e/ + ├── online_serving/ ✅ non-empty + │ ├── test_async_omni.py + │ ├── test_qwen3_omni.py + │ ├── test_qwen3_omni_expansion.py + │ ├── test_image_gen_edit.py + │ ├── test_images_generations_lora.py + │ └── stage_configs/ + └── offline_inference/ ✅ + ├── test_qwen2_5_omni.py + ├── test_qwen3_omni.py + ├── test_bagel_text2img.py + ├── test_t2i_model.py + ├── test_t2v_model.py + ├── test_ovis_image.py + ├── test_zimage_tensor_parallel.py + ├── test_cache_dit.py + ├── test_teacache.py + ├── test_stable_audio_model.py + ├── test_diffusion_cpu_offload.py + ├── test_diffusion_layerwise_offload.py + ├── test_diffusion_lora.py + ├── test_sequence_parallel.py + └── stage_configs/ + ├── qwen2_5_omni_ci.yaml + ├── qwen3_omni_ci.yaml + ├── bagel_*.yaml + └── npu/, rocm/, etc. +``` + + +
+ + +## Common Specifications + +Before entering specific testing levels, the project establishes two common specifications aimed at standardizing the development process and quickly locating issues. + +1. ***PR Checklist ([Tests Style](../ci/tests_style.md))***: This template defines the self-check items that must be completed before submitting a code review (Pull Request). It ensures that each code change meets basic requirements such as code style, dependency updates, and documentation synchronization before entering the automated testing pipeline, serving as the first manual line of defense for quality assurance. +2. ***CI Failure Explanation ([CI Failures](../ci/failures.md))***: This document archives and explains common failure patterns in the Continuous Integration (CI) pipeline, error log interpretation, and preliminary troubleshooting steps. It helps developers and testers quickly diagnose the causes of automated test failures, improving problem-solving efficiency. + +## Chapter 1: L1 & L2 Level Testing - Unit Testing and Basic End-to-End Verification + +### 1.1 Testing Purpose + +L1 and L2 level testing form the foundation of the quality assurance system. L1 level testing focuses on verifying the internal logic correctness of code units (e.g., functions, classes), ensuring each independent component behaves as designed. + +L2 level testing builds upon L1 by introducing GPU resources and verifying that the end-to-end (E2E) process of the model in basic deployment scenarios is smooth. For example, it uses dummy models to confirm that core interfaces like the inference pipeline, output format, and streaming response work properly. The common goal of these two levels is to provide developers with rapid feedback, discovering and fixing issues early in the development cycle . + + + +### 1.2 Testing Content and Scope + +- ***L1 (Unit & Logic Testing)***: +- - ***Scope***: Tests internal functions and methods of core components such as `entrypoints`, `models`. + - ***Focus***: Branch coverage, exception handling, algorithm logic correctness. Does not involve external dependencies or the complete service stack. + - ***Time Cost***: Execution time is controlled within ***15 minutes*** to ensure fast feedback. +- ***L2 (Basic End-to-End Testing)***: +- - ***Scope***: Covers two basic deployment scenarios: `online` (serving) and `offline` (inference). + - ***Focus***: Uses `dummy` models or lightweight real models to verify that the entire chain from request input to result output works normally, including output data structure, streaming (stream) support, etc. Also includes some unit tests that require launching independent service instances. + - ***Characteristic***: Requires ***GPU*** resources to perform model computations. + +### 1.3 Test Directory and Execution Files + +A clear directory structure is key to managing test cases efficiently. + +- ***L1 Test Directory***: `/tests/{component_name}/test_xxx.py` +- - Here, `{component_name}` corresponds to modules in the source code, such as `distributed`, `entrypoints`, etc., and `test_xxx.py` is the specific test file. +- ***L2 Test Directory***: +- - Online Serving: `/tests/e2e/online_serving/test_{model_name}.py` + - Offline Inference: `/tests/e2e/offline_inference/test_{model_name}.py` + +### 1.4 Execution Method and Example + +- ***Trigger Timing***: **`PR with ready label`**. That is, when a developer adds a "ready for review" or similar label to a PR on platforms like GitHub, L1 and L2 tests are automatically triggered. +- ***Execution Environment***: L1 uses ***CPU*** environment; L2 requires ***GPU*** environment. +- ***Script Example***: + +
+ L1 Test Examples + +Examples from `tests/model_executor/models/qwen2_5_omni/test_audio_length.py` +```python +# SPDX-License-Identifier: Apache-2.0 +# SPDX-FileCopyrightText: Copyright contributors to the vLLM project + +import pytest + +pytestmark = [pytest.mark.core_model, pytest.mark.cpu] + +def test_resolve_max_mel_frames_default(): + from vllm_omni.model_executor.models.qwen2_5_omni.audio_length import resolve_max_mel_frames + + assert resolve_max_mel_frames(None, default=30000) == 30000 + assert resolve_max_mel_frames(None, default=6000) == 6000 + + +def test_resolve_max_mel_frames_explicit(): + from vllm_omni.model_executor.models.qwen2_5_omni.audio_length import resolve_max_mel_frames + + # Explicit argument always wins over default + assert resolve_max_mel_frames(123, default=30000) == 123 + assert resolve_max_mel_frames(6000, default=30000) == 6000 + assert resolve_max_mel_frames(0, default=30000) == 0 + + +@pytest.mark.parametrize("repeats", [2, 4]) +@pytest.mark.parametrize("code_len", [0, 1, 32768]) +@pytest.mark.parametrize("max_mel_frames", [None, -1, 0, 1, 6000, 30000]) +def test_cap_and_align_mel_length_no_mismatch(repeats, code_len, max_mel_frames): + """Guard that any max_mel_frames yields a mel length aligned to repeats, and + consistent with the truncated code length (prevents concat mismatch). + """ + from vllm_omni.model_executor.models.qwen2_5_omni.audio_length import cap_and_align_mel_length + + target_code_len, target_mel_len = cap_and_align_mel_length( + code_len=code_len, + repeats=repeats, + max_mel_frames=max_mel_frames, + ) + + assert isinstance(target_code_len, int) + assert isinstance(target_mel_len, int) + + if code_len == 0: + assert target_code_len == 0 + assert target_mel_len == 0 + return + + assert target_code_len >= 1 + assert target_mel_len >= repeats + assert target_mel_len % repeats == 0 + assert target_mel_len == target_code_len * repeats + assert target_code_len <= code_len + + if max_mel_frames is not None and int(max_mel_frames) > 0 and int(max_mel_frames) >= repeats: + assert target_mel_len <= int(max_mel_frames) +``` +
+ +
+ L2 Test Examples +You can refer to Test Examples in Chapter 2 to see example test cases that incorporate both L2 and L3 testing logic. +
+ +- - ***Run Command***: + + `pytest -s -v /tests/e2e/online_serving/test_{model_name}.py` + `pytest -s -v -m 'core_model and cpu' --run-level=core_model` + +## Chapter 2: L3 Level Testing - Core Integration, Performance, and Accuracy Verification + +### 2.1 Testing Purpose + +L3 level testing executes after code is merged into the main branch. Its core purpose is to verify the integration effect, key performance indicators, and output accuracy of ***real models*** in ***multiple deployment scenarios*** + +. It acts as the "quality gatekeeper" for the main branch, ensuring that no merge breaks the core capabilities of the model service. Testing needs to provide clear conclusions within a relatively short time (<30min), balancing test depth with feedback speed. + + + +### 2.2 Testing Content and Scope + +- ***Deployment Scenarios***: Covers richer `online` and `offline` deployment configurations, which may include different hardware configurations, batch sizes, concurrency levels, etc. +- ***Core Verification***: +- 1. ***Inference Functionality***: Ensures real models can perform forward computation normally and return results. + 2. ***Accuracy Compliance***: Verifies that the model's evaluation metrics (e.g., accuracy) meet the expected baseline, preventing code changes from introducing accuracy issues. + 3. ***Important Performance***: Verifies whether performance (e.g., P99 latency, throughput) in core scenarios meets preset thresholds. + +### 2.3 Test Directory and Execution Files + +- ***Functional Testing***: +- - Online Serving: `/tests/e2e/online_serving/test_{model_name}_expansion.py` + - Offline Inference: `/tests/e2e/offline_inference/test_{model_name}_expansion.py` + - (Note: `_expansion.py` likely means it contains more comprehensive scenario cases compared to L2 tests). + +### 2.4 Execution Method and Example + +- ***Trigger Timing***: **`PR Merged`**. Automatically triggered after code review is approved and merged into the main branch. +- ***Execution Environment***: ***GPU*** servers. +- ***Script Example***: + +???+ example "Test Examples" + + **2.4.1 Mark Declaration Section** + + ```python + @pytest.mark.advanced_model + @pytest.mark.core_model + @pytest.mark.parametrize("omni_server", test_params, indirect=True) + ``` + + **Explanation**: + + @pytest.mark.advanced_model: Marks the test as L3 or L4 level, indicating that this test case performs deep validation, using real models for performance, integration, and accuracy testing. This forms a "basic-advanced" correspondence with the core_model mark at the L2 level. + + @pytest.mark.core_model: Marks the test as L1 or L2 level, indicating that this test case validates the basic functionality of the core model. It uses mock weights and only checks if the relevant interface functions correctly. + + @pytest.mark.parametrize: A parameterization decorator that allows abstracting test data into parameters, enabling reuse of the same test logic across different data configurations. indirect=True indicates that parameters will be passed to the fixture for processing. + + **Notes**: If you believe the test case only needs to execute basic run logic at the PR-level CI, you can mark it only with @pytest.mark.core_model. If you believe it only needs to execute deep validation run logic at the merge or nightly level, you can mark it only with @pytest.mark.advanced_model. If you believe the test case needs to accommodate both basic run and deep validation test logic, you should mark it with both @pytest.mark.core_model and @pytest.mark.advanced_model. + + **2.4.2 Test Function Definition and Documentation** + + ```python + def test_mix_to_text_audio_001(omni_server, openai_client) -> None: + """ + Test multi-modal input processing and text/audio output generation via OpenAI API. + Deploy Setting: default yaml + Input Modal: text + audio + video + image + Output Modal: text + audio + Input Setting: stream=True + Datasets: single request + """ + ``` + + **Explanation**: + + **Function Naming Convention**: Uses the test_ prefix, describes the test scenario mix_to_text_audio, and the number 001 indicates the first test case for this scenario. + + **Parameter Explanation**: + + omni_server: Omni server instance obtained via fixture, containing model information and configuration. + + openai_client: Unified OpenAI client processing instance, encapsulating request sending and response validation logic. + + Docstring: Describes the test purpose, deployment settings, input/output modalities, streaming settings, and dataset type in detail, providing clear context for test maintenance. + + **2.4.3 Multimodal Data Preparation** + + ```python + video_data_url = f"data:video/mp4;base64,{generate_synthetic_video(224, 224, 300)['base64']}" + image_data_url = f"data:image/jpeg;base64,{generate_synthetic_image(224, 224)['base64']}" + audio_data_url = f"data:audio/wav;base64,{generate_synthetic_audio(5, 1)['base64']}" + ``` + + **Explanation**: + + **Data Generation Functions**: Use the generate_synthetic_* series of functions to generate synthetic test data, avoiding reliance on external resources and ensuring test reproducibility and stability. + + **Parameter Explanation**: + + Video: width, height, duration_frames + + Image: width, height + + Audio: duration_seconds, channels + + **2.4.4 Request Configuration and Keyword Validation** + + ```python + request_config = { + "model": omni_server.model, + "messages": messages, + "stream": True, + "key_words": { + "audio": ["water", "cricket"], + "video": ["sphere", "globe", "circle", "round"], + "image": ["square", "quadrate"], + "text": ["beijing"] + }, + } + ``` + + **Explanation**: + + **Model Specification**: Uses omni_server.model to ensure the test aligns with the model configured on the server. + + **Keyword Validation Mechanism**: This is an innovative design of the template to address the specific needs of multimodal testing: + + Audio Keywords: Validate whether the generated text's description of audio content contains expected elements (e.g., "water" for water sounds, "cricket" for cricket sounds). If you provide multiple keywords, the validation is considered successful if at least one keyword is present. + + **Video Keywords**: Validate whether the generated text's description of video content contains expected elements. If you provide multiple keywords, the validation is considered successful if at least one keyword is present. + + Image Keywords: Validate whether the generated text's description of image content contains expected elements. If you provide multiple keywords, the validation is considered successful if at least one keyword is present. + + Text Keywords: Validate whether the generated text contains expected elements. If you provide multiple keywords, the validation is considered successful if at least one keyword is present. + + **2.4.5 Request Execution** + + ```python + openai_client.send_request(request_config, request_num=1) + ``` + + **Explanation**: + + **Unified Client**: Uses the OpenAIClientHandler instance to send requests. This client encapsulates error handling, retry mechanisms, and response validation logic. + + **Single Request**: The comment clearly states this is a single-request completion test. For concurrent testing, it can be extended to multiple requests using request_num = n. + + **Implicit Validation**: The send_request method internally includes validation logic dynamically selected based on the --run-level parameter: core_model performs basic validation, while advanced_model performs deep validation. + +- ***Run Command***: `pytest -s -v /tests/e2e/online_serving/test_{model_name}.py -m advanced_model --run-level=advanced_model` + +## Chapter 3: L4 Level Testing - Full Functionality, Performance, and Documentation Testing + +### 3.1 Testing Purpose + +L4 level testing is a comprehensive quality audit before a version release. It expands upon L3, executing ***full*** functional scenarios, conducting systematic ***performance stress tests***, and simultaneously verifying the correctness of accompanying ***example documentation***. Its purpose is to perform deep validation of the system during off-peak nighttime hours, providing quality trend reports for daytime development and data support for release decisions. + + + +### 3.2 Testing Content and Scope + +- ***Full Functionality Testing***: Executes all test cases defined in `test_{model_name}_expansion.py`, covering all implemented features, positive flows, boundary conditions, and exception handling. +- ***Performance Testing***: Uses the `/tests/e2e/perf/nightly.json` configuration file to drive performance testing tools for stress, load, and endurance tests, collecting metrics like throughput, response time, and resource utilization. +- ***Documentation Testing***: Verifies whether the example code provided to users is runnable and its results match the description. + +### 3.3 Test Directory and Execution Files + +- ***Functional Testing***: Same directories as L3. +- ***Performance Test Configuration***: `/tests/e2e/perf/nightly.json` +- ***Documentation Example Tests***: +- - `tests/example/online_serving/test_{model_name}.py` + - `tests/example/offline_inference/test_{model_name}.py` + +### 3.4 Execution Method and Example + +- ***Trigger Timing***: **`Nightly`**, automatically executed every night. +- ***Execution Environment***: ***GPU*** server clusters to meet the resource demands of performance testing. +- ***Script Example***: +
+ Test Examples +When you want to add L4-level performance test cases, you can refer to the following format for case addition in tests/perf/tests/test.json: + +```JSON +{ + "test_name": "test_qwen3_omni", + "server_params": { + "model": "Qwen/Qwen3-Omni-30B-A3B-Instruct", + "stage_config_name": "qwen3_omni.yaml" + }, + "benchmark_params": [ + { + "dataset_name": "random", + "num_prompts": [10, 20], + "request_rate": [0.5, 1], + "random_input_len": 2500, + "random_output_len": 900, + "ignore_eos": true, + "percentile-metrics": "ttft,tpot,itl,e2el,audio_rtf,audio_ttfp,audio_duration", + "baseline": { + "mean_ttft_ms": 100000, + "mean_audio_ttfp_ms": 100000, + "mean_audio_rtf": 100000 + } + } + ] +} +``` + +#### Parameter Explanation + +***Overview*** + +| Field | Required | Description | +| ---------------- | -------- | --------------------------------------------------------------- | +| test_name | Yes | Unique identifier for the test case | +| server_params | Yes | Server-side configuration parameters | +| benchmark_params | Yes | Benchmark running parameters (supports multiple configurations) | + +#### server_params Configuration + +##### Basic Parameters + +| Parameter | Required | Example | Description | +| ----------------- | -------- | ---------------------------------- | ----------------------------- | +| model | Yes | "Qwen/Qwen3-Omni-30B-A3B-Instruct" | Model name or path | +| stage_config_name | Yes | "qwen3_omni.yaml" | Stage configuration file name | + +##### Dynamic Configuration (update/delete) + +Supports incremental modifications based on the basic configuration: + +| Operation | Description | +| --------- | ------------------------------------ | +| update | Update or add configuration items | +| delete | Delete specified configuration items | + +***Example***: +``` +"update": { + "async_chunk": true, // Enable asynchronous chunk processing + "stage_args": { + "0": { + "engine_args.custom_process_next_stage_input_func": "vllm_omni.model_executor.stage_input_processors.qwen3_omni.thinker2talker_async_chunk" + } + } +}, +"delete": { + "stage_args": { + "2": ["custom_process_input_func"] // Delete this configuration for stage 2 + } +} +``` +#### benchmark_params Configuration + +You can add any benchmark running parameters you need here. For all optional parameters, refer to the [benchmark documentation](https://github.com/vllm-project/vllm-omni/blob/main/docs/cli/bench/serve.md). General modifications are as follows: + +1. Change the ---xxx-xx-xx running parameters to xxx_xx_xx format and fill them as keys in the JSON file. +2. For boolean variables in the running parameters, modify them to forms such as ignore_eos: true/false and fill them into the JSON file. +3. Add the baseline parameter to specify the required validation values, ensuring the validation metric names match those in the result.json generated by the benchmark. +4. The qps and concurrency modes are mutually exclusive. For detailed explanations, see the table below: + +| Parameter | Type | Required | Example/Values | Description | +| --------------- | ----------- | -------- | --------------- | -------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- | +| num_prompts | int / array | Yes | 10,[10, 20, 30] | Number of requests. Supports single values or arrays. If a single value is used, it will be automatically expanded to match the number of qps or max_concurrency, e.g., [10,10,10]. If an array is used, its length must match the number of qps or max_concurrency. | +| request_rate | int / array | No | 1, [1, 2, 3] | Queries per second. Supports single values or arrays. If a single value is used, it will be automatically expanded to match the number of num_prompts, e.g., [1,1,1]. If an array is used, its length must match the number of num_prompts. | +| max_concurrency | int / array | No | 1, [1, 2, 3] | Queries per second. Supports single values or arrays. If a single value is used, it will be automatically expanded to match the number of num_prompts, e.g., [1,1,1]. If an array is used, its length must match the number of num_prompts. | +
+ +- - ***Run Command***: (Specific commands would depend on the performance testing tool and configuration defined in `nightly.json`). + +## Chapter 4: L5 Level Testing - Stability and Reliability Testing + +### 4.1 Testing Purpose + +L5 level testing focuses on the performance of model services under ***long-running*** and ***abnormal fault*** scenarios. It aims to uncover deep-seated issues that only manifest under sustained pressure or extreme conditions, such as memory leaks, resource contention, gradual performance degradation, and lack of fault tolerance mechanisms. This is the final, yet crucial, line of defense for ensuring service high availability and production environment robustness. + + + +### 4.2 Testing Content and Scope + +- ***Long-term Stability (Stability) Testing***: Uses the `tests/e2e/stability/weekly.json` configuration to run the service under moderate load for an extended period (e.g., over 12 hours), monitoring whether metrics like memory/VRAM usage, response time, and throughput degrade over time, and whether the service process remains stable. +- ***Reliability Testing***: Uses `tests/e2e/reliability/test_{model_name}.py` to actively simulate various fault and abnormal scenarios, such as: dependent service interruption, abnormal input data, network flicker, hardware resource preemption, etc., to verify the system's fault tolerance, self-healing, and graceful degradation capabilities. + +### 4.3 Test Directory and Execution Files + +- ***Stability Test Configuration***: `tests/e2e/stability/weekly.json` +- ***Reliability Test Suite***: `tests/e2e/reliability/test_{model_name}.py` + +### 4.4 Execution Method and Example + +- ***Trigger Timing***: **`Weekly`** (weekly) or **`Days before Release`** (several days before a major release). Due to long execution times, the frequency is lower. +- ***Execution Environment***: ***GPU*** servers, requiring a stable and exclusive testing environment. +- ***Script Example***: +
+ Test Examples +```python +# WIP +``` +
+ +- - ***Stability***: (Execution would be driven by the configuration in `weekly.json`). + - ***Reliability***: `pytest -s -v tests/e2e/reliability/test_{model_name}.py` + +## Summary + +This multi-level testing system achieves continuous, progressive validation of model service quality by tightly integrating testing activities with the development workflow (commit, review, merge, release). From rapid unit testing to comprehensive end-to-end testing, and further to in-depth performance, stability, and reliability verification, each level has clear objectives, collectively building a robust quality protection net. By following this system, teams can deliver high-quality, highly reliable model services more efficiently. diff --git a/docs/contributing/ci/failures.md b/docs/contributing/ci/failures.md index d64d98a605e..6e3c10ac8e0 100644 --- a/docs/contributing/ci/failures.md +++ b/docs/contributing/ci/failures.md @@ -1,4 +1,10 @@ # CI Failures -What should I do when a CI job fails on my PR, but I don't think my PR caused -the failure? +What should I do when a CI job fails on my PR, but I don't think my PR caused the failure? + + +## Common Case 1: + +🚨 Error: The command was interrupted by a signal: signal: terminated + +Reason: The test is terminated due to exceed the time limits. Sometimes rebuild the test will pass. diff --git a/docs/contributing/ci/tests_markers.md b/docs/contributing/ci/tests_markers.md index 2afff846028..c9e5d18e602 100644 --- a/docs/contributing/ci/tests_markers.md +++ b/docs/contributing/ci/tests_markers.md @@ -7,7 +7,8 @@ Defined in `pyproject.toml`: | Marker | Description | | ------------------ | --------------------------------------------------------- | -| `core_model` | Core model tests (run in each PR) | +| `core_model` | L1&L2 tests (run in each PR) | +| `advanced_model` | L3&L4 level tests (run in each merge or nightly) | | `diffusion` | Diffusion model tests | | `omni` | Omni model tests | | `cache` | Cache backend tests | diff --git a/docs/contributing/ci/tests_style.md b/docs/contributing/ci/tests_style.md index 65c2b044346..ab94f2f5655 100644 --- a/docs/contributing/ci/tests_style.md +++ b/docs/contributing/ci/tests_style.md @@ -2,8 +2,18 @@ To ensure project maintainability and sustainable development, we encourage contributors to submit test code (unit tests, system tests, or end-to-end tests) alongside their code changes. This document outlines the guidelines for organizing and naming test files. +## Checklist before submitting your test files + +1. The file is saved in an appropriate place and the file name is clear. +2. The coding style follows the requirements outlined below. +3. All test functions have appropriate pytest markers. +4. For tests that need run in CI, please ensure it labeled as ``@pytest.mark.core_model` the test is configured under the `./buildkite/` folder. + + ## Test Types +For more details about our [Five Levels Tests design](../ci/CI_5levels.md). + ### Unit Tests and System Tests For unit tests and system tests, we strongly recommend placing test files in the same directory structure as the source code being tested, using the naming convention `test_*.py`. @@ -18,95 +28,141 @@ End-to-end tests verify the complete functionality of a system or component. For ## Test Directory Structure -The ideal directory structure mirrors the source code organization: +The ideal directory structure mirrors the source code organization. Legend: `✅` = test exists, `⬜` = suggested to add. ``` -vllm_omni/ tests/ -├── config/ → ├── config/ -│ └── model.py │ └── test_model.py +vllm_omni/ tests/ +├── config/ → ├── config/ +│ ├── model.py │ └── test_model.py ⬜ +│ └── lora.py │ └── test_lora.py ⬜ +│ +├── core/ → ├── core/ +│ └── sched/ │ └── sched/ +│ ├── omni_ar_scheduler.py │ ├── test_omni_ar_scheduler.py ⬜ +│ ├── omni_generation_scheduler.py │ ├── test_omni_generation_scheduler.py ⬜ +│ └── output.py │ └── test_output.py ✅ currently in entrypoints/test_omni_new_request_data.py (tests output.OmniNewRequestData) +│ +├── diffusion/ → ├── diffusion/ +│ ├── diffusion_engine.py │ ├── test_diffusion_engine.py ⬜ +│ ├── attention/ │ ├── attention/ +│ │ ├── layer.py │ │ ├── test_attention_sp.py ✅ +│ │ └── backends/ │ │ └── test_flash_attn.py ✅ +│ ├── distributed/ │ ├── distributed/ +│ │ └── ... │ │ ├── test_comm.py ✅ +│ │ │ │ ├── test_cfg_parallel.py ✅ +│ │ │ │ └── test_sp_plan_hooks.py ✅ +│ ├── lora/ │ ├── lora/ +│ │ └── ... │ │ ├── test_base_linear.py ✅ +│ │ │ │ └── test_lora_manager.py ✅ +│ ├── models/ │ ├── models/ +│ │ ├── qwen_image/ │ │ ├── qwen_image/ (e2e coverage) +│ │ ├── z_image/ │ │ └── z_image/ +│ │ └── ... │ │ └── test_zimage_tp_constraints.py ✅ +│ └── worker/ │ └── worker/ +│ ├── diffusion_worker.py │ └── test_diffusion_worker.py ✅ file at diffusion/test_diffusion_worker.py +│ └── diffusion_model_runner.py │ │ -├── core/ → ├── core/ -│ └── sched/ │ └── sched/ # Maps to core/sched/ -│ ├── omni_ar_scheduler.py │ ├── test_omni_ar_scheduler.py -│ ├── omni_generation_scheduler.py │ ├── test_omni_generation_scheduler.py -│ └── output.py │ └── test_output.py +├── distributed/ → ├── distributed/ +│ └── omni_connectors/ │ └── omni_connectors/ +│ ├── adapter.py │ ├── test_adapter_and_flow.py ✅ +│ ├── kv_transfer_manager.py │ ├── test_basic_connectors.py ✅ +│ ├── connectors/ │ ├── test_kv_flow.py ✅ +│ └── utils/ │ └── test_omni_connector_configs.py ✅ │ -├── diffusion/ → ├── diffusion/ -│ ├── diffusion_engine.py │ ├── test_diffusion_engine.py -│ ├── omni_diffusion.py │ ├── test_omni_diffusion.py -│ ├── attention/ │ ├── attention/ # Maps to diffusion/attention/ -│ │ └── backends/ │ │ └── test_*.py -│ ├── models/ │ ├── models/ # Maps to diffusion/models/ -│ │ ├── qwen_image/ │ │ ├── qwen_image/ -│ │ │ └── ... │ │ │ └── test_*.py -│ │ └── z_image/ │ │ └── z_image/ -│ │ └── ... │ │ └── test_*.py -│ └── worker/ │ └── worker/ # Maps to diffusion/worker/ -│ └── ... │ └── test_*.py +├── engine/ → ├── engine/ +│ ├── input_processor.py │ ├── test_input_processor.py ⬜ (no processor.py in source) +│ ├── output_processor.py │ └── test_output_processor.py ⬜ +│ └── arg_utils.py │ └── test_arg_utils.py ⬜ │ -├── distributed/ → ├── distributed/ -│ └── ... │ └── test_*.py +├── entrypoints/ → ├── entrypoints/ +│ ├── omni.py │ ├── test_omni.py ⬜ (E2E covered by e2e/offline, e2e/online) +│ ├── omni_llm.py │ ├── test_omni_llm.py ✅ +│ ├── omni_stage.py │ ├── test_omni_stage.py ⬜ (partial in test_omni_stage_diffusion_config.py) +│ ├── omni_diffusion.py │ ├── test_omni_diffusion.py ✅ +│ ├── async_omni.py │ ├── test_async_omni.py ✅ actually in e2e/online_serving/test_async_omni.py +│ ├── async_omni_diffusion.py │ ├── test_async_omni_diffusion_config.py ✅ +│ ├── stage_utils.py │ ├── test_stage_utils.py ✅ +│ ├── cli/ │ ├── cli/ (benchmarks/test_serve_cli.py covers CLI serve) +│ │ └── ... │ │ └── test_*.py ⬜ +│ └── openai/ │ └── openai_api/ # maps to entrypoints/openai/ +│ ├── api_server.py │ ├── test_api_server.py ⬜ (e2e indirect coverage) +│ ├── serving_chat.py │ ├── test_serving_chat_sampling_params.py ✅ +│ ├── serving_speech.py │ ├── test_serving_speech.py ✅ +│ └── image_api_utils.py │ └── test_image_server.py ✅ │ -├── engine/ → ├── engine/ -│ ├── processor.py │ ├── test_processor.py -│ └── output_processor.py │ └── test_output_processor.py +├── inputs/ → ├── inputs/ +│ ├── data.py │ ├── test_data.py ⬜ +│ ├── parse.py │ ├── test_parse.py ⬜ +│ └── preprocess.py │ └── test_preprocess.py ✅ currently in entrypoints/test_omni_input_preprocessor.py │ -├── entrypoints/ → ├── entrypoints/ -│ ├── omni_llm.py │ ├── test_omni_llm.py # UT: OmniLLM core logic (mocked) -│ ├── omni_stage.py │ ├── test_omni_stage.py # UT: OmniStage logic -│ ├── omni.py │ ├── test_omni.py # E2E: Omni class (offline inference) -│ ├── async_omni.py │ ├── test_async_omni.py # E2E: AsyncOmni class -│ ├── cli/ │ ├── cli/ # Maps to entrypoints/cli/ -│ │ └── ... │ │ └── test_*.py -│ └── openai/ │ └── openai/ # Maps to entrypoints/openai/ -│ ├── api_server.py │ ├── test_api_server.py # E2E: API server (online serving) -│ └── serving_chat.py │ └── test_serving_chat.py +├── model_executor/ → ├── model_executor/ +│ ├── layers/ │ ├── layers/ +│ │ └── mrope.py │ │ └── test_mrope.py ⬜ +│ ├── model_loader/ │ ├── model_loader/ +│ │ └── weight_utils.py │ │ └── test_weight_utils.py ⬜ +│ ├── models/ │ ├── models/ +│ │ ├── qwen2_5_omni/ │ │ ├── qwen2_5_omni/ +│ │ │ ├── qwen2_5_omni_thinker.py │ │ │ ├── test_audio_length.py ✅ +│ │ │ ├── qwen2_5_omni_talker.py │ │ │ ├── test_qwen2_5_omni_thinker.py ⬜ +│ │ │ └── qwen2_5_omni_token2wav.py │ │ │ ├── test_qwen2_5_omni_talker.py ⬜ +│ │ └── qwen3_omni/ │ │ │ └── test_qwen2_5_omni_token2wav.py ⬜ +│ │ └── ... │ │ └── qwen3_omni/ +│ ├── stage_configs/ │ │ └── test_*.py ⬜ +│ │ └── *.yaml │ └── stage_configs/ (used by e2e, test_*.py can be added) ⬜ +│ └── stage_input_processors/ │ └── stage_input_processors/ +│ └── ... │ └── test_*.py ⬜ │ -├── inputs/ → ├── inputs/ -│ ├── data.py │ ├── test_data.py -│ ├── parse.py │ ├── test_parse.py -│ └── preprocess.py │ └── test_preprocess.py +├── sample/ → ├── sample/ +│ └── __init__.py │ └── test_*.py ⬜ │ -├── model_executor/ → ├── model_executor/ -│ ├── layers/ │ ├── layers/ -│ │ └── mrope.py │ │ └── test_mrope.py -│ ├── model_loader/ │ ├── model_loader/ -│ │ └── weight_utils.py │ │ └── test_weight_utils.py -│ ├── models/ │ ├── models/ -│ │ ├── qwen2_5_omni/ │ │ ├── qwen2_5_omni/ -│ │ │ ├── qwen2_5_omni_thinker.py │ │ │ ├── test_qwen2_5_omni_thinker.py # UT -│ │ │ ├── qwen2_5_omni_talker.py │ │ │ ├── test_qwen2_5_omni_talker.py # UT -│ │ │ └── qwen2_5_omni_token2wav.py │ │ │ └── test_qwen2_5_omni_token2wav.py # UT -│ │ └── qwen3_omni/ │ │ └── qwen3_omni/ -│ │ └── ... │ │ └── test_*.py -│ ├── stage_configs/ │ └── stage_configs/ # Configuration tests (if needed) -│ │ └── ... │ └── test_*.py -│ └── stage_input_processors/ │ └── stage_input_processors/ -│ └── ... │ └── test_*.py +├── utils/ → ├── utils/ +│ └── __init__.py │ └── test_*.py ⬜ (no platform_utils.py currently) │ -├── sample/ → ├── sample/ -│ └── ... │ └── test_*.py +├── worker/ → ├── worker/ +│ ├── gpu_ar_model_runner.py │ ├── test_gpu_ar_model_runner.py ⬜ +│ ├── gpu_ar_worker.py │ ├── test_gpu_ar_worker.py ⬜ +│ ├── gpu_generation_model_runner.py │ ├── test_gpu_generation_model_runner.py ✅ +│ ├── gpu_generation_worker.py │ ├── test_gpu_generation_worker.py ⬜ +│ ├── gpu_model_runner.py │ ├── test_omni_gpu_model_runner.py ✅ +│ └── mixins.py │ └── (npu under platforms/npu/worker/) # not worker/npu/ │ -├── utils/ → ├── utils/ -│ └── platform_utils.py │ └── test_platform_utils.py +├── platforms/ → (no tests/platforms/, e2e and stage_configs provide indirect coverage) +│ ├── cuda/ +│ ├── npu/worker/ # NPU worker here, not vllm_omni/worker/npu/ +│ ├── rocm/ +│ └── xpu/worker/ │ -├── worker/ → ├── worker/ - ├── gpu_ar_worker.py │ ├── test_gpu_ar_worker.py - ├── gpu_generation_worker.py │ ├── test_gpu_generation_worker.py - ├── gpu_model_runner.py │ ├── test_gpu_model_runner.py - └── npu/ │ └── npu/ # Maps to worker/npu/ - └── ... │ └── test_*.py +├── outputs.py → test_outputs.py ✅ (at tests root) +├── (logger, patch, request, version) → (no corresponding unit test) │ -└── e2e/ → ├── e2e/ # End-to-end scenarios (no 1:1 source mirror) - ├── online_serving/ # Full-stack online serving flows - │ └── (empty for now) - └── offline_inference/ # Full offline inference flows - ├── test_qwen2_5_omni.py # Moved from multi_stages/ - ├── test_qwen3_omni.py # Moved from multi_stages_h100/ - ├── test_t2i_model.py # Moved from single_stage/ - └── stage_configs/ # Shared stage configs - ├── qwen2_5_omni_ci.yaml - └── qwen3_omni_ci.yaml +└── e2e (tests side only) → ├── e2e/ + ├── online_serving/ ✅ non-empty + │ ├── test_async_omni.py + │ ├── test_qwen3_omni.py + │ ├── test_qwen3_omni_expansion.py + │ ├── test_image_gen_edit.py + │ ├── test_images_generations_lora.py + │ └── stage_configs/ + └── offline_inference/ ✅ + ├── test_qwen2_5_omni.py + ├── test_qwen3_omni.py + ├── test_bagel_text2img.py + ├── test_t2i_model.py + ├── test_t2v_model.py + ├── test_ovis_image.py + ├── test_zimage_tensor_parallel.py + ├── test_cache_dit.py + ├── test_teacache.py + ├── test_stable_audio_model.py + ├── test_diffusion_cpu_offload.py + ├── test_diffusion_layerwise_offload.py + ├── test_diffusion_lora.py + ├── test_sequence_parallel.py + └── stage_configs/ + ├── qwen2_5_omni_ci.yaml + ├── qwen3_omni_ci.yaml + ├── bagel_*.yaml + └── npu/, rocm/, etc. ``` @@ -117,6 +173,8 @@ vllm_omni/ tests/ - **E2E Tests**: Place in `tests/e2e/offline_inference/` or `tests/e2e/online_serving/` with descriptive names. Example: `tests/e2e/offline_inference/test_qwen3_omni.py`, `tests/e2e/offline_inference/test_diffusion_model.py` +- **Expansion Tests** + ### Best Practices 1. **Mirror Source Structure**: Test directories should mirror the source code structure @@ -144,81 +202,217 @@ vllm_omni/ tests/ ### Template #### E2E - Online serving +E2E Online tests for Qwen3-Omni model with mix input and audio+text output. Based on `tests/e2e/online_serving/test_qwen3_omni.py`. + ```python -# SPDX-License-Identifier: Apache-2.0 -# SPDX-FileCopyrightText: Copyright contributors to the vLLM project """ -Online E2E smoke test for an omni model (video,text,audio → audio). +E2E Online tests for Qwen3-Omni model with mix input and audio+text output. """ + +import os + +os.environ["VLLM_WORKER_MULTIPROC_METHOD"] = "spawn" +os.environ["VLLM_TEST_CLEAN_GPU_MEMORY"] = "0" + +import threading from pathlib import Path -import pytest import openai +import pytest -from tests.utils import hardware_test +from tests.conftest import ( + OmniServer, + convert_audio_to_text, + cosine_similarity_text, + dummy_messages_from_mix_data, + generate_synthetic_video, + merge_base64_and_convert_to_text, +) +from vllm_omni.platforms import current_omni_platform + +# Edit: model name and stage config path +models = ["Qwen/Qwen3-Omni-30B-A3B-Instruct"] + +#If you use the default configuration file, you can directly use the following address. +def get_default_config(): + return str(Path(__file__).parent.parent / "stage_configs" / "qwen3_omni_ci.yaml") + +#If you need to modify the configuration file, you can use modify_stage_config. +def get_chunk_config(): + path = modify_stage_config( + get_default_config(), + updates={ + "async_chunk": True, + "stage_args": { + 0: { + "engine_args.custom_process_next_stage_input_func": "vllm_omni.model_executor.stage_input_processors.qwen3_omni.thinker2talker_async_chunk" + }, + 1: { + "engine_args.custom_process_next_stage_input_func": "vllm_omni.model_executor.stage_input_processors.qwen3_omni.talker2code2wav_async_chunk" + }, + }, + }, + deletes={"stage_args": {2: ["custom_process_input_func"]}}, + ) + return path -# Optional: set process start method for workers -os.environ["VLLM_WORKER_MULTIPROC_METHOD"] = "spawn" +stage_configs = [get_default_config(), CHUNK_CONFIG_PATH] -models = ["{your model name}"] #Edit here to load your model -stage_configs = [str(Path(__file__).parent / "stage_configs" / {your model yaml})] #Edit here to load your model yaml test_params = [(model, stage_config) for model in models for stage_config in stage_configs] -#OmniServer,Used to start the vllm-omni server -class OmniServer: - xxx +#Please use this method to launch the online instance. +_omni_server_lock = threading.Lock() -@pytest.fixture +@pytest.fixture(scope="module") def omni_server(request): - model, stage_config_path = request.param - with OmniServer(model, ["--stage-configs-path", stage_config_path]) as server: - yield server - + """Start vLLM-Omni server as a subprocess. Use module scope for multi-stage init (10-20+ min).""" + with _omni_server_lock: + model, stage_config_path = request.param + with OmniServer( + model, + ["--stage-configs-path", stage_config_path, "--stage-init-timeout", "120"], + ) as server: + yield server -#handle request message -@pytest.fixture(scope="session") -def base64_encoded_video() -> str: - xxx -@pytest.fixture(scope="session") -def dummy_messages_from_video_data(video_data_url: str, content_text: str) -> str: - xxx +@pytest.fixture +def client(omni_server): + """OpenAI client for the running vLLM-Omni server.""" + return openai.OpenAI( + base_url=f"http://{omni_server.host}:{omni_server.port}/v1", + api_key="EMPTY", + ) +#Please use function definitions above the test function to define the prompts and other parameters you need. +def get_system_prompt(): + return { + "role": "system", + "content": [ + { + "type": "text", + "text": ( + "You are Qwen, a virtual human developed by the Qwen Team, " + "Alibaba Group, capable of perceiving auditory and visual inputs, " + "as well as generating text and speech." + ), + } + ], + } + +... + +#Please define test case tags according to the instructions in the marker documentation. @pytest.mark.core_model @pytest.mark.omni -@hardware_test( - res={"cuda": "L4", "rocm": "MI325", "npu": "A2"}, - num_cards={"cuda": 2, "rocm": 2, "npu": 4}, -) @pytest.mark.parametrize("omni_server", test_params, indirect=True) -def test_video_to_audio( - client: openai.OpenAI, - omni_server, - base64_encoded_video: str, -) -> None: - #set message - video_data_url = f"data:video/mp4;base64, {base64_encoded_video}" - messages = dummy_messages_from_video_data(video_data_url) - - #send request - chat_completion = client.chat.completions.create( - model=omni_server.model, - messages=messages, +def test_mix_to_text_audio_001(client: openai.OpenAI, omni_server, request) -> None: + # PLEASE FOLLOW THESE TEMPLATE INSTRUCTIONS: + # ============================================================================ + # TEMPLATE USAGE GUIDE: + # 1. Copy this entire function as a starting point for multi-modal tests + # 2. Update the test name to reflect your specific test scenario + # 3. Modify input/output modalities as needed (see OPTIONS section below) + # 4. Adjust assertions based on your expected outcomes + # 5. Add custom validation logic for your specific use case + # ============================================================================ + + #Please list the relevant test points. + """ + Test multi-modal input processing and text/audio output generation via OpenAI API. + Deploy Setting: default yaml + Input Modal: text + audio + video + image + Output Modal: text + audio + Input Setting: stream=True + Datasets: single request + """ + # SECTION 1: TEST SETUP AND INITIALIZATION + # ========================================= + # INSTRUCTIONS: Initialize test variables and prepare test environment + # MODIFY: Add any additional test setup required for your scenario + e2e_list = list() + # SECTION 2: TEST DATA GENERATION + # ================================ + # INSTRUCTIONS: Generate or load test data for each input modality + # MODIFY: Replace synthetic generators with your actual data sources + # VIDEO DATA - Generate synthetic video for testing + # FORMAT: data:video/mp4;base64,{base64_encoded_video} + # PARAMETERS: width, height, duration_frames + video_data_url = f"data:video/mp4;base64,{generate_synthetic_video(224, 224, 300)['base64']}" + # IMAGE DATA - Generate synthetic image for testing + # FORMAT: data:image/jpeg;base64,{base64_encoded_image} + # PARAMETERS: width, height + image_data_url = f"data:image/jpeg;base64,{generate_synthetic_image(224, 224)['base64']}" + # AUDIO DATA - Generate synthetic audio for testing + # FORMAT: data:audio/wav;base64,{base64_encoded_audio} + # PARAMETERS: duration_seconds, channels + audio_data_url = f"data:audio/wav;base64,{generate_synthetic_audio(5, 1)['base64']}" + + # SECTION 3: MESSAGE CONSTRUCTION + # ================================ + # INSTRUCTIONS: Assemble the complete message payload for API request + # MODIFY: Add/remove modalities or change prompt structure as needed + + # USAGE: Construct a message containing all input modalities + # IMPORTANT: Ensure the message structure matches OpenAI API expectations + # CUSTOMIZATION POINTS: + # - system_prompt: Controls the assistant's behavior + # - content_text: The user's text prompt/question + # - *_data_url: URLs for media content (video/image/audio) + messages = dummy_messages_from_mix_data( + system_prompt=get_system_prompt(), + video_data_url=video_data_url, + image_data_url=image_data_url, + audio_data_url=audio_data_url, + content_text=get_prompt("mix"), ) - #verify text output - text_choice = chat_completion.choices[0] - assert text_choice.finish_reason == "length" - - #verify audio output - audio_choice = chat_completion.choices[1] - audio_message = audio_choice.message - if hasattr(audio_message, "audio") and audio_message.audio: - assert audio_message.audio.data is not None - assert len(audio_message.audio.data) > 0 + # SECTION 4: API REQUEST EXECUTION + # ================================= + # INSTRUCTIONS: Make the API call and measure performance + # MODIFY: Add timeout, retry logic, or additional parameters + start_time = time.perf_counter() + chat_completion = client.chat.completions.create(model=omni_server.model, messages=messages, stream=True) + + #Call using your preferred method and obtain the final audio and text outputs. + ... + + # SECTION 5: OUTPUT VALIDATION + # ============================= + # INSTRUCTIONS: Verify that outputs meet expected criteria + # MODIFY: Adjust validation logic for your specific requirements + + # ASSERTION 1: E2E Validation + # PURPOSE: Verify that the E2E latency is less than the baseline. + current_e2e = time.perf_counter() - start_time + print(f"the request e2e is: {current_e2e}") + e2e_list.append(current_e2e) + + print(f"the avg e2e is: {sum(e2e_list) / len(e2e_list)}") + + + + # ASSERTION 2: Text Output Validation + # PURPOSE: Verify that text output was generated with keyword content + assert text_content is not None and len(text_content) >= 2, "No text output is generated" + assert any( + keyword in text_content.lower() for keyword in ["square", "quadrate", "sphere", "globe", "circle", "round"] + ), "The output does not contain any of the keywords." + + + # ASSERTION 3: Cross-Modal Consistency + # PURPOSE: Verify text and audio outputs convey the same information + # CUSTOMIZATION: Adjust similarity threshold (0.9) based on accuracy requirements + assert audio_data is not None, "No audio output is generated" + audio_content = merge_base64_and_convert_to_text(audio_data) + print(f"text content is: {text_content}") + print(f"audio content is: {audio_content}") + similarity = cosine_similarity_text(audio_content.lower(), text_content.lower()) + print(f"similarity is: {similarity}") + assert similarity > 0.9, "The audio content is not same as the text" ``` + #### E2E - Offline inference ```python # SPDX-License-Identifier: Apache-2.0 @@ -271,10 +465,3 @@ def test_video_to_audio(omni_runner: type[OmniRunner], model: str) -> None: has_audio = any(o.final_output_type == "audio" for o in outputs) assert has_audio ``` - -## Checklist before submitting your test files - -1. The file is saved in an appropriate place and the file name is clear. -2. The coding style follows the requirements outlined above. -3. **All test functions have appropriate pytest markers** -4. For tests that need run in CI, please ensure the test is configured under the `./buildkite/` folder.