Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
16 changes: 2 additions & 14 deletions .buildkite/pipeline.yml
Original file line number Diff line number Diff line change
Expand Up @@ -20,19 +20,7 @@ steps:
- label: "Simple Unit Test"
depends_on: image-build
commands:
- |
pytest -v -s \
tests/entrypoints/ \
tests/diffusion/cache/ \
tests/diffusion/lora/ \
tests/model_executor/models/qwen2_5_omni/test_audio_length.py \
tests/worker/ \
tests/distributed/omni_connectors/test_kv_flow.py \
--cov=vllm_omni \
--cov-branch \
--cov-report=term-missing \
--cov-report=html \
--cov-report=xml
- "pytest -v -s -m 'core_model and cpu' --cov=vllm_omni --cov-branch --cov-report=term-missing --cov-report=html --cov-report=xml"
agents:
queue: "gpu_1_queue"
plugins:
Expand Down Expand Up @@ -118,7 +106,7 @@ steps:
timeout_in_minutes: 15
depends_on: image-build
commands:
- pytest -s -v tests/e2e/offline_inference/test_cache_dit.py tests/e2e/offline_inference/test_teacache.py
- pytest -s -v -m 'core_model and cache and diffusion and not distributed_cuda and L4'
agents:
queue: "gpu_1_queue" # g6.4xlarge instance on AWS, has 1 L4 GPU
plugins:
Expand Down
2 changes: 1 addition & 1 deletion .buildkite/test-amd.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -44,7 +44,7 @@ steps:
- export GPU_ARCHS=gfx942
- export VLLM_LOGGING_LEVEL=DEBUG
- export VLLM_WORKER_MULTIPROC_METHOD=spawn
- pytest -s -v tests/e2e/offline_inference/test_cache_dit.py tests/e2e/offline_inference/test_teacache.py
- pytest -s -v -m 'core_model and cache and diffusion and not distributed_rocm and MI325'

- label: "Diffusion Sequence Parallelism Test"
timeout_in_minutes: 20
Expand Down
60 changes: 28 additions & 32 deletions docs/contributing/ci/tests_markers.md
Original file line number Diff line number Diff line change
Expand Up @@ -5,33 +5,33 @@ By adding markers before test functions, tests can later be executed uniformly b
## Current Markers
Defined in `pyproject.toml`:

| Marker | Description |
| ------------------ | ------------------------------------------------------- |
| `core_model` | Core model tests (run in each PR) |
| `diffusion` | Diffusion model tests |
| `omni` | Omni model tests |
| `cache` | Cache backend tests |
| `parallel` | Parallelism/distributed tests |
| `cpu` | Tests that run on CPU |
| `gpu` | Tests that run on GPU (auto-added) |
| `cuda` | Tests that run on CUDA (auto-added) |
| `rocm` | Tests that run on AMD/ROCm (auto-added) |
| `npu` | Tests that run on NPU/Ascend (auto-added) |
| `H100` | Tests that require H100 GPU |
| `L4` | Tests that require L4 GPU |
| `MI325` | Tests that require MI325 GPU (AMD/ROCm) |
| `A2` | Tests that require A2 NPU |
| `A3` | Tests that require A3 NPU |
| `distributed_cuda` | Tests that require multi cards on CUDA platform |
| `distributed_rocm` | Tests that require multi cards on ROCm platform |
| `distributed_npu` | Tests that require multi cards on NPU platform |
| `skipif_cuda` | Skip if the num of CUDA cards is less than the required |
| `skipif_rocm` | Skip if the num of ROCm cards is less than the required |
| `skipif_npu` | Skip if the num of NPU cards is less than the required |
| `slow` | Slow tests (may skip in quick CI) |
| `benchmark` | Benchmark tests |

For those markers shown as auto-added, they will be added by the `@hardware_test` decorator.
| Marker | Description |
| ------------------ | --------------------------------------------------------- |
| `core_model` | Core model tests (run in each PR) |
| `diffusion` | Diffusion model tests |
| `omni` | Omni model tests |
| `cache` | Cache backend tests |
| `parallel` | Parallelism/distributed tests |
| `cpu` | Tests that run on CPU |
| `gpu` | Tests that run on GPU * |
| `cuda` | Tests that run on CUDA * |
| `rocm` | Tests that run on AMD/ROCm * |
| `npu` | Tests that run on NPU/Ascend * |
| `H100` | Tests that require H100 GPU * |
| `L4` | Tests that require L4 GPU * |
| `MI325` | Tests that require MI325 GPU (AMD/ROCm) * |
| `A2` | Tests that require A2 NPU * |
| `A3` | Tests that require A3 NPU * |
| `distributed_cuda` | Tests that require multi cards on CUDA platform * |
| `distributed_rocm` | Tests that require multi cards on ROCm platform * |
| `distributed_npu` | Tests that require multi cards on NPU platform * |
| `skipif_cuda` | Skip if the num of CUDA cards is less than the required * |
| `skipif_rocm` | Skip if the num of ROCm cards is less than the required * |
| `skipif_npu` | Skip if the num of NPU cards is less than the required * |
| `slow` | Slow tests (may skip in quick CI) |
| `benchmark` | Benchmark tests |

\* Means those markers are auto-added, and they will be added by the `@hardware_test` decorator.

### Example usage for markers

Expand Down Expand Up @@ -71,10 +71,7 @@ This decorator is intended to make hardware-aware, cross-platform test authoring
Support for `skipif_rocm` and `skipif_npu` will be implemented later.


5. **Runs each test in a new process**
Automatically wraps the distributed test with a decorator (`@create_new_process_for_each_test`) to ensure isolation and compatibility with multi-process hardware backends.

6. **Works with pytest filtering**
5. **Works with pytest filtering**
Allows tests to be filtered and selected at runtime using standard pytest marker expressions (e.g., `-m "distributed_cuda and L4"`).

#### Example usage for decorator
Expand All @@ -94,7 +91,6 @@ This decorator is intended to make hardware-aware, cross-platform test authoring
```
- `res` must be a dict; supported resources: CUDA (L4/H100), ROCm (MI325), NPU (A2/A3)
- `num_cards` can be int (all platforms) or dict (per platform); defaults to 1 when missing
- `hardware_test` automatically applies `@create_new_process_for_each_test` for distributed tests.
- Distributed markers (`distributed_cuda`, `distributed_rocm`, `distributed_npu`) are auto-added for multi-card cases
- Filtering examples:
- CUDA only: `pytest -m "distributed_cuda and L4"`
Expand Down
4 changes: 4 additions & 0 deletions pyproject.toml
Original file line number Diff line number Diff line change
Expand Up @@ -175,6 +175,10 @@ markers = [
"slow: Slow tests (may skip in quick CI)",
"benchmark: Benchmark tests",
]
filterwarnings = [
"ignore:.*does not have '__test__' attribute.*:UserWarning",
"ignore:.*does not have '__bases__' attribute.*:UserWarning",
]

[tool.typos.default]
extend-ignore-identifiers-re = [
Expand Down
4 changes: 4 additions & 0 deletions tests/benchmarks/test_serve_cli.py
Original file line number Diff line number Diff line change
Expand Up @@ -4,6 +4,7 @@
import pytest

from tests.conftest import OmniServer
from tests.utils import hardware_test

models = ["Qwen/Qwen3-Omni-30B-A3B-Instruct"]
stage_configs = [str(Path(__file__).parent.parent / "e2e" / "stage_configs" / "qwen3_omni_ci.yaml")]
Expand All @@ -29,6 +30,9 @@ def omni_server(request):
print("OmniServer stopped")


@pytest.mark.core_model
@pytest.mark.benchmark
@hardware_test(res={"cuda": "H100"}, num_cards=2)
@pytest.mark.parametrize("omni_server", test_params, indirect=True)
def test_bench_serve_chat(omni_server):
command = [
Expand Down
2 changes: 2 additions & 0 deletions tests/diffusion/cache/test_cache_backends.py
Original file line number Diff line number Diff line change
Expand Up @@ -22,6 +22,8 @@
from vllm_omni.diffusion.cache.teacache.backend import TeaCacheBackend
from vllm_omni.diffusion.data import DiffusionCacheConfig

pytestmark = [pytest.mark.core_model, pytest.mark.cpu]


class TestCacheDiTBackend:
"""Test CacheDiTBackend implementation."""
Expand Down
3 changes: 3 additions & 0 deletions tests/diffusion/lora/test_base_linear.py
Original file line number Diff line number Diff line change
Expand Up @@ -5,10 +5,13 @@

from dataclasses import dataclass

import pytest
import torch

from vllm_omni.diffusion.lora.layers.base_linear import DiffusionBaseLinearLayerWithLoRA

pytestmark = [pytest.mark.core_model, pytest.mark.cpu]


@dataclass
class _DummyLoRAConfig:
Expand Down
3 changes: 3 additions & 0 deletions tests/diffusion/lora/test_lora_manager.py
Original file line number Diff line number Diff line change
Expand Up @@ -3,6 +3,7 @@

from __future__ import annotations

import pytest
import torch
from vllm.lora.lora_weights import LoRALayerWeights
from vllm.lora.utils import get_supported_lora_modules
Expand All @@ -11,6 +12,8 @@
from vllm_omni.diffusion.lora.manager import DiffusionLoRAManager
from vllm_omni.lora.request import LoRARequest

pytestmark = [pytest.mark.core_model, pytest.mark.cpu]


class _DummyLoRALayer:
def __init__(self, n_slices: int, output_slices: tuple[int, ...]):
Expand Down
2 changes: 2 additions & 0 deletions tests/diffusion/test_diffusion_worker.py
Original file line number Diff line number Diff line change
Expand Up @@ -17,6 +17,8 @@

from vllm_omni.diffusion.worker.diffusion_worker import DiffusionWorker

pytestmark = [pytest.mark.core_model, pytest.mark.diffusion, pytest.mark.cpu]


@pytest.fixture
def mock_od_config():
Expand Down
26 changes: 4 additions & 22 deletions tests/distributed/omni_connectors/test_kv_flow.py
Original file line number Diff line number Diff line change
@@ -1,14 +1,15 @@
import pytest
import torch

from tests.utils import hardware_test
from vllm_omni.diffusion.request import OmniDiffusionRequest
from vllm_omni.distributed.omni_connectors.kv_transfer_manager import (
OmniKVCacheConfig,
OmniKVTransferManager,
)
from vllm_omni.inputs.data import OmniDiffusionSamplingParams

pytestmark = [pytest.mark.core_model, pytest.mark.cpu, pytest.mark.cache]


class MockConnector:
def __init__(self):
Expand Down Expand Up @@ -58,11 +59,6 @@ def common_constants():
}


@pytest.mark.cache
@hardware_test(
res={"cuda": "L4"},
num_cards=2,
)
def test_manager_extraction(kv_config, mock_connector, common_constants):
"""Test extraction and sending logic in OmniKVTransferManager."""
num_layers = common_constants["num_layers"]
Expand Down Expand Up @@ -109,11 +105,6 @@ def test_manager_extraction(kv_config, mock_connector, common_constants):
assert data["layer_blocks"]["key_cache"][0].shape == expected_shape


@pytest.mark.cache
@hardware_test(
res={"cuda": "L4"},
num_cards=2,
)
def test_manager_reception(kv_config, mock_connector, common_constants):
"""Test reception and injection logic in OmniKVTransferManager."""
num_layers = common_constants["num_layers"]
Expand Down Expand Up @@ -171,11 +162,6 @@ def test_manager_reception(kv_config, mock_connector, common_constants):
assert req.kv_metadata["seq_len"] == seq_len


@pytest.mark.cache
@hardware_test(
res={"cuda": "L4"},
num_cards=2,
)
def test_integration_flow(common_constants):
"""Simulate extraction -> connector -> reception."""
num_layers = common_constants["num_layers"]
Expand Down Expand Up @@ -211,7 +197,8 @@ def test_integration_flow(common_constants):
recv_timeout=1.0,
)
receiver_manager = OmniKVTransferManager(receiver_config)
receiver_manager._connector = connector # Share the same mock connector instance
# Share the same mock connector instance
receiver_manager._connector = connector

req = OmniDiffusionRequest(
prompts=["test_integ"],
Expand All @@ -228,11 +215,6 @@ def test_integration_flow(common_constants):
assert req.kv_metadata["seq_len"] == 10


@pytest.mark.cache
@hardware_test(
res={"cuda": "L4"},
num_cards=2,
)
def test_manager_extraction_no_connector(kv_config, common_constants):
"""Test extraction when connector is unavailable (should still return IDs)."""
block_size = common_constants["block_size"]
Expand Down
5 changes: 5 additions & 0 deletions tests/e2e/offline_inference/test_cache_dit.py
Original file line number Diff line number Diff line change
Expand Up @@ -15,6 +15,7 @@
import pytest
import torch

from tests.utils import hardware_test
from vllm_omni.inputs.data import OmniDiffusionSamplingParams

# ruff: noqa: E402
Expand All @@ -32,6 +33,10 @@
models = ["riverclouds/qwen_image_random"]


@pytest.mark.core_model
@pytest.mark.diffusion
@pytest.mark.cache
@hardware_test(res={"cuda": "L4", "rocm": "MI325"})
@pytest.mark.parametrize("model_name", models)
def test_cache_dit(model_name: str):
"""Test cache-dit backend with diffusion model."""
Expand Down
5 changes: 4 additions & 1 deletion tests/e2e/offline_inference/test_diffusion_cpu_offload.py
Original file line number Diff line number Diff line change
Expand Up @@ -5,7 +5,7 @@
import torch
from vllm.distributed.parallel_state import cleanup_dist_env_and_memory

from tests.utils import GPUMemoryMonitor
from tests.utils import GPUMemoryMonitor, hardware_test
from vllm_omni.inputs.data import OmniDiffusionSamplingParams
from vllm_omni.platforms import current_omni_platform

Expand Down Expand Up @@ -45,6 +45,9 @@ def inference(model_name: str, offload: bool = True):
return peak


@pytest.mark.core_model
@pytest.mark.diffusion
@hardware_test(res={"cuda": "L4", "rocm": "MI325"})
@pytest.mark.skipif(current_omni_platform.is_npu() or current_omni_platform.is_rocm(), reason="Hardware not supported")
@pytest.mark.parametrize("model_name", models)
def test_cpu_offload_diffusion_model(model_name: str):
Expand Down
10 changes: 7 additions & 3 deletions tests/e2e/offline_inference/test_qwen2_5_omni.py
Original file line number Diff line number Diff line change
Expand Up @@ -13,10 +13,10 @@
from vllm.envs import VLLM_USE_MODELSCOPE
from vllm.multimodal.image import convert_image_mode

from tests.utils import create_new_process_for_each_test, hardware_test
from vllm_omni.platforms import current_omni_platform

from .conftest import OmniRunner
from .utils import create_new_process_for_each_test

models = ["Qwen/Qwen2.5-Omni-3B"]

Expand All @@ -34,8 +34,10 @@


@pytest.mark.core_model
@pytest.mark.parametrize("test_config", test_params)
@pytest.mark.omni
@hardware_test(res={"cuda": "L4", "rocm": "MI325"}, num_cards={"cuda": 4, "rocm": 2})
@create_new_process_for_each_test("spawn")
@pytest.mark.parametrize("test_config", test_params)
def test_mixed_modalities_to_audio(omni_runner: type[OmniRunner], test_config: tuple[str, str]) -> None:
"""Test processing audio, image, and video together, generating audio output."""
model, stage_config_path = test_config
Expand Down Expand Up @@ -94,8 +96,10 @@ def test_mixed_modalities_to_audio(omni_runner: type[OmniRunner], test_config: t


@pytest.mark.core_model
@pytest.mark.parametrize("test_config", test_params)
@pytest.mark.omni
@hardware_test(res={"cuda": "L4", "rocm": "MI325"}, num_cards={"cuda": 4, "rocm": 2})
@create_new_process_for_each_test("spawn")
@pytest.mark.parametrize("test_config", test_params)
def test_mixed_modalities_to_text_only(omni_runner: type[OmniRunner], test_config: tuple[str, str]) -> None:
"""Test processing audio, image, and video together, generating audio output."""
model, stage_config_path = test_config
Expand Down
4 changes: 4 additions & 0 deletions tests/e2e/offline_inference/test_qwen3_omni.py
Original file line number Diff line number Diff line change
Expand Up @@ -14,6 +14,7 @@
import pytest
from vllm.assets.video import VideoAsset

from tests.utils import hardware_test
from vllm_omni.platforms import current_omni_platform

from .conftest import OmniRunner
Expand All @@ -31,6 +32,9 @@
test_params = [(model, stage_config) for model in models for stage_config in stage_configs]


@pytest.mark.core_model
@pytest.mark.omni
@hardware_test(res={"cuda": "H100", "rocm": "MI325"}, num_cards=2)
@pytest.mark.parametrize("test_config", test_params)
def test_video_to_audio(omni_runner: type[OmniRunner], test_config) -> None:
"""Test processing video, generating audio output."""
Expand Down
Loading