Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
28 commits
Select commit Hold shift + click to select a range
b4add5b
[CI] Skip test_bagel[parallel_tp_2] and test_wan22_i2v_online_serving…
yenuo26 Apr 17, 2026
64d368d
[Bugfix] fix CI failure (#2884)
RuixiangMa Apr 17, 2026
f2edb81
[Cleanup] Remove dead runtime.defaults config parameters (#2343)
NickCao Apr 17, 2026
1637dba
[skip CI][Docs] Add Qwen3-Omni and Qwen3-TTS performance blog and fig…
Shirley125 Apr 17, 2026
b5ddff7
Nextstep online e2e (#2107)
Joshna-Medisetty Apr 17, 2026
f346f2f
Add Teacache Support for LongCat Image (#1487)
alex-jw-brooks Apr 17, 2026
5a68c21
[skip ci][recipe] draft vllm-omni recipes (#2646)
hsliuustc0106 Apr 18, 2026
4f71f73
[Docs] Update WeChat QR code for community support (#2895)
david6666666 Apr 18, 2026
d2c23d7
[Refactor] Remove resampy dependency (#2891)
NickCao Apr 18, 2026
4124a1f
[Feature]Support audio streaming input and output-phase2 (#2581)
Shirley125 Apr 18, 2026
768931e
[BugFix]: Fix multi-stage cfg bug (#2801)
princepride Apr 18, 2026
fe6cec6
[doc][skip ci] remove redundant content in readme (#2901)
Shirley125 Apr 18, 2026
9cf1fe7
[Feat] cache-dit for GLM-Image (#1399)
RuixiangMa Apr 18, 2026
9313f37
[Agent] Add NPU main2main skill (#2858)
gcanlin Apr 18, 2026
a683b1d
[Bugfix][VoxCPM2] Fix voice-clone decode loop by padding prefill prom…
Sy0307 Apr 18, 2026
a390381
[Config Refactor][2/N] Pipeline + Deploy Config Schema (#2383)
lishunyang12 Apr 19, 2026
26edc7f
[Bugfix][VoxCPM2]: Fix vectorized_gather OOB under concurrent prefill…
Sy0307 Apr 19, 2026
1568451
perf(helios): replace strided RoPE with stack+flatten for contiguous …
willamhou Apr 19, 2026
93beef1
[Bugfix] diffusion end points allow model mismatch (#2805)
xiaohajiayou Apr 19, 2026
68f28f9
[Feat] Support layerwise CPU offloading for more videogen models (#2018)
yuanheng-zhao Apr 19, 2026
cd384d9
[Config Refactor 2.5/N] Centralize pipeline registry (#2915)
lishunyang12 Apr 19, 2026
78f237e
[Perf] Optimize Wan2.2 device free on image preprocess (#2852)
fan2956 Apr 20, 2026
d435fe0
[Docs] update documents (#2921)
R2-Y Apr 20, 2026
0393c58
[BugFix] Fixed the issue where --no-async-chunk was not working. (#2934)
amy-why-3459 Apr 20, 2026
8a9add1
[CI] Restructure vLLM-Omni Test Layout, Fixture Scope, and Support Mo…
yenuo26 Apr 20, 2026
2d7a64e
Merge origin/main into dev/migrate-MR-v2 with semantic-safe conflict …
Sy0307 Apr 20, 2026
fd91ad9
Fix merge review issues on semantic-safe sync branch
Sy0307 Apr 20, 2026
013005a
Fix scheduler finished cleanup on semantic-safe sync branch
Sy0307 Apr 20, 2026
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
2 changes: 1 addition & 1 deletion .buildkite/test-nightly.yml
Original file line number Diff line number Diff line change
Expand Up @@ -552,7 +552,7 @@ steps:
- label: ":full_moon: Diffusion X2V · Accuracy Test"
timeout_in_minutes: 180
commands:
- pytest -s -v tests/e2e/accuracy/wan22_i2v/test_wan22_i2v_video_similarity.py --run-level advanced_model
- pytest -s -v tests/e2e/accuracy/wan22_i2v/test_wan22_i2v_video_similarity.py -m advanced_model --run-level advanced_model
agents:
queue: "mithril-h100-pool"
plugins:
Expand Down
2 changes: 1 addition & 1 deletion docs/.nav.yml
Original file line number Diff line number Diff line change
Expand Up @@ -107,7 +107,7 @@ nav:
- design/feature/hsdp.md
- design/feature/cache_dit.md
- design/feature/teacache.md
- design/feature/async_chunk_design.md
- design/feature/async_chunk.md
- design/feature/vae_parallel.md
- design/feature/diffusion_step_execution.md
- Module Design:
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -40,7 +40,7 @@ Currently all the features are available in online serving mode. Hence, only nee
- Test marks: always add `advanced_model` and `diffusion`. Add GPU-related marks if needed. Ref: [Markers for Tests](https://docs.vllm.ai/projects/vllm-omni/en/latest/contributing/ci/tests_markers/).
- To maximize code reuse, you may refer to
- `tests/conftest.py` for `omni_server` (running server in subprocess) and `openai_client` fixtures (sending requests and validating output), `generate_synthetic_image` and `assert_XXX_valid` helper.
- `tests/utils.py` for `@hardware_test(...)` and `hardware_marks`.
- `tests/helpers/mark.py` for `@hardware_test(...)` and `hardware_marks`.
- [Parametrizing tests (pytest doc)](https://docs.pytest.org/en/stable/example/parametrize.html) to reuse test function implementation for different cases.
- Doc: add a concise docstring for each test function.
- Reference L4 test implementation: [tests/e2e/online_serving/test_qwen_image_edit_expansion.py](https://github.com/vllm-project/vllm-omni/blob/main/tests/e2e/online_serving/test_qwen_image_edit_expansion.py).
12 changes: 6 additions & 6 deletions docs/contributing/ci/tests_markers.md
Original file line number Diff line number Diff line change
Expand Up @@ -38,7 +38,7 @@ Defined in `pyproject.toml`:
### Example usage for markers

```python
from tests.utils import hardware_test
from tests.helpers.mark import hardware_test

@pytest.mark.core_model
@pytest.mark.omni
Expand All @@ -53,7 +53,7 @@ def test_video_to_audio()

### Decorator: `@hardware_test`

This decorator is intended to make hardware-aware, cross-platform test authoring easier and more robust for CI/CD environments. The `hardware_test` decorator in `vllm-omni/tests/utils.py` performs the following actions:
This decorator is intended to make hardware-aware, cross-platform test authoring easier and more robust for CI/CD environments. The `hardware_test` decorator in `vllm-omni/tests/helpers/mark.py` performs the following actions:

1. **Applies platform and resource markers**
Adds the appropriate pytest markers for each specified hardware platform (e.g., `cuda`, `rocm`, `xpu`, `npu`) and resource type (e.g., `L4`, `H100`, `MI325`, `B60`, `A2`, `A3`).
Expand Down Expand Up @@ -105,7 +105,7 @@ This decorator is intended to make hardware-aware, cross-platform test authoring
`hardware_marks` returns a list of pytest mark objects with the same signature as `@hardware_test`. Use it when you need more flexibility, such as attaching hardware marks to individual `pytest.param` entries rather than an entire test function.

```python
from tests.utils import hardware_marks
from tests.helpers.mark import hardware_marks

MULTI_CARD_MARKS = hardware_marks(
res={"cuda": "H100", "rocm": "MI325", "npu": "A2"}, num_cards=2
Expand Down Expand Up @@ -133,9 +133,9 @@ If you want to add support for a new platform (e.g., "tpu" for a new accelerator
"distributed_tpu: Tests that require multiple TPU devices",
]
```
2. **Implement a marker construction function for your platform** in `vllm-omni/tests/utils.py`:
2. **Implement a marker construction function for your platform** in `vllm-omni/tests/helpers/mark.py`:
```python
# In vllm-omni/tests/utils.py
# In vllm-omni/tests/helpers/mark.py

def tpu_marks(*, res: str, num_cards: int):
test_platform = pytest.mark.tpu
Expand Down Expand Up @@ -175,4 +175,4 @@ If you want to add support for a new platform (e.g., "tpu" for a new accelerator
- Plug into `hardware_marks`
- You're done: tests using `@hardware_test` or `hardware_marks` with your platform now automatically get the correct markers, distribution, and isolation!

See code in `vllm-omni/tests/utils.py` for existing examples (`cuda_marks`, `rocm_marks`, `npu_marks`).
See code in `vllm-omni/tests/helpers/mark.py` for existing examples (`cuda_marks`, `rocm_marks`, `npu_marks`).
14 changes: 6 additions & 8 deletions docs/contributing/ci/tests_style.md
Original file line number Diff line number Diff line change
Expand Up @@ -221,15 +221,13 @@ from pathlib import Path
import openai
import pytest

from tests.conftest import (
OmniServer,
convert_audio_to_text,
from tests.helpers.media import (
convert_audio_bytes_to_text,
cosine_similarity_text,
dummy_messages_from_mix_data,
generate_synthetic_video,
merge_base64_and_convert_to_text,
)
from tests.utils import get_deploy_config_path
from tests.helpers.runtime import OmniServer, dummy_messages_from_mix_data
from tests.helpers.stage_config import get_deploy_config_path, modify_stage_config
from vllm_omni.platforms import current_omni_platform

# Edit: model name and stage config path
Expand Down Expand Up @@ -406,7 +404,7 @@ def test_mix_to_text_audio_001(client: openai.OpenAI, omni_server, request) -> N
# PURPOSE: Verify text and audio outputs convey the same information
# CUSTOMIZATION: Adjust similarity threshold (0.9) based on accuracy requirements
assert audio_data is not None, "No audio output is generated"
audio_content = merge_base64_and_convert_to_text(audio_data)
audio_content = convert_audio_bytes_to_text(audio_data)
print(f"text content is: {text_content}")
print(f"audio content is: {audio_content}")
similarity = cosine_similarity_text(audio_content.lower(), text_content.lower())
Expand All @@ -429,7 +427,7 @@ from pathlib import Path
import pytest
from vllm.assets.video import VideoAsset

from tests.utils import hardware_test
from tests.helpers.mark import hardware_test
from ..multi_stages.conftest import OmniRunner

# Optional: set process start method for workers
Expand Down
17 changes: 8 additions & 9 deletions docs/contributing/model/adding_omni_model.md
Original file line number Diff line number Diff line change
Expand Up @@ -408,18 +408,17 @@ Understanding the data structures is crucial for implementing stage transitions:

**Input to your function:**
- `stage_list[source_stage_id].engine_outputs`: List of `EngineCoreOutput` objects
- Each contains `outputs`: List of `RequestOutput` objects
- Each `RequestOutput` has:
- `token_ids`: Generated token IDs
- `multimodal_output`: Dict with keys like `"code_predictor_codes"`, etc.
- These are the hidden states or intermediate outputs from the model's forward pass
- `prompt_token_ids`: Original prompt token IDs
- - Each contains `outputs`: List of `RequestOutput` objects
- Each `RequestOutput` has:
- - - `token_ids`: Generated token IDs
- `multimodal_output`: Dict with keys like `"code_predictor_codes"`, etc.These are the hidden states or intermediate outputs from the model's forward pass
- `prompt_token_ids`: Original prompt token IDs

**Output from your function:**
- Must return `list[OmniTokensPrompt]` where each `OmniTokensPrompt` contains:
- `prompt_token_ids`: List[int] - Token IDs for the next stage
- `additional_information`: Dict[str, Any] - Optional metadata (e.g., embeddings, hidden states)
- `multi_modal_data`: Optional multimodal data if needed
- - `prompt_token_ids`: List[int] - Token IDs for the next stage
- `additional_information`: Dict[str, Any] - Optional metadata (e.g., embeddings, hidden states)
- `multi_modal_data`: Optional multimodal data if needed

### How Model Outputs Are Stored

Expand Down
4 changes: 2 additions & 2 deletions docs/contributing/model/adding_tts_model.md
Original file line number Diff line number Diff line change
Expand Up @@ -28,7 +28,7 @@ and can be placed on different devices. Qwen3-TTS has two stages:

Each stage is a separate model class configured independently via YAML. The two stages
are connected by the `async_chunk` framework, which enables inter-stage streaming for
low first-packet latency (see [Async Chunk Design](../../design/feature/async_chunk_design.md)).
low first-packet latency (see [Async Chunk Design](../../design/feature/async_chunk.md)).

### Without async_chunk (batch mode)

Expand Down Expand Up @@ -591,5 +591,5 @@ Adding a TTS model to vLLM-Omni involves:
For more information, see:

- [Architecture Overview](../../design/architecture_overview.md)
- [Async Chunk Design](../../design/feature/async_chunk_design.md)
- [Async Chunk Design](../../design/feature/async_chunk.md)
- [Stage Configuration Guide](../../configuration/stage_configs.md)
Original file line number Diff line number Diff line change
@@ -1,4 +1,4 @@
# Async Chunk Design
# Async Chunk

## Table of Contents

Expand Down Expand Up @@ -88,8 +88,9 @@ The following diagram illustrates the **Async Chunk Architecture** for multi-sta
</p>

**Diagram Legend:**

| Step | Stage Type | Description |
|:------:|:-----------:|:------------|
|------|-----------|------------|
| `prefill` | Initialization | Context processing, KV cache initialization |
| `decode` | Autoregressive | Token-by-token generation in AR stages |
| `codes` | Audio Encoding | RVQ codec codes from Talker stage |
Expand Down
Binary file modified docs/source/architecture/async-chunk-architecture.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading