Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
1 change: 1 addition & 0 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -17,6 +17,7 @@ Easy, fast, and cheap omni-modality model serving for everyone

*Latest News* 🔥

- [2026/02] We released [0.16.0](https://github.com/vllm-project/vllm-omni/releases/tag/v0.16.0) - A major alignment + capability release that rebases onto **upstream vLLM v0.16.0** and significantly expands performance, distributed execution, and production readiness across **Qwen3-Omni / Qwen3-TTS**, **Bagel**, **MiMo-Audio**, **GLM-Image** and the **Diffusion (DiT) image/video stack**—while also improving platform coverage (CUDA / ROCm / NPU / XPU), CI quality, and documentation.
- [2026/02] We released [0.14.0](https://github.com/vllm-project/vllm-omni/releases/tag/v0.14.0) - This is the first **stable release** of vLLM-Omni that expands Omni’s diffusion / image-video generation and audio / TTS stack, improves distributed execution and memory efficiency, and broadens platform/backend coverage (GPU/ROCm/NPU/XPU). It also brings meaningful upgrades to serving APIs, profiling & benchmarking, and overall stability. Please check our latest [paper](https://arxiv.org/abs/2602.02204) for architecture design and performance results.
- [2026/01] We released [0.12.0rc1](https://github.com/vllm-project/vllm-omni/releases/tag/v0.12.0rc1) - a major RC milestone focused on maturing the diffusion stack, strengthening OpenAI-compatible serving, expanding omni-model coverage, and improving stability across platforms (GPU/NPU/ROCm).
- [2025/11] vLLM community officially released [vllm-project/vllm-omni](https://github.com/vllm-project/vllm-omni) in order to support omni-modality models serving.
Expand Down
2 changes: 1 addition & 1 deletion docs/getting_started/installation/gpu.md
Original file line number Diff line number Diff line change
Expand Up @@ -30,7 +30,7 @@ vLLM-Omni is a Python library that supports the following GPU variants. The libr

### Pre-built wheels

Note: Pre-built wheels are currently only available for vLLM-Omni 0.11.0rc1, 0.12.0rc1, 0.14.0rc1, 0.14.0. For the latest version, please [build from source](https://docs.vllm.ai/projects/vllm-omni/en/latest/getting_started/installation/gpu/#build-wheel-from-source).
Note: Pre-built wheels are currently only available for vLLM-Omni 0.11.0rc1, 0.12.0rc1, 0.14.0rc1, 0.14.0, 0.16.0. For the latest version, please [build from source](https://docs.vllm.ai/projects/vllm-omni/en/latest/getting_started/installation/gpu/#build-wheel-from-source).

=== "NVIDIA CUDA"

Expand Down
4 changes: 2 additions & 2 deletions docs/getting_started/installation/gpu/cuda.inc.md
Original file line number Diff line number Diff line change
Expand Up @@ -40,7 +40,7 @@ If you do not need to modify source code of vLLM, you can directly install the s
uv pip install vllm==0.16.0 --torch-backend=auto
```

The release 0.14.0 of vLLM is based on PyTorch 2.9.0 which requires CUDA 12.9 environment.
The release 0.16.0 of vLLM is based on PyTorch 2.9.0 which requires CUDA 12.9 environment.

#### Installation of vLLM-Omni
Since vllm-omni is rapidly evolving, it's recommended to install it from source
Expand Down Expand Up @@ -91,7 +91,7 @@ docker run --runtime nvidia --gpus 2 \
--env "HF_TOKEN=$HF_TOKEN" \
-p 8091:8091 \
--ipc=host \
vllm/vllm-omni:v0.14.0 \
vllm/vllm-omni:v0.16.0 \
--model Qwen/Qwen3-Omni-30B-A3B-Instruct --port 8091
```

Expand Down
4 changes: 2 additions & 2 deletions docs/getting_started/installation/gpu/rocm.inc.md
Original file line number Diff line number Diff line change
Expand Up @@ -130,7 +130,7 @@ docker run --rm \
-v ~/.cache/huggingface:/root/.cache/huggingface \
--env "HF_TOKEN=$HF_TOKEN" \
-p 8091:8091 \
vllm/vllm-omni-rocm:v0.14.0 \
vllm/vllm-omni-rocm:v0.16.0 \
--model Qwen/Qwen3-Omni-30B-A3B-Instruct --omni --port 8091
```

Expand All @@ -149,7 +149,7 @@ docker run --rm -it \
-v ~/.cache/huggingface:/root/.cache/huggingface \
--env "HF_TOKEN=$HF_TOKEN" \
--entrypoint bash \
vllm/vllm-omni-rocm:v0.14.0
vllm/vllm-omni-rocm:v0.16.0
```

# --8<-- [end:pre-built-images]