diff --git a/README.md b/README.md index f3dab1c811..35b0594b7a 100644 --- a/README.md +++ b/README.md @@ -17,6 +17,7 @@ Easy, fast, and cheap omni-modality model serving for everyone *Latest News* 🔥 +- [2026/02] We released [0.16.0](https://github.com/vllm-project/vllm-omni/releases/tag/v0.16.0) - A major alignment + capability release that rebases onto **upstream vLLM v0.16.0** and significantly expands performance, distributed execution, and production readiness across **Qwen3-Omni / Qwen3-TTS**, **Bagel**, **MiMo-Audio**, **GLM-Image** and the **Diffusion (DiT) image/video stack**—while also improving platform coverage (CUDA / ROCm / NPU / XPU), CI quality, and documentation. - [2026/02] We released [0.14.0](https://github.com/vllm-project/vllm-omni/releases/tag/v0.14.0) - This is the first **stable release** of vLLM-Omni that expands Omni’s diffusion / image-video generation and audio / TTS stack, improves distributed execution and memory efficiency, and broadens platform/backend coverage (GPU/ROCm/NPU/XPU). It also brings meaningful upgrades to serving APIs, profiling & benchmarking, and overall stability. Please check our latest [paper](https://arxiv.org/abs/2602.02204) for architecture design and performance results. - [2026/01] We released [0.12.0rc1](https://github.com/vllm-project/vllm-omni/releases/tag/v0.12.0rc1) - a major RC milestone focused on maturing the diffusion stack, strengthening OpenAI-compatible serving, expanding omni-model coverage, and improving stability across platforms (GPU/NPU/ROCm). - [2025/11] vLLM community officially released [vllm-project/vllm-omni](https://github.com/vllm-project/vllm-omni) in order to support omni-modality models serving. diff --git a/docs/getting_started/installation/gpu.md b/docs/getting_started/installation/gpu.md index 963e38de49..508ea307da 100644 --- a/docs/getting_started/installation/gpu.md +++ b/docs/getting_started/installation/gpu.md @@ -30,7 +30,7 @@ vLLM-Omni is a Python library that supports the following GPU variants. The libr ### Pre-built wheels -Note: Pre-built wheels are currently only available for vLLM-Omni 0.11.0rc1, 0.12.0rc1, 0.14.0rc1, 0.14.0. For the latest version, please [build from source](https://docs.vllm.ai/projects/vllm-omni/en/latest/getting_started/installation/gpu/#build-wheel-from-source). +Note: Pre-built wheels are currently only available for vLLM-Omni 0.11.0rc1, 0.12.0rc1, 0.14.0rc1, 0.14.0, 0.16.0. For the latest version, please [build from source](https://docs.vllm.ai/projects/vllm-omni/en/latest/getting_started/installation/gpu/#build-wheel-from-source). === "NVIDIA CUDA" diff --git a/docs/getting_started/installation/gpu/cuda.inc.md b/docs/getting_started/installation/gpu/cuda.inc.md index c261e0d7ac..1e1737e434 100644 --- a/docs/getting_started/installation/gpu/cuda.inc.md +++ b/docs/getting_started/installation/gpu/cuda.inc.md @@ -40,7 +40,7 @@ If you do not need to modify source code of vLLM, you can directly install the s uv pip install vllm==0.16.0 --torch-backend=auto ``` -The release 0.14.0 of vLLM is based on PyTorch 2.9.0 which requires CUDA 12.9 environment. +The release 0.16.0 of vLLM is based on PyTorch 2.9.0 which requires CUDA 12.9 environment. #### Installation of vLLM-Omni Since vllm-omni is rapidly evolving, it's recommended to install it from source @@ -91,7 +91,7 @@ docker run --runtime nvidia --gpus 2 \ --env "HF_TOKEN=$HF_TOKEN" \ -p 8091:8091 \ --ipc=host \ - vllm/vllm-omni:v0.14.0 \ + vllm/vllm-omni:v0.16.0 \ --model Qwen/Qwen3-Omni-30B-A3B-Instruct --port 8091 ``` diff --git a/docs/getting_started/installation/gpu/rocm.inc.md b/docs/getting_started/installation/gpu/rocm.inc.md index 634969506a..dc280bb946 100644 --- a/docs/getting_started/installation/gpu/rocm.inc.md +++ b/docs/getting_started/installation/gpu/rocm.inc.md @@ -130,7 +130,7 @@ docker run --rm \ -v ~/.cache/huggingface:/root/.cache/huggingface \ --env "HF_TOKEN=$HF_TOKEN" \ -p 8091:8091 \ - vllm/vllm-omni-rocm:v0.14.0 \ + vllm/vllm-omni-rocm:v0.16.0 \ --model Qwen/Qwen3-Omni-30B-A3B-Instruct --omni --port 8091 ``` @@ -149,7 +149,7 @@ docker run --rm -it \ -v ~/.cache/huggingface:/root/.cache/huggingface \ --env "HF_TOKEN=$HF_TOKEN" \ --entrypoint bash \ - vllm/vllm-omni-rocm:v0.14.0 + vllm/vllm-omni-rocm:v0.16.0 ``` # --8<-- [end:pre-built-images]