[Doc] Update docs and dockerfiles for rebase of vllm v0.18.0#2038
Conversation
…tion - Updated VLLM_VERSION in pipeline-intel.yaml and Dockerfiles for CI, ROCm, and XPU to 0.18.0. - Modified installation instructions in quickstart.md, gpu.md, cuda.inc.md, and rocm.inc.md to reflect the new version. - Adjusted pre-built wheel availability note in gpu.md to include version 0.18.0. - Updated Docker image tags in documentation to use v0.18.0. Signed-off-by: tzhouam <tzhouam@connect.ust.hk>
|
@tjtanaa please help me to review if the AMD part is correct |
|
@xuechendi Please help me to see if the Intel part is correct. |
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: c1f8001643
ℹ️ About Codex in GitHub
Codex has been enabled to automatically review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
When you sign up for Codex through ChatGPT, Codex can also answer questions or update the PR, like "@codex address that feedback".
| ARG VLLM_VERSION=v0.18.0 | ||
| RUN git clone -b ${VLLM_VERSION} https://github.com/vllm-project/vllm |
There was a problem hiding this comment.
Keep XPU builds on a vLLM ref that exists
Changing VLLM_VERSION to v0.18.0 breaks every Intel/XPU image build on this commit. .buildkite/scripts/hardware_ci/run-xpu-test.sh forwards that value into docker build ... --build-arg VLLM_VERSION=${VLLM_VERSION}, and this Dockerfile then runs git clone -b ${VLLM_VERSION} https://github.com/vllm-project/vllm; today https://github.com/vllm-project/vllm/tree/v0.18.0 404s and upstream PyPI still tops out at 0.17.1, so the clone step fails before CI or local XPU builds can even start.
Useful? React with 👍 / 👎.
| @@ -1,4 +1,4 @@ | |||
| ARG BASE_IMAGE=vllm/vllm-openai-rocm:v0.17.0 | |||
| ARG BASE_IMAGE=vllm/vllm-openai-rocm:v0.18.0 | |||
There was a problem hiding this comment.
Point the ROCm Dockerfile at a published base image
Using vllm/vllm-openai-rocm:v0.18.0 as the default base image makes the ROCm Dockerfile unbuildable. Docker Hub's current tag list for vllm/vllm-openai-rocm includes v0.17.1, v0.17.0, etc., but no v0.18.0, so docker build -f docker/Dockerfile.rocm now fails immediately on the initial FROM unless callers override BASE_IMAGE.
Useful? React with 👍 / 👎.
| ### Pre-built wheels | ||
|
|
||
| Note: Pre-built wheels are currently only available for vLLM-Omni 0.11.0rc1, 0.12.0rc1, 0.14.0rc1, 0.14.0, 0.16.0. For the latest version, please [build from source](https://docs.vllm.ai/projects/vllm-omni/en/latest/getting_started/installation/gpu/#build-wheel-from-source). | ||
| Note: Pre-built wheels are currently available for vLLM-Omni 0.11.0rc1, 0.12.0rc1, 0.14.0rc1, 0.14.0, 0.16.0, and 0.18.0. If you need a newer unreleased revision, please [build from source](https://docs.vllm.ai/projects/vllm-omni/en/latest/getting_started/installation/gpu/#build-wheel-from-source). |
There was a problem hiding this comment.
Stop documenting 0.18.0 artifacts before they ship
This diff updates the GPU install/deploy docs to vllm==0.18.0, git checkout v0.18.0, and vllm/vllm-omni*:v0.18.0, but those artifacts are not published yet: upstream vLLM still lists 0.17.1 on PyPI, vllm-omni still lists 0.16.0 on PyPI/Docker Hub, and the new v0.18.0 release/tag/image URLs 404. Anyone following the updated CUDA/ROCm install or deployment snippets will now hit a hard download/pull failure instead of a working setup.
Useful? React with 👍 / 👎.
Thanks, I have gone through the doc, it looks good to me on intel related part. |
Removed installation of vLLM and related dependencies from Dockerfile. Signed-off-by: Gao Han <hgaoaf@connect.ust.hk>
### vllm-omni-audio-tts - Source: [PR #2059](vllm-project/vllm-omni#2059) - [BugFix][Qwen3TTS] CodePredictor CudaGraph Pool - Changes: - Bug fix: [BugFix][Qwen3TTS] CodePredictor CudaGraph Pool ### vllm-omni-perf - Source: [PR #2059](vllm-project/vllm-omni#2059) - [BugFix][Qwen3TTS] CodePredictor CudaGraph Pool - Changes: - Bug fix: [BugFix][Qwen3TTS] CodePredictor CudaGraph Pool ### vllm-omni-api - Source: [PR #2058](vllm-project/vllm-omni#2058) - [Bugfix] Fix Fish Speech and CosyVoice3 online serving - missing is_comprehension and broken model detection - Changes: - Bug fix: [Bugfix] Fix Fish Speech and CosyVoice3 online serving - missing is_comprehension and broken model detection ### vllm-omni-contrib - Source: [PR #2045](vllm-project/vllm-omni#2045) - [Voxtral] Improve example ### vllm-omni-cicd - Source: [PR #2045](vllm-project/vllm-omni#2045) - [Voxtral] Improve example ### vllm-omni-api - Source: [PR #2042](vllm-project/vllm-omni#2042) - [bugfix] /chat/completion doesn't read extra_body for diffusion model - Changes: - Bug fix: [bugfix] /chat/completion doesn't read extra_body for diffusion model ### vllm-omni-perf - Source: [PR #2042](vllm-project/vllm-omni#2042) - [bugfix] /chat/completion doesn't read extra_body for diffusion model - Changes: - Bug fix: [bugfix] /chat/completion doesn't read extra_body for diffusion model ### vllm-omni-contrib - Source: [PR #2038](vllm-project/vllm-omni#2038) - [Doc] Update docs and dockerfiles for rebase of vllm v0.18.0 ### vllm-omni-serving - Source: [PR #2037](vllm-project/vllm-omni#2037) - [Rebase] Rebase to vllm v0.18.0 ### vllm-omni-contrib - Source: [PR #2037](vllm-project/vllm-omni#2037) - [Rebase] Rebase to vllm v0.18.0 ### vllm-omni-api - Source: [PR #2037](vllm-project/vllm-omni#2037) - [Rebase] Rebase to vllm v0.18.0 ### vllm-omni-cicd - Source: [PR #2037](vllm-project/vllm-omni#2037) - [Rebase] Rebase to vllm v0.18.0 ### vllm-omni-cicd - Source: [PR #2032](vllm-project/vllm-omni#2032) - [CI] Change Bagel online test environment variable `VLLM_TEST_CLEAN_GPU_MEMORY` to `0` ### vllm-omni-cicd - Source: [PR #2031](vllm-project/vllm-omni#2031) - [CI] Fix test. - Changes: - Bug fix: [CI] Fix test. ### vllm-omni-cicd - Source: [PR #2017](vllm-project/vllm-omni#2017) - [CI] [ROCm] Setup `test-ready.yml` and `test-merge.yml` ### vllm-omni-cicd - Source: [PR #2014](vllm-project/vllm-omni#2014) - [Test] Implement mock HTTP request handling in benchmark CLI tests ### vllm-omni-perf - Source: [PR #2014](vllm-project/vllm-omni#2014) - [Test] Implement mock HTTP request handling in benchmark CLI tests ### vllm-omni-serving - Source: [PR #2012](vllm-project/vllm-omni#2012) - [Fixbug][Perf] Qwen3-omni: code predictor with re-prefill + SDPA and eliminate decode hot-path CPU round-trips - Changes: - Bug fix: [Fixbug][Perf] Qwen3-omni: code predictor with re-prefill + SDPA and eliminate decode hot-path CPU round-trips ### vllm-omni-image-gen - Source: [PR #2012](vllm-project/vllm-omni#2012) - [Fixbug][Perf] Qwen3-omni: code predictor with re-prefill + SDPA and eliminate decode hot-path CPU round-trips - Changes: - Bug fix: [Fixbug][Perf] Qwen3-omni: code predictor with re-prefill + SDPA and eliminate decode hot-path CPU round-trips ### vllm-omni-perf - Source: [PR #2012](vllm-project/vllm-omni#2012) - [Fixbug][Perf] Qwen3-omni: code predictor with re-prefill + SDPA and eliminate decode hot-path CPU round-trips - Changes: - Bug fix: [Fixbug][Perf] Qwen3-omni: code predictor with re-prefill + SDPA and eliminate decode hot-path CPU round-trips ### vllm-omni-serving - Source: [PR #2009](vllm-project/vllm-omni#2009) - [Bugfix] revert PR#1758 which introduced the accuracy problem of qwen3-omni - Changes: - Bug fix: [Bugfix] revert PR#1758 which introduced the accuracy problem of qwen3-omni ### vllm-omni-image-gen - Source: [PR #2007](vllm-project/vllm-omni#2007) - [Bugfix]Fix bug of online server can not return mutli images - Changes: - Bug fix: [Bugfix]Fix bug of online server can not return mutli images - Additions: - Qwen-Image-Layered - Qwen-Image-Layered - Qwen-Image-Layered ### vllm-omni-api - Source: [PR #2007](vllm-project/vllm-omni#2007) - [Bugfix]Fix bug of online server can not return mutli images - Changes: - Bug fix: [Bugfix]Fix bug of online server can not return mutli images ### vllm-omni-cicd - Source: [PR #1998](vllm-project/vllm-omni#1998) - [CI] Split BAGEL tests into dummy/real weight tiers (L2/L3) ### vllm-omni-serving - Source: [PR #1985](vllm-project/vllm-omni#1985) - [Perf] [Qwen3-TTS] Keep audio_codes and last_talker_hidden on GPU to eliminate per-step sync stalls - Changes: - Performance improvement: [Perf] [Qwen3-TTS] Keep audio_codes and last_talker_hidden on GPU to eliminate per-step sync stalls ### vllm-omni-audio-tts - Source: [PR #1985](vllm-project/vllm-omni#1985) - [Perf] [Qwen3-TTS] Keep audio_codes and last_talker_hidden on GPU to eliminate per-step sync stalls - Changes: - Performance improvement: [Perf] [Qwen3-TTS] Keep audio_codes and last_talker_hidden on GPU to eliminate per-step sync stalls ### vllm-omni-perf - Source: [PR #1985](vllm-project/vllm-omni#1985) - [Perf] [Qwen3-TTS] Keep audio_codes and last_talker_hidden on GPU to eliminate per-step sync stalls - Changes: - Performance improvement: [Perf] [Qwen3-TTS] Keep audio_codes and last_talker_hidden on GPU to eliminate per-step sync stalls ### vllm-omni-serving - Source: [PR #1984](vllm-project/vllm-omni#1984) - [CI] [ROCm] Bugfix device environment issue - Changes: - Bug fix: [CI] [ROCm] Bugfix device environment issue ### vllm-omni-api - Source: [PR #1984](vllm-project/vllm-omni#1984) - [CI] [ROCm] Bugfix device environment issue - Changes: - Bug fix: [CI] [ROCm] Bugfix device environment issue ### vllm-omni-serving - Source: [PR #1982](vllm-project/vllm-omni#1982) - [Fix] Fix slow hasattr in CUDAGraphWrapper.__getattr__ - Changes: - Bug fix: [Fix] Fix slow hasattr in CUDAGraphWrapper.__getattr__ ### vllm-omni-cicd - Source: [PR #1982](vllm-project/vllm-omni#1982) - [Fix] Fix slow hasattr in CUDAGraphWrapper.__getattr__ - Changes: - Bug fix: [Fix] Fix slow hasattr in CUDAGraphWrapper.__getattr__ ### vllm-omni-api - Source: [PR #1979](vllm-project/vllm-omni#1979) - [Bugfix] Fix config misalignment between offline and online diffusion inference (Wan2.2, Qwen-Image series) - Changes: - Bug fix: [Bugfix] Fix config misalignment between offline and online diffusion inference (Wan2.2, Qwen-Image series) - Additions: - `/v1/chat/completions` ### vllm-omni-perf - Source: [PR #1979](vllm-project/vllm-omni#1979) - [Bugfix] Fix config misalignment between offline and online diffusion inference (Wan2.2, Qwen-Image series) - Changes: - Bug fix: [Bugfix] Fix config misalignment between offline and online diffusion inference (Wan2.2, Qwen-Image series) ### vllm-omni-contrib - Source: [PR #1976](vllm-project/vllm-omni#1976) - [skip ci][Docs] Update WeChat QR code (fix filename case) - Changes: - Bug fix: [skip ci][Docs] Update WeChat QR code (fix filename case) ### vllm-omni-contrib - Source: [PR #1974](vllm-project/vllm-omni#1974) - [Docs] Update WeChat QR code for community support ### vllm-omni-cicd - Source: [PR #1945](vllm-project/vllm-omni#1945) - Fix Base voice clone streaming quality and stop-token crash - Changes: - Bug fix: Fix Base voice clone streaming quality and stop-token crash ### vllm-omni-cicd - Source: [PR #1938](vllm-project/vllm-omni#1938) - [Test] L4 complete diffusion feature test for Bagel models - Changes: - New feature: [Test] L4 complete diffusion feature test for Bagel models ### vllm-omni-perf - Source: [PR #1938](vllm-project/vllm-omni#1938) - [Test] L4 complete diffusion feature test for Bagel models - Changes: - New feature: [Test] L4 complete diffusion feature test for Bagel models ### vllm-omni-perf - Source: [PR #1934](vllm-project/vllm-omni#1934) - Fix OmniGen2 transformer config loading for HF models - Changes: - Bug fix: Fix OmniGen2 transformer config loading for HF models ### vllm-omni-audio-tts - Source: [PR #1930](vllm-project/vllm-omni#1930) - [Bug][Qwen3TTS][Streaming] remove dynamic initial chunk and only compute on initial request ### vllm-omni-perf - Source: [PR #1930](vllm-project/vllm-omni#1930) - [Bug][Qwen3TTS][Streaming] remove dynamic initial chunk and only compute on initial request ### vllm-omni-audio-tts - Source: [PR #1926](vllm-project/vllm-omni#1926) - [Misc] removed qwen3_tts.py as it is out-dated ### vllm-omni-contrib - Source: [PR #1920](vllm-project/vllm-omni#1920) - [Docs] Add Wan2.1-T2V as supported video generation models - Changes: - New feature: [Docs] Add Wan2.1-T2V as supported video generation models ### vllm-omni-video-gen - Source: [PR #1915](vllm-project/vllm-omni#1915) - [Bugfix] fix helios video generate use cpu device - Changes: - Bug fix: [Bugfix] fix helios video generate use cpu device ### vllm-omni-perf - Source: [PR #1915](vllm-project/vllm-omni#1915) - [Bugfix] fix helios video generate use cpu device - Changes: - Bug fix: [Bugfix] fix helios video generate use cpu device ### vllm-omni-audio-tts - Source: [PR #1913](vllm-project/vllm-omni#1913) - [Optim][Qwen3TTS][CodePredictor] support torch.compile with reduce-overhead and dynamic False ### vllm-omni-perf - Source: [PR #1913](vllm-project/vllm-omni#1913) - [Optim][Qwen3TTS][CodePredictor] support torch.compile with reduce-overhead and dynamic False ### vllm-omni-api - Source: [PR #1908](vllm-project/vllm-omni#1908) - [Entrypoint][Refactor] vLLM-Omni Entrypoint Refactoring ### vllm-omni-perf - Source: [PR #1908](vllm-project/vllm-omni#1908) - [Entrypoint][Refactor] vLLM-Omni Entrypoint Refactoring ### vllm-omni-contrib - Source: [PR #1908](vllm-project/vllm-omni#1908) - [Entrypoint][Refactor] vLLM-Omni Entrypoint Refactoring ### vllm-omni-serving - Source: [PR #1908](vllm-project/vllm-omni#1908) - [Entrypoint][Refactor] vLLM-Omni Entrypoint Refactoring ### vllm-omni-cicd - Source: [PR #1908](vllm-project/vllm-omni#1908) - [Entrypoint][Refactor] vLLM-Omni Entrypoint Refactoring ### vllm-omni-image-gen - Source: [PR #1900](vllm-project/vllm-omni#1900) - [Feat] support HSDP for Flux family - Changes: - New feature: [Feat] support HSDP for Flux family ### vllm-omni-contrib - Source: [PR #1900](vllm-project/vllm-omni#1900) - [Feat] support HSDP for Flux family - Changes: - New feature: [Feat] support HSDP for Flux family ### vllm-omni-distributed - Source: [PR #1898](vllm-project/vllm-omni#1898) - [Feature]: Remove some useless `hf_overrides` in yaml - Changes: - New feature: [Feature]: Remove some useless `hf_overrides` in yaml ### vllm-omni-quantization - Source: [PR #1898](vllm-project/vllm-omni#1898) - [Feature]: Remove some useless `hf_overrides` in yaml - Changes: - New feature: [Feature]: Remove some useless `hf_overrides` in yaml ### vllm-omni-cicd - Source: [PR #1898](vllm-project/vllm-omni#1898) - [Feature]: Remove some useless `hf_overrides` in yaml - Changes: - New feature: [Feature]: Remove some useless `hf_overrides` in yaml ### vllm-omni-perf - Source: [PR #1898](vllm-project/vllm-omni#1898) - [Feature]: Remove some useless `hf_overrides` in yaml - Changes: - New feature: [Feature]: Remove some useless `hf_overrides` in yaml ### vllm-omni-contrib - Source: [PR #1890](vllm-project/vllm-omni#1890) - [NPU] Upgrade to v0.17.0 ### vllm-omni-contrib - Source: [PR #1889](vllm-project/vllm-omni#1889) - Add `Governance` section - Changes: - New feature: Add `Governance` section ### vllm-omni-distributed - Source: [PR #1881](vllm-project/vllm-omni#1881) - [Feat] Support T5 Tensor Parallelism - Changes: - New feature: [Feat] Support T5 Tensor Parallelism ### vllm-omni-cicd - Source: [PR #1881](vllm-project/vllm-omni#1881) - [Feat] Support T5 Tensor Parallelism - Changes: - New feature: [Feat] Support T5 Tensor Parallelism
…oject#2038) Signed-off-by: tzhouam <tzhouam@connect.ust.hk> Signed-off-by: Gao Han <hgaoaf@connect.ust.hk> Co-authored-by: Gao Han <hgaoaf@connect.ust.hk>
Purpose
This PR aims to update the dockerfiles and installation docs for vllm v0.18.0 rebase.
Test Plan
\
Test Result
\
Essential Elements of an Effective PR Description Checklist
supported_models.mdandexamplesfor a new model. Please runmkdocs serveto sync the documentation editions to./docs.BEFORE SUBMITTING, PLEASE READ https://github.com/vllm-project/vllm-omni/blob/main/CONTRIBUTING.md (anything written below this line will be removed by GitHub Actions)