Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
15 changes: 1 addition & 14 deletions docs/getting_started/installation/gpu/cuda.inc.md
Original file line number Diff line number Diff line change
Expand Up @@ -20,20 +20,7 @@ Therefore, it is recommended to install vLLM and vLLM-Omni with a **fresh new**

vLLM-Omni is built based on vLLM. Please install it with command below.
```bash
# vllm 0.16.0 is still under prerelease
uv pip install --prerelease=allow vllm --extra-index-url https://wheels.vllm.ai/2d5be1dd5ce2e44dfea53ea03ff61143da5137eb

# vllm 0.16.0 may have some bugs for cuda 12.9, here is how we solve them:
export FLASHINFER_CUDA_TAG="$(python3 -c 'import torch; print((torch.version.cuda or "12.4").replace(".", ""))')"

uv pip install --upgrade --force-reinstall \
"flashinfer-python==0.6.3" \
"flashinfer-cubin==0.6.3" \
"flashinfer-jit-cache==0.6.3" \
--extra-index-url "https://flashinfer.ai/whl/cu${FLASHINFER_CUDA_TAG}"

uv pip install --upgrade --force-reinstall "nvidia-cublas-cu12==12.9.1.4"
uv pip install --upgrade --force-reinstall "numpy==2.2.6"
uv pip install vllm --torch-backend=auto
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 Badge Pin vLLM CUDA install to 0.16.0 in prebuilt docs

This command now resolves to the latest vllm instead of the tested 0.16.0 release, so users can silently install a newer major/minor vLLM that is incompatible with current vllm-omni behavior (the project intentionally does not pin vllm as a dependency in pyproject.toml). In practice this makes the documented CUDA setup non-reproducible and can break installs/runtime as soon as a newer vLLM is published, even though adjacent docs sections still target 0.16.0.

Useful? React with 👍 / 👎.

```

#### Installation of vLLM-Omni
Expand Down
13 changes: 1 addition & 12 deletions docs/getting_started/quickstart.md
Original file line number Diff line number Diff line change
Expand Up @@ -19,18 +19,7 @@ uv venv --python 3.12 --seed
source .venv/bin/activate

# On CUDA
# vllm 0.16.0 is still under prerelease
uv pip install --prerelease=allow vllm --extra-index-url https://wheels.vllm.ai/2d5be1dd5ce2e44dfea53ea03ff61143da5137eb
# vllm 0.16.0 may have some bugs for cuda 12.9, here is how we solve them:
export FLASHINFER_CUDA_TAG="$(python3 -c 'import torch; print((torch.version.cuda or "12.4").replace(".", ""))')"
uv pip install --upgrade --force-reinstall \
"flashinfer-python==0.6.3" \
"flashinfer-cubin==0.6.3" \
"flashinfer-jit-cache==0.6.3" \
--extra-index-url "https://flashinfer.ai/whl/cu${FLASHINFER_CUDA_TAG}"
uv pip install --upgrade --force-reinstall "nvidia-cublas-cu12==12.9.1.4"
uv pip install --upgrade --force-reinstall "numpy==2.2.6"

uv pip install vllm==0.16.0 --torch-backend=auto

# On ROCm
uv pip install vllm==0.16.0 --extra-index-url https://wheels.vllm.ai/rocm/0.16.0/rocm700
Expand Down