Skip to content
Merged
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
84 changes: 44 additions & 40 deletions docs/getting_started/installation/gpu/cuda.inc.md
Original file line number Diff line number Diff line change
Expand Up @@ -20,16 +20,16 @@ Therefore, it is recommended to install vLLM with a **fresh new** environment. I
# --8<-- [end:set-up-using-python]
# --8<-- [start:pre-built-wheels]

You can install vLLM using either `pip` or `uv pip`:

```bash
# Install vLLM with CUDA 12.8.
# If you are using pip.
pip install vllm --extra-index-url https://download.pytorch.org/whl/cu128
# If you are using uv.
uv pip install vllm --torch-backend=auto
```

??? console "pip"
```bash
# Install vLLM with CUDA 12.8.
pip install vllm --extra-index-url https://download.pytorch.org/whl/cu128
```

We recommend leveraging `uv` to [automatically select the appropriate PyTorch index at runtime](https://docs.astral.sh/uv/guides/integration/pytorch/#automatic-backend-selection) by inspecting the installed CUDA driver version via `--torch-backend=auto` (or `UV_TORCH_BACKEND=auto`). To select a specific backend (e.g., `cu126`), set `--torch-backend=cu126` (or `UV_TORCH_BACKEND=cu126`). If this doesn't work, try running `uv self update` to update `uv` first.

!!! note
Expand All @@ -50,36 +50,22 @@ uv pip install https://github.com/vllm-project/vllm/releases/download/v${VLLM_VE

LLM inference is a fast-evolving field, and the latest code may contain bug fixes, performance improvements, and new features that are not released yet. To allow users to try the latest code without waiting for the next release, vLLM provides wheels for Linux running on a x86 platform with CUDA 12 for every commit since `v0.5.3`.

##### Install the latest code using `pip`

```bash
pip install -U vllm \
--pre \
--extra-index-url https://wheels.vllm.ai/nightly
```

`--pre` is required for `pip` to consider pre-released versions.

Another way to install the latest code is to use `uv`:

```bash
uv pip install -U vllm \
--torch-backend=auto \
--extra-index-url https://wheels.vllm.ai/nightly
```

##### Install specific revisions using `pip`
??? console "pip"
```bash
pip install -U vllm \
--pre \
--extra-index-url https://wheels.vllm.ai/nightly
```

If you want to access the wheels for previous commits (e.g. to bisect the behavior change, performance regression), due to the limitation of `pip`, you have to specify the full URL of the wheel file by embedding the commit hash in the URL:
`--pre` is required for `pip` to consider pre-released versions.

```bash
export VLLM_COMMIT=33f460b17a54acb3b6cc0b03f4a17876cff5eafd # use full commit hash from the main branch
pip install https://wheels.vllm.ai/${VLLM_COMMIT}/vllm-1.0.0.dev-cp38-abi3-manylinux1_x86_64.whl
```

Note that the wheels are built with Python 3.8 ABI (see [PEP 425](https://peps.python.org/pep-0425/) for more details about ABI), so **they are compatible with Python 3.8 and later**. The version string in the wheel file name (`1.0.0.dev`) is just a placeholder to have a unified URL for the wheels, the actual versions of wheels are contained in the wheel metadata (the wheels listed in the extra index url have correct versions). Although we don't support Python 3.8 any more (because PyTorch 2.5 dropped support for Python 3.8), the wheels are still built with Python 3.8 ABI to keep the same wheel name as before.

##### Install specific revisions using `uv`
##### Install specific revisions

If you want to access the wheels for previous commits (e.g. to bisect the behavior change, performance regression), you can specify the commit hash in the URL:

Expand All @@ -92,17 +78,35 @@ uv pip install vllm \

The `uv` approach works for vLLM `v0.6.6` and later and offers an easy-to-remember command. A unique feature of `uv` is that packages in `--extra-index-url` have [higher priority than the default index](https://docs.astral.sh/uv/pip/compatibility/#packages-that-exist-on-multiple-indexes). If the latest public release is `v0.6.6.post1`, `uv`'s behavior allows installing a commit before `v0.6.6.post1` by specifying the `--extra-index-url`. In contrast, `pip` combines packages from `--extra-index-url` and the default index, choosing only the latest version, which makes it difficult to install a development version prior to the released version.

??? note "pip"
If you want to access the wheels for previous commits (e.g. to bisect the behavior change,
performance regression), due to the limitation of `pip`, you have to specify the full URL of the
wheel file by embedding the commit hash in the URL:

```bash
export VLLM_COMMIT=33f460b17a54acb3b6cc0b03f4a17876cff5eafd # use full commit hash from the main branch
pip install https://wheels.vllm.ai/${VLLM_COMMIT}/vllm-1.0.0.dev-cp38-abi3-manylinux1_x86_64.whl
```

Note that the wheels are built with Python 3.8 ABI (see [PEP
425](https://peps.python.org/pep-0425/) for more details about ABI), so **they are compatible
with Python 3.8 and later**. The version string in the wheel file name (`1.0.0.dev`) is just a
placeholder to have a unified URL for the wheels, the actual versions of wheels are contained in
the wheel metadata (the wheels listed in the extra index url have correct versions). Although we
don't support Python 3.8 any more (because PyTorch 2.5 dropped support for Python 3.8), the
wheels are still built with Python 3.8 ABI to keep the same wheel name as before.

# --8<-- [end:pre-built-wheels]
# --8<-- [start:build-wheel-from-source]

#### Set up using Python-only build (without compilation)

If you only need to change Python code, you can build and install vLLM without compilation. Using `pip`'s [`--editable` flag](https://pip.pypa.io/en/stable/topics/local-project-installs/#editable-installs), changes you make to the code will be reflected when you run vLLM:
If you only need to change Python code, you can build and install vLLM without compilation. Using `uv pip`'s [`--editable` flag](https://docs.astral.sh/uv/pip/packages/#editable-packages), changes you make to the code will be reflected when you run vLLM:

```bash
git clone https://github.com/vllm-project/vllm.git
cd vllm
VLLM_USE_PRECOMPILED=1 pip install --editable .
VLLM_USE_PRECOMPILED=1 uv pip install --editable .
```

This command will do the following:
Expand All @@ -121,7 +125,7 @@ In case you see an error about wheel not found when running the above command, i
```bash
export VLLM_COMMIT=72d9c316d3f6ede485146fe5aabd4e61dbc59069 # use full commit hash from the main branch
export VLLM_PRECOMPILED_WHEEL_LOCATION=https://wheels.vllm.ai/${VLLM_COMMIT}/vllm-1.0.0.dev-cp38-abi3-manylinux1_x86_64.whl
pip install --editable .
uv pip install --editable .
```

You can find more information about vLLM's wheels in [install-the-latest-code][install-the-latest-code].
Expand All @@ -137,7 +141,7 @@ If you want to modify C++ or CUDA code, you'll need to build vLLM from source. T
```bash
git clone https://github.com/vllm-project/vllm.git
cd vllm
pip install -e .
uv pip install -e .
```

!!! tip
Expand All @@ -152,23 +156,23 @@ pip install -e .
The following environment variables can be set to configure the vLLM `sccache` remote: `SCCACHE_BUCKET=vllm-build-sccache SCCACHE_REGION=us-west-2 SCCACHE_S3_NO_CREDENTIALS=1`. We also recommend setting `SCCACHE_IDLE_TIMEOUT=0`.

!!! note "Faster Kernel Development"
For frequent C++/CUDA kernel changes, after the initial `pip install -e .` setup, consider using the [Incremental Compilation Workflow](../../contributing/incremental_build.md) for significantly faster rebuilds of only the modified kernel code.
For frequent C++/CUDA kernel changes, after the initial `uv pip install -e .` setup, consider using the [Incremental Compilation Workflow](../../contributing/incremental_build.md) for significantly faster rebuilds of only the modified kernel code.

##### Use an existing PyTorch installation

There are scenarios where the PyTorch dependency cannot be easily installed via pip, e.g.:
There are scenarios where the PyTorch dependency cannot be easily installed with `uv`, e.g.:

- Building vLLM with PyTorch nightly or a custom PyTorch build.
- Building vLLM with aarch64 and CUDA (GH200), where the PyTorch wheels are not available on PyPI. Currently, only the PyTorch nightly has wheels for aarch64 with CUDA. You can run `pip3 install --pre torch torchvision torchaudio --index-url https://download.pytorch.org/whl/nightly/cu124` to [install PyTorch nightly](https://pytorch.org/get-started/locally/), and then build vLLM on top of it.
- Building vLLM with aarch64 and CUDA (GH200), where the PyTorch wheels are not available on PyPI. Currently, only the PyTorch nightly has wheels for aarch64 with CUDA. You can run `uv pip install --index-url https://download.pytorch.org/whl/nightly/cu128 torch torchvision torchaudio` to [install PyTorch nightly](https://pytorch.org/get-started/locally/) and then build vLLM on top of it.

To build vLLM using an existing PyTorch installation:

```bash
git clone https://github.com/vllm-project/vllm.git
cd vllm
python use_existing_torch.py
pip install -r requirements/build.txt
pip install --no-build-isolation -e .
uv pip install -r requirements/build.txt
uv pip install --no-build-isolation -e .
```

##### Use the local cutlass for compilation
Expand All @@ -179,7 +183,7 @@ To achieve this, you can set the environment variable VLLM_CUTLASS_SRC_DIR to po
```bash
git clone https://github.com/vllm-project/vllm.git
cd vllm
VLLM_CUTLASS_SRC_DIR=/path/to/cutlass pip install -e .
VLLM_CUTLASS_SRC_DIR=/path/to/cutlass uv pip install -e .
```

##### Troubleshooting
Expand All @@ -189,7 +193,7 @@ to be run simultaneously, via the environment variable `MAX_JOBS`. For example:

```bash
export MAX_JOBS=6
pip install -e .
uv pip install -e .
```

This is especially useful when you are building on less powerful machines. For example, when you use WSL it only [assigns 50% of the total memory by default](https://learn.microsoft.com/en-us/windows/wsl/wsl-config#main-wsl-settings), so using `export MAX_JOBS=1` can avoid compiling multiple files simultaneously and running out of memory.
Expand Down Expand Up @@ -228,7 +232,7 @@ Simply disable the `VLLM_TARGET_DEVICE` environment variable before installing:

```bash
export VLLM_TARGET_DEVICE=empty
pip install -e .
uv pip install -e .
```

# --8<-- [end:build-wheel-from-source]
Expand Down