Skip to content
Open
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
18 commits
Select commit Hold shift + click to select a range
cab1097
[Doc] add custom cuda build guides
loveysuby Feb 10, 2026
995049e
Merge remote-tracking branch 'upstream/main' into docs/add-custom-doc…
loveysuby Apr 6, 2026
2ef5846
docs: revise custom image build guide based on Dockerfile.cuda (#1439)
loveysuby Apr 6, 2026
2565171
Merge branch 'main' into docs/add-custom-docker-build-on-nvidia-cuda
loveysuby Apr 6, 2026
34f5bfe
Merge branch 'main' into docs/add-custom-docker-build-on-nvidia-cuda
loveysuby Apr 7, 2026
f74ba81
Update base image version
loveysuby Apr 10, 2026
8d4211a
Merge branch 'main' into docs/add-custom-docker-build-on-nvidia-cuda
loveysuby Apr 12, 2026
c396c07
Merge branch 'main' into docs/add-custom-docker-build-on-nvidia-cuda
loveysuby Apr 27, 2026
0dff3b8
docs: Use HF_HOME env for model cache path
loveysuby Apr 27, 2026
528e4d7
Merge branch 'main' into docs/add-custom-docker-build-on-nvidia-cuda
loveysuby Apr 27, 2026
7795a0c
Merge branch 'main' into docs/add-custom-docker-build-on-nvidia-cuda
tzhouam Apr 28, 2026
8ca92d2
Merge branch 'main' into docs/add-custom-docker-build-on-nvidia-cuda
loveysuby May 13, 2026
835b259
Update CUDA base image version to v0.20.0
loveysuby May 13, 2026
8faed3f
Merge branch 'main' into docs/add-custom-docker-build-on-nvidia-cuda
loveysuby May 14, 2026
4c70f80
Merge branch 'main' into docs/add-custom-docker-build-on-nvidia-cuda
loveysuby May 26, 2026
97f04c0
Update CUDA base image version to v0.21.0
loveysuby May 26, 2026
650448a
Merge branch 'main' into docs/add-custom-docker-build-on-nvidia-cuda
loveysuby May 26, 2026
df3a542
Merge branch 'main' into docs/add-custom-docker-build-on-nvidia-cuda
congw729 May 28, 2026
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
4 changes: 4 additions & 0 deletions docs/getting_started/installation/gpu.md
Original file line number Diff line number Diff line change
Expand Up @@ -94,6 +94,10 @@ Note: Pre-built wheels are currently available for vLLM-Omni 0.11.0rc1, 0.12.0rc

### Build your own docker image

=== "NVIDIA CUDA"

--8<-- "docs/getting_started/installation/gpu/cuda.inc.md:build-docker"

=== "AMD ROCm"

--8<-- "docs/getting_started/installation/gpu/rocm.inc.md:build-docker"
Expand Down
50 changes: 50 additions & 0 deletions docs/getting_started/installation/gpu/cuda.inc.md
Original file line number Diff line number Diff line change
Expand Up @@ -110,3 +110,53 @@ docker run --runtime nvidia --gpus 2 \
The CUDA image does not define a default entrypoint, so include `vllm serve ... --omni` after the image name.

# --8<-- [end:pre-built-images]

# --8<-- [start:build-docker]

#### Build docker image
Comment thread
loveysuby marked this conversation as resolved.

```bash
DOCKER_BUILDKIT=1 docker build -f docker/Dockerfile.cuda -t vllm-omni-cuda .
```

If you want to specify the base vLLM version:

```bash
Comment thread
loveysuby marked this conversation as resolved.
DOCKER_BUILDKIT=1 docker build \
-f docker/Dockerfile.cuda \
--build-arg BASE_IMAGE=vllm/vllm-openai:v0.21.0 \
Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

PTAL @Gaohan123:
Followed vllm 0.21.0 at docs.

-t vllm-omni-cuda .
```

#### Launch the docker image

##### Launch with OpenAI API Server

!!! note
The model `Qwen/Qwen3-Omni-30B-A3B-Instruct` requires significant GPU memory. The example below has been verified on 2 x H100's.

```bash
docker run --runtime nvidia --gpus 2 \
-v ${HF_HOME:-$HOME/.cache/huggingface}:/root/.cache/huggingface \
--env "HF_TOKEN=$HF_TOKEN" \
-p 8091:8091 \
--ipc=host \
vllm-omni-cuda \
vllm serve --omni --model Qwen/Qwen3-Omni-30B-A3B-Instruct --port 8091
```

By default, this mounts `$HOME/.cache/huggingface` as the model cache directory. To use a custom location, set the `HF_HOME` environment variable before running the command (e.g., `export HF_HOME=/data/models`).

##### Launch with interactive session for development

```bash
docker run --runtime nvidia --gpus all -it --rm \
-v ${HF_HOME:-$HOME/.cache/huggingface}:/root/.cache/huggingface \
--env "HF_TOKEN=$HF_TOKEN" \
-p 8091:8091 \
--ipc=host \
--entrypoint bash \
vllm-omni-cuda
```

# --8<-- [end:build-docker]
Loading