[CI] Add CUDA 13 nightly containers by csahithi · Pull Request #31822 · vllm-project/vllm

csahithi · 2026-01-06T17:14:28Z

Purpose

This PR adds CUDA 13 nightly containers to dockerhub along with automatic 14 day cleanup (CUDA 13 wheels already exist in release-pipeline.yaml).
The cleanup script is modified to accept a prefix as a parameter using which cleanup is performed.

Essential Elements of an Effective PR Description Checklist

The purpose of the PR, such as "Fix some issue (link existing issues this PR will resolve)".
The test plan, such as providing test command.
The test results, such as pasting the results comparison before and after, or e2e results
(Optional) The necessary documentation update, such as updating supported_models.md and examples for a new model.
(Optional) Release notes update. If your change is user facing, please update the release notes draft in the Google Doc.

Signed-off-by: Sahithi Chigurupati <chigurupati.sahithi@gmail.com>

gemini-code-assist

Code Review

This pull request introduces support for building and publishing nightly Docker containers for CUDA 13. This is achieved by adding new steps to the Buildkite release pipeline for building CUDA 13 images for x86 and arm64, creating a multi-arch manifest, and publishing them to DockerHub. The cleanup-nightly-builds.sh script has also been updated to accept a tag prefix, making it reusable for cleaning up different sets of nightly images.

The changes are logical and follow the existing structure of the pipeline. However, I have identified two high-severity issues in the pipeline configuration file:

An invalid CUDA compute capability is specified in the torch_cuda_arch_list for the arm64 build.
There is significant code duplication in the pipeline steps, which harms maintainability. I've suggested using YAML anchors to refactor this.

Please see the detailed comments for suggestions on how to address these points.

gemini-code-assist · 2026-01-06T17:15:58Z

.buildkite/release-pipeline.yaml

+      queue: arm64_cpu_queue_postmerge
+    commands:
+      - "aws ecr-public get-login-password --region us-east-1 | docker login --username AWS --password-stdin public.ecr.aws/q9t5s3a7"
+      - "DOCKER_BUILDKIT=1 docker build --build-arg max_jobs=16 --build-arg USE_SCCACHE=1 --build-arg GIT_REPO_CHECK=1 --build-arg CUDA_VERSION=13.0.1 --build-arg BUILD_BASE_IMAGE=nvidia/cuda:13.0.1-devel-ubuntu22.04 --build-arg FLASHINFER_AOT_COMPILE=true --build-arg torch_cuda_arch_list='8.7 8.9 9.0 10.0+PTX 12.0' --build-arg INSTALL_KV_CONNECTORS=true --tag public.ecr.aws/q9t5s3a7/vllm-release-repo:$BUILDKITE_COMMIT-cuda13-$(uname -m) --target vllm-openai --progress plain -f docker/Dockerfile ."


The torch_cuda_arch_list contains 12.0, which is not a valid CUDA compute capability. The latest defined architecture is 9.0 for Hopper. While 10.0+PTX can be for forward-compatibility with upcoming architectures like Blackwell, 12.0 is likely a mistake. At best, it will be ignored by the build system, but it should be removed to avoid confusion and potential issues.

- "DOCKER_BUILDKIT=1 docker build --build-arg max_jobs=16 --build-arg USE_SCCACHE=1 --build-arg GIT_REPO_CHECK=1 --build-arg CUDA_VERSION=13.0.1 --build-arg BUILD_BASE_IMAGE=nvidia/cuda:13.0.1-devel-ubuntu22.04 --build-arg FLASHINFER_AOT_COMPILE=true --build-arg torch_cuda_arch_list='8.7 8.9 9.0 10.0+PTX' --build-arg INSTALL_KV_CONNECTORS=true --tag public.ecr.aws/q9t5s3a7/vllm-release-repo:$BUILDKITE_COMMIT-cuda13-$(uname -m) --target vllm-openai --progress plain -f docker/Dockerfile ."

12.0 is for RTX PRO 6000 Blackwell, no? There's also 12.1 for DGX Spark.

Speaking of which, @csahithi I think it probably make sense to add 12.1 here. I remember I had a late night discussion with @suhara and he was having troubles building an image on DGX Spark. Therefore, probably make sense to include 12.1 here so that our Nemotron folks will be able to just use the nightly built container directly on DGX Spark.

added 12.1 for DGX Spark, thanks!

gemini-code-assist · 2026-01-06T17:15:58Z

.buildkite/release-pipeline.yaml

+  - label: "Build and publish nightly CUDA 13.0 multi-arch image to DockerHub"
+    depends_on:
+      - create-multi-arch-manifest-cuda13
+    if: build.env("NIGHTLY") == "1"
+    agents:
+      queue: cpu_queue_postmerge
+    commands:
+      - "aws ecr-public get-login-password --region us-east-1 | docker login --username AWS --password-stdin public.ecr.aws/q9t5s3a7"
+      - "docker pull public.ecr.aws/q9t5s3a7/vllm-release-repo:$BUILDKITE_COMMIT-cuda13-x86_64"
+      - "docker pull public.ecr.aws/q9t5s3a7/vllm-release-repo:$BUILDKITE_COMMIT-cuda13-aarch64"
+      - "docker tag public.ecr.aws/q9t5s3a7/vllm-release-repo:$BUILDKITE_COMMIT-cuda13-x86_64 vllm/vllm-openai:cuda13-nightly-x86_64"
+      - "docker tag public.ecr.aws/q9t5s3a7/vllm-release-repo:$BUILDKITE_COMMIT-cuda13-aarch64 vllm/vllm-openai:cuda13-nightly-aarch64"
+      - "docker push vllm/vllm-openai:cuda13-nightly-x86_64"
+      - "docker push vllm/vllm-openai:cuda13-nightly-aarch64"
+      - "docker manifest create vllm/vllm-openai:cuda13-nightly vllm/vllm-openai:cuda13-nightly-x86_64 vllm/vllm-openai:cuda13-nightly-aarch64 --amend"
+      - "docker manifest create vllm/vllm-openai:cuda13-nightly-$BUILDKITE_COMMIT vllm/vllm-openai:cuda13-nightly-x86_64 vllm/vllm-openai:cuda13-nightly-aarch64 --amend"
+      - "docker manifest push vllm/vllm-openai:cuda13-nightly"
+      - "docker manifest push vllm/vllm-openai:cuda13-nightly-$BUILDKITE_COMMIT"
+      # Clean up old CUDA 13.0 nightly builds (keep only last 14)
+      - "bash .buildkite/scripts/cleanup-nightly-builds.sh cuda13-nightly-"
    plugins:
      - docker-login#v3.0.0:
          username: vllmbot


This new step to publish nightly CUDA 13.0 images is almost a complete duplicate of the existing step for CUDA 12.9 images. This level of duplication makes the pipeline configuration difficult to maintain and prone to errors, as changes need to be manually synchronized across multiple blocks.

To improve maintainability, I strongly recommend refactoring this using YAML anchors and aliases. You can define a template for the common parts of the job and then reuse it for each CUDA version, only overriding the specific parts like dependencies and tag prefixes.

Here is a conceptual example:

.publish_nightly_template: &publish_nightly_template if: build.env("NIGHTLY") == "1" agents: queue: cpu_queue_postmerge plugins: - docker-login#v3.0.0: username: vllmbot password-env: DOCKERHUB_TOKEN # ... other common properties - label: "Build and publish nightly CUDA 12.9 ..." <<: *publish_nightly_template depends_on: - create-multi-arch-manifest commands: # ... commands with version-specific tags - "bash .buildkite/scripts/cleanup-nightly-builds.sh nightly-" - label: "Build and publish nightly CUDA 13.0 ..." <<: *publish_nightly_template depends_on: - create-multi-arch-manifest-cuda13 commands: # ... commands with version-specific tags - "bash .buildkite/scripts/cleanup-nightly-builds.sh cuda13-nightly-"

Even the commands list could be further parameterized using variables to reduce duplication even more. Adopting this pattern will make the pipeline much cleaner and easier to manage.

This is a fair comment and I think it'd be a good idea to refactor them, if @csahithi wanna take on this challenge.

wangshangsam

Nicely done, @csahithi ! If we can merge this before Feb 13, maybe we could use a nightly image directly in our MLPerf submission :)

The pipeline should be tested by someone who has access to vLLM's buildkite and ecr account, since this is very infra-setup-dependent. @mgoin do you know whom would be able to help test this out?

wangshangsam · 2026-01-10T02:33:16Z

.buildkite/release-pipeline.yaml

+      queue: cpu_queue_postmerge
+    commands:
+      - "aws ecr-public get-login-password --region us-east-1 | docker login --username AWS --password-stdin public.ecr.aws/q9t5s3a7"
+      - "DOCKER_BUILDKIT=1 docker build --build-arg max_jobs=16 --build-arg USE_SCCACHE=1 --build-arg GIT_REPO_CHECK=1 --build-arg CUDA_VERSION=13.0.1 --build-arg BUILD_BASE_IMAGE=nvidia/cuda:13.0.1-devel-ubuntu22.04 --build-arg FLASHINFER_AOT_COMPILE=true --build-arg INSTALL_KV_CONNECTORS=true --tag public.ecr.aws/q9t5s3a7/vllm-release-repo:$BUILDKITE_COMMIT-cuda13-$(uname -m) --target vllm-openai --progress plain -f docker/Dockerfile ."


Suggested change

- "DOCKER_BUILDKIT=1 docker build --build-arg max_jobs=16 --build-arg USE_SCCACHE=1 --build-arg GIT_REPO_CHECK=1 --build-arg CUDA_VERSION=13.0.1 --build-arg BUILD_BASE_IMAGE=nvidia/cuda:13.0.1-devel-ubuntu22.04 --build-arg FLASHINFER_AOT_COMPILE=true --build-arg INSTALL_KV_CONNECTORS=true --tag public.ecr.aws/q9t5s3a7/vllm-release-repo:$BUILDKITE_COMMIT-cuda13-$(uname -m) --target vllm-openai --progress plain -f docker/Dockerfile ."

- "DOCKER_BUILDKIT=1 docker build --build-arg max_jobs=16 --build-arg USE_SCCACHE=1 --build-arg GIT_REPO_CHECK=1 --build-arg CUDA_VERSION=13.0.1 --build-arg BUILD_BASE_IMAGE=nvidia/cuda:13.0.1-devel-ubuntu22.04 --build-arg INSTALL_KV_CONNECTORS=true --tag public.ecr.aws/q9t5s3a7/vllm-release-repo:$BUILDKITE_COMMIT-cuda13-$(uname -m) --target vllm-openai --progress plain -f docker/Dockerfile ."

FLASHINFER_AOT_COMPILE no longer exists I think.

removed FLASHINFER_AOT_COMPILE

wangshangsam · 2026-01-10T02:44:05Z

.buildkite/release-pipeline.yaml

+      queue: arm64_cpu_queue_postmerge
+    commands:
+      - "aws ecr-public get-login-password --region us-east-1 | docker login --username AWS --password-stdin public.ecr.aws/q9t5s3a7"
+      - "DOCKER_BUILDKIT=1 docker build --build-arg max_jobs=16 --build-arg USE_SCCACHE=1 --build-arg GIT_REPO_CHECK=1 --build-arg CUDA_VERSION=13.0.1 --build-arg BUILD_BASE_IMAGE=nvidia/cuda:13.0.1-devel-ubuntu22.04 --build-arg FLASHINFER_AOT_COMPILE=true --build-arg torch_cuda_arch_list='8.7 8.9 9.0 10.0+PTX 12.0' --build-arg INSTALL_KV_CONNECTORS=true --tag public.ecr.aws/q9t5s3a7/vllm-release-repo:$BUILDKITE_COMMIT-cuda13-$(uname -m) --target vllm-openai --progress plain -f docker/Dockerfile ."


12.0 is for RTX PRO 6000 Blackwell, no? There's also 12.1 for DGX Spark.

Speaking of which, @csahithi I think it probably make sense to add 12.1 here. I remember I had a late night discussion with @suhara and he was having troubles building an image on DGX Spark. Therefore, probably make sense to include 12.1 here so that our Nemotron folks will be able to just use the nightly built container directly on DGX Spark.

wangshangsam · 2026-01-10T02:45:25Z

.buildkite/release-pipeline.yaml

+      queue: arm64_cpu_queue_postmerge
+    commands:
+      - "aws ecr-public get-login-password --region us-east-1 | docker login --username AWS --password-stdin public.ecr.aws/q9t5s3a7"
+      - "DOCKER_BUILDKIT=1 docker build --build-arg max_jobs=16 --build-arg USE_SCCACHE=1 --build-arg GIT_REPO_CHECK=1 --build-arg CUDA_VERSION=13.0.1 --build-arg BUILD_BASE_IMAGE=nvidia/cuda:13.0.1-devel-ubuntu22.04 --build-arg FLASHINFER_AOT_COMPILE=true --build-arg torch_cuda_arch_list='8.7 8.9 9.0 10.0+PTX 12.0' --build-arg INSTALL_KV_CONNECTORS=true --tag public.ecr.aws/q9t5s3a7/vllm-release-repo:$BUILDKITE_COMMIT-cuda13-$(uname -m) --target vllm-openai --progress plain -f docker/Dockerfile ."


Suggested change

- "DOCKER_BUILDKIT=1 docker build --build-arg max_jobs=16 --build-arg USE_SCCACHE=1 --build-arg GIT_REPO_CHECK=1 --build-arg CUDA_VERSION=13.0.1 --build-arg BUILD_BASE_IMAGE=nvidia/cuda:13.0.1-devel-ubuntu22.04 --build-arg FLASHINFER_AOT_COMPILE=true --build-arg torch_cuda_arch_list='8.7 8.9 9.0 10.0+PTX 12.0' --build-arg INSTALL_KV_CONNECTORS=true --tag public.ecr.aws/q9t5s3a7/vllm-release-repo:$BUILDKITE_COMMIT-cuda13-$(uname -m) --target vllm-openai --progress plain -f docker/Dockerfile ."

- "DOCKER_BUILDKIT=1 docker build --build-arg max_jobs=16 --build-arg USE_SCCACHE=1 --build-arg GIT_REPO_CHECK=1 --build-arg CUDA_VERSION=13.0.1 --build-arg BUILD_BASE_IMAGE=nvidia/cuda:13.0.1-devel-ubuntu22.04 --build-arg torch_cuda_arch_list='8.7 8.9 9.0 10.0+PTX 12.0' --build-arg INSTALL_KV_CONNECTORS=true --tag public.ecr.aws/q9t5s3a7/vllm-release-repo:$BUILDKITE_COMMIT-cuda13-$(uname -m) --target vllm-openai --progress plain -f docker/Dockerfile ."

Signed-off-by: Sahithi Chigurupati <chigurupati.sahithi@gmail.com>

wangshangsam

Thanks @csahithi

wangshangsam · 2026-01-14T22:14:47Z

@mgoin I don't think we have a way to actually test out this CI change, without permission to trigger CI, buildkite and vLLM's ECR. Would you be able to test it out, or maybe give permission to @csahithi to test it out?

mgoin · 2026-01-14T23:37:46Z

My scope stops at the ability to trigger normal CI jobs. For the release pipeline it would be best to consult with @khluu

khluu · 2026-01-14T23:39:36Z

The release pipeline can only be launched automatically per-commit on main branch or manually by a group of people with permission (for security purpose). I've manually launched a release job here to test the PR out: https://buildkite.com/vllm/release/builds/11971

csahithi · 2026-01-15T01:30:06Z

@mgoin @khluu I see that the release pipeline run has completed

mergify · 2026-01-19T08:32:34Z

This pull request has merge conflicts that must be resolved before it can be
merged. Please rebase the PR, @csahithi.

https://docs.github.com/en/pull-requests/collaborating-with-pull-requests/working-with-forks/syncing-a-fork

ci: add cuda 13 release and nightly containers

12a30dd

Signed-off-by: Sahithi Chigurupati <chigurupati.sahithi@gmail.com>

mergify bot added ci/build nvidia labels Jan 6, 2026

github-project-automation bot added this to NVIDIA Jan 6, 2026

gemini-code-assist bot reviewed Jan 6, 2026

View reviewed changes

wangshangsam suggested changes Jan 10, 2026

View reviewed changes

github-project-automation bot moved this to In review in NVIDIA Jan 10, 2026

wangshangsam assigned csahithi Jan 10, 2026

fix docker build

2b97f34

Signed-off-by: Sahithi Chigurupati <chigurupati.sahithi@gmail.com>

wangshangsam approved these changes Jan 14, 2026

View reviewed changes

This was referenced Jan 15, 2026

[CI] Implement uploading to PyPI and GitHub in the release pipeline, enable release image building for CUDA 13.0 #31032

Merged

[build] fix cu130 related release pipeline steps and publish as nightly image #32522

Merged

mergify bot added the needs-rebase label Jan 19, 2026

Uh oh!

Conversation

csahithi commented Jan 6, 2026 • edited by github-actions bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Purpose

Uh oh!

gemini-code-assist bot left a comment

Choose a reason for hiding this comment

Code Review

Uh oh!

gemini-code-assist bot Jan 6, 2026

Choose a reason for hiding this comment

Uh oh!

wangshangsam Jan 10, 2026

Choose a reason for hiding this comment

Uh oh!

csahithi Jan 13, 2026

Choose a reason for hiding this comment

Uh oh!

gemini-code-assist bot Jan 6, 2026

Choose a reason for hiding this comment

Uh oh!

wangshangsam Jan 10, 2026

Choose a reason for hiding this comment

Uh oh!

wangshangsam left a comment

Choose a reason for hiding this comment

Uh oh!

wangshangsam Jan 10, 2026

Choose a reason for hiding this comment

Uh oh!

csahithi Jan 13, 2026

Choose a reason for hiding this comment

Uh oh!

wangshangsam Jan 10, 2026

Choose a reason for hiding this comment

Uh oh!

wangshangsam Jan 10, 2026

Choose a reason for hiding this comment

Uh oh!

wangshangsam left a comment

Choose a reason for hiding this comment

Uh oh!

wangshangsam commented Jan 14, 2026

Uh oh!

mgoin commented Jan 14, 2026

Uh oh!

khluu commented Jan 14, 2026

Uh oh!

csahithi commented Jan 15, 2026

Uh oh!

mergify bot commented Jan 19, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

csahithi commented Jan 6, 2026 •

edited by github-actions bot

Loading