Skip to content

[build] fix cu130 related release pipeline steps and publish as nightly image#32522

Merged
khluu merged 2 commits intovllm-project:mainfrom
Harry-Chen:cuda13-image-fix
Jan 18, 2026
Merged

[build] fix cu130 related release pipeline steps and publish as nightly image#32522
khluu merged 2 commits intovllm-project:mainfrom
Harry-Chen:cuda13-image-fix

Conversation

@Harry-Chen
Copy link
Copy Markdown
Member

@Harry-Chen Harry-Chen commented Jan 17, 2026

Purpose

After #31032 is merged, we have received some feedback from NVIDIA, mainly #31032 (review):

@Harry-Chen if you are going to take #31822 and merge it as your own, could you at least check out the differences and consult us (NVIDIA) why those differences are there?

This was indeed my oversight. So I cherry-pick some fixes from this comment and #31822. I have also extracted push-nightly-builds.sh to avoid duplication of commands when uploading to docker hub.

Credit: @csahithi (modification of scripts), @wangshangsam.

Test Plan

I will trigger a release pipeline run to see if everything works.

Test Result


Essential Elements of an Effective PR Description Checklist
  • The purpose of the PR, such as "Fix some issue (link existing issues this PR will resolve)".
  • The test plan, such as providing test command.
  • The test results, such as pasting the results comparison before and after, or e2e results
  • (Optional) The necessary documentation update, such as updating supported_models.md and examples for a new model.
  • (Optional) Release notes update. If your change is user facing, please update the release notes draft in the Google Doc.

…s.sh

Signed-off-by: Shengqi Chen <harry-chen@outlook.com>
Copilot AI review requested due to automatic review settings January 17, 2026 14:33
@mergify mergify bot added the ci/build label Jan 17, 2026
Copy link
Copy Markdown
Contributor

@gemini-code-assist gemini-code-assist bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request addresses feedback on the CUDA 13.0 release pipeline by updating the CUDA version and torch compute capabilities. It also refactors the nightly image publishing logic into a reusable shell script, which improves maintainability and adds a dedicated nightly build for CUDA 13.0. The changes are well-structured and align with the stated purpose. I've added a couple of suggestions to enhance the robustness of the new and modified shell scripts.

Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR fixes CUDA 13.0-related release pipeline steps based on feedback from NVIDIA. It addresses incorrect CUDA versions (13.0.2 → 13.0.1), extracts common Docker push logic into a reusable script, and adds support for publishing CUDA 13.0 as a separate nightly image variant.

Changes:

  • Corrected CUDA version from 13.0.2 to 13.0.1 for cu130 builds (aligning with existing codebase standards)
  • Removed FLASHINFER_AOT_COMPILE build argument for CUDA 13.0 builds
  • Added compute capability 12.1 support for DGX Spark in arm64 CUDA 13.0 builds
  • Extracted nightly build push logic into reusable push-nightly-builds.sh script
  • Made cleanup-nightly-builds.sh parameterizable to support different tag prefixes
  • Added new pipeline step to publish CUDA 13.0 variant as separate nightly image

Reviewed changes

Copilot reviewed 3 out of 3 changed files in this pull request and generated 4 comments.

File Description
.buildkite/scripts/push-nightly-builds.sh New script that consolidates Docker image tagging and pushing logic for nightly builds, supporting optional tag variants like "cu130"
.buildkite/scripts/cleanup-nightly-builds.sh Enhanced to accept tag prefix parameter for cleaning up variant-specific nightly builds
.buildkite/release-pipeline.yaml Updated CUDA 13.0 build configurations, refactored nightly build steps to use new script, added CUDA 13.0 nightly build step

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Signed-off-by: Shengqi Chen <harry-chen@outlook.com>
@khluu khluu enabled auto-merge (squash) January 17, 2026 17:01
@github-actions github-actions bot added the ready ONLY add when PR is ready to merge/full CI is needed label Jan 17, 2026
@khluu khluu merged commit 965765a into vllm-project:main Jan 18, 2026
18 checks passed
khluu pushed a commit that referenced this pull request Jan 18, 2026
…ly image (#32522)

Signed-off-by: Shengqi Chen <harry-chen@outlook.com>
(cherry picked from commit 965765a)
@wangshangsam
Copy link
Copy Markdown
Collaborator

Thanks a lot, @Harry-Chen! I apologize for intense expression of my frustration.

I see that the CI hasn't been uploading the nightly images to dockerhub for a few days: https://hub.docker.com/r/vllm/vllm-openai/tags I'm wondering if you are aware what's happening?

Copy link
Copy Markdown
Member Author

Harry-Chen commented Jan 20, 2026

Thanks a lot, @Harry-Chen! I apologize for intense expression of my frustration.

I see that the CI hasn't been uploading the nightly images to dockerhub for a few days: https://hub.docker.com/r/vllm/vllm-openai/tags I'm wondering if you are aware what's happening?

Yes, this is due to some permission issue, and @khluu has fixed it now. Thanks for reminding us.

@wangshangsam
Copy link
Copy Markdown
Collaborator

wangshangsam commented Jan 20, 2026

Thanks a lot, @Harry-Chen! I apologize for intense expression of my frustration.
I see that the CI hasn't been uploading the nightly images to dockerhub for a few days: https://hub.docker.com/r/vllm/vllm-openai/tags I'm wondering if you are aware what's happening?

Yes, this is due to some permission issue, and @khluu has fixed it now. Thanks for reminding us.

Thanks! I see that there are v0.14.0-aarch64-cu130 and v0.14.0-x86_64-cu130 now for the v0.14 release, but where are the nightly cu130- images though?

@Harry-Chen
Copy link
Copy Markdown
Member Author

Thanks! I see that there are v0.14.0-aarch64-cu130 and v0.14.0-x86_64-cu130 now for the v0.14 release, but where are the nightly cu130- images though?

The cu130 nightly pipelines need a manual trigger to run. I have triggered one on yesterday's nightly run. @khluu do you think we should remove the block and let it run automatically?

@Harry-Chen Harry-Chen deleted the cuda13-image-fix branch January 21, 2026 03:53
@wangshangsam
Copy link
Copy Markdown
Collaborator

It would be very nice if we could remove the block and let it run automatically, which makes our lives easier when testing the latest changes on all the Blackwell platforms.

dsuhinin pushed a commit to dsuhinin/vllm that referenced this pull request Jan 21, 2026
…ly image (vllm-project#32522)

Signed-off-by: Shengqi Chen <harry-chen@outlook.com>
Signed-off-by: dsuhinin <suhinin.dmitriy@gmail.com>
@simon-mo
Copy link
Copy Markdown
Collaborator

Running it automatically makes sense to me.

ItzDEXX pushed a commit to ItzDEXX/vllm that referenced this pull request Feb 19, 2026
…ly image (vllm-project#32522)

Signed-off-by: Shengqi Chen <harry-chen@outlook.com>
dtrifiro added a commit to dtrifiro/vllm that referenced this pull request Mar 9, 2026
- [build] fix cu130 related release pipeline steps and publish as
nightly image (vllm-project#32522)
- [Misc] Replace urllib's `urlparse` with urllib3's `parse_url`
(vllm-project#32746)
- [Misc] Bump opencv-python dependency version to 4.13
(vllm-project#32668)
- [Bugfix] Fix Whisper/encoder-decoder GPU memory leak
(vllm-project#32789)
- [CI] fix version comparsion and exclusion patterns in
upload-release-wheels.sh (vllm-project#32971)
- tokenizers: mistral: fix merge conflict
- `Dockerfile.tpu.ubi`: add `git` to allow `pip install git+https`
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

ci/build ready ONLY add when PR is ready to merge/full CI is needed

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants