[build] fix cu130 related release pipeline steps and publish as nightly image#32522
[build] fix cu130 related release pipeline steps and publish as nightly image#32522khluu merged 2 commits intovllm-project:mainfrom
Conversation
…s.sh Signed-off-by: Shengqi Chen <harry-chen@outlook.com>
There was a problem hiding this comment.
Code Review
This pull request addresses feedback on the CUDA 13.0 release pipeline by updating the CUDA version and torch compute capabilities. It also refactors the nightly image publishing logic into a reusable shell script, which improves maintainability and adds a dedicated nightly build for CUDA 13.0. The changes are well-structured and align with the stated purpose. I've added a couple of suggestions to enhance the robustness of the new and modified shell scripts.
There was a problem hiding this comment.
Pull request overview
This PR fixes CUDA 13.0-related release pipeline steps based on feedback from NVIDIA. It addresses incorrect CUDA versions (13.0.2 → 13.0.1), extracts common Docker push logic into a reusable script, and adds support for publishing CUDA 13.0 as a separate nightly image variant.
Changes:
- Corrected CUDA version from 13.0.2 to 13.0.1 for cu130 builds (aligning with existing codebase standards)
- Removed FLASHINFER_AOT_COMPILE build argument for CUDA 13.0 builds
- Added compute capability 12.1 support for DGX Spark in arm64 CUDA 13.0 builds
- Extracted nightly build push logic into reusable
push-nightly-builds.shscript - Made
cleanup-nightly-builds.shparameterizable to support different tag prefixes - Added new pipeline step to publish CUDA 13.0 variant as separate nightly image
Reviewed changes
Copilot reviewed 3 out of 3 changed files in this pull request and generated 4 comments.
| File | Description |
|---|---|
.buildkite/scripts/push-nightly-builds.sh |
New script that consolidates Docker image tagging and pushing logic for nightly builds, supporting optional tag variants like "cu130" |
.buildkite/scripts/cleanup-nightly-builds.sh |
Enhanced to accept tag prefix parameter for cleaning up variant-specific nightly builds |
.buildkite/release-pipeline.yaml |
Updated CUDA 13.0 build configurations, refactored nightly build steps to use new script, added CUDA 13.0 nightly build step |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
Signed-off-by: Shengqi Chen <harry-chen@outlook.com>
|
Thanks a lot, @Harry-Chen! I apologize for intense expression of my frustration. I see that the CI hasn't been uploading the nightly images to dockerhub for a few days: https://hub.docker.com/r/vllm/vllm-openai/tags I'm wondering if you are aware what's happening? |
Yes, this is due to some permission issue, and @khluu has fixed it now. Thanks for reminding us. |
Thanks! I see that there are |
The cu130 nightly pipelines need a manual trigger to run. I have triggered one on yesterday's nightly run. @khluu do you think we should remove the block and let it run automatically? |
|
It would be very nice if we could remove the block and let it run automatically, which makes our lives easier when testing the latest changes on all the Blackwell platforms. |
…ly image (vllm-project#32522) Signed-off-by: Shengqi Chen <harry-chen@outlook.com> Signed-off-by: dsuhinin <suhinin.dmitriy@gmail.com>
|
Running it automatically makes sense to me. |
…ly image (vllm-project#32522) Signed-off-by: Shengqi Chen <harry-chen@outlook.com>
- [build] fix cu130 related release pipeline steps and publish as nightly image (vllm-project#32522) - [Misc] Replace urllib's `urlparse` with urllib3's `parse_url` (vllm-project#32746) - [Misc] Bump opencv-python dependency version to 4.13 (vllm-project#32668) - [Bugfix] Fix Whisper/encoder-decoder GPU memory leak (vllm-project#32789) - [CI] fix version comparsion and exclusion patterns in upload-release-wheels.sh (vllm-project#32971) - tokenizers: mistral: fix merge conflict - `Dockerfile.tpu.ubi`: add `git` to allow `pip install git+https`
Purpose
After #31032 is merged, we have received some feedback from NVIDIA, mainly #31032 (review):
This was indeed my oversight. So I cherry-pick some fixes from this comment and #31822. I have also extracted
push-nightly-builds.shto avoid duplication of commands when uploading to docker hub.Credit: @csahithi (modification of scripts), @wangshangsam.
Test Plan
I will trigger a release pipeline run to see if everything works.
Test Result
Essential Elements of an Effective PR Description Checklist
supported_models.mdandexamplesfor a new model.