[CI] Automate Docker Hub release image publishing#40415
Conversation
There was a problem hiding this comment.
Claude Code Review
This repository is configured for manual code reviews. Comment @claude review to trigger a review and subscribe this PR to future pushes, or @claude review once for a one-time review.
Tip: disable this comment in your organization's Code Review settings.
There was a problem hiding this comment.
Code Review
This pull request updates the release script to include support for Ubuntu 24.04 Docker images. It adds commands to pull, tag, push, and create multi-architecture manifests for both standard and CUDA 13.0 variants across x86_64 and aarch64 architectures. I have no feedback to provide.
Replaces manual docker publish instructions in annotate-release.sh with a unified publish-release-images.sh script and corresponding Buildkite pipeline steps. The new script handles all image variants (CUDA 12.9, CUDA 13.0, Ubuntu 24.04, ROCm, and CPU), automating the pull, tag, push, and multi-arch manifest creation for Docker Hub releases. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> Signed-off-by: khluu <khluu000@gmail.com>
e444006 to
8f4b74c
Compare
Replaces manual docker publish instructions in annotate-release.sh with a unified publish-release-images.sh script and corresponding Buildkite pipeline steps. The new script handles all image variants (CUDA 12.9, CUDA 13.0, Ubuntu 24.04, ROCm, and CPU), automating the pull, tag, push, and multi-arch manifest creation for Docker Hub releases. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> Signed-off-by: khluu <khluu000@gmail.com>
Move docker tag/push/manifest logic from the annotation script into a new publish-release-images.sh that runs as an automated pipeline step. Adds all image variants including Ubuntu 24.04, CUDA 12.9, ROCm, and CPU with graceful fallback for CPU images behind their own block steps. Co-authored-by: Claude Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> Signed-off-by: khluu <khluu000@gmail.com>
8f4b74c to
8e520ee
Compare
|
@claude review |
- Use `docker manifest inspect` to distinguish a missing CPU image (skip with warning) from a real pull failure (fail loudly under set -e), instead of `if docker pull ... 2>/dev/null; then` which masked all non-zero exits as "image not built". - Add the CPU build steps to `block-publish-release-images.depends_on` with `allow_failure: true`, so publish doesn't race an in-progress CPU build but still proceeds when the operator legitimately leaves the CPU block steps unblocked. - Normalize the `v` prefix on the CPU build-side tags too, so the round-trip with the publish script is robust regardless of whether the operator types `1.2.3` or `v1.2.3` into the input field. - Use `buildkite-agent meta-data get --default ""` so the explicit empty-check is reachable under `set -euo pipefail` (otherwise pipefail aborts before the check). Signed-off-by: Kevin H. Luu <kevin@anyscale.com> Co-authored-by: Claude <noreply@anthropic.com> Signed-off-by: khluu <khluu000@gmail.com>
- publish-release-images.sh: restructure CPU section to inspect both arches up front, then either publish everything (per-arch tags + multi-arch manifest) or fail loudly. The previous structure pushed per-arch tags independently of the multi-arch gate, so a partial state (one arch fails under publish gate's `allow_failure: true`) would leave `:latest-x86_64` pointing at the new release while the `:latest` multi-arch manifest still resolved to the previous release. - release-pipeline.yaml: revert the build-side `| sed 's/^v//'` normalization. `buildkite-agent meta-data get release-version` always returns the value with the `v` prefix already, so the strip-then- re-prepend was a no-op and added noise. Signed-off-by: Kevin H. Luu <kevin@anyscale.com> Co-authored-by: Claude <noreply@anthropic.com> Signed-off-by: khluu <khluu000@gmail.com>
There was a problem hiding this comment.
All prior review feedback addressed: all-or-nothing CPU publish with explicit fail-loud on partial state, manifest-inspect probe instead of silent pull-skip, race window closed via CPU build deps with allow_failure, and metadata fetch hardened with --default.
Extended reasoning...
Overview
This PR is purely CI/build infrastructure: it replaces the manual copy-paste docker tag/push commands previously printed by annotate-release.sh with an automated Buildkite step that runs a new publish-release-images.sh. It also adds key: IDs to two CPU build steps and adds a new block-gated publish step. No runtime/production code paths are touched.
Security risks
None. The script runs in the existing release pipeline with the same DockerHub credentials and ECR auth that the manual flow relied on. Inputs are operator-supplied via Buildkite metadata; tag construction uses shell variables in trusted-context strings (no untrusted user input is interpolated into shell evaluation contexts that didn't already exist in the prior manual workflow).
Level of scrutiny
Lower than production code: this is release-time tooling guarded by manual block steps and is not in the request-serving path. The blast radius of bugs is limited to the Docker Hub release artifacts, and operators can re-run the publish step idempotently. That said, I scrutinized the prior revisions because silent failures here can ship stale images — and those concerns have all been resolved in the current commits.
Other factors
All three prior review items are now visibly addressed in the diff: (1) line 11 uses buildkite-agent meta-data get release-version --default "" so the explicit empty-check on line 12 fires as intended; (2) the CPU section uses docker manifest inspect first to distinguish 'not built' from 'pull failed', then enforces all-or-nothing — both arches publish together or neither does, with a non-zero exit on the partial state to prevent split-tag drift between :latest-x86_64 and the multi-arch :latest; (3) the publish block now lists the CPU build steps in depends_on with allow_failure: true, closing the race window where publish could run before the in-progress CPU build pushed to ECR. Recent commits 8f214e0 and 2b83d0c match this exactly. Straightforward CI change worth approving.
| - block: "Publish release images to DockerHub" | ||
| key: block-publish-release-images | ||
| depends_on: | ||
| - annotate-release-workflow |
There was a problem hiding this comment.
it doesn't need to wait for release workflow annotation
There was a problem hiding this comment.
Done — dropped annotate-release-workflow from the depends_on. The annotate step posts wheel download info and isn't a real publish prerequisite.
| - create-multi-arch-manifest-cuda-12-9 | ||
| - create-multi-arch-manifest-ubuntu2404 | ||
| - create-multi-arch-manifest-cuda-12-9-ubuntu2404 |
There was a problem hiding this comment.
does it not need to wait on create-multi-arch-manifest?
There was a problem hiding this comment.
Good catch — added create-multi-arch-manifest to the depends_on. It wasn't strictly needed because the publish script pulls per-arch images directly (not the multi-arch manifest), but with annotate-release-workflow removed we lose the transitive wait on the CUDA 13.0 builds. Adding it explicitly also makes the depends_on symmetric with the other variants (-cuda-12-9, -ubuntu2404, -cuda-12-9-ubuntu2404).
| - label: "Publish release images to DockerHub" | ||
| depends_on: | ||
| - block-publish-release-images | ||
| id: publish-release-images-dockerhub |
The PyPI release script invokes `python setup.py sdist` in a fresh venv that only had `twine` installed, but setup.py imports torch, setuptools_scm, packaging, and setuptools at parse time. Install requirements/build/cuda.txt so those imports resolve. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> Signed-off-by: khluu <khluu000@gmail.com>
- block-publish-release-images: replace `annotate-release-workflow` with `create-multi-arch-manifest`. The annotate step posts wheel download info and isn't a real prerequisite for image publishing; swapping in the CUDA 13.0 multi-arch manifest step preserves the transitive wait on the CUDA 13.0 builds and matches the other variants already listed. - Use `key:` instead of `id:` on the publish step to match the canonical Buildkite identifier and the new CPU build step keys. Signed-off-by: Kevin H. Luu <kevin@anyscale.com> Co-authored-by: Claude <noreply@anthropic.com> Signed-off-by: khluu <khluu000@gmail.com>
| CPU_X86_TAG=public.ecr.aws/q9t5s3a7/vllm-cpu-release-repo:v${RELEASE_VERSION} | ||
| CPU_ARM_TAG=public.ecr.aws/q9t5s3a7/vllm-arm64-cpu-release-repo:v${RELEASE_VERSION} | ||
|
|
||
| CPU_X86_AVAILABLE=false | ||
| CPU_ARM_AVAILABLE=false | ||
| docker manifest inspect "${CPU_X86_TAG}" >/dev/null 2>&1 && CPU_X86_AVAILABLE=true | ||
| docker manifest inspect "${CPU_ARM_TAG}" >/dev/null 2>&1 && CPU_ARM_AVAILABLE=true | ||
|
|
||
| if [ "$CPU_X86_AVAILABLE" = "true" ] && [ "$CPU_ARM_AVAILABLE" = "true" ]; then | ||
| docker pull "${CPU_X86_TAG}" | ||
| docker tag "${CPU_X86_TAG}" vllm/vllm-openai-cpu:latest-x86_64 | ||
| docker tag "${CPU_X86_TAG}" vllm/vllm-openai-cpu:v${RELEASE_VERSION}-x86_64 | ||
| docker push vllm/vllm-openai-cpu:latest-x86_64 | ||
| docker push vllm/vllm-openai-cpu:v${RELEASE_VERSION}-x86_64 | ||
|
|
||
| docker pull "${CPU_ARM_TAG}" | ||
| docker tag "${CPU_ARM_TAG}" vllm/vllm-openai-cpu:latest-arm64 | ||
| docker tag "${CPU_ARM_TAG}" vllm/vllm-openai-cpu:v${RELEASE_VERSION}-arm64 | ||
| docker push vllm/vllm-openai-cpu:latest-arm64 | ||
| docker push vllm/vllm-openai-cpu:v${RELEASE_VERSION}-arm64 | ||
|
|
||
| docker manifest rm vllm/vllm-openai-cpu:latest || true | ||
| docker manifest rm vllm/vllm-openai-cpu:v${RELEASE_VERSION} || true | ||
| docker manifest create vllm/vllm-openai-cpu:latest vllm/vllm-openai-cpu:latest-x86_64 vllm/vllm-openai-cpu:latest-arm64 | ||
| docker manifest create vllm/vllm-openai-cpu:v${RELEASE_VERSION} vllm/vllm-openai-cpu:v${RELEASE_VERSION}-x86_64 vllm/vllm-openai-cpu:v${RELEASE_VERSION}-arm64 | ||
| docker manifest push vllm/vllm-openai-cpu:latest | ||
| docker manifest push vllm/vllm-openai-cpu:v${RELEASE_VERSION} | ||
| elif [ "$CPU_X86_AVAILABLE" = "false" ] && [ "$CPU_ARM_AVAILABLE" = "false" ]; then |
There was a problem hiding this comment.
🔴 On the build side (release-pipeline.yaml lines 320, 322, 339, 341), CPU images are pushed to ECR using the raw $(buildkite-agent meta-data get release-version), but publish-release-images.sh strips a leading v at line 11 and re-prepends v at lines 140-141 — so if an operator types 1.2.3 (no v) into the unvalidated input field, the build pushes :1.2.3 while publish looks for :v1.2.3. Both docker manifest inspect calls 404, the script falls into the elif at line 167 ("Neither CPU image found"), prints a misleading WARNING ("ensure block step was unblocked" — they did) and exits 0 green, leaving vllm/vllm-openai-cpu:latest and :v${RELEASE_VERSION} stale from the previous release. The PR added docker manifest inspect (closing the partial-build observability gap raised in the resolved review comment) but did NOT close this v-prefix root cause; fix is one line — mirror the sed s/^v// on the build side (or strip on both, or validate the input field).
Extended reasoning...
What goes wrong
publish-release-images.sh and release-pipeline.yaml disagree on whether the operator-supplied release version carries a leading v:
- Publish side (
publish-release-images.sh):- Line 11:
RELEASE_VERSION=$(buildkite-agent meta-data get release-version --default "" | sed 's/^v//')— strips a leadingv. - Lines 140-141:
CPU_X86_TAG=public.ecr.aws/q9t5s3a7/vllm-cpu-release-repo:v${RELEASE_VERSION}andCPU_ARM_TAG=...:v${RELEASE_VERSION}— re-prependsv. - The publish script is therefore tolerant of either input format (
1.2.3orv1.2.3) and always inspects:v1.2.3.
- Line 11:
- Build side (
release-pipeline.yaml):- Line 320 (x86 build):
--tag public.ecr.aws/q9t5s3a7/vllm-cpu-release-repo:$(buildkite-agent meta-data get release-version) - Line 322 (x86 push):
docker push public.ecr.aws/q9t5s3a7/vllm-cpu-release-repo:$(buildkite-agent meta-data get release-version) - Line 339 (arm64 build) / 341 (arm64 push): identical pattern with
vllm-arm64-cpu-release-repo. - Builds use the raw operator input — no
vnormalization. Builds push:1.2.3if the operator typed1.2.3, or:v1.2.3if they typedv1.2.3.
- Line 320 (x86 build):
The "Provide Release version here" input step has no validation or hint about the expected format, and the publish script's own sed signals that mixed input is acceptable. So the formats round-trip inconsistently.
Step-by-step proof
- Release manager unblocks
block-cpu-release-image-buildandblock-arm64-cpu-release-image-build, and types1.2.3(nov) into the unvalidated input field. - CPU build steps push
public.ecr.aws/q9t5s3a7/vllm-cpu-release-repo:1.2.3andvllm-arm64-cpu-release-repo:1.2.3to ECR (line 322 / 341). publish-release-images.shruns. Line 11 setsRELEASE_VERSION=1.2.3(sed strips nothing because there's no leadingv). Lines 140-141 setCPU_X86_TAG=...:v1.2.3andCPU_ARM_TAG=...:v1.2.3.- Lines 145-146:
docker manifest inspect "${CPU_X86_TAG}"returns "manifest unknown" (the tag pushed was:1.2.3, not:v1.2.3).CPU_X86_AVAILABLEstaysfalse. Same for arm64. - Line 148 guard
CPU_X86_AVAILABLE=true && CPU_ARM_AVAILABLE=trueevaluates false; line 167elifmatches (both false); script printsWARNING: Neither CPU image found in ECR, skipping CPU publish (ensure block-cpu-release-image-build and block-arm64-cpu-release-image-build were unblocked and the builds finished pushing)— even though the operator DID unblock both gates and the builds DID succeed. - Script falls through to
echo "Successfully published release images for v1.2.3"and exits 0. - Buildkite reports the publish step green.
vllm/vllm-openai-cpu:latestandvllm/vllm-openai-cpu:v${RELEASE_VERSION}on Docker Hub remain pointing at whatever the previous release published.docker pull vllm/vllm-openai-cpu:latestsilently returns the prior release's image.
Why existing protections don't catch it
- The PR's
docker manifest inspectrewrite addresses the original review comment's partial-build observability concern (distinguishing "not built" from "pull failed"), and the new partial-buildelsebranch (lines 169-176) correctly fails loudly when exactly one arch is available. But when both arches are present in ECR under a different tag (the v-prefix mismatch case), inspect returns "manifest unknown" for both, theelifat line 167 matches, and the script falls into the silent-skip path with a misleading WARNING. upload-release-wheels-pypi.shhas aGIT_VERSIONsanity check that would catch a wrong format for the wheel path (PURE_VERSION=${RELEASE_VERSION#v}line 31), but that runs in a separate step on a separate gate; the docker publish path has no equivalent check and produces no error signal.- Operator convention is
vX.Y.Zper past releases, but conventions are not validation. The publish script's ownsed 's/^v//'invites mixed input, and a one-character omission yields a silent stale release.
Impact
Bounded but real:
- Only
vllm/vllm-openai-cpu:latestandvllm/vllm-openai-cpu:v${RELEASE_VERSION}(multi-arch + per-arch tags) go silently stale. - Build is reported green, no paging signal, no diagnostic for the operator other than a WARNING that misdirects them ("ensure block step was unblocked" — but they did).
- Users
docker pull vllm/vllm-openai-cpu:latestget the previous release. - The previous manual flow had the same mismatch but a human running the commands would have seen the 404; the new automation makes it silent — a regression in observability vs. the manual flow it replaces.
How to fix
One-line, two viable options:
- Mirror the
sedon the build side, e.g. replace$(buildkite-agent meta-data get release-version)with$(buildkite-agent meta-data get release-version | sed 's/^v//')at lines 320, 322, 339, 341, and update the publish script to inspect:${RELEASE_VERSION}(novprefix). Or strip the v on both sides and add it consistently in tag generation. - Validate the input format on the "Provide Release version here" step (e.g. require a
vprefix or normalize via a hint), so the round-trip is well-defined.
Either fix closes the silent-skip path; verifiers were unanimous.
| - label: "Publish release images to DockerHub" | ||
| depends_on: | ||
| - block-publish-release-images | ||
| id: publish-release-images-dockerhub |
There was a problem hiding this comment.
🟡 The new publish-release-images-dockerhub label step (lines 460-474) depends on block-publish-release-images, which is gated with if: build.env("NIGHTLY") != "1" (line 458) and so is filtered out of nightly pipeline uploads. The label step itself carries neither a matching if: clause nor an allow_dependency_failure: true on its dependency, so its presence in nightly builds relies on Buildkite's transitive-filter semantics rather than the explicit pattern used 350 lines earlier in this same file at lines 107-115. Suggested fix (one line): add if: build.env("NIGHTLY") != "1" to the label step to mirror the block step's guard, making the intent self-documenting and not dependent on undocumented edge cases of depends_on referencing a filtered step.
Extended reasoning...
What the bug is
block-publish-release-images (line 441) is gated with if: build.env("NIGHTLY") != "1" on line 458. The dependent publish-release-images-dockerhub label step (lines 460-474) lists depends_on: - block-publish-release-images but has neither a matching if: guard nor allow_dependency_failure: true on the dependency. So during a nightly pipeline upload, the label step survives the upload but its only dependency does not.
Why this is a consistency gap
The same author/file has already encoded the right pattern for the analogous case 350 lines earlier:
block-build-release-images(line 107) is gated withif: build.env("NIGHTLY") != "1"(line 110).- The dependent
build-release-imagesgroup (line 112) explicitly setsallow_dependency_failure: true(line 115).
That pattern fits that case (the build group should run in BOTH modes — in nightly the upstream block is missing, so allow_dependency_failure: true lets the group proceed). For the new publish step the intent is different: it should NOT run in nightly. The two valid encodings of that intent in this file are:
- Add
if: build.env("NIGHTLY") != "1"to the label step itself — mirrors the explicit guard used for the existing nightly-onlyPublish nightly multi-arch image to DockerHubsteps (which useif: build.env("NIGHTLY") == "1"). - Leave it to Buildkite's transitive filter on the missing dependency — what the PR currently does.
Step-by-step trace of nightly behavior
- Nightly pipeline upload runs with
NIGHTLY=1. block-publish-release-imagesis filtered out by itsif:clause and is not present in the uploaded pipeline.publish-release-images-dockerhubis uploaded (noif:on it). Its sole dependencyblock-publish-release-imagesdoes not exist in this build.- Buildkite's resolution: a step whose
depends_onreferences a non-existent step is marked broken/skipped (withoutallow_dependency_failure). The publish step is silently dropped.
In the happy path this is the intended outcome. The risk is that step (4) relies on Buildkite's behavior being stable across versions and matching the author's mental model — and unlike approach (1), there is nothing in the YAML that tells a future reader 'this step is intentionally nightly-gated'.
Addressing the refutation
A fellow verifier argues this is intentional behavior leveraging Buildkite's transitive-filter semantics, and that the lines 113-115 pattern is not analogous because the build group must run in both modes whereas the publish step must not. That refutation is correct on intent — the two cases ARE different, and the refutation is right that simply pasting allow_dependency_failure: true here would be wrong (it would make publish run in nightly without any block gate).
But the refutation does not address the explicit-vs-implicit gap. The two existing patterns in this file (build-release-images group with allow_dependency_failure: true, and Publish nightly steps with if: NIGHTLY == "1") both make the gating visible at the step that runs. The new publish step is the first place in this file where the gating is only on the upstream block. That is the consistency gap, and it is also why a one-line if: build.env("NIGHTLY") != "1" on the label step is the cleanest fix — it doesn't change runtime behavior, it just documents the intent in the right place.
Impact
Bounded — in the most likely case the existing pattern works (Buildkite skips the publish step in nightly as intended). The downside is reduced readability and a small risk if Buildkite's filtered-dependency semantics ever change or differ across pipeline-upload code paths. Severity: nit.
Fix
Add the matching guard on the label step:
- label: "Publish release images to DockerHub"
depends_on:
- block-publish-release-images
id: publish-release-images-dockerhub
if: build.env("NIGHTLY") != "1" # <-- add this
agents:
queue: small_cpu_queue_release
...This makes the nightly-skip intent explicit at the step that runs and matches the file's own convention of guarding nightly-conditional steps with their own if: clause.
| python3 -m venv /tmp/vllm-release-env | ||
| source /tmp/vllm-release-env/bin/activate | ||
| pip install twine | ||
| pip install -r requirements/build/cuda.txt |
There was a problem hiding this comment.
🔴 Installing requirements/build/cuda.txt (line 46) is intended to make python setup.py sdist (line 59) succeed, but the script still cannot reach a green PyPI upload: (1) cuda.txt pins CUDA-built torch==2.11.0, so setup.py auto-detects VLLM_TARGET_DEVICE=cuda and get_vllm_version() calls get_nvcc_cuda_version() at setup.py:916, which asserts CUDA_HOME is not None (setup.py:888) — small_cpu_queue_release has no CUDA toolkit, so sdist exits before producing the .tar.gz; the sdist-skip guard at setup.py:920 sits after the assertion, so it does not help. (2) Even if sdist succeeded, lines 66/72-73 are broken: PYPI_WHEEL_FILES=$(find …) returns the two default-variant wheels (x86_64 + aarch64) newline-separated, and twine check "$PYPI_WHEEL_FILES" … quotes them as a single argument with an embedded newline, which twine treats as one nonexistent path. Fix: set VLLM_USE_PRECOMPILED=1 (or VLLM_TARGET_DEVICE=empty) on the sdist invocation, and switch the twine call to a bash array (mapfile -t PYPI_WHEEL_FILES < <(find …); twine check "${PYPI_WHEEL_FILES[@]}" "$SDIST_FILE").
Extended reasoning...
What the bug is
This PR adds pip install -r requirements/build/cuda.txt at line 46 specifically to make the python setup.py sdist invocation on line 59 succeed (previously it failed at module-level import torch in setup.py and aborted under set -e). The intended outcome — a working PyPI upload — is still not achieved, because that one-line change activates two independent downstream failures, both of which would fire the first time block-upload-release-wheels is unblocked.
Failure mode A — setup.py sdist aborts on a CPU-only agent
After the PR, sdist gets past import torch and begins module-level execution. With requirements/build/cuda.txt pinning the default PyPI torch==2.11.0 (CUDA-built wheel, torch.version.cuda is set), setup.py auto-detection at lines 53-64 sets VLLM_TARGET_DEVICE = "cuda". The module-level setup(version=get_vllm_version(), ...) call then runs get_vllm_version() (setup.py:898). _is_cuda() is true; VLLM_USE_PRECOMPILED is unset; the function falls through to setup.py:916 → str(get_nvcc_cuda_version()), whose first line (setup.py:888) is assert CUDA_HOME is not None, "CUDA_HOME is not set".
The agent that runs upload-release-wheels-pypi.sh is small_cpu_queue_release (release-pipeline.yaml). The queue name and the absence of any CUDA-toolkit installation step mean CUDA_HOME resolves to None (no env var, no /usr/local/cuda, no nvcc). The assertion raises, sdist exits non-zero, and set -e aborts the script at line 59 — before the twine call ever runs. Critically, the sdist-skip guard at setup.py:920 (if "sdist" not in sys.argv: ...) sits after the failing call at line 916, so it does not protect this path.
Failure mode B — newline-quoted wheel paths
If failure mode A is fixed, the script reaches lines 66-73:
PYPI_WHEEL_FILES=$(find $DIST_DIR -name "vllm-${PURE_VERSION}*.whl" -not -name "*+*")
...
python3 -m twine check "$PYPI_WHEEL_FILES" "$SDIST_FILE"
python3 -m twine upload --non-interactive --verbose "$PYPI_WHEEL_FILES" "$SDIST_FILE"The vLLM release pipeline produces two default-variant wheels (x86_64 + aarch64; the +cu129/+cpu variants are filtered by -not -name "*+*"). find outputs them newline-separated. Quoting "$PYPI_WHEEL_FILES" passes both paths to twine as a single argument with an embedded newline, which twine treats as one filename and fails opening.
Step-by-step proof of failure mode A
- Operator unblocks
block-upload-release-wheels. The script runs onsmall_cpu_queue_release. - Line 46 installs torch==2.11.0 (CUDA-built);
torch.version.cudais set. - Line 59 runs
python setup.py sdist. setup.py:53-64 setsVLLM_TARGET_DEVICE = "cuda". - Module-level
setup(version=get_vllm_version(), ...)at setup.py:1085 runsget_vllm_version(). - setup.py:912
_is_cuda()is True;VLLM_USE_PRECOMPILEDis unset → falls to else at setup.py:916. get_nvcc_cuda_version()runs; setup.py:888assert CUDA_HOME is not Noneraises AssertionError on the CPU-only agent.- sdist exits non-zero,
set -eaborts the script. PyPI upload is blocked. The script never reaches the twine call, so failure mode B does not even fire — it is shadowed.
Step-by-step proof of failure mode B (assuming A is fixed)
findreturns two paths:/tmp/.../vllm-${VER}-cp38-abi3-manylinux_2_35_x86_64.whland..._aarch64.whl.PYPI_WHEEL_FILESis set to those two paths joined by\n.twine check "$PYPI_WHEEL_FILES" "$SDIST_FILE"invokes twine withargv[1]containing both paths separated by a literal newline.- twine treats
argv[1]as a single filename, callsopen(), and fails with no-such-file.
Verified empirically: X=$(printf "a\nb"); set -- "$X"; echo $# prints 1 — quoting flattens the newline-separated list to one argument. Without quoting, IFS-based word-splitting yields two arguments.
Why existing code does not prevent this
- The sdist-skip guard at setup.py:920 only suppresses the
+cuXXXsuffix on the version string; it appears after theget_nvcc_cuda_version()call on line 916, so the assertion has already triggered. The guard cannot help. - The PR test plan exercises the docker publish path (
Verify the publish script…,Verify the block step appears…) but does not exerciseupload-release-wheels-pypi.shend to end. The wheel upload is gated behind a manual block (block-upload-release-wheels), so the PR can land green without either failure being observed. - The script was added in commit ae3b4de on 2026-05-02 with the broken twine quoting already present, but pre-PR the script aborted at line 59 (
ImportError) underset -e, so the twine call has never executed. This PR is the first commit that brings either failure into the live code path.
How to fix
Two minimal changes:
- Set
VLLM_USE_PRECOMPILED=1(orVLLM_TARGET_DEVICE=empty) on the sdist invocation at line 59, so_is_cuda()takes the precompiled branch at setup.py:913 and skipsget_nvcc_cuda_version(). Equivalent: install a CPU-only torch (pip install torch --index-url https://download.pytorch.org/whl/cpu) sotorch.version.cudaisNoneand_is_cuda()returns False during version detection. - Use a bash array for the twine call:
mapfile -t PYPI_WHEEL_FILES < <(find "$DIST_DIR" -name "vllm-${PURE_VERSION}*.whl" -not -name "*+*")
if [[ ${#PYPI_WHEEL_FILES[@]} -eq 0 ]]; then
echo "No default variant wheels found, quitting..."; exit 1
fi
python3 -m twine check "${PYPI_WHEEL_FILES[@]}" "$SDIST_FILE"
python3 -m twine upload --non-interactive --verbose "${PYPI_WHEEL_FILES[@]}" "$SDIST_FILE"Either fix alone leaves the other failure live, so both must land for the PyPI upload step to succeed.
Signed-off-by: khluu <khluu000@gmail.com>
Signed-off-by: khluu <khluu000@gmail.com> Co-authored-by: hongbolv <33214277+hongbolv@users.noreply.github.com>
Signed-off-by: khluu <khluu000@gmail.com> Signed-off-by: Ifta Khairul Alam Adil <ikaadil007@gmail.com>
Signed-off-by: khluu <khluu000@gmail.com> (cherry picked from commit 1333864)
Signed-off-by: khluu <khluu000@gmail.com> Signed-off-by: Libin Tang <libin.tang@intel.com>
Replaces manual docker publish instructions in annotate-release.sh with a unified publish-release-images.sh script and corresponding Buildkite pipeline steps. The new script handles all image variants (CUDA 12.9, CUDA 13.0, Ubuntu 24.04, ROCm, and CPU), automating the pull, tag, push, and multi-arch manifest creation for Docker Hub releases. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> Signed-off-by: khluu <khluu000@gmail.com>
Signed-off-by: khluu <khluu000@gmail.com>
Replaces manual docker publish instructions in annotate-release.sh with a unified publish-release-images.sh script and corresponding Buildkite pipeline steps. The new script handles all image variants (CUDA 12.9, CUDA 13.0, Ubuntu 24.04, ROCm, and CPU), automating the pull, tag, push, and multi-arch manifest creation for Docker Hub releases. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> Signed-off-by: khluu <khluu000@gmail.com>
Replaces manual docker publish instructions in annotate-release.sh with a unified publish-release-images.sh script and corresponding Buildkite pipeline steps. The new script handles all image variants (CUDA 12.9, CUDA 13.0, Ubuntu 24.04, ROCm, and CPU), automating the pull, tag, push, and multi-arch manifest creation for Docker Hub releases. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> Signed-off-by: khluu <khluu000@gmail.com>
Signed-off-by: khluu <khluu000@gmail.com>
Signed-off-by: khluu <khluu000@gmail.com>
Summary
publish-release-images.shscript that handles all Docker Hub publishing: CUDA 13.0 (default), CUDA 12.9, Ubuntu 24.04 (both CUDA versions), ROCm, and CPU|| trueondocker manifest rmand missing versioned manifest cleanupDetails
Previously, the
annotate-release.shscript printed manual docker commands that a human had to copy-paste to publish release images to Docker Hub. This was error-prone — Ubuntu 24.04 images were missing entirely, anddocker manifest rmcommands would fail on first-time releases without|| true.This PR:
.buildkite/scripts/publish-release-images.sh— pulls images from ECR, tags withlatestandv$VERSION, pushes individual arch tags, creates multi-arch manifests for all variants (CUDA 13.0, cu129, ubuntu2404, cu129-ubuntu2404, ROCm, CPU)release-pipeline.yaml— requires manual unblock before publishing, depends on all build/manifest steps completing, only runs for non-nightly buildsRebased on current main (was previously based on older codebase with
cu130naming).No other open PR addresses this. AI assistance (Claude) was used in preparing this PR.
Test plan
🤖 Generated with Claude Code