[CI] Automate Docker Hub release image publishing by khluu · Pull Request #40415 · vllm-project/vllm

khluu · 2026-04-20T23:43:29Z

Summary

Replaces manual docker tag/push/manifest commands from the release annotation script with an automated Buildkite pipeline step
Adds a new publish-release-images.sh script that handles all Docker Hub publishing: CUDA 13.0 (default), CUDA 12.9, Ubuntu 24.04 (both CUDA versions), ROCm, and CPU
Adds the previously missing Ubuntu 24.04 images to the release flow
Fixes missing || true on docker manifest rm and missing versioned manifest cleanup

Details

Previously, the annotate-release.sh script printed manual docker commands that a human had to copy-paste to publish release images to Docker Hub. This was error-prone — Ubuntu 24.04 images were missing entirely, and docker manifest rm commands would fail on first-time releases without || true.

This PR:

New script .buildkite/scripts/publish-release-images.sh — pulls images from ECR, tags with latest and v$VERSION, pushes individual arch tags, creates multi-arch manifests for all variants (CUDA 13.0, cu129, ubuntu2404, cu129-ubuntu2404, ROCm, CPU)
New pipeline step with a block gate in release-pipeline.yaml — requires manual unblock before publishing, depends on all build/manifest steps completing, only runs for non-nightly builds
Simplified annotate script — now only shows wheel download commands, with a note that docker images are published automatically
CPU images handled gracefully — skipped with a warning if their separate block steps weren't unblocked

Rebased on current main (was previously based on older codebase with cu130 naming).

No other open PR addresses this. AI assistance (Claude) was used in preparing this PR.

Test plan

Verify the publish script correctly tags and pushes all image variants by triggering a release pipeline run
Verify the block step appears in Buildkite with correct dependencies
Verify the annotate script still renders wheel download info correctly
Confirm CPU skip logic works when CPU block steps are not unblocked

🤖 Generated with Claude Code

claude

Claude Code Review

This repository is configured for manual code reviews. Comment @claude review to trigger a review and subscribe this PR to future pushes, or @claude review once for a one-time review.

_{Tip: disable this comment in your organization's Code Review settings.}

gemini-code-assist

Code Review

This pull request updates the release script to include support for Ubuntu 24.04 Docker images. It adds commands to pull, tag, push, and create multi-architecture manifests for both standard and CUDA 13.0 variants across x86_64 and aarch64 architectures. I have no feedback to provide.

Replaces manual docker publish instructions in annotate-release.sh with a unified publish-release-images.sh script and corresponding Buildkite pipeline steps. The new script handles all image variants (CUDA 12.9, CUDA 13.0, Ubuntu 24.04, ROCm, and CPU), automating the pull, tag, push, and multi-arch manifest creation for Docker Hub releases. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> Signed-off-by: khluu <khluu000@gmail.com>

This reverts commit 579602a.

Move docker tag/push/manifest logic from the annotation script into a new publish-release-images.sh that runs as an automated pipeline step. Adds all image variants including Ubuntu 24.04, CUDA 12.9, ROCm, and CPU with graceful fallback for CPU images behind their own block steps. Co-authored-by: Claude Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> Signed-off-by: khluu <khluu000@gmail.com>

simon-mo · 2026-05-05T16:16:49Z

@claude review

- Use `docker manifest inspect` to distinguish a missing CPU image (skip with warning) from a real pull failure (fail loudly under set -e), instead of `if docker pull ... 2>/dev/null; then` which masked all non-zero exits as "image not built". - Add the CPU build steps to `block-publish-release-images.depends_on` with `allow_failure: true`, so publish doesn't race an in-progress CPU build but still proceeds when the operator legitimately leaves the CPU block steps unblocked. - Normalize the `v` prefix on the CPU build-side tags too, so the round-trip with the publish script is robust regardless of whether the operator types `1.2.3` or `v1.2.3` into the input field. - Use `buildkite-agent meta-data get --default ""` so the explicit empty-check is reachable under `set -euo pipefail` (otherwise pipefail aborts before the check). Signed-off-by: Kevin H. Luu <kevin@anyscale.com> Co-authored-by: Claude <noreply@anthropic.com> Signed-off-by: khluu <khluu000@gmail.com>

- publish-release-images.sh: restructure CPU section to inspect both arches up front, then either publish everything (per-arch tags + multi-arch manifest) or fail loudly. The previous structure pushed per-arch tags independently of the multi-arch gate, so a partial state (one arch fails under publish gate's `allow_failure: true`) would leave `:latest-x86_64` pointing at the new release while the `:latest` multi-arch manifest still resolved to the previous release. - release-pipeline.yaml: revert the build-side `| sed 's/^v//'` normalization. `buildkite-agent meta-data get release-version` always returns the value with the `v` prefix already, so the strip-then- re-prepend was a no-op and added noise. Signed-off-by: Kevin H. Luu <kevin@anyscale.com> Co-authored-by: Claude <noreply@anthropic.com> Signed-off-by: khluu <khluu000@gmail.com>

claude

All prior review feedback addressed: all-or-nothing CPU publish with explicit fail-loud on partial state, manifest-inspect probe instead of silent pull-skip, race window closed via CPU build deps with allow_failure, and metadata fetch hardened with --default.

Extended reasoning...

Overview

This PR is purely CI/build infrastructure: it replaces the manual copy-paste docker tag/push commands previously printed by annotate-release.sh with an automated Buildkite step that runs a new publish-release-images.sh. It also adds key: IDs to two CPU build steps and adds a new block-gated publish step. No runtime/production code paths are touched.

Security risks

None. The script runs in the existing release pipeline with the same DockerHub credentials and ECR auth that the manual flow relied on. Inputs are operator-supplied via Buildkite metadata; tag construction uses shell variables in trusted-context strings (no untrusted user input is interpolated into shell evaluation contexts that didn't already exist in the prior manual workflow).

Level of scrutiny

Lower than production code: this is release-time tooling guarded by manual block steps and is not in the request-serving path. The blast radius of bugs is limited to the Docker Hub release artifacts, and operators can re-run the publish step idempotently. That said, I scrutinized the prior revisions because silent failures here can ship stale images — and those concerns have all been resolved in the current commits.

Other factors

All three prior review items are now visibly addressed in the diff: (1) line 11 uses buildkite-agent meta-data get release-version --default "" so the explicit empty-check on line 12 fires as intended; (2) the CPU section uses docker manifest inspect first to distinguish 'not built' from 'pull failed', then enforces all-or-nothing — both arches publish together or neither does, with a non-zero exit on the partial state to prevent split-tag drift between :latest-x86_64 and the multi-arch :latest; (3) the publish block now lists the CPU build steps in depends_on with allow_failure: true, closing the race window where publish could run before the in-progress CPU build pushed to ECR. Recent commits 8f214e0 and 2b83d0c match this exactly. Straightforward CI change worth approving.

khluu · 2026-05-05T23:41:12Z

+      - block: "Publish release images to DockerHub"
+        key: block-publish-release-images
+        depends_on:
+          - annotate-release-workflow


it doesn't need to wait for release workflow annotation

Done — dropped annotate-release-workflow from the depends_on. The annotate step posts wheel download info and isn't a real publish prerequisite.

khluu · 2026-05-05T23:42:30Z

+          - create-multi-arch-manifest-cuda-12-9
+          - create-multi-arch-manifest-ubuntu2404
+          - create-multi-arch-manifest-cuda-12-9-ubuntu2404


does it not need to wait on create-multi-arch-manifest?

Good catch — added create-multi-arch-manifest to the depends_on. It wasn't strictly needed because the publish script pulls per-arch images directly (not the multi-arch manifest), but with annotate-release-workflow removed we lose the transitive wait on the CUDA 13.0 builds. Adding it explicitly also makes the depends_on symmetric with the other variants (-cuda-12-9, -ubuntu2404, -cuda-12-9-ubuntu2404).

khluu · 2026-05-05T23:43:05Z

+      - label: "Publish release images to DockerHub"
+        depends_on:
+          - block-publish-release-images
+        id: publish-release-images-dockerhub


use key instead of id

Done — switched to key:.

The PyPI release script invokes `python setup.py sdist` in a fresh venv that only had `twine` installed, but setup.py imports torch, setuptools_scm, packaging, and setuptools at parse time. Install requirements/build/cuda.txt so those imports resolve. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> Signed-off-by: khluu <khluu000@gmail.com>

- block-publish-release-images: replace `annotate-release-workflow` with `create-multi-arch-manifest`. The annotate step posts wheel download info and isn't a real prerequisite for image publishing; swapping in the CUDA 13.0 multi-arch manifest step preserves the transitive wait on the CUDA 13.0 builds and matches the other variants already listed. - Use `key:` instead of `id:` on the publish step to match the canonical Buildkite identifier and the new CPU build step keys. Signed-off-by: Kevin H. Luu <kevin@anyscale.com> Co-authored-by: Claude <noreply@anthropic.com> Signed-off-by: khluu <khluu000@gmail.com>

claude · 2026-05-05T23:57:34Z

+CPU_X86_TAG=public.ecr.aws/q9t5s3a7/vllm-cpu-release-repo:v${RELEASE_VERSION}
+CPU_ARM_TAG=public.ecr.aws/q9t5s3a7/vllm-arm64-cpu-release-repo:v${RELEASE_VERSION}
+
+CPU_X86_AVAILABLE=false
+CPU_ARM_AVAILABLE=false
+docker manifest inspect "${CPU_X86_TAG}" >/dev/null 2>&1 && CPU_X86_AVAILABLE=true
+docker manifest inspect "${CPU_ARM_TAG}" >/dev/null 2>&1 && CPU_ARM_AVAILABLE=true
+
+if [ "$CPU_X86_AVAILABLE" = "true" ] && [ "$CPU_ARM_AVAILABLE" = "true" ]; then
+  docker pull "${CPU_X86_TAG}"
+  docker tag "${CPU_X86_TAG}" vllm/vllm-openai-cpu:latest-x86_64
+  docker tag "${CPU_X86_TAG}" vllm/vllm-openai-cpu:v${RELEASE_VERSION}-x86_64
+  docker push vllm/vllm-openai-cpu:latest-x86_64
+  docker push vllm/vllm-openai-cpu:v${RELEASE_VERSION}-x86_64
+
+  docker pull "${CPU_ARM_TAG}"
+  docker tag "${CPU_ARM_TAG}" vllm/vllm-openai-cpu:latest-arm64
+  docker tag "${CPU_ARM_TAG}" vllm/vllm-openai-cpu:v${RELEASE_VERSION}-arm64
+  docker push vllm/vllm-openai-cpu:latest-arm64
+  docker push vllm/vllm-openai-cpu:v${RELEASE_VERSION}-arm64
+
+  docker manifest rm vllm/vllm-openai-cpu:latest || true
+  docker manifest rm vllm/vllm-openai-cpu:v${RELEASE_VERSION} || true
+  docker manifest create vllm/vllm-openai-cpu:latest vllm/vllm-openai-cpu:latest-x86_64 vllm/vllm-openai-cpu:latest-arm64
+  docker manifest create vllm/vllm-openai-cpu:v${RELEASE_VERSION} vllm/vllm-openai-cpu:v${RELEASE_VERSION}-x86_64 vllm/vllm-openai-cpu:v${RELEASE_VERSION}-arm64
+  docker manifest push vllm/vllm-openai-cpu:latest
+  docker manifest push vllm/vllm-openai-cpu:v${RELEASE_VERSION}
+elif [ "$CPU_X86_AVAILABLE" = "false" ] && [ "$CPU_ARM_AVAILABLE" = "false" ]; then


🔴 On the build side (release-pipeline.yaml lines 320, 322, 339, 341), CPU images are pushed to ECR using the raw $(buildkite-agent meta-data get release-version), but publish-release-images.sh strips a leading v at line 11 and re-prepends v at lines 140-141 — so if an operator types 1.2.3 (no v) into the unvalidated input field, the build pushes :1.2.3 while publish looks for :v1.2.3. Both docker manifest inspect calls 404, the script falls into the elif at line 167 ("Neither CPU image found"), prints a misleading WARNING ("ensure block step was unblocked" — they did) and exits 0 green, leaving vllm/vllm-openai-cpu:latest and :v${RELEASE_VERSION} stale from the previous release. The PR added docker manifest inspect (closing the partial-build observability gap raised in the resolved review comment) but did NOT close this v-prefix root cause; fix is one line — mirror the sed s/^v// on the build side (or strip on both, or validate the input field).

Extended reasoning...

What goes wrong

publish-release-images.sh and release-pipeline.yaml disagree on whether the operator-supplied release version carries a leading v:

Publish side (publish-release-images.sh):

Line 11: RELEASE_VERSION=$(buildkite-agent meta-data get release-version --default "" | sed 's/^v//') — strips a leading v.

Lines 140-141: CPU_X86_TAG=public.ecr.aws/q9t5s3a7/vllm-cpu-release-repo:v${RELEASE_VERSION} and CPU_ARM_TAG=...:v${RELEASE_VERSION} — re-prepends v.

The publish script is therefore tolerant of either input format (1.2.3 or v1.2.3) and always inspects :v1.2.3.

Build side (release-pipeline.yaml):

Line 320 (x86 build): --tag public.ecr.aws/q9t5s3a7/vllm-cpu-release-repo:$(buildkite-agent meta-data get release-version)

Line 322 (x86 push): docker push public.ecr.aws/q9t5s3a7/vllm-cpu-release-repo:$(buildkite-agent meta-data get release-version)

Line 339 (arm64 build) / 341 (arm64 push): identical pattern with vllm-arm64-cpu-release-repo.

Builds use the raw operator input — no v normalization. Builds push :1.2.3 if the operator typed 1.2.3, or :v1.2.3 if they typed v1.2.3.

The "Provide Release version here" input step has no validation or hint about the expected format, and the publish script's own sed signals that mixed input is acceptable. So the formats round-trip inconsistently.

Step-by-step proof

Release manager unblocks block-cpu-release-image-build and block-arm64-cpu-release-image-build, and types 1.2.3 (no v) into the unvalidated input field.

CPU build steps push public.ecr.aws/q9t5s3a7/vllm-cpu-release-repo:1.2.3 and vllm-arm64-cpu-release-repo:1.2.3 to ECR (line 322 / 341).

publish-release-images.sh runs. Line 11 sets RELEASE_VERSION=1.2.3 (sed strips nothing because there's no leading v). Lines 140-141 set CPU_X86_TAG=...:v1.2.3 and CPU_ARM_TAG=...:v1.2.3.

Lines 145-146: docker manifest inspect "${CPU_X86_TAG}" returns "manifest unknown" (the tag pushed was :1.2.3, not :v1.2.3). CPU_X86_AVAILABLE stays false. Same for arm64.

Line 148 guard CPU_X86_AVAILABLE=true && CPU_ARM_AVAILABLE=true evaluates false; line 167 elif matches (both false); script prints WARNING: Neither CPU image found in ECR, skipping CPU publish (ensure block-cpu-release-image-build and block-arm64-cpu-release-image-build were unblocked and the builds finished pushing) — even though the operator DID unblock both gates and the builds DID succeed.

Script falls through to echo "Successfully published release images for v1.2.3" and exits 0.

Buildkite reports the publish step green. vllm/vllm-openai-cpu:latest and vllm/vllm-openai-cpu:v${RELEASE_VERSION} on Docker Hub remain pointing at whatever the previous release published. docker pull vllm/vllm-openai-cpu:latest silently returns the prior release's image.

Why existing protections don't catch it

The PR's docker manifest inspect rewrite addresses the original review comment's partial-build observability concern (distinguishing "not built" from "pull failed"), and the new partial-build else branch (lines 169-176) correctly fails loudly when exactly one arch is available. But when both arches are present in ECR under a different tag (the v-prefix mismatch case), inspect returns "manifest unknown" for both, the elif at line 167 matches, and the script falls into the silent-skip path with a misleading WARNING.

upload-release-wheels-pypi.sh has a GIT_VERSION sanity check that would catch a wrong format for the wheel path (PURE_VERSION=${RELEASE_VERSION#v} line 31), but that runs in a separate step on a separate gate; the docker publish path has no equivalent check and produces no error signal.

Operator convention is vX.Y.Z per past releases, but conventions are not validation. The publish script's own sed 's/^v//' invites mixed input, and a one-character omission yields a silent stale release.

Impact

Bounded but real:

Only vllm/vllm-openai-cpu:latest and vllm/vllm-openai-cpu:v${RELEASE_VERSION} (multi-arch + per-arch tags) go silently stale.

Build is reported green, no paging signal, no diagnostic for the operator other than a WARNING that misdirects them ("ensure block step was unblocked" — but they did).

Users docker pull vllm/vllm-openai-cpu:latest get the previous release.

The previous manual flow had the same mismatch but a human running the commands would have seen the 404; the new automation makes it silent — a regression in observability vs. the manual flow it replaces.

How to fix

One-line, two viable options:

Mirror the sed on the build side, e.g. replace $(buildkite-agent meta-data get release-version) with $(buildkite-agent meta-data get release-version | sed 's/^v//') at lines 320, 322, 339, 341, and update the publish script to inspect :${RELEASE_VERSION} (no v prefix). Or strip the v on both sides and add it consistently in tag generation.

Validate the input format on the "Provide Release version here" step (e.g. require a v prefix or normalize via a hint), so the round-trip is well-defined.

Either fix closes the silent-skip path; verifiers were unanimous.

claude · 2026-05-05T23:57:34Z

+      - label: "Publish release images to DockerHub"
+        depends_on:
+          - block-publish-release-images
+        id: publish-release-images-dockerhub


🟡 The new publish-release-images-dockerhub label step (lines 460-474) depends on block-publish-release-images, which is gated with if: build.env("NIGHTLY") != "1" (line 458) and so is filtered out of nightly pipeline uploads. The label step itself carries neither a matching if: clause nor an allow_dependency_failure: true on its dependency, so its presence in nightly builds relies on Buildkite's transitive-filter semantics rather than the explicit pattern used 350 lines earlier in this same file at lines 107-115. Suggested fix (one line): add if: build.env("NIGHTLY") != "1" to the label step to mirror the block step's guard, making the intent self-documenting and not dependent on undocumented edge cases of depends_on referencing a filtered step.

Extended reasoning...

What the bug is

block-publish-release-images (line 441) is gated with if: build.env("NIGHTLY") != "1" on line 458. The dependent publish-release-images-dockerhub label step (lines 460-474) lists depends_on: - block-publish-release-images but has neither a matching if: guard nor allow_dependency_failure: true on the dependency. So during a nightly pipeline upload, the label step survives the upload but its only dependency does not.

Why this is a consistency gap

The same author/file has already encoded the right pattern for the analogous case 350 lines earlier:

block-build-release-images (line 107) is gated with if: build.env("NIGHTLY") != "1" (line 110).

The dependent build-release-images group (line 112) explicitly sets allow_dependency_failure: true (line 115).

That pattern fits that case (the build group should run in BOTH modes — in nightly the upstream block is missing, so allow_dependency_failure: true lets the group proceed). For the new publish step the intent is different: it should NOT run in nightly. The two valid encodings of that intent in this file are:

Add if: build.env("NIGHTLY") != "1" to the label step itself — mirrors the explicit guard used for the existing nightly-only Publish nightly multi-arch image to DockerHub steps (which use if: build.env("NIGHTLY") == "1").

Leave it to Buildkite's transitive filter on the missing dependency — what the PR currently does.

Step-by-step trace of nightly behavior

Nightly pipeline upload runs with NIGHTLY=1.

block-publish-release-images is filtered out by its if: clause and is not present in the uploaded pipeline.

publish-release-images-dockerhub is uploaded (no if: on it). Its sole dependency block-publish-release-images does not exist in this build.

Buildkite's resolution: a step whose depends_on references a non-existent step is marked broken/skipped (without allow_dependency_failure). The publish step is silently dropped.

In the happy path this is the intended outcome. The risk is that step (4) relies on Buildkite's behavior being stable across versions and matching the author's mental model — and unlike approach (1), there is nothing in the YAML that tells a future reader 'this step is intentionally nightly-gated'.

Addressing the refutation

A fellow verifier argues this is intentional behavior leveraging Buildkite's transitive-filter semantics, and that the lines 113-115 pattern is not analogous because the build group must run in both modes whereas the publish step must not. That refutation is correct on intent — the two cases ARE different, and the refutation is right that simply pasting allow_dependency_failure: true here would be wrong (it would make publish run in nightly without any block gate).

But the refutation does not address the explicit-vs-implicit gap. The two existing patterns in this file (build-release-images group with allow_dependency_failure: true, and Publish nightly steps with if: NIGHTLY == "1") both make the gating visible at the step that runs. The new publish step is the first place in this file where the gating is only on the upstream block. That is the consistency gap, and it is also why a one-line if: build.env("NIGHTLY") != "1" on the label step is the cleanest fix — it doesn't change runtime behavior, it just documents the intent in the right place.

Impact

Bounded — in the most likely case the existing pattern works (Buildkite skips the publish step in nightly as intended). The downside is reduced readability and a small risk if Buildkite's filtered-dependency semantics ever change or differ across pipeline-upload code paths. Severity: nit.

Fix

Add the matching guard on the label step:

- label: "Publish release images to DockerHub" depends_on: - block-publish-release-images id: publish-release-images-dockerhub if: build.env("NIGHTLY") != "1" # <-- add this agents: queue: small_cpu_queue_release ...

This makes the nightly-skip intent explicit at the step that runs and matches the file's own convention of guarding nightly-conditional steps with their own if: clause.

claude · 2026-05-06T00:30:25Z

 python3 -m venv /tmp/vllm-release-env
 source /tmp/vllm-release-env/bin/activate
 pip install twine
+pip install -r requirements/build/cuda.txt


🔴 Installing requirements/build/cuda.txt (line 46) is intended to make python setup.py sdist (line 59) succeed, but the script still cannot reach a green PyPI upload: (1) cuda.txt pins CUDA-built torch==2.11.0, so setup.py auto-detects VLLM_TARGET_DEVICE=cuda and get_vllm_version() calls get_nvcc_cuda_version() at setup.py:916, which asserts CUDA_HOME is not None (setup.py:888) — small_cpu_queue_release has no CUDA toolkit, so sdist exits before producing the .tar.gz; the sdist-skip guard at setup.py:920 sits after the assertion, so it does not help. (2) Even if sdist succeeded, lines 66/72-73 are broken: PYPI_WHEEL_FILES=$(find …) returns the two default-variant wheels (x86_64 + aarch64) newline-separated, and twine check "$PYPI_WHEEL_FILES" … quotes them as a single argument with an embedded newline, which twine treats as one nonexistent path. Fix: set VLLM_USE_PRECOMPILED=1 (or VLLM_TARGET_DEVICE=empty) on the sdist invocation, and switch the twine call to a bash array (mapfile -t PYPI_WHEEL_FILES < <(find …); twine check "${PYPI_WHEEL_FILES[@]}" "$SDIST_FILE").

Extended reasoning...

What the bug is

This PR adds pip install -r requirements/build/cuda.txt at line 46 specifically to make the python setup.py sdist invocation on line 59 succeed (previously it failed at module-level import torch in setup.py and aborted under set -e). The intended outcome — a working PyPI upload — is still not achieved, because that one-line change activates two independent downstream failures, both of which would fire the first time block-upload-release-wheels is unblocked.

Failure mode A — setup.py sdist aborts on a CPU-only agent

After the PR, sdist gets past import torch and begins module-level execution. With requirements/build/cuda.txt pinning the default PyPI torch==2.11.0 (CUDA-built wheel, torch.version.cuda is set), setup.py auto-detection at lines 53-64 sets VLLM_TARGET_DEVICE = "cuda". The module-level setup(version=get_vllm_version(), ...) call then runs get_vllm_version() (setup.py:898). _is_cuda() is true; VLLM_USE_PRECOMPILED is unset; the function falls through to setup.py:916 → str(get_nvcc_cuda_version()), whose first line (setup.py:888) is assert CUDA_HOME is not None, "CUDA_HOME is not set".

The agent that runs upload-release-wheels-pypi.sh is small_cpu_queue_release (release-pipeline.yaml). The queue name and the absence of any CUDA-toolkit installation step mean CUDA_HOME resolves to None (no env var, no /usr/local/cuda, no nvcc). The assertion raises, sdist exits non-zero, and set -e aborts the script at line 59 — before the twine call ever runs. Critically, the sdist-skip guard at setup.py:920 (if "sdist" not in sys.argv: ...) sits after the failing call at line 916, so it does not protect this path.

Failure mode B — newline-quoted wheel paths

If failure mode A is fixed, the script reaches lines 66-73:

PYPI_WHEEL_FILES=$(find $DIST_DIR -name "vllm-${PURE_VERSION}*.whl" -not -name "*+*") ... python3 -m twine check "$PYPI_WHEEL_FILES" "$SDIST_FILE" python3 -m twine upload --non-interactive --verbose "$PYPI_WHEEL_FILES" "$SDIST_FILE"

The vLLM release pipeline produces two default-variant wheels (x86_64 + aarch64; the +cu129/+cpu variants are filtered by -not -name "*+*"). find outputs them newline-separated. Quoting "$PYPI_WHEEL_FILES" passes both paths to twine as a single argument with an embedded newline, which twine treats as one filename and fails opening.

Step-by-step proof of failure mode A

Operator unblocks block-upload-release-wheels. The script runs on small_cpu_queue_release.

Line 46 installs torch==2.11.0 (CUDA-built); torch.version.cuda is set.

Line 59 runs python setup.py sdist. setup.py:53-64 sets VLLM_TARGET_DEVICE = "cuda".

Module-level setup(version=get_vllm_version(), ...) at setup.py:1085 runs get_vllm_version().

setup.py:912 _is_cuda() is True; VLLM_USE_PRECOMPILED is unset → falls to else at setup.py:916.

get_nvcc_cuda_version() runs; setup.py:888 assert CUDA_HOME is not None raises AssertionError on the CPU-only agent.

sdist exits non-zero, set -e aborts the script. PyPI upload is blocked. The script never reaches the twine call, so failure mode B does not even fire — it is shadowed.

Step-by-step proof of failure mode B (assuming A is fixed)

find returns two paths: /tmp/.../vllm-${VER}-cp38-abi3-manylinux_2_35_x86_64.whl and ..._aarch64.whl.

PYPI_WHEEL_FILES is set to those two paths joined by \n.

twine check "$PYPI_WHEEL_FILES" "$SDIST_FILE" invokes twine with argv[1] containing both paths separated by a literal newline.

twine treats argv[1] as a single filename, calls open(), and fails with no-such-file.

Verified empirically: X=$(printf "a\nb"); set -- "$X"; echo $# prints 1 — quoting flattens the newline-separated list to one argument. Without quoting, IFS-based word-splitting yields two arguments.

Why existing code does not prevent this

The sdist-skip guard at setup.py:920 only suppresses the +cuXXX suffix on the version string; it appears after the get_nvcc_cuda_version() call on line 916, so the assertion has already triggered. The guard cannot help.

The PR test plan exercises the docker publish path (Verify the publish script…, Verify the block step appears…) but does not exercise upload-release-wheels-pypi.sh end to end. The wheel upload is gated behind a manual block (block-upload-release-wheels), so the PR can land green without either failure being observed.

The script was added in commit ae3b4de on 2026-05-02 with the broken twine quoting already present, but pre-PR the script aborted at line 59 (ImportError) under set -e, so the twine call has never executed. This PR is the first commit that brings either failure into the live code path.

How to fix

Two minimal changes:

Set VLLM_USE_PRECOMPILED=1 (or VLLM_TARGET_DEVICE=empty) on the sdist invocation at line 59, so _is_cuda() takes the precompiled branch at setup.py:913 and skips get_nvcc_cuda_version(). Equivalent: install a CPU-only torch (pip install torch --index-url https://download.pytorch.org/whl/cpu) so torch.version.cuda is None and _is_cuda() returns False during version detection.

Use a bash array for the twine call:

mapfile -t PYPI_WHEEL_FILES < <(find "$DIST_DIR" -name "vllm-${PURE_VERSION}*.whl" -not -name "*+*") if [[ ${#PYPI_WHEEL_FILES[@]} -eq 0 ]]; then echo "No default variant wheels found, quitting..."; exit 1 fi python3 -m twine check "${PYPI_WHEEL_FILES[@]}" "$SDIST_FILE" python3 -m twine upload --non-interactive --verbose "${PYPI_WHEEL_FILES[@]}" "$SDIST_FILE"

Either fix alone leaves the other failure live, so both must land for the PyPI upload step to succeed.

Signed-off-by: khluu <khluu000@gmail.com>

Signed-off-by: khluu <khluu000@gmail.com> Co-authored-by: hongbolv <33214277+hongbolv@users.noreply.github.com>

Signed-off-by: khluu <khluu000@gmail.com> Signed-off-by: Ifta Khairul Alam Adil <ikaadil007@gmail.com>

Signed-off-by: khluu <khluu000@gmail.com> (cherry picked from commit 1333864)

Signed-off-by: khluu <khluu000@gmail.com> Signed-off-by: Libin Tang <libin.tang@intel.com>

Replaces manual docker publish instructions in annotate-release.sh with a unified publish-release-images.sh script and corresponding Buildkite pipeline steps. The new script handles all image variants (CUDA 12.9, CUDA 13.0, Ubuntu 24.04, ROCm, and CPU), automating the pull, tag, push, and multi-arch manifest creation for Docker Hub releases. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> Signed-off-by: khluu <khluu000@gmail.com>

Signed-off-by: khluu <khluu000@gmail.com>

Replaces manual docker publish instructions in annotate-release.sh with a unified publish-release-images.sh script and corresponding Buildkite pipeline steps. The new script handles all image variants (CUDA 12.9, CUDA 13.0, Ubuntu 24.04, ROCm, and CPU), automating the pull, tag, push, and multi-arch manifest creation for Docker Hub releases. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> Signed-off-by: khluu <khluu000@gmail.com>

Signed-off-by: khluu <khluu000@gmail.com>

claude Bot reviewed Apr 20, 2026

View reviewed changes

mergify Bot added the ci/build label Apr 20, 2026

gemini-code-assist Bot reviewed Apr 20, 2026

View reviewed changes

khluu changed the title ~~[CI] Add ubuntu2404 images to release annotate script~~ [CI] Automate Docker Hub release image publishing Apr 21, 2026

khluu force-pushed the ci/add-ubuntu2404-annotate-release branch from e444006 to 8f4b74c Compare April 22, 2026 23:16

khluu added a commit that referenced this pull request Apr 23, 2026

Revert "[CI] Automate Docker Hub release image publishing (#40415)"

f732441

This reverts commit 579602a.

khluu force-pushed the ci/add-ubuntu2404-annotate-release branch from 8f4b74c to 8e520ee Compare May 5, 2026 05:18

simon-mo approved these changes May 5, 2026

View reviewed changes

simon-mo added this to the v0.20.2 milestone May 5, 2026

claude Bot reviewed May 5, 2026

View reviewed changes

Comment thread .buildkite/scripts/publish-release-images.sh Outdated

Comment thread .buildkite/release-pipeline.yaml

Comment thread .buildkite/scripts/publish-release-images.sh

claude Bot reviewed May 5, 2026

View reviewed changes

Comment thread .buildkite/scripts/publish-release-images.sh Outdated

claude Bot reviewed May 5, 2026

View reviewed changes

khluu commented May 5, 2026

View reviewed changes

khluu added 2 commits May 5, 2026 16:43

khluu enabled auto-merge (squash) May 5, 2026 23:46

github-actions Bot added the ready ONLY add when PR is ready to merge/full CI is needed label May 5, 2026

Merge branch 'main' into ci/add-ubuntu2404-annotate-release

d48a7c4

claude Bot reviewed May 5, 2026

View reviewed changes

khluu merged commit 1333864 into main May 6, 2026
17 checks passed

khluu deleted the ci/add-ubuntu2404-annotate-release branch May 6, 2026 00:15

claude Bot reviewed May 6, 2026

View reviewed changes

chaojun-zhang pushed a commit to chaojun-zhang/vllm that referenced this pull request May 6, 2026

[CI] Automate Docker Hub release image publishing (vllm-project#40415)

25e704b

Signed-off-by: khluu <khluu000@gmail.com>

ikaadil pushed a commit to ikaadil/vllm that referenced this pull request May 7, 2026

[CI] Automate Docker Hub release image publishing (vllm-project#40415)

7efaab9

Signed-off-by: khluu <khluu000@gmail.com> Signed-off-by: Ifta Khairul Alam Adil <ikaadil007@gmail.com>

khluu added a commit that referenced this pull request May 8, 2026

[CI] Automate Docker Hub release image publishing (#40415)

bc150f5

Signed-off-by: khluu <khluu000@gmail.com> (cherry picked from commit 1333864)

libinta pushed a commit to libinta/vllm that referenced this pull request May 8, 2026

[CI] Automate Docker Hub release image publishing (vllm-project#40415)

13fa3c2

Signed-off-by: khluu <khluu000@gmail.com> Signed-off-by: Libin Tang <libin.tang@intel.com>

weifang231 pushed a commit to weifang231/eb-vllm that referenced this pull request May 13, 2026

[CI] Automate Docker Hub release image publishing (vllm-project#40415)

08dabba

Signed-off-by: khluu <khluu000@gmail.com>

mfylcek pushed a commit to mfylcek/vllm that referenced this pull request May 19, 2026

[CI] Automate Docker Hub release image publishing (vllm-project#40415)

c9fb85a

Signed-off-by: khluu <khluu000@gmail.com>

jhu960213 pushed a commit to jhu960213/vllm that referenced this pull request May 20, 2026

[CI] Automate Docker Hub release image publishing (vllm-project#40415)

099ea5a

Signed-off-by: khluu <khluu000@gmail.com>

Uh oh!

Conversation

khluu commented Apr 20, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

Details

Test plan

Uh oh!

claude Bot left a comment

Choose a reason for hiding this comment

Claude Code Review

Uh oh!

gemini-code-assist Bot left a comment

Choose a reason for hiding this comment

Code Review

Uh oh!

simon-mo commented May 5, 2026

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

claude Bot left a comment

Choose a reason for hiding this comment

Overview

Security risks

Level of scrutiny

Other factors

Uh oh!

khluu May 5, 2026

Choose a reason for hiding this comment

Uh oh!

khluu May 5, 2026

Choose a reason for hiding this comment

Uh oh!

khluu May 5, 2026

Choose a reason for hiding this comment

Uh oh!

khluu May 5, 2026

Choose a reason for hiding this comment

Uh oh!

khluu May 5, 2026

Choose a reason for hiding this comment

Uh oh!

khluu May 5, 2026

Choose a reason for hiding this comment

Uh oh!

claude Bot May 5, 2026

Choose a reason for hiding this comment

What goes wrong

Step-by-step proof

Why existing protections don't catch it

Impact

How to fix

Uh oh!

claude Bot May 5, 2026

Choose a reason for hiding this comment

What the bug is

Why this is a consistency gap

Step-by-step trace of nightly behavior

Addressing the refutation

Impact

Fix

Uh oh!

Uh oh!

claude Bot May 6, 2026

Choose a reason for hiding this comment

What the bug is

Failure mode A — setup.py sdist aborts on a CPU-only agent

Failure mode B — newline-quoted wheel paths

Step-by-step proof of failure mode A

Step-by-step proof of failure mode B (assuming A is fixed)

Why existing code does not prevent this

How to fix

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

khluu commented Apr 20, 2026 •

edited

Loading

Failure mode A — `setup.py sdist` aborts on a CPU-only agent