Skip to content

[CI] Automate Docker Hub release image publishing#40415

Merged
khluu merged 6 commits into
mainfrom
ci/add-ubuntu2404-annotate-release
May 6, 2026
Merged

[CI] Automate Docker Hub release image publishing#40415
khluu merged 6 commits into
mainfrom
ci/add-ubuntu2404-annotate-release

Conversation

@khluu
Copy link
Copy Markdown
Member

@khluu khluu commented Apr 20, 2026

Summary

  • Replaces manual docker tag/push/manifest commands from the release annotation script with an automated Buildkite pipeline step
  • Adds a new publish-release-images.sh script that handles all Docker Hub publishing: CUDA 13.0 (default), CUDA 12.9, Ubuntu 24.04 (both CUDA versions), ROCm, and CPU
  • Adds the previously missing Ubuntu 24.04 images to the release flow
  • Fixes missing || true on docker manifest rm and missing versioned manifest cleanup

Details

Previously, the annotate-release.sh script printed manual docker commands that a human had to copy-paste to publish release images to Docker Hub. This was error-prone — Ubuntu 24.04 images were missing entirely, and docker manifest rm commands would fail on first-time releases without || true.

This PR:

  1. New script .buildkite/scripts/publish-release-images.sh — pulls images from ECR, tags with latest and v$VERSION, pushes individual arch tags, creates multi-arch manifests for all variants (CUDA 13.0, cu129, ubuntu2404, cu129-ubuntu2404, ROCm, CPU)
  2. New pipeline step with a block gate in release-pipeline.yaml — requires manual unblock before publishing, depends on all build/manifest steps completing, only runs for non-nightly builds
  3. Simplified annotate script — now only shows wheel download commands, with a note that docker images are published automatically
  4. CPU images handled gracefully — skipped with a warning if their separate block steps weren't unblocked

Rebased on current main (was previously based on older codebase with cu130 naming).

No other open PR addresses this. AI assistance (Claude) was used in preparing this PR.

Test plan

  • Verify the publish script correctly tags and pushes all image variants by triggering a release pipeline run
  • Verify the block step appears in Buildkite with correct dependencies
  • Verify the annotate script still renders wheel download info correctly
  • Confirm CPU skip logic works when CPU block steps are not unblocked

🤖 Generated with Claude Code

Copy link
Copy Markdown

@claude claude Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Claude Code Review

This repository is configured for manual code reviews. Comment @claude review to trigger a review and subscribe this PR to future pushes, or @claude review once for a one-time review.

Tip: disable this comment in your organization's Code Review settings.

@mergify mergify Bot added the ci/build label Apr 20, 2026
Copy link
Copy Markdown
Contributor

@gemini-code-assist gemini-code-assist Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request updates the release script to include support for Ubuntu 24.04 Docker images. It adds commands to pull, tag, push, and create multi-architecture manifests for both standard and CUDA 13.0 variants across x86_64 and aarch64 architectures. I have no feedback to provide.

@khluu khluu changed the title [CI] Add ubuntu2404 images to release annotate script [CI] Automate Docker Hub release image publishing Apr 21, 2026
khluu added a commit that referenced this pull request Apr 22, 2026
Replaces manual docker publish instructions in annotate-release.sh with
a unified publish-release-images.sh script and corresponding Buildkite
pipeline steps. The new script handles all image variants (CUDA 12.9,
CUDA 13.0, Ubuntu 24.04, ROCm, and CPU), automating the pull, tag,
push, and multi-arch manifest creation for Docker Hub releases.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

Signed-off-by: khluu <khluu000@gmail.com>
@khluu khluu force-pushed the ci/add-ubuntu2404-annotate-release branch from e444006 to 8f4b74c Compare April 22, 2026 23:16
khluu added a commit that referenced this pull request Apr 22, 2026
Replaces manual docker publish instructions in annotate-release.sh with
a unified publish-release-images.sh script and corresponding Buildkite
pipeline steps. The new script handles all image variants (CUDA 12.9,
CUDA 13.0, Ubuntu 24.04, ROCm, and CPU), automating the pull, tag,
push, and multi-arch manifest creation for Docker Hub releases.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

Signed-off-by: khluu <khluu000@gmail.com>
khluu added a commit that referenced this pull request Apr 23, 2026
Move docker tag/push/manifest logic from the annotation script into a
new publish-release-images.sh that runs as an automated pipeline step.
Adds all image variants including Ubuntu 24.04, CUDA 12.9, ROCm, and
CPU with graceful fallback for CPU images behind their own block steps.

Co-authored-by: Claude
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

Signed-off-by: khluu <khluu000@gmail.com>
@khluu khluu force-pushed the ci/add-ubuntu2404-annotate-release branch from 8f4b74c to 8e520ee Compare May 5, 2026 05:18
@simon-mo
Copy link
Copy Markdown
Collaborator

simon-mo commented May 5, 2026

@claude review

@simon-mo simon-mo added this to the v0.20.2 milestone May 5, 2026
Comment thread .buildkite/scripts/publish-release-images.sh Outdated
Comment thread .buildkite/release-pipeline.yaml
Comment thread .buildkite/scripts/publish-release-images.sh
- Use `docker manifest inspect` to distinguish a missing CPU image (skip
  with warning) from a real pull failure (fail loudly under set -e),
  instead of `if docker pull ... 2>/dev/null; then` which masked all
  non-zero exits as "image not built".
- Add the CPU build steps to `block-publish-release-images.depends_on`
  with `allow_failure: true`, so publish doesn't race an in-progress
  CPU build but still proceeds when the operator legitimately leaves
  the CPU block steps unblocked.
- Normalize the `v` prefix on the CPU build-side tags too, so the
  round-trip with the publish script is robust regardless of whether
  the operator types `1.2.3` or `v1.2.3` into the input field.
- Use `buildkite-agent meta-data get --default ""` so the explicit
  empty-check is reachable under `set -euo pipefail` (otherwise
  pipefail aborts before the check).

Signed-off-by: Kevin H. Luu <kevin@anyscale.com>
Co-authored-by: Claude <noreply@anthropic.com>

Signed-off-by: khluu <khluu000@gmail.com>
Comment thread .buildkite/scripts/publish-release-images.sh Outdated
- publish-release-images.sh: restructure CPU section to inspect both
  arches up front, then either publish everything (per-arch tags +
  multi-arch manifest) or fail loudly. The previous structure pushed
  per-arch tags independently of the multi-arch gate, so a partial
  state (one arch fails under publish gate's `allow_failure: true`)
  would leave `:latest-x86_64` pointing at the new release while the
  `:latest` multi-arch manifest still resolved to the previous release.
- release-pipeline.yaml: revert the build-side `| sed 's/^v//'`
  normalization. `buildkite-agent meta-data get release-version` always
  returns the value with the `v` prefix already, so the strip-then-
  re-prepend was a no-op and added noise.

Signed-off-by: Kevin H. Luu <kevin@anyscale.com>
Co-authored-by: Claude <noreply@anthropic.com>

Signed-off-by: khluu <khluu000@gmail.com>
Copy link
Copy Markdown

@claude claude Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

All prior review feedback addressed: all-or-nothing CPU publish with explicit fail-loud on partial state, manifest-inspect probe instead of silent pull-skip, race window closed via CPU build deps with allow_failure, and metadata fetch hardened with --default.

Extended reasoning...

Overview

This PR is purely CI/build infrastructure: it replaces the manual copy-paste docker tag/push commands previously printed by annotate-release.sh with an automated Buildkite step that runs a new publish-release-images.sh. It also adds key: IDs to two CPU build steps and adds a new block-gated publish step. No runtime/production code paths are touched.

Security risks

None. The script runs in the existing release pipeline with the same DockerHub credentials and ECR auth that the manual flow relied on. Inputs are operator-supplied via Buildkite metadata; tag construction uses shell variables in trusted-context strings (no untrusted user input is interpolated into shell evaluation contexts that didn't already exist in the prior manual workflow).

Level of scrutiny

Lower than production code: this is release-time tooling guarded by manual block steps and is not in the request-serving path. The blast radius of bugs is limited to the Docker Hub release artifacts, and operators can re-run the publish step idempotently. That said, I scrutinized the prior revisions because silent failures here can ship stale images — and those concerns have all been resolved in the current commits.

Other factors

All three prior review items are now visibly addressed in the diff: (1) line 11 uses buildkite-agent meta-data get release-version --default "" so the explicit empty-check on line 12 fires as intended; (2) the CPU section uses docker manifest inspect first to distinguish 'not built' from 'pull failed', then enforces all-or-nothing — both arches publish together or neither does, with a non-zero exit on the partial state to prevent split-tag drift between :latest-x86_64 and the multi-arch :latest; (3) the publish block now lists the CPU build steps in depends_on with allow_failure: true, closing the race window where publish could run before the in-progress CPU build pushed to ECR. Recent commits 8f214e0 and 2b83d0c match this exactly. Straightforward CI change worth approving.

Comment thread .buildkite/release-pipeline.yaml Outdated
- block: "Publish release images to DockerHub"
key: block-publish-release-images
depends_on:
- annotate-release-workflow
Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

it doesn't need to wait for release workflow annotation

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done — dropped annotate-release-workflow from the depends_on. The annotate step posts wheel download info and isn't a real publish prerequisite.

Comment on lines +445 to +447
- create-multi-arch-manifest-cuda-12-9
- create-multi-arch-manifest-ubuntu2404
- create-multi-arch-manifest-cuda-12-9-ubuntu2404
Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

does it not need to wait on create-multi-arch-manifest?

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good catch — added create-multi-arch-manifest to the depends_on. It wasn't strictly needed because the publish script pulls per-arch images directly (not the multi-arch manifest), but with annotate-release-workflow removed we lose the transitive wait on the CUDA 13.0 builds. Adding it explicitly also makes the depends_on symmetric with the other variants (-cuda-12-9, -ubuntu2404, -cuda-12-9-ubuntu2404).

Comment thread .buildkite/release-pipeline.yaml Outdated
- label: "Publish release images to DockerHub"
depends_on:
- block-publish-release-images
id: publish-release-images-dockerhub
Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

use key instead of id

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done — switched to key:.

khluu added 2 commits May 5, 2026 16:43
The PyPI release script invokes `python setup.py sdist` in a fresh venv
that only had `twine` installed, but setup.py imports torch,
setuptools_scm, packaging, and setuptools at parse time. Install
requirements/build/cuda.txt so those imports resolve.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

Signed-off-by: khluu <khluu000@gmail.com>
- block-publish-release-images: replace `annotate-release-workflow`
  with `create-multi-arch-manifest`. The annotate step posts wheel
  download info and isn't a real prerequisite for image publishing;
  swapping in the CUDA 13.0 multi-arch manifest step preserves the
  transitive wait on the CUDA 13.0 builds and matches the other
  variants already listed.
- Use `key:` instead of `id:` on the publish step to match the
  canonical Buildkite identifier and the new CPU build step keys.

Signed-off-by: Kevin H. Luu <kevin@anyscale.com>
Co-authored-by: Claude <noreply@anthropic.com>

Signed-off-by: khluu <khluu000@gmail.com>
@khluu khluu enabled auto-merge (squash) May 5, 2026 23:46
@github-actions github-actions Bot added the ready ONLY add when PR is ready to merge/full CI is needed label May 5, 2026
Comment on lines +140 to +167
CPU_X86_TAG=public.ecr.aws/q9t5s3a7/vllm-cpu-release-repo:v${RELEASE_VERSION}
CPU_ARM_TAG=public.ecr.aws/q9t5s3a7/vllm-arm64-cpu-release-repo:v${RELEASE_VERSION}

CPU_X86_AVAILABLE=false
CPU_ARM_AVAILABLE=false
docker manifest inspect "${CPU_X86_TAG}" >/dev/null 2>&1 && CPU_X86_AVAILABLE=true
docker manifest inspect "${CPU_ARM_TAG}" >/dev/null 2>&1 && CPU_ARM_AVAILABLE=true

if [ "$CPU_X86_AVAILABLE" = "true" ] && [ "$CPU_ARM_AVAILABLE" = "true" ]; then
docker pull "${CPU_X86_TAG}"
docker tag "${CPU_X86_TAG}" vllm/vllm-openai-cpu:latest-x86_64
docker tag "${CPU_X86_TAG}" vllm/vllm-openai-cpu:v${RELEASE_VERSION}-x86_64
docker push vllm/vllm-openai-cpu:latest-x86_64
docker push vllm/vllm-openai-cpu:v${RELEASE_VERSION}-x86_64

docker pull "${CPU_ARM_TAG}"
docker tag "${CPU_ARM_TAG}" vllm/vllm-openai-cpu:latest-arm64
docker tag "${CPU_ARM_TAG}" vllm/vllm-openai-cpu:v${RELEASE_VERSION}-arm64
docker push vllm/vllm-openai-cpu:latest-arm64
docker push vllm/vllm-openai-cpu:v${RELEASE_VERSION}-arm64

docker manifest rm vllm/vllm-openai-cpu:latest || true
docker manifest rm vllm/vllm-openai-cpu:v${RELEASE_VERSION} || true
docker manifest create vllm/vllm-openai-cpu:latest vllm/vllm-openai-cpu:latest-x86_64 vllm/vllm-openai-cpu:latest-arm64
docker manifest create vllm/vllm-openai-cpu:v${RELEASE_VERSION} vllm/vllm-openai-cpu:v${RELEASE_VERSION}-x86_64 vllm/vllm-openai-cpu:v${RELEASE_VERSION}-arm64
docker manifest push vllm/vllm-openai-cpu:latest
docker manifest push vllm/vllm-openai-cpu:v${RELEASE_VERSION}
elif [ "$CPU_X86_AVAILABLE" = "false" ] && [ "$CPU_ARM_AVAILABLE" = "false" ]; then
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🔴 On the build side (release-pipeline.yaml lines 320, 322, 339, 341), CPU images are pushed to ECR using the raw $(buildkite-agent meta-data get release-version), but publish-release-images.sh strips a leading v at line 11 and re-prepends v at lines 140-141 — so if an operator types 1.2.3 (no v) into the unvalidated input field, the build pushes :1.2.3 while publish looks for :v1.2.3. Both docker manifest inspect calls 404, the script falls into the elif at line 167 ("Neither CPU image found"), prints a misleading WARNING ("ensure block step was unblocked" — they did) and exits 0 green, leaving vllm/vllm-openai-cpu:latest and :v${RELEASE_VERSION} stale from the previous release. The PR added docker manifest inspect (closing the partial-build observability gap raised in the resolved review comment) but did NOT close this v-prefix root cause; fix is one line — mirror the sed s/^v// on the build side (or strip on both, or validate the input field).

Extended reasoning...

What goes wrong

publish-release-images.sh and release-pipeline.yaml disagree on whether the operator-supplied release version carries a leading v:

  • Publish side (publish-release-images.sh):
    • Line 11: RELEASE_VERSION=$(buildkite-agent meta-data get release-version --default "" | sed 's/^v//') — strips a leading v.
    • Lines 140-141: CPU_X86_TAG=public.ecr.aws/q9t5s3a7/vllm-cpu-release-repo:v${RELEASE_VERSION} and CPU_ARM_TAG=...:v${RELEASE_VERSION} — re-prepends v.
    • The publish script is therefore tolerant of either input format (1.2.3 or v1.2.3) and always inspects :v1.2.3.
  • Build side (release-pipeline.yaml):
    • Line 320 (x86 build): --tag public.ecr.aws/q9t5s3a7/vllm-cpu-release-repo:$(buildkite-agent meta-data get release-version)
    • Line 322 (x86 push): docker push public.ecr.aws/q9t5s3a7/vllm-cpu-release-repo:$(buildkite-agent meta-data get release-version)
    • Line 339 (arm64 build) / 341 (arm64 push): identical pattern with vllm-arm64-cpu-release-repo.
    • Builds use the raw operator input — no v normalization. Builds push :1.2.3 if the operator typed 1.2.3, or :v1.2.3 if they typed v1.2.3.

The "Provide Release version here" input step has no validation or hint about the expected format, and the publish script's own sed signals that mixed input is acceptable. So the formats round-trip inconsistently.

Step-by-step proof

  1. Release manager unblocks block-cpu-release-image-build and block-arm64-cpu-release-image-build, and types 1.2.3 (no v) into the unvalidated input field.
  2. CPU build steps push public.ecr.aws/q9t5s3a7/vllm-cpu-release-repo:1.2.3 and vllm-arm64-cpu-release-repo:1.2.3 to ECR (line 322 / 341).
  3. publish-release-images.sh runs. Line 11 sets RELEASE_VERSION=1.2.3 (sed strips nothing because there's no leading v). Lines 140-141 set CPU_X86_TAG=...:v1.2.3 and CPU_ARM_TAG=...:v1.2.3.
  4. Lines 145-146: docker manifest inspect "${CPU_X86_TAG}" returns "manifest unknown" (the tag pushed was :1.2.3, not :v1.2.3). CPU_X86_AVAILABLE stays false. Same for arm64.
  5. Line 148 guard CPU_X86_AVAILABLE=true && CPU_ARM_AVAILABLE=true evaluates false; line 167 elif matches (both false); script prints WARNING: Neither CPU image found in ECR, skipping CPU publish (ensure block-cpu-release-image-build and block-arm64-cpu-release-image-build were unblocked and the builds finished pushing) — even though the operator DID unblock both gates and the builds DID succeed.
  6. Script falls through to echo "Successfully published release images for v1.2.3" and exits 0.
  7. Buildkite reports the publish step green. vllm/vllm-openai-cpu:latest and vllm/vllm-openai-cpu:v${RELEASE_VERSION} on Docker Hub remain pointing at whatever the previous release published. docker pull vllm/vllm-openai-cpu:latest silently returns the prior release's image.

Why existing protections don't catch it

  • The PR's docker manifest inspect rewrite addresses the original review comment's partial-build observability concern (distinguishing "not built" from "pull failed"), and the new partial-build else branch (lines 169-176) correctly fails loudly when exactly one arch is available. But when both arches are present in ECR under a different tag (the v-prefix mismatch case), inspect returns "manifest unknown" for both, the elif at line 167 matches, and the script falls into the silent-skip path with a misleading WARNING.
  • upload-release-wheels-pypi.sh has a GIT_VERSION sanity check that would catch a wrong format for the wheel path (PURE_VERSION=${RELEASE_VERSION#v} line 31), but that runs in a separate step on a separate gate; the docker publish path has no equivalent check and produces no error signal.
  • Operator convention is vX.Y.Z per past releases, but conventions are not validation. The publish script's own sed 's/^v//' invites mixed input, and a one-character omission yields a silent stale release.

Impact

Bounded but real:

  • Only vllm/vllm-openai-cpu:latest and vllm/vllm-openai-cpu:v${RELEASE_VERSION} (multi-arch + per-arch tags) go silently stale.
  • Build is reported green, no paging signal, no diagnostic for the operator other than a WARNING that misdirects them ("ensure block step was unblocked" — but they did).
  • Users docker pull vllm/vllm-openai-cpu:latest get the previous release.
  • The previous manual flow had the same mismatch but a human running the commands would have seen the 404; the new automation makes it silent — a regression in observability vs. the manual flow it replaces.

How to fix

One-line, two viable options:

  1. Mirror the sed on the build side, e.g. replace $(buildkite-agent meta-data get release-version) with $(buildkite-agent meta-data get release-version | sed 's/^v//') at lines 320, 322, 339, 341, and update the publish script to inspect :${RELEASE_VERSION} (no v prefix). Or strip the v on both sides and add it consistently in tag generation.
  2. Validate the input format on the "Provide Release version here" step (e.g. require a v prefix or normalize via a hint), so the round-trip is well-defined.

Either fix closes the silent-skip path; verifiers were unanimous.

Comment thread .buildkite/release-pipeline.yaml Outdated
Comment on lines +460 to +463
- label: "Publish release images to DockerHub"
depends_on:
- block-publish-release-images
id: publish-release-images-dockerhub
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🟡 The new publish-release-images-dockerhub label step (lines 460-474) depends on block-publish-release-images, which is gated with if: build.env("NIGHTLY") != "1" (line 458) and so is filtered out of nightly pipeline uploads. The label step itself carries neither a matching if: clause nor an allow_dependency_failure: true on its dependency, so its presence in nightly builds relies on Buildkite's transitive-filter semantics rather than the explicit pattern used 350 lines earlier in this same file at lines 107-115. Suggested fix (one line): add if: build.env("NIGHTLY") != "1" to the label step to mirror the block step's guard, making the intent self-documenting and not dependent on undocumented edge cases of depends_on referencing a filtered step.

Extended reasoning...

What the bug is

block-publish-release-images (line 441) is gated with if: build.env("NIGHTLY") != "1" on line 458. The dependent publish-release-images-dockerhub label step (lines 460-474) lists depends_on: - block-publish-release-images but has neither a matching if: guard nor allow_dependency_failure: true on the dependency. So during a nightly pipeline upload, the label step survives the upload but its only dependency does not.

Why this is a consistency gap

The same author/file has already encoded the right pattern for the analogous case 350 lines earlier:

  • block-build-release-images (line 107) is gated with if: build.env("NIGHTLY") != "1" (line 110).
  • The dependent build-release-images group (line 112) explicitly sets allow_dependency_failure: true (line 115).

That pattern fits that case (the build group should run in BOTH modes — in nightly the upstream block is missing, so allow_dependency_failure: true lets the group proceed). For the new publish step the intent is different: it should NOT run in nightly. The two valid encodings of that intent in this file are:

  1. Add if: build.env("NIGHTLY") != "1" to the label step itself — mirrors the explicit guard used for the existing nightly-only Publish nightly multi-arch image to DockerHub steps (which use if: build.env("NIGHTLY") == "1").
  2. Leave it to Buildkite's transitive filter on the missing dependency — what the PR currently does.

Step-by-step trace of nightly behavior

  1. Nightly pipeline upload runs with NIGHTLY=1.
  2. block-publish-release-images is filtered out by its if: clause and is not present in the uploaded pipeline.
  3. publish-release-images-dockerhub is uploaded (no if: on it). Its sole dependency block-publish-release-images does not exist in this build.
  4. Buildkite's resolution: a step whose depends_on references a non-existent step is marked broken/skipped (without allow_dependency_failure). The publish step is silently dropped.

In the happy path this is the intended outcome. The risk is that step (4) relies on Buildkite's behavior being stable across versions and matching the author's mental model — and unlike approach (1), there is nothing in the YAML that tells a future reader 'this step is intentionally nightly-gated'.

Addressing the refutation

A fellow verifier argues this is intentional behavior leveraging Buildkite's transitive-filter semantics, and that the lines 113-115 pattern is not analogous because the build group must run in both modes whereas the publish step must not. That refutation is correct on intent — the two cases ARE different, and the refutation is right that simply pasting allow_dependency_failure: true here would be wrong (it would make publish run in nightly without any block gate).

But the refutation does not address the explicit-vs-implicit gap. The two existing patterns in this file (build-release-images group with allow_dependency_failure: true, and Publish nightly steps with if: NIGHTLY == "1") both make the gating visible at the step that runs. The new publish step is the first place in this file where the gating is only on the upstream block. That is the consistency gap, and it is also why a one-line if: build.env("NIGHTLY") != "1" on the label step is the cleanest fix — it doesn't change runtime behavior, it just documents the intent in the right place.

Impact

Bounded — in the most likely case the existing pattern works (Buildkite skips the publish step in nightly as intended). The downside is reduced readability and a small risk if Buildkite's filtered-dependency semantics ever change or differ across pipeline-upload code paths. Severity: nit.

Fix

Add the matching guard on the label step:

- label: "Publish release images to DockerHub"
  depends_on:
    - block-publish-release-images
  id: publish-release-images-dockerhub
  if: build.env("NIGHTLY") != "1"   # <-- add this
  agents:
    queue: small_cpu_queue_release
  ...

This makes the nightly-skip intent explicit at the step that runs and matches the file's own convention of guarding nightly-conditional steps with their own if: clause.

@khluu khluu merged commit 1333864 into main May 6, 2026
17 checks passed
@khluu khluu deleted the ci/add-ubuntu2404-annotate-release branch May 6, 2026 00:15
python3 -m venv /tmp/vllm-release-env
source /tmp/vllm-release-env/bin/activate
pip install twine
pip install -r requirements/build/cuda.txt
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🔴 Installing requirements/build/cuda.txt (line 46) is intended to make python setup.py sdist (line 59) succeed, but the script still cannot reach a green PyPI upload: (1) cuda.txt pins CUDA-built torch==2.11.0, so setup.py auto-detects VLLM_TARGET_DEVICE=cuda and get_vllm_version() calls get_nvcc_cuda_version() at setup.py:916, which asserts CUDA_HOME is not None (setup.py:888) — small_cpu_queue_release has no CUDA toolkit, so sdist exits before producing the .tar.gz; the sdist-skip guard at setup.py:920 sits after the assertion, so it does not help. (2) Even if sdist succeeded, lines 66/72-73 are broken: PYPI_WHEEL_FILES=$(find …) returns the two default-variant wheels (x86_64 + aarch64) newline-separated, and twine check "$PYPI_WHEEL_FILES" … quotes them as a single argument with an embedded newline, which twine treats as one nonexistent path. Fix: set VLLM_USE_PRECOMPILED=1 (or VLLM_TARGET_DEVICE=empty) on the sdist invocation, and switch the twine call to a bash array (mapfile -t PYPI_WHEEL_FILES < <(find …); twine check "${PYPI_WHEEL_FILES[@]}" "$SDIST_FILE").

Extended reasoning...

What the bug is

This PR adds pip install -r requirements/build/cuda.txt at line 46 specifically to make the python setup.py sdist invocation on line 59 succeed (previously it failed at module-level import torch in setup.py and aborted under set -e). The intended outcome — a working PyPI upload — is still not achieved, because that one-line change activates two independent downstream failures, both of which would fire the first time block-upload-release-wheels is unblocked.

Failure mode A — setup.py sdist aborts on a CPU-only agent

After the PR, sdist gets past import torch and begins module-level execution. With requirements/build/cuda.txt pinning the default PyPI torch==2.11.0 (CUDA-built wheel, torch.version.cuda is set), setup.py auto-detection at lines 53-64 sets VLLM_TARGET_DEVICE = "cuda". The module-level setup(version=get_vllm_version(), ...) call then runs get_vllm_version() (setup.py:898). _is_cuda() is true; VLLM_USE_PRECOMPILED is unset; the function falls through to setup.py:916 → str(get_nvcc_cuda_version()), whose first line (setup.py:888) is assert CUDA_HOME is not None, "CUDA_HOME is not set".

The agent that runs upload-release-wheels-pypi.sh is small_cpu_queue_release (release-pipeline.yaml). The queue name and the absence of any CUDA-toolkit installation step mean CUDA_HOME resolves to None (no env var, no /usr/local/cuda, no nvcc). The assertion raises, sdist exits non-zero, and set -e aborts the script at line 59 — before the twine call ever runs. Critically, the sdist-skip guard at setup.py:920 (if "sdist" not in sys.argv: ...) sits after the failing call at line 916, so it does not protect this path.

Failure mode B — newline-quoted wheel paths

If failure mode A is fixed, the script reaches lines 66-73:

PYPI_WHEEL_FILES=$(find $DIST_DIR -name "vllm-${PURE_VERSION}*.whl" -not -name "*+*")
...
python3 -m twine check "$PYPI_WHEEL_FILES" "$SDIST_FILE"
python3 -m twine upload --non-interactive --verbose "$PYPI_WHEEL_FILES" "$SDIST_FILE"

The vLLM release pipeline produces two default-variant wheels (x86_64 + aarch64; the +cu129/+cpu variants are filtered by -not -name "*+*"). find outputs them newline-separated. Quoting "$PYPI_WHEEL_FILES" passes both paths to twine as a single argument with an embedded newline, which twine treats as one filename and fails opening.

Step-by-step proof of failure mode A

  1. Operator unblocks block-upload-release-wheels. The script runs on small_cpu_queue_release.
  2. Line 46 installs torch==2.11.0 (CUDA-built); torch.version.cuda is set.
  3. Line 59 runs python setup.py sdist. setup.py:53-64 sets VLLM_TARGET_DEVICE = "cuda".
  4. Module-level setup(version=get_vllm_version(), ...) at setup.py:1085 runs get_vllm_version().
  5. setup.py:912 _is_cuda() is True; VLLM_USE_PRECOMPILED is unset → falls to else at setup.py:916.
  6. get_nvcc_cuda_version() runs; setup.py:888 assert CUDA_HOME is not None raises AssertionError on the CPU-only agent.
  7. sdist exits non-zero, set -e aborts the script. PyPI upload is blocked. The script never reaches the twine call, so failure mode B does not even fire — it is shadowed.

Step-by-step proof of failure mode B (assuming A is fixed)

  1. find returns two paths: /tmp/.../vllm-${VER}-cp38-abi3-manylinux_2_35_x86_64.whl and ..._aarch64.whl.
  2. PYPI_WHEEL_FILES is set to those two paths joined by \n.
  3. twine check "$PYPI_WHEEL_FILES" "$SDIST_FILE" invokes twine with argv[1] containing both paths separated by a literal newline.
  4. twine treats argv[1] as a single filename, calls open(), and fails with no-such-file.

Verified empirically: X=$(printf "a\nb"); set -- "$X"; echo $# prints 1 — quoting flattens the newline-separated list to one argument. Without quoting, IFS-based word-splitting yields two arguments.

Why existing code does not prevent this

  • The sdist-skip guard at setup.py:920 only suppresses the +cuXXX suffix on the version string; it appears after the get_nvcc_cuda_version() call on line 916, so the assertion has already triggered. The guard cannot help.
  • The PR test plan exercises the docker publish path (Verify the publish script…, Verify the block step appears…) but does not exercise upload-release-wheels-pypi.sh end to end. The wheel upload is gated behind a manual block (block-upload-release-wheels), so the PR can land green without either failure being observed.
  • The script was added in commit ae3b4de on 2026-05-02 with the broken twine quoting already present, but pre-PR the script aborted at line 59 (ImportError) under set -e, so the twine call has never executed. This PR is the first commit that brings either failure into the live code path.

How to fix

Two minimal changes:

  1. Set VLLM_USE_PRECOMPILED=1 (or VLLM_TARGET_DEVICE=empty) on the sdist invocation at line 59, so _is_cuda() takes the precompiled branch at setup.py:913 and skips get_nvcc_cuda_version(). Equivalent: install a CPU-only torch (pip install torch --index-url https://download.pytorch.org/whl/cpu) so torch.version.cuda is None and _is_cuda() returns False during version detection.
  2. Use a bash array for the twine call:
mapfile -t PYPI_WHEEL_FILES < <(find "$DIST_DIR" -name "vllm-${PURE_VERSION}*.whl" -not -name "*+*")
if [[ ${#PYPI_WHEEL_FILES[@]} -eq 0 ]]; then
  echo "No default variant wheels found, quitting..."; exit 1
fi
python3 -m twine check "${PYPI_WHEEL_FILES[@]}" "$SDIST_FILE"
python3 -m twine upload --non-interactive --verbose "${PYPI_WHEEL_FILES[@]}" "$SDIST_FILE"

Either fix alone leaves the other failure live, so both must land for the PyPI upload step to succeed.

chaojun-zhang pushed a commit to chaojun-zhang/vllm that referenced this pull request May 6, 2026
Copilot AI pushed a commit to hongbolv/vllm that referenced this pull request May 7, 2026
Signed-off-by: khluu <khluu000@gmail.com>
Co-authored-by: hongbolv <33214277+hongbolv@users.noreply.github.com>
ikaadil pushed a commit to ikaadil/vllm that referenced this pull request May 7, 2026
Signed-off-by: khluu <khluu000@gmail.com>
Signed-off-by: Ifta Khairul Alam Adil <ikaadil007@gmail.com>
khluu added a commit that referenced this pull request May 8, 2026
Signed-off-by: khluu <khluu000@gmail.com>
(cherry picked from commit 1333864)
libinta pushed a commit to libinta/vllm that referenced this pull request May 8, 2026
Signed-off-by: khluu <khluu000@gmail.com>
Signed-off-by: Libin Tang <libin.tang@intel.com>
mystous pushed a commit to mystous/vllm_hybrid that referenced this pull request May 10, 2026
Replaces manual docker publish instructions in annotate-release.sh with
a unified publish-release-images.sh script and corresponding Buildkite
pipeline steps. The new script handles all image variants (CUDA 12.9,
CUDA 13.0, Ubuntu 24.04, ROCm, and CPU), automating the pull, tag,
push, and multi-arch manifest creation for Docker Hub releases.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

Signed-off-by: khluu <khluu000@gmail.com>
weifang231 pushed a commit to weifang231/eb-vllm that referenced this pull request May 13, 2026
my-other-github-account pushed a commit to my-other-github-account/vllm that referenced this pull request May 15, 2026
Replaces manual docker publish instructions in annotate-release.sh with
a unified publish-release-images.sh script and corresponding Buildkite
pipeline steps. The new script handles all image variants (CUDA 12.9,
CUDA 13.0, Ubuntu 24.04, ROCm, and CPU), automating the pull, tag,
push, and multi-arch manifest creation for Docker Hub releases.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

Signed-off-by: khluu <khluu000@gmail.com>
my-other-github-account pushed a commit to my-other-github-account/vllm that referenced this pull request May 15, 2026
Replaces manual docker publish instructions in annotate-release.sh with
a unified publish-release-images.sh script and corresponding Buildkite
pipeline steps. The new script handles all image variants (CUDA 12.9,
CUDA 13.0, Ubuntu 24.04, ROCm, and CPU), automating the pull, tag,
push, and multi-arch manifest creation for Docker Hub releases.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

Signed-off-by: khluu <khluu000@gmail.com>
mfylcek pushed a commit to mfylcek/vllm that referenced this pull request May 19, 2026
jhu960213 pushed a commit to jhu960213/vllm that referenced this pull request May 20, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

ci/build ready ONLY add when PR is ready to merge/full CI is needed

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants