Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
39 changes: 38 additions & 1 deletion .buildkite/release-pipeline.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -309,6 +309,7 @@ steps:
depends_on: ~

- label: "Build release image - x86_64 - CPU"
key: build-cpu-release-image-x86
depends_on:
- block-cpu-release-image-build
- input-release-version
Expand All @@ -327,7 +328,8 @@ steps:
depends_on: ~

- label: "Build release image - arm64 - CPU"
depends_on:
key: build-cpu-release-image-arm64
depends_on:
- block-arm64-cpu-release-image-build
- input-release-version
agents:
Expand Down Expand Up @@ -436,6 +438,41 @@ steps:
DOCKER_BUILDKIT: "1"
DOCKERHUB_USERNAME: "vllmbot"

- block: "Publish release images to DockerHub"
key: block-publish-release-images
depends_on:
- create-multi-arch-manifest
- create-multi-arch-manifest-cuda-12-9
- create-multi-arch-manifest-ubuntu2404
- create-multi-arch-manifest-cuda-12-9-ubuntu2404
Comment on lines +445 to +447
Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

does it not need to wait on create-multi-arch-manifest?

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good catch — added create-multi-arch-manifest to the depends_on. It wasn't strictly needed because the publish script pulls per-arch images directly (not the multi-arch manifest), but with annotate-release-workflow removed we lose the transitive wait on the CUDA 13.0 builds. Adding it explicitly also makes the depends_on symmetric with the other variants (-cuda-12-9, -ubuntu2404, -cuda-12-9-ubuntu2404).

- build-rocm-release-image
- input-release-version
# Wait for CPU builds if their block steps were unblocked, so publish
# doesn't race the in-progress CPU build. allow_failure lets publish
# proceed when the operator legitimately leaves the CPU block steps
# unblocked or the CPU build fails.
- step: build-cpu-release-image-x86
allow_failure: true
- step: build-cpu-release-image-arm64
allow_failure: true
if: build.env("NIGHTLY") != "1"

Comment thread
claude[bot] marked this conversation as resolved.
- label: "Publish release images to DockerHub"
depends_on:
- block-publish-release-images
key: publish-release-images-dockerhub
agents:
queue: small_cpu_queue_release
commands:
- "bash .buildkite/scripts/publish-release-images.sh"
plugins:
- docker-login#v3.0.0:
username: vllmbot
password-env: DOCKERHUB_TOKEN
env:
DOCKER_BUILDKIT: "1"
DOCKERHUB_USERNAME: "vllmbot"

- group: "Publish wheels"
key: "publish-wheels"
steps:
Expand Down
94 changes: 1 addition & 93 deletions .buildkite/scripts/annotate-release.sh
Original file line number Diff line number Diff line change
Expand Up @@ -8,8 +8,6 @@ if [ -z "${RELEASE_VERSION}" ]; then
RELEASE_VERSION="1.0.0.dev"
fi

ROCM_BASE_CACHE_KEY=$(.buildkite/scripts/cache-rocm-base-wheels.sh key)

buildkite-agent annotate --style 'info' --context 'release-workflow' << EOF
To download the wheel (by commit):
\`\`\`
Expand All @@ -25,95 +23,5 @@ aws s3 cp s3://vllm-wheels/${BUILDKITE_COMMIT}/vllm-${RELEASE_VERSION}+cpu-cp38-
aws s3 cp s3://vllm-wheels/${BUILDKITE_COMMIT}/vllm-${RELEASE_VERSION}+cpu-cp38-abi3-manylinux_2_35_aarch64.whl .
\`\`\`


To download and upload the image:

\`\`\`
# Download images:

docker pull public.ecr.aws/q9t5s3a7/vllm-release-repo:${BUILDKITE_COMMIT}-x86_64
docker pull public.ecr.aws/q9t5s3a7/vllm-release-repo:${BUILDKITE_COMMIT}-aarch64
docker pull public.ecr.aws/q9t5s3a7/vllm-release-repo:${BUILDKITE_COMMIT}-x86_64-cu129
docker pull public.ecr.aws/q9t5s3a7/vllm-release-repo:${BUILDKITE_COMMIT}-aarch64-cu129
docker pull public.ecr.aws/q9t5s3a7/vllm-release-repo:${ROCM_BASE_CACHE_KEY}-rocm-base
docker pull public.ecr.aws/q9t5s3a7/vllm-release-repo:${BUILDKITE_COMMIT}-rocm
docker pull public.ecr.aws/q9t5s3a7/vllm-cpu-release-repo:v${RELEASE_VERSION}
docker pull public.ecr.aws/q9t5s3a7/vllm-arm64-cpu-release-repo:v${RELEASE_VERSION}

# Tag and push images:

## CUDA

docker tag public.ecr.aws/q9t5s3a7/vllm-release-repo:${BUILDKITE_COMMIT}-x86_64 vllm/vllm-openai:x86_64
docker tag vllm/vllm-openai:x86_64 vllm/vllm-openai:latest-x86_64
docker tag vllm/vllm-openai:x86_64 vllm/vllm-openai:v${RELEASE_VERSION}-x86_64
docker push vllm/vllm-openai:latest-x86_64
docker push vllm/vllm-openai:v${RELEASE_VERSION}-x86_64

docker tag public.ecr.aws/q9t5s3a7/vllm-release-repo:${BUILDKITE_COMMIT}-x86_64-cu129 vllm/vllm-openai:x86_64-cu129
docker tag vllm/vllm-openai:x86_64-cu129 vllm/vllm-openai:latest-x86_64-cu129
docker tag vllm/vllm-openai:x86_64-cu129 vllm/vllm-openai:v${RELEASE_VERSION}-x86_64-cu129
docker push vllm/vllm-openai:latest-x86_64-cu129
docker push vllm/vllm-openai:v${RELEASE_VERSION}-x86_64-cu129

docker tag public.ecr.aws/q9t5s3a7/vllm-release-repo:${BUILDKITE_COMMIT}-aarch64 vllm/vllm-openai:aarch64
docker tag vllm/vllm-openai:aarch64 vllm/vllm-openai:latest-aarch64
docker tag vllm/vllm-openai:aarch64 vllm/vllm-openai:v${RELEASE_VERSION}-aarch64
docker push vllm/vllm-openai:latest-aarch64
docker push vllm/vllm-openai:v${RELEASE_VERSION}-aarch64

docker tag public.ecr.aws/q9t5s3a7/vllm-release-repo:${BUILDKITE_COMMIT}-aarch64-cu129 vllm/vllm-openai:aarch64-cu129
docker tag vllm/vllm-openai:aarch64-cu129 vllm/vllm-openai:latest-aarch64-cu129
docker tag vllm/vllm-openai:aarch64-cu129 vllm/vllm-openai:v${RELEASE_VERSION}-aarch64-cu129
docker push vllm/vllm-openai:latest-aarch64-cu129
docker push vllm/vllm-openai:v${RELEASE_VERSION}-aarch64-cu129

## ROCm

docker tag public.ecr.aws/q9t5s3a7/vllm-release-repo:${BUILDKITE_COMMIT}-rocm vllm/vllm-openai-rocm:${BUILDKITE_COMMIT}
docker tag vllm/vllm-openai-rocm:${BUILDKITE_COMMIT} vllm/vllm-openai-rocm:latest
docker tag vllm/vllm-openai-rocm:${BUILDKITE_COMMIT} vllm/vllm-openai-rocm:v${RELEASE_VERSION}
docker push vllm/vllm-openai-rocm:latest
docker push vllm/vllm-openai-rocm:v${RELEASE_VERSION}

docker tag public.ecr.aws/q9t5s3a7/vllm-release-repo:${ROCM_BASE_CACHE_KEY}-rocm-base vllm/vllm-openai-rocm:${BUILDKITE_COMMIT}-base
docker tag vllm/vllm-openai-rocm:${BUILDKITE_COMMIT}-base vllm/vllm-openai-rocm:latest-base
docker tag vllm/vllm-openai-rocm:${BUILDKITE_COMMIT}-base vllm/vllm-openai-rocm:v${RELEASE_VERSION}-base
docker push vllm/vllm-openai-rocm:latest-base
docker push vllm/vllm-openai-rocm:v${RELEASE_VERSION}-base

## CPU

docker tag public.ecr.aws/q9t5s3a7/vllm-cpu-release-repo:v${RELEASE_VERSION} vllm/vllm-openai-cpu:x86_64
docker tag vllm/vllm-openai-cpu:x86_64 vllm/vllm-openai-cpu:latest-x86_64
docker tag vllm/vllm-openai-cpu:x86_64 vllm/vllm-openai-cpu:v${RELEASE_VERSION}-x86_64
docker push vllm/vllm-openai-cpu:latest-x86_64
docker push vllm/vllm-openai-cpu:v${RELEASE_VERSION}-x86_64

docker tag public.ecr.aws/q9t5s3a7/vllm-arm64-cpu-release-repo:v${RELEASE_VERSION} vllm/vllm-openai-cpu:arm64
docker tag vllm/vllm-openai-cpu:arm64 vllm/vllm-openai-cpu:latest-arm64
docker tag vllm/vllm-openai-cpu:arm64 vllm/vllm-openai-cpu:v${RELEASE_VERSION}-arm64
docker push vllm/vllm-openai-cpu:latest-arm64
docker push vllm/vllm-openai-cpu:v${RELEASE_VERSION}-arm64

# Create multi-arch manifest:

docker manifest rm vllm/vllm-openai:latest
docker manifest create vllm/vllm-openai:latest vllm/vllm-openai:latest-x86_64 vllm/vllm-openai:latest-aarch64
docker manifest create vllm/vllm-openai:v${RELEASE_VERSION} vllm/vllm-openai:v${RELEASE_VERSION}-x86_64 vllm/vllm-openai:v${RELEASE_VERSION}-aarch64
docker manifest push vllm/vllm-openai:latest
docker manifest push vllm/vllm-openai:v${RELEASE_VERSION}

docker manifest rm vllm/vllm-openai:latest-cu129
docker manifest create vllm/vllm-openai:latest-cu129 vllm/vllm-openai:latest-x86_64-cu129 vllm/vllm-openai:latest-aarch64-cu129
docker manifest create vllm/vllm-openai:v${RELEASE_VERSION}-cu129 vllm/vllm-openai:v${RELEASE_VERSION}-x86_64-cu129 vllm/vllm-openai:v${RELEASE_VERSION}-aarch64-cu129
docker manifest push vllm/vllm-openai:latest-cu129
docker manifest push vllm/vllm-openai:v${RELEASE_VERSION}-cu129

docker manifest rm vllm/vllm-openai-cpu:latest || true
docker manifest create vllm/vllm-openai-cpu:latest vllm/vllm-openai-cpu:latest-x86_64 vllm/vllm-openai-cpu:latest-arm64
docker manifest create vllm/vllm-openai-cpu:v${RELEASE_VERSION} vllm/vllm-openai-cpu:v${RELEASE_VERSION}-x86_64 vllm/vllm-openai-cpu:v${RELEASE_VERSION}-arm64
docker manifest push vllm/vllm-openai-cpu:latest
docker manifest push vllm/vllm-openai-cpu:v${RELEASE_VERSION}
\`\`\`
Docker images are published automatically by the "Publish release images to DockerHub" pipeline step.
EOF
180 changes: 180 additions & 0 deletions .buildkite/scripts/publish-release-images.sh
Original file line number Diff line number Diff line change
@@ -0,0 +1,180 @@
#!/bin/bash
# SPDX-License-Identifier: Apache-2.0
# SPDX-FileCopyrightText: Copyright contributors to the vLLM project
#
# Publish release Docker images from ECR to DockerHub.
# Pulls per-arch images, tags with latest and versioned tags, pushes them,
# then creates and pushes multi-arch manifests.

set -euo pipefail

RELEASE_VERSION=$(buildkite-agent meta-data get release-version --default "" | sed 's/^v//')
if [ -z "${RELEASE_VERSION}" ]; then
echo "ERROR: release-version metadata not set"
exit 1
fi
Comment thread
claude[bot] marked this conversation as resolved.

COMMIT="$BUILDKITE_COMMIT"
ROCM_BASE_CACHE_KEY=$(.buildkite/scripts/cache-rocm-base-wheels.sh key)

echo "========================================"
echo "Publishing release images v${RELEASE_VERSION}"
echo " Commit: ${COMMIT}"
echo " ROCm base cache key: ${ROCM_BASE_CACHE_KEY}"
echo "========================================"

# Login to ECR to pull staging images
aws ecr-public get-login-password --region us-east-1 | \
docker login --username AWS --password-stdin public.ecr.aws/q9t5s3a7

# ---- CUDA (default: 13.0) ----

docker pull public.ecr.aws/q9t5s3a7/vllm-release-repo:${COMMIT}-x86_64
docker pull public.ecr.aws/q9t5s3a7/vllm-release-repo:${COMMIT}-aarch64

docker tag public.ecr.aws/q9t5s3a7/vllm-release-repo:${COMMIT}-x86_64 vllm/vllm-openai:latest-x86_64
docker tag public.ecr.aws/q9t5s3a7/vllm-release-repo:${COMMIT}-x86_64 vllm/vllm-openai:v${RELEASE_VERSION}-x86_64
docker push vllm/vllm-openai:latest-x86_64
docker push vllm/vllm-openai:v${RELEASE_VERSION}-x86_64

docker tag public.ecr.aws/q9t5s3a7/vllm-release-repo:${COMMIT}-aarch64 vllm/vllm-openai:latest-aarch64
docker tag public.ecr.aws/q9t5s3a7/vllm-release-repo:${COMMIT}-aarch64 vllm/vllm-openai:v${RELEASE_VERSION}-aarch64
docker push vllm/vllm-openai:latest-aarch64
docker push vllm/vllm-openai:v${RELEASE_VERSION}-aarch64

docker manifest rm vllm/vllm-openai:latest || true
docker manifest rm vllm/vllm-openai:v${RELEASE_VERSION} || true
docker manifest create vllm/vllm-openai:latest vllm/vllm-openai:latest-x86_64 vllm/vllm-openai:latest-aarch64
docker manifest create vllm/vllm-openai:v${RELEASE_VERSION} vllm/vllm-openai:v${RELEASE_VERSION}-x86_64 vllm/vllm-openai:v${RELEASE_VERSION}-aarch64
docker manifest push vllm/vllm-openai:latest
docker manifest push vllm/vllm-openai:v${RELEASE_VERSION}

# ---- CUDA 12.9 ----

docker pull public.ecr.aws/q9t5s3a7/vllm-release-repo:${COMMIT}-x86_64-cu129
docker pull public.ecr.aws/q9t5s3a7/vllm-release-repo:${COMMIT}-aarch64-cu129

docker tag public.ecr.aws/q9t5s3a7/vllm-release-repo:${COMMIT}-x86_64-cu129 vllm/vllm-openai:latest-x86_64-cu129
docker tag public.ecr.aws/q9t5s3a7/vllm-release-repo:${COMMIT}-x86_64-cu129 vllm/vllm-openai:v${RELEASE_VERSION}-x86_64-cu129
docker push vllm/vllm-openai:latest-x86_64-cu129
docker push vllm/vllm-openai:v${RELEASE_VERSION}-x86_64-cu129

docker tag public.ecr.aws/q9t5s3a7/vllm-release-repo:${COMMIT}-aarch64-cu129 vllm/vllm-openai:latest-aarch64-cu129
docker tag public.ecr.aws/q9t5s3a7/vllm-release-repo:${COMMIT}-aarch64-cu129 vllm/vllm-openai:v${RELEASE_VERSION}-aarch64-cu129
docker push vllm/vllm-openai:latest-aarch64-cu129
docker push vllm/vllm-openai:v${RELEASE_VERSION}-aarch64-cu129

docker manifest rm vllm/vllm-openai:latest-cu129 || true
docker manifest rm vllm/vllm-openai:v${RELEASE_VERSION}-cu129 || true
docker manifest create vllm/vllm-openai:latest-cu129 vllm/vllm-openai:latest-x86_64-cu129 vllm/vllm-openai:latest-aarch64-cu129
docker manifest create vllm/vllm-openai:v${RELEASE_VERSION}-cu129 vllm/vllm-openai:v${RELEASE_VERSION}-x86_64-cu129 vllm/vllm-openai:v${RELEASE_VERSION}-aarch64-cu129
docker manifest push vllm/vllm-openai:latest-cu129
docker manifest push vllm/vllm-openai:v${RELEASE_VERSION}-cu129

# ---- Ubuntu 24.04 (CUDA 13.0) ----

docker pull public.ecr.aws/q9t5s3a7/vllm-release-repo:${COMMIT}-x86_64-ubuntu2404
docker pull public.ecr.aws/q9t5s3a7/vllm-release-repo:${COMMIT}-aarch64-ubuntu2404

docker tag public.ecr.aws/q9t5s3a7/vllm-release-repo:${COMMIT}-x86_64-ubuntu2404 vllm/vllm-openai:latest-x86_64-ubuntu2404
docker tag public.ecr.aws/q9t5s3a7/vllm-release-repo:${COMMIT}-x86_64-ubuntu2404 vllm/vllm-openai:v${RELEASE_VERSION}-x86_64-ubuntu2404
docker push vllm/vllm-openai:latest-x86_64-ubuntu2404
docker push vllm/vllm-openai:v${RELEASE_VERSION}-x86_64-ubuntu2404

docker tag public.ecr.aws/q9t5s3a7/vllm-release-repo:${COMMIT}-aarch64-ubuntu2404 vllm/vllm-openai:latest-aarch64-ubuntu2404
docker tag public.ecr.aws/q9t5s3a7/vllm-release-repo:${COMMIT}-aarch64-ubuntu2404 vllm/vllm-openai:v${RELEASE_VERSION}-aarch64-ubuntu2404
docker push vllm/vllm-openai:latest-aarch64-ubuntu2404
docker push vllm/vllm-openai:v${RELEASE_VERSION}-aarch64-ubuntu2404

docker manifest rm vllm/vllm-openai:latest-ubuntu2404 || true
docker manifest rm vllm/vllm-openai:v${RELEASE_VERSION}-ubuntu2404 || true
docker manifest create vllm/vllm-openai:latest-ubuntu2404 vllm/vllm-openai:latest-x86_64-ubuntu2404 vllm/vllm-openai:latest-aarch64-ubuntu2404
docker manifest create vllm/vllm-openai:v${RELEASE_VERSION}-ubuntu2404 vllm/vllm-openai:v${RELEASE_VERSION}-x86_64-ubuntu2404 vllm/vllm-openai:v${RELEASE_VERSION}-aarch64-ubuntu2404
docker manifest push vllm/vllm-openai:latest-ubuntu2404
docker manifest push vllm/vllm-openai:v${RELEASE_VERSION}-ubuntu2404

# ---- Ubuntu 24.04 (CUDA 12.9) ----

docker pull public.ecr.aws/q9t5s3a7/vllm-release-repo:${COMMIT}-x86_64-cu129-ubuntu2404
docker pull public.ecr.aws/q9t5s3a7/vllm-release-repo:${COMMIT}-aarch64-cu129-ubuntu2404

docker tag public.ecr.aws/q9t5s3a7/vllm-release-repo:${COMMIT}-x86_64-cu129-ubuntu2404 vllm/vllm-openai:latest-x86_64-cu129-ubuntu2404
docker tag public.ecr.aws/q9t5s3a7/vllm-release-repo:${COMMIT}-x86_64-cu129-ubuntu2404 vllm/vllm-openai:v${RELEASE_VERSION}-x86_64-cu129-ubuntu2404
docker push vllm/vllm-openai:latest-x86_64-cu129-ubuntu2404
docker push vllm/vllm-openai:v${RELEASE_VERSION}-x86_64-cu129-ubuntu2404

docker tag public.ecr.aws/q9t5s3a7/vllm-release-repo:${COMMIT}-aarch64-cu129-ubuntu2404 vllm/vllm-openai:latest-aarch64-cu129-ubuntu2404
docker tag public.ecr.aws/q9t5s3a7/vllm-release-repo:${COMMIT}-aarch64-cu129-ubuntu2404 vllm/vllm-openai:v${RELEASE_VERSION}-aarch64-cu129-ubuntu2404
docker push vllm/vllm-openai:latest-aarch64-cu129-ubuntu2404
docker push vllm/vllm-openai:v${RELEASE_VERSION}-aarch64-cu129-ubuntu2404

docker manifest rm vllm/vllm-openai:latest-cu129-ubuntu2404 || true
docker manifest rm vllm/vllm-openai:v${RELEASE_VERSION}-cu129-ubuntu2404 || true
docker manifest create vllm/vllm-openai:latest-cu129-ubuntu2404 vllm/vllm-openai:latest-x86_64-cu129-ubuntu2404 vllm/vllm-openai:latest-aarch64-cu129-ubuntu2404
docker manifest create vllm/vllm-openai:v${RELEASE_VERSION}-cu129-ubuntu2404 vllm/vllm-openai:v${RELEASE_VERSION}-x86_64-cu129-ubuntu2404 vllm/vllm-openai:v${RELEASE_VERSION}-aarch64-cu129-ubuntu2404
docker manifest push vllm/vllm-openai:latest-cu129-ubuntu2404
docker manifest push vllm/vllm-openai:v${RELEASE_VERSION}-cu129-ubuntu2404

# ---- ROCm ----

docker pull public.ecr.aws/q9t5s3a7/vllm-release-repo:${COMMIT}-rocm
docker pull public.ecr.aws/q9t5s3a7/vllm-release-repo:${ROCM_BASE_CACHE_KEY}-rocm-base

docker tag public.ecr.aws/q9t5s3a7/vllm-release-repo:${COMMIT}-rocm vllm/vllm-openai-rocm:latest
docker tag public.ecr.aws/q9t5s3a7/vllm-release-repo:${COMMIT}-rocm vllm/vllm-openai-rocm:v${RELEASE_VERSION}
docker push vllm/vllm-openai-rocm:latest
docker push vllm/vllm-openai-rocm:v${RELEASE_VERSION}

docker tag public.ecr.aws/q9t5s3a7/vllm-release-repo:${ROCM_BASE_CACHE_KEY}-rocm-base vllm/vllm-openai-rocm:latest-base
docker tag public.ecr.aws/q9t5s3a7/vllm-release-repo:${ROCM_BASE_CACHE_KEY}-rocm-base vllm/vllm-openai-rocm:v${RELEASE_VERSION}-base
docker push vllm/vllm-openai-rocm:latest-base
docker push vllm/vllm-openai-rocm:v${RELEASE_VERSION}-base

# ---- CPU ----
# CPU images are behind separate block steps and may not have been built.
# All-or-nothing: inspect both arches first, then either publish everything
# (per-arch + multi-arch manifest) or skip everything. Publishing only one
# arch would leave `:latest-x86_64` pointing at the new release while the
# `:latest` multi-arch manifest still resolves to the previous release.

CPU_X86_TAG=public.ecr.aws/q9t5s3a7/vllm-cpu-release-repo:v${RELEASE_VERSION}
CPU_ARM_TAG=public.ecr.aws/q9t5s3a7/vllm-arm64-cpu-release-repo:v${RELEASE_VERSION}

CPU_X86_AVAILABLE=false
CPU_ARM_AVAILABLE=false
docker manifest inspect "${CPU_X86_TAG}" >/dev/null 2>&1 && CPU_X86_AVAILABLE=true
docker manifest inspect "${CPU_ARM_TAG}" >/dev/null 2>&1 && CPU_ARM_AVAILABLE=true

if [ "$CPU_X86_AVAILABLE" = "true" ] && [ "$CPU_ARM_AVAILABLE" = "true" ]; then
docker pull "${CPU_X86_TAG}"
docker tag "${CPU_X86_TAG}" vllm/vllm-openai-cpu:latest-x86_64
docker tag "${CPU_X86_TAG}" vllm/vllm-openai-cpu:v${RELEASE_VERSION}-x86_64
docker push vllm/vllm-openai-cpu:latest-x86_64
docker push vllm/vllm-openai-cpu:v${RELEASE_VERSION}-x86_64

docker pull "${CPU_ARM_TAG}"
docker tag "${CPU_ARM_TAG}" vllm/vllm-openai-cpu:latest-arm64
docker tag "${CPU_ARM_TAG}" vllm/vllm-openai-cpu:v${RELEASE_VERSION}-arm64
docker push vllm/vllm-openai-cpu:latest-arm64
docker push vllm/vllm-openai-cpu:v${RELEASE_VERSION}-arm64

docker manifest rm vllm/vllm-openai-cpu:latest || true
docker manifest rm vllm/vllm-openai-cpu:v${RELEASE_VERSION} || true
docker manifest create vllm/vllm-openai-cpu:latest vllm/vllm-openai-cpu:latest-x86_64 vllm/vllm-openai-cpu:latest-arm64
docker manifest create vllm/vllm-openai-cpu:v${RELEASE_VERSION} vllm/vllm-openai-cpu:v${RELEASE_VERSION}-x86_64 vllm/vllm-openai-cpu:v${RELEASE_VERSION}-arm64
docker manifest push vllm/vllm-openai-cpu:latest
docker manifest push vllm/vllm-openai-cpu:v${RELEASE_VERSION}
elif [ "$CPU_X86_AVAILABLE" = "false" ] && [ "$CPU_ARM_AVAILABLE" = "false" ]; then
Comment on lines +140 to +167
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🔴 On the build side (release-pipeline.yaml lines 320, 322, 339, 341), CPU images are pushed to ECR using the raw $(buildkite-agent meta-data get release-version), but publish-release-images.sh strips a leading v at line 11 and re-prepends v at lines 140-141 — so if an operator types 1.2.3 (no v) into the unvalidated input field, the build pushes :1.2.3 while publish looks for :v1.2.3. Both docker manifest inspect calls 404, the script falls into the elif at line 167 ("Neither CPU image found"), prints a misleading WARNING ("ensure block step was unblocked" — they did) and exits 0 green, leaving vllm/vllm-openai-cpu:latest and :v${RELEASE_VERSION} stale from the previous release. The PR added docker manifest inspect (closing the partial-build observability gap raised in the resolved review comment) but did NOT close this v-prefix root cause; fix is one line — mirror the sed s/^v// on the build side (or strip on both, or validate the input field).

Extended reasoning...

What goes wrong

publish-release-images.sh and release-pipeline.yaml disagree on whether the operator-supplied release version carries a leading v:

  • Publish side (publish-release-images.sh):
    • Line 11: RELEASE_VERSION=$(buildkite-agent meta-data get release-version --default "" | sed 's/^v//') — strips a leading v.
    • Lines 140-141: CPU_X86_TAG=public.ecr.aws/q9t5s3a7/vllm-cpu-release-repo:v${RELEASE_VERSION} and CPU_ARM_TAG=...:v${RELEASE_VERSION} — re-prepends v.
    • The publish script is therefore tolerant of either input format (1.2.3 or v1.2.3) and always inspects :v1.2.3.
  • Build side (release-pipeline.yaml):
    • Line 320 (x86 build): --tag public.ecr.aws/q9t5s3a7/vllm-cpu-release-repo:$(buildkite-agent meta-data get release-version)
    • Line 322 (x86 push): docker push public.ecr.aws/q9t5s3a7/vllm-cpu-release-repo:$(buildkite-agent meta-data get release-version)
    • Line 339 (arm64 build) / 341 (arm64 push): identical pattern with vllm-arm64-cpu-release-repo.
    • Builds use the raw operator input — no v normalization. Builds push :1.2.3 if the operator typed 1.2.3, or :v1.2.3 if they typed v1.2.3.

The "Provide Release version here" input step has no validation or hint about the expected format, and the publish script's own sed signals that mixed input is acceptable. So the formats round-trip inconsistently.

Step-by-step proof

  1. Release manager unblocks block-cpu-release-image-build and block-arm64-cpu-release-image-build, and types 1.2.3 (no v) into the unvalidated input field.
  2. CPU build steps push public.ecr.aws/q9t5s3a7/vllm-cpu-release-repo:1.2.3 and vllm-arm64-cpu-release-repo:1.2.3 to ECR (line 322 / 341).
  3. publish-release-images.sh runs. Line 11 sets RELEASE_VERSION=1.2.3 (sed strips nothing because there's no leading v). Lines 140-141 set CPU_X86_TAG=...:v1.2.3 and CPU_ARM_TAG=...:v1.2.3.
  4. Lines 145-146: docker manifest inspect "${CPU_X86_TAG}" returns "manifest unknown" (the tag pushed was :1.2.3, not :v1.2.3). CPU_X86_AVAILABLE stays false. Same for arm64.
  5. Line 148 guard CPU_X86_AVAILABLE=true && CPU_ARM_AVAILABLE=true evaluates false; line 167 elif matches (both false); script prints WARNING: Neither CPU image found in ECR, skipping CPU publish (ensure block-cpu-release-image-build and block-arm64-cpu-release-image-build were unblocked and the builds finished pushing) — even though the operator DID unblock both gates and the builds DID succeed.
  6. Script falls through to echo "Successfully published release images for v1.2.3" and exits 0.
  7. Buildkite reports the publish step green. vllm/vllm-openai-cpu:latest and vllm/vllm-openai-cpu:v${RELEASE_VERSION} on Docker Hub remain pointing at whatever the previous release published. docker pull vllm/vllm-openai-cpu:latest silently returns the prior release's image.

Why existing protections don't catch it

  • The PR's docker manifest inspect rewrite addresses the original review comment's partial-build observability concern (distinguishing "not built" from "pull failed"), and the new partial-build else branch (lines 169-176) correctly fails loudly when exactly one arch is available. But when both arches are present in ECR under a different tag (the v-prefix mismatch case), inspect returns "manifest unknown" for both, the elif at line 167 matches, and the script falls into the silent-skip path with a misleading WARNING.
  • upload-release-wheels-pypi.sh has a GIT_VERSION sanity check that would catch a wrong format for the wheel path (PURE_VERSION=${RELEASE_VERSION#v} line 31), but that runs in a separate step on a separate gate; the docker publish path has no equivalent check and produces no error signal.
  • Operator convention is vX.Y.Z per past releases, but conventions are not validation. The publish script's own sed 's/^v//' invites mixed input, and a one-character omission yields a silent stale release.

Impact

Bounded but real:

  • Only vllm/vllm-openai-cpu:latest and vllm/vllm-openai-cpu:v${RELEASE_VERSION} (multi-arch + per-arch tags) go silently stale.
  • Build is reported green, no paging signal, no diagnostic for the operator other than a WARNING that misdirects them ("ensure block step was unblocked" — but they did).
  • Users docker pull vllm/vllm-openai-cpu:latest get the previous release.
  • The previous manual flow had the same mismatch but a human running the commands would have seen the 404; the new automation makes it silent — a regression in observability vs. the manual flow it replaces.

How to fix

One-line, two viable options:

  1. Mirror the sed on the build side, e.g. replace $(buildkite-agent meta-data get release-version) with $(buildkite-agent meta-data get release-version | sed 's/^v//') at lines 320, 322, 339, 341, and update the publish script to inspect :${RELEASE_VERSION} (no v prefix). Or strip the v on both sides and add it consistently in tag generation.
  2. Validate the input format on the "Provide Release version here" step (e.g. require a v prefix or normalize via a hint), so the round-trip is well-defined.

Either fix closes the silent-skip path; verifiers were unanimous.

echo "WARNING: Neither CPU image found in ECR, skipping CPU publish (ensure block-cpu-release-image-build and block-arm64-cpu-release-image-build were unblocked and the builds finished pushing)"
else
# Partial state: one arch built, the other did not. Fail loudly rather than
# ship a Docker Hub state where `:latest-${arch}` and `:latest` (multi-arch)
# disagree on which release they point at.
echo "ERROR: Partial CPU build detected (x86_64=${CPU_X86_AVAILABLE}, arm64=${CPU_ARM_AVAILABLE})."
echo " Refusing to publish to avoid split-tag drift between per-arch and multi-arch tags."
echo " Re-run the missing CPU build and retry, or manually publish if a single-arch release is intended."
exit 1
fi

echo ""
echo "Successfully published release images for v${RELEASE_VERSION}"
3 changes: 2 additions & 1 deletion .buildkite/scripts/upload-release-wheels-pypi.sh
Original file line number Diff line number Diff line change
Expand Up @@ -39,10 +39,11 @@

set -x # avoid printing secrets above

# install twine from pypi
# install twine and sdist build prerequisites from pypi
python3 -m venv /tmp/vllm-release-env
source /tmp/vllm-release-env/bin/activate
pip install twine
pip install -r requirements/build/cuda.txt

Check failure on line 46 in .buildkite/scripts/upload-release-wheels-pypi.sh

View check run for this annotation

Claude / Claude Code Review

PyPI upload path still broken after this PR

Installing `requirements/build/cuda.txt` (line 46) is intended to make `python setup.py sdist` (line 59) succeed, but the script still cannot reach a green PyPI upload: (1) cuda.txt pins CUDA-built `torch==2.11.0`, so setup.py auto-detects `VLLM_TARGET_DEVICE=cuda` and `get_vllm_version()` calls `get_nvcc_cuda_version()` at setup.py:916, which asserts `CUDA_HOME is not None` (setup.py:888) — `small_cpu_queue_release` has no CUDA toolkit, so sdist exits before producing the .tar.gz; the sdist-ski
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🔴 Installing requirements/build/cuda.txt (line 46) is intended to make python setup.py sdist (line 59) succeed, but the script still cannot reach a green PyPI upload: (1) cuda.txt pins CUDA-built torch==2.11.0, so setup.py auto-detects VLLM_TARGET_DEVICE=cuda and get_vllm_version() calls get_nvcc_cuda_version() at setup.py:916, which asserts CUDA_HOME is not None (setup.py:888) — small_cpu_queue_release has no CUDA toolkit, so sdist exits before producing the .tar.gz; the sdist-skip guard at setup.py:920 sits after the assertion, so it does not help. (2) Even if sdist succeeded, lines 66/72-73 are broken: PYPI_WHEEL_FILES=$(find …) returns the two default-variant wheels (x86_64 + aarch64) newline-separated, and twine check "$PYPI_WHEEL_FILES" … quotes them as a single argument with an embedded newline, which twine treats as one nonexistent path. Fix: set VLLM_USE_PRECOMPILED=1 (or VLLM_TARGET_DEVICE=empty) on the sdist invocation, and switch the twine call to a bash array (mapfile -t PYPI_WHEEL_FILES < <(find …); twine check "${PYPI_WHEEL_FILES[@]}" "$SDIST_FILE").

Extended reasoning...

What the bug is

This PR adds pip install -r requirements/build/cuda.txt at line 46 specifically to make the python setup.py sdist invocation on line 59 succeed (previously it failed at module-level import torch in setup.py and aborted under set -e). The intended outcome — a working PyPI upload — is still not achieved, because that one-line change activates two independent downstream failures, both of which would fire the first time block-upload-release-wheels is unblocked.

Failure mode A — setup.py sdist aborts on a CPU-only agent

After the PR, sdist gets past import torch and begins module-level execution. With requirements/build/cuda.txt pinning the default PyPI torch==2.11.0 (CUDA-built wheel, torch.version.cuda is set), setup.py auto-detection at lines 53-64 sets VLLM_TARGET_DEVICE = "cuda". The module-level setup(version=get_vllm_version(), ...) call then runs get_vllm_version() (setup.py:898). _is_cuda() is true; VLLM_USE_PRECOMPILED is unset; the function falls through to setup.py:916 → str(get_nvcc_cuda_version()), whose first line (setup.py:888) is assert CUDA_HOME is not None, "CUDA_HOME is not set".

The agent that runs upload-release-wheels-pypi.sh is small_cpu_queue_release (release-pipeline.yaml). The queue name and the absence of any CUDA-toolkit installation step mean CUDA_HOME resolves to None (no env var, no /usr/local/cuda, no nvcc). The assertion raises, sdist exits non-zero, and set -e aborts the script at line 59 — before the twine call ever runs. Critically, the sdist-skip guard at setup.py:920 (if "sdist" not in sys.argv: ...) sits after the failing call at line 916, so it does not protect this path.

Failure mode B — newline-quoted wheel paths

If failure mode A is fixed, the script reaches lines 66-73:

PYPI_WHEEL_FILES=$(find $DIST_DIR -name "vllm-${PURE_VERSION}*.whl" -not -name "*+*")
...
python3 -m twine check "$PYPI_WHEEL_FILES" "$SDIST_FILE"
python3 -m twine upload --non-interactive --verbose "$PYPI_WHEEL_FILES" "$SDIST_FILE"

The vLLM release pipeline produces two default-variant wheels (x86_64 + aarch64; the +cu129/+cpu variants are filtered by -not -name "*+*"). find outputs them newline-separated. Quoting "$PYPI_WHEEL_FILES" passes both paths to twine as a single argument with an embedded newline, which twine treats as one filename and fails opening.

Step-by-step proof of failure mode A

  1. Operator unblocks block-upload-release-wheels. The script runs on small_cpu_queue_release.
  2. Line 46 installs torch==2.11.0 (CUDA-built); torch.version.cuda is set.
  3. Line 59 runs python setup.py sdist. setup.py:53-64 sets VLLM_TARGET_DEVICE = "cuda".
  4. Module-level setup(version=get_vllm_version(), ...) at setup.py:1085 runs get_vllm_version().
  5. setup.py:912 _is_cuda() is True; VLLM_USE_PRECOMPILED is unset → falls to else at setup.py:916.
  6. get_nvcc_cuda_version() runs; setup.py:888 assert CUDA_HOME is not None raises AssertionError on the CPU-only agent.
  7. sdist exits non-zero, set -e aborts the script. PyPI upload is blocked. The script never reaches the twine call, so failure mode B does not even fire — it is shadowed.

Step-by-step proof of failure mode B (assuming A is fixed)

  1. find returns two paths: /tmp/.../vllm-${VER}-cp38-abi3-manylinux_2_35_x86_64.whl and ..._aarch64.whl.
  2. PYPI_WHEEL_FILES is set to those two paths joined by \n.
  3. twine check "$PYPI_WHEEL_FILES" "$SDIST_FILE" invokes twine with argv[1] containing both paths separated by a literal newline.
  4. twine treats argv[1] as a single filename, calls open(), and fails with no-such-file.

Verified empirically: X=$(printf "a\nb"); set -- "$X"; echo $# prints 1 — quoting flattens the newline-separated list to one argument. Without quoting, IFS-based word-splitting yields two arguments.

Why existing code does not prevent this

  • The sdist-skip guard at setup.py:920 only suppresses the +cuXXX suffix on the version string; it appears after the get_nvcc_cuda_version() call on line 916, so the assertion has already triggered. The guard cannot help.
  • The PR test plan exercises the docker publish path (Verify the publish script…, Verify the block step appears…) but does not exercise upload-release-wheels-pypi.sh end to end. The wheel upload is gated behind a manual block (block-upload-release-wheels), so the PR can land green without either failure being observed.
  • The script was added in commit ae3b4de on 2026-05-02 with the broken twine quoting already present, but pre-PR the script aborted at line 59 (ImportError) under set -e, so the twine call has never executed. This PR is the first commit that brings either failure into the live code path.

How to fix

Two minimal changes:

  1. Set VLLM_USE_PRECOMPILED=1 (or VLLM_TARGET_DEVICE=empty) on the sdist invocation at line 59, so _is_cuda() takes the precompiled branch at setup.py:913 and skips get_nvcc_cuda_version(). Equivalent: install a CPU-only torch (pip install torch --index-url https://download.pytorch.org/whl/cpu) so torch.version.cuda is None and _is_cuda() returns False during version detection.
  2. Use a bash array for the twine call:
mapfile -t PYPI_WHEEL_FILES < <(find "$DIST_DIR" -name "vllm-${PURE_VERSION}*.whl" -not -name "*+*")
if [[ ${#PYPI_WHEEL_FILES[@]} -eq 0 ]]; then
  echo "No default variant wheels found, quitting..."; exit 1
fi
python3 -m twine check "${PYPI_WHEEL_FILES[@]}" "$SDIST_FILE"
python3 -m twine upload --non-interactive --verbose "${PYPI_WHEEL_FILES[@]}" "$SDIST_FILE"

Either fix alone leaves the other failure live, so both must land for the PyPI upload step to succeed.

python3 -m twine --version

# copy release wheels to local directory
Expand Down
Loading