Skip to content
Merged
Show file tree
Hide file tree
Changes from 1 commit
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
27 changes: 27 additions & 0 deletions .buildkite/release-pipeline.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -436,6 +436,33 @@
DOCKER_BUILDKIT: "1"
DOCKERHUB_USERNAME: "vllmbot"

- block: "Publish release images to DockerHub"
key: block-publish-release-images
depends_on:
- annotate-release-workflow

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

it doesn't need to wait for release workflow annotation

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done — dropped annotate-release-workflow from the depends_on. The annotate step posts wheel download info and isn't a real publish prerequisite.

- create-multi-arch-manifest-cuda-12-9
- create-multi-arch-manifest-ubuntu2404
- create-multi-arch-manifest-cuda-12-9-ubuntu2404
Comment on lines +445 to +447

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

does it not need to wait on create-multi-arch-manifest?

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good catch — added create-multi-arch-manifest to the depends_on. It wasn't strictly needed because the publish script pulls per-arch images directly (not the multi-arch manifest), but with annotate-release-workflow removed we lose the transitive wait on the CUDA 13.0 builds. Adding it explicitly also makes the depends_on symmetric with the other variants (-cuda-12-9, -ubuntu2404, -cuda-12-9-ubuntu2404).

- build-rocm-release-image
- input-release-version
if: build.env("NIGHTLY") != "1"

Check warning on line 449 in .buildkite/release-pipeline.yaml

View check run for this annotation

Claude / Claude Code Review

Publish block missing depends_on for CPU builds creates a silent race

The `block-publish-release-images` step depends on the CUDA/ROCm builds and manifests but not on the two CPU build steps (or their `block-cpu-release-image-build` / `block-arm64-cpu-release-image-build` gates). If a release operator unblocks both the CPU build and the publish gate close together, publish will reach `docker pull vllm-cpu-release-repo:v${RELEASE_VERSION}` before the slow CPU Docker build has finished pushing to ECR; combined with the silent-skip in `publish-release-images.sh`, the
Comment thread
claude[bot] marked this conversation as resolved.
- label: "Publish release images to DockerHub"
depends_on:
- block-publish-release-images
id: publish-release-images-dockerhub

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

use key instead of id

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done — switched to key:.

Comment on lines +460 to +463

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🟡 The new publish-release-images-dockerhub label step (lines 460-474) depends on block-publish-release-images, which is gated with if: build.env("NIGHTLY") != "1" (line 458) and so is filtered out of nightly pipeline uploads. The label step itself carries neither a matching if: clause nor an allow_dependency_failure: true on its dependency, so its presence in nightly builds relies on Buildkite's transitive-filter semantics rather than the explicit pattern used 350 lines earlier in this same file at lines 107-115. Suggested fix (one line): add if: build.env("NIGHTLY") != "1" to the label step to mirror the block step's guard, making the intent self-documenting and not dependent on undocumented edge cases of depends_on referencing a filtered step.

Extended reasoning...

What the bug is

block-publish-release-images (line 441) is gated with if: build.env("NIGHTLY") != "1" on line 458. The dependent publish-release-images-dockerhub label step (lines 460-474) lists depends_on: - block-publish-release-images but has neither a matching if: guard nor allow_dependency_failure: true on the dependency. So during a nightly pipeline upload, the label step survives the upload but its only dependency does not.

Why this is a consistency gap

The same author/file has already encoded the right pattern for the analogous case 350 lines earlier:

  • block-build-release-images (line 107) is gated with if: build.env("NIGHTLY") != "1" (line 110).
  • The dependent build-release-images group (line 112) explicitly sets allow_dependency_failure: true (line 115).

That pattern fits that case (the build group should run in BOTH modes — in nightly the upstream block is missing, so allow_dependency_failure: true lets the group proceed). For the new publish step the intent is different: it should NOT run in nightly. The two valid encodings of that intent in this file are:

  1. Add if: build.env("NIGHTLY") != "1" to the label step itself — mirrors the explicit guard used for the existing nightly-only Publish nightly multi-arch image to DockerHub steps (which use if: build.env("NIGHTLY") == "1").
  2. Leave it to Buildkite's transitive filter on the missing dependency — what the PR currently does.

Step-by-step trace of nightly behavior

  1. Nightly pipeline upload runs with NIGHTLY=1.
  2. block-publish-release-images is filtered out by its if: clause and is not present in the uploaded pipeline.
  3. publish-release-images-dockerhub is uploaded (no if: on it). Its sole dependency block-publish-release-images does not exist in this build.
  4. Buildkite's resolution: a step whose depends_on references a non-existent step is marked broken/skipped (without allow_dependency_failure). The publish step is silently dropped.

In the happy path this is the intended outcome. The risk is that step (4) relies on Buildkite's behavior being stable across versions and matching the author's mental model — and unlike approach (1), there is nothing in the YAML that tells a future reader 'this step is intentionally nightly-gated'.

Addressing the refutation

A fellow verifier argues this is intentional behavior leveraging Buildkite's transitive-filter semantics, and that the lines 113-115 pattern is not analogous because the build group must run in both modes whereas the publish step must not. That refutation is correct on intent — the two cases ARE different, and the refutation is right that simply pasting allow_dependency_failure: true here would be wrong (it would make publish run in nightly without any block gate).

But the refutation does not address the explicit-vs-implicit gap. The two existing patterns in this file (build-release-images group with allow_dependency_failure: true, and Publish nightly steps with if: NIGHTLY == "1") both make the gating visible at the step that runs. The new publish step is the first place in this file where the gating is only on the upstream block. That is the consistency gap, and it is also why a one-line if: build.env("NIGHTLY") != "1" on the label step is the cleanest fix — it doesn't change runtime behavior, it just documents the intent in the right place.

Impact

Bounded — in the most likely case the existing pattern works (Buildkite skips the publish step in nightly as intended). The downside is reduced readability and a small risk if Buildkite's filtered-dependency semantics ever change or differ across pipeline-upload code paths. Severity: nit.

Fix

Add the matching guard on the label step:

- label: "Publish release images to DockerHub"
  depends_on:
    - block-publish-release-images
  id: publish-release-images-dockerhub
  if: build.env("NIGHTLY") != "1"   # <-- add this
  agents:
    queue: small_cpu_queue_release
  ...

This makes the nightly-skip intent explicit at the step that runs and matches the file's own convention of guarding nightly-conditional steps with their own if: clause.

agents:
queue: small_cpu_queue_release
commands:
- "bash .buildkite/scripts/publish-release-images.sh"
plugins:
- docker-login#v3.0.0:
username: vllmbot
password-env: DOCKERHUB_TOKEN
env:
DOCKER_BUILDKIT: "1"
DOCKERHUB_USERNAME: "vllmbot"

- group: "Publish wheels"
key: "publish-wheels"
steps:
Expand Down
94 changes: 1 addition & 93 deletions .buildkite/scripts/annotate-release.sh
Original file line number Diff line number Diff line change
Expand Up @@ -8,8 +8,6 @@ if [ -z "${RELEASE_VERSION}" ]; then
RELEASE_VERSION="1.0.0.dev"
fi

ROCM_BASE_CACHE_KEY=$(.buildkite/scripts/cache-rocm-base-wheels.sh key)

buildkite-agent annotate --style 'info' --context 'release-workflow' << EOF
To download the wheel (by commit):
\`\`\`
Expand All @@ -25,95 +23,5 @@ aws s3 cp s3://vllm-wheels/${BUILDKITE_COMMIT}/vllm-${RELEASE_VERSION}+cpu-cp38-
aws s3 cp s3://vllm-wheels/${BUILDKITE_COMMIT}/vllm-${RELEASE_VERSION}+cpu-cp38-abi3-manylinux_2_35_aarch64.whl .
\`\`\`


To download and upload the image:

\`\`\`
# Download images:

docker pull public.ecr.aws/q9t5s3a7/vllm-release-repo:${BUILDKITE_COMMIT}-x86_64
docker pull public.ecr.aws/q9t5s3a7/vllm-release-repo:${BUILDKITE_COMMIT}-aarch64
docker pull public.ecr.aws/q9t5s3a7/vllm-release-repo:${BUILDKITE_COMMIT}-x86_64-cu129
docker pull public.ecr.aws/q9t5s3a7/vllm-release-repo:${BUILDKITE_COMMIT}-aarch64-cu129
docker pull public.ecr.aws/q9t5s3a7/vllm-release-repo:${ROCM_BASE_CACHE_KEY}-rocm-base
docker pull public.ecr.aws/q9t5s3a7/vllm-release-repo:${BUILDKITE_COMMIT}-rocm
docker pull public.ecr.aws/q9t5s3a7/vllm-cpu-release-repo:v${RELEASE_VERSION}
docker pull public.ecr.aws/q9t5s3a7/vllm-arm64-cpu-release-repo:v${RELEASE_VERSION}

# Tag and push images:

## CUDA

docker tag public.ecr.aws/q9t5s3a7/vllm-release-repo:${BUILDKITE_COMMIT}-x86_64 vllm/vllm-openai:x86_64
docker tag vllm/vllm-openai:x86_64 vllm/vllm-openai:latest-x86_64
docker tag vllm/vllm-openai:x86_64 vllm/vllm-openai:v${RELEASE_VERSION}-x86_64
docker push vllm/vllm-openai:latest-x86_64
docker push vllm/vllm-openai:v${RELEASE_VERSION}-x86_64

docker tag public.ecr.aws/q9t5s3a7/vllm-release-repo:${BUILDKITE_COMMIT}-x86_64-cu129 vllm/vllm-openai:x86_64-cu129
docker tag vllm/vllm-openai:x86_64-cu129 vllm/vllm-openai:latest-x86_64-cu129
docker tag vllm/vllm-openai:x86_64-cu129 vllm/vllm-openai:v${RELEASE_VERSION}-x86_64-cu129
docker push vllm/vllm-openai:latest-x86_64-cu129
docker push vllm/vllm-openai:v${RELEASE_VERSION}-x86_64-cu129

docker tag public.ecr.aws/q9t5s3a7/vllm-release-repo:${BUILDKITE_COMMIT}-aarch64 vllm/vllm-openai:aarch64
docker tag vllm/vllm-openai:aarch64 vllm/vllm-openai:latest-aarch64
docker tag vllm/vllm-openai:aarch64 vllm/vllm-openai:v${RELEASE_VERSION}-aarch64
docker push vllm/vllm-openai:latest-aarch64
docker push vllm/vllm-openai:v${RELEASE_VERSION}-aarch64

docker tag public.ecr.aws/q9t5s3a7/vllm-release-repo:${BUILDKITE_COMMIT}-aarch64-cu129 vllm/vllm-openai:aarch64-cu129
docker tag vllm/vllm-openai:aarch64-cu129 vllm/vllm-openai:latest-aarch64-cu129
docker tag vllm/vllm-openai:aarch64-cu129 vllm/vllm-openai:v${RELEASE_VERSION}-aarch64-cu129
docker push vllm/vllm-openai:latest-aarch64-cu129
docker push vllm/vllm-openai:v${RELEASE_VERSION}-aarch64-cu129

## ROCm

docker tag public.ecr.aws/q9t5s3a7/vllm-release-repo:${BUILDKITE_COMMIT}-rocm vllm/vllm-openai-rocm:${BUILDKITE_COMMIT}
docker tag vllm/vllm-openai-rocm:${BUILDKITE_COMMIT} vllm/vllm-openai-rocm:latest
docker tag vllm/vllm-openai-rocm:${BUILDKITE_COMMIT} vllm/vllm-openai-rocm:v${RELEASE_VERSION}
docker push vllm/vllm-openai-rocm:latest
docker push vllm/vllm-openai-rocm:v${RELEASE_VERSION}

docker tag public.ecr.aws/q9t5s3a7/vllm-release-repo:${ROCM_BASE_CACHE_KEY}-rocm-base vllm/vllm-openai-rocm:${BUILDKITE_COMMIT}-base
docker tag vllm/vllm-openai-rocm:${BUILDKITE_COMMIT}-base vllm/vllm-openai-rocm:latest-base
docker tag vllm/vllm-openai-rocm:${BUILDKITE_COMMIT}-base vllm/vllm-openai-rocm:v${RELEASE_VERSION}-base
docker push vllm/vllm-openai-rocm:latest-base
docker push vllm/vllm-openai-rocm:v${RELEASE_VERSION}-base

## CPU

docker tag public.ecr.aws/q9t5s3a7/vllm-cpu-release-repo:v${RELEASE_VERSION} vllm/vllm-openai-cpu:x86_64
docker tag vllm/vllm-openai-cpu:x86_64 vllm/vllm-openai-cpu:latest-x86_64
docker tag vllm/vllm-openai-cpu:x86_64 vllm/vllm-openai-cpu:v${RELEASE_VERSION}-x86_64
docker push vllm/vllm-openai-cpu:latest-x86_64
docker push vllm/vllm-openai-cpu:v${RELEASE_VERSION}-x86_64

docker tag public.ecr.aws/q9t5s3a7/vllm-arm64-cpu-release-repo:v${RELEASE_VERSION} vllm/vllm-openai-cpu:arm64
docker tag vllm/vllm-openai-cpu:arm64 vllm/vllm-openai-cpu:latest-arm64
docker tag vllm/vllm-openai-cpu:arm64 vllm/vllm-openai-cpu:v${RELEASE_VERSION}-arm64
docker push vllm/vllm-openai-cpu:latest-arm64
docker push vllm/vllm-openai-cpu:v${RELEASE_VERSION}-arm64

# Create multi-arch manifest:

docker manifest rm vllm/vllm-openai:latest
docker manifest create vllm/vllm-openai:latest vllm/vllm-openai:latest-x86_64 vllm/vllm-openai:latest-aarch64
docker manifest create vllm/vllm-openai:v${RELEASE_VERSION} vllm/vllm-openai:v${RELEASE_VERSION}-x86_64 vllm/vllm-openai:v${RELEASE_VERSION}-aarch64
docker manifest push vllm/vllm-openai:latest
docker manifest push vllm/vllm-openai:v${RELEASE_VERSION}

docker manifest rm vllm/vllm-openai:latest-cu129
docker manifest create vllm/vllm-openai:latest-cu129 vllm/vllm-openai:latest-x86_64-cu129 vllm/vllm-openai:latest-aarch64-cu129
docker manifest create vllm/vllm-openai:v${RELEASE_VERSION}-cu129 vllm/vllm-openai:v${RELEASE_VERSION}-x86_64-cu129 vllm/vllm-openai:v${RELEASE_VERSION}-aarch64-cu129
docker manifest push vllm/vllm-openai:latest-cu129
docker manifest push vllm/vllm-openai:v${RELEASE_VERSION}-cu129

docker manifest rm vllm/vllm-openai-cpu:latest || true
docker manifest create vllm/vllm-openai-cpu:latest vllm/vllm-openai-cpu:latest-x86_64 vllm/vllm-openai-cpu:latest-arm64
docker manifest create vllm/vllm-openai-cpu:v${RELEASE_VERSION} vllm/vllm-openai-cpu:v${RELEASE_VERSION}-x86_64 vllm/vllm-openai-cpu:v${RELEASE_VERSION}-arm64
docker manifest push vllm/vllm-openai-cpu:latest
docker manifest push vllm/vllm-openai-cpu:v${RELEASE_VERSION}
\`\`\`
Docker images are published automatically by the "Publish release images to DockerHub" pipeline step.
EOF
172 changes: 172 additions & 0 deletions .buildkite/scripts/publish-release-images.sh
Original file line number Diff line number Diff line change
@@ -0,0 +1,172 @@
#!/bin/bash
# SPDX-License-Identifier: Apache-2.0
# SPDX-FileCopyrightText: Copyright contributors to the vLLM project
#
# Publish release Docker images from ECR to DockerHub.
# Pulls per-arch images, tags with latest and versioned tags, pushes them,
# then creates and pushes multi-arch manifests.

set -euo pipefail

RELEASE_VERSION=$(buildkite-agent meta-data get release-version 2>/dev/null | sed 's/^v//')
if [ -z "${RELEASE_VERSION}" ]; then
echo "ERROR: release-version metadata not set"
exit 1
fi

Check warning on line 15 in .buildkite/scripts/publish-release-images.sh

View check run for this annotation

Claude / Claude Code Review

Custom missing-version error message is unreachable under set -euo pipefail

Under `set -euo pipefail` (line 9), the assignment `RELEASE_VERSION=$(buildkite-agent meta-data get release-version 2>/dev/null | sed 's/^v//')` will abort the script before the `if [ -z "${RELEASE_VERSION}" ]` check on line 12 ever runs — `buildkite-agent meta-data get` exits non-zero when the key is unset, `pipefail` propagates that through the pipeline, and `set -e` kills the script on the failed command substitution. Combined with the `2>/dev/null` redirect, an operator whose metadata is mis
Comment thread
claude[bot] marked this conversation as resolved.

COMMIT="$BUILDKITE_COMMIT"
ROCM_BASE_CACHE_KEY=$(.buildkite/scripts/cache-rocm-base-wheels.sh key)

echo "========================================"
echo "Publishing release images v${RELEASE_VERSION}"
echo " Commit: ${COMMIT}"
echo " ROCm base cache key: ${ROCM_BASE_CACHE_KEY}"
echo "========================================"

# Login to ECR to pull staging images
aws ecr-public get-login-password --region us-east-1 | \
docker login --username AWS --password-stdin public.ecr.aws/q9t5s3a7

# ---- CUDA (default: 13.0) ----

docker pull public.ecr.aws/q9t5s3a7/vllm-release-repo:${COMMIT}-x86_64
docker pull public.ecr.aws/q9t5s3a7/vllm-release-repo:${COMMIT}-aarch64

docker tag public.ecr.aws/q9t5s3a7/vllm-release-repo:${COMMIT}-x86_64 vllm/vllm-openai:latest-x86_64
docker tag public.ecr.aws/q9t5s3a7/vllm-release-repo:${COMMIT}-x86_64 vllm/vllm-openai:v${RELEASE_VERSION}-x86_64
docker push vllm/vllm-openai:latest-x86_64
docker push vllm/vllm-openai:v${RELEASE_VERSION}-x86_64

docker tag public.ecr.aws/q9t5s3a7/vllm-release-repo:${COMMIT}-aarch64 vllm/vllm-openai:latest-aarch64
docker tag public.ecr.aws/q9t5s3a7/vllm-release-repo:${COMMIT}-aarch64 vllm/vllm-openai:v${RELEASE_VERSION}-aarch64
docker push vllm/vllm-openai:latest-aarch64
docker push vllm/vllm-openai:v${RELEASE_VERSION}-aarch64

docker manifest rm vllm/vllm-openai:latest || true
docker manifest rm vllm/vllm-openai:v${RELEASE_VERSION} || true
docker manifest create vllm/vllm-openai:latest vllm/vllm-openai:latest-x86_64 vllm/vllm-openai:latest-aarch64
docker manifest create vllm/vllm-openai:v${RELEASE_VERSION} vllm/vllm-openai:v${RELEASE_VERSION}-x86_64 vllm/vllm-openai:v${RELEASE_VERSION}-aarch64
docker manifest push vllm/vllm-openai:latest
docker manifest push vllm/vllm-openai:v${RELEASE_VERSION}

# ---- CUDA 12.9 ----

docker pull public.ecr.aws/q9t5s3a7/vllm-release-repo:${COMMIT}-x86_64-cu129
docker pull public.ecr.aws/q9t5s3a7/vllm-release-repo:${COMMIT}-aarch64-cu129

docker tag public.ecr.aws/q9t5s3a7/vllm-release-repo:${COMMIT}-x86_64-cu129 vllm/vllm-openai:latest-x86_64-cu129
docker tag public.ecr.aws/q9t5s3a7/vllm-release-repo:${COMMIT}-x86_64-cu129 vllm/vllm-openai:v${RELEASE_VERSION}-x86_64-cu129
docker push vllm/vllm-openai:latest-x86_64-cu129
docker push vllm/vllm-openai:v${RELEASE_VERSION}-x86_64-cu129

docker tag public.ecr.aws/q9t5s3a7/vllm-release-repo:${COMMIT}-aarch64-cu129 vllm/vllm-openai:latest-aarch64-cu129
docker tag public.ecr.aws/q9t5s3a7/vllm-release-repo:${COMMIT}-aarch64-cu129 vllm/vllm-openai:v${RELEASE_VERSION}-aarch64-cu129
docker push vllm/vllm-openai:latest-aarch64-cu129
docker push vllm/vllm-openai:v${RELEASE_VERSION}-aarch64-cu129

docker manifest rm vllm/vllm-openai:latest-cu129 || true
docker manifest rm vllm/vllm-openai:v${RELEASE_VERSION}-cu129 || true
docker manifest create vllm/vllm-openai:latest-cu129 vllm/vllm-openai:latest-x86_64-cu129 vllm/vllm-openai:latest-aarch64-cu129
docker manifest create vllm/vllm-openai:v${RELEASE_VERSION}-cu129 vllm/vllm-openai:v${RELEASE_VERSION}-x86_64-cu129 vllm/vllm-openai:v${RELEASE_VERSION}-aarch64-cu129
docker manifest push vllm/vllm-openai:latest-cu129
docker manifest push vllm/vllm-openai:v${RELEASE_VERSION}-cu129

# ---- Ubuntu 24.04 (CUDA 13.0) ----

docker pull public.ecr.aws/q9t5s3a7/vllm-release-repo:${COMMIT}-x86_64-ubuntu2404
docker pull public.ecr.aws/q9t5s3a7/vllm-release-repo:${COMMIT}-aarch64-ubuntu2404

docker tag public.ecr.aws/q9t5s3a7/vllm-release-repo:${COMMIT}-x86_64-ubuntu2404 vllm/vllm-openai:latest-x86_64-ubuntu2404
docker tag public.ecr.aws/q9t5s3a7/vllm-release-repo:${COMMIT}-x86_64-ubuntu2404 vllm/vllm-openai:v${RELEASE_VERSION}-x86_64-ubuntu2404
docker push vllm/vllm-openai:latest-x86_64-ubuntu2404
docker push vllm/vllm-openai:v${RELEASE_VERSION}-x86_64-ubuntu2404

docker tag public.ecr.aws/q9t5s3a7/vllm-release-repo:${COMMIT}-aarch64-ubuntu2404 vllm/vllm-openai:latest-aarch64-ubuntu2404
docker tag public.ecr.aws/q9t5s3a7/vllm-release-repo:${COMMIT}-aarch64-ubuntu2404 vllm/vllm-openai:v${RELEASE_VERSION}-aarch64-ubuntu2404
docker push vllm/vllm-openai:latest-aarch64-ubuntu2404
docker push vllm/vllm-openai:v${RELEASE_VERSION}-aarch64-ubuntu2404

docker manifest rm vllm/vllm-openai:latest-ubuntu2404 || true
docker manifest rm vllm/vllm-openai:v${RELEASE_VERSION}-ubuntu2404 || true
docker manifest create vllm/vllm-openai:latest-ubuntu2404 vllm/vllm-openai:latest-x86_64-ubuntu2404 vllm/vllm-openai:latest-aarch64-ubuntu2404
docker manifest create vllm/vllm-openai:v${RELEASE_VERSION}-ubuntu2404 vllm/vllm-openai:v${RELEASE_VERSION}-x86_64-ubuntu2404 vllm/vllm-openai:v${RELEASE_VERSION}-aarch64-ubuntu2404
docker manifest push vllm/vllm-openai:latest-ubuntu2404
docker manifest push vllm/vllm-openai:v${RELEASE_VERSION}-ubuntu2404

# ---- Ubuntu 24.04 (CUDA 12.9) ----

docker pull public.ecr.aws/q9t5s3a7/vllm-release-repo:${COMMIT}-x86_64-cu129-ubuntu2404
docker pull public.ecr.aws/q9t5s3a7/vllm-release-repo:${COMMIT}-aarch64-cu129-ubuntu2404

docker tag public.ecr.aws/q9t5s3a7/vllm-release-repo:${COMMIT}-x86_64-cu129-ubuntu2404 vllm/vllm-openai:latest-x86_64-cu129-ubuntu2404
docker tag public.ecr.aws/q9t5s3a7/vllm-release-repo:${COMMIT}-x86_64-cu129-ubuntu2404 vllm/vllm-openai:v${RELEASE_VERSION}-x86_64-cu129-ubuntu2404
docker push vllm/vllm-openai:latest-x86_64-cu129-ubuntu2404
docker push vllm/vllm-openai:v${RELEASE_VERSION}-x86_64-cu129-ubuntu2404

docker tag public.ecr.aws/q9t5s3a7/vllm-release-repo:${COMMIT}-aarch64-cu129-ubuntu2404 vllm/vllm-openai:latest-aarch64-cu129-ubuntu2404
docker tag public.ecr.aws/q9t5s3a7/vllm-release-repo:${COMMIT}-aarch64-cu129-ubuntu2404 vllm/vllm-openai:v${RELEASE_VERSION}-aarch64-cu129-ubuntu2404
docker push vllm/vllm-openai:latest-aarch64-cu129-ubuntu2404
docker push vllm/vllm-openai:v${RELEASE_VERSION}-aarch64-cu129-ubuntu2404

docker manifest rm vllm/vllm-openai:latest-cu129-ubuntu2404 || true
docker manifest rm vllm/vllm-openai:v${RELEASE_VERSION}-cu129-ubuntu2404 || true
docker manifest create vllm/vllm-openai:latest-cu129-ubuntu2404 vllm/vllm-openai:latest-x86_64-cu129-ubuntu2404 vllm/vllm-openai:latest-aarch64-cu129-ubuntu2404
docker manifest create vllm/vllm-openai:v${RELEASE_VERSION}-cu129-ubuntu2404 vllm/vllm-openai:v${RELEASE_VERSION}-x86_64-cu129-ubuntu2404 vllm/vllm-openai:v${RELEASE_VERSION}-aarch64-cu129-ubuntu2404
docker manifest push vllm/vllm-openai:latest-cu129-ubuntu2404
docker manifest push vllm/vllm-openai:v${RELEASE_VERSION}-cu129-ubuntu2404

# ---- ROCm ----

docker pull public.ecr.aws/q9t5s3a7/vllm-release-repo:${COMMIT}-rocm
docker pull public.ecr.aws/q9t5s3a7/vllm-release-repo:${ROCM_BASE_CACHE_KEY}-rocm-base

docker tag public.ecr.aws/q9t5s3a7/vllm-release-repo:${COMMIT}-rocm vllm/vllm-openai-rocm:latest
docker tag public.ecr.aws/q9t5s3a7/vllm-release-repo:${COMMIT}-rocm vllm/vllm-openai-rocm:v${RELEASE_VERSION}
docker push vllm/vllm-openai-rocm:latest
docker push vllm/vllm-openai-rocm:v${RELEASE_VERSION}

docker tag public.ecr.aws/q9t5s3a7/vllm-release-repo:${ROCM_BASE_CACHE_KEY}-rocm-base vllm/vllm-openai-rocm:latest-base
docker tag public.ecr.aws/q9t5s3a7/vllm-release-repo:${ROCM_BASE_CACHE_KEY}-rocm-base vllm/vllm-openai-rocm:v${RELEASE_VERSION}-base
docker push vllm/vllm-openai-rocm:latest-base
docker push vllm/vllm-openai-rocm:v${RELEASE_VERSION}-base

# ---- CPU ----
# CPU images are behind separate block steps and may not have been built.
# Attempt to pull and publish; skip gracefully if images are not available.

CPU_X86=false
CPU_ARM=false

if docker pull public.ecr.aws/q9t5s3a7/vllm-cpu-release-repo:v${RELEASE_VERSION} 2>/dev/null; then
docker tag public.ecr.aws/q9t5s3a7/vllm-cpu-release-repo:v${RELEASE_VERSION} vllm/vllm-openai-cpu:latest-x86_64
docker tag public.ecr.aws/q9t5s3a7/vllm-cpu-release-repo:v${RELEASE_VERSION} vllm/vllm-openai-cpu:v${RELEASE_VERSION}-x86_64
docker push vllm/vllm-openai-cpu:latest-x86_64
docker push vllm/vllm-openai-cpu:v${RELEASE_VERSION}-x86_64
CPU_X86=true
else
echo "WARNING: x86_64 CPU image not found, skipping (ensure block-cpu-release-image-build was unblocked)"
fi

if docker pull public.ecr.aws/q9t5s3a7/vllm-arm64-cpu-release-repo:v${RELEASE_VERSION} 2>/dev/null; then
docker tag public.ecr.aws/q9t5s3a7/vllm-arm64-cpu-release-repo:v${RELEASE_VERSION} vllm/vllm-openai-cpu:latest-arm64
docker tag public.ecr.aws/q9t5s3a7/vllm-arm64-cpu-release-repo:v${RELEASE_VERSION} vllm/vllm-openai-cpu:v${RELEASE_VERSION}-arm64
docker push vllm/vllm-openai-cpu:latest-arm64
docker push vllm/vllm-openai-cpu:v${RELEASE_VERSION}-arm64
CPU_ARM=true
else
echo "WARNING: arm64 CPU image not found, skipping (ensure block-arm64-cpu-release-image-build was unblocked)"
fi

Check failure on line 158 in .buildkite/scripts/publish-release-images.sh

View check run for this annotation

Claude / Claude Code Review

CPU publish path silently swallows errors and tag-format mismatches

The CPU pull at lines 140 and 150 uses `if docker pull ... 2>/dev/null; then` — the `if` construct disables `set -e` and the stderr redirect hides the actual error, so any non-zero exit (transient ECR error, expired auth, daemon timeout, IAM regression, **or a tag-format mismatch**) is treated identically to 'image not built' and produces only a misleading WARNING. A concrete trigger you can hit today: the CPU build steps in release-pipeline.yaml (lines 319-321 / 337-339) tag ECR with the **raw*
Comment thread
claude[bot] marked this conversation as resolved.
Outdated

if [ "$CPU_X86" = "true" ] && [ "$CPU_ARM" = "true" ]; then
docker manifest rm vllm/vllm-openai-cpu:latest || true
docker manifest rm vllm/vllm-openai-cpu:v${RELEASE_VERSION} || true
docker manifest create vllm/vllm-openai-cpu:latest vllm/vllm-openai-cpu:latest-x86_64 vllm/vllm-openai-cpu:latest-arm64
docker manifest create vllm/vllm-openai-cpu:v${RELEASE_VERSION} vllm/vllm-openai-cpu:v${RELEASE_VERSION}-x86_64 vllm/vllm-openai-cpu:v${RELEASE_VERSION}-arm64
docker manifest push vllm/vllm-openai-cpu:latest
docker manifest push vllm/vllm-openai-cpu:v${RELEASE_VERSION}
else
echo "WARNING: Skipping CPU multi-arch manifest (both x86_64 and arm64 images required)"
fi
Comment thread
claude[bot] marked this conversation as resolved.
Outdated

echo ""
echo "Successfully published release images for v${RELEASE_VERSION}"
Loading