Skip to content
Draft
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
2 changes: 1 addition & 1 deletion container/Dockerfile
Original file line number Diff line number Diff line change
Expand Up @@ -12,7 +12,7 @@ ARG BASE_IMAGE="nvcr.io/nvidia/cuda-dl-base"
# Please check https://github.com/ai-dynamo/dynamo/pull/1065
# for details and reproducer to manually test if the image
# can be updated to later versions.
ARG BASE_IMAGE_TAG="25.01-cuda12.8-devel-ubuntu24.04"
ARG BASE_IMAGE_TAG="25.01-cuda12.9-devel-ubuntu24.04"
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue

🧩 Analysis chain

Tag likely invalid: cuda-dl-base 25.01 + CUDA 12.9 combo probably doesn’t exist

25.01 predates CUDA 12.9; nvcr.io/nvidia/cuda-dl-base is unlikely to publish 25.01-cuda12.9-*. This will 404 at build time. Align the HPC track to the first release that actually ships CUDA 12.9 (e.g., 25.09+), or drop the 25.01 constraint if you’re unblocking tests.

Run to confirm the exact valid tag (may require NGC auth):

Optional example fix (adjust to the tag you find):

-ARG BASE_IMAGE_TAG="25.01-cuda12.9-devel-ubuntu24.04"
+ARG BASE_IMAGE_TAG="25.09-cuda12.9-devel-ubuntu24.04"

🏁 Script executed:

#!/bin/bash
set -euo pipefail
REPO="nvcr.io/nvidia/cuda-dl-base"
for track in 25.01 25.05 25.07 25.08 25.09 25.10 25.11 25.12; do
  tag="${track}-cuda12.9-devel-ubuntu24.04"
  echo "Checking $REPO:$tag ..."
  if docker manifest inspect "$REPO:$tag" >/dev/null 2>&1; then
    echo "FOUND: $REPO:$tag"
  fi
done

Length of output: 591


Invalid base image tag — fix before merge

container/Dockerfile:15 — ARG BASE_IMAGE_TAG="25.01-cuda12.9-devel-ubuntu24.04"

The supplied manifest check shows none of the tested nvcr.io/nvidia/cuda-dl-base:*-cuda12.9-devel-ubuntu24.04 tags exist (docker manifest inspect returned no matches). This will 404 at build time.

Action: replace the ARG with a published nvcr.io/nvidia/cuda-dl-base tag that actually includes CUDA 12.9 (or remove the 25.01 track constraint) and verify with docker manifest inspect or the NGC registry before merging.

🤖 Prompt for AI Agents
In container/Dockerfile around line 15, ARG
BASE_IMAGE_TAG="25.01-cuda12.9-devel-ubuntu24.04" references a non-existent
nvcr.io/nvidia/cuda-dl-base tag and will 404 at build time; replace that ARG
with a published nvcr.io/nvidia/cuda-dl-base tag that includes CUDA 12.9 (or
remove the "25.01" track constraint) and then verify the chosen tag exists by
running docker manifest inspect nvcr.io/nvidia/cuda-dl-base:<TAG> (or check the
NGC registry) before merging.


# Build configuration
ARG RELEASE_BUILD=false
Expand Down
4 changes: 2 additions & 2 deletions container/Dockerfile.sglang
Original file line number Diff line number Diff line change
Expand Up @@ -6,10 +6,10 @@ ARG BASE_IMAGE="nvcr.io/nvidia/cuda-dl-base"
# Please check https://github.com/ai-dynamo/dynamo/pull/1065
# for details and reproducer to manually test if the image
# can be updated to later versions.
ARG BASE_IMAGE_TAG="25.01-cuda12.8-devel-ubuntu24.04"
ARG BASE_IMAGE_TAG="25.01-cuda12.9-devel-ubuntu24.04"
ARG RELEASE_BUILD
ARG RUNTIME_IMAGE="nvcr.io/nvidia/cuda"
ARG RUNTIME_IMAGE_TAG="12.8.1-runtime-ubuntu24.04"
ARG RUNTIME_IMAGE_TAG="12.9.1-runtime-ubuntu24.04"
Comment on lines +9 to +12
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue

🧩 Analysis chain

Same base/runtime tag check as vLLM; 25.01 + CUDA 12.9 likely invalid

Ensure ${BASE_IMAGE}:${BASE_IMAGE_TAG} actually exists; 25.01 probably doesn’t publish CUDA 12.9. Runtime tag 12.9.1-runtime-ubuntu24.04 is plausible but verify.

If you confirm a newer HPC track (e.g., 25.09) is required for 12.9, update both this file and vLLM to the same track to keep builds consistent.


🏁 Script executed:

#!/bin/bash
set -euo pipefail
docker manifest inspect nvcr.io/nvidia/cuda-dl-base:25.01-cuda12.9-devel-ubuntu24.04 >/dev/null \
  && echo "FOUND base tag" || echo "MISSING base tag"
docker manifest inspect nvcr.io/nvidia/cuda:12.9.1-runtime-ubuntu24.04 >/dev/null \
  && echo "FOUND runtime tag" || echo "MISSING runtime tag"

Length of output: 126


Base/runtime tag mismatch — 25.01 doesn't ship CUDA 12.9; update both to a matching 25.xx track

Local manifest check couldn't run here (docker not installed); NVIDIA release notes show 25.01 images are built with CUDA 12.8 while CUDA 12.9.1 is provided in later CUDA‑DL releases (25.06). (docs.nvidia.com)

nvcr.io/nvidia/cuda-dl-base:25.01-cuda12.9-devel-ubuntu24.04 is therefore likely invalid — change BASE_IMAGE_TAG to nvcr.io/nvidia/cuda-dl-base:25.06-cuda12.9-devel-ubuntu24.04 (or revert CUDA to 12.8 for 25.01) and ensure the runtime tag is on the same 25.xx track; 12.9.1 CUDA images are available on Docker Hub / NGC. (docker.aityp.com)

File: container/Dockerfile.sglang (lines 9–12).


# Make sure to update the dependency version in pyproject.toml when updating this
ARG SGLANG_VERSION="0.5.0rc2"
Expand Down
6 changes: 3 additions & 3 deletions container/Dockerfile.vllm
Original file line number Diff line number Diff line change
Expand Up @@ -7,11 +7,11 @@ ARG BASE_IMAGE="nvcr.io/nvidia/cuda-dl-base"
# Please check https://github.com/ai-dynamo/dynamo/pull/1065
# for details and reproducer to manually test if the image
# can be updated to later versions.
ARG BASE_IMAGE_TAG="25.01-cuda12.8-devel-ubuntu24.04"
ARG BASE_IMAGE_TAG="25.01-cuda12.9-devel-ubuntu24.04"
ARG RELEASE_BUILD
ARG ENABLE_KVBM=false
ARG RUNTIME_IMAGE="nvcr.io/nvidia/cuda"
ARG RUNTIME_IMAGE_TAG="12.8.1-runtime-ubuntu24.04"
ARG RUNTIME_IMAGE_TAG="12.9.1-runtime-ubuntu24.04"

# Make sure to update the dependency version in pyproject.toml when updating this
ARG VLLM_REF="1da94e673c257373280026f75ceb4effac80e892" # from v0.10.1.1
Expand Down Expand Up @@ -200,7 +200,7 @@ RUN apt-get update && \
# prometheus dependencies
ca-certificates \
# DeepGemm uses 'cuobjdump' which does not come with CUDA image
cuda-command-line-tools-12-8 && \
cuda-command-line-tools-12-9 && \
rm -rf /var/lib/apt/lists/*

# Copy CUDA development tools (nvcc, headers, dependencies, etc.) from base devel image
Expand Down
Loading