diff --git a/.devcontainer/README.md b/.devcontainer/README.md index 5209a483d00..8854410980b 100644 --- a/.devcontainer/README.md +++ b/.devcontainer/README.md @@ -91,13 +91,21 @@ Follow these steps to get your NVIDIA Dynamo development environment up and runn ### Step 1: Build the Development Container Image -Build `dynamo:latest-vllm` from scratch from the source: +Build `dynamo:latest-vllm-local-dev` from scratch from the source: ```bash -./container/build.sh --target dev --framework VLLM +# Single command approach (recommended) +./container/build.sh --framework VLLM --target local-dev +# Creates both dynamo:latest-vllm and dynamo:latest-vllm-local-dev + +# Alternatively, you can build a development container then local-dev +./container/build.sh --framework VLLM +# Now you have a development image dynamo:latest-vllm +./container/build.sh --dev-image dynamo:latest-vllm --framework VLLM +# Now you have a local-dev image dynamo:latest-vllm-local-dev ``` -The container will be built and give certain file permissions to your local uid and gid. +The local-dev image will give you local user permissions matching your host user and includes extra developer utilities (debugging tools, text editors, system monitors, etc.). ### Step 2: Install Dev Containers Extension @@ -235,30 +243,6 @@ cp .devcontainer/devcontainer.json .devcontainer/jensen_dev/devcontainer.json Common customizations include additional mounts, environment variables, IDE extensions, and build arguments. When you open a new Dev Container, you can pick from any of the `.devcontainer//devcontainer.json` files available. -### SGLANG Custom devcontainer.json Configuration (EXPERIMENTAL) - -This is experimental. Please update/fix if you encounter problems. For sglang Dev Container, you first need to build `dynamo:latest-sglang-local-dev` image like this (wait about half an hour): - -```bash -./container/build.sh --framework SGLANG --target local-dev -``` - -Then, make a copy of the `devcontainer.json file` to a directory of your choice. For this example, we'll just call it `sglang`: - -```bash -mkdir .devcontainer/sglang/ -cp -a .devcontainer/devcontainer.json .devcontainer/sglang/ -``` - -Afterwards, edit your `.devcontainer/sglang/devcontainer.json` so that the name and image correspond to SGLANG. Example: -```json - "name": "[sglang] This is my amazing custom Dev Container Development", - ... - "image": "dynamo:latest-sglang-local-dev", -``` - -Now, go to **Dev Containers: Open Folder in Container** and select `[sglang] This is my amazing custom Dev Container Development`. The post-create.sh script should be running. - ### SSH Keys for Git Operations @@ -372,13 +356,15 @@ If you see errors like "container is not running" or "An error occurred setting **Common Causes and Solutions:** -1. **Missing base image:** +1. **Missing a local-dev image:** ```bash - # Check if the required image exists + # Check if the required local-dev image exists docker images | grep dynamo - # If missing, build the dev image first - ./container/build.sh --target local-dev + # If missing, build the dev image first, then build local-dev + ./container/build.sh --framework vllm + ./container/build.sh --dev-image dynamo:latest-vllm --framework vllm + # Output: dynamo:latest-vllm-local-dev ``` 2. **Container startup failure:** diff --git a/container/Dockerfile.local_dev b/container/Dockerfile.local_dev new file mode 100644 index 00000000000..19a6a1e5a00 --- /dev/null +++ b/container/Dockerfile.local_dev @@ -0,0 +1,125 @@ +# syntax=docker/dockerfile:1.10.0 +# SPDX-FileCopyrightText: Copyright (c) 2024-2025 NVIDIA CORPORATION & AFFILIATES. All rights reserved. +# SPDX-License-Identifier: Apache-2.0 + +# This Dockerfile creates a local development environment for Dev Container plug-in use. +# It takes a BASE image (typically the dev target) and adds local-dev specific configurations +# including additional developer utilities and tools. +# +# Usage: +# - Dev Container IDE Extension: Use directly with VS Code/Cursor Dev Container extension +# - Command line: run.sh --image --mount-workspace ... +# where the ubuntu user inside the container is mapped to your local user login + +ARG DEV_BASE="" +FROM ${DEV_BASE} AS local-dev + +# Don't want ubuntu to be editable, just change uid and gid. +ENV USERNAME=ubuntu +ARG USER_UID +ARG USER_GID +ARG WORKSPACE_DIR=/workspace + +ARG ARCH + +# Update package lists and install developer utilities. Some of these may exist in the base image, +# but to ensure consistency across all dev images, we explicitly list all required dev tools here. +RUN apt-get update && apt-get install -y \ + # Development utilities + curl wget git vim nano \ + # System utilities + htop nvtop tmux screen \ + # Network utilities + net-tools iproute2 iputils-ping \ + # Archive utilities + zip unzip rsync \ + # Build tools + build-essential cmake autoconf automake libtool \ + # Debug and analysis tools + gdb valgrind strace ltrace \ + # Text processing + jq yq grep sed \ + # File utilities + tree fd-find ripgrep \ + # Shell utilities + zsh fish bash-completion + +# https://code.visualstudio.com/remote/advancedcontainers/add-nonroot-user +# Configure user with sudo access for Dev Container workflows +RUN apt-get install -y sudo gnupg2 gnupg1 \ + && echo "$USERNAME ALL=(root) NOPASSWD:ALL" > /etc/sudoers.d/$USERNAME \ + && chmod 0440 /etc/sudoers.d/$USERNAME \ + && mkdir -p /home/$USERNAME \ + && groupmod -g $USER_GID $USERNAME \ + && usermod -u $USER_UID -g $USER_GID $USERNAME \ + && chown -R $USERNAME:$USERNAME /home/$USERNAME \ + && chsh -s /bin/bash $USERNAME + +# Install awk separately with fault tolerance +# awk is a virtual package with multiple implementations (gawk, mawk, original-awk). +# Separated because TensorRT-LLM builds failed on awk package conflicts. +# This prevents main package installation failures due to awk availability issues. +RUN (apt-get install -y gawk || \ + apt-get install -y mawk || \ + apt-get install -y original-awk || \ + echo "Warning: Could not install any awk implementation") && \ + (which awk && echo "awk successfully installed: $(which awk)" || echo "awk not available") + +# Add NVIDIA devtools repository and install development tools +RUN wget -qO - https://developer.download.nvidia.com/devtools/repos/ubuntu2404/${ARCH}/nvidia.pub | gpg --dearmor -o /etc/apt/keyrings/nvidia-devtools.gpg && \ + echo "deb [signed-by=/etc/apt/keyrings/nvidia-devtools.gpg] https://developer.download.nvidia.com/devtools/repos/ubuntu2404/${ARCH} /" | tee /etc/apt/sources.list.d/nvidia-devtools.list && \ + apt-get update && \ + apt-get install -y nsight-systems-2025.5.1 + +# Clean up package lists at the end +RUN rm -rf /var/lib/apt/lists/* + +# Set workspace directory variable +ENV WORKSPACE_DIR=${WORKSPACE_DIR} + +# Development environment variables for the local-dev target +# Path configuration notes: +# - DYNAMO_HOME: Main project directory (workspace mount point) +# - CARGO_TARGET_DIR: Build artifacts in workspace/target for persistence +# - CARGO_HOME: Must be in $HOME/.cargo (not workspace) because: +# * Workspace gets mounted to different paths where cargo binaries may not exist +# * Contains critical cargo binaries and registry that need consistent paths +# - RUSTUP_HOME: Must be in $HOME/.rustup (not workspace) because: +# * Contains rust toolchain binaries that must be at expected system paths +# * Workspace mount point would break rustup's toolchain resolution +# - PATH: Includes cargo binaries for rust tool access +ENV DYNAMO_HOME=${WORKSPACE_DIR} +ENV CARGO_TARGET_DIR=${WORKSPACE_DIR}/target +ENV CARGO_HOME=${HOME}/.cargo +ENV RUSTUP_HOME=${HOME}/.rustup +ENV PATH=${CARGO_HOME}/bin:$PATH + +# Copy Rust toolchain from system directories to user home directories with proper ownership +RUN rsync -a --chown=$USER_UID:$USER_GID /usr/local/rustup/ $RUSTUP_HOME/ + +RUN rsync -a --chown=$USER_UID:$USER_GID /usr/local/cargo/ $CARGO_HOME/ + +# Copy virtual environment with proper ownership using rsync instead of chown. +# Why rsync instead of chown -R: +# chown -R is extremely slow in Docker containers, especially on large directory trees +# like Python virtual environments with thousands of files. This is a well-documented +# Docker performance issue. rsync --chown is 3-4x faster as it sets ownership during copy. +RUN rsync -a --chown=$USER_UID:$USER_GID ${VIRTUAL_ENV}/ /tmp/venv-temp/ && \ + rm -rf ${VIRTUAL_ENV} && \ + mv /tmp/venv-temp ${VIRTUAL_ENV} + +# At this point, we are executing as the ubuntu user +USER $USERNAME +ENV HOME=/home/$USERNAME +WORKDIR $HOME + +# https://code.visualstudio.com/remote/advancedcontainers/persist-bash-history +RUN SNIPPET="export PROMPT_COMMAND='history -a' && export HISTFILE=$HOME/.commandhistory/.bash_history" \ + && mkdir -p $HOME/.commandhistory \ + && touch $HOME/.commandhistory/.bash_history \ + && echo "$SNIPPET" >> "$HOME/.bashrc" + +RUN mkdir -p /home/$USERNAME/.cache/ + +ENTRYPOINT ["/opt/nvidia/nvidia_entrypoint.sh"] +CMD [] diff --git a/container/Dockerfile.vllm b/container/Dockerfile.vllm index 4d267ec7287..a4628a0d298 100644 --- a/container/Dockerfile.vllm +++ b/container/Dockerfile.vllm @@ -289,125 +289,6 @@ RUN --mount=type=bind,source=./container/launch_message.txt,target=/workspace/la ENTRYPOINT ["/opt/nvidia/nvidia_entrypoint.sh"] CMD [] -####################################################################### -########## Development (Dev Container only) ########################### -####################################################################### -# -# This stage is for Dev Container plug-in use only. -# It provides a local development environment with extra tools and dependencies -# not present in the base runtime image. - -FROM runtime AS local-dev - -# Don't want ubuntu to be editable, just change uid and gid. -ENV USERNAME=ubuntu -ARG USER_UID -ARG USER_GID -ARG WORKSPACE_DIR=/workspace - -# Install utilities as root -RUN apt-get update -y && \ - apt-get install -y --no-install-recommends \ - # Install utilities - nvtop \ - wget \ - tmux \ - vim \ - git \ - openssh-client \ - iproute2 \ - rsync \ - zip \ - unzip \ - htop \ - # Build Dependencies - autoconf \ - automake \ - cmake \ - libtool \ - meson \ - net-tools \ - pybind11-dev \ - # Rust build dependencies - clang \ - libclang-dev \ - protobuf-compiler && \ - rm -rf /var/lib/apt/lists/* - - -ARG ARCH -RUN wget -qO - https://developer.download.nvidia.com/devtools/repos/ubuntu2404/${ARCH}/nvidia.pub | gpg --dearmor -o /etc/apt/keyrings/nvidia-devtools.gpg && \ - echo "deb [signed-by=/etc/apt/keyrings/nvidia-devtools.gpg] https://developer.download.nvidia.com/devtools/repos/ubuntu2404/${ARCH} /" | tee /etc/apt/sources.list.d/nvidia-devtools.list && \ - apt-get update && \ - apt-get install -y nsight-systems-2025.5.1 && \ - rm -rf /var/lib/apt/lists/* - -COPY --from=runtime /usr/local/bin /usr/local/bin - -# https://code.visualstudio.com/remote/advancedcontainers/add-nonroot-user -# Will use the default ubuntu user, but give sudo access -# Needed so files permissions aren't set to root ownership when writing from inside container -RUN apt-get update && apt-get install -y sudo gnupg2 gnupg1 \ - && echo "$USERNAME ALL=(root) NOPASSWD:ALL" > /etc/sudoers.d/$USERNAME \ - && chmod 0440 /etc/sudoers.d/$USERNAME \ - && mkdir -p /home/$USERNAME \ - && groupmod -g $USER_GID $USERNAME \ - && usermod -u $USER_UID -g $USER_GID $USERNAME \ - && chown -R $USERNAME:$USERNAME /home/$USERNAME \ - && rm -rf /var/lib/apt/lists/* \ - && chsh -s /bin/bash $USERNAME - -# At this point, we are executing as the ubuntu user -USER $USERNAME -ENV HOME=/home/$USERNAME -WORKDIR $HOME - -# Set workspace directory variable -ENV WORKSPACE_DIR=${WORKSPACE_DIR} - -# Development environment variables for the dev target -# Path configuration notes: -# - DYNAMO_HOME: Main project directory (workspace mount point) -# - CARGO_TARGET_DIR: Build artifacts in workspace/target for persistence -# - CARGO_HOME: Must be in $HOME/.cargo (not workspace) because: -# * Workspace gets mounted to different paths where cargo binaries may not exist -# * Contains critical cargo binaries and registry that need consistent paths -# - RUSTUP_HOME: Must be in $HOME/.rustup (not workspace) because: -# * Contains rust toolchain binaries that must be at expected system paths -# * Workspace mount point would break rustup's toolchain resolution -# - PATH: Includes cargo binaries for rust tool access -ENV DYNAMO_HOME=${WORKSPACE_DIR} -ENV CARGO_TARGET_DIR=${WORKSPACE_DIR}/target -ENV CARGO_HOME=${HOME}/.cargo -ENV RUSTUP_HOME=${HOME}/.rustup -ENV PATH=${CARGO_HOME}/bin:$PATH - -COPY --from=dynamo_base --chown=$USER_UID:$USER_GID /usr/local/rustup $RUSTUP_HOME -COPY --from=dynamo_base --chown=$USER_UID:$USER_GID /usr/local/cargo $CARGO_HOME - -# This is a slow operation (~40s on my cpu) -# Much better than chown -R $USERNAME:$USERNAME /opt/dynamo/venv (~10min on my cpu) -COPY --from=runtime --chown=$USER_UID:$USER_GID ${VIRTUAL_ENV} ${VIRTUAL_ENV} - -# so we can use maturin develop -RUN uv pip install maturin[patchelf] - -# Make sure to sync this with the one specified on README.md. -# This is a generic PYTHONPATH which works for all the frameworks, so some paths may not be relevant for this particular framework. -ENV PYTHONPATH=${WORKSPACE_DIR}/components/metrics/src:${WORKSPACE_DIR}/components/frontend/src:${WORKSPACE_DIR}/components/planner/src:${WORKSPACE_DIR}/components/backends/mocker/src:${WORKSPACE_DIR}/components/backends/trtllm/src:${WORKSPACE_DIR}/components/backends/vllm/src:${WORKSPACE_DIR}/components/backends/sglang/src:${WORKSPACE_DIR}/components/backends/llama_cpp/src - -# https://code.visualstudio.com/remote/advancedcontainers/persist-bash-history -RUN SNIPPET="export PROMPT_COMMAND='history -a' && export HISTFILE=$HOME/.commandhistory/.bash_history" \ - && mkdir -p $HOME/.commandhistory \ - && touch $HOME/.commandhistory/.bash_history \ - && echo "$SNIPPET" >> "$HOME/.bashrc" - -RUN mkdir -p /home/$USERNAME/.cache/ - -ENTRYPOINT ["/opt/nvidia/nvidia_entrypoint.sh"] -CMD [] - - ########################################################### ########## Development (run.sh, runs as root user) ######## ########################################################### diff --git a/container/README.md b/container/README.md index 3aeb91e3e7b..2acd6152ac0 100644 --- a/container/README.md +++ b/container/README.md @@ -38,8 +38,8 @@ The `build.sh` and `run.sh` scripts are convenience wrappers that simplify commo These targets are specified with `build.sh --target ` and correspond to Docker multi-stage build targets defined in the Dockerfiles (e.g., `FROM somebase AS `). Some commonly used targets include: - `runtime` - For running pre-built containers without development tools (minimal size) -- `dev` - For development with full toolchain (git, vim, build tools, etc.) -- `local-dev` - For development with user-based permissions matching host UID/GID +- `dev` - For development (inferencing/benchmarking/etc, runs as root user) +- `local-dev` - For development with local user permissions matching host UID/GID. This is useful when mounting host partitions (with local user permissions) to Docker partitions. Additional targets are available in the Dockerfiles for specific build stages and use cases. @@ -73,24 +73,24 @@ Compatibility │ Legacy workflows, │ workspace writable on NFS│work ## Usage Guidelines -- **Use dev + `run.sh`**: `run.sh` script for command-line development by root user -- **Use local-dev + `run.sh`**: `run.sh` script for command-line development using your local user ID +- **Use dev + `run.sh`**: for command-line testing. Runs as root user +- **Use local-dev + `run.sh`**: for command-line development and Docker mounted partitions using your local user ID - **Use local-dev + Dev Container**: VS Code/Cursor Dev Container Plugin, using your local user ID ## Example Commands -### 1. dev + `run.sh`: +### 1. dev + `run.sh` (runs as root): ```bash -run.sh --mount-workspace ... +run.sh ... ``` -### 2. local-dev + `run.sh`: +### 2. local-dev + `run.sh` (runs as the local user): ```bash run.sh --mount-workspace --image dynamo:latest-vllm-local-dev ... ``` -### 3. local-dev + Dev Container: -Use VS Code/Cursor Dev Container plugin with devcontainer.json configuration +### 3. local-dev + Dev Container Extension: +Use VS Code/Cursor Dev Container Extension with devcontainer.json configuration ## Build and Run Scripts Overview @@ -107,22 +107,22 @@ The `build.sh` script is responsible for building Docker images for different AI **Key Features:** - **Framework Support**: vLLM (default when --framework not specified), TensorRT-LLM, SGLang, or NONE - **Multi-stage Builds**: Build process with base images -- **Development Targets**: Supports `dev` and `local-dev` targets +- **Development Targets**: Supports `dev` target and `local-dev` target - **Build Caching**: Docker layer caching and sccache support - **GPU Optimization**: CUDA, EFA, and NIXL support **Common Usage Examples:** ```bash -# Build vLLM image (default) +# Build vLLM dev image called dynamo:latest-vllm (default). This runs as root and is fine to use for inferencing/benchmarking, etc. ./build.sh -# Build with specific framework -./build.sh --framework trtllm - -# Build local development image +# Build both development and local-dev images (integrated into build.sh). While the dev image runs as root, the local-dev image will run as the local user, which is useful when mounting partitions. It will also contain development tools. ./build.sh --framework vllm --target local-dev +# Build TensorRT-LLM development image called dynamo:latest-trtllm +./build.sh --framework trtllm + # Build with custom tag ./build.sh --framework sglang --tag my-custom-tag @@ -136,6 +136,23 @@ The `build.sh` script is responsible for building Docker images for different AI ./build.sh --build-arg CUSTOM_ARG=value ``` +### build.sh --dev-image - Local Development Image Builder + +The `build.sh --dev-image` option takes a dev image and then builds a local-dev image, which contains proper local user permissions. It also includes extra developer utilities (debugging tools, text editors, system monitors, etc.). + +**Common Usage Examples:** + +```bash +# Build local-dev image from dev image dynamo:latest-vllm +./build.sh --dev-image dynamo:latest-vllm --framework vllm + +# Build with custom tag from dev image dynamo:latest-vllm +./build.sh --dev-image dynamo:latest-vllm --framework vllm --tag my-local:dev + +# Dry run to see what would be built +./build.sh --dev-image dynamo:latest-vllm --framework vllm --dry-run +``` + ### run.sh - Container Runtime Manager The `run.sh` script launches Docker containers with the appropriate configuration for development and inference workloads. @@ -156,42 +173,39 @@ The `run.sh` script launches Docker containers with the appropriate configuratio **Common Usage Examples:** ```bash -# Basic container launch -./run.sh +# Basic container launch (inference/production) +./run.sh --image dynamo:latest-vllm -# Mount workspace for development -./run.sh --mount-workspace +# Mount workspace for development (use local-dev image for local user permissions) +./run.sh --image dynamo:latest-vllm-local-dev --mount-workspace -# Use specific image and framework -./run.sh --image dynamo:latest-vllm --framework vllm +# Use specific image and framework for development +./run.sh --image v0.1.0.dev.08cc44965-vllm-local-dev --framework vllm --mount-workspace -# Interactive shell with workspace mounted -./run.sh --mount-workspace -it -- bash +# Interactive development shell with workspace mounted +./run.sh --image dynamo:latest-vllm-local-dev --mount-workspace -it -- bash -# Run with custom environment variables -./run.sh -e CUDA_VISIBLE_DEVICES=0,1 --mount-workspace +# Development with custom environment variables +./run.sh --image dynamo:latest-vllm-local-dev -e CUDA_VISIBLE_DEVICES=0,1 --mount-workspace -# Run without GPU access -./run.sh --gpus none +# Production inference without GPU access +./run.sh --image dynamo:latest-vllm --gpus none # Dry run to see docker command ./run.sh --dry-run -# Run with custom volume mounts -./run.sh -v /host/path:/container/path --mount-workspace - -# Launch with specific container name -./run.sh --name my-dynamo-container --mount-workspace +# Development with custom volume mounts +./run.sh --image dynamo:latest-vllm-local-dev -v /host/path:/container/path --mount-workspace ``` ## Workflow Examples ### Development Workflow ```bash -# 1. Build development image +# 1. Build local-dev image (creates both dynamo:latest-vllm and dynamo:latest-vllm-local-dev) ./build.sh --framework vllm --target local-dev -# 2. Run development container +# 2. Run development container using the local-dev image ./run.sh --image dynamo:latest-vllm-local-dev --mount-workspace -it # 3. Inside container, run inference (requires both frontend and backend) @@ -208,7 +222,7 @@ python -m dynamo.vllm --model Qwen/Qwen3-0.6B --gpu-memory-utilization 0.50 & ./build.sh --framework vllm --release-build # 2. Run production container -./run.sh --image dynamo:latest-vllm --gpus all +./run.sh --image dynamo:latest-vllm-local-dev --gpus all ``` ### Testing Workflow @@ -216,6 +230,6 @@ python -m dynamo.vllm --model Qwen/Qwen3-0.6B --gpu-memory-utilization 0.50 & # 1. Build with no cache for clean build ./build.sh --framework vllm --no-cache -# 2. Test container functionality +# 2. Test container functionality (--image defaults to dynamo:latest-vllm) ./run.sh --mount-workspace -it -- python -m pytest tests/ ``` diff --git a/container/build.sh b/container/build.sh index a095db4c9aa..2ec001bfb9b 100755 --- a/container/build.sh +++ b/container/build.sh @@ -204,6 +204,7 @@ get_options() { fi ;; --base-image) + # Note: --base-image cannot be used with --dev-image if [ "$2" ]; then BASE_IMAGE=$2 shift @@ -227,6 +228,30 @@ get_options() { missing_requirement "$1" fi ;; + --dev-image) + if [ "$2" ]; then + DEV_IMAGE_INPUT=$2 + shift + else + missing_requirement "$1" + fi + ;; + --uid) + if [ "$2" ]; then + CUSTOM_UID=$2 + shift + else + missing_requirement "$1" + fi + ;; + --gid) + if [ "$2" ]; then + CUSTOM_GID=$2 + shift + else + missing_requirement "$1" + fi + ;; --build-arg) if [ "$2" ]; then BUILD_ARGS+="--build-arg $2 " @@ -320,6 +345,23 @@ get_options() { shift done + # Validate argument combinations + if [[ -n "${DEV_IMAGE_INPUT:-}" && -n "${BASE_IMAGE:-}" ]]; then + error "ERROR: --dev-image cannot be used with --base-image. Use --dev-image to build from existing images or --base-image to build new images." + fi + + # Validate that --target and --dev-image cannot be used together + if [[ -n "${DEV_IMAGE_INPUT:-}" && -n "${TARGET:-}" ]]; then + error "ERROR: --target cannot be used with --dev-image. Use --target to build from scratch or --dev-image to build from existing images." + fi + + # Validate that --uid and --gid are only used with local-dev related options + if [[ -n "${CUSTOM_UID:-}" || -n "${CUSTOM_GID:-}" ]]; then + if [[ -z "${DEV_IMAGE_INPUT:-}" && "${TARGET:-}" != "local-dev" ]]; then + error "ERROR: --uid and --gid can only be used with --dev-image or --target local-dev" + fi + fi + if [ -z "$FRAMEWORK" ]; then FRAMEWORK=$DEFAULT_FRAMEWORK fi @@ -352,7 +394,7 @@ get_options() { if [ -z "$TAG" ]; then TAG="--tag dynamo:${VERSION}-${FRAMEWORK,,}" - if [ -n "${TARGET}" ]; then + if [ -n "${TARGET}" ] && [ "${TARGET}" != "local-dev" ]; then TAG="${TAG}-${TARGET}" fi fi @@ -419,6 +461,9 @@ show_help() { echo " [--cache-from cache location to start from]" echo " [--cache-to location where to cache the build output]" echo " [--tag tag for image]" + echo " [--dev-image dev image to build local-dev from]" + echo " [--uid user ID for local-dev images (only with --dev-image or --target local-dev)]" + echo " [--gid group ID for local-dev images (only with --dev-image or --target local-dev)]" echo " [--no-cache disable docker build cache]" echo " [--dry-run print docker commands without running]" echo " [--build-context name=path to add build context]" @@ -468,8 +513,73 @@ fi # Add NIXL_REF as a build argument BUILD_ARGS+=" --build-arg NIXL_REF=${NIXL_REF} " +# Function to build local-dev image with header +build_local_dev_with_header() { + local dev_base_image="$1" + local tags="$2" + local success_msg="$3" + local header_title="$4" + + echo "======================================" + echo "$header_title" + echo "======================================" + + # Get user info right before using it + USER_UID=${CUSTOM_UID:-$(id -u)} + USER_GID=${CUSTOM_GID:-$(id -g)} + + # Set up dockerfile path + DOCKERFILE_LOCAL_DEV="${SOURCE_DIR}/Dockerfile.local_dev" + + if [[ ! -f "$DOCKERFILE_LOCAL_DEV" ]]; then + echo "ERROR: Dockerfile.local_dev not found at: $DOCKERFILE_LOCAL_DEV" + exit 1 + fi + + echo "Building new local-dev image from: $dev_base_image" + echo "User 'ubuntu' will have UID: $USER_UID, GID: $USER_GID" + + # Show the docker command being executed if not in dry-run mode + if [ -z "$RUN_PREFIX" ]; then + set -x + fi + + $RUN_PREFIX docker build \ + --build-arg DEV_BASE="$dev_base_image" \ + --build-arg USER_UID="$USER_UID" \ + --build-arg USER_GID="$USER_GID" \ + --build-arg ARCH="$ARCH" \ + --file "$DOCKERFILE_LOCAL_DEV" \ + $tags \ + "$SOURCE_DIR" || { + { set +x; } 2>/dev/null + echo "ERROR: Failed to build local_dev image" + exit 1 + } + + { set +x; } 2>/dev/null + echo "$success_msg" + + # Show usage instructions + echo "" + echo "To run the local-dev image as the local user ($USER_UID/$USER_GID):" + # Extract the last tag from the tags string + last_tag=$(echo "$tags" | grep -o -- '--tag [^ ]*' | tail -1 | cut -d' ' -f2) + # Calculate relative path to run.sh from current working directory + # Get the directory where build.sh is located + build_dir="$(dirname "${BASH_SOURCE[0]}")" + # Get the absolute path to run.sh (in the same directory as build.sh) + run_abs_path="$(realpath "$build_dir/run.sh")" + # Calculate relative path from current PWD to run.sh + run_path="$(python3 -c "import os; print(os.path.relpath('$run_abs_path', '$PWD'))")" + echo " $run_path --image $last_tag --mount-workspace ..." +} + + +# Handle local-dev target if [[ $TARGET == "local-dev" ]]; then - BUILD_ARGS+=" --build-arg USER_UID=$(id -u) --build-arg USER_GID=$(id -g) " + LOCAL_DEV_BUILD=true + TARGET_STR="--target dev" fi # BUILD DEV IMAGE @@ -607,7 +717,7 @@ if [ "$USE_SCCACHE" = true ]; then fi LATEST_TAG="--tag dynamo:latest-${FRAMEWORK,,}" -if [ -n "${TARGET}" ]; then +if [ -n "${TARGET}" ] && [ "${TARGET}" != "local-dev" ]; then LATEST_TAG="${LATEST_TAG}-${TARGET}" fi @@ -617,24 +727,69 @@ if [ -z "$RUN_PREFIX" ]; then set -x fi -# TODO: Follow 2-step build process for all frameworks once necessary changes are made to the sglang and TRT-LLM backend Dockerfiles. -if [[ $FRAMEWORK == "VLLM" ]] || [[ $FRAMEWORK == "SGLANG" ]]; then - # Define base image tag before using it - DYNAMO_BASE_IMAGE="dynamo-base:${VERSION}" - # Start base image build - echo "======================================" - echo "Starting Build 1: Base Image" - echo "======================================" - $RUN_PREFIX docker build -f "${SOURCE_DIR}/Dockerfile" --target dev $PLATFORM $BUILD_ARGS $CACHE_FROM $CACHE_TO --tag $DYNAMO_BASE_IMAGE $BUILD_CONTEXT_ARG $BUILD_CONTEXT $NO_CACHE - # Start framework build - echo "======================================" - echo "Starting Build 2: Framework Image" - echo "======================================" - BUILD_ARGS+=" --build-arg DYNAMO_BASE_IMAGE=${DYNAMO_BASE_IMAGE}" - $RUN_PREFIX docker build -f $DOCKERFILE $TARGET_STR $PLATFORM $BUILD_ARGS $CACHE_FROM $CACHE_TO $TAG $LATEST_TAG $BUILD_CONTEXT_ARG $BUILD_CONTEXT $NO_CACHE -else - $RUN_PREFIX docker build -f $DOCKERFILE $TARGET_STR $PLATFORM $BUILD_ARGS $CACHE_FROM $CACHE_TO $TAG $LATEST_TAG $BUILD_CONTEXT_ARG $BUILD_CONTEXT $NO_CACHE + +# Skip Build 1 and Build 2 if DEV_IMAGE_INPUT is set (we'll handle it at the bottom) +if [[ -z "${DEV_IMAGE_INPUT:-}" ]]; then + # TODO: Follow 2-step build process for all frameworks once necessary changes are made to the sglang and TRT-LLM backend Dockerfiles. + if [[ $FRAMEWORK == "VLLM" ]] || [[ $FRAMEWORK == "SGLANG" ]]; then + # Define base image tag before using it + DYNAMO_BASE_IMAGE="dynamo-base:${VERSION}" + # Start base image build + echo "======================================" + echo "Starting Build 1: Base Image" + echo "======================================" + $RUN_PREFIX docker build -f "${SOURCE_DIR}/Dockerfile" --target dev $PLATFORM $BUILD_ARGS $CACHE_FROM $CACHE_TO --tag $DYNAMO_BASE_IMAGE $BUILD_CONTEXT_ARG $BUILD_CONTEXT $NO_CACHE + # Start framework build + echo "======================================" + echo "Starting Build 2: Framework Image" + echo "======================================" + BUILD_ARGS+=" --build-arg DYNAMO_BASE_IMAGE=${DYNAMO_BASE_IMAGE}" + $RUN_PREFIX docker build -f $DOCKERFILE $TARGET_STR $PLATFORM $BUILD_ARGS $CACHE_FROM $CACHE_TO $TAG $LATEST_TAG $BUILD_CONTEXT_ARG $BUILD_CONTEXT $NO_CACHE + else + $RUN_PREFIX docker build -f $DOCKERFILE $TARGET_STR $PLATFORM $BUILD_ARGS $CACHE_FROM $CACHE_TO $TAG $LATEST_TAG $BUILD_CONTEXT_ARG $BUILD_CONTEXT $NO_CACHE + fi fi +# Handle --dev-image option (build local-dev from existing dev image) +if [[ -n "${DEV_IMAGE_INPUT:-}" ]]; then + # Validate that the dev image is not already a local-dev image + if [[ "$DEV_IMAGE_INPUT" == *"-local-dev" ]]; then + echo "ERROR: Cannot use local-dev image as dev image input: '$DEV_IMAGE_INPUT'" + exit 1 + fi + + # Build tag arguments - always add -local-dev suffix for --dev-image + # Generate local-dev tag from input image + if [[ "$DEV_IMAGE_INPUT" == *:* ]]; then + LOCAL_DEV_TAG="--tag ${DEV_IMAGE_INPUT}-local-dev" + else + LOCAL_DEV_TAG="--tag ${DEV_IMAGE_INPUT}:latest-local-dev" + fi + + build_local_dev_with_header "$DEV_IMAGE_INPUT" "$LOCAL_DEV_TAG" "Successfully built local-dev image: ${LOCAL_DEV_TAG#--tag }" "Building Local-Dev Image" +elif [[ "${LOCAL_DEV_BUILD:-}" == "true" ]]; then + # Use the first tag name (TAG) if available, otherwise use latest + if [[ -n "$TAG" ]]; then + DEV_IMAGE=$(echo "$TAG" | sed 's/--tag //' | sed 's/-local-dev$//') + else + DEV_IMAGE="dynamo:latest-${FRAMEWORK,,}" + fi + + # Build local-dev tags from existing tags + LOCAL_DEV_TAGS="" + if [[ -n "$TAG" ]]; then + # Extract tag name, remove any existing -local-dev suffix, then add -local-dev + TAG_NAME=$(echo "$TAG" | sed 's/--tag //' | sed 's/-local-dev$//') + LOCAL_DEV_TAGS+=" --tag ${TAG_NAME}-local-dev" + fi + + if [[ -n "$LATEST_TAG" ]]; then + # Extract tag name, remove any existing -local-dev suffix, then add -local-dev + LATEST_TAG_NAME=$(echo "$LATEST_TAG" | sed 's/--tag //' | sed 's/-local-dev$//') + LOCAL_DEV_TAGS+=" --tag ${LATEST_TAG_NAME}-local-dev" + fi + + build_local_dev_with_header "$DEV_IMAGE" "$LOCAL_DEV_TAGS" "Successfully built local-dev images" "Starting Build 3: Local-Dev Image" +fi { set +x; } 2>/dev/null