Skip to content

docker: create a unified container with llama-server, whisper.cpp and stable-diffusion.cpp#589

Closed
mostlygeek wants to merge 4 commits intomainfrom
llama-swap-containers
Closed

docker: create a unified container with llama-server, whisper.cpp and stable-diffusion.cpp#589
mostlygeek wants to merge 4 commits intomainfrom
llama-swap-containers

Conversation

@mostlygeek
Copy link
Copy Markdown
Owner

@mostlygeek mostlygeek commented Mar 18, 2026

The current llama-swap container is basically ggml-org/llama.cpp’s containers with the llama-swap binary injected. This PR creates an experimental image for cuda and vulkan with llama-server, whisper.cpp and stable-diffusion built in.

Build llama-swap binary from local source code instead of downloading
from GitHub releases. This ensures the container uses the exact code
in the repository.

- Add golang:1.25-alpine builder stage to compile llama-swap
- Generate version from git hash with +dirty suffix for unstaged changes
- Update build-image.sh to use repository root as build context
- Remove LLAMA_SWAP_VERSION environment variable and related code
- Add test-binaries.sh to Dockerfile.vulkan for consistency

Both CUDA and Vulkan Dockerfiles now build llama-swap from source.
@coderabbitai
Copy link
Copy Markdown

coderabbitai bot commented Mar 18, 2026

Walkthrough

Modified Makefile to disable Linux arm64 builds. Added Docker infrastructure including multi-stage Dockerfiles for CUDA and Vulkan backends, build automation scripts, operational guidelines, and configuration file updates for Docker artifact management.

Changes

Cohort / File(s) Summary
Build Configuration
Makefile
Disabled Linux arm64 build step by converting the build command into a comment, leaving mac arm64 and linux amd64 builds unchanged.
Docker Configuration
docker/.gitignore, docker/AGENTS.md
Added BuildKit configuration ignore entries and operational documentation covering DOCKER_BUILDKIT caching, notification procedures via pushover.sh, and minimum 2-hour timeout requirements for CUDA-based builds.
Multi-Stage Dockerfiles
docker/Dockerfile.cuda, docker/Dockerfile.vulkan
Introduced complex multi-stage Dockerfiles orchestrating independent builds of llama.cpp, whisper.cpp, stable-diffusion.cpp, and llama-swap with per-project cache mounts, CUDA/Vulkan SDK integration, and runtime verification checks.
Build and Test Infrastructure
docker/build-image.sh, docker/test-binaries.sh
Added build orchestration script supporting CUDA/Vulkan backend selection, dynamic commit hash detection, custom BuildKit configuration, and binary validation; plus complementary test script for CUDA runtime verification with GPU detection and library path management.

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~60 minutes

Possibly related PRs

🚥 Pre-merge checks | ✅ 2 | ❌ 1

❌ Failed checks (1 warning)

Check name Status Explanation Resolution
Docstring Coverage ⚠️ Warning Docstring coverage is 66.67% which is insufficient. The required threshold is 80.00%. Write docstrings for the functions missing them to satisfy the coverage threshold.
✅ Passed checks (2 passed)
Check name Status Explanation
Title check ✅ Passed The title accurately describes the main objective: creating a unified container with three components (llama-server, whisper.cpp, and stable-diffusion.cpp), which matches the PR's core changes.
Description check ✅ Passed The description directly relates to the changeset, explaining the motivation for replacing the previous approach with a new unified image for CUDA and Vulkan backends.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Commit unit tests in branch llama-swap-containers
📝 Coding Plan
  • Generate coding plan for human review comments

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

Tip

You can make CodeRabbit's review stricter and more nitpicky using the `assertive` profile, if that's what you prefer.

Change the reviews.profile setting to assertive to make CodeRabbit's nitpick more issues in your PRs.

Copy link
Copy Markdown

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 7

🧹 Nitpick comments (5)
docker/test-binaries.sh (1)

62-102: Consider testing all binaries that build-image.sh verifies.

The script tests only 3 server binaries, while build-image.sh (lines 240-244) verifies 7 binaries including llama-cli, whisper-cli, sd-cli, and llama-swap. For consistency, consider extending this script to cover all binaries, or document why the scope differs.

♻️ Proposed addition for cli binaries and llama-swap
 # Test sd-server (stable-diffusion)
 print_info "Testing sd-server..."
 if command -v sd-server &> /dev/null; then
     if sd-server --help > /dev/null 2>&1 || sd-server -h > /dev/null 2>&1; then
         print_info "✓ sd-server: OK"
     else
         print_error "✗ sd-server: Failed to run"
         exit 1
     fi
 else
     print_error "✗ sd-server: Not found in PATH"
     exit 1
 fi

+# Test CLI binaries (quick existence check)
+for cli_binary in llama-cli whisper-cli sd-cli llama-swap; do
+    print_info "Testing ${cli_binary}..."
+    if command -v "${cli_binary}" &> /dev/null; then
+        if "${cli_binary}" --help > /dev/null 2>&1 || "${cli_binary}" -h > /dev/null 2>&1 || "${cli_binary}" --version > /dev/null 2>&1; then
+            print_info "✓ ${cli_binary}: OK"
+        else
+            print_error "✗ ${cli_binary}: Failed to run"
+            exit 1
+        fi
+    else
+        print_error "✗ ${cli_binary}: Not found in PATH"
+        exit 1
+    fi
+done
+
 print_info "All binary tests passed!"
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@docker/test-binaries.sh` around lines 62 - 102, The test script currently
checks only llama-server, whisper-server, and sd-server; extend
docker/test-binaries.sh to also verify the CLI binaries and swap tool that
build-image.sh checks by adding the same presence-and-help invocation pattern
for llama-cli, whisper-cli, sd-cli, and llama-swap (use command -v <binary> and
run "<binary> --help" or "<binary> -h"), following the existing blocks for
llama-server/whisper-server/sd-server so failures print the same print_error
messages and exit 1; reference the existing test blocks for llama-server,
whisper-server, and sd-server as templates to implement the new checks.
Makefile (1)

54-54: Commented-out arm64 build is acceptable but consider adding a comment explaining why.

The arm64 build is still supported via GoReleaser (.goreleaser.yaml includes linux+arm64), so automated releases remain unaffected. Consider adding a brief comment explaining the reasoning (e.g., Docker focus on amd64) to prevent future confusion.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@Makefile` at line 54, Add a short explanatory comment above the commented-out
arm64 build line (the line starting with "#GOOS=linux GOARCH=arm64 go build
...") clarifying why it's disabled locally—for example noting that releases
still include linux/arm64 via GoReleaser (.goreleaser.yaml) and local
builds/Docker focus on amd64—to avoid future confusion; update the Makefile near
that commented GOOS/GOARCH line to include this one-line rationale.
docker/Dockerfile.vulkan (2)

183-189: Consider adding a non-root user for improved security posture.

The container runs as root by default. While GPU device access often requires specific permissions, adding an optional non-root user would improve security for deployments that can accommodate it.

🛡️ Proposed addition for non-root user support
 FROM ubuntu:22.04 AS runtime

 ARG LLAMA_COMMIT_HASH=unknown
 ARG WHISPER_COMMIT_HASH=unknown
 ARG SD_COMMIT_HASH=unknown
+ARG UID=0
+ARG GID=0
 ENV DEBIAN_FRONTEND=noninteractive
 ENV PATH="/usr/local/bin:${PATH}"

Then near the end, before WORKDIR /models:

+# Create non-root user if UID != 0
+RUN if [ "$UID" != "0" ]; then \
+        groupadd -g "$GID" llama 2>/dev/null || true && \
+        useradd -u "$UID" -g "$GID" -m llama && \
+        chown -R llama:llama /models; \
+    fi
+USER ${UID}:${GID}
+
 WORKDIR /models
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@docker/Dockerfile.vulkan` around lines 183 - 189, Add an optional non-root
user in the runtime stage by creating a user/group (e.g., llamauser), setting
its home, chowning any runtime-owned dirs (notably /models) and switching to
that user with USER before the WORKDIR /models line; make this conditional via a
build ARG (e.g., CREATE_NONROOT=true) so GPU/permission-sensitive deployments
can opt out, and ensure any files, binaries in /usr/local/bin and installed
runtime assets are owned or readable by that user (adjust RUN steps that install
runtime dependencies to chown where necessary).

27-28: Consider embedding version info via ldflags for consistency with the Makefile build.

The Makefile embeds version info using -ldflags="-X main.commit=... -X main.version=...", but this build uses a plain go build. While the version file approach works, llama-swap --version (line 252) may not display useful info.

♻️ Proposed fix to embed version
 ENV CGO_ENABLED=0 GOOS=linux GOARCH=amd64
-RUN go build -o llama-swap .
+
+# Get version info first
+RUN VERSION=$(git rev-parse --short HEAD 2>/dev/null || echo "local") && \
+    if [ -n "$(git status --porcelain 2>/dev/null)" ]; then VERSION="${VERSION}+dirty"; fi && \
+    BUILD_DATE=$(date -u +"%Y-%m-%dT%H:%M:%SZ") && \
+    go build -ldflags="-X main.commit=${VERSION} -X main.version=local_${VERSION} -X main.date=${BUILD_DATE}" -o llama-swap .
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@docker/Dockerfile.vulkan` around lines 27 - 28, The Dockerfile.vulkan build
step currently runs RUN go build -o llama-swap . without embedding version info;
change the build invocation used in the RUN step (the RUN go build -o llama-swap
. command) to pass the same ldflags used in the Makefile (set -ldflags with -X
main.version=... and -X main.commit=... and any other version variables) so the
binary exposes correct values consumed by the --version code paths
(main.version/main.commit); ensure the values are populated from the same
build-time variables used elsewhere (e.g., environment or ARG) so Docker build
mirrors the Makefile behavior.
docker/Dockerfile.cuda (1)

182-186: Consider adding a non-root user for improved security posture.

Same concern as Dockerfile.vulkan—the container runs as root by default. Adding optional non-root user support would improve security.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@docker/Dockerfile.cuda` around lines 182 - 186, The Dockerfile.runtime stage
(FROM nvidia/cuda:12.4.0-runtime-ubuntu22.04 AS runtime) runs as root; add an
optional non-root user and switch to it: introduce build ARGs (e.g., USER_ID,
GROUP_ID, USER_NAME), create a group and user, set HOME, create/apply ownership
to runtime directories used by the image (e.g., /usr/local/bin, any app or
workspace dir), and then use USER to switch from root to that user before the
final image. Ensure you preserve the existing ENV lines (DEBIAN_FRONTEND, PATH,
LD_LIBRARY_PATH) and make chown operations deterministic so the container can
still run with GPU access while improving security.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@docker/AGENTS.md`:
- Around line 3-4: Fix the Markdown list indentation and update the example
Docker build command so it is complete and clear: correct the sub-item
indentation under the list entry and replace the incomplete command with a full
example that includes DOCKER_BUILDKIT=1, the image name with a backend tag
(e.g., :cuda or :vulkan) and the Dockerfile path (e.g., using -f
Dockerfile.<backend> or specifying the Dockerfile location). Ensure the line
shows the full command form (setting DOCKER_BUILDKIT=1, docker build -t
llama-swap:<backend> -f <DockerfilePath> .) and update the explanatory text to
mention the required backend tag and Dockerfile path.

In `@docker/build-image.sh`:
- Around line 204-221: The script creates buildkitd.toml in the current working
directory which can orphan files when the script is run from elsewhere; change
creation and use of buildkitd.toml to the script's directory. Modify the block
that writes buildkitd.toml (the here-doc marker BUILDKIT_EOF and the filename
buildkitd.toml) to write into the directory of the running script (e.g., using
the script dir variable like DIR="$(cd "$(dirname "${BASH_SOURCE[0]}")" &&
pwd)"), then reference that path when calling docker buildx create
--buildkitd-config and when cleaning up; ensure BUILDER_NAME and docker buildx
use logic remain unchanged but operate against the config file located next to
the script.
- Around line 252-254: The failure message always suggests "./build-image.sh
--vulkan --no-cache" regardless of selected backend; update the echo lines that
print the recovery suggestion so they reference the actual backend flag instead
of hardcoding "--vulkan" (e.g., build the suggestion from the script's backend
variable/flag such as BACKEND or BUILD_BACKEND and print "./build-image.sh
--${BACKEND} --no-cache" or conditionally choose "--cuda" vs "--vulkan"), and
ensure the existing "--no-cache" advice remains.
- Around line 117-128: get_default_branch currently treats a successful git
command as proof the branch exists and blindly falls back to "master" even when
neither branch exists; update get_default_branch to capture the output of git
ls-remote --heads "${repo_url}" <branch> for both "master" and "main", check
that the captured output is non-empty before echoing the branch, and if neither
check returns output, emit an error to stderr (or return a non-zero status)
instead of silently echoing "master"; refer to the get_default_branch function
and replace the existing if/elif/else checks with output-validated checks and a
proper failure path.

In `@docker/Dockerfile.cuda`:
- Around line 127-128: The Dockerfile currently silences failures when copying
binaries by using the "cp ... 2>/dev/null || true" pattern which can hide
missing build artifacts; update the copy step that references
build/bin/whisper-cli and build/bin/whisper-server to remove the "2>/dev/null ||
true" suppression and instead let the COPY/CP fail the build, or add an explicit
verification after the copy (check for existence of build/bin/whisper-cli and
build/bin/whisper-server or test -f) and exit non‑zero if they are missing so
the image build fails fast when required binaries are absent.
- Around line 230-235: Uncomment and restore the block that writes version
information to /versions.txt so build-image.sh can read llama-swap version;
ensure the echo/subshell commands that call /usr/local/bin/llama-server,
sd-server, llama-swap and use ${WHISPER_COMMIT_HASH} are present in the runtime
stage (not just build stage) and that any ARGs (e.g., WHISPER_COMMIT_HASH) are
declared in the runtime stage so they are available there; re-enable the
multi-echo block, preserving proper quoting/escaping for subshell output and
include build_timestamp via date -u to match the Vulkan Dockerfile behavior.

In `@docker/Dockerfile.vulkan`:
- Line 223: The Dockerfile uses COPY --from=llama-build /install/examples/
/opt/llama.cpp/examples/ but the llama-build stage only creates /install/bin and
/install/lib, so the COPY will fail; either remove this COPY or ensure the
llama-build stage creates /install/examples (e.g., create the directory and copy
examples into it) before the final COPY; locate the instruction in the
Dockerfile referencing the llama-build stage and update the stage (or the COPY
line) accordingly so /install/examples exists when COPY --from=llama-build
/install/examples/ is executed.

---

Nitpick comments:
In `@docker/Dockerfile.cuda`:
- Around line 182-186: The Dockerfile.runtime stage (FROM
nvidia/cuda:12.4.0-runtime-ubuntu22.04 AS runtime) runs as root; add an optional
non-root user and switch to it: introduce build ARGs (e.g., USER_ID, GROUP_ID,
USER_NAME), create a group and user, set HOME, create/apply ownership to runtime
directories used by the image (e.g., /usr/local/bin, any app or workspace dir),
and then use USER to switch from root to that user before the final image.
Ensure you preserve the existing ENV lines (DEBIAN_FRONTEND, PATH,
LD_LIBRARY_PATH) and make chown operations deterministic so the container can
still run with GPU access while improving security.

In `@docker/Dockerfile.vulkan`:
- Around line 183-189: Add an optional non-root user in the runtime stage by
creating a user/group (e.g., llamauser), setting its home, chowning any
runtime-owned dirs (notably /models) and switching to that user with USER before
the WORKDIR /models line; make this conditional via a build ARG (e.g.,
CREATE_NONROOT=true) so GPU/permission-sensitive deployments can opt out, and
ensure any files, binaries in /usr/local/bin and installed runtime assets are
owned or readable by that user (adjust RUN steps that install runtime
dependencies to chown where necessary).
- Around line 27-28: The Dockerfile.vulkan build step currently runs RUN go
build -o llama-swap . without embedding version info; change the build
invocation used in the RUN step (the RUN go build -o llama-swap . command) to
pass the same ldflags used in the Makefile (set -ldflags with -X
main.version=... and -X main.commit=... and any other version variables) so the
binary exposes correct values consumed by the --version code paths
(main.version/main.commit); ensure the values are populated from the same
build-time variables used elsewhere (e.g., environment or ARG) so Docker build
mirrors the Makefile behavior.

In `@docker/test-binaries.sh`:
- Around line 62-102: The test script currently checks only llama-server,
whisper-server, and sd-server; extend docker/test-binaries.sh to also verify the
CLI binaries and swap tool that build-image.sh checks by adding the same
presence-and-help invocation pattern for llama-cli, whisper-cli, sd-cli, and
llama-swap (use command -v <binary> and run "<binary> --help" or "<binary> -h"),
following the existing blocks for llama-server/whisper-server/sd-server so
failures print the same print_error messages and exit 1; reference the existing
test blocks for llama-server, whisper-server, and sd-server as templates to
implement the new checks.

In `@Makefile`:
- Line 54: Add a short explanatory comment above the commented-out arm64 build
line (the line starting with "#GOOS=linux GOARCH=arm64 go build ...") clarifying
why it's disabled locally—for example noting that releases still include
linux/arm64 via GoReleaser (.goreleaser.yaml) and local builds/Docker focus on
amd64—to avoid future confusion; update the Makefile near that commented
GOOS/GOARCH line to include this one-line rationale.

ℹ️ Review info
⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

Run ID: 728340de-811d-43fc-8af7-ab070844d98c

📥 Commits

Reviewing files that changed from the base of the PR and between c3c258a and 74c625b.

📒 Files selected for processing (7)
  • Makefile
  • docker/.gitignore
  • docker/AGENTS.md
  • docker/Dockerfile.cuda
  • docker/Dockerfile.vulkan
  • docker/build-image.sh
  • docker/test-binaries.sh

Comment on lines +3 to +4
- use DOCKER_BUILDKIT=1 docker build -t llama-swap:optimized
- DOCKER_BUILDKIT=1 is important to use the caching
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor

Fix indentation and clarify the Docker build command.

The sub-item indentation is inconsistent (flagged by markdownlint). Also, the example command is incomplete—it should include a backend tag (:cuda or :vulkan) and the Dockerfile path.

📝 Proposed fix
-- use DOCKER_BUILDKIT=1 docker build -t llama-swap:optimized
-  - DOCKER_BUILDKIT=1 is important to use the caching
+- use `DOCKER_BUILDKIT=1 docker build -t llama-swap:cuda -f docker/Dockerfile.cuda .`
+  - `DOCKER_BUILDKIT=1` is important for caching
+  - Or use `./docker/build-image.sh --cuda` which handles this automatically
🧰 Tools
🪛 markdownlint-cli2 (0.21.0)

[warning] 4-4: Inconsistent indentation for list items at the same level
Expected: 0; Actual: 1

(MD005, list-indent)

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@docker/AGENTS.md` around lines 3 - 4, Fix the Markdown list indentation and
update the example Docker build command so it is complete and clear: correct the
sub-item indentation under the list entry and replace the incomplete command
with a full example that includes DOCKER_BUILDKIT=1, the image name with a
backend tag (e.g., :cuda or :vulkan) and the Dockerfile path (e.g., using -f
Dockerfile.<backend> or specifying the Dockerfile location). Ensure the line
shows the full command form (setting DOCKER_BUILDKIT=1, docker build -t
llama-swap:<backend> -f <DockerfilePath> .) and update the explanatory text to
mention the required backend tag and Dockerfile path.

Comment on lines +117 to +128
get_default_branch() {
local repo_url="$1"

# Check for master first
if git ls-remote --heads "${repo_url}" master &>/dev/null; then
echo "master"
elif git ls-remote --heads "${repo_url}" main &>/dev/null; then
echo "main"
else
echo "master" # fallback
fi
}
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor

get_default_branch always succeeds even when no branches are found.

The function checks if git ls-remote succeeds but doesn't verify that output is non-empty. If the repo URL is valid but has neither master nor main, it silently falls back to master, which may cause confusing failures later.

🔍 Proposed fix: validate branch existence
 get_default_branch() {
     local repo_url="$1"

-    # Check for master first
-    if git ls-remote --heads "${repo_url}" master &>/dev/null; then
+    # Check for master first (verify output is non-empty)
+    if git ls-remote --heads "${repo_url}" master 2>/dev/null | grep -q .; then
         echo "master"
-    elif git ls-remote --heads "${repo_url}" main &>/dev/null; then
+    elif git ls-remote --heads "${repo_url}" main 2>/dev/null | grep -q .; then
         echo "main"
     else
-        echo "master"  # fallback
+        echo "ERROR: Could not detect default branch for ${repo_url}" >&2
+        return 1
     fi
 }
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@docker/build-image.sh` around lines 117 - 128, get_default_branch currently
treats a successful git command as proof the branch exists and blindly falls
back to "master" even when neither branch exists; update get_default_branch to
capture the output of git ls-remote --heads "${repo_url}" <branch> for both
"master" and "main", check that the captured output is non-empty before echoing
the branch, and if neither check returns output, emit an error to stderr (or
return a non-zero status) instead of silently echoing "master"; refer to the
get_default_branch function and replace the existing if/elif/else checks with
output-validated checks and a proper failure path.

Comment on lines +204 to +221
if ! docker buildx inspect "$BUILDER_NAME" >/dev/null 2>&1; then
echo "Creating custom buildx builder with max-parallelism=1..."

# Create buildkitd.toml config file
cat > buildkitd.toml << 'BUILDKIT_EOF'
[worker.oci]
max-parallelism = 1
BUILDKIT_EOF

# Create the builder with the config
docker buildx create --name "$BUILDER_NAME" \
--driver docker-container \
--buildkitd-config buildkitd.toml \
--use
else
# Switch to our builder
docker buildx use "$BUILDER_NAME"
fi
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor

buildkitd.toml is created in the current working directory, not the script directory.

If the script is invoked from a different directory (e.g., ./docker/build-image.sh --cuda from repo root), the config file will be created in that directory rather than alongside the script. This could lead to orphaned files.

🔧 Proposed fix: use script directory for config
+    BUILDKIT_CONFIG="${SCRIPT_DIR}/buildkitd.toml"
+    
     # Create buildkitd.toml config file
-    cat > buildkitd.toml << 'BUILDKIT_EOF'
+    cat > "${BUILDKIT_CONFIG}" << 'BUILDKIT_EOF'
 [worker.oci]
   max-parallelism = 1
 BUILDKIT_EOF
     
     # Create the builder with the config
     docker buildx create --name "$BUILDER_NAME" \
         --driver docker-container \
-        --buildkitd-config buildkitd.toml \
+        --buildkitd-config "${BUILDKIT_CONFIG}" \
         --use
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
if ! docker buildx inspect "$BUILDER_NAME" >/dev/null 2>&1; then
echo "Creating custom buildx builder with max-parallelism=1..."
# Create buildkitd.toml config file
cat > buildkitd.toml << 'BUILDKIT_EOF'
[worker.oci]
max-parallelism = 1
BUILDKIT_EOF
# Create the builder with the config
docker buildx create --name "$BUILDER_NAME" \
--driver docker-container \
--buildkitd-config buildkitd.toml \
--use
else
# Switch to our builder
docker buildx use "$BUILDER_NAME"
fi
if ! docker buildx inspect "$BUILDER_NAME" >/dev/null 2>&1; then
echo "Creating custom buildx builder with max-parallelism=1..."
BUILDKIT_CONFIG="${SCRIPT_DIR}/buildkitd.toml"
# Create buildkitd.toml config file
cat > "${BUILDKIT_CONFIG}" << 'BUILDKIT_EOF'
[worker.oci]
max-parallelism = 1
BUILDKIT_EOF
# Create the builder with the config
docker buildx create --name "$BUILDER_NAME" \
--driver docker-container \
--buildkitd-config "${BUILDKIT_CONFIG}" \
--use
else
# Switch to our builder
docker buildx use "$BUILDER_NAME"
fi
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@docker/build-image.sh` around lines 204 - 221, The script creates
buildkitd.toml in the current working directory which can orphan files when the
script is run from elsewhere; change creation and use of buildkitd.toml to the
script's directory. Modify the block that writes buildkitd.toml (the here-doc
marker BUILDKIT_EOF and the filename buildkitd.toml) to write into the directory
of the running script (e.g., using the script dir variable like DIR="$(cd
"$(dirname "${BASH_SOURCE[0]}")" && pwd)"), then reference that path when
calling docker buildx create --buildkitd-config and when cleaning up; ensure
BUILDER_NAME and docker buildx use logic remain unchanged but operate against
the config file located next to the script.

Comment on lines +252 to +254
echo "This usually indicates a build stage failure. Try running with --no-cache flag:"
echo " ./build-image.sh --vulkan --no-cache"
exit 1
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor

Error message suggests --vulkan regardless of actual backend.

If a CUDA build fails verification, the message incorrectly suggests --vulkan --no-cache.

🐛 Proposed fix
     echo ""
     echo "This usually indicates a build stage failure. Try running with --no-cache flag:"
-    echo "  ./build-image.sh --vulkan --no-cache"
+    echo "  ./build-image.sh --${BACKEND} --no-cache"
     exit 1
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
echo "This usually indicates a build stage failure. Try running with --no-cache flag:"
echo " ./build-image.sh --vulkan --no-cache"
exit 1
echo "This usually indicates a build stage failure. Try running with --no-cache flag:"
echo " ./build-image.sh --${BACKEND} --no-cache"
exit 1
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@docker/build-image.sh` around lines 252 - 254, The failure message always
suggests "./build-image.sh --vulkan --no-cache" regardless of selected backend;
update the echo lines that print the recovery suggestion so they reference the
actual backend flag instead of hardcoding "--vulkan" (e.g., build the suggestion
from the script's backend variable/flag such as BACKEND or BUILD_BACKEND and
print "./build-image.sh --${BACKEND} --no-cache" or conditionally choose
"--cuda" vs "--vulkan"), and ensure the existing "--no-cache" advice remains.

Comment on lines +127 to +128
cp -v build/bin/whisper-cli build/bin/whisper-server /install/bin/ 2>/dev/null || true && \
find build -name "*.so*" -type f -exec cp -v {} /install/lib/ \;
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor

Silent failure on binary copy may mask build issues.

The 2>/dev/null || true pattern suppresses errors if binaries don't exist. This could mask build failures, leading to a runtime image without required binaries.

🔍 Proposed fix: fail explicitly if binaries missing
-    cp -v build/bin/whisper-cli build/bin/whisper-server /install/bin/ 2>/dev/null || true && \
+    cp -v build/bin/whisper-cli build/bin/whisper-server /install/bin/ && \

Alternatively, add explicit verification:

+    test -f /install/bin/whisper-cli && test -f /install/bin/whisper-server && \
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@docker/Dockerfile.cuda` around lines 127 - 128, The Dockerfile currently
silences failures when copying binaries by using the "cp ... 2>/dev/null ||
true" pattern which can hide missing build artifacts; update the copy step that
references build/bin/whisper-cli and build/bin/whisper-server to remove the
"2>/dev/null || true" suppression and instead let the COPY/CP fail the build, or
add an explicit verification after the copy (check for existence of
build/bin/whisper-cli and build/bin/whisper-server or test -f) and exit non‑zero
if they are missing so the image build fails fast when required binaries are
absent.

Comment on lines +230 to +235
# Write version information to /versions.txt
#RUN echo "llama.cpp: $(/usr/local/bin/llama-server --version 2>&1 | grep version) > /versions.txt && \
# echo "whisper.cpp: ${WHISPER_COMMIT_HASH}" >> /versions.txt && \
# echo "stable-diffusion.cpp: $(/usr/local/bin/sd-server --version)" >> /versions.txt && \
# echo "llama-swap: $(/usr/local/bin/llama-swap -version)" >> /versions.txt && \
# echo "build_timestamp: $(date -u +%Y-%m-%dT%H:%M:%SZ)" >> /versions.txt
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

Uncomment version info to match Vulkan Dockerfile and fix build-image.sh integration.

The build-image.sh script (line 270) reads /versions.txt to display the llama-swap version. With this commented out, CUDA builds will fail at that step.

🐛 Proposed fix: enable version info
-# Write version information to /versions.txt
-#RUN echo "llama.cpp: $(/usr/local/bin/llama-server --version 2>&1 | grep version) > /versions.txt && \
-#    echo "whisper.cpp: ${WHISPER_COMMIT_HASH}" >> /versions.txt && \
-#    echo "stable-diffusion.cpp: $(/usr/local/bin/sd-server --version)" >> /versions.txt && \
-#    echo "llama-swap: $(/usr/local/bin/llama-swap -version)" >> /versions.txt && \
-#    echo "build_timestamp: $(date -u +%Y-%m-%dT%H:%M:%SZ)" >> /versions.txt
+# Write version information to /versions.txt
+ARG LLAMA_COMMIT_HASH=unknown
+ARG WHISPER_COMMIT_HASH=unknown
+ARG SD_COMMIT_HASH=unknown
+RUN echo "llama.cpp: ${LLAMA_COMMIT_HASH}" > /versions.txt && \
+    echo "whisper.cpp: ${WHISPER_COMMIT_HASH}" >> /versions.txt && \
+    echo "stable-diffusion.cpp: ${SD_COMMIT_HASH}" >> /versions.txt && \
+    echo "llama-swap: local" >> /versions.txt && \
+    echo "backend: CUDA" >> /versions.txt && \
+    echo "build_timestamp: $(date -u +%Y-%m-%dT%H:%M:%SZ)" >> /versions.txt

Note: The ARGs need to be declared in the runtime stage to be available.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@docker/Dockerfile.cuda` around lines 230 - 235, Uncomment and restore the
block that writes version information to /versions.txt so build-image.sh can
read llama-swap version; ensure the echo/subshell commands that call
/usr/local/bin/llama-server, sd-server, llama-swap and use
${WHISPER_COMMIT_HASH} are present in the runtime stage (not just build stage)
and that any ARGs (e.g., WHISPER_COMMIT_HASH) are declared in the runtime stage
so they are available there; re-enable the multi-echo block, preserving proper
quoting/escaping for subshell output and include build_timestamp via date -u to
match the Vulkan Dockerfile behavior.

COPY --from=llama-build /install/bin/llama-server /usr/local/bin/
COPY --from=llama-build /install/bin/llama-cli /usr/local/bin/
COPY --from=llama-build /install/lib/ /usr/local/lib/
COPY --from=llama-build /install/examples/ /opt/llama.cpp/examples/
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

COPY will fail: /install/examples/ is not created in the llama-build stage.

The llama-build stage (lines 174-178) only creates /install/bin and /install/lib, but not /install/examples/. This COPY instruction will cause the build to fail.

🐛 Proposed fix: remove or conditionally copy examples
 COPY --from=llama-build /install/bin/llama-server /usr/local/bin/
 COPY --from=llama-build /install/bin/llama-cli /usr/local/bin/
 COPY --from=llama-build /install/lib/ /usr/local/lib/
-COPY --from=llama-build /install/examples/ /opt/llama.cpp/examples/

If examples are needed, add this to the llama-build stage before the COPY:

mkdir -p /install/examples && \
cp -r examples/* /install/examples/ 2>/dev/null || true
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
COPY --from=llama-build /install/examples/ /opt/llama.cpp/examples/
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@docker/Dockerfile.vulkan` at line 223, The Dockerfile uses COPY
--from=llama-build /install/examples/ /opt/llama.cpp/examples/ but the
llama-build stage only creates /install/bin and /install/lib, so the COPY will
fail; either remove this COPY or ensure the llama-build stage creates
/install/examples (e.g., create the directory and copy examples into it) before
the final COPY; locate the instruction in the Dockerfile referencing the
llama-build stage and update the stage (or the COPY line) accordingly so
/install/examples exists when COPY --from=llama-build /install/examples/ is
executed.

@mostlygeek mostlygeek closed this Mar 21, 2026
@mostlygeek mostlygeek deleted the llama-swap-containers branch March 22, 2026 23:55
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant