docker: create a unified container with llama-server, whisper.cpp and stable-diffusion.cpp by mostlygeek · Pull Request #589 · mostlygeek/llama-swap

mostlygeek · 2026-03-18T11:51:28Z

The current llama-swap container is basically ggml-org/llama.cpp’s containers with the llama-swap binary injected. This PR creates an experimental image for cuda and vulkan with llama-server, whisper.cpp and stable-diffusion built in.

Build llama-swap binary from local source code instead of downloading from GitHub releases. This ensures the container uses the exact code in the repository. - Add golang:1.25-alpine builder stage to compile llama-swap - Generate version from git hash with +dirty suffix for unstaged changes - Update build-image.sh to use repository root as build context - Remove LLAMA_SWAP_VERSION environment variable and related code - Add test-binaries.sh to Dockerfile.vulkan for consistency Both CUDA and Vulkan Dockerfiles now build llama-swap from source.

coderabbitai · 2026-03-18T11:51:45Z

Walkthrough

Modified Makefile to disable Linux arm64 builds. Added Docker infrastructure including multi-stage Dockerfiles for CUDA and Vulkan backends, build automation scripts, operational guidelines, and configuration file updates for Docker artifact management.

Changes

Cohort / File(s)	Summary
Build Configuration `Makefile`	Disabled Linux arm64 build step by converting the build command into a comment, leaving mac arm64 and linux amd64 builds unchanged.
Docker Configuration `docker/.gitignore`, `docker/AGENTS.md`	Added BuildKit configuration ignore entries and operational documentation covering DOCKER_BUILDKIT caching, notification procedures via pushover.sh, and minimum 2-hour timeout requirements for CUDA-based builds.
Multi-Stage Dockerfiles `docker/Dockerfile.cuda`, `docker/Dockerfile.vulkan`	Introduced complex multi-stage Dockerfiles orchestrating independent builds of llama.cpp, whisper.cpp, stable-diffusion.cpp, and llama-swap with per-project cache mounts, CUDA/Vulkan SDK integration, and runtime verification checks.
Build and Test Infrastructure `docker/build-image.sh`, `docker/test-binaries.sh`	Added build orchestration script supporting CUDA/Vulkan backend selection, dynamic commit hash detection, custom BuildKit configuration, and binary validation; plus complementary test script for CUDA runtime verification with GPU detection and library path management.

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~60 minutes

Possibly related PRs

build: add stable-diffusion server to musa and vulkan container images #504: Modifies Docker build pipeline to incorporate stable-diffusion (sd-server) artifacts and related build orchestration, paralleling the multi-stage dockerfile changes.
Add support for building Linux ARM64 binary in Makefile #221: Directly addresses the Linux arm64 build target in Makefile that this PR disables—represents inverse operations on the same build configuration.

🚥 Pre-merge checks | ✅ 2 | ❌ 1

❌ Failed checks (1 warning)

Check name	Status	Explanation	Resolution
Docstring Coverage	⚠️ Warning	Docstring coverage is 66.67% which is insufficient. The required threshold is 80.00%.	Write docstrings for the functions missing them to satisfy the coverage threshold.

✅ Passed checks (2 passed)

Check name	Status	Explanation
Title check	✅ Passed	The title accurately describes the main objective: creating a unified container with three components (llama-server, whisper.cpp, and stable-diffusion.cpp), which matches the PR's core changes.
Description check	✅ Passed	The description directly relates to the changeset, explaining the motivation for replacing the previous approach with a new unified image for CUDA and Vulkan backends.

_{✏️ Tip: You can configure your own custom pre-merge checks in the settings.}

✨ Finishing Touches

🧪 Generate unit tests (beta)

Create PR with unit tests
Commit unit tests in branch llama-swap-containers

📝 Coding Plan

Generate coding plan for human review comments

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

_{Comment @coderabbitai help to get the list of available commands and usage tips.}

Tip

You can make CodeRabbit's review stricter and more nitpicky using the `assertive` profile, if that's what you prefer.

Change the reviews.profile setting to assertive to make CodeRabbit's nitpick more issues in your PRs.

coderabbitai

Actionable comments posted: 7

🧹 Nitpick comments (5)

docker/test-binaries.sh (1)

62-102: Consider testing all binaries that build-image.sh verifies.

The script tests only 3 server binaries, while build-image.sh (lines 240-244) verifies 7 binaries including llama-cli, whisper-cli, sd-cli, and llama-swap. For consistency, consider extending this script to cover all binaries, or document why the scope differs.

♻️ Proposed addition for cli binaries and llama-swap

 # Test sd-server (stable-diffusion)
 print_info "Testing sd-server..."
 if command -v sd-server &> /dev/null; then
     if sd-server --help > /dev/null 2>&1 || sd-server -h > /dev/null 2>&1; then
         print_info "✓ sd-server: OK"
     else
         print_error "✗ sd-server: Failed to run"
         exit 1
     fi
 else
     print_error "✗ sd-server: Not found in PATH"
     exit 1
 fi

+# Test CLI binaries (quick existence check)
+for cli_binary in llama-cli whisper-cli sd-cli llama-swap; do
+    print_info "Testing ${cli_binary}..."
+    if command -v "${cli_binary}" &> /dev/null; then
+        if "${cli_binary}" --help > /dev/null 2>&1 || "${cli_binary}" -h > /dev/null 2>&1 || "${cli_binary}" --version > /dev/null 2>&1; then
+            print_info "✓ ${cli_binary}: OK"
+        else
+            print_error "✗ ${cli_binary}: Failed to run"
+            exit 1
+        fi
+    else
+        print_error "✗ ${cli_binary}: Not found in PATH"
+        exit 1
+    fi
+done
+
 print_info "All binary tests passed!"

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed.

In `@docker/test-binaries.sh` around lines 62 - 102, The test script currently
checks only llama-server, whisper-server, and sd-server; extend
docker/test-binaries.sh to also verify the CLI binaries and swap tool that
build-image.sh checks by adding the same presence-and-help invocation pattern
for llama-cli, whisper-cli, sd-cli, and llama-swap (use command -v <binary> and
run "<binary> --help" or "<binary> -h"), following the existing blocks for
llama-server/whisper-server/sd-server so failures print the same print_error
messages and exit 1; reference the existing test blocks for llama-server,
whisper-server, and sd-server as templates to implement the new checks.

Makefile (1)

54-54: Commented-out arm64 build is acceptable but consider adding a comment explaining why.

The arm64 build is still supported via GoReleaser (.goreleaser.yaml includes linux+arm64), so automated releases remain unaffected. Consider adding a brief comment explaining the reasoning (e.g., Docker focus on amd64) to prevent future confusion.
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@Makefile` at line 54, Add a short explanatory comment above the commented-out
arm64 build line (the line starting with "#GOOS=linux GOARCH=arm64 go build
...") clarifying why it's disabled locally—for example noting that releases
still include linux/arm64 via GoReleaser (.goreleaser.yaml) and local
builds/Docker focus on amd64—to avoid future confusion; update the Makefile near
that commented GOOS/GOARCH line to include this one-line rationale.

docker/Dockerfile.vulkan (2)

183-189: Consider adding a non-root user for improved security posture.

The container runs as root by default. While GPU device access often requires specific permissions, adding an optional non-root user would improve security for deployments that can accommodate it.

🛡️ Proposed addition for non-root user support

 FROM ubuntu:22.04 AS runtime

 ARG LLAMA_COMMIT_HASH=unknown
 ARG WHISPER_COMMIT_HASH=unknown
 ARG SD_COMMIT_HASH=unknown
+ARG UID=0
+ARG GID=0
 ENV DEBIAN_FRONTEND=noninteractive
 ENV PATH="/usr/local/bin:${PATH}"

Then near the end, before WORKDIR /models:

+# Create non-root user if UID != 0
+RUN if [ "$UID" != "0" ]; then \
+        groupadd -g "$GID" llama 2>/dev/null || true && \
+        useradd -u "$UID" -g "$GID" -m llama && \
+        chown -R llama:llama /models; \
+    fi
+USER ${UID}:${GID}
+
 WORKDIR /models

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed.

In `@docker/Dockerfile.vulkan` around lines 183 - 189, Add an optional non-root
user in the runtime stage by creating a user/group (e.g., llamauser), setting
its home, chowning any runtime-owned dirs (notably /models) and switching to
that user with USER before the WORKDIR /models line; make this conditional via a
build ARG (e.g., CREATE_NONROOT=true) so GPU/permission-sensitive deployments
can opt out, and ensure any files, binaries in /usr/local/bin and installed
runtime assets are owned or readable by that user (adjust RUN steps that install
runtime dependencies to chown where necessary).

27-28: Consider embedding version info via ldflags for consistency with the Makefile build.

The Makefile embeds version info using -ldflags="-X main.commit=... -X main.version=...", but this build uses a plain go build. While the version file approach works, llama-swap --version (line 252) may not display useful info.

♻️ Proposed fix to embed version

 ENV CGO_ENABLED=0 GOOS=linux GOARCH=amd64
-RUN go build -o llama-swap .
+
+# Get version info first
+RUN VERSION=$(git rev-parse --short HEAD 2>/dev/null || echo "local") && \
+    if [ -n "$(git status --porcelain 2>/dev/null)" ]; then VERSION="${VERSION}+dirty"; fi && \
+    BUILD_DATE=$(date -u +"%Y-%m-%dT%H:%M:%SZ") && \
+    go build -ldflags="-X main.commit=${VERSION} -X main.version=local_${VERSION} -X main.date=${BUILD_DATE}" -o llama-swap .

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed.

In `@docker/Dockerfile.vulkan` around lines 27 - 28, The Dockerfile.vulkan build
step currently runs RUN go build -o llama-swap . without embedding version info;
change the build invocation used in the RUN step (the RUN go build -o llama-swap
. command) to pass the same ldflags used in the Makefile (set -ldflags with -X
main.version=... and -X main.commit=... and any other version variables) so the
binary exposes correct values consumed by the --version code paths
(main.version/main.commit); ensure the values are populated from the same
build-time variables used elsewhere (e.g., environment or ARG) so Docker build
mirrors the Makefile behavior.

docker/Dockerfile.cuda (1)

182-186: Consider adding a non-root user for improved security posture.

Same concern as Dockerfile.vulkan—the container runs as root by default. Adding optional non-root user support would improve security.

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed.

In `@docker/Dockerfile.cuda` around lines 182 - 186, The Dockerfile.runtime stage
(FROM nvidia/cuda:12.4.0-runtime-ubuntu22.04 AS runtime) runs as root; add an
optional non-root user and switch to it: introduce build ARGs (e.g., USER_ID,
GROUP_ID, USER_NAME), create a group and user, set HOME, create/apply ownership
to runtime directories used by the image (e.g., /usr/local/bin, any app or
workspace dir), and then use USER to switch from root to that user before the
final image. Ensure you preserve the existing ENV lines (DEBIAN_FRONTEND, PATH,
LD_LIBRARY_PATH) and make chown operations deterministic so the container can
still run with GPU access while improving security.

🤖 Prompt for all review comments with AI agents

Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@docker/AGENTS.md`:
- Around line 3-4: Fix the Markdown list indentation and update the example
Docker build command so it is complete and clear: correct the sub-item
indentation under the list entry and replace the incomplete command with a full
example that includes DOCKER_BUILDKIT=1, the image name with a backend tag
(e.g., :cuda or :vulkan) and the Dockerfile path (e.g., using -f
Dockerfile.<backend> or specifying the Dockerfile location). Ensure the line
shows the full command form (setting DOCKER_BUILDKIT=1, docker build -t
llama-swap:<backend> -f <DockerfilePath> .) and update the explanatory text to
mention the required backend tag and Dockerfile path.

In `@docker/build-image.sh`:
- Around line 204-221: The script creates buildkitd.toml in the current working
directory which can orphan files when the script is run from elsewhere; change
creation and use of buildkitd.toml to the script's directory. Modify the block
that writes buildkitd.toml (the here-doc marker BUILDKIT_EOF and the filename
buildkitd.toml) to write into the directory of the running script (e.g., using
the script dir variable like DIR="$(cd "$(dirname "${BASH_SOURCE[0]}")" &&
pwd)"), then reference that path when calling docker buildx create
--buildkitd-config and when cleaning up; ensure BUILDER_NAME and docker buildx
use logic remain unchanged but operate against the config file located next to
the script.
- Around line 252-254: The failure message always suggests "./build-image.sh
--vulkan --no-cache" regardless of selected backend; update the echo lines that
print the recovery suggestion so they reference the actual backend flag instead
of hardcoding "--vulkan" (e.g., build the suggestion from the script's backend
variable/flag such as BACKEND or BUILD_BACKEND and print "./build-image.sh
--${BACKEND} --no-cache" or conditionally choose "--cuda" vs "--vulkan"), and
ensure the existing "--no-cache" advice remains.
- Around line 117-128: get_default_branch currently treats a successful git
command as proof the branch exists and blindly falls back to "master" even when
neither branch exists; update get_default_branch to capture the output of git
ls-remote --heads "${repo_url}" <branch> for both "master" and "main", check
that the captured output is non-empty before echoing the branch, and if neither
check returns output, emit an error to stderr (or return a non-zero status)
instead of silently echoing "master"; refer to the get_default_branch function
and replace the existing if/elif/else checks with output-validated checks and a
proper failure path.

In `@docker/Dockerfile.cuda`:
- Around line 127-128: The Dockerfile currently silences failures when copying
binaries by using the "cp ... 2>/dev/null || true" pattern which can hide
missing build artifacts; update the copy step that references
build/bin/whisper-cli and build/bin/whisper-server to remove the "2>/dev/null ||
true" suppression and instead let the COPY/CP fail the build, or add an explicit
verification after the copy (check for existence of build/bin/whisper-cli and
build/bin/whisper-server or test -f) and exit non‑zero if they are missing so
the image build fails fast when required binaries are absent.
- Around line 230-235: Uncomment and restore the block that writes version
information to /versions.txt so build-image.sh can read llama-swap version;
ensure the echo/subshell commands that call /usr/local/bin/llama-server,
sd-server, llama-swap and use ${WHISPER_COMMIT_HASH} are present in the runtime
stage (not just build stage) and that any ARGs (e.g., WHISPER_COMMIT_HASH) are
declared in the runtime stage so they are available there; re-enable the
multi-echo block, preserving proper quoting/escaping for subshell output and
include build_timestamp via date -u to match the Vulkan Dockerfile behavior.

In `@docker/Dockerfile.vulkan`:
- Line 223: The Dockerfile uses COPY --from=llama-build /install/examples/
/opt/llama.cpp/examples/ but the llama-build stage only creates /install/bin and
/install/lib, so the COPY will fail; either remove this COPY or ensure the
llama-build stage creates /install/examples (e.g., create the directory and copy
examples into it) before the final COPY; locate the instruction in the
Dockerfile referencing the llama-build stage and update the stage (or the COPY
line) accordingly so /install/examples exists when COPY --from=llama-build
/install/examples/ is executed.

---

Nitpick comments:
In `@docker/Dockerfile.cuda`:
- Around line 182-186: The Dockerfile.runtime stage (FROM
nvidia/cuda:12.4.0-runtime-ubuntu22.04 AS runtime) runs as root; add an optional
non-root user and switch to it: introduce build ARGs (e.g., USER_ID, GROUP_ID,
USER_NAME), create a group and user, set HOME, create/apply ownership to runtime
directories used by the image (e.g., /usr/local/bin, any app or workspace dir),
and then use USER to switch from root to that user before the final image.
Ensure you preserve the existing ENV lines (DEBIAN_FRONTEND, PATH,
LD_LIBRARY_PATH) and make chown operations deterministic so the container can
still run with GPU access while improving security.

In `@docker/Dockerfile.vulkan`:
- Around line 183-189: Add an optional non-root user in the runtime stage by
creating a user/group (e.g., llamauser), setting its home, chowning any
runtime-owned dirs (notably /models) and switching to that user with USER before
the WORKDIR /models line; make this conditional via a build ARG (e.g.,
CREATE_NONROOT=true) so GPU/permission-sensitive deployments can opt out, and
ensure any files, binaries in /usr/local/bin and installed runtime assets are
owned or readable by that user (adjust RUN steps that install runtime
dependencies to chown where necessary).
- Around line 27-28: The Dockerfile.vulkan build step currently runs RUN go
build -o llama-swap . without embedding version info; change the build
invocation used in the RUN step (the RUN go build -o llama-swap . command) to
pass the same ldflags used in the Makefile (set -ldflags with -X
main.version=... and -X main.commit=... and any other version variables) so the
binary exposes correct values consumed by the --version code paths
(main.version/main.commit); ensure the values are populated from the same
build-time variables used elsewhere (e.g., environment or ARG) so Docker build
mirrors the Makefile behavior.

In `@docker/test-binaries.sh`:
- Around line 62-102: The test script currently checks only llama-server,
whisper-server, and sd-server; extend docker/test-binaries.sh to also verify the
CLI binaries and swap tool that build-image.sh checks by adding the same
presence-and-help invocation pattern for llama-cli, whisper-cli, sd-cli, and
llama-swap (use command -v <binary> and run "<binary> --help" or "<binary> -h"),
following the existing blocks for llama-server/whisper-server/sd-server so
failures print the same print_error messages and exit 1; reference the existing
test blocks for llama-server, whisper-server, and sd-server as templates to
implement the new checks.

In `@Makefile`:
- Line 54: Add a short explanatory comment above the commented-out arm64 build
line (the line starting with "#GOOS=linux GOARCH=arm64 go build ...") clarifying
why it's disabled locally—for example noting that releases still include
linux/arm64 via GoReleaser (.goreleaser.yaml) and local builds/Docker focus on
amd64—to avoid future confusion; update the Makefile near that commented
GOOS/GOARCH line to include this one-line rationale.

ℹ️ Review info

⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

Run ID: 728340de-811d-43fc-8af7-ab070844d98c

📥 Commits

Reviewing files that changed from the base of the PR and between c3c258a and 74c625b.

📒 Files selected for processing (7)

Makefile
docker/.gitignore
docker/AGENTS.md
docker/Dockerfile.cuda
docker/Dockerfile.vulkan
docker/build-image.sh
docker/test-binaries.sh

coderabbitai · 2026-03-18T11:56:29Z

docker/AGENTS.md

+- use DOCKER_BUILDKIT=1 docker build -t llama-swap:optimized
+ - DOCKER_BUILDKIT=1 is important to use the caching


⚠️ Potential issue | 🟡 Minor

Fix indentation and clarify the Docker build command.

The sub-item indentation is inconsistent (flagged by markdownlint). Also, the example command is incomplete—it should include a backend tag (:cuda or :vulkan) and the Dockerfile path.

📝 Proposed fix

-- use DOCKER_BUILDKIT=1 docker build -t llama-swap:optimized - - DOCKER_BUILDKIT=1 is important to use the caching +- use `DOCKER_BUILDKIT=1 docker build -t llama-swap:cuda -f docker/Dockerfile.cuda .` + - `DOCKER_BUILDKIT=1` is important for caching + - Or use `./docker/build-image.sh --cuda` which handles this automatically

🧰 Tools

🪛 markdownlint-cli2 (0.21.0)

[warning] 4-4: Inconsistent indentation for list items at the same level
Expected: 0; Actual: 1

(MD005, list-indent)

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed. In `@docker/AGENTS.md` around lines 3 - 4, Fix the Markdown list indentation and update the example Docker build command so it is complete and clear: correct the sub-item indentation under the list entry and replace the incomplete command with a full example that includes DOCKER_BUILDKIT=1, the image name with a backend tag (e.g., :cuda or :vulkan) and the Dockerfile path (e.g., using -f Dockerfile.<backend> or specifying the Dockerfile location). Ensure the line shows the full command form (setting DOCKER_BUILDKIT=1, docker build -t llama-swap:<backend> -f <DockerfilePath> .) and update the explanatory text to mention the required backend tag and Dockerfile path.

coderabbitai · 2026-03-18T11:56:29Z

docker/build-image.sh

+get_default_branch() {
+    local repo_url="$1"
+
+    # Check for master first
+    if git ls-remote --heads "${repo_url}" master &>/dev/null; then
+        echo "master"
+    elif git ls-remote --heads "${repo_url}" main &>/dev/null; then
+        echo "main"
+    else
+        echo "master"  # fallback
+    fi
+}


⚠️ Potential issue | 🟡 Minor

get_default_branch always succeeds even when no branches are found.

The function checks if git ls-remote succeeds but doesn't verify that output is non-empty. If the repo URL is valid but has neither master nor main, it silently falls back to master, which may cause confusing failures later.

🔍 Proposed fix: validate branch existence

get_default_branch() { local repo_url="$1" - # Check for master first - if git ls-remote --heads "${repo_url}" master &>/dev/null; then + # Check for master first (verify output is non-empty) + if git ls-remote --heads "${repo_url}" master 2>/dev/null | grep -q .; then echo "master" - elif git ls-remote --heads "${repo_url}" main &>/dev/null; then + elif git ls-remote --heads "${repo_url}" main 2>/dev/null | grep -q .; then echo "main" else - echo "master" # fallback + echo "ERROR: Could not detect default branch for ${repo_url}" >&2 + return 1 fi }

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed. In `@docker/build-image.sh` around lines 117 - 128, get_default_branch currently treats a successful git command as proof the branch exists and blindly falls back to "master" even when neither branch exists; update get_default_branch to capture the output of git ls-remote --heads "${repo_url}" <branch> for both "master" and "main", check that the captured output is non-empty before echoing the branch, and if neither check returns output, emit an error to stderr (or return a non-zero status) instead of silently echoing "master"; refer to the get_default_branch function and replace the existing if/elif/else checks with output-validated checks and a proper failure path.

coderabbitai · 2026-03-18T11:56:29Z

docker/build-image.sh

+if ! docker buildx inspect "$BUILDER_NAME" >/dev/null 2>&1; then
+    echo "Creating custom buildx builder with max-parallelism=1..."
+
+    # Create buildkitd.toml config file
+    cat > buildkitd.toml << 'BUILDKIT_EOF'
+[worker.oci]
+  max-parallelism = 1
+BUILDKIT_EOF
+
+    # Create the builder with the config
+    docker buildx create --name "$BUILDER_NAME" \
+        --driver docker-container \
+        --buildkitd-config buildkitd.toml \
+        --use
+else
+    # Switch to our builder
+    docker buildx use "$BUILDER_NAME"
+fi


⚠️ Potential issue | 🟡 Minor

buildkitd.toml is created in the current working directory, not the script directory.

If the script is invoked from a different directory (e.g., ./docker/build-image.sh --cuda from repo root), the config file will be created in that directory rather than alongside the script. This could lead to orphaned files.

🔧 Proposed fix: use script directory for config

+ BUILDKIT_CONFIG="${SCRIPT_DIR}/buildkitd.toml" + # Create buildkitd.toml config file - cat > buildkitd.toml << 'BUILDKIT_EOF' + cat > "${BUILDKIT_CONFIG}" << 'BUILDKIT_EOF' [worker.oci] max-parallelism = 1 BUILDKIT_EOF # Create the builder with the config docker buildx create --name "$BUILDER_NAME" \ --driver docker-container \ - --buildkitd-config buildkitd.toml \ + --buildkitd-config "${BUILDKIT_CONFIG}" \ --use

📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change

if ! docker buildx inspect "$BUILDER_NAME" >/dev/null 2>&1; then

echo "Creating custom buildx builder with max-parallelism=1..."

# Create buildkitd.toml config file

cat > buildkitd.toml << 'BUILDKIT_EOF'

[worker.oci]

max-parallelism = 1

BUILDKIT_EOF

# Create the builder with the config

docker buildx create --name "$BUILDER_NAME" \

--driver docker-container \

--buildkitd-config buildkitd.toml \

--use

else

# Switch to our builder

docker buildx use "$BUILDER_NAME"

fi

if ! docker buildx inspect "$BUILDER_NAME" >/dev/null 2>&1; then

echo "Creating custom buildx builder with max-parallelism=1..."

BUILDKIT_CONFIG="${SCRIPT_DIR}/buildkitd.toml"

# Create buildkitd.toml config file

cat > "${BUILDKIT_CONFIG}" << 'BUILDKIT_EOF'

[worker.oci]

max-parallelism = 1

BUILDKIT_EOF

# Create the builder with the config

docker buildx create --name "$BUILDER_NAME" \

--driver docker-container \

--buildkitd-config "${BUILDKIT_CONFIG}" \

--use

else

# Switch to our builder

docker buildx use "$BUILDER_NAME"

fi

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed. In `@docker/build-image.sh` around lines 204 - 221, The script creates buildkitd.toml in the current working directory which can orphan files when the script is run from elsewhere; change creation and use of buildkitd.toml to the script's directory. Modify the block that writes buildkitd.toml (the here-doc marker BUILDKIT_EOF and the filename buildkitd.toml) to write into the directory of the running script (e.g., using the script dir variable like DIR="$(cd "$(dirname "${BASH_SOURCE[0]}")" && pwd)"), then reference that path when calling docker buildx create --buildkitd-config and when cleaning up; ensure BUILDER_NAME and docker buildx use logic remain unchanged but operate against the config file located next to the script.

coderabbitai · 2026-03-18T11:56:30Z

docker/build-image.sh

+    echo "This usually indicates a build stage failure. Try running with --no-cache flag:"
+    echo "  ./build-image.sh --vulkan --no-cache"
+    exit 1


⚠️ Potential issue | 🟡 Minor

Error message suggests --vulkan regardless of actual backend.

If a CUDA build fails verification, the message incorrectly suggests --vulkan --no-cache.

🐛 Proposed fix

echo "" echo "This usually indicates a build stage failure. Try running with --no-cache flag:" - echo " ./build-image.sh --vulkan --no-cache" + echo " ./build-image.sh --${BACKEND} --no-cache" exit 1

📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change

echo "This usually indicates a build stage failure. Try running with --no-cache flag:"

echo " ./build-image.sh --vulkan --no-cache"

exit 1

echo "This usually indicates a build stage failure. Try running with --no-cache flag:"

echo " ./build-image.sh --${BACKEND} --no-cache"

exit 1

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed. In `@docker/build-image.sh` around lines 252 - 254, The failure message always suggests "./build-image.sh --vulkan --no-cache" regardless of selected backend; update the echo lines that print the recovery suggestion so they reference the actual backend flag instead of hardcoding "--vulkan" (e.g., build the suggestion from the script's backend variable/flag such as BACKEND or BUILD_BACKEND and print "./build-image.sh --${BACKEND} --no-cache" or conditionally choose "--cuda" vs "--vulkan"), and ensure the existing "--no-cache" advice remains.

coderabbitai · 2026-03-18T11:56:30Z

docker/Dockerfile.cuda

+    cp -v build/bin/whisper-cli build/bin/whisper-server /install/bin/ 2>/dev/null || true && \
+    find build -name "*.so*" -type f -exec cp -v {} /install/lib/ \;


⚠️ Potential issue | 🟡 Minor

Silent failure on binary copy may mask build issues.

The 2>/dev/null || true pattern suppresses errors if binaries don't exist. This could mask build failures, leading to a runtime image without required binaries.

🔍 Proposed fix: fail explicitly if binaries missing

- cp -v build/bin/whisper-cli build/bin/whisper-server /install/bin/ 2>/dev/null || true && \ + cp -v build/bin/whisper-cli build/bin/whisper-server /install/bin/ && \

Alternatively, add explicit verification:

+ test -f /install/bin/whisper-cli && test -f /install/bin/whisper-server && \

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed. In `@docker/Dockerfile.cuda` around lines 127 - 128, The Dockerfile currently silences failures when copying binaries by using the "cp ... 2>/dev/null || true" pattern which can hide missing build artifacts; update the copy step that references build/bin/whisper-cli and build/bin/whisper-server to remove the "2>/dev/null || true" suppression and instead let the COPY/CP fail the build, or add an explicit verification after the copy (check for existence of build/bin/whisper-cli and build/bin/whisper-server or test -f) and exit non‑zero if they are missing so the image build fails fast when required binaries are absent.

coderabbitai · 2026-03-18T11:56:30Z

docker/Dockerfile.cuda

+# Write version information to /versions.txt
+#RUN echo "llama.cpp: $(/usr/local/bin/llama-server --version 2>&1 | grep version) > /versions.txt && \
+#    echo "whisper.cpp: ${WHISPER_COMMIT_HASH}" >> /versions.txt && \
+#    echo "stable-diffusion.cpp: $(/usr/local/bin/sd-server --version)" >> /versions.txt && \
+#    echo "llama-swap: $(/usr/local/bin/llama-swap -version)" >> /versions.txt && \
+#    echo "build_timestamp: $(date -u +%Y-%m-%dT%H:%M:%SZ)" >> /versions.txt


⚠️ Potential issue | 🟠 Major

Uncomment version info to match Vulkan Dockerfile and fix build-image.sh integration.

The build-image.sh script (line 270) reads /versions.txt to display the llama-swap version. With this commented out, CUDA builds will fail at that step.

🐛 Proposed fix: enable version info

-# Write version information to /versions.txt -#RUN echo "llama.cpp: $(/usr/local/bin/llama-server --version 2>&1 | grep version) > /versions.txt && \ -# echo "whisper.cpp: ${WHISPER_COMMIT_HASH}" >> /versions.txt && \ -# echo "stable-diffusion.cpp: $(/usr/local/bin/sd-server --version)" >> /versions.txt && \ -# echo "llama-swap: $(/usr/local/bin/llama-swap -version)" >> /versions.txt && \ -# echo "build_timestamp: $(date -u +%Y-%m-%dT%H:%M:%SZ)" >> /versions.txt +# Write version information to /versions.txt +ARG LLAMA_COMMIT_HASH=unknown +ARG WHISPER_COMMIT_HASH=unknown +ARG SD_COMMIT_HASH=unknown +RUN echo "llama.cpp: ${LLAMA_COMMIT_HASH}" > /versions.txt && \ + echo "whisper.cpp: ${WHISPER_COMMIT_HASH}" >> /versions.txt && \ + echo "stable-diffusion.cpp: ${SD_COMMIT_HASH}" >> /versions.txt && \ + echo "llama-swap: local" >> /versions.txt && \ + echo "backend: CUDA" >> /versions.txt && \ + echo "build_timestamp: $(date -u +%Y-%m-%dT%H:%M:%SZ)" >> /versions.txt

Note: The ARGs need to be declared in the runtime stage to be available.

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed. In `@docker/Dockerfile.cuda` around lines 230 - 235, Uncomment and restore the block that writes version information to /versions.txt so build-image.sh can read llama-swap version; ensure the echo/subshell commands that call /usr/local/bin/llama-server, sd-server, llama-swap and use ${WHISPER_COMMIT_HASH} are present in the runtime stage (not just build stage) and that any ARGs (e.g., WHISPER_COMMIT_HASH) are declared in the runtime stage so they are available there; re-enable the multi-echo block, preserving proper quoting/escaping for subshell output and include build_timestamp via date -u to match the Vulkan Dockerfile behavior.

coderabbitai · 2026-03-18T11:56:30Z

docker/Dockerfile.vulkan

+COPY --from=llama-build /install/bin/llama-server /usr/local/bin/
+COPY --from=llama-build /install/bin/llama-cli /usr/local/bin/
+COPY --from=llama-build /install/lib/ /usr/local/lib/
+COPY --from=llama-build /install/examples/ /opt/llama.cpp/examples/


⚠️ Potential issue | 🟠 Major

COPY will fail: /install/examples/ is not created in the llama-build stage.

The llama-build stage (lines 174-178) only creates /install/bin and /install/lib, but not /install/examples/. This COPY instruction will cause the build to fail.

🐛 Proposed fix: remove or conditionally copy examples

COPY --from=llama-build /install/bin/llama-server /usr/local/bin/ COPY --from=llama-build /install/bin/llama-cli /usr/local/bin/ COPY --from=llama-build /install/lib/ /usr/local/lib/ -COPY --from=llama-build /install/examples/ /opt/llama.cpp/examples/

If examples are needed, add this to the llama-build stage before the COPY:

mkdir -p /install/examples && \ cp -r examples/* /install/examples/ 2>/dev/null || true

📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change

COPY --from=llama-build /install/examples/ /opt/llama.cpp/examples/

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed. In `@docker/Dockerfile.vulkan` at line 223, The Dockerfile uses COPY --from=llama-build /install/examples/ /opt/llama.cpp/examples/ but the llama-build stage only creates /install/bin and /install/lib, so the COPY will fail; either remove this COPY or ensure the llama-build stage creates /install/examples (e.g., create the directory and copy examples into it) before the final COPY; locate the instruction in the Dockerfile referencing the llama-build stage and update the stage (or the COPY line) accordingly so /install/examples exists when COPY --from=llama-build /install/examples/ is executed.

mostlygeek added 4 commits March 17, 2026 10:54

checkpoint

6969c52

fix llama-swap build in Dockerfile.cuda

c6a93d8

docker,Makefile: fix cuda build

74c625b

coderabbitai bot reviewed Mar 18, 2026

View reviewed changes

mostlygeek closed this Mar 21, 2026

mostlygeek deleted the llama-swap-containers branch March 22, 2026 23:55

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

docker: create a unified container with llama-server, whisper.cpp and stable-diffusion.cpp#589

docker: create a unified container with llama-server, whisper.cpp and stable-diffusion.cpp#589
mostlygeek wants to merge 4 commits intomainfrom
llama-swap-containers

mostlygeek commented Mar 18, 2026 •

edited

Loading

Uh oh!

coderabbitai bot commented Mar 18, 2026 •

edited

Loading

❌ Failed checks (1 warning)

Uh oh!

coderabbitai bot left a comment

Uh oh!

coderabbitai bot Mar 18, 2026

Uh oh!

coderabbitai bot Mar 18, 2026

Uh oh!

coderabbitai bot Mar 18, 2026

Uh oh!

coderabbitai bot Mar 18, 2026

Uh oh!

coderabbitai bot Mar 18, 2026

Uh oh!

coderabbitai bot Mar 18, 2026

Uh oh!

coderabbitai bot Mar 18, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

		- use DOCKER_BUILDKIT=1 docker build -t llama-swap:optimized
		- DOCKER_BUILDKIT=1 is important to use the caching

		cp -v build/bin/whisper-cli build/bin/whisper-server /install/bin/ 2>/dev/null \|\| true && \
		find build -name ".so" -type f -exec cp -v {} /install/lib/ \;

Conversation

mostlygeek commented Mar 18, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

coderabbitai bot commented Mar 18, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Walkthrough

Changes

Estimated code review effort

Possibly related PRs

❌ Failed checks (1 warning)

Uh oh!

coderabbitai bot left a comment

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot Mar 18, 2026

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot Mar 18, 2026

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot Mar 18, 2026

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot Mar 18, 2026

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot Mar 18, 2026

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot Mar 18, 2026

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot Mar 18, 2026

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

mostlygeek commented Mar 18, 2026 •

edited

Loading

coderabbitai bot commented Mar 18, 2026 •

edited

Loading