docker: create a unified container with llama-server, whisper.cpp and stable-diffusion.cpp#589
docker: create a unified container with llama-server, whisper.cpp and stable-diffusion.cpp#589mostlygeek wants to merge 4 commits intomainfrom
Conversation
Build llama-swap binary from local source code instead of downloading from GitHub releases. This ensures the container uses the exact code in the repository. - Add golang:1.25-alpine builder stage to compile llama-swap - Generate version from git hash with +dirty suffix for unstaged changes - Update build-image.sh to use repository root as build context - Remove LLAMA_SWAP_VERSION environment variable and related code - Add test-binaries.sh to Dockerfile.vulkan for consistency Both CUDA and Vulkan Dockerfiles now build llama-swap from source.
WalkthroughModified Makefile to disable Linux arm64 builds. Added Docker infrastructure including multi-stage Dockerfiles for CUDA and Vulkan backends, build automation scripts, operational guidelines, and configuration file updates for Docker artifact management. Changes
Estimated code review effort🎯 4 (Complex) | ⏱️ ~60 minutes Possibly related PRs
🚥 Pre-merge checks | ✅ 2 | ❌ 1❌ Failed checks (1 warning)
✅ Passed checks (2 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches🧪 Generate unit tests (beta)
📝 Coding Plan
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment Tip You can make CodeRabbit's review stricter and more nitpicky using the `assertive` profile, if that's what you prefer.Change the |
There was a problem hiding this comment.
Actionable comments posted: 7
🧹 Nitpick comments (5)
docker/test-binaries.sh (1)
62-102: Consider testing all binaries thatbuild-image.shverifies.The script tests only 3 server binaries, while
build-image.sh(lines 240-244) verifies 7 binaries includingllama-cli,whisper-cli,sd-cli, andllama-swap. For consistency, consider extending this script to cover all binaries, or document why the scope differs.♻️ Proposed addition for cli binaries and llama-swap
# Test sd-server (stable-diffusion) print_info "Testing sd-server..." if command -v sd-server &> /dev/null; then if sd-server --help > /dev/null 2>&1 || sd-server -h > /dev/null 2>&1; then print_info "✓ sd-server: OK" else print_error "✗ sd-server: Failed to run" exit 1 fi else print_error "✗ sd-server: Not found in PATH" exit 1 fi +# Test CLI binaries (quick existence check) +for cli_binary in llama-cli whisper-cli sd-cli llama-swap; do + print_info "Testing ${cli_binary}..." + if command -v "${cli_binary}" &> /dev/null; then + if "${cli_binary}" --help > /dev/null 2>&1 || "${cli_binary}" -h > /dev/null 2>&1 || "${cli_binary}" --version > /dev/null 2>&1; then + print_info "✓ ${cli_binary}: OK" + else + print_error "✗ ${cli_binary}: Failed to run" + exit 1 + fi + else + print_error "✗ ${cli_binary}: Not found in PATH" + exit 1 + fi +done + print_info "All binary tests passed!"🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@docker/test-binaries.sh` around lines 62 - 102, The test script currently checks only llama-server, whisper-server, and sd-server; extend docker/test-binaries.sh to also verify the CLI binaries and swap tool that build-image.sh checks by adding the same presence-and-help invocation pattern for llama-cli, whisper-cli, sd-cli, and llama-swap (use command -v <binary> and run "<binary> --help" or "<binary> -h"), following the existing blocks for llama-server/whisper-server/sd-server so failures print the same print_error messages and exit 1; reference the existing test blocks for llama-server, whisper-server, and sd-server as templates to implement the new checks.Makefile (1)
54-54: Commented-out arm64 build is acceptable but consider adding a comment explaining why.The arm64 build is still supported via GoReleaser (
.goreleaser.yamlincludeslinux+arm64), so automated releases remain unaffected. Consider adding a brief comment explaining the reasoning (e.g., Docker focus on amd64) to prevent future confusion.🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@Makefile` at line 54, Add a short explanatory comment above the commented-out arm64 build line (the line starting with "#GOOS=linux GOARCH=arm64 go build ...") clarifying why it's disabled locally—for example noting that releases still include linux/arm64 via GoReleaser (.goreleaser.yaml) and local builds/Docker focus on amd64—to avoid future confusion; update the Makefile near that commented GOOS/GOARCH line to include this one-line rationale.docker/Dockerfile.vulkan (2)
183-189: Consider adding a non-root user for improved security posture.The container runs as root by default. While GPU device access often requires specific permissions, adding an optional non-root user would improve security for deployments that can accommodate it.
🛡️ Proposed addition for non-root user support
FROM ubuntu:22.04 AS runtime ARG LLAMA_COMMIT_HASH=unknown ARG WHISPER_COMMIT_HASH=unknown ARG SD_COMMIT_HASH=unknown +ARG UID=0 +ARG GID=0 ENV DEBIAN_FRONTEND=noninteractive ENV PATH="/usr/local/bin:${PATH}"Then near the end, before
WORKDIR /models:+# Create non-root user if UID != 0 +RUN if [ "$UID" != "0" ]; then \ + groupadd -g "$GID" llama 2>/dev/null || true && \ + useradd -u "$UID" -g "$GID" -m llama && \ + chown -R llama:llama /models; \ + fi +USER ${UID}:${GID} + WORKDIR /models🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@docker/Dockerfile.vulkan` around lines 183 - 189, Add an optional non-root user in the runtime stage by creating a user/group (e.g., llamauser), setting its home, chowning any runtime-owned dirs (notably /models) and switching to that user with USER before the WORKDIR /models line; make this conditional via a build ARG (e.g., CREATE_NONROOT=true) so GPU/permission-sensitive deployments can opt out, and ensure any files, binaries in /usr/local/bin and installed runtime assets are owned or readable by that user (adjust RUN steps that install runtime dependencies to chown where necessary).
27-28: Consider embedding version info via ldflags for consistency with the Makefile build.The Makefile embeds version info using
-ldflags="-X main.commit=... -X main.version=...", but this build uses a plaingo build. While the version file approach works,llama-swap --version(line 252) may not display useful info.♻️ Proposed fix to embed version
ENV CGO_ENABLED=0 GOOS=linux GOARCH=amd64 -RUN go build -o llama-swap . + +# Get version info first +RUN VERSION=$(git rev-parse --short HEAD 2>/dev/null || echo "local") && \ + if [ -n "$(git status --porcelain 2>/dev/null)" ]; then VERSION="${VERSION}+dirty"; fi && \ + BUILD_DATE=$(date -u +"%Y-%m-%dT%H:%M:%SZ") && \ + go build -ldflags="-X main.commit=${VERSION} -X main.version=local_${VERSION} -X main.date=${BUILD_DATE}" -o llama-swap .🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@docker/Dockerfile.vulkan` around lines 27 - 28, The Dockerfile.vulkan build step currently runs RUN go build -o llama-swap . without embedding version info; change the build invocation used in the RUN step (the RUN go build -o llama-swap . command) to pass the same ldflags used in the Makefile (set -ldflags with -X main.version=... and -X main.commit=... and any other version variables) so the binary exposes correct values consumed by the --version code paths (main.version/main.commit); ensure the values are populated from the same build-time variables used elsewhere (e.g., environment or ARG) so Docker build mirrors the Makefile behavior.docker/Dockerfile.cuda (1)
182-186: Consider adding a non-root user for improved security posture.Same concern as Dockerfile.vulkan—the container runs as root by default. Adding optional non-root user support would improve security.
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@docker/Dockerfile.cuda` around lines 182 - 186, The Dockerfile.runtime stage (FROM nvidia/cuda:12.4.0-runtime-ubuntu22.04 AS runtime) runs as root; add an optional non-root user and switch to it: introduce build ARGs (e.g., USER_ID, GROUP_ID, USER_NAME), create a group and user, set HOME, create/apply ownership to runtime directories used by the image (e.g., /usr/local/bin, any app or workspace dir), and then use USER to switch from root to that user before the final image. Ensure you preserve the existing ENV lines (DEBIAN_FRONTEND, PATH, LD_LIBRARY_PATH) and make chown operations deterministic so the container can still run with GPU access while improving security.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@docker/AGENTS.md`:
- Around line 3-4: Fix the Markdown list indentation and update the example
Docker build command so it is complete and clear: correct the sub-item
indentation under the list entry and replace the incomplete command with a full
example that includes DOCKER_BUILDKIT=1, the image name with a backend tag
(e.g., :cuda or :vulkan) and the Dockerfile path (e.g., using -f
Dockerfile.<backend> or specifying the Dockerfile location). Ensure the line
shows the full command form (setting DOCKER_BUILDKIT=1, docker build -t
llama-swap:<backend> -f <DockerfilePath> .) and update the explanatory text to
mention the required backend tag and Dockerfile path.
In `@docker/build-image.sh`:
- Around line 204-221: The script creates buildkitd.toml in the current working
directory which can orphan files when the script is run from elsewhere; change
creation and use of buildkitd.toml to the script's directory. Modify the block
that writes buildkitd.toml (the here-doc marker BUILDKIT_EOF and the filename
buildkitd.toml) to write into the directory of the running script (e.g., using
the script dir variable like DIR="$(cd "$(dirname "${BASH_SOURCE[0]}")" &&
pwd)"), then reference that path when calling docker buildx create
--buildkitd-config and when cleaning up; ensure BUILDER_NAME and docker buildx
use logic remain unchanged but operate against the config file located next to
the script.
- Around line 252-254: The failure message always suggests "./build-image.sh
--vulkan --no-cache" regardless of selected backend; update the echo lines that
print the recovery suggestion so they reference the actual backend flag instead
of hardcoding "--vulkan" (e.g., build the suggestion from the script's backend
variable/flag such as BACKEND or BUILD_BACKEND and print "./build-image.sh
--${BACKEND} --no-cache" or conditionally choose "--cuda" vs "--vulkan"), and
ensure the existing "--no-cache" advice remains.
- Around line 117-128: get_default_branch currently treats a successful git
command as proof the branch exists and blindly falls back to "master" even when
neither branch exists; update get_default_branch to capture the output of git
ls-remote --heads "${repo_url}" <branch> for both "master" and "main", check
that the captured output is non-empty before echoing the branch, and if neither
check returns output, emit an error to stderr (or return a non-zero status)
instead of silently echoing "master"; refer to the get_default_branch function
and replace the existing if/elif/else checks with output-validated checks and a
proper failure path.
In `@docker/Dockerfile.cuda`:
- Around line 127-128: The Dockerfile currently silences failures when copying
binaries by using the "cp ... 2>/dev/null || true" pattern which can hide
missing build artifacts; update the copy step that references
build/bin/whisper-cli and build/bin/whisper-server to remove the "2>/dev/null ||
true" suppression and instead let the COPY/CP fail the build, or add an explicit
verification after the copy (check for existence of build/bin/whisper-cli and
build/bin/whisper-server or test -f) and exit non‑zero if they are missing so
the image build fails fast when required binaries are absent.
- Around line 230-235: Uncomment and restore the block that writes version
information to /versions.txt so build-image.sh can read llama-swap version;
ensure the echo/subshell commands that call /usr/local/bin/llama-server,
sd-server, llama-swap and use ${WHISPER_COMMIT_HASH} are present in the runtime
stage (not just build stage) and that any ARGs (e.g., WHISPER_COMMIT_HASH) are
declared in the runtime stage so they are available there; re-enable the
multi-echo block, preserving proper quoting/escaping for subshell output and
include build_timestamp via date -u to match the Vulkan Dockerfile behavior.
In `@docker/Dockerfile.vulkan`:
- Line 223: The Dockerfile uses COPY --from=llama-build /install/examples/
/opt/llama.cpp/examples/ but the llama-build stage only creates /install/bin and
/install/lib, so the COPY will fail; either remove this COPY or ensure the
llama-build stage creates /install/examples (e.g., create the directory and copy
examples into it) before the final COPY; locate the instruction in the
Dockerfile referencing the llama-build stage and update the stage (or the COPY
line) accordingly so /install/examples exists when COPY --from=llama-build
/install/examples/ is executed.
---
Nitpick comments:
In `@docker/Dockerfile.cuda`:
- Around line 182-186: The Dockerfile.runtime stage (FROM
nvidia/cuda:12.4.0-runtime-ubuntu22.04 AS runtime) runs as root; add an optional
non-root user and switch to it: introduce build ARGs (e.g., USER_ID, GROUP_ID,
USER_NAME), create a group and user, set HOME, create/apply ownership to runtime
directories used by the image (e.g., /usr/local/bin, any app or workspace dir),
and then use USER to switch from root to that user before the final image.
Ensure you preserve the existing ENV lines (DEBIAN_FRONTEND, PATH,
LD_LIBRARY_PATH) and make chown operations deterministic so the container can
still run with GPU access while improving security.
In `@docker/Dockerfile.vulkan`:
- Around line 183-189: Add an optional non-root user in the runtime stage by
creating a user/group (e.g., llamauser), setting its home, chowning any
runtime-owned dirs (notably /models) and switching to that user with USER before
the WORKDIR /models line; make this conditional via a build ARG (e.g.,
CREATE_NONROOT=true) so GPU/permission-sensitive deployments can opt out, and
ensure any files, binaries in /usr/local/bin and installed runtime assets are
owned or readable by that user (adjust RUN steps that install runtime
dependencies to chown where necessary).
- Around line 27-28: The Dockerfile.vulkan build step currently runs RUN go
build -o llama-swap . without embedding version info; change the build
invocation used in the RUN step (the RUN go build -o llama-swap . command) to
pass the same ldflags used in the Makefile (set -ldflags with -X
main.version=... and -X main.commit=... and any other version variables) so the
binary exposes correct values consumed by the --version code paths
(main.version/main.commit); ensure the values are populated from the same
build-time variables used elsewhere (e.g., environment or ARG) so Docker build
mirrors the Makefile behavior.
In `@docker/test-binaries.sh`:
- Around line 62-102: The test script currently checks only llama-server,
whisper-server, and sd-server; extend docker/test-binaries.sh to also verify the
CLI binaries and swap tool that build-image.sh checks by adding the same
presence-and-help invocation pattern for llama-cli, whisper-cli, sd-cli, and
llama-swap (use command -v <binary> and run "<binary> --help" or "<binary> -h"),
following the existing blocks for llama-server/whisper-server/sd-server so
failures print the same print_error messages and exit 1; reference the existing
test blocks for llama-server, whisper-server, and sd-server as templates to
implement the new checks.
In `@Makefile`:
- Line 54: Add a short explanatory comment above the commented-out arm64 build
line (the line starting with "#GOOS=linux GOARCH=arm64 go build ...") clarifying
why it's disabled locally—for example noting that releases still include
linux/arm64 via GoReleaser (.goreleaser.yaml) and local builds/Docker focus on
amd64—to avoid future confusion; update the Makefile near that commented
GOOS/GOARCH line to include this one-line rationale.
ℹ️ Review info
⚙️ Run configuration
Configuration used: Path: .coderabbit.yaml
Review profile: CHILL
Plan: Pro
Run ID: 728340de-811d-43fc-8af7-ab070844d98c
📒 Files selected for processing (7)
Makefiledocker/.gitignoredocker/AGENTS.mddocker/Dockerfile.cudadocker/Dockerfile.vulkandocker/build-image.shdocker/test-binaries.sh
| - use DOCKER_BUILDKIT=1 docker build -t llama-swap:optimized | ||
| - DOCKER_BUILDKIT=1 is important to use the caching |
There was a problem hiding this comment.
Fix indentation and clarify the Docker build command.
The sub-item indentation is inconsistent (flagged by markdownlint). Also, the example command is incomplete—it should include a backend tag (:cuda or :vulkan) and the Dockerfile path.
📝 Proposed fix
-- use DOCKER_BUILDKIT=1 docker build -t llama-swap:optimized
- - DOCKER_BUILDKIT=1 is important to use the caching
+- use `DOCKER_BUILDKIT=1 docker build -t llama-swap:cuda -f docker/Dockerfile.cuda .`
+ - `DOCKER_BUILDKIT=1` is important for caching
+ - Or use `./docker/build-image.sh --cuda` which handles this automatically🧰 Tools
🪛 markdownlint-cli2 (0.21.0)
[warning] 4-4: Inconsistent indentation for list items at the same level
Expected: 0; Actual: 1
(MD005, list-indent)
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@docker/AGENTS.md` around lines 3 - 4, Fix the Markdown list indentation and
update the example Docker build command so it is complete and clear: correct the
sub-item indentation under the list entry and replace the incomplete command
with a full example that includes DOCKER_BUILDKIT=1, the image name with a
backend tag (e.g., :cuda or :vulkan) and the Dockerfile path (e.g., using -f
Dockerfile.<backend> or specifying the Dockerfile location). Ensure the line
shows the full command form (setting DOCKER_BUILDKIT=1, docker build -t
llama-swap:<backend> -f <DockerfilePath> .) and update the explanatory text to
mention the required backend tag and Dockerfile path.
| get_default_branch() { | ||
| local repo_url="$1" | ||
|
|
||
| # Check for master first | ||
| if git ls-remote --heads "${repo_url}" master &>/dev/null; then | ||
| echo "master" | ||
| elif git ls-remote --heads "${repo_url}" main &>/dev/null; then | ||
| echo "main" | ||
| else | ||
| echo "master" # fallback | ||
| fi | ||
| } |
There was a problem hiding this comment.
get_default_branch always succeeds even when no branches are found.
The function checks if git ls-remote succeeds but doesn't verify that output is non-empty. If the repo URL is valid but has neither master nor main, it silently falls back to master, which may cause confusing failures later.
🔍 Proposed fix: validate branch existence
get_default_branch() {
local repo_url="$1"
- # Check for master first
- if git ls-remote --heads "${repo_url}" master &>/dev/null; then
+ # Check for master first (verify output is non-empty)
+ if git ls-remote --heads "${repo_url}" master 2>/dev/null | grep -q .; then
echo "master"
- elif git ls-remote --heads "${repo_url}" main &>/dev/null; then
+ elif git ls-remote --heads "${repo_url}" main 2>/dev/null | grep -q .; then
echo "main"
else
- echo "master" # fallback
+ echo "ERROR: Could not detect default branch for ${repo_url}" >&2
+ return 1
fi
}🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@docker/build-image.sh` around lines 117 - 128, get_default_branch currently
treats a successful git command as proof the branch exists and blindly falls
back to "master" even when neither branch exists; update get_default_branch to
capture the output of git ls-remote --heads "${repo_url}" <branch> for both
"master" and "main", check that the captured output is non-empty before echoing
the branch, and if neither check returns output, emit an error to stderr (or
return a non-zero status) instead of silently echoing "master"; refer to the
get_default_branch function and replace the existing if/elif/else checks with
output-validated checks and a proper failure path.
| if ! docker buildx inspect "$BUILDER_NAME" >/dev/null 2>&1; then | ||
| echo "Creating custom buildx builder with max-parallelism=1..." | ||
|
|
||
| # Create buildkitd.toml config file | ||
| cat > buildkitd.toml << 'BUILDKIT_EOF' | ||
| [worker.oci] | ||
| max-parallelism = 1 | ||
| BUILDKIT_EOF | ||
|
|
||
| # Create the builder with the config | ||
| docker buildx create --name "$BUILDER_NAME" \ | ||
| --driver docker-container \ | ||
| --buildkitd-config buildkitd.toml \ | ||
| --use | ||
| else | ||
| # Switch to our builder | ||
| docker buildx use "$BUILDER_NAME" | ||
| fi |
There was a problem hiding this comment.
buildkitd.toml is created in the current working directory, not the script directory.
If the script is invoked from a different directory (e.g., ./docker/build-image.sh --cuda from repo root), the config file will be created in that directory rather than alongside the script. This could lead to orphaned files.
🔧 Proposed fix: use script directory for config
+ BUILDKIT_CONFIG="${SCRIPT_DIR}/buildkitd.toml"
+
# Create buildkitd.toml config file
- cat > buildkitd.toml << 'BUILDKIT_EOF'
+ cat > "${BUILDKIT_CONFIG}" << 'BUILDKIT_EOF'
[worker.oci]
max-parallelism = 1
BUILDKIT_EOF
# Create the builder with the config
docker buildx create --name "$BUILDER_NAME" \
--driver docker-container \
- --buildkitd-config buildkitd.toml \
+ --buildkitd-config "${BUILDKIT_CONFIG}" \
--use📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| if ! docker buildx inspect "$BUILDER_NAME" >/dev/null 2>&1; then | |
| echo "Creating custom buildx builder with max-parallelism=1..." | |
| # Create buildkitd.toml config file | |
| cat > buildkitd.toml << 'BUILDKIT_EOF' | |
| [worker.oci] | |
| max-parallelism = 1 | |
| BUILDKIT_EOF | |
| # Create the builder with the config | |
| docker buildx create --name "$BUILDER_NAME" \ | |
| --driver docker-container \ | |
| --buildkitd-config buildkitd.toml \ | |
| --use | |
| else | |
| # Switch to our builder | |
| docker buildx use "$BUILDER_NAME" | |
| fi | |
| if ! docker buildx inspect "$BUILDER_NAME" >/dev/null 2>&1; then | |
| echo "Creating custom buildx builder with max-parallelism=1..." | |
| BUILDKIT_CONFIG="${SCRIPT_DIR}/buildkitd.toml" | |
| # Create buildkitd.toml config file | |
| cat > "${BUILDKIT_CONFIG}" << 'BUILDKIT_EOF' | |
| [worker.oci] | |
| max-parallelism = 1 | |
| BUILDKIT_EOF | |
| # Create the builder with the config | |
| docker buildx create --name "$BUILDER_NAME" \ | |
| --driver docker-container \ | |
| --buildkitd-config "${BUILDKIT_CONFIG}" \ | |
| --use | |
| else | |
| # Switch to our builder | |
| docker buildx use "$BUILDER_NAME" | |
| fi |
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@docker/build-image.sh` around lines 204 - 221, The script creates
buildkitd.toml in the current working directory which can orphan files when the
script is run from elsewhere; change creation and use of buildkitd.toml to the
script's directory. Modify the block that writes buildkitd.toml (the here-doc
marker BUILDKIT_EOF and the filename buildkitd.toml) to write into the directory
of the running script (e.g., using the script dir variable like DIR="$(cd
"$(dirname "${BASH_SOURCE[0]}")" && pwd)"), then reference that path when
calling docker buildx create --buildkitd-config and when cleaning up; ensure
BUILDER_NAME and docker buildx use logic remain unchanged but operate against
the config file located next to the script.
| echo "This usually indicates a build stage failure. Try running with --no-cache flag:" | ||
| echo " ./build-image.sh --vulkan --no-cache" | ||
| exit 1 |
There was a problem hiding this comment.
Error message suggests --vulkan regardless of actual backend.
If a CUDA build fails verification, the message incorrectly suggests --vulkan --no-cache.
🐛 Proposed fix
echo ""
echo "This usually indicates a build stage failure. Try running with --no-cache flag:"
- echo " ./build-image.sh --vulkan --no-cache"
+ echo " ./build-image.sh --${BACKEND} --no-cache"
exit 1📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| echo "This usually indicates a build stage failure. Try running with --no-cache flag:" | |
| echo " ./build-image.sh --vulkan --no-cache" | |
| exit 1 | |
| echo "This usually indicates a build stage failure. Try running with --no-cache flag:" | |
| echo " ./build-image.sh --${BACKEND} --no-cache" | |
| exit 1 |
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@docker/build-image.sh` around lines 252 - 254, The failure message always
suggests "./build-image.sh --vulkan --no-cache" regardless of selected backend;
update the echo lines that print the recovery suggestion so they reference the
actual backend flag instead of hardcoding "--vulkan" (e.g., build the suggestion
from the script's backend variable/flag such as BACKEND or BUILD_BACKEND and
print "./build-image.sh --${BACKEND} --no-cache" or conditionally choose
"--cuda" vs "--vulkan"), and ensure the existing "--no-cache" advice remains.
| cp -v build/bin/whisper-cli build/bin/whisper-server /install/bin/ 2>/dev/null || true && \ | ||
| find build -name "*.so*" -type f -exec cp -v {} /install/lib/ \; |
There was a problem hiding this comment.
Silent failure on binary copy may mask build issues.
The 2>/dev/null || true pattern suppresses errors if binaries don't exist. This could mask build failures, leading to a runtime image without required binaries.
🔍 Proposed fix: fail explicitly if binaries missing
- cp -v build/bin/whisper-cli build/bin/whisper-server /install/bin/ 2>/dev/null || true && \
+ cp -v build/bin/whisper-cli build/bin/whisper-server /install/bin/ && \Alternatively, add explicit verification:
+ test -f /install/bin/whisper-cli && test -f /install/bin/whisper-server && \🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@docker/Dockerfile.cuda` around lines 127 - 128, The Dockerfile currently
silences failures when copying binaries by using the "cp ... 2>/dev/null ||
true" pattern which can hide missing build artifacts; update the copy step that
references build/bin/whisper-cli and build/bin/whisper-server to remove the
"2>/dev/null || true" suppression and instead let the COPY/CP fail the build, or
add an explicit verification after the copy (check for existence of
build/bin/whisper-cli and build/bin/whisper-server or test -f) and exit non‑zero
if they are missing so the image build fails fast when required binaries are
absent.
| # Write version information to /versions.txt | ||
| #RUN echo "llama.cpp: $(/usr/local/bin/llama-server --version 2>&1 | grep version) > /versions.txt && \ | ||
| # echo "whisper.cpp: ${WHISPER_COMMIT_HASH}" >> /versions.txt && \ | ||
| # echo "stable-diffusion.cpp: $(/usr/local/bin/sd-server --version)" >> /versions.txt && \ | ||
| # echo "llama-swap: $(/usr/local/bin/llama-swap -version)" >> /versions.txt && \ | ||
| # echo "build_timestamp: $(date -u +%Y-%m-%dT%H:%M:%SZ)" >> /versions.txt |
There was a problem hiding this comment.
Uncomment version info to match Vulkan Dockerfile and fix build-image.sh integration.
The build-image.sh script (line 270) reads /versions.txt to display the llama-swap version. With this commented out, CUDA builds will fail at that step.
🐛 Proposed fix: enable version info
-# Write version information to /versions.txt
-#RUN echo "llama.cpp: $(/usr/local/bin/llama-server --version 2>&1 | grep version) > /versions.txt && \
-# echo "whisper.cpp: ${WHISPER_COMMIT_HASH}" >> /versions.txt && \
-# echo "stable-diffusion.cpp: $(/usr/local/bin/sd-server --version)" >> /versions.txt && \
-# echo "llama-swap: $(/usr/local/bin/llama-swap -version)" >> /versions.txt && \
-# echo "build_timestamp: $(date -u +%Y-%m-%dT%H:%M:%SZ)" >> /versions.txt
+# Write version information to /versions.txt
+ARG LLAMA_COMMIT_HASH=unknown
+ARG WHISPER_COMMIT_HASH=unknown
+ARG SD_COMMIT_HASH=unknown
+RUN echo "llama.cpp: ${LLAMA_COMMIT_HASH}" > /versions.txt && \
+ echo "whisper.cpp: ${WHISPER_COMMIT_HASH}" >> /versions.txt && \
+ echo "stable-diffusion.cpp: ${SD_COMMIT_HASH}" >> /versions.txt && \
+ echo "llama-swap: local" >> /versions.txt && \
+ echo "backend: CUDA" >> /versions.txt && \
+ echo "build_timestamp: $(date -u +%Y-%m-%dT%H:%M:%SZ)" >> /versions.txtNote: The ARGs need to be declared in the runtime stage to be available.
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@docker/Dockerfile.cuda` around lines 230 - 235, Uncomment and restore the
block that writes version information to /versions.txt so build-image.sh can
read llama-swap version; ensure the echo/subshell commands that call
/usr/local/bin/llama-server, sd-server, llama-swap and use
${WHISPER_COMMIT_HASH} are present in the runtime stage (not just build stage)
and that any ARGs (e.g., WHISPER_COMMIT_HASH) are declared in the runtime stage
so they are available there; re-enable the multi-echo block, preserving proper
quoting/escaping for subshell output and include build_timestamp via date -u to
match the Vulkan Dockerfile behavior.
| COPY --from=llama-build /install/bin/llama-server /usr/local/bin/ | ||
| COPY --from=llama-build /install/bin/llama-cli /usr/local/bin/ | ||
| COPY --from=llama-build /install/lib/ /usr/local/lib/ | ||
| COPY --from=llama-build /install/examples/ /opt/llama.cpp/examples/ |
There was a problem hiding this comment.
COPY will fail: /install/examples/ is not created in the llama-build stage.
The llama-build stage (lines 174-178) only creates /install/bin and /install/lib, but not /install/examples/. This COPY instruction will cause the build to fail.
🐛 Proposed fix: remove or conditionally copy examples
COPY --from=llama-build /install/bin/llama-server /usr/local/bin/
COPY --from=llama-build /install/bin/llama-cli /usr/local/bin/
COPY --from=llama-build /install/lib/ /usr/local/lib/
-COPY --from=llama-build /install/examples/ /opt/llama.cpp/examples/If examples are needed, add this to the llama-build stage before the COPY:
mkdir -p /install/examples && \
cp -r examples/* /install/examples/ 2>/dev/null || true📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| COPY --from=llama-build /install/examples/ /opt/llama.cpp/examples/ |
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@docker/Dockerfile.vulkan` at line 223, The Dockerfile uses COPY
--from=llama-build /install/examples/ /opt/llama.cpp/examples/ but the
llama-build stage only creates /install/bin and /install/lib, so the COPY will
fail; either remove this COPY or ensure the llama-build stage creates
/install/examples (e.g., create the directory and copy examples into it) before
the final COPY; locate the instruction in the Dockerfile referencing the
llama-build stage and update the stage (or the COPY line) accordingly so
/install/examples exists when COPY --from=llama-build /install/examples/ is
executed.
The current llama-swap container is basically ggml-org/llama.cpp’s containers with the llama-swap binary injected. This PR creates an experimental image for cuda and vulkan with llama-server, whisper.cpp and stable-diffusion built in.