docker/unified: add ik_llama.cpp to CUDA container by mostlygeek · Pull Request #620 · mostlygeek/llama-swap

mostlygeek · 2026-04-01T06:10:05Z

Add ik-llama-server (ikawrakow/ik_llama.cpp) to the unified CUDA image.
The binary is built statically with CUDA support and installed alongside
the existing llama-server to avoid naming collisions.

add install-ik-llama.sh build script (mirrors install-llama.sh pattern)
add ik-llama-cuda build stage in Dockerfile; ik-llama-vulkan is a no-op
so BuildKit skips the CUDA stage entirely on vulkan builds
resolve IK_LLAMA_REF in build-image.sh (cuda only, "n/a" for vulkan)
verify ik-llama-server binary in post-build check (cuda only)
add ik_llama_ref input to unified-docker.yml workflow
track ik_llama.cpp commit hash in /versions.txt

https://claude.ai/code/session_01RRofFLwAiAPfk4NgcNegb9

Add ik-llama-server (ikawrakow/ik_llama.cpp) to the unified CUDA image. The binary is built statically with CUDA support and installed alongside the existing llama-server to avoid naming collisions. - add install-ik-llama.sh build script (mirrors install-llama.sh pattern) - add ik-llama-cuda build stage in Dockerfile; ik-llama-vulkan is a no-op so BuildKit skips the CUDA stage entirely on vulkan builds - resolve IK_LLAMA_REF in build-image.sh (cuda only, "n/a" for vulkan) - verify ik-llama-server binary in post-build check (cuda only) - add ik_llama_ref input to unified-docker.yml workflow - track ik_llama.cpp commit hash in /versions.txt https://claude.ai/code/session_01RRofFLwAiAPfk4NgcNegb9

coderabbitai · 2026-04-01T06:10:20Z

No actionable comments were generated in the recent review. 🎉

ℹ️ Recent review info

⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

Run ID: 959f0cac-2ccc-4097-85cf-2fe94c02c927

📥 Commits

Reviewing files that changed from the base of the PR and between 4a8af43 and bc73e6d.

📒 Files selected for processing (2)

docker/unified/Dockerfile
docker/unified/install-ik-llama.sh

✅ Files skipped from review due to trivial changes (1)

docker/unified/install-ik-llama.sh

🚧 Files skipped from review as they are similar to previous changes (1)

docker/unified/Dockerfile

Walkthrough

Adds optional ik_llama.cpp support to the unified Docker build: new workflow input, build-script logic to resolve CUDA-only commit hashes, a CUDA-focused installer script, and multi-stage Dockerfile changes that conditionally include ik-llama binaries and report its commit in versions.txt.

Changes

Cohort / File(s)	Summary
CI Workflow `.github/workflows/unified-docker.yml`	Added `ik_llama_ref` workflow_dispatch input and mapped it to `IK_LLAMA_REF` in the `build` job environment.
Build orchestration script `docker/unified/build-image.sh`	Added `IK_LLAMA_REF` env/help and `IK_LLAMA_REPO`; resolve `IK_LLAMA_HASH` only for `--cuda` (set `"n/a"` for `--vulkan`); pass `IK_LLAMA_COMMIT_HASH` build-arg; conditionally verify/report `ik-llama-server` and its hash for CUDA builds.
Dockerfile `docker/unified/Dockerfile`	Updated base images; added multi-stage ik-llama builds (`ik-llama-vulkan` stub, `ik-llama-cuda` using `IK_LLAMA_COMMIT_HASH`) with `ARG BACKEND` selecting `ik-llama-build`; always copy `/install/bin/` into runtime; append `ik_llama.cpp: ${IK_LLAMA_COMMIT_HASH}` to `/versions.txt`.
Installer script `docker/unified/install-ik-llama.sh`	New executable script that clones/fetches `ikawrakow/ik_llama.cpp` at a specified ref, configures CUDA-focused CMake flags, builds `llama-server`, validates output, and installs `ik-llama-server` to `/install/bin/`.

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~25 minutes

Possibly related PRs

docker/unified: add cuda based unified container #597: Extends the unified Docker workflow and build pipeline that this PR further modifies to add ik_llama support.
docker/unified: build llama.cpp with static libraries #616: Modifies docker/unified build/runtime copying and artifact expectations similar to changes here.
.github,docker/unified: include vulkan build #599: Introduced backend selection (CUDA vs Vulkan) in the unified build flow that this PR conditions ik_llama builds on.

🚥 Pre-merge checks | ✅ 3

✅ Passed checks (3 passed)

Check name	Status	Explanation
Title check	✅ Passed	The title accurately summarizes the main change: adding ik_llama.cpp to the CUDA container, which is reflected in all modified files.
Description check	✅ Passed	The description clearly explains the purpose and details of the changeset, including the new script, build stages, and workflow changes.
Docstring Coverage	✅ Passed	No functions found in the changed files to evaluate docstring coverage. Skipping docstring coverage check.

_{✏️ Tip: You can configure your own custom pre-merge checks in the settings.}

✨ Finishing Touches

🧪 Generate unit tests (beta)

Create PR with unit tests
Commit unit tests in branch claude/add-ik-llama-docker-Dlzie

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

_{Comment @coderabbitai help to get the list of available commands and usage tips.}

coderabbitai

Actionable comments posted: 1

🧹 Nitpick comments (1)

docker/unified/build-image.sh (1)
160-162: Tag resolution pattern in resolve_ref() could be made more robust.

The resolve_ref function on line 161 queries refs/tags/<ref> without peeling annotated tags. While the ik_llama.cpp repository currently contains no annotated tags that would cause SHA mismatches, the code pattern could be hardened against future tag types by prioritizing peeled refs.

Consider this optional improvement to resolve_ref():
Defensive fix in resolve_ref()
-    hash=$(git ls-remote "${repo_url}" "refs/tags/${ref}" "refs/heads/${ref}" 2>/dev/null | head -1 | cut -f1)
+    hash=$(git ls-remote "${repo_url}" "refs/tags/${ref}^{}" "refs/tags/${ref}" "refs/heads/${ref}" 2>/dev/null | head -1 | cut -f1)
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@docker/unified/build-image.sh` around lines 160 - 162, The resolve_ref()
helper currently resolves tags by checking refs/tags/<ref> which can return an
annotated tag object rather than the peeled commit; update resolve_ref() so it
first tries the peeled tag ref (refs/tags/<ref>^{}) to get the underlying commit
SHA, then falls back to refs/tags/<ref>, then refs/heads/<ref>, and finally any
plain ref lookup—ensure this behavior is used when computing IK_LLAMA_HASH from
resolve_ref("${IK_LLAMA_REPO}" "${IK_LLAMA_REF}") so annotated tags in
ik_llama.cpp resolve to the commit SHA rather than the tag object.

🤖 Prompt for all review comments with AI agents

Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@docker/unified/install-ik-llama.sh`:
- Around line 31-33: The linker flag "-Wl,--allow-shlib-undefined" in the
CMAKE_EXE_LINKER_FLAGS entry should be removed because it permits unresolved
shared-library symbols to pass link-time checks; edit the array element
containing "-DCMAKE_EXE_LINKER_FLAGS=-Wl,-rpath-link,/usr/local/cuda/lib64/stubs
-lcuda -Wl,--allow-shlib-undefined" and delete the trailing
"-Wl,--allow-shlib-undefined" portion so the CMAKE_EXE_LINKER_FLAGS only
includes the rpath-link and -lcuda entries.

---

Nitpick comments:
In `@docker/unified/build-image.sh`:
- Around line 160-162: The resolve_ref() helper currently resolves tags by
checking refs/tags/<ref> which can return an annotated tag object rather than
the peeled commit; update resolve_ref() so it first tries the peeled tag ref
(refs/tags/<ref>^{}) to get the underlying commit SHA, then falls back to
refs/tags/<ref>, then refs/heads/<ref>, and finally any plain ref lookup—ensure
this behavior is used when computing IK_LLAMA_HASH from
resolve_ref("${IK_LLAMA_REPO}" "${IK_LLAMA_REF}") so annotated tags in
ik_llama.cpp resolve to the commit SHA rather than the tag object.

🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

Push a commit to this branch (recommended)
Create a new PR with the fixes

ℹ️ Review info

⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

Run ID: 8d88c815-b744-4ad2-a07c-22d3183dd3be

📥 Commits

Reviewing files that changed from the base of the PR and between 9559009 and 9b1b54c.

📒 Files selected for processing (4)

.github/workflows/unified-docker.yml
docker/unified/Dockerfile
docker/unified/build-image.sh
docker/unified/install-ik-llama.sh

coderabbitai · 2026-04-01T06:20:06Z

docker/unified/install-ik-llama.sh

+    "-DCMAKE_CUDA_FLAGS=-allow-unsupported-compiler"
+    "-DCMAKE_EXE_LINKER_FLAGS=-Wl,-rpath-link,/usr/local/cuda/lib64/stubs -lcuda -Wl,--allow-shlib-undefined"
+)


⚠️ Potential issue | 🟠 Major

🧩 Analysis chain

🌐 Web query:

What does GNU ld flag --allow-shlib-undefined do for executable linking, and what runtime risks does it introduce for CUDA-linked binaries?

💡 Result:

The GNU ld flag --allow-shlib-undefined permits the linker to accept undefined symbols specifically in shared libraries (.so files) that are linked into the output, rather than failing the link. It is the default behavior. This flag does not directly affect undefined symbols in regular object files (.o); those are controlled separately (e.g., by --no-undefined or -z defs). For executable linking: When creating an executable, ld by default already enforces that all symbols (including those from shared libraries) must be resolved at link time, treating unresolved symbols from shared libraries as errors. Specifying --allow-shlib-undefined explicitly changes this default for executables, allowing unresolved symbols from shared libraries to pass link-time checks. This defers resolution to runtime by the dynamic loader. Without it (or with --no-allow-shlib-undefined), the linker errors on unresolved symbols in any input shared libraries. Runtime risks for CUDA-linked binaries: CUDA applications link against shared libraries like libcudart.so, libcublas.so, which may have internal undefined symbols (e.g., GLIBC_PRIVATE symbols) or complex dependencies. Using --allow-shlib-undefined lets the executable link despite these, but risks runtime failures if: - Required CUDA libraries are missing or version-mismatched at runtime, causing undefined symbol errors during loading or lazy PLT resolution. - Multiple CUDA runtime instances (static vs. shared) lead to ABI mismatches, duplicate registrations, or crashes during kernel launches or managed memory access—issues NVCC changes since CUDA 13.0 aim to prevent via hidden visibility. - Subtle failures like incorrect kernel behavior or segfaults occur without build-time detection, as seen in shared library scenarios with global functions or device variables exported unexpectedly. Prefer --no-allow-shlib-undefined (or -Wl,--no-allow-shlib-undefined) for stricter link-time checks to catch dependency issues early.

Citations:

1: https://sourceware.org/binutils/docs/ld/Options.html

2: https://wiki.gentoo.org/wiki/Project:Quality_Assurance/-Wl,-z,defs_and_-Wl,--no-allow-shlib-undefined

3: http://sourceware.org/binutils/docs-2.16/ld/Options.html

4: https://stackoverflow.com/questions/2356168/force-gcc-to-notify-about-undefined-references-in-shared-libraries

5: https://manpages.ubuntu.com/manpages/jammy/man1/alpha-linux-gnu-ld.bfd.1.html

6: https://stackoverflow.com/questions/31371867/what-is-the-g-flag-that-permits-undefined-references-in-shared-libraries

7: https://stackoverflow.com/questions/55928308/whats-the-difference-between-unresolved-symbols-ignore-in-shared-libs-and-a

8: https://developer.nvidia.com/blog/cuda-c-compiler-updates-impacting-elf-visibility-and-linkage/

9: https://stackoverflow.com/questions/65150791/undefined-symbol-despite-being-defined-in-linked-library-cuda-10-1

🏁 Script executed:

# First, let's check if these files exist and examine their content fd -t f "install.*llama.*" docker/ # Check the specific lines in install-ik-llama.sh echo "=== install-ik-llama.sh (lines 25-35) ===" sed -n '25,35p' docker/unified/install-ik-llama.sh # Check the corresponding section in install-llama.sh for comparison echo -e "\n=== install-llama.sh (check similar CMake flags section) ===" if [ -f docker/unified/install-llama.sh ]; then grep -n "CMAKE_EXE_LINKER_FLAGS" docker/unified/install-llama.sh -A 1 -B 1 fi

Repository: mostlygeek/llama-swap

Length of output: 888

🏁 Script executed:

# Check git history for this file to understand why the flag was added git log --oneline docker/unified/install-ik-llama.sh 2>/dev/null | head -5 # Look for any comments or documentation around this script echo "=== Full install-ik-llama.sh script ===" wc -l docker/unified/install-ik-llama.sh # Look at the full script to understand context cat -n docker/unified/install-ik-llama.sh | head -50 # Check if there are any comments about CUDA linking issues grep -n "allow-shlib-undefined\|undefined\|CUDA\|link" docker/unified/install-ik-llama.sh

Repository: mostlygeek/llama-swap

Length of output: 2458

Remove --allow-shlib-undefined from line 32.

The flag -Wl,--allow-shlib-undefined permits unresolved symbols from shared libraries to pass link-time checks, deferring resolution to runtime. This weakens build safety and can hide missing or mismatched CUDA libraries until runtime failures occur. The baseline install-llama.sh builds successfully without this flag using the same CUDA linker configuration, so it is not required.

Suggested change

- "-DCMAKE_EXE_LINKER_FLAGS=-Wl,-rpath-link,/usr/local/cuda/lib64/stubs -lcuda -Wl,--allow-shlib-undefined" + "-DCMAKE_EXE_LINKER_FLAGS=-Wl,-rpath-link,/usr/local/cuda/lib64/stubs -lcuda"

📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change

"-DCMAKE_CUDA_FLAGS=-allow-unsupported-compiler"

"-DCMAKE_EXE_LINKER_FLAGS=-Wl,-rpath-link,/usr/local/cuda/lib64/stubs -lcuda -Wl,--allow-shlib-undefined"

)

"-DCMAKE_CUDA_FLAGS=-allow-unsupported-compiler"

"-DCMAKE_EXE_LINKER_FLAGS=-Wl,-rpath-link,/usr/local/cuda/lib64/stubs -lcuda"

)

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed. In `@docker/unified/install-ik-llama.sh` around lines 31 - 33, The linker flag "-Wl,--allow-shlib-undefined" in the CMAKE_EXE_LINKER_FLAGS entry should be removed because it permits unresolved shared-library symbols to pass link-time checks; edit the array element containing "-DCMAKE_EXE_LINKER_FLAGS=-Wl,-rpath-link,/usr/local/cuda/lib64/stubs -lcuda -Wl,--allow-shlib-undefined" and delete the trailing "-Wl,--allow-shlib-undefined" portion so the CMAKE_EXE_LINKER_FLAGS only includes the rpath-link and -lcuda entries.

The ikawrakow/ik_llama.cpp repo uses "main" not "master". https://claude.ai/code/session_01RRofFLwAiAPfk4NgcNegb9

Three bugs prevented the ik_llama.cpp stage from building: - Remove -DLLAMA_CURL=ON: requires libcurl4-openssl-dev which is not installed in the builder; consistent with how llama.cpp is built - Upgrade cmake via pip3: Ubuntu 22.04 ships cmake 3.22 which lacks the NVCC compiler feature mapping for CUDA C++20 (required by ik_llama.cpp main branch) - Fix 4GB binary link failure: compiling for 5 CUDA architectures with hundreds of quantization kernels produces a ~4GB static binary that overflows x86-64 32-bit PC-relative addressing (PLT entries and exception tables); fixed by building with -mcmodel=large for all host C/C++ and CUDA host compilation, and using the mold linker Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

Upgrade all base images from Ubuntu 22.04 / CUDA 12.4.0 to Ubuntu 24.04 / CUDA 12.9.1, and remove the Ubuntu 22.04 workarounds added in the previous commit. - Remove pip3 cmake upgrade (Ubuntu 24.04 ships cmake 3.28+ with CUDA C++20 support) - Remove -mcmodel=large C/CXX/CUDA flags (not needed with newer toolchain) - Remove -fuse-ld=mold and mold package (linker overflow no longer occurs) - Fix ubuntu:26.04 → ubuntu:24.04 for vulkan builder and runtime stages - Remove git from runtime images (not needed at runtime, reduces image size) Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

candrews · 2026-04-02T15:54:04Z

If ik_llama.cpp is being added to the cuda image, it really should be also added to the other images too (or at least cpu and vulkan).

mostlygeek · 2026-04-02T15:57:47Z

@candrews ik_llama only has cuda acceleration so it only makes sense to add it to that container. I’m not planning on creating a unified CPU container but people should be able to take what’s in the repo and create their own.

candrews · 2026-04-02T16:03:14Z

ik_llama only has cuda acceleration

ik_llama also has vulkan support and it currently works (although to your point it's not as good as their cuda support). And they're planning on improving it: ikawrakow/ik_llama.cpp#590

So I think including ik_llama in the vulkan image makes sense now and for the future.

coderabbitai bot reviewed Apr 1, 2026

View reviewed changes

claude and others added 3 commits April 1, 2026 08:33

docker/unified: fix ik_llama.cpp default branch to main

4a8af43

The ikawrakow/ik_llama.cpp repo uses "main" not "master". https://claude.ai/code/session_01RRofFLwAiAPfk4NgcNegb9

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

docker/unified: add ik_llama.cpp to CUDA container#620

docker/unified: add ik_llama.cpp to CUDA container#620
mostlygeek wants to merge 4 commits intomainfrom
claude/add-ik-llama-docker-Dlzie

mostlygeek commented Apr 1, 2026

Uh oh!

coderabbitai bot commented Apr 1, 2026 •

edited

Loading

Uh oh!

coderabbitai bot left a comment

Uh oh!

coderabbitai bot Apr 1, 2026

Uh oh!

candrews commented Apr 2, 2026

Uh oh!

mostlygeek commented Apr 2, 2026

Uh oh!

candrews commented Apr 2, 2026 •

edited

Loading

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Conversation

mostlygeek commented Apr 1, 2026

Uh oh!

coderabbitai bot commented Apr 1, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Walkthrough

Changes

Estimated code review effort

Possibly related PRs

Uh oh!

coderabbitai bot left a comment

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot Apr 1, 2026

Choose a reason for hiding this comment

Uh oh!

candrews commented Apr 2, 2026

Uh oh!

mostlygeek commented Apr 2, 2026

Uh oh!

candrews commented Apr 2, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

coderabbitai bot commented Apr 1, 2026 •

edited

Loading

candrews commented Apr 2, 2026 •

edited

Loading