Skip to content

docker/unified: add ik_llama.cpp to CUDA container#620

Open
mostlygeek wants to merge 4 commits intomainfrom
claude/add-ik-llama-docker-Dlzie
Open

docker/unified: add ik_llama.cpp to CUDA container#620
mostlygeek wants to merge 4 commits intomainfrom
claude/add-ik-llama-docker-Dlzie

Conversation

@mostlygeek
Copy link
Copy Markdown
Owner

Add ik-llama-server (ikawrakow/ik_llama.cpp) to the unified CUDA image.
The binary is built statically with CUDA support and installed alongside
the existing llama-server to avoid naming collisions.

  • add install-ik-llama.sh build script (mirrors install-llama.sh pattern)
  • add ik-llama-cuda build stage in Dockerfile; ik-llama-vulkan is a no-op
    so BuildKit skips the CUDA stage entirely on vulkan builds
  • resolve IK_LLAMA_REF in build-image.sh (cuda only, "n/a" for vulkan)
  • verify ik-llama-server binary in post-build check (cuda only)
  • add ik_llama_ref input to unified-docker.yml workflow
  • track ik_llama.cpp commit hash in /versions.txt

https://claude.ai/code/session_01RRofFLwAiAPfk4NgcNegb9

Add ik-llama-server (ikawrakow/ik_llama.cpp) to the unified CUDA image.
The binary is built statically with CUDA support and installed alongside
the existing llama-server to avoid naming collisions.

- add install-ik-llama.sh build script (mirrors install-llama.sh pattern)
- add ik-llama-cuda build stage in Dockerfile; ik-llama-vulkan is a no-op
  so BuildKit skips the CUDA stage entirely on vulkan builds
- resolve IK_LLAMA_REF in build-image.sh (cuda only, "n/a" for vulkan)
- verify ik-llama-server binary in post-build check (cuda only)
- add ik_llama_ref input to unified-docker.yml workflow
- track ik_llama.cpp commit hash in /versions.txt

https://claude.ai/code/session_01RRofFLwAiAPfk4NgcNegb9
@coderabbitai
Copy link
Copy Markdown

coderabbitai bot commented Apr 1, 2026

No actionable comments were generated in the recent review. 🎉

ℹ️ Recent review info
⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

Run ID: 959f0cac-2ccc-4097-85cf-2fe94c02c927

📥 Commits

Reviewing files that changed from the base of the PR and between 4a8af43 and bc73e6d.

📒 Files selected for processing (2)
  • docker/unified/Dockerfile
  • docker/unified/install-ik-llama.sh
✅ Files skipped from review due to trivial changes (1)
  • docker/unified/install-ik-llama.sh
🚧 Files skipped from review as they are similar to previous changes (1)
  • docker/unified/Dockerfile

Walkthrough

Adds optional ik_llama.cpp support to the unified Docker build: new workflow input, build-script logic to resolve CUDA-only commit hashes, a CUDA-focused installer script, and multi-stage Dockerfile changes that conditionally include ik-llama binaries and report its commit in versions.txt.

Changes

Cohort / File(s) Summary
CI Workflow
​.github/workflows/unified-docker.yml
Added ik_llama_ref workflow_dispatch input and mapped it to IK_LLAMA_REF in the build job environment.
Build orchestration script
docker/unified/build-image.sh
Added IK_LLAMA_REF env/help and IK_LLAMA_REPO; resolve IK_LLAMA_HASH only for --cuda (set "n/a" for --vulkan); pass IK_LLAMA_COMMIT_HASH build-arg; conditionally verify/report ik-llama-server and its hash for CUDA builds.
Dockerfile
docker/unified/Dockerfile
Updated base images; added multi-stage ik-llama builds (ik-llama-vulkan stub, ik-llama-cuda using IK_LLAMA_COMMIT_HASH) with ARG BACKEND selecting ik-llama-build; always copy /install/bin/ into runtime; append ik_llama.cpp: ${IK_LLAMA_COMMIT_HASH} to /versions.txt.
Installer script
docker/unified/install-ik-llama.sh
New executable script that clones/fetches ikawrakow/ik_llama.cpp at a specified ref, configures CUDA-focused CMake flags, builds llama-server, validates output, and installs ik-llama-server to /install/bin/.

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~25 minutes

Possibly related PRs

🚥 Pre-merge checks | ✅ 3
✅ Passed checks (3 passed)
Check name Status Explanation
Title check ✅ Passed The title accurately summarizes the main change: adding ik_llama.cpp to the CUDA container, which is reflected in all modified files.
Description check ✅ Passed The description clearly explains the purpose and details of the changeset, including the new script, build stages, and workflow changes.
Docstring Coverage ✅ Passed No functions found in the changed files to evaluate docstring coverage. Skipping docstring coverage check.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Commit unit tests in branch claude/add-ik-llama-docker-Dlzie

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

Copy link
Copy Markdown

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

🧹 Nitpick comments (1)
docker/unified/build-image.sh (1)

160-162: Tag resolution pattern in resolve_ref() could be made more robust.

The resolve_ref function on line 161 queries refs/tags/<ref> without peeling annotated tags. While the ik_llama.cpp repository currently contains no annotated tags that would cause SHA mismatches, the code pattern could be hardened against future tag types by prioritizing peeled refs.

Consider this optional improvement to resolve_ref():

Defensive fix in resolve_ref()
-    hash=$(git ls-remote "${repo_url}" "refs/tags/${ref}" "refs/heads/${ref}" 2>/dev/null | head -1 | cut -f1)
+    hash=$(git ls-remote "${repo_url}" "refs/tags/${ref}^{}" "refs/tags/${ref}" "refs/heads/${ref}" 2>/dev/null | head -1 | cut -f1)
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@docker/unified/build-image.sh` around lines 160 - 162, The resolve_ref()
helper currently resolves tags by checking refs/tags/<ref> which can return an
annotated tag object rather than the peeled commit; update resolve_ref() so it
first tries the peeled tag ref (refs/tags/<ref>^{}) to get the underlying commit
SHA, then falls back to refs/tags/<ref>, then refs/heads/<ref>, and finally any
plain ref lookup—ensure this behavior is used when computing IK_LLAMA_HASH from
resolve_ref("${IK_LLAMA_REPO}" "${IK_LLAMA_REF}") so annotated tags in
ik_llama.cpp resolve to the commit SHA rather than the tag object.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@docker/unified/install-ik-llama.sh`:
- Around line 31-33: The linker flag "-Wl,--allow-shlib-undefined" in the
CMAKE_EXE_LINKER_FLAGS entry should be removed because it permits unresolved
shared-library symbols to pass link-time checks; edit the array element
containing "-DCMAKE_EXE_LINKER_FLAGS=-Wl,-rpath-link,/usr/local/cuda/lib64/stubs
-lcuda -Wl,--allow-shlib-undefined" and delete the trailing
"-Wl,--allow-shlib-undefined" portion so the CMAKE_EXE_LINKER_FLAGS only
includes the rpath-link and -lcuda entries.

---

Nitpick comments:
In `@docker/unified/build-image.sh`:
- Around line 160-162: The resolve_ref() helper currently resolves tags by
checking refs/tags/<ref> which can return an annotated tag object rather than
the peeled commit; update resolve_ref() so it first tries the peeled tag ref
(refs/tags/<ref>^{}) to get the underlying commit SHA, then falls back to
refs/tags/<ref>, then refs/heads/<ref>, and finally any plain ref lookup—ensure
this behavior is used when computing IK_LLAMA_HASH from
resolve_ref("${IK_LLAMA_REPO}" "${IK_LLAMA_REF}") so annotated tags in
ik_llama.cpp resolve to the commit SHA rather than the tag object.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

Run ID: 8d88c815-b744-4ad2-a07c-22d3183dd3be

📥 Commits

Reviewing files that changed from the base of the PR and between 9559009 and 9b1b54c.

📒 Files selected for processing (4)
  • .github/workflows/unified-docker.yml
  • docker/unified/Dockerfile
  • docker/unified/build-image.sh
  • docker/unified/install-ik-llama.sh

Comment on lines +31 to +33
"-DCMAKE_CUDA_FLAGS=-allow-unsupported-compiler"
"-DCMAKE_EXE_LINKER_FLAGS=-Wl,-rpath-link,/usr/local/cuda/lib64/stubs -lcuda -Wl,--allow-shlib-undefined"
)
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

🧩 Analysis chain

🌐 Web query:

What does GNU ld flag --allow-shlib-undefined do for executable linking, and what runtime risks does it introduce for CUDA-linked binaries?

💡 Result:

The GNU ld flag --allow-shlib-undefined permits the linker to accept undefined symbols specifically in shared libraries (.so files) that are linked into the output, rather than failing the link. It is the default behavior. This flag does not directly affect undefined symbols in regular object files (.o); those are controlled separately (e.g., by --no-undefined or -z defs). For executable linking: When creating an executable, ld by default already enforces that all symbols (including those from shared libraries) must be resolved at link time, treating unresolved symbols from shared libraries as errors. Specifying --allow-shlib-undefined explicitly changes this default for executables, allowing unresolved symbols from shared libraries to pass link-time checks. This defers resolution to runtime by the dynamic loader. Without it (or with --no-allow-shlib-undefined), the linker errors on unresolved symbols in any input shared libraries. Runtime risks for CUDA-linked binaries: CUDA applications link against shared libraries like libcudart.so, libcublas.so, which may have internal undefined symbols (e.g., GLIBC_PRIVATE symbols) or complex dependencies. Using --allow-shlib-undefined lets the executable link despite these, but risks runtime failures if: - Required CUDA libraries are missing or version-mismatched at runtime, causing undefined symbol errors during loading or lazy PLT resolution. - Multiple CUDA runtime instances (static vs. shared) lead to ABI mismatches, duplicate registrations, or crashes during kernel launches or managed memory access—issues NVCC changes since CUDA 13.0 aim to prevent via hidden visibility. - Subtle failures like incorrect kernel behavior or segfaults occur without build-time detection, as seen in shared library scenarios with global functions or device variables exported unexpectedly. Prefer --no-allow-shlib-undefined (or -Wl,--no-allow-shlib-undefined) for stricter link-time checks to catch dependency issues early.

Citations:


🏁 Script executed:

# First, let's check if these files exist and examine their content
fd -t f "install.*llama.*" docker/

# Check the specific lines in install-ik-llama.sh
echo "=== install-ik-llama.sh (lines 25-35) ==="
sed -n '25,35p' docker/unified/install-ik-llama.sh

# Check the corresponding section in install-llama.sh for comparison
echo -e "\n=== install-llama.sh (check similar CMake flags section) ==="
if [ -f docker/unified/install-llama.sh ]; then
  grep -n "CMAKE_EXE_LINKER_FLAGS" docker/unified/install-llama.sh -A 1 -B 1
fi

Repository: mostlygeek/llama-swap

Length of output: 888


🏁 Script executed:

# Check git history for this file to understand why the flag was added
git log --oneline docker/unified/install-ik-llama.sh 2>/dev/null | head -5

# Look for any comments or documentation around this script
echo "=== Full install-ik-llama.sh script ==="
wc -l docker/unified/install-ik-llama.sh

# Look at the full script to understand context
cat -n docker/unified/install-ik-llama.sh | head -50

# Check if there are any comments about CUDA linking issues
grep -n "allow-shlib-undefined\|undefined\|CUDA\|link" docker/unified/install-ik-llama.sh

Repository: mostlygeek/llama-swap

Length of output: 2458


Remove --allow-shlib-undefined from line 32.

The flag -Wl,--allow-shlib-undefined permits unresolved symbols from shared libraries to pass link-time checks, deferring resolution to runtime. This weakens build safety and can hide missing or mismatched CUDA libraries until runtime failures occur. The baseline install-llama.sh builds successfully without this flag using the same CUDA linker configuration, so it is not required.

Suggested change
-    "-DCMAKE_EXE_LINKER_FLAGS=-Wl,-rpath-link,/usr/local/cuda/lib64/stubs -lcuda -Wl,--allow-shlib-undefined"
+    "-DCMAKE_EXE_LINKER_FLAGS=-Wl,-rpath-link,/usr/local/cuda/lib64/stubs -lcuda"
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
"-DCMAKE_CUDA_FLAGS=-allow-unsupported-compiler"
"-DCMAKE_EXE_LINKER_FLAGS=-Wl,-rpath-link,/usr/local/cuda/lib64/stubs -lcuda -Wl,--allow-shlib-undefined"
)
"-DCMAKE_CUDA_FLAGS=-allow-unsupported-compiler"
"-DCMAKE_EXE_LINKER_FLAGS=-Wl,-rpath-link,/usr/local/cuda/lib64/stubs -lcuda"
)
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@docker/unified/install-ik-llama.sh` around lines 31 - 33, The linker flag
"-Wl,--allow-shlib-undefined" in the CMAKE_EXE_LINKER_FLAGS entry should be
removed because it permits unresolved shared-library symbols to pass link-time
checks; edit the array element containing
"-DCMAKE_EXE_LINKER_FLAGS=-Wl,-rpath-link,/usr/local/cuda/lib64/stubs -lcuda
-Wl,--allow-shlib-undefined" and delete the trailing
"-Wl,--allow-shlib-undefined" portion so the CMAKE_EXE_LINKER_FLAGS only
includes the rpath-link and -lcuda entries.

claude and others added 3 commits April 1, 2026 08:33
Three bugs prevented the ik_llama.cpp stage from building:

- Remove -DLLAMA_CURL=ON: requires libcurl4-openssl-dev which is not
  installed in the builder; consistent with how llama.cpp is built
- Upgrade cmake via pip3: Ubuntu 22.04 ships cmake 3.22 which lacks
  the NVCC compiler feature mapping for CUDA C++20 (required by
  ik_llama.cpp main branch)
- Fix 4GB binary link failure: compiling for 5 CUDA architectures with
  hundreds of quantization kernels produces a ~4GB static binary that
  overflows x86-64 32-bit PC-relative addressing (PLT entries and
  exception tables); fixed by building with -mcmodel=large for all
  host C/C++ and CUDA host compilation, and using the mold linker

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Upgrade all base images from Ubuntu 22.04 / CUDA 12.4.0 to Ubuntu 24.04
/ CUDA 12.9.1, and remove the Ubuntu 22.04 workarounds added in the
previous commit.

- Remove pip3 cmake upgrade (Ubuntu 24.04 ships cmake 3.28+ with CUDA
  C++20 support)
- Remove -mcmodel=large C/CXX/CUDA flags (not needed with newer toolchain)
- Remove -fuse-ld=mold and mold package (linker overflow no longer occurs)
- Fix ubuntu:26.04 → ubuntu:24.04 for vulkan builder and runtime stages
- Remove git from runtime images (not needed at runtime, reduces image size)

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
@candrews
Copy link
Copy Markdown

candrews commented Apr 2, 2026

If ik_llama.cpp is being added to the cuda image, it really should be also added to the other images too (or at least cpu and vulkan).

@mostlygeek
Copy link
Copy Markdown
Owner Author

@candrews ik_llama only has cuda acceleration so it only makes sense to add it to that container. I’m not planning on creating a unified CPU container but people should be able to take what’s in the repo and create their own.

@candrews
Copy link
Copy Markdown

candrews commented Apr 2, 2026

ik_llama only has cuda acceleration

ik_llama also has vulkan support and it currently works (although to your point it's not as good as their cuda support). And they're planning on improving it: ikawrakow/ik_llama.cpp#590

So I think including ik_llama in the vulkan image makes sense now and for the future.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants