docker/unified: add ik_llama.cpp to CUDA container#620
docker/unified: add ik_llama.cpp to CUDA container#620mostlygeek wants to merge 4 commits intomainfrom
Conversation
Add ik-llama-server (ikawrakow/ik_llama.cpp) to the unified CUDA image. The binary is built statically with CUDA support and installed alongside the existing llama-server to avoid naming collisions. - add install-ik-llama.sh build script (mirrors install-llama.sh pattern) - add ik-llama-cuda build stage in Dockerfile; ik-llama-vulkan is a no-op so BuildKit skips the CUDA stage entirely on vulkan builds - resolve IK_LLAMA_REF in build-image.sh (cuda only, "n/a" for vulkan) - verify ik-llama-server binary in post-build check (cuda only) - add ik_llama_ref input to unified-docker.yml workflow - track ik_llama.cpp commit hash in /versions.txt https://claude.ai/code/session_01RRofFLwAiAPfk4NgcNegb9
|
No actionable comments were generated in the recent review. 🎉 ℹ️ Recent review info⚙️ Run configurationConfiguration used: Path: .coderabbit.yaml Review profile: CHILL Plan: Pro Run ID: 📒 Files selected for processing (2)
✅ Files skipped from review due to trivial changes (1)
🚧 Files skipped from review as they are similar to previous changes (1)
WalkthroughAdds optional ik_llama.cpp support to the unified Docker build: new workflow input, build-script logic to resolve CUDA-only commit hashes, a CUDA-focused installer script, and multi-stage Dockerfile changes that conditionally include ik-llama binaries and report its commit in versions.txt. Changes
Estimated code review effort🎯 3 (Moderate) | ⏱️ ~25 minutes Possibly related PRs
🚥 Pre-merge checks | ✅ 3✅ Passed checks (3 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches🧪 Generate unit tests (beta)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
There was a problem hiding this comment.
Actionable comments posted: 1
🧹 Nitpick comments (1)
docker/unified/build-image.sh (1)
160-162: Tag resolution pattern inresolve_ref()could be made more robust.The
resolve_reffunction on line 161 queriesrefs/tags/<ref>without peeling annotated tags. While the ik_llama.cpp repository currently contains no annotated tags that would cause SHA mismatches, the code pattern could be hardened against future tag types by prioritizing peeled refs.Consider this optional improvement to
resolve_ref():Defensive fix in
resolve_ref()- hash=$(git ls-remote "${repo_url}" "refs/tags/${ref}" "refs/heads/${ref}" 2>/dev/null | head -1 | cut -f1) + hash=$(git ls-remote "${repo_url}" "refs/tags/${ref}^{}" "refs/tags/${ref}" "refs/heads/${ref}" 2>/dev/null | head -1 | cut -f1)🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@docker/unified/build-image.sh` around lines 160 - 162, The resolve_ref() helper currently resolves tags by checking refs/tags/<ref> which can return an annotated tag object rather than the peeled commit; update resolve_ref() so it first tries the peeled tag ref (refs/tags/<ref>^{}) to get the underlying commit SHA, then falls back to refs/tags/<ref>, then refs/heads/<ref>, and finally any plain ref lookup—ensure this behavior is used when computing IK_LLAMA_HASH from resolve_ref("${IK_LLAMA_REPO}" "${IK_LLAMA_REF}") so annotated tags in ik_llama.cpp resolve to the commit SHA rather than the tag object.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@docker/unified/install-ik-llama.sh`:
- Around line 31-33: The linker flag "-Wl,--allow-shlib-undefined" in the
CMAKE_EXE_LINKER_FLAGS entry should be removed because it permits unresolved
shared-library symbols to pass link-time checks; edit the array element
containing "-DCMAKE_EXE_LINKER_FLAGS=-Wl,-rpath-link,/usr/local/cuda/lib64/stubs
-lcuda -Wl,--allow-shlib-undefined" and delete the trailing
"-Wl,--allow-shlib-undefined" portion so the CMAKE_EXE_LINKER_FLAGS only
includes the rpath-link and -lcuda entries.
---
Nitpick comments:
In `@docker/unified/build-image.sh`:
- Around line 160-162: The resolve_ref() helper currently resolves tags by
checking refs/tags/<ref> which can return an annotated tag object rather than
the peeled commit; update resolve_ref() so it first tries the peeled tag ref
(refs/tags/<ref>^{}) to get the underlying commit SHA, then falls back to
refs/tags/<ref>, then refs/heads/<ref>, and finally any plain ref lookup—ensure
this behavior is used when computing IK_LLAMA_HASH from
resolve_ref("${IK_LLAMA_REPO}" "${IK_LLAMA_REF}") so annotated tags in
ik_llama.cpp resolve to the commit SHA rather than the tag object.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: Path: .coderabbit.yaml
Review profile: CHILL
Plan: Pro
Run ID: 8d88c815-b744-4ad2-a07c-22d3183dd3be
📒 Files selected for processing (4)
.github/workflows/unified-docker.ymldocker/unified/Dockerfiledocker/unified/build-image.shdocker/unified/install-ik-llama.sh
| "-DCMAKE_CUDA_FLAGS=-allow-unsupported-compiler" | ||
| "-DCMAKE_EXE_LINKER_FLAGS=-Wl,-rpath-link,/usr/local/cuda/lib64/stubs -lcuda -Wl,--allow-shlib-undefined" | ||
| ) |
There was a problem hiding this comment.
🧩 Analysis chain
🌐 Web query:
What does GNU ld flag --allow-shlib-undefined do for executable linking, and what runtime risks does it introduce for CUDA-linked binaries?
💡 Result:
The GNU ld flag --allow-shlib-undefined permits the linker to accept undefined symbols specifically in shared libraries (.so files) that are linked into the output, rather than failing the link. It is the default behavior. This flag does not directly affect undefined symbols in regular object files (.o); those are controlled separately (e.g., by --no-undefined or -z defs). For executable linking: When creating an executable, ld by default already enforces that all symbols (including those from shared libraries) must be resolved at link time, treating unresolved symbols from shared libraries as errors. Specifying --allow-shlib-undefined explicitly changes this default for executables, allowing unresolved symbols from shared libraries to pass link-time checks. This defers resolution to runtime by the dynamic loader. Without it (or with --no-allow-shlib-undefined), the linker errors on unresolved symbols in any input shared libraries. Runtime risks for CUDA-linked binaries: CUDA applications link against shared libraries like libcudart.so, libcublas.so, which may have internal undefined symbols (e.g., GLIBC_PRIVATE symbols) or complex dependencies. Using --allow-shlib-undefined lets the executable link despite these, but risks runtime failures if: - Required CUDA libraries are missing or version-mismatched at runtime, causing undefined symbol errors during loading or lazy PLT resolution. - Multiple CUDA runtime instances (static vs. shared) lead to ABI mismatches, duplicate registrations, or crashes during kernel launches or managed memory access—issues NVCC changes since CUDA 13.0 aim to prevent via hidden visibility. - Subtle failures like incorrect kernel behavior or segfaults occur without build-time detection, as seen in shared library scenarios with global functions or device variables exported unexpectedly. Prefer --no-allow-shlib-undefined (or -Wl,--no-allow-shlib-undefined) for stricter link-time checks to catch dependency issues early.
Citations:
- 1: https://sourceware.org/binutils/docs/ld/Options.html
- 2: https://wiki.gentoo.org/wiki/Project:Quality_Assurance/-Wl,-z,defs_and_-Wl,--no-allow-shlib-undefined
- 3: http://sourceware.org/binutils/docs-2.16/ld/Options.html
- 4: https://stackoverflow.com/questions/2356168/force-gcc-to-notify-about-undefined-references-in-shared-libraries
- 5: https://manpages.ubuntu.com/manpages/jammy/man1/alpha-linux-gnu-ld.bfd.1.html
- 6: https://stackoverflow.com/questions/31371867/what-is-the-g-flag-that-permits-undefined-references-in-shared-libraries
- 7: https://stackoverflow.com/questions/55928308/whats-the-difference-between-unresolved-symbols-ignore-in-shared-libs-and-a
- 8: https://developer.nvidia.com/blog/cuda-c-compiler-updates-impacting-elf-visibility-and-linkage/
- 9: https://stackoverflow.com/questions/65150791/undefined-symbol-despite-being-defined-in-linked-library-cuda-10-1
🏁 Script executed:
# First, let's check if these files exist and examine their content
fd -t f "install.*llama.*" docker/
# Check the specific lines in install-ik-llama.sh
echo "=== install-ik-llama.sh (lines 25-35) ==="
sed -n '25,35p' docker/unified/install-ik-llama.sh
# Check the corresponding section in install-llama.sh for comparison
echo -e "\n=== install-llama.sh (check similar CMake flags section) ==="
if [ -f docker/unified/install-llama.sh ]; then
grep -n "CMAKE_EXE_LINKER_FLAGS" docker/unified/install-llama.sh -A 1 -B 1
fiRepository: mostlygeek/llama-swap
Length of output: 888
🏁 Script executed:
# Check git history for this file to understand why the flag was added
git log --oneline docker/unified/install-ik-llama.sh 2>/dev/null | head -5
# Look for any comments or documentation around this script
echo "=== Full install-ik-llama.sh script ==="
wc -l docker/unified/install-ik-llama.sh
# Look at the full script to understand context
cat -n docker/unified/install-ik-llama.sh | head -50
# Check if there are any comments about CUDA linking issues
grep -n "allow-shlib-undefined\|undefined\|CUDA\|link" docker/unified/install-ik-llama.shRepository: mostlygeek/llama-swap
Length of output: 2458
Remove --allow-shlib-undefined from line 32.
The flag -Wl,--allow-shlib-undefined permits unresolved symbols from shared libraries to pass link-time checks, deferring resolution to runtime. This weakens build safety and can hide missing or mismatched CUDA libraries until runtime failures occur. The baseline install-llama.sh builds successfully without this flag using the same CUDA linker configuration, so it is not required.
Suggested change
- "-DCMAKE_EXE_LINKER_FLAGS=-Wl,-rpath-link,/usr/local/cuda/lib64/stubs -lcuda -Wl,--allow-shlib-undefined"
+ "-DCMAKE_EXE_LINKER_FLAGS=-Wl,-rpath-link,/usr/local/cuda/lib64/stubs -lcuda"📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| "-DCMAKE_CUDA_FLAGS=-allow-unsupported-compiler" | |
| "-DCMAKE_EXE_LINKER_FLAGS=-Wl,-rpath-link,/usr/local/cuda/lib64/stubs -lcuda -Wl,--allow-shlib-undefined" | |
| ) | |
| "-DCMAKE_CUDA_FLAGS=-allow-unsupported-compiler" | |
| "-DCMAKE_EXE_LINKER_FLAGS=-Wl,-rpath-link,/usr/local/cuda/lib64/stubs -lcuda" | |
| ) |
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@docker/unified/install-ik-llama.sh` around lines 31 - 33, The linker flag
"-Wl,--allow-shlib-undefined" in the CMAKE_EXE_LINKER_FLAGS entry should be
removed because it permits unresolved shared-library symbols to pass link-time
checks; edit the array element containing
"-DCMAKE_EXE_LINKER_FLAGS=-Wl,-rpath-link,/usr/local/cuda/lib64/stubs -lcuda
-Wl,--allow-shlib-undefined" and delete the trailing
"-Wl,--allow-shlib-undefined" portion so the CMAKE_EXE_LINKER_FLAGS only
includes the rpath-link and -lcuda entries.
The ikawrakow/ik_llama.cpp repo uses "main" not "master". https://claude.ai/code/session_01RRofFLwAiAPfk4NgcNegb9
Three bugs prevented the ik_llama.cpp stage from building: - Remove -DLLAMA_CURL=ON: requires libcurl4-openssl-dev which is not installed in the builder; consistent with how llama.cpp is built - Upgrade cmake via pip3: Ubuntu 22.04 ships cmake 3.22 which lacks the NVCC compiler feature mapping for CUDA C++20 (required by ik_llama.cpp main branch) - Fix 4GB binary link failure: compiling for 5 CUDA architectures with hundreds of quantization kernels produces a ~4GB static binary that overflows x86-64 32-bit PC-relative addressing (PLT entries and exception tables); fixed by building with -mcmodel=large for all host C/C++ and CUDA host compilation, and using the mold linker Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Upgrade all base images from Ubuntu 22.04 / CUDA 12.4.0 to Ubuntu 24.04 / CUDA 12.9.1, and remove the Ubuntu 22.04 workarounds added in the previous commit. - Remove pip3 cmake upgrade (Ubuntu 24.04 ships cmake 3.28+ with CUDA C++20 support) - Remove -mcmodel=large C/CXX/CUDA flags (not needed with newer toolchain) - Remove -fuse-ld=mold and mold package (linker overflow no longer occurs) - Fix ubuntu:26.04 → ubuntu:24.04 for vulkan builder and runtime stages - Remove git from runtime images (not needed at runtime, reduces image size) Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
|
If |
|
@candrews ik_llama only has cuda acceleration so it only makes sense to add it to that container. I’m not planning on creating a unified CPU container but people should be able to take what’s in the repo and create their own. |
ik_llama also has vulkan support and it currently works (although to your point it's not as good as their cuda support). And they're planning on improving it: ikawrakow/ik_llama.cpp#590 So I think including ik_llama in the vulkan image makes sense now and for the future. |
Add ik-llama-server (ikawrakow/ik_llama.cpp) to the unified CUDA image.
The binary is built statically with CUDA support and installed alongside
the existing llama-server to avoid naming collisions.
so BuildKit skips the CUDA stage entirely on vulkan builds
https://claude.ai/code/session_01RRofFLwAiAPfk4NgcNegb9