build: add cuda13 architecture support#551
Conversation
…-swap container against the ghcr.io/ggml-org/llama.cpp:server-cuda13 target vice server-cuda
WalkthroughThe pull request adds "cuda13" to the container build matrix in the GitHub Actions workflow and extends the allowed architectures list in the build script to permit the new platform during validation. Changes
Estimated code review effort🎯 1 (Trivial) | ⏱️ ~3 minutes Possibly related PRs
🚥 Pre-merge checks | ✅ 3✅ Passed checks (3 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches🧪 Generate unit tests (beta)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
There was a problem hiding this comment.
Pull request overview
Adds a new Docker build “architecture” option (cuda13) so this repo can build and publish llama-swap images on top of the upstream ghcr.io/ggml-org/llama.cpp CUDA 13 server images.
Changes:
- Allow
cuda13as a validARCHindocker/build-container.sh. - Add
cuda13to the GitHub Actions container build matrix so CI builds it automatically.
Reviewed changes
Copilot reviewed 2 out of 2 changed files in this pull request and generated no comments.
| File | Description |
|---|---|
docker/build-container.sh |
Extends the allowed architecture list to include cuda13 so the script will construct and build tags for it. |
.github/workflows/containers.yml |
Extends the CI matrix to run the build for cuda13 on schedule/manual runs (and build-only on workflow-change pushes). |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
There was a problem hiding this comment.
🧹 Nitpick comments (1)
.github/workflows/containers.yml (1)
32-32: Consider centralizing supported platform definitions to avoid drift.
matrix.platformandALLOWED_ARCHSnow require manual sync. A shared source (e.g., a checked-in arch list consumed by both workflow and script) would reduce future mismatch risk.🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In @.github/workflows/containers.yml at line 32, Centralize the supported platform list by creating a single checked-in source (e.g., platforms.json or SUPPORTED_ARCHS.txt) and update both matrix.platform in the GitHub Actions workflow and the ALLOWED_ARCHS variable used by scripts to read from that file; modify the workflow to load the list (via fromJSON or matrix generation step) and change the script entrypoint to parse the same file (referencing ALLOWED_ARCHS and matrix.platform) so both consumers derive the values from the single source to prevent drift.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Nitpick comments:
In @.github/workflows/containers.yml:
- Line 32: Centralize the supported platform list by creating a single
checked-in source (e.g., platforms.json or SUPPORTED_ARCHS.txt) and update both
matrix.platform in the GitHub Actions workflow and the ALLOWED_ARCHS variable
used by scripts to read from that file; modify the workflow to load the list
(via fromJSON or matrix generation step) and change the script entrypoint to
parse the same file (referencing ALLOWED_ARCHS and matrix.platform) so both
consumers derive the values from the single source to prevent drift.
|
Thanks! |
Add `cuda13` as a supported build architecture, targeting the `ghcr.io/ggml-org/llama.cpp:server-cuda13` upstream base image. The `server-cuda13` image ships with CUDA 13 libraries, providing improved performance on recent NVIDIA hardware compared to the existing `server-cuda` (CUDA 12) image. Users with newer GPUs (e.g., RTX 50-series) benefit from reduced model load latency and higher token throughput. - Add `cuda13` to the allowed architectures list in `docker/build-container.sh` - Add `cuda13` to the CI matrix in `.github/workflows/containers.yml` so the container is built and pushed automatically
Add `cuda13` as a supported build architecture, targeting the `ghcr.io/ggml-org/llama.cpp:server-cuda13` upstream base image. The `server-cuda13` image ships with CUDA 13 libraries, providing improved performance on recent NVIDIA hardware compared to the existing `server-cuda` (CUDA 12) image. Users with newer GPUs (e.g., RTX 50-series) benefit from reduced model load latency and higher token throughput. - Add `cuda13` to the allowed architectures list in `docker/build-container.sh` - Add `cuda13` to the CI matrix in `.github/workflows/containers.yml` so the container is built and pushed automatically
Add
cuda13as a supported build architecture, targeting theghcr.io/ggml-org/llama.cpp:server-cuda13upstream base image.The
server-cuda13image ships with CUDA 13 libraries, providing improved performance on recent NVIDIA hardware compared to the existingserver-cuda(CUDA 12) image. Users with newer GPUs (e.g., RTX 50-series) benefit from reduced model load latency and higher token throughput.cuda13to the allowed architectures list indocker/build-container.shcuda13to the CI matrix in.github/workflows/containers.ymlso the container is built and pushed automatically