Skip to content

build: add cuda13 architecture support#551

Merged
mostlygeek merged 1 commit intomostlygeek:mainfrom
pdscomp:cuda13
Mar 1, 2026
Merged

build: add cuda13 architecture support#551
mostlygeek merged 1 commit intomostlygeek:mainfrom
pdscomp:cuda13

Conversation

@pdscomp
Copy link
Copy Markdown
Contributor

@pdscomp pdscomp commented Mar 1, 2026

Add cuda13 as a supported build architecture, targeting the ghcr.io/ggml-org/llama.cpp:server-cuda13 upstream base image.

The server-cuda13 image ships with CUDA 13 libraries, providing improved performance on recent NVIDIA hardware compared to the existing server-cuda (CUDA 12) image. Users with newer GPUs (e.g., RTX 50-series) benefit from reduced model load latency and higher token throughput.

  • Add cuda13 to the allowed architectures list in docker/build-container.sh
  • Add cuda13 to the CI matrix in .github/workflows/containers.yml so the container is built and pushed automatically

…-swap container against the ghcr.io/ggml-org/llama.cpp:server-cuda13 target vice server-cuda
Copilot AI review requested due to automatic review settings March 1, 2026 17:10
@coderabbitai
Copy link
Copy Markdown

coderabbitai bot commented Mar 1, 2026

Walkthrough

The pull request adds "cuda13" to the container build matrix in the GitHub Actions workflow and extends the allowed architectures list in the build script to permit the new platform during validation.

Changes

Cohort / File(s) Summary
Add cuda13 platform support
.github/workflows/containers.yml, docker/build-container.sh
Extended build matrix and allowed architectures to include cuda13 platform.

Estimated code review effort

🎯 1 (Trivial) | ⏱️ ~3 minutes

Possibly related PRs

🚥 Pre-merge checks | ✅ 3
✅ Passed checks (3 passed)
Check name Status Explanation
Description check ✅ Passed The description provides relevant context about the cuda13 addition, including the upstream base image, performance rationale, and lists the specific file changes made.
Docstring Coverage ✅ Passed No functions found in the changed files to evaluate docstring coverage. Skipping docstring coverage check.
Title check ✅ Passed The title 'build: add cuda13 architecture support' directly summarizes the main change—adding CUDA 13 as a supported architecture across build configuration and CI matrix.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Post copyable unit tests in a comment

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

Copy link
Copy Markdown

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Adds a new Docker build “architecture” option (cuda13) so this repo can build and publish llama-swap images on top of the upstream ghcr.io/ggml-org/llama.cpp CUDA 13 server images.

Changes:

  • Allow cuda13 as a valid ARCH in docker/build-container.sh.
  • Add cuda13 to the GitHub Actions container build matrix so CI builds it automatically.

Reviewed changes

Copilot reviewed 2 out of 2 changed files in this pull request and generated no comments.

File Description
docker/build-container.sh Extends the allowed architecture list to include cuda13 so the script will construct and build tags for it.
.github/workflows/containers.yml Extends the CI matrix to run the build for cuda13 on schedule/manual runs (and build-only on workflow-change pushes).

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Copy link
Copy Markdown

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🧹 Nitpick comments (1)
.github/workflows/containers.yml (1)

32-32: Consider centralizing supported platform definitions to avoid drift.

matrix.platform and ALLOWED_ARCHS now require manual sync. A shared source (e.g., a checked-in arch list consumed by both workflow and script) would reduce future mismatch risk.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In @.github/workflows/containers.yml at line 32, Centralize the supported
platform list by creating a single checked-in source (e.g., platforms.json or
SUPPORTED_ARCHS.txt) and update both matrix.platform in the GitHub Actions
workflow and the ALLOWED_ARCHS variable used by scripts to read from that file;
modify the workflow to load the list (via fromJSON or matrix generation step)
and change the script entrypoint to parse the same file (referencing
ALLOWED_ARCHS and matrix.platform) so both consumers derive the values from the
single source to prevent drift.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Nitpick comments:
In @.github/workflows/containers.yml:
- Line 32: Centralize the supported platform list by creating a single
checked-in source (e.g., platforms.json or SUPPORTED_ARCHS.txt) and update both
matrix.platform in the GitHub Actions workflow and the ALLOWED_ARCHS variable
used by scripts to read from that file; modify the workflow to load the list
(via fromJSON or matrix generation step) and change the script entrypoint to
parse the same file (referencing ALLOWED_ARCHS and matrix.platform) so both
consumers derive the values from the single source to prevent drift.

ℹ️ Review info

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 49546e2 and 1607f87.

📒 Files selected for processing (2)
  • .github/workflows/containers.yml
  • docker/build-container.sh

@pdscomp pdscomp changed the title docker: add cuda13 architecture support build: add cuda13 architecture support Mar 1, 2026
@mostlygeek mostlygeek merged commit 181f71c into mostlygeek:main Mar 1, 2026
4 of 5 checks passed
@mostlygeek
Copy link
Copy Markdown
Owner

Thanks!

pontostroy pushed a commit to pontostroy/llama-swap that referenced this pull request Mar 4, 2026
Add `cuda13` as a supported build architecture, targeting the
`ghcr.io/ggml-org/llama.cpp:server-cuda13` upstream base image.

The `server-cuda13` image ships with CUDA 13 libraries, providing
improved performance on recent NVIDIA hardware compared to the existing
`server-cuda` (CUDA 12) image. Users with newer GPUs (e.g., RTX
50-series) benefit from reduced model load latency and higher token
throughput.

- Add `cuda13` to the allowed architectures list in
`docker/build-container.sh`
- Add `cuda13` to the CI matrix in `.github/workflows/containers.yml` so
the container is built and pushed automatically
rohitpaul pushed a commit to rohitpaul/llama-swap that referenced this pull request Mar 29, 2026
Add `cuda13` as a supported build architecture, targeting the
`ghcr.io/ggml-org/llama.cpp:server-cuda13` upstream base image.

The `server-cuda13` image ships with CUDA 13 libraries, providing
improved performance on recent NVIDIA hardware compared to the existing
`server-cuda` (CUDA 12) image. Users with newer GPUs (e.g., RTX
50-series) benefit from reduced model load latency and higher token
throughput.

- Add `cuda13` to the allowed architectures list in
`docker/build-container.sh`
- Add `cuda13` to the CI matrix in `.github/workflows/containers.yml` so
the container is built and pushed automatically
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants