CI: Enable CUDA and Vulkan ARM64 runners and fix CI/CD by ehfd · Pull Request #21122 · ggml-org/llama.cpp

ehfd · 2026-03-28T15:26:51Z

Overview

Follow-up of #20929 for enabling CUDA/Vulkan ARM64 runners and fixing ARM64 CUDA Dockerfiles (for GH200, DGX Spark, etc.), as well as vastly accelerating build time for ARM64 due to changing to native runners.

AMD ROCm is only built for x86_64, as AMD doesn't provide ARM64 base images.

CC @CISC @taronaeo

Additional information

Enables gcc/g++ 14 in CUDA Dockerfile images and updates images to Ubuntu 24.04 and CUDA 12.5.1 for the CUDA 12 image, and build ARM64 Vulkan Dockerfile images
Fixes a few aesthetic improvements (libgomp1 curl\ -> libgomp1 curl \) in other Dockerfiles.
Corresponding changes to docker.yml similar to CI: fix ARM64 image build error & enable compilation #20929.
Enable ubuntu-24.04-arm native architecture workers for ARM64 for vastly accelerated build times.
Properly handle multi-architecture container image manifests through digest-based upload, which was previously broken after CI: fix ARM64 image build error & enable compilation #20929.

Requirements

I have read and agree with the contributing guidelines: YES
AI usage disclosure: Gemini 3.1 Pro, GPT-5.3-Codex, purely for review and minor assistance.

Copilot

Pull request overview

Updates the CI Docker image publishing workflow and associated Dockerfiles to support ARM64 CUDA builds and refresh CUDA/Ubuntu toolchain versions for published images.

Changes:

Add ARM64 matrix builds for CUDA 12 (cuda/cuda12) and CUDA 13 (cuda13) images in .github/workflows/docker.yml.
Update CUDA 12 Dockerfile defaults to Ubuntu 24.04 + CUDA 12.5.1 and use GCC/G++ 14 in CUDA build stages.
Minor Dockerfile formatting fixes (spacing before line-continuation \) across multiple image definitions.

Reviewed changes

Copilot reviewed 3 out of 8 changed files in this pull request and generated 2 comments.

Show a summary per file

File	Description
`.github/workflows/docker.yml`	Adds ARM64 CUDA build matrix entries and updates CUDA 12 version used by published images.
`.devops/cuda.Dockerfile`	Bumps Ubuntu/CUDA defaults, switches build toolchain to GCC 14, and updates Python packaging install behavior for Ubuntu 24.04.
`.devops/cuda-new.Dockerfile`	Uses GCC 14 toolchain and minor formatting alignment for the CUDA 13 image.
`.devops/cpu.Dockerfile`	Formatting-only change in base-stage apt install command.
`.devops/rocm.Dockerfile`	Formatting-only change in base-stage apt install command.
`.devops/openvino.Dockerfile`	Formatting-only change in base-stage apt install command.
`.devops/musa.Dockerfile`	Formatting-only change in base-stage apt install command.
`.devops/intel.Dockerfile`	Formatting-only change in base-stage apt install command.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Copilot

Pull request overview

Copilot reviewed 3 out of 8 changed files in this pull request and generated 2 comments.

Comments suppressed due to low confidence (1)

.github/workflows/docker.yml:112

This step name says “Log in to Docker Hub”, but it actually authenticates to ghcr.io. Renaming the step (e.g., “Log in to GHCR”) will make the workflow’s behavior clearer.

      - name: Log in to Docker Hub
        uses: docker/login-action@b45d80f862d83dbcd57f89517bcf500b2ab88fb2 # v4
        with:
          registry: ghcr.io
          username: ${{ github.repository_owner }}
          password: ${{ secrets.GITHUB_TOKEN }}

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

ehfd · 2026-03-29T09:58:54Z

https://github.com/ehfd/llama.cpp/actions/runs/23704816053

@CISC @taronaeo Perfectly works (s390x was excluded due to the IBM runner not being available in the fork).

Co-authored-by: Sigbjørn Skjæret <sigbjorn.skjaeret@scala.com>

ehfd · 2026-03-30T06:50:25Z

https://github.com/ehfd/llama.cpp/actions/runs/23731725922

@CISC @taronaeo Had to fix the Vulkan image because Ubuntu 26.04 Beta removed Python 3.13 overnight (replaced with uv to respect #20530). The Ubuntu packages page removed Python 3.13 for 26.04 between yesterday and today, and the minimum is 3.14.

Edit: Vulkan now passes.

CISC · 2026-03-30T07:04:45Z

@CISC @taronaeo Had to fix the Vulkan image because Ubuntu 26.04 Beta removed Python 3.13 overnight (replaced with uv to respect #20530). The Ubuntu packages page removed Python 3.13 for 26.04 between yesterday and today, and the minimum is 3.14.

This image is becoming annoying. :)

taronaeo

I'm not familiar with the multi-architecture, multi-platform setup that you've proposed here and will need to read the Medium article linked before I can give an approval.

Same as @CISC's comment here, I think this is very elaborate and IMO hard to maintain (especially the jq commands).

If another maintainer understands this well enough, they can give the approval in the meantime :)

ehfd · 2026-03-30T08:15:23Z

Well, the CI/CD passed.

taronaeo · 2026-03-30T08:17:32Z

I'm reading this blog from Docker themselves: https://www.docker.com/blog/multi-arch-build-and-images-the-simple-way/

It appears that the solution proposed here is slightly outdated and as I thought, harder than it actually should be.

Can't we just use docker buildx to simplify this process? Am I missing something? https://www.docker.com/blog/multi-arch-build-and-images-the-simple-way/#:~:text=The%20simple%20way%20with%20docker%20buildx

Edit: More documentation: https://docs.docker.com/build/building/multi-platform/#build-multi-platform-images

ehfd · 2026-03-30T08:24:46Z

@taronaeo The simple way doesn't work because we are running in GitHub Actions and pushing to GitHub Registry, and such a method would easily lead to 6-hour build times (which timeout) vs 1-hour due to emulating arm64 and s390x within x86_64 instead of using different native runners.

There is one other way not in that article to use that syntax you proposed, and at the same time run in native build runners, only when we have control of the configurations for build nodes of each architecture, but not in GitHub Actions, because we have no such control of dispatching builds to various remote runners.

In such a situation, using docker manifest create and the more modern equivalent docker buildx imagetools create is standard.

ehfd · 2026-03-30T08:39:42Z

A more to-the-point AI explanation:

The short answer is: No, you cannot specify multiple platforms in a single build step and expect them all to run natively on different runners.

When you set platforms: "linux/amd64,linux/arm64" in a single docker/build-push-action step, that specific runner tries to build both. Since a runner only has one native architecture, it will use QEMU to emulate the other one, which is significantly slower.

However, your current workflow is actually already using the "industry standard" best practice for achieving what you want. Here is the breakdown of why your current approach is the right one and how to think about the platforms field.

Why your current "Matrix + Merge" is better
Your workflow uses a distributed build pattern. This is the most efficient way to handle multi-arch builds in GitHub Actions.

Native Speed: By splitting the matrix so linux/amd64 runs on ubuntu-24.04 and linux/arm64 runs on ubuntu-24.04-arm, every instruction in your Dockerfile runs at native hardware speed.

Avoids QEMU: Emulating ARM on AMD64 (or vice versa) can be 10x–20x slower, especially for compiled languages (C++, Rust, Go).

The Manifest Merge: The merge_arch_tags job using imagetools create is the "glue." It doesn't rebuild anything; it just points a single tag (like my-image:latest) to the different architecture-specific digests you already pushed.

What happens if you change "platforms" to both?
If you modified your JSON to look like this:
{ "platforms": "linux/amd64,linux/arm64", "runs_on": "ubuntu-24.04" }

The ubuntu-24.04 (AMD64) runner would start.

It would build the AMD64 version natively.

It would see it also needs an ARM64 version and start a QEMU emulator.

The ARM64 build would take much longer and likely time out or exhaust resources.

The "Native" Alternative: Remote Build Nodes
If you truly want to use a single platforms: "linux/amd64,linux/arm64" line in one job but have it run natively, you would need to set up a remote Buildx driver.

This involves:

Starting an ARM64 runner and an AMD64 runner.

Connecting them via SSH.

Configuring Buildx to use both as "nodes" in a single builder instance.

Why you shouldn't do this: It is significantly harder to maintain in GitHub Actions because of networking/SSH complexities between ephemeral runners. Your current "Push by Digest & Merge" strategy is much more robust.

taronaeo · 2026-03-30T08:42:24Z

Okay I am starting to get the full picture of this approach now, and can understand why. In this case, this PR is perfectly acceptable.

ehfd · 2026-03-30T08:43:33Z

@taronaeo Okay, then! Thank you. Can't wait to get it merged.

ehfd · 2026-03-30T08:44:06Z

@CISC Two approvals!

ehfd · 2026-03-31T06:26:20Z

Confirming everything is as intended.

$ docker buildx imagetools inspect ghcr.io/ggml-org/llama.cpp:full
Name:      ghcr.io/ggml-org/llama.cpp:full
MediaType: application/vnd.docker.distribution.manifest.list.v2+json
Digest:    sha256:1afcc00e4a686ce0119e1d352fc98ad3cb065fdd78fa08497eb8a5e0bf287b44

Manifests:
  Name:      ghcr.io/ggml-org/llama.cpp:full@sha256:95dc6931d877f9236b887daedf84044dd40936538c09067a1dccd668dc6dc1ef
  MediaType: application/vnd.docker.distribution.manifest.v2+json
  Platform:  linux/amd64

  Name:      ghcr.io/ggml-org/llama.cpp:full@sha256:0b7b7b417c76991273c27fcb7a6d10dbc4b08e4cdf2816d82b13a05a1f1000f5
  MediaType: application/vnd.docker.distribution.manifest.v2+json
  Platform:  linux/arm64

  Name:      ghcr.io/ggml-org/llama.cpp:full@sha256:556813c8a30d6443b94e0914fc719e081afbb9309abf0201a2c8f5189f83e44a
  MediaType: application/vnd.docker.distribution.manifest.v2+json
  Platform:  linux/s390x

$ docker buildx imagetools inspect ghcr.io/ggml-org/llama.cpp:server
Name:      ghcr.io/ggml-org/llama.cpp:server
MediaType: application/vnd.docker.distribution.manifest.list.v2+json
Digest:    sha256:80910e898e5d9a6b46ca9d1b4674d3e15faf6d32b9692eb6011ccd34b2cb8a06

Manifests:
  Name:      ghcr.io/ggml-org/llama.cpp:server@sha256:044d63114bfdb99d7300d8e490c4ccdc67b19d750d62cca603a3763b41113939
  MediaType: application/vnd.docker.distribution.manifest.v2+json
  Platform:  linux/amd64

  Name:      ghcr.io/ggml-org/llama.cpp:server@sha256:0afa179519c5416cf4e3e5e31b8dd11a8d1bb5640db937f7341d827633fcf5eb
  MediaType: application/vnd.docker.distribution.manifest.v2+json
  Platform:  linux/arm64

  Name:      ghcr.io/ggml-org/llama.cpp:server@sha256:89791926fcf8e6470523d9c1f25607eef059671c93334e259f99f355d7cfa05f
  MediaType: application/vnd.docker.distribution.manifest.v2+json
  Platform:  linux/s390x

$ docker buildx imagetools inspect ghcr.io/ggml-org/llama.cpp:full-s390x
Name:      ghcr.io/ggml-org/llama.cpp:full-s390x
MediaType: application/vnd.docker.distribution.manifest.list.v2+json
Digest:    sha256:ab5b45d2cbfc625a8379fb437ec6fc84e32f6d1181123beaf35c115d433f6823

Manifests:
  Name:      ghcr.io/ggml-org/llama.cpp:full-s390x@sha256:556813c8a30d6443b94e0914fc719e081afbb9309abf0201a2c8f5189f83e44a
  MediaType: application/vnd.docker.distribution.manifest.v2+json
  Platform:  linux/s390x

$ docker buildx imagetools inspect ghcr.io/ggml-org/llama.cpp:server-s390x
Name:      ghcr.io/ggml-org/llama.cpp:server-s390x
MediaType: application/vnd.docker.distribution.manifest.list.v2+json
Digest:    sha256:58f444edfe9cfce870801ce7485283685e35d80e813ef52782b09c72ce57a6d2

Manifests:
  Name:      ghcr.io/ggml-org/llama.cpp:server-s390x@sha256:89791926fcf8e6470523d9c1f25607eef059671c93334e259f99f355d7cfa05f
  MediaType: application/vnd.docker.distribution.manifest.v2+json
  Platform:  linux/s390x

* CI: Enable CUDA and Vulkan ARM64 runners and fix CI/CD Co-authored-by: Ts-sound <44093942+Ts-sound@users.noreply.github.com> * Obtain source tag name from git tag Co-authored-by: Sigbjørn Skjæret <sigbjorn.skjaeret@scala.com> --------- Co-authored-by: Ts-sound <44093942+Ts-sound@users.noreply.github.com> Co-authored-by: Sigbjørn Skjæret <sigbjorn.skjaeret@scala.com>

nalf3in · 2026-04-17T18:15:58Z

Hi, this broke the llamacpp install (using a container) on my debian 13 server as the latest nvidia driver is 555.58.02-3 which only supports up to cuda 12.4 and it looks like this PR bring the minimum up to 12.8.1 (possibly 12.8 from what I understand from the bug mentioned earlier).

Would it be possible to bring the minimum back to 12.4 or make it obvious to users that cuda 12.4 is no longer supported using containers ?

ehfd · 2026-04-18T08:33:29Z

@nalf3in

Here is where the currently supported CUDA version is located: https://github.com/ggml-org/llama.cpp/blob/master/docs/docker.md#building-docker-locally

Unfortunately, Debian is frequently problematic combined with NVIDIA GPUs because of not provide adequate driver versions required. After discussion with the maintainer of this code, we decided to retain CUDA 12.8 due to the support of Blackwell GPUs.

But, I suggest reading this to install the latest driver version available in Debian: https://docs.nvidia.com/datacenter/tesla/driver-installation-guide/debian.html

mediouni-m · 2026-04-18T18:44:03Z

@nalf3in CUDA 12.9 binaries run under older CUDA 12.x releases if they do not go through the runtime PTX path but have prebuilt SASS binaries for the given architecture. But that might come with a binary size penalty...

nalf3in · 2026-04-19T02:29:36Z

@ehfd Thank you for the clarifications. In my case I will stick with building my own containers but it's nice to know there's an alternative. This is a bit pedantic, but the CUDA version is only mentioned in the "Building Docker locally" section, not in the "Usage" part where the prebuilt image is discussed. Please let me know if you think a PR to address this would be a good idea.

@mediouni-m This is interesting. I personally wouldn’t mind a larger binary size for the sake of wider compatibility, but I know that’s not everyone’s view. What do you all think? Is that a good trade-off?

ehfd · 2026-04-19T03:04:31Z

PR is always appreciated. @nalf3in And I do think SASS is a good tradeoff for especially llama.cpp.

* CI: Enable CUDA and Vulkan ARM64 runners and fix CI/CD Co-authored-by: Ts-sound <44093942+Ts-sound@users.noreply.github.com> * Obtain source tag name from git tag Co-authored-by: Sigbjørn Skjæret <sigbjorn.skjaeret@scala.com> --------- Co-authored-by: Ts-sound <44093942+Ts-sound@users.noreply.github.com> Co-authored-by: Sigbjørn Skjæret <sigbjorn.skjaeret@scala.com>

ehfd requested review from a team and ngxson as code owners March 28, 2026 15:26

Copilot AI review requested due to automatic review settings March 28, 2026 15:26

github-actions Bot added the devops improvements to build systems and github actions label Mar 28, 2026

ehfd changed the title ~~CI: Enable CUDA arm64 runners~~ CI: Enable CUDA ARM64 runners Mar 28, 2026

Copilot started reviewing on behalf of ehfd March 28, 2026 15:39 View session

Copilot AI reviewed Mar 28, 2026

View reviewed changes

Comment thread .github/workflows/docker.yml Outdated

Comment thread .github/workflows/docker.yml Outdated

ehfd mentioned this pull request Mar 28, 2026

Feature Request: Restore ARM64 Release Binaries #21091

Closed

4 tasks

ehfd force-pushed the armcuda branch 4 times, most recently from 5601477 to 74b70f2 Compare March 28, 2026 16:49

ehfd requested a review from Copilot March 28, 2026 16:54

Copilot started reviewing on behalf of ehfd March 28, 2026 17:42 View session

Copilot AI reviewed Mar 28, 2026

View reviewed changes

Comment thread .github/workflows/docker.yml Outdated

Comment thread .github/workflows/docker.yml Outdated

ehfd force-pushed the armcuda branch 3 times, most recently from d0d33ee to 9a56f11 Compare March 29, 2026 07:57

github-actions Bot added the documentation Improvements or additions to documentation label Mar 29, 2026

ehfd force-pushed the armcuda branch 3 times, most recently from b9ac3f8 to c1376ee Compare March 29, 2026 09:55

ehfd mentioned this pull request Mar 29, 2026

CI: Fix docker multiarch overwrite #21144

Closed

ehfd changed the title ~~CI: Enable CUDA ARM64 runners~~ CI: Enable CUDA and AMD ARM64 runners Mar 29, 2026

ehfd changed the title ~~CI: Enable CUDA and AMD ARM64 runners~~ CI: Enable CUDA and ROCm ARM64 runners Mar 29, 2026

ehfd force-pushed the armcuda branch 2 times, most recently from b0b92a6 to b220e6d Compare March 29, 2026 10:06

ehfd changed the title ~~CI: Enable CUDA and ROCm ARM64 runners~~ CI: Enable CUDA and ROCm ARM64 runners and fix CI/CD Mar 29, 2026

ehfd force-pushed the armcuda branch from b220e6d to e3fbdcc Compare March 29, 2026 10:11

Obtain source tag name from git tag

7484452

Co-authored-by: Sigbjørn Skjæret <sigbjorn.skjaeret@scala.com>

ehfd force-pushed the armcuda branch from ceeab52 to 7484452 Compare March 30, 2026 06:46

taronaeo reviewed Mar 30, 2026

View reviewed changes

Comment thread .github/workflows/docker.yml

taronaeo approved these changes Mar 30, 2026

View reviewed changes

CISC merged commit 84ae843 into ggml-org:master Mar 30, 2026
2 checks passed

ehfd deleted the armcuda branch March 31, 2026 02:51

This was referenced Mar 31, 2026

Misc. bug: exec /app/llama-server: exec format error (missing amd64 images) #21202

Closed

CI: Enable CPU and Vulkan ARM64 Release #21207

Merged

M1DNYT3 mentioned this pull request Apr 4, 2026

Misc. bug: Docker version of llama.cpp doesn't support CUDA 12.8 anymore #21429

Closed

ehfd mentioned this pull request Apr 18, 2026

ci : Downgrade CUDA versions for compatibility #22077

Closed

ehfd mentioned this pull request Apr 26, 2026

[Tracker] Docker build fails on CI for arm64 #11888

Open

Conversation

ehfd commented Mar 28, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Overview

Additional information

Requirements

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Reviewed changes

Uh oh!

Uh oh!

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Uh oh!

Uh oh!

Uh oh!

ehfd commented Mar 29, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

ehfd commented Mar 30, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

CISC commented Mar 30, 2026

Uh oh!

taronaeo left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

ehfd commented Mar 30, 2026

Uh oh!

taronaeo commented Mar 30, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

ehfd commented Mar 30, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

ehfd commented Mar 30, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

taronaeo commented Mar 30, 2026

Uh oh!

ehfd commented Mar 30, 2026

Uh oh!

ehfd commented Mar 30, 2026

Uh oh!

Uh oh!

ehfd commented Mar 31, 2026

Uh oh!

nalf3in commented Apr 17, 2026

Uh oh!

ehfd commented Apr 18, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

mediouni-m commented Apr 18, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

nalf3in commented Apr 19, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

ehfd commented Apr 19, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

7 participants

ehfd commented Mar 28, 2026 •

edited

Loading

ehfd commented Mar 29, 2026 •

edited

Loading

ehfd commented Mar 30, 2026 •

edited

Loading

taronaeo commented Mar 30, 2026 •

edited

Loading

ehfd commented Mar 30, 2026 •

edited

Loading

ehfd commented Mar 30, 2026 •

edited

Loading

ehfd commented Apr 18, 2026 •

edited

Loading

mediouni-m commented Apr 18, 2026 •

edited

Loading

nalf3in commented Apr 19, 2026 •

edited

Loading

ehfd commented Apr 19, 2026 •

edited

Loading