Skip to content

CI: Enable CUDA and Vulkan ARM64 runners and fix CI/CD#21122

Merged
CISC merged 3 commits into
ggml-org:masterfrom
ehfd:armcuda
Mar 30, 2026
Merged

CI: Enable CUDA and Vulkan ARM64 runners and fix CI/CD#21122
CISC merged 3 commits into
ggml-org:masterfrom
ehfd:armcuda

Conversation

@ehfd
Copy link
Copy Markdown
Contributor

@ehfd ehfd commented Mar 28, 2026

Overview

Fixes #21123

Follow-up of #20929 for enabling CUDA/Vulkan ARM64 runners and fixing ARM64 CUDA Dockerfiles (for GH200, DGX Spark, etc.), as well as vastly accelerating build time for ARM64 due to changing to native runners.

AMD ROCm is only built for x86_64, as AMD doesn't provide ARM64 base images.

CC @CISC @taronaeo

Additional information

  • Enables gcc/g++ 14 in CUDA Dockerfile images and updates images to Ubuntu 24.04 and CUDA 12.5.1 for the CUDA 12 image, and build ARM64 Vulkan Dockerfile images

  • Fixes a few aesthetic improvements (libgomp1 curl\ -> libgomp1 curl \) in other Dockerfiles.

  • Corresponding changes to docker.yml similar to CI: fix ARM64 image build error & enable compilation #20929.

  • Enable ubuntu-24.04-arm native architecture workers for ARM64 for vastly accelerated build times.

  • Properly handle multi-architecture container image manifests through digest-based upload, which was previously broken after CI: fix ARM64 image build error & enable compilation #20929.

Requirements

  • I have read and agree with the contributing guidelines: YES
  • AI usage disclosure: Gemini 3.1 Pro, GPT-5.3-Codex, purely for review and minor assistance.

@ehfd ehfd requested review from a team and ngxson as code owners March 28, 2026 15:26
Copilot AI review requested due to automatic review settings March 28, 2026 15:26
@github-actions github-actions Bot added the devops improvements to build systems and github actions label Mar 28, 2026
@ehfd ehfd changed the title CI: Enable CUDA arm64 runners CI: Enable CUDA ARM64 runners Mar 28, 2026
Copy link
Copy Markdown

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Updates the CI Docker image publishing workflow and associated Dockerfiles to support ARM64 CUDA builds and refresh CUDA/Ubuntu toolchain versions for published images.

Changes:

  • Add ARM64 matrix builds for CUDA 12 (cuda/cuda12) and CUDA 13 (cuda13) images in .github/workflows/docker.yml.
  • Update CUDA 12 Dockerfile defaults to Ubuntu 24.04 + CUDA 12.5.1 and use GCC/G++ 14 in CUDA build stages.
  • Minor Dockerfile formatting fixes (spacing before line-continuation \) across multiple image definitions.

Reviewed changes

Copilot reviewed 3 out of 8 changed files in this pull request and generated 2 comments.

Show a summary per file
File Description
.github/workflows/docker.yml Adds ARM64 CUDA build matrix entries and updates CUDA 12 version used by published images.
.devops/cuda.Dockerfile Bumps Ubuntu/CUDA defaults, switches build toolchain to GCC 14, and updates Python packaging install behavior for Ubuntu 24.04.
.devops/cuda-new.Dockerfile Uses GCC 14 toolchain and minor formatting alignment for the CUDA 13 image.
.devops/cpu.Dockerfile Formatting-only change in base-stage apt install command.
.devops/rocm.Dockerfile Formatting-only change in base-stage apt install command.
.devops/openvino.Dockerfile Formatting-only change in base-stage apt install command.
.devops/musa.Dockerfile Formatting-only change in base-stage apt install command.
.devops/intel.Dockerfile Formatting-only change in base-stage apt install command.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment thread .github/workflows/docker.yml Outdated
Comment thread .github/workflows/docker.yml Outdated
@ehfd ehfd force-pushed the armcuda branch 4 times, most recently from 5601477 to 74b70f2 Compare March 28, 2026 16:49
@ehfd ehfd requested a review from Copilot March 28, 2026 16:54
Copy link
Copy Markdown

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 3 out of 8 changed files in this pull request and generated 2 comments.

Comments suppressed due to low confidence (1)

.github/workflows/docker.yml:112

  • This step name says “Log in to Docker Hub”, but it actually authenticates to ghcr.io. Renaming the step (e.g., “Log in to GHCR”) will make the workflow’s behavior clearer.
      - name: Log in to Docker Hub
        uses: docker/login-action@b45d80f862d83dbcd57f89517bcf500b2ab88fb2 # v4
        with:
          registry: ghcr.io
          username: ${{ github.repository_owner }}
          password: ${{ secrets.GITHUB_TOKEN }}

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment thread .github/workflows/docker.yml Outdated
Comment thread .github/workflows/docker.yml Outdated
@ehfd ehfd force-pushed the armcuda branch 3 times, most recently from d0d33ee to 9a56f11 Compare March 29, 2026 07:57
@github-actions github-actions Bot added the documentation Improvements or additions to documentation label Mar 29, 2026
@ehfd ehfd force-pushed the armcuda branch 3 times, most recently from b9ac3f8 to c1376ee Compare March 29, 2026 09:55
@ehfd ehfd changed the title CI: Enable CUDA ARM64 runners CI: Enable CUDA and AMD ARM64 runners Mar 29, 2026
@ehfd ehfd changed the title CI: Enable CUDA and AMD ARM64 runners CI: Enable CUDA and ROCm ARM64 runners Mar 29, 2026
@ehfd
Copy link
Copy Markdown
Contributor Author

ehfd commented Mar 29, 2026

https://github.com/ehfd/llama.cpp/actions/runs/23704816053

@CISC @taronaeo Perfectly works (s390x was excluded due to the IBM runner not being available in the fork).

@ehfd ehfd force-pushed the armcuda branch 2 times, most recently from b0b92a6 to b220e6d Compare March 29, 2026 10:06
@ehfd ehfd changed the title CI: Enable CUDA and ROCm ARM64 runners CI: Enable CUDA and ROCm ARM64 runners and fix CI/CD Mar 29, 2026
Co-authored-by: Sigbjørn Skjæret <sigbjorn.skjaeret@scala.com>
@ehfd
Copy link
Copy Markdown
Contributor Author

ehfd commented Mar 30, 2026

https://github.com/ehfd/llama.cpp/actions/runs/23731725922

@CISC @taronaeo Had to fix the Vulkan image because Ubuntu 26.04 Beta removed Python 3.13 overnight (replaced with uv to respect #20530). The Ubuntu packages page removed Python 3.13 for 26.04 between yesterday and today, and the minimum is 3.14.

Edit: Vulkan now passes.

@CISC
Copy link
Copy Markdown
Member

CISC commented Mar 30, 2026

@CISC @taronaeo Had to fix the Vulkan image because Ubuntu 26.04 Beta removed Python 3.13 overnight (replaced with uv to respect #20530). The Ubuntu packages page removed Python 3.13 for 26.04 between yesterday and today, and the minimum is 3.14.

This image is becoming annoying. :)

Copy link
Copy Markdown
Member

@taronaeo taronaeo left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not familiar with the multi-architecture, multi-platform setup that you've proposed here and will need to read the Medium article linked before I can give an approval.

Same as @CISC's comment here, I think this is very elaborate and IMO hard to maintain (especially the jq commands).

If another maintainer understands this well enough, they can give the approval in the meantime :)

Comment thread .github/workflows/docker.yml
@ehfd
Copy link
Copy Markdown
Contributor Author

ehfd commented Mar 30, 2026

Well, the CI/CD passed.

@taronaeo
Copy link
Copy Markdown
Member

taronaeo commented Mar 30, 2026

I'm reading this blog from Docker themselves: https://www.docker.com/blog/multi-arch-build-and-images-the-simple-way/

It appears that the solution proposed here is slightly outdated and as I thought, harder than it actually should be.

Can't we just use docker buildx to simplify this process? Am I missing something? https://www.docker.com/blog/multi-arch-build-and-images-the-simple-way/#:~:text=The%20simple%20way%20with%20docker%20buildx

Edit: More documentation: https://docs.docker.com/build/building/multi-platform/#build-multi-platform-images

@ehfd
Copy link
Copy Markdown
Contributor Author

ehfd commented Mar 30, 2026

@taronaeo The simple way doesn't work because we are running in GitHub Actions and pushing to GitHub Registry, and such a method would easily lead to 6-hour build times (which timeout) vs 1-hour due to emulating arm64 and s390x within x86_64 instead of using different native runners.

There is one other way not in that article to use that syntax you proposed, and at the same time run in native build runners, only when we have control of the configurations for build nodes of each architecture, but not in GitHub Actions, because we have no such control of dispatching builds to various remote runners.

In such a situation, using docker manifest create and the more modern equivalent docker buildx imagetools create is standard.

@ehfd
Copy link
Copy Markdown
Contributor Author

ehfd commented Mar 30, 2026

A more to-the-point AI explanation:

The short answer is: No, you cannot specify multiple platforms in a single build step and expect them all to run natively on different runners.

When you set platforms: "linux/amd64,linux/arm64" in a single docker/build-push-action step, that specific runner tries to build both. Since a runner only has one native architecture, it will use QEMU to emulate the other one, which is significantly slower.

However, your current workflow is actually already using the "industry standard" best practice for achieving what you want. Here is the breakdown of why your current approach is the right one and how to think about the platforms field.

Why your current "Matrix + Merge" is better
Your workflow uses a distributed build pattern. This is the most efficient way to handle multi-arch builds in GitHub Actions.

Native Speed: By splitting the matrix so linux/amd64 runs on ubuntu-24.04 and linux/arm64 runs on ubuntu-24.04-arm, every instruction in your Dockerfile runs at native hardware speed.

Avoids QEMU: Emulating ARM on AMD64 (or vice versa) can be 10x–20x slower, especially for compiled languages (C++, Rust, Go).

The Manifest Merge: The merge_arch_tags job using imagetools create is the "glue." It doesn't rebuild anything; it just points a single tag (like my-image:latest) to the different architecture-specific digests you already pushed.

What happens if you change "platforms" to both?
If you modified your JSON to look like this:
{ "platforms": "linux/amd64,linux/arm64", "runs_on": "ubuntu-24.04" }

The ubuntu-24.04 (AMD64) runner would start.

It would build the AMD64 version natively.

It would see it also needs an ARM64 version and start a QEMU emulator.

The ARM64 build would take much longer and likely time out or exhaust resources.

The "Native" Alternative: Remote Build Nodes
If you truly want to use a single platforms: "linux/amd64,linux/arm64" line in one job but have it run natively, you would need to set up a remote Buildx driver.

This involves:

Starting an ARM64 runner and an AMD64 runner.

Connecting them via SSH.

Configuring Buildx to use both as "nodes" in a single builder instance.

Why you shouldn't do this: It is significantly harder to maintain in GitHub Actions because of networking/SSH complexities between ephemeral runners. Your current "Push by Digest & Merge" strategy is much more robust.

@taronaeo
Copy link
Copy Markdown
Member

Okay I am starting to get the full picture of this approach now, and can understand why. In this case, this PR is perfectly acceptable.

@ehfd
Copy link
Copy Markdown
Contributor Author

ehfd commented Mar 30, 2026

@taronaeo Okay, then! Thank you. Can't wait to get it merged.

@ehfd
Copy link
Copy Markdown
Contributor Author

ehfd commented Mar 30, 2026

@CISC Two approvals!

@CISC CISC merged commit 84ae843 into ggml-org:master Mar 30, 2026
2 checks passed
@ehfd ehfd deleted the armcuda branch March 31, 2026 02:51
@ehfd
Copy link
Copy Markdown
Contributor Author

ehfd commented Mar 31, 2026

Confirming everything is as intended.

$ docker buildx imagetools inspect ghcr.io/ggml-org/llama.cpp:full
Name:      ghcr.io/ggml-org/llama.cpp:full
MediaType: application/vnd.docker.distribution.manifest.list.v2+json
Digest:    sha256:1afcc00e4a686ce0119e1d352fc98ad3cb065fdd78fa08497eb8a5e0bf287b44

Manifests:
  Name:      ghcr.io/ggml-org/llama.cpp:full@sha256:95dc6931d877f9236b887daedf84044dd40936538c09067a1dccd668dc6dc1ef
  MediaType: application/vnd.docker.distribution.manifest.v2+json
  Platform:  linux/amd64

  Name:      ghcr.io/ggml-org/llama.cpp:full@sha256:0b7b7b417c76991273c27fcb7a6d10dbc4b08e4cdf2816d82b13a05a1f1000f5
  MediaType: application/vnd.docker.distribution.manifest.v2+json
  Platform:  linux/arm64

  Name:      ghcr.io/ggml-org/llama.cpp:full@sha256:556813c8a30d6443b94e0914fc719e081afbb9309abf0201a2c8f5189f83e44a
  MediaType: application/vnd.docker.distribution.manifest.v2+json
  Platform:  linux/s390x

$ docker buildx imagetools inspect ghcr.io/ggml-org/llama.cpp:server
Name:      ghcr.io/ggml-org/llama.cpp:server
MediaType: application/vnd.docker.distribution.manifest.list.v2+json
Digest:    sha256:80910e898e5d9a6b46ca9d1b4674d3e15faf6d32b9692eb6011ccd34b2cb8a06

Manifests:
  Name:      ghcr.io/ggml-org/llama.cpp:server@sha256:044d63114bfdb99d7300d8e490c4ccdc67b19d750d62cca603a3763b41113939
  MediaType: application/vnd.docker.distribution.manifest.v2+json
  Platform:  linux/amd64

  Name:      ghcr.io/ggml-org/llama.cpp:server@sha256:0afa179519c5416cf4e3e5e31b8dd11a8d1bb5640db937f7341d827633fcf5eb
  MediaType: application/vnd.docker.distribution.manifest.v2+json
  Platform:  linux/arm64

  Name:      ghcr.io/ggml-org/llama.cpp:server@sha256:89791926fcf8e6470523d9c1f25607eef059671c93334e259f99f355d7cfa05f
  MediaType: application/vnd.docker.distribution.manifest.v2+json
  Platform:  linux/s390x

$ docker buildx imagetools inspect ghcr.io/ggml-org/llama.cpp:full-s390x
Name:      ghcr.io/ggml-org/llama.cpp:full-s390x
MediaType: application/vnd.docker.distribution.manifest.list.v2+json
Digest:    sha256:ab5b45d2cbfc625a8379fb437ec6fc84e32f6d1181123beaf35c115d433f6823

Manifests:
  Name:      ghcr.io/ggml-org/llama.cpp:full-s390x@sha256:556813c8a30d6443b94e0914fc719e081afbb9309abf0201a2c8f5189f83e44a
  MediaType: application/vnd.docker.distribution.manifest.v2+json
  Platform:  linux/s390x

$ docker buildx imagetools inspect ghcr.io/ggml-org/llama.cpp:server-s390x
Name:      ghcr.io/ggml-org/llama.cpp:server-s390x
MediaType: application/vnd.docker.distribution.manifest.list.v2+json
Digest:    sha256:58f444edfe9cfce870801ce7485283685e35d80e813ef52782b09c72ce57a6d2

Manifests:
  Name:      ghcr.io/ggml-org/llama.cpp:server-s390x@sha256:89791926fcf8e6470523d9c1f25607eef059671c93334e259f99f355d7cfa05f
  MediaType: application/vnd.docker.distribution.manifest.v2+json
  Platform:  linux/s390x

slartibardfast pushed a commit to slartibardfast/llama.cpp that referenced this pull request Apr 12, 2026
* CI: Enable CUDA and Vulkan ARM64 runners and fix CI/CD

Co-authored-by: Ts-sound <44093942+Ts-sound@users.noreply.github.com>

* Obtain source tag name from git tag

Co-authored-by: Sigbjørn Skjæret <sigbjorn.skjaeret@scala.com>

---------

Co-authored-by: Ts-sound <44093942+Ts-sound@users.noreply.github.com>
Co-authored-by: Sigbjørn Skjæret <sigbjorn.skjaeret@scala.com>
@nalf3in
Copy link
Copy Markdown

nalf3in commented Apr 17, 2026

Hi, this broke the llamacpp install (using a container) on my debian 13 server as the latest nvidia driver is 555.58.02-3 which only supports up to cuda 12.4 and it looks like this PR bring the minimum up to 12.8.1 (possibly 12.8 from what I understand from the bug mentioned earlier).

Would it be possible to bring the minimum back to 12.4 or make it obvious to users that cuda 12.4 is no longer supported using containers ?

@ehfd
Copy link
Copy Markdown
Contributor Author

ehfd commented Apr 18, 2026

@nalf3in

Here is where the currently supported CUDA version is located: https://github.com/ggml-org/llama.cpp/blob/master/docs/docker.md#building-docker-locally

Unfortunately, Debian is frequently problematic combined with NVIDIA GPUs because of not provide adequate driver versions required. After discussion with the maintainer of this code, we decided to retain CUDA 12.8 due to the support of Blackwell GPUs.

But, I suggest reading this to install the latest driver version available in Debian: https://docs.nvidia.com/datacenter/tesla/driver-installation-guide/debian.html

@mediouni-m
Copy link
Copy Markdown
Contributor

mediouni-m commented Apr 18, 2026

@nalf3in CUDA 12.9 binaries run under older CUDA 12.x releases if they do not go through the runtime PTX path but have prebuilt SASS binaries for the given architecture. But that might come with a binary size penalty...

@nalf3in
Copy link
Copy Markdown

nalf3in commented Apr 19, 2026

@ehfd Thank you for the clarifications. In my case I will stick with building my own containers but it's nice to know there's an alternative. This is a bit pedantic, but the CUDA version is only mentioned in the "Building Docker locally" section, not in the "Usage" part where the prebuilt image is discussed. Please let me know if you think a PR to address this would be a good idea.

@mediouni-m This is interesting. I personally wouldn’t mind a larger binary size for the sake of wider compatibility, but I know that’s not everyone’s view. What do you all think? Is that a good trade-off?

@ehfd
Copy link
Copy Markdown
Contributor Author

ehfd commented Apr 19, 2026

PR is always appreciated. @nalf3in And I do think SASS is a good tradeoff for especially llama.cpp.

Seunghhon pushed a commit to Seunghhon/llama.cpp that referenced this pull request Apr 26, 2026
* CI: Enable CUDA and Vulkan ARM64 runners and fix CI/CD

Co-authored-by: Ts-sound <44093942+Ts-sound@users.noreply.github.com>

* Obtain source tag name from git tag

Co-authored-by: Sigbjørn Skjæret <sigbjorn.skjaeret@scala.com>

---------

Co-authored-by: Ts-sound <44093942+Ts-sound@users.noreply.github.com>
Co-authored-by: Sigbjørn Skjæret <sigbjorn.skjaeret@scala.com>
rsenthilkumar6 pushed a commit to rsenthilkumar6/llama.cpp that referenced this pull request May 1, 2026
* CI: Enable CUDA and Vulkan ARM64 runners and fix CI/CD

Co-authored-by: Ts-sound <44093942+Ts-sound@users.noreply.github.com>

* Obtain source tag name from git tag

Co-authored-by: Sigbjørn Skjæret <sigbjorn.skjaeret@scala.com>

---------

Co-authored-by: Ts-sound <44093942+Ts-sound@users.noreply.github.com>
Co-authored-by: Sigbjørn Skjæret <sigbjorn.skjaeret@scala.com>
ljubomirj pushed a commit to ljubomirj/llama.cpp that referenced this pull request May 6, 2026
* CI: Enable CUDA and Vulkan ARM64 runners and fix CI/CD

Co-authored-by: Ts-sound <44093942+Ts-sound@users.noreply.github.com>

* Obtain source tag name from git tag

Co-authored-by: Sigbjørn Skjæret <sigbjorn.skjaeret@scala.com>

---------

Co-authored-by: Ts-sound <44093942+Ts-sound@users.noreply.github.com>
Co-authored-by: Sigbjørn Skjæret <sigbjorn.skjaeret@scala.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

devops improvements to build systems and github actions documentation Improvements or additions to documentation

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Misc. bug: Docker image

7 participants