CI: Enable CUDA and Vulkan ARM64 runners and fix CI/CD#21122
Conversation
There was a problem hiding this comment.
Pull request overview
Updates the CI Docker image publishing workflow and associated Dockerfiles to support ARM64 CUDA builds and refresh CUDA/Ubuntu toolchain versions for published images.
Changes:
- Add ARM64 matrix builds for CUDA 12 (
cuda/cuda12) and CUDA 13 (cuda13) images in.github/workflows/docker.yml. - Update CUDA 12 Dockerfile defaults to Ubuntu 24.04 + CUDA 12.5.1 and use GCC/G++ 14 in CUDA build stages.
- Minor Dockerfile formatting fixes (spacing before line-continuation
\) across multiple image definitions.
Reviewed changes
Copilot reviewed 3 out of 8 changed files in this pull request and generated 2 comments.
Show a summary per file
| File | Description |
|---|---|
.github/workflows/docker.yml |
Adds ARM64 CUDA build matrix entries and updates CUDA 12 version used by published images. |
.devops/cuda.Dockerfile |
Bumps Ubuntu/CUDA defaults, switches build toolchain to GCC 14, and updates Python packaging install behavior for Ubuntu 24.04. |
.devops/cuda-new.Dockerfile |
Uses GCC 14 toolchain and minor formatting alignment for the CUDA 13 image. |
.devops/cpu.Dockerfile |
Formatting-only change in base-stage apt install command. |
.devops/rocm.Dockerfile |
Formatting-only change in base-stage apt install command. |
.devops/openvino.Dockerfile |
Formatting-only change in base-stage apt install command. |
.devops/musa.Dockerfile |
Formatting-only change in base-stage apt install command. |
.devops/intel.Dockerfile |
Formatting-only change in base-stage apt install command. |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
5601477 to
74b70f2
Compare
There was a problem hiding this comment.
Pull request overview
Copilot reviewed 3 out of 8 changed files in this pull request and generated 2 comments.
Comments suppressed due to low confidence (1)
.github/workflows/docker.yml:112
- This step name says “Log in to Docker Hub”, but it actually authenticates to
ghcr.io. Renaming the step (e.g., “Log in to GHCR”) will make the workflow’s behavior clearer.
- name: Log in to Docker Hub
uses: docker/login-action@b45d80f862d83dbcd57f89517bcf500b2ab88fb2 # v4
with:
registry: ghcr.io
username: ${{ github.repository_owner }}
password: ${{ secrets.GITHUB_TOKEN }}
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
d0d33ee to
9a56f11
Compare
b9ac3f8 to
c1376ee
Compare
|
https://github.com/ehfd/llama.cpp/actions/runs/23704816053 @CISC @taronaeo Perfectly works (s390x was excluded due to the IBM runner not being available in the fork). |
b0b92a6 to
b220e6d
Compare
Co-authored-by: Sigbjørn Skjæret <sigbjorn.skjaeret@scala.com>
|
https://github.com/ehfd/llama.cpp/actions/runs/23731725922 @CISC @taronaeo Had to fix the Vulkan image because Ubuntu 26.04 Beta removed Python 3.13 overnight (replaced with Edit: Vulkan now passes. |
taronaeo
left a comment
There was a problem hiding this comment.
I'm not familiar with the multi-architecture, multi-platform setup that you've proposed here and will need to read the Medium article linked before I can give an approval.
Same as @CISC's comment here, I think this is very elaborate and IMO hard to maintain (especially the jq commands).
If another maintainer understands this well enough, they can give the approval in the meantime :)
|
Well, the CI/CD passed. |
|
I'm reading this blog from Docker themselves: https://www.docker.com/blog/multi-arch-build-and-images-the-simple-way/ It appears that the solution proposed here is slightly outdated and as I thought, harder than it actually should be. Can't we just use Edit: More documentation: https://docs.docker.com/build/building/multi-platform/#build-multi-platform-images |
|
@taronaeo The simple way doesn't work because we are running in GitHub Actions and pushing to GitHub Registry, and such a method would easily lead to 6-hour build times (which timeout) vs 1-hour due to emulating arm64 and s390x within x86_64 instead of using different native runners. There is one other way not in that article to use that syntax you proposed, and at the same time run in native build runners, only when we have control of the configurations for build nodes of each architecture, but not in GitHub Actions, because we have no such control of dispatching builds to various remote runners. In such a situation, using |
|
A more to-the-point AI explanation: The short answer is: No, you cannot specify multiple platforms in a single build step and expect them all to run natively on different runners. When you set platforms: "linux/amd64,linux/arm64" in a single docker/build-push-action step, that specific runner tries to build both. Since a runner only has one native architecture, it will use QEMU to emulate the other one, which is significantly slower. However, your current workflow is actually already using the "industry standard" best practice for achieving what you want. Here is the breakdown of why your current approach is the right one and how to think about the platforms field. Why your current "Matrix + Merge" is better Native Speed: By splitting the matrix so linux/amd64 runs on ubuntu-24.04 and linux/arm64 runs on ubuntu-24.04-arm, every instruction in your Dockerfile runs at native hardware speed. Avoids QEMU: Emulating ARM on AMD64 (or vice versa) can be 10x–20x slower, especially for compiled languages (C++, Rust, Go). The Manifest Merge: The merge_arch_tags job using imagetools create is the "glue." It doesn't rebuild anything; it just points a single tag (like my-image:latest) to the different architecture-specific digests you already pushed. What happens if you change "platforms" to both? The ubuntu-24.04 (AMD64) runner would start. It would build the AMD64 version natively. It would see it also needs an ARM64 version and start a QEMU emulator. The ARM64 build would take much longer and likely time out or exhaust resources. The "Native" Alternative: Remote Build Nodes This involves: Starting an ARM64 runner and an AMD64 runner. Connecting them via SSH. Configuring Buildx to use both as "nodes" in a single builder instance. Why you shouldn't do this: It is significantly harder to maintain in GitHub Actions because of networking/SSH complexities between ephemeral runners. Your current "Push by Digest & Merge" strategy is much more robust. |
|
Okay I am starting to get the full picture of this approach now, and can understand why. In this case, this PR is perfectly acceptable. |
|
@taronaeo Okay, then! Thank you. Can't wait to get it merged. |
|
@CISC Two approvals! |
|
Confirming everything is as intended. |
* CI: Enable CUDA and Vulkan ARM64 runners and fix CI/CD Co-authored-by: Ts-sound <44093942+Ts-sound@users.noreply.github.com> * Obtain source tag name from git tag Co-authored-by: Sigbjørn Skjæret <sigbjorn.skjaeret@scala.com> --------- Co-authored-by: Ts-sound <44093942+Ts-sound@users.noreply.github.com> Co-authored-by: Sigbjørn Skjæret <sigbjorn.skjaeret@scala.com>
|
Hi, this broke the llamacpp install (using a container) on my debian 13 server as the latest nvidia driver is 555.58.02-3 which only supports up to cuda 12.4 and it looks like this PR bring the minimum up to 12.8.1 (possibly 12.8 from what I understand from the bug mentioned earlier). Would it be possible to bring the minimum back to 12.4 or make it obvious to users that cuda 12.4 is no longer supported using containers ? |
|
Here is where the currently supported CUDA version is located: https://github.com/ggml-org/llama.cpp/blob/master/docs/docker.md#building-docker-locally Unfortunately, Debian is frequently problematic combined with NVIDIA GPUs because of not provide adequate driver versions required. After discussion with the maintainer of this code, we decided to retain CUDA 12.8 due to the support of Blackwell GPUs. But, I suggest reading this to install the latest driver version available in Debian: https://docs.nvidia.com/datacenter/tesla/driver-installation-guide/debian.html |
|
@nalf3in CUDA 12.9 binaries run under older CUDA 12.x releases if they do not go through the runtime PTX path but have prebuilt SASS binaries for the given architecture. But that might come with a binary size penalty... |
|
@ehfd Thank you for the clarifications. In my case I will stick with building my own containers but it's nice to know there's an alternative. This is a bit pedantic, but the CUDA version is only mentioned in the "Building Docker locally" section, not in the "Usage" part where the prebuilt image is discussed. Please let me know if you think a PR to address this would be a good idea. @mediouni-m This is interesting. I personally wouldn’t mind a larger binary size for the sake of wider compatibility, but I know that’s not everyone’s view. What do you all think? Is that a good trade-off? |
|
PR is always appreciated. @nalf3in And I do think SASS is a good tradeoff for especially llama.cpp. |
* CI: Enable CUDA and Vulkan ARM64 runners and fix CI/CD Co-authored-by: Ts-sound <44093942+Ts-sound@users.noreply.github.com> * Obtain source tag name from git tag Co-authored-by: Sigbjørn Skjæret <sigbjorn.skjaeret@scala.com> --------- Co-authored-by: Ts-sound <44093942+Ts-sound@users.noreply.github.com> Co-authored-by: Sigbjørn Skjæret <sigbjorn.skjaeret@scala.com>
* CI: Enable CUDA and Vulkan ARM64 runners and fix CI/CD Co-authored-by: Ts-sound <44093942+Ts-sound@users.noreply.github.com> * Obtain source tag name from git tag Co-authored-by: Sigbjørn Skjæret <sigbjorn.skjaeret@scala.com> --------- Co-authored-by: Ts-sound <44093942+Ts-sound@users.noreply.github.com> Co-authored-by: Sigbjørn Skjæret <sigbjorn.skjaeret@scala.com>
* CI: Enable CUDA and Vulkan ARM64 runners and fix CI/CD Co-authored-by: Ts-sound <44093942+Ts-sound@users.noreply.github.com> * Obtain source tag name from git tag Co-authored-by: Sigbjørn Skjæret <sigbjorn.skjaeret@scala.com> --------- Co-authored-by: Ts-sound <44093942+Ts-sound@users.noreply.github.com> Co-authored-by: Sigbjørn Skjæret <sigbjorn.skjaeret@scala.com>
Overview
Fixes #21123
Follow-up of #20929 for enabling CUDA/Vulkan ARM64 runners and fixing ARM64 CUDA Dockerfiles (for GH200, DGX Spark, etc.), as well as vastly accelerating build time for ARM64 due to changing to native runners.
AMD ROCm is only built for x86_64, as AMD doesn't provide ARM64 base images.
CC @CISC @taronaeo
Additional information
Enables gcc/g++ 14 in CUDA Dockerfile images and updates images to Ubuntu 24.04 and CUDA 12.5.1 for the CUDA 12 image, and build ARM64 Vulkan Dockerfile images
Fixes a few aesthetic improvements (
libgomp1 curl\->libgomp1 curl \) in other Dockerfiles.Corresponding changes to
docker.ymlsimilar to CI: fix ARM64 image build error & enable compilation #20929.Enable
ubuntu-24.04-armnative architecture workers for ARM64 for vastly accelerated build times.Properly handle multi-architecture container image manifests through digest-based upload, which was previously broken after CI: fix ARM64 image build error & enable compilation #20929.
Requirements