ci : reduce (disable SYCL and CANN builds/releases) by ggerganov · Pull Request #23705 · ggml-org/llama.cpp

ggerganov · 2026-05-26T07:45:05Z

Overview

I believe we are trashing the Github Actions cache too much lately which is causing slow CI overall. This PR aims to disable some of the builds with the goal to lift some of the cache and runners pressure.

The SYCL builds alone consume more than 1/3 of the total 10GB cache that we have. I don't think it's reasonable, so disabling them for now. In order to re-enable, we have to provision dedicated runners.
The openEuler builds are not consuming cache which is good. However, they allocate slots from the GH hosted runners. I'd like to move these builds to dedicated runners too.

TODO

I am still considering what to do with the ROCm/HIP builds. These are probably important, but also feels like we should already have some dedicated runners for them.
https://github.com/ggml-org/llama.cpp/pull/16452/changes#r3302314304

Additional information

Also prefixed the caches with cache-gha- to be able to search and match easily.

Requirements

I have read and agree with the contributing guidelines
AI usage disclosure: NO

[no ci]

ggerganov · 2026-05-26T08:30:14Z

@arthw @hipudding PTAL - I am planning to disable the SYCL and CANN builds and releases until we provision more resources.

@IMbackK In case you have some ideas about the ROCm/HIP builds. These are probably likely to stay for now, but if the CI continues to be slow, we'll likely have to remove those too.

Btw, I see that for ROCm we only create a Linux release and for HIP we create only a Windows release. Why is that? I.e. why not create both Linux/Windows for both?

IMbackK · 2026-05-26T10:04:54Z

@ggerganov HIP is a programming language of which ROCm is an implementation. As for why the linux build is called rocm and the windows build is called HIP, i have no idea, we should probably just call both of them ROCm as we dont actually support running the HIP backend on platforms other than ROCm anymore (ie hip-cpu or hip-nvidia)

I dont really have any ideas for reducing the CI impact of the HIP backend. Really we should be doing more builds of it, not less, as currently we dont build for all targets until release time, which has cause release time build failures before as we have plenty of ifdefe'd code paths that are only compiled on specific targets.
The hip backend sufferes from this fact and the fact that llvm's amdgcn target is particularly slow.

I dont see any way here other than acquiring more resources.

ggerganov · 2026-05-26T10:36:07Z

Ok, thanks. I'll fix the naming to use ROCm for both.

Edit: will keep the names for now.

CISC

We could also disable caching of the oneAPI toolkit at the risk of download failing...

ggerganov · 2026-05-26T12:21:10Z

We can try this from master. It will depend on how long it takes too.

arthw · 2026-05-26T12:31:01Z

@ggerganov
SYCL CI only includes compile and build binary package.
It won't take more computing resource in fact.

SYCL CI has been separated from the build.yml, other backend PR won't trigger SYCL CI action.
I could disable the cache in it.
So that SYCL CI won't take more resource.

Lots of windows users use the binary package directly.
Is it possible to restore the SYCL back CI after disable cache?

Thank you!

arthw · 2026-05-26T12:40:23Z

Maybe we could reduce the CI workload.
Here is my suggestion: #20446 (comment).
Now build.yml include 12 tasks: mac, vulkan, cuda, cpu, windows.

ggerganov · 2026-05-26T12:44:17Z

How long do the SYCL jobs take without the cache?

arthw · 2026-05-26T12:45:12Z

about 20 mins: main time is download oneAPI and install it locally.

arthw · 2026-05-26T13:10:36Z

In a pure CUDA code changed PR: #23349
There are 50 jobs in CI.

But only CUDA jobs is useful in fact.
We could skip other jobs for this case to reduce the work load.

Note, there is no SYCL job running for this PR.

Determine tag name
build-cmake-pkg / linux
model-naming
editorconfig
ubuntu-22-hip-quality-check
labeler
server (default)
server (backend-sampling)
ggml-ci-nvidia-webgpu
macOS-latest-arm64
server-windows
ggml-ci-nvidia-cuda
macOS-latest-x64
ggml-ci-nvidia-vulkan-cm
macOS-latest-arm64-webgpu
ggml-ci-nvidia-vulkan-cm2
ubuntu-cpu (x64, ubuntu-22.04)
ubuntu-cpu (arm64, ubuntu-24.04-arm)
ubuntu-cpu (s390x, ubuntu-24.04-s390x)
ubuntu-cpu (ppc64le, ubuntu-24.04-ppc64le)
ggml-ci-mac-metal
android-arm64
ggml-ci-mac-webgpu
ubuntu-latest-rpc
ggml-ci-mac-vulkan
ubuntu-24-vulkan (x64, ubuntu-24.04)
ubuntu-24-vulkan (arm64, ubuntu-24.04-arm)
ggml-ci-linux-intel-vulkan
ubuntu-24-webgpu
ggml-ci-win-intel-vulkan
ubuntu-24-webgpu-wasm
ggml-ci-intel-openvino-gpu-low-perf
ubuntu-22-hip
ubuntu-22-musa
windows-latest (cpu-x64 (static), x64, -G "Ninja Multi-Config" -D CMAKE_TOOLCHAIN_FILE=cmake/x64-...
windows-latest (openblas-x64, x64, -G "Ninja Multi-Config" -D CMAKE_TOOLCHAIN_FILE=cmake/x64-wind...
windows-latest (vulkan-x64, x64, -DCMAKE_BUILD_TYPE=Release -DGGML_NATIVE=OFF -DLLAMA_BUILD_SERVE...
windows-latest (llvm-arm64, arm64, -G "Ninja Multi-Config" -D CMAKE_TOOLCHAIN_FILE=cmake/arm64-wi...
windows-latest (llvm-arm64-opencl-adreno, arm64, -G "Ninja Multi-Config" -D CMAKE_TOOLCHAIN_FILE=...
ubuntu-latest-cudawindows-2022-cuda (12.4)
windows-latest-hip
ubuntu-cpu-riscv64-native
ggml-ci-x64-cpu-low-perf
ggml-ci-arm64-cpu-low-perf
ggml-ci-x64-cpu-high-perf
ggml-ci-arm64-cpu-high-perf
ggml-ci-arm64-cpu-high-perf-sve
ggml-ci-arm64-cpu-kleidiai
ggml-ci-arm64-cpu-kleidiai-graviton4

IMbackK · 2026-05-26T13:22:41Z

In the case of that pr its actually only the hip jobs that are useful not the cuda ones altho separating if a change affects the hip backend the cuda backend or both is beyond a ci script.

* origin/master: (59 commits) ggml-zendnn : fixed naming of matmul function (ggml-org#20964) ci : do not allocate ccache for 3rd-party hosted runners (ggml-org#23730) ci : move [no release] check to dedicated check_release job (ggml-org#23734) ci : add `[no release]` keyword + fix sanitizer builds (ggml-org#23728) ci : move macos jobs to the apple workflow + fix names (ggml-org#23721) vulkan: optimize conv2d and implement coopmat1 support (ggml-org#22620) ci : remove vulkan SDK dep from webgpu job (ggml-org#23718) hexagon: add support for CONCAT op (ggml-org#23648) ci : move more CPU jobs to self-hosted runners (ggml-org#23715) ci : move sanitizer jobs to self-hosted runners (ggml-org#23713) ci : reduce (disable SYCL and CANN builds/releases) (ggml-org#23705) convert : support Gemma4ForCausalLM architecture (ggml-org#23682) models : Attach Mistral3 NVFP4 weight scales (ggml-org#23629) SYCL: implement ggml_sycl_pool_vmm (ggml-org#22862) tests: test-backend-ops -j <N> to run tests in parallel (ggml-org#23637) model : add support for talkie-1930-13b (ggml-org#22596) ggml-webgpu: Add MMVQ path for Q4/Q8/Q2_K/Q4_K and clean up legacy MUL_MAT pipeline (ggml-org#23594) [WebGPU] Check batch_compute_passes before sending passes when not doing GPU profiling (ggml-org#23457) CUDA: missing PDL sync for FWHT, better fallback (ggml-org#23690) metal : add apple device id (ggml-org#23566) ...

* ci : reduce [no ci] * cont : disable sycl, cann + rename caches [no ci] * cont : cann [no ci]

cristianadam · 2026-06-02T09:50:00Z

At Qt Creator we save the ccache directory of a build as an artifact. Then when a new build starts it looks over previous artifacts and downloads the corresponding ccache archive.

This bypasses the 10 GB cache that GitHub has, since you can have unlimited space for build artifacts 😅

See https://github.com/qt-creator/qt-creator/blob/master/.github/workflows/build_cmake.yml#L556 for details.

The build artifacts are short lived and I think it doesn't affect GitHub's disk space that much.

I think it won't be too hard for an LLM to convert the CMake code to something else used by llama.cpp's CI build yaml files.

alexander454584-cpu · 2026-06-03T20:24:34Z

Please return SYCL.

arthw · 2026-06-04T02:49:15Z

@ggerganov @cics
How do you think about the solution of @cristianadam in QT to avoid ccache big size?
If yes, I can implement it in SYCL CI.

Thank you!

ggerganov · 2026-06-04T05:37:32Z

@arthw Would need to see the implementation and the performance to decide. You can give it a try in a fork and when you have something working I'll take a look.

AG1M · 2026-06-04T12:25:54Z

Thanks a lot, having the Windows SYCL builds back again would be awesome.

NeoZhangJianyu · 2026-06-05T02:12:24Z

@ggerganov
Got it! I will do it.

Thank you!

Fmstrat · 2026-06-05T03:15:39Z

Will we get Linux SYCL back?

ci : reduce

c49777e

[no ci]

github-actions Bot added the devops improvements to build systems and github actions label May 26, 2026

ggerganov added 2 commits May 26, 2026 11:04

cont : disable sycl, cann + rename caches

26a034b

[no ci]

cont : cann

1c9afcb

[no ci]

ggerganov marked this pull request as ready for review May 26, 2026 08:33

ggerganov requested a review from a team as a code owner May 26, 2026 08:33

ggerganov changed the title ~~ci : reduce~~ ci : reduce (disable SYCL and CANN builds/releases) May 26, 2026

CISC approved these changes May 26, 2026

View reviewed changes

ggerganov merged commit 3dc7684 into master May 26, 2026
1 check passed

ggerganov deleted the gg/ci-reduce branch May 26, 2026 12:21

ggerganov mentioned this pull request May 27, 2026

ci : move ARM jobs to self-hosted + disable kleidiai mac release #23780

Merged

github-actions Bot mentioned this pull request May 27, 2026

chore: bump llama.cpp to b9528 leehack/llama-web-bridge#17

Open

4 tasks

fewtarius pushed a commit to fewtarius/llama.cpp that referenced this pull request May 30, 2026

ci : reduce (disable SYCL and CANN builds/releases) (ggml-org#23705)

0c5883c

* ci : reduce [no ci] * cont : disable sycl, cann + rename caches [no ci] * cont : cann [no ci]

turbo-tan pushed a commit to turbo-tan/llama.cpp-tq3 that referenced this pull request Jun 2, 2026

ci : reduce (disable SYCL and CANN builds/releases) (ggml-org#23705)

ddcb0a9

* ci : reduce [no ci] * cont : disable sycl, cann + rename caches [no ci] * cont : cann [no ci]

Fmstrat mentioned this pull request Jun 5, 2026

Releases: No Ubuntu SYCL pre-built #22021

Closed

This was referenced Jun 5, 2026

ci(sycl): drop auto-triggers (workflow has zero jobs upstream) heiervang-technologies/ht-llama.cpp#74

Open

ci(cann): drop auto-triggers (workflow has zero jobs upstream) heiervang-technologies/ht-llama.cpp#80

Open

Conversation

ggerganov commented May 26, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Overview

TODO

Additional information

Requirements

Uh oh!

ggerganov commented May 26, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

IMbackK commented May 26, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

ggerganov commented May 26, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

CISC left a comment

Choose a reason for hiding this comment

Uh oh!

ggerganov commented May 26, 2026

Uh oh!

Uh oh!

arthw commented May 26, 2026

Uh oh!

arthw commented May 26, 2026

Uh oh!

ggerganov commented May 26, 2026

Uh oh!

arthw commented May 26, 2026

Uh oh!

arthw commented May 26, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

IMbackK commented May 26, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

cristianadam commented Jun 2, 2026

Uh oh!

alexander454584-cpu commented Jun 3, 2026

Uh oh!

arthw commented Jun 4, 2026

Uh oh!

ggerganov commented Jun 4, 2026

Uh oh!

AG1M commented Jun 4, 2026

Uh oh!

NeoZhangJianyu commented Jun 5, 2026

Uh oh!

Fmstrat commented Jun 5, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

9 participants

ggerganov commented May 26, 2026 •

edited

Loading

ggerganov commented May 26, 2026 •

edited

Loading

IMbackK commented May 26, 2026 •

edited

Loading

ggerganov commented May 26, 2026 •

edited

Loading

arthw commented May 26, 2026 •

edited

Loading

IMbackK commented May 26, 2026 •

edited

Loading