Skip to content

ci : reduce (disable SYCL and CANN builds/releases)#23705

Merged
ggerganov merged 3 commits into
masterfrom
gg/ci-reduce
May 26, 2026
Merged

ci : reduce (disable SYCL and CANN builds/releases)#23705
ggerganov merged 3 commits into
masterfrom
gg/ci-reduce

Conversation

@ggerganov
Copy link
Copy Markdown
Member

@ggerganov ggerganov commented May 26, 2026

Overview

I believe we are trashing the Github Actions cache too much lately which is causing slow CI overall. This PR aims to disable some of the builds with the goal to lift some of the cache and runners pressure.

  • The SYCL builds alone consume more than 1/3 of the total 10GB cache that we have. I don't think it's reasonable, so disabling them for now. In order to re-enable, we have to provision dedicated runners.
  • The openEuler builds are not consuming cache which is good. However, they allocate slots from the GH hosted runners. I'd like to move these builds to dedicated runners too.

TODO

image

Additional information

Also prefixed the caches with cache-gha- to be able to search and match easily.

Requirements

[no ci]
@github-actions github-actions Bot added the devops improvements to build systems and github actions label May 26, 2026
@ggerganov
Copy link
Copy Markdown
Member Author

ggerganov commented May 26, 2026

@arthw @hipudding PTAL - I am planning to disable the SYCL and CANN builds and releases until we provision more resources.

@IMbackK In case you have some ideas about the ROCm/HIP builds. These are probably likely to stay for now, but if the CI continues to be slow, we'll likely have to remove those too.

Btw, I see that for ROCm we only create a Linux release and for HIP we create only a Windows release. Why is that? I.e. why not create both Linux/Windows for both?

@ggerganov ggerganov marked this pull request as ready for review May 26, 2026 08:33
@ggerganov ggerganov requested a review from a team as a code owner May 26, 2026 08:33
@ggerganov ggerganov changed the title ci : reduce ci : reduce (disable SYCL and CANN builds/releases) May 26, 2026
@IMbackK
Copy link
Copy Markdown
Collaborator

IMbackK commented May 26, 2026

@ggerganov HIP is a programming language of which ROCm is an implementation. As for why the linux build is called rocm and the windows build is called HIP, i have no idea, we should probably just call both of them ROCm as we dont actually support running the HIP backend on platforms other than ROCm anymore (ie hip-cpu or hip-nvidia)

I dont really have any ideas for reducing the CI impact of the HIP backend. Really we should be doing more builds of it, not less, as currently we dont build for all targets until release time, which has cause release time build failures before as we have plenty of ifdefe'd code paths that are only compiled on specific targets.
The hip backend sufferes from this fact and the fact that llvm's amdgcn target is particularly slow.

I dont see any way here other than acquiring more resources.

@ggerganov
Copy link
Copy Markdown
Member Author

ggerganov commented May 26, 2026

Ok, thanks. I'll fix the naming to use ROCm for both.

Edit: will keep the names for now.

Copy link
Copy Markdown
Member

@CISC CISC left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We could also disable caching of the oneAPI toolkit at the risk of download failing...

@ggerganov
Copy link
Copy Markdown
Member Author

We can try this from master. It will depend on how long it takes too.

@ggerganov ggerganov merged commit 3dc7684 into master May 26, 2026
1 check passed
@ggerganov ggerganov deleted the gg/ci-reduce branch May 26, 2026 12:21
@arthw
Copy link
Copy Markdown
Contributor

arthw commented May 26, 2026

@ggerganov
SYCL CI only includes compile and build binary package.
It won't take more computing resource in fact.

SYCL CI has been separated from the build.yml, other backend PR won't trigger SYCL CI action.
I could disable the cache in it.
So that SYCL CI won't take more resource.

Lots of windows users use the binary package directly.
Is it possible to restore the SYCL back CI after disable cache?

Thank you!

@arthw
Copy link
Copy Markdown
Contributor

arthw commented May 26, 2026

Maybe we could reduce the CI workload.
Here is my suggestion: #20446 (comment).
Now build.yml include 12 tasks: mac, vulkan, cuda, cpu, windows.

@ggerganov
Copy link
Copy Markdown
Member Author

How long do the SYCL jobs take without the cache?

@arthw
Copy link
Copy Markdown
Contributor

arthw commented May 26, 2026

about 20 mins: main time is download oneAPI and install it locally.

@arthw
Copy link
Copy Markdown
Contributor

arthw commented May 26, 2026

In a pure CUDA code changed PR: #23349
There are 50 jobs in CI.

But only CUDA jobs is useful in fact.
We could skip other jobs for this case to reduce the work load.

Note, there is no SYCL job running for this PR.

Determine tag name
build-cmake-pkg / linux
model-naming
editorconfig
ubuntu-22-hip-quality-check
labeler
server (default)
server (backend-sampling)
ggml-ci-nvidia-webgpu
macOS-latest-arm64
server-windows
ggml-ci-nvidia-cuda
macOS-latest-x64
ggml-ci-nvidia-vulkan-cm
macOS-latest-arm64-webgpu
ggml-ci-nvidia-vulkan-cm2
ubuntu-cpu (x64, ubuntu-22.04)
ubuntu-cpu (arm64, ubuntu-24.04-arm)
ubuntu-cpu (s390x, ubuntu-24.04-s390x)
ubuntu-cpu (ppc64le, ubuntu-24.04-ppc64le)
ggml-ci-mac-metal
android-arm64
ggml-ci-mac-webgpu
ubuntu-latest-rpc
ggml-ci-mac-vulkan
ubuntu-24-vulkan (x64, ubuntu-24.04)
ubuntu-24-vulkan (arm64, ubuntu-24.04-arm)
ggml-ci-linux-intel-vulkan
ubuntu-24-webgpu
ggml-ci-win-intel-vulkan
ubuntu-24-webgpu-wasm
ggml-ci-intel-openvino-gpu-low-perf
ubuntu-22-hip
ubuntu-22-musa
windows-latest (cpu-x64 (static), x64, -G "Ninja Multi-Config" -D CMAKE_TOOLCHAIN_FILE=cmake/x64-...
windows-latest (openblas-x64, x64, -G "Ninja Multi-Config" -D CMAKE_TOOLCHAIN_FILE=cmake/x64-wind...
windows-latest (vulkan-x64, x64, -DCMAKE_BUILD_TYPE=Release -DGGML_NATIVE=OFF -DLLAMA_BUILD_SERVE...
windows-latest (llvm-arm64, arm64, -G "Ninja Multi-Config" -D CMAKE_TOOLCHAIN_FILE=cmake/arm64-wi...
windows-latest (llvm-arm64-opencl-adreno, arm64, -G "Ninja Multi-Config" -D CMAKE_TOOLCHAIN_FILE=...
ubuntu-latest-cudawindows-2022-cuda (12.4)
windows-latest-hip
ubuntu-cpu-riscv64-native
ggml-ci-x64-cpu-low-perf
ggml-ci-arm64-cpu-low-perf
ggml-ci-x64-cpu-high-perf
ggml-ci-arm64-cpu-high-perf
ggml-ci-arm64-cpu-high-perf-sve
ggml-ci-arm64-cpu-kleidiai
ggml-ci-arm64-cpu-kleidiai-graviton4

@IMbackK
Copy link
Copy Markdown
Collaborator

IMbackK commented May 26, 2026

In the case of that pr its actually only the hip jobs that are useful not the cuda ones altho separating if a change affects the hip backend the cuda backend or both is beyond a ci script.

gabe-l-hart added a commit to gabe-l-hart/llama.cpp that referenced this pull request May 26, 2026
* origin/master: (59 commits)
ggml-zendnn : fixed naming of matmul function (ggml-org#20964)
ci : do not allocate ccache for 3rd-party hosted runners (ggml-org#23730)
ci : move [no release] check to dedicated check_release job (ggml-org#23734)
ci : add `[no release]` keyword + fix sanitizer builds (ggml-org#23728)
ci : move macos jobs to the apple workflow + fix names (ggml-org#23721)
vulkan: optimize conv2d and implement coopmat1 support (ggml-org#22620)
ci : remove vulkan SDK dep from webgpu job (ggml-org#23718)
hexagon: add support for CONCAT op (ggml-org#23648)
ci : move more CPU jobs to self-hosted runners (ggml-org#23715)
ci : move sanitizer jobs to self-hosted runners (ggml-org#23713)
ci : reduce (disable SYCL and CANN builds/releases) (ggml-org#23705)
convert : support Gemma4ForCausalLM architecture (ggml-org#23682)
models : Attach Mistral3 NVFP4 weight scales (ggml-org#23629)
SYCL: implement ggml_sycl_pool_vmm (ggml-org#22862)
tests: test-backend-ops -j <N> to run tests in parallel (ggml-org#23637)
model : add support for talkie-1930-13b (ggml-org#22596)
ggml-webgpu: Add MMVQ path for Q4/Q8/Q2_K/Q4_K and clean up legacy MUL_MAT pipeline (ggml-org#23594)
[WebGPU] Check batch_compute_passes before sending passes when not doing GPU profiling (ggml-org#23457)
CUDA: missing PDL sync for FWHT, better fallback (ggml-org#23690)
metal : add apple device id (ggml-org#23566)
...
fewtarius pushed a commit to fewtarius/llama.cpp that referenced this pull request May 30, 2026
* ci : reduce

[no ci]

* cont : disable sycl, cann + rename caches

[no ci]

* cont : cann

[no ci]
turbo-tan pushed a commit to turbo-tan/llama.cpp-tq3 that referenced this pull request Jun 2, 2026
* ci : reduce

[no ci]

* cont : disable sycl, cann + rename caches

[no ci]

* cont : cann

[no ci]
@cristianadam
Copy link
Copy Markdown

At Qt Creator we save the ccache directory of a build as an artifact. Then when a new build starts it looks over previous artifacts and downloads the corresponding ccache archive.

This bypasses the 10 GB cache that GitHub has, since you can have unlimited space for build artifacts 😅

See https://github.com/qt-creator/qt-creator/blob/master/.github/workflows/build_cmake.yml#L556 for details.

The build artifacts are short lived and I think it doesn't affect GitHub's disk space that much.

I think it won't be too hard for an LLM to convert the CMake code to something else used by llama.cpp's CI build yaml files.

@alexander454584-cpu
Copy link
Copy Markdown

Please return SYCL.

@arthw
Copy link
Copy Markdown
Contributor

arthw commented Jun 4, 2026

@ggerganov @cics
How do you think about the solution of @cristianadam in QT to avoid ccache big size?
If yes, I can implement it in SYCL CI.

Thank you!

@ggerganov
Copy link
Copy Markdown
Member Author

@arthw Would need to see the implementation and the performance to decide. You can give it a try in a fork and when you have something working I'll take a look.

@AG1M
Copy link
Copy Markdown

AG1M commented Jun 4, 2026

Thanks a lot, having the Windows SYCL builds back again would be awesome.

@NeoZhangJianyu
Copy link
Copy Markdown
Contributor

@ggerganov
Got it! I will do it.

Thank you!

@Fmstrat
Copy link
Copy Markdown

Fmstrat commented Jun 5, 2026

Will we get Linux SYCL back?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

devops improvements to build systems and github actions

Projects

None yet

Development

Successfully merging this pull request may close these issues.

9 participants