ci : refactor by ggerganov · Pull Request #23789 · ggml-org/llama.cpp

ggerganov · 2026-05-27T18:40:51Z

Overview

Split CUDA workflows in 2: build-cuda-ubuntu.yml and build-cuda-windows.yml
build-cuda-windows.yml is now manual
The entire release.yml workflow is now executed sequentially in a queue. This allows to have non-timestamped caches for the release jobs, which should reduce the cache usage of the repo
Make ccache names more consistent by prefixing with the workflow name
Disable android ccache - does not seem to improve build time and at the same time takes a lot of space

Additional information

The main goal of this change is to avoid running the heavy CUDA Windows jobs too often. To do that, PRs now will run only the CUDA Ubuntu jobs (includes hip and musa) in build-cuda-ubuntu.yml automatically. The CUDA Windows jobs can only be started manually for a PR with the new build-cuda-windows.yml workflow.

A problem on master was that both the build.yml and release.yml workflows would start the same CUDA Windows jobs. When we miss a hit for the ccache, this results in 2 parallel jobs that run for 3 hours and produce the same ccache twice. With multiple commits to master, this can scale even more. To fix that, all release workflows now run sequentially in a queue:

# note: run this workflow one at a time for better cache reuse
concurrency:
  group: release
  queue: max

I think this should result in better utilization of the runners and the ccache.

The release jobs also have a new ccache policy:

      - name: ccache
        uses: ggml-org/ccache-action@v1.2.21
        with:
          key: release-${{ matrix.os }}-${{ matrix.arch }}
          append-timestamp: false # note: use this only with non-concurrent jobs!

This prevents this from happening:

Now we will have a single cache entry per release job (note there is not timestamp suffix):

We can do that because the jobs do not run concurrently.

Next PRs

Deduplicate the jobs in build-cuda-windows.yml and release.yml

Requirements

I have read and agree with the contributing guidelines
AI usage disclosure: NO

[no ci]

[no ci] Co-authored-by: Georgi Gerganov <ggerganov@gmail.com>

ggerganov · 2026-05-28T06:46:16Z

There is non-negligible chance that this breaks something - will monitor closely.

ggerganov · 2026-05-28T09:29:22Z

A positive effect of the change is that we can now cancel intermediate releases when it makes sense. For example, at the moment we have:

So the 3111 release is ongoing and the next 3 releases are waiting for it to finish. We can cancel 3112 and 3113. This way we jump directly to 3114 saving some CI runs.

This is the release queue: https://github.com/ggml-org/llama.cpp/actions/workflows/release.yml

* ci : separate CUDA windows workflow + fix names * ci : rename workflow * ci : prefix cache names with workflow name * ci : rename build.yml -> build-cpu.yml * ci : cache keys * ci : fix windows cuda/hip concurrency of release workflow * ci : fix apple cache names * ci : add TODOs * cont : keep just the last cache * ci : update release concurrency to queue * ci : move the release trigger to ubuntu-slim * ci : hip add TODO * cont : improve words Co-authored-by: Georgi Gerganov <ggerganov@gmail.com>

CISC · 2026-05-29T21:45:07Z

The release jobs also have a new ccache policy:

      - name: ccache
        uses: ggml-org/ccache-action@v1.2.21
        with:
          key: release-${{ matrix.os }}-${{ matrix.arch }}
          append-timestamp: false # note: use this only with non-concurrent jobs!

This doesn't work as intended, it causes the ccache to never update, leaving outdated caches with currently only 4% hits:

Failed to save: Unable to reserve cache with key ccache-release-windows-2022-x64-cuda-13.3-, another job may be creating this cache.

CISC · 2026-05-29T22:26:05Z

I've deleted the ccaches without timestamp now so the next 6+ releases should be swifter.

ggerganov · 2026-05-30T05:39:16Z

Prepared a fix here: #23895

* ci : separate CUDA windows workflow + fix names * ci : rename workflow * ci : prefix cache names with workflow name * ci : rename build.yml -> build-cpu.yml * ci : cache keys * ci : fix windows cuda/hip concurrency of release workflow * ci : fix apple cache names * ci : add TODOs * cont : keep just the last cache * ci : update release concurrency to queue * ci : move the release trigger to ubuntu-slim * ci : hip add TODO * cont : improve words Co-authored-by: Georgi Gerganov <ggerganov@gmail.com>

ggerganov requested a review from a team as a code owner May 27, 2026 18:40

github-actions Bot added the devops improvements to build systems and github actions label May 27, 2026

ggerganov commented May 27, 2026

View reviewed changes

Comment thread .github/workflows/build-cuda-ubuntu.yml

ggerganov added 12 commits May 28, 2026 08:13

ci : separate CUDA windows workflow + fix names

d83e6c5

ci : rename workflow

c0381e1

ci : prefix cache names with workflow name

acf7181

ci : rename build.yml -> build-cpu.yml

a9cb3cf

ci : cache keys

f993437

ci : fix windows cuda/hip concurrency of release workflow

8eaeb7d

ci : fix apple cache names

250f3a3

ci : add TODOs

1a865fd

cont : keep just the last cache

c9c97d4

[no ci]

ci : update release concurrency to queue

d645f84

ci : move the release trigger to ubuntu-slim

e206253

ci : hip add TODO

7988c6e

ggerganov force-pushed the gg/ci-win-concurrency branch from 7ae06e8 to 7988c6e Compare May 28, 2026 05:14

ggerganov commented May 28, 2026

View reviewed changes

Comment thread .github/workflows/release.yml Outdated

ggerganov commented May 28, 2026

View reviewed changes

Comment thread .github/workflows/build-cuda-windows.yml Outdated

ggerganov commented May 28, 2026

View reviewed changes

Comment thread .github/workflows/build-cuda-windows.yml Outdated

cont : improve words

5ecf7b4

[no ci] Co-authored-by: Georgi Gerganov <ggerganov@gmail.com>

ggerganov merged commit 491c4d7 into master May 28, 2026
1 check passed

ggerganov deleted the gg/ci-win-concurrency branch May 28, 2026 06:44

ggerganov mentioned this pull request May 30, 2026

ci : clear cache instead of "no timestamp" keys + fix macos #23895

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

ci : refactor#23789

ci : refactor#23789
ggerganov merged 13 commits into
masterfrom
gg/ci-win-concurrency

ggerganov commented May 27, 2026 •

edited

Loading

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

ggerganov commented May 28, 2026

Uh oh!

ggerganov commented May 28, 2026 •

edited

Loading

Uh oh!

CISC commented May 29, 2026

Uh oh!

CISC commented May 29, 2026

Uh oh!

ggerganov commented May 30, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

ggerganov commented May 27, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Overview

Additional information

Next PRs

Requirements

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

ggerganov commented May 28, 2026

Uh oh!

ggerganov commented May 28, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

CISC commented May 29, 2026

Uh oh!

CISC commented May 29, 2026

Uh oh!

ggerganov commented May 30, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

ggerganov commented May 27, 2026 •

edited

Loading

ggerganov commented May 28, 2026 •

edited

Loading