Skip to content

ci : refactor#23789

Merged
ggerganov merged 13 commits into
masterfrom
gg/ci-win-concurrency
May 28, 2026
Merged

ci : refactor#23789
ggerganov merged 13 commits into
masterfrom
gg/ci-win-concurrency

Conversation

@ggerganov
Copy link
Copy Markdown
Member

@ggerganov ggerganov commented May 27, 2026

Overview

  • Split CUDA workflows in 2: build-cuda-ubuntu.yml and build-cuda-windows.yml
  • build-cuda-windows.yml is now manual
  • The entire release.yml workflow is now executed sequentially in a queue. This allows to have non-timestamped caches for the release jobs, which should reduce the cache usage of the repo
  • Make ccache names more consistent by prefixing with the workflow name
  • Disable android ccache - does not seem to improve build time and at the same time takes a lot of space

Additional information

The main goal of this change is to avoid running the heavy CUDA Windows jobs too often. To do that, PRs now will run only the CUDA Ubuntu jobs (includes hip and musa) in build-cuda-ubuntu.yml automatically. The CUDA Windows jobs can only be started manually for a PR with the new build-cuda-windows.yml workflow.

A problem on master was that both the build.yml and release.yml workflows would start the same CUDA Windows jobs. When we miss a hit for the ccache, this results in 2 parallel jobs that run for 3 hours and produce the same ccache twice. With multiple commits to master, this can scale even more. To fix that, all release workflows now run sequentially in a queue:

# note: run this workflow one at a time for better cache reuse
concurrency:
  group: release
  queue: max

I think this should result in better utilization of the runners and the ccache.

The release jobs also have a new ccache policy:

      - name: ccache
        uses: ggml-org/ccache-action@v1.2.21
        with:
          key: release-${{ matrix.os }}-${{ matrix.arch }}
          append-timestamp: false # note: use this only with non-concurrent jobs!

This prevents this from happening:

image

Now we will have a single cache entry per release job (note there is not timestamp suffix):

image

We can do that because the jobs do not run concurrently.

Next PRs

  • Deduplicate the jobs in build-cuda-windows.yml and release.yml

Requirements

@ggerganov ggerganov requested a review from a team as a code owner May 27, 2026 18:40
@github-actions github-actions Bot added the devops improvements to build systems and github actions label May 27, 2026
Comment thread .github/workflows/build-cuda-ubuntu.yml
@ggerganov ggerganov force-pushed the gg/ci-win-concurrency branch from 7ae06e8 to 7988c6e Compare May 28, 2026 05:14
Comment thread .github/workflows/release.yml Outdated
Comment thread .github/workflows/build-cuda-windows.yml Outdated
Comment thread .github/workflows/build-cuda-windows.yml Outdated
[no ci]

Co-authored-by: Georgi Gerganov <ggerganov@gmail.com>
@ggerganov ggerganov merged commit 491c4d7 into master May 28, 2026
1 check passed
@ggerganov ggerganov deleted the gg/ci-win-concurrency branch May 28, 2026 06:44
@ggerganov
Copy link
Copy Markdown
Member Author

There is non-negligible chance that this breaks something - will monitor closely.

@ggerganov
Copy link
Copy Markdown
Member Author

ggerganov commented May 28, 2026

A positive effect of the change is that we can now cancel intermediate releases when it makes sense. For example, at the moment we have:

image

So the 3111 release is ongoing and the next 3 releases are waiting for it to finish. We can cancel 3112 and 3113. This way we jump directly to 3114 saving some CI runs.

This is the release queue: https://github.com/ggml-org/llama.cpp/actions/workflows/release.yml

adrianhoehne pushed a commit to adrianhoehne/llama.cpp that referenced this pull request May 28, 2026
* ci : separate CUDA windows workflow + fix names

* ci : rename workflow

* ci : prefix cache names with workflow name

* ci : rename build.yml -> build-cpu.yml

* ci : cache keys

* ci : fix windows cuda/hip concurrency of release workflow

* ci : fix apple cache names

* ci : add TODOs

* cont : keep just the last cache

* ci : update release concurrency to queue

* ci : move the release trigger to ubuntu-slim

* ci : hip add TODO

* cont : improve words

Co-authored-by: Georgi Gerganov <ggerganov@gmail.com>
adrianhoehne pushed a commit to adrianhoehne/llama.cpp that referenced this pull request May 28, 2026
* ci : separate CUDA windows workflow + fix names

* ci : rename workflow

* ci : prefix cache names with workflow name

* ci : rename build.yml -> build-cpu.yml

* ci : cache keys

* ci : fix windows cuda/hip concurrency of release workflow

* ci : fix apple cache names

* ci : add TODOs

* cont : keep just the last cache

* ci : update release concurrency to queue

* ci : move the release trigger to ubuntu-slim

* ci : hip add TODO

* cont : improve words

Co-authored-by: Georgi Gerganov <ggerganov@gmail.com>
@CISC
Copy link
Copy Markdown
Member

CISC commented May 29, 2026

The release jobs also have a new ccache policy:

      - name: ccache
        uses: ggml-org/ccache-action@v1.2.21
        with:
          key: release-${{ matrix.os }}-${{ matrix.arch }}
          append-timestamp: false # note: use this only with non-concurrent jobs!

This doesn't work as intended, it causes the ccache to never update, leaving outdated caches with currently only 4% hits:

Failed to save: Unable to reserve cache with key ccache-release-windows-2022-x64-cuda-13.3-, another job may be creating this cache.

@CISC
Copy link
Copy Markdown
Member

CISC commented May 29, 2026

I've deleted the ccaches without timestamp now so the next 6+ releases should be swifter.

@ggerganov
Copy link
Copy Markdown
Member Author

Prepared a fix here: #23895

fewtarius pushed a commit to fewtarius/llama.cpp that referenced this pull request May 30, 2026
* ci : separate CUDA windows workflow + fix names

* ci : rename workflow

* ci : prefix cache names with workflow name

* ci : rename build.yml -> build-cpu.yml

* ci : cache keys

* ci : fix windows cuda/hip concurrency of release workflow

* ci : fix apple cache names

* ci : add TODOs

* cont : keep just the last cache

* ci : update release concurrency to queue

* ci : move the release trigger to ubuntu-slim

* ci : hip add TODO

* cont : improve words

Co-authored-by: Georgi Gerganov <ggerganov@gmail.com>
adrianhoehne pushed a commit to adrianhoehne/llama.cpp that referenced this pull request Jun 1, 2026
* ci : separate CUDA windows workflow + fix names

* ci : rename workflow

* ci : prefix cache names with workflow name

* ci : rename build.yml -> build-cpu.yml

* ci : cache keys

* ci : fix windows cuda/hip concurrency of release workflow

* ci : fix apple cache names

* ci : add TODOs

* cont : keep just the last cache

* ci : update release concurrency to queue

* ci : move the release trigger to ubuntu-slim

* ci : hip add TODO

* cont : improve words

Co-authored-by: Georgi Gerganov <ggerganov@gmail.com>
turbo-tan pushed a commit to turbo-tan/llama.cpp-tq3 that referenced this pull request Jun 2, 2026
* ci : separate CUDA windows workflow + fix names

* ci : rename workflow

* ci : prefix cache names with workflow name

* ci : rename build.yml -> build-cpu.yml

* ci : cache keys

* ci : fix windows cuda/hip concurrency of release workflow

* ci : fix apple cache names

* ci : add TODOs

* cont : keep just the last cache

* ci : update release concurrency to queue

* ci : move the release trigger to ubuntu-slim

* ci : hip add TODO

* cont : improve words

Co-authored-by: Georgi Gerganov <ggerganov@gmail.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

devops improvements to build systems and github actions

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants