ci : refactor#23789
Conversation
7ae06e8 to
7988c6e
Compare
[no ci] Co-authored-by: Georgi Gerganov <ggerganov@gmail.com>
|
There is non-negligible chance that this breaks something - will monitor closely. |
|
A positive effect of the change is that we can now cancel intermediate releases when it makes sense. For example, at the moment we have:
So the 3111 release is ongoing and the next 3 releases are waiting for it to finish. We can cancel 3112 and 3113. This way we jump directly to 3114 saving some CI runs. This is the release queue: https://github.com/ggml-org/llama.cpp/actions/workflows/release.yml |
* ci : separate CUDA windows workflow + fix names * ci : rename workflow * ci : prefix cache names with workflow name * ci : rename build.yml -> build-cpu.yml * ci : cache keys * ci : fix windows cuda/hip concurrency of release workflow * ci : fix apple cache names * ci : add TODOs * cont : keep just the last cache * ci : update release concurrency to queue * ci : move the release trigger to ubuntu-slim * ci : hip add TODO * cont : improve words Co-authored-by: Georgi Gerganov <ggerganov@gmail.com>
* ci : separate CUDA windows workflow + fix names * ci : rename workflow * ci : prefix cache names with workflow name * ci : rename build.yml -> build-cpu.yml * ci : cache keys * ci : fix windows cuda/hip concurrency of release workflow * ci : fix apple cache names * ci : add TODOs * cont : keep just the last cache * ci : update release concurrency to queue * ci : move the release trigger to ubuntu-slim * ci : hip add TODO * cont : improve words Co-authored-by: Georgi Gerganov <ggerganov@gmail.com>
This doesn't work as intended, it causes the ccache to never update, leaving outdated caches with currently only 4% hits: |
|
I've deleted the ccaches without timestamp now so the next 6+ releases should be swifter. |
|
Prepared a fix here: #23895 |
* ci : separate CUDA windows workflow + fix names * ci : rename workflow * ci : prefix cache names with workflow name * ci : rename build.yml -> build-cpu.yml * ci : cache keys * ci : fix windows cuda/hip concurrency of release workflow * ci : fix apple cache names * ci : add TODOs * cont : keep just the last cache * ci : update release concurrency to queue * ci : move the release trigger to ubuntu-slim * ci : hip add TODO * cont : improve words Co-authored-by: Georgi Gerganov <ggerganov@gmail.com>
* ci : separate CUDA windows workflow + fix names * ci : rename workflow * ci : prefix cache names with workflow name * ci : rename build.yml -> build-cpu.yml * ci : cache keys * ci : fix windows cuda/hip concurrency of release workflow * ci : fix apple cache names * ci : add TODOs * cont : keep just the last cache * ci : update release concurrency to queue * ci : move the release trigger to ubuntu-slim * ci : hip add TODO * cont : improve words Co-authored-by: Georgi Gerganov <ggerganov@gmail.com>
* ci : separate CUDA windows workflow + fix names * ci : rename workflow * ci : prefix cache names with workflow name * ci : rename build.yml -> build-cpu.yml * ci : cache keys * ci : fix windows cuda/hip concurrency of release workflow * ci : fix apple cache names * ci : add TODOs * cont : keep just the last cache * ci : update release concurrency to queue * ci : move the release trigger to ubuntu-slim * ci : hip add TODO * cont : improve words Co-authored-by: Georgi Gerganov <ggerganov@gmail.com>

Overview
build-cuda-ubuntu.ymlandbuild-cuda-windows.ymlbuild-cuda-windows.ymlis now manualrelease.ymlworkflow is now executed sequentially in a queue. This allows to have non-timestamped caches for the release jobs, which should reduce the cache usage of the repoAdditional information
The main goal of this change is to avoid running the heavy CUDA Windows jobs too often. To do that, PRs now will run only the CUDA Ubuntu jobs (includes hip and musa) in
build-cuda-ubuntu.ymlautomatically. The CUDA Windows jobs can only be started manually for a PR with the newbuild-cuda-windows.ymlworkflow.A problem on
masterwas that both thebuild.ymlandrelease.ymlworkflows would start the same CUDA Windows jobs. When we miss a hit for the ccache, this results in 2 parallel jobs that run for 3 hours and produce the same ccache twice. With multiple commits to master, this can scale even more. To fix that, all release workflows now run sequentially in a queue:I think this should result in better utilization of the runners and the ccache.
The release jobs also have a new ccache policy:
This prevents this from happening:
Now we will have a single cache entry per release job (note there is not timestamp suffix):
We can do that because the jobs do not run concurrently.
Next PRs
build-cuda-windows.ymlandrelease.ymlRequirements