-
-
Notifications
You must be signed in to change notification settings - Fork 23.7k
CI: Don't include commit sha in cache key, ensure caches don't stay reserved #104076
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
| - name: Restore Godot build cache | ||
| uses: ./.github/actions/godot-cache-restore | ||
| with: | ||
| cache-name: ${{ matrix.cache-name }} | ||
| continue-on-error: true |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I just moved the cache restore and save as close as possible to the compiling step to minimal the time where the cache is locked.
| scons-cache-limit: ${{ matrix.cache-limit }} | ||
|
|
||
| - name: Save Godot build cache | ||
| if: always() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This makes it run even if previous steps fail, which ensures that if the cache was locked, it gets unlocked.
If the job fails before cache/restore, then this step still runs and passes fine with just warnings:
Warning: Path Validation Error: Path(s) specified in the action for caching do(es) not exist, hence no cache is being saved.
Warning: Cache save failed.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What I haven't tested though is what happens if the job is canceled (e.g. by pushing a new commit to the PR branch). Let's test this in this PR and makes sure it also runs this steps and unlocks the cache key.
This comment was marked as resolved.
This comment was marked as resolved.
|
Once the CI is done let's push some unrelated change to confirm it replaces the cache entry correctly |
I've actually put that one in its own PR as it can be merged independently from this: #104077.
Feel free to push commits to my PR branch to do tests, I probably won't have too much time to do it myself. |
…eserved Including the commit sha in the cache key means that we create a new unique cache for every PR commit push or merge event. Previous caches for obsolete commits don't get cleaned up until after 7 days and so our cache quota gets filled extremely fast, occasionally leading to losing our `master` branch cache. Through some tests, I found that it works fine to just use the branch ref as part of the cache key, and thus expect that: - Only the latest commit in the `master` or other dev branches has a cache. - Only the latest commit in a PR has a cache. So cache keys take the form: ``` linux-template-minimal|master linux-template-minimal|104076/merge ``` When e.g. merging a PR in `master`, its CI workflow will reuse the existing `linux-template-minimal|master` cache, and then replace it with an updated cache using the same key (so the old cache is discarded). There's a potential problem though, which is that when restoring a cache, GitHub Actions puts a "reserved" lock on it to prevent concurrent access from other workflows. This lock is removed when saving the cache in the same job, but if an intermediate step fails and terminates the job before that, the cache stays stuck in a reserved state (at least for one hour). Subsequent workflows can't save to it, instead getting an error like this: ``` Failed to save: Unable to reserve cache with key linux-template-minimal|104076/merge, another job may be creating this cache. ``` We work it around by ensuring that the `cache/save` action always run even if previous steps failed, so that the lock is removed, if there's any. This should drastically reduce our cache congestion.
5529d07 to
553fbeb
Compare
|
Unfortunately seems to fail to save the cache https://github.com/godotengine/godot/actions/runs/13839172307/job/38721986719?pr=104076#step:9:35 |
|
The SHA was required because caches are immutable. If we wanted to save with the exact same name, it'd need to be immediately preceeded by deleting the existing cache. |
Hm, seems like I misread my results in another repo where I experimented with this. It looked to me like it was able to just replace the cache by a new entry with the same name, but now I see indeed that it's failing and still using a 2 days old cache from when I first did that change. This isn't going to work then :( |
Including the commit sha in the cache key means that we create a new unique cache for every PR commit push or merge event. Previous caches for obsolete commits don't get cleaned up until after 7 days and so our cache quota gets filled extremely fast, occasionally leading to losing our
masterbranch cache.Through some tests, I found that it works fine to just use the branch ref as part of the cache key, and thus expect that:
masteror other dev branches has a cache.So cache keys take the form:
When e.g. merging a PR in
master, its CI workflow will reuse the existinglinux-template-minimal|mastercache, and then replace it with an updated cache using the same key (so the old cache is discarded).There's a potential problem though, which is that when restoring a cache, GitHub Actions puts a "reserved" lock on it to prevent concurrent access from other workflows. This lock is removed when saving the cache in the same job, but if an intermediate step fails and terminates the job before that, the cache stays stuck in a reserved state (at least for one hour).
Subsequent workflows can't save to it, instead getting an error like this:
We work it around by ensuring that the
cache/saveaction always run even if previous steps failed, so that the lock is removed, if there's any.This should drastically reduce our cache congestion.