Skip to content
Merged
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
28 changes: 28 additions & 0 deletions .github/workflows/cache_cleanup.yml
Original file line number Diff line number Diff line change
@@ -0,0 +1,28 @@
# https://docs.github.com/en/actions/writing-workflows/choosing-what-your-workflow-does/caching-dependencies-to-speed-up-workflows#force-deleting-cache-entries
name: 🧹 Cache Cleanup
on:
pull_request:
types:
- closed

jobs:
cleanup:
name: Cleanup PR caches
runs-on: ubuntu-latest
steps:
- name: Cleanup
env:
GH_TOKEN: ${{ secrets.GITHUB_TOKEN }}
GH_REPO: ${{ github.repository }}
BRANCH: refs/pull/${{ github.event.pull_request.number }}/merge
run: |
echo "Fetching list of cache key"
cache_keys_for_pr=$(gh cache list --ref $BRANCH --limit 100 --json id --jq '.[].id')
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Note that this deletes only up to 100 caches (that's what the docs example does).

After #104076 we should only have a single cache key pattern per PR, so maybe around 20 caches totally from all our jobs, so it should be fine.

Currently this might not even include all caches in a PR where e.g. a commit was amended 10 times, which would have around 200 unique caches. The rest would be culled after 7 days.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So since #104076 isn't going to work, maybe we need to reconsider how many caches we want to query here, as we might have multiple hundreds in a lot of PRs.

Copy link
Member

@AThousandShips AThousandShips Mar 13, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

100 caches would be a bit over 5 runs of a PR, that's not an unreasonable case, 200 would be pretty sufficient I'd say, that's 10 recent runs

Note also that a significant portion of PRs are merged when they have no caches around, or at most one set likely, the general retention for caches with the pressure we have is about a day, essentially always no older than two days (never seen a cache on the repo that old AFAIK)

So only cases where a PR is amended repeatedly shortly before (within 24 hours) being merged would we need to worry about that I'd say (I occasionally go through and check for old caches that are for merged PRs and there often aren't any/many)

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I checked quickly some recent PRs that had multiple commits today and they all have only 19 caches. So it sounds like previous caches already get dropped somehow, maybe because we're already at such a saturated level that they quickly get garbage collected.

We can reassess after we've reduced our cache congestion further to see if it needs to be increased.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

PRs that were merged recently?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No some that are still open and had multiple commits today, e.g. the Traits PR.

# Setting this to not fail the workflow while deleting cache keys.
set +e
echo "Deleting caches..."
for cache_key in $cache_keys_for_pr; do
gh cache delete $cache_key
echo "Deleted: $cache_key"
done
echo "Done"