Skip to content

Improve CI cache usage#14880

Closed
nineinchnick wants to merge 2 commits intotrinodb:masterfrom
nineinchnick:ci-cache
Closed

Improve CI cache usage#14880
nineinchnick wants to merge 2 commits intotrinodb:masterfrom
nineinchnick:ci-cache

Conversation

@nineinchnick
Copy link
Copy Markdown
Member

Description

I noticed we're still creating too many cache entries so it gets cleared too often, so we should not create them on forks, because such entries cannot be reused except for that fork and branch.

Ref: https://docs.github.com/en/actions/using-workflows/caching-dependencies-to-speed-up-workflows#restrictions-for-accessing-a-cache

I also noticed downloading the cache sometimes gets stuck, like here: https://github.com/trinodb/trino/actions/runs/3384101557/jobs/5620679970. There's a new setting in the cache action documented here: https://github.com/actions/cache/blob/main/workarounds.md#cache-segment-restore-timeout and it should be available because actions/setup-java is using actions/cache 3.0.4 (the JS lib, not the GH action): https://github.com/actions/setup-java/blob/main/package.json

Ref: https://github.com/actions/toolkit/blob/main/packages/cache/RELEASES.md#304

Non-technical explanation

n/a

Release notes

(x) This is not user-visible or docs only and no release notes are required.
( ) Release notes are required, please propose a release note for me.
( ) Release notes are required, with the following suggested text:

Caches created on forks cannot be shared so don't create them to avoid
wasting time and wasting cache limits.
Copy link
Copy Markdown
Member

@hashhar hashhar left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

BTW is this really a problem though? We can have around 6 cache entries with current size.

Only the latest one on master will get used anyway.

This change would mean PRs which change pom.xml will no longer benefit from caching (since the only reason a new cache entry would be created on fork would be if pom.xml changed).

@nineinchnick
Copy link
Copy Markdown
Member Author

nineinchnick commented Nov 3, 2022

BTW is this really a problem though? We can have around 6 cache entries with current size.

We see a lot of failures related to fetching artifacts from Maven Central. I want to increase the chances of a cache hit by avoiding the eviction of cache entries, especially if most of them cannot be reused.

Also, we waste time (50s) on uploading a cache entry that won't be reused:
image

Only the latest one on master will get used anyway.

Only if it exists.

This change would mean PRs which change pom.xml will no longer benefit from caching (since the only reason a new cache entry would be created on fork would be if pom.xml changed).

They don't benefit from cache anyway, since they can't use the one from master, because the cache key is a hash of all pom files. We could work around it by using a different restore key: https://github.com/actions/cache/blob/main/workarounds.md#update-a-cache but:

  • this input param is not exposed in setup-java, we'd need to extract the usage of the cache action
  • there are relatively few commits in the last 3 months that change poms:
 release_changes | pom_changes | all_changes 
-----------------+-------------+-------------
            1818 |         696 |       11723 

query:

SELECT
    count(*) FILTER (WHERE d.path_name like '%pom.xml' AND c.message like '[maven-release-plugin] %') AS release_changes
  , count(*) FILTER (WHERE d.path_name like '%pom.xml' AND c.message not like '[maven-release-plugin] %') AS pom_changes
  , count(*) AS all_changes
FROM git.default.commits c
JOIN git.default.diff_stats d ON d.commit_id = c.object_id
WHERE c.commit_time > NOW() - INTERVAL '3' MONTH
;

This is a different problem to solve in a separate PR though (restoring the cache vs creating a new cache entry).

Default timeout is 60 minutes, after which a stuck download would be
retried. Lower it to start retrying downloads when they get stuck.
@hashhar
Copy link
Copy Markdown
Member

hashhar commented Nov 3, 2022

Only the latest one on master will get used anyway.

Only if it exists.

Why would it not exist? The tip of master always creates a cache entry and since GHA rebases PRs onto master before running CI unless you get caught in the very tiny window where some PR is merged to master but it's cache is not yet uploaded only then would the cache not exist.

They don't benefit from cache anyway, since they can't use the one from master,

Only if you assume CI runs just once on such a PR which is rarely true.

For example if you go to https://github.com/trinodb/trino/actions/caches and see which one is getting used recently and when it was created you'll notice the one created from tip of master is most used one. But also others created from forks get used relatively often as well.

Note that the cache is LRU, not FIFO so as long as something uses the cache from tip of master it'll never get evicted - just eventually replaced when some other PR is merged to master.

@nineinchnick
Copy link
Copy Markdown
Member Author

Only the latest one on master will get used anyway.

Only if it exists.

Why would it not exist? The tip of master always creates a cache entry and since GHA rebases PRs onto master before running CI unless you get caught in the very tiny window where some PR is merged to master but it's cache is not yet uploaded only then would the cache not exist.

They don't benefit from cache anyway, since they can't use the one from master,

Only if you assume CI runs just once on such a PR which is rarely true.

For example if you go to https://github.com/trinodb/trino/actions/caches and see which one is getting used recently and when it was created you'll notice the one created from tip of master is most used one. But also others created from forks get used relatively often as well.

Note that the cache is LRU, not FIFO so as long as something uses the cache from tip of master it'll never get evicted - just eventually replaced when some other PR is merged to master.

You're right. We should first make sure that forks can reuse the cache from master, then we can analyze how often they reuse their own cache. I extracted the other commit into #14881, will close this one and work on using restore-key.

@nineinchnick
Copy link
Copy Markdown
Member Author

nineinchnick commented Nov 3, 2022

I just opened #14882 and trinodb/github-actions#15 if we decide to use this action in #14865.

@nineinchnick nineinchnick deleted the ci-cache branch November 3, 2022 14:25
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Development

Successfully merging this pull request may close these issues.

2 participants