In CI skip installing into local repository#13367
Conversation
29db1e2 to
459e586
Compare
03079f1 to
7402157
Compare
7402157 to
202cb86
Compare
|
@hashhar this is now ready for review. I changed install to package in a few places where we're only supposed to build the project and we run I checked that with these changes the size of the local Maven repo is ~1GB and shouldn't grow much. Currently, it's ~7GB which is not sustainable. |
202cb86 to
0081cab
Compare
You can close it and merge this one. |
|
Thanks. just FYI, it's red now. |
e3a43a1 to
fa7c1e9
Compare
|
I made some wrong assumptions before and had to reevaluate my approach here. When we run I reverted replacing |
0f252d6 to
bf9831f
Compare
|
I don't know how our CI caching works. @hashhar can you PTAL? |
|
How much time does our cache save us. If it's not a very large difference I'd probably opt to drop the cache instead. |
.github/workflows/ci.yml
Outdated
There was a problem hiding this comment.
This is the non-controversial part and we can merge it right away if extracted.
bf9831f to
11076f1
Compare
Based on highly scientific tests, that is a single run of this branch Note that this PR attempts to fix the cache to be under 1gb, instead of the 7gb we have now. Downloading that 7gb cache takes 2-5 minutes. I hope to get this under 30 seconds. Then it'll be beneficial, since it saves 50 seconds. Note that not installing artifacts in a local maven repo should not be related to caching it or not. If we don't have to copy files around, we shouldn't. |
|
Oh and I think using the cache should also increase resiliency against Maven Central being flaky sometimes. But I don't have data about how often this happens. |
|
I still see large cache being used - https://github.com/trinodb/trino/runs/7672132855?check_suite_focus=true#step:4:81 |
11076f1 to
b29d60e
Compare
|
I added a temporary commit with a change in the main pom to get a new cache key. When the run completes I'll trigger another one and that will demonstrate the changes. |
62aa0bb to
8f9e851
Compare
|
I give up, I don't know what criteria Github uses when restoring the cache. It's supposed to check for a hash of all pom files, but I modified it and it still uses the cache from the previous workflow run: where I computed the hash and its: Maybe it's trying to be smart and getting some cache when there's no hit. Anyway, I used this action to verify the hash: |
8f9e851 to
cb7d18d
Compare
| retention-days: ${{ env.TEST_REPORT_RETENTION_DAYS }} | ||
| - name: Clean local Maven repo | ||
| if: steps.cache.outputs.cache-hit != 'true' | ||
| run: rm -rf ~/.m2/repository |
There was a problem hiding this comment.
Add a code comment?
The commit changes some install into package, so worth noting why it's still worthwhile to delete ~/.m2/repository.
Also, why ~/.m2/repository and not eg ~/.m2/repository/io/trino ?
There was a problem hiding this comment.
This removes the whole local repo to avoid creating a cache entry from this job since it might not be the one with the most dependencies.


Description
I noticed the size of the cache package with the local Maven repo is about 7.5GB, near the 10GB limit. I think installing Trino packages into the local repository is not necessary, and might both save some time on copying and reduce the size of the cache. Downloading and extracting such a big cache takes about 1.5 minutes.
Related issues, pull requests, and links
Documentation
(x) No documentation is needed.
( ) Sufficient documentation is included in this PR.
( ) Documentation PR is available with #prnumber.
( ) Documentation issue #issuenumber is filed, and can be handled later.
Release notes
(x) No release notes entries required.
( ) Release notes entries required with the following suggested text: