Skip to content

In CI skip installing into local repository#13367

Merged
hashhar merged 1 commit intotrinodb:masterfrom
nineinchnick:ci-package-not-install
Aug 5, 2022
Merged

In CI skip installing into local repository#13367
hashhar merged 1 commit intotrinodb:masterfrom
nineinchnick:ci-package-not-install

Conversation

@nineinchnick
Copy link
Member

Description

I noticed the size of the cache package with the local Maven repo is about 7.5GB, near the 10GB limit. I think installing Trino packages into the local repository is not necessary, and might both save some time on copying and reduce the size of the cache. Downloading and extracting such a big cache takes about 1.5 minutes.

Is this change a fix, improvement, new feature, refactoring, or other?
improvement

Is this a change to the core query engine, a connector, client library, or the SPI interfaces? (be specific)
ci

How would you describe this change to a non-technical end user or system administrator?
n/a

Related issues, pull requests, and links

Documentation

(x) No documentation is needed.
( ) Sufficient documentation is included in this PR.
( ) Documentation PR is available with #prnumber.
( ) Documentation issue #issuenumber is filed, and can be handled later.

Release notes

(x) No release notes entries required.
( ) Release notes entries required with the following suggested text:

# Section
* Fix some things. ({issue}`issuenumber`)

@cla-bot cla-bot bot added the cla-signed label Jul 26, 2022
@nineinchnick nineinchnick force-pushed the ci-package-not-install branch from 29db1e2 to 459e586 Compare July 27, 2022 07:29
@nineinchnick nineinchnick force-pushed the ci-package-not-install branch from 03079f1 to 7402157 Compare July 27, 2022 11:16
@nineinchnick nineinchnick added the no-release-notes This pull request does not require release notes entry label Jul 28, 2022
@nineinchnick nineinchnick force-pushed the ci-package-not-install branch from 7402157 to 202cb86 Compare July 28, 2022 10:53
@nineinchnick nineinchnick marked this pull request as ready for review July 28, 2022 10:54
@nineinchnick
Copy link
Member Author

@hashhar this is now ready for review. I changed install to package in a few places where we're only supposed to build the project and we run mvn test right after that. I changed it to verify in the Maven Checks job and I also added changes in the error-prone-checks job that @findepi suggested in #13257

I checked that with these changes the size of the local Maven repo is ~1GB and shouldn't grow much. Currently, it's ~7GB which is not sustainable.

@nineinchnick nineinchnick force-pushed the ci-package-not-install branch from 202cb86 to 0081cab Compare July 28, 2022 11:02
@nineinchnick nineinchnick requested review from findepi and wendigo July 28, 2022 11:16
@findepi
Copy link
Member

findepi commented Jul 28, 2022

I also added changes in the error-prone-checks job that @findepi suggested in #13257

Do you want me to merge or close that PR?

@nineinchnick
Copy link
Member Author

Do you want me to merge or close that PR?

You can close it and merge this one.

@findepi
Copy link
Member

findepi commented Jul 28, 2022

Thanks. just FYI, it's red now.

@nineinchnick
Copy link
Member Author

nineinchnick commented Jul 29, 2022

I made some wrong assumptions before and had to reevaluate my approach here. When we run mvn test there have to be artifacts from other modules in the local repository, because the tested module has to be built. If I skip the build phase, it tries to run the artifact from the local repository, not the build directory.

I reverted replacing install with package in most jobs, except for maven-checks, which should be the only job where the cache can be saved. I added clearing the local repo in other jobs to ensure that. I didn't find any different way to prevent storing the cache.

@nineinchnick nineinchnick force-pushed the ci-package-not-install branch 2 times, most recently from 0f252d6 to bf9831f Compare August 1, 2022 09:57
@nineinchnick nineinchnick requested a review from hashhar August 1, 2022 12:20
@findepi findepi removed their request for review August 2, 2022 18:31
@nineinchnick
Copy link
Member Author

@hashhar @findepi this is ready to go

@findepi
Copy link
Member

findepi commented Aug 2, 2022

I don't know how our CI caching works. @hashhar can you PTAL?

@hashhar
Copy link
Member

hashhar commented Aug 3, 2022

How much time does our cache save us. If it's not a very large difference I'd probably opt to drop the cache instead.

Comment on lines -100 to +103
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is the non-controversial part and we can merge it right away if extracted.

@nineinchnick
Copy link
Member Author

How much time does our cache save us. If it's not a very large difference I'd probably opt to drop the cache instead.

Based on highly scientific tests, that is a single run of this branch

with cache:
image
without cache:
image

Note that this PR attempts to fix the cache to be under 1gb, instead of the 7gb we have now. Downloading that 7gb cache takes 2-5 minutes. I hope to get this under 30 seconds. Then it'll be beneficial, since it saves 50 seconds.

Note that not installing artifacts in a local maven repo should not be related to caching it or not. If we don't have to copy files around, we shouldn't.

@nineinchnick
Copy link
Member Author

Oh and I think using the cache should also increase resiliency against Maven Central being flaky sometimes. But I don't have data about how often this happens.

@hashhar
Copy link
Member

hashhar commented Aug 4, 2022

I still see large cache being used - https://github.com/trinodb/trino/runs/7672132855?check_suite_focus=true#step:4:81

Cache Size: ~6914 MB (7249763541 B)
/usr/bin/tar --use-compress-program zstd -d -xf /home/runner/work/_temp/10163cbd-356c-43ed-98d8-a3350f756b3e/cache.tzst -P -C /home/runner/work/trino/trino
Cache restored successfully

@nineinchnick nineinchnick force-pushed the ci-package-not-install branch from 11076f1 to b29d60e Compare August 4, 2022 18:08
@nineinchnick
Copy link
Member Author

I added a temporary commit with a change in the main pom to get a new cache key. When the run completes I'll trigger another one and that will demonstrate the changes.

@nineinchnick nineinchnick force-pushed the ci-package-not-install branch 13 times, most recently from 62aa0bb to 8f9e851 Compare August 4, 2022 21:38
@nineinchnick
Copy link
Member Author

I give up, I don't know what criteria Github uses when restoring the cache. It's supposed to check for a hash of all pom files, but I modified it and it still uses the cache from the previous workflow run:

Cache restored from key: setup-java-Linux-maven-0a53bc970763b2f9f964aba9ebd2fde72497ff9e8e884aac25d25409d362496b

where I computed the hash and its:

d14eb4b7c95b06aa930bc26818f8be71fb1aba18c6c758adbc7894f6c544a349

Maybe it's trying to be smart and getting some cache when there's no hit. Anyway, I used this action to verify the hash:

      - uses: nineinchnick/github-script@0a9984563e6199a44285d325a9921f1bff9517b2
        with:
          script: |
            const hash = await glob.hashFiles("**/pom.xml", {followSymbolicLinks: false}, true)
            core.info(hash)

@nineinchnick nineinchnick force-pushed the ci-package-not-install branch from 8f9e851 to cb7d18d Compare August 4, 2022 21:47
@hashhar hashhar merged commit 30dd905 into trinodb:master Aug 5, 2022
@github-actions github-actions bot added this to the 393 milestone Aug 5, 2022
@nineinchnick nineinchnick deleted the ci-package-not-install branch August 8, 2022 10:03
retention-days: ${{ env.TEST_REPORT_RETENTION_DAYS }}
- name: Clean local Maven repo
if: steps.cache.outputs.cache-hit != 'true'
run: rm -rf ~/.m2/repository
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Add a code comment?

The commit changes some install into package, so worth noting why it's still worthwhile to delete ~/.m2/repository.

Also, why ~/.m2/repository and not eg ~/.m2/repository/io/trino ?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This removes the whole local repo to avoid creating a cache entry from this job since it might not be the one with the most dependencies.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is it worth a code comment?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

cla-signed no-release-notes This pull request does not require release notes entry

Development

Successfully merging this pull request may close these issues.

3 participants