Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Figure out if we can cache pkgs folder #11

Closed
wolfv opened this issue Jan 26, 2021 · 13 comments
Closed

Figure out if we can cache pkgs folder #11

wolfv opened this issue Jan 26, 2021 · 13 comments

Comments

@wolfv
Copy link
Member

wolfv commented Jan 26, 2021

Github Actions supports caching (up to 5 Gb).

We could cache pkgs so that they don't have to be re-downloaded and unpacked every time. That could give another decent speed boost.

@wolfv
Copy link
Member Author

wolfv commented Feb 19, 2021

micromamba now has a micromamba clean command that removes un-used packages.

Could be useful for this

@wolfv
Copy link
Member Author

wolfv commented Sep 22, 2021

Additionally it would be really cool to have automatic ccache caching when using the micromamba action

@wolfv
Copy link
Member Author

wolfv commented Sep 22, 2021

This is how the cache stuff from @actions can be used from JS:

https://github.com/actions/toolkit/tree/main/packages/cache#usage

@maartenbreddels
Copy link

In my experience the caching and restoring for large environments was slower than building the environment, i suspected this was due to compression and/or decompression. Are the unzipped downloads stored somewhere?

@wolfv
Copy link
Member Author

wolfv commented Oct 15, 2021

In the pkgs folder you have the cached tarballs and extracted directories.

We could remove tarballs with micromamba clean -t and then only cache the extracted directories

@maartenbreddels
Copy link

In vaexio/vaex#1648
I see

/usr/bin/tar -cz -f /home/runner/work/_temp/dc3fb938-fa46-42b4-b219-a4a921e8d3b5/cache.tgz -C /usr/share/miniconda/envs/vaex-dev .

Taking a while (4 minutes 25 seconds)

@wolfv
Copy link
Member Author

wolfv commented Oct 15, 2021

yeah that's a bit different from my suggestion but may also work well ...
Well, it's just zipping quite a few files, depending on the env size, so it's kinda expected to take some time to cache/and un-cache :)

@maartenbreddels
Copy link

Comparing zipping the env, vs the pkgs directory:

/usr/bin/tar -cvz -f cache.tar.gz envs/vaex-dev  94,38s user 4,08s system 90% cpu 1:48,75 total
$ ls -alh cache.tar.gz
-rw-rw-r-- 1 maartenbreddels maartenbreddels 893M okt 15 11:03 cache.tar.gz


/usr/bin/tar -cvz -f pkg.tar.gz pkgs  19,01s user 1,05s system 104% cpu 19,253 total
$ ls -alh pkg.tar.gz
-rw-rw-r-- 1 maartenbreddels maartenbreddels 773M okt 15 11:04 pkg.tar.gz

On my local computer it's 5x faster, and somewhat smaller.

@maartenbreddels
Copy link

Also wondering how it should play along with an explicit file, because I'm now in a situation where i put in a dummy environment.yml, and then depending if I have a cache hit (so pkgs and the explicit file is given back to me) i will install from the environment file or from the explicit file.

@wolfv
Copy link
Member Author

wolfv commented Oct 15, 2021

I think we should have an environment.yml file and then generate a lock file per platform. Then -- when we have a cache hit -- we will recreate the env from the explicit env/lock file.

The lock file should have an expiry date -- so if we hit the cache but it's older than e.g. 7 days we recreate the lock file (this value should be configurable).

@jonashaag
Copy link
Collaborator

I've spent some time optimizing env/pkg cache for some repos recently. Observations:

  • Having a pkg cache is always faster than not having one.
  • On Windows, having an env cache is MUCH faster than having a pkg cache.

Nowadays the cache action uses zstd so decompression should be really fast.

@jonashaag
Copy link
Collaborator

jonashaag commented Feb 2, 2022

By the way we could also cache the micromamba download, although it that would be more of a traffic reduction technique than a speedup because the download is really fast.

Update: On this repo's CI, the last Windows download took 10s and Ubuntu 1s.

@jonashaag
Copy link
Collaborator

Fixed with #38

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants