Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Introduce pex3 cache {dir,info,purge}. #2513

Merged
merged 8 commits into from
Sep 4, 2024
Merged

Conversation

jsirois
Copy link
Member

@jsirois jsirois commented Aug 21, 2024

Re-structure the Pex cache to both support versioning as well as adding
access tracking for shared (normal) use and for exclusive use when
portions of the cache need to be deleted. With this new ground work, add
a new pex3 cache {dir,info,purge} family of commands for inspecting
and safely trimming the Pex cache.

Closes #1176
Closes #1655
Closes #2201

@jsirois jsirois marked this pull request as ready for review September 2, 2024 21:53
@jsirois jsirois changed the title Introduce versioned cache dirs. Introduce pex3 cache {dir,info,purge}. Sep 2, 2024
@jsirois
Copy link
Member Author

jsirois commented Sep 2, 2024

Pants folks - I added you since you have: pantsbuild/pants#11167

This initial introduction of Pex cache management commands does not have any JSON output option - its just for humans currently. The linked Pants issue seems to require structured information though. Much like Pants needing to come up with a representation it prefers for dependency graphs (I assume that's still not done), this integration point also needs similar spec'ing. If you have opinions, or better, a spec, I'd be happy to follow up with support for various alternate output formats for the cache usage information.

management = [
# N.B.: Released on 2017-09-01 and added support for the `process_iter(attrs, ad_value)` API we
# use in `pex.cache.access`.
"psutil>=5.3"
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

N.B.: If Pants wants to use these cache management commands and it still uses the Pex PEX releases, I'll either need to add a pex+management PEX to the release that embeds psutil for all supported platforms or else start releasing Pex PEX scies, which naturally handle platform specific deps. I favor the scies here since the pex+management approach doesn't scale well once more CLI-specific-deps are added, but I wanted to get Pants maintainers opinions on this.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

... there's a longer story on why I went with psutil / access info gathering at delete-time, but suffice it to say, recording usage info in the read-write access lock path (I used sqlite3) added ~2-30ms overhead to every PEX launch and I did not deem that acceptable.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pants currently still uses the Pex PEX releases, but could straightforwardly switch to Pex scies (assuming they are drop in replacements for each other in terms of the CLI interface, which I assume would be the case). Thanks!

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

One thing I wasn't clear on: If not scies, why would the alternative be a separate pex+management PEX rather than embedding psutil for all platforms in the multiplatform pex PEX? I can see why you want psutil in the management extra, to keep the base Pex dist widely installable, but does that have to be mirrored in the release PEX?

Copy link
Member Author

@jsirois jsirois Sep 3, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Well, two reasons:

  1. It means embedding all these wheels in the Pex PEX or the Pex+management PEX - so, in part size:
    • psutil-5.9.5-cp27-cp27m-macosx_10_9_x86_64.whl
    • psutil-5.9.5-cp27-cp27m-manylinux2010_i686.whl
    • psutil-5.9.5-cp27-cp27m-manylinux2010_x86_64.whl
    • psutil-5.9.5-cp27-cp27mu-manylinux2010_i686.whl
    • psutil-5.9.5-cp27-cp27mu-manylinux2010_x86_64.whl
    • psutil-5.9.5-cp36-abi3-macosx_10_9_x86_64.whl
    • psutil-5.9.5-cp36-abi3-manylinux_2_12_i686.manylinux2010_i686.manylinux_2_17_i686.manylinux2014_i686.whl
    • psutil-5.9.5-cp36-abi3-manylinux_2_12_x86_64.manylinux2010_x86_64.manylinux_2_17_x86_64.manylinux2014_x86_64.whl
    • psutil-5.9.5-cp38-abi3-macosx_11_0_arm64.whl
  2. But it also means inventing a bit of new functionality to support a Universal Target to complement the current AbbreviatedPlatform, CompletePLatform and LocalInterpreter Targets - this new Target type would only work when the Pex resolve was against a --lock or a --pex-repository and it would be able to grab all the wheels listed in 1 despite a machine not having all the corresponding LocalInterpreter targets available.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hrm, I think I need to do most of 2 anyhow to support creating a lock for the Pex scies with something like pex3 lock create --project ".[management]" --pip-version latest --style universal --target-system linux --target-system mac --interpreter-constraint "CPython==3.12.*" -o pex-scie.lock. Alternatively, I could create a manual workflow that generated a complete platform for each of the 4 supported platforms using the scie PBS and generate a strict multi-lock using those 4 --complete-platform targets.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If that's a hassle then an alternative is for Pants to install pex as a dist, with the management extra, when it needs to run management commands?

Copy link
Member Author

@jsirois jsirois Sep 4, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Its free to do so, although, last I knew this introduces a whole can of worms it currently leaves to the Pex PEX "binary" which it only needs to know how to download. Now you're in the land of building a venv, caching it somewhere efficiently, etc.

I'll definitely be at least adding Pex scies to this release anyhow; so your choice.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I was thinking of downloading the Pex PEX as before for most uses, but then using that to bootstrap a Pex venv if needed for this one use case. But obviously Pex scies would be better.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Aha, OK. Yeah - that would work. I should have the Pex scies + release PRs up today.

@jsirois
Copy link
Member Author

jsirois commented Sep 2, 2024

Some examples.

Get cache info sorted by size:

:; python -mpex.cli cache info -HS
Path: /home/jsirois/.cache/pex

Pex Docs: docs/0
Artifacts used in serving Pex docs via `pex --docs` and `pex3 docs`.
0 bytes in 0 subdirectories and 0 files.

Abbreviated Platforms: platforms/0
Information calculated about abbreviated platforms specified via `--platform`.
291 kB in 13 subdirectories and 26 files.

User Code: user_code/0
User code added to PEX files using `-D` / `--sources-directory`, `-P` / `--package` and `-M` / `--module`.
531 kB in 117 subdirectories and 168 files.

Packed Bootstraps: bootstrap_zips/0
PEX runtime bootstrap code, zipped up for `--layout packed` PEXes.
1.42 MB in 2 subdirectories and 4 files.

Unzipped PEXes: unzipped_pexes/0
The unzipped PEX files executed on this machine.
3.12 MB in 483 subdirectories and 1824 files.

Packed Wheels: packed_wheels/0
The same content as 'installed_wheels/0', but zipped up for `--layout packed` PEXes.
6.32 MB in 6 subdirectories and 12 files.

Interpreters: interpreters/0
Information about interpreters found on the system.
14.4 MB in 374 subdirectories and 704 files.

Bootstraps: bootstraps/0
PEX runtime bootstrap code.
19.5 MB in 125 subdirectories and 1657 files.

Pex Tools: tools/0
Caches for the various `PEX_TOOLS=1` / `pex-tools` subcommands.
21.4 MB in 254 subdirectories and 1886 files.

Built Wheels: built_wheels/0
Wheels built by Pex from resolved sdists when creating PEX files.
29.6 MB in 156 subdirectories and 59 files.

Isolated Pex Code: isolated/0
The Pex codebase isolated for internal use in subprocesses.
45.0 MB in 367 subdirectories and 3803 files.

Scie Tools: scies/0
Tools and caches used when building PEX scies via `--scie {eager,lazy}`.
151 MB in 5 subdirectories and 40 files.

Lock Artifact Downloads: downloads/0
Distributions downloaded when resolving from a Pex lock file.
236 MB in 379 subdirectories and 1980 files.

Pip Versions: pip/0
Isolated Pip caches and Pip PEXes Pex uses to resolve distributions.
324 MB in 3349 subdirectories and 17300 files.

Virtual Environments: venvs/0
Virtual environments generated at runtime for `--venv` mode PEXes.
583 MB in 9165 subdirectories and 40569 files.

Pre-installed Wheels: installed_wheels/0
Pre-installed wheel chroots used to both build PEXes and serve as runtime `sys.path` entries.
961 MB in 8975 subdirectories and 51739 files.

Total: 2.40 GB in 23770 subdirectories and 121771 files.

Dry run purge of just installed_wheels cache:

:; python -mpex.cli cache purge --entries installed_wheels -nRH
Would purge requested entries from /home/jsirois/.cache/pex: installed_wheels/0
Would also purge those entries transitive dependents in: unzipped_pexes/0, venvs/0

Would have purged cache Unzipped PEXes from unzipped_pexes/0
3.12 MB in 483 subdirectories and 1824 files.

Would have purged cache Pre-installed Wheels from installed_wheels/0
961 MB in 8975 subdirectories and 51739 files.

Would have purged cache Virtual Environments from venvs/0
583 MB in 9165 subdirectories and 40569 files.

Total: 1.55 GB in 18623 subdirectories and 94132 files.

And go for it (no psutil):

:; python -mpex.cli cache purge --entries installed_wheels -RH
Purging requested entries from /home/jsirois/.cache/pex: installed_wheels/0
Also purging those entries transitive dependents in: unzipped_pexes/0, venvs/0

Failed to import psutil: No module named 'psutil'
Will proceed with basic output.
---
Note: this process will block until all other running Pex processes have exited.
To get information on which processes these are, re-install Pex with the
management extra; e.g.: with requirement pex[management]

Attempting to acquire cache write lock (press CTRL-C to abort) ...
^C
No cache entries purged.

With psutil:

:; python -mpex.cli cache purge --entries installed_wheels -RH
Purging requested entries from /home/jsirois/.cache/pex: installed_wheels/0
Also purging those entries transitive dependents in: unzipped_pexes/0, venvs/0

Waiting on 2 in flight processes (with shared lock on /home/jsirois/.cache/pex/access.lck) to complete before deleting:
---
1. pid 281904 started by jsirois at 2024-09-02 15:45:24
   Pex env: {'PEX': '/home/jsirois/dev/pex-tool/pex/empty.pex'}
   cmdline: ['/home/jsirois/.pyenv/versions/3.11.9/bin/python3.11', '/home/jsirois/.cache/pex/unzipped_pexes/0/292f879052303680091fdcb445c2a746967b4e0f']
2. pid 282594 started by jsirois at 2024-09-02 15:45:50
   Pex env: {'PEX_TOOLS': '1', 'PEX': '/home/jsirois/dev/pex-tool/pex/empty-tools.pex'}
   cmdline: ['/home/jsirois/.pyenv/versions/3.11.9/bin/python3.11', '/home/jsirois/.cache/pex/unzipped_pexes/0/2c278c488639385dd1ca190c245fc4e8da7a0f30', 'repository', 'extract', '-f', '/tmp/find-links', '--serve']

Attempting to acquire cache write lock (press CTRL-C to abort) ...
^C
No cache entries purged.

And ending the processes with the shared lock:

:; python -mpex.cli cache purge --entries installed_wheels -RH
Purging requested entries from /home/jsirois/.cache/pex: installed_wheels/0
Also purging those entries transitive dependents in: unzipped_pexes/0, venvs/0

Waiting on 2 in flight processes (with shared lock on /home/jsirois/.cache/pex/access.lck) to complete before deleting:
---
1. pid 281904 started by jsirois at 2024-09-02 15:45:17
   Pex env: {'PEX': '/home/jsirois/dev/pex-tool/pex/empty.pex'}
   cmdline: ['/home/jsirois/.pyenv/versions/3.11.9/bin/python3.11', '/home/jsirois/.cache/pex/unzipped_pexes/0/292f879052303680091fdcb445c2a746967b4e0f']
2. pid 282594 started by jsirois at 2024-09-02 15:45:43
   Pex env: {'PEX_TOOLS': '1', 'PEX': '/home/jsirois/dev/pex-tool/pex/empty-tools.pex'}
   cmdline: ['/home/jsirois/.pyenv/versions/3.11.9/bin/python3.11', '/home/jsirois/.cache/pex/unzipped_pexes/0/2c278c488639385dd1ca190c245fc4e8da7a0f30', 'repository', 'extract', '-f', '/tmp/find-links', '--serve']

Attempting to acquire cache write lock (press CTRL-C to abort) ...

Purged cache Unzipped PEXes from unzipped_pexes/0
3.12 MB in 483 subdirectories and 1824 files.

Purged cache Pre-installed Wheels from installed_wheels/0
961 MB in 8975 subdirectories and 51739 files.

Purged cache Virtual Environments from venvs/0
583 MB in 9165 subdirectories and 40569 files.

Total: 1.55 GB in 18623 subdirectories and 94132 files.

Copy link
Collaborator

@benjyw benjyw left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Really nice.

management = [
# N.B.: Released on 2017-09-01 and added support for the `process_iter(attrs, ad_value)` API we
# use in `pex.cache.access`.
"psutil>=5.3"
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pants currently still uses the Pex PEX releases, but could straightforwardly switch to Pex scies (assuming they are drop in replacements for each other in terms of the CLI interface, which I assume would be the case). Thanks!

management = [
# N.B.: Released on 2017-09-01 and added support for the `process_iter(attrs, ad_value)` API we
# use in `pex.cache.access`.
"psutil>=5.3"
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

One thing I wasn't clear on: If not scies, why would the alternative be a separate pex+management PEX rather than embedding psutil for all platforms in the multiplatform pex PEX? I can see why you want psutil in the management extra, to keep the base Pex dist widely installable, but does that have to be mirrored in the release PEX?

pex/cli/commands/cache/command.py Outdated Show resolved Hide resolved
pex/cli/commands/cache/command.py Outdated Show resolved Hide resolved
@jsirois jsirois merged commit 84a4196 into pex-tool:main Sep 4, 2024
26 checks passed
@jsirois jsirois deleted the cache/manage branch September 4, 2024 02:21
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
2 participants