-
-
Notifications
You must be signed in to change notification settings - Fork 264
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Implement pex3 lock sync. #2373
Conversation
Introduce the high level `pex3 lock sync` command. This should generally suffice for typical use cases and works as follows: + On first use (where the specified `--lock` does not yet exist), it acts just like `pex3 lock create`. + On subsequent uses it does a minimal synchronization of the lock based on the diff of the given current requirements against the requirements used to generate the specified `--lock`. This amounts to formulating a `pex3 lock update` command with the appropriate `-p`, `-R` and `-d` arguments. In addition to creating and syncing a lock, it can also create and sync a venv (--venv) based on the lock. Further, a command can be specified to run in the synchronized venv with arguments following the `--` option terminator. This latter set of features allow Pex to act as a concise tool in `tox` / `nox` / `invoke` / `make` setups to implement a simple build system. Fixes pex-tool#2344
Another pretty huge one. Thanks in advance for any scrutiny you can provide. I did not convert Pex's |
@cburroughs I sent you an invite. I'd like to have you review, especially with an eye towards making sure you have everything you need for your Pants work with this change. |
Thank you! I'll take a long look this week. |
N.B.: I've noticed a few things and will be pushing a few individually reviewable commits here at the tail. None of these change the nature of the large 1st commit; so any review effort there is not wasted. |
The specific case of a user Pip requirement was failing with a KeyError.
This is needed to pick up IC changes in universal locks amongst many other configuration changes that can affect what versions are chosen and which artifacts are locked.
Ok, with 120fe0d I think this is complete and I'll stay hands off until there is time to review. |
Nice! I'll take a close look over the next couple of days. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is very cool. I've focused on the core change in lock.py
and skimmed the tests, but haven't had time to look beyond that.
venv = Virtualenv.enclosing(python=argv0_path) | ||
if not venv: | ||
try: | ||
venv = Virtualenv(os.path.dirname(os.path.dirname(argv0_path))) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I see that this double parent is used a few places in the code base so probably isn't "new"... what does it mean? What --venv
argument value might a user pass that falls into this path?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The venv bin dir. So /a/venv/bin/script
. The parent is /a/venv/bin/
, and its parent is /a/venv/
- the venv dir.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
N.B.: For Windows the structure is C:\a\venv\Scripts\script.exe
and so the ..\..
treatment holds.
|
||
to_remove = [] # type: List[Distribution] | ||
to_install = [] # type: List[Distribution] | ||
for distribution in distributions: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Confirming my understanding of transitive dependencies of requirements (e.g. explicit
requires hidden
, and the command is something like pex3 lock sync explicit --venv ...
):
- does
distributions
include all of them, not just the top level requirements? - I don't see explicit tests for these sort of transitive dependencies, either for updating lockfiles (including something transitioning from transitive-only to top-level requirement and vice versa) or for updating a venv. Have I missed some? If not, is that something worth testing or is it well covered elsewhere?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
- Yes, distributions come from a full transitive lock resolve.
- There are extensive tests of lock create and lock update in other files. This change just re-uses that 2 year old infra.
That said re 2, the transition you mention is probably interesting enough to test here.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ok, tested all of 2 in test_sync_venv_transitive_to_direct_and_vice_versa
.
if to_unlink: | ||
to_unlink_by_pin[ | ||
(distribution.metadata.project_name, distribution.metadata.version) | ||
] = [file for file in to_unlink if abs_venv_dir == commonpath((abs_venv_dir, file))] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I've noticed this idiom is used in a quite a few places (not just this PR), is it worth a dedicated is_prefix_of
(or something) function for it?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I guess I haven't thought so. The commonpath
function is stdlib and this is about the only way to use it; so why obscure a stdlib idiom behind a bespoke to the Pex codebase helper.
for file in itertools.chain.from_iterable(to_unlink_by_pin.values()): | ||
safe_delete(file) | ||
parent_dirs.add(os.path.dirname(file)) | ||
for parent_dir in sorted(parent_dirs, reverse=True): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Cool trick with the reverse=True
👍
pex/cli/commands/lock.py
Outdated
"When creating a venv that doesn't exist, install pip in it. When syncing an " | ||
"existing venv, retain its pip, if any, even if pip is not present in the lock " | ||
"file being synced. N.B.: When in `--no-retain-pip` mode (the default) only remove " | ||
"pip if it is both present in the venv and not present in the lock file being " | ||
"synced." |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If a lockfile has pip in it:
- does the
--no-retain-pip
(a) install pip to a new venv (or existing one without it), and (b) update it if an existing venv has a different version? - does
--pip
use the version from the lockfile?
If so, the docs might be slightly clearer trying to convey something like:
- In the default
--no-retain-pip
mode,pip
isn't special, and behaves like any other package in the lock (i.e. installed/updated/removed as required) - With
--retain-pip
,pip
is explicitly installed (aka, not removed) in venv, even if it is not in the lockfile (and if it is in the lockfile, the lockfile's version is used?)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is the grid that makes sense as I see it (keep in mind there is only a single bool here and the --no-*
mean False
/ no Pip and the other two mean True
, the venv should have Pip:
Existing venv + resolve has Pip | Existing venv + resolve does not have Pip | No venv + resolve has Pip | No venv + resolve does not have Pip | |
---|---|---|---|---|
--pip |
install resolve's Pip | -m ensurepip |
install resolve's Pip | -m ensurepip |
--no-pip |
install resolve's Pip anyway | uninstall Pip if installed | install resolve's Pip anyway | no-op |
In other words the idea is --no-pip
is ignored if the the resolve does in fact contain Pip. One tactic there would be to error instead: you told me no Pip but your resolve contains Pip. My thinking here is the main point is "sync"; so clearly the lock trumps. In a scenario where there are many venvs being synced, it makes sense to say pex3 lock sync --no-pip ...
(in code) to ensure final venvs do not have pip unless in the lock (i.e.: a use case where you generally want the venv <-> lock mapping to be 1-1 / controlled by sync with no cowboying the venv by using Pip later ... unless the lock explicitly asks for Pip).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I believe the 1 case in the grid not handled is --pip
+ "Existing venv + resolve does not have Pip". According to the grid, Pip should be installed via -m ensurepip
but it is not in that case. I'll fix that and add a test as well as take a second go at the option help docs.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ok, the grid above is now tested and the option help improved, hopefully.
if not sync_target: | ||
return Ok() | ||
|
||
if self.options.dry_run: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
From this, it looks like --dry-run
only applies to the sync to the venv, and the --lock
lockfile is still created/mutated, which I find a bit surprising: I'd expect a --dry-run
flag to mean not mutate anything "real" on disk, only caches and temporary files. It looks like the docs for --dry-run
say "Don't update the lock file".
Am I missing something?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You are, partly. The --dry-run
option is accessed by helpers that perform the update and so a --dry-run
update does not mutate the existing lock. The create path, though, does not handle --dry-run
and always creates the lock file. I'm not sure what to do about that. A dry-run venv mutation makes sense (the create path can sync an existing venv instead of creating a new one). A dry run new lock file creation makes a lot less sense.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ok, there is now a trimmed down dry run report for lock creation such that no lock file hits disk.
I've piece-meal responded. I'll have full time to respond to all comments on 2/26 / 2/27. Some of the questions, though are answered by tests in this PR if you get impatient. |
Expand test coverage and improve option help.
Test a transitive <-> intransitive flip-flop.
@huonw I've fully responded as far as I can tell saving for the case of what to do with
... I actually went with this in c1e26df |
Ok, all of @huonw's feedback is addressed AFAICT; so ready for round 2 of feedback. |
Tests are bloody, but it appears to be a GitHub outage with action repos timing out during clone and ghcr.io also timing out when downloading Docker images. |
Still looking and prototyping; I hope to have more useful feedback early next week. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The code looks good; thanks for the updates. I haven't (and won't, sorry) had time to execute it/play with it, yet, though; sounds like @cburroughs may be able to cover that more practical validation.
@@ -1414,8 +1428,28 @@ def _sync(self): | |||
pip_configuration=pip_configuration, | |||
) | |||
) | |||
with safe_open(lock_file_path, "w") as fp: | |||
self._dump_lockfile(lockfile, output=fp) | |||
if self.options.dry_run: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This seems like an appropriate way to handle creating with --dry-run
, thanks for getting it in.
Thank you! This is great and I've been kicking the tires so to speak and try it out with our internal lockfile. A few in progress comments so far. Conflicting outputWhile syncing an internal lock file I've gotten this apparently conflicting message:
But I've been unable to figure out a smaller reproduction case so far. Extras/ConstraintsA bug where a requirement with extras resulted in an invalid tmp constraints file for the sync.
Interpreter Constraints and friendsI see 120fe0d but I'm not sure how these lock options are intended to be "sync-able" changes. That is which can switch to "new" values while trying to hold the versions constant, or only only values that are subsets of the original range. Naive example trying to "upgrade" Python versions with
|
|
An update since its been a while again: I've not looked at "Conflicting output" and "Extras/Constraints" since those both look small / easy. |
Ok, I am going to proceed without support for lockfiles with multiple locked resolves, that case did prove too much to get working in a way I'm satisfied with yesterday. |
Previously a subset needed to be successful to select a locked resolve for syncing. This would fail if the lock was strict or else any single locked requirement had platform specific artifacts with no available sdist.
for target in update_targets | ||
] | ||
else: | ||
subset_result = try_( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
N.B.: This is the old logic and it still applies for lock files with more than one locked resolve within; so these locks will still be subject to the more strict locked resolve selection rules and potentially fail to sync. I'll add a comment here pointing to an issue that tracks possibly handling those style locks better with a less strict, but still correct, scheme for selecting locked resolves to sync.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Comment added and filed #2386
@cburroughs this is good to go afaict.
|
Ok, I looked more deeply at "Extras/Constraints" and although I cannot repro using that case, I do see a bug in the code by inspection. I'll add a failing test and fix. |
Previously contraints were taken directly from requirement strings which led to invalid constraints for requirements with extras or a direct reference URL. Introduce a distinct `Constraint` type that conforms to Pip's constraint expectations and fix up the lock model and lock updates to use `Constraints` to ensure Pip's expectations are met.
Alright @cburroughs this should be truly good to hammer now. The only outstanding known issue is the cosmetic "updates: no updates" message. |
original_locked_requirements = collect_locked_requirements() | ||
assert ProjectName("sphinx-click") not in original_locked_requirements | ||
|
||
run_pex3("lock", "sync", "shiv[rtd]==1.0.5", "--indent", "2", "--lock", lock).assert_success() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Before the fix this failed with:
E AssertionError: integration test failed: return_code=1, output=, error=Encountered 1 error updating /tmp/pytest-of-jsirois/pytest-9/test_sync_extras0/lock.json:
E 1.) cp312-cp312-manylinux_2_35_x86_64: /tmp/lock_update.zcz8gryb.constraints.txt line 4:
E shiv[rtd]==1.0.5
E Constraint files do not support VCS, URL or local project requirementsand they do not support requirements with extras. Search for 'We are also changing our support for Constraints Files' here: https://pip.pypa.io/en/stable/user_guide/#changes-to-the-pip-dependency-resolver-in-20-3-2020.
Thanks! Have been swamped but will try to get my hammer out tomorrow. |
Still haven't figured out what triggers this
👍 Sorry for my slowness. My pattern here has been to hammer on a large internal lockfile, find something that errors out, and then tease out if that's a real thing that happens in public. Usually that has involved a bunch of stumbling over internal packages, local PyPi registries, PEBCAK, or other things that don't generalize. I think I've found a case where if the transitive dependencies are strict (
|
@cburroughs my next work stint runs the afternoon of 3/27 through 4/1. I'll take a look then. |
Okay, @cburroughs that last behavior you mentioned is what you get. You can fix by following the reccomendation (and then once more when you get a new but similar conflict) to arrive at a successful:
The issue here is that acryl-datahub pins a whole heck of alot of deps (instead of, say, constraining to a non-breaking semver range):
That sort of mass pinning will lead to issues when the pins change in another version given that Pex updates have these rules:
Afaict, to have these sorts of situations resolve automatically would either require Pex tries, fails, retries switching constraints pins to semver range pins? But that would lead to more disruptive lock updates growing akin to re-running So, how do you wish to proceed? Afaict Pex cannot do better than this without becoming a resolver itself which is not in the cards in Pex 2.x. The user has to be able to read the Pip conflict output and do what I did above or else Pants needs to have an escape hatch to re-run |
@cburroughs I may move forward with this iteration this afternoon. I'm getting uncomfortable with the number of changes queued behind a release and it seems like Pants is at liberty to skip this initial release of |
(Yeah some of these large packages with narrow pins are challenging to deal with in a few different ways.) The mental model of "Pex is not a resolver and thus can't 'backtrack'" makes sense to me. I'll have to think about how to present that through Pants in a way that's helpful without getting in the users way, but that was the essential design problem there anyway. Thanks for putting this all together. |
One thing to point out is |
Introduce the high level
pex3 lock sync
command. This should generallysuffice for typical use cases and works as follows:
--lock
does not yet exist), itacts just like
pex3 lock create
.on the diff of the given current requirements against the requirements
used to generate the specified
--lock
. This amounts to formulating apex3 lock update
command with the appropriate-p
,-R
and-d
arguments.
In addition to creating and syncing a lock, it can also create and sync
a venv (--venv) based on the lock. Further, a command can be specified
to run in the synchronized venv with arguments following the
--
optionterminator.
This latter set of features allow Pex to act as a concise tool in
tox
/nox
/invoke
/make
setups to implement a simple build system.Fixes #2344