-
Notifications
You must be signed in to change notification settings - Fork 3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Prefer backtracking on dependencies involved in the most recent conflict #10481
Conversation
After a bit of thought I realized that older ancestors are all going to be pinned so preferring all ancestors would probably at best only help situations that required deep backtracking. I did manage to implement a relatively simple version and it worked well with all my known test cases, but didn't seem to provide any additional benefit. Further it seems like it isn't strong enough to tackle the Airflow 1.10.13 case. So I think trying to prefer all the ancestor adds a bit of complexity for no provable benefit. Can revisit in the future though. |
I think I may not be reading this correctly, but what are the additional benefits you feel should be expected? The patch only modifies the backtracking behaviour from I can tell (because |
You are correct that this patch only modifies the backtracking behavior. Sorry I did not explain myself very well, let me attempt to do better: Say you had 3 requirements "X, Y, Z" and X depended on A which depended on B, and Y depended on C which depends on B. But the latest version of C and the latest version of A's dependency on B is incompatible. This creates a failure in attempting to pin at some point and causes Pip to backtrack, In this scenario the patch I provide is supposed to prefer up to B, C, and A. As A and C disagree about what version of B they should have, the assumption is resolving this conflict before anything else will speed up total backtracking time. What I thought (and crossed out in my original comment now) is it might make even more sense in some circumstances to prefer the whole ancestor chain, so the combination of X, A, B, and Y, C, B. My reasoning for this in some circumstances it could be that X and A as well as Y and C are tightly coupled and therefore preferring A and C may not be helpful as as you need to backtrack to earlier versions of X or Y. I created a new patch which implemented this "prefer all ancestors of failure causes" but after experimenting with it I could not find an example where it actually helped more than this current patch I have submitted to you. I wanted to communicate this negative result to be clear where my thoughts on this optimization are and what I've tried. Hope that explanation makes more sense? Seems like these ideas are somewhat difficult to communicate textually, but I am happy to answer any questions you have. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I really like this -- this is a very smart way to solve the problem of actively backtracking on the items that are implicated in the conflicts.
A couple of minor comments.
return ( | ||
not requires_python, | ||
delay_this, | ||
not direct, | ||
not pinned, | ||
not responsible_for_failure, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This should likely be the first item in this tuple. When a candidate is implicated in the failure, the most important thing is that we need to iterate on other candidates of that identifier IMO.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You are likely correct. My reasoning was python, direct, and pinned should all be singular package requirements, and I didn't want to mess with pinning them first. But as "responsible_for_failures" only becomes relevant once there are failures I guess it makes sense to put it first. I will do some sanity testing though that this doesn't cause problems.
As for "delay_this" I actually personally feel that this setuptools hack can be removed with this change, as it won't be responsible for failures and therefore pip won't be "accidentally" backtracking on setuptools anymore. I just wasn't brave enough to do it on this PR.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We can have a --use-deprecated=setuptools-resolve-hack
that enables delay_this
and point people report to a tracking issue when they use this so we can get an idea if it’s actually useful after this. But I agree this should be a separate PR.
Meanwhile, I would put not responsible_for_failure
after delay_this
(requires_python
should always be the first one to make unresolvable Python version conflits bubble up as soon as possible).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I am running through my known test cases and I found a requirements file where the proposed change to the return order here makes it significantly slower to resolve (in fact it hasn't finished resolving):
# Python 3.7 only
poetry2setup
wheel
twine
Without the proposed change it significantly backtracks on "keyring" to "19.2.0", then after 1 more significant backtrack on "importlib_metadata" it resolves quickly.
With the proposed change the resolver backtracks on "keyring" to "15.1.0", backtracks significantly on "importlib_metadata" (but much slower), then backtracks significantly on "packaging", and that is as far as I've been able to get after an hour of backtracking.
I have not yet been able to figure out all the specific details. But I hypothesize it is very efficient to check single requirements (e.g. direct and pinned), even versus following backtrack failures.
Let me know what you think.
(Also, please fix the linter errors first so the actual tests can run.) |
Added requested function and changed name to "backtrack_causes" instead of "failure_causes", I think this makes more sense in context, as these aren't failures of the current state but of the state that the resolver backtracked from. |
I'm realizing this is going to require modifying a lot of test cases or at least test set-up. In most cases though it should just require adding: Could also make the But before proceeding either way I'll wait to see if you think the approach is correct, maybe there's a way I should do this without adding an extra argument. |
I think adding a new (required, not optional) argument is the way to go. Not entirely sure |
Not gonna run the CI, since we know it's gonna fail until we update the vendored resolvelib, which needs a resolvelib release. :) |
FWIW, @notatallshaw -- would it be possible for you to share the list of small requirements.txt files that trigger the backtracking behaviour, so that others can see how this behaves? |
Yes, been meaning to do this for a while but it ends up being pretty time consuming to go back and recheck everything. Where I've written Linux or Windows only it's usually because system dependencies are also required. I've also made note of where the requirements file end up in ResolutionImpossible as I found it interesting even in these cases this optimization is still highly effective at reducing backtracking.
I would say I mostly tested against 1, 2, and 4 as in my testing they seem to represent a good range of challenges when it comes to finding a solution. Also I normally test in a clean virtual environment using the command Also I came across several more problematic requirements files but these were the ones I was able to reproduce today (except where noted). |
Question on Pip process: Once there is a resolvelib release what is the process for vendoring the new version? Is is a separate PR? Do I include the updated config in this PR and then it automatically happens in the CI pipeline? Or do I run the vendoring sync locally and include the resolvelib changes in this PR? |
Vendoring a new release of resolvelib (or indeed any vendored package) should be done as a separate PR. There's developer docs on how to do it, I believe, but basically it's a matter of updating |
Thanks I wasn't sure because a lot of tests will break with a new version of resolvelib until this PR lands due to the additional argument in the |
Ah, I see. I still think it should be a separate PR, but merging the two PRs should probably be co-ordinated, so I'll leave that to @pradyunsg or @uranusjr, as they've been the most directly involved with this work. |
On second thought, since the release will be done by either me or @pradyunsg, it could actually be easier for whoever does the release to also do the vendoring update, and rebase this PR to the branch instead. |
Hi! Many thanks for everyone's hard work here :-) I'm hoping that this PR makes it to the 21.3 release - and from reading the above, I'm presuming that's the intention? If so, should this PR be added to the 21.3 milestone? |
I'll add it in. I'm looking at our set of vendored dependencies right now, and will likely end up picking this up this weekend as well. |
Once But then I think the PR will be good to go pending some more thoughts and suggestions from the PIP maintainers. I'll also re-run through all the test cases I can still reproduce. |
Alrighty. The resolvelib 0.8.0 release has been made. Could you go ahead and update this PR @notatallshaw? Here's what I would suggest doing, to update this PR to use resolvelib 0.8.0:
|
I tried doing this but it seemed to add even more commits on to this pull request. Not exactly sure what the goal is here, my knowledge of using git in such a large project is very limited. |
Would it be fine if I did squashing instead then? |
5ee8193
to
351da10
Compare
You should be able to run:
And, that should get you synced up with the rebased version of this branch. :) |
Hmm... I'm gonna fiddle with this for a bit. Gimem a few minutes. :) |
This can significantly reduce the amount of backtracking required, by avoiding backtracking on unrelated packages in the dependency graph. Co-authored-by: Pradyun Gedam <[email protected]>
351da10
to
1e3c127
Compare
@@ -66,12 +66,13 @@ def __init__( | |||
def identify(self, requirement_or_candidate: Union[Requirement, Candidate]) -> str: | |||
return requirement_or_candidate.name | |||
|
|||
def get_preference( | |||
def get_preference( # type: ignore |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We need to update resolvelib's type annotations, to fix this.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Apologies I missed this. I've never worked on a project before that splits the components in to different repos like this, I should of tried to run the test cases all together before merge in to resolvelib.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
No worries. The fact that this was missed is more a symptom of a bad process, and that's not something you need to apologise for. :)
Alrighty. I'm done. |
Yes, absolutely, I'm on a flight right now to a friend's wedding and may not be able to spend much time over the next few days, so feel free to push whatever needs pushing. I don't have any ownership feelings over this PR I just want to help Pip. But if you do need me to do something I'll catch up as soon as I can. |
Made these changes.
Thanks for the review @uranusjr! And, @notatallshaw, for all the work investigating and implementing this! ^>^ |
Here's to hoping it doesn't cause more issues than it solves! Feel free to ping me on 21.3+ heavy backtracking reports. I am very invested at this point to keep resolving this issue |
PR for #10479
I actually think it would make sense to prefer all the ancestors of the failures, but I couldn't figure out a simple way to do that.Depends on sarugaku/resolvelib#84