Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merge identical forks #5405

Merged
merged 1 commit into from
Jul 25, 2024
Merged

Merge identical forks #5405

merged 1 commit into from
Jul 25, 2024

Conversation

konstin
Copy link
Member

@konstin konstin commented Jul 24, 2024

Consider these requirements from pylint 3.2.5:

Requires-Dist: dill >=0.3.6 ; python_version >= "3.11"
Requires-Dist: dill >=0.3.7 ; python_version >= "3.12"

We will split on the python version, but then we may pick a version of dill that's >=0.3.7 in both branches and also have an otherwise identical resolution in both forks. In this case, we merge both forks and store only their conjoined markers.

Consider these requirements from pylint 3.2.5:

```
Requires-Dist: dill >=0.3.6 ; python_version >= "3.11"
Requires-Dist: dill >=0.3.7 ; python_version >= "3.12"
```

We will split on the python version, but then we may pick a version of `dill` that's `>=0.3.7` in both branches and also have an otherwise identical resolution in both forks. In this case, we merge both forks and store only their conjoined markers.
@konstin konstin added the preview Experimental behavior label Jul 24, 2024
@konstin konstin requested a review from BurntSushi July 24, 2024 12:16
@@ -2426,6 +2462,30 @@ pub(crate) struct ResolutionDependencyEdge {
}

impl Resolution {
fn universal() -> Self {
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That method gets removed upstack

.map(ResolverMarkers::Fork)
.unwrap_or(ResolverMarkers::Universal);
continue 'FORK;
}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I buy this, but two thoughts come to mind.

Firstly, is it possible for more than one duplicate resolution to exist at any given point in time? If so, this would I believe only find one of them. But, I do not think this is case, since this is run for every resolution before it is "saved." So it should never be the case that more than one duplicate resolution appears.

Secondly, this is doing an exhaustive search over all existing resolutions to find a possible duplicate. And I suspect that the Resolution::same_graph routine is itself not especially cheap. I think this ends up being quadratic in the number of forks (which are themselves exponential in the number of dependencies I think? or possibly in the depth in the dependency tree). I don't have a good feel for how big of an issue that is in practice. Do we have a sense of what the common case is? I would guess the common case is that there aren't any duplicates. So perhaps we can optimize for that path. (To be clear, I don't mean to suggest that be done in this PR.)

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Firstly, is it possible for more than one duplicate resolution to exist at any given point in time? If so, this would I believe only find one of them. But, I do not think this is case, since this is run for every resolution before it is "saved." So it should never be the case that more than one duplicate resolution appears.

We fork every time we see conflicting markers, but in many of those cases the requirements themselves are not conflicting (say numpy >= 1.16 for one and numpy >= 1.19 for the other). When forking, we can't yet know whether we'll find a compatible numpy for both of. I've also seen cases where we end up rejecting the package version we forked on in both branches, removing the conflicting requirements. By copying over preferences from previous forks, we try to coerce two forks to resolving the same package version. Basically, our strategy is to fork often to avoid failing on avoidable conflicts, but still having a solution with as few divergences as possibles.

Re perf: I agree that this is potentially costly, but i think we have to do this to get desirable resolution. We have some short-cuts that we get from std that makes this cheaper: When two forks have a different number of packages, the check is a single usize comparison. We also usually have a small number of forks (and the more specific a fork is, the more likely it is we skip future fork points because we're already more specific), so i see this more as a fixed cost of maybe 10^2/2=50 checks. There are of course pathological cases, for those i think we just have to be a bit slow here avoid redundant forks in the lockfile.

@konstin konstin merged commit 93fb28f into main Jul 25, 2024
55 checks passed
@konstin konstin deleted the konsti/merge-identical-forks branch July 25, 2024 09:54
charliermarsh added a commit that referenced this pull request Aug 14, 2024
## Summary

This was added in #5405 but is now
the cause of an instability in `github_wikidata_bot`. Specifically, on
the initial run, we fork in `pydantic==2.8.2`, via:

```
Requires-Dist: typing-extensions>=4.12.2; python_version >= '3.13'
Requires-Dist: typing-extensions>=4.6.1; python_version < '3.13'
```

In the end, we resolve a single version of `typing-extensions`
(`4.12.2`)... But we don't recognize the two resolutions as the "same
graph", because we propagate the fork markers, and so the "edges" have
different markers on them...

In the second run through, when we have the forks in advance, we don't
split on Pydantic... We just try to solve from the root with the current
forks. This is fundamentally different and I fear it will be the cause
of many instabilities. But removing this graph check fixes the proximate
issue.

I don't really understand why this was added since there was no test
coverage in the PR.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
preview Experimental behavior
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants