Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Dependency resolution way slower for packages with large number of releases from legacy secondary repositories #6436

Closed
3 tasks done
MasterNayru opened this issue Sep 7, 2022 · 4 comments · Fixed by #6442
Labels
kind/bug Something isn't working as expected

Comments

@MasterNayru
Copy link
Contributor

Issue

I am seeing a significant speed regression when Poetry attempts to resolve dependencies from a secondary repo with many releases. To replicate this issue so that the Poetry team can see what I am talking about, my gist configures pypi.org as a secondary repo and points the pyproject.toml to packages with many releases. I have not been seeing this issue in v1.1 and do believe this was introduced relatively recently with the changes to support detecting yanked releases.

It looks like with Poetry v1.2 there was a change in logic to how the links to packages in a legacy repository are obtained. Poetry, as of v1.2, this line tells Poetry to reach out to a legacy repository and grab every package published for that package: https://github.com/python-poetry/poetry/blob/master/src/poetry/repositories/legacy_repository.py#L84. It then checks each package to see whether the package has been yanked or not, and in doing so loops through every published package again, and reparses the page and creates package objects for every link on the page for every package linked on the page in the repo. I hope I am describing this clearly enough, but basically because the links are stored in a list, it seems like Poetry has no option but to keep continually looping through the links over and over again to reprocess them to work out which links belong to a particular release version.

To put numbers on how drastic an effect this has on Poetry's performance, with the linked pyproject.toml, it takes somewhere in the order of 10 to 11 minutes to resolve these dependencies. If the secondary repo is removed, it takes about 25 seconds. I was able to make changes to the code in a fork of mine to make it use a dict instead of a list iterator to try and store links by package name and version, but I struggled to update the tests to make sure that I didn't break any other parts of Poetry in doing so. With the changes I had made, I was able to resolve dependencies using a secondary repo with the above config in Poetry v1.2 in the 25 seconds or so that I was expecting.

At the risk of displaying my limitations when it comes to programming in Python, if looking at my garbage code in my fork will help communicate what I am describing in this issue, feel free to roast this: MasterNayru@1f27951 . It is probably not fixing the root cause of the issue but at least seems to limit the size of the nested loops that Poetry currently seems to run when trying to get version info from a legacy repo.

@MasterNayru MasterNayru added kind/bug Something isn't working as expected status/triage This issue needs to be triaged labels Sep 7, 2022
@dimbleby
Copy link
Contributor

dimbleby commented Sep 7, 2022

I wondered about this when considering whether I was interested in resolving the merge conflicts in #6081.

My immediate thought was that the LinkSource should be refactored so that we learn whether versions are yanked as we discover them. Something like:

  • yanked(name, version) is removed
  • versions(name) is renamed / rearranged to return Iterator[tuple[Version, Yanked]] (where Yanked is str | bool)

I'm not expecting to get to this though, if anyone else would like to take this suggestion - or fix some other way - we won't be conflicting.

@radoering
Copy link
Member

@MasterNayru Thanks for the detailed analysis. I took your general idea and cleaned it up a bit in #6442.

@mwgamble
Copy link

mwgamble commented Sep 8, 2022

Thank you @MasterNayru so much for the detailed analysis! ❤️

I'm just hear to voice support for this. We've noticed a very similar performance regression when dealing with Poetry 1.2 and Nexus (last updated a week ago).

neersighted added a commit that referenced this issue Sep 9, 2022
…es (#6442)

Resolves: #6436

Measurements of `poetry lock` with warm cache with example
pyproject.toml from #6436:

|test case|time in s|peak memory usage in MB|
|---|---|---|
|legacy repository (before)|422|113|
|legacy repository (after)|3|118|
|pypi repository|1|92|

`backports.cached-property` is used in order to support
cached_property on Python 3.7. 

Co-authored-by: Jarrod Moore <[email protected]>
Co-authored-by: Bjorn Neergaard <[email protected]>
radoering added a commit to radoering/poetry that referenced this issue Sep 9, 2022
…es (python-poetry#6442)

Resolves: python-poetry#6436

Measurements of `poetry lock` with warm cache with example
pyproject.toml from python-poetry#6436:

|test case|time in s|peak memory usage in MB|
|---|---|---|
|legacy repository (before)|422|113|
|legacy repository (after)|3|118|
|pypi repository|1|92|

`backports.cached-property` is used in order to support
cached_property on Python 3.7.

Co-authored-by: Jarrod Moore <[email protected]>
Co-authored-by: Bjorn Neergaard <[email protected]>

(cherry picked from commit c4b2253)
radoering added a commit to radoering/poetry that referenced this issue Sep 9, 2022
…es (python-poetry#6442)

Resolves: python-poetry#6436

Measurements of `poetry lock` with warm cache with example
pyproject.toml from python-poetry#6436:

|test case|time in s|peak memory usage in MB|
|---|---|---|
|legacy repository (before)|422|113|
|legacy repository (after)|3|118|
|pypi repository|1|92|

`backports.cached-property` is used in order to support
cached_property on Python 3.7.

(cherry picked from commit c4b2253)

Co-authored-by: Jarrod Moore <[email protected]>
Co-authored-by: Bjorn Neergaard <[email protected]>
radoering added a commit to radoering/poetry that referenced this issue Sep 9, 2022
…es (python-poetry#6442)

Resolves: python-poetry#6436

Measurements of `poetry lock` with warm cache with example
pyproject.toml from python-poetry#6436:

|test case|time in s|peak memory usage in MB|
|---|---|---|
|legacy repository (before)|422|113|
|legacy repository (after)|3|118|
|pypi repository|1|92|

`backports.cached-property` is used in order to support
cached_property on Python 3.7.

(cherry picked from commit c4b2253)

Co-authored-by: Jarrod Moore <[email protected]>
Co-authored-by: Bjorn Neergaard <[email protected]>
neersighted added a commit that referenced this issue Sep 10, 2022
…es (#6442)

Resolves: #6436

Measurements of `poetry lock` with warm cache with example
pyproject.toml from #6436:

|test case|time in s|peak memory usage in MB|
|---|---|---|
|legacy repository (before)|422|113|
|legacy repository (after)|3|118|
|pypi repository|1|92|

`backports.cached-property` is used in order to support
cached_property on Python 3.7.

(cherry picked from commit c4b2253)

Co-authored-by: Jarrod Moore <[email protected]>
Co-authored-by: Bjorn Neergaard <[email protected]>
@mkniewallner mkniewallner removed the status/triage This issue needs to be triaged label Sep 18, 2022
Copy link

github-actions bot commented Mar 1, 2024

This issue has been automatically locked since there has not been any recent activity after it was closed. Please open a new issue for related bugs.

@github-actions github-actions bot locked as resolved and limited conversation to collaborators Mar 1, 2024
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
kind/bug Something isn't working as expected
Projects
None yet
Development

Successfully merging a pull request may close this issue.

5 participants