-
Notifications
You must be signed in to change notification settings - Fork 2.3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Dependency resolution way slower for packages with large number of releases from legacy secondary repositories #6436
Comments
I wondered about this when considering whether I was interested in resolving the merge conflicts in #6081. My immediate thought was that the
I'm not expecting to get to this though, if anyone else would like to take this suggestion - or fix some other way - we won't be conflicting. |
@MasterNayru Thanks for the detailed analysis. I took your general idea and cleaned it up a bit in #6442. |
Thank you @MasterNayru so much for the detailed analysis! ❤️ I'm just hear to voice support for this. We've noticed a very similar performance regression when dealing with Poetry 1.2 and Nexus (last updated a week ago). |
…es (#6442) Resolves: #6436 Measurements of `poetry lock` with warm cache with example pyproject.toml from #6436: |test case|time in s|peak memory usage in MB| |---|---|---| |legacy repository (before)|422|113| |legacy repository (after)|3|118| |pypi repository|1|92| `backports.cached-property` is used in order to support cached_property on Python 3.7. Co-authored-by: Jarrod Moore <[email protected]> Co-authored-by: Bjorn Neergaard <[email protected]>
…es (python-poetry#6442) Resolves: python-poetry#6436 Measurements of `poetry lock` with warm cache with example pyproject.toml from python-poetry#6436: |test case|time in s|peak memory usage in MB| |---|---|---| |legacy repository (before)|422|113| |legacy repository (after)|3|118| |pypi repository|1|92| `backports.cached-property` is used in order to support cached_property on Python 3.7. Co-authored-by: Jarrod Moore <[email protected]> Co-authored-by: Bjorn Neergaard <[email protected]> (cherry picked from commit c4b2253)
…es (python-poetry#6442) Resolves: python-poetry#6436 Measurements of `poetry lock` with warm cache with example pyproject.toml from python-poetry#6436: |test case|time in s|peak memory usage in MB| |---|---|---| |legacy repository (before)|422|113| |legacy repository (after)|3|118| |pypi repository|1|92| `backports.cached-property` is used in order to support cached_property on Python 3.7. (cherry picked from commit c4b2253) Co-authored-by: Jarrod Moore <[email protected]> Co-authored-by: Bjorn Neergaard <[email protected]>
…es (python-poetry#6442) Resolves: python-poetry#6436 Measurements of `poetry lock` with warm cache with example pyproject.toml from python-poetry#6436: |test case|time in s|peak memory usage in MB| |---|---|---| |legacy repository (before)|422|113| |legacy repository (after)|3|118| |pypi repository|1|92| `backports.cached-property` is used in order to support cached_property on Python 3.7. (cherry picked from commit c4b2253) Co-authored-by: Jarrod Moore <[email protected]> Co-authored-by: Bjorn Neergaard <[email protected]>
…es (#6442) Resolves: #6436 Measurements of `poetry lock` with warm cache with example pyproject.toml from #6436: |test case|time in s|peak memory usage in MB| |---|---|---| |legacy repository (before)|422|113| |legacy repository (after)|3|118| |pypi repository|1|92| `backports.cached-property` is used in order to support cached_property on Python 3.7. (cherry picked from commit c4b2253) Co-authored-by: Jarrod Moore <[email protected]> Co-authored-by: Bjorn Neergaard <[email protected]>
This issue has been automatically locked since there has not been any recent activity after it was closed. Please open a new issue for related bugs. |
I am on the latest Poetry version.
I have searched the issues of this repo and believe that this is not a duplicate.
If an exception occurs when executing a command, I executed it again in debug mode (
-vvv
option).OS version and name: Fedora Linux 36, Debian 11
Poetry version: Poetry v1.2.0
Link of a Gist with the contents of your pyproject.toml file: https://gist.github.com/MasterNayru/d08b08c6ae86656faee171dcf387c0af
Issue
I am seeing a significant speed regression when Poetry attempts to resolve dependencies from a secondary repo with many releases. To replicate this issue so that the Poetry team can see what I am talking about, my gist configures pypi.org as a secondary repo and points the pyproject.toml to packages with many releases. I have not been seeing this issue in v1.1 and do believe this was introduced relatively recently with the changes to support detecting yanked releases.
It looks like with Poetry v1.2 there was a change in logic to how the links to packages in a legacy repository are obtained. Poetry, as of v1.2, this line tells Poetry to reach out to a legacy repository and grab every package published for that package: https://github.com/python-poetry/poetry/blob/master/src/poetry/repositories/legacy_repository.py#L84. It then checks each package to see whether the package has been yanked or not, and in doing so loops through every published package again, and reparses the page and creates package objects for every link on the page for every package linked on the page in the repo. I hope I am describing this clearly enough, but basically because the links are stored in a list, it seems like Poetry has no option but to keep continually looping through the links over and over again to reprocess them to work out which links belong to a particular release version.
To put numbers on how drastic an effect this has on Poetry's performance, with the linked pyproject.toml, it takes somewhere in the order of 10 to 11 minutes to resolve these dependencies. If the secondary repo is removed, it takes about 25 seconds. I was able to make changes to the code in a fork of mine to make it use a dict instead of a list iterator to try and store links by package name and version, but I struggled to update the tests to make sure that I didn't break any other parts of Poetry in doing so. With the changes I had made, I was able to resolve dependencies using a secondary repo with the above config in Poetry v1.2 in the 25 seconds or so that I was expecting.
At the risk of displaying my limitations when it comes to programming in Python, if looking at my garbage code in my fork will help communicate what I am describing in this issue, feel free to roast this: MasterNayru@1f27951 . It is probably not fixing the root cause of the issue but at least seems to limit the size of the nested loops that Poetry currently seems to run when trying to get version info from a legacy repo.
The text was updated successfully, but these errors were encountered: