-
Notifications
You must be signed in to change notification settings - Fork 2.3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Replace the secondary
source type with more granular types
#6713
Comments
#6669 represents incremental progress towards this goal, and the author is interested in implementing this proposal. |
Would it be fair say that a contributing factor to issues that users have when managing multiple repositories is that Poetry assumes the worst case scenario when it comes to how to obtain metadata from non-PyPI repositories? Lookups to PyPI are done through APIs and happen way quicker than the HTML parsing that happens with the other repositories. Obviously, Poetry has gone down that route to ensure that things work and don't just explode because some backend didn't whatever APIs PyPI is currently exposing. Just seems to me that while querying repos unnecessarily is worth avoiding, a real pain point is how these other package repositories end up being utilised. |
I did not reference it here as it is out of scope for this issue; however, yes, that is a problem. However, it is an ecosystem problem:
(from #4113 (comment)) In short, PyPI implements a non-standard API that has had to make breaking changes with no warning in the past, and that they intend to remove. It has served Poetry's use case so far and validated the need for this in the ecosystem, but it is not implementable by other package sources for a variety of reasons (ranging from the data model being coupled to warehouse internals and incorrect in the general case, to detection being problematic). There is a standards-based solution and path forward; Poetry will have to wait patiently with everyone else for gradual adoption. |
Indeed I am. The refactor is "hairy" (not my words), but it appears to be a nice way to get acquainted with the Poetry code and contribute. I'm now at the point where I write tests that express the desired behavior and found that the Currently, such dependencies are not bound to the source, meaning that they may be retrieved from any configured repo including the source repo. Perhaps the most natural option is to, when we introduce The first suggestion seems most appropriate. My private packages have private dependencies, so I'd expect that to work just fine. Still, I'd like to hear your thoughts before I make assumptions. |
Regarding the correct model for transient dependencies when requested from a I find this less surprising as all dependencies in Python are "peer dependencies" in other ecosystems, and thus we don't have to answer the question of where a dep comes from when it's a mutual dependency of multiple top level deps with disparate However, if we must propagate Poetry has generally taken a stance of "if you care about any details of your transient dependencies, well, now it's a top-level dependency," which is maybe constructed backwards vs. the reality of the Python ecosystem (most existing tooling and standards are built around flat lists of deps and not trees). In my mind, it makes more sense to users who are new to Python packaging and better expresses Poetry's goals of bringing useful ideas from other ecosystems to Python, while remaining compatible/a good citizen. |
Can I suggest setting out what you think the key use cases for configuring repositories are, and spelling out how those use cases would be met by this proposal? With any luck that'll validate that this is a great solution. With even more luck, it'll help someone to realise that their case is not well served, or that the model can be simplified without hurting anything. The cases that come to mind for me are
in my small world I don't know the use case for "internal mirror with fallback to pypi", but perhaps it's a thing people want. (Maybe those with slow or limited internet connections have reasons to prefer local mirrors but are willing to fall back to pypi?) |
One more that is crucial to my team:
|
@kkozmic that is also our primary use case. Specifically, poetry is starting to get unbearably slow in our larger projects. Our private repositories are hosted in gitlab and when gitlab doesn't find a package it returns with a redirect to pypi. This means any local caching is defeated because the requested and returned urls differ |
I've made a draft PR to facilitate the discussion. High-level feedback is very welcome (may save future revisions)! |
We regularly use an internal mirror with fallback to PyPI. Our packages are developed internally and made public with some delay (and not all versions are released externally). During development of package A, I need to be able to use the latest features of package B -- possibly not available on PyPI yet. When the new versions packages A and B are released externally, users of package A should just use the version of package B that is available on PyPI. |
@neersighted is |
This is an excellent proposal. In particular, the |
Is there any palliative solution? |
Related to this issue and #7204: If I try to force poetry to use the private source as the primary source and try to ignore the default pypi it still compares the results of pypi.org with the private source.
|
Since the implementation of Actually, I think we do only need an option to disable PyPI. This need not to be tied to setting a flag on a specific source. #7430 is a draft (with some TODOs) to deprecate |
In the last maintainer meeting we revisited the proposal, refined it and came up with a slightly differnt design alternative. I'll try to summarize it here, so we can decide if the alternative or the (revised) original proposal makes more sense. General decisions (same in both proposals)Secondary sources ( Setting a default source ( Without With only one remaining source type we need new options to define the following behaviors:
In the following I'll summarize the revised original proposal and the new variant omitting the deprecated features for more clarity. Original proposalIntroduce a new property for sources called Supplemental sources will only be queried if no suitable package/version has been found in other (primary and supplemental) sources before. The order of supplemental sources is the order of occurrence as in pyproject.toml. Explicit sources are only queried if they are defined explicitly for a dependency. New proposalIntroduce two new flags: In contrast to the original proposal, sources are not sorted by groups first, but divided into groups by the order of occurrence in the pyproject.toml. Let's consider the following example:
That would result in A and B being queried first "in parallel". If no suitable package/version is found in A or B, C and D are queried "in parallel". If no suitable package/version is found in C or D, E is queried. (If a package/version is found in one group the following groups are not queried.) Closing thoughtsIn both proposals, the default PyPI source (if not disabled) has the lowest priority and is only queried if no suitable package/version has been found in any other (non-explicit) source. This should help to avoid dependency confusion attacks. What do you think? Which proposal makes more sense? Which one is easier to teach? (The names of the new options are not carved in stone yet and may be replaced if there are more suitable names.) Is there any use case not covered by one (or even both) proposals? |
Thank you for the detailed overview and proposal. Perhaps the one thing that can be clarified is what the resolution order implies (perhaps also to add to the docs in the PR that handles the final proposal)? To the best of my understanding (please agree / correct). If several sources are queried for a certain dependency in the same 'batch' (that includes the current As for the general decisions part: you have my full support in both decisions. Regarding the other part, my thoughts are the following. The new proposal is strictly more generic than the original proposal -- if every supplemental source is flagged as Looking forward to others' thoughts. |
Hi agree with @b-kamphorst: the first solution seems easier to understand and - while less generic - it also feels less "over-engineered". Did you ever receive issues that the first proposal would not solve but the second one would? |
For anyone having the same issue of @kkozmic , I developed a simple opinionated plugin that makes sure that:
You can find it here, feedback is welcome! |
While writing down the two approaches, I already had the feeling that the new proposal was too elaborate. Thanks for the confirmation. I think we will stick (more or less) to the original proposal. #6713 (comment) shows that there is something we did not yet consider: People might have different requirements considering the priority of the default PyPI source. Some users want other sources only be looked up if there is nothing on PyPI, other users want PyPI only be looked up if there is nothing in other sources. Thus, it might make sense to make the priority of PyPI configurable. |
Just adding one more voice to the pool of "having a
Works for me, because if things get complicated I'd probably just set up a "pypi mirror plus private packages" and use that for everything, which solves the issue as a whole. Controlling the lookup and fallback order through clever setting of ever more granular flags for both sources and dependencies seems difficult and error prone. |
Short status update: The first step is done: #7658 has been merged into master. Thus, the next minor version of Poetry will introduce explicit package sources, which are only searched if they are explicitly configured for a dependency with IMO, the next step is #7801, which will allow to configure the priority of PyPI. In its issue description, I also described my target image. (#7801 will supersede #7430.) Opinions are welcome. |
@radoering I have a question about that part:
I don't really understand why this exception is only relevant to the
If I don't specify any Also, the scenario in which a) we don't have any Btw, the current docs are quite confusing:
|
It has a technical background. We need at least one non-explicit source. Otherwise, we'll get a "no versions for package xyz found" error which would be quite obscure.
Your reasoning makes sense. We probably should change it.
Technically, that's not an issue because if nothing is found in higher priority sources (which would be always true in this case), supplemental sources are searched. However, from a teaching perspective, I agree with you. If we want to change it we should print a warning first to avoid a breaking change (and change it to an error later).
PRs with docs improvements are always welcome. 😄
That's still true. If the same version of a package is found in two sources the higher priority wins.
In that case, at first, only the first supplemental source will be searched. Only if no versions fulfilling the constraint are found in the first one, the second supplemental source will be searched, and so on.
We can't (don't want to) deprecate
That is an alternative we also considered. However, we decided that it's better to keep |
@radoering thank you for the quick answer!
Fully agree with that approach. As you said - technically it is correct to have
To sum up then:
Is that right? If yes, I wonder if that's actually the expected behaviour for users, as that's a bit "inconsistent". I could imagine some people wish to query
Yeah, I get your point, I also had the same thoughts, just assumed redundant, but not-breaking,
I don't want to overpromise, but when I have the full picture (so when we discuss the above points), maybe I will find some time in the foreseeable future to prepare at least initial PR. |
In principle, yes. Strictly speaking, it's not always the highest version but the highest version that fulfills all constraints.
In that case, they can configure one source as
We considered making it more elaborate in #6713 (comment) but refrained from doing so since there were no real-world use cases.
Supporting multiple per-package source constraints is a new feature request. If you really have a use case for this, please create a new issue (if there is no such issue yet), so that we can discuss the consequences there.
Makes sense. |
Sure thing.
Right, good point.
Yep, I agree querying
That was actually just an example why we could make sources "priority" configurable for all types, not only |
This issue has been automatically locked since there has not been any recent activity after it was closed. Please open a new issue for related bugs. |
Poetry currently will look up packages in every source, even when unnecessary due to a
source =
constraint on a dependency, or if the dependency was already found. This behavior often surprises users (though it was originally modeled on pip). Ostensibly,secondary
is the solution to this -- however, it merely pushes the repository further back in the search order and does little to address the user complaints/confusion. As such, we should deprecate it and replace it with new options that better match user expectations.I think, to preserve backwards compatibility, the long-term path forward is probably two different options (in my example,
supplemental
andprivate
). I would propose something like the following, as an overview of behaviors:pip --extra-index-url
-style. This is known internally as primary, and is the current behavior.default = true
means that the repository replaces PyPI,pip --index-url
-style. This is the current behavior.supplemental = true
means that this source is only consulted after a lookup in thedefault
(implicitly PyPI unless configured otherwise) and primary sources fail.private = true
means that the repo will only be considered for packages that are explicitly configured withsource =
, and should be preferred by users for private packages to avoid dependency confusion attacks.secondary = true
would be kept around as a deprecated option (likely with a warning), and would maintain the legacy behavior of being searched exhaustively for backwards compatibility.Originally #5984 (comment)
The text was updated successfully, but these errors were encountered: