-
Notifications
You must be signed in to change notification settings - Fork 3.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
An option to start a local index proxy when running pip #11771
Comments
FWIW, I don't want to say that I'm -1 on this specifically, but I don't think it's going to solve the use cases that people keep asking for like filtering by age, etc. Like if people really want a way to start a temporary index server with pip, then it's fine? but I don't think most people want that, nor would be particularly happy with requiring that for the feature they actually want. |
I agree that it's a solution designed by a techie, which is only likely to be loved by techies. So sue me 🙂 And it's absolutely me scratching a personal itch. But I do think it could enable reasonably user friendly idioms. Imagine, for example, I can even imagine someone writing a customisable proxy, so that users could put |
I'm also inclined to think that "start a local index proxy" is a much bigger feature than "constrain certain packages to certain indexes (possibly via the constraints file)". Typically the latter is what's needed, and some kind of wildcard in names in constraints files would get to basically the same point as simpleindex covers (without drastically increasing the number of things they could screw up). In our case, we have builds of particular libraries that we want to ensure use our wheel rather than trying to build from sdist again (which we know is going to fail). Simply using That said, it's a pain in CI to start simpleindex, configure the correct In any case, I'm +1 on the problem description, and +0 on this proposal (I'll take it if there's motivation to implement it, but I think a separate constraints list is simpler and almost as good). |
Personally I am inclined to say this is not suitable for pip, but I do have had similar needs for this, and I suspect the scenario is not uncommon (or it should be more common). It’s probably a good idea to develop a ready-made solution that spins up a server in the background and run pip against it in one command so we can point people to it when asked. |
In addition to a script, could you have a dependency specifier and an entry-point? Eg If the script is run through a sub-process, perhaps instead of passing arguments, you look at its stdout, grepping for eg |
TBH, the intention here is to remove complexity from pip, by providing one, simple, option that covers common requests. Adding complexity to this feature is to an extent working against that goal. So yes, we could, but I'd prefer not. For example, you're saying include a dependency specifier - do you expect pip to download that dependency? Where should pip install it? How should it download it, when we haven't started the proxy yet? Things get out of hand very fast.
Yeah, I feel like the reaction to this proposal has mostly been "we should make it easier to use proxy servers, but not like this". And if it doesn't result in fewer people asking for more complexity in pip, then it's not adding much value. Maybe a separately distributed utility |
If I understood the problem space correctly, I feel like this could be solved with a declarative configuration file like the one I hinted at here: https://discuss.python.org/t/python-packaging-strategy-discussion-part-1/22420/127 I will call this potential file format Constraints 2 for now, because I feel like I understand that by relying on a separate proxy process (like simpleindex) as suggested in the initial post, there would be relatively little change needed in pip itself, which is an advantage. On the other hand, although it would require much more work, I feel like the problem space that constraints 2 could cover is much wider, than what could be done with a proxy. If I am not mistaken this would cover dependency confusion attack as well, which seems to have prompted this discussion here and this other one. If there was to be such a Constraints 2 file format, possibly standardized, then other tools (installers, dev workflow tools, and so on) could also take advantage of it as well. Is it something potentially interesting? Should I try to start a dedicated discussion thread somewhere? Gather some use cases and rough design concepts? |
IMO, it's offtopic for this issue, but it's potentially interesting. Personally, I'd like to see an implementation in the form of an index proxy that implemented filtering via a "constraints 2 format" configuration. That would be usable right now by pip, would be portable to any other application that wanted to view indexes through a "constraints 2" filter, and would allow rapid development independent of the resource constraints of projects like pip. If it became popular, and people found starting up a standalone index proxy a problem, then we could look at building the implementation into pip. At that point, we might even consider vendoring the proxy library and making the pip implementation be "start the vendored proxy, and point pip at it". So yes, go for it, but I would strongly recommend basing the initial proposal on a filtering server proxy, rather than on "if pip implements this, life would be great". Because the latter is a great way to end up with all talk and no action, unfortunately. |
@pfmoore It cannot be just a proxy, because what I have in mind is to also fix the metadata of some packages. So in other words it would not just impact the "package finder" but also the "dependency resolution". I will try to put my ideas in a dedicated thread. Thanks for feedback. Anyway, I very much like the concept and implementation of simpleindex, I recommend it often. So your initial suggestion (of a tighter integration) seems good to me. |
@sinoroc OK, in that case (a) it's definitely offtopic for this thread, and (b) I'm concerned that it's yet more complexity for pip (and other standards-conforming tools) to have to implement. I'd like to see the proposal discuss "how does this reduce overall complexity rather than increasing it", as the current trend seems to be to pile ever more requirements on tools. |
@pfmoore Understood. I will try to publish my thoughts on this in a document somewhere. Feel free to mask these messages as off-topic. |
Given that it's likely that the index proxy will have more complex settings than pip, maybe we should be pushing the command into the proxy tools? So more like:
(and it sets the environment variables for pip, and maybe other relevant tools if needed, while running arbitrary commands after the |
Ah that makes sense. Probably not in the main entry point but something like |
I'm a strong proponent of there being a solid and re-usable simple repository core library which includes proxying, filtering and merging functionality. If such a library existed, why not vendor it into
I think the proposal would be quick to implement, but IMO lacks realistic adoption - it doesn't feel like a well integrated solution to the problem that vendoring a domain specific tool could provide. It is also unnecessarily wasteful to start a new process, to open up a new port, and to use HTML to communicate between processes (not to mention the subprocess issues that would inevitably turn up (e.g. a race condition on opening the target repository's port when invoking Based on my recent
Within The premise that you can chain a repository is also the basis for the |
What's the problem this feature will solve?
Many problems people have with index server handling (such as index priority, dependency confusion attacks, filtering available package releases by age, etc) can relatively simply be solved using an index proxy, that presents a "view" of the underlying index(es). However, users are typically reluctant to use such a proxy index. Generally it's difficult to get people to articulate the reasons they don't like this option, but the most common complaint is the need to manage a separate running service.
Describe the solution you'd like
If pip had an option that specified a script which started an index server for the duration of the pip invocation, people could use this to avoid the need to have a permanently running index proxy.
For example, the user creates1 a script
my_index.py
, which starts up an index server. Then, they invoke pip using the commandpip install --index-script=my_index.py ...
. When started, pip will run the index script, and communicate with it to agree on a proxy URL that it will provide. The rest of the pip invocation works as normal, using the temporary index. When pip completes, it shuts down the proxy automatically.The details of the communication between pip and the script will need to be established, but it can probably be as simple as pip choosing a port number, and passing it as an argument to the script.
Alternative Solutions
The current approach, where the user has to manually start a proxy before running pip, is viable, but appears unattractive to users as a solution.
Alternative solutions to individual issues have been proposed as pip feature requests - for example, #8606 and https://discuss.python.org/t/proposal-preventing-dependency-confusion-attacks-with-the-map-file/23414. These solve individual problems, but are not as general as the proposed solution.
Additional context
Creating an index proxy script is potentially more complexity than many users will be comfortable with, which is likely to limit the adoption of this proposal. This can be addressed, at least in part, by publishing a set of scripts that handle well-known cases like index prioritisation.
Unfortunately, pip has a limited amount of developer resource, and that makes it difficult to implement solutions to the various issues raised in a timely manner. Also, it's necessary to make sure that any solution works for every user's situation, which further delays resolution. By, in effect, "outsourcing" the work of solving the issue to the end user, simple, tightly focused solutions can be delivered in a much more timely manner, and pressure is taken off the volunteers supporting pip.
Code of Conduct
Footnotes
In an ideal world, an ecosystem of proxy implementations will become available, so the user simply downloads a suitable script and configures it. ↩
The text was updated successfully, but these errors were encountered: