-
Notifications
You must be signed in to change notification settings - Fork 3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
pip overwrites existing files unconditionally during installation #4625
Comments
Is there a workaround for now? |
the name clash pretty much is the fault ofthe package authors ^^ pip cant help bad packagers and pypi is not curated |
@RonnyPfannschmidt So pip is so dumb that it cannot even check if it's overwriting files (better if it checked the directory/egg/package exist)? You are correct that pip can't help with name clashes "in the wild", but in my opinion it must warn the user when a name clash is happening on the user machine. Especially because pip itself, during automatic dependencies resolving, could install a package that the user did not explicitly request and clash with another one. |
The fact that the names clash is a packaging issue. The fact that pip installs such packages is (arguably) a pip bug. I agree that pip should be printing a warning; I won't want pip to refuse to overwrite files and fail... Off the top of my head, I imagine this would break namespace package installation (or at least need them to special cased).
I imagine the only "fix" to this issue is that the conflicting packages resolve such a conflict. To me, this issue as a tracker for a warning being added to pip, that's printed when pip overwrites an existing file, when it doesn't expect it to happen. |
Any case of overwriting a file that is not a namespace declaration is likely a packaging bug |
Agreed. pip warning about the fact that it's overwriting is nicer than pip silently breaking the installation; even though the flaw is in the package... pip is the one taking the action that breaks the environment, even if it's because of the flawed packaging. |
FTR; The fix for this issue IMO would be that pip starts printing warnings when it overwrites non-namespace declaration files. I feel that until a PR comes around, it doesn't make sense to discuss the ramifications of this change. |
Agreed. There's a whole load of corner cases / unusual usages that would have to be considered (upgrades, ``--target``` installs, for example) and it's much easier to debate details like that with actual code. |
We just got bitten by this - again. I finally researched for an issue for this as I expected that this is just an oversight and will be fixed anyway, if I report it or not. Consider using two library packages using Usually this worked out like this (without using the intermediate packages):
However, with less luck this is the result:
In reality we build using fresh virtualenvs each time and it seems like the installation order is deterministic. So our customers weren't affected. 😌 However we spent quite some time to find out what as going on when running the application on my machine, because somehow I got the dependencies installed in the wrong order. I could understand that this feature is missing if pip did not track the installed files. So I looked if maybe pip does already record this information: (pip-nameclash) torsten.landschoff@horatio:~/pip-nameclash$ find -type f -print0|xargs -0 grep ipaddress.py
Binary file ./lib/python2.7/site-packages/pip/_vendor/ipaddress.pyc matches
Binary file ./lib/python2.7/site-packages/ipaddress.pyc matches
./lib/python2.7/site-packages/pip-9.0.1.dist-info/RECORD:pip/_vendor/ipaddress.py,sha256=wimbqcE7rwwETlucn8A_4Qd_-NKXPOBcNxJHarUoXng,80176
./lib/python2.7/site-packages/pip-9.0.1.dist-info/RECORD:pip/_vendor/ipaddress.pyc,,
./lib/python2.7/site-packages/ipaddress-1.0.18.dist-info/RECORD:ipaddress.py,sha256=0U5GvYMuyiZrV5jScIRHYxFWbwLzM44HNiaY8x57qUA,80156
./lib/python2.7/site-packages/ipaddress-1.0.18.dist-info/RECORD:ipaddress.pyc,,
./lib/python2.7/site-packages/py2_ipaddress-3.4.1.dist-info/RECORD:ipaddress.py,sha256=lCp63yE4Z1-ia3T3qEiljD1CcG10UartIx3nYXXoRzE,74200
./lib/python2.7/site-packages/py2_ipaddress-3.4.1.dist-info/RECORD:ipaddress.pyc,, Surprise, it is recorded - in a file named Question is: Would a merge request be accepted that adds a check for conflicting files? I'd be interested in implementing this as I carry a bit of fear that we could run into this in production. For now I will be trying to come up with a script that checks the installed files vs. the RECORD information. |
Yes, a PR would be welcomed. As noted earlier in the thread, please make sure to cater for namespace packages (I don't know much about them myself, but a lot of people use them and they need to be supported). |
Good point. And probably non-trivial. Naive solution would be to ensure that all But I bet that will often not be the case. |
oh, and as a extra fun bit, pip will have to keep track of double installed files |
... and at the moment, pip doesn't actually keep track of what's installed. That's done in the Sorry @tlandschoff-scale - I don't mean to dump loads of objections on you. It would still be good to have a PR for this, but as you can see, it's not as simple as it first seems. Unfortunately, that's far too common a situation in Python packaging 😞 |
@pfmoore I think packages installed in other ways (e.g. |
Understood, but my point is that pip doesn't do any tracking of what it installed. It relies purely on standardised metadata to tell what files are installed and where (specifically the RECORD file). And that's essentially by design - we want package data (such as what is installed) to be introspectable by anyone, using well-defined standard metadata. |
... anyway, it's not worth debating the details just yet. Let's wait for someone to submit a PR and we can discuss in the context of actual code. |
From what I have found the RECORD file is actually rewritten during installation - but only when a wheel is installed. Which would probably account for 99% of all packages we are installing. Actually this is done inside the (vendored) distlib package in |
I agree with @pfmoore here. I think it's best to not discuss the details now and to discuss them in the context of some actual implementation, when a PR is made for this by someone. |
In the case of Like most of the other issues with
|
Just an observer's comment regarding
Arch Linux's packages are split in two parts: the "official" packages (https://www.archlinux.org/packages/) and the user-provided & uncurated packages (the Arch User Repositories, https://aur.archlinux.org/ https://wiki.archlinux.org/index.php/Arch_User_Repository). The package manager, So lack of curation is not really an argument there IMO. |
Conda has the same policy. |
(Arriving from pypa/packaging.python.org#513) As noted there, I think more desirable behaviour here would be for
As a UX enhancement, (Such a DB would also let |
Here's a different idea: The only legitimate usecase we know of, for this, is non-native namespace packages. In that case, we can reasonably restrict this to only allow overwriting |
But this still leaves the problem that when one of the packages gets uninstalled, it would break other packages that share the same |
Coming to this very late (from: https://discuss.python.org/t/python-module-conflict-discussion/23659) https://discuss.python.org/t/python-module-conflict-discussion/23659 I think it could be kept fairly simple, taking into account that there may be other sources than PyPi for packages, and there may be tools other than pip for installing. AFAICT, there are two different potential use cases here:
NOTE: namespace packages may not be the only use-case -- one I can think of off the top of my head are optional "accelerated" versions of packages. I might want the accelerated version to overwrite the "regular" version in some way. |
Any regular upgrade process is going to remove the old package first. There's literally no reason for a package install to need to overwrite any existing file, and the patch ought to be as simple as banning overwrites when extracting.1 The only exception - and it would be part of error handling the failed overwrite, not an eager test - is to allow overwrites for identical contents. The only somewhat legitimate case right now is multiple cooperating packages trying to form a namespace that want to include the So, in the absence of the
We don't need to fix uninstalls here - they're already busted. Most people I know don't ever uninstall stuff anymore, they just blow away the whole environment and start fresh. In any case, there are workarounds for it, and if pip starts warning on overwrites it'll be even more clear that it's because uninstalls won't work properly. Footnotes
|
There's And I disagree on uninstalls, as a general concept. If you're simply talking about uninstalls of namespace packages, then fair enough (I don't care much either way) but uninstalls in general are definitely an important thing to have, and "just blow away the environment" isn't a reasonable response to brokenness of uninstalls in general. |
I wasn't even aware 😄 But I see no reason to not treat that like
Maybe unreasonable, but it's what people do. That's the only point I was making. I never suggested further breaking or removing uninstall. All I was really pointing out is that this change doesn't break uninstalls further, so there's no reason to block it on those grounds. (Also worth noting that to properly fix uninstalls, we need to solve the conflict problem first, so this one comes first anyway.) |
From a conversation with @jaimergp at PackagingCon today, the way Conda has a flag for how to handle
We can probably add an option like that. |
I'm gonna do a bit of a rework of the wheel installation logic, once pypa/installer#153 is completed. |
Sorry I'm a bit green but how would one know this has occurred prior to some behavior? Is there something I can run before installation to avoid this? And is the environment ruined if I don't? Thanks |
Is this a security risk in that a malicious package can purposefully overwrite a file in another package? |
It's a security risk if you don't validate the contents of the package you're intending to install. Whether it then installs malicious code, runs a malicious build-time script, or overwrites existing files is equivalent risk. |
Pip version: any (tested on 1.5.4 and 9.0.1)
Python version: any (tested on 2.7.6 and 3.4.3)
Operating system: Should be OS independent (tested on Ubuntu 14.04.5 and Mint 17.3)
Description:
I'm installing packages with with
pip
, and it happens that some of these packages (with different names) have files with name clashes. Unfortunately,pip
silently ignores the issue and the results depend on the order in which the installs happens. In an ideal world, I would expect this to be impossible, i.e. to force the package maintainer to use a unique namespace. As a (far worse, but better than current behavior) second choice, I would expectpip
to warn the user before silently overwriting existing files, especially ifpip
itself installed those files.What I've run:
Create a clean slate to work on:
Install packages:
Lots of irrelevant stuff removed from log above (for the sake of making this issue easier to read). Analyzing what has been installed:
Now install silently conflicting package, and analyzing what happened:
As you can see,
pip
silently allow the two packages to overwrite each other'sjwt/exceptions.py
file (and in particular theInvalidKeyTypeError
is now gone, creating a whole lot of problem, depending on the order in which pip install ran)The text was updated successfully, but these errors were encountered: