-
-
Notifications
You must be signed in to change notification settings - Fork 1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Ensure bundled pip does not get too out of date #1821
Comments
I have not seen any research to back up this claim, so I'll consider this as your opinion.
With PEP-517 introduced a few years back build backends and its dependencies are handled by the build frontend so the sole beneficiary of an auto-upgrade feature would be the front end, in this case, pip. pip A major reason we stopped download by default is that slows down the creation process significantly, especially on slow networks. Pip releases a new version every four-month, there really is no reason to slow it down every time (pip should IMHO not check to upgrade every time, but instead something sane like every three months and maybe warn for a day, and then give up). To make sure people use the latest pip it's easier to tell them to upgrade the virtualenv then force pip upgrade every time. We do repackage virtualenv shortly after pip does (giving them a few days after release to allow pip bugs to be fixed).
This was no regression. It was a conscious decision made, and one we changed with the freedom of a new major version that allowed us to redefine defaults, without considering them regression or breaking. |
I'm personally ambivalent about this change, but I feel that this is something of an overstatement. I was watching with interest the development of the rewrite, and I don't recall any discussion with the virtualenv maintainers about whether we should change the download behaviour. I'm OK with the idea that this was changed as part of a rewrite, and so reassessing defaults was a reasonable part of that. I'm also fine with looking at how this decision works out going forward (things have changed significantly enough that prior experience should be taken into account cautiously). But it's certainly the case that this change wasn't done with the consensus of all the virtualenv maintainers (conceded - most of us had neglected virtualenv for many years, but the project wasn't "abandoned"). Personally, environment creation was always a major concern for me, as it was always far slower on Windows than on Unix. With the rewrite using symlinks where available, and linking to a shared version of pip rather than downloading, that has changed drastically. Those are benefits that I wouldn't want to see removed. But as a pip maintainer, and as a user, I don't like the fact that my virtual environments do not always have an up to date pip. So I'm sympathetic to the argument for downloading by default. As an alternative, virtualenv could auto-update its internal copy of pip, so the cost of downloading is only paid when a new pip is released, and the rest of the time the only cost is a version check against PyPI, similar to pip's selfcheck code. I don't know if this option was considered during the rewrite. |
To my mind, the issue is not about reproducibility, it is about environment build speed. We had the same discussion previously, when I first introduced installing the embedded pip and setuptools from wheels using the embedded copy of pip. I was swayed last time because the other costs of environment builds made the download a minor point (and that is mitigated by pip's wheel cache anyway). With the much faster environment creation nowadays, the argument for using the embedded version is much stronger.. For contrast, I find the fact that the Python |
Wasn't stated explicitly in the RFC that's true, or no ticket directly. That being said none of the maintainers reached out with any significant feedback as part of the RFC (#1366), or any concerns in the initial PR post rewrite. Three months have passed until someone even noticed it, so I don't believe this hurts anyone that much.
Dunno, the project had very little maintenance for long time before I became a maintainer, so not sure what you consider abandoned and what you consider neglected.
I still think that's overkill and introduced significant overhead with very little benefit. We want people to update semi-regularly, but we would not consider pip 20.0 outdated today (I hope at least). IMHO checking for an update once every two/three month is more than enough. For the minority people wanting always the latest and willing to pay the performance price for it, they can use The packaging tools have been long aiming for correctness over performance, and this hurts development. A major complaint about tox is that it's slow. The main reason is slow because of all downstream tools are introducing small overheads here and there in the name of complete and always correctness. All these small extras do add up, so I'd like to move away from introducing overheads where it's not must. And in this case, I don't consider this must. |
Call it what you want. We had a number of people ask for the feature before it was implemented. I can find only one person who asked for it to stop.
It was a regression in expected behavior for me at least.
Correct. In fact long term I think we should stop installing anything else but pip, but that's a discussion for another ticket.
This belief is wrong IMO. First off assuming that because nobody noticed for 3 months or had any feedback about this change that it was OK is silly. The rewrite was a large change and it's super easy to miss something like a boolean being changed. Since it wasn't called out or discussed in any way I just assumed it had been kept as is and hadn't been changed. Second of all, the problem is that virtualenv 20 is going to get bundled in some Ubuntu or Debian LTS and people are going to be using that for years. So you cannot assume the happy case of "well someone has a relatively modern virtualenv so they're going to have a relatively modern pip". Honestly this makes me super frustrated at the rewrite in it's entirety. I was super excited to see that work finally getting done, but now I feel like it's going to be a net negative for the ecosystem as a whole, and something we're not going to really start feeling the pain from for a year+, but once we do, it'll be something we continue to feel the pain from for years after that. This is almost certainly cause a huge version drift that makes it far more difficult to roll out new changes to the ecosystem (in particular anything that depends on support in pip) and it is going to single handily hold things back. If you're worried about performance concerns, there are far, far better ways to handle that than by reverting one of the best mechanisms we have for getting people updated. |
I will also say, the reason I didn't notice it for 3 months is because for almost all of that time virtualenv 20 had the latest pip bundled inside of it. So again you can' use that time to mean anything because my expected behavior was happening (I created a virtual environment and got the latest pip). The released version of virtualenv has only just now fallen behind the released version of pip (since this is the first pip release) and I noticed it almost immediately, and wrote a note to myself to sit down and figure out what had changed that broke my expectations and it wasn't until last night that I had a chance and found out that the rewrite had reversed the default. |
The second issue is going to basically apply to some users. We can look back and say "what if this changed had happened earlier", Take a look at say Ubuntu 16.04, a still supported OS that would currently be installing pip 18.1 instead of pip 20.0.2, and this OS is still supported for another year. In some cases damage has already been done too, Ubuntu 20.04 shipped with virtualenv 20, which means for the next.. 7? years users on Ubuntu 20.04 are going to be getting pip 20.0.2 inside of a virtual environment, (and possibly longer! extended security maintenance on 20.04 lasts until 2032). |
This is not true. Debian (and Ubuntu goes under this), de-vendors the embedded pip (starting with virtualenv 20) and all of pips dependencies so will use whatever the OS ships with; not what virtual env ships with. Furthermore, I'd expect them to also upgrade virtualenv patch versions, which would include the updates; or do they fix it in stone all applications for 5 years? If so, IMHO they should also patch in to download instead use of the embed. But as I said they already patch to not use what we embed.
Revising that decision at the moment there is me, and @asottile for it. @pfmoore seems more to download false than true. So now we have a few more people speaking to stop it.
virtualenv 20 is a new major version that also changes significantly how virtual environments are created (coming in line with how venv does things mostly). Expecting to be a drop-in replacement with virtualenv 16 IMHO is not fair. From your posts, the only system you are really worried about is Debians LTS packaging. If this is the case then why not raise the issue to them? Should we enforce some pain on everyone just for Debian users? Debian by may be popular does not mean the most users. But as I said above they don't actually use our embedded wheels, and they would also patch out a download flag being true to force users using the OS packages instead whatever we ship. See pre-commit/pre-commit#1383 and https://bugs.debian.org/cgi-bin/bugreport.cgi?bug=956144 for some other issues this has caused.
People having download is false would be just as much affected. Rolling out a breaking change would clearly break them just as much. So we would need to instruct all those to either upgrade pip (or easier, upgrade virtualenv); or set download to true.
IMHO this isn't the best mechanism. A much better mechanism I think is what I was suggesting above:
And I'm agreeable on implementing something like that; so I'd like to move the discussion towards that rather than trying to move back to download equals true always. IMHO a good middle ground is to let's say check for upgrading the embed wheel once every two weeks and upgrade if there have been no releases in the last 10 days. This guarantees that no broken releases of pip/setuptools/wheel break peoples system (or upgrade always if download is True, which the user can set). |
while I can't speak on the behalf of my colleagues at my previous companies, repeatability was a strong design goal of our python application setups. by freezing all requirements and having hash verification on installation we believed we were "safe" from external changes. but at both of my previous employment locations we were severely burned (read: public outages costing lots of money) by the 14.x-16.x this forced us to update many many install sites which called I see the change to restore download=False as the default behaviour as an overwhelmingly positive one towards the goals of speed and repeatability. And especially as a maintainer of two popular projects which utilize virtualenv (tox / pre-commit), the fd: I recognize that developing software is difficult and I mean no hard feelings towards pip / setuptools in this message -- all software is buggy and it's impossible to handle every use case <3 |
It is true, I pulled those version numbers from the Ubuntu archive and Ubuntu 20.04 are very unlikely to upgrade their pip or virtualenv to newer versions. Upgrading a package full stop to a newer version is not something that commonly happens in these distros, even to a new patch release. A plan that relies on doing that is not a good plan.
Those issues just look like bugs that are bad interactions between patches and are wholly unrelated to this discussion. As far as I am aware no distro has ever patched virtualenb to set I don't just care about Debian/Ubuntu LTS releases, they're just the easiest place to talk about this change in relation to, this would also say effect CI systems like Travis who rarely upgrade virtualenv in their base image. The places this effects are not all single entities capable of patching this behavior out.
Yes, but you don't need to get 100% of the ecosystem onto the newest version to pull the ecosystem forward. You're never going to do that so it's a non goal. What you attempt to do is make it so as many people as possible are upgrading to a newer release within a timely fashion, so that as package authors decide what features they can rely on when packaging their software, they don't look and see 50% of the user base are using tooling too old to support some feature they need. Packaging tooling is unlike most other tooling where the network effects of old versions is a major problem. If 50% of people are using a decade old version of tox, that doesn't really effect anyone else. If 50% of people are using a decade old version of pip then every package author releasing to PyPI pretty much have to eschew using new features completely.
If those were the only goals that mattered, then sure, but virtualenv's special place in the ecosystem means it has to care about the overall ecosystem health as well.
Instead we just revert back to the distutils2 days where hypothetically there are some new features that make things better, but nobody can actually use them because half the world is pinned on some ancient version of tooling. This isn't some brave new world where we don't understand the effects of these changes, we had this as the default for a long time, it was overall harmful and we made the decision to change it. I took a quick look because I could not remember anyone having opened an issue asking for Anyways, that's all mostly background to try and explain why it's important. As to the actual suggested change:
That seems fine to me. The important thing is less that every single invocation of Presumably this would download the wheel into some local cache directory or something and will use that for future invocations of |
Yes. We already are not installing directly the embedded wheel, but an extracted version of it; for which we acquire the wheel either from embed, or a previously downloaded local version of it. See https://github.com/pypa/virtualenv/blob/master/src/virtualenv/seed/via_app_data/via_app_data.py#L61. Note we already do most of this, excluding the periodic automatic update. See https://virtualenv.pypa.io/en/latest/cli_interface.html#pip. E.g. if a download install pulls in a newer version than installed subsequent creations will use the latest version, rather than the bundled one (see the PS. I'm not 100% totally on board with this idea either but I'm considering it seriously. |
I'm not in love 100% with the above idea, but I'm willing it to consider it as a middle ground. I really would like first to hear the opinion of some other people too that are convinced that IMHO and experience what software developers care are (at my company - Bloomberg - at least 3k+ engineers using Python; I'm part of the group of people that preach usage of Python and communicate with the community so I have reasonable of feedback from people that care more about their work than about the Python ecosystem - for them, Python is just a tool to a goal):
The reason why we would like to have download on with true as I understand is to improve the experience with point 3. That being said we should not make points 1 and 2 significantly worse, to make some progress on 3. |
I would say that pinning is good when it is done explicitly. The problem with the current behavior is it is being done implicitly and the policy is far too global, and in many cases it's not even pinning so much as relying on random state on the machine itself. Ultimately though I don't think that this is the right tool to enable reproducibility because it actually does a pretty poor job at it. To break that down further: The "pin" is currently a global pin that effects every virtual environment on that machine. This means that it is both too coarsely grained and too finely grained at the same time. Because it covers the entire machine, it means that two distinct projects cannot have different versions pinned (unless they go out of their way to have an explicit pin). It would be silly to expect that end users manage their pip version by modifying a global virtualenv installation version. Likewise because it's on a per machine level (effectively anyways, since virtualenv in virtualenv is rare) that means that people are going to get different results if they run the same project on two different machines that happen to have different versions of virtualenv installed. Thus if you want to have pinning in virtualenv, the right tool for that is an explicit pin that each individual project controls. This appears to be a new (and good) feature with virtualenv that allows you to pass an explicit version of pip to install. A tool like tox could even add an explicit configuration for this instead of relying on implicit behavior. |
Actually if I read this correctly the current behavior doesn't even reliably pin implicitly. If I'm reading this correctly it's going to use the latest version it can find in the app-dir by default. That means if I ever install a newer version via the |
Yes, you're right. That's a bug though 😅 That being said the above proposal would suffer from the same issue, hence why not 100 in on it. I'm still mostly tempted to default to manual updates, as that will cause the least friction for developers with the packaging ecosystem IMHO and most likely provide the most stable experience. Projects wanting to use new features can always upgrade at their own speed within 🤷♂️ |
Is it really a a bug? I would assume that was an intended difference between bundled and app-dir, as the —pip default is latest. What would latest mean if not the latest version?
My point isn’t that this is bad exactly. It’s that relying on virtualenv version as a crude mechanism to pin your pip version is fraught with problems so we shouldn’t concern ourselves with that. There is a mechanism for actually pinning your pip version built into virtualenv so people who want pinning should be directed to use that. That frees the default behavior to be opened up to allowing periodically pulling in the latest version by default and storing that in your app dir.
…Sent from my iPhone
On May 8, 2020, at 6:36 PM, Bernát Gábor ***@***.***> wrote:
Yes, you're right. That's a bug though 😅
That being said the above proposal would suffer from the same issue, hence why not 100 in on it. I'm still mostly tempted to default to manual updates, as that will cause the least friction for developers with the packaging ecosystem IMHO and most likely provide the most stable experience.
—
You are receiving this because you authored the thread.
Reply to this email directly, view it on GitHub, or unsubscribe.
|
While in abstract your point sounds ok, in practice as both Anthony and I enumerated seems to work a lot better than forcing to always install the latest pip. Though periodic upgrade is probably ok, though want to think it over and hear other people also agree. |
Can folks please use the 👍 reaction (on this post) to indicate that they're OK with this? If you're not OK with this, please respond below with why. |
I'm not keen on going back to downloading as it would make environment build slower, and get messy on systems without a network connection (or with a proxy that Python isn't configured for, or... - all of which are real issues for me). There's a load of discussion on Discourse about this which is more nuanced, but essentially I am in favour of "use a cached version" by default, but if it helps satisfy people who want the latest version I'm OK with regularly (somehow) updating the cache. My far and away biggest concern is that the default behaviour should not sit there on my PC waiting for 2 minutes for the proxy to finally report back that if Python's not willing to speak the NTLM protocol, then it doesn't feel like letting me see the internet. And no, I don't know how you could "quickly check" that this isn't going to happen :-( |
(not going to fully reiterate my comments above) tl;dr (in no particular order):
|
We discussed this on discourse. There will not be any download by default. The current behaviour I'm planing for is check for update every two weeks. Even in this case don't do it inline, but trigger a background process that does it in the background. Would only become active when a subsequent create already finds a newer build ready to go. This would mean that CI environments that don't save the cache folder would always use the embedded version. They're encouraged to upgrade the virtualenv itself to get newer version, or force in-place download. End users would get periodic updates even if they don't upgrade virtualenv. Feel free to give feedback on this plan. |
Ticket to the discuss https://discuss.python.org/t/how-should-virtualenv-behave-respective-of-seed-packages-e-g-pip-by-default/4146 one thing we agreed with overwhelming majority is not to download by default. It was split on periodically upload Vs always use embedded. |
Huh. The responses seem to be about "download by default" which isn't what I asked about? I guess that's because the comment I quoted mentions "every single invocation of virtualenv gets the absolute most up to date version of pip"? I mean, the rest of the quote says that, that specific behavior is not as important as actually doing periodic updates (which is why I quoted it) but that point seems to have been missed. I'm confused by the reactionary comments here TBH. FWIW, I'd read this issue and the discourse thread before posting that comment; and wanted to move the discussion forward, by checking whether we agree that virtualenv should do timely updates of pip, even if virtualenv isn't updated itself. I'm just gonna step away from this. Everyone here seems to have strong feelings about this topic, and engaging here just isn't gonna be worthwhile for me. :) |
This now has been released under https://virtualenv.pypa.io/en/20.0.24/changelog.html#v20-0-24-2020-06-22 |
https://build.opensuse.org/request/show/819158 by user scarabeus_iv + dimstar_suse - Add patch from upstream to fix one failing test: * tests.patch - Add missing dependencies - Skip online test test_seed_link_via_app_data - update to 20.0.25: * Fix that when the ``app-data`` seeders image creation fails the exception is silently ignored. Avoid two virtual environment creations to step on each oth ers toes by using a lock while creating the base images. By :user:`gaborbernat`. (`#1869 <https://github.com/pypa/virtualenv/issues/1869>`_) * Ensure that the seeded packages do not get too much out of date: + More details under :ref:`wheels` - by :user:`gaborbernat`. (`#1821 <https://github.com/pypa/virtualenv/issues/1821>`_) * Upgrade embed wheel content: + - ship wheels for Python ``3.9`` and ``3.10`` + - upgrade setuptools for Python ``3.5+`` from ``47.1.1`` t
Issue
Prior to the rewrite of virtualenv, it would ensure to always install the latest version of pip. If I understand correctly, with the rewrite this behavior has regressed back to installing ONLY the version of pip bundled with virtualenv.
The health of the packaging ecosystem depends in part by getting users onto newer versions of the packaging tools as quickly as possible. The decision to install the latest version of pip in virtualenv was made in part to support that. By reverting this feature back to it's legacy behavior we're making it harder to progress python packaging as a whole.
From a user point of view, before we enabled download by default a common complaint we had was that people would create a virtual environment and the first time they run a pip command they're told to upgrade pip. A number of users reported that the very first thing they did upon entering a new virtual env was upgrade pip which slowed things down because they had to uninstall and upgrade over the pip that was just installed.
Users who wish to have reproducibility were in the minority and It is, in my opinion, a fairly major regression in virtualenv and I would stongly suggest bringing back the intended behavior.
The text was updated successfully, but these errors were encountered: