Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Setuptools is first installed from PyPI then installed again using the vendored version #1001

Closed
edmorley opened this issue Jul 22, 2020 · 1 comment · Fixed by #1007
Closed
Assignees

Comments

@edmorley
Copy link
Member

edmorley commented Jul 22, 2020

The Python setup step uses get-pip.py to install pip, here:

/app/.heroku/python/bin/python "$GETPIP_PY" pip=="$PIP_UPDATE" &> /dev/null

Afterwards the vendored setuptools wheel is installed, here:

/app/.heroku/python/bin/pip install "$ROOT_DIR/vendor/setuptools-39.0.1-py2.py3-none-any.whl" &> /dev/null

However get-pip.py actually installs setuptools itself (if it's not already installed):
https://pip.pypa.io/en/stable/installing/#installing-with-get-pip-py
https://github.com/pypa/get-pip/blob/eff16c878c7fd6b688b9b4c4267695cf1a0bf01b/templates/default.py#L152-L153

This means the latest version of setuptools (currently 49.2.0) is installed from PyPI, and then immediately after the vendored copy (currently 39.0.1) installed over it.

This adds time to the build unnecessarily, and negates any reliability/security benefits of vendoring setuptools.

We should either:

  1. Disable the automatic get-pip.py setuptools installation using --no-setuptools, and install only the vendored version
  2. Stop vendoring setuptools and instead pin the setuptools version by passing setuptools==VERSION to get-pip, so the copy it downloads from PyPI is the version we want rather than latest
@edmorley edmorley self-assigned this Jul 22, 2020
edmorley added a commit that referenced this issue Jul 23, 2020
The versions installed by the buildpack have been updated as follows:
* pip:
  - If using Python 3.4: No change (already using the last to support 3.4)
  - If using pipenv: No change (need to update to a newer pipenv first)
  - For everything else: `20.0.2` -> `20.1.1`
* setuptools:
  - If using Python 3.4: `39.0.1` -> `43.0.0` (latest for 3.4)
  - If using Python 2.7: `39.0.1` -> `44.1.1` (latest for 2.7)
  - For everything else: `39.0.1` -> `47.1.1` (until #1006 fixed)
* wheel:
  - If using Python 3.4: `unpinned` -> `0.33.6`
  - For everything else: `unpinned` -> `0.34.2`

This fixes #949 and fixes #1005, and means packages that rely on newer
setuptools will now install successfully.

Changelogs:
https://pip.pypa.io/en/stable/news/
https://setuptools.readthedocs.io/en/latest/history.html#v47-1-1
https://wheel.readthedocs.io/en/latest/news.html

In addition:
* Installed versions are now deterministic (fixes #1000, fixes #1003)
* The build output now includes the versions used, making it easier to
  debug future upgrades (closes #939)
* Errors during pip/setuptools/wheel install now correctly fail the
  build, and stderr is no longer sent to `/dev/null` (fixes #1002)
* Setuptools is no longer installed twice (fixes #1001)
* Everything that is downloaded is now used (fixes #999)
* `--no-cache` and `--disable-version-check` are now used, saving
  unnecessary work and preventing creation of unwanted files in `/app`
* The `PIP_UPDATE` env var no longer leaks into subprocesses.

As part of fixing version pinning, we now use pip itself to determine
whether the installed packages are up to date, since parsing pip's
output is fragile (eg #1003).

This means `pip install` is now called every time, however this is a
no-op for repeat builds where the versions have not changed, since
unless `--upgrade` is specified pip does not hit the index (PyPI) if
requirements are satisfied.

For the installation itself `get-pip.py` is no longer used, since:
- It uses `--force-reinstall`, which is unnecessary here and would slow
  down repeat builds (given we call pip install every time now).
  Trying to work around this by using `get-pip.py` only for the initial
  install, and real pip for subsequent updates would mean we lose
  protection against cached broken installs, plus significantly
  increase the version combinations test matrix.
- It means downloading pip twice (once embedded in `get-pip.py`, and
  again during the install, since `get-pip.py` can't install the
  embedded version directly)
- We would still have to manage several versions of get-pip.py, to
  support older Pythons.

We don't use `ensurepip` since:
- Not all of the previously generated Python runtimes on S3 include it
- We would still have to upgrade pip afterwards
- The versions of pip/setuptools bundled with ensurepip differ greatly
  depending on Python version, and we could easily start using a CLI
  flag for the first pip install before upgrade that isn't supported
  on all versions, without even knowing it (unless we test against
  hundreds of Python archives).

The new pip wheel assets on S3 were generated using:

```
$ pip download --no-cache pip==19.1.1
Collecting pip==19.1.1
  Downloading pip-19.1.1-py2.py3-none-any.whl (1.4 MB)
  Saved ./pip-19.1.1-py2.py3-none-any.whl
Successfully downloaded pip

$ pip download --no-cache pip==20.1.1
Collecting pip==20.1.1
  Downloading pip-20.1.1-py2.py3-none-any.whl (1.5 MB)
  Saved ./pip-20.1.1-py2.py3-none-any.whl
Successfully downloaded pip

$ aws s3 sync . s3://lang-python/common/ --exclude "*" --include "*.whl" --acl public-read --dryrun
(dryrun) upload: ./pip-19.1.1-py2.py3-none-any.whl to s3://lang-python/common/pip-19.1.1-py2.py3-none-any.whl
(dryrun) upload: ./pip-20.1.1-py2.py3-none-any.whl to s3://lang-python/common/pip-20.1.1-py2.py3-none-any.whl

$ aws s3 sync . s3://lang-python/common/ --exclude "*" --include "*.whl" --acl public-read
upload: ./pip-19.1.1-py2.py3-none-any.whl to s3://lang-python/common/pip-19.1.1-py2.py3-none-any.whl
upload: ./pip-20.1.1-py2.py3-none-any.whl to s3://lang-python/common/pip-20.1.1-py2.py3-none-any.whl
```
@edmorley
Copy link
Member Author

In #1007 I went with option (2) above.

edmorley added a commit that referenced this issue Jul 29, 2020
Since:
* we'll be updating setuptools soon, and newer setuptools has dropped
  support for Python versions this buildpack needs to support. As such
  if we continued to vendor setuptools, we would need to vendor at
  least three different versions.
* we want to try and update setuptools more frequently than we have
  in the past, which will mean more repo bloat from binary churn.
* we're still pinning to a specific version, meaning vendoring doesn't
  have determinism benefits.
* setuptools is only fetched from PyPI for new installs (or where
  versions have changed), so this doesn't increase build time, load on
  PyPI, or reliance on PyPI in the common case.
* setuptools is already being inadvertently installed from PyPI prior to
  being installed from the vendored copy (see #1001), so we're in effect
  already using/depending on PyPI here.
* switching to storing setuptools on S3 wouldn't help reliability as
  much as it would appear at first glance, since the later `pip install`
  of customer dependencies will fail if PyPI is down anyway.
edmorley added a commit that referenced this issue Jul 29, 2020
`get-pip.py` installs setuptools itself (if it's not already installed):
https://pip.pypa.io/en/stable/installing/#installing-with-get-pip-py
https://github.com/pypa/get-pip/blob/eff16c878c7fd6b688b9b4c4267695cf1a0bf01b/templates/default.py#L152-L153

This means that previously the latest version of setuptools (currently
`49.2.0`) was being installed from PyPI, and then immediately after the
target version (currently `39.0.1`) installed over it.

This added time to the build unnecessarily.

The version of setuptools installed by `get-pip.py` can be overridden
by passing in a version as a normal requirements specifier.

Fixes #1001.
edmorley added a commit that referenced this issue Jul 29, 2020
Since:
* we'll be updating setuptools soon, and newer setuptools has dropped
  support for Python versions this buildpack needs to support. As such
  if we continued to vendor setuptools, we would need to vendor at
  least three different versions.
* we want to try and update setuptools more frequently than we have
  in the past, which will mean more repo bloat from binary churn.
* we're still pinning to a specific version, meaning vendoring doesn't
  have determinism benefits.
* setuptools is only fetched from PyPI for new installs (or where
  versions have changed), so this doesn't increase build time, load on
  PyPI, or reliance on PyPI in the common case.
* setuptools is already being inadvertently installed from PyPI prior to
  being installed from the vendored copy (see #1001), so we're in effect
  already using/depending on PyPI here.
* switching to storing setuptools on S3 wouldn't help reliability as
  much as it would appear at first glance, since the later `pip install`
  of customer dependencies will fail if PyPI is down anyway.
edmorley added a commit that referenced this issue Jul 29, 2020
`get-pip.py` installs setuptools itself (if it's not already installed):
https://pip.pypa.io/en/stable/installing/#installing-with-get-pip-py
https://github.com/pypa/get-pip/blob/eff16c878c7fd6b688b9b4c4267695cf1a0bf01b/templates/default.py#L152-L153

This means that previously the latest version of setuptools (currently
`49.2.0`) was being installed from PyPI, and then immediately after the
target version (currently `39.0.1`) installed over it.

This added time to the build unnecessarily.

The version of setuptools installed by `get-pip.py` can be overridden
by passing in a version as a normal requirements specifier.

Fixes #1001.
dryan pushed a commit to dryan/heroku-buildpack-python that referenced this issue Nov 19, 2020
Since:
* we'll be updating setuptools soon, and newer setuptools has dropped
  support for Python versions this buildpack needs to support. As such
  if we continued to vendor setuptools, we would need to vendor at
  least three different versions.
* we want to try and update setuptools more frequently than we have
  in the past, which will mean more repo bloat from binary churn.
* we're still pinning to a specific version, meaning vendoring doesn't
  have determinism benefits.
* setuptools is only fetched from PyPI for new installs (or where
  versions have changed), so this doesn't increase build time, load on
  PyPI, or reliance on PyPI in the common case.
* setuptools is already being inadvertently installed from PyPI prior to
  being installed from the vendored copy (see heroku#1001), so we're in effect
  already using/depending on PyPI here.
* switching to storing setuptools on S3 wouldn't help reliability as
  much as it would appear at first glance, since the later `pip install`
  of customer dependencies will fail if PyPI is down anyway.
dryan pushed a commit to dryan/heroku-buildpack-python that referenced this issue Nov 19, 2020
`get-pip.py` installs setuptools itself (if it's not already installed):
https://pip.pypa.io/en/stable/installing/#installing-with-get-pip-py
https://github.com/pypa/get-pip/blob/eff16c878c7fd6b688b9b4c4267695cf1a0bf01b/templates/default.py#L152-L153

This means that previously the latest version of setuptools (currently
`49.2.0`) was being installed from PyPI, and then immediately after the
target version (currently `39.0.1`) installed over it.

This added time to the build unnecessarily.

The version of setuptools installed by `get-pip.py` can be overridden
by passing in a version as a normal requirements specifier.

Fixes heroku#1001.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

1 participant