Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Improve pip/setuptools/wheel installation #1007

Merged
merged 11 commits into from
Jul 29, 2020
Merged

Conversation

edmorley
Copy link
Member

@edmorley edmorley commented Jul 23, 2020

Notable changes:

  • The installed versions of setuptools/wheel are now deterministic
  • The build output now includes the versions of pip/setuptools/wheel used, making it easier to debug future upgrades
  • Errors during pip/setuptools/wheel install now correctly fail the build, and error output is no longer sent to /dev/null
  • Pip is now installed using itself instead of get-pip.py
  • Setuptools is installed from PyPI rather than a vendored copy
  • Various other small fixes

For more details (and easier review) see the individual commit messages and diffs.

Fixes #1000.
Fixes #1001.
Fixes #1002.
Fixes #1003.
Closes #939.
Closes #999.

@dzuelke
Copy link
Contributor

dzuelke commented Jul 23, 2020

I'd suggest two things:

  1. break this up into more than one commit (and more than one changelog line) if at all possible
  2. tests? ;)

@edmorley edmorley marked this pull request as draft July 24, 2020 17:23
@edmorley edmorley force-pushed the update-pip-setuptools-wheel branch from 776a1f9 to 40a337e Compare July 29, 2020 08:06
edmorley added 11 commits July 29, 2020 09:30
Since:
* "explicit is better than implicit"
* we'll soon be upgrading pip, and debugging breakage caused by upgrades
  will be easier if versions are visible in the build log

Closes #939.
And use the `$PYTHON_VERSION` calculated in `bin/steps/python` instead
of re-implementing the Python version handling.
Since:
* "explicit is better than implicit"
* we'll soon be upgrading setuptools, and debugging breakage caused by
  upgrades will be easier if versions are visible in the build log
Since:
* we'll be updating setuptools soon, and newer setuptools has dropped
  support for Python versions this buildpack needs to support. As such
  if we continued to vendor setuptools, we would need to vendor at
  least three different versions.
* we want to try and update setuptools more frequently than we have
  in the past, which will mean more repo bloat from binary churn.
* we're still pinning to a specific version, meaning vendoring doesn't
  have determinism benefits.
* setuptools is only fetched from PyPI for new installs (or where
  versions have changed), so this doesn't increase build time, load on
  PyPI, or reliance on PyPI in the common case.
* setuptools is already being inadvertently installed from PyPI prior to
  being installed from the vendored copy (see #1001), so we're in effect
  already using/depending on PyPI here.
* switching to storing setuptools on S3 wouldn't help reliability as
  much as it would appear at first glance, since the later `pip install`
  of customer dependencies will fail if PyPI is down anyway.
`get-pip.py` installs setuptools itself (if it's not already installed):
https://pip.pypa.io/en/stable/installing/#installing-with-get-pip-py
https://github.com/pypa/get-pip/blob/eff16c878c7fd6b688b9b4c4267695cf1a0bf01b/templates/default.py#L152-L153

This means that previously the latest version of setuptools (currently
`49.2.0`) was being installed from PyPI, and then immediately after the
target version (currently `39.0.1`) installed over it.

This added time to the build unnecessarily.

The version of setuptools installed by `get-pip.py` can be overridden
by passing in a version as a normal requirements specifier.

Fixes #1001.
Before:
- if `wheel` was not already installed, then `get-pip.py` would
  automatically install the latest version on PyPI, which is `0.34.2`
  (or `0.33.6` for Python 3.4).
- if `wheel` was already installed, then it was left unchanged
  regardless of the version installed.

Now:
- if `wheel` is not already installed, then the same versions will be
  installed as before, except these versions are pinned and will now not
  change unexpectedly after future `wheel` releases.
- if `wheel` is already installed, then it's upgraded/downgraded to the
  target version as needed.

Partly addresses #1000, though this change only helps builds where the
pip/setuptools/wheel install flow is triggered (currently only new apps
or ones where Python was purged or pip was not the correct version).

Since the wheel version is now known, it's output to the build log to
ease debugging and for parity with pip/setuptools.

The rest of #1000 will be fixed in later commits.
Since `get-pip.py` / pip will automatically detect and remove old
pip/setuptools versions if needed, so removing them manually is both not
necessary and slows down the build in the case where the pip version
changed, but setuptools remained the same.
Previously the pip/setuptools/wheel install step was skipped so long
as Python hadn't just been clean installed (ie so long as not a new app,
emptied cache, Python upgrade, stack change) and pip was the expected
version.

This meant that setuptool/wheel could be the wrong version (or even just
not installed at all), and this would not be corrected.

Now, we now use pip itself to determine whether the installed packages
are up to date, since parsing pip's output is fragile (eg #1003) and
would be tedious given there would be three packages to check.

Unfortunately `get-pip.py` uses `--force-reinstall` which means
performing this step every time is not the no-op it would otherwise be,
but this will be resolved by switching away from `get-pip.py` in the
next commit.

Fixes #1000.
Fixes #1003.
Closes #999.
`get-pip.py` is no longer used, since:
- It uses `--force-reinstall`, which is unnecessary here and slows down
  repeat builds (given we call pip install every time now). Trying to
  work around this by using `get-pip.py` only for the initial install,
  and real pip for subsequent updates would mean we lose protection
  against cached broken installs, plus significantly increase the
  version combinations test matrix.
- It means downloading pip twice (once embedded in `get-pip.py`, and
  again during the install, since `get-pip.py` can't install the
  embedded version directly).
- We would still have to manage several versions of `get-pip.py`, to
  support older Pythons (once we upgrade to newer pip).

We don't use `ensurepip` since:
- not all of the previously generated Python runtimes on S3 include it.
- we would still have to upgrade pip/setuptools afterwards.
- the versions of pip/setuptools bundled with ensurepip differ greatly
  depending on Python version, and we could easily start using a CLI
  flag for the first pip install before upgrade that isn't supported on
  all versions, without even knowing it (unless we test against hundreds
  of Python archives).

Instead we install pip using itself in wheel form. See:
pypa/pip#2351 (comment)

The new pip wheel assets on S3 were generated using:

```
$ pip download --no-cache pip==19.1.1
Collecting pip==19.1.1
  Downloading pip-19.1.1-py2.py3-none-any.whl (1.4 MB)
  Saved ./pip-19.1.1-py2.py3-none-any.whl
Successfully downloaded pip

$ pip download --no-cache pip==20.0.2
Collecting pip==20.0.2
  Downloading pip-20.0.2-py2.py3-none-any.whl (1.4 MB)
  Saved ./pip-20.0.2-py2.py3-none-any.whl
Successfully downloaded pip

$ aws s3 sync . s3://lang-python/common/ --exclude "*" --include "*.whl" --acl public-read --dryrun
(dryrun) upload: ./pip-19.1.1-py2.py3-none-any.whl to s3://lang-python/common/pip-19.1.1-py2.py3-none-any.whl
(dryrun) upload: ./pip-20.0.2-py2.py3-none-any.whl to s3://lang-python/common/pip-20.0.2-py2.py3-none-any.whl

$ aws s3 sync . s3://lang-python/common/ --exclude "*" --include "*.whl" --acl public-read
upload: ./pip-19.1.1-py2.py3-none-any.whl to s3://lang-python/common/pip-19.1.1-py2.py3-none-any.whl
upload: ./pip-20.0.2-py2.py3-none-any.whl to s3://lang-python/common/pip-20.0.2-py2.py3-none-any.whl
```
…eel (#1007)

Since the version check is redundant given we control/choose the version.

The pip cache is redundant since we instead cache site-packages. The pip
cache also ends up in `/app` so isn't included in the build cache anyway.
They are now displayed in the build output (instead of being sent to
`/dev/null`) and fail the build early instead of failing later in
`bin/steps/pip-install`.

Fixes #1002.
@edmorley edmorley force-pushed the update-pip-setuptools-wheel branch from 40a337e to eb37945 Compare July 29, 2020 08:57
@edmorley edmorley changed the title Update pip/setuptools/wheel and refactor installation Improve pip/setuptools/wheel installation Jul 29, 2020
@edmorley edmorley marked this pull request as ready for review July 29, 2020 09:19
@edmorley
Copy link
Member Author

I've:

  • moved the pip/setuptools version upgrades out of this PR
  • split up the commits
  • added some tests

@edmorley edmorley merged commit 00e70ff into master Jul 29, 2020
@edmorley edmorley deleted the update-pip-setuptools-wheel branch July 29, 2020 18:11
edmorley added a commit that referenced this pull request Jul 29, 2020
Since:
* "explicit is better than implicit"
* we'll soon be upgrading pip, and debugging breakage caused by upgrades
  will be easier if versions are visible in the build log

Closes #939.
edmorley added a commit that referenced this pull request Jul 29, 2020
And use the `$PYTHON_VERSION` calculated in `bin/steps/python` instead
of re-implementing the Python version handling.
edmorley added a commit that referenced this pull request Jul 29, 2020
Since:
* "explicit is better than implicit"
* we'll soon be upgrading setuptools, and debugging breakage caused by
  upgrades will be easier if versions are visible in the build log
edmorley added a commit that referenced this pull request Jul 29, 2020
Since:
* we'll be updating setuptools soon, and newer setuptools has dropped
  support for Python versions this buildpack needs to support. As such
  if we continued to vendor setuptools, we would need to vendor at
  least three different versions.
* we want to try and update setuptools more frequently than we have
  in the past, which will mean more repo bloat from binary churn.
* we're still pinning to a specific version, meaning vendoring doesn't
  have determinism benefits.
* setuptools is only fetched from PyPI for new installs (or where
  versions have changed), so this doesn't increase build time, load on
  PyPI, or reliance on PyPI in the common case.
* setuptools is already being inadvertently installed from PyPI prior to
  being installed from the vendored copy (see #1001), so we're in effect
  already using/depending on PyPI here.
* switching to storing setuptools on S3 wouldn't help reliability as
  much as it would appear at first glance, since the later `pip install`
  of customer dependencies will fail if PyPI is down anyway.
edmorley added a commit that referenced this pull request Jul 29, 2020
`get-pip.py` installs setuptools itself (if it's not already installed):
https://pip.pypa.io/en/stable/installing/#installing-with-get-pip-py
https://github.com/pypa/get-pip/blob/eff16c878c7fd6b688b9b4c4267695cf1a0bf01b/templates/default.py#L152-L153

This means that previously the latest version of setuptools (currently
`49.2.0`) was being installed from PyPI, and then immediately after the
target version (currently `39.0.1`) installed over it.

This added time to the build unnecessarily.

The version of setuptools installed by `get-pip.py` can be overridden
by passing in a version as a normal requirements specifier.

Fixes #1001.
edmorley added a commit that referenced this pull request Jul 29, 2020
Before:
- if `wheel` was not already installed, then `get-pip.py` would
  automatically install the latest version on PyPI, which is `0.34.2`
  (or `0.33.6` for Python 3.4).
- if `wheel` was already installed, then it was left unchanged
  regardless of the version installed.

Now:
- if `wheel` is not already installed, then the same versions will be
  installed as before, except these versions are pinned and will now not
  change unexpectedly after future `wheel` releases.
- if `wheel` is already installed, then it's upgraded/downgraded to the
  target version as needed.

Partly addresses #1000, though this change only helps builds where the
pip/setuptools/wheel install flow is triggered (currently only new apps
or ones where Python was purged or pip was not the correct version).

Since the wheel version is now known, it's output to the build log to
ease debugging and for parity with pip/setuptools.

The rest of #1000 will be fixed in later commits.
edmorley added a commit that referenced this pull request Jul 29, 2020
Since `get-pip.py` / pip will automatically detect and remove old
pip/setuptools versions if needed, so removing them manually is both not
necessary and slows down the build in the case where the pip version
changed, but setuptools remained the same.
edmorley added a commit that referenced this pull request Jul 29, 2020
Previously the pip/setuptools/wheel install step was skipped so long
as Python hadn't just been clean installed (ie so long as not a new app,
emptied cache, Python upgrade, stack change) and pip was the expected
version.

This meant that setuptool/wheel could be the wrong version (or even just
not installed at all), and this would not be corrected.

Now, we now use pip itself to determine whether the installed packages
are up to date, since parsing pip's output is fragile (eg #1003) and
would be tedious given there would be three packages to check.

Unfortunately `get-pip.py` uses `--force-reinstall` which means
performing this step every time is not the no-op it would otherwise be,
but this will be resolved by switching away from `get-pip.py` in the
next commit.

Fixes #1000.
Fixes #1003.
Closes #999.
edmorley added a commit that referenced this pull request Jul 29, 2020
`get-pip.py` is no longer used, since:
- It uses `--force-reinstall`, which is unnecessary here and slows down
  repeat builds (given we call pip install every time now). Trying to
  work around this by using `get-pip.py` only for the initial install,
  and real pip for subsequent updates would mean we lose protection
  against cached broken installs, plus significantly increase the
  version combinations test matrix.
- It means downloading pip twice (once embedded in `get-pip.py`, and
  again during the install, since `get-pip.py` can't install the
  embedded version directly).
- We would still have to manage several versions of `get-pip.py`, to
  support older Pythons (once we upgrade to newer pip).

We don't use `ensurepip` since:
- not all of the previously generated Python runtimes on S3 include it.
- we would still have to upgrade pip/setuptools afterwards.
- the versions of pip/setuptools bundled with ensurepip differ greatly
  depending on Python version, and we could easily start using a CLI
  flag for the first pip install before upgrade that isn't supported on
  all versions, without even knowing it (unless we test against hundreds
  of Python archives).

Instead we install pip using itself in wheel form. See:
pypa/pip#2351 (comment)

The new pip wheel assets on S3 were generated using:

```
$ pip download --no-cache pip==19.1.1
Collecting pip==19.1.1
  Downloading pip-19.1.1-py2.py3-none-any.whl (1.4 MB)
  Saved ./pip-19.1.1-py2.py3-none-any.whl
Successfully downloaded pip

$ pip download --no-cache pip==20.0.2
Collecting pip==20.0.2
  Downloading pip-20.0.2-py2.py3-none-any.whl (1.4 MB)
  Saved ./pip-20.0.2-py2.py3-none-any.whl
Successfully downloaded pip

$ aws s3 sync . s3://lang-python/common/ --exclude "*" --include "*.whl" --acl public-read --dryrun
(dryrun) upload: ./pip-19.1.1-py2.py3-none-any.whl to s3://lang-python/common/pip-19.1.1-py2.py3-none-any.whl
(dryrun) upload: ./pip-20.0.2-py2.py3-none-any.whl to s3://lang-python/common/pip-20.0.2-py2.py3-none-any.whl

$ aws s3 sync . s3://lang-python/common/ --exclude "*" --include "*.whl" --acl public-read
upload: ./pip-19.1.1-py2.py3-none-any.whl to s3://lang-python/common/pip-19.1.1-py2.py3-none-any.whl
upload: ./pip-20.0.2-py2.py3-none-any.whl to s3://lang-python/common/pip-20.0.2-py2.py3-none-any.whl
```
edmorley added a commit that referenced this pull request Jul 29, 2020
…eel (#1007)

Since the version check is redundant given we control/choose the version.

The pip cache is redundant since we instead cache site-packages. The pip
cache also ends up in `/app` so isn't included in the build cache anyway.
dryan pushed a commit to dryan/heroku-buildpack-python that referenced this pull request Nov 19, 2020
Since:
* "explicit is better than implicit"
* we'll soon be upgrading pip, and debugging breakage caused by upgrades
  will be easier if versions are visible in the build log

Closes heroku#939.
dryan pushed a commit to dryan/heroku-buildpack-python that referenced this pull request Nov 19, 2020
And use the `$PYTHON_VERSION` calculated in `bin/steps/python` instead
of re-implementing the Python version handling.
dryan pushed a commit to dryan/heroku-buildpack-python that referenced this pull request Nov 19, 2020
)

Since:
* "explicit is better than implicit"
* we'll soon be upgrading setuptools, and debugging breakage caused by
  upgrades will be easier if versions are visible in the build log
dryan pushed a commit to dryan/heroku-buildpack-python that referenced this pull request Nov 19, 2020
Since:
* we'll be updating setuptools soon, and newer setuptools has dropped
  support for Python versions this buildpack needs to support. As such
  if we continued to vendor setuptools, we would need to vendor at
  least three different versions.
* we want to try and update setuptools more frequently than we have
  in the past, which will mean more repo bloat from binary churn.
* we're still pinning to a specific version, meaning vendoring doesn't
  have determinism benefits.
* setuptools is only fetched from PyPI for new installs (or where
  versions have changed), so this doesn't increase build time, load on
  PyPI, or reliance on PyPI in the common case.
* setuptools is already being inadvertently installed from PyPI prior to
  being installed from the vendored copy (see heroku#1001), so we're in effect
  already using/depending on PyPI here.
* switching to storing setuptools on S3 wouldn't help reliability as
  much as it would appear at first glance, since the later `pip install`
  of customer dependencies will fail if PyPI is down anyway.
dryan pushed a commit to dryan/heroku-buildpack-python that referenced this pull request Nov 19, 2020
`get-pip.py` installs setuptools itself (if it's not already installed):
https://pip.pypa.io/en/stable/installing/#installing-with-get-pip-py
https://github.com/pypa/get-pip/blob/eff16c878c7fd6b688b9b4c4267695cf1a0bf01b/templates/default.py#L152-L153

This means that previously the latest version of setuptools (currently
`49.2.0`) was being installed from PyPI, and then immediately after the
target version (currently `39.0.1`) installed over it.

This added time to the build unnecessarily.

The version of setuptools installed by `get-pip.py` can be overridden
by passing in a version as a normal requirements specifier.

Fixes heroku#1001.
dryan pushed a commit to dryan/heroku-buildpack-python that referenced this pull request Nov 19, 2020
Before:
- if `wheel` was not already installed, then `get-pip.py` would
  automatically install the latest version on PyPI, which is `0.34.2`
  (or `0.33.6` for Python 3.4).
- if `wheel` was already installed, then it was left unchanged
  regardless of the version installed.

Now:
- if `wheel` is not already installed, then the same versions will be
  installed as before, except these versions are pinned and will now not
  change unexpectedly after future `wheel` releases.
- if `wheel` is already installed, then it's upgraded/downgraded to the
  target version as needed.

Partly addresses heroku#1000, though this change only helps builds where the
pip/setuptools/wheel install flow is triggered (currently only new apps
or ones where Python was purged or pip was not the correct version).

Since the wheel version is now known, it's output to the build log to
ease debugging and for parity with pip/setuptools.

The rest of heroku#1000 will be fixed in later commits.
dryan pushed a commit to dryan/heroku-buildpack-python that referenced this pull request Nov 19, 2020
Since `get-pip.py` / pip will automatically detect and remove old
pip/setuptools versions if needed, so removing them manually is both not
necessary and slows down the build in the case where the pip version
changed, but setuptools remained the same.
dryan pushed a commit to dryan/heroku-buildpack-python that referenced this pull request Nov 19, 2020
…u#1007)

Previously the pip/setuptools/wheel install step was skipped so long
as Python hadn't just been clean installed (ie so long as not a new app,
emptied cache, Python upgrade, stack change) and pip was the expected
version.

This meant that setuptool/wheel could be the wrong version (or even just
not installed at all), and this would not be corrected.

Now, we now use pip itself to determine whether the installed packages
are up to date, since parsing pip's output is fragile (eg heroku#1003) and
would be tedious given there would be three packages to check.

Unfortunately `get-pip.py` uses `--force-reinstall` which means
performing this step every time is not the no-op it would otherwise be,
but this will be resolved by switching away from `get-pip.py` in the
next commit.

Fixes heroku#1000.
Fixes heroku#1003.
Closes heroku#999.
dryan pushed a commit to dryan/heroku-buildpack-python that referenced this pull request Nov 19, 2020
`get-pip.py` is no longer used, since:
- It uses `--force-reinstall`, which is unnecessary here and slows down
  repeat builds (given we call pip install every time now). Trying to
  work around this by using `get-pip.py` only for the initial install,
  and real pip for subsequent updates would mean we lose protection
  against cached broken installs, plus significantly increase the
  version combinations test matrix.
- It means downloading pip twice (once embedded in `get-pip.py`, and
  again during the install, since `get-pip.py` can't install the
  embedded version directly).
- We would still have to manage several versions of `get-pip.py`, to
  support older Pythons (once we upgrade to newer pip).

We don't use `ensurepip` since:
- not all of the previously generated Python runtimes on S3 include it.
- we would still have to upgrade pip/setuptools afterwards.
- the versions of pip/setuptools bundled with ensurepip differ greatly
  depending on Python version, and we could easily start using a CLI
  flag for the first pip install before upgrade that isn't supported on
  all versions, without even knowing it (unless we test against hundreds
  of Python archives).

Instead we install pip using itself in wheel form. See:
pypa/pip#2351 (comment)

The new pip wheel assets on S3 were generated using:

```
$ pip download --no-cache pip==19.1.1
Collecting pip==19.1.1
  Downloading pip-19.1.1-py2.py3-none-any.whl (1.4 MB)
  Saved ./pip-19.1.1-py2.py3-none-any.whl
Successfully downloaded pip

$ pip download --no-cache pip==20.0.2
Collecting pip==20.0.2
  Downloading pip-20.0.2-py2.py3-none-any.whl (1.4 MB)
  Saved ./pip-20.0.2-py2.py3-none-any.whl
Successfully downloaded pip

$ aws s3 sync . s3://lang-python/common/ --exclude "*" --include "*.whl" --acl public-read --dryrun
(dryrun) upload: ./pip-19.1.1-py2.py3-none-any.whl to s3://lang-python/common/pip-19.1.1-py2.py3-none-any.whl
(dryrun) upload: ./pip-20.0.2-py2.py3-none-any.whl to s3://lang-python/common/pip-20.0.2-py2.py3-none-any.whl

$ aws s3 sync . s3://lang-python/common/ --exclude "*" --include "*.whl" --acl public-read
upload: ./pip-19.1.1-py2.py3-none-any.whl to s3://lang-python/common/pip-19.1.1-py2.py3-none-any.whl
upload: ./pip-20.0.2-py2.py3-none-any.whl to s3://lang-python/common/pip-20.0.2-py2.py3-none-any.whl
```
dryan pushed a commit to dryan/heroku-buildpack-python that referenced this pull request Nov 19, 2020
…eel (heroku#1007)

Since the version check is redundant given we control/choose the version.

The pip cache is redundant since we instead cache site-packages. The pip
cache also ends up in `/app` so isn't included in the build cache anyway.
dryan pushed a commit to dryan/heroku-buildpack-python that referenced this pull request Nov 19, 2020
They are now displayed in the build output (instead of being sent to
`/dev/null`) and fail the build early instead of failing later in
`bin/steps/pip-install`.

Fixes heroku#1002.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment