Skip to content

Use pre-built python as much as possible.#6079

Merged
pavera merged 16 commits intomainfrom
pavera/pre-compiled-python
Nov 18, 2022
Merged

Use pre-built python as much as possible.#6079
pavera merged 16 commits intomainfrom
pavera/pre-compiled-python

Conversation

@pavera
Copy link
Copy Markdown
Contributor

@pavera pavera commented Nov 8, 2022

Context

Currently we pre-install via pyenv a single version of python (usually the latest available from pyenv). We use this version if the project doesn't specify a specific version. If the project specifies a different version, we install that version just in time during the update job. This can take a significant amount of time (10-15 minutes). It also results in the updater reaching out to the public internet which for some GHES use cases is not desirable.

Approach

This PR pre-installs the latest patch version of all of the currently supported major.minor releases of Python and limits the version selection logic to major.minor so that we greatly limit the number of just in time pyenv install commands we issue.

I've modified the Dockerfile to pre-install multiple versions via pyenv and tar.gz the resulting directories to limit the size of the resulting docker image. When a request is made for a specific python version, I untar the archive just in time. This does increase the build time of the dependabot-core image. A possible optimization would be to build a pyenv specific docker image and make core a multi-stage build that simply pulls in the pyenv tarball. The approach I've taken also has the downside of every python job needing to pay the cost of unpacking the tarball which takes around 10-15s in my testing so I feel like that's acceptable to save 10-15 minutes in the worst cases.

@deivid-rodriguez
Copy link
Copy Markdown
Contributor

This is so great!

I actually thought of doing the same thing, but didn't come up with the idea of compressing the installation, so I discarded the idea because it would increase image size too much. Smart!

I'm not sure how Python considers versions of Python itself for resolution, so my question is, could this affect correctness in some cases, namely, when a project requires a specific patch level version of Python that's not the latest?

@pavera
Copy link
Copy Markdown
Contributor Author

pavera commented Nov 8, 2022

I'm not sure how Python considers versions of Python itself for resolution, so my question is, could this affect correctness in some cases, namely, when a project requires a specific patch level version of Python that's not the latest?

Yes, this is also a concern with this approach. I've been trying to think about ways to capture data related to this but haven't had any jump out at me. I don't know how common it is for projects themselves to specify a python patch version. From the testing I did the default in poetry, pipenv, and pip-compile is to only record the major.minor in the project metadata. In this case currently we install the latest patch version via pyenv for that major.minor pair. Expanding this to ignore patch versions specified in the project may cause issues for those specific projects.

@deivid-rodriguez
Copy link
Copy Markdown
Contributor

Would it make sense to limit the improvement for now to the additional Python versions being preinstalled, until we know better where and how we will regress?

@pavera pavera changed the title Use pre-build python as much as possible. Use pre-built python as much as possible. Nov 9, 2022
@pavera pavera mentioned this pull request Nov 9, 2022
1 task
Breaking out individual additional python versions in their own tarballs. This is because we need a functional installed pyenv for file parsing before we've determined the python version to use, little chicken and egg problem.

I've also made pipenv properly ignore patch versions of python in this commit.
@Kurt-von-Laven
Copy link
Copy Markdown
Contributor

Overall, I would suggest using the latest Python version that matches the project specification, and I agree that can be extremely performant for sufficiently recent Python versions when the project specifies the latest patch version or doesn't specify the patch version at all. When the project specifies a different patch version, it seems like it should be possible to install it reasonably quickly, certainly far faster than 10-15 minutes. actions/setup-python does more or less all of the optimizations you are describing and more in the context of actions/runner-images, but maybe that project can be reused on the dependabot-core image. The Ubuntu 22.04 runner image ships with the latest patch version of several recent versions of Python, so you may even be able to reuse some of their image configuration code. We use actions/setup-python in a composite action, so I don't have the best sense of how long it takes to install an arbitrary version, but I'd guess 5-60 seconds since our entire workflow typically runs in under 3 minutes.

That being said, I suspect it will rarely make problems to use the latest patch version if you decide to go that route. I co-maintain MegaLinter, which pins the Python version for all our users, because MegaLinter itself is predominantly written in Python and uses a Python Alpine base Docker image. This can create issues for projects that aren't on the same major and minor versions of Python as the version of MegaLinter they are running, but differences in the patch version have yet to be an issue. Of course, Dependabot's user base is much larger, so it probably includes some highly eccentric projects.

@pavera pavera marked this pull request as ready for review November 10, 2022 05:45
@pavera pavera requested a review from a team as a code owner November 10, 2022 05:45
This change is intended to improve cache hits and prevent re-building python as much as possible
Copy link
Copy Markdown
Contributor

@brrygrdn brrygrdn left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🚀

@pavera pavera merged commit 57a4252 into main Nov 18, 2022
@pavera pavera deleted the pavera/pre-compiled-python branch November 18, 2022 19:54
@pavera pavera mentioned this pull request Nov 30, 2022
@jeffwidman
Copy link
Copy Markdown
Member

FTR, this added about 0.9 GB to the resulting updater image size. That's probably an acceptable tradeoff here for now, but will be nice when we can eventually split the docker images by ecosystem:

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

7 participants