Use pre-built python as much as possible.#6079
Conversation
|
This is so great! I actually thought of doing the same thing, but didn't come up with the idea of compressing the installation, so I discarded the idea because it would increase image size too much. Smart! I'm not sure how Python considers versions of Python itself for resolution, so my question is, could this affect correctness in some cases, namely, when a project requires a specific patch level version of Python that's not the latest? |
Yes, this is also a concern with this approach. I've been trying to think about ways to capture data related to this but haven't had any jump out at me. I don't know how common it is for projects themselves to specify a python patch version. From the testing I did the default in poetry, pipenv, and pip-compile is to only record the |
|
Would it make sense to limit the improvement for now to the additional Python versions being preinstalled, until we know better where and how we will regress? |
Breaking out individual additional python versions in their own tarballs. This is because we need a functional installed pyenv for file parsing before we've determined the python version to use, little chicken and egg problem. I've also made pipenv properly ignore patch versions of python in this commit.
|
Overall, I would suggest using the latest Python version that matches the project specification, and I agree that can be extremely performant for sufficiently recent Python versions when the project specifies the latest patch version or doesn't specify the patch version at all. When the project specifies a different patch version, it seems like it should be possible to install it reasonably quickly, certainly far faster than 10-15 minutes. actions/setup-python does more or less all of the optimizations you are describing and more in the context of actions/runner-images, but maybe that project can be reused on the dependabot-core image. The Ubuntu 22.04 runner image ships with the latest patch version of several recent versions of Python, so you may even be able to reuse some of their image configuration code. We use actions/setup-python in a composite action, so I don't have the best sense of how long it takes to install an arbitrary version, but I'd guess 5-60 seconds since our entire workflow typically runs in under 3 minutes. That being said, I suspect it will rarely make problems to use the latest patch version if you decide to go that route. I co-maintain MegaLinter, which pins the Python version for all our users, because MegaLinter itself is predominantly written in Python and uses a Python Alpine base Docker image. This can create issues for projects that aren't on the same major and minor versions of Python as the version of MegaLinter they are running, but differences in the patch version have yet to be an issue. Of course, Dependabot's user base is much larger, so it probably includes some highly eccentric projects. |
This change is intended to improve cache hits and prevent re-building python as much as possible
|
FTR, this added about 0.9 GB to the resulting |
Context
Currently we pre-install via pyenv a single version of python (usually the latest available from pyenv). We use this version if the project doesn't specify a specific version. If the project specifies a different version, we install that version just in time during the update job. This can take a significant amount of time (10-15 minutes). It also results in the updater reaching out to the public internet which for some GHES use cases is not desirable.
Approach
This PR pre-installs the latest patch version of all of the currently supported
major.minorreleases of Python and limits the version selection logic tomajor.minorso that we greatly limit the number of just in timepyenv installcommands we issue.I've modified the Dockerfile to pre-install multiple versions via pyenv and tar.gz the resulting directories to limit the size of the resulting docker image. When a request is made for a specific python version, I untar the archive just in time. This does increase the build time of the dependabot-core image. A possible optimization would be to build a pyenv specific docker image and make core a multi-stage build that simply pulls in the pyenv tarball. The approach I've taken also has the downside of every python job needing to pay the cost of unpacking the tarball which takes around 10-15s in my testing so I feel like that's acceptable to save 10-15 minutes in the worst cases.