-
-
Notifications
You must be signed in to change notification settings - Fork 3.6k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Design document for new Docker images structure #7566
Conversation
- Explains a little what's the current situations - Mention problems we already had - Propose a new and different naming for images - Ideas to support custom Docker images
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I didn't review this super closely, but I commented on the bits that caught my eye. I think it's a good start, and I mostly just had some naming feedback.
|
||
.. note:: | ||
|
||
I don't think it's useful to have ``ubuntu20-py37`` exposed to users, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Wouldn't that make the build images much smaller? Seems worth doing, no?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Each version of Python is ~400mb and we have currently 6 versions, so it will reduce ~2Gb
# readthedocs/build:latest
docs@71578174d2ac:~/.pyenv$ du -hs versions/*
382M versions/2.7.18
449M versions/3.5.10
438M versions/3.6.12
451M versions/3.7.9
383M versions/3.8.6
161M versions/pypy3.5-7.0.0
docs@71578174d2ac:~/.pyenv$
Reducing size on images is good, but we need to think about the complexity that introduces building one image per Python version per OS version and with/without PDF support (24 images with 6 Python versions and 2 OS versions supported, and 28 if we add py3.9). In this scenario, if we need to add something to the base image, we need to rebuild a lot of images. We need to find a balance on this considering the pros and cons for each scenario. What are those pros/cons that you can think about?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I thought this was the main change we wanted to implement and I would be a big +1 on per python version images.
Pros/cons i see are:
👍 No more monolith images
👍 Changing the images would be less risky because images are more isolated
👍 If we want to make a small change to one image, we don't need to generate large images
👎 We have to build a pile of images on base image change
👍 Building the pile of images will be automated, Dockerfiles can be generated or automated with env vars
👍 We don't need the nopdf
abstraction because end image size is already smaller
👎 Mixed environment images are .. weird
👍 Explicit versioning removes the need for versioning of the docker images like we have been
The monolith image is where we notice the most friction now. Making a change to the monoliths is risky because of the high number of changes that can be introduced. Narrowing the focus of the images reduces the potential for side effects and allows us to make a small change (python minor version change) and only affect 1 image.
The pile of images is probably not much of an issue. The base image build will speed this process up, and we can automate the generation of Dockerfiles or the build process rather easily.
The nopdf
abstraction becomes unnecessary if our build images are small. I am very happy to reduce work here and cut out the nopdf
abstraction entirely.
Brought up an alternative example of versioning here:
#7620
This was closer to what I thought we were originally discussing
|
||
.. Taken from https://github.com/readthedocs/readthedocs-docker-images/blob/master/Dockerfile | ||
|
||
* ``ubuntu20-base`` |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ubuntu20
is a confusing name. We should be explicit. Is this Ubuntu 20.04? If so, it should be ubuntu-20.04-base
or something.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We use only Ubuntu LTS versions, so ubuntu20
is Ubuntu 20.04 and ubuntu22
will be Ubuntu 22.04.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think we could use the docker structure for labels and name this ubuntu-base:20.04
or ubuntu-base:20
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
oh, this is already the "version" part of the images, I'm not sure if it's allowed to use :
again here. Or maybe we can start naming the images as readthedocs/build-ubuntu-base:20.4 or maybe take some inspiration from circle https://circleci.com/docs/2.0/circleci-images/
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ubuntu-base:20.04
is the most conventional and understandable in my opinion. Even if you only use LTS versions, the version of Ubuntu is not 20 but 20.04, it is an unnecessary and confusing convention reducing it to 20 and also Ubuntu can change their naming standards as they wish.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm not super worried about how we will tag our own images, the most important thing here is how we will expose them to users via the config file (build.image
).
I'm fine take some inspiration from circleci and tag them as rtd/ubuntu20:py37
. The main point here is that it will be exposed as ubuntu20:py37
or ubuntu20-py37
which is almost the same to the end user.
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ | ||
|
||
At some point an old version of Python will be deprecated (eg. 3.4) and will be removed from our Docker images. | ||
These versions should only be removed when the OS in the ``base`` is upgraded (eg. from ``ubuntu20`` to ``ubuntu22``). |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This seems like a good reason to have explicit python versioned images.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
How long do we want to support old Python versions? That's the question we need to answer. If we have explicit python versioned images, when are going to remove the image ubuntu18-py34
from servers?
The when is what we need a plan for. Here, I'm suggesting doing this breaking change together with other breaking changes (eg. OS upgrade) but we can keep the same Python versions from ubuntu20
in ubuntu22
if we want.
Talking through this, we can start without python versions on the docker image names (eg.
I think this approach is a solid upgrade, and will make a better experience for users, and we can adjust as we go. Most importantly, it gives us a lot of new options, without removing much in the way of future options. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think OS versioning does accomplish something similar to latest/stable versioning, but it does feel like we've lost some of the versioning protection. Because I'm -1 on the nopdf
image, the base
image is only really useful for custom user images -- we don't get the benefit. Because of this, to me, the end outcome seems mostly limited to renaming latest
and stable
to ubuntu20
and ubuntu18
.
I'm a big +1 on python versioned images. I think we get the most benefits here, but adding in node/etc versions gets awkward quick.
I illustrated what I thought we were going for here:
#7620
|
||
.. note:: | ||
|
||
I don't think it's useful to have ``ubuntu20-py37`` exposed to users, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I thought this was the main change we wanted to implement and I would be a big +1 on per python version images.
Pros/cons i see are:
👍 No more monolith images
👍 Changing the images would be less risky because images are more isolated
👍 If we want to make a small change to one image, we don't need to generate large images
👎 We have to build a pile of images on base image change
👍 Building the pile of images will be automated, Dockerfiles can be generated or automated with env vars
👍 We don't need the nopdf
abstraction because end image size is already smaller
👎 Mixed environment images are .. weird
👍 Explicit versioning removes the need for versioning of the docker images like we have been
The monolith image is where we notice the most friction now. Making a change to the monoliths is risky because of the high number of changes that can be introduced. Narrowing the focus of the images reduces the potential for side effects and allows us to make a small change (python minor version change) and only affect 1 image.
The pile of images is probably not much of an issue. The base image build will speed this process up, and we can automate the generation of Dockerfiles or the build process rather easily.
The nopdf
abstraction becomes unnecessary if our build images are small. I am very happy to reduce work here and cut out the nopdf
abstraction entirely.
Brought up an alternative example of versioning here:
#7620
This was closer to what I thought we were originally discussing
* user requirements | ||
* plantuml, imagemagick, rsgv-convert, swig | ||
* sphinx-js dependencies | ||
* rust |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think the point would be to stop adding weird requirements that bloat the images like this. Something like rust could be included in a custom image until more users need it.
This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions. |
After the feedback received here, I think what we want is around these lines:
I came up with this idea:
Example of usage via config file: build:
image: ubuntu20-py39
apt:
- swig
- imagemagick
extras:
- node==14.16
- rust==1.46.0 This case will trigger this command on the builder: docker build \
--tag ${BUILD_ID} \
--file Dockerfile.custom \
--build-arg RTD_IMAGE=ubuntu20-py39
--build-arg RTD_NODE_VERSION=14.16 \
--build-arg RTD_RUST_VERSION=1.46.0 \
--build-arg RTD_APT_PACKAGES=swig,imagemagick where FROM readthedocs:${RTD_IMAGE}
ARG RTD_IMAGE
ARG RTD_NODE_VERSION
ARG RTD_RUST_VERSION
ARG RTD_APT_PACKAGES
USER root
WORKDIR /
RUN apt-get update
RUN apt-get install -y ${RTD_APT_PACKAGES}
USER docs
WORKDIR /home/docs
RUN nodenv install ${RTD_NODE_VERSION}
RUN nodenv global ${RTD_NODE_VERSION}
RUN curl https://sh.rustup.rs -sSf | sh -s -- -y --default-toolchain ${RTD_RUST_VERSION}
ENV PATH="/home/docs/.cargo/bin:$PATH"
|
I think that summarizes the overall plan very well. I agree with almost everything you said, and I like the apt install addition idea. Allowing arbitrary packages would help a number of requests that we get for odd installation dependencies. I'm not quite sold that we need to support a ubuntu20 and ubuntu18 base package, but this seems like a minor point we can discuss more. After a testing period, I'm not sure there are enough major differences in os level packages that we would want to support 2 sets of images. The node/rust/etc dependency pattern doesn't feel complete yet however, but this is a very tricky problem to solve well. If nodenv doesn't find a binary for ubuntu, nodenv will compile node from source every build for the user. If we expand this pattern to other
We've talked some about custom dockerfiles, but there are enough operations concerns here that I think I would only ever support this for very custom use cases or enterprise users. Custom docker images wouldn't be a good way to target such a common use case like installation of node because we'd end up with so many custom images we'd have to also worry about operations and management of these extra images -- likely just doing redundant things like installing node too. |
I think we should support this. There are lot of people that depends on packages that are already installed in our images and mostly all the package version changed from one OS version to another. I'm fine starting only with
I'm not super worried about this. I did some tests here and it took ~5s to install node and rust:
That said, we can easily change the way that we install these extra dependencies since they are managed in the
I'm 👍 on Custom Dockerfiles for enterprise users, but I don't think we are there just yet. Allowing people to install extra dependencies and apt packages should cover +80% of the cases. If we wanted, we could add an extra "chunk of code" in the |
…humitos/build-images-design-doc
I updated this document with the latest conversation we had. I'd like @readthedocs/core to review it and have some feedback before our roadmap meeting so we are all in the same page. |
I'm also now realizing that the Dockerfile.custom abstraction seems unnecessary. Couldn't we just do these commands inside the application instead of adding a docker specific abstraction?
Yeah, I mentioned that this will depend on the version specified. I don't know what coverage looks like though, ubuntu 18 and 20 could have most versions available by binary installation. If not though, users will randomly recompile the node version they specify |
Yes and no 😄 We can install BTW, building the |
|
||
.. Taken from https://github.com/readthedocs/readthedocs-docker-images/blob/master/Dockerfile | ||
|
||
* ``ubuntu20-base`` |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think we could use the docker structure for labels and name this ubuntu-base:20.04
or ubuntu-base:20
|
||
.. Taken from https://github.com/readthedocs/readthedocs-docker-images/blob/master/Dockerfile | ||
|
||
* ``ubuntu20-base`` |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
oh, this is already the "version" part of the images, I'm not sure if it's allowed to use :
again here. Or maybe we can start naming the images as readthedocs/build-ubuntu-base:20.4 or maybe take some inspiration from circle https://circleci.com/docs/2.0/circleci-images/
docker build \ | ||
--tag ${BUILD_ID} \ | ||
--file Dockerfile.custom \ | ||
--build-arg RTD_IMAGE=ubuntu20-py39 | ||
--build-arg RTD_NODE_VERSION=14.16.0 \ | ||
--build-arg RTD_RUST_VERSION=1.46.0 \ | ||
--build-arg RTD_APT_PACKAGES="swig imagemagick" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm nor sure about building the docker image before using it, don't think there is an easy way to do this with our scaling sets
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why are you referring to scale sets here?
The image will be built on the builder immediately before the starting the docs build process and after used for that particular build, it will be deleted:
- clone repository and parse config file
docker build ...
- our build commands
docker rmi ${BUILD_ID}
apt: | ||
- swig | ||
- imagemagick | ||
extras: | ||
- node==14.16 | ||
- rust==1.46.0 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
also, maybe we just need to find a way to allow to the current user to use apt, so these can be done with any custom command (people may want to use a custom package from a repository that isn't in ubuntu)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ansible does that easy with package modules, but you can restrict the package managers you want eg. https://github.com/staticdev/linux-developer-playbook/blob/main/default.config.yml#L51
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I know the plan is to maybe support nodeenv and rustenv. Maybe we consider not supporting those as a special case and we just have nodeenv/rustenv installed and they can run an arbitrary command to download and install the right version?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I decided to treat them as special thinking that in the future we could pre-build images with the most used dependencies --similar to what @ericholscher said in our meeting.
Continuing with that line, now I'm mentioned that "only major version of node and minor version of rust are available", so we can generalize this dependency more in case we want to build those pre-built images.
I'm not sold on this, tho. However, I think in general that making users to specify dependencies is cleaner than asking them to run custom commands (*). Besides, it allow us to have better/standarized data and understand more how they are using our platform.
(*) if we ever change nodenv
for a different node
version manager, those commands will break.
I don't think we need to differentiate the images by its state (stable, latest, testing) | ||
but by its main base differences: OS and Python version. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
👍
apt: | ||
- swig | ||
- imagemagick | ||
extras: | ||
- node==14.16 | ||
- rust==1.46.0 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I know the plan is to maybe support nodeenv and rustenv. Maybe we consider not supporting those as a special case and we just have nodeenv/rustenv installed and they can run an arbitrary command to download and install the right version?
|
||
At some point an old version of Python will be deprecated (eg. 3.4) and will be removed. | ||
To achieve this, we can just remove the Docker image affected: ``ubuntu20-py34``, | ||
once there are no users depending on it anymore. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I would consider raising some sort of warning in the build that surfaces on the build page. We shouldn't support these old unsupported versions forever just because somebody has forgotten they're using it.
|
||
* same as ``-py*`` versions | ||
* Conda version installed via ``pyenv`` | ||
* ``mamba`` executable (installed via ``conda``) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
👍🏽😄
I think we can merge this PR. It seems we discussed everything already and we agree on most of it. There are other design documents for specific things created. |
Rendered version: https://docs--7566.org.readthedocs.build/en/7566/development/design/build-images.html