Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Design document for new Docker images structure #7566

Merged
merged 7 commits into from
Apr 1, 2021

Conversation

humitos
Copy link
Member

@humitos humitos commented Oct 15, 2020

  • Explains a little what's the current situations
  • Mention problems we already had
  • Propose a new and different naming for images
  • Ideas to support custom Docker images

Rendered version: https://docs--7566.org.readthedocs.build/en/7566/development/design/build-images.html

- Explains a little what's the current situations
- Mention problems we already had
- Propose a new and different naming for images
- Ideas to support custom Docker images
Copy link
Member

@ericholscher ericholscher left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I didn't review this super closely, but I commented on the bits that caught my eye. I think it's a good start, and I mostly just had some naming feedback.


.. note::

I don't think it's useful to have ``ubuntu20-py37`` exposed to users,
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Wouldn't that make the build images much smaller? Seems worth doing, no?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Each version of Python is ~400mb and we have currently 6 versions, so it will reduce ~2Gb

# readthedocs/build:latest
docs@71578174d2ac:~/.pyenv$ du -hs versions/*
382M	versions/2.7.18
449M	versions/3.5.10
438M	versions/3.6.12
451M	versions/3.7.9
383M	versions/3.8.6
161M	versions/pypy3.5-7.0.0
docs@71578174d2ac:~/.pyenv$

Reducing size on images is good, but we need to think about the complexity that introduces building one image per Python version per OS version and with/without PDF support (24 images with 6 Python versions and 2 OS versions supported, and 28 if we add py3.9). In this scenario, if we need to add something to the base image, we need to rebuild a lot of images. We need to find a balance on this considering the pros and cons for each scenario. What are those pros/cons that you can think about?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I thought this was the main change we wanted to implement and I would be a big +1 on per python version images.

Pros/cons i see are:

👍 No more monolith images
👍 Changing the images would be less risky because images are more isolated
👍 If we want to make a small change to one image, we don't need to generate large images
👎 We have to build a pile of images on base image change
👍 Building the pile of images will be automated, Dockerfiles can be generated or automated with env vars
👍 We don't need the nopdf abstraction because end image size is already smaller
👎 Mixed environment images are .. weird
👍 Explicit versioning removes the need for versioning of the docker images like we have been

The monolith image is where we notice the most friction now. Making a change to the monoliths is risky because of the high number of changes that can be introduced. Narrowing the focus of the images reduces the potential for side effects and allows us to make a small change (python minor version change) and only affect 1 image.

The pile of images is probably not much of an issue. The base image build will speed this process up, and we can automate the generation of Dockerfiles or the build process rather easily.

The nopdf abstraction becomes unnecessary if our build images are small. I am very happy to reduce work here and cut out the nopdf abstraction entirely.

Brought up an alternative example of versioning here:
#7620

This was closer to what I thought we were originally discussing


.. Taken from https://github.com/readthedocs/readthedocs-docker-images/blob/master/Dockerfile

* ``ubuntu20-base``
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ubuntu20 is a confusing name. We should be explicit. Is this Ubuntu 20.04? If so, it should be ubuntu-20.04-base or something.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We use only Ubuntu LTS versions, so ubuntu20 is Ubuntu 20.04 and ubuntu22 will be Ubuntu 22.04.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we could use the docker structure for labels and name this ubuntu-base:20.04 or ubuntu-base:20

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

oh, this is already the "version" part of the images, I'm not sure if it's allowed to use : again here. Or maybe we can start naming the images as readthedocs/build-ubuntu-base:20.4 or maybe take some inspiration from circle https://circleci.com/docs/2.0/circleci-images/

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ubuntu-base:20.04 is the most conventional and understandable in my opinion. Even if you only use LTS versions, the version of Ubuntu is not 20 but 20.04, it is an unnecessary and confusing convention reducing it to 20 and also Ubuntu can change their naming standards as they wish.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not super worried about how we will tag our own images, the most important thing here is how we will expose them to users via the config file (build.image).

I'm fine take some inspiration from circleci and tag them as rtd/ubuntu20:py37. The main point here is that it will be exposed as ubuntu20:py37 or ubuntu20-py37 which is almost the same to the end user.

~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

At some point an old version of Python will be deprecated (eg. 3.4) and will be removed from our Docker images.
These versions should only be removed when the OS in the ``base`` is upgraded (eg. from ``ubuntu20`` to ``ubuntu22``).
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This seems like a good reason to have explicit python versioned images.

Copy link
Member Author

@humitos humitos Oct 21, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How long do we want to support old Python versions? That's the question we need to answer. If we have explicit python versioned images, when are going to remove the image ubuntu18-py34 from servers?

The when is what we need a plan for. Here, I'm suggesting doing this breaking change together with other breaking changes (eg. OS upgrade) but we can keep the same Python versions from ubuntu20 in ubuntu22 if we want.

@ericholscher
Copy link
Member

Talking through this, we can start without python versions on the docker image names (eg. ubuntu20). In the future if we want to change to an ubuntu20-py37, we can derive this from the users config file from the python.version key, without the user having to specify it. This gives us a backwards compat way to move to those image names, so we aren't closing off that path, but we can move forward with this approach. This approach gives us:

  • Composable images, so we can build off the base images, instead of one huge one
  • Definition of the OS, allowing us to ship an ubuntu18 and ubuntu20 image, and users can stick with what they want
  • Pinned dependencies and a test suite. This is the major thing that caused downtime/issues for us in the past, so solving this issue is I think the most important step with these new images.
  • Gives us a standard base to start thinking about user images or additional custom images that we maintain based on top of ours. This isn't defined here yet, but we're doing composable images so we'll know a lot more about the workflow for our users.

I think this approach is a solid upgrade, and will make a better experience for users, and we can adjust as we go. Most importantly, it gives us a lot of new options, without removing much in the way of future options.

Copy link
Contributor

@agjohnson agjohnson left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think OS versioning does accomplish something similar to latest/stable versioning, but it does feel like we've lost some of the versioning protection. Because I'm -1 on the nopdf image, the base image is only really useful for custom user images -- we don't get the benefit. Because of this, to me, the end outcome seems mostly limited to renaming latest and stable to ubuntu20 and ubuntu18.

I'm a big +1 on python versioned images. I think we get the most benefits here, but adding in node/etc versions gets awkward quick.

I illustrated what I thought we were going for here:
#7620


.. note::

I don't think it's useful to have ``ubuntu20-py37`` exposed to users,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I thought this was the main change we wanted to implement and I would be a big +1 on per python version images.

Pros/cons i see are:

👍 No more monolith images
👍 Changing the images would be less risky because images are more isolated
👍 If we want to make a small change to one image, we don't need to generate large images
👎 We have to build a pile of images on base image change
👍 Building the pile of images will be automated, Dockerfiles can be generated or automated with env vars
👍 We don't need the nopdf abstraction because end image size is already smaller
👎 Mixed environment images are .. weird
👍 Explicit versioning removes the need for versioning of the docker images like we have been

The monolith image is where we notice the most friction now. Making a change to the monoliths is risky because of the high number of changes that can be introduced. Narrowing the focus of the images reduces the potential for side effects and allows us to make a small change (python minor version change) and only affect 1 image.

The pile of images is probably not much of an issue. The base image build will speed this process up, and we can automate the generation of Dockerfiles or the build process rather easily.

The nopdf abstraction becomes unnecessary if our build images are small. I am very happy to reduce work here and cut out the nopdf abstraction entirely.

Brought up an alternative example of versioning here:
#7620

This was closer to what I thought we were originally discussing

* user requirements
* plantuml, imagemagick, rsgv-convert, swig
* sphinx-js dependencies
* rust
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think the point would be to stop adding weird requirements that bloat the images like this. Something like rust could be included in a custom image until more users need it.

@stale
Copy link

stale bot commented Dec 18, 2020

This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions.

@stale stale bot added the Status: stale Issue will be considered inactive soon label Dec 18, 2020
@stale stale bot closed this Dec 25, 2020
@humitos humitos reopened this Dec 28, 2020
@stale stale bot removed the Status: stale Issue will be considered inactive soon label Dec 28, 2020
@humitos humitos added the Accepted Accepted issue on our roadmap label Dec 28, 2020
@humitos
Copy link
Member Author

humitos commented Mar 10, 2021

After the feedback received here, I think what we want is around these lines:

  • be able to share a base image with common dependencies
  • base image is tied to OS version
  • create Python-specific images on top of the base images
  • allow users to specify extra dependencies (apt packages, node, rust, etc) without breaking our structure
  • allow us to upgrade Python 3.6.10 to 3.6.13 without breaking/rebuilding any other image, just py36

I came up with this idea:

  • create base images that will include everything to build PDFs
  • at the time writing this, we will have only two base images: ubuntu18 and ubuntu20
  • pre-build Python images on top of the basic ones:
    • ubuntu18-py27
    • ubuntu18-py36
    • ...
    • ubuntu20-py39
    • ubutun20-conda47
  • automatically rebuild images on commit:
    • all if there is change to the base images (eg. new apt packages added to the base)
    • only the one affect (eg. upgrade Python patch version)
  • all Python version will be managed by pyenv, including conda
  • mamba will be installed by default on -conda* images
  • users can specify extra dependencies via config file
    • APT dependencies
    • extra "known" dependencies (not available via APT, eg: node, rust)
  • if the user specify extras we build a Docker image on demand based on one of the pre-built ones to install these extras
  • users adding extra requirements will be (automatically) "penalized" since it will use building time to install them
  • sticking with the pre-built images is better to everybody

Example of usage via config file:

build:
  image: ubuntu20-py39
  apt:
    - swig
    - imagemagick
  extras:
    - node==14.16
    - rust==1.46.0

This case will trigger this command on the builder:

docker build \
  --tag ${BUILD_ID} \
  --file Dockerfile.custom \
  --build-arg RTD_IMAGE=ubuntu20-py39
  --build-arg RTD_NODE_VERSION=14.16 \
  --build-arg RTD_RUST_VERSION=1.46.0 \
  --build-arg RTD_APT_PACKAGES=swig,imagemagick

where Dockerfile.custom is:

FROM readthedocs:${RTD_IMAGE}

ARG RTD_IMAGE
ARG RTD_NODE_VERSION
ARG RTD_RUST_VERSION
ARG RTD_APT_PACKAGES

USER root
WORKDIR /

RUN apt-get update
RUN apt-get install -y ${RTD_APT_PACKAGES}

USER docs
WORKDIR /home/docs

RUN nodenv install ${RTD_NODE_VERSION}
RUN nodenv global ${RTD_NODE_VERSION}
RUN curl https://sh.rustup.rs -sSf | sh -s -- -y --default-toolchain ${RTD_RUST_VERSION}
ENV PATH="/home/docs/.cargo/bin:$PATH"

NOTE that we pretty close to support custom Docker images where the user providers a Dockerfile extending one of our images, but that requires more technical knowledge. However, it may worth to have this conversation.

@agjohnson
Copy link
Contributor

I think that summarizes the overall plan very well. I agree with almost everything you said, and I like the apt install addition idea. Allowing arbitrary packages would help a number of requests that we get for odd installation dependencies.

I'm not quite sold that we need to support a ubuntu20 and ubuntu18 base package, but this seems like a minor point we can discuss more. After a testing period, I'm not sure there are enough major differences in os level packages that we would want to support 2 sets of images.

The node/rust/etc dependency pattern doesn't feel complete yet however, but this is a very tricky problem to solve well. If nodenv doesn't find a binary for ubuntu, nodenv will compile node from source every build for the user. If we expand this pattern to other *env tools, like rustenv, goenv, etc, I don't know if these support binary installation. A couple options here, which I'm not particularly invested in:

  • Limit the versions of extras we accept, make sure binaries are available
  • Host prebuilt binaries for extra language versions for *env tools
  • Package *env installed files up and install via apt, or find a repo source already doing something similar

We've talked some about custom dockerfiles, but there are enough operations concerns here that I think I would only ever support this for very custom use cases or enterprise users. Custom docker images wouldn't be a good way to target such a common use case like installation of node because we'd end up with so many custom images we'd have to also worry about operations and management of these extra images -- likely just doing redundant things like installing node too.

@humitos
Copy link
Member Author

humitos commented Mar 11, 2021

I'm not quite sold that we need to support a ubuntu20 and ubuntu18 base package, but this seems like a minor point we can discuss more. After a testing period, I'm not sure there are enough major differences in os level packages that we would want to support 2 sets of images.

I think we should support this. There are lot of people that depends on packages that are already installed in our images and mostly all the package version changed from one OS version to another. I'm fine starting only with ubuntu20-* images, but I think it has to be on the image name so we can easily support ubuntu22-* when it's released.

The node/rust/etc dependency pattern doesn't feel complete yet however, but this is a very tricky problem to solve well. If nodenv doesn't find a binary for ubuntu, nodenv will compile node from source every build for the user. If we expand this pattern to other *env tools, like rustenv, goenv, etc, I don't know if these support binary installation. A couple options here, which I'm not particularly invested in:

I'm not super worried about this. I did some tests here and it took ~5s to install node and rust:

docs@e7acf8b49fa0:~$ time curl https://sh.rustup.rs -sSf | sh -s -- -y --default-toolchain 1.46.0
info: downloading installer
...

Rust is installed now. Great!
...

real	0m4.682s
user	0m0.349s
sys	0m0.246s
docs@e7acf8b49fa0:~$ time ./.nodenv/bin/nodenv install 14.16.0
Downloading node-v14.16.0-linux-x64.tar.gz...
-> https://nodejs.org/dist/v14.16.0/node-v14.16.0-linux-x64.tar.gz
Installing node-v14.16.0-linux-x64...
Installed node-v14.16.0-linux-x64 to /home/docs/.nodenv/versions/14.16.0


real	0m5.686s
user	0m1.609s
sys	0m1.155s

NOTE that I also tried nodenv install nightly and it took ~7s (tried different random versions and they took similar times). On the other hand, node-env (which needs to be compiled) took longer: ~50m --which is unacceptable.

That said, we can easily change the way that we install these extra dependencies since they are managed in the Dockerfile.custom, and modifying that file won't break builds if we leave the binary in the same place but change the installation method. I'm happy to explore other way of installing these extras (in particular if we can find well maintained PPA and install them directly with apt-get) but I'd like to continue moving forward on this design so we don't block it on this. We can improve performance later if needed.

We've talked some about custom dockerfiles, but there are enough operations concerns here that I think I would only ever support this for very custom use cases or enterprise users

I'm 👍 on Custom Dockerfiles for enterprise users, but I don't think we are there just yet. Allowing people to install extra dependencies and apt packages should cover +80% of the cases. If we wanted, we could add an extra "chunk of code" in the Dockerfile.custom that executes custom commands to reduce the cases that require a Custom Docker Image a little more. Currently I'm seeing that as a complication for the new design that I'm not sure it's super useful to add at this point. However, it should be easy to add it later as an improvement/feature by letting people add those commands in the config file and just running them in the dockerfile.

@humitos
Copy link
Member Author

humitos commented Mar 16, 2021

I updated this document with the latest conversation we had. I'd like @readthedocs/core to review it and have some feedback before our roadmap meeting so we are all in the same page.

@humitos humitos requested a review from a team March 16, 2021 15:12
@agjohnson
Copy link
Contributor

I'm also now realizing that the Dockerfile.custom abstraction seems unnecessary. Couldn't we just do these commands inside the application instead of adding a docker specific abstraction?

I'm not super worried about this. I did some tests here and it took ~5s to install node and rust:

Yeah, I mentioned that this will depend on the version specified. I don't know what coverage looks like though, ubuntu 18 and 20 could have most versions available by binary installation. If not though, users will randomly recompile the node version they specify

humitos added a commit to readthedocs/readthedocs-docker-images that referenced this pull request Mar 17, 2021
@humitos
Copy link
Member Author

humitos commented Mar 17, 2021

I'm also now realizing that the Dockerfile.custom abstraction seems unnecessary. Couldn't we just do these commands inside the application instead of adding a docker specific abstraction?

Yes and no 😄

We can install node and rust from the application itself since they are executed as docs user, but we can't install packages with apt-get since the application doesn't have root/sudo permissions on the Docker container.

BTW, building the Dockerfile.custom takes ~2m locally. I updated the document to reflect this.


.. Taken from https://github.com/readthedocs/readthedocs-docker-images/blob/master/Dockerfile

* ``ubuntu20-base``
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we could use the docker structure for labels and name this ubuntu-base:20.04 or ubuntu-base:20


.. Taken from https://github.com/readthedocs/readthedocs-docker-images/blob/master/Dockerfile

* ``ubuntu20-base``
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

oh, this is already the "version" part of the images, I'm not sure if it's allowed to use : again here. Or maybe we can start naming the images as readthedocs/build-ubuntu-base:20.4 or maybe take some inspiration from circle https://circleci.com/docs/2.0/circleci-images/

Comment on lines 136 to 142
docker build \
--tag ${BUILD_ID} \
--file Dockerfile.custom \
--build-arg RTD_IMAGE=ubuntu20-py39
--build-arg RTD_NODE_VERSION=14.16.0 \
--build-arg RTD_RUST_VERSION=1.46.0 \
--build-arg RTD_APT_PACKAGES="swig imagemagick"
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm nor sure about building the docker image before using it, don't think there is an easy way to do this with our scaling sets

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why are you referring to scale sets here?

The image will be built on the builder immediately before the starting the docs build process and after used for that particular build, it will be deleted:

  • clone repository and parse config file
  • docker build ...
  • our build commands
  • docker rmi ${BUILD_ID}

Comment on lines 122 to 127
apt:
- swig
- imagemagick
extras:
- node==14.16
- rust==1.46.0
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

also, maybe we just need to find a way to allow to the current user to use apt, so these can be done with any custom command (people may want to use a custom package from a repository that isn't in ubuntu)

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ansible does that easy with package modules, but you can restrict the package managers you want eg. https://github.com/staticdev/linux-developer-playbook/blob/main/default.config.yml#L51

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I know the plan is to maybe support nodeenv and rustenv. Maybe we consider not supporting those as a special case and we just have nodeenv/rustenv installed and they can run an arbitrary command to download and install the right version?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I decided to treat them as special thinking that in the future we could pre-build images with the most used dependencies --similar to what @ericholscher said in our meeting.

Continuing with that line, now I'm mentioned that "only major version of node and minor version of rust are available", so we can generalize this dependency more in case we want to build those pre-built images.

I'm not sold on this, tho. However, I think in general that making users to specify dependencies is cleaner than asking them to run custom commands (*). Besides, it allow us to have better/standarized data and understand more how they are using our platform.

(*) if we ever change nodenv for a different node version manager, those commands will break.

Comment on lines 260 to 261
I don't think we need to differentiate the images by its state (stable, latest, testing)
but by its main base differences: OS and Python version.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

👍

Comment on lines 122 to 127
apt:
- swig
- imagemagick
extras:
- node==14.16
- rust==1.46.0
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I know the plan is to maybe support nodeenv and rustenv. Maybe we consider not supporting those as a special case and we just have nodeenv/rustenv installed and they can run an arbitrary command to download and install the right version?


At some point an old version of Python will be deprecated (eg. 3.4) and will be removed.
To achieve this, we can just remove the Docker image affected: ``ubuntu20-py34``,
once there are no users depending on it anymore.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would consider raising some sort of warning in the build that surfaces on the build page. We shouldn't support these old unsupported versions forever just because somebody has forgotten they're using it.

@humitos humitos requested a review from a team March 18, 2021 11:52

* same as ``-py*`` versions
* Conda version installed via ``pyenv``
* ``mamba`` executable (installed via ``conda``)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

👍🏽😄

@humitos
Copy link
Member Author

humitos commented Mar 31, 2021

I think we can merge this PR. It seems we discussed everything already and we agree on most of it. There are other design documents for specific things created.

@humitos humitos enabled auto-merge April 1, 2021 09:18
@humitos humitos merged commit 8b2c504 into master Apr 1, 2021
@humitos humitos deleted the humitos/build-images-design-doc branch April 1, 2021 09:19
@humitos humitos mentioned this pull request May 4, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Accepted Accepted issue on our roadmap
Projects
None yet
Development

Successfully merging this pull request may close these issues.

7 participants