Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Build: allow to install packages with apt #8065

Merged
merged 12 commits into from
May 12, 2021
24 changes: 23 additions & 1 deletion docs/config-file/v2.rst
Original file line number Diff line number Diff line change
Expand Up @@ -64,7 +64,7 @@ This is to avoid typos and provide feedback on invalid configurations.

.. contents::
:local:
:depth: 1
:depth: 3
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I wonder if this will spam the TOC too much -- seems fine tho?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I was between 2 or 3, but wanted to have build.apt_images and sphing.builder visible instead of just having sphinx and build, but no strong opinion.


version
~~~~~~~
Expand Down Expand Up @@ -298,6 +298,9 @@ Configuration for the documentation build process.
build:
image: latest
apt_packages:
- mysql-client
- cmatrix
python:
version: 3.7
Expand All @@ -318,6 +321,25 @@ as defined here:
* `stable <https://github.com/readthedocs/readthedocs-docker-images/tree/releases/5.x>`_: :buildpyversions:`stable`
* `latest <https://github.com/readthedocs/readthedocs-docker-images/tree/releases/6.x>`_: :buildpyversions:`latest`

build.apt_packages
``````````````````

List of `APT packages`_ to install.

.. _APT packages: https://packages.ubuntu.com/

:Type: ``list``
:Default: ``[]``

.. code-block:: yaml
version: 2
build:
apt_packages:
- mysql-client
- cmatrix
sphinx
~~~~~~

Expand Down
66 changes: 60 additions & 6 deletions readthedocs/config/config.py
Original file line number Diff line number Diff line change
Expand Up @@ -64,6 +64,7 @@
SUBMODULES_INVALID = 'submodules-invalid'
INVALID_KEYS_COMBINATION = 'invalid-keys-combination'
INVALID_KEY = 'invalid-key'
INVALID_NAME = 'invalid-name'

LATEST_CONFIGURATION_VERSION = 2

Expand Down Expand Up @@ -124,7 +125,9 @@ def _get_display_key(self):
# Checks for patterns similar to `python.install.0.requirements`
# if matched change to `python.install[0].requirements` using backreference.
return re.sub(
r'^(python\.install)(\.)(\d+)(\.\w+)$', r'\1[\3]\4', self.key
r'^([a-zA-Z_.-]+)\.(\d+)([a-zA-Z_.-]*)$',
r'\1[\2]\3',
self.key
)


Expand Down Expand Up @@ -622,7 +625,10 @@ def conda(self):
@property
def build(self):
"""The docker image used by the builders."""
return Build(**self._config['build'])
return Build(
apt_packages=[],
**self._config['build'],
)

@property
def doctype(self):
Expand Down Expand Up @@ -745,12 +751,60 @@ def validate_build(self):
),
)

# Allow to override specific project
config_image = self.defaults.get('build_image')
if config_image:
build['image'] = config_image
# Allow to override specific project
config_image = self.defaults.get('build_image')
if config_image:
build['image'] = config_image

with self.catch_validation_error('build.apt_packages'):
raw_packages = self._raw_config.get('build', {}).get('apt_packages', [])
validate_list(raw_packages)
# Transform to a dict, so is easy to validate individual entries.
self._raw_config.setdefault('build', {})['apt_packages'] = (
list_to_dict(raw_packages)
)

build['apt_packages'] = [
self.validate_apt_package(index)
for index in range(len(raw_packages))
]
if not raw_packages:
self.pop_config('build.apt_packages')

return build

def validate_apt_package(self, index):
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I feel like we probably can be smarter about this. Can we not call apt after a --, which is the shell way of handling the end of options?

Unless otherwise noted, each builtin command documented as accepting options preceded by ‘-’ accepts ‘--’ to signify the end of the options. The :, true, false, and test/[ builtins do not accept options and do not treat ‘--’ specially. The exit, logout, return, break, continue, let, and shift builtins accept and process arguments beginning with ‘-’ without requiring ‘--’. Other builtins that accept arguments but are not specified as accepting options interpret arguments beginning with ‘-’ as invalid options and require ‘--’ to prevent this interpretation.

We should still raise a validation error here, but I'd like to see additional restrictions for this built in.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good point Eric.

Also, we need to take care of executing commands in the same line as well. Like,

apt-get install `touch /tmp/hello.txt`

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I believe python's subprocess command will handle this, but we should definitely be sure and have tests for it.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, subprocess should handle this as long as you don't pass shell=True. However, it's worth a test to make sure we don't ever mess that up.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think we use subprocess, but the Docker client API (exec_create), tho.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We don't use subprocess, we escape special chars in

def get_wrapped_command(self):
"""
Wrap command in a shell and optionally escape special bash characters.
In order to set the current working path inside a docker container, we
need to wrap the command in a shell call manually.
Some characters will be interpreted as shell characters without
escaping, such as: ``pip install requests<0.8``. When passing
``escape_command=True`` in the init method this escapes a good majority
of those characters.
"""
bash_escape_re = re.compile(
r"([\t\ \!\"\#\$\&\'\(\)\*\:\;\<\>\?\@"
r'\[\\\]\^\`\{\|\}\~])',
)
prefix = ''
if self.bin_path:
prefix += 'PATH={}:$PATH '.format(self.bin_path)
command = (
' '.join([
bash_escape_re.sub(r'\\\1', part) if self.escape_command else part
for part in self.command
])
)
return (
"/bin/sh -c 'cd {cwd} && {prefix}{cmd}'".format(
cwd=self.cwd,
prefix=prefix,
cmd=command,
)
)

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Have you tested defining packages in a malicious way? (e.g. trying to executed forbidden things). From this docstring it only happen under certain circumstances. Are we sure we are always calling this in that way?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

escape_command defaults to true if that's what you mean.

"""
Validate the package name to avoid injections of extra options.
Packages names can be a regex pattern.
We just validate that they aren't interpreted as an option or file.
"""
key = f'build.apt_packages.{index}'
package = self.pop_config(key)
with self.catch_validation_error(key):
validate_string(package)
package = package.strip()
invalid_starts = [
# Don't allow to inject extra options.
'-',
'\\',
# Don't allow to install from a path.
'/',
'.',
]
for start in invalid_starts:
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I left this bc feels like it gives a nice error for usual mistakes, but isn't needed as the regex below already validates this.

if package.startswith(start):
self.error(
key=key,
message=(
'Invalid package name. '
f'Package can\'t start with {start}',
),
code=INVALID_NAME,
)
return package

def validate_python(self):
"""
Validates the python key.
Expand Down
2 changes: 1 addition & 1 deletion readthedocs/config/models.py
Original file line number Diff line number Diff line change
Expand Up @@ -28,7 +28,7 @@ def as_dict(self):

class Build(Base):

__slots__ = ('image',)
__slots__ = ('image', 'apt_packages')


class Python(Base):
Expand Down
51 changes: 51 additions & 0 deletions readthedocs/config/tests/test_config.py
Original file line number Diff line number Diff line change
Expand Up @@ -26,6 +26,7 @@
CONFIG_REQUIRED,
CONFIG_SYNTAX_INVALID,
INVALID_KEY,
INVALID_NAME,
PYTHON_INVALID,
VERSION_INVALID,
)
Expand Down Expand Up @@ -748,6 +749,7 @@ def test_as_dict(tmpdir):
},
'build': {
'image': 'readthedocs/build:latest',
'apt_packages': [],
},
'conda': None,
'sphinx': {
Expand Down Expand Up @@ -935,6 +937,54 @@ def test_build_image_check_invalid_type(self, value):
build.validate()
assert excinfo.value.key == 'build.image'

@pytest.mark.parametrize(
'value',
[
[],
['cmatrix'],
['mysql', 'cmatrix', 'postgresql-dev'],
['mysql', 'cmatrix', 'postgresql=1.2.3'],
],
)
def test_build_apt_packages_check_valid(self, value):
build = self.get_build_config({'build': {'apt_packages': value}})
build.validate()
assert build.build.apt_packages == value

@pytest.mark.parametrize(
'value',
[3, 'string', {}],
)
def test_build_apt_packages_invalid_type(self, value):
build = self.get_build_config({'build': {'apt_packages': value}})
with raises(InvalidConfig) as excinfo:
build.validate()
assert excinfo.value.key == 'build.apt_packages'

@pytest.mark.parametrize(
'error_index, value',
[
(0, ['/', 'cmatrix']),
(1, ['cmatrix', '-q']),
(1, ['cmatrix', ' -q']),
(1, ['cmatrix', '\\-q']),
(1, ['cmatrix', '--quiet']),
(1, ['cmatrix', ' --quiet']),
(2, ['cmatrix', 'quiet', './package.deb']),
(2, ['cmatrix', 'quiet', ' ./package.deb ']),
(2, ['cmatrix', 'quiet', '/home/user/package.deb']),
(2, ['cmatrix', 'quiet', ' /home/user/package.deb']),
(2, ['cmatrix', 'quiet', '../package.deb']),
(2, ['cmatrix', 'quiet', ' ../package.deb']),
],
)
def test_build_apt_packages_invalid_value(self, error_index, value):
build = self.get_build_config({'build': {'apt_packages': value}})
with raises(InvalidConfig) as excinfo:
build.validate()
assert excinfo.value.key == f'build.apt_packages.{error_index}'
assert excinfo.value.code == INVALID_NAME

@pytest.mark.parametrize('value', [3, [], 'invalid'])
def test_python_check_invalid_types(self, value):
build = self.get_build_config({'python': value})
Expand Down Expand Up @@ -2072,6 +2122,7 @@ def test_as_dict(self, tmpdir):
},
'build': {
'image': 'readthedocs/build:latest',
'apt_packages': [],
},
'conda': None,
'sphinx': {
Expand Down
18 changes: 17 additions & 1 deletion readthedocs/projects/tasks.py
Original file line number Diff line number Diff line change
Expand Up @@ -794,7 +794,7 @@ def run_build(self, record):
environment=self.build_env,
)
with self.project.repo_nonblockinglock(version=self.version):
self.setup_python_environment()
self.setup_build()

# TODO the build object should have an idea of these states,
# extend the model to include an idea of these outcomes
Expand Down Expand Up @@ -1152,6 +1152,10 @@ def update_app_instances(
search_ignore=self.config.search.ignore,
)

def setup_build(self):
self.install_system_dependencies()
self.setup_python_environment()

def setup_python_environment(self):
"""
Build the virtualenv and install the project into it.
Expand All @@ -1177,6 +1181,18 @@ def setup_python_environment(self):
if self.project.has_feature(Feature.LIST_PACKAGES_INSTALLED_ENV):
self.python_env.list_packages_installed()

def install_system_dependencies(self):
packages = self.config.build.apt_packages
if packages:
self.build_env.run(
'apt-get', 'update', '-y', '-q',
user=settings.RTD_BUILD_SUPER_USER,
)
self.build_env.run(
'apt-get', 'install', '-y', '-q', *packages,
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We should always use the long-version of arguments in code, to make it clearer what it does. Also add in the -- Ending

Suggested change
'apt-get', 'install', '-y', '-q', *packages,
# -- ends all command arguments to protect against users passing them
'apt-get', 'install', '--assume-yes', '--quiet', '--', *packages,

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is there a reason we want this to be quiet? I know the output is sort of long but it might help folks debug their own problems. I tested it with --quiet and it's still fairly verbose but maybe a comment is appropriate on why.

I looked at what CircleCI does and they basically give users free reign to run arbitrary commands including apt-get install. Our system seems quite a bit more locked down than that which seems fine. There's still some risk (could a user install some docker utilities and somehow access the host?) but I think as long as we're just installing built-in Debian packages and sanitizing the inputs really well, we should be OK.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is there a reason we want this to be quiet? I know the output is sort of long but it might help folks debug their own problems. I tested it with --quiet and it's still fairly verbose but maybe a comment is appropriate on why.

-q will suppress the progress bar not the output itself, -qq will suppress everything.

user=settings.RTD_BUILD_SUPER_USER,
)

def build_docs(self):
"""
Wrapper to all build functions.
Expand Down
4 changes: 4 additions & 0 deletions readthedocs/rtd_tests/fixtures/spec/v2/schema.yml
Original file line number Diff line number Diff line change
Expand Up @@ -50,6 +50,10 @@ build:
# Note: it can be overridden by a project
image: enum('stable', 'latest', required=False)

# List of packages to be installed with apt-get
# Default: []
apt_packages: list(str(), required=False)

python:
# The Python version (this depends on the build image)
# Default: '3'
Expand Down
71 changes: 70 additions & 1 deletion readthedocs/rtd_tests/tests/test_celery.py
Original file line number Diff line number Diff line change
Expand Up @@ -2,6 +2,7 @@
import shutil
from os.path import exists
from tempfile import mkdtemp
from unittest import mock
from unittest.mock import MagicMock, patch

from allauth.socialaccount.models import SocialAccount
Expand All @@ -17,7 +18,11 @@
LATEST,
)
from readthedocs.builds.models import Build, Version
from readthedocs.doc_builder.environments import LocalBuildEnvironment
from readthedocs.config.config import BuildConfigV2
from readthedocs.doc_builder.environments import (
BuildEnvironment,
LocalBuildEnvironment,
)
from readthedocs.doc_builder.exceptions import VersionLockedError
from readthedocs.oauth.models import RemoteRepository, RemoteRepositoryRelation
from readthedocs.projects import tasks
Expand Down Expand Up @@ -469,3 +474,67 @@ def test_send_build_status_no_remote_repo_or_social_account_gitlab(self, send_bu

send_build_status.assert_not_called()
self.assertEqual(Message.objects.filter(user=self.eric).count(), 1)

@patch('readthedocs.projects.tasks.UpdateDocsTaskStep.setup_python_environment', new=MagicMock)
@patch('readthedocs.projects.tasks.UpdateDocsTaskStep.build_docs', new=MagicMock)
@patch('readthedocs.doc_builder.environments.BuildEnvironment.update_build', new=MagicMock)
@patch('readthedocs.projects.tasks.UpdateDocsTaskStep.setup_vcs', new=MagicMock)
@patch.object(BuildEnvironment, 'run')
@patch('readthedocs.doc_builder.config.load_config')
def test_install_apt_packages(self, load_config, run):
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This seems like a lot of patches? Seems we're trying to stop the builds from processing, but I wonder if there's a cleaner way to do this?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I was able to remove one patch, but I think a cleaner way would require refactor the build task so we can mock self.build_env, self.setup_env and self.python_env

config = BuildConfigV2(
{},
{
'version': 2,
'build': {
'apt_packages': [
'clangd',
'cmatrix',
],
},
},
source_file='readthedocs.yml',
)
config.validate()
load_config.return_value = config

version = self.project.versions.first()
build = get(
Build,
project=self.project,
version=version,
)
with mock_api(self.repo):
result = tasks.update_docs_task.delay(
version.pk,
build_pk=build.pk,
record=False,
intersphinx=False,
)
self.assertTrue(result.successful())

self.assertEqual(run.call_count, 2)
apt_update = run.call_args_list[0]
apt_install = run.call_args_list[1]
self.assertEqual(
apt_update,
mock.call(
'apt-get',
'update',
'-y',
'-q',
user='root:root',
)
)
self.assertEqual(
apt_install,
mock.call(
'apt-get',
'install',
'-y',
'-q',
'clangd',
'cmatrix',
user='root:root',
)
)
1 change: 1 addition & 0 deletions readthedocs/settings/base.py
Original file line number Diff line number Diff line change
Expand Up @@ -433,6 +433,7 @@ def TEMPLATES(self):
# instance to avoid file permissions issues.
# https://docs.docker.com/engine/reference/run/#user
RTD_DOCKER_USER = 'docs:docs'
RTD_BUILD_SUPER_USER = 'root:root'

RTD_DOCKER_COMPOSE = False

Expand Down