Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Build: allow to install packages with apt #8065

Merged
merged 12 commits into from
May 12, 2021
Merged

Build: allow to install packages with apt #8065

merged 12 commits into from
May 12, 2021

Conversation

stsewd
Copy link
Member

@stsewd stsewd commented Mar 31, 2021

  • All package names and forms like patters/version (package-*, package==2.x) are allowed
  • We don't allow injecting extra options nor install from a path (this is validated in the config file, you'll get a nice error with the exact package that is invalid)
  • Update and installation is done before the python setup and after the clone step

You can test this locally with this branch from the test-builds https://github.com/readthedocs/test-builds/blob/use-apt/.readthedocs.yaml

Implements #8060
Related: #7566

@stsewd stsewd force-pushed the install-apt-packages branch from 4ca6676 to 2435a4a Compare April 1, 2021 23:10
@stsewd stsewd marked this pull request as ready for review April 1, 2021 23:26
@stsewd stsewd requested a review from a team April 1, 2021 23:26
Copy link
Member

@ericholscher ericholscher left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This looks great. I think we should look a bit more closely at the security aspects of it, but I think my changes will at least help:

$ sudo apt-get install --yes -- --quiet test
Reading package lists... Done
Building dependency tree
Reading state information... Done
E: Unable to locate package --quiet
E: Unable to locate package test

@@ -64,7 +64,7 @@ This is to avoid typos and provide feedback on invalid configurations.

.. contents::
:local:
:depth: 1
:depth: 3
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I wonder if this will spam the TOC too much -- seems fine tho?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I was between 2 or 3, but wanted to have build.apt_images and sphing.builder visible instead of just having sphinx and build, but no strong opinion.

docs/config-file/v2.rst Show resolved Hide resolved
readthedocs/config/config.py Show resolved Hide resolved
readthedocs/config/config.py Outdated Show resolved Hide resolved
return build

def validate_apt_package(self, index):
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I feel like we probably can be smarter about this. Can we not call apt after a --, which is the shell way of handling the end of options?

Unless otherwise noted, each builtin command documented as accepting options preceded by ‘-’ accepts ‘--’ to signify the end of the options. The :, true, false, and test/[ builtins do not accept options and do not treat ‘--’ specially. The exit, logout, return, break, continue, let, and shift builtins accept and process arguments beginning with ‘-’ without requiring ‘--’. Other builtins that accept arguments but are not specified as accepting options interpret arguments beginning with ‘-’ as invalid options and require ‘--’ to prevent this interpretation.

We should still raise a validation error here, but I'd like to see additional restrictions for this built in.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good point Eric.

Also, we need to take care of executing commands in the same line as well. Like,

apt-get install `touch /tmp/hello.txt`

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I believe python's subprocess command will handle this, but we should definitely be sure and have tests for it.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, subprocess should handle this as long as you don't pass shell=True. However, it's worth a test to make sure we don't ever mess that up.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think we use subprocess, but the Docker client API (exec_create), tho.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We don't use subprocess, we escape special chars in

def get_wrapped_command(self):
"""
Wrap command in a shell and optionally escape special bash characters.
In order to set the current working path inside a docker container, we
need to wrap the command in a shell call manually.
Some characters will be interpreted as shell characters without
escaping, such as: ``pip install requests<0.8``. When passing
``escape_command=True`` in the init method this escapes a good majority
of those characters.
"""
bash_escape_re = re.compile(
r"([\t\ \!\"\#\$\&\'\(\)\*\:\;\<\>\?\@"
r'\[\\\]\^\`\{\|\}\~])',
)
prefix = ''
if self.bin_path:
prefix += 'PATH={}:$PATH '.format(self.bin_path)
command = (
' '.join([
bash_escape_re.sub(r'\\\1', part) if self.escape_command else part
for part in self.command
])
)
return (
"/bin/sh -c 'cd {cwd} && {prefix}{cmd}'".format(
cwd=self.cwd,
prefix=prefix,
cmd=command,
)
)

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Have you tested defining packages in a malicious way? (e.g. trying to executed forbidden things). From this docstring it only happen under certain circumstances. Are we sure we are always calling this in that way?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

escape_command defaults to true if that's what you mean.

readthedocs/projects/tasks.py Show resolved Hide resolved
user=settings.RTD_BUILD_SUPER_USER,
)
self.build_env.run(
'apt-get', 'install', '-y', '-q', *packages,
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We should always use the long-version of arguments in code, to make it clearer what it does. Also add in the -- Ending

Suggested change
'apt-get', 'install', '-y', '-q', *packages,
# -- ends all command arguments to protect against users passing them
'apt-get', 'install', '--assume-yes', '--quiet', '--', *packages,

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is there a reason we want this to be quiet? I know the output is sort of long but it might help folks debug their own problems. I tested it with --quiet and it's still fairly verbose but maybe a comment is appropriate on why.

I looked at what CircleCI does and they basically give users free reign to run arbitrary commands including apt-get install. Our system seems quite a bit more locked down than that which seems fine. There's still some risk (could a user install some docker utilities and somehow access the host?) but I think as long as we're just installing built-in Debian packages and sanitizing the inputs really well, we should be OK.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is there a reason we want this to be quiet? I know the output is sort of long but it might help folks debug their own problems. I tested it with --quiet and it's still fairly verbose but maybe a comment is appropriate on why.

-q will suppress the progress bar not the output itself, -qq will suppress everything.

@patch('readthedocs.projects.tasks.UpdateDocsTaskStep.setup_vcs', new=MagicMock)
@patch.object(BuildEnvironment, 'run')
@patch('readthedocs.doc_builder.config.load_config')
def test_install_apt_packages(self, load_config, run):
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This seems like a lot of patches? Seems we're trying to stop the builds from processing, but I wonder if there's a cleaner way to do this?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I was able to remove one patch, but I think a cleaner way would require refactor the build task so we can mock self.build_env, self.setup_env and self.python_env

Copy link
Member

@humitos humitos left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good to me.

I also think we should take care a little deeper about security concerns here and make some extra QA for these cases.

docs/config-file/v2.rst Outdated Show resolved Hide resolved
return build

def validate_apt_package(self, index):
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good point Eric.

Also, we need to take care of executing commands in the same line as well. Like,

apt-get install `touch /tmp/hello.txt`

readthedocs/config/config.py Show resolved Hide resolved
readthedocs/settings/base.py Outdated Show resolved Hide resolved
@ericholscher
Copy link
Member

ericholscher commented Apr 6, 2021

Pinging @davidfischer here to chime in on the security angles of this. I think we've covered most of them (passing -- to end the argument parsing, tests for shelling out) but I feel like there's lots of terrible ways this can go wrong. In particular, we should write more about how the Python level protections work around client.exec_create in docker, and make sure we're thinking through all the various places this might break.

'/',
'.',
]
for start in invalid_starts:
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I left this bc feels like it gives a nice error for usual mistakes, but isn't needed as the regex below already validates this.

readthedocs/config/config.py Outdated Show resolved Hide resolved
Copy link
Member

@ericholscher ericholscher left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The changes here look good. I think we can ship this soon 👍

.. note::

When possible avoid installing Python packages using apt (``python3-numpy`` for example),
:ref:`use pip or Conda instead <guides/reproducible-builds:pinning dependencies>`.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💯

Copy link
Member

@humitos humitos left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM! 🎉 APT packages! 😻

docs/config-file/v2.rst Show resolved Hide resolved
return build

def validate_apt_package(self, index):
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Have you tested defining packages in a malicious way? (e.g. trying to executed forbidden things). From this docstring it only happen under certain circumstances. Are we sure we are always calling this in that way?

@stsewd stsewd enabled auto-merge (squash) May 12, 2021 17:51
@stsewd stsewd merged commit 130b6b4 into master May 12, 2021
@stsewd stsewd deleted the install-apt-packages branch May 12, 2021 17:58
@jakirkham
Copy link
Contributor

I'm probably trying this too soon (sorry am really excited about this feature 😄). Is this the right way to use it ( rapidsai/ucx-py#734 )? Is there some way to tell the package is being installed? If I just need to wait a bit, happy to wait 🙂

@stsewd
Copy link
Member Author

stsewd commented May 12, 2021

@jakirkham this change should be live by next tuesday in the afternoon.

@stsewd
Copy link
Member Author

stsewd commented May 12, 2021

also, this will work only with the v2 of the conig file (you are using v1) https://docs.readthedocs.io/en/stable/config-file/v2.html

version: 2
build:
  apt_packages:
    - libnuma1

@jakirkham
Copy link
Contributor

Awesome! Thank you for working on this 🙏

Sorry for getting ahead of things here 😅 Will give this another try next week 🙂

@stsewd
Copy link
Member Author

stsewd commented May 18, 2021

@jakirkham and anyone else subscribed, this change is live now! Please let us know if you find any problems!

@jakirkham
Copy link
Contributor

Worked great! 😄 Thanks again 🙏

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants