Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open bind mount sources from the host userns #2576

Merged
merged 3 commits into from
Oct 28, 2021

Conversation

alban
Copy link
Contributor

@alban alban commented Sep 7, 2020

The source of the bind mount might not be accessible in a different usernamespace because a component of the source path might not be traversed under the users and groups mapped inside the user namespace. This caused errors such as the following:

  # time="2020-06-22T13:48:26Z" level=error msg="container_linux.go:367:
  starting container process caused: process_linux.go:459:
  container init caused: rootfs_linux.go:58:
  mounting \"/tmp/busyboxtest/source-inaccessible/dir\"
  to rootfs at \"/tmp/inaccessible\" caused:
  stat /tmp/busyboxtest/source-inaccessible/dir: permission denied"

To solve this problem, this patch performs the following:

  1. in nsexec.c, it opens the source path in the host userns (so we have the right permissions to open it) but in the container mntns (so the kernel cross mntns mount check let us mount it later: https://github.com/torvalds/linux/blob/v5.8/fs/namespace.c#L2312).
  2. in nsexec.c, it passes the file descriptors of the source to the child process with SCM_RIGHTS.
  3. In runc-init in Golang, it finishes the mounts while inside the userns even without access to the some components of the source paths.

Passing the fds with SCM_RIGHTS is necessary because once the child process is in the container mntns, it is already in the container userns so it cannot temporarily join the host mntns.

This patch uses the existing mechanism with LIBCONTAINER* environment variables to pass the file descriptors from runc to runc init.

This patch uses the existing mechanism with the Netlink-style bootstrap to pass information about the list of source mounts to nsexec.c.


Fixes: #2484

TODO:

  • It does not work yet when the bind mount is configured as read-only in config.json.
  • Unit tests fail.
  • a single env var to pass on all the mount fds

@alban
Copy link
Contributor Author

alban commented Sep 8, 2020

It does not work yet when the bind mount is configured as read-only in config.json.

This was not a bug but normal kernel behaviour because I tested without mount flags nosuid,nodev. When I test with the correct flags, it works. For details, see: #1229 (comment)

@alban alban force-pushed the alban/userns-2484-take2 branch from 040f486 to d5a10e2 Compare September 9, 2020 07:51
@alban
Copy link
Contributor Author

alban commented Sep 9, 2020

Unit tests fail.

Unit tests seem to fail on the master branch too, it does not seem related to this PR.

@alban alban marked this pull request as ready for review September 9, 2020 09:27
@kolyshkin
Copy link
Contributor

Unit tests seem to fail on the master branch too, it does not seem related to this PR.

Can you please rebase it on top of current master (merged #2580 should fix CI)

@alban alban force-pushed the alban/userns-2484-take2 branch from d5a10e2 to 80a3299 Compare September 9, 2020 16:37
Copy link
Contributor

@kolyshkin kolyshkin left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Left some comments.

libcontainer/container_linux.go Show resolved Hide resolved
libcontainer/container_linux.go Outdated Show resolved Hide resolved
libcontainer/factory_linux.go Outdated Show resolved Hide resolved
libcontainer/nsenter/nsexec.c Outdated Show resolved Hide resolved
libcontainer/nsenter/nsexec.c Outdated Show resolved Hide resolved
libcontainer/nsenter/nsexec.c Outdated Show resolved Hide resolved
libcontainer/nsenter/nsexec.c Outdated Show resolved Hide resolved
libcontainer/nsenter/nsexec.c Show resolved Hide resolved
libcontainer/nsenter/nsexec.c Outdated Show resolved Hide resolved
@kolyshkin
Copy link
Contributor

In general, can we maybe use a single env var to pass on all the mount fds at once (instead of introducing a variable per each fd)? That would probably be faster and more resource-wise.

@alban alban force-pushed the alban/userns-2484-take2 branch from 80a3299 to 7c0529f Compare September 10, 2020 10:25
@alban
Copy link
Contributor Author

alban commented Sep 10, 2020

@kolyshkin Thanks for the reviews!

I addressed all the comments except the one about the single env var to pass on all the mount fds. I added that in the TODO list.

The unit tests on Travis still fail after a rebase but I don't see the error message. Do you know why? If not, I'll continue to investigate...

@alban alban force-pushed the alban/userns-2484-take2 branch from e14baed to 54eca43 Compare September 10, 2020 16:09
@alban
Copy link
Contributor Author

alban commented Sep 11, 2020

In general, can we maybe use a single env var to pass on all the mount fds at once (instead of introducing a variable per each fd)? That would probably be faster and more resource-wise.

Added, with _LIBCONTAINER_MOUNT_FILE_FDS containing a list of fds separated by ;.

@alban alban force-pushed the alban/userns-2484-take2 branch 2 times, most recently from 70b28f5 to 165745b Compare September 11, 2020 12:02
@alban
Copy link
Contributor Author

alban commented Sep 11, 2020

The unit tests now pass fine. Changes:

  • the mount source fd passing is disabled on rootless containers because the runc parent process might not have the capabilities to run setns() on the mount namespace it created. In that case, the mount would be performed as before, and inaccessible dirs would fail to mount as before. If we want to support that use case too, I don't know how to do it
  • whether to enabled mount source fd passing is now decided in container_linux.go sendMountSources()
  • the tests are skipped in the rootless case
  • the tests disable readonly rootfs. Not sure why it didn't work without that

man setns(2):
Changing the mount namespace requires that the caller possess both CAP_SYS_CHROOT and CAP_SYS_ADMIN capabilities in its own user namespace and CAP_SYS_ADMIN in the user namespace that owns the target mount namespace.

envMountFileFds, err)
}
mountFile := os.NewFile(uintptr(mountFileFd), "mount-file")
defer mountFile.Close()
Copy link
Member

@cyphar cyphar Sep 12, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

These defers will never be called because this function ends up calling Execve. There are two things we should do:

  1. Explicitly set O_CLOEXEC on all of the file descriptors (fcntl(F_SETFL, FD_CLOEXEC) is what you want).
  2. Close them as soon as we can.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks. I added a patch that does the following:

  • the O_CLOEXEC is now set in nsexec.c by dup3()
  • close them earlier in libcontainer/init_linux.go (for the initSetns case) and in libcontainer/standard_init_linux.go (for the initStandard case).

The "defer FOO.Close()" in this function follows the same code pattern of closing other fds in case i.Init() somehow fails to run Execve.

@alban alban force-pushed the alban/userns-2484-take2 branch 2 times, most recently from 6900793 to d7854e6 Compare September 15, 2020 15:31
@rata rata force-pushed the alban/userns-2484-take2 branch 2 times, most recently from c87cbd6 to 1413f82 Compare October 12, 2021 14:04
@rata
Copy link
Member

rata commented Oct 12, 2021

@kolyshkin no problem, fixed! Pushed several times to see if the DCO check runs (it says expected, but stays like that for a long time), but didn't help. PTAL :)

@rata rata force-pushed the alban/userns-2484-take2 branch 4 times, most recently from ebdd748 to 1efa5f2 Compare October 12, 2021 18:32
@rata
Copy link
Member

rata commented Oct 12, 2021

Pushing again as make lint was failing with timeout (locally it passes just fine). Let's see if the CI passes now, if not will try again tomorrow.


if strings.HasPrefix(tempBase, path) {
// We can't safely change permissions if it is not below tempBase.
if stats.Mode()&0o5 == 0 {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks like the check is wrong. Consider the case when only r or x is set.

Copy link
Member

@rata rata Oct 13, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

haha, I'm embarrassed I missed it 🙈 . I was unlucky and in practice it worked, because the dirs didn't have any permissions for others. But thanks for catching this, fixed now!

continue
}

if stats.Mode()&0o5 != 0 {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ditto

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed too, thanks! 🙈

@rata rata force-pushed the alban/userns-2484-take2 branch from 1efa5f2 to 04a4822 Compare October 13, 2021 11:23
@rata
Copy link
Member

rata commented Oct 13, 2021

@kolyshkin PTAL

Add a unit test to check that bind mounts that have a part of its
path non accessible by others still work when using user namespaces.

To do this, we also modify newRoot() to return rootfs directories that
can be traverse by others, so the rootfs created works for all test
(either running in a userns or not).

Signed-off-by: Mauricio Vásquez <[email protected]>
Signed-off-by: Rodrigo Campos <[email protected]>
Co-authored-by: Rodrigo Campos <[email protected]>
@rata rata force-pushed the alban/userns-2484-take2 branch from 04a4822 to 8542322 Compare October 16, 2021 15:30
@rata
Copy link
Member

rata commented Oct 16, 2021

@kolyshkin Thanks, PTAL. Hopefully this is ready now 🤞

Copy link
Contributor

@kolyshkin kolyshkin left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@AkihiroSuda AkihiroSuda requested a review from cyphar October 19, 2021 09:02
@kolyshkin kolyshkin requested a review from AkihiroSuda October 19, 2021 17:13
@AkihiroSuda AkihiroSuda merged commit 4d17654 into opencontainers:master Oct 28, 2021
@rata
Copy link
Member

rata commented Oct 28, 2021

Thanks all for the time and review!

I think we never added release notes for this. Here they are:

Fix using bind mounts when the user in the user namespace doesn't have permission to traverse the mount path (#2484)


runc run test_busybox
[ "$status" -eq 0 ]
}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Doesn't seem to work on Fedora 35
#3258

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I mentioned it in several places already, but just in case someone is looking here, the PR fixing this is: #3260

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

runc with user namespace enabled fails to bind mount host dirs with 750 permission
9 participants