-
Notifications
You must be signed in to change notification settings - Fork 583
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
cmd/snap-confine: prevent cwd restore permission bypass #6642
Conversation
It was brought to our attention that specially crafted current working directory can be used to bypass permissions checks on a multi-user system. The principal reason for that was that snap-confine, during the course of execution, changes the current working directory, either explicitly or implicitly (e.g. via setns) and then tries to restore the original working directory while executing with root privileges. The recently introduced central handling of umask is now extended to handle current working directory. The code replaces individual implementation in several other places in the codebase. The new implementation is better than the original in two ways. First of all, it works based on a different principle which is discussed below. Second of all, it runs late, after the original user and group IDs have been restored (that is, it no longer runs as privileged user). The new method has one of three possible outcomes: 1) The original working directory cannot be represented in the execution environment. The process is moved to /var/lib/snapd/void 2) The original working directory can be represented in the execution environment but it now points to a different inode. The new inode is used and a filesystem permission check is performed by the kernel. 3) The original working directory can be represented in the execution environment and points to the same inode. The same inode is used. Two tests were affected by this change as they were running commands inside a LXD container as the root user, with the initial working directory set to /root, then using su to switch to the ubuntu user. Those commands were changed to go to /home/ubuntu instead. To show this all in operation I've added a new test that verifies each of the variant cases. Signed-off-by: Zygmunt Krynicki <[email protected]>
if (nread == sizeof orig_cwd) { | ||
die("cannot fit symbolic link target %s", fd_path); | ||
} | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
For the sake of debugging, I'd add:
debug("original cwd in parent namespace: %s", orig_cwd);
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We cannot do that, that string is not terminated.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I meant adding it below:
if (nread == sizeof orig_cwd) {
die("cannot fit symbolic link target %s", fd_path);
}
debug("original cwd in parent namespace: %s", orig_cwd);
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ah, but that is logged in several cases below?
Signed-off-by: Zygmunt Krynicki <[email protected]>
Signed-off-by: Zygmunt Krynicki <[email protected]>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for your work on this. Comments inline :)
We use O_PATH so it doesn't matter much but we can be stricter by passing O_DIRECTORY when we open with the current working directory. Signed-off-by: Zygmunt Krynicki <[email protected]>
Signed-off-by: Zygmunt Krynicki <[email protected]>
Signed-off-by: Zygmunt Krynicki <[email protected]>
Signed-off-by: Zygmunt Krynicki <[email protected]>
Inspect the current working directory in the initial mount namespace. While this doesn't change anything it is more natural and more resilient to breakage in case kernel changes semantics of fstatfs invoked across unshare/setns. Signed-off-by: Zygmunt Krynicki <[email protected]>
When originally implementing the first patch in this series I was considering using the file descriptor for the current working directory in the original mount namespace so I wanted to differentiate the cases where the inode it points to now and before is different from that when they are the same. Over time I realised that I cannot use fchdir with the original descriptor and implemented separate cases for fchdir + chdir but in doing so I didn't realise that fchdir and chdir are equivalent here because the path descriptor used is from the per-snap mount namespace and that it is opened just moments before calling fchdir. As such the whole fragment can be reduced to unconditional fchdir with the inode check serving just as UX differentiation. Signed-off-by: Zygmunt Krynicki <[email protected]>
Nothing is changed but it's clearer what is going on. Signed-off-by: Zygmunt Krynicki <[email protected]>
Signed-off-by: Zygmunt Krynicki <[email protected]>
Signed-off-by: Zygmunt Krynicki <[email protected]>
The new test is failing in the most interesting way:
There's no void directory on core18! |
There's quite a difference in the contents of /var/lib/snapd between
|
When the invoking user has no permission to stay in a directory (e.g. a symlink attack in /tmp or a su invocation from /root) then instead of failing the whole command chain, move the invoked process to the void directory. The void directory is empty and has no permissions to do anything inside. The set of affected tests were changed not to require a directory change anymore. In addition the pwd test was expanded to cover this specific situation. Signed-off-by: Zygmunt Krynicki <[email protected]>
Signed-off-by: Zygmunt Krynicki <[email protected]>
I will make the required changes to core18 to make this test pass. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks!
cmd/snap-confine/snap-confine.c
Outdated
} | ||
if (inner_cwd_fd < 0) { | ||
debug("cannot represent original working directory %s", orig_cwd); | ||
goto the_void; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hehe :)
cmd/snap-confine/snap-confine.c
Outdated
} | ||
return; | ||
the_void: | ||
if (chdir(sc_void_dir) < 0) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Of course, if you are root and not confined, then this could be raced, but unconfined root can just do that without the race, so not a problem.
I found some more issues here. I'm iterating on this now. |
The void directory cannot be 000 anymore. When snap-confine chooses to restore the orignal working directory inside the mount namespace it must do so without abusing the root power it wields. As such it drops extra permissions and moves to the desired directory as the invoking user. This can fail for lack of permissions to access said directory. When this happens, instead of aborting execution, snap-confine moves to the special fallback directory, the void. This raises the problem that it cannot access the void directory as originally designed (with mode 000) so something must yield. We chose to yield the permissions by changing them to 111, that is, anyone can browse but nothing else. The change is done across many layers of packaging and test preparation but even that is insufficient. What we noticed is that the originally desired permissions were not really applied. I will separately address this in core and core18 so that the directory exists and with correct permissions but for now this should unblock the effort. Signed-off-by: Zygmunt Krynicki <[email protected]>
This failed in an odd way I didn't see locally. It would imply that the user test has permissions to be in /root, which is unexpected. I will investigate tomorrow.
|
I'm debugging this now. |
Signed-off-by: Zygmunt Krynicki <[email protected]>
The cwd test detected that core18 didn't ship the skeleton directory structure required by snapd. Since that was fixed the corresponding workaround can be now removed. Signed-off-by: Zygmunt Krynicki <[email protected]>
When the cwd test fails it can do so because of incorrect permissions in the image. Adding a debug section will allow us to track down the problem. Signed-off-by: Zygmunt Krynicki <[email protected]>
The test setup was incorrectly creating an open-access /root directory that is not representative of a real core or classic system. This patch fixes that. Signed-off-by: Zygmunt Krynicki <[email protected]>
This should be fixed now. |
@@ -762,7 +762,7 @@ popd | |||
%{_mandir}/man8/snap-confine.8* | |||
%{_mandir}/man8/snap-discard-ns.8* | |||
%{_systemdgeneratordir}/snapd-generator | |||
%attr(0000,root,root) %{_sharedstatedir}/snapd/void | |||
%attr(0111,root,root) %{_sharedstatedir}/snapd/void |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Shall we double check in test/upgrade/basic
that the permissions are actually updated?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Not in this pull request. This is not attempting to make that work across the distributions, just in our internal test suite. There are distributions that are known to have this broken, most notably ubuntu and ubuntu-core16
@@ -166,7 +166,7 @@ package() { | |||
install -dm755 "$pkgdir/var/lib/snapd/lib/vulkan" | |||
install -dm755 "$pkgdir/var/lib/snapd/lib/glvnd" | |||
# these dirs have special permissions | |||
install -dm000 "$pkgdir/var/lib/snapd/void" | |||
install -dm111 "$pkgdir/var/lib/snapd/void" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for updating this. I'll need to check whether pacman does the right thing on upgrade though.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm sure we need to update all non-RPM distribution packaging to behave correctly. I doubt this is correct yet.
Maciej Borzęcki rightfully noted that we can fail to open the path (using O_PATH flag) when we don't have permissions to navigate the directory chain. Instead of dying like we do on most errors, we will now go to the void directory. This is more in line with similar case when fchdir fails to change the directory for permission reasons. Signed-off-by: Zygmunt Krynicki <[email protected]>
The -m option applies only to the leaf element. The -p option means that we can create additional components. Shellcheck recommends against that. Signed-off-by: Zygmunt Krynicki <[email protected]>
open(orig_cwd, O_PATH | O_DIRECTORY | O_CLOEXEC | O_NOFOLLOW); | ||
if (inner_cwd_fd < 0) { | ||
if (errno == EPERM || errno == EACCES || errno == ENOENT) { | ||
debug |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Wow, indent really does unspeakable things to the code.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes... I'm on the edge of using clang-format for snap-confine.c because this is just silly.
On top of that, it's not stable, run indent a few times and you get different output.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I reviewed all the changes since my last review (thanks @bboozzoo for the O_PATH find :) and they look fine. Approving so long as the tests pass.
@@ -471,6 +480,9 @@ EOF | |||
# - make sure the group matches | |||
# - bind mount /root/test-etc/* to /etc/* via custom systemd job | |||
# We also create /var/lib/extrausers/* and append ubuntu,test there | |||
! test -e /mnt/system-data/root | |||
mkdir -m 700 /mnt/system-data/root | |||
test -d /mnt/system-data/root |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This reminded me that for many years Ubuntu shipped an 0755 /root directory. It's possible that other distros might do this today. That obviously doesn't affect this hunk of the test so just fyi.
inner_cwd_fd = | ||
open(orig_cwd, O_PATH | O_DIRECTORY | O_CLOEXEC | O_NOFOLLOW); | ||
if (inner_cwd_fd < 0) { | ||
if (errno == EPERM || errno == EACCES || errno == ENOENT) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This was a nice find. From 'man open' - Opening a file or directory with the O_PATH flag requires no permissions on the object itself (but does require execute permission on the directories in the path prefix)
On core systems the directory /var/lib/snapd/void may not be present. When it is absent it must be created by snap-confine on demand. Signed-off-by: Zygmunt Krynicki <[email protected]>
I force pushed to fix a typo that was upsetting the spell checker. |
The void directory is created by the test so we cannot check for it early. We can and should check for it late though. Signed-off-by: Zygmunt Krynicki <[email protected]>
I had earlier placed the definition of the variable across a goto jump that clang, probably rightfully, complains about. The jump supposedly jumps over the initialization so to stay safe, move the initialization and indeed the definition, earlier. Signed-off-by: Zygmunt Krynicki <[email protected]>
Force pushed to correct two misspellings in the last commit message. |
@@ -180,6 +180,7 @@ static void sc_restore_process_state(const sc_preserved_process_state * | |||
/* If the original working directory cannot be used for whatever reason then | |||
* move the process to a special void directory. */ | |||
const char *sc_void_dir = "/var/lib/snapd/void"; | |||
int void_dir_fd SC_CLEANUP(sc_cleanup_close) = -1; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is fine, but perhaps it makes sense to declare this nearer its first use. Eg:
the_void:
/* The void directory may be absent. On core18 system, and other
* systems using bootable base snap coupled with snapd snap, the
* /var/lib/snapd directory structure is not provided with packages but
* created on demand. */
int void_dir_fd SC_CLEANUP(sc_cleanup_close) = -1;
void_dir_fd = open(sc_void_dir, O_DIRECTORY | O_PATH | O_NOFOLLOW | O_CLOEXEC);
...
Not a blocker.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is invalid C. Labels cannot denote variable declarations. I tried that earlier :)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Oh, heh
It was brought to our attention that specially crafted current working
directory can be used to bypass permissions checks on a multi-user
system. The principal reason for that was that snap-confine, during the
course of execution, changes the current working directory, either
explicitly or implicitly (e.g. via setns) and then tries to restore the
original working directory while executing with root privileges.
The recently introduced central handling of umask is now extended to
handle current working directory. The code replaces individual
implementation in several other places in the codebase.
The new implementation is better than the original in two ways. First
of all, it works based on a different principle which is discussed
below. Second of all, it runs late, after the original user and group
IDs have been restored (that is, it no longer runs as privileged user).
The new method has one of three possible outcomes:
The original working directory cannot be represented in the execution
environment. The process is moved to /var/lib/snapd/void
The original working directory can be represented in the execution
environment but it now points to a different inode. The new inode is
used and a filesystem permission check is performed by the kernel.
The original working directory can be represented in the execution
environment and points to the same inode. The same inode is used.
Two tests were affected by this change as they were running commands
inside a LXD container as the root user, with the initial working
directory set to /root, then using su to switch to the ubuntu user.
Those commands were changed to go to /home/ubuntu instead.
To show this all in operation I've added a new test that verifies each
of the variant cases.
Signed-off-by: Zygmunt Krynicki [email protected]
Related-To: https://bugzilla.suse.com/show_bug.cgi?id=1127368