Skip to content

Commit

Permalink
nsexec: retry unshare on EINVAL
Browse files Browse the repository at this point in the history
Older kernels may return EINVAL on unshare when a process is reading
runc's /proc/$PID/status or /proc/$PID/maps. This was fixed by kernel
commit 12c641ab8270f ("unshare: Unsharing a thread does not require
unsharing a vm") in Linux v4.3.

For CentOS 7, the fix was backported to CentOS 7.7 (kernel 3.10.0-1062).

To work around this kernel bug, let's retry on EINVAL a few times.

Reported-by: zzyyzte <[email protected]>
Signed-off-by: Kir Kolyshkin <[email protected]>
(cherry picked from commit cecb039)
Signed-off-by: Kir Kolyshkin <[email protected]>
  • Loading branch information
kolyshkin authored and cyphar committed Mar 26, 2023
1 parent 059d773 commit 8ec02ea
Showing 1 changed file with 22 additions and 8 deletions.
30 changes: 22 additions & 8 deletions libcontainer/nsenter/nsexec.c
Original file line number Diff line number Diff line change
Expand Up @@ -832,6 +832,25 @@ void send_mountsources(int sockfd, pid_t child, char *mountsources, size_t mount
bail("failed to close container mount namespace fd %d", container_mntns_fd);
}

void try_unshare(int flags, const char *msg)
{
write_log(DEBUG, "unshare %s", msg);
/*
* Kernels prior to v4.3 may return EINVAL on unshare when another process
* reads runc's /proc/$PID/status or /proc/$PID/maps. To work around this,
* retry on EINVAL a few times.
*/
int retries = 5;
for (; retries > 0; retries--) {
if (unshare(flags) == 0) {
return;
}
if (errno != EINVAL)
break;
}
bail("failed to unshare %s", msg);
}

void nsexec(void)
{
int pipenum;
Expand Down Expand Up @@ -1170,9 +1189,7 @@ void nsexec(void)
* problem.
*/
if (config.cloneflags & CLONE_NEWUSER) {
write_log(DEBUG, "unshare user namespace");
if (unshare(CLONE_NEWUSER) < 0)
bail("failed to unshare user namespace");
try_unshare(CLONE_NEWUSER, "user namespace");
config.cloneflags &= ~CLONE_NEWUSER;

/*
Expand Down Expand Up @@ -1224,9 +1241,7 @@ void nsexec(void)
* some old kernel versions where clone(CLONE_PARENT | CLONE_NEWPID)
* was broken, so we'll just do it the long way anyway.
*/
write_log(DEBUG, "unshare remaining namespace (except cgroupns)");
if (unshare(config.cloneflags & ~CLONE_NEWCGROUP) < 0)
bail("failed to unshare remaining namespaces (except cgroupns)");
try_unshare(config.cloneflags & ~CLONE_NEWCGROUP, "remaining namespaces (except cgroupns)");

/* Ask our parent to send the mount sources fds. */
if (config.mountsources) {
Expand Down Expand Up @@ -1344,8 +1359,7 @@ void nsexec(void)
}

if (config.cloneflags & CLONE_NEWCGROUP) {
if (unshare(CLONE_NEWCGROUP) < 0)
bail("failed to unshare cgroup namespace");
try_unshare(CLONE_NEWCGROUP, "cgroup namespace");
}

write_log(DEBUG, "signal completion to stage-0");
Expand Down

0 comments on commit 8ec02ea

Please sign in to comment.