Skip to content

Commit

Permalink
nsexec: retry unshare on EINVAL
Browse files Browse the repository at this point in the history
Older kernels may return EINVAL on unshare when a process is reading
runc's /proc/$PID/status or /proc/$PID/maps. This was fixed by kernel
commit 12c641ab8270f ("unshare: Unsharing a thread does not require
unsharing a vm") in Linuxt  v4.3.

For CentOS 7, the fix was backported to CentOS 7.7 (kernel 3.10.0-1062).

To work around this kernel bug, let's retry on EINVAL a few times.

Reported-by: zzyyzte <[email protected]>
Signed-off-by: Kir Kolyshkin <[email protected]>
  • Loading branch information
kolyshkin committed Mar 16, 2023
1 parent b3a68fe commit cecb039
Showing 1 changed file with 22 additions and 8 deletions.
30 changes: 22 additions & 8 deletions libcontainer/nsenter/nsexec.c
Original file line number Diff line number Diff line change
Expand Up @@ -833,6 +833,25 @@ void send_mountsources(int sockfd, pid_t child, char *mountsources, size_t mount
bail("failed to close container mount namespace fd %d", container_mntns_fd);
}

void try_unshare(int flags, const char *msg)
{
write_log(DEBUG, "unshare %s", msg);
/*
* Kernels prior to v4.3 may return EINVAL on unshare when another process
* reads runc's /proc/$PID/status or /proc/$PID/maps. To work around this,
* retry on EINVAL a few times.
*/
int retries = 5;
for (; retries > 0; retries--) {
if (unshare(flags) == 0) {
return;
}
if (errno != EINVAL)
break;
}
bail("failed to unshare %s", msg);
}

void nsexec(void)
{
int pipenum;
Expand Down Expand Up @@ -1171,9 +1190,7 @@ void nsexec(void)
* problem.
*/
if (config.cloneflags & CLONE_NEWUSER) {
write_log(DEBUG, "unshare user namespace");
if (unshare(CLONE_NEWUSER) < 0)
bail("failed to unshare user namespace");
try_unshare(CLONE_NEWUSER, "user namespace");
config.cloneflags &= ~CLONE_NEWUSER;

/*
Expand Down Expand Up @@ -1225,9 +1242,7 @@ void nsexec(void)
* some old kernel versions where clone(CLONE_PARENT | CLONE_NEWPID)
* was broken, so we'll just do it the long way anyway.
*/
write_log(DEBUG, "unshare remaining namespace (except cgroupns)");
if (unshare(config.cloneflags & ~CLONE_NEWCGROUP) < 0)
bail("failed to unshare remaining namespaces (except cgroupns)");
try_unshare(config.cloneflags & ~CLONE_NEWCGROUP, "remaining namespaces (except cgroupns)");

/* Ask our parent to send the mount sources fds. */
if (config.mountsources) {
Expand Down Expand Up @@ -1340,8 +1355,7 @@ void nsexec(void)
}

if (config.cloneflags & CLONE_NEWCGROUP) {
if (unshare(CLONE_NEWCGROUP) < 0)
bail("failed to unshare cgroup namespace");
try_unshare(CLONE_NEWCGROUP, "cgroup namespace");
}

write_log(DEBUG, "signal completion to stage-0");
Expand Down

0 comments on commit cecb039

Please sign in to comment.