Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[hotfix] nsenter: refuse to build with Go 1.22 #4234

Merged
merged 2 commits into from
Apr 2, 2024

Conversation

cyphar
Copy link
Member

@cyphar cyphar commented Mar 29, 2024

We will almost certainly need to eventually rework nsenter to:

  1. Figure out a way to make pthread_self() not break after nsenter runs
    (probably not possible, because the core issue is likely that we are
    ignoring the rules of signal-safety(7)); or
  2. Do an other re-exec of /proc/self/exe to execute the Go half of
    "runc init" -- after we've done the nsenter setup. This would reset
    all of the process state and ensure we have a clean glibc state for
    Go, but it would make runc slower...

For now, just block Go 1.22 builds to avoid having broken runcs floating
around until we resolve the issue. It seems possible for musl to also
have an issue, but it appears to work and so for now just block glibc
builds.

Note that this will only block builds for anything that uses nsenter --
so users of our (internal) libcontainer libraries should be fine. Only
users that are starting containers using nsenter to actually start
containers will see the error (which is precisely what we want).

See #4233
Signed-off-by: Aleksa Sarai [email protected]

@cyphar cyphar mentioned this pull request Mar 29, 2024
@AkihiroSuda AkihiroSuda added impact/changelog backport/1.1-todo A PR in main branch which needs to be backported to release-1.1 labels Mar 29, 2024
Copy link
Member

@AkihiroSuda AkihiroSuda left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@kolyshkin
Copy link
Contributor

@cyphar can you pick 7651760 to here? Otherwise cross-i386 job will fail.

@lifubang
Copy link
Member

Please see 6fb24a2 in #4193 , I think maybe we can resolve this clone(2) issue. Please help to see what's wrong with the remain failure test cases.

@cyphar
Copy link
Member Author

cyphar commented Mar 30, 2024

@lifubang Are you sure that actually solves the issue? Can you try running the test C program from golang/go#65625 with that change?

We don't use CLONE_VM so the processes are not running in the same memory. Note that the warning in clone(2) says

Since the child and calling process may share memory, it is not possible for the child process to execute in the same stack as the calling process.

In fact, clone3(2) explicitly allows you to use the current stack for the child process:

The stack for the child process is specified via cl_args.stack, which points to the lowest byte of the stack area, and cl_args.stack_size, which specifies the size of the stack in bytes. In the case where the CLONE_VM flag (see below) is specified, a stack must be explicitly allocated and specified. Otherwise, these two fields can be specified as NULL and 0, which causes the child to use the same stack area as the parent (in the child's own virtual address space).

I think the issue is more likely related to the thread-local storage used by pthreads. From golang/go#65625 it seems that the old TID is stored in the TLS and glibc then tries to do sched_getaffinity on that TID -- this just so happens to work in runc's case because the parent process is still alive and so pthreads gets the wrong process's information (oops).
I'm a little surprised this is the case -- gettid() works correctly after fork() and while all of the pthread_* stuff is theoretically unsafe to call after fork() it's very weird that they would allow the pthread_* stuff to get outdated if they already put the work into making gettid() work (not to mention IIRC gettid() is cached in thread-local storage). Also the debugging work in golang/go#65625 indicated that it was an issue with CLONE_PARENT in particular, which makes less sense -- if the problem is outdated TLS data, I would expect the issue to occur regardless of what the parent PID is.

Ultimately, our usage of clone is technically not "safe" from a UB perspective because we aren't meant to call any async-unsafe code in a child after a fork. I suspect this is the core issue.

@lifubang
Copy link
Member

lifubang commented Mar 31, 2024

Can you try running the test C program from golang/go#65625 with that change?

It seems not work for this C program, but I'm curious about why it can work in runc.

We don't use CLONE_VM so the processes are not running in the same memory. Note that the warning in clone(2) says

I think Copy-on-Write still has the effective even without CLONE_VM, if there is no write action to a memory page, it will be shared with the parent process, but I guess(only a guess) any write action to the thread-local storage may not cause Copy-on-Write, so we get a wrong tid?
With my patch, most ci jobs passed, only some jobs of the rootless(with idmap) in ubuntu 20.04 failed.

@cyphar
Copy link
Member Author

cyphar commented Apr 1, 2024

It seems not work for this C program, but I'm curious about why it can work in runc.

The reason the C program fails is because it kills the original process, and so the wrong PID no longer exists. With runc, the parent process still exists so the old PID still works.

I think Copy-on-Write still has the effective even without CLONE_VM, if there is no write action to a memory page, it will be shared with the parent process, but I guess(only a guess) any write action to the thread-local storage may not cause Copy-on-Write, so we get a wrong tid?

CLONE_VM means that there is no CoW -- the memory is shared between processes (like threads -- in fact, threads use CLONE_VM). I am only bringing it up because the man page warning you point to as a justification for needing to malloc only makes sense for CLONE_VM (which we don't use). The clone3(2) documentation does a better job of explaining this behaviour.

but I guess(only a guess) any write action to the thread-local storage may not cause Copy-on-Write, so we get a wrong tid?

I think the problem is that the TLS contains the old data, and glibc doesn't update it after clone(2). Strictly speaking, we can only safely execute "async-signal-safe" functions (see signal-safety(7)) in the child after a clone(2) or fork(2) -- glibc does not have to update the TID in TLS after clone(2) (none of pthread_* is "async-signal-safe", it's only "MT-Safe", so they don't need to provide any guarantees about pthread state after a fork(2) or clone(2)), but they do have to for getpid(2). It's just unfortunate that they don't update it because getpid(2) is cached in the TLS as well.

With my patch, most ci jobs passed, only some jobs of the rootless(with idmap) in ubuntu 20.04 failed.

It's possible the malloc helps avoid the corruption issue on old glibc somehow, I'm not sure. My point was more that I don't see how malloc fixes the actual issue, so I don't think adding it make sense -- we should fix the underlying issue, rather than having a patch that "solves" the issue without a good reason for why it solves it.

@cyphar cyphar force-pushed the go122 branch 2 times, most recently from edf8114 to 802b395 Compare April 1, 2024 14:54
@cyphar cyphar requested a review from AkihiroSuda April 1, 2024 14:55
Signed-off-by: Kir Kolyshkin <[email protected]>
Signed-off-by: Aleksa Sarai <[email protected]>
We will almost certainly need to eventually rework nsenter to:

 1. Figure out a way to make pthread_self() not break after nsenter runs
    (probably not possible, because the core issue is likely that we are
    ignoring the rules of signal-safety(7)); or
 2. Do an other re-exec of /proc/self/exe to execute the Go half of
    "runc init" -- after we've done the nsenter setup. This would reset
    all of the process state and ensure we have a clean glibc state for
    Go, but it would make runc slower...

For now, just block Go 1.22 builds to avoid having broken runcs floating
around until we resolve the issue. It seems possible for musl to also
have an issue, but it appears to work and so for now just block glibc
builds.

Note that this will only block builds for anything that uses nsenter --
so users of our (internal) libcontainer libraries should be fine. Only
users that are starting containers using nsenter to actually start
containers will see the error (which is precisely what we want).

Signed-off-by: Aleksa Sarai <[email protected]>
@AkihiroSuda AkihiroSuda removed the backport/1.1-todo A PR in main branch which needs to be backported to release-1.1 label Apr 1, 2024
CHANGELOG.md Show resolved Hide resolved
@kolyshkin
Copy link
Contributor

Do an other re-exec of /proc/self/exe to execute the Go half of
"runc init" -- after we've done the nsenter setup. This would reset
all of the process state and ensure we have a clean glibc state for
Go, but it would make runc slower...

Wonder how much slower that would be.

Copy link
Contributor

@kolyshkin kolyshkin left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@kolyshkin kolyshkin merged commit 4641f17 into opencontainers:main Apr 2, 2024
45 checks passed
@cyphar cyphar deleted the go122 branch April 2, 2024 02:43
@kolyshkin kolyshkin added the backport/1.1-done A PR in main branch which has been backported to release-1.1 label Apr 2, 2024
@kolyshkin
Copy link
Contributor

1.1 backport: #4240

@liggitt
Copy link
Contributor

liggitt commented Apr 3, 2024

Will this PR break things which are using go1.22 to link / build runc as a library and don't actually exercise the nsenter code at all? If so, that seems unnecessary and really problematic.

Asking another way, do we know that everything importing this package is running nsenter?

@austinvazquez
Copy link
Contributor

@liggitt, for downstream repositories which have a dependency on the runc module but do not call the nsenter package then this hotfix would not prevent those users from linking and building. The easiest method to test if your use case uses nsenter package is to vendor dependencies and check for vendor/github.com/runc/libcontainer/nsenter package. Hope that helps.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
backport/1.1-done A PR in main branch which has been backported to release-1.1 impact/changelog
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants