-
Notifications
You must be signed in to change notification settings - Fork 2.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Nsexec spring cleaning part I #3982
Nsexec spring cleaning part I #3982
Conversation
OK, this timeouts in CI the same way as #3953. |
c53ef6e
to
2d8e8e7
Compare
OK, I fixed the issue of CI timeout. This was caused by the wrong logic of error handling in + // We have a copy, the child can keep working. We don't need to
+ // wait for the seccomp notify listener to get the fd before we
+ // permit the child to continue because the child will happily wait
+ // for the listener if it hits SCMP_ACT_NOTIFY.
+ if err := writeSync(p.messageSockPair.parent, procSeccompDone); err != nil {
return err
}
- defer unix.Close(seccompFd)
bundle, annotations := utils.Annotations(p.config.Config.Labels)
containerProcessState := &specs.ContainerProcessState{
@@ -199,15 +213,10 @@ func (p *setnsProcess) start() (retErr error) {
containerProcessState, seccompFd); err != nil {
return err
}
-
- // Sync with child.
- if err := writeSync(p.messageSockPair.parent, procSeccompDone); err != nil {
- return err
- } |
libcontainer/utils/cmsg.go
Outdated
if i == 0 && err == nil { | ||
// Only close the first one on error. | ||
continue | ||
} | ||
// Always close extra ones. | ||
_ = unix.Close(fd) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@cyphar I slightly changed the code here; the original was this:
// Only close 0 if err != nil, and close everything else.
if i != 0 || err != nil {
_ = unix.Close(fd)
}
I feel that my version is less compact but more readable. Feel free to 👎🏻 and I will revert :)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ah! I wanted to fix the error but forgot (the error is we should check Err
not err
). Fixed now.
2d8e8e7
to
33a1bfb
Compare
Can you drop the first procfs patch? This was needed for one of the patches you haven't included here, and I need to figure out why that is necessary with the cloned_binary change (the issue is that GHA masks procfs which trips the |
Otherwise TESTFLAGS="-run FooBar" will result in TESTFLAGS=-run being executed in the container. Signed-off-by: Aleksa Sarai <[email protected]>
The code in this function became quite complicated and not entirely correct over time. As a result, if an error is returned from parseSync, it might end up stuck waiting for the child to finish. 1. Let's not wait() for the child twice. We already do it in the defer statement (call p.terminate()) when we are returning an error. 2. Remove sentResume and sentRun since we do not want to check if these were sent or not. Instead, introduce and check seenProcReady, as procReady is always expected from runc init. 3. Eliminate the possibility to wrap nil as an error. 4. Make sure we always call shutdown on the sync socket, and do not let shutdown error shadow the ierr. This fixes the issue of stuck `runc runc` with the optimization patch (sending procSeccompDone earlier) applied. Signed-off-by: Kir Kolyshkin <[email protected]>
This includes quite a few cleanups and improvements to the way we do synchronisation. The core behaviour is unchanged, but switching to embedding json.RawMessage into the synchronisation structure will allow us to do more complicated synchronisation operations in future patches. The file descriptor passing through the synchronisation system feature will be used as part of the idmapped-mount and bind-mount-source features when switching that code to use the new mount API outside of nsexec.c. Signed-off-by: Aleksa Sarai <[email protected]> Signed-off-by: Kir Kolyshkin <[email protected]>
*os.File is correctly tracked by the garbage collector, and there's no need to use raw file descriptors for this code. Signed-off-by: Aleksa Sarai <[email protected]>
The kernel ignores these arguments, and passing them can lead to confusing error messages (the old source is irrelevant for MS_REMOUNT), as well as causing issues for a future patch where we switch to move_mount(2). Signed-off-by: Aleksa Sarai <[email protected]>
The original implementation of cgroupns had additional synchronisation to "ensure" that the process is in the correct cgroup before unsharing the cgroupns. This behaviour was actually never necessary, and after commit 5110bd2 ("nsenter: remove cgroupns sync mechanism") there is no synchronisation at all, meaning that CLONE_NEWCGROUP should not get any special treatment. Fixes: 5110bd2 ("nsenter: remove cgroupns sync mechanism") Fixes: df3fa11 ("Add support for cgroup namespace") Signed-off-by: Aleksa Sarai <[email protected]>
In the runc state JSON we always use snake_case. This is a no-op change, but it will cause any existing container state files to be incorrectly parsed. Luckily, commit fbf183c ("Add uid and gid mappings to mounts") has never been in a runc release so we can change this before a 1.2.z release. Fixes: fbf183c ("Add uid and gid mappings to mounts") Signed-off-by: Aleksa Sarai <[email protected]>
33a1bfb
to
1f25724
Compare
done |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM, thanks for carrying this! I'll split the other features in #3953 into separate PRs based on this.
Commit 20b95f2 |
This is a partial carry of #3953, containing more-or-less simple changes from it.
The only differences from the commits in the original PR are:
err
->Err
);