Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Nsexec spring cleaning part I #3982

Merged
merged 7 commits into from
Aug 16, 2023

Conversation

kolyshkin
Copy link
Contributor

@kolyshkin kolyshkin commented Aug 15, 2023

This is a partial carry of #3953, containing more-or-less simple changes from it.

The only differences from the commits in the original PR are:

  • ported to current HEAD (a few conflicts due to recently merged PRs);
  • fixed a misspelled word (syncrhonisation) in a commit message;
  • fixed a small issue in an intermediate commit (see dbdc562#r1294110056);
  • fixed a few cases of referring to old function names in doc and error messages;
  • fixed error handling in RecvFile (err -> Err);
  • added a commit that fixes the parseSync handling issue;
  • added my Signed-off-by to some commits.

@kolyshkin kolyshkin mentioned this pull request Aug 15, 2023
4 tasks
@kolyshkin kolyshkin marked this pull request as draft August 15, 2023 01:48
@kolyshkin
Copy link
Contributor Author

OK, this timeouts in CI the same way as #3953.

@kolyshkin kolyshkin force-pushed the nsexec-spring-cleaning-p1 branch from c53ef6e to 2d8e8e7 Compare August 16, 2023 02:19
@kolyshkin
Copy link
Contributor Author

OK, I fixed the issue of CI timeout. This was caused by the wrong logic of error handling in (p *initProcess) start(), which was uncovered by the following optimization in the "libcontainer: sync: cleanup synchronisation code" commit:

+                       // We have a copy, the child can keep working. We don't need to
+                       // wait for the seccomp notify listener to get the fd before we
+                       // permit the child to continue because the child will happily wait
+                       // for the listener if it hits SCMP_ACT_NOTIFY.
+                       if err := writeSync(p.messageSockPair.parent, procSeccompDone); err != nil {
                                return err
                        }
-                       defer unix.Close(seccompFd)
 
                        bundle, annotations := utils.Annotations(p.config.Config.Labels)
                        containerProcessState := &specs.ContainerProcessState{
@@ -199,15 +213,10 @@ func (p *setnsProcess) start() (retErr error) {
                                containerProcessState, seccompFd); err != nil {
                                return err
                        }
-
-                       // Sync with child.
-                       if err := writeSync(p.messageSockPair.parent, procSeccompDone); err != nil {
-                               return err
-                       }

@kolyshkin kolyshkin added the kind/refactor refactoring label Aug 16, 2023
@kolyshkin kolyshkin marked this pull request as ready for review August 16, 2023 02:43
@kolyshkin kolyshkin added this to the 1.2.0 milestone Aug 16, 2023
@kolyshkin
Copy link
Contributor Author

@cyphar @lifubang PTAL

Comment on lines 68 to 73
if i == 0 && err == nil {
// Only close the first one on error.
continue
}
// Always close extra ones.
_ = unix.Close(fd)
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@cyphar I slightly changed the code here; the original was this:

			// Only close 0 if err != nil, and close everything else.
			if i != 0 || err != nil {
				_ = unix.Close(fd)
			}

I feel that my version is less compact but more readable. Feel free to 👎🏻 and I will revert :)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah! I wanted to fix the error but forgot (the error is we should check Err not err). Fixed now.

@kolyshkin kolyshkin force-pushed the nsexec-spring-cleaning-p1 branch from 2d8e8e7 to 33a1bfb Compare August 16, 2023 02:51
@cyphar
Copy link
Member

cyphar commented Aug 16, 2023

Can you drop the first procfs patch? This was needed for one of the patches you haven't included here, and I need to figure out why that is necessary with the cloned_binary change (the issue is that GHA masks procfs which trips the mount_too_revealing code, but weirdly this doesn't cause issues with current runc). I will include it in the follow-up PRs.

@kolyshkin kolyshkin requested review from cyphar and lifubang August 16, 2023 02:53
cyphar and others added 7 commits August 15, 2023 19:54
Otherwise TESTFLAGS="-run FooBar" will result in TESTFLAGS=-run being
executed in the container.

Signed-off-by: Aleksa Sarai <[email protected]>
The code in this function became quite complicated and not entirely
correct over time. As a result, if an error is returned from parseSync,
it might end up stuck waiting for the child to finish.

1. Let's not wait() for the child twice. We already do it in the
defer statement (call p.terminate()) when we are returning an error.

2. Remove sentResume and sentRun since we do not want to check if
these were sent or not. Instead, introduce and check seenProcReady, as
procReady is always expected from runc init.

3. Eliminate the possibility to wrap nil as an error.

4. Make sure we always call shutdown on the sync socket, and do not let
   shutdown error shadow the ierr.

This fixes the issue of stuck `runc runc` with the optimization patch
(sending procSeccompDone earlier) applied.

Signed-off-by: Kir Kolyshkin <[email protected]>
This includes quite a few cleanups and improvements to the way we do
synchronisation. The core behaviour is unchanged, but switching to
embedding json.RawMessage into the synchronisation structure will allow
us to do more complicated synchronisation operations in future patches.

The file descriptor passing through the synchronisation system feature
will be used as part of the idmapped-mount and bind-mount-source
features when switching that code to use the new mount API outside of
nsexec.c.

Signed-off-by: Aleksa Sarai <[email protected]>
Signed-off-by: Kir Kolyshkin <[email protected]>
*os.File is correctly tracked by the garbage collector, and there's no
need to use raw file descriptors for this code.

Signed-off-by: Aleksa Sarai <[email protected]>
The kernel ignores these arguments, and passing them can lead to
confusing error messages (the old source is irrelevant for MS_REMOUNT),
as well as causing issues for a future patch where we switch to
move_mount(2).

Signed-off-by: Aleksa Sarai <[email protected]>
The original implementation of cgroupns had additional synchronisation
to "ensure" that the process is in the correct cgroup before unsharing
the cgroupns. This behaviour was actually never necessary, and after
commit 5110bd2 ("nsenter: remove cgroupns sync mechanism") there is
no synchronisation at all, meaning that CLONE_NEWCGROUP should not get
any special treatment.

Fixes: 5110bd2 ("nsenter: remove cgroupns sync mechanism")
Fixes: df3fa11 ("Add support for cgroup namespace")
Signed-off-by: Aleksa Sarai <[email protected]>
In the runc state JSON we always use snake_case. This is a no-op change,
but it will cause any existing container state files to be incorrectly
parsed. Luckily, commit fbf183c ("Add uid and gid mappings to
mounts") has never been in a runc release so we can change this before a
1.2.z release.

Fixes: fbf183c ("Add uid and gid mappings to mounts")
Signed-off-by: Aleksa Sarai <[email protected]>
@kolyshkin kolyshkin force-pushed the nsexec-spring-cleaning-p1 branch from 33a1bfb to 1f25724 Compare August 16, 2023 02:54
@kolyshkin
Copy link
Contributor Author

Can you drop the first procfs patch? This was needed for one of the patches you haven't included here, and I need to figure out why that is necessary with the cloned_binary change (the issue is that GHA masks procfs which trips the mount_too_revealing code, but weirdly this doesn't cause issues with current runc). I will include it in the follow-up PRs.

done

Copy link
Member

@cyphar cyphar left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, thanks for carrying this! I'll split the other features in #3953 into separate PRs based on this.

@AkihiroSuda
Copy link
Member

Commit 20b95f2 libcontainer: seccomp: pass around *os.File for notifyfd caused a regression

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
kind/refactor refactoring
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants