-
Notifications
You must be signed in to change notification settings - Fork 2.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
runc kill: add support for cgroup.kill #3825
Merged
Merged
+154
−146
Conversation
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
kolyshkin
force-pushed
the
cgroup.kill
branch
3 times, most recently
from
April 12, 2023 03:00
e9cdc73
to
53a2b6a
Compare
This comment was marked as outdated.
This comment was marked as outdated.
kolyshkin
force-pushed
the
cgroup.kill
branch
2 times, most recently
from
April 27, 2023 02:11
81c8c07
to
144339b
Compare
kolyshkin
commented
Apr 27, 2023
kolyshkin
force-pushed
the
cgroup.kill
branch
2 times, most recently
from
April 27, 2023 02:26
e861f49
to
d3fd53c
Compare
kolyshkin
force-pushed
the
cgroup.kill
branch
6 times, most recently
from
May 12, 2023 19:44
133885a
to
f1a50f1
Compare
kolyshkin
force-pushed
the
cgroup.kill
branch
4 times, most recently
from
May 13, 2023 01:36
bd44b32
to
9cc045c
Compare
@cyphar @AkihiroSuda PTAL |
kolyshkin
force-pushed
the
cgroup.kill
branch
2 times, most recently
from
May 18, 2023 00:59
122b9cc
to
e2254ea
Compare
CI failure is unrelated (#3868) |
cyphar
reviewed
Jun 8, 2023
It seems that set -x was temporarily added as a debug measure, but slipped into the final commit. Remove it, for the sake of test logs brevity. Fixes: 9f656db Signed-off-by: Kir Kolyshkin <[email protected]>
This is roughly the same as TestPIDHostInitProcessWait in libct/int, except that here we use separate processes to create and to kill a container, so the processes inside a container are not children of "runc kill", and also we hit different codepaths (nonChildProcess.signal rather than initProcess.signal). One other thing is, rootless is also tested. Signed-off-by: Kir Kolyshkin <[email protected]>
There are two very distinct usage scenarios for signalAllProcesses: * when used from the runc binary ("runc kill" command), the processes that it kills are not the children of "runc kill", and so calling wait(2) on each process is totally useless, as it will return ECHLD; * when used from a program that have created the container (such as libcontainer/integration test suite), that program can and should call wait(2), not the signalling code. So, the child reaping code is totally useless in the first case, and should be implemented by the program using libcontainer in the second case. I was not able to track down how this code was added, my best guess is it happened when this code was part of dockerd, which did not have a proper child reaper implemented at that time. Remove it, and add a proper documentation piece. Change the integration test accordingly. PS the first attempt to disable the child reaping code in signalAllProcesses was made in commit bb912eb, which used a questionable heuristic to figure out whether wait(2) should be called. This heuristic worked for a particular use case, but is not correct in general. While at it: - simplify signalAllProcesses to use unix.Kill; - document (container).Signal. Signed-off-by: Kir Kolyshkin <[email protected]>
When someone is using libcontainer to start and kill containers from a long lived process (i.e. the same process creates and removes the container), initProcess.wait method is used, which has a kludge to work around killing containers that do not have their own PID namespace. The code that checks for own PID namespace is not entirely correct. To be exact, it does not set sharePidns flag when the host/caller PID namespace is implicitly used. As a result, the above mentioned kludge does not work. Fix the issue, add a test case (which fails without the fix). Signed-off-by: Kir Kolyshkin <[email protected]>
By default, the container has its own PID namespace, and killing (with SIGKILL) its init process from the parent PID namespace also kills all the other processes. Obviously, it does not work that way when the container is sharing its PID namespace with the host or another container, since init is no longer special (it's not PID 1). In this case, killing container's init will result in a bunch of other processes left running (and thus the inability to remove the cgroup). The solution to the above problem is killing all the container processes, not just init. The problem with the current implementation is, the killing logic is implemented in libcontainer's initProcess.wait, and thus only available to libcontainer users, but not the runc kill command (which uses nonChildProcess.kill and does not use wait at all). So, some workarounds exist: - func destroy(c *Container) calls signalAllProcesses; - runc kill implements -a flag. This code became very tangled over time. Let's simplify things by moving the killing all processes from initProcess.wait to container.Signal, and documents the new behavior. In essence, this also makes `runc kill` to automatically kill all container processes when the container does not have its own PID namespace. Document that as well. Signed-off-by: Kir Kolyshkin <[email protected]>
As of previous commit, this is implied in a particular scenario. In fact, this is the one and only scenario that justifies the use of -a. Drop the option from the documentation. For backward compatibility, do recognize it, and retain the feature of ignoring the "container is stopped" error when set. Signed-off-by: Kir Kolyshkin <[email protected]>
Signed-off-by: Kir Kolyshkin <[email protected]>
cyphar
approved these changes
Jun 10, 2023
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
lgtm
Apparently this also fixes #4040 (added to description). |
Merged
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
The
cgroup.kill
API was added to Linux kernel 5.14 (see [1], [2]). This PR adds its support to runc, and changes quite a few things around killing containers.[1] https://lwn.net/Articles/855049/
[2] https://lwn.net/Articles/855924/
Fixes: #3135
Fixes: #3866
Fixes: #3864
Fixes: #4040
Closes: #3199
This also removes child reaper from libcontainer. It is useless for runc itself, and any libcontainer users who start a container need to take care of reaping their own children. See "libct: signalAllProcesses: remove child reaping" commit for more details.
Release note