Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Avoid killing runc init early #2855

Merged
merged 2 commits into from
Apr 3, 2021

Conversation

kolyshkin
Copy link
Contributor

Two commits, each one solving a problem of a stale bind mount
left by runc init after unsuccessful container start.

See #2843 for more details
and the initial investigation.

Closes: #2843

start: don't kill runc init too early

The stars can be aligned in a way that results in runc to leave a stale
bind mount in container's state directory, which manifests itself later,
while trying to remove the container, in an error like this:

remove /run/runc/test2: unlinkat /run/runc/test2/runc.W24K2t: device or resource busy

The stale mount happens because runc start/run/exec kills runc init
while it is inside ensure_cloned_binary(). One such scenario is when
a unified cgroup resource is specified for cgroup v1, a cgroup manager's
Apply returns an error (as of commit b006f4a), and when
(*initProcess).start() kills runc init just after it was started.

One solution is NOT to kill runc init too early. To achieve that,
amend the libcontainer/nsenter code to send a \0 byte to signal
that it is past the initial setup, and make start() (for both
run/start and exec) wait for this byte before proceeding with
kill on an error path.

While at it, improve some error messages.

libct/configs/validator: add some cgroup support

Add some minimal validation for cgroups. The following checks
are implemented:

  • cgroup name and/or prefix (or path) is set;
  • for cgroup v1, unified resources are not set;
  • for cgroup v2, if memorySwap is set, memory is also set,
    and memorySwap > memory.

This makes some invalid configurations fail earlier (before runc init
is started), which is better, as this should prevent killing runc init
in the middle of ensure_cloned_binary().

@kolyshkin kolyshkin added this to the 1.0.0-rc94 milestone Mar 16, 2021
@kolyshkin
Copy link
Contributor Author

Some repo (as configured by github) appears to be broken:

Err:51 https://dl.bintray.com/sbt/debian Release
502 Bad Gateway [IP: 52.43.227.140 443]

Restarted; got a different but similar error:

E: The repository 'https://dl.bintray.com/sbt/debian Release' is no longer signed.

@kolyshkin kolyshkin marked this pull request as draft March 16, 2021 17:22
@kolyshkin kolyshkin force-pushed the dont-kill-init-early branch 3 times, most recently from a59f526 to 6c408c9 Compare March 16, 2021 21:43
@kolyshkin
Copy link
Contributor Author

(validate CI is failing here -- fixed in #2860)

@AkihiroSuda @cyphar @lifubang PTAL

@kolyshkin kolyshkin marked this pull request as ready for review March 16, 2021 23:18
@cyphar cyphar self-requested a review March 18, 2021 05:13
@kolyshkin
Copy link
Contributor Author

@AkihiroSuda @cyphar @lifubang PTAL (please ignore failed CI -- it is fixed in #2860)

@AkihiroSuda
Copy link
Member

Could you rebase, LGTM then

Add some minimal validation for cgroups. The following checks
are implemented:

 - cgroup name and/or prefix (or path) is set;
 - for cgroup v1, unified resources are not set;
 - for cgroup v2, if memorySwap is set, memory is also set,
   and memorySwap > memory.

This makes some invalid configurations fail earlier (before runc init
is started), which is better.

Signed-off-by: Kir Kolyshkin <[email protected]>
The stars can be aligned in a way that results in runc to leave a stale
bind mount in container's state directory, which manifests itself later,
while trying to remove the container, in an error like this:

> remove /run/runc/test2: unlinkat /run/runc/test2/runc.W24K2t: device or resource busy

The stale mount happens because runc start/run/exec kills runc init
while it is inside ensure_cloned_binary(). One such scenario is when
a unified cgroup resource is specified for cgroup v1, a cgroup manager's
Apply returns an error (as of commit b006f4a), and when
(*initProcess).start() kills runc init just after it was started.

One solution is NOT to kill runc init too early. To achieve that,
amend the libcontainer/nsenter code to send a \0 byte to signal
that it is past the initial setup, and make start() (for both
run/start and exec) wait for this byte before proceeding with
kill on an error path.

While at it, improve some error messages.

Signed-off-by: Kir Kolyshkin <[email protected]>
@kolyshkin kolyshkin force-pushed the dont-kill-init-early branch from 6c408c9 to 4ecff8d Compare March 31, 2021 21:37
@kolyshkin
Copy link
Contributor Author

Could you rebase, LGTM then

Rebased to include #2860

Copy link
Member

@cyphar cyphar left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM.

@cyphar cyphar closed this in 0d49470 Apr 3, 2021
@cyphar cyphar merged commit 0d49470 into opencontainers:master Apr 3, 2021
@kolyshkin kolyshkin mentioned this pull request Oct 11, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants