Avoid killing runc init early #2855

kolyshkin · 2021-03-16T01:35:52Z

Two commits, each one solving a problem of a stale bind mount
left by runc init after unsuccessful container start.

See #2843 for more details
and the initial investigation.

Closes: #2843

start: don't kill runc init too early

The stars can be aligned in a way that results in runc to leave a stale
bind mount in container's state directory, which manifests itself later,
while trying to remove the container, in an error like this:

remove /run/runc/test2: unlinkat /run/runc/test2/runc.W24K2t: device or resource busy

The stale mount happens because runc start/run/exec kills runc init
while it is inside ensure_cloned_binary(). One such scenario is when
a unified cgroup resource is specified for cgroup v1, a cgroup manager's
Apply returns an error (as of commit b006f4a), and when
(*initProcess).start() kills runc init just after it was started.

One solution is NOT to kill runc init too early. To achieve that,
amend the libcontainer/nsenter code to send a \0 byte to signal
that it is past the initial setup, and make start() (for both
run/start and exec) wait for this byte before proceeding with
kill on an error path.

While at it, improve some error messages.

libct/configs/validator: add some cgroup support

Add some minimal validation for cgroups. The following checks
are implemented:

cgroup name and/or prefix (or path) is set;
for cgroup v1, unified resources are not set;
for cgroup v2, if memorySwap is set, memory is also set,
and memorySwap > memory.

This makes some invalid configurations fail earlier (before runc init
is started), which is better, as this should prevent killing runc init
in the middle of ensure_cloned_binary().

kolyshkin · 2021-03-16T03:03:06Z

Some repo (as configured by github) appears to be broken:

Err:51 https://dl.bintray.com/sbt/debian Release
502 Bad Gateway [IP: 52.43.227.140 443]

Restarted; got a different but similar error:

E: The repository 'https://dl.bintray.com/sbt/debian Release' is no longer signed.

kolyshkin · 2021-03-16T23:17:55Z

(validate CI is failing here -- fixed in #2860)

@AkihiroSuda @cyphar @lifubang PTAL

kolyshkin · 2021-03-23T03:36:51Z

@AkihiroSuda @cyphar @lifubang PTAL (please ignore failed CI -- it is fixed in #2860)

AkihiroSuda · 2021-03-29T05:41:10Z

Could you rebase, LGTM then

Add some minimal validation for cgroups. The following checks are implemented: - cgroup name and/or prefix (or path) is set; - for cgroup v1, unified resources are not set; - for cgroup v2, if memorySwap is set, memory is also set, and memorySwap > memory. This makes some invalid configurations fail earlier (before runc init is started), which is better. Signed-off-by: Kir Kolyshkin <[email protected]>

The stars can be aligned in a way that results in runc to leave a stale bind mount in container's state directory, which manifests itself later, while trying to remove the container, in an error like this: > remove /run/runc/test2: unlinkat /run/runc/test2/runc.W24K2t: device or resource busy The stale mount happens because runc start/run/exec kills runc init while it is inside ensure_cloned_binary(). One such scenario is when a unified cgroup resource is specified for cgroup v1, a cgroup manager's Apply returns an error (as of commit b006f4a), and when (*initProcess).start() kills runc init just after it was started. One solution is NOT to kill runc init too early. To achieve that, amend the libcontainer/nsenter code to send a \0 byte to signal that it is past the initial setup, and make start() (for both run/start and exec) wait for this byte before proceeding with kill on an error path. While at it, improve some error messages. Signed-off-by: Kir Kolyshkin <[email protected]>

kolyshkin · 2021-03-31T21:38:13Z

Could you rebase, LGTM then

Rebased to include #2860

cyphar

LGTM.

kolyshkin added this to the 1.0.0-rc94 milestone Mar 16, 2021

kolyshkin marked this pull request as draft March 16, 2021 17:22

kolyshkin force-pushed the dont-kill-init-early branch 3 times, most recently from a59f526 to 6c408c9 Compare March 16, 2021 21:43

kolyshkin marked this pull request as ready for review March 16, 2021 23:18

cyphar self-requested a review March 18, 2021 05:13

kolyshkin requested review from AkihiroSuda and mrunalp March 23, 2021 03:37

kolyshkin mentioned this pull request Mar 25, 2021

umount all mount points in runc root dir #2843

Closed

kolyshkin added 2 commits March 31, 2021 14:36

kolyshkin force-pushed the dont-kill-init-early branch from 6c408c9 to 4ecff8d Compare March 31, 2021 21:37

AkihiroSuda approved these changes Apr 1, 2021

View reviewed changes

thaJeztah mentioned this pull request Apr 2, 2021

libcontainer/configs/validate: make Validate() less DRY #2886

Merged

cyphar approved these changes Apr 3, 2021

View reviewed changes

cyphar closed this in 0d49470 Apr 3, 2021

cyphar merged commit 0d49470 into opencontainers:master Apr 3, 2021

kolyshkin mentioned this pull request May 6, 2021

rc94 discussion (mid-April 2021?) #2790

Closed

kolyshkin added the impact/changelog label May 6, 2021

kolyshkin mentioned this pull request Oct 11, 2024

libct: rm initWaiter #4441

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Avoid killing runc init early #2855

Avoid killing runc init early #2855

kolyshkin commented Mar 16, 2021

kolyshkin commented Mar 16, 2021

kolyshkin commented Mar 16, 2021

kolyshkin commented Mar 23, 2021

AkihiroSuda commented Mar 29, 2021

kolyshkin commented Mar 31, 2021

cyphar left a comment

Avoid killing runc init early #2855

Avoid killing runc init early #2855

Conversation

kolyshkin commented Mar 16, 2021

start: don't kill runc init too early

libct/configs/validator: add some cgroup support

kolyshkin commented Mar 16, 2021

kolyshkin commented Mar 16, 2021

kolyshkin commented Mar 23, 2021

AkihiroSuda commented Mar 29, 2021

kolyshkin commented Mar 31, 2021

cyphar left a comment

Choose a reason for hiding this comment