Ensure we can always terminate the parent process on error #4355

lifubang · 2024-07-20T01:09:25Z

As we all know, we should terminate the parent process if there is an error when starting the container process,
but these terminate function are called in many places, for example: initProcess, setnsProcess, and Container,
if we forget this terminate call for some errors, it will let the container in unknown state, so we should change to
call it in some final places.

One possible place that missing terminate action:
https://github.com/opencontainers/runc/blob/v1.2.0-rc.2/libcontainer/container_linux.go#L357-L360

libcontainer/container_linux.go

rata · 2024-07-22T13:46:11Z

libcontainer/container_linux.go

+	if process.Init {
+		if err := ignoreTerminateErrors(process.ops.terminate()); err != nil {
+			logrus.WithError(err).Warn("unable to terminate initProcess")
+		}
+		if err := c.cgroupManager.Destroy(); err != nil {
+			logrus.WithError(err).Warn("unable to destroy cgroupManager")
+		}
+		if c.intelRdtManager != nil {
+			if err := c.intelRdtManager.Destroy(); err != nil {
+				logrus.WithError(err).Warn("unable to destroy intelRdtManager")
+			}
+		}


With the info in the PR and this github diff, it's not obvious why this makes sense. Can you please elaborate why before in many places we were not calling all of this and now we do? For example, the cgroup manager destroy wasn't called in all the code-paths before.

Was it missing? It wasn't needed but it's idempotent so it is fine?

I think whether these destroy methods are called or not both are OK.
We should not have different results in one function without a specific reason, for example: in container.Run().

And I think we should not destroy them in before, because runc doesn’t delete the failure container automatically, users must have to use ‘runc delete’ to destroy the failure container created by ‘runc create’ or ‘runc run’. How about remove these destroy methods call in here? I think it has no compatibility problems. WDYT?

Sorry, not sure I followed. Why do you want to remove these methods?

Also, what is the state the container is left? The delete operation must only work for stopped containers: https://github.com/opencontainers/runtime-spec/blob/main/runtime.md#delete

Sorry, not sure I followed. Why do you want to remove these methods?

Yes, these destroy methods can't be removed, because we should destroy the cgroup & intelRdt manager manually if we haven't saved container's state yet.

libcontainer/process_linux.go

rata · 2024-07-22T13:51:20Z

libcontainer/container_linux.go

@@ -229,6 +239,30 @@ func (c *Container) Exec() error {
 	return c.exec()
 }

+// terminate is to kill the container's init/exec process when got failure.
+func (c *Container) terminate(process *Process) {


Is it simple to add tests? Maybe for this function. We can fake the process interface and make it return an error, and test everything that needs to happen, indeed happens?

This might not add a lot of value now, but it will if we refactor this code in the future.

libcontainer/container_linux.go

lifubang · 2024-09-26T04:42:33Z

One possible place that missing terminate action:
https://github.com/opencontainers/runc/blob/v1.2.0-rc.2/libcontainer/container_linux.go#L357-L360

I think this is really a bug, but we will hit it with a very very tiny probability, do we think we want this PR in next 1.2.0 release candidate?

kolyshkin · 2024-12-03T01:04:47Z

libcontainer/container_linux.go

-func (c *Container) Start(process *Process) error {
+func (c *Container) Start(process *Process) (retErr error) {
 	c.m.Lock()
 	defer c.m.Unlock()
+	defer func() {
+		if retErr != nil {
+			c.terminate(process)
+		}
+	}()


To me, this looks overcomplicated, given that this function only calls c.start.

An alternative would be

func (c *Container) Start(process *Process) error { c.m.Lock() defer c.m.Unlock() if err := c.start(process); err != nil { c.terminate(process) return err } }

kolyshkin · 2024-12-03T01:09:43Z

libcontainer/container_linux.go

+			c.terminate(process)
+		}
+	}()
+


Similar to 150c32f#r1866835966, there's no need for a defer here. Something like this would work:

if !process.Init { return nil } err := c.exec() if err != nil { c.terminate(process) } return err

As we all know, we should terminate the parent process if there is an error when starting the container process, but these terminate function are called in many places, for example: `initProcess`, `setnsProcess`, and `Container`, if we forget this terminate call for some errors, it will let the container in unknown state, so we should change to call it in some final places. Signed-off-by: lifubang <[email protected]>

lifubang requested review from cyphar, kolyshkin and AkihiroSuda July 20, 2024 01:22

lifubang added the kind/bug label Jul 20, 2024

YanzhaoLi reviewed Jul 22, 2024

View reviewed changes

libcontainer/container_linux.go Outdated Show resolved Hide resolved

lifubang mentioned this pull request Jul 22, 2024

tests/int/hooks: fix failed hooks test #4352

Closed

thaJeztah reviewed Jul 22, 2024

View reviewed changes

libcontainer/container_linux.go Outdated Show resolved Hide resolved

thaJeztah reviewed Jul 22, 2024

View reviewed changes

libcontainer/container_linux.go Outdated Show resolved Hide resolved

lifubang force-pushed the fix-no-terminate-onerror branch from 77069ba to df3a94f Compare July 22, 2024 11:12

rata reviewed Jul 22, 2024

View reviewed changes

kolyshkin reviewed Sep 18, 2024

View reviewed changes

libcontainer/container_linux.go Show resolved Hide resolved

lifubang force-pushed the fix-no-terminate-onerror branch 2 times, most recently from 8dcd42b to 5a06eb6 Compare September 23, 2024 12:00

lifubang force-pushed the fix-no-terminate-onerror branch from 5a06eb6 to 150c32f Compare September 26, 2024 05:00

kolyshkin reviewed Dec 3, 2024

View reviewed changes

lifubang force-pushed the fix-no-terminate-onerror branch from 150c32f to 8ff0b71 Compare December 11, 2024 15:53

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Ensure we can always terminate the parent process on error #4355

Ensure we can always terminate the parent process on error #4355

lifubang commented Jul 20, 2024

rata Jul 22, 2024

lifubang Jul 23, 2024

lifubang Jul 23, 2024 •

edited

Loading

rata Jul 24, 2024

lifubang Sep 23, 2024

rata Jul 22, 2024

lifubang commented Sep 26, 2024

kolyshkin Dec 3, 2024

kolyshkin Dec 3, 2024

Ensure we can always terminate the parent process on error #4355

Are you sure you want to change the base?

Ensure we can always terminate the parent process on error #4355

Conversation

lifubang commented Jul 20, 2024

rata Jul 22, 2024

Choose a reason for hiding this comment

lifubang Jul 23, 2024

Choose a reason for hiding this comment

lifubang Jul 23, 2024 • edited Loading

Choose a reason for hiding this comment

rata Jul 24, 2024

Choose a reason for hiding this comment

lifubang Sep 23, 2024

Choose a reason for hiding this comment

rata Jul 22, 2024

Choose a reason for hiding this comment

lifubang commented Sep 26, 2024

kolyshkin Dec 3, 2024

Choose a reason for hiding this comment

kolyshkin Dec 3, 2024

Choose a reason for hiding this comment

lifubang Jul 23, 2024 •

edited

Loading