Add restart policy for containers #2826

mheon · 2019-04-01T19:25:18Z

Cooked this up after lunch. Implements most of Docker's --restart flag. We can't implement unless-stopped (we don't have a daemon and as such never experience a daemon restart), and I haven't coded up restart counts yet (another half hour or so of work), but always and on-error with infinite restarts work as advertised.

Needs tests, bash completions, and for me to stop being lazy and code up restart count.

openshift-ci-robot · 2019-04-01T19:25:21Z

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: mheon

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Details

Needs approval from an approver in each of these files:

~~OWNERS~~ [mheon]

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

mheon · 2019-04-01T23:22:00Z

Added support for retry count with the on-failure policy. All that's left is manpages and tests

mheon · 2019-04-01T23:38:33Z

Manpages are in

mheon · 2019-04-02T02:09:28Z

I'm going to drop the WIP. Tests are still missing, though. Need to think about how we can test this in a way that doesn't introduce a half dozen more race conditions.

mheon · 2019-04-02T02:10:06Z

bot, retest this please

rhatdan · 2019-04-02T13:03:46Z

Restart policy is not allowed to be specified if user also specifies --rm.

rhatdan · 2019-04-02T12:59:20Z

libpod/container.go

Docker also supports unless-stopped

- **RestartPolicy** – The behavior to apply when the container exits. The value is an object with a `Name` property of either `"always"` to always restart, `"unless-stopped"` to restart always except when user has manually stopped the container or `"on-failure"` to restart only when the container exit code is non-zero. If `on-failure` is used, `MaximumRetryCount` controls the number of times to retry before giving up. The default is not to restart. (optional) An ever increasing delay (double the previous delay, starting at 100mS) is added before each restart to prevent flooding the server.

Unless-stopped is very much a "restarted the daemon" thing, so I just throw an error about it in pkg/spec if they try to pass it.

We still want to point people to systemd for cases like that, where the container will be restarted after system reboot, for example. The manpages make this clear, but it might be good to add the warning to Podman itself too.

My reading of unless-stopped, seems to be exactly what you are designing. IE If a user stops a container then it will not restart, otherwise if the container fails, it will be restarted.

It seems the interpretation depends on where we look. The man page is referring to the daemon start only while the docs state:

Similar to always, except that when the container is stopped (manually or otherwise), it is not restarted even after Docker daemon restarts

This makes me believe that we can support it.

From that description, it's identical to always - restart policy never triggers after the container is stopped via an API call.

(Minus the daemon restart aspect)

mheon · 2019-04-02T13:26:35Z

Unless stopped only makes sense with a Daemon. We detect it being passed in and error that it's not supported.

…

On Tue, Apr 2, 2019, 09:04 Daniel J Walsh ***@***.***> wrote: ***@***.**** commented on this pull request. ------------------------------ In libpod/container.go <#2826 (comment)>: > @@ -102,6 +102,20 @@ func (ns LinuxNS) String() string { } } +// Valid restart policy types. +const ( + // RestartPolicyNone indicates that no restart policy has been requested + // by a container. + RestartPolicyNone = "" + // RestartPolicyNo is identical in function to RestartPolicyNone. + RestartPolicyNo = "no" + // RestartPolicyAlways unconditionally restarts the container. + RestartPolicyAlways = "always" + // RestartPolicyOnFailure restarts the container on non-0 exit code, + // with an optional maximum number of retries. + RestartPolicyOnFailure = "on-failure" Docker also supports unless-stopped ------------------------------ In libpod/container.go <#2826 (comment)>: > @@ -102,6 +102,20 @@ func (ns LinuxNS) String() string { } } +// Valid restart policy types. +const ( + // RestartPolicyNone indicates that no restart policy has been requested + // by a container. + RestartPolicyNone = "" + // RestartPolicyNo is identical in function to RestartPolicyNone. + RestartPolicyNo = "no" + // RestartPolicyAlways unconditionally restarts the container. + RestartPolicyAlways = "always" + // RestartPolicyOnFailure restarts the container on non-0 exit code, + // with an optional maximum number of retries. + RestartPolicyOnFailure = "on-failure" - **RestartPolicy** – The behavior to apply when the container exits. The value is an object with a `Name` property of either `"always"` to always restart, `"unless-stopped"` to restart always except when user has manually stopped the container or `"on-failure"` to restart only when the container exit code is non-zero. If `on-failure` is used, `MaximumRetryCount` controls the number of times to retry before giving up. The default is not to restart. (optional) An ever increasing delay (double the previous delay, starting at 100mS) is added before each restart to prevent flooding the server. — You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub <#2826 (review)>, or mute the thread <https://github.com/notifications/unsubscribe-auth/AHYHCMVjeZm32hMbFQh4irBy-LiWuE14ks5vc1VCgaJpZM4cWdiC> .

mheon · 2019-04-02T14:27:53Z

--rm and --restart now conflict

stevekuznetsov · 2019-04-02T16:34:45Z

/retest

kunalkushwaha · 2019-04-03T01:08:35Z

I tested the PR locally with all condition.. working fine as expected.

One question regarding flow of control for triggering handleRestartPolicy() or container.Cleanup() function, how this is invoked when podman is not running?

mheon · 2019-04-03T02:13:07Z

In short, it's not.

This version of restart policy takes effect when the Podman cleanup process fires. When a Podman container exits, conmon launches the podman cleanup command to clean mounts, handle --rm, etc. The major component of this is to call Cleanup(), which has the authority to invoke restart policy.

We can't actually detect if a restart is required there, though, as in some cases (podman run specifically), we have other processes that might be checking the container's state before podman cleanup gets there. These processes may pick up the container's state change from running to stopped before podman cleanup does, so we need to store the restart policy trigger in the database. The good news is that the place where it's detected, syncContainer(), is called before Cleanup() as well (and all other Podman commands that can modify a container's state), so we're guaranteed to pick up the condition that triggers the restart, and when the cleanup command executes we'll know to perform a restart.

There are limitations to this approach - most notably, we cannot restart containers after a system reboot, because we don't have a running daemon to handle restart policy, and we don't know when a Podman process will first run. For use cases requiring that, using systemd unit files is still recommended - @baude was discussing writing a podman generate systemd command to assist in generating them.

rhatdan · 2019-04-03T12:15:32Z

@mheon The information you just typed, should probably be added to the man page, and perhaps a troublshoot.md (Minus the stuff about podman generate systemd, (Until it exists.)

rhatdan · 2019-04-03T17:38:31Z

bot, retest this please

rhatdan · 2019-04-03T17:39:08Z

LGTM

mheon · 2019-04-03T18:18:26Z

Working on tests now. Once they're in, I'll rebase to pick up Cirrus fixes and we can merge

This initial version does not support restart count, but it works as advertised otherwise. Signed-off-by: Matthew Heon <[email protected]>

Noticed this when testing some behavior with Docker. Signed-off-by: Matthew Heon <[email protected]>

The on-failure restart option supports restarting only a given number of times. To do this, we need one additional field in the DB to track restart count (which conveniently fills a field in Inspect we weren't populating), plus some plumbing logic. Signed-off-by: Matthew Heon <[email protected]>

Signed-off-by: Matthew Heon <[email protected]>

mheon · 2019-05-03T14:36:39Z

Comments addressed, let's see if CI starts to behave

Ensure that we can decode the restart event with the new journald events. Signed-off-by: Matthew Heon <[email protected]>

Signed-off-by: Matthew Heon <[email protected]>

mheon · 2019-05-03T16:00:15Z

/retest

TomSweeneyRedHat · 2019-05-03T16:19:28Z

docs/podman-create.1.md

+- `always`                   : Restart containers when they exit, regardless of status, retrying indefinitely
+
+Please note that restart will not restart containers after a system reboot.
+This this functionality is required in your environment, you can invoke Podman from a systemd unit file, or create an init script for whichever init system is in use.


s/This this/If this/

TomSweeneyRedHat · 2019-05-03T16:19:50Z

docs/podman-run.1.md

+- `always`                   : Restart containers when they exit, regardless of status, retrying indefinitely
+
+Please note that restart will not restart containers after a system reboot.
+This this functionality is required in your environment, you can invoke Podman from a systemd unit file, or create an init script for whichever init system is in use.


ditto This this

Signed-off-by: Matthew Heon <[email protected]>

TomSweeneyRedHat · 2019-05-03T16:38:12Z

LGTM, but would like another head nod or two.

rhatdan · 2019-05-03T16:46:37Z

Once this passes test, I will merge.

Theory: it's SELinux blowing up and preventing us from creating files as the container. Try and use a fresh dir and :Z to fix. Signed-off-by: Matthew Heon <[email protected]>

mheon · 2019-05-03T19:42:00Z

Tests are finally starting to pass

TomSweeneyRedHat · 2019-05-03T21:13:25Z

I'll beat @rhatdan to the punch.
/lgtm

openshift-ci-robot added the do-not-merge/work-in-progress Indicates that a PR should not merge because it is a work in progress. label Apr 1, 2019

openshift-ci-robot requested review from giuseppe and umohnani8 April 1, 2019 19:25

openshift-ci-robot added approved Indicates a PR has been approved by an approver from all required OWNERS files. size/L labels Apr 1, 2019

mheon mentioned this pull request Apr 1, 2019

Support --restart=no for restart #2828

Closed

mheon changed the title ~~[WIP] Add restart policy for containers~~ Add restart policy for containers Apr 2, 2019

openshift-ci-robot removed the do-not-merge/work-in-progress Indicates that a PR should not merge because it is a work in progress. label Apr 2, 2019

rhatdan reviewed Apr 2, 2019

View reviewed changes

mheon force-pushed the restart_policy branch 5 times, most recently from 2da2bce to 73c1446 Compare April 3, 2019 22:37

mheon added the Release Notes 1.3.0 label Apr 4, 2019

mheon added 10 commits May 3, 2019 10:36

Add container restart policy to Libpod & Podman

0d73ee4

This initial version does not support restart count, but it works as advertised otherwise. Signed-off-by: Matthew Heon <[email protected]>

Sending signals to containers prevents restart policy

dc42304

Noticed this when testing some behavior with Docker. Signed-off-by: Matthew Heon <[email protected]>

Add manpage information for restart policy

357e4c3

Signed-off-by: Matthew Heon <[email protected]>

Move to using constants for valid restart policy types

7ba1b60

Signed-off-by: Matthew Heon <[email protected]>

Restart policy conflicts with the --rm flag

948fb5e

Signed-off-by: Matthew Heon <[email protected]>

Restart policy should not run if a container is running

56356d7

Signed-off-by: Matthew Heon <[email protected]>

Add a restart event, and make one during restart policy

cafb68e

Signed-off-by: Matthew Heon <[email protected]>

Add a test for restart policy

e1443fe

Signed-off-by: Matthew Heon <[email protected]>

Address review comments on restart policy

d7c367a

Signed-off-by: Matthew Heon <[email protected]>

mheon force-pushed the restart_policy branch from 0fc9ade to d7c367a Compare May 3, 2019 14:36

mheon added 2 commits May 3, 2019 10:43

Fix 'restart' event in journald

4d348d7

Ensure that we can decode the restart event with the new journald events. Signed-off-by: Matthew Heon <[email protected]>

Small code fix

5c4fefa

Signed-off-by: Matthew Heon <[email protected]>

TomSweeneyRedHat reviewed May 3, 2019

View reviewed changes

Fix manpage typos

ceaaed7

Signed-off-by: Matthew Heon <[email protected]>

mheon force-pushed the restart_policy branch from ca20274 to afb47f7 Compare May 3, 2019 18:06

Try and fix restart-policy tests

d328695

Theory: it's SELinux blowing up and preventing us from creating files as the container. Try and use a fresh dir and :Z to fix. Signed-off-by: Matthew Heon <[email protected]>

mheon force-pushed the restart_policy branch from afb47f7 to d328695 Compare May 3, 2019 18:38

openshift-ci-robot assigned TomSweeneyRedHat May 3, 2019

openshift-ci-robot added the lgtm Indicates that a PR is ready to be merged. label May 3, 2019

openshift-merge-robot merged commit 4aa9014 into containers:master May 3, 2019

github-actions bot added the locked - please file new issue/PR Assist humans wanting to comment on an old issue or PR with locked comments. label Sep 26, 2023

github-actions bot locked as resolved and limited conversation to collaborators Sep 26, 2023

Add restart policy for containers #2826

Add restart policy for containers #2826

Uh oh!

Conversation

mheon commented Apr 1, 2019

Uh oh!

openshift-ci-robot commented Apr 1, 2019

Uh oh!

mheon commented Apr 1, 2019

Uh oh!

mheon commented Apr 1, 2019

Uh oh!

mheon commented Apr 2, 2019

Uh oh!

mheon commented Apr 2, 2019

Uh oh!

rhatdan commented Apr 2, 2019

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

mheon commented Apr 2, 2019 via email

Uh oh!

mheon commented Apr 2, 2019

Uh oh!

stevekuznetsov commented Apr 2, 2019

Uh oh!

kunalkushwaha commented Apr 3, 2019

Uh oh!

mheon commented Apr 3, 2019

Uh oh!

rhatdan commented Apr 3, 2019

Uh oh!

rhatdan commented Apr 3, 2019

Uh oh!

rhatdan commented Apr 3, 2019

Uh oh!

mheon commented Apr 3, 2019

Uh oh!

mheon commented May 3, 2019

Uh oh!

mheon commented May 3, 2019

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

TomSweeneyRedHat commented May 3, 2019

Uh oh!

rhatdan commented May 3, 2019

Uh oh!

mheon commented May 3, 2019

Uh oh!

TomSweeneyRedHat commented May 3, 2019

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

9 participants