Skip to content

Conversation

@mheon
Copy link
Member

@mheon mheon commented Apr 1, 2019

Cooked this up after lunch. Implements most of Docker's --restart flag. We can't implement unless-stopped (we don't have a daemon and as such never experience a daemon restart), and I haven't coded up restart counts yet (another half hour or so of work), but always and on-error with infinite restarts work as advertised.

Needs tests, bash completions, and for me to stop being lazy and code up restart count.

@openshift-ci-robot openshift-ci-robot added the do-not-merge/work-in-progress Indicates that a PR should not merge because it is a work in progress. label Apr 1, 2019
@openshift-ci-robot
Copy link
Collaborator

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: mheon

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Details Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@openshift-ci-robot openshift-ci-robot added approved Indicates a PR has been approved by an approver from all required OWNERS files. size/L labels Apr 1, 2019
@mheon
Copy link
Member Author

mheon commented Apr 1, 2019

Added support for retry count with the on-failure policy. All that's left is manpages and tests

@mheon
Copy link
Member Author

mheon commented Apr 1, 2019

Manpages are in

@mheon
Copy link
Member Author

mheon commented Apr 2, 2019

I'm going to drop the WIP. Tests are still missing, though. Need to think about how we can test this in a way that doesn't introduce a half dozen more race conditions.

@mheon mheon changed the title [WIP] Add restart policy for containers Add restart policy for containers Apr 2, 2019
@openshift-ci-robot openshift-ci-robot removed the do-not-merge/work-in-progress Indicates that a PR should not merge because it is a work in progress. label Apr 2, 2019
@mheon
Copy link
Member Author

mheon commented Apr 2, 2019

bot, retest this please

@rhatdan
Copy link
Member

rhatdan commented Apr 2, 2019

Restart policy is not allowed to be specified if user also specifies --rm.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Docker also supports unless-stopped

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

    -   **RestartPolicy** – The behavior to apply when the container exits.  The
            value is an object with a `Name` property of either `"always"` to
            always restart, `"unless-stopped"` to restart always except when
            user has manually stopped the container or `"on-failure"` to restart only when the container
            exit code is non-zero.  If `on-failure` is used, `MaximumRetryCount`
            controls the number of times to retry before giving up.
            The default is not to restart. (optional)
            An ever increasing delay (double the previous delay, starting at 100mS)
            is added before each restart to prevent flooding the server.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Unless-stopped is very much a "restarted the daemon" thing, so I just throw an error about it in pkg/spec if they try to pass it.

We still want to point people to systemd for cases like that, where the container will be restarted after system reboot, for example. The manpages make this clear, but it might be good to add the warning to Podman itself too.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

My reading of unless-stopped, seems to be exactly what you are designing. IE If a user stops a container then it will not restart, otherwise if the container fails, it will be restarted.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It seems the interpretation depends on where we look. The man page is referring to the daemon start only while the docs state:

Similar to always, except that when the container is stopped (manually or otherwise), it is not restarted even after Docker daemon restarts

This makes me believe that we can support it.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

From that description, it's identical to always - restart policy never triggers after the container is stopped via an API call.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

(Minus the daemon restart aspect)

@mheon
Copy link
Member Author

mheon commented Apr 2, 2019 via email

@mheon
Copy link
Member Author

mheon commented Apr 2, 2019

--rm and --restart now conflict

@stevekuznetsov
Copy link
Contributor

/retest

@kunalkushwaha
Copy link
Collaborator

I tested the PR locally with all condition.. working fine as expected.


One question regarding flow of control for triggering handleRestartPolicy() or container.Cleanup() function, how this is invoked when podman is not running?

@mheon
Copy link
Member Author

mheon commented Apr 3, 2019

In short, it's not.

This version of restart policy takes effect when the Podman cleanup process fires. When a Podman container exits, conmon launches the podman cleanup command to clean mounts, handle --rm, etc. The major component of this is to call Cleanup(), which has the authority to invoke restart policy.

We can't actually detect if a restart is required there, though, as in some cases (podman run specifically), we have other processes that might be checking the container's state before podman cleanup gets there. These processes may pick up the container's state change from running to stopped before podman cleanup does, so we need to store the restart policy trigger in the database. The good news is that the place where it's detected, syncContainer(), is called before Cleanup() as well (and all other Podman commands that can modify a container's state), so we're guaranteed to pick up the condition that triggers the restart, and when the cleanup command executes we'll know to perform a restart.

There are limitations to this approach - most notably, we cannot restart containers after a system reboot, because we don't have a running daemon to handle restart policy, and we don't know when a Podman process will first run. For use cases requiring that, using systemd unit files is still recommended - @baude was discussing writing a podman generate systemd command to assist in generating them.

@rhatdan
Copy link
Member

rhatdan commented Apr 3, 2019

@mheon The information you just typed, should probably be added to the man page, and perhaps a troublshoot.md (Minus the stuff about podman generate systemd, (Until it exists.)

@rhatdan
Copy link
Member

rhatdan commented Apr 3, 2019

bot, retest this please

@rhatdan
Copy link
Member

rhatdan commented Apr 3, 2019

LGTM

@mheon
Copy link
Member Author

mheon commented Apr 3, 2019

Working on tests now. Once they're in, I'll rebase to pick up Cirrus fixes and we can merge

@mheon mheon force-pushed the restart_policy branch 5 times, most recently from 2da2bce to 73c1446 Compare April 3, 2019 22:37
mheon added 10 commits May 3, 2019 10:36
This initial version does not support restart count, but it works
as advertised otherwise.

Signed-off-by: Matthew Heon <[email protected]>
Noticed this when testing some behavior with Docker.

Signed-off-by: Matthew Heon <[email protected]>
The on-failure restart option supports restarting only a given
number of times. To do this, we need one additional field in the
DB to track restart count (which conveniently fills a field in
Inspect we weren't populating), plus some plumbing logic.

Signed-off-by: Matthew Heon <[email protected]>
Signed-off-by: Matthew Heon <[email protected]>
@mheon mheon force-pushed the restart_policy branch from 0fc9ade to d7c367a Compare May 3, 2019 14:36
@mheon
Copy link
Member Author

mheon commented May 3, 2019

Comments addressed, let's see if CI starts to behave

mheon added 2 commits May 3, 2019 10:43
Ensure that we can decode the restart event with the new journald
events.

Signed-off-by: Matthew Heon <[email protected]>
Signed-off-by: Matthew Heon <[email protected]>
@mheon
Copy link
Member Author

mheon commented May 3, 2019

/retest

- `always` : Restart containers when they exit, regardless of status, retrying indefinitely

Please note that restart will not restart containers after a system reboot.
This this functionality is required in your environment, you can invoke Podman from a systemd unit file, or create an init script for whichever init system is in use.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

s/This this/If this/

- `always` : Restart containers when they exit, regardless of status, retrying indefinitely

Please note that restart will not restart containers after a system reboot.
This this functionality is required in your environment, you can invoke Podman from a systemd unit file, or create an init script for whichever init system is in use.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ditto This this

Signed-off-by: Matthew Heon <[email protected]>
@TomSweeneyRedHat
Copy link
Member

LGTM, but would like another head nod or two.

@rhatdan
Copy link
Member

rhatdan commented May 3, 2019

Once this passes test, I will merge.

@mheon mheon force-pushed the restart_policy branch from ca20274 to afb47f7 Compare May 3, 2019 18:06
Theory: it's SELinux blowing up and preventing us from creating
files as the container. Try and use a fresh dir and :Z to fix.

Signed-off-by: Matthew Heon <[email protected]>
@mheon mheon force-pushed the restart_policy branch from afb47f7 to d328695 Compare May 3, 2019 18:38
@mheon
Copy link
Member Author

mheon commented May 3, 2019

Tests are finally starting to pass

@TomSweeneyRedHat
Copy link
Member

I'll beat @rhatdan to the punch.
/lgtm

@openshift-ci-robot openshift-ci-robot added the lgtm Indicates that a PR is ready to be merged. label May 3, 2019
@openshift-merge-robot openshift-merge-robot merged commit 4aa9014 into containers:master May 3, 2019
@github-actions github-actions bot added the locked - please file new issue/PR Assist humans wanting to comment on an old issue or PR with locked comments. label Sep 26, 2023
@github-actions github-actions bot locked as resolved and limited conversation to collaborators Sep 26, 2023
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.

Labels

approved Indicates a PR has been approved by an approver from all required OWNERS files. lgtm Indicates that a PR is ready to be merged. locked - please file new issue/PR Assist humans wanting to comment on an old issue or PR with locked comments.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

9 participants