Skip to content

Commit

Permalink
Redesign to additional probe
Browse files Browse the repository at this point in the history
  • Loading branch information
matthyx committed May 1, 2019
1 parent a124db0 commit 4131790
Showing 1 changed file with 44 additions and 38 deletions.
Original file line number Diff line number Diff line change
@@ -1,29 +1,30 @@
---
title: Add initializationFailureThreshold to health probes
title: Add pod-startup liveness-probe holdoff for slow-starting pods
authors:
- "@matthyx"
owning-sig: sig-node
participating-sigs:
- sig-apps
- sig-architecture
reviewers:
- @RobertKrawitz
- @thockin
approvers:
- @RobertKrawitz
- @derekwaynecarr
- @thockin
editor: TBD
creation-date: 2019-02-21
last-updated: 2019-04-12
status: provisional
last-updated: 2019-05-01
status: implementable
see-also:
replaces:
superseded-by:
---

# Add initializationFailureThreshold to health probes
# Add pod-startup liveness-probe holdoff for slow-starting pods

## Table of Contents

- [Add initializationFailureThreshold to health probes](#add-initializationFailurethreshold-to-health-probes)
- [Add pod-startup liveness-probe holdoff for slow-starting pods](#add-pod-startup-liveness-probe-holdoff-for-slow-starting-pods)
- [Table of Contents](#table-of-contents)
- [Release Signoff Checklist](#release-signoff-checklist)
- [Summary](#summary)
Expand All @@ -36,14 +37,16 @@ superseded-by:
- [Stateless kubelet](#stateless-kubelet)
- [Design Details](#design-details)
- [Test Plan](#test-plan)
- [Feature Gate](#feature-gate)
- [Graduation Criteria](#graduation-criteria)
- [Implementation History](#implementation-history)

[Tools for generating]: https://github.com/ekalinin/github-markdown-toc

## Release Signoff Checklist

- [x] kubernetes/enhancements issue in release milestone, which links to KEP (this should be a link to the KEP location in kubernetes/enhancements, not the initial KEP PR)
- [ ] KEP approvers have set the KEP status to `implementable`
- [X] KEP approvers have set the KEP status to `implementable`
- [ ] Design details are appropriately documented
- [ ] Test plan is in place, giving consideration to SIG Architecture and SIG Testing input
- [ ] Graduation criteria is in place
Expand All @@ -58,7 +61,7 @@ superseded-by:

Slow starting containers are difficult to address with the current status of health probes: they are either killed before being up, or could be left deadlocked during a very long time before being killed.

This proposal adds a numerical option `initializationFailureThreshold` to probes allowing a greater number of failures during the initial start of the container before taking action, while keeping `failureThreshold` at a minimum to restart deadlocked containers after an acceptable delay.
This proposal adds a new probe called `startupProbe` that holds off all the other probes until the pod has finished its startup. In the case of a slow-starting pod, it could poll on a relatively short period with a high `failureThreshold`. Once it is satisfied, the other probes can start.

## Motivation

Expand All @@ -83,70 +86,73 @@ However, none of these strategies provide an timely answer to slow starting cont
- Improve documentation of the `Probe` structure in core types' API.
- Improve `kubernetes.io/docs` section about Pod lifecycle:
- Clearly state that PostStart handlers do not delay probe executions.
- Introduce and explain this new option.
- Document that `kubelet` does not save states, and what are the implications with this new option (see Risks and Mitigations).
- Document appropriate use cases for this new option.
- Introduce and explain this new probe.
- Document that `kubelet` does not save states, and what are the implications with this new probe (see Risks and Mitigations).
- Document appropriate use cases for this new probe.

### Non-Goals

- This proposal does not address the issue of pod load affecting startup (or any other probe that may be delayed due to load). It is acting strictly at the pod level, not the node level.
- This proposal will only update the official Kubernetes documentation, excluding [A Pod's Life] and other well referenced pages explaining probes.
- This proposal does not propose to change the stateless nature of the kubelet.

[A Pod's Life]: https://blog.openshift.com/kubernetes-pods-life/

## Proposal

### Implementation Details

The proposed solution is to add a new `int32` field named `InitializationFailureThreshold` to the type `Probe` of the core API, which is mapped to the numerical option `initializationFailureThreshold`.
The proposed solution is to add a new probe named `startupProbe` in the container spec of a pod which will determine whether it has finished starting up.

It also requires keeping the state of the container (has the probe ever succeeded?) using a boolean `hasInitialized` inside the kubelet worker.
It also requires keeping the state of the container (has the `startupProbe` ever succeeded?) using a boolean `hasStarted` inside the kubelet worker.

The combination of `hasInitialized` false and `resultRun` count lower than `initializationFailureThreshold` becomes another condition to return `true` to the probe state in `worker.go`.
Depending on `hasStarted` the probing mechanism in `worker.go` might be altered:

For example, if `periodSeconds` is 10, `initializationFailureThreshold` is 20, and `failureThreshold` is 3, it means that:
- `hasStarted == true`: the kubelet worker works the same way as today
- `hasStarted == false`: the kubelet worker only probes the `startupProbe`

- The kubelet will allow the container 200 seconds to start (20 probes, spaced 10 seconds apart).
- If a probe succeeds at any time during that interval, the container is considered to have started, and `failureThreshold` is used thereafter.
If `startupProbe` fails more than `failureThreshold` times, the result is the same as today when `livenessProbe` fails: the container is killed and might be restarted depending on `restartPolicy`.

This means that all these cases will lead to a container being terminated:

- The container fails 20 probes at startup. It is considered to have failed, and is terminated after 200 seconds of downtime.
- The container fails 10 probes at startup, starts successfully, and after a long time fails 3 probes. The container is considered to have failed and is terminated after 30 seconds of downtime.
- The container fails 10 probes at startup, succeeds once, and fails 3 more probes. The container is considered to have started at 100 seconds, and even though it is still within the first 200 seconds of its lifetime covered by `initializationFailureThreshold`, it is considered to have failed because of `failureThreshold` and the fact that it had initially started successfully, therefore it is terminated after 30 seconds of downtime.

If `initializationFailureThreshold` is set smaller than `failureThreshold`, it's value is overridden to `failureThreshold` to avoid having the container being killed faster during startup than it would be in case of a deadlock, rendering is permanently unavailable.

This is being implemented and reviewed in PR [#71449].

[#71449]: https://github.com/kubernetes/kubernetes/pull/71449
If no `startupProbe` is defined, `hasStarted` is initialized with `true`.

### Risks and Mitigations

#### Stateless kubelet

`kubelet` handles container/pod lifecycle-related functions by relying on the underlying container runtime to persist the states. For probes and/or lifecycle hooks, kubelet rely on in-memory states only.

This means that the boolean `hasInitialized` as well as all probe counters could be reset during the lifetime of a container. There are several cases to consider:
This means that the boolean `hasStarted` as well as all probe counters could be reset during the lifetime of a container. There are several cases to consider:

- The container is starting: `hasInitialized` was still false, which means the container will have more probe attempts to successfully start. This is also the case today with a long `initialDelaySeconds`, except that with the new feature `failureThreshold` is taken into account after the first success of the probe.
- The container is running: `hasInitialized` is reverted to false until the next probe which will succeed and reset it to true immediately after.
- The container is deadlocked: `hasInitialized` is reverted to false, which means the container will have a maximum downtime of `periodSeconds` times `initializationFailureThreshold`. This is also the case today with a high `failureThreshold`.
- The container is starting: `hasStarted` was already false, which means the `startupProbe` will continue to be checked and hold off other probes until it succeeds or fails completely.
- The container is running: `hasStarted` is reverted to false until the next `startupProbe` run which will succeed and reset it to true immediately after.
- The container is deadlocked: `hasStarted` is reverted to false, which means the container will have a maximum downtime of `periodSeconds` times `failureThreshold` (from the `startupProbe`). This is also the case today when you need to set a high `failureThreshold` for the `livenessProbe`.

The only way of killing the deadlocked container faster would be to have a stateful kubelet, which is not the purpose of this KEP.

## Design Details

### Test Plan

The following test cases can be covered by calling the `fakeExecProber` a number of times to verify:
TBD

### Feature Gate

- Expected feature gate key: `StartupProbeEnabled`
- Expected default value: `false`

- the container is killed after `initializationFailureThreshold` if it has never initialized (emulated by always calling `fakeExecProber{probe.Success, nil}`)
- the container is killed after `failureThreshold` once it has initialized (emulated by calling `fakeExecProber{probe.Success, nil}` once) with total probes > `initializationFailureThreshold`
- the container is killed after `failureThreshold` once it has initialized (emulated by calling `fakeExecProber{probe.Success, nil}` once) with total probes < `initializationFailureThreshold`
### Graduation Criteria

- Alpha: Initial support for `startupProbe` added. Disabled by default.
- Beta: `startupProbe` enabled with no default configuration.
- Stable: `startupProbe` enabled with no default configuration.

## Implementation History

- 2018-11-27: prototype implemented in PR [#71449] under review
- 2019-03-05: present KEP to sig-node
- 2019-04-11: open issue in enhancements [#950]
- 2019-05-01: redesign to additional probe after @thockin [proposal]

[#950]: https://github.com/kubernetes/enhancements/issues/950
[#71449]: https://github.com/kubernetes/kubernetes/pull/71449
[#950]: https://github.com/kubernetes/enhancements/issues/950
[proposal]: https://github.com/kubernetes/kubernetes/issues/27114#issuecomment-437208330

0 comments on commit 4131790

Please sign in to comment.