-
Notifications
You must be signed in to change notification settings - Fork 15.3k
KEP-4603: Docs for feature behind ReduceDefaultCrashLoopBackoffDecay feature gate #50065
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Changes from all commits
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
| Original file line number | Diff line number | Diff line change |
|---|---|---|
|
|
@@ -261,21 +261,35 @@ problems, the kubelet resets the restart backoff timer for that container. | |
| [Sidecar containers and Pod lifecycle](/docs/concepts/workloads/pods/sidecar-containers/#sidecar-containers-and-pod-lifecycle) | ||
| explains the behaviour of `init containers` when specify `restartpolicy` field on it. | ||
|
|
||
| ### Reduced container restart delay | ||
|
|
||
| {{< feature-state | ||
| feature_gate_name="ReduceDefaultCrashLoopBackOffDecay" >}} | ||
|
|
||
| With the alpha feature gate `ReduceDefaultCrashLoopBackOffDecay` enabled, | ||
| container start retries across your cluster will be reduced to begin at 1s | ||
| (instead of 10s) and increase exponentially by 2x each restart until a maximum | ||
| delay of 60s (instead of 300s which is 5 minutes). | ||
|
|
||
| If you use this feature along with the alpha feature | ||
| `KubeletCrashLoopBackOffMax` (described below), individual nodes may have | ||
| different maximum delays. | ||
|
|
||
| ### Configurable container restart delay | ||
|
|
||
| {{< feature-state feature_gate_name="KubeletCrashLoopBackOffMax" >}} | ||
|
|
||
| With the alpha feature gate `KubeletCrashLoopBackOffMax` enabled, you can | ||
| reconfigure the maximum delay between container start retries from the default | ||
| of 300s (5 minutes). This configuration is set per node using kubelet | ||
| configuration. In your [kubelet configuration](/docs/tasks/administer-cluster/kubelet-config-file/), | ||
| under `crashLoopBackOff` set the `maxContainerRestartPeriod` field between | ||
| `"1s"` and `"300s"`. As described above in [Container restart | ||
| policy](#restart-policy), delays on that node will still start at 10s and | ||
| increase exponentially by 2x each restart, but will now be capped at your | ||
| configured maximum. If the `maxContainerRestartPeriod` you configure is less | ||
| than the default initial value of 10s, the initial delay will instead be set to | ||
| the configured maximum. | ||
| configuration. In your [kubelet | ||
| configuration](/docs/tasks/administer-cluster/kubelet-config-file/), under | ||
| `crashLoopBackOff` set the `maxContainerRestartPeriod` field between `"1s"` and | ||
| `"300s"`. As described above in [Container restart policy](#restart-policy), | ||
| delays on that node will still start at 10s and increase exponentially by 2x | ||
| each restart, but will now be capped at your configured maximum. If the | ||
| `maxContainerRestartPeriod` you configure is less than the default initial value | ||
| of 10s, the initial delay will instead be set to the configured maximum. | ||
|
Contributor
Author
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I ended up leaving this the same as before -- I thought about making it more generic sounding ("default initial value for your cluster" instead of "default initial value of 10s") but I thought for most people they will want to come to these docs and see the effect in concrete terms not assuming they are using another alpha gate, and only people who opt to use BOTH alpha gates should have to reason about if they have a different default or not. |
||
|
|
||
| See the following kubelet configuration examples: | ||
|
|
||
|
|
@@ -294,6 +308,13 @@ crashLoopBackOff: | |
| maxContainerRestartPeriod: "2s" | ||
| ``` | ||
|
|
||
| If you use this feature along with the alpha feature | ||
| `ReduceDefaultCrashLoopBackOffDecay` (described above), your cluster defaults | ||
| for initial backoff and maximum backoff will no longer be 10s and 300s, but 1s | ||
| and 60s. Per node configuration takes precedence over the defaults set by | ||
| `ReduceDefaultCrashLoopBackOffDecay`, even if this would result in a node having | ||
| a longer maximum backoff than other nodes in the cluster. | ||
|
|
||
| ## Pod conditions | ||
|
|
||
| A Pod has a PodStatus, which has an array of | ||
|
|
||
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,15 @@ | ||
| --- | ||
| title: ReduceDefaultCrashLoopBackOffDecay | ||
| content_type: feature_gate | ||
| _build: | ||
| list: never | ||
| render: false | ||
|
|
||
| stages: | ||
| - stage: alpha | ||
| defaultValue: false | ||
| fromVersion: "1.33" | ||
| --- | ||
| Enabled reduction of both the initial delay and the maximum delay accrued | ||
| between container restarts for a node for containers in `CrashLoopBackOff` | ||
| across the cluster to `1s` initial delay and `60s` maximum delay. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: we'd like this to be a default, right? I would suggest to put a note in the paragraph above that the numbers listed are only applicable when the feature gate is disabled. It may be too early for alpha, but when it goes to beta we will for sure switch numbers in paragraph above and change this section to "Old restart delay numbers".
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this it too early to do that now because I don't want to clutter up the "normal" docs with stuff that is for alpha people only right now. But yes I will hoist it up more strongly at beta.