Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
37 changes: 29 additions & 8 deletions content/en/docs/concepts/workloads/pods/pod-lifecycle.md
Original file line number Diff line number Diff line change
Expand Up @@ -261,21 +261,35 @@ problems, the kubelet resets the restart backoff timer for that container.
[Sidecar containers and Pod lifecycle](/docs/concepts/workloads/pods/sidecar-containers/#sidecar-containers-and-pod-lifecycle)
explains the behaviour of `init containers` when specify `restartpolicy` field on it.

### Reduced container restart delay

{{< feature-state
feature_gate_name="ReduceDefaultCrashLoopBackOffDecay" >}}

With the alpha feature gate `ReduceDefaultCrashLoopBackOffDecay` enabled,
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: we'd like this to be a default, right? I would suggest to put a note in the paragraph above that the numbers listed are only applicable when the feature gate is disabled. It may be too early for alpha, but when it goes to beta we will for sure switch numbers in paragraph above and change this section to "Old restart delay numbers".

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this it too early to do that now because I don't want to clutter up the "normal" docs with stuff that is for alpha people only right now. But yes I will hoist it up more strongly at beta.

container start retries across your cluster will be reduced to begin at 1s
(instead of 10s) and increase exponentially by 2x each restart until a maximum
delay of 60s (instead of 300s which is 5 minutes).

If you use this feature along with the alpha feature
`KubeletCrashLoopBackOffMax` (described below), individual nodes may have
different maximum delays.

### Configurable container restart delay

{{< feature-state feature_gate_name="KubeletCrashLoopBackOffMax" >}}

With the alpha feature gate `KubeletCrashLoopBackOffMax` enabled, you can
reconfigure the maximum delay between container start retries from the default
of 300s (5 minutes). This configuration is set per node using kubelet
configuration. In your [kubelet configuration](/docs/tasks/administer-cluster/kubelet-config-file/),
under `crashLoopBackOff` set the `maxContainerRestartPeriod` field between
`"1s"` and `"300s"`. As described above in [Container restart
policy](#restart-policy), delays on that node will still start at 10s and
increase exponentially by 2x each restart, but will now be capped at your
configured maximum. If the `maxContainerRestartPeriod` you configure is less
than the default initial value of 10s, the initial delay will instead be set to
the configured maximum.
configuration. In your [kubelet
configuration](/docs/tasks/administer-cluster/kubelet-config-file/), under
`crashLoopBackOff` set the `maxContainerRestartPeriod` field between `"1s"` and
`"300s"`. As described above in [Container restart policy](#restart-policy),
delays on that node will still start at 10s and increase exponentially by 2x
each restart, but will now be capped at your configured maximum. If the
`maxContainerRestartPeriod` you configure is less than the default initial value
of 10s, the initial delay will instead be set to the configured maximum.
Copy link
Contributor Author

@lauralorenz lauralorenz Apr 8, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I ended up leaving this the same as before -- I thought about making it more generic sounding ("default initial value for your cluster" instead of "default initial value of 10s") but I thought for most people they will want to come to these docs and see the effect in concrete terms not assuming they are using another alpha gate, and only people who opt to use BOTH alpha gates should have to reason about if they have a different default or not.


See the following kubelet configuration examples:

Expand All @@ -294,6 +308,13 @@ crashLoopBackOff:
maxContainerRestartPeriod: "2s"
```

If you use this feature along with the alpha feature
`ReduceDefaultCrashLoopBackOffDecay` (described above), your cluster defaults
for initial backoff and maximum backoff will no longer be 10s and 300s, but 1s
and 60s. Per node configuration takes precedence over the defaults set by
`ReduceDefaultCrashLoopBackOffDecay`, even if this would result in a node having
a longer maximum backoff than other nodes in the cluster.

## Pod conditions

A Pod has a PodStatus, which has an array of
Expand Down
Original file line number Diff line number Diff line change
@@ -0,0 +1,15 @@
---
title: ReduceDefaultCrashLoopBackOffDecay
content_type: feature_gate
_build:
list: never
render: false

stages:
- stage: alpha
defaultValue: false
fromVersion: "1.33"
---
Enabled reduction of both the initial delay and the maximum delay accrued
between container restarts for a node for containers in `CrashLoopBackOff`
across the cluster to `1s` initial delay and `60s` maximum delay.