KEP-4603: tune crashloopbackoff for 1.32 by lauralorenz · Pull Request #4893 · kubernetes/enhancements

lauralorenz · 2024-10-02T04:39:16Z

One-line PR description: Tunable CrashLoopBackoff proposal for 1.32 based on new defaults and node level (via KubeletConfiguration) max backoff configuration

Issue link: Tune CrashLoopBackOff #4603

Other comments:

Signed-off-by: Laura Lorenz <lauralorenz@google.com>

…esign details Signed-off-by: Laura Lorenz <lauralorenz@google.com>

Signed-off-by: Laura Lorenz <lauralorenz@google.com>

Signed-off-by: lauralorenz <lauralorenz@google.com>

Signed-off-by: Laura Lorenz <lauralorenz@google.com>

k8s-ci-robot · 2024-10-02T04:39:19Z

Skipping CI for Draft Pull Request.
If you want CI signal for your change, please convert it to an actual PR.
You can still manually trigger a test run with /test all

lauralorenz · 2024-10-02T19:57:46Z

Thank you thank you @soltysh for following up so much 🚀 The PRR is ready for your review in this PR. Thanks again!

tallclair · 2024-10-03T17:55:52Z

/assign
/milestone v1.32

tallclair

Very close to LGTM, it just needs the proposed API for the KubeletConfiguration. The rest are nits and non-blocking.

keps/sig-node/4603-tune-crashloopbackoff/README.md

tallclair · 2024-10-03T23:55:53Z

keps/sig-node/4603-tune-crashloopbackoff/README.md

+kubelet as a config file or, beta as of Kubernetes 1.30, a config directory
+([ref](https://kubernetes.io/docs/tasks/administer-cluster/kubelet-config-file/)).
+Since this is a per-node configuration that likely will be set on a subset of
+nodes, or potentially even differently per node, it's important that it can be


I can think of 2 use cases for a heterogeneous configuration:

Dedicated node pool for workloads that are expected to rapidly restart

Machine size adjusted config

In either case, I'd expect this configuration to be shared among a node pool. Upstream k8s doesn't have a node pool concept, but I think we should think of this configuration as shared across a group of nodes.

Added this (and one other case) explicitly in to clarify the position of choosing KubeletConfiguration with this in 58df245

tallclair · 2024-10-03T23:58:18Z

keps/sig-node/4603-tune-crashloopbackoff/README.md

+drops fields unrecognized by the current kubelet's schema, making it a good
+choice to circumvent compatibility issues with n-3 kubelets. While there is an
+argument that this could be better manipulated with a command-line flag, so
+lifecycle tooling that configures nodes can expose it more transparently, the


I don't think this argument holds weight. If we believed it, we shouldn't have added KubeletConfiguration in the first place. I don't think the backoff override is special enough that it should get hoisted up into a flag for better visibility.

That is, I agree with the decision to put it in the KubeletConfiguration rather than a flag.

Clarified the position of choosing KubeletConfiguration with this in 58df245

tallclair · 2024-10-04T00:00:35Z

keps/sig-node/4603-tune-crashloopbackoff/README.md

+[`client_go.Backoff.hasExpired`](https://github.com/kubernetes/kubernetes/blob/master/staging/src/k8s.io/client-go/util/flowcontrol/backoff.go#L178),
+and configure the `client_go.Backoff` object created for use by the kube runtime
+manager for container restart bacoff with a function that compares to a flat
+rate of 300 seconds.


IMO 300s is too long for the backoff recovery, but I'm happy with reducing the scope of changes for the first alpha. Maybe add a beta graduation criteria to revisit this decision?

Added as a beta criteria in 4b3835f

tallclair · 2024-10-04T00:04:18Z

keps/sig-node/4603-tune-crashloopbackoff/README.md

-  * <<[UNRESOLVED]>>node upgrade and downgrade path <<[/UNRESOLVED]>>
- Fix https://github.com/kubernetes/kubernetes/issues/123602 if this blocks the
-  implementation, otherwise beta criteria
+- New `int32 crashloopbackoff.max` field in `KubeletConfiguration` API, validated


Do you propose the actual API / fieldname anywhere?

Ah ya only just hidden here. Added explicitly in 1a5f314

tallclair · 2024-10-04T00:06:48Z

keps/sig-node/4603-tune-crashloopbackoff/README.md

+benchmarking is worked up, this is gated by its own feature gate,
+`ReduceDefaultCrashLoopBackoffDecay`.
+
+### Per node config


I think this section could benefit from a TL;DR of what exactly is being proposed. You can keep the justification and discussion, but the proposal is too buried right now. This should also include the specific field name being proposed.

Added in 1a5f314

tallclair · 2024-10-04T00:10:24Z

keps/sig-node/4603-tune-crashloopbackoff/README.md

+- Test proving `KubeletConfiguration` objects will silently drop unrecognized
+  fields in the `config.validation_test` package
+  ([ref](https://github.com/kubernetes/kubernetes/blob/master/pkg/kubelet/apis/config/validation/validation_test.go)).


Is this also the expected behavior when the feature gate is disabled?

Yes. I did include this comment inline here in 1515af5

Signed-off-by: Laura Lorenz <lauralorenz@google.com>

…tune-crashloopbackoff-132

Signed-off-by: Laura Lorenz <lauralorenz@google.com>

tallclair · 2024-10-04T21:29:11Z

keps/sig-node/4603-tune-crashloopbackoff/README.md

+apiVersion: kubelet.config.k8s.io/v1beta1
+kind: KubeletConfiguration
+crashloopbackoff: 
+  max: 4


I think this should be maxSeconds (https://github.com/kubernetes/community/blob/master/contributors/devel/sig-architecture/api-conventions.md#units)

Fixed in 220604e

tallclair · 2024-10-07T16:18:16Z

/lgtm

tallclair · 2024-10-07T16:19:30Z

I think this addresses everything @dchen1107 listed in #4604 (comment)

@lauralorenz please confirm.

lauralorenz · 2024-10-08T04:12:08Z

To fill in the context to @tallclair's question in #4893 (comment) for @dchen1107

Rapid Restart Implementation

The previously named RestartPolicy: Rapid has been moved to Alternatives Considered here. If you mean the change to the default curve, that is here in the Proposal section and the implementation explained more in this Design Details section.

Node-Level Opt-in

This is now introduced here in the Proposal section and further explained in this Design Details section.

Benchmarking and Stress Testing

Manual stress tests I did to rationalize the current default suggestions are discussed here and further benchmarking and stress testing remain a graduation criteria for alpha and are described more in the benchmarking Design Details section.

Interaction with Job API

This change in design now supports faster restarts of any workloads -- not limited to those without support for RestartPolicy: Always in the Pod API like the prior RestartPolicy: Rapid, as described in these design slides. Explicitly supporting Jobs was an added as an explicit KEP goal in this PR at fb56335#diff-ffa07b017bd73874c1a805a8492f4e83e1d334445fcf891da0be617332650427R291.

The interaction with a very specific Job API feature from KEP-3329 is described here and has not changed much since 1.31 since as of 1.32, they target different restart types (Never in KEP-3329 vs OnFailure and Always in this KEP).

Kubelet Overhead Analysis

See the much more expanded section for this here and in this Appendix.

lauralorenz · 2024-10-08T04:17:53Z

📟 Greetings @soltysh , just a friendly update that this has sig-node reviewer lgtm, if that time sequences anything regarding the PRR. @dchen1107 has it in her queue for the sig-node hard approval.

dchen1107 · 2024-10-08T16:29:59Z

keps/sig-node/4603-tune-crashloopbackoff/README.md

-* Runs startup probes until container started (startup probes may be more
+  image downloads) if image pull policy specifies it
+  ([ref](https://github.com/kubernetes/kubernetes/blob/release-1.31/pkg/kubelet/images/image_manager.go#L135)).
+* Recreates the pod sandbox and probe workers


nit: Why kubelet recreate the pod sandbox for restarting a container within the already-running pod? cc/ @tallclair @yujuhong

I don't think kubelet does that.
The probe worker is also set up only once when the pod is added.

I also don't think register/unregister the pod to the managers, as listed below...

dchen1107 · 2024-10-08T16:39:35Z

/approve

Thanks @lauralorenz for the detailed design. Thanks @tallclair and others for the detailed review.

I have focused on the potential risks this time which has been brought up several times by the community: 1) Increased Load on Kubelet: Faster restarts mean kubelet has to work harder and more frequently to manage pod lifecycles. This could lead to increased CPU and memory usage, potentially impacting node stability. 2) API Server Overload: Each pod restart triggers API server requests to update pod status. More frequent restarts could strain the API server, potentially affecting the entire cluster.

The KEP itself documented well the risk mitigation strategies:

Conservative Defaults
Node-Level Opt-in
Alpha Stage and Feature Gates
Extensive Testing
Controlled Experimentation
Gradual Rollout
Continuous Monitoring

cc/ @liggitt

soltysh

Minor nit, feel free to either fix it asap (I'll put a hold) or in a follow-up (in which case drop the hold).

/approve
the PRR

soltysh · 2024-10-08T18:10:35Z

keps/sig-node/4603-tune-crashloopbackoff/README.md

 -->

+
 - `kubelet/kuberuntime/kuberuntime_manager_test`: **could not find a successful


This is missing, I see some reasonable data in https://testgrid.k8s.io/sig-testing-canaries#ci-kubernetes-coverage-unit

Thanks! I will fix it now

Fixed in another branch at lauralorenz@d367cd5, will merge in as follow on PR because I fear losing those hard won lgtms and approves 😅

k8s-ci-robot · 2024-10-08T18:14:00Z

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: dchen1107, lauralorenz, soltysh

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Details

Needs approval from an approver in each of these files:

~~keps/prod-readiness/OWNERS~~ [soltysh]
~~keps/sig-node/OWNERS~~ [dchen1107]

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

soltysh · 2024-10-08T18:14:28Z

/hold
to fix the nit

lauralorenz · 2024-10-08T18:45:25Z

/unhold

merging in fix for nit at lauralorenz@d367cd5 after this

lauralorenz · 2024-10-08T20:35:53Z

FYI follow up PR for PRR nit is in #4910

lauralorenz added 23 commits September 17, 2024 11:02

Update proposal and risk section for 1.32

00d1011

Signed-off-by: Laura Lorenz <lauralorenz@google.com>

Add refactor for backoff counter reset threshold, and resolve major d…

1273fbd

…esign details Signed-off-by: Laura Lorenz <lauralorenz@google.com>

Cleanup TODO and lingering unresolved tag

978d99d

Signed-off-by: Laura Lorenz <lauralorenz@google.com>

Reformatting some paragraphs

1461b6a

Signed-off-by: Laura Lorenz <lauralorenz@google.com>

Update kubelet overhead analysis

03c1284

Signed-off-by: Laura Lorenz <lauralorenz@google.com>

rewrap

83ca22b

Signed-off-by: Laura Lorenz <lauralorenz@google.com>

Update more per node details

fb56335

Signed-off-by: Laura Lorenz <lauralorenz@google.com>

Specify kubelet configuration implementation

e0f35ba

Signed-off-by: Laura Lorenz <lauralorenz@google.com>

Update graduation criteria with 1.32 proposal stuff

a4412ed

Signed-off-by: Laura Lorenz <lauralorenz@google.com>

Update with conflict resolution info

21caaa6

Signed-off-by: Laura Lorenz <lauralorenz@google.com>

Internal consistency and proofreading run

c9f7bad

Signed-off-by: Laura Lorenz <lauralorenz@google.com>

Add some new unresolved tags from IRL comments

c3d9e2c

Signed-off-by: Laura Lorenz <lauralorenz@google.com>

Clean up some unresolved's to undraft

b879c7f

Signed-off-by: Laura Lorenz <lauralorenz@google.com>

Update graphs, remove some other unresolveds

0293f6f

Signed-off-by: Laura Lorenz <lauralorenz@google.com>

Add new graphs

9229a7d

Signed-off-by: Laura Lorenz <lauralorenz@google.com>

Add conflict resolution table

a31f622

Signed-off-by: Laura Lorenz <lauralorenz@google.com>

Fix napkin math in risks and mitigations

5a50201

Signed-off-by: Laura Lorenz <lauralorenz@google.com>

Address free restart issue in design details

154a5c9

Signed-off-by: Laura Lorenz <lauralorenz@google.com>

Fix some lingering unresolveds

ea3988f

Signed-off-by: Laura Lorenz <lauralorenz@google.com>

Remove benchmarking ifftt, can add later

97cb477

Signed-off-by: Laura Lorenz <lauralorenz@google.com>

Update toc

d81363d

Signed-off-by: lauralorenz <lauralorenz@google.com>

spelling

40073eb

Signed-off-by: Laura Lorenz <lauralorenz@google.com>

update some grad criteria and version skew info

5997b34

Signed-off-by: Laura Lorenz <lauralorenz@google.com>

k8s-ci-robot added the do-not-merge/work-in-progress Indicates that a PR should not merge because it is a work in progress. label Oct 2, 2024

k8s-ci-robot requested review from derekwaynecarr and mrunalp October 2, 2024 04:39

k8s-ci-robot added kind/kep Categorizes KEP tracking issues and PRs modifying the KEP directory cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. sig/node Categorizes an issue or PR as relevant to SIG Node. labels Oct 2, 2024

k8s-ci-robot assigned tallclair Oct 3, 2024

k8s-ci-robot added this to the v1.32 milestone Oct 3, 2024

tallclair reviewed Oct 4, 2024

View reviewed changes

lauralorenz added 5 commits October 4, 2024 10:16

More accurately represent the motivation for KubeletConfiguration

58df245

Signed-off-by: Laura Lorenz <lauralorenz@google.com>

Explicitly add proposed fields for KubeletConfiguration

1a5f314

Signed-off-by: Laura Lorenz <lauralorenz@google.com>

Revisit 300s validation for per node config in beta

4b3835f

Signed-off-by: Laura Lorenz <lauralorenz@google.com>

Merge branch 'kep-4603-tune-crashloopbackoff-132-copy' into kep-4603-…

9eacadb

…tune-crashloopbackoff-132

Add unresolved comments, and annotated unresolved with target stage

1515af5

Signed-off-by: Laura Lorenz <lauralorenz@google.com>

tallclair reviewed Oct 4, 2024

View reviewed changes

crashloopbackoff.max -> crashloopbackoff.maxSeconds

220604e

k8s-ci-robot added the lgtm "Looks good to me", indicates that a PR is ready to be merged. label Oct 7, 2024

dchen1107 reviewed Oct 8, 2024

View reviewed changes

soltysh approved these changes Oct 8, 2024

View reviewed changes

k8s-ci-robot added the approved Indicates a PR has been approved by an approver from all required OWNERS files. label Oct 8, 2024

k8s-ci-robot added the do-not-merge/hold Indicates that a PR should not merge because someone has issued a /hold command. label Oct 8, 2024

k8s-ci-robot removed the do-not-merge/hold Indicates that a PR should not merge because someone has issued a /hold command. label Oct 8, 2024

k8s-ci-robot merged commit 72e6683 into kubernetes:master Oct 8, 2024

lauralorenz mentioned this pull request Oct 8, 2024

KEP-4603: PRR testgrid/prow link updates #4910

Merged

pacoxu mentioned this pull request Oct 12, 2024

Tune CrashLoopBackOff #4603

Open

5 tasks

		-->


		- `kubelet/kuberuntime/kuberuntime_manager_test`: **could not find a successful

Conversation

lauralorenz commented Oct 2, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

k8s-ci-robot commented Oct 2, 2024

Uh oh!

lauralorenz commented Oct 2, 2024

Uh oh!

tallclair commented Oct 3, 2024

Uh oh!

tallclair left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

tallclair commented Oct 7, 2024

Uh oh!

tallclair commented Oct 7, 2024

Uh oh!

lauralorenz commented Oct 8, 2024

Uh oh!

lauralorenz commented Oct 8, 2024

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

dchen1107 commented Oct 8, 2024

Uh oh!

soltysh left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

k8s-ci-robot commented Oct 8, 2024

Uh oh!

soltysh commented Oct 8, 2024

Uh oh!

lauralorenz commented Oct 8, 2024

Uh oh!

lauralorenz commented Oct 8, 2024

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

lauralorenz commented Oct 2, 2024 •

edited

Loading