Skip to content

Conversation

@mimowo
Copy link
Contributor

@mimowo mimowo commented Sep 21, 2023

  • One-line PR description: Update the KEP for Beta graduation for Backoff Limit Per Index
  • Other comments:

@k8s-ci-robot
Copy link
Contributor

Skipping CI for Draft Pull Request.
If you want CI signal for your change, please convert it to an actual PR.
You can still manually trigger a test run with /test all

@k8s-ci-robot k8s-ci-robot added do-not-merge/work-in-progress Indicates that a PR should not merge because it is a work in progress. cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. labels Sep 21, 2023
@k8s-ci-robot k8s-ci-robot added kind/kep Categorizes KEP tracking issues and PRs modifying the KEP directory sig/apps Categorizes an issue or PR as relevant to SIG Apps. size/XS Denotes a PR that changes 0-9 lines, ignoring generated files. labels Sep 21, 2023
@mimowo mimowo force-pushed the backoff-limit-per-index-beta branch from ac34804 to d16aea7 Compare September 21, 2023 11:16
@mimowo mimowo changed the title Update KEP for Beta for "Backoff Limit Per Index" Update KEP-3850 "Backoff Limit Per Index" for Beta Sep 22, 2023
@mimowo mimowo force-pushed the backoff-limit-per-index-beta branch from d16aea7 to de0c18c Compare September 22, 2023 16:51
@k8s-ci-robot k8s-ci-robot added size/M Denotes a PR that changes 30-99 lines, ignoring generated files. and removed size/XS Denotes a PR that changes 0-9 lines, ignoring generated files. labels Sep 22, 2023
@wojtek-t wojtek-t self-assigned this Sep 25, 2023
@mimowo mimowo force-pushed the backoff-limit-per-index-beta branch from de0c18c to 20e655e Compare September 25, 2023 08:26
@mimowo mimowo changed the title Update KEP-3850 "Backoff Limit Per Index" for Beta WIP: Update KEP-3850 "Backoff Limit Per Index" for Beta Sep 25, 2023
@mimowo mimowo marked this pull request as ready for review September 25, 2023 08:28
@mimowo
Copy link
Contributor Author

mimowo commented Sep 25, 2023

/assign @alculquicondor

@mimowo mimowo changed the title WIP: Update KEP-3850 "Backoff Limit Per Index" for Beta Update KEP-3850 "Backoff Limit Per Index" for Beta Sep 25, 2023
@k8s-ci-robot k8s-ci-robot removed the do-not-merge/work-in-progress Indicates that a PR should not merge because it is a work in progress. label Sep 25, 2023
@mimowo mimowo force-pushed the backoff-limit-per-index-beta branch from 45f0b4b to 20e655e Compare September 27, 2023 09:51
Copy link
Member

@alculquicondor alculquicondor left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

/lgtm

@k8s-ci-robot k8s-ci-robot added the lgtm "Looks good to me", indicates that a PR is ready to be merged. label Sep 27, 2023
@alculquicondor
Copy link
Member

/assign @soltysh

Copy link
Contributor

@kannon92 kannon92 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do you want to mention that in beta we should move these failure reasons to staging?

xref kubernetes/kubernetes@a62eb45#diff-a7577a7dede7864ff38c631319033714fdd0bed91108d976282e1099507e6ff0


###### What are the reasonable SLOs (Service Level Objectives) for the enhancement?

This feature does not propose SLOs.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Are there any tests for feature enablement/disablement?

Have those been added? Can you link? If not, can you please prioritize?

How can a rollout or rollback fail? Can it impact already running workloads?

it doesn't impact already running pods, but it actually impacts the way they are restarted in case of failures - seems worth clarifying.

also, please address "how can rollout/rollback fail"

Were upgrade and rollback tested? Was the upgrade->downgrade->upgrade path tested?

Please ensure that this will happen and the KEP will get updated before graduating FG in k/k.
Even better would be to describe a test scenario now, see
https://github.com/kubernetes/enhancements/pull/3658/files
as an example

Are there any missing metrics that would be useful to have to improve observability of this feature?

Are you still planning it? Can you update?

https://github.com/kubernetes/enhancements/tree/master/keps/NNNN-kep-template#can-enabling--using-this-feature-result-in-resource-exhaustion-of-some-node-resources-pids-sockets-inodes-etc

as a new question was added - please copy and fill-in

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree with Wojtek here, this feature is proposed to be promoted to beta, ie. on by default, so all the questions around rollback and monitoring when such should be performed are very important.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@wojtek-t @soltysh thanks for reviewing. I think all the questions are now addressed. PTAL.

@mimowo mimowo force-pushed the backoff-limit-per-index-beta branch from 20e655e to 099f0d1 Compare September 29, 2023 12:44
@k8s-ci-robot k8s-ci-robot removed the lgtm "Looks good to me", indicates that a PR is ready to be merged. label Sep 29, 2023
@mimowo mimowo force-pushed the backoff-limit-per-index-beta branch 3 times, most recently from 71409f6 to 576a4b2 Compare September 29, 2023 14:16
@mimowo mimowo force-pushed the backoff-limit-per-index-beta branch from 576a4b2 to 31e1403 Compare September 29, 2023 14:16
@k8s-ci-robot k8s-ci-robot added size/L Denotes a PR that changes 100-499 lines, ignoring generated files. and removed size/M Denotes a PR that changes 30-99 lines, ignoring generated files. labels Sep 29, 2023

2. Simulate downgrade by disabling the feature for api server and control-plane:

```sh
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For the sake of keeping the KEP small for future consumption, I find the commands unnecessary.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do you mean all commands in this section, or just those to edit the control-plane manifests?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For now, removed the commands to edit the manifests for now, and made some descriptions more concise. PTAL

Copy link
Member

@alculquicondor alculquicondor left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

/lgtm

@k8s-ci-robot k8s-ci-robot added the lgtm "Looks good to me", indicates that a PR is ready to be merged. label Sep 29, 2023
- Address reviews and bug reports from Alpha users
- Propose and implement metrics
- Implement the `job_finished_indexes_total` metric
- E2e tests are in Testgrid and linked in KEP
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I see integration but not e2e?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The test will be added after the feature gate is enabled by default, during the implementation phase for Beta.


No. The tests will be added in Alpha.
Yes, there is an [integration test](https://github.com/kubernetes/kubernetes/blob/dc28eeaa3a6e18ef683f4b2379234c2284d5577e/test/integration/job/job_test.go#L763)
which tests the following path: enablement -> disablement -> re-enablement.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Awesome :)

###### Are there any missing metrics that would be useful to have to improve observability of this feature?

For Beta we will consider introduction of a new metric `job_finished_indexes_total`
For Beta we will introduce a new metric `job_finished_indexes_total`
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

OK - thanks!

@wojtek-t
Copy link
Member

wojtek-t commented Oct 2, 2023

/lgtm
/approve PRR

Copy link
Contributor

@soltysh soltysh left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

/lgtm
/approve
/label tide/merge-method-squash

@k8s-ci-robot k8s-ci-robot added the tide/merge-method-squash Denotes a PR that should be squashed by tide when it merges. label Oct 2, 2023
@k8s-ci-robot
Copy link
Contributor

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: mimowo, soltysh, wojtek-t

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@k8s-ci-robot k8s-ci-robot added the approved Indicates a PR has been approved by an approver from all required OWNERS files. label Oct 2, 2023
@k8s-ci-robot k8s-ci-robot merged commit f3466c7 into kubernetes:master Oct 2, 2023
@k8s-ci-robot k8s-ci-robot added this to the v1.29 milestone Oct 2, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

approved Indicates a PR has been approved by an approver from all required OWNERS files. cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. kind/kep Categorizes KEP tracking issues and PRs modifying the KEP directory lgtm "Looks good to me", indicates that a PR is ready to be merged. sig/apps Categorizes an issue or PR as relevant to SIG Apps. size/L Denotes a PR that changes 100-499 lines, ignoring generated files. tide/merge-method-squash Denotes a PR that should be squashed by tide when it merges.

Projects

Archived in project

Development

Successfully merging this pull request may close these issues.

7 participants