-
Notifications
You must be signed in to change notification settings - Fork 1.6k
Update KEP-3850 "Backoff Limit Per Index" for Beta #4228
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Update KEP-3850 "Backoff Limit Per Index" for Beta #4228
Conversation
mimowo
commented
Sep 21, 2023
- One-line PR description: Update the KEP for Beta graduation for Backoff Limit Per Index
- Issue link: Backoff Limit Per Index For Indexed Jobs #3850
- Other comments:
|
Skipping CI for Draft Pull Request. |
ac34804 to
d16aea7
Compare
d16aea7 to
de0c18c
Compare
de0c18c to
20e655e
Compare
|
/assign @alculquicondor |
45f0b4b to
20e655e
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
/lgtm
|
/assign @soltysh |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do you want to mention that in beta we should move these failure reasons to staging?
xref kubernetes/kubernetes@a62eb45#diff-a7577a7dede7864ff38c631319033714fdd0bed91108d976282e1099507e6ff0
|
|
||
| ###### What are the reasonable SLOs (Service Level Objectives) for the enhancement? | ||
|
|
||
| This feature does not propose SLOs. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Are there any tests for feature enablement/disablement?
Have those been added? Can you link? If not, can you please prioritize?
How can a rollout or rollback fail? Can it impact already running workloads?
it doesn't impact already running pods, but it actually impacts the way they are restarted in case of failures - seems worth clarifying.
also, please address "how can rollout/rollback fail"
Were upgrade and rollback tested? Was the upgrade->downgrade->upgrade path tested?
Please ensure that this will happen and the KEP will get updated before graduating FG in k/k.
Even better would be to describe a test scenario now, see
https://github.com/kubernetes/enhancements/pull/3658/files
as an example
Are there any missing metrics that would be useful to have to improve observability of this feature?
Are you still planning it? Can you update?
as a new question was added - please copy and fill-in
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I agree with Wojtek here, this feature is proposed to be promoted to beta, ie. on by default, so all the questions around rollback and monitoring when such should be performed are very important.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
20e655e to
099f0d1
Compare
71409f6 to
576a4b2
Compare
576a4b2 to
31e1403
Compare
|
|
||
| 2. Simulate downgrade by disabling the feature for api server and control-plane: | ||
|
|
||
| ```sh |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
For the sake of keeping the KEP small for future consumption, I find the commands unnecessary.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do you mean all commands in this section, or just those to edit the control-plane manifests?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
For now, removed the commands to edit the manifests for now, and made some descriptions more concise. PTAL
keps/sig-apps/3850-backoff-limits-per-index-for-indexed-jobs/README.md
Outdated
Show resolved
Hide resolved
…EADME.md Co-authored-by: Aldo Culquicondor <[email protected]>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
/lgtm
| - Address reviews and bug reports from Alpha users | ||
| - Propose and implement metrics | ||
| - Implement the `job_finished_indexes_total` metric | ||
| - E2e tests are in Testgrid and linked in KEP |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I see integration but not e2e?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The test will be added after the feature gate is enabled by default, during the implementation phase for Beta.
|
|
||
| No. The tests will be added in Alpha. | ||
| Yes, there is an [integration test](https://github.com/kubernetes/kubernetes/blob/dc28eeaa3a6e18ef683f4b2379234c2284d5577e/test/integration/job/job_test.go#L763) | ||
| which tests the following path: enablement -> disablement -> re-enablement. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Awesome :)
| ###### Are there any missing metrics that would be useful to have to improve observability of this feature? | ||
|
|
||
| For Beta we will consider introduction of a new metric `job_finished_indexes_total` | ||
| For Beta we will introduce a new metric `job_finished_indexes_total` |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
OK - thanks!
|
/lgtm |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
/lgtm
/approve
/label tide/merge-method-squash
|
[APPROVALNOTIFIER] This PR is APPROVED This pull-request has been approved by: mimowo, soltysh, wojtek-t The full list of commands accepted by this bot can be found here. The pull request process is described here
Needs approval from an approver in each of these files:
Approvers can indicate their approval by writing |