-
Notifications
You must be signed in to change notification settings - Fork 1.5k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
KEP 2551 - kubectl exit code normalization #2574
KEP 2551 - kubectl exit code normalization #2574
Conversation
Note that some exit code are not actual errors. You should definitely account for that. |
True!! So we've discussed in todays sig-cli and I'll try to reduce the scope/amount of error codes, like having:
|
|
||
### Goals | ||
|
||
* Document possible exit codes for kubectl |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Will we also cover expectations for plugins and their exit codes? Maybe that's a non-goal at this stage.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hey Tim,
IMO we should not normalize the plugins exit codes, but plugins enter in the 'external programs' category, which we can say that the error code is 128 + the plugin exit code (same for kubectl diff, as an example).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Now I'm questioning myself what happens if a plugin ends with an error code bigger then 128, which will be 128 (from the kubectl) + 128 (from the plugin).
During the design, this should be dealt like (if > 255 then return 255 or something inside the supported error range)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We can recommend but that's not a hard requirement.
|
||
### Goals | ||
|
||
* Document possible exit codes for kubectl |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We can recommend but that's not a hard requirement.
The majority of commands already are organized as the following: | ||
* Run Complete to complete missing information with defaults. This runs on the client side | ||
* Run Validation to check command syntax and missing arguments. This runs on the client side | ||
* Run the command itself. This might run on the client side, be dry-run, run an external command or call the APIServer. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This might actually change during the separattion of cobra from Options
struct, so it would be nice to put it in a more generic way.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Example: validation and execution.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@deejross will take a look on this
kubectl exec and run uses the pod exit code as its own exit code, we should figure out how we | ||
should deal with this | ||
https://github.com/kubernetes/kubernetes/blob/v1.22.0-alpha.0/test/e2e/kubectl/kubectl.go#L496 | ||
https://github.com/kubernetes/kubernetes/blob/v1.22.0-alpha.0/staging/src/k8s.io/kubectl/pkg/cmd/util/helpers.go#L178-L179 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah, that's a big one and how we can properly handle that.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should this follow the 128+exit code rule proposed for plugins?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
One other option is to start kubectl error codes at 201. This would allow exec to continue returning the pod exit code, assuming it's less than 201. If the pod exit code is 201 or greater, then set it to 255. Maybe we should apply the same logic to external commands as well, such as diff to keep it consistent. This would allow those commands to return their codes unaltered (if less than 201), and make kubectl generated error codes distinct to reduce confusion around the origin of an error. Thoughts on this approach?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@soltysh wdyt about this pattern of using RC > 200? I was discussing with Ross here that this may be a good approach but not the desired by sig-cli.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
How about having kubectl exec
take a parameter --use-exit-code
that defaults to true, and that you can force to false if you want success to mean “we were able to exec
something”?
### Creating new error parser functions | ||
|
||
Another design solution is to create helper functions for each steps: | ||
* When running Complete, Validate or other client side steps, call cmdutil.CheckClientErr(err) and exits with some well defined client error code, mapped to ErrorCodeClient |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
As mentioned before I'd describe this in slightly more generic way, since this implementation might change.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@deejross to take a look into this
The Kubernetes project currently lacks enough contributors to adequately respond to all issues and PRs. This bot triages issues and PRs according to the following rules:
You can:
Please send feedback to sig-contributor-experience at kubernetes/community. /lifecycle stale |
/remove-lifecycle stale |
The Kubernetes project currently lacks enough contributors to adequately respond to all issues and PRs. This bot triages issues and PRs according to the following rules:
You can:
Please send feedback to sig-contributor-experience at kubernetes/community. /lifecycle stale |
/remove-lifecycle stale |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
A couple of nits, but mostly this looks good
/hold
for you to get PRR file added and nits addressed
/lgtm
/approve
for the developers to warn when a new deployment fails because of the lack of some permission, so those permissions can be | ||
updated for the pipeline to work correctly. | ||
|
||
The developers are making a lot of changes, and they keeps asking for Bruce to look for every pipeline execution, even those that |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The developers are making a lot of changes, and they keeps asking for Bruce to look for every pipeline execution, even those that | |
The developers are making a lot of changes, and they keep asking for Bruce to look for every pipeline execution, even those that |
wants to warn users when the apply command fails because of differences between the manifests. | ||
|
||
#### Story 2 | ||
Bruce Wayne, the security administrator of the Gotham Inc Company is following the development of a new product. Bruce asked |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Bruce Wayne, the security administrator of the Gotham Inc Company is following the development of a new product. Bruce asked | |
Bruce Wayne, the security administrator of the Gotham Inc company is following the development of a new product. Bruce asked |
|
||
| Code | Description | | ||
| ----- | ----------------------------------------------------------------------------------------------------- | | ||
| 1-200 | Reserved for exit codes from exec and external commands | |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I was wondering if we need to reserve all 200, if just the initial 100 isn't sufficient? But that's something we'll need to empirically figure out as we go.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
agreed. This was a discussion I was on with Ross, that maybe the code reservation can be something more sparse or in a "range" other than reserving all the 200
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Let's "close" this list before going to beta, for alpha let's leave it open.
| 203 | Client configuration error, invalid or missing configuration | | ||
| 204 | Network failure, API could not be reached | | ||
| 205 | Authentication failure, identity could not be determined | | ||
| 206 | Authorization failure, identity was determined, but does not have access to requested resource(s) | |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This and not found are hard to distinguish due to server returning 404 Not Found in both cases, due to security reasons.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We can leave this as an UNRESOLVED tag so we are stating this is not solved yet
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Exactly what I said above :)
|
||
### Creating new error parser functions | ||
|
||
Another design solution is to create helper functions for each steps: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I was thinking about it earlier this week and I think that implementing a set of new error codes inside kubectl where each command could use them and then having CheckErr
return new error codes with that env of yours and just -1 in the backwards compatible way as it does today. Which is close with what you're describing here.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
sorry @soltysh is there any action on this? I'm a bit slow this latest days :)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
No action required, this is more like a thought for you and @deejross to consider when implementing 😉
CRI or CNI may require updating that component before the kubelet. | ||
--> | ||
|
||
## Production Readiness Review Questionnaire |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You're missing file in https://github.com/kubernetes/enhancements/blob/master/keps/prod-readiness/ and make sure to update the template with current one.
PRR added |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Minor tweaks needed to PRR but looks good in general.
# The milestone at which this feature was, or is targeted to be, at each stage. | ||
milestone: | ||
alpha: "v1.24" | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
you're missing some fields from the latest template, please update
|
||
## Production Readiness Review Questionnaire | ||
|
||
### Feature Enablement and Rollback |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think you have an older version of the questions, please review the latest template and include any new questions as well. Thanks.
owning-sig: sig-cli | ||
status: implementable | ||
creation-date: 2021-03-16 | ||
reviewers: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I took a look thanks!
@johnbelamaric fixed based on your comments, thanks :) |
bacba4f
to
d51c5c2
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Some minor nits, but this is good, thx!
/lgtm
/approve
|
||
Items marked with (R) are required *prior to targeting to a milestone / release*. | ||
|
||
- [ ] (R) Enhancement issue in release milestone, which links to KEP dir in [kubernetes/enhancements] (not the initial KEP PR) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nit: some of these checkboxes need to be checked.
|
||
| Code | Description | | ||
| ----- | ----------------------------------------------------------------------------------------------------- | | ||
| 1-200 | Reserved for exit codes from exec and external commands | |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Let's "close" this list before going to beta, for alpha let's leave it open.
| 203 | Client configuration error, invalid or missing configuration | | ||
| 204 | Network failure, API could not be reached | | ||
| 205 | Authentication failure, identity could not be determined | | ||
| 206 | Authorization failure, identity was determined, but does not have access to requested resource(s) | |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Exactly what I said above :)
|
||
### Creating new error parser functions | ||
|
||
Another design solution is to create helper functions for each steps: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
No action required, this is more like a thought for you and @deejross to consider when implementing 😉
/approve |
[APPROVALNOTIFIER] This PR is APPROVED This pull-request has been approved by: johnbelamaric, rikatz, soltysh The full list of commands accepted by this bot can be found here. The pull request process is described here
Needs approval from an approver in each of these files:
Approvers can indicate their approval by writing |
/hold cancel |
xref Issue: #2551