-
Notifications
You must be signed in to change notification settings - Fork 1.5k
OCPBUGS-3706: Improve error reporting from agent wait-for install-complete #6730
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
OCPBUGS-3706: Improve error reporting from agent wait-for install-complete #6730
Conversation
All errors produced by this function are accompanied by Fatal-level logs, so there is no need for an additional warning-level log.
err is always nil at this point, because we check it further up and it is not overwritten by the variable of the same name that is shadowing it inside the anonymous function, as was probably intended.
Create the Cluster object outside of the WaitFor...() implementation and pass it in, instead of creating it inside and returning it.
When running the 'agent wait-for install-complete' command, we first check that bootstrapping is complete (by running the equivalent of 'agent wait-for bootstrap-complete'. However, if this failed because the bootstrapping timed out, we would report it as an install failure along with the corresponding debug messages (stating that the problem is with the cluster operators, and inevitably failing to fetch data about which). If the failure occurs during bootstrapping, report it as a bootstrap error the same as you would get from 'agent wait-for bootstrap-complete'.
|
@zaneb: This pull request references Jira Issue OCPBUGS-3706, which is valid. The bug has been moved to the POST state. 3 validation(s) were run on this bug
Requesting review from QA contact: The bug has been updated to refer to the pull request using the external bug tracker. DetailsIn response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. |
|
/cherry-pick release-4.12 |
|
@zaneb: once the present PR merges, I will cherry-pick it on top of release-4.12 in a new PR and assign it to you. DetailsIn response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. |
|
@zaneb: The following tests failed, say
Full PR test history. Your PR dashboard. DetailsInstructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. I understand the commands that are listed here. |
|
/lgtm |
|
|
||
| if err = agentpkg.WaitForInstallComplete(cluster); err != nil { | ||
| logrus.Debug("Printing the event list gathered from the Agent Rest API") | ||
| cluster.PrintInfraEnvRestAPIEventList() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm curious why we would stop printing out the event list if the install failed to complete. Wouldn't it be useful information?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The theory is that at this point we've succeeded in bootstrapping, and it's just that the CVO isn't complete, so the events that happened before/during bootstrapping seem less relevant at this point.
I think this was previously here because it was trying to cover all cases, since there was previously no way of telling if it failed during bootstrapping or after.
|
[APPROVALNOTIFIER] This PR is APPROVED This pull-request has been approved by: rwsu The full list of commands accepted by this bot can be found here. The pull request process is described here DetailsNeeds approval from an approver in each of these files:
Approvers can indicate their approval by writing |
|
@zaneb: All pull requests linked via external trackers have merged: Jira Issue OCPBUGS-3706 has been moved to the MODIFIED state. DetailsIn response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. |
|
@zaneb: new pull request created: #6742 DetailsIn response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. |
When running the
agent wait-for install-completecommand, we first check that bootstrapping is complete (by running the equivalent ofagent wait-for bootstrap-complete. However, if this failed because the bootstrapping timed out, we would report it as an install failure along with the corresponding debug messages (stating that the problem is with the cluster operators, and inevitably failing to fetch data about which).If the failure occurs during bootstrapping, report it as a bootstrap error the same as you would get from
agent wait-for bootstrap-complete. If the failure occurs later, don't print the assisted-installer event list. If the command failed because of a timeout, log that.