-
Notifications
You must be signed in to change notification settings - Fork 1.5k
cmd/openshift-install/create: Mention 'wait-for' when install-complete fails #3726
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
cmd/openshift-install/create: Mention 'wait-for' when install-complete fails #3726
Conversation
…e fails Folks might wish to wait longer, possibly after trying to manually recover some cluster component. Personally I'd rather drop the install-complete timeout entirely and have callers supply their own timeout like: $ timeout 1h openshift-install create cluster but Stephen Benjamin feels that the current installer output is not sufficiently clear to allow users to make informed decisions about whether waiting longer or not makes sense. Potentially product improvements like alerting on stuck-in-Provisioned compute machines and installer logging of firing alerts would help in this space. But until we can drop the timeout, pointing folks at the wait-for command makes that safety valve more discoverable. The "Use the following command..." language is originally from 07aa0e0 (cmd: add gather bootstrap subcommand for gathering logs on bootstrap failure, 2019-04-12, openshift#1627), so I'm just rolling forward with that approach instead of porting it to use argv[0] or something vs. it's current assumption that the installer command will be "openshift-install".
|
[APPROVALNOTIFIER] This PR is NOT APPROVED This pull-request has been approved by: The full list of commands accepted by this bot can be found here. DetailsNeeds approval from an approver in each of these files:Approvers can indicate their approval by writing |
| logrus.Error("Attempted to gather ClusterOperator status after installation failure: ", err2) | ||
| } | ||
| logrus.Info("Use the following command if you want to wait longer for install completion:") | ||
| logrus.Info("openshift-install wait-for install-complete --help") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Either here or in the help (which only says "Wait until the cluster is ready"), what do you think of explaining why someone might want to wait longer?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think that decision is up to the user, and depends on their feelings vs. the ClusterOperator and ClusterVersion conditions we report immediately before and after these new lines. What would you add?
To clarify somewhat: I do think the installer should always give up at some point, I just prefer it make an informed decision rather than base it only on overall time. This seems fine to me for now. |
|
@wking: The following tests failed, say
Full PR test history. Your PR dashboard. DetailsInstructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. I understand the commands that are listed here. |
|
/uncc |
|
Obsoleted by #4259. /close |
|
@wking: Closed this PR. DetailsIn response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. |
Folks might wish to wait longer, possibly after trying to manually recover some cluster component. Personally I'd rather drop the
install-completetimeout entirely and have callers supply their own timeout like:$ timeout 1h openshift-install create clusterbut @stbenjam feels that the current installer output is not sufficiently clear to allow users to make informed decisions about whether waiting longer or not makes sense. Potentially product improvements like alerting on stuck-in-
Provisionedcompute machines and installer logging of firing alerts would help in this space. But until we can drop the timeout, pointing folks at the wait-for command makes that safety valve more discoverable.The
Use the following command...language is originally from 07aa0e0 (#1627), so I'm just rolling forward with that approach instead of porting it to useargv[0]or something vs. it's current assumption that the installer command will beopenshift-install.