-
Notifications
You must be signed in to change notification settings - Fork 1.5k
Bug 2072202: Check for api and api-int resolution during cluster install #5816
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Bug 2072202: Check for api and api-int resolution during cluster install #5816
Conversation
e8e03e9 to
891fe87
Compare
|
/retest |
patrickdillon
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It is unlikely that api-int is resolvable from the installer host, so this check would need to be run on the bootstrap host.
I think it would be a good idea to enumerate the failure cases we are trying to improve upon.
right, its not. the coredns static pod running on the nodes configures the A record for api-int |
The installer would be able to resolve api-int in the edge case where it is installing to an existing vpc (e.g. aws, azure, gcp) and the host is running within that vpc, right? Like I said, this is an edge case and does not affect the outcome for this PR, but trying to add context for @sadasu about why I said "unlikely." Also trying to edify myself in case I'm mistaken. |
Sorry yep, I was thinking onprem ;-) |
891fe87 to
24bcc65
Compare
|
@patrickdillon @jcpowermac thanks for the helpful comments. Hopefully the current location triggers this verification at the correct point on the bootstrap node on an install failure. Couple of questions:
|
5828d56 to
230e2e0
Compare
|
/cc |
jstuever
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This looks like a solid approach. I do have questions about how the new shell script is being referenced from bootkube.sh. I expect this is missing a source line at the top. Otherwise, looks good.
2050020 to
64d16cd
Compare
patrickdillon
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This looks good. Left some nits and ideas. Personally I would like to consider some different terminology than "DNS config."
data/data/bootstrap/files/usr/local/bin/bootstrap-verify-dns-config.sh
Outdated
Show resolved
Hide resolved
data/data/bootstrap/files/usr/local/bin/bootstrap-verify-dns-config.sh
Outdated
Show resolved
Hide resolved
65307c8 to
84928d0
Compare
|
/retest-required |
1 similar comment
|
/retest-required |
472c869 to
a8569df
Compare
|
/test e2e-alibaba |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Except for some nitpicking on the indentation of some lines, it seems to be working as intended:
level=error msg=Failed to wait for bootstrapping to complete. This error usually happens when there is a problem with control plane hosts that prevents the control plane operators from creating the control plane.
level=error msg=The bootstrap machine is unable to resolve API and/or API-Int Server URLs
level=info msg=Successfully resolved API_INT_URL api-int.ci-op-97x1j7tm-920ba.alicloud-dev.devcluster.openshift.com
level=info msg=Unable to reach API_INT_URL's https endpoint at https://10.0.101.111:6443/version
level=info msg=It might be too early for the https://10.0.101.111:6443/version to be available.
data/data/bootstrap/files/usr/local/bin/bootstrap-verify-api-server-urls.sh
Outdated
Show resolved
Hide resolved
data/data/bootstrap/files/usr/local/bin/bootstrap-verify-api-server-urls.sh
Outdated
Show resolved
Hide resolved
data/data/bootstrap/files/usr/local/bin/bootstrap-verify-api-server-urls.sh
Outdated
Show resolved
Hide resolved
data/data/bootstrap/files/usr/local/bin/bootstrap-verify-api-server-urls.sh
Outdated
Show resolved
Hide resolved
data/data/bootstrap/files/usr/local/bin/bootstrap-verify-api-server-urls.sh
Outdated
Show resolved
Hide resolved
data/data/bootstrap/files/usr/local/bin/bootstrap-verify-api-server-urls.sh
Outdated
Show resolved
Hide resolved
data/data/bootstrap/files/usr/local/bin/bootstrap-verify-api-server-urls.sh
Outdated
Show resolved
Hide resolved
On the bootstrap node, verify if the API URL and API-INT URLS are resolvable when cluster bringup fails.
a8569df to
6bfb2e6
Compare
|
/lgtm |
|
/retest-required |
|
/skip |
|
/cc @patrickdillon |
|
/hold Revision 6bfb2e6 was retested 3 times: holding |
|
@sadasu: The following tests failed, say
Full PR test history. Your PR dashboard. DetailsInstructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. I understand the commands that are listed here. |
|
/test e2e-gcp-ovn |
|
/hold cancel |
|
@sadasu: All pull requests linked via external trackers have merged: Bugzilla bug 2072202 has been moved to the MODIFIED state. DetailsIn response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. |
During cluster bring up, when nodes are unable to resolve api-int, the cluster fails to come up and the messages in the logs, several times do not hint at any connectivity issues making it difficult to detect and resolve any infrastructure setup issues.
These changes contain: