-
Notifications
You must be signed in to change notification settings - Fork 1.5k
data/data/bootstrap: use loopback kubeconfig for API access #2086
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
data/data/bootstrap: use loopback kubeconfig for API access #2086
Conversation
2d72ab0 to
19415cb
Compare
|
/retest |
19415cb to
16ccc5d
Compare
|
I wasn't aware of this. Coming soon! |
16ccc5d to
82d81d9
Compare
|
Grepping around for |
i'm not too concerned with this... the use-case for this script is that bootstrapping failed and we have an api running.. but bootstrap apiserver is usually not running when this is run... so keeping it LB is fine.. progress always runs when the bootstrapping completes... so it needs to stay LB.
openstack doesn't end up with weird LB blackhole and can keep using it if it likes.. i'm not concerned when it's platform specific.. |
|
/lgtm |
|
[APPROVALNOTIFIER] This PR is APPROVED This pull-request has been approved by: jhixson74, wking The full list of commands accepted by this bot can be found here. The pull request process is described here DetailsNeeds approval from an approver in each of these files:
Approvers can indicate their approval by writing |
|
/retest Please review the full test history for this PR and help us cut down flakes. |
|
@jhixson74: The following tests failed, say
Full PR test history. Your PR dashboard. Please help us cut down on flakes by linking to an open issue when you hit one in your PR. DetailsInstructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. I understand the commands that are listed here. |
…ys for etcd-signer Since the pivots to prefer loopback Kube-API access: * bf59ebf (azure: generate loopback kubeconfig to access API locally, 2019-07-17, openshift#2085). * 82d81d9 (data/data/bootstrap: use loopback kubeconfig for API access, 2019-07-24, openshift#2086). * openshift/cluster-bootstrap@61d1428bea (pkg/start: use loopback kubeconfig to talk to API, 2019-07-23, openshift/cluster-bootstrap#28). * possibly more logs on the bootstrap machine have contained distracting errors like these reported in [1]: $ grep 'not localhost\|etcd-signer' journal-bootstrap.log ... Aug 20 10:33:56 cnv-qe-08.cnvqe.lab.eng.rdu2.redhat.com podman[8366]: 2019-08-20 10:33:56.090073216 +0000 UTC m=+2.644782091 container start d0dcc42a1335c1224df35a48a279f63f1cb7a03c94de5ebb29e2633e6ee6c429 (image=quay.io/openshift-release-dev/ocp-v4.0-art-dev@sha256:f20394d571ff9a28aed9366434521d221d8d743a6efe2a3d6c6ad242198a522e, name=etcd-signer) Aug 20 10:33:58 cnv-qe-08.cnvqe.lab.eng.rdu2.redhat.com openshift.sh[2867]: error: unable to recognize "./99_kubeadmin-password-secret.yaml": Get https://localhost:6443/api?timeout=32s: x509: certificate is valid for api.bm1.oc4, not localhost Aug 20 10:34:01 cnv-qe-08.cnvqe.lab.eng.rdu2.redhat.com approve-csr.sh[2870]: Unable to connect to the server: x509: certificate is valid for api.bm1.oc4, not localhost ... Aug 20 10:43:55 cnv-qe-08.cnvqe.lab.eng.rdu2.redhat.com openshift.sh[2867]: error: unable to recognize "./99_kubeadmin-password-secret.yaml": Get https://localhost:6443/api?timeout=32s: x509: certificate is valid for api.bm1.oc4, not localhost Aug 20 10:43:59 cnv-qe-08.cnvqe.lab.eng.rdu2.redhat.com podman[15272]: 2019-08-20 10:43:59.68789639 +0000 UTC m=+0.188325679 container died d0dcc42a1335c1224df35a48a279f63f1cb7a03c94de5ebb29e2633e6ee6c429 (image=quay.io/openshift-release-dev/ocp-v4.0-art-dev@sha256:f20394d571ff9a28aed9366434521d221d8d743a6efe2a3d6c6ad242198a522e, name=etcd-signer) ... With this commit, we pass the localhost cert to etcd-signer so we can form the TLS connection to gracefully say "sorry, I'm not really a Kube API server". Fixes [2]. [1]: https://bugzilla.redhat.com/show_bug.cgi?id=1743661 [2]: https://bugzilla.redhat.com/show_bug.cgi?id=1743840
With this commit, I take advantage of openshift/cluster-bootstrap@fc5e0941 (start: wire the library-go dynamic client create, 2019-02-05, openshift/cluster-bootstrap#14) to replace our previous openshift.sh (with a minor change to the manifest directory). I'm currently using a cp in bootkube.sh to shift those manifests into the generic directory; I plan on consolidating Openshift into Manifests in pkg/asset/manifests in follow-up work. This change is especially important since the pivot to loopback kubeconfigs in openshift.sh: 82d81d9 (data/data/bootstrap: use loopback kubeconfig for API access, 2019-07-24, openshift#2086), because once cluster-bootstrap (launched from bootkube.sh) decides it's done it tears down the bootstrap control plane. Without the bootstrap control plane, further attempts by openshift.sh to push manifests via the loopback kubeconfig fail [1]. We could roll reporting into bootkube.sh as well (dropping progress.service), but Abhinav wanted to keep it separate [2]. [1]: https://bugzilla.redhat.com/show_bug.cgi?id=1748452 [2]: openshift#1381 (comment)
…ntrol-plane teardown Since 82d81d9 (data/data/bootstrap: use loopback kubeconfig for API access, 2019-07-24, openshift#2086), we've been pushing the OpenShift-specific components via a loopback kubeconfig and the bootstrap control plane. Since 108a45b (data/bootstrap: Replace openshift.sh with cluster-bootstrap, 2019-01-24, landed 2019-09-30, openshift#1381), that push has always happened before the bootstrap control plane shuts down. I've changed "into" to "via" because the data passes through either control plane on its way to rest in the shared etcd cluster. I've dropped "then" from the final step because none of the other steps said "then". I think ordering is clear enough from our use of an ordered list ;).
…ys for etcd-signer Since the pivots to prefer loopback Kube-API access: * bf59ebf (azure: generate loopback kubeconfig to access API locally, 2019-07-17, openshift#2085). * 82d81d9 (data/data/bootstrap: use loopback kubeconfig for API access, 2019-07-24, openshift#2086). * openshift/cluster-bootstrap@61d1428bea (pkg/start: use loopback kubeconfig to talk to API, 2019-07-23, openshift/cluster-bootstrap#28). * possibly more logs on the bootstrap machine have contained distracting errors like these reported in [1]: $ grep 'not localhost\|etcd-signer' journal-bootstrap.log ... Aug 20 10:33:56 cnv-qe-08.cnvqe.lab.eng.rdu2.redhat.com podman[8366]: 2019-08-20 10:33:56.090073216 +0000 UTC m=+2.644782091 container start d0dcc42a1335c1224df35a48a279f63f1cb7a03c94de5ebb29e2633e6ee6c429 (image=quay.io/openshift-release-dev/ocp-v4.0-art-dev@sha256:f20394d571ff9a28aed9366434521d221d8d743a6efe2a3d6c6ad242198a522e, name=etcd-signer) Aug 20 10:33:58 cnv-qe-08.cnvqe.lab.eng.rdu2.redhat.com openshift.sh[2867]: error: unable to recognize "./99_kubeadmin-password-secret.yaml": Get https://localhost:6443/api?timeout=32s: x509: certificate is valid for api.bm1.oc4, not localhost Aug 20 10:34:01 cnv-qe-08.cnvqe.lab.eng.rdu2.redhat.com approve-csr.sh[2870]: Unable to connect to the server: x509: certificate is valid for api.bm1.oc4, not localhost ... Aug 20 10:43:55 cnv-qe-08.cnvqe.lab.eng.rdu2.redhat.com openshift.sh[2867]: error: unable to recognize "./99_kubeadmin-password-secret.yaml": Get https://localhost:6443/api?timeout=32s: x509: certificate is valid for api.bm1.oc4, not localhost Aug 20 10:43:59 cnv-qe-08.cnvqe.lab.eng.rdu2.redhat.com podman[15272]: 2019-08-20 10:43:59.68789639 +0000 UTC m=+0.188325679 container died d0dcc42a1335c1224df35a48a279f63f1cb7a03c94de5ebb29e2633e6ee6c429 (image=quay.io/openshift-release-dev/ocp-v4.0-art-dev@sha256:f20394d571ff9a28aed9366434521d221d8d743a6efe2a3d6c6ad242198a522e, name=etcd-signer) ... With this commit, we pass the localhost cert to etcd-signer so we can form the TLS connection to gracefully say "sorry, I'm not really a Kube API server". Fixes [2]. [1]: https://bugzilla.redhat.com/show_bug.cgi?id=1743661 [2]: https://bugzilla.redhat.com/show_bug.cgi?id=1743840
With this commit, I take advantage of openshift/cluster-bootstrap@fc5e0941 (start: wire the library-go dynamic client create, 2019-02-05, openshift/cluster-bootstrap#14) to replace our previous openshift.sh (with a minor change to the manifest directory). I'm currently using a cp in bootkube.sh to shift those manifests into the generic directory; I plan on consolidating Openshift into Manifests in pkg/asset/manifests in follow-up work. This change is especially important since the pivot to loopback kubeconfigs in openshift.sh: 82d81d9 (data/data/bootstrap: use loopback kubeconfig for API access, 2019-07-24, openshift#2086), because once cluster-bootstrap (launched from bootkube.sh) decides it's done it tears down the bootstrap control plane. Without the bootstrap control plane, further attempts by openshift.sh to push manifests via the loopback kubeconfig fail [1]. We could roll reporting into bootkube.sh as well (dropping progress.service), but Abhinav wanted to keep it separate [2]. [1]: https://bugzilla.redhat.com/show_bug.cgi?id=1748452 [2]: openshift#1381 (comment)
…ntrol-plane teardown Since 82d81d9 (data/data/bootstrap: use loopback kubeconfig for API access, 2019-07-24, openshift#2086), we've been pushing the OpenShift-specific components via a loopback kubeconfig and the bootstrap control plane. Since 108a45b (data/bootstrap: Replace openshift.sh with cluster-bootstrap, 2019-01-24, landed 2019-09-30, openshift#1381), that push has always happened before the bootstrap control plane shuts down. I've changed "into" to "via" because the data passes through either control plane on its way to rest in the shared etcd cluster. I've dropped "then" from the final step because none of the other steps said "then". I think ordering is clear enough from our use of an ordered list ;).
This code copies the loopback kubeconfig into the kubernetes configuration so that static pods can use it. approve-csr is also modified to use it in its service script.
This is necessary due to a limitation with Azure internal load balancers. See limitation #2 here: https://docs.microsoft.com/en-us/azure/load-balancer/load-balancer-overview#limitations
"Unlike public Load Balancers which provide outbound connections when transitioning from private IP addresses inside the virtual network to public IP addresses, internal Load Balancers do not translate outbound originated connections to the frontend of an internal Load Balancer as both are in private IP address space. This avoids potential for SNAT port exhaustion inside unique internal IP address space where translation is not required. The side effect is that if an outbound flow from a VM in the backend pool attempts a flow to frontend of the internal Load Balancer in which pool it resides and is mapped back to itself, both legs of the flow don't match and the flow will fail."
https://jira.coreos.com/browse/CORS-1094