-
Notifications
You must be signed in to change notification settings - Fork 1.5k
BUG 1685704: assets: use internal apiserver name for all internal clients #1633
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
BUG 1685704: assets: use internal apiserver name for all internal clients #1633
Conversation
|
/hold We don't want to land this until we have everyone on board (which may already be the case, but I want to be sure ;) |
|
you need to change the kubeconfig for kubelet to use $ rg 'https://api' ./pkg/asset
pkg/asset/kubeconfig/kubeconfig.go
34: Server: fmt.Sprintf("https://api.%s:6443", installConfig.ClusterDomain()),
pkg/asset/kubeconfig/kubeconfig_test.go
65: server: https://api.test-cluster-name.test.example.com:6443
89: server: https://api.test-cluster-name.test.example.com:6443
pkg/asset/manifests/utils.go
37: return fmt.Sprintf("https://api.%s:6443", ic.ClusterDomain()) |
|
/test e2e-metal |
|
Interesting. All the worker kubelets are |
|
Unfortunately the kubelet logs were not captured from the masters so it is hard to tell what happened. I'll try to recreate manually and see what is going on. |
|
I spent most of the day trying to figure this out to no avail. There must be some side effect I'm not intending. The only effect I intend is that the Any help would be appreciated! 🙏 |
… API
The `.status.apiServerURL` [1] states the this value can be used by components like kubelet on machines, to contact the `apisever` using the infrastructure provider rather than the kubernetes networking.
Therefore, this will be set to `api-int.$cluster_domain` as that is URL that should be used by the LB consumers of apiserver.
currently the external serving cert controller is using this value to generate the certificate, when it should be the one doing the replace magic from [2] and not the internal serving cert controller and which causes wrong certificate generation
when we finally switch to `api-int` [3]:
```console
$ oc get secrets -n openshift-kube-apiserver external-loadbalancer-serving-certkey -oyaml
apiVersion: v1
data:
tls.crt: LS0tLS1CRUdJTiBDRVJUSUZJQ0FURS0tLS0tCk1JSURYekNDQWtlZ0F3SUJBZ0lJYzJwVFVHSWw3Mmd3RFFZSktvWklodmNOQVFFTEJRQXdOekVTTUJBR0ExVUUKQ3hNSmIzQmxibk5vYVdaME1TRXdId1lEVlFRREV4aHJkV0psTFdGd2FYTmxjblpsY2kxc1lpMXphV2R1WlhJdwpIaGNOTVRrd05ERTNNak0xT0RNeVdoY05NVGt3TlRFM01qTTFPRE16V2pBbk1TVXdJd1lEVlFRREV4eGhjR2t0CmFXNTBMbUZrWVdocGVXRXRNQzUwZEM1MFpYTjBhVzVuTUlJQklqQU5CZ2txaGtpRzl3MEJBUUVGQUFPQ0FROEEKTUlJQkNnS0NBUUVBb293WHNVMW51TWRMSkhneUVWNTZXZmpTSDVCdmpqQ3c4eGJWYTVKNlRPUDhOTWVPaEtuOAowRVF6VDYzTTdQdTFtdlhqVmJ5T1JlcWZhL3Y4ZXJjdzIwS1liUDY3QXoyY2JrUXplUmpuVUVsVC8yU2hYa1E2CkVhS3Y4VGM4dlM3SFRhYkZTZVRmTW5RWHhNT0FOUDRyTW51NmpLUmw3aC90WU5Jck1xZFB3YUJaK0UyOWpBSW0KR0VaSUdCWDJWLy9Vb0hyK05vM2hHL0c1Ykdua3JhaUdUYkhTSjdQcExoNGJFYmFPTWlJWmR3K215WEtvS2ZScQo5SkVKd3llTDNueUNlQVQ1Y0tDa0NGZTR0eDQvcGMweTdQYmtHTmJzMG8yaHpMTkhHdTBNYVgwV2NYZkdObnhvCno4Y1p0S0ZGNjdaT25ZSGRtdGErVStjNWllcHZLTkJzY1FJREFRQUJvMzh3ZlRBT0JnTlZIUThCQWY4RUJBTUMKQmFBd0V3WURWUjBsQkF3d0NnWUlLd1lCQlFVSEF3RXdEQVlEVlIwVEFRSC9CQUl3QURBZkJnTlZIU01FR0RBVwpnQlNFQ09ZaFdUNDZObTdCOGQ0cHdYbEpQUXpDa3pBbkJnTlZIUkVFSURBZWdoeGhjR2t0YVc1MExtRmtZV2hwCmVXRXRNQzUwZEM1MFpYTjBhVzVuTUEwR0NTcUdTSWIzRFFFQkN3VUFBNElCQVFCbUVIOXQ5dGZDOXBMSVVXQUUKakE0dHBsZXpLVi9LR0ppZVh1dW5Td0NKU0FiNUcyclBMallVRGRRa0pJeks4cHdtNW1oMys2Vzd3bEo4ZVhWaAptVVJWT1RHNUlKQkkzdk1RT0hwWkt6YjY3a0ZZajVleSs3U2FmNjdKZzJ2L0lxS09QdmpVeU9VcUd6ckFWanI5Cm5FM2g3cVNwQXRYZk50Mk5IOGxuMTVKNFlsN1hhZkd6cGppUVZ2UVdBUHpoSnYxc2ZRVElpc0lSb2FvK2ZCUEcKRmtLcDUxRXRvc0xvbmtYRFZ5UGdzQVR4MW5jU2RvRE94WWhueXJteWEyT0Q2MVlMSDloalpjY2lENzBMdTVnOApFcUhrenhCL1F3S0JzbFdDUkNxYktpWkRVd0UvcWtEMmVST0NiS0hLckVKME8wWEVHVCtNRk1wU1RYcWIyaDA5CjhUV3IKLS0tLS1FTkQgQ0VSVElGSUNBVEUtLS0tLQotLS0tLUJFR0lOIENFUlRJRklDQVRFLS0tLS0KTUlJRE1qQ0NBaHFnQXdJQkFnSUliRHFHUFpCdlk0a3dEUVlKS29aSWh2Y05BUUVMQlFBd056RVNNQkFHQTFVRQpDeE1KYjNCbGJuTm9hV1owTVNFd0h3WURWUVFERXhocmRXSmxMV0Z3YVhObGNuWmxjaTFzWWkxemFXZHVaWEl3CkhoY05NVGt3TkRFM01qTTBPREExV2hjTk1qa3dOREUwTWpNME9EQTFXakEzTVJJd0VBWURWUVFMRXdsdmNHVnUKYzJocFpuUXhJVEFmQmdOVkJBTVRHR3QxWW1VdFlYQnBjMlZ5ZG1WeUxXeGlMWE5wWjI1bGNqQ0NBU0l3RFFZSgpLb1pJaHZjTkFRRUJCUUFEZ2dFUEFEQ0NBUW9DZ2dFQkFPL1E0b3hYTjVvYkJ4WEptR3ZuOWkrRWpJRzJWN0RxCnFrTFFqMHU5STdnUDFMV3dEUE4rcUtFRWwzZnJjTGl2d0EvdTd5MlJ2Z0gxcDByRmFEUmhPYnBDWHU0VVN2aUYKLzVJZjl3dzFvMGlDRlNCczFmQm9GMERTNE1kMWc3cnhCVzVlTDlNVllsMGU1QzB0YkNVc3BWamkyNnR5K0dTWgpKNlRYVS9idEErV044STNnOUhQUHRZcnRLVVpycnBCUFpTWDNjZWxIWDlDczRFaDdFdXdrRFR6T2N4VGRsMUdoCjB4U1V5bE9lOU92eVVNVWM3SHpOS29QMGlRZE9scXNwL0ZvNjFwdHBOM0xUS1FCWUU5VUR2SUNFZ1NscXlzWFMKR3I1UzU0cFVMYzdva21LaXVLc0lkK0ZYWDd5STFGQTY1VVFUZDJOc2sweTFJNTF4VGp3LzN3OENBd0VBQWFOQwpNRUF3RGdZRFZSMFBBUUgvQkFRREFnS2tNQThHQTFVZEV3RUIvd1FGTUFNQkFmOHdIUVlEVlIwT0JCWUVGSVFJCjVpRlpQam8yYnNIeDNpbkJlVWs5RE1LVE1BMEdDU3FHU0liM0RRRUJDd1VBQTRJQkFRQ1dSdVpTaURaYjQ4NzcKNmZhVWpvQTg5eld5Rm15L0VodTBURFpSSTFTSElaQjhmM1FINGlMdEhzMU53VGZDcU5XSTNOeFNxQ0Ntdmx6ZAp0ZGppVjdPUkxRZjc3WVVWdU9ZclF2M0RuK2ozcmZ1dCtoSzMyM3hjTDdYbXBpRnRIUXhac2NQZ2JqaUd1dS9XCmdhYlZBOXF3MWUxQit3b25UWlY0YWM1bjJ5N1QxbzdoeWE3WnY5RWJ5RFp3cnJPN2NBSmJjWS9pUDJnM0pnMG4KS0hJT2FVSnRZZjF0RDNCSWhzWVIzL0tQVStLU0lyQ2R3Y0ptbXYyZ1V5eGJLTzlFb2QyU2R0RDRJZEVKVTNobQpFUjhkcmFFK3Y5MzJHVHZYWWYwRUFOdHExam5ycFRxajNIVUFMajVZTm1UdE5DQTVEdHhWa2xaZ0I4RkJpdS9NCitjNmJ3UjJVCi0tLS0tRU5EIENFUlRJRklDQVRFLS0tLS0K
tls.key: ....
kind: Secret
metadata:
annotations:
auth.openshift.io/certificate-hostnames: api-int.adahiya-0.tt.testing
auth.openshift.io/certificate-issuer: kube-apiserver-lb-signer
auth.openshift.io/certificate-not-after: 2019-05-17T23:58:33Z
auth.openshift.io/certificate-not-before: 2019-04-17T23:58:32Z
creationTimestamp: 2019-04-17T23:58:37Z
labels:
auth.openshift.io/managed-certificate-type: target
name: external-loadbalancer-serving-certkey
namespace: openshift-kube-apiserver
resourceVersion: "2560"
selfLink: /api/v1/namespaces/openshift-kube-apiserver/secrets/external-loadbalancer-serving-certkey
uid: b783479c-616c-11e9-8bba-52fdfc072182
type: kubernetes.io/tls
$ xclip -sel c -o | base64 -d | openssl x509 -noout -text
Certificate:
Data:
Version: 3 (0x2)
Serial Number: 8316551266602184552 (0x736a53506225ef68)
Signature Algorithm: sha256WithRSAEncryption
Issuer: OU = openshift, CN = kube-apiserver-lb-signer
Validity
Not Before: Apr 17 23:58:32 2019 GMT
Not After : May 17 23:58:33 2019 GMT
Subject: CN = api-int.adahiya-0.tt.testing
X509v3 extensions:
X509v3 Key Usage: critical
Digital Signature, Key Encipherment
X509v3 Extended Key Usage:
TLS Web Server Authentication
X509v3 Basic Constraints: critical
CA:FALSE
X509v3 Authority Key Identifier:
keyid:84:08:E6:21:59:3E:3A:36:6E:C1:F1:DE:29:C1:79:49:3D:0C:C2:93
X509v3 Subject Alternative Name:
DNS:api-int.adahiya-0.tt.testing
```
This commit moves the replace magic [2] from internal to external serving cert controller.
[1]: https://github.com/openshift/api/blob/13b403bfb6ce84ddc053bd3b401b5d67bf175efa/config/v1/types_infrastructure.go#L55-L58
[2]: openshift#405
[3]: openshift/installer#1633
aff9253 to
668f56b
Compare
|
making the changes @abhinavdahiya suggested and now my etcd-members won't come up, stuck in the sounds like the bootstrap apiserver is not configured with the cert for |
|
from |
|
gah, this isn't the real apiserver (since there is no etcd yet). it is the |
668f56b to
20464f8
Compare
20464f8 to
5e6c5e9
Compare
5e6c5e9 to
c8c463c
Compare
c8c463c to
052fcee
Compare
|
rebased due to collision with #1640 |
|
/retest |
|
i'm seeing this locally now self-hosted apiserver doesn't have a cert for |
|
[APPROVALNOTIFIER] This PR is APPROVED This pull-request has been approved by: abhinavdahiya, sjenning The full list of commands accepted by this bot can be found here. The pull request process is described here DetailsNeeds approval from an approver in each of these files:
Approvers can indicate their approval by writing |
|
/hold cancel |
|
/hold Looks like we want to merge this after beta4 ships ie after monday |
/hold cancel we are ready to merge this. |
|
@sjenning: The following test failed, say
Full PR test history. Your PR dashboard. Please help us cut down on flakes by linking to an open issue when you hit one in your PR. DetailsInstructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. I understand the commands that are listed here. |
|
/retest Please review the full test history for this PR and help us cut down flakes. |
Currently the etcd-signer-server does not support serving TLS traffic based on SNI. Therefore, since openshift's kube-apiserver serves traffic on api.$cluster_domain and api-int.$cluster_domain [1] and because during bootstrapping when etcd-signer-server is mimic-ing the kube-apiserver to sign the etcd clients certificates it can only serve traffic on single domain, external clients trying to connect to `:6443` from api.$cluster_domain see errors like, ```console time="2019-04-24T13:25:11-07:00" level=debug msg="Still waiting for the Kubernetes API: Get https://api.adahiya-0.tt.testing:6443/version?timeout=32s: x509: certificate is valid for api-int.adahiya-0.tt.testing, not api.adahiya-0.tt.testing" ``` as the etcd-signer-server is using certs for `api-int.$cluster_domain` as internal clients ie etcd agent is contacting it on that domain, the external clients ie the installer hits the etcd-signer on `api.$cluster_domain` Allowing etcd-signer-server to accept multiple certs and serve TLS based on SNI allows it to correctly mimic the kube-apiserver's capability. [1]: openshift/installer#1633
Currently the etcd-signer-server does not support serving TLS traffic based on SNI. Therefore, since openshift's kube-apiserver serves traffic on api.$cluster_domain and api-int.$cluster_domain [1] and because during bootstrapping when etcd-signer-server is mimic-ing the kube-apiserver to sign the etcd clients certificates it can only serve traffic on single domain, external clients trying to connect to `:6443` from api.$cluster_domain see errors like, ```console time="2019-04-24T13:25:11-07:00" level=debug msg="Still waiting for the Kubernetes API: Get https://api.adahiya-0.tt.testing:6443/version?timeout=32s: x509: certificate is valid for api-int.adahiya-0.tt.testing, not api.adahiya-0.tt.testing" ``` as the etcd-signer-server is using certs for `api-int.$cluster_domain` as internal clients ie etcd agent is contacting it on that domain, the external clients ie the installer hits the etcd-signer on `api.$cluster_domain` Allowing etcd-signer-server to accept multiple certs and serve TLS based on SNI allows it to correctly mimic the kube-apiserver's capability. [1]: openshift/installer#1633
Currently the etcd-signer-server does not support serving TLS traffic based on SNI. Therefore, since openshift's kube-apiserver serves traffic on api.$cluster_domain and api-int.$cluster_domain [1] and because during bootstrapping when etcd-signer-server is mimic-ing the kube-apiserver to sign the etcd clients certificates it can only serve traffic on single domain, external clients trying to connect to `:6443` from api.$cluster_domain see errors like, ```console time="2019-04-24T13:25:11-07:00" level=debug msg="Still waiting for the Kubernetes API: Get https://api.adahiya-0.tt.testing:6443/version?timeout=32s: x509: certificate is valid for api-int.adahiya-0.tt.testing, not api.adahiya-0.tt.testing" ``` as the etcd-signer-server is using certs for `api-int.$cluster_domain` as internal clients ie etcd agent is contacting it on that domain, the external clients ie the installer hits the etcd-signer on `api.$cluster_domain` Allowing etcd-signer-server to accept multiple certs and serve TLS based on SNI allows it to correctly mimic the kube-apiserver's capability. [1]: openshift/installer#1633
Currently the etcd-signer-server does not support serving TLS traffic based on SNI. Therefore, since openshift's kube-apiserver serves traffic on api.$cluster_domain and api-int.$cluster_domain [1] and because during bootstrapping when etcd-signer-server is mimic-ing the kube-apiserver to sign the etcd clients certificates it can only serve traffic on single domain, external clients trying to connect to `:6443` from api.$cluster_domain see errors like, ```console time="2019-04-24T13:25:11-07:00" level=debug msg="Still waiting for the Kubernetes API: Get https://api.adahiya-0.tt.testing:6443/version?timeout=32s: x509: certificate is valid for api-int.adahiya-0.tt.testing, not api.adahiya-0.tt.testing" ``` as the etcd-signer-server is using certs for `api-int.$cluster_domain` as internal clients ie etcd agent is contacting it on that domain, the external clients ie the installer hits the etcd-signer on `api.$cluster_domain` Allowing etcd-signer-server to accept multiple certs and serve TLS based on SNI allows it to correctly mimic the kube-apiserver's capability. [1]: openshift/installer#1633
…-int Catching up with 13e4b70 (data/aws: create an api-int dns name, 2019-04-11, openshift#1601) and 052fcee (asset/manifests: use internal apiserver name, 2019-04-17, openshift#1633).
Catching up with 13e4b70 (data/aws: create an api-int dns name, 2019-04-11, openshift#1601), now that 052fcee (asset/manifests: use internal apiserver name, 2019-04-17, openshift#1633) has moved some internal assets over to that name.
Add the hosts plugin to coredns so that we can create a static entry for api-int.$CLUSTER_DOMAIN. hosts is used because it doesn't have to be authoritative for the zone and can allow fallthrough of records that are not found. For more details, see https://bugzilla.redhat.com/show_bug.cgi?id=1685704 and openshift/installer#1633
BZ1685704 requires that all internal client ie openshift cluster-infra clients inside the cluster that talk to the apiserver on LB needs to move to using
api-int.$cluster_domainso that customers can modify the external LB URL for apiserver without affecting the internal clients.data/data/bootstrap/files/usr/local/bin/bootkube.sh.templateAll the internal clients which includes the etcd cert agent moved to contacting the apiserver on
api-int.$cluster_domaintherefore the etcd signer on the bootstrap node needs to use the serving cert for theapi-intpkg/asset/kubeconfig/The admin kubeconfig that installer provides its users needs to continue the apiserver on
api.$cluster_domainThe kubelet kubeconfig is moved to use
api-int.$cluster_domainas kubelets are internal clients to apiserver as kubelets are internal clients to apiserver.pkg/asset/manifestsThis change changes the
.status.apiServerURLforclusterinfrastructures.config.openshift.io1 to point all internal client to useapi-int.$cluster_domainfor contacting apiserver.@deads2k @abhinavdahiya @wking
see if this works now that the groundwork is laid
xref https://bugzilla.redhat.com/show_bug.cgi?id=1685704