-
Notifications
You must be signed in to change notification settings - Fork 39.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Fixes using externally managed certs for kubeadm #68296
Fixes using externally managed certs for kubeadm #68296
Conversation
/sig cluster-lifecycle |
/cc @timothysc |
a release note is possible needed. wasn't the cert overhaul after this issue chronologically:
/kind bug |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
added some nits.
thanks for the bug fix @liztio
|
||
writePKIFiles(t, tmpdir, test.files) | ||
|
||
err := CreatePKIAssets(cfg) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
err := CreatePKIAssets(cfg)
if err != nil {
if test.expectError {
return
}
t.Fatalf("Unexpected error: %v", err)
}
if test.expectError {
t.Fatal("Expected error from CreatePKIAssets, got none")
}
can this be refactored to the following:
if err := CreatePKIAssets(cfg); (err != nil) != test.expectError {
t.Fatalf("expected error: %v, got: %v, error: %v", test.expectError, (err != nil), err)
}
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This turns out to not quite be the case: if there's an error and we expect it, we don't want to continue to run assertCertsExist
|
||
for caCert, certs := range tree { | ||
if err := validateCACert(certKeyLocation{dir, caCert.BaseName, "", caCert.Name}); err != nil { | ||
t.Errorf("Couldn't validate CA certificate %v: %v", caCert.Name, err) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Couldn't
-> couldn't
uxName: leaf.Name, | ||
} | ||
if err := validateSignedCertWithCA(cl, caCert); err != nil { | ||
return fmt.Errorf("couldn't load expected cert %q and key %q does not exist: %v", leaf.Name, ca.Name, err) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
possibly mod the message to:
could not load expected certificate %q or validate the existence of key %q for it: %v
🤔 ?
if err == nil { | ||
// Cert exists already, make sure it's valid | ||
if !caCert.IsCA { | ||
return fmt.Errorf("certificate %q should be a CA but is not", ca.Name) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
certificate %q should be a CA but is not
->
certificate %q is not a CA
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm ok with the statement.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
/approve
@liztio needs a release note if it was originally reported in 1.11 and fixed in 1.12
@neolit123 huh, maybe it was! I honestly just assumed that I broke it! |
5a00dbd
to
e0d5c09
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
/lgtm
/approve
Walk the certificate tree, at each step checking for a CACert. If the CACert is found, try to use it to generate certificates. Otherwise, generate a new CA cert.
e0d5c09
to
cda8c39
Compare
} | ||
|
||
// validateSignedCertWithCA tries to load a certificate and validate it with the given caCert | ||
func validateSignedCertWithCA(l certKeyLocation, caCert *x509.Certificate) error { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
not urgent, but might be good to get test cases for this function.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's just a slight generalisation of the existing validateSignedCert so I didn't bother.
/lgtm |
[APPROVALNOTIFIER] This PR is APPROVED This pull-request has been approved by: liztio, timothysc The full list of commands accepted by this bot can be found here. The pull request process is described here
Needs approval from an approver in each of these files:
Approvers can indicate their approval by writing |
Automatic merge from submit-queue (batch tested with PRs 68087, 68256, 64621, 68299, 68296). If you want to cherry-pick this change to another branch, please follow the instructions here: https://github.com/kubernetes/community/blob/master/contributors/devel/cherry-picks.md. |
PR kubernetes#68296 got ahead of this. This reverts: - commit 17b4c19 - commit 2fd01a0 ----------[corresponding planned issue]--------- TITLE: kubeadm HA setup: silently broken certificate chains issue, leading to failed etcd clustering <!-- This form is for bug reports and feature requests ONLY! If you're looking for help check [Stack Overflow](https://stackoverflow.com/questions/tagged/kubernetes) and the [troubleshooting guide](https://kubernetes.io/docs/tasks/debug-application-cluster/troubleshooting/). If the matter is security related, please disclose it privately via https://kubernetes.io/security/. --> **Is this a BUG REPORT or FEATURE REQUEST?**: /kind bug **What happened**: Following the [HA setup documentation](https://kubernetes.io/docs/setup/independent/high-availability/) and having done the part `5. Run the kubeadm phase commands to bootstrap the kubelet`, etcd starts complaining about the certificate issue, with the cluster health never be reaching ok. - CP0 (CP0_IP: 192.168.122.27) ``` ----(snip)---- 2018-09-08 01:50:36.892124 I | etcdserver/membership: added member 10309eb7b886aec3 [https://192.168.122.30:2380] to cluster 50244b6b5ec9586c 2018-09-08 01:50:36.892205 I | rafthttp: starting peer 10309eb7b886aec3... 2018-09-08 01:50:36.892446 I | rafthttp: started HTTP pipelining with peer 10309eb7b886aec3 2018-09-08 01:50:36.894198 I | rafthttp: started peer 10309eb7b886aec3 2018-09-08 01:50:36.894236 I | rafthttp: added peer 10309eb7b886aec3 2018-09-08 01:50:36.894429 I | rafthttp: started streaming with peer 10309eb7b886aec3 (writer) 2018-09-08 01:50:36.894443 I | rafthttp: started streaming with peer 10309eb7b886aec3 (writer) 2018-09-08 01:50:36.894474 I | rafthttp: started streaming with peer 10309eb7b886aec3 (stream MsgApp v2 reader) 2018-09-08 01:50:36.894663 I | rafthttp: started streaming with peer 10309eb7b886aec3 (stream Message reader) 2018-09-08 01:50:37.845351 W | etcdserver: failed to reach the peerURL(https://192.168.122.30:2380) of member 10309eb7b886aec3 (Get https://192.168.122.30:2380/version: dial tcp 192.168.122.30:2380: getsockopt: connection refused) 2018-09-08 01:50:37.845466 W | etcdserver: cannot get the version of member 10309eb7b886aec3 (Get https://192.168.122.30:2380/version: dial tcp 192.168.122.30:2380: getsockopt: connection refused) 2018-09-08 01:50:37.900635 W | raft: d9e6643beae8a97f stepped down to follower since quorum is not active 2018-09-08 01:50:37.900702 I | raft: d9e6643beae8a97f became follower at term 2 2018-09-08 01:50:37.900741 I | raft: raft.node: d9e6643beae8a97f lost leader d9e6643beae8a97f at term 2 2018-09-08 01:50:38.901030 I | raft: d9e6643beae8a97f is starting a new election at term 2 2018-09-08 01:50:38.901193 I | raft: d9e6643beae8a97f became candidate at term 3 2018-09-08 01:50:38.901261 I | raft: d9e6643beae8a97f received MsgVoteResp from d9e6643beae8a97f at term 3 2018-09-08 01:50:38.901319 I | raft: d9e6643beae8a97f [logterm: 2, index: 553] sent MsgVote request to 10309eb7b886aec3 at term 3 /* * etcd on CP1 boots up, but its client cert is invalid.. */ > 2018-09-08 01:50:39.673266 I | etcdmain: rejected connection from "192.168.122.30:41466" (error "tls: failed to verify client's certificate: x509: certificate signed by unknown authority (possibly because of \"crypto/rsa: verification error\" while trying to verify candidate authority certificate \"etcd-ca\")", ServerName "") 2018-09-08 01:50:40.500956 I | raft: d9e6643beae8a97f is starting a new election at term 3 2018-09-08 01:50:40.501314 I | raft: d9e6643beae8a97f became candidate at term 4 2018-09-08 01:50:40.501751 I | raft: d9e6643beae8a97f received MsgVoteResp from d9e6643beae8a97f at term 4 2018-09-08 01:50:40.505545 I | raft: d9e6643beae8a97f [logterm: 2, index: 553] sent MsgVote request to 10309eb7b886aec3 at term 4 > 2018-09-08 01:50:40.938004 I | etcdmain: rejected connection from "192.168.122.30:41468" (error "tls: failed to verify client's certificate: x509: certificate signed by unknown authority (possibly because of \"crypto/rsa: verification error\" while trying to verify candidate authority certificate \"etcd-ca\")", ServerName "") 2018-09-08 01:50:41.801044 I | raft: d9e6643beae8a97f is starting a new election at term 4 2018-09-08 01:50:41.801112 I | raft: d9e6643beae8a97f became candidate at term 5 2018-09-08 01:50:41.801137 I | raft: d9e6643beae8a97f received MsgVoteResp from d9e6643beae8a97f at term 5 2018-09-08 01:50:41.801221 I | raft: d9e6643beae8a97f [logterm: 2, index: 553] sent MsgVote request to 10309eb7b886aec3 at term 5 ----(snip)---- ``` - CP1 (CP1_IP: 192.168.122.30) ``` ----(snip)---- 2018-09-08 14:53:32.763499 W | etcdserver: could not get cluster response from https://192.168.122.27:2380: Get https://192.168.122.27:2380/members: remote error: tls: bad certificate 2018-09-08 14:53:32.768630 I | etcdmain: rejected connection from "192.168.122.27:33996" (error remote error: tls: bad certificate", ServerName "") 2018-09-08 14:53:32.771694 C | etcdmain: cannot fetch cluster info from peer urls: could not retrieve cluster information from the given urls /* * and k8s_etcd_etcd dies.. */ ----(snip)---- ``` This is because the scp-ed CA root certificate for etcd clustering is not going to be used when generating the daughter certificate so etcd in the first stacked control plane cannot verify it. This problem is not confined to etcd, but all the other PKI assets which are expected to be signed by corresponding scp-ed CAs as well. `kubeadm init` and `kubeadm join` paths seem not to be affected. In my opinion, there are two options we can take: - ensure the capability to utilize the pre-installed CA with kubeadm certs command, to be consistent with the current documentation. - log some warning or error message saying that the certificate chains has been broken with the command. and I think the former is reasonable, so I will post PR for that later. **What you expected to happen**: etcd clustering successfully starts with properly setup certificates chains. **How to reproduce it (as minimally and precisely as possible)**: As mentioned in **What happened:** section above. **Anything else we need to know?**: Nothing in particular. **Environment**: - Kubernetes version (use `kubectl version`): v1.13.0-alpha.0.1072+2c933695fa61d5 - Cloud provider or hardware configuration: hardware configuration (qemu-kvm) - OS (e.g. from /etc/os-release): fedora 28 - Kernel (e.g. `uname -a`): 4.19.0-rc2 - Install tools: kubeadm - Others:
What this PR does / why we need it:
The certificates overhaul caused a regression when using external certificates. This fixes that issue so external CAs no longer require a key if all certificates exist.
Walk the certificate tree, at each step checking for a CACert.
If the CACert is found, try to use it to generate certificates.
Otherwise, generate a new CA cert.
Which issue(s) this PR fixes (optional, in
fixes #<issue number>(, fixes #<issue_number>, ...)
format, will close the issue(s) when PR gets merged):Fixes kubernetes/kubeadm#918
Special notes for your reviewer:
Release note: