-
Notifications
You must be signed in to change notification settings - Fork 219
Conversation
pkg/asset/internal/templates.go
Outdated
- apiGroups: ["certificates.k8s.io"] | ||
resources: ["certificatesigningrequests/nodeclient"] | ||
verbs: ["create"] | ||
- apiGroups: ["certificates.k8s.io"] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
note for later:
There's actually a system:node-bootstrapper
role we can just bind to https://github.com/kubernetes/kubernetes/blob/3a0d8f8fea75140e4a38f5165a75e93bcd28c60c/plugin/pkg/auth/authorizer/rbac/bootstrappolicy/testdata/cluster-roles.yaml#L724-L742
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done
pkg/tlsutil/tlsutil.go
Outdated
@@ -127,3 +128,21 @@ func NewSignedCertificate(cfg CertConfig, key *rsa.PrivateKey, caCert *x509.Cert | |||
} | |||
return x509.ParseCertificate(certDERBytes) | |||
} | |||
|
|||
// NewRandomToken generates an n-byte random hexadecimal token. The output will have length n*2. | |||
func NewRandomToken(n int) ([]byte, error) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: there's actually a defined format for another authenticator. We might just want to copy that for now: https://kubernetes.io/docs/admin/bootstrap-tokens/#token-format
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done
pkg/asset/tls.go
Outdated
// TLS organizations map to Kubernetes groups, and "system:masters" | ||
// is a well-known Kubernetes group that gives a user admin power. | ||
// | ||
// For now, put the kubelets in this group. Later we can restrict | ||
// their credentials, likely with the help of TLS bootstrapping. | ||
// For now, put the admins in this group. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We can remove this comment. We're not going to change this group ever.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done
Seeing some issues with only using the RotateKubeletClientCertificate to perform the first CSR (and not the bootstrap flag). Reported issue is that it requires a kubelet restart after doing the CSR to start authenticating as a node. I think the kubelet is opening a connection to the API server with the bootstrapping credentials to perform its CSR, then keeping that connection alive even though it's got a client cert. The kubelet never closes its existing connection to the API server, so it never re-performs a TLS handshake to authenticate using its client cert. Only workaround I see is to have systemd restart the kubelet once it detects the kubelet's written its first certificate to disk. Blah. |
@ericchiang Yes, I can see how that would happen. The underlying connection is only re-established if it is disrupted for some reason. The rotated cert is set immediately, and will be used when the connection is re-established. |
That's concerning... the client and transport built from the bootstrapping credentials shouldn't be reused beyond the bootstrap phase. Once it gets the first client cert, the input to building the API client should be different, so it should get a different tls config, with a new set of keepalive connections. |
oh, by bootstrapping, you don't actually mean |
@jcbsmpsn because the API server authenticates x509 certs at the time of the request, and not on the handshake[1] I think we'll hit the same issue when the kubelet cert expires. E.g. once the kubelet's cert expires, it'll keep the connection open, but the requests will start getting forbidden errors. I think the kubelet's rotation code should shut down its connections once it successfully rotates a cert. That would let it immediately start using the new certs and prevent these kind of issues with long lived connections and old credentials. |
@ericchiang That would be good. I do not think that will be easy. |
3e2f0d2
to
cd0da11
Compare
kubernetes/kubernetes#49899 addresses closing existing connections when the certificate changes. |
@jcbsmpsn yes, I added a temporary commit that switches the kubelet to use a hyperkube with that PR to test. |
Automatic merge from submit-queue (batch tested with PRs 49237, 49656, 49980, 49841, 49899) certificate manager: close existing client conns once cert rotates After the kubelet rotates its client cert, it will keep connections to the API server open indefinitely, causing it to use its old credentials instead of the new certs. Because the API server authenticates client certs at the time of the request, and not the handshake, this could cause the kubelet to start hitting auth failures even if it rotated its certificate to a new, valid one. When the kubelet rotates its cert, close down existing connections to force a new TLS handshake. Ref kubernetes/enhancements#266 Updates kubernetes-retired/bootkube#663 ```release-note After a kubelet rotates its client cert, it now closes its connections to the API server to force a handshake using the new cert. Previously, the kubelet could keep its existing connection open, even if the cert used for that connection was expired and rejected by the API server. ``` /cc @kubernetes/sig-auth-bugs /assign @jcbsmpsn @mikedanese
@diegs this PR has been updated with 1.8 features. It also now enables the node authorizer and admission controller. https://kubernetes.io/docs/admin/authorization/node/ Everything works now except the checkpointer. I think we want to run the checkpointer with a service account and ideally its own RBAC role. Since it's running as a static pod, and wont have access to cc @dghubble @colemickens --- in case either of you want to take a look. edit: am trying to debug the test failures. hack/multi-node does work. |
e0c4ff9
to
e5bcfe8
Compare
pkg/asset/internal/templates.go
Outdated
@@ -56,7 +112,7 @@ spec: | |||
- name: kubelet | |||
image: {{ .Images.Hyperkube }} | |||
command: | |||
- ./hyperkube | |||
- /hyperkube |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
woops, need to remove this. think its a rebase artifact.
…ce-backoff Automatic merge from submit-queue (batch tested with PRs 54773, 52523, 47497, 55356, 49429). If you want to cherry-pick this change to another branch, please follow the instructions <a href="https://github.com/kubernetes/community/blob/master/contributors/devel/cherry-picks.md">here</a>. certificate manager: reduce max backoff from 128s to 32s For TLS bootstrapping in bootkube we run a kubelet with a control plane run through static pods. That static control plane has an API server and controller manager that approve the kubelet's CSR. Since the kubelet has to wait for the static control plane to come up to be approved, we hit this backoff every time and it actually adds a notable overhead to startup times. kubernetes-retired/bootkube#663 If this choice is somewhat arbitrary, I'd like to see it lowered for 1.9. /assign @jcbsmpsn @mikedanese ```release-note NONE ```
Automatic merge from submit-queue (batch tested with PRs 54773, 52523, 47497, 55356, 49429). If you want to cherry-pick this change to another branch, please follow the instructions <a href="https://github.com/kubernetes/community/blob/master/contributors/devel/cherry-picks.md">here</a>. certificate manager: reduce max backoff from 128s to 32s For TLS bootstrapping in bootkube we run a kubelet with a control plane run through static pods. That static control plane has an API server and controller manager that approve the kubelet's CSR. Since the kubelet has to wait for the static control plane to come up to be approved, we hit this backoff every time and it actually adds a notable overhead to startup times. kubernetes-retired/bootkube#663 If this choice is somewhat arbitrary, I'd like to see it lowered for 1.9. /assign @jcbsmpsn @mikedanese ```release-note NONE ``` Kubernetes-commit: 0ff21718d127b9fc14bdfc068624e82fb84e99c2
hack/multi-node/user-data.sample
Outdated
ExecStartPre=/bin/mkdir -p /etc/kubernetes/cni/net.d | ||
ExecStartPre=/bin/mkdir -p /etc/kubernetes/checkpoint-secrets | ||
ExecStartPre=/bin/mkdir -p /etc/kubernetes/inactive-manifests | ||
ExecStartPre=/bin/mkdir -p /etc/kubernetes/kubelet/tls |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Default directory for kubeadm is /etc/kubernetes/pki. Does bootkube care to be the same? Just might help people who use both systems but not a real concern for functionality.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@rothgar I think that's a good suggestion, but tls
is used in a few places in bootkube. Mind opening an issue to suggest aligning with kubeadm? We can address in a separate PR.
Automatic merge from submit-queue (batch tested with PRs 54773, 52523, 47497, 55356, 49429). If you want to cherry-pick this change to another branch, please follow the instructions <a href="https://github.com/kubernetes/community/blob/master/contributors/devel/cherry-picks.md">here</a>. certificate manager: reduce max backoff from 128s to 32s For TLS bootstrapping in bootkube we run a kubelet with a control plane run through static pods. That static control plane has an API server and controller manager that approve the kubelet's CSR. Since the kubelet has to wait for the static control plane to come up to be approved, we hit this backoff every time and it actually adds a notable overhead to startup times. kubernetes-retired/bootkube#663 If this choice is somewhat arbitrary, I'd like to see it lowered for 1.9. /assign @jcbsmpsn @mikedanese ```release-note NONE ``` Kubernetes-commit: 0ff21718d127b9fc14bdfc068624e82fb84e99c2
Automatic merge from submit-queue (batch tested with PRs 54773, 52523, 47497, 55356, 49429). If you want to cherry-pick this change to another branch, please follow the instructions <a href="https://github.com/kubernetes/community/blob/master/contributors/devel/cherry-picks.md">here</a>. certificate manager: reduce max backoff from 128s to 32s For TLS bootstrapping in bootkube we run a kubelet with a control plane run through static pods. That static control plane has an API server and controller manager that approve the kubelet's CSR. Since the kubelet has to wait for the static control plane to come up to be approved, we hit this backoff every time and it actually adds a notable overhead to startup times. kubernetes-retired/bootkube#663 If this choice is somewhat arbitrary, I'd like to see it lowered for 1.9. /assign @jcbsmpsn @mikedanese ```release-note NONE ``` Kubernetes-commit: 0ff21718d127b9fc14bdfc068624e82fb84e99c2
Checkpointer is running in a dedicated service account now, is there anything left [I can help with], which stops this PR from being merged? |
This is actually already done, even without @diegs changes. The changes were manifest only and didn't require any changes to the checkpointer itself. I think we still need to add an approve all rule for rotation[1 (different from bootstrap approval) as well as docs. Will update this when I get the chance. [1] https://kubernetes.io/docs/admin/kubelet-tls-bootstrapping/#kube-controller-manager-configuration |
@@ -38,7 +42,8 @@ coreos: | |||
--lock-file=/var/run/lock/kubelet.lock \ | |||
--network-plugin=cni \ | |||
--node-labels=node-role.kubernetes.io/master \ | |||
--pod-manifest-path=/etc/kubernetes/manifests | |||
--pod-manifest-path=/etc/kubernetes/manifests \ | |||
--rotate-certificates |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
shouldn't it be --bootstrap-kubeconfig
and --require-kubeconfig
as per https://kubernetes.io/docs/admin/kubelet-tls-bootstrapping/#kubelet-configuration ?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is actually intentional. We want to start static pods even if the rotation hasn't completed. --bootstrap-kubeconfig
will block, but --rotate-certificates
wont, even though they both have the same effect.
Still waiting for e2e. If this passes then I think the code bits are done. We still need to do some rotation testing (e.g. if we run a long lived cluster the rotation works) and add some docs. |
@ericchiang any updates on this? |
No I haven't had time to do the short rotation time period testing. |
Found a bug in the rotation code. It's fixed now. Tested rotation with the following diff. We probably want to do something similar during the e2e tests. Maybe as a follow up? --- a/pkg/asset/internal/templates.go
+++ b/pkg/asset/internal/templates.go
@@ -503,6 +503,7 @@ spec:
- --leader-elect=true
- --root-ca-file=/etc/kubernetes/secrets/ca.crt
- --service-account-private-key-file=/etc/kubernetes/secrets/service-account.key
+ - --experimental-cluster-signing-duration=15m
livenessProbe:
httpGet:
path: /healthz
@@ -581,6 +581,7 @@ spec:
- --leader-elect=true
- --root-ca-file=/etc/kubernetes/secrets/ca.crt
- --service-account-private-key-file=/etc/kubernetes/secrets/service-account.key
+ - --experimental-cluster-signing-duration=15m
volumeMounts:
- name: secrets
mountPath: /etc/kubernetes/secrets |
@ericchiang sure, that sounds good, except does that require waiting 15m for the certs to rotate? |
@diegs yes. Maybe during conformance instead? |
@ericchiang SG. also code LGTM but I can't LGTM "my own" PR. |
coreosbot run e2e |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
adding approval on behalf of @diegs
Needs another approval from a member, cc @rphillips @aaronlevy |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Two really small nits that are fine to leave in.
pkg/asset/k8s.go
Outdated
} | ||
|
||
for i, b := range token { | ||
token[i] = validBootstrapTokenChars[int(uint8(b))%len(validBootstrapTokenChars)] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: don't think the uint8() necessary
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Removed.
pkg/asset/k8s.go
Outdated
// newBootstrapToken constructs a bootstrap token in conformance with the following format: | ||
// https://kubernetes.io/docs/admin/bootstrap-tokens/#token-format | ||
func newBootstrapToken() (id string, secret string, err error) { | ||
token := make([]byte, 6+16) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: not immediately clear what 6+16 is (id+secret).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Added comment.
Automatic merge from submit-queue (batch tested with PRs 56290, 57984). If you want to cherry-pick this change to another branch, please follow the instructions <a href="https://github.com/kubernetes/community/blob/master/contributors/devel/cherry-picks.md">here</a>. kubeadm: more random tokens The strategy of hex encoding a random byte array only uses the following characters: 0123456789abcdef Instead of the entire bootstrapping token character set: 0123456789abcdefghijklmnopqrstuvwxyz Update the token generation to use the entire character set. This increases the token secret from 48 bits of entropy to ~82 bits. 256^8 (1.8e+19) vs. 36^16 (7.9e+24). Noticed this writing kubernetes-retired/bootkube#663 ```release-note NONE ``` /cc @mattmoyer @luxas
No description provided.