Skip to content
This repository has been archived by the owner on Jul 30, 2021. It is now read-only.

Add TLS auto-approval for kubelet. #663

Merged
merged 2 commits into from
Jan 9, 2018
Merged

Conversation

diegs
Copy link
Contributor

@diegs diegs commented Jul 27, 2017

No description provided.

@diegs diegs self-assigned this Jul 27, 2017
@k8s-ci-robot k8s-ci-robot added the cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. label Jul 27, 2017
- apiGroups: ["certificates.k8s.io"]
resources: ["certificatesigningrequests/nodeclient"]
verbs: ["create"]
- apiGroups: ["certificates.k8s.io"]
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done

@@ -127,3 +128,21 @@ func NewSignedCertificate(cfg CertConfig, key *rsa.PrivateKey, caCert *x509.Cert
}
return x509.ParseCertificate(certDERBytes)
}

// NewRandomToken generates an n-byte random hexadecimal token. The output will have length n*2.
func NewRandomToken(n int) ([]byte, error) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: there's actually a defined format for another authenticator. We might just want to copy that for now: https://kubernetes.io/docs/admin/bootstrap-tokens/#token-format

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done

pkg/asset/tls.go Outdated
// TLS organizations map to Kubernetes groups, and "system:masters"
// is a well-known Kubernetes group that gives a user admin power.
//
// For now, put the kubelets in this group. Later we can restrict
// their credentials, likely with the help of TLS bootstrapping.
// For now, put the admins in this group.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We can remove this comment. We're not going to change this group ever.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done

@ericchiang
Copy link
Contributor

Seeing some issues with only using the RotateKubeletClientCertificate to perform the first CSR (and not the bootstrap flag). Reported issue is that it requires a kubelet restart after doing the CSR to start authenticating as a node.

I think the kubelet is opening a connection to the API server with the bootstrapping credentials to perform its CSR, then keeping that connection alive even though it's got a client cert. The kubelet never closes its existing connection to the API server, so it never re-performs a TLS handshake to authenticate using its client cert.

Only workaround I see is to have systemd restart the kubelet once it detects the kubelet's written its first certificate to disk. Blah.

@jcbsmpsn
Copy link

@ericchiang Yes, I can see how that would happen. The underlying connection is only re-established if it is disrupted for some reason. The rotated cert is set immediately, and will be used when the connection is re-established.

@liggitt
Copy link

liggitt commented Jul 28, 2017

I think the kubelet is opening a connection to the API server with the bootstrapping credentials to perform its CSR, then keeping that connection alive even though it's got a client cert.

That's concerning... the client and transport built from the bootstrapping credentials shouldn't be reused beyond the bootstrap phase. Once it gets the first client cert, the input to building the API client should be different, so it should get a different tls config, with a new set of keepalive connections.

@liggitt
Copy link

liggitt commented Jul 28, 2017

oh, by bootstrapping, you don't actually mean --bootstrap... you mean the initial content of --kubeconfig

@ericchiang
Copy link
Contributor

@jcbsmpsn because the API server authenticates x509 certs at the time of the request, and not on the handshake[1] I think we'll hit the same issue when the kubelet cert expires. E.g. once the kubelet's cert expires, it'll keep the connection open, but the requests will start getting forbidden errors.

I think the kubelet's rotation code should shut down its connections once it successfully rotates a cert. That would let it immediately start using the new certs and prevent these kind of issues with long lived connections and old credentials.

[1] https://github.com/kubernetes/kubernetes/blob/v1.7.2/staging/src/k8s.io/apiserver/pkg/authentication/request/x509/x509.go

@jcbsmpsn
Copy link

@ericchiang That would be good. I do not think that will be easy.

@jcbsmpsn
Copy link

jcbsmpsn commented Aug 2, 2017

kubernetes/kubernetes#49899 addresses closing existing connections when the certificate changes.

@diegs
Copy link
Contributor Author

diegs commented Aug 2, 2017

@jcbsmpsn yes, I added a temporary commit that switches the kubelet to use a hyperkube with that PR to test.

k8s-github-robot pushed a commit to kubernetes/kubernetes that referenced this pull request Aug 3, 2017
Automatic merge from submit-queue (batch tested with PRs 49237, 49656, 49980, 49841, 49899)

certificate manager: close existing client conns once cert rotates

After the kubelet rotates its client cert, it will keep connections to the API server open indefinitely, causing it to use its old credentials instead of the new certs. Because the API server authenticates client certs at the time of the request, and not the handshake, this could cause the kubelet to start hitting auth failures even if it rotated its certificate to a new, valid one.
    
When the kubelet rotates its cert, close down existing connections to force a new TLS handshake.

Ref kubernetes/enhancements#266
Updates kubernetes-retired/bootkube#663

```release-note
After a kubelet rotates its client cert, it now closes its connections to the API server to force a handshake using the new cert. Previously, the kubelet could keep its existing connection open, even if the cert used for that connection was expired and rejected by the API server.
```

/cc @kubernetes/sig-auth-bugs 
/assign @jcbsmpsn @mikedanese
@k8s-ci-robot k8s-ci-robot added the size/L Denotes a PR that changes 100-499 lines, ignoring generated files. label Nov 7, 2017
@ericchiang
Copy link
Contributor

ericchiang commented Nov 7, 2017

@diegs this PR has been updated with 1.8 features. It also now enables the node authorizer and admission controller.

https://kubernetes.io/docs/admin/authorization/node/
https://kubernetes.io/docs/admin/admission-controllers/#noderestriction

Everything works now except the checkpointer. I think we want to run the checkpointer with a service account and ideally its own RBAC role. Since it's running as a static pod, and wont have access to KUBERNETES_SERVICE_HOST and KUBERNETES_SERVICE_PORT, we probably want to do something similar the kube-proxy-kubeconfig in this PR.

cc @dghubble @colemickens --- in case either of you want to take a look.

edit: am trying to debug the test failures. hack/multi-node does work.

@@ -56,7 +112,7 @@ spec:
- name: kubelet
image: {{ .Images.Hyperkube }}
command:
- ./hyperkube
- /hyperkube
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

woops, need to remove this. think its a rebase artifact.

hh pushed a commit to ii/kubernetes that referenced this pull request Nov 9, 2017
…ce-backoff

Automatic merge from submit-queue (batch tested with PRs 54773, 52523, 47497, 55356, 49429). If you want to cherry-pick this change to another branch, please follow the instructions <a href="https://github.com/kubernetes/community/blob/master/contributors/devel/cherry-picks.md">here</a>.

certificate manager: reduce max backoff from 128s to 32s

For TLS bootstrapping in bootkube we run a kubelet with a control plane run through static pods. That static control plane has an API server and controller manager that approve the kubelet's CSR.

Since the kubelet has to wait for the static control plane to come up to be approved, we hit this backoff every time and it actually adds a notable overhead to startup times.

kubernetes-retired/bootkube#663

If this choice is somewhat arbitrary, I'd like to see it lowered for 1.9.

/assign @jcbsmpsn @mikedanese 

```release-note
NONE
```
sttts pushed a commit to sttts/client-go that referenced this pull request Nov 9, 2017
Automatic merge from submit-queue (batch tested with PRs 54773, 52523, 47497, 55356, 49429). If you want to cherry-pick this change to another branch, please follow the instructions <a href="https://github.com/kubernetes/community/blob/master/contributors/devel/cherry-picks.md">here</a>.

certificate manager: reduce max backoff from 128s to 32s

For TLS bootstrapping in bootkube we run a kubelet with a control plane run through static pods. That static control plane has an API server and controller manager that approve the kubelet's CSR.

Since the kubelet has to wait for the static control plane to come up to be approved, we hit this backoff every time and it actually adds a notable overhead to startup times.

kubernetes-retired/bootkube#663

If this choice is somewhat arbitrary, I'd like to see it lowered for 1.9.

/assign @jcbsmpsn @mikedanese

```release-note
NONE
```

Kubernetes-commit: 0ff21718d127b9fc14bdfc068624e82fb84e99c2
ExecStartPre=/bin/mkdir -p /etc/kubernetes/cni/net.d
ExecStartPre=/bin/mkdir -p /etc/kubernetes/checkpoint-secrets
ExecStartPre=/bin/mkdir -p /etc/kubernetes/inactive-manifests
ExecStartPre=/bin/mkdir -p /etc/kubernetes/kubelet/tls
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Default directory for kubeadm is /etc/kubernetes/pki. Does bootkube care to be the same? Just might help people who use both systems but not a real concern for functionality.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@rothgar I think that's a good suggestion, but tls is used in a few places in bootkube. Mind opening an issue to suggest aligning with kubeadm? We can address in a separate PR.

k8s-publishing-bot pushed a commit to k8s-publishing-bot/client-go that referenced this pull request Nov 29, 2017
Automatic merge from submit-queue (batch tested with PRs 54773, 52523, 47497, 55356, 49429). If you want to cherry-pick this change to another branch, please follow the instructions <a href="https://github.com/kubernetes/community/blob/master/contributors/devel/cherry-picks.md">here</a>.

certificate manager: reduce max backoff from 128s to 32s

For TLS bootstrapping in bootkube we run a kubelet with a control plane run through static pods. That static control plane has an API server and controller manager that approve the kubelet's CSR.

Since the kubelet has to wait for the static control plane to come up to be approved, we hit this backoff every time and it actually adds a notable overhead to startup times.

kubernetes-retired/bootkube#663

If this choice is somewhat arbitrary, I'd like to see it lowered for 1.9.

/assign @jcbsmpsn @mikedanese

```release-note
NONE
```

Kubernetes-commit: 0ff21718d127b9fc14bdfc068624e82fb84e99c2
k8s-publishing-bot pushed a commit to k8s-publishing-bot/client-go that referenced this pull request Dec 7, 2017
Automatic merge from submit-queue (batch tested with PRs 54773, 52523, 47497, 55356, 49429). If you want to cherry-pick this change to another branch, please follow the instructions <a href="https://github.com/kubernetes/community/blob/master/contributors/devel/cherry-picks.md">here</a>.

certificate manager: reduce max backoff from 128s to 32s

For TLS bootstrapping in bootkube we run a kubelet with a control plane run through static pods. That static control plane has an API server and controller manager that approve the kubelet's CSR.

Since the kubelet has to wait for the static control plane to come up to be approved, we hit this backoff every time and it actually adds a notable overhead to startup times.

kubernetes-retired/bootkube#663

If this choice is somewhat arbitrary, I'd like to see it lowered for 1.9.

/assign @jcbsmpsn @mikedanese

```release-note
NONE
```

Kubernetes-commit: 0ff21718d127b9fc14bdfc068624e82fb84e99c2
@redbaron
Copy link
Contributor

redbaron commented Dec 11, 2017

Checkpointer is running in a dedicated service account now, is there anything left [I can help with], which stops this PR from being merged?

@diegs
Copy link
Contributor Author

diegs commented Dec 11, 2017

@redbaron I am wrapping up #784 which makes the tests better and changes the checkpointer to only look at its own namespace, which relaxes the required roll bindings. Hoping to finish that up in the next day or two.

@ericchiang
Copy link
Contributor

Checkpointer is running in a dedicated service account now, is there anything left [I can help with], which stops this PR from being merged?

This is actually already done, even without @diegs changes. The changes were manifest only and didn't require any changes to the checkpointer itself.

I think we still need to add an approve all rule for rotation[1 (different from bootstrap approval) as well as docs. Will update this when I get the chance.

[1] https://kubernetes.io/docs/admin/kubelet-tls-bootstrapping/#kube-controller-manager-configuration

@@ -38,7 +42,8 @@ coreos:
--lock-file=/var/run/lock/kubelet.lock \
--network-plugin=cni \
--node-labels=node-role.kubernetes.io/master \
--pod-manifest-path=/etc/kubernetes/manifests
--pod-manifest-path=/etc/kubernetes/manifests \
--rotate-certificates
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

shouldn't it be --bootstrap-kubeconfig and --require-kubeconfig as per https://kubernetes.io/docs/admin/kubelet-tls-bootstrapping/#kubelet-configuration ?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is actually intentional. We want to start static pods even if the rotation hasn't completed. --bootstrap-kubeconfig will block, but --rotate-certificates wont, even though they both have the same effect.

@ericchiang
Copy link
Contributor

Still waiting for e2e. If this passes then I think the code bits are done.

We still need to do some rotation testing (e.g. if we run a long lived cluster the rotation works) and add some docs.

@diegs diegs changed the title WIP: Add TLS auto-approval for kubelet. Add TLS auto-approval for kubelet. Dec 18, 2017
@aaronlevy
Copy link
Contributor

@ericchiang any updates on this?

@ericchiang
Copy link
Contributor

No I haven't had time to do the short rotation time period testing.

@ericchiang
Copy link
Contributor

ericchiang commented Jan 8, 2018

Found a bug in the rotation code. It's fixed now.

Tested rotation with the following diff. We probably want to do something similar during the e2e tests. Maybe as a follow up?

--- a/pkg/asset/internal/templates.go               
+++ b/pkg/asset/internal/templates.go               
@@ -503,6 +503,7 @@ spec:                           
         - --leader-elect=true                      
         - --root-ca-file=/etc/kubernetes/secrets/ca.crt                                                 
         - --service-account-private-key-file=/etc/kubernetes/secrets/service-account.key                
+        - --experimental-cluster-signing-duration=15m                                                   
         livenessProbe:                             
           httpGet:                                 
             path: /healthz                                       
@@ -581,6 +581,7 @@ spec:                           
     - --leader-elect=true                          
     - --root-ca-file=/etc/kubernetes/secrets/ca.crt                                                     
     - --service-account-private-key-file=/etc/kubernetes/secrets/service-account.key                    
+    - --experimental-cluster-signing-duration=15m  
     volumeMounts:                                  
     - name: secrets                                
       mountPath: /etc/kubernetes/secrets

@diegs
Copy link
Contributor Author

diegs commented Jan 8, 2018

@ericchiang sure, that sounds good, except does that require waiting 15m for the certs to rotate?

@ericchiang
Copy link
Contributor

@diegs yes. Maybe during conformance instead?

@diegs
Copy link
Contributor Author

diegs commented Jan 8, 2018

@ericchiang SG. also code LGTM but I can't LGTM "my own" PR.

@ericchiang
Copy link
Contributor

coreosbot run e2e

Copy link
Contributor

@ericchiang ericchiang left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

adding approval on behalf of @diegs

@diegs
Copy link
Contributor Author

diegs commented Jan 8, 2018

Needs another approval from a member, cc @rphillips @aaronlevy

Copy link
Contributor

@aaronlevy aaronlevy left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Two really small nits that are fine to leave in.

pkg/asset/k8s.go Outdated
}

for i, b := range token {
token[i] = validBootstrapTokenChars[int(uint8(b))%len(validBootstrapTokenChars)]
Copy link
Contributor

@aaronlevy aaronlevy Jan 9, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: don't think the uint8() necessary

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Removed.

pkg/asset/k8s.go Outdated
// newBootstrapToken constructs a bootstrap token in conformance with the following format:
// https://kubernetes.io/docs/admin/bootstrap-tokens/#token-format
func newBootstrapToken() (id string, secret string, err error) {
token := make([]byte, 6+16)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: not immediately clear what 6+16 is (id+secret).

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Added comment.

@diegs diegs merged commit 38a5af6 into kubernetes-retired:master Jan 9, 2018
@diegs diegs deleted the tls branch January 9, 2018 03:37
k8s-github-robot pushed a commit to kubernetes/kubernetes that referenced this pull request Jan 9, 2018
Automatic merge from submit-queue (batch tested with PRs 56290, 57984). If you want to cherry-pick this change to another branch, please follow the instructions <a href="https://github.com/kubernetes/community/blob/master/contributors/devel/cherry-picks.md">here</a>.

kubeadm: more random tokens

The strategy of hex encoding a random byte array only uses the
following characters:

	0123456789abcdef

Instead of the entire bootstrapping token character set:

	0123456789abcdefghijklmnopqrstuvwxyz

Update the token generation to use the entire character set. This
increases the token secret from 48 bits of entropy to ~82 bits.

256^8 (1.8e+19) vs. 36^16 (7.9e+24).

Noticed this writing kubernetes-retired/bootkube#663

```release-note
NONE
```

/cc @mattmoyer @luxas
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. size/L Denotes a PR that changes 100-499 lines, ignoring generated files.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

9 participants