-
Notifications
You must be signed in to change notification settings - Fork 220
Disable Kubelet read-only port 10255 #1025
Disable Kubelet read-only port 10255 #1025
Conversation
Can one of the admins verify this patch? |
1 similar comment
Can one of the admins verify this patch? |
Can one of the admins verify this patch? |
[APPROVALNOTIFIER] This PR is NOT APPROVED This pull-request has been approved by: If they are not already assigned, you can assign the PR to them by writing The full list of commands accepted by this bot can be found here. The pull request process is described here
Needs approval from an approver in each of these files:
Approvers can indicate their approval by writing |
I expect I'll need to split the checkpointer change and user-data change for tests to pass. But manually testing on clusters with
|
Thanks Dalton for looking into this. We just had a conversation about this exact issue. I am worried about the certificate rotation needed for the checkpointer -> kubelet secure connection. There is an edge case where if a node is down for a period of time, comes back, the cert might be expired. Not to mention needing to maintain the cert rotation on that connection. We are definitely on board to try and remove the insecure port requirement. |
Ah right. bootkube supports using kubelet TLS bootstrap so the concern is if there's no apiserver (e.g. the cluster is powered off) when the certificate expires, then on startup, kubelet starts pod-checkpointer which may not make it far enough* to start the bootstrap-apiserver, to issue the new cert so the kubelet can register. So given the choice, we prefer to at least fallback to 10255. I can post a standalone PR for the checkpointer to try 10250, then fallback to 10255. *I'm not sure how that can happen, since clusters that disable read-only today, pod-checkpointer can't contact anything. Yet control plane recovery succeeds. |
Added the ClusterRole and ClusterRoleBinding to give checkpointer permission to perform requests previously done via the kubelet read-only API. Requires #1027 and a checkpointer release before this can pass tests. |
@dghubble can you update the checkpointer hash to: |
Updated, and opened #1031 with just the image change, since we might consider it separate from the feature change of disabling read-only in test / example clusters. Release Note:
|
/ok-to-test |
|
* Updates pod-checkpointer to prefer the Kubelet secure API (before falling back to the Kubelet read-only API that is disabled on Typhoon clusters since poseidon/typhoon#324) * Previously, pod-checkpointer checkpointed an initial set of pods during bootstrapping so recovery from power cycling clusters was unaffected, but logs were noisy * kubernetes-retired/bootkube#1027 * kubernetes-retired/bootkube#1025
* Updates pod-checkpointer to prefer the Kubelet secure API (before falling back to the Kubelet read-only API that is disabled on Typhoon clusters since #324) * Previously, pod-checkpointer checkpointed an initial set of pods during bootstrapping so recovery from power cycling clusters was unaffected, but logs were noisy * kubernetes-retired/bootkube#1027 * kubernetes-retired/bootkube#1025
Gah, |
* Updates pod-checkpointer to prefer the Kubelet secure API (before falling back to the Kubelet read-only API that is disabled on Typhoon clusters since poseidon/typhoon#324) * Previously, pod-checkpointer checkpointed an initial set of pods during bootstrapping so recovery from power cycling clusters was unaffected, but logs were noisy * kubernetes-retired/bootkube#1027 * kubernetes-retired/bootkube#1025
* Updates pod-checkpointer to prefer the Kubelet secure API (before falling back to the Kubelet read-only API that is disabled on Typhoon clusters since poseidon/typhoon#324) * Previously, pod-checkpointer checkpointed an initial set of pods during bootstrapping so recovery from power cycling clusters was unaffected, but logs were noisy * kubernetes-retired/bootkube#1027 * kubernetes-retired/bootkube#1025
* Updates pod-checkpointer to prefer the Kubelet secure API (before falling back to the Kubelet read-only API that is disabled on Typhoon clusters since poseidon/typhoon#324) * Previously, pod-checkpointer checkpointed an initial set of pods during bootstrapping so recovery from power cycling clusters was unaffected, but logs were noisy * kubernetes-retired/bootkube#1027 * kubernetes-retired/bootkube#1025
* Updates pod-checkpointer to prefer the Kubelet secure API (before falling back to the Kubelet read-only API that is disabled on Typhoon clusters since poseidon/typhoon#324) * Previously, pod-checkpointer checkpointed an initial set of pods during bootstrapping so recovery from power cycling clusters was unaffected, but logs were noisy * kubernetes-retired/bootkube#1027 * kubernetes-retired/bootkube#1025
coreosbot run e2e calico |
@@ -24,5 +24,4 @@ The information below describes a minimum set of port allocations used by Kubern | |||
| TCP | 4194 | Master & Worker Nodes | The port of the localhost cAdvisor endpoint | | |||
| UDP | 4789 | Master & Worker Nodes | flannel overlay network - *vxlan backend* | | |||
| TCP | 10250 | Master Nodes | Worker node Kubelet API for exec and logs. | | |||
| TCP | 10255 | Master & Worker Nodes | Worker node read-only Kubelet API (Heapster). | |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We should leave this network requirement in but label it 'optional'. otherwise this lgtm and tests are passing.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Updated
Did we want to add just bump the checkpointer via #1031, then rebase this to follow it? Since the two are somewhat independent. 🤷♂️ either way works for me |
@dghubble Thanks! We can rebase this PR now. |
* Add ClusterRole and ClusterRoleBinding to give checkpointer permission to perform requests previously done via the kubelet read-only API
Updated |
coreosbot run e2e calico |
* Updates pod-checkpointer to prefer the Kubelet secure API (before falling back to the Kubelet read-only API that is disabled on Typhoon clusters since poseidon#324) * Previously, pod-checkpointer checkpointed an initial set of pods during bootstrapping so recovery from power cycling clusters was unaffected, but logs were noisy * kubernetes-retired/bootkube#1027 * kubernetes-retired/bootkube#1025
Issues go stale after 90d of inactivity. If this issue is safe to close now please do so with Send feedback to sig-testing, kubernetes/test-infra and/or fejta. |
Stale issues rot after 30d of inactivity. If this issue is safe to close now please do so with Send feedback to sig-testing, kubernetes/test-infra and/or fejta. |
Rotten issues close after 30d of inactivity. Send feedback to sig-testing, kubernetes/test-infra and/or fejta. |
@fejta-bot: Closed this PR. In response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. |
* Updates pod-checkpointer to prefer the Kubelet secure API (before falling back to the Kubelet read-only API that is disabled on Typhoon clusters since poseidon/typhoon#324) * Previously, pod-checkpointer checkpointed an initial set of pods during bootstrapping so recovery from power cycling clusters was unaffected, but logs were noisy * kubernetes-retired/bootkube#1027 * kubernetes-retired/bootkube#1025
pod-checkpointer
creates aninsecureClient
andsecureClient
to the kubelet, but only makes use of the former (via 10255). AdaptlocalParentPods
to try to use the secure client first, then the insecure client to support clusters that disable the kubelet read-only port.Open questions:
Background
Today in bootkube, the kubelet read-only port (10255) is enabled and pod-checkpointer uses its
/pods
endpoint to get all pods (later filters to find parent pods). That all works fine.We've come a long way toward eliminating the read-only port. Cloud load balancers can now health check the apiserver, Prometheus can use the kubelet secure API to scrape metrics, and heapster can get metrics from the kubelet secure API too.
Running clusters with kubelet
read-only-port=0
(disabled), the active pod-checkpointer will log:Interestingly, even with localParentPods calls failing on these clusters, recovery from cluster power cycling is unaffected. But I figure if we can eliminate this one last use of the read-only API, all the better.