Wait for kubelet port to be ready before setting#7041
Conversation
Signed-off-by: Daishan Peng <daishan@acorn.io>
That is interesting, I'm not aware of any other projects that pre-create a node object before starting the Kubelet. Can you explain more about why it's doing this? |
|
@brandond I think it has to do with our integration when using k3s with karpenter and aws cloud manager controller. What's happening here is that when the pod goes into unschedulable state, the karpenter will create a k8s node first then under the hood also calls AWS to create a node with the userdata we provide to spin up k3s-agent service. So when k3s-agent starts up, the node is already there so k3s-agent will always set kubelet port to zero before registering itself, so it always fails on validations for exec and log. |
|
Ah, that's an interesting workflow. I didn't count on there being any cases where we'd get a Node object that wasn't properly set up by the kubelet. I wonder if we handle this properly when changing the kubelet port; it's possible we don't and would use the value from the last time the kubelet was up, instead of getting the new value. |
|
I just confirmed that we do have a race condition here when the kubelet port is changed. If I restart a node with |
|
I added another tweak to wait for the kubelet to update the ready status before checking the port; this should ensure that we're seeing the most recent update to the node. |
b7efe08 to
6fc9370
Compare
Signed-off-by: Brad Davidson <brad.davidson@rancher.com>
6fc9370 to
07878a1
Compare
|
Thanks for the quick reviews @brandond @dereknola 👍 |
|
Think this can get backported to 1.25? 😅 |
|
For sure! We are backporting to 1.25 and 1.24 for March releases. We are in Feb code freeze right now, I will merge this when lifted. |
|
@StrongMonkey / @brandond Can we get some testing steps to replicate and validate this issue? |
|
@mdrahman-suse See #7041 (comment)
So, start K3s, then restart using that flag to change the kubelet port, and check the logs to see what kubelet port is reported. It should be the currently configured port, not the port that was used during the previous startup. |
Proposed Changes
When using k3s with karpenter, we have been seeing issues that we are not able to exec and log into pods that lands into karpenter nodes.
When digging further, we have found out that every time we made exec and log requests, k3s agent returns the following log
When k3s-agent starts, we've seen that it was setting kubelet port to 0, which is not correct.
I think what's happening in here is that when using k3s agent with karpenter, the node resource was precreated by karpenter. So agent boots up, it already sees the node but at this point the kubelet port might not be populated back since k3s-agent is still starting. So it just sets the zero port.
Types of Changes
In order to solve this bug, we need to make sure the we only set the port when port is ready(great than 0). This will prevent k3s agent from validating the wrong port and rejecting exec and log request.
Verification
Testing this with aws-karpenter
Testing
Linked Issues
User-Facing Change
Further Comments