Skip to content

Wait for kubelet port to be ready before setting#7041

Merged
dereknola merged 2 commits into
k3s-io:masterfrom
StrongMonkey:fix-kubelet-port
Mar 13, 2023
Merged

Wait for kubelet port to be ready before setting#7041
dereknola merged 2 commits into
k3s-io:masterfrom
StrongMonkey:fix-kubelet-port

Conversation

@StrongMonkey

@StrongMonkey StrongMonkey commented Mar 9, 2023

Copy link
Copy Markdown
Contributor

Proposed Changes

When using k3s with karpenter, we have been seeing issues that we are not able to exec and log into pods that lands into karpenter nodes.

When digging further, we have found out that every time we made exec and log requests, k3s agent returns the following log

Tunnel authorizer checking dial request for 127.0.0.1:10250"
Mar 09 17:36:42 ip-10-0-0-125 k3s[15315]: time="2023-03-09T17:36:42Z" level=error msg="Remotedialer proxy error" error="connect not allowed"

When k3s-agent starts, we've seen that it was setting kubelet port to 0, which is not correct.

Tunnel authorizer set Kubelet Port 0

I think what's happening in here is that when using k3s agent with karpenter, the node resource was precreated by karpenter. So agent boots up, it already sees the node but at this point the kubelet port might not be populated back since k3s-agent is still starting. So it just sets the zero port.

Types of Changes

In order to solve this bug, we need to make sure the we only set the port when port is ready(great than 0). This will prevent k3s agent from validating the wrong port and rejecting exec and log request.

Verification

Testing this with aws-karpenter

Testing

Linked Issues

User-Facing Change

The agent tunnel authorizer now waits for the kubelet to be ready before reading the kubelet port from the node object.

Further Comments

Signed-off-by: Daishan Peng <daishan@acorn.io>
@StrongMonkey StrongMonkey requested a review from a team as a code owner March 9, 2023 18:54
@brandond

brandond commented Mar 9, 2023

Copy link
Copy Markdown
Member

when using k3s agent with karpenter, the node resource was precreated by karpenter.

That is interesting, I'm not aware of any other projects that pre-create a node object before starting the Kubelet. Can you explain more about why it's doing this?

@StrongMonkey

Copy link
Copy Markdown
Contributor Author

@brandond I think it has to do with our integration when using k3s with karpenter and aws cloud manager controller.

What's happening here is that when the pod goes into unschedulable state, the karpenter will create a k8s node first then under the hood also calls AWS to create a node with the userdata we provide to spin up k3s-agent service. So when k3s-agent starts up, the node is already there so k3s-agent will always set kubelet port to zero before registering itself, so it always fails on validations for exec and log.

@brandond

brandond commented Mar 9, 2023

Copy link
Copy Markdown
Member

Ah, that's an interesting workflow. I didn't count on there being any cases where we'd get a Node object that wasn't properly set up by the kubelet. I wonder if we handle this properly when changing the kubelet port; it's possible we don't and would use the value from the last time the kubelet was up, instead of getting the new value.

@brandond

brandond commented Mar 9, 2023

Copy link
Copy Markdown
Member

I just confirmed that we do have a race condition here when the kubelet port is changed. If I restart a node with --kubelet-arg=port=11250 we still use the original port, as this runs before the kubelet has updated the setting on the node object. So just waiting for it to be non-zero doesn't address the full scope of the problem.

@brandond

Copy link
Copy Markdown
Member

I added another tweak to wait for the kubelet to update the ready status before checking the port; this should ensure that we're seeing the most recent update to the node.

Signed-off-by: Brad Davidson <brad.davidson@rancher.com>
@cjellick

Copy link
Copy Markdown

Thanks for the quick reviews @brandond @dereknola 👍

@cjellick

Copy link
Copy Markdown

Think this can get backported to 1.25? 😅

@dereknola

Copy link
Copy Markdown
Member

For sure! We are backporting to 1.25 and 1.24 for March releases. We are in Feb code freeze right now, I will merge this when lifted.

@dereknola dereknola merged commit b7f90f3 into k3s-io:master Mar 13, 2023
@mdrahman-suse

Copy link
Copy Markdown

@StrongMonkey / @brandond Can we get some testing steps to replicate and validate this issue?
CC @ShylajaDevadiga @VestigeJ

@brandond

brandond commented Mar 22, 2023

Copy link
Copy Markdown
Member

@mdrahman-suse See #7041 (comment)

If I restart a node with --kubelet-arg=port=11250 we still use the original port

So, start K3s, then restart using that flag to change the kubelet port, and check the logs to see what kubelet port is reported. It should be the currently configured port, not the port that was used during the previous startup.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants