Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

bug: auto-refresh tokens in a kube config file #5699

Closed
wants to merge 1 commit into from
Closed

bug: auto-refresh tokens in a kube config file #5699

wants to merge 1 commit into from

Conversation

Skarlso
Copy link

@Skarlso Skarlso commented Apr 20, 2023

What type of PR is this?

/kind bug

What this PR does / why we need it:

Continuously update the token provided by a kube config file.

Which issue(s) this PR fixes:

Fixes #4784

Special notes for your reviewer:

Does this PR introduce a user-facing change?


Additional documentation e.g., KEPs (Kubernetes Enhancement Proposals), usage docs, etc.:


@k8s-ci-robot k8s-ci-robot added do-not-merge/work-in-progress Indicates that a PR should not merge because it is a work in progress. kind/bug Categorizes issue or PR as related to a bug. cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. labels Apr 20, 2023
@k8s-ci-robot
Copy link
Contributor

[APPROVALNOTIFIER] This PR is NOT APPROVED

This pull-request has been approved by: Skarlso
Once this PR has been reviewed and has the lgtm label, please assign elmiko for approval. For more information see the Kubernetes Code Review Process.

The full list of commands accepted by this bot can be found here.

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@k8s-ci-robot k8s-ci-robot requested review from arunmk and elmiko April 20, 2023 20:09
@k8s-ci-robot k8s-ci-robot added area/cluster-autoscaler size/M Denotes a PR that changes 30-99 lines, ignoring generated files. labels Apr 20, 2023
Copy link
Contributor

@elmiko elmiko left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@Skarlso , i studied the code and i don't necessarily know the client-go parts super well, but while looking through the client-go code i came across a section i am curious about.

this function https://github.com/kubernetes/client-go/blob/master/transport/round_trippers.go#L39-L69 should be called during the managementClient, err := dynamic.NewForConfig(managementConfig) portion of our builder code, but i don't understand why that reference to NewBearerAuthWithRefreshRoundTripper is not being used as the transport for our client.

it seems to me like the dynamic.NewForConfig should use the same tranport as specified here if the kubeconfig contains the bearer token. any thoughts? (and apologies if you covered this in the conversation with @liggitt , i know this stuff is complex)

thanks to @stlaz for helping me understand the client-go stuff

@Skarlso
Copy link
Author

Skarlso commented May 24, 2023

@elmiko If I look at the code, I see in that dynamic.NewConfig that it sets up the roundtripper through the config here:

// TransportFor returns an http.RoundTripper that will provide the authentication
// or transport level security defined by the provided Config. Will return the
// default http.DefaultTransport if no special case behavior is needed.
func TransportFor(config *Config) (http.RoundTripper, error) {
	cfg, err := config.TransportConfig()
	if err != nil {
		return nil, err
	}
	return transport.New(cfg)
}

I think that's the magic that will make it refresh, as far as I understand this thing. :/

@Skarlso
Copy link
Author

Skarlso commented May 24, 2023

Especially this part:

// TransportConfig converts a client config to an appropriate transport config.
func (c *Config) TransportConfig() (*transport.Config, error) {
	conf := &transport.Config{
		UserAgent:          c.UserAgent,
		Transport:          c.Transport,

Where it will set Transport: c.Transport from the config. I think that's what we are looking for, right?

@elmiko
Copy link
Contributor

elmiko commented May 24, 2023

Where it will set Transport: c.Transport from the config. I think that's what we are looking for, right?

when i was looking through the code path we have, i would have expected c.Transport to be nil at this point, which later i would expect to get overwritten by transport.New(cfg) call (from this section https://github.com/kubernetes/client-go/blob/master/transport/transport.go#L51-L60).

at the end of https://github.com/kubernetes/client-go/blob/master/transport/transport.go#L51-L60 , the call to HTTPWrapperForConfig seems like it should then configure the transport with the refreshing bearer token style. i was wondering if there is something about the kubeconfig that is causing it not to use that transport (eg something in your config is causing it to skip https://github.com/kubernetes/client-go/blob/master/transport/round_trippers.go#L50-L55)

edit: mind you, i'm not positing this as fact, i am just trying to understand what is happening.

@Skarlso
Copy link
Author

Skarlso commented May 24, 2023

Yeah gotcha. I will have to re-read this to understand the context. :D

@Skarlso
Copy link
Author

Skarlso commented May 24, 2023

Just rebased...

@Skarlso
Copy link
Author

Skarlso commented May 24, 2023

@elmiko So.... you're saying that this should have already worked, but for some reason it doesn't?

@elmiko
Copy link
Contributor

elmiko commented May 24, 2023

that's my understanding after reading through all the code in client-go (and talking with @stlaz). it seems like this should be the default behavior when a bearer token is present in the kubeconfig.

@Skarlso
Copy link
Author

Skarlso commented May 24, 2023

yeah, I see that now. The only explanation would be if the transporter is not nil but I highly doubt that. 🤔

@Skarlso
Copy link
Author

Skarlso commented May 24, 2023

/cc @richardcase maybe you could shed a better light on this scenario? How does a CAPA kubeconfig look like that makes this stop working? :D

@Skarlso
Copy link
Author

Skarlso commented May 24, 2023

@elmiko Dude.

Because the workload cluster isn't using the dynamic client. ;)

	workloadClient, err := kubernetes.NewForConfig(workloadConfig)
	if err != nil {
		klog.Fatalf("create kube clientset failed: %v", err)
	}

And that kubernetes.NewForConfig isn't using any wrappers as far as I can find.

@Skarlso
Copy link
Author

Skarlso commented May 24, 2023

It's just using HTTPClientFor which isn't using any wrapping:

func NewForConfig(c *rest.Config) (*Clientset, error) {
	configShallowCopy := *c

	if configShallowCopy.UserAgent == "" {
		configShallowCopy.UserAgent = rest.DefaultKubernetesUserAgent()
	}

	// share the transport between all clients
	httpClient, err := rest.HTTPClientFor(&configShallowCopy)
	if err != nil {
		return nil, err
	}

	return NewForConfigAndClient(&configShallowCopy, httpClient)
}

this might explain why the management cluster refreshes but the workload cluster doesn't?

@Skarlso
Copy link
Author

Skarlso commented May 24, 2023

Nevermind. HTTPClientFor does use transport, err := TransportFor(config) which does call return transport.New(cfg) which calls the wrapping. Argh. :/ I have no idea then what's going on.

@Skarlso
Copy link
Author

Skarlso commented May 24, 2023

BTW, if Auth header is set it won't do the refresh anymore. And now I remember something about that header... :D

func (rt *bearerAuthRoundTripper) RoundTrip(req *http.Request) (*http.Response, error) {
	if len(req.Header.Get("Authorization")) != 0 {
		return rt.rt.RoundTrip(req)
	}

	req = utilnet.CloneRequest(req)
	token := rt.bearer
	if rt.source != nil {
		if refreshedToken, err := rt.source.Token(); err == nil {
			token = refreshedToken.AccessToken
		}
	}
	req.Header.Set("Authorization", fmt.Sprintf("Bearer %s", token))
	return rt.rt.RoundTrip(req)
}

Edit: What I remembered was about headers being copied.

However, this is interesting. So once Authorization is set, it won't refresh again if I read this correctly?

@elmiko
Copy link
Contributor

elmiko commented May 25, 2023

thanks for all the investigations @Skarlso , this is a really tricky bit to figure out.

However, this is interesting. So once Authorization is set, it won't refresh again if I read this correctly?

good question, have you tried running the autoscaler with --v=11? you should be able to see all the http traffic with that level of verbosity.

@Skarlso
Copy link
Author

Skarlso commented May 25, 2023

@elmiko will do that.

I did some asking around and @liggitt answered that it's probably receiving these without the auth header mostly.

But I will have to run tests for this. I'm trying to get our people to give me some kind of environment where they are observing this behaviour so I can reproduce it consistently.

@elmiko
Copy link
Contributor

elmiko commented May 25, 2023

sounds good, thanks again for all the hard work @Skarlso =)

@Skarlso
Copy link
Author

Skarlso commented May 25, 2023

Thanks. :) This is quite interesting. I'm not a 100% sure now what's going on. :D :D But I'll try to get to the end of it.

@stlaz
Copy link
Member

stlaz commented May 26, 2023

However, this is interesting. So once Authorization is set, it won't refresh again if I read this correctly?

RoundTrip() is a method called on a per-request basis (hence *http.Request in the function arguments).

The lines

		if refreshedToken, err := rt.source.Token(); err == nil {
			token = refreshedToken.AccessToken
		}

should be re-reading the token file on every request the client is making, that is, if you really have the token file set as the source of your bearer token.

Notice that the error on re-read is ignored, I'd expect it to at least be logged on some logging level.

If you were to debug, I think you may want to add a few prints around these lines/throw a few breakpoints around with delve.

@richardcase
Copy link

/cc @richardcase maybe you could shed a better light on this scenario? How does a CAPA kubeconfig look like that makes this stop working? :D

Sorry for the delay in respond @Skarlso . The CAPA kubeconfig for an EKS cluster contains an embedded token that expiries in 15 mins (the max allowed TTL for the token) and we re-create the kubeconfig with a new token every 10 mins.

@cnmcavoy
Copy link
Contributor

I also took a stab at solving this problem over in #5951 @Skarlso

This approach won't work, the BearerTokenFile is always "" for the management and workload kube configs. Also, the NewForConfig already sets up the same type of transport wrapper. So functionally this PR is a no-op.

@Skarlso
Copy link
Author

Skarlso commented Jul 14, 2023

Yepp that's pretty much the conclusion here sadly. I'm not sure this is an autoscaler problem anymore.

My apologies. I have stepped away from open source for a little bit as I got burned out. I'll be back eventually.

@Skarlso Skarlso closed this by deleting the head repository Oct 13, 2023
@Skarlso
Copy link
Author

Skarlso commented Oct 13, 2023

Argh, so I deleted my fork because somehow my access broke.

Nevertheless, I think this isn't an issue in autoscaler. 🤔

So, I'm fine with closing it. :)

@elmiko
Copy link
Contributor

elmiko commented Oct 16, 2023

thanks for the update @Skarlso

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area/cluster-autoscaler cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. do-not-merge/work-in-progress Indicates that a PR should not merge because it is a work in progress. kind/bug Categorizes issue or PR as related to a bug. size/M Denotes a PR that changes 30-99 lines, ignoring generated files.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

cluster-autoscaler caches workload cluster kubeconfig
6 participants