oidc auto refresh token during api invocation #492

mvle · 2018-03-21T15:50:13Z

Refresh the token not only at configuration initialization time but also during invocation of the K8 API calls. The token might expire right before a call thus causing the call to fail with an authentication failure. This will be needed for services that run a long time and rely on being able to reach the K8 Api Server (also short services that happen to be unlucky).

Possibly be done with a check to 'exp' field in the id-token right before a call_api or catch a 401 return status and perform a refresh token operation and update the configuration object with the new id_token and refresh_token values from the OAuth server.

Related to #368 #195, kubernetes-client/python-base#48

mvle · 2018-03-21T15:53:39Z

As mentioned by @roycaihw, swagger-codegen does not support OAuth2 refresh.
What I'm proposing is rather hacky. Anyway to do this so generated code will do the right thing? Wrapper calls?

flylo · 2018-03-27T16:05:32Z

@yliaog this is an issue for us, as we are running kubernetes pods as batch processes and tracking the state of the pods with a static client. Once the client object is initialized, it does no refresh, and therefore any job that takes longer than 60 minutes will fail. Looking through python-base, it seems like we could make a wrapper class that generates a new client (or refreshes the config) every n minutes, or checks status prior to every call (as @mvle suggested). The best fix would be in swagger-codegen, but a temporary solution would probably be very useful for a lot of people.

yliaog · 2018-03-27T17:01:11Z

I agree it's better to have a solution before swagger-codegen fix is in.

I think it's likely more efficient to check the status and do the refresh only when it fails. mind to send a PR?

In the meantime, do you mind to ping the swagger-codegen issue?

flylo · 2018-03-28T15:25:15Z

@yliaog Even with a try/catch wrapper the token wasn't working, but I've found the root cause of that issue. In debugging, I noticed that there is already logic to handle the case of token expiry, but the method is incorrect.

def _is_expired(expiry):
    return ((parse_rfc3339(expiry) + EXPIRY_SKEW_PREVENTION_DELAY) <=
            datetime.datetime.utcnow().replace(tzinfo=UTC))

With the logic above, we arbitrarily push back expiration time 5 minutes, so there is always a 5 minute interval in which any call to the API will be using an expired token with no refresh, and will therefore throw a 401.

Will submit a PR in a bit with a fix to that flipped sign, and also to add a retry wrapper for that API.

mvle · 2018-03-30T14:16:08Z

@flylo awesome. Can't wait to see the PR and for this feature to be implemented.

I was looking through the code but couldn't figure out where to place the wrapper that would catch all the api calls without it being overwritten by swagger-codegen. Only thing I could think of would require everyone using the library to try/catch in their own code when invoking the api calls.

flylo · 2018-03-30T16:35:21Z

@mvle @yliaog i submitted this today to unblock the API wrapper.
kubernetes-client/python-base#55

I merged a version of the wrapper in my own repo that has some stupid sleeping to get around the flipped sign bug. The wrapper is dumb simple... This is what it looks like right now:

class RetryingCoreV1Api(CoreV1Api):

    def __init__(self, client_provider,
                 failure_tolerance=None,
                 status_codes_to_catch=None,
                 seconds_before_retry=1):
        self._client_provider = client_provider
        self.num_failures = 0
        self._failure_tolerance = failure_tolerance
        self._status_codes = status_codes_to_catch
        self._seconds_pause_before_retry = seconds_before_retry
        super(RetryingCoreV1Api, self).__init__(self._client_provider())

    def __call__(self, instance_function, *args, **kwargs):
        try:
            return instance_function(*args, **kwargs)
        except ApiException as e:
            logger.info('Caught exception %s in %s ' % (e, instance_function.__name__,))
            self.num_failures += 1
            if self._status_codes is not None:
                if e.status not in self._status_codes:
                    raise e
            if self._failure_tolerance is not None:
                if self.num_failures > self._failure_tolerance:
                    raise e
            super(RetryingCoreV1Api, self).__init__(self._client_provider())
            return self.__call__(instance_function, *args, **kwargs)

and you'd use it like so:

from functools import partial
cfg_file = "<path to config file>"
client_provider = partial(new_client_from_config, cfg_file)
api = RetryingCoreV1Api(client_provider)
api(api.create_namespaced_pod, body=..., namespace=...)

I've tested it out in our internal repo and it works fine, but now I just have to adapt my tests to the python-base repo. My plan was to put it in a new module in config called config.patch, since this is just a temporary workaround to an issue in the swagger-codegen.

Before I go through too much trouble of refactoring my tests and making this backwards compatible, will this actually fix your issues?

flylo · 2018-03-30T16:38:48Z

also @mvle we'd have to put it in python-base.

mvle · 2018-04-01T16:22:36Z

@flylo this approach does indeed solve the problem. But, if I understand your proposal correctly, this is going to require patching the python-client, why not just create a patch that will directly add the retry logic to the "__call_api__" in ApiClient? That way we won't need create a RetryV1Api for each of the V1Api objects and require users to change their applications to make use of this feature. Otherwise, we would have to document how to use this feature with the new calling convention.

flylo · 2018-04-02T01:48:18Z

@mvle we can't directly edit the ApiClient because that code is auto-generated by swagger-codegen and it seems like all auth is handled in the manual python-base repo. This is just a temporary approach. It is simply a helper that wraps calls to the api in a try/catch block, and re-initializes the client object every time one of those calls fails. Also, at first glance it seems like we can't override call_api in the ApiClient, because the ApiClient has no knowledge of the authentication mechanism used to generate kube config tokens. Maybe there is an easy solution to this in swagger clients, but I'm pretty new to this.

I suppose we could create another child class for the client that overrides call_api, but this would just be more things to maintain and test, and if this is just a temporary patch that seems not ideal.

mvle · 2018-04-02T14:38:06Z

@flylo i agree that what i'm proposing might be hard to maintain. We'll have to add something into Configuration in python-client to store a callback or some info we can use to decide whether to do a renewal like in auth_settings. But i'm hopeful as I think these generated classes don't change much. The patch can be applied on installation of client library. I don't exactly like what I'm proposing much TBH since it's really hacky but it's most transparent to use.

As to your proposal, is it possible to make a single Retry class that is not inheriting from a specific V1Api class, e.g,. CoreV1Api? This way we end up with just one generic retry class.

flylo · 2018-04-02T21:01:34Z

@mvle
How about this:

class RetryingApiWrapper(object):

    def __init__(self,
                 client_provider,
                 api_cls,
                 failure_tolerance=None,
                 status_codes_to_catch=None,
                 seconds_before_retry=1):
        self._client_provider = client_provider
        self.num_failures = 0
        self._failure_tolerance = failure_tolerance
        self._status_codes = status_codes_to_catch
        self._seconds_pause_before_retry = seconds_before_retry
        self._api_clazz = api_cls
        self._api = None
        self.__refresh_api()

    def __refresh_api(self):
        self._api = self._api_clazz(api_client=self._client_provider())

    def wrap_api_call(self, unbound_method, *args, **kwargs):
        try:
            partially_applied_method = partial(unbound_method, *args, **kwargs)
            bound_method = MethodType(partially_applied_method, self._api, self._api_clazz)
            return bound_method()
        except ApiException as e:
            logger.info('Caught exception %s in %s ' % (e, unbound_method.__name__,))
            self.num_failures += 1
            if self._status_codes is not None:
                if e.status not in self._status_codes:
                    raise e
            if self._failure_tolerance is not None:
                if self.num_failures > self._failure_tolerance:
                    raise e
            self.__refresh_api()
            return self.wrap_api_call(unbound_method, *args, **kwargs)

The wrapper would get used like so:

client_provider = client.new_client_from_config
api = RetryingApiWrapper(client_provider, CoreV1Api)
api.wrap_api_call(CoreV1Api.create_namespaced_pod, body=pod_payload, namespace=self.namespace)

So, you construct an instance of the wrapper by passing it a client provider and an API class. Then, to make calls, you pass an unbound method reference to wrap_api_call. The wrapper will partially apply it, bind it to a private instance of the API, then attempt to call the bound instance method.

mvle · 2018-04-03T14:02:49Z

@flylo this is awesome! I think this will make it easy to use, maintain and not intrusive.

@yliaog what do you think?

flylo · 2018-04-03T14:38:09Z

sweet - i'll submit a PR as soon as i get a chance.

yliaog · 2018-04-03T17:14:16Z

I like it, looking forward to the PR. Thanks.

flylo · 2018-04-04T12:17:59Z

@yliaog this pr is blocking the wrapper one so if you could take a look that'd be awesome.
kubernetes-client/python-base#55

roycaihw · 2018-09-22T00:11:51Z

@flylo How is the wrapper going? Just want to make sure you get everything you need. Is there anything I can help with?

flylo · 2018-09-24T20:11:24Z

@roycaihw hey I actually ended up using that snippet above in an internal repo because working through the ITs in this one was a bit more time-consuming than I was hoping at the time. Let me take another look at this over the next few days and make sure that I can submit a fully-tested PR rather than just pasting a snippet.

hannes-brt · 2018-11-27T00:42:05Z

Are there still plans to fix this? Or has this issue been fixed out of channel?

Thanks!

fejta-bot · 2019-04-27T03:44:05Z

Issues go stale after 90d of inactivity.
Mark the issue as fresh with /remove-lifecycle stale.
Stale issues rot after an additional 30d of inactivity and eventually close.

If this issue is safe to close now please do so with /close.

Send feedback to sig-testing, kubernetes/test-infra and/or fejta.
/lifecycle stale

fejta-bot · 2019-05-27T04:26:41Z

Stale issues rot after 30d of inactivity.
Mark the issue as fresh with /remove-lifecycle rotten.
Rotten issues close after an additional 30d of inactivity.

If this issue is safe to close now please do so with /close.

Send feedback to sig-testing, kubernetes/test-infra and/or fejta.
/lifecycle rotten

fejta-bot · 2019-06-26T05:18:00Z

Rotten issues close after 30d of inactivity.
Reopen the issue with /reopen.
Mark the issue as fresh with /remove-lifecycle rotten.

Send feedback to sig-testing, kubernetes/test-infra and/or fejta.
/close

k8s-ci-robot · 2019-06-26T05:18:07Z

@fejta-bot: Closing this issue.

In response to this:

Rotten issues close after 30d of inactivity.
Reopen the issue with /reopen.
Mark the issue as fresh with /remove-lifecycle rotten.

Send feedback to sig-testing, kubernetes/test-infra and/or fejta.
/close

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

mvle mentioned this issue Mar 21, 2018

Add oidc auth kubernetes-client/python-base#48

Merged

flylo mentioned this issue Mar 30, 2018

fixing flipped sign in expiry time padding kubernetes-client/python-base#55

Merged

roycaihw mentioned this issue Sep 21, 2018

GCP Token is not refreshed kubernetes-client/python-base#59

Closed

flylo mentioned this issue Sep 24, 2018

GCP Token not getting refreshed => 404 Error #612

Closed

k8s-ci-robot added the lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. label Apr 27, 2019

k8s-ci-robot added lifecycle/rotten Denotes an issue or PR that has aged beyond stale and will be auto-closed. and removed lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. labels May 27, 2019

k8s-ci-robot closed this as completed Jun 26, 2019

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

oidc auto refresh token during api invocation #492

oidc auto refresh token during api invocation #492

mvle commented Mar 21, 2018

mvle commented Mar 21, 2018

flylo commented Mar 27, 2018

yliaog commented Mar 27, 2018

flylo commented Mar 28, 2018 •

edited

Loading

mvle commented Mar 30, 2018

flylo commented Mar 30, 2018

flylo commented Mar 30, 2018

mvle commented Apr 1, 2018 •

edited

Loading

flylo commented Apr 2, 2018 •

edited

Loading

mvle commented Apr 2, 2018

flylo commented Apr 2, 2018

mvle commented Apr 3, 2018

flylo commented Apr 3, 2018

yliaog commented Apr 3, 2018

flylo commented Apr 4, 2018

roycaihw commented Sep 22, 2018

flylo commented Sep 24, 2018

hannes-brt commented Nov 27, 2018

fejta-bot commented Apr 27, 2019

fejta-bot commented May 27, 2019

fejta-bot commented Jun 26, 2019

k8s-ci-robot commented Jun 26, 2019

oidc auto refresh token during api invocation #492

oidc auto refresh token during api invocation #492

Comments

mvle commented Mar 21, 2018

mvle commented Mar 21, 2018

flylo commented Mar 27, 2018

yliaog commented Mar 27, 2018

flylo commented Mar 28, 2018 • edited Loading

mvle commented Mar 30, 2018

flylo commented Mar 30, 2018

flylo commented Mar 30, 2018

mvle commented Apr 1, 2018 • edited Loading

flylo commented Apr 2, 2018 • edited Loading

mvle commented Apr 2, 2018

flylo commented Apr 2, 2018

mvle commented Apr 3, 2018

flylo commented Apr 3, 2018

yliaog commented Apr 3, 2018

flylo commented Apr 4, 2018

roycaihw commented Sep 22, 2018

flylo commented Sep 24, 2018

hannes-brt commented Nov 27, 2018

fejta-bot commented Apr 27, 2019

fejta-bot commented May 27, 2019

fejta-bot commented Jun 26, 2019

k8s-ci-robot commented Jun 26, 2019

flylo commented Mar 28, 2018 •

edited

Loading

mvle commented Apr 1, 2018 •

edited

Loading

flylo commented Apr 2, 2018 •

edited

Loading