Implement Discoverer class and lazy loading method for API resources #220

asetty · 2018-10-03T14:57:38Z

@fabianvf Here's the pull request, testing and comments from your side are very welcome :)

Motivation is for certain use cases (e.g. ansible) the dynamic client is recreated often and many times we are only dealing with a single resource type. Discovering the entire API in these cases is wasteful so we try to minimize requests to those that are needed. The request to discover groups is always made because I could not think of a good way to avoid this. Still we see a nice speedup in all cases which allows us to sit on our hands a couple seconds less when deploying with ansible. :)

Performance varies for different resource types as if we have a manifest such as:

version: v1
kind: Pod
...

We need to request the resources for all groups that have a v1 version.
If we have the group and version such as

version: apiextensions.k8s.io/v1beta1
kind: CustomResourceDefinition
...

we only need to request resources for group apiextensions.k8s.io, version v1beta1.

Some simple test profiling:
Task 1

- name: Create a k8s namespace
  k8s:
    name: testing
    api_version: v1
    kind: Namespace
    context: dev

Time before: ~4.6 s
Time after: ~2.5-3 s

Task 2

...
apiVersion: extensions/v1beta1
kind: Deployment
...
[DETAILS left out bc not important]

Time before: ~4.6
Time after: ~1.2-2s

asetty · 2018-10-03T15:03:29Z

Issue mentioned here:
ansible/ansible#31650

fabianvf

Looks good, had a few questions but nothing major. Will try to pull this down and look at it more closely soon (I'm at Ansiblefest right now so it's a little crazy)

openshift/dynamic/client.py

fabianvf · 2018-10-03T15:03:05Z

openshift/dynamic/client.py

@@ -377,20 +312,6 @@ def __init__(self, parent, **kwargs):
        self.verbs = kwargs.pop('verbs', None)
        self.extra_args = kwargs

-    #TODO(fabianvf): Determine proper way to handle differences between resources + subresources


Was this intentionally removed?

Nope, will add back

fabianvf · 2018-10-03T15:08:52Z

openshift/dynamic/client.py

+            if not resourcePart:
+                return []
+            elif isinstance(resourcePart, self.ResourceGroup):
+                assert len(reqParams) == 2, "prefix and group params should be present, have %s" % reqParams


Could you raise an Exception (maybe ValueError or something like that) rather than an assertion error?

fabianvf · 2018-10-03T15:12:07Z

/ok-to-test

fabianvf · 2018-10-03T15:14:58Z

openshift/dynamic/client.py

-    def __init__(self, resources):
-        self.__resources = resources
+    class ResourceGroup(object):
+        def __init__(self, preferred, resources={}):


You probably don't want to use a mutable default value for resources, I would default it to None and then do resources = resources or {}

asetty · 2018-10-03T23:33:54Z

@fabianvf Not sure why the build failed when I just added a blank line... Nothing jumps out at me, but possibly a flaky test?

But, also it would be good if the behavioral tests would test all Discoverer types (currently EagerDiscoverer and LazyDiscoverer). I'm guessing you would more readily know how to add this? :)

willthames · 2018-10-04T16:02:28Z

I'm going to test this from the Ansible end.

willthames · 2018-10-04T16:39:13Z

My ansiblefest2018 tests running from Austin to EKS in Oregon go from 26-27 seconds before this change to 16-17 seconds with this change

fabianvf · 2018-10-04T16:54:04Z

@asetty Yeah it was a build flake, I've been looking into what's causing those, seems to be nitty race conditions for rolebinding creation...

Agree it would be good to have it test with different discoverer types, I think we should be able to parametrize the fixture that returns the client, I'll explore it a little and submit a PR to your branch if it works out.

fabianvf · 2018-10-04T18:55:39Z

@asetty, would you mind rebasing? I think the conflict is fairly minor, I just added *List versions of the resources to the list when it's being added.

fabianvf

Had a few more questions/comments, it looks like there might be a few more places we can squeeze out some performance without adding too much complexity. Let me know if I was wrong about anything.

I had one additional concern, which is that the EagerDiscoverer is iterable while the LazyDiscoverer is not (easy way to test that, the main function at the bottom of the file iterates over the discovered resources and prints them out, so you can just run ./openshift/dynamic/client.py and see that it currently raises an exception). Can you think of a good way to implement the __iter__ method for the LazyDiscoverer?

fabianvf · 2018-10-05T21:55:19Z

openshift/dynamic/client.py

+    def search(self, **kwargs):
+        return self.__search(self.__build_search(**kwargs), self.__resources, [])
+
+    def __search(self,  parts, resources, reqParams):


Not a merge blocker, but we may want to refactor this logic at some point, it's a little tough to follow with all the nesting.

fabianvf · 2018-10-05T21:59:11Z

openshift/dynamic/client.py

+
+        prefix = 'apis'
+        groups['apis'] = {}
+        groups_response = load_json(self.client.request('GET', '/{}'.format(prefix)))['groups']


You might be able to avoid this call entirely, or at least only make it when the api_groups property is accessed. If you assume that the group and api_version that the user passes in is accurate you could attempt to skip the groups call and go straight to the specific group, only going back and doing this logic if that request 404s.

This is true, but one thing that gets a little trickier is handling the preferred field because AFAIK I can only get this in the groups response. That's why I went ahead and did it at the beginning because it's just one request. Any thoughts on this? Maybe the preferred field isn't needed? Or we could ignore it in some way until we have to do the groups request when either api_groups is accessed or get 404 on a request.

My original thinking was that since it's one request and in many cases we won't have all of the required fields to request the resource directly we might as well do it at the beginning, but more optimization is always nice.

Yeah, I'm not sure how useful the preferred flag is tbh, I'm not sure it's really used. I would just ignore it for now.

fabianvf · 2018-10-05T22:01:14Z

openshift/dynamic/client.py

+    def version(self):
+        return self.__version
+
+    def _load_server_info(self):


I would recommend making this lazy as well, just make the request for version information when the version property is accessed. That will save us two requests per instantiation, since usually the version isn't used.

Honestly I would go ahead and make this lazy just in this base implementation, there's really no need for it to be eager at all.

I thought about this as well, but ultimately didn't make it lazy because for EagerDsicoverer, We call version to check if for openshift resources in the cluster in the default_groups function when setting up resources and for LazyDiscoverer I was calling it in __setup_resources.
e.g.

if self.version.get('openshift'): groups['oapi'] = { '': { 'v1': self.ResourceGroup(True) }}

I'm going to look at getting rid of the group requests in LazyDiscoverer until they are actually needed like you mentioned in your other comment, then there may be some benefit.

Yeah, that default groups logic is actually going to be broken in newer versions of openshift anyway, as I think they've removed the openshift version API entirely. Not really relevant to this PR though, just noting it.

asetty · 2018-10-30T21:44:22Z

Getting back to this finally... For the __iter__ method we could just implement it the same as EagerDiscoverer and we will only get the resources that have already been "discovered". Another option could be to do a full discovery when someone iterates over the LazyDiscoverer and take that cost then.

fabianvf · 2018-10-31T16:16:58Z

@asetty I think the best thing to do for __iter__ would be to get the group list, and iterate through each group yielding a resource at a time. If the iteration is interrupted, a full API walk will not have been performed- just the APIs we've crawled so far. I think that's probably the best way to do it lazily.

fabianvf · 2018-10-31T16:24:14Z

@asetty also looks like a stupid import merge conflict, should be a simple rebase

asetty · 2018-11-01T19:06:08Z

@asetty I think the best thing to do for __iter__ would be to get the group list, and iterate through each group yielding a resource at a time. If the iteration is interrupted, a full API walk will not have been performed- just the APIs we've crawled so far. I think that's probably the best way to do it lazily.

To clarify: you're saying we should make the requests for the APIs we have yet to crawl s.t. we will have the same result as EagerDiscoverer?

fabianvf · 2018-11-01T19:26:44Z

Right, but we should just make the group requests one at a time (rather than all at once), so that if the loop is interrupted we don't do a fill discovery. ie:

groups = discover_api_groups()
for group in groups:
    resources = resources_for_api_group(group)
    for resource in resources:
        yield resource

This will allow us to implement different strategies for discovering API resources i.e. all requests at beginning, completely lazy, background loading.

fabianvf

One small thing about the cache integration, and a question about tests. Other than that I think it's good to go.

fabianvf · 2018-12-04T22:00:30Z

openshift/dynamic/client.py

-        self.__resources = resources
-        self.__client = client
+    # Special key used to mark when cache needs to be upated
+    UPDATE_KEY = "__needs_update__"


It might be cleaner to make an attribute on your discoverer like __update_cache and then override the _write_cache method to be something like:

if self.__update_cache: super(LazyDiscoverer, self)._write_cache() self.__update_cache = False

Then you can get rid of most of the checks below and just set self.__update_cache = True whenever you discover a new resource, and just call _write_cache at the end.

fabianvf · 2018-12-04T22:02:58Z

test/unit/test_resource_container.py

@@ -1,82 +0,0 @@
-import pytest


Is it possible to update these tests or add new ones for the new discoverer classes? I'm happy to follow up with another PR to add them if you don't want to muck about in pytest.

fabianvf · 2018-12-12T19:13:30Z

@asetty I'm going to hold off on merging any major PRs until this is in to avoid breaking you again, let me know if you need any help getting the last few comments resolved.

Only the logic for search needs to change between discoverers, so the get function is now defined in the Discoverer class instead of in each subclass Signed-off-by: Fabian von Feilitzsch <[email protected]>

- Use request fixture to get the discoverer in the client fixture - Minor style changes Signed-off-by: Fabian von Feilitzsch <[email protected]>

Fix tests for discoverers

…penshift#220) * Remove resource container and implement Discoverer class This will allow us to implement different strategies for discovering API resources i.e. all requests at beginning, completely lazy, background loading. * Remove resource container unit test * Add back line for ResourceList kind * Add cache updating when resources are requested in __iter__ method * Add back case where there is no match for _type field in JSONDecoder * Change flag to for update cache to a field of Discoverer * Add pytest unit test for discoverer * Use generic get method for Discoverer subclasses Only the logic for search needs to change between discoverers, so the get function is now defined in the Discoverer class instead of in each subclass Signed-off-by: Fabian von Feilitzsch <[email protected]> * Update unit tests for discoverers - Use request fixture to get the discoverer in the client fixture - Minor style changes Signed-off-by: Fabian von Feilitzsch <[email protected]>

openshift-ci-robot added size/L Denotes a PR that changes 100-499 lines, ignoring generated files. needs-ok-to-test Indicates a PR that requires an org member to verify it is safe to test. labels Oct 3, 2018

fabianvf requested changes Oct 3, 2018

View reviewed changes

openshift-ci-robot removed the needs-ok-to-test Indicates a PR that requires an org member to verify it is safe to test. label Oct 3, 2018

fabianvf requested changes Oct 3, 2018

View reviewed changes

asetty force-pushed the lazy-load-resources branch from 2541bc8 to 80ec23a Compare October 5, 2018 00:01

fabianvf requested changes Oct 5, 2018

View reviewed changes

openshift-bot added the needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. label Nov 6, 2018

asetty force-pushed the lazy-load-resources branch from 7185315 to 996cb4e Compare November 7, 2018 21:04

openshift-bot removed the needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. label Nov 7, 2018

fabianvf mentioned this pull request Nov 19, 2018

add automatic caching for discovery requests, refreshing on a miss #238

Merged

asetty force-pushed the lazy-load-resources branch 2 times, most recently from 038eee9 to 50bf150 Compare November 27, 2018 22:57

openshift-bot added the needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. label Nov 27, 2018

asetty force-pushed the lazy-load-resources branch from 50bf150 to c034af3 Compare November 27, 2018 23:18

asetty added 2 commits November 30, 2018 16:48

Remove resource container and implement Discoverer class

47e54c0

This will allow us to implement different strategies for discovering API resources i.e. all requests at beginning, completely lazy, background loading.

Remove resource container unit test

c51a821

asetty force-pushed the lazy-load-resources branch from c034af3 to c51a821 Compare December 1, 2018 00:48

openshift-bot removed the needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. label Dec 1, 2018

asetty added 2 commits December 1, 2018 12:06

Add back line for ResourceList kind

d533cd6

Add cache updating when resources are requested in __iter__ method

464f43b

openshift-ci-robot added size/XL Denotes a PR that changes 500-999 lines, ignoring generated files. and removed size/L Denotes a PR that changes 100-499 lines, ignoring generated files. labels Dec 3, 2018

Add back case where there is no match for _type field in JSONDecoder

4132815

fabianvf requested changes Dec 4, 2018

View reviewed changes

Change flag to for update cache to a field of Discoverer

12aa2d1

openshift-ci-robot added size/L Denotes a PR that changes 100-499 lines, ignoring generated files. and removed size/XL Denotes a PR that changes 500-999 lines, ignoring generated files. labels Dec 12, 2018

Add pytest unit test for discoverer

8f986d8

asetty force-pushed the lazy-load-resources branch from f8ad9da to 8f986d8 Compare December 13, 2018 00:22

fabianvf added 2 commits December 13, 2018 13:08

Use generic get method for Discoverer subclasses

1e3eb5c

Only the logic for search needs to change between discoverers, so the get function is now defined in the Discoverer class instead of in each subclass Signed-off-by: Fabian von Feilitzsch <[email protected]>

Update unit tests for discoverers

015edef

- Use request fixture to get the discoverer in the client fixture - Minor style changes Signed-off-by: Fabian von Feilitzsch <[email protected]>

fabianvf approved these changes Dec 13, 2018

View reviewed changes

Merge pull request #1 from fabianvf/lazy-load-resources

427e522

Fix tests for discoverers

fabianvf merged commit 307dd81 into openshift:master Dec 13, 2018

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Implement Discoverer class and lazy loading method for API resources #220

Implement Discoverer class and lazy loading method for API resources #220

asetty commented Oct 3, 2018 •

edited

Loading

asetty commented Oct 3, 2018

fabianvf left a comment

fabianvf Oct 3, 2018

asetty Oct 3, 2018

fabianvf Oct 3, 2018

fabianvf commented Oct 3, 2018

fabianvf Oct 3, 2018

asetty commented Oct 3, 2018

willthames commented Oct 4, 2018

willthames commented Oct 4, 2018

fabianvf commented Oct 4, 2018

fabianvf commented Oct 4, 2018

fabianvf left a comment

fabianvf Oct 5, 2018

fabianvf Oct 5, 2018

asetty Oct 30, 2018

fabianvf Oct 31, 2018

fabianvf Oct 5, 2018

fabianvf Oct 5, 2018

asetty Oct 22, 2018 •

edited

Loading

fabianvf Oct 25, 2018

asetty commented Oct 30, 2018 •

edited

Loading

fabianvf commented Oct 31, 2018

fabianvf commented Oct 31, 2018

asetty commented Nov 1, 2018

fabianvf commented Nov 1, 2018

fabianvf left a comment

fabianvf Dec 4, 2018

fabianvf Dec 4, 2018

fabianvf commented Dec 12, 2018

Implement Discoverer class and lazy loading method for API resources #220

Implement Discoverer class and lazy loading method for API resources #220

Conversation

asetty commented Oct 3, 2018 • edited Loading

asetty commented Oct 3, 2018

fabianvf left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

fabianvf commented Oct 3, 2018

Choose a reason for hiding this comment

asetty commented Oct 3, 2018

willthames commented Oct 4, 2018

willthames commented Oct 4, 2018

fabianvf commented Oct 4, 2018

fabianvf commented Oct 4, 2018

fabianvf left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

asetty Oct 22, 2018 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

asetty commented Oct 30, 2018 • edited Loading

fabianvf commented Oct 31, 2018

fabianvf commented Oct 31, 2018

asetty commented Nov 1, 2018

fabianvf commented Nov 1, 2018

fabianvf left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

fabianvf commented Dec 12, 2018

asetty commented Oct 3, 2018 •

edited

Loading

asetty Oct 22, 2018 •

edited

Loading

asetty commented Oct 30, 2018 •

edited

Loading