-
Notifications
You must be signed in to change notification settings - Fork 145
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Implement Discoverer class and lazy loading method for API resources #220
Conversation
Issue mentioned here: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good, had a few questions but nothing major. Will try to pull this down and look at it more closely soon (I'm at Ansiblefest right now so it's a little crazy)
openshift/dynamic/client.py
Outdated
@@ -377,20 +312,6 @@ def __init__(self, parent, **kwargs): | |||
self.verbs = kwargs.pop('verbs', None) | |||
self.extra_args = kwargs | |||
|
|||
#TODO(fabianvf): Determine proper way to handle differences between resources + subresources |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Was this intentionally removed?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nope, will add back
openshift/dynamic/client.py
Outdated
if not resourcePart: | ||
return [] | ||
elif isinstance(resourcePart, self.ResourceGroup): | ||
assert len(reqParams) == 2, "prefix and group params should be present, have %s" % reqParams |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could you raise an Exception (maybe ValueError or something like that) rather than an assertion error?
/ok-to-test |
openshift/dynamic/client.py
Outdated
def __init__(self, resources): | ||
self.__resources = resources | ||
class ResourceGroup(object): | ||
def __init__(self, preferred, resources={}): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You probably don't want to use a mutable default value for resources, I would default it to None and then do resources = resources or {}
@fabianvf Not sure why the build failed when I just added a blank line... Nothing jumps out at me, but possibly a flaky test? But, also it would be good if the behavioral tests would test all Discoverer types (currently EagerDiscoverer and LazyDiscoverer). I'm guessing you would more readily know how to add this? :) |
I'm going to test this from the Ansible end. |
My ansiblefest2018 tests running from Austin to EKS in Oregon go from 26-27 seconds before this change to 16-17 seconds with this change |
@asetty Yeah it was a build flake, I've been looking into what's causing those, seems to be nitty race conditions for rolebinding creation... Agree it would be good to have it test with different discoverer types, I think we should be able to parametrize the fixture that returns the client, I'll explore it a little and submit a PR to your branch if it works out. |
@asetty, would you mind rebasing? I think the conflict is fairly minor, I just added *List versions of the resources to the list when it's being added. |
2541bc8
to
80ec23a
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Had a few more questions/comments, it looks like there might be a few more places we can squeeze out some performance without adding too much complexity. Let me know if I was wrong about anything.
I had one additional concern, which is that the EagerDiscoverer is iterable while the LazyDiscoverer is not (easy way to test that, the main function at the bottom of the file iterates over the discovered resources and prints them out, so you can just run ./openshift/dynamic/client.py and see that it currently raises an exception). Can you think of a good way to implement the __iter__
method for the LazyDiscoverer?
def search(self, **kwargs): | ||
return self.__search(self.__build_search(**kwargs), self.__resources, []) | ||
|
||
def __search(self, parts, resources, reqParams): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Not a merge blocker, but we may want to refactor this logic at some point, it's a little tough to follow with all the nesting.
openshift/dynamic/client.py
Outdated
|
||
prefix = 'apis' | ||
groups['apis'] = {} | ||
groups_response = load_json(self.client.request('GET', '/{}'.format(prefix)))['groups'] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You might be able to avoid this call entirely, or at least only make it when the api_groups property is accessed. If you assume that the group and api_version that the user passes in is accurate you could attempt to skip the groups call and go straight to the specific group, only going back and doing this logic if that request 404s.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is true, but one thing that gets a little trickier is handling the preferred field because AFAIK I can only get this in the groups response. That's why I went ahead and did it at the beginning because it's just one request. Any thoughts on this? Maybe the preferred field isn't needed? Or we could ignore it in some way until we have to do the groups request when either api_groups is accessed or get 404 on a request.
My original thinking was that since it's one request and in many cases we won't have all of the required fields to request the resource directly we might as well do it at the beginning, but more optimization is always nice.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah, I'm not sure how useful the preferred flag is tbh, I'm not sure it's really used. I would just ignore it for now.
def version(self): | ||
return self.__version | ||
|
||
def _load_server_info(self): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I would recommend making this lazy as well, just make the request for version information when the version property is accessed. That will save us two requests per instantiation, since usually the version isn't used.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Honestly I would go ahead and make this lazy just in this base implementation, there's really no need for it to be eager at all.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I thought about this as well, but ultimately didn't make it lazy because for EagerDsicoverer, We call version to check if for openshift resources in the cluster in the default_groups
function when setting up resources and for LazyDiscoverer I was calling it in __setup_resources
.
e.g.
if self.version.get('openshift'):
groups['oapi'] = { '': {
'v1': self.ResourceGroup(True)
}}
I'm going to look at getting rid of the group requests in LazyDiscoverer until they are actually needed like you mentioned in your other comment, then there may be some benefit.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah, that default groups logic is actually going to be broken in newer versions of openshift anyway, as I think they've removed the openshift version API entirely. Not really relevant to this PR though, just noting it.
Getting back to this finally... For the |
@asetty I think the best thing to do for |
@asetty also looks like a stupid import merge conflict, should be a simple rebase |
To clarify: you're saying we should make the requests for the APIs we have yet to crawl s.t. we will have the same result as EagerDiscoverer? |
Right, but we should just make the group requests one at a time (rather than all at once), so that if the loop is interrupted we don't do a fill discovery. ie:
|
7185315
to
996cb4e
Compare
038eee9
to
50bf150
Compare
50bf150
to
c034af3
Compare
This will allow us to implement different strategies for discovering API resources i.e. all requests at beginning, completely lazy, background loading.
c034af3
to
c51a821
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
One small thing about the cache integration, and a question about tests. Other than that I think it's good to go.
openshift/dynamic/client.py
Outdated
self.__resources = resources | ||
self.__client = client | ||
# Special key used to mark when cache needs to be upated | ||
UPDATE_KEY = "__needs_update__" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It might be cleaner to make an attribute on your discoverer like __update_cache
and then override the _write_cache
method to be something like:
if self.__update_cache:
super(LazyDiscoverer, self)._write_cache()
self.__update_cache = False
Then you can get rid of most of the checks below and just set self.__update_cache = True
whenever you discover a new resource, and just call _write_cache
at the end.
test/unit/test_resource_container.py
Outdated
@@ -1,82 +0,0 @@ | |||
import pytest |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is it possible to update these tests or add new ones for the new discoverer classes? I'm happy to follow up with another PR to add them if you don't want to muck about in pytest.
@asetty I'm going to hold off on merging any major PRs until this is in to avoid breaking you again, let me know if you need any help getting the last few comments resolved. |
f8ad9da
to
8f986d8
Compare
Only the logic for search needs to change between discoverers, so the get function is now defined in the Discoverer class instead of in each subclass Signed-off-by: Fabian von Feilitzsch <[email protected]>
- Use request fixture to get the discoverer in the client fixture - Minor style changes Signed-off-by: Fabian von Feilitzsch <[email protected]>
Fix tests for discoverers
…penshift#220) * Remove resource container and implement Discoverer class This will allow us to implement different strategies for discovering API resources i.e. all requests at beginning, completely lazy, background loading. * Remove resource container unit test * Add back line for ResourceList kind * Add cache updating when resources are requested in __iter__ method * Add back case where there is no match for _type field in JSONDecoder * Change flag to for update cache to a field of Discoverer * Add pytest unit test for discoverer * Use generic get method for Discoverer subclasses Only the logic for search needs to change between discoverers, so the get function is now defined in the Discoverer class instead of in each subclass Signed-off-by: Fabian von Feilitzsch <[email protected]> * Update unit tests for discoverers - Use request fixture to get the discoverer in the client fixture - Minor style changes Signed-off-by: Fabian von Feilitzsch <[email protected]>
…penshift#220) * Remove resource container and implement Discoverer class This will allow us to implement different strategies for discovering API resources i.e. all requests at beginning, completely lazy, background loading. * Remove resource container unit test * Add back line for ResourceList kind * Add cache updating when resources are requested in __iter__ method * Add back case where there is no match for _type field in JSONDecoder * Change flag to for update cache to a field of Discoverer * Add pytest unit test for discoverer * Use generic get method for Discoverer subclasses Only the logic for search needs to change between discoverers, so the get function is now defined in the Discoverer class instead of in each subclass Signed-off-by: Fabian von Feilitzsch <[email protected]> * Update unit tests for discoverers - Use request fixture to get the discoverer in the client fixture - Minor style changes Signed-off-by: Fabian von Feilitzsch <[email protected]>
@fabianvf Here's the pull request, testing and comments from your side are very welcome :)
Motivation is for certain use cases (e.g. ansible) the dynamic client is recreated often and many times we are only dealing with a single resource type. Discovering the entire API in these cases is wasteful so we try to minimize requests to those that are needed. The request to discover groups is always made because I could not think of a good way to avoid this. Still we see a nice speedup in all cases which allows us to sit on our hands a couple seconds less when deploying with ansible. :)
Performance varies for different resource types as if we have a manifest such as:
We need to request the resources for all groups that have a v1 version.
If we have the group and version such as
we only need to request resources for group
apiextensions.k8s.io
, versionv1beta1
.Some simple test profiling:
Task 1
Time before: ~4.6 s
Time after: ~2.5-3 s
Task 2
Time before: ~4.6
Time after: ~1.2-2s