-
Notifications
You must be signed in to change notification settings - Fork 19
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Improvement/revamp cursor iteration #44
base: master
Are you sure you want to change the base?
Improvement/revamp cursor iteration #44
Conversation
…alue set by the user is not lost. Also, fix a bug where a second iteration on the cursor would results in no content at all, because of self.retrieved not being reset. This is not an useful information outside of the iteration algo so self.retrieved was made local to reduce impact. Adding .vscode in .gitignore
…sts/helpers.py. The last page generated was not getting the right start_element and causing a StopIteration when the generated collection was assigned as a side_effect on a cursor.
…o made. The logic handling skip (if defined) and limit (if defined) has been transfered from __iter__ to iter_pages to avoid unecessary round-trips with AppNexus API. Add unit tests around that logic and revamp of the helpers that generate collections when mocking client.get results.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Code looks good! There are a few things I do not understand in the tests, but that doesn't look like a big deal.
count = page["count"] | ||
start_element = start_element + page["num_elements"] | ||
num_elements = min(page["count"] - num_elements, self.batch_size) | ||
count = min(page["count"], self._skip + self._limit) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can't we just keep num_elements
, and entirely remove count
from the method?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I didn't address this comment.
I remember trying hard to handle all corners cases and also avoiding all these intermediate variables and always bumping into a problem or having a pretty unreadable code.
I think the code is both functional and maintanable as is so I would keep it that way.
tests/helpers.py
Outdated
def gen_ordered_collection(start_element, count, object_type="campaigns"): | ||
return gen_collection( | ||
object_generator_func=lambda index: {"id": index}, | ||
start_element=start_element, count=count, object_type=object_type) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I guess it would be more readable/maintainable to add a random: bool
parameter to gen_collection
and gen_page
than having these two very similar functions, what do you think?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Agreed, changes made
tests/helpers.py
Outdated
start_element=start_element + i * 100, | ||
num_elements=volume % 100) | ||
result.append(page) | ||
return result |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can you keep gen_collection
under gen_page
so that we can see the actual diff? FYI, the functions are ordered that way because we usually order the functions "à la C", i.e. with functions used in other functions at the top, although this is not a strict convention nor something important.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done
tests/cursor.py
Outdated
client = AppNexusClient("test", "test") | ||
mocker.patch.object(client, "get") | ||
client.get.side_effect = ordered_response_dict * 2 | ||
return Cursor(client, "campaign", representations.raw) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This fixture doesn't seem used anywhere.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Removed
tests/cursor.py
Outdated
|
||
|
||
def test_skip_none(mocker): | ||
cursor = mock_ordered_cursor(mocker, start=0, count=COLLECTION_SIZE) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why do you use this rather than your ordered_cursor
fixture?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
No good reason in this case, updated
def test_skip_ten(mocker): | ||
skip = 10 | ||
cursor = mock_ordered_cursor(mocker, start=skip, count=COLLECTION_SIZE) | ||
cursor.skip(skip) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm not sure to understand why you mock_ordered_cursor(..., start=skip, ...)
and then cursor.skip(skip)
, can you explain?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The goal of this test and the ones following is to validate that the cursor is iterating over pages properly taking skip
and limit
parmeters into account. These parameters are setting up what is going to be asked of the API, not doing some post-request manipulation of the results.
The start
, count
or factor
arguments given to mock_ordered_cursor
are used to simulate the expected response from the API. Therefore, we need to align what we want (ie. setting up the cursor) with what is expected to be returned by the client.get
method that we are patching, feeding gen_collection
results to it.
I think the tests are ok as is (used them to validate the algo changes) though I admit that they could be a bit better.
One potentially better approach would be to actually monkey patch request.get
and similar to change the raw results based on the query params of the url requested. I felt this was a bit overkill for what I wanted to achieve here.
appnexus/cursor.py
Outdated
@@ -39,21 +38,11 @@ def __getitem__(self, idx): | |||
|
|||
def __iter__(self): | |||
"""Iterate over all AppNexus objects matching the specifications""" | |||
retrieved = 0 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Comment from @rambobinator
The retrieved
variable is not used anymore, remove it
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Variable removed in last commits
Address issue #41, #42 and #43.
I tried to work around the current design in
master
to fix all corner cases but was unable to do that and, at the same time, actually taking into account the skip/limit at the url level.Most 'limited' fixes (changing as little code as possible) would have required some sort of duplication of code between
cursor.__iter__
tocursor.iter_pages
and shared responsibilities of the skip/limit logic. I felt that it was cleaner to centralize that logic in one place and that this place had to be the one closer to the api calls.Therefore, the skip/limit logic has been moved from
cursor.__iter__
tocursor.iter_pages
so that query parameters used to query AppNexus are actually impacted by the user's configuration.