-
Notifications
You must be signed in to change notification settings - Fork 7
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[PP-1358] add new events #2174
base: main
Are you sure you want to change the base?
[PP-1358] add new events #2174
Conversation
Codecov ReportAttention: Patch coverage is
Additional details and impacted files@@ Coverage Diff @@
## main #2174 +/- ##
=======================================
Coverage 91.09% 91.09%
=======================================
Files 363 363
Lines 41215 41240 +25
Branches 8830 8831 +1
=======================================
+ Hits 37544 37568 +24
- Misses 2406 2407 +1
Partials 1265 1265 ☔ View full report in Codecov by Sentry. |
4020690
to
a1700c8
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm a bit concerned about the performance impacts of this change. I'd like to understand that a bit better before this one goes in.
alembic/versions/20241119_58b0ae7f5b67_make_loan_patron_id_and_hold_patron_id_.py
Show resolved
Hide resolved
def remove_expired_holds_for_collection(db: Session, collection_id: int) -> int: | ||
def remove_expired_holds_for_collection( | ||
db: Session, collection_id: int, analytics: Analytics | ||
) -> int: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do you have a sense of how much slower this function is now? I spent some time optimizing it, so previously it was very fast, since it was just running a single delete query on fairly well indexed data. Now we are doing a select, iterating through each row, and pushing a file to s3 for each row. Seems like that has the potential to be a lot slower.
I ask, because I've been trying to keep the run time of any one celery task under two minutes, so eventually we can run these tasks on AWS spot instances, and everything becomes a lot easier if we are able to complete a task within the shutdown window.
This function gets called within a single task for every collection on a CM, see my note below.
Hold.end < utc_now(), | ||
Hold.license_pool_id == LicensePool.id, | ||
LicensePool.collection_id == collection_id, | ||
) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Since this is done in two steps, what we log in analytics has the possibility of getting out of sync with the DB right? I'd suggest making this into a select for update query, then doing the delete in the for loop where we also log the analytics events.
@@ -123,7 +153,9 @@ def remove_expired_holds(task: Task) -> None: | |||
] | |||
for collection_id, collection_name in collections: | |||
with task.transaction() as session: | |||
removed = remove_expired_holds_for_collection(session, collection_id) | |||
removed = remove_expired_holds_for_collection( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Here is where the function I noted above gets called. So within a single task it iterates over all the collections in a CM. Since the function completes quickly now its not a concern, but I'm a bit worried with these changes it might be slow enough that we need a different approach here.
Can you do some testing to get an idea with a CM like CA or CT how long the updated function will take if we have a lot of holds outstanding? If it still completes reasonably fast then these changes are fine, but if it is now taking more then 30 - 45 seconds I think we should consider either re-queuing the task after a certain amount of work or only processing one collection per task or something like that to speed it up.
@@ -200,8 +232,12 @@ def recalculate_hold_queue_collection( | |||
f"Skipping license pool {license_pool_id} because it no longer exists." | |||
) | |||
continue | |||
|
|||
analytics = task.services.analytics.analytics() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Minor: No need to create this service on every loop. this can be moved outside of the for loop.
@@ -880,11 +880,18 @@ def run_once(self, *args, **kwargs): | |||
count = qu.count() | |||
self.log.info("Deleting %d row(s)", count) | |||
while count > 0: | |||
post_delete_ops = [] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is this being used for anything? I see a for loop below that references it, but I don't see anywhere that data gets added to it.
@@ -897,6 +904,9 @@ def delete(self, row): | |||
""" | |||
self._db.delete(row) | |||
|
|||
def post_delete(self, row) -> None: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Minor: Why add a new function for this? Can't LoanlikeReaperMonitor
just override delete
, call the super method, then do whatever it needs to do?
assert True == is_new | ||
assert CirculationEvent.NEW_PATRON == analytics.event_type | ||
assert 1 == analytics.count | ||
assert is_new == True |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Minor: If we are updating these we should update boolean checks to be strict as well:
assert is_new is True
assert "Achewood" == patron.neighborhood | ||
assert 1 == analytics.count | ||
assert patron.authorization_identifier == "2" | ||
assert is_new == False |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Minor: Same as above
assert is_new is False
converted to hold.
a1700c8
to
69cd3e4
Compare
Description
This update captures five new circulation events:
I have a couple of questions/comments to inform the review:
We are capturing hold expirations where the CM is and is not the source of truth. However for loans I believe we are only handling expired loans where the CM is not source of truth. Did I miss something here? Or should we be doing something for loan expirations where CM is not the source of truth?
When we convert holds to loans I am both capturing the conversion event as well as recording a "checkout". Because it is a checkout event that happens to be a conversion. I think that's what we want but others may disagree.
I put a note in the code, but I'll call it out again here: there appears to be a case where a loan can be converted to a hold. I'm not sure what the scenario is, but I've captured it is a distinct event in addition to the placing of the hold.
I ran into some mypy issues with the collect_event method. That led me to observe that all loans and holds have patrons and the intention seems to be that all patrons must have loans and holds but the database does not enforce that constraint. I included the migration here, but I am happy to move it to a separate PR is that seems cleaner.
Motivation and Context
https://ebce-lyrasis.atlassian.net/browse/PP-1804
How Has This Been Tested?
New unit tests added. Older unit tests amended.
Checklist