Skip to content
This repository has been archived by the owner on Apr 26, 2024. It is now read-only.

Bulk-invalidate e2e cached queries after claiming keys #16613

Merged
merged 15 commits into from
Nov 9, 2023

Conversation

DMRobertson
Copy link
Contributor

@DMRobertson DMRobertson commented Nov 8, 2023

As threatened in #16554 (comment)

History:

I wrote a dirty test for the new bulk invalidation that I'm not particularly happy with. Completely untested otherwise.

Nominating Erik because a) streams and b) PC helped me write this.

Comment on lines +503 to +504
for keys in key_tuples:
txn.call_after(cache_func.invalidate, keys)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@DMRobertson postulated if it would be better to do:

Suggested change
for keys in key_tuples:
txn.call_after(cache_func.invalidate, keys)
def invalidate():
for keys in key_tuples:
cache_func.invalidate(keys)
txn.call_after(invalidate)

But I don't think it matters much since txn_call_after just calls them in a loop anyway? Maybe one way would use less memory. 🤷

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should we update the non-bulk versions of these to actually require tuples?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I was wondering if we should actually be requiring List[str], since the keys column is []text. It'd save a conversion tuple->list here and there? 🤷

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think tuples are used though because they're immutable so is what is used in the cache keys?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ahh, that makes sense!

I'd be in favour of specifying Tuple[str, ...], but in another PR.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do they have to be strings? I'd be kind of surprised if we don't have other types in there?

Anyway, yes sounds like a different PR.

Comment on lines +767 to +768
# TODO Can we call this for just the last position or somehow batch
# _add_persisted_position.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this is a question for @erikjohnston -- _add_persisted_position seems fairly heavy and if we could optimize what we call it with (or take multiple IDs at once) I think that'd be a win.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we certainly could make make it take multiple IDs? It does expect to be called with every position (though I don't think anything would break, just be less efficient)

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I guess if it has to be called w/ every position maybe it isn't a huge deal...

@DMRobertson DMRobertson marked this pull request as ready for review November 9, 2023 11:34
@DMRobertson DMRobertson requested a review from a team as a code owner November 9, 2023 11:34
changelog.d/16613.feature Outdated Show resolved Hide resolved
@@ -671,14 +671,50 @@ def get_next_txn(self, txn: LoggingTransaction) -> int:

return self._return_factor * next_id

def _mark_id_as_finished(self, next_id: int) -> None:
"""The ID has finished being processed so we should advance the
def get_next_mult_txn(self, txn: LoggingTransaction, n: int) -> List[int]:
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I wonder if get_next_txn should call this instead of duplicating some of the logic? (Same kind of goes for the cache & stream functions above TBH)

Copy link
Member

@erikjohnston erikjohnston left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks sane from my side, without having looked too closely

@DMRobertson DMRobertson merged commit 91587d4 into develop Nov 9, 2023
39 of 41 checks passed
@DMRobertson DMRobertson deleted the dmr/bulk-cache-invalidation branch November 9, 2023 15:57
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants