Bulk-invalidate e2e cached queries after claiming keys #16613

DMRobertson · 2023-11-08T19:54:34Z

As threatened in #16554 (comment)

History:

I wrote a dirty test for the new bulk invalidation that I'm not particularly happy with. Completely untested otherwise.

Nominating Erik because a) streams and b) PC helped me write this.

Missed in #7436

The order of the keys much match the order of each values iterable, but collections have no ordering guarantees.

gold star please

clokep · 2023-11-08T20:02:43Z

synapse/storage/databases/main/cache.py

+        for keys in key_tuples:
+            txn.call_after(cache_func.invalidate, keys)


@DMRobertson postulated if it would be better to do:

Suggested change

for keys in key_tuples:

txn.call_after(cache_func.invalidate, keys)

def invalidate():

for keys in key_tuples:

cache_func.invalidate(keys)

txn.call_after(invalidate)

But I don't think it matters much since txn_call_after just calls them in a loop anyway? Maybe one way would use less memory. 🤷

synapse/storage/databases/main/cache.py

clokep · 2023-11-08T20:04:02Z

synapse/storage/databases/main/cache.py

Should we update the non-bulk versions of these to actually require tuples?

I was wondering if we should actually be requiring List[str], since the keys column is []text. It'd save a conversion tuple->list here and there? 🤷

I think tuples are used though because they're immutable so is what is used in the cache keys?

ahh, that makes sense!

I'd be in favour of specifying Tuple[str, ...], but in another PR.

Do they have to be strings? I'd be kind of surprised if we don't have other types in there?

Anyway, yes sounds like a different PR.

synapse/storage/util/id_generators.py

clokep · 2023-11-08T20:08:26Z

synapse/storage/util/id_generators.py

+            # TODO Can we call this for just the last position or somehow batch
+            # _add_persisted_position.


I think this is a question for @erikjohnston -- _add_persisted_position seems fairly heavy and if we could optimize what we call it with (or take multiple IDs at once) I think that'd be a win.

I think we certainly could make make it take multiple IDs? It does expect to be called with every position (though I don't think anything would break, just be less efficient)

I guess if it has to be called w/ every position maybe it isn't a huge deal...

Co-authored-by: Patrick Cloke <[email protected]>

changelog.d/16613.feature

clokep · 2023-11-09T12:59:04Z

synapse/storage/util/id_generators.py

@@ -671,14 +671,50 @@ def get_next_txn(self, txn: LoggingTransaction) -> int:

        return self._return_factor * next_id

-    def _mark_id_as_finished(self, next_id: int) -> None:
-        """The ID has finished being processed so we should advance the
+    def get_next_mult_txn(self, txn: LoggingTransaction, n: int) -> List[int]:


I wonder if get_next_txn should call this instead of duplicating some of the logic? (Same kind of goes for the cache & stream functions above TBH)

erikjohnston

Looks sane from my side, without having looked too closely

David Robertson and others added 8 commits November 8, 2023 15:55

Remove stale comment

190b50a

Missed in #7436

simple_insert_many's keys should be a sequence

b1ca34f

The order of the keys much match the order of each values iterable, but collections have no ordering guarantees.

Impl bulk-invalidating caches over replication

171d705

Add get_next_mult_txn.

c6f2a8f

Mark multiple IDs as finished.

307c77a

Helpful implementation note!!

d9b7d08

gold star please

dummy test case

f6ee91c

Commit to them being tuples

9e72bf4

DMRobertson mentioned this pull request Nov 8, 2023

/keys/claim is surprisingly slow #16554

Open

3 tasks

David Robertson added 3 commits November 8, 2023 19:55

Changelog

be349f2

lint

91a48af

Fixup 307c77a

35938df

DMRobertson requested a review from erikjohnston November 8, 2023 19:58

clokep reviewed Nov 8, 2023

View reviewed changes

synapse/storage/databases/main/cache.py Outdated Show resolved Hide resolved

clokep reviewed Nov 8, 2023

View reviewed changes

synapse/storage/util/id_generators.py Show resolved Hide resolved

David Robertson added 2 commits November 8, 2023 20:07

Remove guff from the non-worker test

69b539e

Fix var name

4899623

clokep reviewed Nov 8, 2023

View reviewed changes

synapse/storage/util/id_generators.py Outdated Show resolved Hide resolved

clokep reviewed Nov 8, 2023

View reviewed changes

Use set.update instead of a loop over set.add

e3a6358

Co-authored-by: Patrick Cloke <[email protected]>

DMRobertson marked this pull request as ready for review November 9, 2023 11:34

DMRobertson requested a review from a team as a code owner November 9, 2023 11:34

DMRobertson commented Nov 9, 2023

View reviewed changes

changelog.d/16613.feature Outdated Show resolved Hide resolved

Tweak changelog

b95d361

clokep reviewed Nov 9, 2023

View reviewed changes

erikjohnston reviewed Nov 9, 2023

View reviewed changes

DMRobertson merged commit 91587d4 into develop Nov 9, 2023
39 of 41 checks passed

DMRobertson deleted the dmr/bulk-cache-invalidation branch November 9, 2023 15:57

clokep mentioned this pull request Nov 9, 2023

Use _invalidate_cache_and_stream_bulk in more places. #16616

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Bulk-invalidate e2e cached queries after claiming keys #16613

Bulk-invalidate e2e cached queries after claiming keys #16613

DMRobertson commented Nov 8, 2023 •

edited

Loading

clokep Nov 8, 2023

clokep Nov 8, 2023

DMRobertson Nov 8, 2023

clokep Nov 8, 2023

DMRobertson Nov 8, 2023

clokep Nov 8, 2023

clokep Nov 8, 2023

erikjohnston Nov 9, 2023

clokep Nov 9, 2023

clokep Nov 9, 2023

erikjohnston left a comment

		for keys in key_tuples:
		txn.call_after(cache_func.invalidate, keys)

		# TODO Can we call this for just the last position or somehow batch
		# _add_persisted_position.

Bulk-invalidate e2e cached queries after claiming keys #16613

Bulk-invalidate e2e cached queries after claiming keys #16613

Conversation

DMRobertson commented Nov 8, 2023 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

erikjohnston left a comment

Choose a reason for hiding this comment

DMRobertson commented Nov 8, 2023 •

edited

Loading