Faster room joins: fix race in recalculation of current room state #13151

squahtx · 2022-07-01T10:48:58Z

When we finish un-partial stating all events in a room, we recalculate
the current state using forward extremities. This can race with
persistence of another event, which could result in an invalid current
room state in the database.

To avoid the race, we recalculate current room state in the same
queue as event persistence. The event persistence queue may be on
another worker, so a new replication endpoint is required as well.

Fixes #13007.

May be easiest to review commit by commit.

This moves us closer to fixing the race between recalculation of a room's current state and event persistence. The next step is to move recalculation of current state into the event persistence queue. Signed-off-by: Sean Quah <[email protected]>

Avoid races between event persistence and recalculation of a room's current state by putting them in the same queue. Signed-off-by: Sean Quah <[email protected]>

Signed-off-by: Sean Quah <[email protected]>

erikjohnston

The tests are failing alas

erikjohnston · 2022-07-05T08:11:00Z

synapse/replication/http/state.py

+    async def _serialize_payload(room_id: str) -> JsonDict:  # type: ignore[override]
+        return {}


I think you can drop this since returning an empty dict is the default implementation in the base class.

Python complains because the base implementation is an abstractmethod

erikjohnston · 2022-07-05T08:12:02Z

synapse/state/__init__.py

+        self._events_shard_config = hs.config.worker.events_shard_config
+        self._instance_name = hs.get_instance_name()
+
+        self._update_current_state = (


Might be worth naming this _client to make it obvious what the difference is between it and update_current_state?

erikjohnston · 2022-07-05T08:16:30Z

synapse/storage/controllers/persist_events.py

            end_item = queue[-1]
+            existing_task = queue[-1].task
+            # add our events to the existing queue item
+            existing_task.events_and_contexts.extend(task.events_and_contexts)


I wonder if it'd be cleaner to have a def try_update(..) -> bool function in _EventPersistQueueTask that encapsulates this logic? For _UpdateCurrentStateTask it'd simply always return false (i.e. update failed), and in _PersistEventsTask have this check? Though that might complicate things unnecessarily.

I had a go at this

squahtx · 2022-07-05T15:03:52Z

TestRestrictedRoomsLocalJoin and TestSendJoinPartialStateResponse are known worker mode flakes: #13161

…_recalculation_race

Sean Quah added 4 commits June 30, 2022 23:51

Move recalculation of current state into the event persistence queue

db0430a

Avoid races between event persistence and recalculation of a room's current state by putting them in the same queue. Signed-off-by: Sean Quah <[email protected]>

Give recalculation of a room's current state a real stream ordering

6738bbc

Signed-off-by: Sean Quah <[email protected]>

Add newsfile

af5735c

squahtx requested a review from a team as a code owner July 1, 2022 10:48

Fix tests

2479743

squahtx mentioned this pull request Jul 1, 2022

Allow presence transactions during faster room joins tests matrix-org/complement#402

Merged

erikjohnston reviewed Jul 5, 2022

View reviewed changes

Sean Quah added 3 commits July 5, 2022 14:15

Rename replication client to _update_current_state_client

e03565e

Factor out persistence queue task merging logic into its own method

7588ab5

Underscore unused parameters

c7d6d6e

squahtx requested a review from erikjohnston July 5, 2022 15:03

erikjohnston approved these changes Jul 7, 2022

View reviewed changes

squahtx enabled auto-merge (squash) July 7, 2022 11:50

Merge branch 'develop' into squah/faster_room_joins_fix_current_state…

f8743a9

…_recalculation_race

squahtx merged commit 1391a76 into develop Jul 7, 2022

squahtx deleted the squah/faster_room_joins_fix_current_state_recalculation_race branch July 7, 2022 12:19

richvdh mentioned this pull request Oct 10, 2022

Faster joins: support worker-mode deployments #12994

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Faster room joins: fix race in recalculation of current room state #13151

Faster room joins: fix race in recalculation of current room state #13151

squahtx commented Jul 1, 2022 •

edited

Loading

erikjohnston left a comment

erikjohnston Jul 5, 2022

squahtx Jul 5, 2022

erikjohnston Jul 5, 2022

erikjohnston Jul 5, 2022

squahtx Jul 5, 2022

erikjohnston Jul 5, 2022

squahtx Jul 5, 2022

squahtx commented Jul 5, 2022

		async def _serialize_payload(room_id: str) -> JsonDict: # type: ignore[override]
		return {}

Faster room joins: fix race in recalculation of current room state #13151

Faster room joins: fix race in recalculation of current room state #13151

Conversation

squahtx commented Jul 1, 2022 • edited Loading

erikjohnston left a comment

Choose a reason for hiding this comment

erikjohnston Jul 5, 2022

Choose a reason for hiding this comment

squahtx Jul 5, 2022

Choose a reason for hiding this comment

erikjohnston Jul 5, 2022

Choose a reason for hiding this comment

erikjohnston Jul 5, 2022

Choose a reason for hiding this comment

squahtx Jul 5, 2022

Choose a reason for hiding this comment

erikjohnston Jul 5, 2022

Choose a reason for hiding this comment

squahtx Jul 5, 2022

Choose a reason for hiding this comment

squahtx commented Jul 5, 2022

squahtx commented Jul 1, 2022 •

edited

Loading