Skip to content
This repository has been archived by the owner on Apr 26, 2024. It is now read-only.

Reduce event lookups during room creation by passing known event IDs #13210

Merged

Conversation

Fizzadar
Copy link
Contributor

@Fizzadar Fizzadar commented Jul 7, 2022

Been looking into speeding up room creation recently, in particular reducing the number of queries required. Inspired by the room batch handler this uses previous event inserts to pre-populate state events during room creation, reducing the number of queries required to create a room.

Signed off by Nick @ Beeper (@Fizzadar)

Pull Request Checklist

Inspired by the room batch handler this uses previous event inserts to
pre-populate state events during room creation, reducing the number of
queries required to create a room.
@Fizzadar Fizzadar requested a review from a team as a code owner July 7, 2022 10:27
Copy link
Contributor

@DMRobertson DMRobertson left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If I understand right, this saves a few calls to

  • get_prev_events_for_room
  • get_current_state_ids (so get_state_group_for_events, get_state_for_groups, get_state_group_delta)

Have you had a chance to measure this at all? Would love to know e.g. the change in the number of DB queries made!

synapse/handlers/room.py Outdated Show resolved Hide resolved
synapse/handlers/room.py Outdated Show resolved Hide resolved
synapse/handlers/room.py Outdated Show resolved Hide resolved
@Fizzadar
Copy link
Contributor Author

Fizzadar commented Jul 7, 2022

@DMRobertson thanks for looking into this! Yes was doing some local testing on this - seeing a reduction of around 6 queries per createRoom request on a blank/empty monolith synapse, totalling 48 vs. 54 before, so ~10% maybe; was hoping for more!

I think there's actually a few more to be reduced in the subsequent name/topic/member events inside the create_room function itself, but that feels like a separate PR (can do here if preferred).

I can't replicate the complement failures locally which is annoying, even running with workers configured, any pointers here? EDIT: just seen #13161!

@DMRobertson
Copy link
Contributor

I can't replicate the complement failures locally which is annoying, even running with workers configured, any pointers here? EDIT: just seen #13161!

Yeah, I would basically ignore any complement errors that happen only on the complement-with-workers job.

@DMRobertson
Copy link
Contributor

I think there's actually a few more to be reduced in the subsequent name/topic/member events inside the create_room function itself, but that feels like a separate PR (can do here if preferred).

I did wonder if it was possible to create the events all at once, in memory, and then persist them simultaneously in one big batch. This certainly seems like a step towards that though!

Copy link
Contributor

@DMRobertson DMRobertson left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Great! LGTM but I'd just like to double-check and get a second opinion. Could another pair of eyes from @matrix-org/synapse-core take a look?

@DMRobertson DMRobertson requested a review from a team July 7, 2022 18:42
@Fizzadar
Copy link
Contributor Author

Fizzadar commented Jul 8, 2022

I did wonder if it was possible to create the events all at once, in memory, and then persist them simultaneously in one big batch. This certainly seems like a step towards that though!

This is the dream! The batch sender doesn't currently do this either, looks like there's a bunch of work to do on the event persister to support this, but it would be a dramatic improvement for sure!

Copy link
Contributor

@squahtx squahtx left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

seeing a reduction of around 6 queries per createRoom request on a blank/empty monolith synapse, totalling 48 vs. 54 before, so ~10% maybe; was hoping for more!

Would you be able to test on a worker deployment with a dedicated event persister? And/or get a list of all the sql queries createRoom runs?

Reading through the code, it looks like we'll now call get_events_as_list(state_event_ids) when creating each event instead, and the efficiency of that relies on having the previously persisted event in the event cache. I think only the event persister worker will have newly sent events cached, so there'll be an additional number of send() calls - 1 db requests with workers compared to a monolith.

The code change itself looks fine, I just wonder whether it's effective!

This actually increases the number of queries executed.
@Fizzadar
Copy link
Contributor Author

Reading through the code, it looks like we'll now call get_events_as_list(state_event_ids) when creating each event instead, and the efficiency of that relies on having the previously persisted event in the event cache. I think only the event persister worker will have newly sent events cached, so there'll be an additional number of send() calls - 1 db requests with workers compared to a monolith.

The code change itself looks fine, I just wonder whether it's effective!

Great shout! The number of queries does indeed increase for each send in workers mode - this still comes out lower than the baseline of develop branch with workers. Inspired by your comment I tried removing passing state event IDs around and it yields a further improvement - change: 0218814.

Results (two create rooms per run, for cold/hot cache results):

develop baseline:
monolith - 56, 50
workers - 63, 57

With state event IDs provided (PR before comment):
monolith - 50, 45
workers - 57, 52

Without state event IDs prvoided (latest commit push):
monolith - 44, 39
workers - 51, 46

So an even greater saving now! Feels odd that explicitly providing state IDs is less efficient...

@DMRobertson
Copy link
Contributor

Reading through the code, it looks like we'll now call get_events_as_list(state_event_ids) when creating each event instead

Due to this snippet?

state_events = await self.store.get_events_as_list(state_event_ids)
# Create a StateMap[str]
state_map = {(e.type, e.state_key): e.event_id for e in state_events}

If so, I wonder if we should be passing in a state_map instead of a state_event_ids list to create_new_client_event. I.e. force the caller to do the deduplication. See also this thread: #13210 (comment) .

Fizzadar added a commit to Fizzadar/synapse that referenced this pull request Jul 11, 2022
This yields a further improvement as we don't need to lookup those
events, instead they get pulled from state.

See discussion on: matrix-org#13210
@squahtx
Copy link
Contributor

squahtx commented Jul 11, 2022

Reading through the code, it looks like we'll now call get_events_as_list(state_event_ids) when creating each event instead

Due to this snippet?

state_events = await self.store.get_events_as_list(state_event_ids)
# Create a StateMap[str]
state_map = {(e.type, e.state_key): e.event_id for e in state_events}

That's the one!

If so, I wonder if we should be passing in a state_map instead of a state_event_ids list to create_new_client_event. I.e. force the caller to do the deduplication. See also this thread: #13210 (comment) .

Sounds plausible. Off the top of the my head, I don't remember all the places which explicitly pass state_event_ids, so can't comment on the feasibility.

Copy link
Contributor

@squahtx squahtx left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Results (two create rooms per run, for cold/hot cache results):

develop baseline:
monolith - 56, 50
workers - 63, 57

With state event IDs provided (PR before comment):
monolith - 50, 45
workers - 57, 52

Without state event IDs prvoided (latest commit push):
monolith - 44, 39
workers - 51, 46

Those are some really nice numbers!

To reduce the likelihood of a regression, can we add a comment in async def send() explaining why we don't pass state_event_ids?

@clokep
Copy link
Member

clokep commented Jul 11, 2022

To reduce the likelihood of a regression

We could also update the unit tests to assert a number of queries if we wanted (using the resource_usage property of the channel).

@Fizzadar
Copy link
Contributor Author

Fizzadar commented Jul 11, 2022

Comment added in cf1cf26. Would it be appropriate just to assert channel.resource_usage.db_txn_count in the RoomsCreateTestCase.test_post_room_no_keys test? Or all the create room tests perhaps?

TESTS! fd9b836 & 27d8298

Copy link
Contributor

@squahtx squahtx left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good, thanks for adding regression tests!

@squahtx
Copy link
Contributor

squahtx commented Jul 11, 2022

I'm going to merge, since TestRestrictedRoomsLocalJoinInMSC3787Room is a known complement flake on workers

@squahtx squahtx enabled auto-merge (squash) July 11, 2022 16:59
@squahtx squahtx disabled auto-merge July 11, 2022 16:59
@squahtx squahtx merged commit 92202ce into matrix-org:develop Jul 11, 2022
@Fizzadar Fizzadar deleted the optimise-room-creation-event-lookups branch July 11, 2022 17:02
Fizzadar added a commit to beeper/synapse-legacy-fork that referenced this pull request Aug 23, 2022
…atrix-org#13210)

Inspired by the room batch handler, this uses previous event inserts to
pre-populate prev events during room creation, reducing the number of
queries required to create a room.

Signed off by Nick @ Beeper (@Fizzadar)
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants