Skip to content
This repository has been archived by the owner on Apr 26, 2024. It is now read-only.

Faster joins: Update room stats once state re-sync completes #12814

Closed
1 task done
Tracked by #14030
squahtx opened this issue May 20, 2022 · 6 comments · Fixed by matrix-org/complement#585 or #14874
Closed
1 task done
Tracked by #14030

Faster joins: Update room stats once state re-sync completes #12814

squahtx opened this issue May 20, 2022 · 6 comments · Fixed by matrix-org/complement#585 or #14874
Assignees
Labels
A-Federated-Join joins over federation generally suck T-Task Refactoring, removal, replacement, enabling or disabling functionality, other engineering tasks.

Comments

@squahtx
Copy link
Contributor

squahtx commented May 20, 2022


Potentially useful / possible blockers:

@babolivier babolivier added the T-Task Refactoring, removal, replacement, enabling or disabling functionality, other engineering tasks. label May 20, 2022
@MadLittleMods MadLittleMods added the A-Federated-Join joins over federation generally suck label Jun 3, 2022
@richvdh
Copy link
Member

richvdh commented Jun 9, 2022

Also: what do the room stats say during re-sync, and is that ok?

@richvdh
Copy link
Member

richvdh commented Jun 9, 2022

Points of note:

  • if success:
    logger.info("State resync complete for %s", room_id)
    self._storage_controllers.state.notify_room_un_partial_stated(
    room_id
    )
    # TODO(faster_joins) update room stats and user directory?
    # https://github.com/matrix-org/synapse/issues/12814
    # https://github.com/matrix-org/synapse/issues/12815
    return
  • # FIXME(faster_joins): what do we do here?
    # https://github.com/matrix-org/synapse/issues/12814
    # https://github.com/matrix-org/synapse/issues/12815
    # https://github.com/matrix-org/synapse/issues/13008
    return await self.stores.main.get_partial_current_state_deltas(
    prev_stream_id, max_stream_id
    )

@H-Shay
Copy link
Contributor

H-Shay commented Jan 6, 2023

Looking at the code, it seems that there are two tables that would need updating: room_stats_state, and room_stats_current. Is there any others that I am missing?
As for an approach to this, would it make sense to just pull the current state map and use the info contained within to update the tables? It appears that most of the already-existing functions to update the stats tables rely on using state deltas, which seems less useful in this context?

@squahtx
Copy link
Contributor Author

squahtx commented Jan 10, 2023

Looking at the code, it seems that there are two tables that would need updating: room_stats_state, and room_stats_current. Is there any others that I am missing?

Those are all the tables for this issue I believe.

As for the approach:

Non-faster joins operation
Normally, when persisting an event, we

  1. call PersistEventsStore._update_current_state_txn with the state delta due to the event.
  2. PersistEventsStore._update_current_state_txn inserts the delta as rows into current_state_delta_stream under a given, new, stream id (created using store._stream_id_gen).
  3. I believe we call Notifier.notify_replication() somewhere, but I have not looked.

Notifier.notify_replication() runs 4 callbacks:

StatsHandler.notify_new_event starts off a processing loop which consumes new current_state_delta_stream rows and updates room_stats_state and room_stats_current.

Faster joins operation
When un-partial stating a room, we call StateHandler.update_current_state, which also eventually calls PersistEventsStore._update_current_state_txn with the state delta due to un-partial stating the room and a new stream id (created using store._stream_id_gen). We then call Notifier.notify_replication().

So in single process mode, it looks like things might already just work? We'll want to write a complement test to confirm it.

Worker mode
We need to convince ourselves that Notifier.notify_replication gets called on other workers appropriately, otherwise room stats will lag behind under certain circumstances (when the stats processing loop isn't running until we receive another, possibly unrelated, event).

I believe StateHandler.update_current_state causes us to send EventsStreamCurrentStateRows over replication.
We currently don't trigger Notifier.notify_replication in response, which will need fixing.


In summary, we might expect things to just work in single process mode. We'll want to write a complement test for it.
In worker mode, we are missing a notify_replication call somewhere and we would expect the complement test to flake until it's added.

Hopefully the above isn't too confusing or unclear.

@H-Shay
Copy link
Contributor

H-Shay commented Jan 10, 2023

That's extremely clear, thank you very much!

squahtx pushed a commit that referenced this issue Jan 19, 2023
When finishing a partial state join to a room, we update the current
state of the room without persisting additional events. Workers receive
notice of the current state update over replication, but neglect to wake
the room stats and user directory updaters, which then get incidentally
triggered the next time an event is persisted or an unrelated event
persister sends out a stream position update.

We wake the room stats and user directory updaters at the appropriate
time in this commit.

Part of #12814 and #12815.

Signed-off-by: Sean Quah <[email protected]>
squahtx pushed a commit that referenced this issue Jan 19, 2023
When finishing a partial state join to a room, we update the current
state of the room without persisting additional events. Workers receive
notice of the current state update over replication, but neglect to wake
the room stats and user directory updaters, which then get incidentally
triggered the next time an event is persisted or an unrelated event
persister sends out a stream position update.

We wake the room stats and user directory updaters at the appropriate
time in this commit.

Part of #12814 and #12815.

Signed-off-by: Sean Quah <[email protected]>
squahtx pushed a commit that referenced this issue Jan 19, 2023
When finishing a partial state join to a room, we update the current
state of the room without persisting additional events. Workers receive
notice of the current state update over replication, but neglect to wake
the room stats and user directory updaters, which then get incidentally
triggered the next time an event is persisted or an unrelated event
persister sends out a stream position update.

We wake the room stats and user directory updaters at the appropriate
time in this commit.

Part of #12814 and #12815.

Signed-off-by: Sean Quah <[email protected]>
@squahtx
Copy link
Contributor Author

squahtx commented Jan 19, 2023

Also: what do the room stats say during re-sync, and is that ok?

During re-sync, the room stats reflect the current state of the room and exclude all remote users except new joiners.

DMRobertson pushed a commit that referenced this issue Jan 23, 2023
…n finishing join (#14874)

* Faster joins: Update room stats and user directory on workers when done

When finishing a partial state join to a room, we update the current
state of the room without persisting additional events. Workers receive
notice of the current state update over replication, but neglect to wake
the room stats and user directory updaters, which then get incidentally
triggered the next time an event is persisted or an unrelated event
persister sends out a stream position update.

We wake the room stats and user directory updaters at the appropriate
time in this commit.

Part of #12814 and #12815.

Signed-off-by: Sean Quah <[email protected]>

* fixup comment

Signed-off-by: Sean Quah <[email protected]>
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
A-Federated-Join joins over federation generally suck T-Task Refactoring, removal, replacement, enabling or disabling functionality, other engineering tasks.
Projects
None yet
5 participants