You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
{{ message }}
This repository has been archived by the owner on Apr 26, 2024. It is now read-only.
On April 11th, 2021 at ~12:00 UTC we saw matrix.org's user directory worker start using 100% CPU consistently, and continued doing so until restarted on April 12th 16:10 UTC.
It turns out that it was stuck doing state resolution for an IRC room with 123,000+ state events.
It's a little bit surprising that the user directory is doing state resolution at all though, as it should just be listening for membership changes happening on the current_state_deltas_stream, and updating tables used for user directory search accordingly.
In the logs, we see the following repeated multiple times per second:
2021-04-12 00:00:44,506 - synapse.replication.tcp.handler - 496 - INFO - replication_command_handler@7f0b5b2e2268 - Handling 'POSITION events event_persister-2 1939721421 1939721422'
2021-04-12 00:00:44,506 - synapse.replication.tcp.handler - 549 - INFO - process-replication-data-48623630 - Caught up with stream 'events' to 1939721422
2021-04-12 00:00:44,507 - synapse.replication.tcp.handler - 496 - INFO - replication_command_handler@7f0b5b2e2268 - Handling 'POSITION events event_persister-2 1939721422 1939721423'
2021-04-12 00:00:44,507 - synapse.replication.tcp.handler - 549 - INFO - process-replication-data-48623632 - Caught up with stream 'events' to 1939721423
2021-04-12 00:00:44,610 - synapse.state - 576 - INFO - Measure[resolve_state_groups_for_events]@7f09dc222840 - Resolving state for !xxx:domain with groups [596595428, 596513551]
2021-04-12 00:00:44,714 - synapse.state.v1 - 84 - INFO - Measure[state._resolve_events]@7f09dc222d68 - Asking for 104/104 conflicted events
2021-04-12 00:00:44,715 - synapse.state.v1 - 118 - INFO - Measure[state._resolve_events]@7f09dc222d68 - Asking for 3/3 auth events
(Note that we are using redis replication, even if that code is in the tcp/handler.py class).
So it seems that the user directory is listening to the events stream (I think), in addition to the current_state_deltas stream:
Ideally the user directory would just accept membership updates from other worker processes without needing to perform state resolution itself in the meantime.
The text was updated successfully, but these errors were encountered:
On April 11th, 2021 at ~12:00 UTC we saw matrix.org's user directory worker start using 100% CPU consistently, and continued doing so until restarted on April 12th 16:10 UTC.
It turns out that it was stuck doing state resolution for an IRC room with 123,000+ state events.
It's a little bit surprising that the user directory is doing state resolution at all though, as it should just be listening for membership changes happening on the
current_state_deltas_stream
, and updating tables used for user directory search accordingly.In the logs, we see the following repeated multiple times per second:
(Note that we are using redis replication, even if that code is in the
tcp/handler.py
class).So it seems that the user directory is listening to the
events
stream (I think), in addition to thecurrent_state_deltas
stream:synapse/synapse/handlers/user_directory.py
Lines 160 to 162 in b7748d3
Ideally the user directory would just accept membership updates from other worker processes without needing to perform state resolution itself in the meantime.
The text was updated successfully, but these errors were encountered: