-
-
Notifications
You must be signed in to change notification settings - Fork 2.1k
Initial sync is taking a long time to finish #12158
Comments
After some research we found some known bugs and issues with state groups.[1][2] In our case the number of unreferenced state groups is 115173 select count(*) from state_groups sg
left join event_to_state_groups esg on esg.state_group=sg.id
left join state_group_edges e on e.prev_state_group=sg.id
where esg.state_group is null and e.prev_state_group is null;
count
--------
115173
(1 row) However, our database is kind of small: SELECT pg_size_pretty( pg_database_size('synapse') );
pg_size_pretty
----------------
1884 MB
(1 row) Also, the tables are small too: select schemaname as table_schema,
relname as table_name,
pg_size_pretty(pg_total_relation_size(relid)) as total_size,
pg_size_pretty(pg_relation_size(relid)) as data_size,
pg_size_pretty(pg_total_relation_size(relid) - pg_relation_size(relid))
as external_size
from pg_catalog.pg_statio_user_tables
order by pg_total_relation_size(relid) desc,
pg_relation_size(relid) desc
limit 10;
table_schema | table_name | total_size | data_size | external_size
--------------+------------------------+------------+-----------+---------------
public | event_json | 311 MB | 294 MB | 17 MB
public | event_edges | 139 MB | 49 MB | 90 MB
public | events | 128 MB | 57 MB | 71 MB
public | event_auth | 114 MB | 81 MB | 33 MB
public | state_groups_state | 67 MB | 44 MB | 23 MB
public | event_reference_hashes | 65 MB | 29 MB | 36 MB
public | state_groups | 55 MB | 32 MB | 23 MB
public | event_search | 55 MB | 35 MB | 20 MB
public | event_to_state_groups | 43 MB | 21 MB | 22 MB
public | event_auth_chain_links | 37 MB | 25 MB | 13 MB
(10 rows) And we don't have a large number of rooms: select count(*) from rooms ;
count
-------
3383
(1 row) This is the total number of state groups per room: SELECT room_id, count(*) cnt FROM state_groups_state GROUP BY room_id ORDER BY cnt DESC LIMIT 10;
room_id | cnt
------------------------------------------+-----
!BgEfFfYMhyOcttvNfO:domain.com | 634
!yEqKcqXpBFSlmaaUrK:domain.com | 604
!uSIfnORIJBDvblUerc:domain.com | 461
!XfQBTuKiJUHBDccOJo:domain.com | 456
!BGgFHXSbKHvIxtiMBn:domain.com | 447
!iECYZMWICqTXgiCPfr:domain.com | 436
!BBZfJtPWDLJQltvdpU:domain.com | 432
!emAdrLpvJFfSxRVUJu:domain.com | 422
!xSPVtuCmdALkdlWYwb:domain.com | 405
!VvZUZDTnaWEWMMhVbM:domain.com | 403
(10 rows) We have some custom state events that are sent to the rooms, but the amount of current state events is small: SELECT s.room_id, s.current_state_events
FROM room_stats_current s
LEFT JOIN room_stats_state r USING (room_id)
WHERE s.room_id ='!BgEfFfYMhyOcttvNfO:domain.com';
room_id | current_state_events
------------------------------------------+----------------------
!BgEfFfYMhyOcttvNfO:domain.com | 18
(1 row) |
This is what we've tried to improve performace: CREATE INDEX state_groups_state_state_key_type ON state_groups_state (state_key, type);
ALTER TABLE state_groups_state ALTER COLUMN state_group SET (n_distinct = -0.02);
ANALYZE state_groups_state (state_group)
ALTER TABLE state_groups_state ALTER state_key SET STATISTICS 5000;
ALTER TABLE state_groups_state ALTER type SET STATISTICS 5000;
ANALYZE state_groups_state
according to: We have checked these issues too: but we don't have any duplicare records SELECT COUNT(1) from ( SELECT state_group, prev_state_group FROM state_group_edges GROUP BY state_group, prev_state_group HAVING COUNT(ctid) > 1 ) AS dupes;
count
-------
0
(1 row) We added an index to state_groups_state accordng to: CREATE INDEX state_group_idx ON state_groups_state USING BTREE(state_group); We also did maintenance operations according to: VACUUM FULL
REINDEX
ANALYZE |
Hi! Thanks for the extensive write-up. Since it's not especially clear: Is the query that you're seeing the same every time or does the number change? The main thing I would suggest is tuning your caches: to do so, the rough idea is that you want to set up monitoring (Prometheus + Grafana is what we use) and look at the graphs for caches to figure out which caches should be increased in size. We seem to have some documentation about that, see here: https://matrix-org.github.io/synapse/latest/usage/administration/understanding_synapse_through_grafana_graphs.html#caches. Nowadays you should be able to turn your caches up quite a bit without too much worry about wasteful memory usage, since entries also expire after 30 minutes. You could also consider offloading some of these expensive operations to different workers, but I don't know the usage of your homeserver much to advise whether that's suitable. You might also consider tuning your Postgres database if you haven't done so; some people recommend 'PGTune' to help you get started https://pgtune.leopard.in.ua/. Using SSD rather than spinning hard drives seems to help, as well. |
Hello. We have done some tests according to the suggestions and we would like to include some information: The number in the query changes a lot. The database log looks like this: WITH RECURSIVE state(state_group) AS ( VALUES(30051::bigint) UNION ALL SELECT prev_state_group FROM state_group_edges e, state s WHERE s.state_group = e.state_group ) SELECT DISTINCT ON (type, state_key) type, state_key, event_id FROM state_groups_state WHERE state_group IN ( SELECT state_group FROM state ) AND ((type = 'm.room.history_visibility' AND state_key = '') OR (type = 'm.room.member' AND state_key = '@supervisor:domain.com')) ORDER BY type, state_key, state_group DESC
SELECT tag, content FROM room_tags WHERE user_id = '@supervisor:domain.com' AND room_id = '!FzoifeqrdPBmHrqnPR:domain.com'
SET default_transaction_isolation TO DEFAULT
WITH RECURSIVE state(state_group) AS ( VALUES(15780::bigint) UNION ALL SELECT prev_state_group FROM state_group_edges e, state s WHERE s.state_group = e.state_group ) SELECT DISTINCT ON (type, state_key) type, state_key, event_id FROM state_groups_state WHERE state_group IN ( SELECT state_group FROM state ) AND ((type = 'm.room.history_visibility' AND state_key = '') OR (type = 'm.room.member' AND state_key = '@supervisor:domain.com')) ORDER BY type, state_key, state_group DESC
BEGIN ISOLATION LEVEL REPEATABLE READ
SELECT event_id, topological_ordering, stream_ordering FROM event_relations INNER JOIN events USING (event_id) WHERE relates_to_id = '$Sxcjm83FA3XhaaWSicYHm0tVE2psVBwinRPMpfF9vnE' AND room_id = '!FzyWxEeqZyZcIYVuwY:domain.com' AND relation_type = 'm.reference' ORDER BY topological_ordering ASC, stream_ordering ASC LIMIT 6
COMMIT
COMMIT
SELECT tag, content FROM room_tags WHERE user_id = '@supervisor:domain.com' AND room_id = '!FzoFEvRzLCJvkvtdVy:domain.com'
BEGIN ISOLATION LEVEL REPEATABLE READ
BEGIN ISOLATION LEVEL REPEATABLE READ
SELECT e.event_id, e.stream_ordering, ej.internal_metadata, ej.json, ej.format_version, r.room_version, rej.reason, e.outlier FROM events AS e JOIN event_json AS ej USING (event_id) LEFT JOIN rooms r ON r.room_id = e.room_id LEFT JOIN rejections as rej USING (event_id) WHERE e.event_id = ANY(ARRAY['$MuFlEJj_sHwBmVJ6nXsDxFK-LgfBbcCdID2C6cpBtuw'])
SET default_transaction_isolation TO DEFAULT
BEGIN ISOLATION LEVEL REPEATABLE READ
WITH RECURSIVE state(state_group) AS ( VALUES(238978::bigint) UNION ALL SELECT prev_state_group FROM state_group_edges e, state s WHERE s.state_group = e.state_group ) SELECT DISTINCT ON (type, state_key) type, state_key, event_id FROM state_groups_state WHERE state_group IN ( SELECT state_group FROM state ) AND ((type = 'm.room.history_visibility' AND state_key = '') OR (type = 'm.room.member' AND state_key = '@supervisor:domain.com')) ORDER BY type, state_key, state_group DESC
WITH RECURSIVE state(state_group) AS ( VALUES(238948::bigint) UNION ALL SELECT prev_state_group FROM state_group_edges e, state s WHERE s.state_group = e.state_group ) SELECT DISTINCT ON (type, state_key) type, state_key, event_id FROM state_groups_state WHERE state_group IN ( SELECT state_group FROM state ) AND ((type = 'm.room.history_visibility' AND state_key = '') OR (type = 'm.room.member' AND state_key = '@supervisor:domain.com')) ORDER BY type, state_key, state_group DESC
BEGIN ISOLATION LEVEL REPEATABLE READ
SET LOCAL enable_seqscan=off
WITH RECURSIVE state(state_group) AS ( VALUES(238949::bigint) UNION ALL SELECT prev_state_group FROM state_group_edges e, state s WHERE s.state_group = e.state_group ) SELECT DISTINCT ON (type, state_key) type, state_key, event_id FROM state_groups_state WHERE state_group IN ( SELECT state_group FROM state ) AND ((type = 'm.room.history_visibility' AND state_key = '') OR (type = 'm.room.member' AND state_key = '@supervisor:domain.com')) ORDER BY type, state_key, state_group DESC
BEGIN ISOLATION LEVEL REPEATABLE READ The database's tunning according to https://pgtune.leopard.in.ua/ doesn't seem to improve the /sync time |
Also, we ran synapse_auto_compressor and the number of unreferenced state groups decreased to 92852. Again, maybe the backfilling is related, as stated in synapse_auto_compressor project page: select count(*) from state_groups sg
left join event_to_state_groups esg on esg.state_group=sg.id
left join state_group_edges e on e.prev_state_group=sg.id
where esg.state_group is null and e.prev_state_group is null;
count
-------
92852
(1 row) After this, the /sync time is not better. About the storage, we have this deployment in the cloud with SSD disks. |
You can tune the caches through two methods: setting a global factor, which is applied to all caches, or setting cache factors for individual caches. Increasing the cache factor will reduce cache misses, but will increase memory usage. You can find out more about how to set the cache factors here in the sample config under the heading |
Hi. We've working really hard in trying to reduce the initial sync time but the results haven't been very hopeful. We set up the caches environment varibles in our docker synapse service like this: SYNAPSE_CACHE_FACTOR_GETEVENT=2.0
SYNAPSE_CACHE_FACTOR_STATEGROUPCACHE=2.0
SYNAPSE_CACHE_FACTOR_STATEGROUPMEMBERSCACHE=2.0
SYNAPSE_CACHE_FACTOR_DEVICEFEDERATIONOUTBOXSTREAMCHANGECACHE=2.0 but login time didn't change much (just like 30 secs of difference). We've tried a lot of combinations of the state_groups compressor script, vacuum, reindex, index for (state_group, key) in the state_groups_state table but again, we haven't been able to reduce the initial sync time after login in Element. |
Hi @dbedoya and @jffrancob. Here are a few questions which might shed some light on the situation:
|
Hi @DMRobertson. At this point we think it's important to add that our deployments use Mautrix-WhatsApp bridge and a bot joined in each room is used to invite a member into the room. Once the conversation ends the member leaves the room and the process starts again in the future. We use a stack in Docker Swarm with 5 services:
We manage an average of 5000 rooms in each deployment. Apart from the owner of the room (which is the user created by the mautrix bridge for each room) we have about 10 - 20 users that, as we said before, join and leave the room constantly. We think this deployment is not complex enough to add workers. Since we don't have a lot of simultaneous /sync requests and just a single /sync request takes a long time to finish, we think using workers is not going to improve the time. |
To answer your third question, we have noticed that the users who have joined and left rooms many times appears to be more affected and the users recently created are not affected at all. Plus, we took a user and performed a leave in all the rooms he have joined and the /sync time for that user didn't improve. |
Just a small thing that came to mind?
Are they joining/leaving different rooms each time? Perhaps you would do well by having the users 'forget' the rooms they leave so that they don't come down |
Is this still an issue/has this been resolved? |
It's still an issue in our environments. We've tried a lot of approaches, but none of them have improved the /sync time for the users having the issue. Neither purging the history (even forgetting the room). In the meantime, we have deactivated the users with long /sync times, created new users to act as replacements and force joined the new users into the same rooms the deactivated users were. The new users can perform /sync in a short time. Having this into account we think the problem in /sync request is that the state_groups queries are performed since the first time the user joined the room, no matter if a forget has been executed. |
When users are leaving the rooms, is the leave initiated by their own user, or does the bot kick them? In the former case, initial sync shouldn't do any processing for those rooms, unless For synapse developers: |
@jffrancob are you still experiencing this? Are you able to answer squahtx's question? |
Has there been any progress on this issue? I see the same effect on two different setups (a. dockerized matrix-synapse, b. matrix-synapse in lxc container, both with haproxy as ssl accelerator in front and postgres in a separate container). |
@thommierother It might be worthwhile opening your own issue with a fuller description, linking to this one if you think it's related. This issue isn't very clear and the original reporter seems to have gone inactive, so it will probably be cleaner that way and avoid derailing this existing thread in case they turn out to be different things. |
closing as there has been no update from the original reporter. It's worth noting that "Initial sync is taking a long time to finish" is pretty much business as usual, at least on a largish account. |
I will open a new issue for this as I see the problem on a fresh synapse installation with a fresh account |
Hi, an update on the test requested by @squahtx : We took a synapse implementation where a user was joined in 4000 rooms. Note: This user was not leaving the rooms where he was interacting, he was kicked by a bot, so he had a lot of events in thousands of rooms related to kick event. The first test we did was to kick this user out of the 4000 rooms he was in and then we logged in with this user and had a wait time of over 8 minutes. The second test we did was to remove this user from the 4000 rooms using a leave event (to leave by himself), then we logged in and it took 1 minute and 45 seconds. The third test was to take this user and invite him to all 13000 rooms that exist on the entire test server (obviously he was not invited to the rooms he had already joined) and this user automatically accepted the invitations. The user then left all the rooms he had joined. Then we connected and it took 5 seconds. |
ok, thanks for the info. In that case I guess my problem has another root cause. The server is virtualized (proxmox LXC container). WIll try with a dendrite (golang) setup again ... |
Description
Currently we're facing a big performance issue with Synapse when the user logs in to Element and performs the initial sync. The request is taking up to 20 minutes to finish. All we can see is a recurring query in the DB that looks like this:
Steps to reproduce
It's important to add that our deployment is not federated but we use Mautrix-WhatsApp bridge and it was set up to perform backfilling.
We tried to increase the machine resources, but it didn't improved the response time.
Version information
Version: Our Synapse version is v1.49.0 and we already tried to upgrade to v1.53.0
Platform: Docker Swarm over Debian bullseye
The text was updated successfully, but these errors were encountered: