You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
{{ message }}
This repository has been archived by the owner on Apr 26, 2024. It is now read-only.
Some users have been complaining about slowness on my homeserver lately. Looking into it for a bit, I noticed that postgres is spending a large amount of time in DELETE FROM statements on the device_inbox table, and performs a huge amount of IO on these DELETE FROM statements:
This behavior is currently consistent: every minute or so, a bunch of postgres workers run these DELETE FROM statements, reading gigabytes of data.
This has caused consistent large amount of load since roughly the 12th, so two days after the v1.94.0 release. I don't precisely remember when I performed that update, typically I do them relatively quickly though.
This is a VM running only postgres, synapse, and a few related services. No major changes in config were made around the 12th, the last thing that happened was addition of mjolnir on august 25th. Synapse has had constant load of these DELETE FROM queries since then.
Steps to reproduce
sorry, no idea how to reproduce. if I can give any more insight I'll try to provide it, up to a call to inspect the server if that helps.
Homeserver
matrix.stratum0.org
Synapse Version
1.95.1
Installation Method
Debian packages from packages.matrix.org
Database
postgres, single server, never ported
Workers
Single process
Platform
Debian in a VM
Configuration
Presence is on 🤷
Relevant log output
matrix=# SELECT query, calls, total_time, rows, 100.0 * shared_blks_hit /
nullif(shared_blks_hit + shared_blks_read, 0) AS hit_percent
FROM pg_stat_statements ORDER BY total_time DESC LIMIT 5;
query | calls | total_time | rows | hit_percent
-----------------------------------------------------------------------------+--------+------------------+---------+---------------------
DELETE FROM device_inbox WHERE ctid IN ( +| 4227 | 125140420.808502 | 4213183 | 83.0263107880089348
SELECT ctid FROM device_inbox +||||
WHERE user_id = $1 AND device_id = $2 AND stream_id <= $3+||||
LIMIT $4 +||||
) ||||
DELETE FROM device_inbox WHERE ctid IN ( +| 3414 | 105166717.010538 | 45377 | 69.3587470823784226
SELECT ctid FROM device_inbox +||||
WHERE user_id = $1 AND device_id = $2 AND stream_id <= $3+||||
+||||
) ||||
SELECT COUNT(*) FROM e2e_room_keys WHERE user_id = $1 AND version = $2| 2043 | 649328.821114998 | 2043 | 96.4265751586556058
SELECT DISTINCT substring(state_key FROM $1) +| 231678 | 320314.261538005 | 6226101 | 97.9818494936092724
FROM current_state_events +||||
WHERE +||||type = $2 +||||
AND membership = $3 +||||
AND room_id = $4||||selectcount(*) from device_inbox | 10 | 304487.276555 | 10 | 8.8124870074439053
(5 rows)
There are also errors like this in the log, relating to this statement:
Oct 20 22:24:44 metiadoc matrix-synapse[769]: 2023-10-20 22:24:44,127 - synapse.util.task_scheduler - 362 - ERROR - task-delete_device_messages-368-oyCvahTgzGOLEzSl- schedul>
Oct 20 22:24:44 metiadoc matrix-synapse[769]: Traceback (most recent call last):
Oct 20 22:24:44 metiadoc matrix-synapse[769]: File "/opt/venvs/matrix-synapse/lib/python3.9/site-packages/synapse/util/task_scheduler.py", line 359, in wrapper
Oct 20 22:24:44 metiadoc matrix-synapse[769]: (status, result, error) = await function(task)
Oct 20 22:24:44 metiadoc matrix-synapse[769]: File "/opt/venvs/matrix-synapse/lib/python3.9/site-packages/synapse/handlers/device.py", line 406, in _delete_device_messages
Oct 20 22:24:44 metiadoc matrix-synapse[769]: res = await self.store.delete_messages_for_device(
Oct 20 22:24:44 metiadoc matrix-synapse[769]: File "/opt/venvs/matrix-synapse/lib/python3.9/site-packages/synapse/logging/opentracing.py", line 916, in _wrapper
Oct 20 22:24:44 metiadoc matrix-synapse[769]: return await func(*args, **kwargs)
Oct 20 22:24:44 metiadoc matrix-synapse[769]: File "/opt/venvs/matrix-synapse/lib/python3.9/site-packages/synapse/storage/databases/main/deviceinbox.py", line 493, in dele>
Oct 20 22:24:44 metiadoc matrix-synapse[769]: count = await self.db_pool.runInteraction(
Oct 20 22:24:44 metiadoc matrix-synapse[769]: File "/opt/venvs/matrix-synapse/lib/python3.9/site-packages/synapse/storage/database.py", line 934, in runInteraction
Oct 20 22:24:44 metiadoc matrix-synapse[769]: return await delay_cancellation(_runInteraction())
Oct 20 22:24:44 metiadoc matrix-synapse[769]: File "/opt/venvs/matrix-synapse/lib/python3.9/site-packages/twisted/internet/defer.py", line 1993, in _inlineCallbacks
Oct 20 22:24:44 metiadoc matrix-synapse[769]: result = context.run(
Oct 20 22:24:44 metiadoc matrix-synapse[769]: File "/opt/venvs/matrix-synapse/lib/python3.9/site-packages/twisted/python/failure.py", line 518, in throwExceptionIntoGenera>
Oct 20 22:24:44 metiadoc matrix-synapse[769]: return g.throw(self.type, self.value, self.tb)
Oct 20 22:24:44 metiadoc matrix-synapse[769]: File "/opt/venvs/matrix-synapse/lib/python3.9/site-packages/synapse/storage/database.py", line 900, in _runInteraction
Oct 20 22:24:44 metiadoc matrix-synapse[769]: result = await self.runWithConnection(
Oct 20 22:24:44 metiadoc matrix-synapse[769]: File "/opt/venvs/matrix-synapse/lib/python3.9/site-packages/synapse/storage/database.py", line 1029, in runWithConnection
Oct 20 22:24:44 metiadoc matrix-synapse[769]: return await make_deferred_yieldable(
Oct 20 22:24:44 metiadoc matrix-synapse[769]: File "/opt/venvs/matrix-synapse/lib/python3.9/site-packages/twisted/python/threadpool.py", line 244, in inContext
Oct 20 22:24:44 metiadoc matrix-synapse[769]: result = inContext.theWork() # type: ignore[attr-defined]
Oct 20 22:24:44 metiadoc matrix-synapse[769]: File "/opt/venvs/matrix-synapse/lib/python3.9/site-packages/twisted/python/threadpool.py", line 260, in <lambda>
Oct 20 22:24:44 metiadoc matrix-synapse[769]: inContext.theWork = lambda: context.call( # type: ignore[attr-defined]
Oct 20 22:24:44 metiadoc matrix-synapse[769]: File "/opt/venvs/matrix-synapse/lib/python3.9/site-packages/twisted/python/context.py", line 117, in callWithContext
Oct 20 22:24:44 metiadoc matrix-synapse[769]: return self.currentContext().callWithContext(ctx, func, *args, **kw)
Oct 20 22:24:44 metiadoc matrix-synapse[769]: File "/opt/venvs/matrix-synapse/lib/python3.9/site-packages/twisted/python/context.py", line 82, in callWithContext
Oct 20 22:24:44 metiadoc matrix-synapse[769]: return func(*args, **kw)
Oct 20 22:24:44 metiadoc matrix-synapse[769]: File "/opt/venvs/matrix-synapse/lib/python3.9/site-packages/twisted/enterprise/adbapi.py", line 282, in _runWithConnection
Oct 20 22:24:44 metiadoc matrix-synapse[769]: result = func(conn, *args, **kw)
Oct 20 22:24:44 metiadoc matrix-synapse[769]: File "/opt/venvs/matrix-synapse/lib/python3.9/site-packages/synapse/storage/database.py", line 1022, in inner_func
Oct 20 22:24:44 metiadoc matrix-synapse[769]: return func(db_conn, *args, **kwargs)
Oct 20 22:24:44 metiadoc matrix-synapse[769]: File "/opt/venvs/matrix-synapse/lib/python3.9/site-packages/synapse/storage/database.py", line 762, in new_transaction
Oct 20 22:24:44 metiadoc matrix-synapse[769]: r = func(cursor, *args, **kwargs)
Oct 20 22:24:44 metiadoc matrix-synapse[769]: File "/opt/venvs/matrix-synapse/lib/python3.9/site-packages/synapse/storage/databases/main/deviceinbox.py", line 490, in dele>
Oct 20 22:24:44 metiadoc matrix-synapse[769]: txn.execute(sql, (user_id, device_id, up_to_stream_id))
Oct 20 22:24:44 metiadoc matrix-synapse[769]: File "/opt/venvs/matrix-synapse/lib/python3.9/site-packages/synapse/storage/database.py", line 421, in execute
Oct 20 22:24:44 metiadoc matrix-synapse[769]: self._do_execute(self.txn.execute, sql, parameters)
Oct 20 22:24:44 metiadoc matrix-synapse[769]: File "/opt/venvs/matrix-synapse/lib/python3.9/site-packages/synapse/storage/database.py", line 473, in _do_execute
Oct 20 22:24:44 metiadoc matrix-synapse[769]: return func(sql, *args, **kwargs)
Oct 20 22:24:44 metiadoc matrix-synapse[769]: psycopg2.errors.QueryCanceled: canceling statement due to statement timeout
### Anything else that would be useful to know?
_No response_
The text was updated successfully, but these errors were encountered:
Ah, one more thing: The device_inbox table currently has ~4M values (which seems huge for ~130 active users, btw). And yesterday it was ~8.5M. Due to the timeouts happening, I increased the postgres connection limit. possibly because of this the delete statements started working and entries were actually deleted.
I'll keep investigating that query. It seems that postgres just isn't using the index to run it 🤔
ah, the issue described here goes back to #16479 (comment). This issue caused many terabytes of useless IO for my relatively small instance and severely impacted responsiveness for some users, so glad it got sorted out quick.
Description
Some users have been complaining about slowness on my homeserver lately. Looking into it for a bit, I noticed that postgres is spending a large amount of time in DELETE FROM statements on the device_inbox table, and performs a huge amount of IO on these DELETE FROM statements:
This behavior is currently consistent: every minute or so, a bunch of postgres workers run these DELETE FROM statements, reading gigabytes of data.
This has caused consistent large amount of load since roughly the 12th, so two days after the v1.94.0 release. I don't precisely remember when I performed that update, typically I do them relatively quickly though.
This is a VM running only postgres, synapse, and a few related services. No major changes in config were made around the 12th, the last thing that happened was addition of mjolnir on august 25th. Synapse has had constant load of these DELETE FROM queries since then.
Steps to reproduce
sorry, no idea how to reproduce. if I can give any more insight I'll try to provide it, up to a call to inspect the server if that helps.
Homeserver
matrix.stratum0.org
Synapse Version
1.95.1
Installation Method
Debian packages from packages.matrix.org
Database
postgres, single server, never ported
Workers
Single process
Platform
Debian in a VM
Configuration
Presence is on 🤷
Relevant log output
There are also errors like this in the log, relating to this statement:
The text was updated successfully, but these errors were encountered: