Skip to content
This repository has been archived by the owner on Apr 26, 2024. It is now read-only.

Delete room API gets stuck in a very slow database query and (practically) never completes #15635

Closed
intelfx opened this issue May 19, 2023 · 2 comments · Fixed by #15693
Closed
Assignees
Labels
A-Database DB stuff like queries, migrations, new/remove columns, indexes, unexpected entries in the db A-Performance Performance, both client-facing and admin-facing O-Uncommon Most users are unlikely to come across this or unexpected workflow S-Critical Blocks development, potential data loss, more than 25% of users possibly affected, no workarounds. X-Regression Something broke which worked on a previous release

Comments

@intelfx
Copy link
Contributor

intelfx commented May 19, 2023

Description

Attempting to delete a room using the delete room API gets stuck inside a PostgreSQL query deleting from events table:

$ psql -c "select datname, now() - query_start AS query_elapsed, state, query from pg_stat_activity where datname = 'synapse' and state = 'active' and now() - query_start > interval '1 minute';"
 datname |  query_elapsed  | state  |                               query                               
---------+-----------------+--------+-------------------------------------------------------------------
 synapse | 00:05:00.760974 | active | DELETE FROM events WHERE room_id='!HbIRFiVYbAwQBbobIc:matrix.org'
(1 row)

The relevant PostgreSQL backend process becomes pinned at 100%CPU with almost no I/O for a period of time disproportionate to the amount of affected rows (>30 minutes for 70k-ish rows on a high-end workstation-grade machine; I ran out of patience before the query ever completed).

Steps to reproduce

  • use synapse-admin or a curl invocation to delete a room (in my case, !HbIRFiVYbAwQBbobIc:matrix.org) using the delete room API
  • wait until synapse starts deleting from events table
  • observe a PostgreSQL backend process busy with the DELETE FROM events ... query for a disproportionately large time
  • cancel the query with PostgreSQL: SELECT pg_terminate_backend(pid) FROM pg_stat_activity WHERE datname = 'synapse' AND state = 'active' AND query LIKE 'DELETE FROM events %';"
  • observe relevant context in the backend termination message in PostgreSQL logs

Homeserver

intelfx.name

Synapse Version

{"server_version":"1.83.0","python_version":"3.11.3"}

Installation Method

Other (please mention below)

Database

PostgreSQL 15.3 (single, never ported from SQLite, restored from backups in the past)

Workers

Multiple workers

Platform

  • OS: Arch Linux in a nspawn container on bare metal
  • Synapse installed from the distribution packages
  • Hardware: single workstation-grade machine (AMD Ryzen CPU, >50G DDR4 non-ECC RAM, NVMe SSD)

Configuration

Relevant log output

Synapse log:

May 19 22:55:17 synapse[109688]: synapse.handlers.pagination: [DELETE-233] starting shutdown room_id !HbIRFiVYbAwQBbobIc:matrix.org with delete_id JCvYBmcOnUhcQFZl
May 19 22:55:17 synapse[109688]: synapse.access.http.8400: [DELETE-233] 127.0.0.1 - 8400 - {@intelfx:intelfx.name} Processed request: 0.004sec/0.000sec (0.001sec, 0.000sec) (0.000sec/0.002sec/3) 32B 200 "DELETE /_synapse/admin/v2/rooms/!HbIRFiVYbAwQBbobIc:matrix.org HTTP/1.1" "curl/8.1.0" [0 dbevts]
May 19 22:55:17 synapse[109688]: synapse.handlers.room: [shutdown_and_purge_room-0] Shutting down room '!HbIRFiVYbAwQBbobIc:matrix.org'
May 19 22:55:17 synapse[109688]: synapse.handlers.room: [shutdown_and_purge_room-0] Kicking '@intelfx:intelfx.name' from '!HbIRFiVYbAwQBbobIc:matrix.org'...
May 19 22:55:17 synapse[109688]: synapse.access.http.8400: [GET-234] 127.0.0.1 - 8400 - {@intelfx:intelfx.name} Processed request: 0.001sec/0.000sec (0.000sec, 0.000sec) (0.000sec/0.000sec/1) 171B 200 "GET /_synapse/admin/v2/rooms/!HbIRFiVYbAwQBbobIc:matrix.org/delete_status HTTP/1.1" "curl/8.1.0" [0 dbevts]
May 19 22:55:17 synapse[109688]: synapse.storage.controllers.persist_events: [persist_events-6] Server no longer in room !HbIRFiVYbAwQBbobIc:matrix.org
May 19 22:55:17 synapse[109688]: synapse.handlers.pagination: [shutdown_and_purge_room-0] starting purge room_id !HbIRFiVYbAwQBbobIc:matrix.org
May 19 22:55:17 synapse[109688]: synapse.storage.databases.main.purge_events: [shutdown_and_purge_room-0-!HbIRFiVYbAwQBbobIc:matrix.org] [purge] Starting initial main purge of [1/2]
May 19 22:55:17 synapse[109728]: synapse.handlers.sync: [GET-115] User membership change between getting rooms and current token: @intelfx:intelfx.name leave !HbIRFiVYbAwQBbobIc:matrix.org
May 19 22:55:18 synapse[109688]: synapse.storage.databases.main.purge_events: [shutdown_and_purge_room-0-!HbIRFiVYbAwQBbobIc:matrix.org] [purge] removing from event_auth_chain_links
May 19 22:55:20 synapse[109688]: synapse.storage.databases.main.purge_events: [shutdown_and_purge_room-0-!HbIRFiVYbAwQBbobIc:matrix.org] [purge] removing from event_auth
May 19 22:55:23 synapse[109688]: synapse.storage.databases.main.purge_events: [shutdown_and_purge_room-0-!HbIRFiVYbAwQBbobIc:matrix.org] [purge] removing from event_edges
May 19 22:55:35 synapse[109688]: synapse.storage.databases.main.purge_events: [shutdown_and_purge_room-0-!HbIRFiVYbAwQBbobIc:matrix.org] [purge] removing from event_json
May 19 22:55:38 synapse[109688]: synapse.storage.databases.main.purge_events: [shutdown_and_purge_room-0-!HbIRFiVYbAwQBbobIc:matrix.org] [purge] removing from event_push_actions_staging
May 19 22:55:38 synapse[109688]: synapse.storage.databases.main.purge_events: [shutdown_and_purge_room-0-!HbIRFiVYbAwQBbobIc:matrix.org] [purge] removing from event_relations
May 19 22:55:38 synapse[109688]: synapse.storage.databases.main.purge_events: [shutdown_and_purge_room-0-!HbIRFiVYbAwQBbobIc:matrix.org] [purge] removing from event_to_state_groups
May 19 22:55:38 synapse[109688]: synapse.storage.databases.main.purge_events: [shutdown_and_purge_room-0-!HbIRFiVYbAwQBbobIc:matrix.org] [purge] removing from event_auth_chains
May 19 22:55:40 synapse[109688]: synapse.storage.databases.main.purge_events: [shutdown_and_purge_room-0-!HbIRFiVYbAwQBbobIc:matrix.org] [purge] removing from event_auth_chain_to_calculate
May 19 22:55:40 synapse[109688]: synapse.storage.databases.main.purge_events: [shutdown_and_purge_room-0-!HbIRFiVYbAwQBbobIc:matrix.org] [purge] removing from redactions
May 19 22:55:40 synapse[109688]: synapse.storage.databases.main.purge_events: [shutdown_and_purge_room-0-!HbIRFiVYbAwQBbobIc:matrix.org] [purge] removing from rejections
May 19 22:55:40 synapse[109688]: synapse.storage.databases.main.purge_events: [shutdown_and_purge_room-0-!HbIRFiVYbAwQBbobIc:matrix.org] [purge] removing from state_events
May 19 22:55:41 synapse[109688]: synapse.storage.databases.main.purge_events: [shutdown_and_purge_room-0-!HbIRFiVYbAwQBbobIc:matrix.org] [purge] removing from current_state_events
May 19 22:55:41 synapse[109688]: synapse.storage.databases.main.purge_events: [shutdown_and_purge_room-0-!HbIRFiVYbAwQBbobIc:matrix.org] [purge] removing from destination_rooms
May 19 22:55:42 synapse[109688]: synapse.storage.txn: [shutdown_and_purge_room-0-!HbIRFiVYbAwQBbobIc:matrix.org] [TXN OPERROR] {purge_room-2c5} deadlock detected
                                 DETAIL:  Process 109715 waits for ShareLock on transaction 42293317; blocked by process 109999.
                                 Process 109999 waits for ShareLock on transaction 42293268; blocked by process 109715.
                                 HINT:  See server log for query details.
                                 CONTEXT:  while deleting tuple (6,73) in relation "destination_rooms"
                                  0/5
May 19 22:55:42 synapse[109725]: synapse.federation.sender.per_destination_queue: [federation_transaction_transmission_loop-11] Catching up rooms to matrix.im: '!HbIRFiVYbAwQBbobIc:matrix.org'
May 19 22:55:42 synapse[109725]: synapse.federation.sender.per_destination_queue: [federation_transaction_transmission_loop-18] Catching up rooms to kde.org: '!HbIRFiVYbAwQBbobIc:matrix.org'
May 19 22:55:42 synapse[109725]: synapse.federation.sender.per_destination_queue: [federation_transaction_transmission_loop-13] Catching up rooms to matrix.org: '!HbIRFiVYbAwQBbobIc:matrix.org'
May 19 22:55:42 synapse[109725]: synapse.federation.sender.per_destination_queue: [federation_transaction_transmission_loop-12] Catching up rooms to cutefunny.art: '!HbIRFiVYbAwQBbobIc:matrix.org'
May 19 22:55:42 synapse[109725]: synapse.federation.sender.per_destination_queue: [federation_transaction_transmission_loop-22] Catching up rooms to gnome.org: '!HbIRFiVYbAwQBbobIc:matrix.org'
May 19 22:55:42 synapse[109725]: synapse.federation.sender.per_destination_queue: [federation_transaction_transmission_loop-14] Catching up rooms to tchncs.de: '!HbIRFiVYbAwQBbobIc:matrix.org'
May 19 22:55:42 synapse[109725]: synapse.federation.sender.per_destination_queue: [federation_transaction_transmission_loop-15] Catching up rooms to mozilla.org: '!HbIRFiVYbAwQBbobIc:matrix.org'
May 19 22:55:42 synapse[109725]: synapse.federation.sender.per_destination_queue: [federation_transaction_transmission_loop-19] Catching up rooms to darkcloud.ca: '!HbIRFiVYbAwQBbobIc:matrix.org'
May 19 22:55:42 synapse[109725]: synapse.federation.sender.per_destination_queue: [federation_transaction_transmission_loop-25] Catching up rooms to nasnotfound.de: '!HbIRFiVYbAwQBbobIc:matrix.org'
May 19 22:55:42 synapse[109725]: synapse.federation.sender.per_destination_queue: [federation_transaction_transmission_loop-20] Catching up rooms to pub.solar: '!HbIRFiVYbAwQBbobIc:matrix.org'
May 19 22:55:42 synapse[109725]: synapse.federation.sender.per_destination_queue: [federation_transaction_transmission_loop-34] Catching up rooms to feline.support: '!HbIRFiVYbAwQBbobIc:matrix.org'
May 19 22:55:42 synapse[109725]: synapse.federation.sender.per_destination_queue: [federation_transaction_transmission_loop-16] Catching up rooms to ergaster.org: '!HbIRFiVYbAwQBbobIc:matrix.org'
May 19 22:55:42 synapse[109725]: synapse.federation.sender.per_destination_queue: [federation_transaction_transmission_loop-42] Catching up rooms to fyralabs.com: '!HbIRFiVYbAwQBbobIc:matrix.org'
May 19 22:55:42 synapse[109725]: synapse.federation.sender.per_destination_queue: [federation_transaction_transmission_loop-17] Catching up rooms to librem.one: '!HbIRFiVYbAwQBbobIc:matrix.org'
May 19 22:55:42 synapse[109725]: synapse.federation.sender.per_destination_queue: [federation_transaction_transmission_loop-26] Catching up rooms to qtmlabs.xyz: '!HbIRFiVYbAwQBbobIc:matrix.org'
May 19 22:55:42 synapse[109725]: synapse.federation.sender.per_destination_queue: [federation_transaction_transmission_loop-32] Catching up rooms to opensuse.org: '!HbIRFiVYbAwQBbobIc:matrix.org'
May 19 22:55:42 synapse[109725]: synapse.federation.sender.per_destination_queue: [federation_transaction_transmission_loop-23] Catching up rooms to freiburg.social: '!HbIRFiVYbAwQBbobIc:matrix.org'
May 19 22:55:42 synapse[109725]: synapse.federation.sender.per_destination_queue: [federation_transaction_transmission_loop-27] Catching up rooms to archlinux.org: '!HbIRFiVYbAwQBbobIc:matrix.org'
May 19 22:55:42 synapse[109725]: synapse.federation.sender.per_destination_queue: [federation_transaction_transmission_loop-28] Catching up rooms to matrix.domainepublic.net: '!HbIRFiVYbAwQBbobIc:matrix.org'
May 19 22:55:42 synapse[109725]: synapse.federation.sender.per_destination_queue: [federation_transaction_transmission_loop-29] Catching up rooms to envs.net: '!HbIRFiVYbAwQBbobIc:matrix.org'
May 19 22:55:42 synapse[109725]: synapse.federation.sender.per_destination_queue: [federation_transaction_transmission_loop-31] Catching up rooms to massivebox.net: '!HbIRFiVYbAwQBbobIc:matrix.org'
May 19 22:55:42 synapse[109725]: synapse.federation.sender.per_destination_queue: [federation_transaction_transmission_loop-41] Catching up rooms to ctrl-c.liu.se: '!HbIRFiVYbAwQBbobIc:matrix.org'
May 19 22:55:42 synapse[109725]: synapse.federation.sender.per_destination_queue: [federation_transaction_transmission_loop-37] Catching up rooms to kepstin.ca: '!HbIRFiVYbAwQBbobIc:matrix.org'
May 19 22:55:42 synapse[109725]: synapse.federation.sender.per_destination_queue: [federation_transaction_transmission_loop-33] Catching up rooms to russ.network: '!HbIRFiVYbAwQBbobIc:matrix.org'
May 19 22:55:42 synapse[109725]: synapse.federation.sender.per_destination_queue: [federation_transaction_transmission_loop-40] Catching up rooms to libera.chat: '!HbIRFiVYbAwQBbobIc:matrix.org'
May 19 22:55:42 synapse[109725]: synapse.federation.sender.per_destination_queue: [federation_transaction_transmission_loop-35] Catching up rooms to fedora.im: '!HbIRFiVYbAwQBbobIc:matrix.org'
May 19 22:55:42 synapse[109725]: synapse.federation.sender.per_destination_queue: [federation_transaction_transmission_loop-24] Catching up rooms to halogen.city: '!HbIRFiVYbAwQBbobIc:matrix.org'
May 19 22:55:42 synapse[109725]: synapse.federation.sender.per_destination_queue: [federation_transaction_transmission_loop-39] Catching up rooms to half-shot.uk: '!HbIRFiVYbAwQBbobIc:matrix.org'
May 19 22:55:42 synapse[109688]: synapse.storage.databases.main.purge_events: [shutdown_and_purge_room-0-!HbIRFiVYbAwQBbobIc:matrix.org] [purge] removing from event_auth_chain_links
May 19 22:55:45 synapse[109688]: synapse.storage.databases.main.purge_events: [shutdown_and_purge_room-0-!HbIRFiVYbAwQBbobIc:matrix.org] [purge] removing from event_auth
May 19 22:55:45 synapse[109688]: synapse.storage.databases.main.purge_events: [shutdown_and_purge_room-0-!HbIRFiVYbAwQBbobIc:matrix.org] [purge] removing from event_edges
May 19 22:55:45 synapse[109688]: synapse.storage.databases.main.purge_events: [shutdown_and_purge_room-0-!HbIRFiVYbAwQBbobIc:matrix.org] [purge] removing from event_json
May 19 22:55:45 synapse[109688]: synapse.storage.databases.main.purge_events: [shutdown_and_purge_room-0-!HbIRFiVYbAwQBbobIc:matrix.org] [purge] removing from event_push_actions_staging
May 19 22:55:45 synapse[109688]: synapse.storage.databases.main.purge_events: [shutdown_and_purge_room-0-!HbIRFiVYbAwQBbobIc:matrix.org] [purge] removing from event_relations
May 19 22:55:45 synapse[109688]: synapse.storage.databases.main.purge_events: [shutdown_and_purge_room-0-!HbIRFiVYbAwQBbobIc:matrix.org] [purge] removing from event_to_state_groups
May 19 22:55:46 synapse[109688]: synapse.storage.databases.main.purge_events: [shutdown_and_purge_room-0-!HbIRFiVYbAwQBbobIc:matrix.org] [purge] removing from event_auth_chains
May 19 22:55:46 synapse[109688]: synapse.storage.databases.main.purge_events: [shutdown_and_purge_room-0-!HbIRFiVYbAwQBbobIc:matrix.org] [purge] removing from event_auth_chain_to_calculate
May 19 22:55:46 synapse[109688]: synapse.storage.databases.main.purge_events: [shutdown_and_purge_room-0-!HbIRFiVYbAwQBbobIc:matrix.org] [purge] removing from redactions
May 19 22:55:46 synapse[109688]: synapse.storage.databases.main.purge_events: [shutdown_and_purge_room-0-!HbIRFiVYbAwQBbobIc:matrix.org] [purge] removing from rejections
May 19 22:55:46 synapse[109688]: synapse.storage.databases.main.purge_events: [shutdown_and_purge_room-0-!HbIRFiVYbAwQBbobIc:matrix.org] [purge] removing from state_events
May 19 22:55:47 synapse[109688]: synapse.storage.databases.main.purge_events: [shutdown_and_purge_room-0-!HbIRFiVYbAwQBbobIc:matrix.org] [purge] removing from current_state_events
May 19 22:55:47 synapse[109688]: synapse.storage.databases.main.purge_events: [shutdown_and_purge_room-0-!HbIRFiVYbAwQBbobIc:matrix.org] [purge] removing from destination_rooms
May 19 22:55:47 synapse[109688]: synapse.storage.databases.main.purge_events: [shutdown_and_purge_room-0-!HbIRFiVYbAwQBbobIc:matrix.org] [purge] removing from event_backward_extremities
May 19 22:55:47 synapse[109688]: synapse.storage.databases.main.purge_events: [shutdown_and_purge_room-0-!HbIRFiVYbAwQBbobIc:matrix.org] [purge] removing from event_forward_extremities
May 19 22:55:47 synapse[109688]: synapse.storage.databases.main.purge_events: [shutdown_and_purge_room-0-!HbIRFiVYbAwQBbobIc:matrix.org] [purge] removing from event_push_actions
May 19 22:55:47 synapse[109688]: synapse.storage.databases.main.purge_events: [shutdown_and_purge_room-0-!HbIRFiVYbAwQBbobIc:matrix.org] [purge] removing from event_search
May 19 22:55:48 synapse[109688]: synapse.storage.databases.main.purge_events: [shutdown_and_purge_room-0-!HbIRFiVYbAwQBbobIc:matrix.org] [purge] removing from event_failed_pull_attempts
May 19 22:55:48 synapse[109688]: synapse.storage.databases.main.purge_events: [shutdown_and_purge_room-0-!HbIRFiVYbAwQBbobIc:matrix.org] [purge] removing from partial_state_events
May 19 22:55:48 synapse[109688]: synapse.storage.databases.main.purge_events: [shutdown_and_purge_room-0-!HbIRFiVYbAwQBbobIc:matrix.org] [purge] removing from partial_state_rooms_servers
May 19 22:55:48 synapse[109688]: synapse.storage.databases.main.purge_events: [shutdown_and_purge_room-0-!HbIRFiVYbAwQBbobIc:matrix.org] [purge] removing from partial_state_rooms
May 19 22:55:48 synapse[109688]: synapse.storage.databases.main.purge_events: [shutdown_and_purge_room-0-!HbIRFiVYbAwQBbobIc:matrix.org] [purge] removing from local_current_membership
May 19 22:55:48 synapse[109688]: synapse.storage.databases.main.purge_events: [shutdown_and_purge_room-0-!HbIRFiVYbAwQBbobIc:matrix.org] [purge] removing from room_memberships
May 19 22:55:48 synapse[109688]: synapse.storage.databases.main.purge_events: [shutdown_and_purge_room-0-!HbIRFiVYbAwQBbobIc:matrix.org] [purge] removing from events
May 19 22:56:17 synapse[109688]: synapse.access.http.8400: [GET-642] 127.0.0.1 - 8400 - {@intelfx:intelfx.name} Processed request: 0.001sec/0.000sec (0.000sec, 0.000sec) (0.000sec/0.000sec/1) 188B 200 "GET /_synapse/admin/v2/rooms/!HbIRFiVYbAwQBbobIc:matrix.org/delete_status HTTP/1.1" "curl/8.1.0" [0 dbevts]
May 19 22:57:17 synapse[109688]: synapse.access.http.8400: [GET-817] 127.0.0.1 - 8400 - {@intelfx:intelfx.name} Processed request: 0.001sec/0.000sec (0.000sec, 0.000sec) (0.000sec/0.000sec/1) 188B 200 "GET /_synapse/admin/v2/rooms/!HbIRFiVYbAwQBbobIc:matrix.org/delete_status HTTP/1.1" "curl/8.1.0" [0 dbevts]
May 19 22:58:17 synapse[109688]: synapse.access.http.8400: [GET-1001] 127.0.0.1 - 8400 - {@intelfx:intelfx.name} Processed request: 0.001sec/0.000sec (0.000sec, 0.000sec) (0.000sec/0.000sec/1) 188B 200 "GET /_synapse/admin/v2/rooms/!HbIRFiVYbAwQBbobIc:matrix.org/delete_status HTTP/1.1" "curl/8.1.0" [0 dbevts]

(repeated queries at the end are related to polling from my purge room script)

pg_stat_activity:

 datid | datname |  pid   | leader_pid | usesysid | usename | application_name | client_addr | client_hostname | client_port |         backend_start         |          xact_start           |          query_start          |         state_change          | wait_event_type | wait_event | state  | backend_xid | backend_xmin | query_id |                               query                               |  backend_type  
-------+---------+--------+------------+----------+---------+------------------+-------------+-----------------+-------------+-------------------------------+-------------------------------+-------------------------------+-------------------------------+-----------------+------------+--------+-------------+--------------+----------+-------------------------------------------------------------------+----------------
 19354 | synapse | 109715 |            |    16388 | synapse |                  |             |                 |          -1 | 2023-05-19 22:54:43.329757+03 | 2023-05-19 22:55:42.853882+03 | 2023-05-19 22:55:48.174517+03 | 2023-05-19 22:55:48.174518+03 |                 |            | active |    42294370 |     42294370 |          | DELETE FROM events WHERE room_id='!HbIRFiVYbAwQBbobIc:matrix.org' | client backend
(1 row)

PostgreSQL logs:

After terminating the query, PostgreSQL logs are enlightening as to why this might happen:

May 19 23:04:51 postgres[109715]: [8-1] 2023-05-19 23:04:51.806 MSK [109715] FATAL:  terminating connection due to administrator command
May 19 23:04:51 postgres[109715]: [8-2] 2023-05-19 23:04:51.806 MSK [109715] CONTEXT:  SQL statement "SELECT 1 FROM ONLY "public"."room_memberships" x WHERE $1 OPERATOR(pg_catalog.=) "event_stream_ordering" FOR KEY SHARE OF x"
May 19 23:04:51 postgres[109715]: [8-3] 2023-05-19 23:04:51.806 MSK [109715] STATEMENT:  DELETE FROM events WHERE room_id='!HbIRFiVYbAwQBbobIc:matrix.org'

Anything else that would be useful to know?

Judging by PostgreSQL logs (the context is reproducible across multiple purge attempts), the database spends most of the time verifying foreign key constraints for various tables referencing events.stream_ordering:

Referenced by:
    <...>
    TABLE "current_state_events" CONSTRAINT "event_stream_ordering_fkey" FOREIGN KEY (event_stream_ordering) REFERENCES events(stream_ordering) NOT VALID
    TABLE "room_memberships" CONSTRAINT "event_stream_ordering_fkey" FOREIGN KEY (event_stream_ordering) REFERENCES events(stream_ordering) NOT VALID
    TABLE "local_current_membership" CONSTRAINT "event_stream_ordering_fkey" FOREIGN KEY (event_stream_ordering) REFERENCES events(stream_ordering) NOT VALID
    <...>

It looks like none of these tables are indexed on event_stream_ordering, whereas the SELECT 1 FROM ONLY ... FOR KEY SHARE OF x query seems to be a part of how the FK constraints are actually implemented in PostgreSQL.

Excerpt of the EXPLAIN ANALYZE for the problematic query (note execution time for LIMIT 10):

 Trigger for constraint event_stream_ordering_fkey: time=386.843 calls=10
 Trigger for constraint event_stream_ordering_fkey: time=159.558 calls=10
 Trigger for constraint event_stream_ordering_fkey: time=4503.586 calls=10
<...>
 Execution Time: 5051.214 ms

This exact issue has also been spotted in the wild on pgsql-general: https://www.postgresql.org/message-id/CAA-aLv78noHZ2_nFyxd3zxoRPvq6Gm2enKpRuoxm56PtALU3Bw@mail.gmail.com

Creating the necessary indices by hand fixes the issue:

CREATE INDEX tmp_current_state_events_event_stream_ordering ON current_state_events(event_stream_ordering);                                                                                                
CREATE INDEX tmp_room_memberships_event_stream_ordering ON room_memberships(event_stream_ordering);
CREATE INDEX tmp_local_current_membership_event_stream_ordering ON local_current_membership(event_stream_ordering);
DELETE FROM events WHERE room_id='!HbIRFiVYbAwQBbobIc:matrix.org';
DELETE 74083

All in all, looks like a regression from #15356.

@reivilibre reivilibre added A-Performance Performance, both client-facing and admin-facing S-Major Major functionality / product severely impaired, no satisfactory workaround. A-Database DB stuff like queries, migrations, new/remove columns, indexes, unexpected entries in the db O-Uncommon Most users are unlikely to come across this or unexpected workflow labels May 24, 2023
@reivilibre
Copy link
Contributor

Thank you for the very comprehensive report!

It seems to me that you are right:

    TABLE "current_state_events" CONSTRAINT "event_stream_ordering_fkey" FOREIGN KEY (event_stream_ordering) REFERENCES events(stream_ordering) NOT VALID
    TABLE "room_memberships" CONSTRAINT "event_stream_ordering_fkey" FOREIGN KEY (event_stream_ordering) REFERENCES events(stream_ordering) NOT VALID
    TABLE "local_current_membership" CONSTRAINT "event_stream_ordering_fkey" FOREIGN KEY (event_stream_ordering) REFERENCES events(stream_ordering) NOT VALID

these referees need to have an index so the DELETE can check it isn't breaking any foreign key constraints by deleting the potentially-referenced rows.

As you say, it just appears to have been forgotten in #15356.

@reivilibre reivilibre added S-Critical Blocks development, potential data loss, more than 25% of users possibly affected, no workarounds. and removed S-Major Major functionality / product severely impaired, no satisfactory workaround. labels May 24, 2023
@erikjohnston erikjohnston added the X-Regression Something broke which worked on a previous release label May 25, 2023
@reivilibre
Copy link
Contributor

This should be fixed in v1.85.0rc2 ­— you'll need to give it enough time for the background update to build the indices, but afterwards it should be good.

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
A-Database DB stuff like queries, migrations, new/remove columns, indexes, unexpected entries in the db A-Performance Performance, both client-facing and admin-facing O-Uncommon Most users are unlikely to come across this or unexpected workflow S-Critical Blocks development, potential data loss, more than 25% of users possibly affected, no workarounds. X-Regression Something broke which worked on a previous release
Projects
None yet
3 participants