Skip to content
This repository has been archived by the owner on Apr 26, 2024. It is now read-only.

history purge does not remove redactions, causing backfill to fail #8707

Closed
jwh opened this issue Nov 3, 2020 · 21 comments · Fixed by #10343
Closed

history purge does not remove redactions, causing backfill to fail #8707

jwh opened this issue Nov 3, 2020 · 21 comments · Fixed by #10343
Labels
S-Minor Blocks non-critical functionality, workarounds exist. T-Defect Bugs, crashes, hangs, security vulnerabilities, or other reported issues.

Comments

@jwh
Copy link

jwh commented Nov 3, 2020

Description

2020-11-03 00:15:22,910 - synapse.handlers.federation - 596 - WARNING - GET-101024 - Failed to fetch missing state/auth events for $-UIzk2q1h4lzcIoiH8PbTmj8J3-LIJS04l_1bcozSeo {'$xM3Wv8GUU39u2ihr3I1xY2C4kxirgTqqtQqqYlZdqG4', '$6t2x6QF6MYqInr976BQrqd5fc9G8IUkhwa3OPTIEC7E', '$lLR9FKrM-R0J_lFgTZ8lRTj5u9hB6UUv4TbUVTfHepY', '$xAMXdmqI5m6yCvohFsX61GMzEn9ntoQVOiNjR8i_jmY', '$kLFaE-QKOnRoRtM3stSDUav0EzOi_PX2R_9TGp23m0A', '$Gsv4UnCSnaoim7CV3_3LbRaVZUTcxLuhImAwL7_z5Ss', '$5pJ-oeWmqWE-mBHvyhC_C5QqqIEXi2fpa1p1JBJD8HM', '$dSFyjXMIVWwrnTwZBofMgggmFcfWpF5S-qd0LBcEy2c', '$NqZw81jCfyk9yXt3KVk3YqS4w_k3qEw49yyVRJsoUtU', '$BRrLqFq4-JKxoFm61DxYa3Zxbl7BVXl3WGU7OJ7hslM'}
2020-11-03 00:15:23,064 - synapse.handlers.federation - 1129 - ERROR - GET-101024 - Failed to backfill from maunium.net because FirstError[#0, [Failure instance: Traceback: <class 'psycopg2.errors.UniqueViolation'>: duplicate key value violates unique constraint "redactions_event_id_key"
DETAIL:  Key (event_id)=($-UIzk2q1h4lzcIoiH8PbTmj8J3-LIJS04l_1bcozSeo) already exists.

/usr/lib/python3.8/site-packages/twisted/internet/defer.py:654:_runCallbacks
/usr/lib/python3.8/site-packages/twisted/internet/defer.py:1475:gotResult
/usr/lib/python3.8/site-packages/twisted/internet/defer.py:1416:_inlineCallbacks
/usr/lib/python3.8/site-packages/twisted/python/failure.py:512:throwExceptionIntoGenerator
--- <exception caught here> ---
/usr/lib/python3.8/site-packages/synapse/storage/persist_events.py:150:handle_queue_loop
/usr/lib/python3.8/site-packages/synapse/storage/persist_events.py:302:persisting_queue
/usr/lib/python3.8/site-packages/synapse/storage/persist_events.py:502:_persist_events
/usr/lib/python3.8/site-packages/synapse/storage/databases/main/events.py:161:_persist_events_and_state_updates
/usr/lib/python3.8/site-packages/synapse/storage/database.py:627:runInteraction
/usr/lib/python3.8/site-packages/synapse/storage/database.py:707:runWithConnection
/usr/lib/python3.8/site-packages/twisted/python/threadpool.py:250:inContext
/usr/lib/python3.8/site-packages/twisted/python/threadpool.py:266:<lambda>
/usr/lib/python3.8/site-packages/twisted/python/context.py:122:callWithContext
/usr/lib/python3.8/site-packages/twisted/python/context.py:85:callWithContext
/usr/lib/python3.8/site-packages/twisted/enterprise/adbapi.py:306:_runWithConnection
/usr/lib/python3.8/site-packages/twisted/python/compat.py:464:reraise
/usr/lib/python3.8/site-packages/twisted/enterprise/adbapi.py:297:_runWithConnection
/usr/lib/python3.8/site-packages/synapse/storage/database.py:702:inner_func
/usr/lib/python3.8/site-packages/synapse/storage/database.py:505:new_transaction
/usr/lib/python3.8/site-packages/synapse/logging/utils.py:71:wrapped
/usr/lib/python3.8/site-packages/synapse/storage/databases/main/events.py:399:_persist_events_txn
/usr/lib/python3.8/site-packages/synapse/storage/databases/main/events.py:954:_update_metadata_tables_txn
/usr/lib/python3.8/site-packages/synapse/storage/databases/main/events.py:1060:_store_redaction
/usr/lib/python3.8/site-packages/synapse/storage/database.py:809:simple_insert_txn
/usr/lib/python3.8/site-packages/synapse/storage/database.py:266:execute
/usr/lib/python3.8/site-packages/synapse/storage/database.py:292:_do_execute
]]
Traceback (most recent call last):
  File "/usr/lib/python3.8/site-packages/synapse/handlers/federation.py", line 1097, in try_backfill
    await self.backfill(
  File "/usr/lib/python3.8/site-packages/synapse/handlers/federation.py", line 909, in backfill
    await self._handle_new_events(dest, room_id, ev_infos, backfilled=True)
  File "/usr/lib/python3.8/site-packages/synapse/handlers/federation.py", line 1940, in _handle_new_events
    await self.persist_events_and_notify(
  File "/usr/lib/python3.8/site-packages/synapse/handlers/federation.py", line 2912, in persist_events_and_notify
    events, max_stream_token = await self.storage.persistence.persist_events(
  File "/usr/lib/python3.8/site-packages/synapse/storage/persist_events.py", line 240, in persist_events
    ret_vals = await make_deferred_yieldable(
twisted.internet.defer.FirstError: FirstError[#0, [Failure instance: Traceback: <class 'psycopg2.errors.UniqueViolation'>: duplicate key value violates unique constraint "redactions_event_id_key"
DETAIL:  Key (event_id)=($-UIzk2q1h4lzcIoiH8PbTmj8J3-LIJS04l_1bcozSeo) already exists.

/usr/lib/python3.8/site-packages/twisted/internet/defer.py:654:_runCallbacks
/usr/lib/python3.8/site-packages/twisted/internet/defer.py:1475:gotResult
/usr/lib/python3.8/site-packages/twisted/internet/defer.py:1416:_inlineCallbacks
/usr/lib/python3.8/site-packages/twisted/python/failure.py:512:throwExceptionIntoGenerator
--- <exception caught here> ---
/usr/lib/python3.8/site-packages/synapse/storage/persist_events.py:150:handle_queue_loop
/usr/lib/python3.8/site-packages/synapse/storage/persist_events.py:302:persisting_queue
/usr/lib/python3.8/site-packages/synapse/storage/persist_events.py:502:_persist_events
/usr/lib/python3.8/site-packages/synapse/storage/databases/main/events.py:161:_persist_events_and_state_updates
/usr/lib/python3.8/site-packages/synapse/storage/database.py:627:runInteraction
/usr/lib/python3.8/site-packages/synapse/storage/database.py:707:runWithConnection
/usr/lib/python3.8/site-packages/twisted/python/threadpool.py:250:inContext
/usr/lib/python3.8/site-packages/twisted/python/threadpool.py:266:<lambda>
/usr/lib/python3.8/site-packages/twisted/python/context.py:122:callWithContext
/usr/lib/python3.8/site-packages/twisted/python/context.py:85:callWithContext
/usr/lib/python3.8/site-packages/twisted/enterprise/adbapi.py:306:_runWithConnection
/usr/lib/python3.8/site-packages/twisted/python/compat.py:464:reraise
/usr/lib/python3.8/site-packages/twisted/enterprise/adbapi.py:297:_runWithConnection
/usr/lib/python3.8/site-packages/synapse/storage/database.py:702:inner_func
/usr/lib/python3.8/site-packages/synapse/storage/database.py:505:new_transaction
/usr/lib/python3.8/site-packages/synapse/logging/utils.py:71:wrapped
/usr/lib/python3.8/site-packages/synapse/storage/databases/main/events.py:399:_persist_events_txn
/usr/lib/python3.8/site-packages/synapse/storage/databases/main/events.py:954:_update_metadata_tables_txn
/usr/lib/python3.8/site-packages/synapse/storage/databases/main/events.py:1060:_store_redaction
/usr/lib/python3.8/site-packages/synapse/storage/database.py:809:simple_insert_txn
/usr/lib/python3.8/site-packages/synapse/storage/database.py:266:execute
/usr/lib/python3.8/site-packages/synapse/storage/database.py:292:_do_execute
]]

Steps to reproduce

  • Setup fresh Synapse instance
  • use it, after a few days rooms appear as unread and are missing history or element think there is history when there isn't
  • suffer

This time round (2nd time this exact issue has happened), the affected rooms are:

  • @appservice-irc:matrix.org (some history appears, element thinks there is unread history prior to the creation of the room)
  • 3 local mautrix bridges (no history, element thinks there is unread history prior to the creation of the room)

No errors are emitted by synapse, element just returns a permission denied type message when hitting "jump to unread"

Version information

1.20.0 -> 1.22.1

  • Install method:
    pip

  • Platform:
    Arch, LXD container

@jwh
Copy link
Author

jwh commented Nov 3, 2020

I'm not sure the duplicate key issues and the broken rooms are related as the events in question aren't related to the rooms that are broken, they appear to be arbitary - whats more odd is that it only seems to happen when I'm away from my machine and element is idle

@erikjohnston
Copy link
Member

That sounds worrying. Can you share your full logs please (at INFO level if they're at WARN currently)?

@jwh
Copy link
Author

jwh commented Nov 3, 2020

I'll see if I can reproduce it, I already have logging set to DEBUG so hopefully I'll be able to catch it - separately though I was under the impression that the redactions issue was fixed but it does seem to happen with the latest release, only for that particular room though - perhaps some broken events being received from some servers? not really sure how to diagnose that

@erikjohnston
Copy link
Member

Thanks! Yup, I thought the redactions problem was fixed, but perhaps it didn't fix existing problems?

@jwh
Copy link
Author

jwh commented Nov 9, 2020

I'm pretty sure I saw it after a fresh install on 1.22 but I'll see if I can rejoin a few rooms and trigger it

@jwh
Copy link
Author

jwh commented Nov 9, 2020

Alright, so pretty sure the history weirdness is just an element issue, only 1 error and 3 warnings (unrelated) logged with DEBUG on, and triggering activity from another client fixes the state on the broken element instance - haven't seen the redactions issue again, sorry for the noise! just spotted it, I'll pester you on matrix if you don't mind so I can share the logs

@erikjohnston
Copy link
Member

Had a chat with @jwh and it looks like this is caused by the fact that purging history in the room doesn't appear to delete from the redactions table:

for table in (
"events",
"event_json",
"event_auth",
"event_edges",
"event_forward_extremities",
"event_reference_hashes",
"event_relations",
"event_search",
"rejections",
):

In @jwh this was caused by using the room retention settings, but it will also affect anyone who uses the purge history admin API.

@erikjohnston erikjohnston added z-bug (Deprecated Label) z-p2 (Deprecated Label) Z-Help-Wanted We know exactly how to fix this issue, and would be grateful for any contribution and removed info-needed labels Nov 10, 2020
@richvdh
Copy link
Member

richvdh commented Apr 30, 2021

I'm not really sure what this issue is about - could someone update the summary to be clearer?

@richvdh richvdh added X-Needs-Info This issue is blocked awaiting information from the reporter and removed Z-Help-Wanted We know exactly how to fix this issue, and would be grateful for any contribution labels Apr 30, 2021
@callahad
Copy link
Contributor

callahad commented May 6, 2021

We think Erik's comment from earlier is the issue

purging history in the room doesn't appear to delete from the redactions table

Which breaks the client's sense of what rooms have history (or not)

@richvdh
Copy link
Member

richvdh commented May 6, 2021

... except that purge_history does delete from redactions

@callahad
Copy link
Contributor

callahad commented May 6, 2021

In the absence of more targeted steps to reproduce (and lack of other similar reports), we're going to close this for now.

Erik notes that it might be possible to get that backtrace by: join room, purge history, try to backfill... but we're not sure this is still an issue.

@callahad callahad closed this as completed May 6, 2021
@erikjohnston
Copy link
Member

... except that purge_history does delete from redactions

I'm only seeing "redactions" appear once in that file under purge_room? https://github.com/matrix-org/synapse/blob/develop/synapse/storage/databases/main/purge_events.py#L349

@richvdh
Copy link
Member

richvdh commented May 6, 2021

oh! sorry, yes, that's what confused me.

Should we reopen this with a clearer summary then? (or open a new issue?)

@richvdh richvdh changed the title Unread room markers and absent history, maybe history purge does not remove redactions, causing backfill to fail May 6, 2021
@richvdh richvdh reopened this May 6, 2021
@jwh
Copy link
Author

jwh commented May 6, 2021

Yeah so, in this case - it occured because I had retention configured, and it seems to be the issue that erik noted above after we did some testing/debug - I haven't had retention enabled since so I don't know if it's been fixed since, but a fairly easy way to reproduce at the time was enabling a short retention period, then wait for events from rooms (in most cases this was in the synapse room, but it happened in others ie; bridges that redact messages for transient errors)

@erikjohnston erikjohnston added S-Minor Blocks non-critical functionality, workarounds exist. T-Defect Bugs, crashes, hangs, security vulnerabilities, or other reported issues. and removed X-Needs-Info This issue is blocked awaiting information from the reporter z-bug (Deprecated Label) z-p2 (Deprecated Label) labels May 13, 2021
@superboum
Copy link
Contributor

superboum commented Jul 3, 2021

I am also encountering this bug, on a redaction event, after having used the /_synapse/admin/v1/purge_history_status/ API.
One line as an example, but it seems to occurs on most rooms:

2021-07-03 13:24:58,350 - synapse.handlers.federation - 1226 - ERROR - GET-84508- Failed to backfill from matrix.org because FirstError[#0, [Failure instance: Traceback: <class 'psycopg2.errors.UniqueViolation'>: duplicate key value violates unique constraint "redactions_event_id_key"
DETAIL:  Key (event_id)=($14999685271477323kiflt:matrix.org) already exists.

Maybe an even easier way to test the bug would be to join Matrix HQ from a fresh homeserver, then purge the history of the room, then to try to scroll in the past.

@moritzdietz
Copy link
Contributor

moritzdietz commented Jul 5, 2021

I also just got bitten by this. Used the Delete Room API with {"purge": true} for Synapse Announcements and not just the the Purge History API.

Click to to see logs
2021-07-05 21:34:49,687 - synapse.handlers.federation - 615 - WARNING - GET-89377 - Failed to fetch missing state events for $loiBSa9Mat5PzCXC4ZlY67ha26j1qvByHz8g3dub1g4 {'$Hd3igT_CXju8aD3w04_ju_SdOX8pn_mx_wa84XJDro8', '$JwdbJO1lzyTuJgBSCnazDslqO2YqFvKGqnlsBa5WFAg'}
2021-07-05 21:34:49,688 - synapse.handlers.federation - 628 - WARNING - GET-89377 - Failed to fetch missing auth events for $loiBSa9Mat5PzCXC4ZlY67ha26j1qvByHz8g3dub1g4 {'$5ggKX3HjhqqGJF23c-MVC3ceEr2g2J3dhQmrabDeQ3E', '$fKpErKjDOz3ppO_cgDL51A5vz9gRR7VCPPVkcEq1aYU', '$gz5_aKXT4PLoIVuddTYYzeKDpkPxykPh2MVfKnxlGRE', '$84Xemq2AeQ50MFw6xbhsPDEJNe267KzTsl2xeu-sK5E', '$Ent5NC35ODmD2DJSZu80LZUozKo4p2PhzWR543ZxjqA', '$8nPP37gQ3wkVLyJuTF-7UYRANIOXUQlHgTDvLjO4IC4'}
2021-07-05 21:34:49,857 - synapse.handlers.federation - 1242 - ERROR - GET-89377 - Failed to backfill from matrix.org because duplicate key value violates unique constraint "redactions_event_id_key"
DETAIL:  Key (event_id)=($9RVwexrk8bjuJTjRpZ7wCeMZ9_wjCzhIHJxgD2S30S0) already exists.
Traceback (most recent call last):
  File "/opt/venvs/matrix-synapse/lib/python3.8/site-packages/synapse/handlers/federation.py", line 1210, in try_backfill
    await self.backfill(
  File "/opt/venvs/matrix-synapse/lib/python3.8/site-packages/synapse/handlers/federation.py", line 1013, in backfill
    await self._auth_and_persist_events(
  File "/opt/venvs/matrix-synapse/lib/python3.8/site-packages/synapse/handlers/federation.py", line 2303, in _auth_and_persist_events
    await self.persist_events_and_notify(
  File "/opt/venvs/matrix-synapse/lib/python3.8/site-packages/synapse/handlers/federation.py", line 3295, in persist_events_and_notify
    events, max_stream_token = await self.storage.persistence.persist_events(
  File "/opt/venvs/matrix-synapse/lib/python3.8/site-packages/synapse/logging/opentracing.py", line 774, in _trace_inner
    return await func(*args, **kwargs)
  File "/opt/venvs/matrix-synapse/lib/python3.8/site-packages/synapse/storage/persist_events.py", line 320, in persist_events
    ret_vals = await yieldable_gather_results(enqueue, partitioned.items())
  File "/opt/venvs/matrix-synapse/lib/python3.8/site-packages/synapse/storage/persist_events.py", line 237, in handle_queue_loop
    ret = await self._per_item_callback(
  File "/opt/venvs/matrix-synapse/lib/python3.8/site-packages/synapse/storage/persist_events.py", line 577, in _persist_event_batch
    await self.persist_events_store._persist_events_and_state_updates(
  File "/opt/venvs/matrix-synapse/lib/python3.8/site-packages/synapse/storage/databases/main/events.py", line 176, in _persist_events_and_state_updates
    await self.db_pool.runInteraction(
  File "/opt/venvs/matrix-synapse/lib/python3.8/site-packages/synapse/storage/database.py", line 681, in runInteraction
    result = await self.runWithConnection(
  File "/opt/venvs/matrix-synapse/lib/python3.8/site-packages/synapse/storage/database.py", line 770, in runWithConnection
    return await make_deferred_yieldable(
  File "/opt/venvs/matrix-synapse/lib/python3.8/site-packages/twisted/python/threadpool.py", line 238, in inContext
    result = inContext.theWork()  # type: ignore[attr-defined]
  File "/opt/venvs/matrix-synapse/lib/python3.8/site-packages/twisted/python/threadpool.py", line 254, in <lambda>
    inContext.theWork = lambda: context.call(  # type: ignore[attr-defined]
  File "/opt/venvs/matrix-synapse/lib/python3.8/site-packages/twisted/python/context.py", line 118, in callWithContext
    return self.currentContext().callWithContext(ctx, func, *args, **kw)
  File "/opt/venvs/matrix-synapse/lib/python3.8/site-packages/twisted/python/context.py", line 83, in callWithContext
    return func(*args, **kw)
  File "/opt/venvs/matrix-synapse/lib/python3.8/site-packages/twisted/enterprise/adbapi.py", line 293, in _runWithConnection
    compat.reraise(excValue, excTraceback)
  File "/opt/venvs/matrix-synapse/lib/python3.8/site-packages/twisted/python/deprecate.py", line 298, in deprecatedFunction
    return function(*args, **kwargs)
  File "/opt/venvs/matrix-synapse/lib/python3.8/site-packages/twisted/python/compat.py", line 403, in reraise
    raise exception.with_traceback(traceback)
  File "/opt/venvs/matrix-synapse/lib/python3.8/site-packages/twisted/enterprise/adbapi.py", line 284, in _runWithConnection
    result = func(conn, *args, **kw)
  File "/opt/venvs/matrix-synapse/lib/python3.8/site-packages/synapse/storage/database.py", line 765, in inner_func
    return func(db_conn, *args, **kwargs)
  File "/opt/venvs/matrix-synapse/lib/python3.8/site-packages/synapse/storage/database.py", line 549, in new_transaction
    r = func(cursor, *args, **kwargs)
  File "/opt/venvs/matrix-synapse/lib/python3.8/site-packages/synapse/logging/utils.py", line 69, in wrapped
    return f(*args, **kwargs)
  File "/opt/venvs/matrix-synapse/lib/python3.8/site-packages/synapse/storage/databases/main/events.py", line 398, in _persist_events_txn
    self._update_metadata_tables_txn(
  File "/opt/venvs/matrix-synapse/lib/python3.8/site-packages/synapse/storage/databases/main/events.py", line 1500, in _update_metadata_tables_txn
    self._store_redaction(txn, event)
  File "/opt/venvs/matrix-synapse/lib/python3.8/site-packages/synapse/storage/databases/main/events.py", line 1583, in _store_redaction
    self.db_pool.simple_insert_txn(
  File "/opt/venvs/matrix-synapse/lib/python3.8/site-packages/synapse/storage/database.py", line 873, in simple_insert_txn
    txn.execute(sql, vals)
  File "/opt/venvs/matrix-synapse/lib/python3.8/site-packages/synapse/storage/database.py", line 297, in execute
    self._do_execute(self.txn.execute, sql, *args)
  File "/opt/venvs/matrix-synapse/lib/python3.8/site-packages/synapse/storage/database.py", line 330, in _do_execute
    return func(sql, *args)
psycopg2.errors.UniqueViolation: duplicate key value violates unique constraint "redactions_event_id_key"
DETAIL:  Key (event_id)=($9RVwexrk8bjuJTjRpZ7wCeMZ9_wjCzhIHJxgD2S30S0) already exists.

@andir
Copy link
Contributor

andir commented Jul 8, 2021

Same is happening here on a rather fresh (~1 week old) server:

Jul 08 17:27:14 matrix postgres[114432]: [114432] ERROR:  duplicate key value violates unique constraint "redactions_event_id_key"
Jul 08 17:27:14 matrix postgres[114432]: [114432] DETAIL:  Key (event_id)=($xOIiufu0hoiXGGptMkAnLOXJQFCjJaxCJ0gamYgRuAs) already exists.
Jul 08 17:27:14 matrix postgres[114432]: [114432] STATEMENT:  INSERT INTO redactions (event_id, redacts, received_ts) VALUES('$xOIiufu0hoiXGGptMkAnLOXJQFCjJaxCJ0gamYgRuAs', '$A2NfXY6tSMQOATU7cPbhtBNTMyUVlmgezvJcSUYJCA8', 1625765234509)
Jul 08 17:27:14 matrix synapse[114414]: synapse.handlers.federation: [GET-171543] Failed to backfill from nordgedanken.dev because duplicate key value violates unique constraint "redactions_event_id_key"
                                        DETAIL:  Key (event_id)=($xOIiufu0hoiXGGptMkAnLOXJQFCjJaxCJ0gamYgRuAs) already exists.
                                        Traceback (most recent call last):
                                          File "/nix/store/bcrfq4ddxzwf3kjzvqf4mwpg06ij4c60-matrix-synapse-1.37.1/lib/python3.8/site-packages/synapse/handlers/federation.py", line 1210, in try_backfill
                                            await self.backfill(
                                          File "/nix/store/bcrfq4ddxzwf3kjzvqf4mwpg06ij4c60-matrix-synapse-1.37.1/lib/python3.8/site-packages/synapse/handlers/federation.py", line 1013, in backfill
                                            await self._auth_and_persist_events(
                                          File "/nix/store/bcrfq4ddxzwf3kjzvqf4mwpg06ij4c60-matrix-synapse-1.37.1/lib/python3.8/site-packages/synapse/handlers/federation.py", line 2303, in _auth_and_persist_events
                                            await self.persist_events_and_notify(
                                          File "/nix/store/bcrfq4ddxzwf3kjzvqf4mwpg06ij4c60-matrix-synapse-1.37.1/lib/python3.8/site-packages/synapse/handlers/federation.py", line 3295, in persist_events_and_notify
                                            events, max_stream_token = await self.storage.persistence.persist_events(
                                          File "/nix/store/bcrfq4ddxzwf3kjzvqf4mwpg06ij4c60-matrix-synapse-1.37.1/lib/python3.8/site-packages/synapse/storage/persist_events.py", line 320, in persist_events
                                            ret_vals = await yieldable_gather_results(enqueue, partitioned.items())
                                          File "/nix/store/bcrfq4ddxzwf3kjzvqf4mwpg06ij4c60-matrix-synapse-1.37.1/lib/python3.8/site-packages/synapse/storage/persist_events.py", line 237, in handle_queue_loop
                                            ret = await self._per_item_callback(
                                          File "/nix/store/bcrfq4ddxzwf3kjzvqf4mwpg06ij4c60-matrix-synapse-1.37.1/lib/python3.8/site-packages/synapse/storage/persist_events.py", line 577, in _persist_event_batch
                                            await self.persist_events_store._persist_events_and_state_updates(
                                          File "/nix/store/bcrfq4ddxzwf3kjzvqf4mwpg06ij4c60-matrix-synapse-1.37.1/lib/python3.8/site-packages/synapse/storage/databases/main/events.py", line 176, in _persist_events_and_state_updates
                                            await self.db_pool.runInteraction(
                                          File "/nix/store/bcrfq4ddxzwf3kjzvqf4mwpg06ij4c60-matrix-synapse-1.37.1/lib/python3.8/site-packages/synapse/storage/database.py", line 681, in runInteraction
                                            result = await self.runWithConnection(
                                          File "/nix/store/bcrfq4ddxzwf3kjzvqf4mwpg06ij4c60-matrix-synapse-1.37.1/lib/python3.8/site-packages/synapse/storage/database.py", line 770, in runWithConnection
                                            return await make_deferred_yieldable(
                                          File "/nix/store/9fx266jnq1vwjv7cjwwphyd7w5938kgl-python3.8-Twisted-20.3.0/lib/python3.8/site-packages/twisted/python/threadpool.py", line 250, in inContext
                                            result = inContext.theWork()
                                          File "/nix/store/9fx266jnq1vwjv7cjwwphyd7w5938kgl-python3.8-Twisted-20.3.0/lib/python3.8/site-packages/twisted/python/threadpool.py", line 266, in <lambda>
                                            inContext.theWork = lambda: context.call(ctx, func, *args, **kw)
                                          File "/nix/store/9fx266jnq1vwjv7cjwwphyd7w5938kgl-python3.8-Twisted-20.3.0/lib/python3.8/site-packages/twisted/python/context.py", line 122, in callWithContext
                                            return self.currentContext().callWithContext(ctx, func, *args, **kw)
                                          File "/nix/store/9fx266jnq1vwjv7cjwwphyd7w5938kgl-python3.8-Twisted-20.3.0/lib/python3.8/site-packages/twisted/python/context.py", line 85, in callWithContext
                                            return func(*args,**kw)
                                          File "/nix/store/9fx266jnq1vwjv7cjwwphyd7w5938kgl-python3.8-Twisted-20.3.0/lib/python3.8/site-packages/twisted/enterprise/adbapi.py", line 306, in _runWithConnection
                                            compat.reraise(excValue, excTraceback)
                                          File "/nix/store/9fx266jnq1vwjv7cjwwphyd7w5938kgl-python3.8-Twisted-20.3.0/lib/python3.8/site-packages/twisted/python/compat.py", line 464, in reraise
                                            raise exception.with_traceback(traceback)
                                          File "/nix/store/9fx266jnq1vwjv7cjwwphyd7w5938kgl-python3.8-Twisted-20.3.0/lib/python3.8/site-packages/twisted/enterprise/adbapi.py", line 297, in _runWithConnection
                                            result = func(conn, *args, **kw)
                                          File "/nix/store/bcrfq4ddxzwf3kjzvqf4mwpg06ij4c60-matrix-synapse-1.37.1/lib/python3.8/site-packages/synapse/storage/database.py", line 765, in inner_func
                                            return func(db_conn, *args, **kwargs)
                                          File "/nix/store/bcrfq4ddxzwf3kjzvqf4mwpg06ij4c60-matrix-synapse-1.37.1/lib/python3.8/site-packages/synapse/storage/database.py", line 549, in new_transaction
                                            r = func(cursor, *args, **kwargs)
                                          File "/nix/store/bcrfq4ddxzwf3kjzvqf4mwpg06ij4c60-matrix-synapse-1.37.1/lib/python3.8/site-packages/synapse/logging/utils.py", line 69, in wrapped
                                            return f(*args, **kwargs)
                                          File "/nix/store/bcrfq4ddxzwf3kjzvqf4mwpg06ij4c60-matrix-synapse-1.37.1/lib/python3.8/site-packages/synapse/storage/databases/main/events.py", line 398, in _persist_events_txn
                                            self._update_metadata_tables_txn(
                                          File "/nix/store/bcrfq4ddxzwf3kjzvqf4mwpg06ij4c60-matrix-synapse-1.37.1/lib/python3.8/site-packages/synapse/storage/databases/main/events.py", line 1500, in _update_metadata_tables_txn
                                            self._store_redaction(txn, event)
                                          File "/nix/store/bcrfq4ddxzwf3kjzvqf4mwpg06ij4c60-matrix-synapse-1.37.1/lib/python3.8/site-packages/synapse/storage/databases/main/events.py", line 1583, in _store_redaction
                                            self.db_pool.simple_insert_txn(
                                          File "/nix/store/bcrfq4ddxzwf3kjzvqf4mwpg06ij4c60-matrix-synapse-1.37.1/lib/python3.8/site-packages/synapse/storage/database.py", line 873, in simple_insert_txn
                                            txn.execute(sql, vals)
                                          File "/nix/store/bcrfq4ddxzwf3kjzvqf4mwpg06ij4c60-matrix-synapse-1.37.1/lib/python3.8/site-packages/synapse/storage/database.py", line 297, in execute
                                            self._do_execute(self.txn.execute, sql, *args)
                                          File "/nix/store/bcrfq4ddxzwf3kjzvqf4mwpg06ij4c60-matrix-synapse-1.37.1/lib/python3.8/site-packages/synapse/storage/database.py", line 330, in _do_execute
                                            return func(sql, *args)
                                        psycopg2.errors.UniqueViolation: duplicate key value violates unique constraint "redactions_event_id_key"
                                        DETAIL:  Key (event_id)=($xOIiufu0hoiXGGptMkAnLOXJQFCjJaxCJ0gamYgRuAs) already exists.

Perhaps an intermediate solution to the problem might be to treat the UniqueViolation as successfull backfill?

andir added a commit to andir/synapse that referenced this issue Jul 8, 2021
Occasionally, in combination with retention, redactions aren't deleted
from the database whenever they are due for deletion. The server will
eventually try to backfill the deleted events and trip over the already
existing redaction events.

Switching to an UPSERT for those events allows us to recover from there
situations. The retention code still needs fixing but that is outside of
my current comfort zone on this code base.

This is related to matrix-org#8707 where the error was discussed already.

Signed-off-by: Andreas Rammhold <[email protected]>
@richvdh
Copy link
Member

richvdh commented Jul 8, 2021

Perhaps an intermediate solution to the problem might be to treat the UniqueViolation as successfull backfill?

It's a one-line change to fix the underlying problem. I'd much rather do that than plaster over it.

@richvdh
Copy link
Member

richvdh commented Jul 8, 2021

@andir
Copy link
Contributor

andir commented Jul 8, 2021

I think adding redactions to https://github.com/matrix-org/synapse/blob/v1.37.1/synapse/storage/databases/main/purge_events.py#L208 should fix it.

That sounds good as well. I still believe that we will require my fix to recover those servers that are confused about the data already existing when they are backfilling.

@andir
Copy link
Contributor

andir commented Jul 8, 2021

@richvdh I added a 2nd commit to my PR and added the fix that you pointed out.

@richvdh richvdh linked a pull request Jul 8, 2021 that will close this issue
4 tasks
andir added a commit to andir/synapse that referenced this issue Jul 8, 2021
Previously redacints where left behind leading to backfilling issues
when the server stumbled across the already existing yet to be
backfilled redactions.

This issues has been discussed in matrix-org#8707.

Signed-off-by: Andreas Rammhold <[email protected]>
richvdh pushed a commit that referenced this issue Jul 9, 2021
* Upsert redactions in case they already exists

Occasionally, in combination with retention, redactions aren't deleted
from the database whenever they are due for deletion. The server will
eventually try to backfill the deleted events and trip over the already
existing redaction events.

Switching to an UPSERT for those events allows us to recover from there
situations. The retention code still needs fixing but that is outside of
my current comfort zone on this code base.

This is related to #8707 where the error was discussed already.

Signed-off-by: Andreas Rammhold <[email protected]>

* Also purge redactions when purging events

Previously redacints where left behind leading to backfilling issues
when the server stumbled across the already existing yet to be
backfilled redactions.

This issues has been discussed in #8707.

Signed-off-by: Andreas Rammhold <[email protected]>
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
S-Minor Blocks non-critical functionality, workarounds exist. T-Defect Bugs, crashes, hangs, security vulnerabilities, or other reported issues.
Projects
None yet
Development

Successfully merging a pull request may close this issue.

7 participants