-
-
Notifications
You must be signed in to change notification settings - Fork 2.1k
history purge does not remove redactions
, causing backfill to fail
#8707
Comments
I'm not sure the duplicate key issues and the broken rooms are related as the events in question aren't related to the rooms that are broken, they appear to be arbitary - whats more odd is that it only seems to happen when I'm away from my machine and element is idle |
That sounds worrying. Can you share your full logs please (at INFO level if they're at WARN currently)? |
I'll see if I can reproduce it, I already have logging set to DEBUG so hopefully I'll be able to catch it - separately though I was under the impression that the redactions issue was fixed but it does seem to happen with the latest release, only for that particular room though - perhaps some broken events being received from some servers? not really sure how to diagnose that |
Thanks! Yup, I thought the redactions problem was fixed, but perhaps it didn't fix existing problems? |
I'm pretty sure I saw it after a fresh install on 1.22 but I'll see if I can rejoin a few rooms and trigger it |
Alright, so pretty sure the history weirdness is just an element issue, only 1 error and 3 warnings (unrelated) logged with DEBUG on, and triggering activity from another client fixes the state on the broken element instance - |
Had a chat with @jwh and it looks like this is caused by the fact that purging history in the room doesn't appear to delete from the synapse/synapse/storage/databases/main/purge_events.py Lines 206 to 216 in 7941372
In @jwh this was caused by using the room retention settings, but it will also affect anyone who uses the purge history admin API. |
I'm not really sure what this issue is about - could someone update the summary to be clearer? |
We think Erik's comment from earlier is the issue
Which breaks the client's sense of what rooms have history (or not) |
... except that purge_history does delete from |
In the absence of more targeted steps to reproduce (and lack of other similar reports), we're going to close this for now. Erik notes that it might be possible to get that backtrace by: join room, purge history, try to backfill... but we're not sure this is still an issue. |
I'm only seeing "redactions" appear once in that file under |
oh! sorry, yes, that's what confused me. Should we reopen this with a clearer summary then? (or open a new issue?) |
redactions
, causing backfill to fail
Yeah so, in this case - it occured because I had retention configured, and it seems to be the issue that erik noted above after we did some testing/debug - I haven't had retention enabled since so I don't know if it's been fixed since, but a fairly easy way to reproduce at the time was enabling a short retention period, then wait for events from rooms (in most cases this was in the synapse room, but it happened in others ie; bridges that redact messages for transient errors) |
I am also encountering this bug, on a
Maybe an even easier way to test the bug would be to join Matrix HQ from a fresh homeserver, then purge the history of the room, then to try to scroll in the past. |
I also just got bitten by this. Used the Delete Room API with Click to to see logs
|
Same is happening here on a rather fresh (~1 week old) server:
Perhaps an intermediate solution to the problem might be to treat the UniqueViolation as successfull backfill? |
Occasionally, in combination with retention, redactions aren't deleted from the database whenever they are due for deletion. The server will eventually try to backfill the deleted events and trip over the already existing redaction events. Switching to an UPSERT for those events allows us to recover from there situations. The retention code still needs fixing but that is outside of my current comfort zone on this code base. This is related to matrix-org#8707 where the error was discussed already. Signed-off-by: Andreas Rammhold <[email protected]>
It's a one-line change to fix the underlying problem. I'd much rather do that than plaster over it. |
I think adding |
That sounds good as well. I still believe that we will require my fix to recover those servers that are confused about the data already existing when they are backfilling. |
@richvdh I added a 2nd commit to my PR and added the fix that you pointed out. |
Previously redacints where left behind leading to backfilling issues when the server stumbled across the already existing yet to be backfilled redactions. This issues has been discussed in matrix-org#8707. Signed-off-by: Andreas Rammhold <[email protected]>
* Upsert redactions in case they already exists Occasionally, in combination with retention, redactions aren't deleted from the database whenever they are due for deletion. The server will eventually try to backfill the deleted events and trip over the already existing redaction events. Switching to an UPSERT for those events allows us to recover from there situations. The retention code still needs fixing but that is outside of my current comfort zone on this code base. This is related to #8707 where the error was discussed already. Signed-off-by: Andreas Rammhold <[email protected]> * Also purge redactions when purging events Previously redacints where left behind leading to backfilling issues when the server stumbled across the already existing yet to be backfilled redactions. This issues has been discussed in #8707. Signed-off-by: Andreas Rammhold <[email protected]>
Description
Steps to reproduce
This time round (2nd time this exact issue has happened), the affected rooms are:
No errors are emitted by synapse, element just returns a permission denied type message when hitting "jump to unread"
Version information
1.20.0 -> 1.22.1
Install method:
pip
Platform:
Arch, LXD container
The text was updated successfully, but these errors were encountered: