-
-
Notifications
You must be signed in to change notification settings - Fork 2.1k
Postgres port script breaks the database after synapse 1.26, doesn't allow creating new DMs. #9382
Comments
Reading through the code introduced in #8868 I'm starting to suspect this is an effect of another improperly-seeded sequence in the postgres migration script; unfortunately the checks that flagged other problem sequences in the migration script didn't catch this one. If that hunch is right, I think running this query by hand should fix this problem:
Would anyone who knows the event auth chain schema better than I do like to confirm that fiddling with the database directly there won't mess something else up? |
Thanks for digging into this, looks like we've forgotten to I can confirm that is safe to increase the values of any of our sequences (while the server is offline). |
related: #9344 |
Ran that |
I experienced this issue recently on 1.28.0 and reverted to a backup of my sqlite db. I recently retried with 1.29.0 as the release notes mentioned both #9449 and #9470. This time, I received an error for 'event_auth_chain_id' on incremental calls to the synapse_port_db. To me, this suggests the #9470 PR appears to be working, but there may still be a case not covered with the #9449 PR. @krithin's work around did seem to work and I have since migrated. Has anyone else experienced this after updating 1.29.0? My installation is on Debian 10.8 via the Debian package.
|
Description
I upgraded from sqlite to postgres using
synapse_port_db
. The script ran successfully, except for the issues I mentioned in #9344. A few days after that, however, a new user on a different HS tried to initiate a DM with me, and I found myself unable to accept the invite with a "Failed to join room: Internal Server Error". My synapse logs indicate that the problem isduplicate key value violates unique constraint "event_auth_chains_c_seq_index"
.Searching the history of #synapse:matrix.org for
event_auth_chains_c_seq_index
turned up a couple more cases where people saw that error and were unable to create new rooms or DMs after migrating to postgres, so I know it's not just me. In the past the advice they had been given was to wipe their server and start fresh, but I think that kind of a data loss bug (or a db state bug for which the only remedy is data loss) is pretty unacceptable for a messaging service.This is particularly pernicious because the postgres migration script completes successfully, and it's only possibly a few days later, when someone tries to create a new room, that you find the database is borked. This would be less of a problem if there were a clear, documented workaround (like there is for the bugs in #9344), but for this bug the crowd in #synapse:matrix.org does not know of a nondestructive fix for the problem.
Steps to reproduce
Version information
If not matrix.org:
Version: {"server_version":"1.26.0","python_version":"3.8.5"}
Install method: apt:
matrix-synapse-py3/unknown,now 1.26.0+focal1 amd64 [installed]
Platform: Ubuntu 20.04.2 LTS
The text was updated successfully, but these errors were encountered: