-
-
Notifications
You must be signed in to change notification settings - Fork 2.1k
Faster joins: handle total failure to sync state #13000
Comments
As you say, it's not clear what we want in this case. Should we eventually boot the user(s) out of the room and shut it down, pretending it never happened? That sounds pretty janky, but perhaps defensible if the UI makes it clear that you're not 'properly joined' whilst the partial join is going on. Is that something we'd want to do? |
I kinda think that's what we'll have to do, ultimately, though we'd probably have to figure out a way to get the memo to the clients about the reason we're giving up on the room. To be honest that sounds like a general problem - "we've given up on this room" can happen for other reasons (notably: it getting shut down by an admin) - so this might need spec changes. |
Can we do a out-of-band leave like we do for rejecting invites? I think that would end up doing roughly the right thing? I'm kinda assuming this situation would be rare enough that we don't need to worry too much about making the UX slick, so long as we end up in a sane state. |
|
So I took a stab at this and have a branch where I did an out-of-band leave when syncing hit the total failure state (and a test for this). However, I then realized that the code that I called to process the leave was only defined on the master, and so this solution would not work for worker instances. This is as far as I got with it. I've pushed the branch here if that's helpful for anyone. |
Currently, if we try every server in the room and are unable to sync state from any of them, we give up, leaving us with a room stuck in "partial state" state, and any C-S requests for state in that room timing out indefinitely.
It's not entirely clear what we should do in this case:
synapse/synapse/handlers/federation.py
Lines 1594 to 1610 in 7c6b220
The text was updated successfully, but these errors were encountered: