-
-
Notifications
You must be signed in to change notification settings - Fork 96
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Room events arrive faster than to-device messages during federation connectivity problems, causing decryption failures #1123
Comments
Related to #966 (potential duplicate) |
According to https://matrix-org.github.io/synapse/dev-docs/latest/modules/federation_sender.html#a-note-on-failures-and-back-offs, Synapse already mitigates this to some degree by restarting the to-device transmission loop once it receives an inbound request from the remote server. |
The core problem here is that todevice messages don't flow transitively, but timeline msgs do. So if server A can't route to server B, messages may all flow transitively A->C->B - but the keys will never get through, no matter what the retry schedule is. Possible solutions off the top of my head:
The latter is more fiddly, but preserves Matrix's somewhat desirable self-healing transitive delivery properties. Alternatively, does MLS solve this by putting the key data in the timeline rather than todevice msgs? (cc @uhoreg) |
Most of the MLS stuff happens in-room. The one thing that can happen in to-device messages is Welcome messages, but those could also be sent in-room. We haven't figured out if it would be better to use to-device messages or in-room messages for that. |
Presumably Matrix could've gotten away with putting it key data into the room too but for some reason chose not to. Are there any concerns with MLS key data being stored in the DAG forever? E.g. bloat or problems arising out of making permanent something that should be ephemeral? |
Yeah, the main concerns about storing the MLS Welcome messages in the DAG are bloat-related. It would be irrelevant information to most of the people in the room. |
We may be able to flag events that we got relayed from a different server, so that clients can indicate that there may be a decryption failure. |
I think there are two separate issues here:
The latter is quite frequent, and we see this (almost) daily between element.io <--> matrix.org. If we renegotiated room keys less frequently (or had more time to let the room key arrive by not sending it when typing..) this would concretely help I think. |
I don't understand what you mean by this. By sending it when typing, that means that we send it earlier, which means it has more time to arrive before the room event is received? |
A document on that subject was created https://docs.google.com/document/d/1f5XMf4qXEzoUBEFDu5IWJXvFAc76D9_dwqt0ReoCRpw/edit#heading=h.a4ncf0eur6o2 |
MLS key data would most likely be sent over to-device in the linearized matrix models we've been discussing. The guarantees for delivery are a bit higher, though. |
Suppose server A is struggling to send federation traffic to server B. (This might be because B is slow, is having connectivity problems, or there is just a huge queue of traffic.)
In this situation, B will often still receive encrypted messages from A: they may be pulled in from other servers in the room. However, the keys to these messages will be stuck in a queue on A. Users on B therefore see decryption errors.
ref element-hq/element-web#3754
The text was updated successfully, but these errors were encountered: