Room events arrive faster than to-device messages during federation connectivity problems, causing decryption failures #1123

turt2live · 2022-06-14T05:07:41Z

Suppose server A is struggling to send federation traffic to server B. (This might be because B is slow, is having connectivity problems, or there is just a huge queue of traffic.)

In this situation, B will often still receive encrypted messages from A: they may be pulled in from other servers in the room. However, the keys to these messages will be stuck in a queue on A. Users on B therefore see decryption errors.

ref element-hq/element-web#3754

MadLittleMods · 2023-04-11T23:46:08Z

Related to #966 (potential duplicate)

richvdh · 2023-04-12T09:42:36Z

Related to #966 (potential duplicate)

It's not a duplicate: #966 talks about the ordering of to-device messages relative to one another, while this is about to-device messages relative to room events.

dkasak · 2023-05-16T13:17:58Z

According to https://matrix-org.github.io/synapse/dev-docs/latest/modules/federation_sender.html#a-note-on-failures-and-back-offs, Synapse already mitigates this to some degree by restarting the to-device transmission loop once it receives an inbound request from the remote server.

ara4n · 2023-10-06T19:13:35Z

The core problem here is that todevice messages don't flow transitively, but timeline msgs do. So if server A can't route to server B, messages may all flow transitively A->C->B - but the keys will never get through, no matter what the retry schedule is.

Possible solutions off the top of my head:

Forbid transitive backfill for encrypted rooms
Let server B request missing todevice traffic from server A via server C. So:
- server B realises it's receiving current events from server A via server C
- server B assumes that A can't route to it directly
- server B asks server C to ask server A for B's todevice msgs.

The latter is more fiddly, but preserves Matrix's somewhat desirable self-healing transitive delivery properties.

Alternatively, does MLS solve this by putting the key data in the timeline rather than todevice msgs? (cc @uhoreg)

uhoreg · 2023-10-07T14:30:55Z

Most of the MLS stuff happens in-room. The one thing that can happen in to-device messages is Welcome messages, but those could also be sent in-room. We haven't figured out if it would be better to use to-device messages or in-room messages for that.

dkasak · 2023-10-08T12:18:31Z

Presumably Matrix could've gotten away with putting it key data into the room too but for some reason chose not to. Are there any concerns with MLS key data being stored in the DAG forever? E.g. bloat or problems arising out of making permanent something that should be ephemeral?

uhoreg · 2023-10-10T16:06:12Z

Yeah, the main concerns about storing the MLS Welcome messages in the DAG are bloat-related. It would be irrelevant information to most of the people in the room.

uhoreg · 2024-01-19T16:15:20Z

We may be able to flag events that we got relayed from a different server, so that clients can indicate that there may be a decryption failure.

kegsay · 2024-01-26T16:34:17Z

I think there are two separate issues here:

network partitions over federation i.e no amount of waiting will give you the to-device msgs.
temporary lag where to-device msgs take ages to arrive.

The latter is quite frequent, and we see this (almost) daily between element.io <--> matrix.org. If we renegotiated room keys less frequently (or had more time to let the room key arrive by not sending it when typing..) this would concretely help I think.

uhoreg · 2024-01-31T02:54:56Z

(or had more time to let the room key arrive by not sending it when typing..)

I don't understand what you mean by this. By sending it when typing, that means that we send it earlier, which means it has more time to arrive before the room event is received?

BillCarsonFr · 2024-09-24T11:18:40Z

Alternatively, does MLS solve this by putting the key data in the timeline rather than todevice msgs?

A document on that subject was created https://docs.google.com/document/d/1f5XMf4qXEzoUBEFDu5IWJXvFAc76D9_dwqt0ReoCRpw/edit#heading=h.a4ncf0eur6o2

turt2live · 2024-09-24T15:35:56Z

MLS key data would most likely be sent over to-device in the linearized matrix models we've been discussing. The guarantees for delivery are a bit higher, though.

turt2live added wart A point where the protocol is inconsistent or inelegant A-E2EE Issues about end-to-end encryption labels Jun 14, 2022

turt2live mentioned this issue Jun 14, 2022

if your server goes down, when it comes back you get messages long before the keys element-hq/element-web#3754

Closed

richvdh changed the title ~~if your server goes down, when it comes back you get messages long before the keys~~ Room events arrive faster than to-device messages during federation connectivity problems, causing decruption failures Jan 13, 2023

richvdh changed the title ~~Room events arrive faster than to-device messages during federation connectivity problems, causing decruption failures~~ Room events arrive faster than to-device messages during federation connectivity problems, causing decryption failures Jan 13, 2023

richvdh added feature Suggestion for a significant extension which needs considerable consideration and removed wart A point where the protocol is inconsistent or inelegant labels Jan 13, 2023

richvdh mentioned this issue Jan 13, 2023

Unable To Decrypt meta issue element-hq/element-meta#245

Open

82 tasks

richvdh mentioned this issue Feb 28, 2024

Epic: Users should not see UTDs for messages they are not supposed to be able to read element-hq/element-meta#2312

Closed

richvdh mentioned this issue Sep 23, 2024

Federation catchup doesn't send to_device EDUs until the remote end has caught up element-hq/synapse#8691

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Room events arrive faster than to-device messages during federation connectivity problems, causing decryption failures #1123

Room events arrive faster than to-device messages during federation connectivity problems, causing decryption failures #1123

turt2live commented Jun 14, 2022 •

edited by richvdh

Loading

MadLittleMods commented Apr 11, 2023

richvdh commented Apr 12, 2023

dkasak commented May 16, 2023

ara4n commented Oct 6, 2023 •

edited

Loading

uhoreg commented Oct 7, 2023

dkasak commented Oct 8, 2023

uhoreg commented Oct 10, 2023

uhoreg commented Jan 19, 2024

kegsay commented Jan 26, 2024

uhoreg commented Jan 31, 2024

BillCarsonFr commented Sep 24, 2024

turt2live commented Sep 24, 2024

Room events arrive faster than to-device messages during federation connectivity problems, causing decryption failures #1123

Room events arrive faster than to-device messages during federation connectivity problems, causing decryption failures #1123

Comments

turt2live commented Jun 14, 2022 • edited by richvdh Loading

MadLittleMods commented Apr 11, 2023

richvdh commented Apr 12, 2023

dkasak commented May 16, 2023

ara4n commented Oct 6, 2023 • edited Loading

uhoreg commented Oct 7, 2023

dkasak commented Oct 8, 2023

uhoreg commented Oct 10, 2023

uhoreg commented Jan 19, 2024

kegsay commented Jan 26, 2024

uhoreg commented Jan 31, 2024

BillCarsonFr commented Sep 24, 2024

turt2live commented Sep 24, 2024

turt2live commented Jun 14, 2022 •

edited by richvdh

Loading

ara4n commented Oct 6, 2023 •

edited

Loading