You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
When a remote server falls behind on federation, Synapse back off and starts batching up requests. Usually this isn't too bad as the remote end will only be maybe 1 or 2 transactions behind, however more serious occurrences can put the server behind by hundreds of transactions or thousands of events.
Many of the messages could be encrypted, which means they'll be potentially accompanied by to_device EDUs in order to decrypt the messages on the client side. If the EDUs aren't sent as part of the catchup transactions, it's possible for the clients to not be able to decrypt messages and thus make users sad/angry.
Here's an example of this happening in real life:
For background on this graph: t2bot.io (the server in question) runs 2 federation readers, 1 of which (03) is dedicated to just handling matrix.org's traffic. The other (04) is left to handle any other random server which might exist in the wild.
In the graph, t2bot.io was behind on matrix.org's transactions and thus had a very spikey waveform due to the 50 PDU transactions having to be retried. When it did catch up, it was also met with all the EDUs it missed, creating a significant spike. Traffic after that is then normal.
This has been observed to happen on several catchups already, and only noticed today (with Synapse 1.22.0) - it's unclear if this is an issue in prior versions of synapse, or is a matrix.org federation sender-specific issue.
Version information
Homeserver: t2bot.io
If not matrix.org:
Version: 1.22.0 (with minor, unrelated, patches)
Install method: pip
Platform: Ubuntu 20.04, bare metal
The text was updated successfully, but these errors were encountered:
This is a particular problem, because if your server spends a lot of time lagging behind, then you can end up receiving room events but never the e2e keys for those events
This is still a real problem, causing real UTDs. IMHO to-device events should be prioritised ahead of PDUs.
This issue has been migrated from #8691.
Description
When a remote server falls behind on federation, Synapse back off and starts batching up requests. Usually this isn't too bad as the remote end will only be maybe 1 or 2 transactions behind, however more serious occurrences can put the server behind by hundreds of transactions or thousands of events.
Many of the messages could be encrypted, which means they'll be potentially accompanied by to_device EDUs in order to decrypt the messages on the client side. If the EDUs aren't sent as part of the catchup transactions, it's possible for the clients to not be able to decrypt messages and thus make users sad/angry.
Here's an example of this happening in real life:
For background on this graph: t2bot.io (the server in question) runs 2 federation readers, 1 of which (03) is dedicated to just handling matrix.org's traffic. The other (04) is left to handle any other random server which might exist in the wild.
In the graph, t2bot.io was behind on matrix.org's transactions and thus had a very spikey waveform due to the 50 PDU transactions having to be retried. When it did catch up, it was also met with all the EDUs it missed, creating a significant spike. Traffic after that is then normal.
This has been observed to happen on several catchups already, and only noticed today (with Synapse 1.22.0) - it's unclear if this is an issue in prior versions of synapse, or is a matrix.org federation sender-specific issue.
Version information
If not matrix.org:
Version: 1.22.0 (with minor, unrelated, patches)
Install method: pip
The text was updated successfully, but these errors were encountered: