New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

Sign up for GitHub

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Jump to bottom

MSC2775: Lazy loading over federation #2775

Closed

ara4n wants to merge 8 commits into old_master from matthew/msc2775

Member

ara4n commented Sep 14, 2020 •

edited

Loading

An MSC to define how to massively speed up joins (and MSC #2444 peeks) by incrementally sending m.room.member events after your server has joined/peeked the room, rather than up front.

ara4n added 2 commits

September 14, 2020 02:05


          MSC2775: Lazy loading over federation

e217542

A first cut at an MSC to define how to massively speed up joins (and MSC #2444 peeks)
by sending irrelevant  events after you join rather than up front


          suggest we LL referenced user_ids

192d508

ara4n added the proposal label


          punctuation

8b6ff35

turt2live added kind:feature proposal-in-review kind:core and removed kind:feature labels


          security considerations

cf2ad9b

ara4n commented

View reviewed changes

proposals/2775-lazy-loading-over-federation.md Outdated

+              Clients which are lazy loading members however may return the initial `/join`
+              or `/peek` before `/state` has completed.  However, we need a way to tell
+              clients once the server has finished synchronising its local state.  For

Member Author

ara4n Sep 14, 2020

need to spec how the client tells the server that it knows how to handle SS LL

neilalexander reviewed

View reviewed changes

proposals/2775-lazy-loading-over-federation.md Show resolved Hide resolved

turt2live self-requested a review

September 15, 2020 00:16


          lots of fixes thanks to @erikjohnston

b640f45

erikjohnston reviewed

View reviewed changes

proposals/2775-lazy-loading-over-federation.md Outdated Show resolved Hide resolved

erikjohnston reviewed

View reviewed changes

proposals/2775-lazy-loading-over-federation.md Outdated Show resolved Hide resolved

ara4n added 3 commits

September 15, 2020 17:32


          spell out that you may need to fetch more events to auth

21da07a


          lazyloaded backfill

3b78a26


          clarify the alt option

6e2d7c7

neilalexander reviewed

View reviewed changes

proposals/2775-lazy-loading-over-federation.md

+                 [requirements of the CS API](https://github.com/matrix-org/matrix-doc/blob/1c7a6a9c7fa2b47877ce8790ea5e5c588df5fa90/api/client-server/sync.yaml#L148))
+               * any members which are in the auth chain for the state events in the response
+               * any members which are power events (aka control events): bans & kicks.
+               * one joined member per server (if we want to be able to send messages while

Contributor

neilalexander Sep 19, 2020

I worry that this will create a potential race at the protocol level that may be exploitable by a bad actor in the room.

For example, in the situation that you join a room with two or more other users that are resident on the same server remote.com. You learn about user A but not about B, C or D.

User A detects your join and then leaves the room immediately before you are able to retrieve the rest of the room state, therefore you think you are the only occupant of the room. As you don’t know about any users from remote.com anymore, you no longer know if that server is still resident in the room and therefore you don’t know if you can ask it for room state.

The impact of this is lessened if you can include more than one membership from a given homeserver—even knowing about two or three users reduces the chance of this ever being an issue.

neilalexander reviewed

View reviewed changes

proposals/2775-lazy-loading-over-federation.md

+              The joining server can then sync in the remaining membership events by calling
+              `/state` as of the user's join event.  To avoid retrieving duplicate data, we
+              propose adding a parameter of `lazy_load_members_only: true` to the JSON
+              request body which would then only return the missing `m.room.member` events.

Contributor

neilalexander Sep 19, 2020

This implies that the homeserver needs to track which membership events have been sent to which users, which feels like it might create a lot of additional complexity for homeserver implementors. It might just be better (certainly a lot simpler) to send the entire room state and deal with the duplicates.

neilalexander reviewed

View reviewed changes

proposals/2775-lazy-loading-over-federation.md

+              The vast majority of state events in Matrix today are `m.room.member` events.
+              For instance, 99.4% (30661 out of 30856) of Matrix HQ's state is
+              `m.room.member`s (see Stats section below).

Contributor

neilalexander Sep 19, 2020

It would be interesting to know how many of these are actually ”membership”: “join” and not users that have left.

Certainly another optimisation would be to not bother telling homeservers about ”leave” membership events until they need to know them for some reason (which is probably when processing their next join and, even then, unless they are ”invite” or ”ban” I’m still not sure why we care about their previous ”leave” as long as join rules permit).

emorrp1 Nov 9, 2020

Looks like it's approximately 2:1 join:leave for #matrix:matrix.org, but approximately even for #fdroid:f-droid.org. It'd be a trickier query for "not users that have left".

select json::json#>>'{content,membership}' as membership, count(*) from state_events natural join event_json where type='m.room.member' and room_id='!OGEhHVWSdvArJzumhm:matrix.org' group by membership order by count(*) desc;
 membership | count 
------------+-------
 join       | 24269
 leave      | 12029
 ban        |   307
 invite     |   145

Member

richvdh Oct 25, 2021

current figures for #matrix:

 count | membership 
-------+------------
  1458 | ban
 26287 | join
 26251 | leave

neilalexander reviewed

View reviewed changes

proposals/2775-lazy-loading-over-federation.md

+                 to)
+               * any membership events with membership `invite` (to mitigate risk of double invites)
+               * any members for user_ids which are referred to by the content of state events
+                 in the response (e.g. `m.room.power_levels`) <-- TBD.  These could be irrelevant,

Contributor

neilalexander Sep 19, 2020

This does seem irrelevant, as the power levels are still enforced even for users that we don’t know about yet. Anything that’s important for auth will already be in the auth chain.

neilalexander reviewed

View reviewed changes

proposals/2775-lazy-loading-over-federation.md

+              server has fully synchronised the state for this room.  Operations which are
+              blocked on state being fully synchronised are:
+               * Sending E2EE messages, otherwise some of the users will not have the keys

Contributor

neilalexander Sep 19, 2020

Something to consider here is that we can’t even start sending device list updates for users until we learn about those users, let alone exchanging keys, so this might create another protocol-level race when joining E2E rooms if you start sending messages into the room before you know about all the devices in the room (resulting in UTDs).

Member

richvdh Oct 25, 2021

@neilalexander I'm not really following you here. You seem to be saying the same thing as the MSC (if a user sends E2E messages before they have the user list, their client will not know who to encrypt for).

neilalexander reviewed

View reviewed changes

proposals/2775-lazy-loading-over-federation.md

+              calculate resolved state as of that event for authorising events and servicing
+              /state queries etc.  Loading the power events up front lets us authorise new
+              events (backfilled & new traffic) using partial state - when you receive an
+              event you do the lookup of the event to the list of event state keys you need

Contributor

neilalexander Sep 19, 2020 •

edited

Loading

It’s worth calling out that checking for soft-failures depends on us not just knowing the state at the time of the event, but also the current room state, therefore it’s essential to retrieve the latest membership state for a user too in addition to the membership event supplied in the newly-received event’s auth events (which may be out of date).

We need to be able to consult a server that we know to be in the room right now for that information.

Member

richvdh Oct 25, 2021

We need to be able to consult a server that we know to be in the room right now for that information.

Consider we receive an event from a user which the federation as a whole doesn't believe is currently in the room. Clearly the sending server believes that user is in the room, because it let the user send a message.

In short, we'd need to carefully consider which server we consult.

That said - maybe it's ok to allow such events through for the period we are syncing state.

This was referenced Oct 27, 2020

MSC1772: Matrix spaces #1772

Merged

Slow joins lead to poor UX element-hq/element-meta#1383

Open

turt2live removed their request for review

March 22, 2021 03:01

turt2live added the needs-implementation label

turt2live force-pushed the old_master branch from e895827 to dca99ee Compare

August 30, 2021 22:34

richvdh reviewed

View reviewed changes

proposals/2775-lazy-loading-over-federation.md

+               * Room state can be big.  For instance, a /send_join response for Matrix HQ is
+                 currently 24MB of JSON covering 28,188 events, and could easily take tens of
+                 seconds to calculate and send (especially on lower-end hardware).
+               * All these events have to be verified by the receiving server.

Member

richvdh Oct 25, 2021

Suggested change

      
             * All these events have to be verified by the receiving server.
          
             * All these events have to be verified and persisted by the receiving server.

Our testing shows the main problem is writing the events to the database.

timokoesters Dec 30, 2021

Is it possible to keep them in ram and persist them in the background while users can already use the room? Probably not because the server might crash...

proposals/2775-lazy-loading-over-federation.md

+              The vast majority of state events in Matrix today are `m.room.member` events.
+              For instance, 99.4% (30661 out of 30856) of Matrix HQ's state is
+              `m.room.member`s (see Stats section below).

Member

richvdh Oct 25, 2021

current figures for #matrix:

 count | membership 
-------+------------
  1458 | ban
 26287 | join
 26251 | leave

proposals/2775-lazy-loading-over-federation.md

+                 [requirements of the CS API](https://github.com/matrix-org/matrix-doc/blob/1c7a6a9c7fa2b47877ce8790ea5e5c588df5fa90/api/client-server/sync.yaml#L148))
+               * any members which are in the auth chain for the state events in the response
+               * any members which are power events (aka control events): bans & kicks.
+               * one joined member per server (if we want to be able to send messages while

Member

richvdh Oct 25, 2021

for context, this will be 2462 membership events for Matrix HQ (of 54194 total state) at present.

Member

richvdh Oct 25, 2021

Can we just ask the server we join via to send us a list of the servers in the room? There doesn't seem to be any need to have actual membership events for them.

proposals/2775-lazy-loading-over-federation.md

+              server has fully synchronised the state for this room.  Operations which are
+              blocked on state being fully synchronised are:
+               * Sending E2EE messages, otherwise some of the users will not have the keys

Member

richvdh Oct 25, 2021

@neilalexander I'm not really following you here. You seem to be saying the same thing as the MSC (if a user sends E2E messages before they have the user list, their client will not know who to encrypt for).

proposals/2775-lazy-loading-over-federation.md

+              calculate resolved state as of that event for authorising events and servicing
+              /state queries etc.  Loading the power events up front lets us authorise new
+              events (backfilled & new traffic) using partial state - when you receive an
+              event you do the lookup of the event to the list of event state keys you need

Member

richvdh Oct 25, 2021

We need to be able to consult a server that we know to be in the room right now for that information.

Consider we receive an event from a user which the federation as a whole doesn't believe is currently in the room. Clearly the sending server believes that user is in the room, because it let the user send a message.

In short, we'd need to carefully consider which server we consult.

That said - maybe it's ok to allow such events through for the period we are syncing state.

proposals/2775-lazy-loading-over-federation.md

+              Causes include:
+               * Room state can be big.  For instance, a /send_join response for Matrix HQ is
+                 currently 24MB of JSON covering 28,188 events, and could easily take tens of

Member

richvdh Oct 25, 2021

now 115M and 144K events, for the record.

proposals/2775-lazy-loading-over-federation.md

Comment on lines +94 to +98

+              /state queries etc.  Loading the power events up front lets us authorise new
+              events (backfilled & new traffic) using partial state - when you receive an
+              event you do the lookup of the event to the list of event state keys you need
+              to auth; and if any of those are missing you need to fetch them from the
+              remote server by type & state_key via /state (and auth them too).

Member

richvdh Oct 25, 2021

The problem here is that we're effectively trusting a remote server to tell us about part of the state at a given point in the DAG.

Currently we trust one single server to give us a correct impression of the room state, via the /send_join response - but in that case we've normally chosen a specific server via the room alias. I'm a bit worried about opening it up so that any server in the federation (not even necessarily in the room) can make claims about room state.

Given we should already have the ACL list and the kick/ban list, I'm not sure I can think of a way to abuse this too much that isn't already a problem, but I think it's something we need to be careful about.

proposals/2775-lazy-loading-over-federation.md

+              events (backfilled & new traffic) using partial state - when you receive an
+              event you do the lookup of the event to the list of event state keys you need
+              to auth; and if any of those are missing you need to fetch them from the
+              remote server by type & state_key via /state (and auth them too).

Member

richvdh Oct 25, 2021

It may also be worth noting that the auth chain for some events can be pretty huge, since it includes all past joins and leaves for a given user, meaning that some users in HQ now have auth chains thousands of events long.

proposals/2775-lazy-loading-over-federation.md

Comment on lines +106 to +108

+              of all the new events we've been sent since joining the room.  We should not
+              need to re-auth these events, given the new state should not impact their
+              auth results.  This ensures that the server ends up with correct historical

Member

richvdh Oct 25, 2021

We should not need to re-auth these events, given the new state should not impact their auth results.

Per the above, I think this is a risky assumption. The new state totally could affect their auth results.

proposals/2775-lazy-loading-over-federation.md

Comment on lines +152 to +153

		We currently trust the server we join via to provide us with accurate room state.
		This proposal doesn't make this any better or worse.

Member

richvdh Oct 25, 2021

per the above, I think it does, since we now trust lots of servers to give us accurate room state - at least while we're lazy-loading the state.

callahad mentioned this pull request

Prototype MSC2775: Lazy loading over federation matrix-org/synapse#11249

Closed

richvdh reviewed

View reviewed changes

proposals/2775-lazy-loading-over-federation.md

+              Therefore, in the response to `/send_join` (or a MSC2444 `/peek`), we propose
+              sending only the following `m.room.member` events (if the initiating server
+              includes `lazy_load_members: true` in their JSON request body):

Member

richvdh Dec 1, 2021

the request body for a /send_join is the membership event itself, so we'll have to put this flag elsewhere. Suggest a lazy_load_members=true|false query-param.

Member

richvdh Dec 1, 2021

also, needs an unstable prefix, I guess.

Suggest org.matrix.msc2775.lazy_load_members

richvdh reviewed

View reviewed changes

proposals/2775-lazy-loading-over-federation.md

+               * the "hero" room members which are needed for clients to display
+                 a summary of the room (based on the
+                 [requirements of the CS API](https://github.com/matrix-org/matrix-doc/blob/1c7a6a9c7fa2b47877ce8790ea5e5c588df5fa90/api/client-server/sync.yaml#L148))
+               * any members which are in the auth chain for the state events in the response

Member

richvdh Dec 1, 2021

the auth chain ends up in a separate section, so I think this is a no-op.

proposals/2775-lazy-loading-over-federation.md

Comment on lines +37 to +39

+               * the "hero" room members which are needed for clients to display
+                 a summary of the room (based on the
+                 [requirements of the CS API](https://github.com/matrix-org/matrix-doc/blob/1c7a6a9c7fa2b47877ce8790ea5e5c588df5fa90/api/client-server/sync.yaml#L148))

Member

richvdh Dec 1, 2021

do we really need this as well as a summary?

proposals/2775-lazy-loading-over-federation.md

+                 a summary of the room (based on the
+                 [requirements of the CS API](https://github.com/matrix-org/matrix-doc/blob/1c7a6a9c7fa2b47877ce8790ea5e5c588df5fa90/api/client-server/sync.yaml#L148))
+               * any members which are in the auth chain for the state events in the response
+               * any members which are power events (aka control events): bans & kicks.

Member

richvdh Dec 1, 2021

why do we need kicks here?

proposals/2775-lazy-loading-over-federation.md

+              `/send_join` or `/peek` must include a `lazy_load_members: true` field if the
+              state is partial and members need to be subsequently loaded by `/state`.
+              Clients which are not lazy loading members (by MSC1227) must block returning

Member

richvdh Dec 2, 2021

what does it mean for a client to "block returning" an API?

richvdh reviewed

View reviewed changes

proposals/2775-lazy-loading-over-federation.md

+              Clients which are lazy loading members however may return the initial `/join`
+              or `/peek` before `/state` has completed.  However, we need a way to tell
+              clients once the server has finished synchronising its local state. We do this
+              by adding an `syncing: true` field to the room's `state` block in the `/sync`

Member

richvdh Dec 6, 2021

what exactly do clients do with this field? I thought that clients which do lazy-loading syncs were obliged to expect partial state blocks anyway?

Member Author

ara4n Dec 20, 2021

they do expect partial state blocks, but they currently don't know that they're partial - and so then go and hit /members anyway to fill in the missing members. so this i think is fixing that thinko by giving the client a clear way to know that state is partial and they need to fill it in.

richvdh reviewed

View reviewed changes

proposals/2775-lazy-loading-over-federation.md

Comment on lines +37 to +48

+               * the "hero" room members which are needed for clients to display
+                 a summary of the room (based on the
+                 [requirements of the CS API](https://github.com/matrix-org/matrix-doc/blob/1c7a6a9c7fa2b47877ce8790ea5e5c588df5fa90/api/client-server/sync.yaml#L148))
+               * any members which are in the auth chain for the state events in the response
+               * any members which are power events (aka control events): bans & kicks.
+               * one joined member per server (if we want to be able to send messages while
+                 the room state is synchronising, otherwise we won't know where to send them
+                 to)
+               * any membership events with membership `invite` (to mitigate risk of double invites)
+               * any members for user_ids which are referred to by the content of state events
+                 in the response (e.g. `m.room.power_levels`) <-- TBD.  These could be irrelevant,
+                 plus we don't know where to look for user_ids in arbitrary state events.

Member

richvdh Dec 22, 2021 •

edited

Loading

something else I'd like to change while we're messing about with /send_join responses: we should make it explicit that events do not need to be duplicated between state and auth_chain. For example, the m.room.create is necessarily both part of state but is also on the auth chain for all the events in the response. There is no point in sending two copies of such events - servers should be able to elide them from auth_chain.

This needs to be opt-in, because existing implementations (such as Synapse) rely on at least the create event being returned in auth_chain - so this is a good time to change it (when we are adding a query param anyway).

We should do something similar to /state.

timokoesters reviewed

View reviewed changes

proposals/2775-lazy-loading-over-federation.md

+              Causes include:
+               * Room state can be big.  For instance, a /send_join response for Matrix HQ is
+                 currently 24MB of JSON covering 28,188 events, and could easily take tens of

timokoesters Dec 30, 2021

I think there is some duplication we can get rid of. Some state events are also mentioned in the auth chain. Maybe this can be fixed by only sending it as one list of event jsons and one list of only the event ids that are in the state

timokoesters Dec 31, 2021

Ah, this was mentioned already in #2775 (comment)

proposals/2775-lazy-loading-over-federation.md

+              Causes include:
+               * Room state can be big.  For instance, a /send_join response for Matrix HQ is
+                 currently 24MB of JSON covering 28,188 events, and could easily take tens of

timokoesters Dec 30, 2021

We can also think about how we can improve the auth chain size. Member events don't have to mention the previous member event in most cases and can instead mention an old member event or none at all in public rooms

proposals/2775-lazy-loading-over-federation.md

+               * Room state can be big.  For instance, a /send_join response for Matrix HQ is
+                 currently 24MB of JSON covering 28,188 events, and could easily take tens of
+                 seconds to calculate and send (especially on lower-end hardware).
+               * All these events have to be verified by the receiving server.

timokoesters Dec 30, 2021

Is it possible to keep them in ram and persist them in the background while users can already use the room? Probably not because the server might crash...

proposals/2775-lazy-loading-over-federation.md

+                 seconds to calculate and send (especially on lower-end hardware).
+               * All these events have to be verified by the receiving server.
+               * Your server may have to fetch ths signing keys for all the servers who have
+                 sent state into the room.

timokoesters Dec 30, 2021

This can be improved by including the public key in the event (instead of the server name?)

turt2live removed the proposal-in-review label

Member Author

ara4n commented Aug 12, 2022

@richvdh presumably this is obsoleted by #3706 now?

Member

richvdh commented Aug 12, 2022

@richvdh presumably this is obsoleted by #3706 now?

largely yes, though I'm mostly keeping this open as a reminder that we probably need an MSC that describes how a server is meant to use the extensions proposed in #3706.

Member

richvdh commented Oct 3, 2022

largely yes, though I'm mostly keeping this open as a reminder that we probably need an MSC that describes how a server is meant to use the extensions proposed in #3706.

That MSC now exists in the form of MSC3902, so maybe we can close this one now?

Member

richvdh commented Oct 18, 2022

closing as above.

richvdh closed this

turt2live added the obsolete label

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Reviewers

timokoesters timokoesters left review comments

erikjohnston erikjohnston left review comments

richvdh richvdh left review comments

neilalexander neilalexander left review comments

emorrp1 emorrp1 left review comments

Labels

kind:core needs-implementation obsolete proposal