-
Notifications
You must be signed in to change notification settings - Fork 415
MSC4143: MatrixRTC #4143
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
MSC4143: MatrixRTC #4143
Changes from 12 commits
9cbe448
b2b4e5e
8c1340a
8c800a6
1dcbfce
813a21a
d65e42e
4ab679a
6d84256
d68e942
3b19b49
a278ff8
f97e4e6
b626d51
ec9fa8b
50db42a
9bc444c
33231dc
cea4cdd
d61969a
4ad3961
1fbd843
60e23d9
680ef7d
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,278 @@ | ||
| # MSC4143: MatrixRTC | ||
|
|
||
| This MSC defines the modules with which the MatrixRTC (Matrix Real Time Communication) signalling system is built. | ||
|
|
||
| The MatrixRTC specification is separated into different modules. | ||
|
|
||
| - The MatrixRTC room state that defines the state of the real time application.\ | ||
| It is the source of truth for: | ||
| - Who is part of a session | ||
| - Who is connected via what technology/backend | ||
| - Metadata per device used by other participants to decide whether the streams | ||
| from this source are of interest / need to be subscribed. | ||
| - The RTC backend. | ||
| - It defines how to connect the participating peers. | ||
| - Livekit is the standard for this as of writing. | ||
| - Defines how to connect to a server/other peers, how to update the connection, | ||
| how to subscribe to different streams... | ||
| - Another planned backend is a full mesh implementation based on MSC3401. | ||
| - The RTCSession types (application) have their own per application spec. | ||
| - Calls can be done with an application of type `m.call` see (TODO: link call msc) | ||
| - The application defines all the details of the RTC experience: | ||
| - How to interpret the metadata of the member events. | ||
| - What streams to connect to. | ||
| - What data in which format to sent over the RTC channels. | ||
|
|
||
| This MSC will focus on the Matrix room state, which can be seen as the most high | ||
| level signalling of a call: | ||
|
|
||
| ## Proposal | ||
|
|
||
| Each RTC session is made out of a collection of `m.rtc.member` state events. | ||
| Each `m.rtc.member` event defines the application type: `application` | ||
| and a `call_id`. | ||
| The first element of the state key is the `userId` and the second the `deviceId`. | ||
| (see [this proposal for state keys](https://github.com/matrix-org/matrix-spec-proposals/pull/3757#issuecomment-2099010555) | ||
| for context about second/first state key.) | ||
|
|
||
| ### The MatrixRTC room state | ||
|
|
||
| Everything required for working MatrixRTC | ||
| (current session, sessions history, join/leave events, ...) only | ||
| require one event type. | ||
|
|
||
| A complete `m.rtc.member` state event looks like this: | ||
|
|
||
| ```json5 | ||
| // event type: "m.rtc.member" | ||
| // event key: "@user:matrix.domain_DEVICEID" | ||
| { | ||
| "application": "m.my_session_type", | ||
| "call_id": "", | ||
| "device_id": "DEVICEID", | ||
| "created_ts": Time | undefined, | ||
| "expires_after": Duration, | ||
| "focus_active": {...FOCUS_A}, | ||
| "foci_preferred": [ | ||
| {...FOCUS_1}, | ||
| {...FOCUS_2} | ||
| ] | ||
| } | ||
| ``` | ||
|
|
||
| > [!NOTE] | ||
| > This relies on [MSC3757](https://github.com/matrix-org/matrix-spec-proposals/pull/3757). | ||
| > We need to have one state event per device, hence multiple "non-overwritable" state | ||
| > events per user. | ||
|
|
||
| This gives us the information, that user: `@user:matrix.domain` with device `DEVICEID` | ||
| is part of an RTCSession of type `m.call` in the scope/sub-session `""` (empty | ||
| string as call id) connected over `FOCUS_A`. This is all information that is needed | ||
| for another room member to detect the running session and join it. | ||
|
|
||
| We include the device_id in the member content to not rely on the exact format of the state key. | ||
| In case [MSC3757](https://github.com/matrix-org/matrix-spec-proposals/pull/3757) is used it would not | ||
| be the second element of the state key array. | ||
|
|
||
| `created_ts` is an optional property that caches the time of creation. It is not required | ||
| for an event that, has not yet been updated, there the `origin_server_ts` is used. | ||
|
|
||
| > [!NOTE] | ||
| > We introduce `created_ts()` as the notation for `created_ts ?? origin_server_ts` | ||
|
|
||
| Once the event gets updated, the origin_server_ts needs to be copied into the `created_ts` field. | ||
| An existing `created_ts` field implies that this is a state event updating the current session | ||
| and a missing `created_ts` field implies that it is a join state event. | ||
| All membership events that belong to one member session can be grouped with the index | ||
| `created_ts()`+`device_id`. This is why the `m.rtc.member` events deliberately do NOT include a `membership_id`. | ||
|
|
||
| Other then the membership sessions, there is **no event** to represent a rtc session (containing all members). | ||
| Such an event would include shared information, and deciding who has authority over that is not trivial. | ||
|
||
| Instead the session is a computed value based on `m.rtc.member` events. | ||
| The list of events with the same `application` and `m.call_id` represent one session. | ||
| This array allows to compute fields such as participant count, start time, etc. | ||
|
|
||
| Sending an empty `m.rtc.member` event represents a leave action. | ||
| Sending a well formatted `m.rtc.member` represents a join action. | ||
|
|
||
| Based on the value of `application`, the event might include additional parameters | ||
| required to provide additional session parameters. | ||
|
|
||
| > A [thirdroom](https://thirdroom.io)-like experience could include the information of an approximate position | ||
| > on the map, so that clients can omit connecting to participants that are not in their | ||
| > area of interest. | ||
|
|
||
| #### Reliability requirements for the room state | ||
|
|
||
| Room state is a very well suited place to store the data for a MatrixRTC session, as | ||
| it allows: | ||
|
|
||
| - The client to determine current ongoing sessions without loading history for every room, | ||
| or doing additional work other than the sync loop that needs to run anyway. | ||
| - The client can compute/access data of past sessions without any additional redundant data. | ||
| - Sessions (start/end/participant count) are federated and there is not redundant data storage that | ||
| could result in conflicts, or can get out of sync. The room state events are part of the dag and this | ||
| is solved like any other PDU in matrix. | ||
|
|
||
| A challenge with using the room state to represent a session is disconnection behaviour. | ||
| If the client disconnects from a call because of a network issue, | ||
| an application crash, or a user forcefully quitting the client - then the room state cannot be updated any more. | ||
| The client is required to leave by sending a new empty state which cannot happen once connection is lost. | ||
|
|
||
| If the state is not updated correctly we end up with incorrect session end timestamps, and a room state that is not | ||
| correctly representing the current RTC session state. Historic and current MatrixRTC session data would be broken. | ||
|
|
||
| For an acceptable solution, the following requirements need to be taken into consideration: | ||
|
|
||
| - Room state is set to empty if the client loses connection. (A heardbeat like system is desired) | ||
| - The best source of truth for a call participation is a working connection to the SFU. | ||
| It is desired that the disconnect of the SFU is connected to the room state. | ||
| - It should be possible to update the room state without the client being online. | ||
| - All of this should still work when Matrix uses cryptographic identities (e.g. | ||
| [MSC4080](https://github.com/matrix-org/matrix-spec-proposals/pull/4080)). | ||
|
|
||
| [MSC4140](https://github.com/matrix-org/matrix-spec-proposals/pull/4140) proposes a concept to | ||
| delay the leave events until one of the leave conditions (heartbeat or SFU disconnect) occur | ||
| and fulfil all of the these requirements. | ||
|
|
||
| A MatrixRTC client has to first send/schedule the following delayed leave event: | ||
|
|
||
| ```json5 | ||
| // event type: "m.rtc.member" | ||
| // event key: "@user:matrix.domain_DEVICEID" | ||
| { | ||
| "leave_reason": "CONNECTION_LOST" | ||
|
||
| } | ||
| ``` | ||
|
|
||
| Subsequently, the actual state event can be sent, so that we guarantee that the state will be empty eventually. | ||
| The `leave_reason` is added so clients can be more verbal about why a user disconnected from a call. | ||
|
|
||
| Receiving clients will be able to detect if the delayed event request was recognised by the presence of the `has_delayed_overwrite: true` | ||
| unsigned property. If the property is missing the event is invalid. | ||
|
|
||
| This also invalidates delayed leave events that are send with a valid membership content. They do not contain the | ||
| `has_delayed_overwrite: true` unsigned property. | ||
|
|
||
| #### Historic sessions | ||
|
|
||
| Since there is no single entry for a historic session (because of the ownership ambiguity), | ||
| historic sessions need to be computed on the client. | ||
|
|
||
| Each state event can either mark a join or leave: | ||
|
|
||
| - join: `prev_state.application != current_state.application` && | ||
| `prev_state.m.call_id != current_state.m.call_id` && | ||
| `current_state.application != undefined` | ||
| (where an empty `m.rtc.member` event would imply `state.application == undefined`) | ||
| - leave: `prev_state.application != current_state.application` && | ||
| `prev_state.m.call_id != current_state.m.call_id` && | ||
| `current_state.application == undefined` | ||
|
|
||
| Based on this one can find user sessions. The range between a join and a leave | ||
| event gives the specific times and duration of the session. | ||
| The collection of all overlapping user sessions with the same `call_id` and | ||
| `application` define one MatrixRTC history event. | ||
|
|
||
| ### The RTC backend | ||
|
|
||
| `foci_active` and `foci_preferred` are used to communicate: | ||
|
|
||
| - how a user is connected to the session (`foci_active`) | ||
| - what connection method this user knows about would like to connect with. | ||
|
|
||
| The only enforced parameter of a `foci_preferred` or `foci_active` is `type`. | ||
| Based on the focus type a different amount of parameters might be needed to, | ||
| communicate how to connect to other users. | ||
| `foci_preferred` and `foci_active` can have different parameters so that it is, | ||
| possible to use a combination of the two to figure our that everyone is connected | ||
| with each other. | ||
|
|
||
| Only users with the same type can connect in one session. If a frontend does | ||
| not support the used type they cannot connect. | ||
|
|
||
| Each focus type will get its own MSC, describing how to get from the foci | ||
| information to establishing WebRTC connections for all participants. | ||
|
|
||
| - [`livekit`](www.example.com) TODO: create `livekit` focus MSC and add link here. | ||
| - [`full_mesh`](https://github.com/matrix-org/matrix-spec-proposals/pull/3401) | ||
| TODO: create `full-mesh` focus MSC based on[MSC3401](https://github.com/matrix-org/matrix-spec-proposals/pull/3401) | ||
| and add link here. | ||
|
|
||
| #### Sourcing `foci_preferred` | ||
|
|
||
| At some point participants have to decide/propose which focus they use. | ||
| Based on the focus type and use case choosing a `foci_preferred` can be different. | ||
| If possible these guidelines should be obeyed: | ||
|
|
||
| - If there is a relation between the `focus_active` and a preferred focus (`type: livekit` is an example for this) | ||
| it is recommended to copy _the preferred focus that relates to the current `focus_active`_ of other participants to | ||
| the start of the `foci_preferred` array of the member event. | ||
| (The exact definition of: _the preferred focus that relates to the current `focus_active`_ is part of the | ||
| specification for each focus type. For `full_mesh` for example there is no such thing as: _the preferred focus that | ||
| relates to the current `focus_active`_ ) | ||
| - Homeservers can proposes `preferred_foci` via the well known. An array of preferred foci is provided behind the well | ||
| known key `m.rtc_foci`. This is defined in [MSC4158](https://github.com/matrix-org/matrix-spec-proposals/pull/4158). | ||
| They are related and it is recommended to also read | ||
| [MSC4158](https://github.com/matrix-org/matrix-spec-proposals/pull/4158) with this MSC. | ||
| Those proposals from **your own** homeserver should come next in the `foci_preferred` list of the member event. | ||
| - Clients also have the option to configure a preferred foci even though this is not recommended (see below). | ||
| Those come last in the list. | ||
|
|
||
| The rationale for these guidelines are: | ||
|
|
||
| - It is always desired to have as few focus switches as possible. | ||
| That is why the highest priority is to prefer the focus that is already in use. | ||
| - MatrixRTC is designed around the same architecture as the rest of Matrix, with | ||
| conversations being powered by many homeservers from across the network. | ||
| MatrixRTC has the same goal. To achieve a stable and healthy ecosystem | ||
| RTC infrastructure should be thought of as a part of a homeserver. It is very similar | ||
| to a turn server: mostly traffic and little cpu load. | ||
| To not end up in a world where each user is only using one central SFU but where the traffic | ||
| is split over multiple SFU's it is important that we leverage the SFU distribution similarly to the | ||
| distribution of homeservers. | ||
| For this reason the second guideline is to lookup the preferred foci from the homeserver's well_known. | ||
| - Looking up the preferred foci from a client is toxic to a federated system. If the majority of users | ||
| decide to use the same client all of the users will use one focus. This destroys the passive security mechanism that | ||
| each instance is not an interesting attack vector since it is only a fraction of the network. | ||
| Additionally it will result in poor performance if every user on Matrix would use the same focus. | ||
| There are cases where this is acceptable: | ||
| - Transitioning to MatrixRTC. Here it might be beneficial to have a client that has a fallback focus | ||
| so calls also work with homeservers not supporting it. | ||
| - For testing purposes where a different focus should be tested but one does not want to touch the .well_known | ||
| - For custom deployments that benefit from having the Focus configuration on a per client basis instead of per homeserver. | ||
|
|
||
| ### The RTC Session types (application) | ||
|
|
||
| Each session type can have its own specification in how the different streams | ||
| are interpreted and even what focus type to use. This makes this proposal extremely | ||
| flexible. For instance, a Jitsi conference could be added by introducing a new `application` | ||
| and a new focus type and would be MatrixRTC compatible. It would not be compatible | ||
| with applications that do not use the Jitsi focus but clients would know that there | ||
| is an ongoing session of unknown type and unknown focus and could display/represent | ||
| this in the user interface. | ||
|
|
||
| To make it easy for clients to support different RTC session types, the recommended | ||
| approach is to provide a Matrix widget for each session type, so that client developers | ||
| can use the widget as the first implementation if they want to support this RTC | ||
| session type. | ||
|
|
||
| Each application should get its own MSC in which the all the additional | ||
| fields are explained and how the communication with the possible foci is | ||
| defined: | ||
|
|
||
| - [`m.call`](www.example.com) TODO: create `m.call` MSC and add link here. | ||
|
||
|
|
||
| ## Potential issues | ||
|
Member
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. The fact we don't reference how to tell users that a call is happening (i.e. m.call.notify) is very disorienting here. |
||
|
|
||
| ## Alternatives | ||
|
|
||
| ## Security considerations | ||
|
|
||
| ## Unstable prefix | ||
|
|
||
| The state events and the well_known key introduced in this MSC use the unstable prefix | ||
| `org.matrix.msc4143.` instead of `m.` as used in the text. | ||
|
||
|
|
||
| Possible values inside the `m.rtc.member` event (like `m.call`) will use a prefix defined in the | ||
| related PR (TODO create and link `m.call` application type PR) | ||
Uh oh!
There was an error while loading. Please reload this page.