From 750087ff4eedcd988143031d1e68940fa856006d Mon Sep 17 00:00:00 2001 From: =?UTF-8?q?=C5=A0imon=20Brandner?= Date: Sun, 25 Sep 2022 19:30:57 +0200 Subject: [PATCH 01/29] Native Matrix VoIP signalling for cascaded SFUs MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit Signed-off-by: Šimon Brandner --- proposals/0000-sfu.md | 357 ++++++++++++++++++++++++++++++++++++++++++ 1 file changed, 357 insertions(+) create mode 100644 proposals/0000-sfu.md diff --git a/proposals/0000-sfu.md b/proposals/0000-sfu.md new file mode 100644 index 00000000000..a710073e22a --- /dev/null +++ b/proposals/0000-sfu.md @@ -0,0 +1,357 @@ +# MSC0000: Native Matrix VoIP signalling for cascaded SFUs + +[MSC3401](https://github.com/matrix-org/matrix-spec-proposals/pull/3401) +specifies how full-mesh group calls work in Matrix. While that MSC works well +for small group calls, it does not work so well for large conferences due to +bandwidth (and other) issues. + +Selective Forwarding Units (SFUs) - servers which forwarding WebRTC streams +between peers (which could be clients or SFUs or both). To make use of them +effectively, peers need to be able to tell the SFU which streams they want to +receive at what resolutions. + +To solve the issue of centralization, the SFUs are also allowed to connect to +each other ("cascade") and therefore the peers also need a way to tell an SFU to +which other SFUs to connect. + +## Proposal + +**TODO: spell out how this works with active speaker detection & associated +signalling** **TODO: spell out how the DC traffic interacts with +application-layer traffic** **TODO: how do we prove to the SFU that we have the +right to subscribe to track?** + +### Diagrams + +#### 1:1 + +``` + A -------- B +``` + +#### Full mesh between clients + +``` + A -------- B + \ / + \ / + \ / + \ / + C +``` + +#### SFU (aka Focus) + +``` + A __ __ B + \ / + F + | + | + C + +Where F is an SFU focus +``` + +#### Cascaded decentralised SFU + +``` + A1 --. .-- B1 + A2 ---Fa ----- Fb--- B2 + \ / + \ / + \ / + \ / + Fc + | | + C1 C2 + +Where Fa, Fb and Fc are SFU foci, one per homeserver, each with two clients. +``` + +### State events + +#### `m.call` state event + +This MSC proposes adding an _optional_ `m.foci` field to the `m.call` state +event. It as list of recommended SFUs that the call initiator can recommend to +users who do not want to use their own SFU (because they don't have one, or +because they would be the only person on their SFU for their call, and so choose +to connect direct to save bandwidth). + +For instance: + +```json +{ + "type": "m.call", + "state_key": "cvsiu2893", + "content": { + "m.intent": "m.room", + "m.type": "m.voice", + "m.name": "Voice room", + "m.foci": [ + "@sfu-lon:matrix.org", + "@sfu-nyc:matrix.org" + ] + } +} +``` + +#### `m.call.member` state event + +This MSC proposes adding an _optional_ `m.foci` field to the `m.call.member` +state event. It is used, if the user wants to be contacted via an SFU rather +than called directly (either 1:1 or full mesh). + +For instance: + +```jsonc +{ + "type": "m.call.member", + "state_key": "@matthew:matrix.org", + "content": { + "m.calls": [ + { + "m.call_id": "cvsiu2893", + // TODO: Should this be at the device level? + "m.foci": [ + "@sfu-lon:matrix.org", + "@sfu-nyc:matrix.org", + ], + "m.devices": [...] + } + ], + "m.expires_ts": 1654616071686 + } +} +``` + +### Choosing an SFU + +**TODO: How does a client discover SFUs** **TODO: Is SFU identified by just +`user_id` or `(user_id, device_id)`?** + +* When initiating a group call, we need to decide which devices to actually talk + to. + * If the client has no SFU configured, we try to use the `m.foci` in the + `m.call` event. + * If there are multiple `m.foci`, we select the closest one based on + latency, e.g. by trying to connect to all of them simultaneously and + discarding all but the first call to answer. + * If there are no `m.foci` in the `m.call` event, then we look at which foci + in `m.call.member` that are already in use by existing participants, and + select the most common one. (If the foci is overloaded it can reject us + and we should then try the next most populous one, etc). + * If there are no `m.foci` in the `m.call.member`, then we connect full + mesh. + * If subsequently `m.foci` are introduced into the conference, then we + should transfer the call to them (effectively doing a 1:1->group call + upgrade). + * If the client does have an SFU configured, then we decide whether to use it. + * If other conf participants are already using it, then we use it. + * If there are other users from our homeserver in the conference, then we + use it (as presumably they should be using it too) + * If there are no other `m.foci` (either in the `m.call` or in the + participant state) then we use it. + * Otherwise, we save bandwidth on our SFU by not cascading and instead + behaving as if we had no SFU configured. +* We do not recommend that users utilise an SFU to hide behind for privacy, but + instead use a TURN server, only providing relay candidates, rather than + consuming SFU resources and unnecessarily mandating the presence of an SFU. + +### Initial offer/answer dance + +During the initial offer/answer dance, the client establishes a data-channel +between itself and the SFU to use later for rapid signalling. + +### Simulcast + +#### RTP munging + +#### vp8 munging + +### RTCP re-transmission + +### Data-channel messaging + +The client uses the established data channel connection to the SFU to perform +low-latency signalling to rapidly (un)subscribe/(un)publish streams, send +keep-alive messages, metadata, cascade and perform re-negotiation. + +**TODO: It feels like these ought to be `m.` namespaced** **TODO: Why `op` +instead of `type`?** **TODO: It feels like these ought to have `content` rather +than being on the same layer** + +#### SDP Stream Metadata extension + +The client will be receiving multiple streams from the SFU and it will need to +be able to distinguish them, this therefore build on +[MSC3077](https://github.com/matrix-org/matrix-spec-proposals/pull/3077) and +[MSC3291](https://github.com/matrix-org/matrix-spec-proposals/pull/3291) to +provide the client with the necessary metadata. Some of the data-channel events +include a `metadata` field including a description of the stream being sent +either from the SFU to the client or from the client to the SFU. + +```json5 +{ + "streamId1": { + "purpose": "m.usermedia", + "audio_muted": false, + "video_muted": true, + "tracks": { + "trackId1": { + "width": 1920, + "height": 1080 + }, + "trackId2": {} + } + } +} +``` + +#### Event types + +##### Subscribe + +```json5 +{ + "op": "subscribe", + "streamId": "streamId1", + "trackId1": "trackId1", + "width": 1920, + "height": 1080 +} +``` + +##### Unsubscribe + +```json5 +{ + "op": "unsubscribe", + "streamId": "streamId1", + "trackId1": "trackId1" +} +``` + +##### Publish + +##### Unpublish + +##### Offer + +##### Answer + +##### Metadata + +```json5 +{ + "op": "metadata", + "metadata": {...} // As specified in the Metadata section +} +``` + +##### Keep-alive + +```json5 +{ + "op": "alive" +} +``` + +##### Connect + +If a user is using their SFU in a call, it will need to know how to connect to +other SFUs present in order to participate in the full-mesh of SFU traffic (if +any). The client is responsible for doing this using the `connect` op. + +```json5 +{ + op: "connect" + // TODO: How should this look? +} +``` + +### Encryption + +When SFUs are on the media path, they will necessarily terminate the SRTP +traffic from the peer, breaking E2EE. To address this, we apply an additional +end-to-end layer of encryption to the media using [WebRTC Encoded +Transform](https://github.com/w3c/webrtc-encoded-transform/blob/main/explainer.md) +(formerly Insertable Streams) via +[SFrame](https://datatracker.ietf.org/doc/draft-omara-sframe/). + +In order to provide PFS, The symmetric key used for these streams from a given +participating device is a megolm key. Unlike a normal megolm key, this is shared +via `m.room_key` over Olm to the devices participating in the conference +including an `m.call_id` and `m.room_id` field on the key to correlate it to the +conference traffic, rather than using the `session_id` event field to correlate +(given the encrypted traffic is SRTP rather than events, and we don't want to +have to send fake events from all senders every time the megolm session is +replaced). + +The megolm key is ratcheted forward for every SFrame, and shared with new +participants at the current index via `m.room_key` over Olm as per above. When +participants leave, a new megolm session is created and shared with all +participants over Olm. The new session is only used once all participants have +received it. + +## Potential issues + +The SFUs participating in a conference end up in a full mesh. Rather than +inventing our own spanning-tree system for SFUs however, we should fix it for +Matrix as a whole (as is happening in the LB work) and use a Pinecone tree or +similar to decide what better-than-full-mesh topology to use. In practice, full +mesh cascade between SFUs is probably not that bad (especially if SFUs only +request the streams over the trunk their clients care about) - and on aggregate +will be less obnoxious than all the clients hitting a single SFU. + +Too many foci will chew bandwidth due to full-mesh between them. In the worst +case, if every use is on their own HS and picks a different foci, it degenerates +to a full-mesh call (just server-side rather than client-side). Hopefully this +shouldn't happen as you will converge on using a single SFU with the most +clients, but need to check how this works in practice. + +SFrame mandates its own ratchet currently which is almost the same as megolm but +not quite. Switching it out for megolm seems reasonable right now (at least +until MLS comes along) + +## Alternatives + +An option would be to treat 1:1 (and full mesh) entirely differently to SFU +based calling rather than trying to unify them. Also, it's debatable whether +supporting full mesh is useful at all. In the end, it feels like unifying 1:1 +and SFU calling is for the best though, as it then gives you the ability to +trivially upgrade 1:1 calls to group calls and vice versa, and avoids +maintaining two separate hunks of spec. It also forces 1:1 calls to take +multi-stream calls seriously, which is useful for more exotic capture devices +(stereo cameras; 3D cameras; surround sound; audio fields etc). + +### Cascading + +One option here is for SFUs to act as an AS and sniff the `m.call.member` +traffic of their associated server, and automatically call any other `m.foci` +which appear. (They don't need to make outbound calls to clients, as clients +always dial in). + +## Security considerations + +Malicious users could try to DoS SFUs by specifying them as their foci. + +SFrame E2EE may go horribly wrong if we can't send the new megolm session fast +enough to all the participants when a participant leave (and meanwhile if we +keep using the old session, we're technically leaking call media to the parted +participant until we manage to rotate). + +Need to ensure there's no scope for media forwarding loops through SFUs. + +In order to authenticate that only legitimate users are allowed to subscribe to +a given `conf_id` on an SFU, it would make sense for the SFU to act as an AS and +sniff the `m.call` events on their associated server, and only act on to-device +`m.call.*` events which come from a user who is confirmed to be in the room for +that `m.call`. (In practice, if the conf is E2EE then it's of limited use to +connect to the SFU without having the keys to decrypt the traffic, but this +feature is desirable for non-E2EE confs and to stop bandwidth DoS) + +## Unstable prefixes + +We probably don't care for this for the data-channel? From aa53398309872ddbe7edf51c37308b2fb3584987 Mon Sep 17 00:00:00 2001 From: =?UTF-8?q?=C5=A0imon=20Brandner?= Date: Sun, 25 Sep 2022 19:33:41 +0200 Subject: [PATCH 02/29] Update MSC number MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit Signed-off-by: Šimon Brandner --- proposals/0000-sfu.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/proposals/0000-sfu.md b/proposals/0000-sfu.md index a710073e22a..af7332ee2a2 100644 --- a/proposals/0000-sfu.md +++ b/proposals/0000-sfu.md @@ -1,4 +1,4 @@ -# MSC0000: Native Matrix VoIP signalling for cascaded SFUs +# MSC3898: Native Matrix VoIP signalling for cascaded SFUs [MSC3401](https://github.com/matrix-org/matrix-spec-proposals/pull/3401) specifies how full-mesh group calls work in Matrix. While that MSC works well From de302cb85e75e465b52d634c2d30f42d67ac828a Mon Sep 17 00:00:00 2001 From: =?UTF-8?q?=C5=A0imon=20Brandner?= Date: Sun, 2 Oct 2022 11:40:26 +0200 Subject: [PATCH 03/29] Link to diagrams from MSC3401 MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit Signed-off-by: Šimon Brandner --- proposals/0000-sfu.md | 47 ++----------------------------------------- 1 file changed, 2 insertions(+), 45 deletions(-) diff --git a/proposals/0000-sfu.md b/proposals/0000-sfu.md index af7332ee2a2..18001abdff2 100644 --- a/proposals/0000-sfu.md +++ b/proposals/0000-sfu.md @@ -23,51 +23,8 @@ right to subscribe to track?** ### Diagrams -#### 1:1 - -``` - A -------- B -``` - -#### Full mesh between clients - -``` - A -------- B - \ / - \ / - \ / - \ / - C -``` - -#### SFU (aka Focus) - -``` - A __ __ B - \ / - F - | - | - C - -Where F is an SFU focus -``` - -#### Cascaded decentralised SFU - -``` - A1 --. .-- B1 - A2 ---Fa ----- Fb--- B2 - \ / - \ / - \ / - \ / - Fc - | | - C1 C2 - -Where Fa, Fb and Fc are SFU foci, one per homeserver, each with two clients. -``` +The diagrams of how this all looks can be found in +[MSC3401](https://github.com/matrix-org/matrix-spec-proposals/pull/3401). ### State events From 7474782819af164d1c64588ab9d9776a6c9b48ba Mon Sep 17 00:00:00 2001 From: =?UTF-8?q?=C5=A0imon=20Brandner?= Date: Sun, 2 Oct 2022 11:40:48 +0200 Subject: [PATCH 04/29] Use correct number for file MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit Signed-off-by: Šimon Brandner --- proposals/{0000-sfu.md => 3898-sfu.md} | 0 1 file changed, 0 insertions(+), 0 deletions(-) rename proposals/{0000-sfu.md => 3898-sfu.md} (100%) diff --git a/proposals/0000-sfu.md b/proposals/3898-sfu.md similarity index 100% rename from proposals/0000-sfu.md rename to proposals/3898-sfu.md From 5cad46d4c0b0b02c6b2c85dd54dbbafce6a22629 Mon Sep 17 00:00:00 2001 From: =?UTF-8?q?=C5=A0imon=20Brandner?= Date: Fri, 11 Nov 2022 16:38:56 +0100 Subject: [PATCH 05/29] Update sub and unsub ops MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit Signed-off-by: Šimon Brandner --- proposals/3898-sfu.md | 25 +++++++++++++++++++------ 1 file changed, 19 insertions(+), 6 deletions(-) diff --git a/proposals/3898-sfu.md b/proposals/3898-sfu.md index 18001abdff2..335249c1dbb 100644 --- a/proposals/3898-sfu.md +++ b/proposals/3898-sfu.md @@ -170,13 +170,24 @@ either from the SFU to the client or from the client to the SFU. ##### Subscribe +This event is sent by the client to request a set of tracks. In the case of +video tracks the client can also request a specific resolution of a given a +track; this resolution is a resolution the client wishes to receive but the SFU +may send a lower one due to bandwidth etc. + +If the user for example switches from "spotlight" (one large tile) to "grid" +(multiple small tiles) view, it should also send this request to let the SFU +know of the resolution change. + ```json5 { "op": "subscribe", - "streamId": "streamId1", - "trackId1": "trackId1", - "width": 1920, - "height": 1080 + "start": [ + "stream_id": "streamId1", + "track_id": "trackId1", + "width": 1920, + "height": 1080 + ], } ``` @@ -185,8 +196,10 @@ either from the SFU to the client or from the client to the SFU. ```json5 { "op": "unsubscribe", - "streamId": "streamId1", - "trackId1": "trackId1" + "stop": [ + "stream_id": "streamId1", + "track_id": "trackId1" + ], } ``` From f542fcbd190f3b1bf9eb4581131455f70197fb24 Mon Sep 17 00:00:00 2001 From: =?UTF-8?q?=C5=A0imon=20Brandner?= Date: Fri, 11 Nov 2022 16:47:04 +0100 Subject: [PATCH 06/29] Give a reason for specifying res in metadata MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit Signed-off-by: Šimon Brandner --- proposals/3898-sfu.md | 5 +++++ 1 file changed, 5 insertions(+) diff --git a/proposals/3898-sfu.md b/proposals/3898-sfu.md index 335249c1dbb..6385bbb6d0a 100644 --- a/proposals/3898-sfu.md +++ b/proposals/3898-sfu.md @@ -149,6 +149,11 @@ provide the client with the necessary metadata. Some of the data-channel events include a `metadata` field including a description of the stream being sent either from the SFU to the client or from the client to the SFU. +Other than mute information and stream purpose, the metadata includes video +track resolution. The SFU may not be able to determine the resolution of the +track itself but it does need to know for simulcast; therefore, we include this +in the metadata. + ```json5 { "streamId1": { From 6f01a945245bcbfdd2cef0b7687cca21da575a55 Mon Sep 17 00:00:00 2001 From: =?UTF-8?q?=C5=A0imon=20Brandner?= Date: Sat, 12 Nov 2022 17:27:05 +0100 Subject: [PATCH 07/29] Specify foci by `device_id` too MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit Signed-off-by: Šimon Brandner --- proposals/3898-sfu.md | 11 +++++------ 1 file changed, 5 insertions(+), 6 deletions(-) diff --git a/proposals/3898-sfu.md b/proposals/3898-sfu.md index 6385bbb6d0a..917b2f30d81 100644 --- a/proposals/3898-sfu.md +++ b/proposals/3898-sfu.md @@ -47,9 +47,9 @@ For instance: "m.type": "m.voice", "m.name": "Voice room", "m.foci": [ - "@sfu-lon:matrix.org", - "@sfu-nyc:matrix.org" - ] + { "user_id": "@sfu-lon:matrix.org", "device_id": "FS5F589EF" }, + { "user_id": "@sfu-nyc:matrix.org", "device_id": "VT4GA35VS" }, + ], } } ``` @@ -70,10 +70,9 @@ For instance: "m.calls": [ { "m.call_id": "cvsiu2893", - // TODO: Should this be at the device level? "m.foci": [ - "@sfu-lon:matrix.org", - "@sfu-nyc:matrix.org", + { "user_id": "@sfu-lon:matrix.org", "device_id": "FS5F589EF" }, + { "user_id": "@sfu-nyc:matrix.org", "device_id": "VT4GA35VS" }, ], "m.devices": [...] } From 575e16c5c6242a16712adc56a2418b0f2e0aa1df Mon Sep 17 00:00:00 2001 From: =?UTF-8?q?=C5=A0imon=20Brandner?= Date: Sat, 12 Nov 2022 17:28:32 +0100 Subject: [PATCH 08/29] Fixup some json MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit Signed-off-by: Šimon Brandner --- proposals/3898-sfu.md | 16 ++++++++-------- 1 file changed, 8 insertions(+), 8 deletions(-) diff --git a/proposals/3898-sfu.md b/proposals/3898-sfu.md index 917b2f30d81..8e8090e50cc 100644 --- a/proposals/3898-sfu.md +++ b/proposals/3898-sfu.md @@ -62,7 +62,7 @@ than called directly (either 1:1 or full mesh). For instance: -```jsonc +```json { "type": "m.call.member", "state_key": "@matthew:matrix.org", @@ -153,7 +153,7 @@ track resolution. The SFU may not be able to determine the resolution of the track itself but it does need to know for simulcast; therefore, we include this in the metadata. -```json5 +```json { "streamId1": { "purpose": "m.usermedia", @@ -183,7 +183,7 @@ If the user for example switches from "spotlight" (one large tile) to "grid" (multiple small tiles) view, it should also send this request to let the SFU know of the resolution change. -```json5 +```json { "op": "subscribe", "start": [ @@ -197,7 +197,7 @@ know of the resolution change. ##### Unsubscribe -```json5 +```json { "op": "unsubscribe", "stop": [ @@ -217,7 +217,7 @@ know of the resolution change. ##### Metadata -```json5 +```json { "op": "metadata", "metadata": {...} // As specified in the Metadata section @@ -226,7 +226,7 @@ know of the resolution change. ##### Keep-alive -```json5 +```json { "op": "alive" } @@ -238,9 +238,9 @@ If a user is using their SFU in a call, it will need to know how to connect to other SFUs present in order to participate in the full-mesh of SFU traffic (if any). The client is responsible for doing this using the `connect` op. -```json5 +```json { - op: "connect" + "op": "connect" // TODO: How should this look? } ``` From 33b1880a29b34d262a41cb4140c855263ee3c89e Mon Sep 17 00:00:00 2001 From: =?UTF-8?q?=C5=A0imon=20Brandner?= Date: Sat, 12 Nov 2022 17:32:28 +0100 Subject: [PATCH 09/29] Typo MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit Signed-off-by: Šimon Brandner --- proposals/3898-sfu.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/proposals/3898-sfu.md b/proposals/3898-sfu.md index 8e8090e50cc..5c47173d05c 100644 --- a/proposals/3898-sfu.md +++ b/proposals/3898-sfu.md @@ -96,7 +96,7 @@ For instance: discarding all but the first call to answer. * If there are no `m.foci` in the `m.call` event, then we look at which foci in `m.call.member` that are already in use by existing participants, and - select the most common one. (If the foci is overloaded it can reject us + select the most common one. (If the focus is overloaded it can reject us and we should then try the next most populous one, etc). * If there are no `m.foci` in the `m.call.member`, then we connect full mesh. From 65faee445fa2108b79416433333aa1ad178a20e7 Mon Sep 17 00:00:00 2001 From: =?UTF-8?q?=C5=A0imon=20Brandner?= Date: Sun, 13 Nov 2022 13:11:19 +0100 Subject: [PATCH 10/29] Specify how to handle foci better MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit Signed-off-by: Šimon Brandner --- proposals/3898-sfu.md | 167 +++++++++++++++++++++++++----------------- 1 file changed, 98 insertions(+), 69 deletions(-) diff --git a/proposals/3898-sfu.md b/proposals/3898-sfu.md index 5c47173d05c..ad689c7716b 100644 --- a/proposals/3898-sfu.md +++ b/proposals/3898-sfu.md @@ -26,39 +26,10 @@ right to subscribe to track?** The diagrams of how this all looks can be found in [MSC3401](https://github.com/matrix-org/matrix-spec-proposals/pull/3401). -### State events +### Additions to the `m.call.member` state event -#### `m.call` state event - -This MSC proposes adding an _optional_ `m.foci` field to the `m.call` state -event. It as list of recommended SFUs that the call initiator can recommend to -users who do not want to use their own SFU (because they don't have one, or -because they would be the only person on their SFU for their call, and so choose -to connect direct to save bandwidth). - -For instance: - -```json -{ - "type": "m.call", - "state_key": "cvsiu2893", - "content": { - "m.intent": "m.room", - "m.type": "m.voice", - "m.name": "Voice room", - "m.foci": [ - { "user_id": "@sfu-lon:matrix.org", "device_id": "FS5F589EF" }, - { "user_id": "@sfu-nyc:matrix.org", "device_id": "VT4GA35VS" }, - ], - } -} -``` - -#### `m.call.member` state event - -This MSC proposes adding an _optional_ `m.foci` field to the `m.call.member` -state event. It is used, if the user wants to be contacted via an SFU rather -than called directly (either 1:1 or full mesh). +This MSC proposes adding two _optional_ fields to the `m.call.member` state event: +`m.foci.preferred` and `m.foci.active`. For instance: @@ -70,11 +41,16 @@ For instance: "m.calls": [ { "m.call_id": "cvsiu2893", - "m.foci": [ - { "user_id": "@sfu-lon:matrix.org", "device_id": "FS5F589EF" }, - { "user_id": "@sfu-nyc:matrix.org", "device_id": "VT4GA35VS" }, - ], - "m.devices": [...] + "m.devices": [{ + "device_id": "U738KDF9WJ", + "m.foci.active": [ + { "user_id": "@sfu-lon:matrix.org", "device_id": "FS5F589EF" } + ], + "m.foci.preferred": [ + { "user_id": "@sfu-bon:matrix.org", "device_id": "3FSF589EF" }, + { "user_id": "@sfu-mon:matrix.org", "device_id": "GFSDH93EF" }, + ] + }] } ], "m.expires_ts": 1654616071686 @@ -82,38 +58,83 @@ For instance: } ``` -### Choosing an SFU - -**TODO: How does a client discover SFUs** **TODO: Is SFU identified by just -`user_id` or `(user_id, device_id)`?** - -* When initiating a group call, we need to decide which devices to actually talk - to. - * If the client has no SFU configured, we try to use the `m.foci` in the - `m.call` event. - * If there are multiple `m.foci`, we select the closest one based on - latency, e.g. by trying to connect to all of them simultaneously and - discarding all but the first call to answer. - * If there are no `m.foci` in the `m.call` event, then we look at which foci - in `m.call.member` that are already in use by existing participants, and - select the most common one. (If the focus is overloaded it can reject us - and we should then try the next most populous one, etc). - * If there are no `m.foci` in the `m.call.member`, then we connect full - mesh. - * If subsequently `m.foci` are introduced into the conference, then we - should transfer the call to them (effectively doing a 1:1->group call - upgrade). - * If the client does have an SFU configured, then we decide whether to use it. - * If other conf participants are already using it, then we use it. - * If there are other users from our homeserver in the conference, then we - use it (as presumably they should be using it too) - * If there are no other `m.foci` (either in the `m.call` or in the - participant state) then we use it. - * Otherwise, we save bandwidth on our SFU by not cascading and instead - behaving as if we had no SFU configured. -* We do not recommend that users utilise an SFU to hide behind for privacy, but - instead use a TURN server, only providing relay candidates, rather than - consuming SFU resources and unnecessarily mandating the presence of an SFU. +#### `m.foci.active` + +This field is a list of foci the user's device is publishing to. Usually, this +list will have a length of 1, yet a client might publish to multiple foci if +they are on different networks, for instance, or to simultaneously fan-out in +different directions from the client if there is no nearby focus. If the client +is participating full-mesh, it should either omit this field from the state +event or leave the list empty. + +#### `m.foci.preferred` + +This field is a list of foci the client would prefer to switch to from the +current active focus, if any other client also starts using the given focus. If +the client is already using one of its preferred foci, it should either omit +this field from the state event or leave the list empty. + +### Choosing a focus + +#### Discovering foci + +**TODO: How does a client discover foci? We could use well-known or a custom endpoint** + +Foci are identified by a tuple of `user_id` and `device_id`. + +#### Determining the best focus + +There are many ways to determine the best focus; this MSC recommends the +following: + +- Is the quickest to respond to `m.call.invite` with `m.call.answer`. +- Is the quickest to rapidly reject a spurious HTTPS request to a high-numbered + port on the SFU's IP address, if the SFU exposes its IP somewhere - similar to + the [apenwarr/blip](https://github.com/apenwarr/blip) trick, in order to + measure media-path latency rather than signalling path latency. +- Has the best latency of data-channel traffic flows. +- Has the best latency and bandwidth determined by sending a small splurge of + media down the pipe to probe. + +#### Joining a call + +The following diagram explains how a client chooses a focus when joining a call. + +```mermaid +flowchart TD; +wantsToJoin[Wants to join a call]; +hasPreferred(Has preferred focus?); +callPreferred[Calls preferred foci without media to grab a slot]; +publishPreferred[Publishes `m.foci.preferred`]; +checkMembers(Call has more than 2 members including the client itself?); +callFullMesh[Calls other member full-mesh]; +callMembersFoci[Tries calling foci from `m.call.member` events]; +orderFoci[Orders foci from best to worst]; +findFocusPreferredByOtherMember(Goes through ordered foci to find one which is preferred by at least one other member); +callBestPreferred[Calls the focus]; +callBestActive[Calls the best active focus in room]; +publishActive[Publishes `m.foci.active`]; + +wantsToJoin-->hasPreferred; +hasPreferred--->|Yes|callPreferred; +hasPreferred--->|No|checkMembers; +callPreferred--->publishPreferred; +publishPreferred--->checkMembers; +checkMembers--->|Yes|callMembersFoci; +checkMembers--->|No|callFullMesh; +callMembersFoci--->orderFoci; +orderFoci--->findFocusPreferredByOtherMember; +findFocusPreferredByOtherMember--->|Found|callBestPreferred; +callBestPreferred--->publishActive; +findFocusPreferredByOtherMember--->|Not found|callBestActive; +callBestActive--->publishActive; +``` + +#### Mid-call changes + +Once in a call, the client listens for changes to `m.call.member` state events +and if another member starts using one of the client's preferred foci, the client +switches to that focus. ### Initial offer/answer dance @@ -269,6 +290,14 @@ participants leave, a new megolm session is created and shared with all participants over Olm. The new session is only used once all participants have received it. +### Notes + +#### Hiding behind foci + +We do not recommend that users utilise a focus to hide behind for privacy, but +instead use a TURN server, only providing relay candidates, rather than +consuming focus resources and unnecessarily mandating the presence of a focus. + ## Potential issues The SFUs participating in a conference end up in a full mesh. Rather than From 9882c97c211135a4e6b4455b8830cb8e92fd95ee Mon Sep 17 00:00:00 2001 From: =?UTF-8?q?=C5=A0imon=20Brandner?= Date: Sun, 13 Nov 2022 13:18:02 +0100 Subject: [PATCH 11/29] Amend TODOs MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit Signed-off-by: Šimon Brandner --- proposals/3898-sfu.md | 20 ++++++++++++-------- 1 file changed, 12 insertions(+), 8 deletions(-) diff --git a/proposals/3898-sfu.md b/proposals/3898-sfu.md index ad689c7716b..bdfbe5abd50 100644 --- a/proposals/3898-sfu.md +++ b/proposals/3898-sfu.md @@ -16,10 +16,8 @@ which other SFUs to connect. ## Proposal -**TODO: spell out how this works with active speaker detection & associated -signalling** **TODO: spell out how the DC traffic interacts with -application-layer traffic** **TODO: how do we prove to the SFU that we have the -right to subscribe to track?** +- **TODO: spell out how this works with active speaker detection & associated +signalling** ### Diagrams @@ -78,7 +76,7 @@ this field from the state event or leave the list empty. #### Discovering foci -**TODO: How does a client discover foci? We could use well-known or a custom endpoint** +- **TODO: How does a client discover foci? We could use well-known or a custom endpoint** Foci are identified by a tuple of `user_id` and `device_id`. @@ -155,9 +153,12 @@ The client uses the established data channel connection to the SFU to perform low-latency signalling to rapidly (un)subscribe/(un)publish streams, send keep-alive messages, metadata, cascade and perform re-negotiation. -**TODO: It feels like these ought to be `m.` namespaced** **TODO: Why `op` -instead of `type`?** **TODO: It feels like these ought to have `content` rather -than being on the same layer** +- **TODO: It feels like these ought to be `m.` namespaced** +- **TODO: Why `op` instead of `type`?** +- **TODO: It feels like these ought to have `content` rather than being on the + same layer** +- **TODO: Spell out how the DC traffic interacts with application-layer +traffic** #### SDP Stream Metadata extension @@ -204,6 +205,9 @@ If the user for example switches from "spotlight" (one large tile) to "grid" (multiple small tiles) view, it should also send this request to let the SFU know of the resolution change. +- **TODO: how do we prove to the SFU that we have the right to subscribe to +track?** + ```json { "op": "subscribe", From c66bbe487ed932f455b656dd143c531c3d4ff5f2 Mon Sep 17 00:00:00 2001 From: Daniel Abramov Date: Tue, 15 Nov 2022 13:06:12 +0100 Subject: [PATCH 12/29] Add rationale behind usage of data channels --- proposals/3898-sfu.md | 28 ++++++++++++++++++++++++++++ 1 file changed, 28 insertions(+) diff --git a/proposals/3898-sfu.md b/proposals/3898-sfu.md index bdfbe5abd50..11facb3a869 100644 --- a/proposals/3898-sfu.md +++ b/proposals/3898-sfu.md @@ -153,6 +153,9 @@ The client uses the established data channel connection to the SFU to perform low-latency signalling to rapidly (un)subscribe/(un)publish streams, send keep-alive messages, metadata, cascade and perform re-negotiation. +See the section about the [rationale](#the-use-of-the-data-channels-for-signaling) +behind the use of the data channels for signaling. + - **TODO: It feels like these ought to be `m.` namespaced** - **TODO: Why `op` instead of `type`?** - **TODO: It feels like these ought to have `content` rather than being on the @@ -333,6 +336,31 @@ maintaining two separate hunks of spec. It also forces 1:1 calls to take multi-stream calls seriously, which is useful for more exotic capture devices (stereo cameras; 3D cameras; surround sound; audio fields etc). +### The use of the data channels for signaling + +The current specification assumes that signaling works over Matrix, but +side-chains to the data channel once the peer connection is established +in order to perform low-latency signaling. + +In an ideal scenario the use of the data channels would not be required and +the usage of native Matrix signaling would be sufficient, however due to +the fact that regular Matrix signaling may need to traverse different +servers, e.g. `client <-> home server <-> home server <-> sfu`, our +signaling would not be quite as fast as we need it to be. The effect will +be even greater when coupled with the fact that certain protocols like +HTTP would not be as efficient for a real-time communication as e.g. WebRTC +data channels or WebSockets. + +The problem would be solved if the clients could connect to the SFU +**directly** and communicate via Matrix for all signaling messages. This +would allow us to use a faster transport (WebSockets, QUIC etc) to transmit +signaling messages. However, this is *currently* not possible due to the fact +that it would require the support of the P2P Matrix that is still being under +development at the time of writing this MSC. + +To read more about the problem and get more context, please refer to the +[discussion](https://github.com/matrix-org/matrix-spec-proposals/pull/3898#discussion_r1019098025). + ### Cascading One option here is for SFUs to act as an AS and sniff the `m.call.member` From 1b2d74064ea807411c07b42630c386a9e014951f Mon Sep 17 00:00:00 2001 From: =?UTF-8?q?=C5=A0imon=20Brandner?= Date: Fri, 2 Dec 2022 18:25:54 +0100 Subject: [PATCH 13/29] Add TODO MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit Signed-off-by: Šimon Brandner --- proposals/3898-sfu.md | 2 ++ 1 file changed, 2 insertions(+) diff --git a/proposals/3898-sfu.md b/proposals/3898-sfu.md index 11facb3a869..26c7adc2b74 100644 --- a/proposals/3898-sfu.md +++ b/proposals/3898-sfu.md @@ -134,6 +134,8 @@ Once in a call, the client listens for changes to `m.call.member` state events and if another member starts using one of the client's preferred foci, the client switches to that focus. +**TODO: other cases?** + ### Initial offer/answer dance During the initial offer/answer dance, the client establishes a data-channel From feb064b06cd851943719d6f7b6640210d2eea30e Mon Sep 17 00:00:00 2001 From: =?UTF-8?q?=C5=A0imon=20Brandner?= Date: Fri, 2 Dec 2022 18:26:20 +0100 Subject: [PATCH 14/29] Update event types MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit Signed-off-by: Šimon Brandner --- proposals/3898-sfu.md | 36 +++++++++++++++++++++++++++++++++--- 1 file changed, 33 insertions(+), 3 deletions(-) diff --git a/proposals/3898-sfu.md b/proposals/3898-sfu.md index 26c7adc2b74..f5131c50fd4 100644 --- a/proposals/3898-sfu.md +++ b/proposals/3898-sfu.md @@ -212,6 +212,7 @@ know of the resolution change. - **TODO: how do we prove to the SFU that we have the right to subscribe to track?** +- **TODO: should this be `select` instead?** ```json { @@ -227,6 +228,10 @@ track?** ##### Unsubscribe +If a client no longer wishes to be subscribed to a track, it should send this event. + +- **TODO: should this be `unselect` instead?** + ```json { "op": "unsubscribe", @@ -237,16 +242,36 @@ track?** } ``` -##### Publish +##### Offer -##### Unpublish +Whenever the client/focus creates an SDP offer, it should send it over to the +other side using this event. The other side should then respond with an `answer` +event. -##### Offer +```json +{ + "op": "offer", + "sdp": "..." +} +``` ##### Answer +Whenever the client/focus creates an SDP answer in response to an SDP offer, it +should send it over to the other side using this event. + +```json +{ + "op": "answer", + "sdp": "..." +} +``` + ##### Metadata +Whenever the metadata changes (e.g. mute state changes happen), the client/focus +can send a `metadata` event which includes a `metadata` field. + ```json { "op": "metadata", @@ -256,6 +281,11 @@ track?** ##### Keep-alive +Clients should send `alive` message to foci every so often. If the client does +not send an `alive` message for 30 seconds, the focus should hang up. + +- **TODO: should this be configurable somehow?** + ```json { "op": "alive" From d96d101a9501710c29eb3fb1ee72d1cec98e1600 Mon Sep 17 00:00:00 2001 From: =?UTF-8?q?=C5=A0imon=20Brandner?= Date: Fri, 2 Dec 2022 18:56:16 +0100 Subject: [PATCH 15/29] Add unstable prefixes MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit Signed-off-by: Šimon Brandner --- proposals/3898-sfu.md | 8 ++++++++ 1 file changed, 8 insertions(+) diff --git a/proposals/3898-sfu.md b/proposals/3898-sfu.md index f5131c50fd4..4fefde21e34 100644 --- a/proposals/3898-sfu.md +++ b/proposals/3898-sfu.md @@ -422,3 +422,11 @@ feature is desirable for non-E2EE confs and to stop bandwidth DoS) ## Unstable prefixes We probably don't care for this for the data-channel? + +While this MSC is not considered stable, implementations should use +`org.matrix.msc3898` as a namespace. + +|Stable (post-FCP) |Unstable | +|------------------|-----------------------------------| +|`m.foci.active` |`org.matrix.msc3898.foci.active` | +|`m.foci.preferred`|`org.matrix.msc3898.foci.preferred`| From d538e1e7b23dc7c88866139f2cd2bd57e515b368 Mon Sep 17 00:00:00 2001 From: =?UTF-8?q?=C5=A0imon=20Brandner?= Date: Tue, 6 Dec 2022 11:06:14 +0100 Subject: [PATCH 16/29] Use `subscribe` instead of `select` MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit Signed-off-by: Šimon Brandner --- proposals/3898-sfu.md | 3 --- 1 file changed, 3 deletions(-) diff --git a/proposals/3898-sfu.md b/proposals/3898-sfu.md index 4fefde21e34..dbd0ff3c0a1 100644 --- a/proposals/3898-sfu.md +++ b/proposals/3898-sfu.md @@ -212,7 +212,6 @@ know of the resolution change. - **TODO: how do we prove to the SFU that we have the right to subscribe to track?** -- **TODO: should this be `select` instead?** ```json { @@ -230,8 +229,6 @@ track?** If a client no longer wishes to be subscribed to a track, it should send this event. -- **TODO: should this be `unselect` instead?** - ```json { "op": "unsubscribe", From 91470a2a603fd75ee0571412f9eecec7791153d4 Mon Sep 17 00:00:00 2001 From: =?UTF-8?q?=C5=A0imon=20Brandner?= Date: Tue, 6 Dec 2022 16:10:22 +0100 Subject: [PATCH 17/29] `op` -> `event` MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit Signed-off-by: Šimon Brandner --- proposals/3898-sfu.md | 17 ++++++++--------- 1 file changed, 8 insertions(+), 9 deletions(-) diff --git a/proposals/3898-sfu.md b/proposals/3898-sfu.md index dbd0ff3c0a1..5e1acab0f4f 100644 --- a/proposals/3898-sfu.md +++ b/proposals/3898-sfu.md @@ -159,7 +159,6 @@ See the section about the [rationale](#the-use-of-the-data-channels-for-signalin behind the use of the data channels for signaling. - **TODO: It feels like these ought to be `m.` namespaced** -- **TODO: Why `op` instead of `type`?** - **TODO: It feels like these ought to have `content` rather than being on the same layer** - **TODO: Spell out how the DC traffic interacts with application-layer @@ -215,7 +214,7 @@ track?** ```json { - "op": "subscribe", + "type": "subscribe", "start": [ "stream_id": "streamId1", "track_id": "trackId1", @@ -231,7 +230,7 @@ If a client no longer wishes to be subscribed to a track, it should send this ev ```json { - "op": "unsubscribe", + "type": "unsubscribe", "stop": [ "stream_id": "streamId1", "track_id": "trackId1" @@ -247,7 +246,7 @@ event. ```json { - "op": "offer", + "type": "offer", "sdp": "..." } ``` @@ -259,7 +258,7 @@ should send it over to the other side using this event. ```json { - "op": "answer", + "type": "answer", "sdp": "..." } ``` @@ -271,7 +270,7 @@ can send a `metadata` event which includes a `metadata` field. ```json { - "op": "metadata", + "type": "metadata", "metadata": {...} // As specified in the Metadata section } ``` @@ -285,7 +284,7 @@ not send an `alive` message for 30 seconds, the focus should hang up. ```json { - "op": "alive" + "type": "alive" } ``` @@ -293,11 +292,11 @@ not send an `alive` message for 30 seconds, the focus should hang up. If a user is using their SFU in a call, it will need to know how to connect to other SFUs present in order to participate in the full-mesh of SFU traffic (if -any). The client is responsible for doing this using the `connect` op. +any). The client is responsible for doing this using the `connect` event. ```json { - "op": "connect" + "type": "connect" // TODO: How should this look? } ``` From 2ef7425a63e39b4fb9bea569a4c059abcbce88fd Mon Sep 17 00:00:00 2001 From: =?UTF-8?q?=C5=A0imon=20Brandner?= Date: Tue, 6 Dec 2022 16:10:42 +0100 Subject: [PATCH 18/29] Fixup formatting MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit Signed-off-by: Šimon Brandner --- proposals/3898-sfu.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/proposals/3898-sfu.md b/proposals/3898-sfu.md index 5e1acab0f4f..19c42cf2159 100644 --- a/proposals/3898-sfu.md +++ b/proposals/3898-sfu.md @@ -382,7 +382,7 @@ data channels or WebSockets. The problem would be solved if the clients could connect to the SFU **directly** and communicate via Matrix for all signaling messages. This would allow us to use a faster transport (WebSockets, QUIC etc) to transmit -signaling messages. However, this is *currently* not possible due to the fact +signaling messages. However, this is **currently** not possible due to the fact that it would require the support of the P2P Matrix that is still being under development at the time of writing this MSC. From 5a186e489615aeaa5e1994d8ef0c23b12a9f2e04 Mon Sep 17 00:00:00 2001 From: =?UTF-8?q?=C5=A0imon=20Brandner?= Date: Tue, 6 Dec 2022 16:24:41 +0100 Subject: [PATCH 19/29] Use `content` MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit Signed-off-by: Šimon Brandner --- proposals/3898-sfu.md | 59 +++++++++++++++++++++++++++++++------------ 1 file changed, 43 insertions(+), 16 deletions(-) diff --git a/proposals/3898-sfu.md b/proposals/3898-sfu.md index 19c42cf2159..bf43a7ffa08 100644 --- a/proposals/3898-sfu.md +++ b/proposals/3898-sfu.md @@ -215,12 +215,22 @@ track?** ```json { "type": "subscribe", - "start": [ - "stream_id": "streamId1", - "track_id": "trackId1", - "width": 1920, - "height": 1080 - ], + "content": { + "start": [ + { + "stream_id": "streamId1", + "track_id": "trackId1", + "width": 1920, + "height": 1080 + }, + { + "stream_id": "streamId2", + "track_id": "trackId2", + "width": 256, + "height": 144 + } + ] + } } ``` @@ -231,10 +241,18 @@ If a client no longer wishes to be subscribed to a track, it should send this ev ```json { "type": "unsubscribe", - "stop": [ - "stream_id": "streamId1", - "track_id": "trackId1" - ], + "content": { + "stop": [ + { + "stream_id": "streamId1", + "track_id": "trackId1" + }, + { + "stream_id": "streamId2", + "track_id": "trackId2" + } + ] + } } ``` @@ -247,7 +265,9 @@ event. ```json { "type": "offer", - "sdp": "..." + "content": { + "sdp": "..." + } } ``` @@ -259,7 +279,9 @@ should send it over to the other side using this event. ```json { "type": "answer", - "sdp": "..." + "content": { + "sdp": "..." + } } ``` @@ -271,7 +293,9 @@ can send a `metadata` event which includes a `metadata` field. ```json { "type": "metadata", - "metadata": {...} // As specified in the Metadata section + "content": { + "metadata": {...} // As specified in the Metadata section + } } ``` @@ -284,7 +308,8 @@ not send an `alive` message for 30 seconds, the focus should hang up. ```json { - "type": "alive" + "type": "alive", + "content": {} } ``` @@ -296,8 +321,10 @@ any). The client is responsible for doing this using the `connect` event. ```json { - "type": "connect" - // TODO: How should this look? + "type": "connect", + "content": { + // TODO: How should this look? + } } ``` From b46152526d4039537e5faaf52afaece6576b465f Mon Sep 17 00:00:00 2001 From: =?UTF-8?q?=C5=A0imon=20Brandner?= Date: Tue, 6 Dec 2022 16:29:31 +0100 Subject: [PATCH 20/29] Namespace things MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit Signed-off-by: Šimon Brandner --- proposals/3898-sfu.md | 26 +++++++++++++------------- 1 file changed, 13 insertions(+), 13 deletions(-) diff --git a/proposals/3898-sfu.md b/proposals/3898-sfu.md index bf43a7ffa08..52105712ae3 100644 --- a/proposals/3898-sfu.md +++ b/proposals/3898-sfu.md @@ -214,9 +214,9 @@ track?** ```json { - "type": "subscribe", + "type": "m.subscribe", "content": { - "start": [ + "m.start": [ { "stream_id": "streamId1", "track_id": "trackId1", @@ -240,9 +240,9 @@ If a client no longer wishes to be subscribed to a track, it should send this ev ```json { - "type": "unsubscribe", + "type": "m.unsubscribe", "content": { - "stop": [ + "m.stop": [ { "stream_id": "streamId1", "track_id": "trackId1" @@ -264,9 +264,9 @@ event. ```json { - "type": "offer", + "type": "m.offer", "content": { - "sdp": "..." + "m.sdp": "..." } } ``` @@ -278,9 +278,9 @@ should send it over to the other side using this event. ```json { - "type": "answer", + "type": "m.answer", "content": { - "sdp": "..." + "m.sdp": "..." } } ``` @@ -292,9 +292,9 @@ can send a `metadata` event which includes a `metadata` field. ```json { - "type": "metadata", + "type": "m.metadata", "content": { - "metadata": {...} // As specified in the Metadata section + "m.metadata": {...} // As specified in the Metadata section } } ``` @@ -308,12 +308,12 @@ not send an `alive` message for 30 seconds, the focus should hang up. ```json { - "type": "alive", + "type": "m.alive", "content": {} } ``` -##### Connect +##### Connect to focus If a user is using their SFU in a call, it will need to know how to connect to other SFUs present in order to participate in the full-mesh of SFU traffic (if @@ -321,7 +321,7 @@ any). The client is responsible for doing this using the `connect` event. ```json { - "type": "connect", + "type": "m.connect_to_focus", "content": { // TODO: How should this look? } From e49e80df68468166293df5878a1662f6605e6aba Mon Sep 17 00:00:00 2001 From: =?UTF-8?q?=C5=A0imon=20Brandner?= Date: Tue, 6 Dec 2022 16:31:44 +0100 Subject: [PATCH 21/29] Further namespacing MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit Signed-off-by: Šimon Brandner --- proposals/3898-sfu.md | 9 +++------ 1 file changed, 3 insertions(+), 6 deletions(-) diff --git a/proposals/3898-sfu.md b/proposals/3898-sfu.md index 52105712ae3..313cf03594f 100644 --- a/proposals/3898-sfu.md +++ b/proposals/3898-sfu.md @@ -158,9 +158,6 @@ keep-alive messages, metadata, cascade and perform re-negotiation. See the section about the [rationale](#the-use-of-the-data-channels-for-signaling) behind the use of the data channels for signaling. -- **TODO: It feels like these ought to be `m.` namespaced** -- **TODO: It feels like these ought to have `content` rather than being on the - same layer** - **TODO: Spell out how the DC traffic interacts with application-layer traffic** @@ -171,7 +168,7 @@ be able to distinguish them, this therefore build on [MSC3077](https://github.com/matrix-org/matrix-spec-proposals/pull/3077) and [MSC3291](https://github.com/matrix-org/matrix-spec-proposals/pull/3291) to provide the client with the necessary metadata. Some of the data-channel events -include a `metadata` field including a description of the stream being sent +include an `m.metadata` field including a description of the stream being sent either from the SFU to the client or from the client to the SFU. Other than mute information and stream purpose, the metadata includes video @@ -259,7 +256,7 @@ If a client no longer wishes to be subscribed to a track, it should send this ev ##### Offer Whenever the client/focus creates an SDP offer, it should send it over to the -other side using this event. The other side should then respond with an `answer` +other side using this event. The other side should then respond with an `m.answer` event. ```json @@ -288,7 +285,7 @@ should send it over to the other side using this event. ##### Metadata Whenever the metadata changes (e.g. mute state changes happen), the client/focus -can send a `metadata` event which includes a `metadata` field. +can send an `m.metadata` event which includes an `m.metadata` field. ```json { From 6b3fd47cc25bdc9cc1b75f3cd7646a33f6e3b6db Mon Sep 17 00:00:00 2001 From: =?UTF-8?q?=C5=A0imon=20Brandner?= Date: Tue, 6 Dec 2022 20:11:39 +0100 Subject: [PATCH 22/29] Update the events to match current Matrix MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit Signed-off-by: Šimon Brandner --- proposals/3898-sfu.md | 115 +++++++++++++++++++----------------------- 1 file changed, 52 insertions(+), 63 deletions(-) diff --git a/proposals/3898-sfu.md b/proposals/3898-sfu.md index 313cf03594f..551241a5530 100644 --- a/proposals/3898-sfu.md +++ b/proposals/3898-sfu.md @@ -168,8 +168,8 @@ be able to distinguish them, this therefore build on [MSC3077](https://github.com/matrix-org/matrix-spec-proposals/pull/3077) and [MSC3291](https://github.com/matrix-org/matrix-spec-proposals/pull/3291) to provide the client with the necessary metadata. Some of the data-channel events -include an `m.metadata` field including a description of the stream being sent -either from the SFU to the client or from the client to the SFU. +include an `sdp_stream_metadata` field including a description of the stream +being sent either from the SFU to the client or from the client to the SFU. Other than mute information and stream purpose, the metadata includes video track resolution. The SFU may not be able to determine the resolution of the @@ -195,25 +195,35 @@ in the metadata. #### Event types -##### Subscribe +This MSC adds a few new `m.call.*` events and extends a few of the existing ones. -This event is sent by the client to request a set of tracks. In the case of -video tracks the client can also request a specific resolution of a given a -track; this resolution is a resolution the client wishes to receive but the SFU -may send a lower one due to bandwidth etc. +##### `m.call.track_subscription` + +This event is sent to the focus to let it know about the tracks the client would +like to start/stop subscribing to. + +Upon receiving this event, a focus should make the subscribe changes based on +the `start` and `stop` arrays and respond with an `m.call.negotiate` event. + +In the case of video tracks, in the `start` array the client may also request a +specific resolution for a given track; this resolution is a resolution the +client wishes to receive but the SFU may send a lower one due to bandwidth etc. If the user for example switches from "spotlight" (one large tile) to "grid" -(multiple small tiles) view, it should also send this request to let the SFU -know of the resolution change. +(multiple small tiles) view, it should also send this event with the updated +resolution in the `start` array to let the focus know of the resolution change. + +Clients may request each track only once: foci should ignore multiple requests +of the same track. -- **TODO: how do we prove to the SFU that we have the right to subscribe to +- **TODO: how do we prove to the focus that we have the right to subscribe to track?** ```json { - "type": "m.subscribe", + "type": "m.call.track_subscription", "content": { - "m.start": [ + "start": [ { "stream_id": "streamId1", "track_id": "trackId1", @@ -226,99 +236,78 @@ track?** "width": 256, "height": 144 } - ] - } -} -``` - -##### Unsubscribe - -If a client no longer wishes to be subscribed to a track, it should send this event. - -```json -{ - "type": "m.unsubscribe", - "content": { - "m.stop": [ + ], + "stop": [ { - "stream_id": "streamId1", - "track_id": "trackId1" + "stream_id": "streamId3", + "track_id": "trackId4" }, { - "stream_id": "streamId2", - "track_id": "trackId2" + "stream_id": "streamId4", + "track_id": "trackId4" } ] } } ``` -##### Offer +##### `m.call.negotiate` -Whenever the client/focus creates an SDP offer, it should send it over to the -other side using this event. The other side should then respond with an `m.answer` -event. +This event works exactly like the `m.call.negotiate` event in 1:1 calls. ```json { - "type": "m.offer", + "type": "m.call.negotiate", "content": { - "m.sdp": "..." + "description": { + "type": "offer", + "sdp": "..." + }, + "sdp_stream_metadata": {...} // As specified in the Metadata section } } ``` -##### Answer - -Whenever the client/focus creates an SDP answer in response to an SDP offer, it -should send it over to the other side using this event. - -```json -{ - "type": "m.answer", - "content": { - "m.sdp": "..." - } -} -``` +##### `m.call.sdp_stream_metadata` -##### Metadata +This event works very similarly to the 1:1 call `m.call.sdp_stream_metadata`. -Whenever the metadata changes (e.g. mute state changes happen), the client/focus -can send an `m.metadata` event which includes an `m.metadata` field. +- **TODO: Spec how foci actually use this to advertise tracks** ```json { - "type": "m.metadata", + "type": "m.call.sdp_stream_metadata", "content": { - "m.metadata": {...} // As specified in the Metadata section + "sdp_stream_metadata": {...} // As specified in the Metadata section } } ``` -##### Keep-alive +##### `m.call.keep_alive` -Clients should send `alive` message to foci every so often. If the client does -not send an `alive` message for 30 seconds, the focus should hang up. +Clients should send an `m.call.keep_alive` event to foci every so often. If +the client does not send an `m.call.keep_alive` event for 30 seconds, the +focus should hang up. - **TODO: should this be configurable somehow?** ```json { - "type": "m.alive", + "type": "m.call.keep_alive", "content": {} } ``` -##### Connect to focus +##### `m.call.connect_to_focus` -If a user is using their SFU in a call, it will need to know how to connect to -other SFUs present in order to participate in the full-mesh of SFU traffic (if -any). The client is responsible for doing this using the `connect` event. +If a user is using their focus in a call, it will need to know how to connect to +other foci present in order to participate in the full-mesh of SFU traffic (if +any). The client is responsible for doing this using the +`m.call.connect_to_focus` event. ```json { - "type": "m.connect_to_focus", + "type": "m.call.connect_to_focus", "content": { // TODO: How should this look? } From bf52e02293014d4cec73b465b40c693a1f472764 Mon Sep 17 00:00:00 2001 From: =?UTF-8?q?=C5=A0imon=20Brandner?= Date: Wed, 7 Dec 2022 14:50:24 +0100 Subject: [PATCH 23/29] Fix typo Co-authored-by: David Baker --- proposals/3898-sfu.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/proposals/3898-sfu.md b/proposals/3898-sfu.md index 551241a5530..e950e84635e 100644 --- a/proposals/3898-sfu.md +++ b/proposals/3898-sfu.md @@ -164,7 +164,7 @@ traffic** #### SDP Stream Metadata extension The client will be receiving multiple streams from the SFU and it will need to -be able to distinguish them, this therefore build on +be able to distinguish them, this therefore builds on [MSC3077](https://github.com/matrix-org/matrix-spec-proposals/pull/3077) and [MSC3291](https://github.com/matrix-org/matrix-spec-proposals/pull/3291) to provide the client with the necessary metadata. Some of the data-channel events From f81dd9d92eea146c2df18d6bc4f902e06b07b6d7 Mon Sep 17 00:00:00 2001 From: =?UTF-8?q?=C5=A0imon=20Brandner?= Date: Wed, 7 Dec 2022 15:57:18 +0100 Subject: [PATCH 24/29] Use `subscribe`/`unsbuscribe` MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit Signed-off-by: Šimon Brandner --- proposals/3898-sfu.md | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) diff --git a/proposals/3898-sfu.md b/proposals/3898-sfu.md index e950e84635e..2dbf0a0267b 100644 --- a/proposals/3898-sfu.md +++ b/proposals/3898-sfu.md @@ -223,7 +223,7 @@ track?** { "type": "m.call.track_subscription", "content": { - "start": [ + "subscribe": [ { "stream_id": "streamId1", "track_id": "trackId1", @@ -237,7 +237,7 @@ track?** "height": 144 } ], - "stop": [ + "unsubscribe": [ { "stream_id": "streamId3", "track_id": "trackId4" From 9c32b96c58c343769a67edbb73f99ab78c073df4 Mon Sep 17 00:00:00 2001 From: David Baker Date: Thu, 8 Dec 2022 19:18:35 +0000 Subject: [PATCH 25/29] Add informational section on active/preferred foci. --- proposals/3898-sfu.md | 11 +++++++++++ 1 file changed, 11 insertions(+) diff --git a/proposals/3898-sfu.md b/proposals/3898-sfu.md index 2dbf0a0267b..b0e4ee9a327 100644 --- a/proposals/3898-sfu.md +++ b/proposals/3898-sfu.md @@ -29,6 +29,17 @@ The diagrams of how this all looks can be found in This MSC proposes adding two _optional_ fields to the `m.call.member` state event: `m.foci.preferred` and `m.foci.active`. +Informational: This attempts to avoid the situation where a conference is ongoing +with several users in, for example, New York. These users are all connected to the +focus in New York. Alice joins from London: rather than connecting to the focus +in London, she connects directly to the one in New York since that's where all the +other participants are connected. If more users then join from London, however, they +will all make the same decision and connect to the New York focus rather than the +optimal configuration of the London users connected to the London focus. With active +and preferred foci, the second user that joins from London will know that although +Alice's active focus is Newe York, her preferred is London, and can therefore choose +the London focus instead. + For instance: ```json From 6f8c9d19d86637cc4278be41df2f6ecf6b6b8e17 Mon Sep 17 00:00:00 2001 From: David Baker Date: Thu, 8 Dec 2022 19:35:30 +0000 Subject: [PATCH 26/29] Change keepalives to ping/pong --- proposals/3898-sfu.md | 30 +++++++++++++++++++++--------- 1 file changed, 21 insertions(+), 9 deletions(-) diff --git a/proposals/3898-sfu.md b/proposals/3898-sfu.md index b0e4ee9a327..efbe2a39695 100644 --- a/proposals/3898-sfu.md +++ b/proposals/3898-sfu.md @@ -164,7 +164,7 @@ between itself and the SFU to use later for rapid signalling. The client uses the established data channel connection to the SFU to perform low-latency signalling to rapidly (un)subscribe/(un)publish streams, send -keep-alive messages, metadata, cascade and perform re-negotiation. +ping messages, metadata, cascade and perform re-negotiation. See the section about the [rationale](#the-use-of-the-data-channels-for-signaling) behind the use of the data channels for signaling. @@ -294,17 +294,29 @@ This event works very similarly to the 1:1 call `m.call.sdp_stream_metadata`. } ``` -##### `m.call.keep_alive` - -Clients should send an `m.call.keep_alive` event to foci every so often. If -the client does not send an `m.call.keep_alive` event for 30 seconds, the -focus should hang up. - -- **TODO: should this be configurable somehow?** +##### `m.call.ping`, `m.call.pong` +A ping message must be sent by the focus to the client at an interval +no greater than 30 seconds. On receiving a ping message, a client must respond +immediately with a pong message. A client may therefore detect that the +connection has failed after an amount of time of its choosing (greater than +30 seconds) has elapsed since it last saw a ping message. A server may deem a +client unresponsive after not receiving a pong some amount of time after it +has sent a ping, again the amount of time the server waits is up to the +implementation. Either send should hang up once deeming the other side +unresponsive. + +focus -> client: +```json +{ + "type": "m.call.ping", + "content": {} +} +``` +client -> focus: ```json { - "type": "m.call.keep_alive", + "type": "m.call.pong", "content": {} } ``` From ecf2425e58fbf19808307b49d7825433d40d0a99 Mon Sep 17 00:00:00 2001 From: =?UTF-8?q?=C5=A0imon=20Brandner?= Date: Thu, 8 Dec 2022 21:30:56 +0100 Subject: [PATCH 27/29] Add empty line --- proposals/3898-sfu.md | 1 + 1 file changed, 1 insertion(+) diff --git a/proposals/3898-sfu.md b/proposals/3898-sfu.md index efbe2a39695..5e23998fd75 100644 --- a/proposals/3898-sfu.md +++ b/proposals/3898-sfu.md @@ -295,6 +295,7 @@ This event works very similarly to the 1:1 call `m.call.sdp_stream_metadata`. ``` ##### `m.call.ping`, `m.call.pong` + A ping message must be sent by the focus to the client at an interval no greater than 30 seconds. On receiving a ping message, a client must respond immediately with a pong message. A client may therefore detect that the From bf04b17b7be79b555105f72aab0de3516e4315bb Mon Sep 17 00:00:00 2001 From: =?UTF-8?q?=C5=A0imon=20Brandner?= Date: Fri, 9 Dec 2022 15:03:10 +0100 Subject: [PATCH 28/29] Fix event name MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit Signed-off-by: Šimon Brandner --- proposals/3898-sfu.md | 6 +++--- 1 file changed, 3 insertions(+), 3 deletions(-) diff --git a/proposals/3898-sfu.md b/proposals/3898-sfu.md index 5e23998fd75..2a63d3bbc6d 100644 --- a/proposals/3898-sfu.md +++ b/proposals/3898-sfu.md @@ -279,15 +279,15 @@ This event works exactly like the `m.call.negotiate` event in 1:1 calls. } ``` -##### `m.call.sdp_stream_metadata` +##### `m.call.sdp_stream_metadata_changed` -This event works very similarly to the 1:1 call `m.call.sdp_stream_metadata`. +This event works very similarly to the 1:1 call `m.call.sdp_stream_metadata_changed`. - **TODO: Spec how foci actually use this to advertise tracks** ```json { - "type": "m.call.sdp_stream_metadata", + "type": "m.call.sdp_stream_metadata_changed", "content": { "sdp_stream_metadata": {...} // As specified in the Metadata section } From 1896fc7cdab7cbf5e653f84b650772e894e26485 Mon Sep 17 00:00:00 2001 From: =?UTF-8?q?=C5=A0imon=20Brandner?= Date: Mon, 12 Dec 2022 13:28:27 +0100 Subject: [PATCH 29/29] Remove encryption section as it's glossing over details MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit Signed-off-by: Šimon Brandner --- proposals/3898-sfu.md | 28 +++------------------------- 1 file changed, 3 insertions(+), 25 deletions(-) diff --git a/proposals/3898-sfu.md b/proposals/3898-sfu.md index 2a63d3bbc6d..4755f103974 100644 --- a/proposals/3898-sfu.md +++ b/proposals/3898-sfu.md @@ -37,7 +37,7 @@ other participants are connected. If more users then join from London, however, will all make the same decision and connect to the New York focus rather than the optimal configuration of the London users connected to the London focus. With active and preferred foci, the second user that joins from London will know that although -Alice's active focus is Newe York, her preferred is London, and can therefore choose +Alice's active focus is New York, her preferred is London, and can therefore choose the London focus instead. For instance: @@ -307,6 +307,7 @@ implementation. Either send should hang up once deeming the other side unresponsive. focus -> client: + ```json { "type": "m.call.ping", @@ -315,6 +316,7 @@ focus -> client: ``` client -> focus: + ```json { "type": "m.call.pong", @@ -338,30 +340,6 @@ any). The client is responsible for doing this using the } ``` -### Encryption - -When SFUs are on the media path, they will necessarily terminate the SRTP -traffic from the peer, breaking E2EE. To address this, we apply an additional -end-to-end layer of encryption to the media using [WebRTC Encoded -Transform](https://github.com/w3c/webrtc-encoded-transform/blob/main/explainer.md) -(formerly Insertable Streams) via -[SFrame](https://datatracker.ietf.org/doc/draft-omara-sframe/). - -In order to provide PFS, The symmetric key used for these streams from a given -participating device is a megolm key. Unlike a normal megolm key, this is shared -via `m.room_key` over Olm to the devices participating in the conference -including an `m.call_id` and `m.room_id` field on the key to correlate it to the -conference traffic, rather than using the `session_id` event field to correlate -(given the encrypted traffic is SRTP rather than events, and we don't want to -have to send fake events from all senders every time the megolm session is -replaced). - -The megolm key is ratcheted forward for every SFrame, and shared with new -participants at the current index via `m.room_key` over Olm as per above. When -participants leave, a new megolm session is created and shared with all -participants over Olm. The new session is only used once all participants have -received it. - ### Notes #### Hiding behind foci