[WIP] MSC3898: Native Matrix VoIP signalling for cascaded foci (SFUs, MCUs...) #3898

SimonBrandner · 2022-09-25T17:32:53Z

Builds on/split out of #3401

Signed-off-by: Šimon Brandner <[email protected]>

ara4n · 2022-09-25T17:37:38Z

i'm all in favour of splitting out the SFU bits from MSC3401, and leaving MSC3401 to focus on full mesh - but perhaps don't duplicate stuff between the two (e.g. the architecture diagrams, or m.call.* events?)

SimonBrandner · 2022-09-25T17:40:22Z

i'm all in favour of splitting out the SFU bits from MSC3401, and leaving MSC3401 to focus on full mesh - but perhaps don't duplicate stuff between the two (e.g. the architecture diagrams, or m.call.* events?)

I've duplicated the diagrams here mainly for context and the m.call. events are not duplicates, they just describe what needs to be added for SFUs to work (m.foci) (the whole event is included only as an example)

Signed-off-by: Šimon Brandner <[email protected]>

proposals/3898-sfu.md

Signed-off-by: Šimon Brandner <[email protected]>

proposals/3898-sfu.md

Signed-off-by: Šimon Brandner <[email protected]>

daniel-abramov

Another issue that I ran into while refactoring the SFU was that I noticed that we don't have any feedback regarding messages sent to (and from) the SFU failing. This made things hard to debug since the client never knows if the message sent to the SFU was processed and if the result was successful. I.e. if you try to subscribe to a track and you don't get any tracks back, you don't really know if you need to retry and if so, for how long. We probably need to introduce a mechanism to report errors from/to the SFU. And because of the async nature of the communication, each such message will have to have some sort of a transaction ID or something like this, so that if the client sends Subscribe, Subscribe, Subscribe and gets an error back, it knows which Subscribe command the error corresponds to.
Also, we need to probably mention somewhere how to deal with the "obsolete" To-Device message. When testing the SFU I've noticed that sometimes there were cases when I ended the SFU, but clients continued to post messages to the SFU (not knowing that the SFU is down). That resulted in lots of messages accumulating somewhere in the home server. Once the SFU got restarted, it got lots of messages back (which triggered many actions that must not have happened). We need to somehow receive and drop all old messages on the SFU side once we start it, i.e. send a request to the home server with something like "Hey, give me the messages since time.Now() and drop the rest" (conceptually; we probably can achieve this using the current API of the client SDK).

proposals/3898-sfu.md

daniel-abramov · 2022-12-02T21:50:40Z

proposals/3898-sfu.md

+        "stream_id": "streamId1",
+        "track_id": "trackId1",


Btw, do we really need both track ID and stream ID for the SFU use case?

The track IDs that browsers generate seem to be GUIDs that are unique enough (i.e. it's unlikely that there would be 2 tracks with the same GUID). Does this mean that instead of using two values, we could just send the track_id? (the server anyway always knows the stream ID of each track).

I don't have any strong arguments here. @ara4n and @dbkr, do you have any thoughts?

To explain the reasoning here. When browsing via Pion docs, I've noticed that StreamID is said to be unique only within a single peer connection (but not globally), while TrackID is meant to be unique within a stream, but not globally. This means that in the case of the SFU, the combination of StreamID and TrackID as per Pion would not be enough to uniquely identify a track, so initially, I was worried that our implementation is not correct. However, when checking what the browsers actually send as TrackID and StreamID, I've noticed that both are randomly generated with TrackID being a GUID (it's also not that off from the official spec). If the TrackID is a GUID, then it would be enough to use the GUID as an identifier for tracks when trying to subscribe/unsubscribe from tracks. The rest (StreamID) would anyway be known to the client once the subscription is completed since they will get the remote way along with its stream ID.

Also, using only the TrackID would allow us to support streamless tracks that may potentially exist in the MatrixRTC use case.

daniel-abramov · 2022-12-02T21:51:28Z

proposals/3898-sfu.md

+If a user is using their SFU in a call, it will need to know how to connect to
+other SFUs present in order to participate in the full-mesh of SFU traffic (if
+any). The client is responsible for doing this using the `connect` op.


Do we want to specify the cascading specific logic in this MSC or would it be better to make a separate MSC for cascading?

Rationale: if we have a dedicated MSC for the SFU, we'll be able to finalize and merge it faster to master. Iterating with small MSCs might be a better idea given the amount of time it normally takes until the MSC is merged? (just a gut feeling)

The issue is that the event fields used in a single focus case are quite different from the cascading case. I wonder if there is a way to avoid that issue

Sorry, I did not fully get what you mean.

I think the reason why I initially commented is that it seems like we're not going to have the cascading implemented in the very nearest future (currently we don't really support it), so I thought maybe it would be faster to limit this MSC to the SFU and then create a cascading MSC after that (once we have a stable SFU). I was just afraid that otherwise the MSC would stay open (or in a draft state) for too long.

I agree with that but I am not sure how to technically handle this - the MSC currently specifies an SFU selection algorithm and the fields it uses, if we wanted to split the MSC into two, we would need to completely different ways to specify the SFU, I think...

daniel-abramov · 2022-12-02T21:55:29Z

proposals/3898-sfu.md

+    "content": {
+        "m.calls": [
+            {
+                "m.call_id": "cvsiu2893",


Note that the call_id does not seem to be necessary.

When the SFU sends To-Device messages to the clients, the conf_id is specified and given that the conf_id is a unique identifier of a conference/call, there seem to be no need to have a call_id in addition to that.

Recently I've ran into an issue where I realized that call_id and conf_id are not the same (despite MSC3401 giving me an impression that they are identical). The conf_id was the ID of a conference (as expected), but the call_id was another random string that was different for each single participant which forced us to use both call_id and conf_id when sending messages back to the clients (otherwise they would be rejected).

It looks like call_id should either be removed or (if we want to keep it for the backward compatibility with the older MSC?) it must be equal to the conf_id.

I think this comment belongs on MSC3401 as this line is specified in other MSC, although I thinik the conclusion is just that there's confusion between call_id and conf_id and we should rename this to conf_id (there's no other conf ID in this event so it is necessary, not just for backwards compat).

Yeah, I've also written a comment about it in MSC3401 😛

Basically, the problem is not only that they are called differently, but also that the value of call_id and conf_id is different, so they are different for some reason (and on the SFU we are obligated to take both into account: conf_id for a conference ID and the call_id to set a value in outgoing To-Device messages without which the client would discard the messages).

Do you agree that the correct resolution is to change this to m.conf_id?

Yes, that would be great! Though I wonder what the consequence of that would be (i.e. what is that value that the current call_id has? - It's not a conference ID, it's something different, or maybe it's a leftover from a previous implementation for 1:1s where call_id meant something?)

Really?

Yes 🙂 That's something that I discovered a week ago when deploying the first iteration of refactored SFU. I've just tried to join the SFU and the conf_id field is equal to 1668002318158qFQmBWgVHHXTZsPA, while the call_id is 1670443502134bOWVqa3btIfDQMjJ.

conf_id and call_id from where though? There will also be call_id in the individual calls which will definitely be different. Otherwise we need to work out what's going on here.

conf_id and call_id from where though?

From To-Device messages that the participants of the conference send to the SFU. We then reply with To-Device messages back (e.g. when we generate an answer), in which case we also set both conf_id or call_id.

Do you mean https://github.com/matrix-org/matrix-js-sdk/blob/develop/src/webrtc/call.ts#L2252? conf_id is the ID of the conference call (state key of the m.call event), call_id is the ID of the 1:1 call between the individual group call participants.

Yeah, seems like this. But the thing is that, from the SFUs standpoint, the call_id does not have any semantics, but currently we're obligated to store both conf_id and call_id (which have different values), where the call_id is only used in order to send To-Device messages to the clients, i.e. when I e.g. want to send messages from the SFU to the client, I have to set both the conf_id (the ID of a conference) and the call_id (the ID of the 1:1 call between individual group call participants).

So my point is that we probably want to get rid of mandating call_id for the SFU calls since they don't seem any semantic value for this use case. And only use the conf_id instead?

SimonBrandner · 2022-12-03T11:39:18Z

proposals/3898-sfu.md

+
+## Security considerations
+
+Malicious users could try to DoS SFUs by specifying them as their foci.


Are SFUs not (by default, with an option to the admin/operator to open it up) authenticated using one's matrix account? Shouldn't they be?
The cascaded decentralized SFU concept appears to be that there is one focus associated with each homeserver. Hence I would expect that I can ever only access my hs's SFU(s).

(by @HarHarLinks from #3401 (comment))

As I learn more about this topic, foci seem to not be authenticated.
As a server admin, I would like if not anyone could use the focus I host. It would appear logical to allow only user of one or more associated homeservers and at most also temporarily their remote call members if the algorithm deems the focus favourable.

Signed-off-by: Šimon Brandner <[email protected]>

daniel-abramov · 2022-12-06T20:05:23Z

proposals/3898-sfu.md

+like to start/stop subscribing to.
+
+Upon receiving this event, a focus should make the subscribe changes based on
+the `start` and `stop` arrays and respond with an `m.call.negotiate` event.


Should we always respond to the m.call.negotiate (we may re-use the transceiver if there is such a possibility)? Maybe we can just mention that the server may reply with the m.call.negotiate if it's practical/necessary.

I'd stick with the current wording until we figure out something better and more specific

proposals/3898-sfu.md

dbkr · 2022-12-07T11:29:29Z

proposals/3898-sfu.md

+|Stable (post-FCP) |Unstable                           |
+|------------------|-----------------------------------|
+|`m.foci.active`   |`org.matrix.msc3898.foci.active`   |
+|`m.foci.preferred`|`org.matrix.msc3898.foci.preferred`|


Moving this to a line comment so it can be a thread:

Also, we need to probably mention somewhere how to deal with the "obsolete" To-Device message. When testing the SFU I've noticed that sometimes there were cases when I ended the SFU, but clients continued to post messages to the SFU (not knowing that the SFU is down). That resulted in lots of messages accumulating somewhere in the home server. Once the SFU got restarted, it got lots of messages back (which triggered many actions that must not have happened). We need to somehow receive and drop all old messages on the SFU side once we start it, i.e. send a request to the home server with something like "Hey, give me the messages since time.Now() and drop the rest" (conceptually; we probably can achieve this using the current API of the client SDK).

Yep, this is broadly the same problem as a client starting back up and receiving old messages, some of which are call invites, and having to determine whether the calls have been and gone, in which case it should just ignore the events, or if the call is still ringing.

There's actually not much we can do about this at the client-server API level: to-device messages are just store-and-forward, so there's no real way to tell which messages are old and which are recent. To-device messages don't have timestamps either which room events do. I think the options are either to read & discard all to-device messages at startup or add a timestamp & expiry at the event level. This will also tie in to how we manage the lifetime of a focus and how we balance between foci both to distribute load and so we can restart individual foci.

proposals/3898-sfu.md

dbkr · 2022-12-07T13:22:29Z

proposals/3898-sfu.md

+
+#### Discovering foci
+
+- **TODO: How does a client discover foci? We could use well-known or a custom endpoint**


Thoughts: how we load balance between SFUs and manage availability will have a bearing on this, ie. would we expect SFUs to become unavailable when they get restarted / updated and therefore how often a client expect its SFU (or list of SFUs?) to change?

proposals/3898-sfu.md

dbkr · 2022-12-07T13:40:51Z

proposals/3898-sfu.md

+        "start": [
+            {
+                "stream_id": "streamId1",
+                "track_id": "trackId1",


Maybe we should include the user ID of the user sending the track we want here? That way we're not relying on stream/track IDs being globally unique (plus it will make the the signalling much easier to understand when looking at it). The stream ID feels unnecessary in either case.

Hmm, interesting point, perhaps device_id as well? So it would be (track_id, device_id, track_id)? @daniel-abramov, what do you think?

Though I guess that if we have these we might as well leave the stream_id there for flexibility....

That seems like a part of a discussion that we've recently had about the stream/track IDs.

So far the trackIDs were unique regardless of the browser we used for the tests (we even changed the code of the waterfall to only rely on trackID when subscribing to tracks and it seems to work just fine and the handling is simpler and more elegant).

I think we have 2 options here:

Either use trackID only (seems to be totally fine since trackIDs are GUIDs).

Or use a tuple of track_id, device_id and stream_id that @SimonBrandner suggested in the comment above.

The current implementation in the SFU uses (1), which also it seems to be ok from the RFC's standpoint:

[..] A good practice is to use a UUID [rfc4122], which is 36 characters long in its canonical form. To avoid fingerprinting, implementations SHOULD use the forms in section 4.4 or 4.5 of RFC 4122 when generating UUIDs. [..]

I don't have a strong opinion, but I'm always biased toward elegant and simple solutions, so my personal preference would be an option (1).

Ah yes, sorry - this is very similar, but github has hidden that comment as outdated. The RFC is only suggesting UUIDs as good practice though, so I'm not sure we can rely on it. Šimon's correct too in that we'd need the device ID too if we couldn't be sure that the track ID was globally unique.

Another thing we could do here is specify the SFU(s?) to get the stream from? I think this would mean we wouldn't need the the connect-to_focus message?

Another thing we could do here is specify the SFU(s?) to get the stream from? I think this would mean we wouldn't need the the connect-to_focus message?

For cascading? - Yeah, probably, but I have not yet thought through the whole cascading thing yet (but probably we could approach the cascading topic similar to what we did with the SFU conferencing: experiment with things in code and update an MSC once we gathered more information on what works).

Actually, we shouldn't be using WebRTC track-ids at all (https://blog.mozilla.org/webrtc/the-evolution-of-webrtc/). We should identify by mids to the focus and either use this directly or make up our own ID to reference media here, mapping it to the mid on the focus with a stream_metadata.

Another thing we could do here is specify the SFU(s?) to get the stream from? I think this would mean we wouldn't need the the connect-to_focus message?

It's not very clear to me how that would work, tbh

Actually, we shouldn't be using WebRTC track-ids at all (https://blog.mozilla.org/webrtc/the-evolution-of-webrtc/). We should identify by mids to the focus and either use this directly or make up our own ID to reference media here, mapping it to the mid on the focus with a stream_metadata.

This is a good point, @dbkr. I also read this article in the past, but got confused and ignored the conclusion since I saw that the approach of using stream IDs and track IDs did seem to work for the EC despite that article from Mozilla stating that it's a no go (other, newer articles had similar conclusions).

I tried to correlate the information between that particle + another article on transceivers from Mozilla + webrtcforthecurious + WebRTC API docs from Mozilla to understand what's the correct way to tackle this problem.

Since my notes were rather large for a comment, I've created a discussion page for that as we agreed.

Please take a look: https://github.com/vector-im/voip-internal/discussions/79

proposals/3898-sfu.md

Co-authored-by: David Baker <[email protected]>

Signed-off-by: Šimon Brandner <[email protected]>

proposals/3898-sfu.md

Signed-off-by: Šimon Brandner <[email protected]>

Native Matrix VoIP signalling for cascaded SFUs

750087f

Signed-off-by: Šimon Brandner <[email protected]>

SimonBrandner added voip proposal A matrix spec change proposal kind:core MSC which is critical to the protocol's success needs-implementation This MSC does not have a qualifying implementation for the SCT to review. The MSC cannot enter FCP. labels Sep 25, 2022

SimonBrandner mentioned this pull request Sep 25, 2022

Rip out SFU bits out of MSC3401 #3897

Merged

SimonBrandner changed the title ~~[WIP] MSC0000: Native Matrix VoIP signalling for cascaded SFUs~~ [WIP] MSC3898: Native Matrix VoIP signalling for cascaded SFUs Sep 25, 2022

Update MSC number

aa53398

Signed-off-by: Šimon Brandner <[email protected]>

SimonBrandner added 2 commits October 2, 2022 11:40

Link to diagrams from MSC3401

de302cb

Signed-off-by: Šimon Brandner <[email protected]>

Use correct number for file

7474782

Signed-off-by: Šimon Brandner <[email protected]>

daniel-abramov reviewed Nov 10, 2022

View reviewed changes

proposals/3898-sfu.md Show resolved Hide resolved

proposals/3898-sfu.md Outdated Show resolved Hide resolved

proposals/3898-sfu.md Show resolved Hide resolved

SimonBrandner added 3 commits November 11, 2022 16:38

Update sub and unsub ops

5cad46d

Signed-off-by: Šimon Brandner <[email protected]>

Merge remote-tracking branch 'upstream/main' into SimonBrandner/msc/sfu

2cbc2d6

Give a reason for specifying res in metadata

f542fcb

Signed-off-by: Šimon Brandner <[email protected]>

SimonBrandner mentioned this pull request Nov 12, 2022

Implement MSC3898: Native Matrix VoIP signalling for cascaded SFUs matrix-org/matrix-js-sdk#2423

Draft

SimonBrandner added 3 commits November 12, 2022 17:27

Specify foci by device_id too

6f01a94

Signed-off-by: Šimon Brandner <[email protected]>

Fixup some json

575e16c

Signed-off-by: Šimon Brandner <[email protected]>

Typo

33b1880

Signed-off-by: Šimon Brandner <[email protected]>

ara4n reviewed Nov 12, 2022

View reviewed changes

proposals/3898-sfu.md Outdated Show resolved Hide resolved

SimonBrandner and others added 3 commits November 13, 2022 13:11

Specify how to handle foci better

65faee4

Signed-off-by: Šimon Brandner <[email protected]>

Amend TODOs

9882c97

Signed-off-by: Šimon Brandner <[email protected]>

Add rationale behind usage of data channels

c66bbe4

daniel-abramov mentioned this pull request Dec 1, 2022

SFU Refactoring matrix-org/waterfall#52

Merged

SimonBrandner added 3 commits December 2, 2022 18:56

Add TODO

1b2d740

Signed-off-by: Šimon Brandner <[email protected]>

Update event types

feb064b

Signed-off-by: Šimon Brandner <[email protected]>

Add unstable prefixes

d96d101

Signed-off-by: Šimon Brandner <[email protected]>

daniel-abramov reviewed Dec 2, 2022

View reviewed changes

daniel-abramov mentioned this pull request Dec 2, 2022

MSC3401: Native Group VoIP Signalling #3401

Open

matrix-org deleted a comment from daniel-abramov Dec 3, 2022

SimonBrandner changed the title ~~[WIP] MSC3898: Native Matrix VoIP signalling for cascaded SFUs~~ [WIP] MSC3898: Native Matrix VoIP signalling for cascaded foci Dec 3, 2022

SimonBrandner changed the title ~~[WIP] MSC3898: Native Matrix VoIP signalling for cascaded foci~~ [WIP] MSC3898: Native Matrix VoIP signalling for cascaded foci (SFUs, MCUs...) Dec 3, 2022

SimonBrandner commented Dec 3, 2022

View reviewed changes

Use subscribe instead of select

d538e1e

Signed-off-by: Šimon Brandner <[email protected]>

SimonBrandner mentioned this pull request Dec 6, 2022

Update events shape to match the MSC matrix-org/waterfall#62

Closed

SimonBrandner added 6 commits December 6, 2022 16:11

op -> event

91470a2

Signed-off-by: Šimon Brandner <[email protected]>

Fixup formatting

2ef7425

Signed-off-by: Šimon Brandner <[email protected]>

Use content

5a186e4

Signed-off-by: Šimon Brandner <[email protected]>

Namespace things

b461525

Signed-off-by: Šimon Brandner <[email protected]>

Further namespacing

e49e80d

Signed-off-by: Šimon Brandner <[email protected]>

Update the events to match current Matrix

6b3fd47

Signed-off-by: Šimon Brandner <[email protected]>

daniel-abramov reviewed Dec 6, 2022

View reviewed changes

dbkr reviewed Dec 7, 2022

View reviewed changes

proposals/3898-sfu.md Show resolved Hide resolved

dbkr reviewed Dec 7, 2022

View reviewed changes

proposals/3898-sfu.md Outdated Show resolved Hide resolved

dbkr reviewed Dec 7, 2022

View reviewed changes

proposals/3898-sfu.md Outdated Show resolved Hide resolved

dbkr reviewed Dec 7, 2022

View reviewed changes

proposals/3898-sfu.md Outdated Show resolved Hide resolved

SimonBrandner and others added 4 commits December 7, 2022 14:50

Fix typo

bf52e02

Co-authored-by: David Baker <[email protected]>

Use subscribe/unsbuscribe

f81dd9d

Signed-off-by: Šimon Brandner <[email protected]>

Add informational section on active/preferred foci.

9c32b96

Change keepalives to ping/pong

6f8c9d1

SimonBrandner commented Dec 8, 2022

View reviewed changes

proposals/3898-sfu.md Show resolved Hide resolved

SimonBrandner added 3 commits December 8, 2022 21:30

Add empty line

ecf2425

Fix event name

bf04b17

Signed-off-by: Šimon Brandner <[email protected]>

Remove encryption section as it's glossing over details

1896fc7

Signed-off-by: Šimon Brandner <[email protected]>

daniel-abramov mentioned this pull request Dec 12, 2022

Change event shape to match MSC3898 matrix-org/waterfall#70

Merged


		## Security considerations

		Malicious users could try to DoS SFUs by specifying them as their foci.


		#### Discovering foci

		- TODO: How does a client discover foci? We could use well-known or a custom endpoint

[WIP] MSC3898: Native Matrix VoIP signalling for cascaded foci (SFUs, MCUs...) #3898

Are you sure you want to change the base?

[WIP] MSC3898: Native Matrix VoIP signalling for cascaded foci (SFUs, MCUs...) #3898

Conversation

SimonBrandner commented Sep 25, 2022 • edited Loading

ara4n commented Sep 25, 2022

SimonBrandner commented Sep 25, 2022 • edited Loading

daniel-abramov left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

daniel-abramov Dec 2, 2022 • edited by SimonBrandner Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

daniel-abramov Dec 7, 2022 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

SimonBrandner commented Sep 25, 2022 •

edited

Loading

SimonBrandner commented Sep 25, 2022 •

edited

Loading

daniel-abramov Dec 2, 2022 •

edited by SimonBrandner

Loading

daniel-abramov Dec 7, 2022 •

edited

Loading