Skip to content

Conversation

@toger5
Copy link

@toger5 toger5 commented May 10, 2024

Rendered

To-do:

Pull Request Checklist

@toger5 toger5 marked this pull request as draft May 10, 2024 10:33
@toger5 toger5 force-pushed the toger5/matrixRTC branch 2 times, most recently from d717c0b to cff8291 Compare May 10, 2024 10:34
Signed-off-by: Timo K <[email protected]>
@toger5 toger5 force-pushed the toger5/matrixRTC branch from cff8291 to 9cbe448 Compare May 10, 2024 10:35
@turt2live turt2live changed the title MatrixRTC (draft) MSC4143: MatrixRTC May 10, 2024
@turt2live turt2live added voip proposal A matrix spec change proposal kind:core MSC which is critical to the protocol's success needs-implementation This MSC does not have a qualifying implementation for the SCT to review. The MSC cannot enter FCP. labels May 10, 2024
Signed-off-by: Timo K <[email protected]>
@toger5 toger5 force-pushed the toger5/matrixRTC branch from b0aa20b to b2b4e5e Compare May 14, 2024 09:24
## Unstable prefix

The state events and the well_known key introduced in this MSC use the unstable prefix
`org.matrix.msc4143.` instead of `m.` as used in the text.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

empirically we seem to be using org.matrix.msc3401.call.member rather than org.matrix.msc4143.rtc.member in Element?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The current implementation is still using the msc3401 prefix. That is wrong and will be addressed. There is still the open topic of how exactly we want to do the state keys and the event ownership and on top of that we have plans for how to index rtc member events in a better way.
The reason we changed it to (...call... ->) ...rtc... is, that we need the call namespace for the particular video call matrixRTC application (session type) of calling over MatrixRTC. Using that word for both. Matrix rtc sessions in general and calls will be confusing in the long run.

`created_ts()`+`device_id`. This is why the `m.rtc.member` events deliberately do NOT include a `membership_id`.

Other then the membership sessions, there is **no event** to represent a rtc session (containing all members).
Such an event would include shared information, and deciding who has authority over that is not trivial.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I continue to trip over whether it is wise to force clients to read all possible m.rtc.member events to figure out if a call is happening, and who created it.

I /think/ that a better reason for not having an m.rtc state event describing the existence of an RTC session is that you'd have to handle disconnection semantics on it similar to delayed-events for m.rtc.member... at which point, why not leverage the membership events?

However, it still feels REALLY weird to not have something in state telling you whether semantically a call is intended to be happening now (and what that sort of call is, when it began, and who initiated it) - versus having to infer it.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Alternatively, if the thought experiment is "what if two users both create an m.rtc state event on different forks of the DAG at the same time?" ... is that really so bad? and does aggregating m.rtc.member state actually make it better? if so, how?

In other words, we actually need to justify the lack of m.rtc state event much better here, imo. In particular, having somewhere to store the metadata about the call at the point of creation (its name, its ID, whether it's intended to be a voice/video room or a group call or a conference, etc)

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There is a large list of reasons and I 100% support the idea to go into detail in the MSC to justify this approach.

As for this comment I will just give a couple of short examples/arguments:

The security and "trolling" surface is huge. If we have one state event we either limit who can create a call or we allow everyone to mess with the event. This can go from smaller issues like ending the call for fun to larger issues like changing where the call is happening without the members noticing it. (in a LK world at least)

But even if everyone plays fair, a call can be stopped by a delayed event because the creator failed to send the refresh event. This now disconnects everyone from the call.
Independent how we make things behave, if we have a public/shared event controlling the experience for everyone we switch from:

  • If there is a client with issues (or user that actively introduces issues) that user has a degraded experience
  • If there is a client with issues (or user that actively introduces issues) everyone experience can be broken.

In the context of matrix where there is no central entity controlling the clients the seconds seems to be the only valid option.


- [`m.call`](www.example.com) TODO: create `m.call` MSC and add link here.

## Potential issues
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The fact we don't reference how to tell users that a call is happening (i.e. m.call.notify) is very disorienting here.

spantaleev added a commit to spantaleev/matrix-docker-ansible-deploy that referenced this pull request Mar 17, 2025
…atrix_client_property_org_matrix_msc4143_rtc_foci` so that a list is produced

`group_vars/matrix_servers` was correctly populating `matrix_static_files_file_matrix_client_property_org_matrix_msc4143_rtc_foci_auto` with a list, but:

- the defaults for these variables were hinting that hashmaps are necessary

- merging of `_auto` and `_custom` was done as if for hashmaps, not lists

As a result, `/.well-known/matrix/client` looked like this:

```json
{
	"org.matrix.msc4143.rtc_foci": {
		"livekit_service_url": "https://matrix.example.com/livekit-jwt-service",
		"type": "livekit"
	}
}
```

.. instead of what's expected as per MSC4143 (matrix-org/matrix-spec-proposals#4143):

```json
{
	"org.matrix.msc4143.rtc_foci": [
		{
			"livekit_service_url": "https://matrix.example.com/livekit-jwt-service",
			"type": "livekit"
		}
	]
}
```

Regardless of our incorrectly formatted `org.matrix.msc4143.rtc_foci`
configuration in `/.well-known/matrix/client`, Element Web still seemed
to be able to discover LiveKit JWT Service (and by extension, LiveKit Server) correctly,
even without this fix.
},
"member": {
"id": "xyzABCDEF10123",
"device_id": "DEVICEID",
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why is there a device_id, user_id here?
These are claimed fields, no need to have them they can be found from the olm session that was used for decryption

Copy link
Author

@toger5 toger5 Apr 18, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There is no olm session for the m.call.member state events.
And we might want the option to create memberships from other accounts. (bots creating multiple memberships)

The fields are as follows:

- `member` required object - describes the participant of the RTC session:
- `id` required string - a unique identifier for this session membership as defined above. Recommended to be a UUID. It can be reused if the user leaves and rejoins the session.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe obvious but presumably the recommendation is for UUIDv4?


#### Leaving a session

Sending an empty `m.rtc.member` event represents a leave action. The state key must be the same as boefore
Copy link
Contributor

@Johennes Johennes Mar 27, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It is said further up that the session participant (member.user_id) is not necessarily identical to the sender of the m.rtc.member event. How does this interact with state protection when a user wants to leave a session that they were added to by somebody else?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is a really good reason, that the state key should not be used as the state protection. We would like the external user to create a session for you and allow you + them to modify this state key.

Comment on lines 120 to 121
All membership events that belong to one member session can be grouped with the index
`created_ts()`+`state_key`. This is why the `m.rtc.member` events deliberately do NOT include something akin to a `membership_id`.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm curious why the chain is being build via the timestamp of the initial event and not its ID?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The timestamp is needed anyhow to compute expired events. So this is one field that is the same in all events and where we do not need an iterative approach. In the end this is an implementation detail. The spec only clarifies that this is a required property (one an implementation is allowed to rely on) of a membership.

membershipsForSingleDevice.map((m)=>m.created_ts ?? m.origin_server_ts).filter((ts)=>ts===related_events_ts)

Is a valid implementation to collect all events for connected session.

It still does not help with finding the associated leave event.
It is still helpful in making clear how created_ts behaves.

> on the map, so that clients can omit connecting to participants that are not in their
> area of interest.

#### State key for `m.rtc.member`

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Several applications (widgets) can send this event to the same room. In most cases the application should only receive and send it's own member events, so that data is not shared and cannot be modified by another application. Unless someone writes a widget to show and or manage all rtc data in the room.

It would be nice if the spec could define this behavior. Maybe we could have some changes to widget capabilities or some default behavior for this state event.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Based on what field would the widget api/driver filter the events. They all have the same type.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

<application>?

Comment on lines 171 to 172
Even thought the proposed format is`<device_id>_<application><optional_application_id>`,
the state key only has to fulfill be unique in regards to device, application and application_id.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
Even thought the proposed format is`<device_id>_<application><optional_application_id>`,
the state key only has to fulfill be unique in regards to device, application and application_id.
Even though the proposed format is`<device_id>_<application><optional_application_id>`,
the state key only has to be unique in regards to device, application and application_id.

This makes it grammatically correct, but I still struggle to understand it. Do you mean that the state key is a unique combination from these 3 variables?

@ara4n ara4n added the matrix-2.0 Required for Matrix 2.0 label Sep 5, 2025
@github-project-automation github-project-automation bot moved this to Tracking for review in Spec Core Team Workflow Sep 5, 2025
venimus pushed a commit to superhero-com/matrix-docker-ansible-deploy that referenced this pull request Oct 8, 2025
…atrix_client_property_org_matrix_msc4143_rtc_foci` so that a list is produced

`group_vars/matrix_servers` was correctly populating `matrix_static_files_file_matrix_client_property_org_matrix_msc4143_rtc_foci_auto` with a list, but:

- the defaults for these variables were hinting that hashmaps are necessary

- merging of `_auto` and `_custom` was done as if for hashmaps, not lists

As a result, `/.well-known/matrix/client` looked like this:

```json
{
	"org.matrix.msc4143.rtc_foci": {
		"livekit_service_url": "https://matrix.example.com/livekit-jwt-service",
		"type": "livekit"
	}
}
```

.. instead of what's expected as per MSC4143 (matrix-org/matrix-spec-proposals#4143):

```json
{
	"org.matrix.msc4143.rtc_foci": [
		{
			"livekit_service_url": "https://matrix.example.com/livekit-jwt-service",
			"type": "livekit"
		}
	]
}
```

Regardless of our incorrectly formatted `org.matrix.msc4143.rtc_foci`
configuration in `/.well-known/matrix/client`, Element Web still seemed
to be able to discover LiveKit JWT Service (and by extension, LiveKit Server) correctly,
even without this fix.
@fkwp fkwp marked this pull request as ready for review October 14, 2025 13:16
Michael-Ixo pushed a commit to ixoworld/synapse that referenced this pull request Oct 23, 2025
Deployments that make use of the
[synapse-s3-storage-provider](https://github.com/matrix-org/synapse-s3-storage-provider)
module must upgrade to
[v1.6.0](https://github.com/matrix-org/synapse-s3-storage-provider/releases/tag/v1.6.0).
Using older versions of the module with this release of Synapse will prevent
users from being able to upload or download media.

No significant changes since 1.140.0rc1.

- Add [a new Media Query by ID Admin API](https://element-hq.github.io/synapse/v1.140/admin_api/media_admin_api.html#query-a-piece-of-media-by-id) that allows server admins to query and investigate the metadata of local or cached remote media via
  the `origin/media_id` identifier found in a [Matrix Content URI](https://spec.matrix.org/v1.14/client-server-api/#matrix-content-mxc-uris). ([\element-hq#18911](element-hq#18911))
- Add [a new Fetch Event Admin API](https://element-hq.github.io/synapse/v1.140/admin_api/fetch_event.html) to fetch an event by ID. ([\element-hq#18963](element-hq#18963))
- Update [MSC4284: Policy Servers](matrix-org/matrix-spec-proposals#4284) implementation to support signatures when available. ([\element-hq#18934](element-hq#18934))
- Add experimental implementation of the `GET /_matrix/client/v1/rtc/transports` endpoint for the latest draft of [MSC4143: MatrixRTC](matrix-org/matrix-spec-proposals#4143). ([\element-hq#18967](element-hq#18967))
- Expose a `defer_to_threadpool` function in the Synapse Module API that allows modules to run a function on a separate thread in a custom threadpool. ([\element-hq#19032](element-hq#19032))

- Fix room upgrade `room_config` argument and documentation for `user_may_create_room` spam-checker callback. ([\element-hq#18721](element-hq#18721))
- Compute a user's last seen timestamp from their devices' last seen timestamps instead of IPs, because the latter are automatically cleared according to `user_ips_max_age`. ([\element-hq#18948](element-hq#18948))
- Fix bug where ephemeral events were not filtered by room ID. Contributed by @frastefanini. ([\element-hq#19002](element-hq#19002))
- Update Synapse main process version string to include git info. ([\element-hq#19011](element-hq#19011))

- Explain how `Deferred` callbacks interact with logcontexts. ([\element-hq#18914](element-hq#18914))
- Fix documentation for `rc_room_creation` and `rc_reports` to clarify that a `per_user` rate limit is not supported. ([\element-hq#18998](element-hq#18998))

- Remove deprecated `LoggingContext.set_current_context`/`LoggingContext.current_context` methods which already have equivalent bare methods in `synapse.logging.context`. ([\element-hq#18989](element-hq#18989))
- Drop support for unstable field names from the long-accepted [MSC2732](matrix-org/matrix-spec-proposals#2732) (Olm fallback keys) proposal. ([\element-hq#18996](element-hq#18996))

- Cleanly shutdown `SynapseHomeServer` object, allowing artifacts of embedded small hosts to be properly garbage collected. ([\element-hq#18828](element-hq#18828))
- Update OEmbed providers to use 'X' instead of 'Twitter' in URL previews, following a rebrand. Contributed by @HammyHavoc. ([\element-hq#18767](element-hq#18767))
- Fix `server_name` in logging context for multiple Synapse instances in one process. ([\element-hq#18868](element-hq#18868))
- Wrap the Rust HTTP client with `make_deferred_yieldable` so it follows Synapse logcontext rules. ([\element-hq#18903](element-hq#18903))
- Fix the GitHub Actions workflow that moves issues labeled "X-Needs-Info" to the "Needs info" column on the team's internal triage board. ([\element-hq#18913](element-hq#18913))
- Disconnect background process work from request trace. ([\element-hq#18932](element-hq#18932))
- Reduce overall number of calls to `_get_e2e_cross_signing_signatures_for_devices` by increasing the batch size of devices the query is called with, reducing DB load. ([\element-hq#18939](element-hq#18939))
- Update error code used when an appservice tries to masquerade as an unknown device using [MSC4326](matrix-org/matrix-spec-proposals#4326). Contributed by @tulir @ Beeper. ([\element-hq#18947](element-hq#18947))
- Fix `no active span when trying to log` tracing error on startup (when OpenTracing is enabled). ([\element-hq#18959](element-hq#18959))
- Fix `run_coroutine_in_background(...)` incorrectly handling logcontext. ([\element-hq#18964](element-hq#18964))
- Add debug logs wherever we change current logcontext. ([\element-hq#18966](element-hq#18966))
- Update dockerfile metadata to fix broken link; point to documentation website. ([\element-hq#18971](element-hq#18971))
- Note that the code is additionally licensed under the [Element Commercial license](https://github.com/element-hq/synapse/blob/develop/LICENSE-COMMERCIAL) in SPDX expression field configs. ([\element-hq#18973](element-hq#18973))
- Fix logcontext handling in `timeout_deferred` tests. ([\element-hq#18974](element-hq#18974))
- Remove internal `ReplicationUploadKeysForUserRestServlet` as a follow-up to the work in element-hq#18581 that moved device changes off the main process. ([\element-hq#18988](element-hq#18988))
- Switch task scheduler from raw logcontext manipulation to using the dedicated logcontext utils. ([\element-hq#18990](element-hq#18990))
- Remove `MockClock()` in tests. ([\element-hq#18992](element-hq#18992))
- Switch back to our own custom `LogContextScopeManager` instead of OpenTracing's `ContextVarsScopeManager` which was causing problems when using the experimental `SYNAPSE_ASYNC_IO_REACTOR` option with tracing enabled. ([\element-hq#19007](element-hq#19007))
- Remove `version_string` argument from `HomeServer` since it's always the same. ([\element-hq#19012](element-hq#19012))
- Remove duplicate call to `hs.start_background_tasks()` introduced from a bad merge. ([\element-hq#19013](element-hq#19013))
- Split homeserver creation (`create_homeserver`) and setup (`setup`). ([\element-hq#19015](element-hq#19015))
- Swap near-end-of-life `macos-13` GitHub Actions runner for the `macos-15-intel` variant. ([\element-hq#19025](element-hq#19025))
- Introduce `RootConfig.validate_config()` which can be subclassed in `HomeServerConfig` to do cross-config class validation. ([\element-hq#19027](element-hq#19027))
- Allow any command of the `release.py` script to accept a `--gh-token` argument. ([\element-hq#19035](element-hq#19035))

* Bump Swatinem/rust-cache from 2.8.0 to 2.8.1. ([\element-hq#18949](element-hq#18949))
* Bump actions/cache from 4.2.4 to 4.3.0. ([\element-hq#18983](element-hq#18983))
* Bump anyhow from 1.0.99 to 1.0.100. ([\element-hq#18950](element-hq#18950))
* Bump authlib from 1.6.3 to 1.6.4. ([\element-hq#18957](element-hq#18957))
* Bump authlib from 1.6.4 to 1.6.5. ([\element-hq#19019](element-hq#19019))
* Bump bcrypt from 4.3.0 to 5.0.0. ([\element-hq#18984](element-hq#18984))
* Bump docker/login-action from 3.5.0 to 3.6.0. ([\element-hq#18978](element-hq#18978))
* Bump lxml from 6.0.0 to 6.0.2. ([\element-hq#18979](element-hq#18979))
* Bump phonenumbers from 9.0.13 to 9.0.14. ([\element-hq#18954](element-hq#18954))
* Bump phonenumbers from 9.0.14 to 9.0.15. ([\element-hq#18991](element-hq#18991))
* Bump prometheus-client from 0.22.1 to 0.23.1. ([\element-hq#19016](element-hq#19016))
* Bump pydantic from 2.11.9 to 2.11.10. ([\element-hq#19017](element-hq#19017))
* Bump pygithub from 2.7.0 to 2.8.1. ([\element-hq#18952](element-hq#18952))
* Bump regex from 1.11.2 to 1.11.3. ([\element-hq#18981](element-hq#18981))
* Bump serde from 1.0.224 to 1.0.226. ([\element-hq#18953](element-hq#18953))
* Bump serde from 1.0.226 to 1.0.228. ([\element-hq#18982](element-hq#18982))
* Bump setuptools-rust from 1.11.1 to 1.12.0. ([\element-hq#18980](element-hq#18980))
* Bump twine from 6.1.0 to 6.2.0. ([\element-hq#18985](element-hq#18985))
* Bump types-pyyaml from 6.0.12.20250809 to 6.0.12.20250915. ([\element-hq#19018](element-hq#19018))
* Bump types-requests from 2.32.4.20250809 to 2.32.4.20250913. ([\element-hq#18951](element-hq#18951))
* Bump typing-extensions from 4.14.1 to 4.15.0. ([\element-hq#18956](element-hq#18956))
"application": {
"type": "m.call",
// optional: app specific slot metadata
"m.call.id": UUID, // Note your application must handle rollback due to state resolution
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

...how?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

kind:core MSC which is critical to the protocol's success matrix-2.0 Required for Matrix 2.0 needs-implementation This MSC does not have a qualifying implementation for the SCT to review. The MSC cannot enter FCP. proposal A matrix spec change proposal voip

Projects

Status: Tracking for review

Development

Successfully merging this pull request may close these issues.