diff --git a/proposals/4033-event-thread-and-order.md b/proposals/4033-event-thread-and-order.md new file mode 100644 index 00000000000..3406042b6f3 --- /dev/null +++ b/proposals/4033-event-thread-and-order.md @@ -0,0 +1,386 @@ +# MSC4033: Explicit ordering of events for receipts + +The [spec](https://spec.matrix.org/unstable/client-server-api/#receipts) states +that receipts are "read-up-to" without explaining what order the events are in, +so it is difficult to decide whether an event is before or after a receipt. + +We propose adding an explicit order number to all events, so that it is clear +which events are read. + +This proposal covers receipts, and not fully-read markers. Fully-read markers +have the same issue in terms of ordering, and should probably be fixed in a +similar way, but they are not addressed here. + +## Motivation + +To decide whether a room is unread, a Matrix client must decide whether it +contains any unread messages. + +Similarly, to decide whether a room has notifications, we must decide whether +any of its potentially-notifying messages is unread. + +Both of these tasks require us to decide whether a message is read or unread. + +To make this decision we have receipts. We use the following rule: + +> An event is read if the room contains an unthreaded receipt pointing at an +> event which is *after* the event, or a threaded receipt pointing at an event +> that is in the same thread as the event, and is *after* or the same as the +> event. +> +> Otherwise, it is unread. + +(In both cases we only consider receipts sent by the current user, obviously. We +consider either private or public read receipts.) + +To perform this calculation we need a clear definition of *after*. + +### Current definition of *after* + +The current spec (see +[11.6 Receipts](https://spec.matrix.org/latest/client-server-api/#receipts)) is not clear +about what it calls "read up to" means. + +Clients like Element Web make the assumption that *after* means "after in Sync +Order", where "Sync Order" means "the order in which I (the client) received the +events from the server via sync", so if a client received an event and another +event for which it has a receipt via sync, then the event that was later in the +sync or received in a later sync, is after the other one. + +See +[room-dag-concepts](https://github.com/matrix-org/synapse/blob/develop/docs/development/room-dag-concepts.md#depth-and-stream-ordering) +for some Synapse-specific information on Stream Order. In Synapse, Sync Order is +expected to be identical to its concept of Stream Order. + +See also [Spec Issue #1167](https://github.com/matrix-org/matrix-spec/issues/1167), +which calls out this ambiguity about the meaning of "read up to". + +### Problems with the current definition + +The current definition of *after* is ambiguous, and difficult for clients to +calculate. It depends on only receiving events via sync, which is impossible +since we sometimes want messages that did not arrive via sync, so we use +different APIs such as `messages` or `relations`. + +The current definition also makes it needlessly complex for clients to determine +whether an event is read because the receipt itself does not hold enough +information: the referenced event must be fetched and correctly ordered. + +Note: these problems actually apply to all receipts, not just those of the +current user. The symptoms are much more visible and impactful when the current +user's receipts are misinterpreted than for other users, but this proposal +covers both cases. + +## Proposal + +We propose to add an explicit order number to events and receipts, so we can +easily compare whether an event is before or after a receipt. + +This order should be a number that is attached to an event by the server before +it sends it to any client, and it should never change. It should, +loosely-speaking, increase for "newer" messages within the same room. + +The order of an event may be negative, and if so it is understood that this +event is always read. The order included with a receipt should never be +negative. + +The ordering must be consistent between a user's homeserver and all of that +user's connected clients. There are no guarantees it is consistent across +different users or rooms. It will be inconsistent across federation as there is +no mechanism to sync order between homeservers. For this reason, we propose that +`order` be included in an event's `unsigned` property. + +This proposal attaches no particular meaning to the rate at which the ordering +increments. (Although we can imagine that some future proposal might want to +expand this idea to include some meaning.) + +### Examples + +Example event (changes are highlighted in bold): + +
{ + "type": "m.room.message", + "content": { + "body": "This is an example text message", + "format": "org.matrix.custom.html", + "formatted_body": "<b>This is an example text message</b>", + "msgtype": "m.text" + }, + "event_id": "$143273582443PhrSn:example.org", + "origin_server_ts": 1432735824653, + "room_id": "!jEsUZKDJdhlrceRyVU:example.org", + "sender": "@example:example.org", + "unsigned": { + "age": 1234, + "order": 56764334543 + } +}+ +Example encrypted event (changes are highlighted in bold): + +
{ + "type": "m.room.encrypted", + "content": { + "algorithm": "m.megolm.v1.aes-sha2", + "sender_key": "+ +Example receipt (changes are highlighted in bold): + +", + "device_id": " ", + "session_id": " ", + "ciphertext": " " + } + "event_id": "$143273582443PhrSn:example.org", + "origin_server_ts": 1432735824653, + "room_id": "!jEsUZKDJdhlrceRyVU:example.org", + "sender": "@example:example.org", + "unsigned": { + "age": 1234, + "order": 56764334543 + } +}
{ + "content": { + "$1435641916114394fHBLK:matrix.org": { + "m.read": { + "@erikj:jki.re": { + "ts": 1436451550453, + "order": 56764334544, + } + }, + } + }, + "type": "m.receipt" +}+ +We propose: + +* all events should contain an `order` property inside `unsigned`. +* all receipts should contain an `order` property alongside `ts` inside the + information about an event, which is a cache of the `order` property within + the referred-to event. + +The `order` property in receipts should be inserted by servers when they are +creating the aggregated receipt event. + +If the server is not able to provide the order of a receipt (e.g. because it +does not have the relevant event) it should not send the receipt. If a server +later receives an event, allowing it to provide an order for this receipt, it +should send the receipt at that time. Rationale: without the order, a receipt is +not useful to the client since it is not able to use it to determine which +events are read. If a receipt points at an unknown event, the safest assumption +is that other events in the room are unread i.e. there is no receipt. + +If a receipt is received for an event with negative order, the server should set +the order in the receipt to zero. All events with negative order are understood +to be read. + +Note that the `order` property for a particular event will probably be the same +for every user, so will be repeated multiple times in an aggregated receipt +event. This structure was chosen to reduce the chance of breaking existing +clients by introducing `order` at a higher level. + +### Proposed definition of *after* + +We propose that the definition of *after* should be: + +* Event A is after event B if its order is larger. + +We propose updating the spec around receipts +([11.6 Receipts](https://spec.matrix.org/latest/client-server-api/#receipts)) +to be explicit about what "read up to" means, using the above definition. + +### Definition of read and unread events + +We propose that the definition of whether an event is read should include the +original definition plus the above definition of *after*, and also include this +clarification: + +> (Because the receipt itself contains the `order` of the pointed-to event, +> there is no need to examine the pointed-to event: it is sufficient to compare +> the `order` of the event in question with the `order` in the receipt.) + +Further, it should be stated that events with negative order are always read, +even if no receipt exists. + +### Order does not have to be unique + +If this proposal required the `order` property to be unique within a room, it +might inadvertently put constraints on the implementation of servers since some +linearised process would need to be involved. + +So, we do not require that `order` should be unique within a room. Instead, if +two events have the same `order`, they are both marked as read by a receipt with +that order. + +Events with identical order introduce some imprecision into the process of +marking events as read, so they should be minimised where possible, but some +overlap is tolerable where the server implementation requires it. + +So, a server might choose to use the epoch millisecond at which it received a +message as its order. However, if a server receives a large batch of messages in +the same millisecond, this might cause undesirable behaviour, so a refinement +might be the millisecond as the integer part and a fractional part that +increases as the batch is processed, preserving the order in which the server +receives the messages in the batch. + +If a server were processing multiple batches in parallel, it could implement +this in each process separately, and accept that some events would receive +identical orders, but this would be rare in practice and have little effect on +end users' experience of unread markers. + +### Redacted events + +Existing servers already include an `unsigned` section with redacted events, +despite `unsigned` not being mentioned in the [redaction +rules](https://spec.matrix.org/unstable/rooms/v10/#redactions). + +Therefore we propose that redacted events should include `order` in exactly the +same way as all room events. + +## Discussion + +### What order to display events in the UI? + +It is desirable that the order property should match the order of events +displayed in the client as closely as possible, so that receipts behave +consistently with the displayed timeline. However, clients may have different +ideas about where to display late-arriving messages, so it is impossible to +define an order that works for all clients. Instead we agree that a consistent +answer is the best we can do, and rely on clients to provide the best UX they +can for late-arriving messages. + +### Stream order or Topological Order? + +The two orders that we might choose to populate the `order` property are "stream +order" where late-arriving messages tend to receive higher numbers, or +"Topological Order" where late-arriving message tend to receive lower numbers. + +We believe that it is better to consider late-arriving messages as unread, +meaning the client has the information that these newly arrived messages have +not been read and can choose how to display it (or not). This is what leads us +to suggest Stream Order as the correct choice. + +However, if servers choose Topological Order, this proposal still works - we +just have what the authors consider undesirable behaviour regarding +late-arriving events (they are seen as read even though they are not). + +### Inconsistency across federation + +Because order may be inconsistent across federation[^1], one user may +occasionally see a different unread status for another user from what that user +themselves see. We regard this as impossible to avoid, and expect that in most +cases it will be unnoticeable, since home servers with good connectivity will +normally see events in similar orders. When servers have long network splits, +there will be a noticeable difference at first, but once messages start flowing +normally and users start reading them, the differences will disappear as new +events will have higher Stream order than the older ones on both servers. + +[^1]: In fact, order could also be inconsistent across different users on the + same home server, although we expect in practice this will not happen. + +The focus of this proposal is that a single user sees consistent behaviour +around their own read receipts, and we consider that much more important that +the edge case of inconsistent behaviour across federation after a network split. + +## Implementation Notes + +Some home servers such as Synapse already have a concept of Stream Order. We +expect that the order defined here could be implemented using Stream Order. + +## Potential issues + +This explicitly allows receipts to be inconsistent across federation. In +practice this is already the case in the wild, and is impossible to solve using +Stream Order. The problems with using Topological Order (and Sync Order) have +already been outlined. + +## Alternatives + +### Solves the same problem MSC3981 Relations Recursion tried to solve + +This proposal would not replace +[MSC3981: /relations recursion](https://github.com/matrix-org/matrix-spec-proposals/pull/3981) +but would make it less important, because we would no longer depend on the +server providing messages in Sync Order, so we could happily fetch messages +recursively and still be able to slot them into the right thread and ordering. + +Note that the expectation (from some client devs e.g. me @andybalaam) was that +MSC3981 would solve many problems for clients because the events in a thread +would be returned in Sync Order, but this is not true: the proposal will return +events in Topological Order, which is useless for determining which events are +read. + +### The server could report which rooms are unread + +We could use the definitions within this proposal but avoid calculating what was +unread on the client. Instead we could ask the server to figure out which rooms +are unread. + +The client will still need to know which events are unread in order to process +notifications that are encrypted when they pass through the server, so this +proposal would probably be unaltered even if we added the capability for servers +to surface which rooms are unread. + +### Location of order property in receipts + +Initially, we included `order` as a sibling of `m.read` inside the content of a +receipt: + +
{ + "content": { + "$1435641916114394fHBLK:matrix.org": { + "order": 56764334544, + "m.read": { "@rikj:jki.re": { "ts": 1436451550453, "thread_id": "$x" } }, + "m.read.private": { "@self:example.org": { "ts": 1661384801651 } } + } + }, + "type": "m.receipt" +}+ +We moved it inside the content, as a sibling to `ts`, because multiple existing +clients (mautrix-go, mautrix-python and matrix-rust-sdk) would have failed to +parse the above JSON if they encountered it without first being updated. + +### Drop receipts with missing order information + +In the case where a server has a receipt to send to the client, but does not +have the event to which it refers, and therefore cannot find its order, we +proposed above that the server should hold the receipt until it has the relevant +event, and send it then. + +Alternatively, we could simply never send the receipt under these circumstances. +We believe that this is reasonable because it is not expected to happen for the +user's own events, which are the most critical to provide accurate read +receipts, and implementing the "hold and send later" strategy may cause extra +work for the server for little practical gain. + +## Security considerations + +None highlighted so far. + +## Unstable prefix + +TODO + +## Dependencies + +None at this time. + +## Acknowledgements + +Formed from a discussion with @ara4n, with early review from @clokep. Built on +ideas from @t3chguy, @justjanne, @germain-gg and @weeman1337. + +## Changelog + +* 2023-07-04 Initial draft by @andybalaam after conversation with @ara4n. +* 2023-07-05 Remove thread roots from their thread after conversation with @clokep. +* 2023-07-05 Make redactions never unread after conversation with @t3chguy +* 2023-07-05 Give a definition of Stream Order +* 2023-07-05 Be explicit about Stream Order not going over federation +* 2023-07-05 Mention disagreeing about what another user has read +* 2023-07-05 Move thread_id into content after talking to @deepbluev7 +* 2023-07-06 Reduced to just order. Thread IDs will be a separate MSC +* 2023-07-06 Moved order deeper within receipts to reduce existing client impact +* 2023-07-13 Include order with redacted events after comments from @clokep