-
Notifications
You must be signed in to change notification settings - Fork 382
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
MSC4033: Explicit ordering of events for receipts #4033
base: main
Are you sure you want to change the base?
Conversation
@bradtgmurray is the client code open source? |
Unfortunately no |
Co-authored-by: Matthew Hodgson <[email protected]>
Co-authored-by: Matthew Hodgson <[email protected]>
…r when we get them
Further, it should be stated that events with negative order are always read, | ||
even if no receipt exists. | ||
|
||
### Order does not have to be unique |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
After a conversation with @erikjohnston about not wanting to constrain server implementations, I added this section. Comments welcome.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks. I think the problem with this is where the client gets a bunch of messages with an ordering much lower than the current ordering of the room. Do you a) treat them as read immediately, or b) make it hard to correctly mark them as read?
so it is difficult to decide whether an event is before or after a receipt. | ||
|
||
We propose adding an explicit order number to all events, so that it is clear | ||
which events are read. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The problem with numbers is that you can't fill gaps between them, so if you already have 15 and 16, there's no place inbetween them. Instead, you could treat the order as a string of digits instead with the property that
15 < 151 < 16
For easier understanding, you can compare this to decimal numbers: 0.15 < 0.151 < 0.16
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
if you need to be able to insert a new value between any existing pair, you can use Dyadic Rationals, basically a value of the form a * 2^b
where a
and b
are integers.
e.g. if you have 23
and 24
, you can use 47 * 2^-1 = 23.5
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Using an array as the index is also a concept that can be found in CRDT's.
[1]
, [1,1]
, [1,2]
, [1,2,1]
,[2]
(very similar to what timokoesters is proposing but strings bring more typing ambiguity/issues.)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In terms of storage/memory used per event (thus bandwidth usage), dyadic rationals sound optimal: sufficiently large space, only require two u32 or two u64. Strings or array representations would require bigger allocations. (Nobody really proposed it, but FP numbers would be a waste in terms of storage space b/o all the NaN
s + IEEE754 is hard to get right.)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
dyadic rationals only require one byte for the exponent, unless you want to support a value with >256 bits (imo excessive), so they only need 5 bytes (32-bit mantissa) or 9 bytes (64-bit mantissa), though you can definitely use more for alignment or convenience.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Note that we will need to be able to handle edge cases where we run out of exponents, e.g. you have an event A and you keep inserting events just after it (which may very well end up a common case). If you only have a one byte exponent then you quickly run out of room.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
if you run out of exponent bits, that also means the mantissa has to be at least 256-bits wide (or something like that), so you will need big-integer arithmetic for that...i think that's likely excessive.
maybe a better idea is to maintain a binary tree with one node per message and have the server send the list of changed nodes (if the message's contents doesn't change, all that needs to be sent is the node ids of the tree's children, since that's all that changes during tree balancing), this allows tree balancing to avoid the tree getting too deep.
messages would then be ordered by their position in an in-order traversal of that tree.
As a SDK implementer, I am super interested in this proposal, as it could be useful for other purposes too, in addition to making computing the unread badges very simple. For instance, I think this would allow figuring whether we have gaps when reconciling a local/cached timeline of events with the results of a /messages query. edit (2024-02-13): turns out this is not sufficient to figure out gaps, because the ordering number space has to have holes, so the HS can "always" insert events coming late from a laggy federated server. |
@@ -0,0 +1,386 @@ | |||
# MSC4033: Explicit ordering of events for receipts |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just as a quick note: I have strong suspicions that: a) this is not possible to do in a way that doesn't limit scalability of servers, b) this isn't necessarily the order in which clients should render events, and c) trying to handle filling in gaps is complicated as you need to deal with edge cases.
I'm sure we can do something close here, but needs investigation from server teams to figure out viability.
One change that might help a lot here is: instead of inserting an ordering on all events, instead have an opaque "receipt_key" field for events coming down /sync
or /messages
, which you can pass to the /receipt
API. This is a lot less powerful than a full linearized history, but feels immediately more viable.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Oh, also: the server sometimes doesn't know the correct ordering of two historical events immediately (e.g. it has two disconnected chunks of DAG). This is less of a problem for /sync
and read receipts (as historical events don't go down /sync
or count as unread, server side at least), but may be more of one for other use cases.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Oh, also: the server sometimes doesn't know the correct ordering of two historical events immediately (e.g. it has two disconnected chunks of DAG).
this is easily handled by my binary tree idea, the server just picks an arbitrary spot in the tree and then changes it when it knows the correct spot. since the tree nodes' order can be changed without changing message/node ids, this makes it more flexible than using a numeric order.
…remental sync (#17510) Use `stream_ordering` based `timeline` pagination for incremental `/sync` in Sliding Sync. Previously, we were always using a `topological_ordering` but we should only be using that for historical scenarios (initial `/sync`, newly joined, or haven't sent the room down the connection before). This is slightly different than what the [spec suggests](https://spec.matrix.org/v1.10/client-server-api/#syncing) > Events are ordered in this API according to the arrival time of the event on the homeserver. This can conflict with other APIs which order events based on their partial ordering in the event graph. This can result in duplicate events being received (once per distinct API called). Clients SHOULD de-duplicate events based on the event ID when this happens. But we've had a [discussion below in this PR](#17510 (comment)) and this matches what Sync v2 already does and seems like it makes sense. Created a spec issue matrix-org/matrix-spec#1917 to clarify this. Related issues: - matrix-org/matrix-spec#1917 - matrix-org/matrix-spec#852 - matrix-org/matrix-spec-proposals#4033
Rendered
Points out ambiguities in the current spec about what a receipt means, because it is not clear on the order of events that means they are "before" or "after" a receipt. Proposes that events and receipts contain an
order
property that makes it explicit which receipt marks which event as read.Fixes matrix-org/matrix-spec#1167