Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

MSC3871: Gappy timelines #3871

Open
wants to merge 19 commits into
base: main
Choose a base branch
from
Open
Changes from 7 commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
130 changes: 130 additions & 0 deletions proposals/3871-gappy-timelines.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,130 @@
# MSC3871: Gappy timeline
MadLittleMods marked this conversation as resolved.
Show resolved Hide resolved

`/messages` returns a linearized version of the event DAG. From any given
homeservers perspective of the room, the DAG can have gaps where they're missing
events. This could be because the homeserver hasn't fetched them yet or because
it failed to fetch the events because those homeservers are unreachable and no
one else knows about the event.

Currently, there is an unwritten rule between the server and client that the
server will always return all contiguous events in that part of the timeline.
But the server has to break this rule sometimes when it doesn't have the event
and is unable to get the event from anyone else. This MSC aims to change the
dynamic so the server can give the client feedback and an indication of where
the gaps are.

This way, clients know where they are missing events and can even retry fetching
by perhaps adding some UI to the timeline like "We failed to get some messages
in this gap, try again."

This can also make servers faster to respond to `/messages`. For example,
currently, Synapse always tries to backfill and fill in the gap (even when it
has enough messages locally to respond). In big rooms like `#matrix:matrix.org`
kegsay marked this conversation as resolved.
Show resolved Hide resolved
(Matrix HQ), almost every place you ask for has gaps in it (thousands of
backwards extremities) and lots of those events are unreachable so we try the
same thing over and over hoping the response will be different this time but
instead, we just make the `/messages` response time slow. With this MSC, we can
instead be more intelligent about backfilling in the background and just tell
the client about the gap that they can retry fetching a little later.


## Proposal

Add a `?gaps_allowed=true` query parameter flag to `GET
/_matrix/client/v3/rooms/{roomId}/messages` which allows usage of a
`m.timeline.gap` indicator, that can be used in the `response` `chunk` list of
events. There can be multiple gaps per response.


### `m.timeline.gap`

key | type | value | description | required
--- | --- | --- | --- | ---
`gap_start_event_id` | string | Event ID | The event ID that the homeserver is missing where the gap begins | yes
`pagination_token` | string | Pagination token | A pagination token that represents the spot in the DAG after the missing `gap_start_event_id`. Useful when retrying to fetch the missing part of the timeline again via `/messages?dir=b&from=<pagination_token>` | yes

Pagination tokens are positions between events. This already an established
concept but to illustrate this better, see the following diagram:
```
pagination_token
|
<oldest-in-time> [0]<--[1] <gap> [gap_start_event_id]▼<--[4]<--[5]<--[6] <newest-in-time>
```

`m.timeline.gap` has a similar shape to a normal event so it's still easy to
iterate over the `/messages` response and process but has no `event_id` itself
MadLittleMods marked this conversation as resolved.
Show resolved Hide resolved
so it should not be mistaken as a real event in the room.

A full example of the `m.timeline.gap` indicator:

```json
{
"type": "m.timeline.gap",
"content": {
"gap_start_event_id": "$12345",
"pagination_token": "t47409-4357353_219380_26003_2265",
}
}
```

`/messages` response example with a gap:

```json
{
"chunk": [
{
"type": "m.room.message",
"content": {
"body": "foo",
}
},
{
"type": "m.timeline.gap",
"content": {
"gap_start_event_id": "$12345",
"pagination_token": "t47409-4357353_219380_26003_2265",
}
},
{
"type": "m.room.message",
"content": {
"body": "baz",
}
},
]
}
```


## Potential issues

Lots of gaps/extremities are generated when a spam attack occurs and federation
falls behind. If clients start showing gaps with retry links, we might just be
exposing the spam more.


## Alternatives
MadLittleMods marked this conversation as resolved.
Show resolved Hide resolved

As an alternative, we can continue to do nothing as we do today and not worry
about the occasional missing events. People seem not to notice any missing
messages anyway but they do probably see our slow `/messages` pagination.



## Security considerations

Only your own homeserver controls whether a `m.timeline.gap` indicator is added to the
message response and it isn't an event of the room so there shouldn't be any weird
edge case where the gap is trying to get you to fetch spam or something.


## Unstable prefix

Servers will indicate support for the new endpoint via a true value for feature
flag `org.matrix.msc3871` in `unstable_features` in the response to `GET
/_matrix/client/versions`.

While this feature is in development, it can be used as `GET
/_matrix/client/unstable/org.matrix.msc3871/rooms/{roomId}/messages?gaps_allowed=true`

MadLittleMods marked this conversation as resolved.
Show resolved Hide resolved