`/sync` doesn't inform clients when there is a gap in the timeline #16463

erikjohnston · 2023-10-11T12:07:32Z

If there has bee e.g. a netsplit in a room where the remote servers have sent more than ten events, then when the netsplit resolves the local server will fetch up to 10 events. The remaining events will be backfilled when a client does an appropriate call to /messages.

The syncing clients will receive the new events down their timeline, however the limited flag is not set. This means the client thinks there is no gap, and so won't try and backfill any missed messages.

The net result is that clients won't ever see the older events in the remote fork.

The text was updated successfully, but these errors were encountered:

erikjohnston · 2023-10-11T12:09:54Z

I think the fix here is to somehow detect that we have a backwards extremity between the prev_batch and current token, and if so set the limited flag (ensuring we don't send down any events from before the potential gap)

clokep · 2023-10-11T12:10:44Z

Not directly related, but MSC3871 talks a bit about detecting gaps in timelines. (Although from a /messages POV.)

clokep · 2023-10-11T12:12:03Z

The syncing clients will receive the new events down their timeline, however the limited flag is not set. This means the client thinks there is no gap, and so won't try and backfill any missed messages.

Why is the limited flag not set? Is it because because 10 events is less than the default 20 events that clients usually request?

clokep · 2023-10-11T12:13:35Z

The remaining events will be backfilled when a client does an appropriate call to /messages.

Would #13576 help at all? (Proactively backfilling more events.)

erikjohnston · 2023-10-11T12:16:35Z

The syncing clients will receive the new events down their timeline, however the limited flag is not set. This means the client thinks there is no gap, and so won't try and backfill any missed messages.

Why is the limited flag not set? Is it because because 10 events is less than the default 20 events that clients usually request?

Yup, I believe so.

The remaining events will be backfilled when a client does an appropriate call to /messages.

Would #13576 help at all? (Proactively backfilling more events.)

It'd mitigate this a bit, yes.

clokep · 2023-10-11T12:23:07Z

I guess a band-aid might be to try to backfill 100 events or something. This won't work though when the netsplit is only 11 events.

I'm having some trouble tracking here how the backfill would even work. Though, currently the homeserver wouldn't attempt to backfill those until the user backscrolled (via /messages) to where topologically the netsplit occurred, correct? (What I'm asking is, even with the limited flag set, how would a client get those messages easily due to the split in ordering between /messages and /sync -- see matrix-org/matrix-spec#852).

erikjohnston · 2023-10-11T12:44:53Z

Oh, hmm.

Firstly: I care less right now if you have backpaginate a bunch before you actually do a backfill, but agreed that is sucky.

Secondly, I think that if clients /messages set from to be prev_batch and to to be the sync token returned by the previous sync, we'll never actually paginate to the topological ordering where we'd trigger backfill. I'm not sure if clients actually set a to?

clokep · 2023-10-11T13:47:34Z

I'm not sure if clients actually set a to?

I do see some hits for it, but they all are either a 401 error with missing access token or the client disconnects before a response is sent?

$ grep "to=" synchrotron*.log | grep -v " 401 " | grep -v "already disconnected"

erikjohnston · 2023-10-11T14:29:00Z

Oh, that's exciting!

clokep · 2023-10-11T15:13:20Z

I think the fix here is to somehow detect that we have a backwards extremity between the prev_batch and current token, and if so set the limited flag (ensuring we don't send down any events from before the potential gap)

We talked a bit about this on the phone, it might work to check something about the topo ordering of the first (or last?) event sent down sync vs. the latest topo of the room or something. I'm unsure this would work.

Maybe another thing would be if any of the events in the room since the since token have a stream ordering that's negative or a prev event that has a negative stream ordering?

erikjohnston · 2023-10-11T16:06:20Z

Looks like artificially forcing the limited flag doesn't use the to parameter on element web at least, so that might still be enough for a short term fix. It looks to throw away the entire timeline and paginate it back in?

erikjohnston · 2023-10-12T09:12:28Z

My plan is to add a table that records whenever we receive an event over federation which we don't have the prev events for (and haven't fetched), i.e. where we have a "gap". Then we use that to force the limited flag.

This was referenced Oct 13, 2023

Mark sync as limited if there is a gap in the timeline #16485

Merged

Add regression test for matrix-org/synapse#16463 matrix-org/complement#681

Merged

erikjohnston closed this as completed in #16485 Oct 19, 2023

erikjohnston mentioned this issue Oct 19, 2023

The limited flag in Timeline in /sync response needs clarification matrix-org/matrix-spec#1664

Open

erikjohnston added a commit to matrix-org/complement that referenced this issue Oct 19, 2023

Add regression test for matrix-org/synapse#16463 (#681)

fdfbde6

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

`/sync` doesn't inform clients when there is a gap in the timeline #16463

`/sync` doesn't inform clients when there is a gap in the timeline #16463

erikjohnston commented Oct 11, 2023

erikjohnston commented Oct 11, 2023

clokep commented Oct 11, 2023 •

edited

Loading

clokep commented Oct 11, 2023

clokep commented Oct 11, 2023

erikjohnston commented Oct 11, 2023

clokep commented Oct 11, 2023

erikjohnston commented Oct 11, 2023

clokep commented Oct 11, 2023

erikjohnston commented Oct 11, 2023

clokep commented Oct 11, 2023

erikjohnston commented Oct 11, 2023 •

edited

Loading

erikjohnston commented Oct 12, 2023

/sync doesn't inform clients when there is a gap in the timeline #16463

/sync doesn't inform clients when there is a gap in the timeline #16463

Comments

erikjohnston commented Oct 11, 2023

erikjohnston commented Oct 11, 2023

clokep commented Oct 11, 2023 • edited Loading

clokep commented Oct 11, 2023

clokep commented Oct 11, 2023

erikjohnston commented Oct 11, 2023

clokep commented Oct 11, 2023

erikjohnston commented Oct 11, 2023

clokep commented Oct 11, 2023

erikjohnston commented Oct 11, 2023

clokep commented Oct 11, 2023

erikjohnston commented Oct 11, 2023 • edited Loading

erikjohnston commented Oct 12, 2023

`/sync` doesn't inform clients when there is a gap in the timeline #16463

`/sync` doesn't inform clients when there is a gap in the timeline #16463

clokep commented Oct 11, 2023 •

edited

Loading

erikjohnston commented Oct 11, 2023 •

edited

Loading