-
Notifications
You must be signed in to change notification settings - Fork 211
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Sliding Sync: Handle room_subscriptions that increase the timeline_limit
#17503
Changes from all commits
11f3492
9284cc0
ce09ef0
394c25a
f17ff7c
9c2354b
d10361d
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1 @@ | ||
Handle requests for more events in a room when using experimental sliding sync. |
Original file line number | Diff line number | Diff line change |
---|---|---|
|
@@ -653,6 +653,13 @@ async def current_sync_for_user( | |
else: | ||
assert_never(status.status) | ||
|
||
if status.timeline_limit is not None and ( | ||
status.timeline_limit < relevant_room_map[room_id].timeline_limit | ||
): | ||
# If the timeline limit has increased we want to send down | ||
# more historic events (even if nothing has since changed). | ||
rooms_should_send.add(room_id) | ||
|
||
# We only need to check for new events since any state changes | ||
# will also come down as new events. | ||
rooms_that_have_updates = self.store.get_rooms_that_might_have_updates( | ||
|
@@ -697,6 +704,7 @@ async def handle_room(room_id: str) -> None: | |
if has_lists or has_room_subscriptions: | ||
connection_position = await self.connection_store.record_rooms( | ||
sync_config=sync_config, | ||
room_configs=relevant_room_map, | ||
from_token=from_token, | ||
sent_room_ids=relevant_room_map.keys(), | ||
# TODO: We need to calculate which rooms have had updates since the `from_token` but were not included in the `sent_room_ids` | ||
|
@@ -1475,14 +1483,20 @@ async def get_room_sync_data( | |
# - When users `newly_joined` | ||
# - For an incremental sync where we haven't sent it down this | ||
# connection before | ||
# | ||
# We also decide if we should ignore the timeline bound or not. This is | ||
# to handle the case where the client has requested more historical | ||
# messages in the room by increasing the timeline limit. | ||
from_bound = None | ||
ignore_timeline_bound = False | ||
initial = True | ||
if from_token and not room_membership_for_user_at_to_token.newly_joined: | ||
room_status = await self.connection_store.have_sent_room( | ||
sync_config=sync_config, | ||
connection_token=from_token.connection_position, | ||
room_id=room_id, | ||
) | ||
|
||
if room_status.status == HaveSentRoomFlag.LIVE: | ||
from_bound = from_token.stream_token.room_key | ||
initial = False | ||
|
@@ -1496,9 +1510,24 @@ async def get_room_sync_data( | |
else: | ||
assert_never(room_status.status) | ||
|
||
if room_status.timeline_limit is not None and ( | ||
room_status.timeline_limit < room_sync_config.timeline_limit | ||
): | ||
# If the timeline limit has been increased since previous | ||
# requests then we treat it as request for more events. We do | ||
# this by sending down a `limited` sync, ignoring the from | ||
# bound. | ||
ignore_timeline_bound = True | ||
|
||
log_kv({"sliding_sync.room_status": room_status}) | ||
|
||
log_kv({"sliding_sync.from_bound": from_bound, "sliding_sync.initial": initial}) | ||
log_kv( | ||
{ | ||
"sliding_sync.from_bound": from_bound, | ||
"sliding_sync.ignore_timeline_bound": ignore_timeline_bound, | ||
"sliding_sync.initial": initial, | ||
} | ||
) | ||
|
||
# Assemble the list of timeline events | ||
# | ||
|
@@ -1541,7 +1570,7 @@ async def get_room_sync_data( | |
# (from newer to older events) starting at to_bound. | ||
# This ensures we fill the `limit` with the newest events first, | ||
from_key=to_bound, | ||
to_key=from_bound, | ||
to_key=from_bound if not ignore_timeline_bound else None, | ||
direction=Direction.BACKWARDS, | ||
# We add one so we can determine if there are enough events to saturate | ||
# the limit or not (see `limited`) | ||
|
@@ -1566,6 +1595,12 @@ async def get_room_sync_data( | |
stream=timeline_events[0].internal_metadata.stream_ordering - 1 | ||
) | ||
|
||
if ignore_timeline_bound: | ||
# If we're ignoring the timeline bound we *must* set limited to | ||
# true, as otherwise the client will append the received events | ||
# to the timeline, rather than replacing it. | ||
limited = True | ||
Comment on lines
+1599
to
+1602
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. This seems dubious. If some client replaces everything on For example, for an incremental sync if there are more events than the
Even There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Yeah, this is sort of an abuse of what What the SS proxy does is to set As I see it the options to deal with timeline trickling (in the short term) are:
I think the last one is the least worst. It's also worth noting that the client (mostly) opts into this behaviour by doing explicit room subscriptions with a differing There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. (Side note: if we do do this PR we should update the docs for There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. The pattern laid out is to use If they're trying to get more Clients should handle gaps though. Or at-least we shouldn't punish clients who do handle gaps. For example, Hydrogen handles this with fragments and gap filling since it has offline support and doesn't throw away its work. If we still think Sliding Sync should somehow address this, then I think we need to apply more thought to the API design. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Just to be clear: this is implementing the behaviour that the sliding sync proxy already supports. The way that the rust sdk currently starts is to:
I agree that there is something a bit confusing with allowing the client to change timeline limit, but TBF they are opting into this behaviour (to an extent).
We would be tweaking The alternative is to just use
I don't think this really punishes clients that handle gaps that much, it means that if they change the timeline limit they need to handle the fact that they may have already seen a subset of the events that have been returned. If we do just use
We're not in a great position to change thing around right now, though I agree it should go on the list of things we should look into once we've reimplemented everything. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more.
This is an active regression in behaviour and UX from the SS proxy. It's clear we need to do something here to allow the apps to continue to operate well.
Yes, we're subtly changing the meaning of
It's really unclear to me what the semantics of increasing the timeline limit should be. Explicitly increasing the timeline limit very much feels like the app wants to get a bigger chunk of timeline down 🤷 I guess there is a time where this sort of happens implicitly, where we have two lists with different timeline limits, e.g. an "all-room" list with a limit of 1 and a "top-20" list with a limit of 20. We'd want to be careful to handle that sanely, though if an old room gets an update (which bumps it to the top-20 list) I don't think its insane for it to include the last 20 timeline events. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. So to summarise my understanding here:
My opinion is that timeline trickling is awful and should be replaced with a bulk However, we do not live in this future. We live in the here and now, where we want to get SSS landed in Synapse ASAP and then iterate on any warts such as this. As a result, I don't really care which proposal we go with. If neither of you can agree on the semantics, then might I suggest a tiebreak and literally just implement the behaviour of the proxy (that is Failing that, I would lean towards Erik's There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. It's really nice to see the candid agreement on how bizarre this behavior is and plans for a better future! ⭐ My actual proposal is that the client can use an initial sync request and ask for There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. From the weekly meeting, it seems like the only downside to my suggestion is that it requires client-side changes. The ElementX/Rust client team is out today so we aren't able to see whether this is a big deal or not. Besides my suggestion, the better of the proposed options if we really want to push this through: use Besides the existing things already discussed in this discussion thread, we also went over the possibility of adding a completely new flag to accurately describe and trigger this behavior or even a new field like There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Conversation continued in #17579 (comment) There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Why isn't the client expanding their room timeline by paginating That seems just as good if they're adding room subscriptions just to get more There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Yeah, so this is really a (bit) of an abuse of the SS API so the client can avoid doing e.g. 20 pagination requests simultaneously.
Comment on lines
+1598
to
+1602
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. One edge case this doesn't handle is when there are only a few events in the room total. This will claim |
||
|
||
# Make sure we don't expose any events that the client shouldn't see | ||
timeline_events = await filter_events_for_client( | ||
self.storage_controllers, | ||
|
@@ -2232,19 +2267,26 @@ class HaveSentRoom: | |
contains the last stream token of the last updates we sent down | ||
the room, i.e. we still need to send everything since then to the | ||
client. | ||
timeline_limit: The timeline limit config for the room, if LIVE or | ||
PREVIOUSLY. This is used to track if the client has increased | ||
the timeline limit to request more events. | ||
""" | ||
|
||
status: HaveSentRoomFlag | ||
last_token: Optional[RoomStreamToken] | ||
timeline_limit: Optional[int] | ||
|
||
@staticmethod | ||
def live(timeline_limit: int) -> "HaveSentRoom": | ||
return HaveSentRoom(HaveSentRoomFlag.LIVE, None, timeline_limit) | ||
|
||
@staticmethod | ||
def previously(last_token: RoomStreamToken) -> "HaveSentRoom": | ||
def previously(last_token: RoomStreamToken, timeline_limit: int) -> "HaveSentRoom": | ||
"""Constructor for `PREVIOUSLY` flag.""" | ||
return HaveSentRoom(HaveSentRoomFlag.PREVIOUSLY, last_token) | ||
return HaveSentRoom(HaveSentRoomFlag.PREVIOUSLY, last_token, timeline_limit) | ||
|
||
|
||
HAVE_SENT_ROOM_NEVER = HaveSentRoom(HaveSentRoomFlag.NEVER, None) | ||
HAVE_SENT_ROOM_LIVE = HaveSentRoom(HaveSentRoomFlag.LIVE, None) | ||
HAVE_SENT_ROOM_NEVER = HaveSentRoom(HaveSentRoomFlag.NEVER, None, None) | ||
|
||
|
||
@attr.s(auto_attribs=True) | ||
|
@@ -2299,6 +2341,7 @@ async def have_sent_room( | |
async def record_rooms( | ||
self, | ||
sync_config: SlidingSyncConfig, | ||
room_configs: Dict[str, RoomSyncConfig], | ||
from_token: Optional[SlidingSyncStreamToken], | ||
*, | ||
sent_room_ids: StrCollection, | ||
|
@@ -2339,8 +2382,12 @@ async def record_rooms( | |
# end we can treat this as a noop. | ||
have_updated = False | ||
for room_id in sent_room_ids: | ||
new_room_statuses[room_id] = HAVE_SENT_ROOM_LIVE | ||
have_updated = True | ||
prev_state = new_room_statuses.get(room_id) | ||
new_room_statuses[room_id] = HaveSentRoom.live( | ||
room_configs[room_id].timeline_limit | ||
) | ||
if prev_state != new_room_statuses[room_id]: | ||
have_updated = True | ||
|
||
# Whether we add/update the entries for unsent rooms depends on the | ||
# existing entry: | ||
|
@@ -2351,18 +2398,22 @@ async def record_rooms( | |
# given token, so we don't need to update the entry. | ||
# - NEVER: We have never previously sent down the room, and we haven't | ||
# sent anything down this time either so we leave it as NEVER. | ||
# | ||
# We only need to do this if `from_token` is not None, as if it is then | ||
# we know that there are no existing entires. | ||
|
||
# Work out the new state for unsent rooms that were `LIVE`. | ||
if from_token: | ||
new_unsent_state = HaveSentRoom.previously(from_token.stream_token.room_key) | ||
else: | ||
new_unsent_state = HAVE_SENT_ROOM_NEVER | ||
|
||
for room_id in unsent_room_ids: | ||
prev_state = new_room_statuses.get(room_id) | ||
if prev_state is not None and prev_state.status == HaveSentRoomFlag.LIVE: | ||
new_room_statuses[room_id] = new_unsent_state | ||
have_updated = True | ||
for room_id in unsent_room_ids: | ||
prev_state = new_room_statuses.get(room_id) | ||
if ( | ||
prev_state is not None | ||
and prev_state.status == HaveSentRoomFlag.LIVE | ||
): | ||
new_room_statuses[room_id] = HaveSentRoom.previously( | ||
from_token.stream_token.room_key, | ||
room_configs[room_id].timeline_limit, | ||
) | ||
have_updated = True | ||
|
||
if not have_updated: | ||
return prev_connection_token | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
One edge case we thought of during our weekly meeting, is that it seems like it's only possible to bump the
timeline_limit
once to get a batch of timeline messages. So if you initially request everything withtimeline_limit: 1
and thentimeline_limit: 20
to get a bunch of timeline messages, requesting withtimeline_limit: 20
again won't give you a batch of timeline again. You would have to requesttimeline_limit: 21
, etc into infinity every time you want to fetch the initial batch of timeline.This comes into play even with ElementX because it grows and shrinks it's range over time and will want to fetch timeline for rooms that comes into view again.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this is actually covered in the PR, and I need to add tests, but: we when the timeline limit is reduced we update the timeline limit we stored, so we do correctly handle
increase -> decrease -> increase
.