Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Sliding Sync: Use stream_ordering based timeline pagination for incremental sync #17510

Conversation

MadLittleMods
Copy link
Contributor

@MadLittleMods MadLittleMods commented Jul 31, 2024

Use stream_ordering based timeline pagination for incremental /sync in Sliding Sync. Previously, we were always using a topological_ordering but we should only be using that for historical scenarios (initial /sync, newly joined, or haven't sent the room down the connection before).

This is slightly different than what the spec suggests

Events are ordered in this API according to the arrival time of the event on the homeserver. This can conflict with other APIs which order events based on their partial ordering in the event graph. This can result in duplicate events being received (once per distinct API called). Clients SHOULD de-duplicate events based on the event ID when this happens.

But we've had a discussion below in this PR and this matches what Sync v2 already does and seems like it makes sense. Created a spec issue matrix-org/matrix-spec#1917 to clarify this.

Related issues:

Pull Request Checklist

  • Pull request is based on the develop branch
  • Pull request includes a changelog file. The entry should:
    • Be a short description of your change which makes sense to users. "Fixed a bug that prevented receiving messages from other servers." instead of "Moved X method from EventStore to EventWorkerStore.".
    • Use markdown where necessary, mostly for code blocks.
    • End with either a period (.) or an exclamation mark (!).
    • Start with a capital letter.
    • Feel free to credit yourself, by adding a sentence "Contributed by @github_username." or "Contributed by [Your Name]." to the end of the entry.
  • Code style is correct
    (run the linters)

limit=room_sync_config.timeline_limit + 1,
event_filter=None,
timeline_events, new_room_key = (
await self.store.get_room_events_stream_for_room(
Copy link
Contributor Author

@MadLittleMods MadLittleMods Jul 31, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As an alternative, we could adapt paginate_room_events(...) to conditionally use topological_ordering vs stream_ordering

)
# We want to return the events in ascending order (the last event is the
# most recent).
events = events.reverse()
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Since we seem to need to reverse this everywhere it's used, we could just make it the default. To not create surprises, I tried to align with what paginate_room_events(...) currently does.

# We add one so we can determine if there are enough events to saturate
# the limit or not (see `limited`)
limit=room_sync_config.timeline_limit + 1,
)
Copy link
Contributor Author

@MadLittleMods MadLittleMods Jul 31, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There is a possible question of, do we even want to do this?

Using stream_ordering makes sense in terms of /sync returning new items that the server received. it's also what the spec says we should do. We also make sure that if a new event is received, it's returned to the client. Otherwise, it seems like there is an edge case if we use topological_ordering that a new event might not be returned because it's topological position is calculated to be older.

Using topological_ordering makes sense in terms of matching /messages so people don't have to worry about different orders of things between API's.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, so the way that sync v2 works in synapse currently is:

  1. The first time a room is sent down /sync we use paginate_room_events to get a chunk of history for the client, this basically results in the same as if no history was sent down /messages was used.
  2. We use get_room_events_stream_for_rooms to get all updates for rooms since the last /sync, which is in stream ordering.

I think this is probably the right way of looking at it, we use topological ordering when we paginate backwards, but stream ordering to fetch new updates to the room (even if those updates happened a while ago in the past).

What we probably want to do for sliding sync is to more or less replicate that behaviour: when we have an incremental fetch all updates (i.e. get_room_events_stream_for_rooms for joined rooms and fetch all membership changes), and then only use paginate_room_events for the new rooms.

(This also has the advantage we'd only need to sort and filter rooms we know have updates)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sounds pretty reasonable to me 👍

I've created a spec issue to track this: matrix-org/matrix-spec#1917

@MadLittleMods MadLittleMods changed the title Sliding Sync: Use stream_ordering based timeline paginating Sliding Sync: Use stream_ordering based timeline pagination Jul 31, 2024
@MadLittleMods MadLittleMods marked this pull request as ready for review July 31, 2024 22:18
@MadLittleMods MadLittleMods requested a review from a team as a code owner July 31, 2024 22:18
synapse/storage/databases/main/stream.py Show resolved Hide resolved
room_ids, from_key.stream
)
elif direction == Direction.BACKWARDS:
if to_key is not None:
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

to_key is never none?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Makes sense but probably should just update the signature to what get_room_events_stream_for_room(...) uses where it can be None

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Updated and aligned ✅

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thinking about this: we should enforce that to_key is not none when we're going forwards, as we always need to have an upper bound when querying the events stream (as otherwise we'll potentially pick up events that are not fully persisted).

We could do this in the type system, but probably easier to doc and assert it

Copy link
Contributor Author

@MadLittleMods MadLittleMods Aug 6, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think we should do that in this PR (some future task) given that it looks like we don't actually do this in practice yet. For example /messages?dir=f -> get_messages() will call paginate_room_events_by_topological_ordering(from_key=xxx, to_key=None, Direction.FORWARDS). Making this change would require some more default tokens in various places that should have their own PR to think through.

Added some FIXME comments

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh lord really? Bleurgh, then fair.

# We add one so we can determine if there are enough events to saturate
# the limit or not (see `limited`)
limit=room_sync_config.timeline_limit + 1,
)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, so the way that sync v2 works in synapse currently is:

  1. The first time a room is sent down /sync we use paginate_room_events to get a chunk of history for the client, this basically results in the same as if no history was sent down /messages was used.
  2. We use get_room_events_stream_for_rooms to get all updates for rooms since the last /sync, which is in stream ordering.

I think this is probably the right way of looking at it, we use topological ordering when we paginate backwards, but stream ordering to fetch new updates to the room (even if those updates happened a while ago in the past).

What we probably want to do for sliding sync is to more or less replicate that behaviour: when we have an incremental fetch all updates (i.e. get_room_events_stream_for_rooms for joined rooms and fetch all membership changes), and then only use paginate_room_events for the new rooms.

(This also has the advantage we'd only need to sort and filter rooms we know have updates)

@MadLittleMods MadLittleMods changed the title Sliding Sync: Use stream_ordering based timeline pagination Sliding Sync: Use stream_ordering based timeline pagination for incremental sync Aug 5, 2024
@github-actions github-actions bot deployed to PR Documentation Preview August 5, 2024 23:14 Active
@github-actions github-actions bot deployed to PR Documentation Preview August 5, 2024 23:20 Active
):
# Token selection matches what we do in `_paginate_room_events_txn` if there
# are no rows
return [], to_key if to_key else from_key
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We should keep these outside, as if we hit the code paths we will save getting a DB connection and switching threads

Copy link
Contributor Author

@MadLittleMods MadLittleMods Aug 6, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should we have it in both places? I moved it because we have plenty of places that use _paginate_room_events_by_topological_ordering_txn(...) directly but don't have the checks outside

(currently updated with the checks in both places)

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, having in both is fine, though ideally we'd always try and catch it before we went on a DB thread

room_ids, from_key.stream
)
elif direction == Direction.BACKWARDS:
if to_key is not None:
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thinking about this: we should enforce that to_key is not none when we're going forwards, as we always need to have an upper bound when querying the events stream (as otherwise we'll potentially pick up events that are not fully persisted).

We could do this in the type system, but probably easier to doc and assert it

…imeline-paginating

Conflicts:
	synapse/storage/databases/main/stream.py
@github-actions github-actions bot deployed to PR Documentation Preview August 6, 2024 19:03 Active
@github-actions github-actions bot deployed to PR Documentation Preview August 6, 2024 19:06 Active
@MadLittleMods MadLittleMods merged commit 11db575 into develop Aug 7, 2024
41 checks passed
@MadLittleMods MadLittleMods deleted the madlittlemods/use-stream-ordering-based-timeline-paginating branch August 7, 2024 16:27
@MadLittleMods
Copy link
Contributor Author

Thanks for the review @erikjohnston 🐓

netbsd-srcmastr pushed a commit to NetBSD/pkgsrc that referenced this pull request Sep 14, 2024
# Synapse 1.114.0 (2024-09-02)

This release enables support for
[MSC4186](matrix-org/matrix-spec-proposals#4186) —
Simplified Sliding Sync. This allows using the upcoming releases of the Element
X mobile apps without having to run a Sliding Sync Proxy.


### Features

- Enable native sliding sync support ([MSC3575](matrix-org/matrix-spec-proposals#3575) and [MSC4186](matrix-org/matrix-spec-proposals#4186)) by default. ([\#17648](element-hq/synapse#17648))




# Synapse 1.114.0rc3 (2024-08-30)

### Bugfixes

- Fix regression in v1.114.0rc2 that caused workers to fail to start. ([\#17626](element-hq/synapse#17626))




# Synapse 1.114.0rc2 (2024-08-30)

### Features

- Improve cross-signing upload when using [MSC3861](matrix-org/matrix-spec-proposals#3861) to use a custom UIA flow stage, with web fallback support. ([\#17509](element-hq/synapse#17509))
- Make `hash_password` script accept password input from stdin. ([\#17608](element-hq/synapse#17608))

### Bugfixes

- Fix hierarchy returning 403 when room is accessible through federation. Contributed by Krishan (@kfiven). ([\#17194](element-hq/synapse#17194))
- Fix content-length on federation `/thumbnail` responses. ([\#17532](element-hq/synapse#17532))
- Fix authenticated media responses using a wrong limit when following redirects over federation. ([\#17543](element-hq/synapse#17543))

### Internal Changes

- MSC3861: load the issuer and account management URLs from OIDC discovery. ([\#17407](element-hq/synapse#17407))
- Refactor sliding sync class into multiple files. ([\#17595](element-hq/synapse#17595))
- Store sliding sync per-connection state in the database. ([\#17599](element-hq/synapse#17599))
- Make the sliding sync `PerConnectionState` class immutable. ([\#17600](element-hq/synapse#17600))
- Add support to `@tag_args` for standalone functions. ([\#17604](element-hq/synapse#17604))
- Speed up incremental syncs in sliding sync by adding some more caching. ([\#17606](element-hq/synapse#17606))
- Always return the user's own read receipts in sliding sync. ([\#17617](element-hq/synapse#17617))
- Replace `isort` and `black` with `ruff`. ([\#17620](element-hq/synapse#17620))
- Refactor sliding sync code to move room list logic out into a separate class. ([\#17622](element-hq/synapse#17622))



### Updates to locked dependencies

* Bump attrs from 23.2.0 to 24.2.0. ([\#17609](element-hq/synapse#17609))
* Bump cryptography from 42.0.8 to 43.0.0. ([\#17584](element-hq/synapse#17584))
* Bump phonenumbers from 8.13.43 to 8.13.44. ([\#17610](element-hq/synapse#17610))
* Bump pygithub from 2.3.0 to 2.4.0. ([\#17612](element-hq/synapse#17612))
* Bump pyyaml from 6.0.1 to 6.0.2. ([\#17611](element-hq/synapse#17611))
* Bump sentry-sdk from 2.12.0 to 2.13.0. ([\#17585](element-hq/synapse#17585))
* Bump serde from 1.0.206 to 1.0.208. ([\#17581](element-hq/synapse#17581))
* Bump serde from 1.0.208 to 1.0.209. ([\#17613](element-hq/synapse#17613))
* Bump serde_json from 1.0.124 to 1.0.125. ([\#17582](element-hq/synapse#17582))
* Bump serde_json from 1.0.125 to 1.0.127. ([\#17614](element-hq/synapse#17614))
* Bump types-jsonschema from 4.23.0.20240712 to 4.23.0.20240813. ([\#17583](element-hq/synapse#17583))
* Bump types-setuptools from 71.1.0.20240726 to 71.1.0.20240818. ([\#17586](element-hq/synapse#17586))

# Synapse 1.114.0rc1 (2024-08-20)

### Features

- Add a flag to `/versions`, `org.matrix.simplified_msc3575`, to indicate whether experimental sliding sync support has been enabled. ([\#17571](element-hq/synapse#17571))
- Handle changes in `timeline_limit` in experimental sliding sync. ([\#17579](element-hq/synapse#17579))
- Correctly track read receipts that should be sent down in experimental sliding sync. ([\#17575](element-hq/synapse#17575), [\#17589](element-hq/synapse#17589), [\#17592](element-hq/synapse#17592))

### Bugfixes

- Start handlers for new media endpoints when media resource configured. ([\#17483](element-hq/synapse#17483))
- Fix timeline ordering (using `stream_ordering` instead of topological ordering) in experimental [MSC3575](matrix-org/matrix-spec-proposals#3575) Sliding Sync `/sync` endpoint. ([\#17510](element-hq/synapse#17510))
- Fix experimental sliding sync implementation to remember any updates in rooms that were not sent down immediately. ([\#17535](element-hq/synapse#17535))
- Better exclude partially stated rooms if we must await full state in experimental [MSC3575](matrix-org/matrix-spec-proposals#3575) Sliding Sync `/sync` endpoint. ([\#17538](element-hq/synapse#17538))
- Handle lower-case http headers in `_Mulitpart_Parser_Protocol`. ([\#17545](element-hq/synapse#17545))
- Fix fetching federation signing keys from servers that omit `old_verify_keys`. Contributed by @tulir @ Beeper. ([\#17568](element-hq/synapse#17568))
- Fix bug where we would respond with an error when a remote server asked for media that had a length of 0, using the new multipart federation media endpoint. ([\#17570](element-hq/synapse#17570))

### Improved Documentation

- Clarify default behaviour of the
  [`auto_accept_invites.worker_to_run_on`](https://element-hq.github.io/synapse/develop/usage/configuration/config_documentation.html#auto-accept-invites)
  option. ([\#17515](element-hq/synapse#17515))
- Improve docstrings for profile methods. ([\#17559](element-hq/synapse#17559))

### Internal Changes

- Add more tracing to experimental [MSC3575](matrix-org/matrix-spec-proposals#3575) Sliding Sync `/sync` endpoint. ([\#17514](element-hq/synapse#17514))
- Fixup comment in sliding sync implementation. ([\#17531](element-hq/synapse#17531))
- Replace override of deprecated method `HTTPAdapter.get_connection` with `get_connection_with_tls_context`. ([\#17536](element-hq/synapse#17536))
- Fix performance of device lists in `/key/changes` and sliding sync. ([\#17537](element-hq/synapse#17537), [\#17548](element-hq/synapse#17548))
- Bump setuptools from 67.6.0 to 72.1.0. ([\#17542](element-hq/synapse#17542))
- Add a utility function for generating random event IDs. ([\#17557](element-hq/synapse#17557))
- Speed up responding to media requests. ([\#17558](element-hq/synapse#17558), [\#17561](element-hq/synapse#17561), [\#17564](element-hq/synapse#17564), [\#17566](element-hq/synapse#17566), [\#17567](element-hq/synapse#17567), [\#17569](element-hq/synapse#17569))
- Test github token before running release script steps. ([\#17562](element-hq/synapse#17562))
- Reduce log spam of multipart files. ([\#17563](element-hq/synapse#17563))
- Refactor per-connection state in experimental sliding sync handler. ([\#17574](element-hq/synapse#17574))
- Add histogram metrics for sliding sync processing time. ([\#17593](element-hq/synapse#17593))



### Updates to locked dependencies

* Bump bytes from 1.6.1 to 1.7.1. ([\#17526](element-hq/synapse#17526))
* Bump lxml from 5.2.2 to 5.3.0. ([\#17550](element-hq/synapse#17550))
* Bump phonenumbers from 8.13.42 to 8.13.43. ([\#17551](element-hq/synapse#17551))
* Bump regex from 1.10.5 to 1.10.6. ([\#17527](element-hq/synapse#17527))
* Bump sentry-sdk from 2.10.0 to 2.12.0. ([\#17553](element-hq/synapse#17553))
* Bump serde from 1.0.204 to 1.0.206. ([\#17556](element-hq/synapse#17556))
* Bump serde_json from 1.0.122 to 1.0.124. ([\#17555](element-hq/synapse#17555))
* Bump sigstore/cosign-installer from 3.5.0 to 3.6.0. ([\#17549](element-hq/synapse#17549))
* Bump types-pyyaml from 6.0.12.20240311 to 6.0.12.20240808. ([\#17552](element-hq/synapse#17552))
* Bump types-requests from 2.31.0.20240406 to 2.32.0.20240712. ([\#17524](element-hq/synapse#17524))

# Synapse 1.113.0 (2024-08-13)

No significant changes since 1.113.0rc1.




# Synapse 1.113.0rc1 (2024-08-06)

### Features

- Track which rooms have been sent to clients in the experimental [MSC3575](matrix-org/matrix-spec-proposals#3575) Sliding Sync `/sync` endpoint. ([\#17447](element-hq/synapse#17447))
- Add Account Data extension support to experimental [MSC3575](matrix-org/matrix-spec-proposals#3575) Sliding Sync `/sync` endpoint. ([\#17477](element-hq/synapse#17477))
- Add receipts extension support to experimental [MSC3575](matrix-org/matrix-spec-proposals#3575) Sliding Sync `/sync` endpoint. ([\#17489](element-hq/synapse#17489))
- Add typing notification extension support to experimental [MSC3575](matrix-org/matrix-spec-proposals#3575) Sliding Sync `/sync` endpoint. ([\#17505](element-hq/synapse#17505))

### Bugfixes

- Update experimental [MSC3575](matrix-org/matrix-spec-proposals#3575) Sliding Sync `/sync` endpoint to handle invite/knock rooms when filtering. ([\#17450](element-hq/synapse#17450))
- Fix a bug introduced in v1.110.0 which caused `/keys/query` to return incomplete results, leading to high network activity and CPU usage on Matrix clients. ([\#17499](element-hq/synapse#17499))

### Improved Documentation

- Update the [`allowed_local_3pids`](https://element-hq.github.io/synapse/v1.112/usage/configuration/config_documentation.html#allowed_local_3pids) config option's msisdn address to a working example. ([\#17476](element-hq/synapse#17476))

### Internal Changes

- Change sliding sync to use their own token format in preparation for storing per-connection state. ([\#17452](element-hq/synapse#17452))
- Ensure we don't send down negative `bump_stamp` in experimental sliding sync endpoint. ([\#17478](element-hq/synapse#17478))
- Do not send down empty room entries down experimental sliding sync endpoint. ([\#17479](element-hq/synapse#17479))
- Refactor Sliding Sync tests to better utilize the `SlidingSyncBase`. ([\#17481](element-hq/synapse#17481), [\#17482](element-hq/synapse#17482))
- Add some opentracing tags and logging to the experimental sliding sync implementation. ([\#17501](element-hq/synapse#17501))
- Split and move Sliding Sync tests so we have some more sane test file sizes. ([\#17504](element-hq/synapse#17504))
- Update the `limited` field description in the Sliding Sync response to accurately describe what it actually represents. ([\#17507](element-hq/synapse#17507))
- Easier to understand `timeline` assertions in Sliding Sync tests. ([\#17511](element-hq/synapse#17511))
- Reset the sliding sync connection if we don't recognize the per-connection state position. ([\#17529](element-hq/synapse#17529))



### Updates to locked dependencies

* Bump bcrypt from 4.1.3 to 4.2.0. ([\#17495](element-hq/synapse#17495))
* Bump black from 24.4.2 to 24.8.0. ([\#17522](element-hq/synapse#17522))
* Bump phonenumbers from 8.13.39 to 8.13.42. ([\#17521](element-hq/synapse#17521))
* Bump ruff from 0.5.4 to 0.5.5. ([\#17494](element-hq/synapse#17494))
* Bump serde_json from 1.0.120 to 1.0.121. ([\#17493](element-hq/synapse#17493))
* Bump serde_json from 1.0.121 to 1.0.122. ([\#17525](element-hq/synapse#17525))
* Bump towncrier from 23.11.0 to 24.7.1. ([\#17523](element-hq/synapse#17523))
* Bump types-pyopenssl from 24.1.0.20240425 to 24.1.0.20240722. ([\#17496](element-hq/synapse#17496))
* Bump types-setuptools from 70.1.0.20240627 to 71.1.0.20240726. ([\#17497](element-hq/synapse#17497))
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants