Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add Sliding Sync /sync/e2ee endpoint for To-Device messages #17167

Merged
merged 39 commits into from
May 23, 2024

Conversation

MadLittleMods
Copy link
Contributor

@MadLittleMods MadLittleMods commented May 7, 2024

Add Sliding Sync /sync/e2ee endpoint for To-Device messages.

This is being introduced as part of Sliding Sync but doesn't have any sliding window component. It's just a way to get E2EE events without having to sit through a big initial sync (/sync v2). And we can avoid encryption events being backed up by the main sync response or vice-versa.

Splitting To-Device messages out to its own endpoint also helps when clients need to have 2 or more sync streams open at a time, e.g a push notification process and a main process. This can cause the two processes to race to fetch the To-Device events, resulting in the need for complex synchronisation rules to ensure the token is correctly and atomically exchanged between processes. (based on the words from MSC3885)

Part of the Sliding Sync simplification. See this discussion below for why it may not be as useful as we thought before implementing.

Based on:

Dev notes

Running relevant tests:

$ SYNAPSE_POSTGRES=1 SYNAPSE_POSTGRES_USER=postgres SYNAPSE_TEST_LOG_LEVEL=INFO poetry run trial tests.rest.client.test_sliding_sync

# Or without running with poetry but in the poetry environment
$ poetry shell
SYNAPSE_POSTGRES=1 SYNAPSE_POSTGRES_USER=postgres SYNAPSE_TEST_LOG_LEVEL=INFO python -m twisted.trial tests.rest.client.test_sync tests.rest.client.test_sliding_sync tests.rest.client.test_sendtodevice

$ SYNAPSE_POSTGRES=1 SYNAPSE_POSTGRES_USER=postgres SYNAPSE_TEST_LOG_LEVEL=INFO python -m twisted.trial tests.rest.client.test_sliding_sync

Notes on sharing/inheriting Twisted trial unit tests: #17167 (comment)


Putting quotes around a type is called a forward reference. (mypy)

Pull Request Checklist

  • Pull request is based on the develop branch
  • Pull request includes a changelog file. The entry should:
    • Be a short description of your change which makes sense to users. "Fixed a bug that prevented receiving messages from other servers." instead of "Moved X method from EventStore to EventWorkerStore.".
    • Use markdown where necessary, mostly for code blocks.
    • End with either a period (.) or an exclamation mark (!).
    • Start with a capital letter.
    • Feel free to credit yourself, by adding a sentence "Contributed by @github_username." or "Contributed by [Your Name]." to the end of the entry.
  • Code style is correct
    (run the linters)

@CLAassistant
Copy link

CLAassistant commented May 7, 2024

CLA assistant check
All committers have signed the CLA.

@MadLittleMods MadLittleMods changed the title Add Sliding Sync /sync/e2eeendpoint for To-Device messages Add Sliding Sync /sync/e2ee endpoint for To-Device messages May 7, 2024
We were using the enum just to distinguish /sync v2
vs Sliding Sync /sync/e2ee so we should just make an
enum for that instead of trying to glom onto the
existing `sync_type` (overloading it).
Comment on lines 1867 to 1873
# TODO: Do we need to worry about these? All of this info is
# normally calculated when we `_generate_sync_entry_for_rooms()` but we
# probably don't want to do all of that work for this endpoint.
newly_joined_rooms=frozenset(),
newly_joined_or_invited_or_knocked_users=frozenset(),
newly_left_rooms=frozenset(),
newly_left_users=frozenset(),
Copy link
Contributor Author

@MadLittleMods MadLittleMods May 9, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Since I'm not familiar exactly with how /sync should operate or exactly what we care about, I need some advice on whether this part matters.

Does /sync/e2ee need to care about the device changes that would come from doing all of that extra work with rooms?

Perhaps device_lists (device changes) should be in their own endpoint? It seems like a lot of work but I haven't profiled or seen how fast all of things go.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, it's important that device_lists get told about those things, which sucks a bit. I think you're right that in future we might want to reconsider bundling device_lists and e2ee, but I think for now we should keep it bundled (as that is what the Rust SDK expects).

We can always change it later once we can try it out in practice.

Copy link
Contributor Author

@MadLittleMods MadLittleMods May 20, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Updated to include all of the room derived information ✅

As far as I can tell, we don't have tests for the scenarios that this important information is necessary for.


It feels like all the benefits of this endpoint have fallen away though.

We're doing all the same work for this new endpoint as the old sync v2 so the performance is bound to be about the same. Which also means To-Device events will probably get backed up by other events above the same. I did add a filter to only work with membership events which might help.

Work Sync v2 /sync/e2ee
_generate_sync_entry_for_account_data()
_generate_sync_entry_for_rooms() ✅ (sets up derived data for device_lists)
_generate_sync_entry_for_presence() ✅ (not run for matrix.org anyway)
_generate_sync_entry_for_device_list()
_generate_sync_entry_for_to_device()
device_one_time_keys_count
device_unused_fallback_key_types

And it doesn't seem to relieve the Sliding Sync proxy if we're going to use the same device for sync v2 and this new endpoint. This point is being tracked in this discussion -> #17167 (comment)

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hmm, if we can split up _generate_sync_entry_for_rooms so that we can get just the membership changes then I think that would help a lot. Though equally happy to land it as is and do follow up in a separate PR

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

They're pretty gnarled and coupled at the moment. In order to decouple, instead of trying to re-use the data we pull out in _generate_sync_entry_for_rooms(), it might be easier just to do our own dedicated lookup to figure out these membership changes in our own new function.

I would definitely prefer to land this and update in a follow-up 👍

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@erikjohnston Before trying to optimize /sync/e2ee, it would be good to see just how fast/slow it goes. Traces from matrix.org on good size account would probably give us the best indication. Not sure how hard it may be to get enough to_device/device_lists traffic for a good sample

@@ -689,43 +689,177 @@ def test_noop_sync_does_not_tightloop(self) -> None:


class DeviceListSyncTestCase(unittest.HomeserverTestCase):
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Consolidated all of these device_lists tests spread between test_devices.py and test_sync.py to here

Comment on lines 880 to 881
class DeviceOneTimeKeysSyncTestCase(unittest.HomeserverTestCase):
"""Tests regarding device one time keys (`device_one_time_keys_count`) changes."""
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Added tests for device_one_time_keys_count in /sync because they didn't exist before

@MadLittleMods MadLittleMods requested review from erikjohnston and removed request for erikjohnston May 20, 2024 18:03
synapse/rest/client/sync.py Outdated Show resolved Hide resolved
synapse/rest/client/sync.py Show resolved Hide resolved
Copy link
Member

@erikjohnston erikjohnston left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Otherwise I think looks good

tests/rest/client/test_sendtodevice.py Outdated Show resolved Hide resolved
 - Added `room.account_data` and `room.presence` to avoid extra work in `_generate_sync_entry_for_rooms()`
 - Added a comment to the top-level `account_data` and `presence` filters that `(This is just here for good measure)`

See #17167 (comment)
@MadLittleMods MadLittleMods merged commit c97251d into develop May 23, 2024
38 checks passed
@MadLittleMods MadLittleMods deleted the madlittlemods/msc3575-sliding-sync-e2ee branch May 23, 2024 17:06
@MadLittleMods
Copy link
Contributor Author

Thanks for the review @erikjohnston and tips/context @anoadragon453 @kegsay @Hywan @MatMaul 🐧

@Hywan
Copy link
Member

Hywan commented May 27, 2024

🎉 thanks for your work!

H-Shay pushed a commit to H-Shay/hq_synapse that referenced this pull request May 31, 2024
…t-hq#17167)

This is being introduced as part of Sliding Sync but doesn't have any sliding window component. It's just a way to get E2EE events without having to sit through a big initial sync  (`/sync` v2). And we can avoid encryption events being backed up by the main sync response or vice-versa.

Part of some Sliding Sync simplification/experimentation. See [this discussion](element-hq#17167 (comment)) for why it may not be as useful as we thought.

Based on:

 - matrix-org/matrix-spec-proposals#3575
 - matrix-org/matrix-spec-proposals#3885
 - matrix-org/matrix-spec-proposals#3884
Mic92 pushed a commit to Mic92/synapse that referenced this pull request Jun 14, 2024
…t-hq#17167)

This is being introduced as part of Sliding Sync but doesn't have any sliding window component. It's just a way to get E2EE events without having to sit through a big initial sync  (`/sync` v2). And we can avoid encryption events being backed up by the main sync response or vice-versa.

Part of some Sliding Sync simplification/experimentation. See [this discussion](element-hq#17167 (comment)) for why it may not be as useful as we thought.

Based on:

 - matrix-org/matrix-spec-proposals#3575
 - matrix-org/matrix-spec-proposals#3885
 - matrix-org/matrix-spec-proposals#3884
netbsd-srcmastr pushed a commit to NetBSD/pkgsrc that referenced this pull request Jun 18, 2024
# Synapse 1.109.0 (2024-06-18)

- Add the ability to auto-accept invites on the behalf of users. See
  the
  [`auto_accept_invites`](https://element-hq.github.io/synapse/latest/usage/configuration/config_documentation.html#auto-accept-invites)
  config option for
  details. ([\#17147](element-hq/synapse#17147))

- Add experimental
  [MSC3575](matrix-org/matrix-spec-proposals#3575)
  Sliding Sync `/sync/e2ee` endpoint for to-device messages and device
  encryption
  info. ([\#17167](element-hq/synapse#17167))

- Support
  [MSC3916](matrix-org/matrix-spec-proposals#3916)
  by adding unstable media endpoints to
  `/_matrix/client`. ([\#17213](element-hq/synapse#17213))

- Add logging to tasks managed by the task scheduler, showing CPU and
  database
  usage. ([\#17219](element-hq/synapse#17219))


# Synapse 1.108.0 (2024-05-28)

- Add a feature that allows clients to query the configured federation
  whitelist. Disabled by
  default. ([\#16848](element-hq/synapse#16848),
  [\#17199](element-hq/synapse#17199))

- Add the ability to allow numeric user IDs with a specific prefix
  when in the CAS flow. Contributed by Aurélien
  Grimpard. ([\#17098](element-hq/synapse#17098))


Synapse 1.107.0 (2024-05-14)

- Add preliminary support for [MSC3823: Account
  Suspension](matrix-org/matrix-spec-proposals#3823).
  ([\#17051](element-hq/synapse#17051))

- Declare support for [Matrix
  v1.10](https://matrix.org/blog/2024/03/22/matrix-v1.10-release/). Contributed
  by
  @clokep. ([\#17082](element-hq/synapse#17082))

- Add support for [MSC4115: membership metadata on
  events](matrix-org/matrix-spec-proposals#4115).
  ([\#17104](element-hq/synapse#17104),
  [\#17137](element-hq/synapse#17137))


# Synapse 1.106.0 (2024-04-30)

- Send an email if the address is already bound to an user
  account. ([\#16819](element-hq/synapse#16819))

- Implement the rendezvous mechanism described by
  [MSC4108](matrix-org/matrix-spec-proposals#4108).
  ([\#17056](element-hq/synapse#17056))

- Support delegating the rendezvous mechanism described
  [MSC4108](matrix-org/matrix-spec-proposals#4108)
  to an external
  implementation. ([\#17086](element-hq/synapse#17086))
yingziwu added a commit to yingziwu/synapse that referenced this pull request Jun 26, 2024
- Fix the building of binary wheels for macOS by switching to macOS 12 CI runners. ([\#17319](element-hq/synapse#17319))

- When rolling back to a previous Synapse version and then forwards again to this release, don't require server operators to manually run SQL. ([\#17305](element-hq/synapse#17305), [\#17309](element-hq/synapse#17309))

- Use the release branch for sytest in release-branch PRs. ([\#17306](element-hq/synapse#17306))

- Fix bug where one-time-keys were not always included in `/sync` response when using workers. Introduced in v1.109.0rc1. ([\#17275](element-hq/synapse#17275))
- Fix bug where `/sync` could get stuck due to edge case in device lists handling. Introduced in v1.109.0rc1. ([\#17292](element-hq/synapse#17292))

- Add the ability to auto-accept invites on the behalf of users. See the [`auto_accept_invites`](https://element-hq.github.io/synapse/latest/usage/configuration/config_documentation.html#auto-accept-invites) config option for details. ([\#17147](element-hq/synapse#17147))
- Add experimental [MSC3575](matrix-org/matrix-spec-proposals#3575) Sliding Sync `/sync/e2ee` endpoint for to-device messages and device encryption info. ([\#17167](element-hq/synapse#17167))
- Support [MSC3916](matrix-org/matrix-spec-proposals#3916) by adding unstable media endpoints to `/_matrix/client`. ([\#17213](element-hq/synapse#17213))
- Add logging to tasks managed by the task scheduler, showing CPU and database usage. ([\#17219](element-hq/synapse#17219))

- Fix deduplicating of membership events to not create unused state groups. ([\#17164](element-hq/synapse#17164))
- Fix bug where duplicate events could be sent down sync when using workers that are overloaded. ([\#17215](element-hq/synapse#17215))
- Ignore attempts to send to-device messages to bad users, to avoid log spam when we try to connect to the bad server. ([\#17240](element-hq/synapse#17240))
- Fix handling of duplicate concurrent uploading of device one-time-keys. ([\#17241](element-hq/synapse#17241))
- Fix reporting of default tags to Sentry, such as worker name. Broke in v1.108.0. ([\#17251](element-hq/synapse#17251))
- Fix bug where typing updates would not be sent when using workers after a restart. ([\#17252](element-hq/synapse#17252))

- Update the LemonLDAP documentation to say that claims should be explicitly included in the returned `id_token`, as Synapse won't request them. ([\#17204](element-hq/synapse#17204))

- Improve DB usage when fetching related events. ([\#17083](element-hq/synapse#17083))
- Log exceptions when failing to auto-join new user according to the `auto_join_rooms` option. ([\#17176](element-hq/synapse#17176))
- Reduce work of calculating outbound device lists updates. ([\#17211](element-hq/synapse#17211))
- Improve performance of calculating device lists changes in `/sync`. ([\#17216](element-hq/synapse#17216))
- Move towards using `MultiWriterIdGenerator` everywhere. ([\#17226](element-hq/synapse#17226))
- Replaces all usages of `StreamIdGenerator` with `MultiWriterIdGenerator`. ([\#17229](element-hq/synapse#17229))
- Change the `allow_unsafe_locale` config option to also apply when setting up new databases. ([\#17238](element-hq/synapse#17238))
- Fix errors in logs about closing incorrect logging contexts when media gets rejected by a module. ([\#17239](element-hq/synapse#17239), [\#17246](element-hq/synapse#17246))
- Clean out invalid destinations from `device_federation_outbox` table. ([\#17242](element-hq/synapse#17242))
- Stop logging errors when receiving invalid User IDs in key querys requests. ([\#17250](element-hq/synapse#17250))

* Bump anyhow from 1.0.83 to 1.0.86. ([\#17220](element-hq/synapse#17220))
* Bump bcrypt from 4.1.2 to 4.1.3. ([\#17224](element-hq/synapse#17224))
* Bump lxml from 5.2.1 to 5.2.2. ([\#17261](element-hq/synapse#17261))
* Bump mypy-zope from 1.0.3 to 1.0.4. ([\#17262](element-hq/synapse#17262))
* Bump phonenumbers from 8.13.35 to 8.13.37. ([\#17235](element-hq/synapse#17235))
* Bump prometheus-client from 0.19.0 to 0.20.0. ([\#17233](element-hq/synapse#17233))
* Bump pyasn1 from 0.5.1 to 0.6.0. ([\#17223](element-hq/synapse#17223))
* Bump pyicu from 2.13 to 2.13.1. ([\#17236](element-hq/synapse#17236))
* Bump pyopenssl from 24.0.0 to 24.1.0. ([\#17234](element-hq/synapse#17234))
* Bump serde from 1.0.201 to 1.0.202. ([\#17221](element-hq/synapse#17221))
* Bump serde from 1.0.202 to 1.0.203. ([\#17232](element-hq/synapse#17232))
* Bump twine from 5.0.0 to 5.1.0. ([\#17225](element-hq/synapse#17225))
* Bump types-psycopg2 from 2.9.21.20240311 to 2.9.21.20240417. ([\#17222](element-hq/synapse#17222))
* Bump types-pyopenssl from 24.0.0.20240311 to 24.1.0.20240425. ([\#17260](element-hq/synapse#17260))
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

7 participants