Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Sliding Sync: Update filters to be robust against remote invite rooms #17450

Merged
merged 51 commits into from
Jul 30, 2024
Merged
Show file tree
Hide file tree
Changes from 19 commits
Commits
Show all changes
51 commits
Select commit Hold shift + click to select a range
3a6eec1
Make `filters.is_encrypted` robust to remote invites
MadLittleMods Jul 15, 2024
7f72d80
Try share work
MadLittleMods Jul 15, 2024
18ec437
More robust
MadLittleMods Jul 15, 2024
a67157e
Fix lints
MadLittleMods Jul 15, 2024
ffb61cf
Return bool from db
MadLittleMods Jul 15, 2024
9b77c97
Add tests
MadLittleMods Jul 16, 2024
d609205
Standardize
MadLittleMods Jul 16, 2024
8ca8ae5
Add TODO for more tests
MadLittleMods Jul 16, 2024
4f50054
Add changelog
MadLittleMods Jul 16, 2024
7ae2425
Invalidate caches
MadLittleMods Jul 16, 2024
b35a2da
Use `ROOM_UNKNOWN_SENTINEL`
MadLittleMods Jul 16, 2024
4bb34d3
Iterate on sentinels
MadLittleMods Jul 16, 2024
a07ee22
Simply sentinel for now
MadLittleMods Jul 16, 2024
2c9ec61
Merge branch 'develop' into madlittlemods/robust-remote-invite-filters
MadLittleMods Jul 16, 2024
27a97a0
fetch -> get
MadLittleMods Jul 16, 2024
517bfc4
Update comments
MadLittleMods Jul 16, 2024
d22bae4
Fix typo
MadLittleMods Jul 16, 2024
8624fb0
Better docstring
MadLittleMods Jul 16, 2024
a77b70e
Use constants
MadLittleMods Jul 16, 2024
df5093b
Prefer not implementing if not used
MadLittleMods Jul 18, 2024
7c33c83
Separate to its own function (`_bulk_get_stripped_state_for_rooms` ->…
MadLittleMods Jul 18, 2024
474b480
Use `StateMap`
MadLittleMods Jul 18, 2024
82fa037
Fix function name in assertion message
MadLittleMods Jul 18, 2024
6f05bb3
Add tests for when the server leaves the room
MadLittleMods Jul 18, 2024
d4c4de1
Ideally, not just current state
MadLittleMods Jul 18, 2024
0ace82d
Add `test_filter_encrypted_after_we_left`
MadLittleMods Jul 18, 2024
8c97eaa
Try get server left rooms (not working)
MadLittleMods Jul 19, 2024
8dd5a47
Implement `is_local_host_in_rooms(...)`
MadLittleMods Jul 19, 2024
5267b03
Start of join room_stats_current
MadLittleMods Jul 19, 2024
329bf09
Working `room_stats_state` joined by `room_stats_current.local_users_…
MadLittleMods Jul 19, 2024
d7dcbfd
Add corresponding tests for `filter.room_types`
MadLittleMods Jul 19, 2024
60ec4e3
Fix some lints
MadLittleMods Jul 19, 2024
affee22
Fix lints
MadLittleMods Jul 19, 2024
8ee1708
Explain why we join tables
MadLittleMods Jul 19, 2024
fe2d84b
Use `StrippedStateEvent` type in tests
MadLittleMods Jul 19, 2024
4ae3079
Prefer f-string
MadLittleMods Jul 22, 2024
05bbcb0
Add `_bulk_get_partial_current_state_content_for_rooms_from_sync_room…
MadLittleMods Jul 23, 2024
376d900
Simplify and share
MadLittleMods Jul 23, 2024
28a36e8
Remove unused methods
MadLittleMods Jul 23, 2024
ac7ca67
Maybe current state in the future
MadLittleMods Jul 23, 2024
15e157e
Fix cache busting
MadLittleMods Jul 23, 2024
19968e1
Fix lints
MadLittleMods Jul 23, 2024
8f5e4bf
Shorten name
MadLittleMods Jul 23, 2024
3c144b3
Reconsider all stripped events
MadLittleMods Jul 23, 2024
f3b4c9f
Merge branch 'develop' into madlittlemods/robust-remote-invite-filters
MadLittleMods Jul 23, 2024
1732b02
Prefer cache invalidation helper
MadLittleMods Jul 23, 2024
20902e6
Merge branch 'develop' into madlittlemods/robust-remote-invite-filters
MadLittleMods Jul 25, 2024
e9b76cc
Go back to dedicated `room_stats_state` lookups
MadLittleMods Jul 25, 2024
484007e
Update cache busting
MadLittleMods Jul 25, 2024
7acb1ad
Prefer `assert_never`
MadLittleMods Jul 30, 2024
45ed4ce
Merge branch 'develop' into madlittlemods/robust-remote-invite-filters
MadLittleMods Jul 30, 2024
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
1 change: 1 addition & 0 deletions changelog.d/17450.bugfix
Original file line number Diff line number Diff line change
@@ -0,0 +1 @@
Update experimental [MSC3575](https://github.com/matrix-org/matrix-spec-proposals/pull/3575) Sliding Sync `/sync` endpoint to handle invite/knock rooms when filtering.
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also, wonder if its worth worrying too much about the encrypted filter? Not sure anyone actually uses it?

-- @erikjohnston, #17450 (comment)

The is_encrypted filter and room_types filter both share the same challenges in fetching room state so solving one isn't more work than the other.

We will also need to apply these patterns for the other filters like room_name_like so figuring out how to share the work between filters also makes sense.

We're going to implement these other filters at some point anyway.

6 changes: 6 additions & 0 deletions synapse/api/constants.py
Original file line number Diff line number Diff line change
Expand Up @@ -225,6 +225,9 @@ class EventContentFields:
# This is deprecated in MSC2175.
ROOM_CREATOR: Final = "creator"

# The version of the room for `m.room.create` events.
ROOM_VERSION: Final = "room_version"

# Used in m.room.guest_access events.
GUEST_ACCESS: Final = "guest_access"

Expand All @@ -237,6 +240,9 @@ class EventContentFields:
# an unspecced field added to to-device messages to identify them uniquely-ish
TO_DEVICE_MSGID: Final = "org.matrix.msgid"

# `m.room.encryption`` algorithm field
ENCRYPTION_ALGORITHM: Final = "algorithm"


class EventUnsignedContentFields:
"""Fields found inside the 'unsigned' data on events"""
Expand Down
221 changes: 194 additions & 27 deletions synapse/handlers/sliding_sync.py
Original file line number Diff line number Diff line change
Expand Up @@ -17,18 +17,26 @@
# [This file includes modifications made by New Vector Limited]
#
#
import enum
import logging
from itertools import chain
from typing import TYPE_CHECKING, Any, Dict, Final, List, Mapping, Optional, Set, Tuple

import attr
from immutabledict import immutabledict

from synapse.api.constants import AccountDataTypes, Direction, EventTypes, Membership
from synapse.api.constants import (
AccountDataTypes,
Direction,
EventContentFields,
EventTypes,
Membership,
)
from synapse.events import EventBase
from synapse.events.utils import strip_event
from synapse.handlers.relations import BundledAggregations
from synapse.storage.databases.main.roommember import extract_heroes_from_room_summary
from synapse.storage.databases.main.state import ROOM_UNKNOWN_SENTINEL
from synapse.storage.databases.main.stream import CurrentStateDeltaMembership
from synapse.storage.roommember import MemberSummary
from synapse.types import (
Expand All @@ -37,6 +45,7 @@
Requester,
RoomStreamToken,
StateMap,
StrCollection,
StreamKeyType,
StreamToken,
UserID,
Expand All @@ -51,6 +60,12 @@
logger = logging.getLogger(__name__)


class Sentinel(enum.Enum):
# defining a sentinel in this way allows mypy to correctly handle the
# type of a dictionary lookup and subsequent type narrowing.
UNSET_SENTINEL = object()
Comment on lines +88 to +91
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This currently works and aligns with other sentinels we already have in the codebase. But I can't use the same pattern in synapse/storage/databases/main/state.py because we need an immutable value to cache.

I can't figure out a way that has these mypy type narrowing benefits while also being immutable.

Seems like we need python/mypy#15553 or adjust our custom mypy plugin that checks whether something is_cacheable (not sure exactly what to adjust, maybe a new custom Synapse Sentinel type that we add to IMMUTABLE_CUSTOM_TYPES).

In any case, it works and we could just merge but it would be nice to find a pattern to use everywhere.



# The event types that clients should consider as new activity.
DEFAULT_BUMP_EVENT_TYPES = {
EventTypes.Message,
Expand Down Expand Up @@ -1085,6 +1100,72 @@ async def filter_rooms(
A filtered dictionary of room IDs along with membership information in the
room at the time of `to_token`.
"""
room_id_to_stripped_state_map: Dict[str, Optional[List[Any]]] = {}

async def _bulk_get_stripped_state_for_rooms(
room_ids: StrCollection,
) -> None:
MadLittleMods marked this conversation as resolved.
Show resolved Hide resolved
"""
Fetch stripped state for a list of room IDs. Stripped state is only
applicable to invite/knock rooms. Other rooms will have `None` as their
stripped state.

For invite rooms, we pull from `unsigned.invite_room_state`.
For knock rooms, we pull from `unsigned.knock_room_state`.
"""
# Fetch what we haven't before
room_ids_to_fetch = [
room_id
for room_id in room_ids
if room_id not in room_id_to_stripped_state_map
]

# Gather a list of event IDs we can grab stripped state from
invite_or_knock_event_ids: List[str] = []
for room_id in room_ids_to_fetch:
if sync_room_map[room_id].membership in (
Membership.INVITE,
Membership.KNOCK,
):
event_id = sync_room_map[room_id].event_id
# If this is an invite/knock then there should be an event_id
assert event_id is not None
invite_or_knock_event_ids.append(event_id)
else:
room_id_to_stripped_state_map[room_id] = None

invite_or_knock_events = await self.store.get_events(
invite_or_knock_event_ids
)
for invite_or_knock_event in invite_or_knock_events.values():
room_id = invite_or_knock_event.room_id
membership = invite_or_knock_event.membership

if membership == Membership.INVITE:
# Scrutinize unsigned things
invite_room_state = invite_or_knock_event.unsigned.get(
"invite_room_state", None
)
# `invite_room_state` should be a list of stripped events
if isinstance(invite_room_state, list):
room_id_to_stripped_state_map[room_id] = invite_room_state
else:
room_id_to_stripped_state_map[room_id] = None
elif membership == Membership.KNOCK:
# Scrutinize unsigned things
knock_room_state = invite_or_knock_event.unsigned.get(
"knock_room_state", None
)
# `knock_room_state` should be a list of stripped events
if isinstance(knock_room_state, list):
room_id_to_stripped_state_map[room_id] = knock_room_state
else:
room_id_to_stripped_state_map[room_id] = None
else:
raise AssertionError(
f"Unexpected membership {membership} (this is a problem with Synapse itself)"
)

filtered_room_id_set = set(sync_room_map.keys())

# Filter for Direct-Message (DM) rooms
Expand All @@ -1104,31 +1185,72 @@ async def filter_rooms(
if not sync_room_map[room_id].is_dm
}

if filters.spaces:
if filters.spaces is not None:
raise NotImplementedError()

# Filter for encrypted rooms
if filters.is_encrypted is not None:
# Lookup the encryption state from the database. Since this function is
# cached, we need to make a mutable copy via `dict(...)`.
room_id_to_encryption = dict(
await self.store.bulk_get_room_encryption(filtered_room_id_set)
)
room_ids_with_results = [
room_id
for room_id, encryption in room_id_to_encryption.items()
if encryption is not ROOM_UNKNOWN_SENTINEL
]

# We might not have room state for remote invite/knocks if we are the first
# person on our server to see the room. The best we can do is look in the
# optional stripped state from the invite/knock event.
room_ids_without_results = filtered_room_id_set.difference(
room_ids_with_results
)
await _bulk_get_stripped_state_for_rooms(room_ids_without_results)

# Update our `room_id_to_encryption` map based on the stripped state
for room_id in room_ids_without_results:
stripped_state = room_id_to_stripped_state_map.get(
room_id, Sentinel.UNSET_SENTINEL
)
assert stripped_state is not Sentinel.UNSET_SENTINEL, (
f"Stripped state left unset for room {room_id}. "
+ "Make sure you're calling `_bulk_get_stripped_state_for_rooms(...)` "
+ "with that room_id. (this is a problem with Synapse itself)"
)

if stripped_state is not None:
for event in stripped_state:
event_type = event.get("type")
event_content = event.get("content")
# Scrutinize unsigned stripped state
if isinstance(event_type, str) and isinstance(
event_content, dict
):
if event_type == EventTypes.RoomEncryption:
room_id_to_encryption[room_id] = event_content.get(
EventContentFields.ENCRYPTION_ALGORITHM
)
break
else:
# Didn't see any encryption events in the stripped state so we
# can assume the room is unencrypted.
room_id_to_encryption[room_id] = None

# Make a copy so we don't run into an error: `Set changed size during
# iteration`, when we filter out and remove items
for room_id in filtered_room_id_set.copy():
state_at_to_token = await self.storage_controllers.state.get_state_at(
room_id,
to_token,
state_filter=StateFilter.from_types(
[(EventTypes.RoomEncryption, "")]
),
# Partially-stated rooms should have all state events except for the
# membership events so we don't need to wait because we only care
# about retrieving the `EventTypes.RoomEncryption` state event here.
# Plus we don't want to block the whole sync waiting for this one
# room.
await_full_state=False,
)
is_encrypted = state_at_to_token.get((EventTypes.RoomEncryption, ""))
encryption = room_id_to_encryption.get(room_id, ROOM_UNKNOWN_SENTINEL)

# Just remove rooms if we can't determine their encryption status
if encryption is ROOM_UNKNOWN_SENTINEL:
filtered_room_id_set.remove(room_id)
continue

# If we're looking for encrypted rooms, filter out rooms that are not
# encrypted and vice versa
is_encrypted = encryption is not None
if (filters.is_encrypted and not is_encrypted) or (
not filters.is_encrypted and is_encrypted
):
Expand All @@ -1154,15 +1276,60 @@ async def filter_rooms(
# provided in the list. `None` is a valid type for rooms which do not have a
# room type.
if filters.room_types is not None or filters.not_room_types is not None:
room_to_type = await self.store.bulk_get_room_type(
{
room_id
for room_id in filtered_room_id_set
# We only know the room types for joined rooms
if sync_room_map[room_id].membership == Membership.JOIN
}
# Lookup the room type from the database. Since this function is
# cached, we need to make a mutable copy via `dict(...)`.
MadLittleMods marked this conversation as resolved.
Show resolved Hide resolved
room_id_to_type = dict(
await self.store.bulk_get_room_type(filtered_room_id_set)
)
room_ids_with_results = [
room_id
for room_id, room_type in room_id_to_type.items()
if room_type is not ROOM_UNKNOWN_SENTINEL
]

# We might not have room state for remote invite/knocks if we are the first
# person on our server to see the room. The best we can do is look in the
# optional stripped state from the invite/knock event.
room_ids_without_results = filtered_room_id_set.difference(
room_ids_with_results
)
for room_id, room_type in room_to_type.items():
await _bulk_get_stripped_state_for_rooms(room_ids_without_results)

# Update our `room_id_to_type` map based on the stripped state
for room_id in room_ids_without_results:
stripped_state = room_id_to_stripped_state_map.get(
room_id, Sentinel.UNSET_SENTINEL
)
assert stripped_state is not Sentinel.UNSET_SENTINEL, (
f"Stripped state left unset for room {room_id}. "
+ "Make sure you're calling `_bulk_get_stripped_state_for_rooms(...)` "
+ "with that room_id. (this is a problem with Synapse itself)"
)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this true that either the server is in the room or the membership is INVITE/KNOCK? I'm wondering about the case of e.g. leaves where the server is no longer in the room?

Copy link
Contributor Author

@MadLittleMods MadLittleMods Jul 19, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This code does work fine when the server leaves the room (added some tests for that). For the non-invite/knock rooms, room_id_to_stripped_state_map[room_id] will be set to None and won't hit this UNSET_SENTINEL.


That does bring up a good point though. Ideally, we shouldn't be using current state for LEAVE/BAN rooms.

I think we can assume the room type isn't secret since it's defined at the the start of the room in the m.room.create event meaning if you've had any membership before, you already knew about it and for invite/knock it is handed out as one of the stripped events anyway.

For the encrypted state, most of the time it's set right away when the room is created but it's possible to set it later. Can we assume it's not considered leaking data? I would say yes especially since it's one of the stripped state events that's just given away.

If we can make these assumptions, we can keep our bulk lookup shortcuts which would be nice.


if stripped_state is not None:
for event in stripped_state:
MadLittleMods marked this conversation as resolved.
Show resolved Hide resolved
event_type = event.get("type")
event_content = event.get("content")
# Scrutinize unsigned stripped state
if isinstance(event_type, str) and isinstance(
event_content, dict
):
if event_type == EventTypes.Create:
room_id_to_type[room_id] = event_content.get(
EventContentFields.ROOM_TYPE
)
break

# Make a copy so we don't run into an error: `Set changed size during
# iteration`, when we filter out and remove items
for room_id in filtered_room_id_set.copy():
room_type = room_id_to_type.get(room_id, ROOM_UNKNOWN_SENTINEL)

# Just remove rooms if we can't determine their type
if room_type is ROOM_UNKNOWN_SENTINEL:
filtered_room_id_set.remove(room_id)
continue

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

These filters have gotten complex. There is an inkling to just say, "we only apply filters to joined rooms". But the problem with that is when you're joined to a room and filtering with sliding sync, as soon as you leave the room, you won't get that final update down sync because the room was filtered out.

It's important that the filters apply regardless of your membership.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is there any way we can deduplicate some of these things? Maybe we need to have filter classes or something?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I shared the logic in a new function _bulk_get_partial_current_state_content_for_rooms(...)

if (
filters.room_types is not None
and room_type not in filters.room_types
Expand All @@ -1175,13 +1342,13 @@ async def filter_rooms(
):
filtered_room_id_set.remove(room_id)

if filters.room_name_like:
if filters.room_name_like is not None:
raise NotImplementedError()

if filters.tags:
if filters.tags is not None:
raise NotImplementedError()

if filters.not_tags:
if filters.not_tags is not None:
raise NotImplementedError()

# Assemble a new sync room map but only with the `filtered_room_id_set`
Expand Down
4 changes: 3 additions & 1 deletion synapse/handlers/stats.py
Original file line number Diff line number Diff line change
Expand Up @@ -293,7 +293,9 @@ async def _handle_deltas(
"history_visibility"
)
elif delta.event_type == EventTypes.RoomEncryption:
room_state["encryption"] = event_content.get("algorithm")
room_state["encryption"] = event_content.get(
EventContentFields.ENCRYPTION_ALGORITHM
)
elif delta.event_type == EventTypes.Name:
room_state["name"] = event_content.get("name")
elif delta.event_type == EventTypes.Topic:
Expand Down
4 changes: 4 additions & 0 deletions synapse/storage/_base.py
Original file line number Diff line number Diff line change
Expand Up @@ -124,6 +124,8 @@ def _invalidate_state_caches(
# Purge other caches based on room state.
self._attempt_to_invalidate_cache("get_room_summary", (room_id,))
self._attempt_to_invalidate_cache("get_partial_current_state_ids", (room_id,))
self._attempt_to_invalidate_cache("get_room_type", (room_id,))
self._attempt_to_invalidate_cache("get_room_encryption", (room_id,))

def _invalidate_state_caches_all(self, room_id: str) -> None:
"""Invalidates caches that are based on the current state, but does
Expand All @@ -147,6 +149,8 @@ def _invalidate_state_caches_all(self, room_id: str) -> None:
self._attempt_to_invalidate_cache("get_user_in_room_with_profile", None)
self._attempt_to_invalidate_cache("get_rooms_for_user", None)
self._attempt_to_invalidate_cache("get_room_summary", (room_id,))
self._attempt_to_invalidate_cache("get_room_type", (room_id,))
self._attempt_to_invalidate_cache("get_room_encryption", (room_id,))
MadLittleMods marked this conversation as resolved.
Show resolved Hide resolved

def _attempt_to_invalidate_cache(
self, cache_name: str, key: Optional[Collection[Any]]
Expand Down
14 changes: 14 additions & 0 deletions synapse/storage/databases/main/cache.py
Original file line number Diff line number Diff line change
Expand Up @@ -269,12 +269,18 @@ def _process_event_stream_row(self, token: int, row: EventsStreamRow) -> None:

if data.type == EventTypes.Member:
self.get_rooms_for_user.invalidate((data.state_key,)) # type: ignore[attr-defined]
elif data.type == EventTypes.RoomEncryption:
self.get_room_encryption.invalidate((data.room_id,)) # type: ignore[attr-defined]
elif data.type == EventTypes.Create:
self.get_room_type.invalidate((data.room_id,)) # type: ignore[attr-defined]
elif row.type == EventsStreamAllStateRow.TypeId:
assert isinstance(data, EventsStreamAllStateRow)
# Similar to the above, but the entire caches are invalidated. This is
# unfortunate for the membership caches, but should recover quickly.
self._curr_state_delta_stream_cache.entity_has_changed(data.room_id, token) # type: ignore[attr-defined]
self.get_rooms_for_user.invalidate_all() # type: ignore[attr-defined]
self.get_room_type.invalidate((data.room_id,)) # type: ignore[attr-defined]
self.get_room_encryption.invalidate((data.room_id,)) # type: ignore[attr-defined]
MadLittleMods marked this conversation as resolved.
Show resolved Hide resolved
else:
raise Exception("Unknown events stream row type %s" % (row.type,))

Expand Down Expand Up @@ -342,6 +348,10 @@ def _invalidate_caches_for_event(
self._attempt_to_invalidate_cache(
"get_forgotten_rooms_for_user", (state_key,)
)
elif etype == EventTypes.Create:
self._attempt_to_invalidate_cache("get_room_type", (room_id,))
elif etype == EventTypes.RoomEncryption:
self._attempt_to_invalidate_cache("get_room_encryption", (room_id,))

if relates_to:
self._attempt_to_invalidate_cache(
Expand Down Expand Up @@ -399,6 +409,8 @@ def _invalidate_caches_for_room_events(self, room_id: str) -> None:
self._attempt_to_invalidate_cache("get_thread_summary", None)
self._attempt_to_invalidate_cache("get_thread_participated", None)
self._attempt_to_invalidate_cache("get_threads", (room_id,))
self._attempt_to_invalidate_cache("get_room_type", (room_id,))
self._attempt_to_invalidate_cache("get_room_encryption", (room_id,))

self._attempt_to_invalidate_cache("_get_state_group_for_event", None)

Expand Down Expand Up @@ -451,6 +463,8 @@ def _invalidate_caches_for_room(self, room_id: str) -> None:
self._attempt_to_invalidate_cache("get_forgotten_rooms_for_user", None)
self._attempt_to_invalidate_cache("_get_membership_from_event_id", None)
self._attempt_to_invalidate_cache("get_room_version_id", (room_id,))
self._attempt_to_invalidate_cache("get_room_type", (room_id,))
self._attempt_to_invalidate_cache("get_room_encryption", (room_id,))

# And delete state caches.

Expand Down
Loading
Loading