-
Notifications
You must be signed in to change notification settings - Fork 38
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Cache invalidation #294
Comments
start simple: e2e test that makes a room, sets a name, syncs and gets name. Then redacts name, sends sentinel, syncs until sentinel, check room name has reset to something sensible |
RoomType shouldn't change here---it's a property of the create event, which is unique and immutable. |
Could also have a NID field on the rooms table tracking the oldest event NID after which we have a contiguous timeline. That would mean we don't have to update 5000 bools if we see a gap after 5000 contiguous timeline events. Or maybe your point is that we have a flag "parentUnknown" or something, default false and set to false in the situation we describe. Then any sql queries that load events paginating backwards have to stop when they see this flag is true. |
Cache invalidation in SS
This is the process of informing downstream API caches that what they have remembered for a room is incorrect and needs refetching from the database. We (dmr & kegan) propose invalidation is room ID scoped, and not more fine-grained for now. This keeps things simple but means we ask the DB for more information than we strictly need to.
Cases when invalidation are needed:
Redactions. Whilst the proxy doesn't cache the event content of messages, it definitely does cache things like the room name. If the room name is redacted, we need to inform the global/user/connection caches that the name is now unset. This also applies to member display names (for
room.name
) and avatar URLs (forroom.avatar
), as well as the canonical alias (forroom.name
).State resolution. This is part of a broader strategy for refreshing entire room state snapshots. The downstream room state caches are currently loaded on startup then strictly updated when live timeline events arrive from the poller. This is a problem when:
In the latter case, we fudged a solution by incorrectly prepending
state
block events to the timeline to get the caches to update downstream correctly. This makes the timeline incorrect when the client asks for a sufficiently hightimeline_limit
though.We propose to add a new payload type
V2InvalidateRoom
which will contain the room ID to invalidate and the new snapshot ID to load from. This will cause downstream API processes to reload caches from that snapshot. The precise data that needs to be reloaded is the entire list of all joined members and invited members in addition to populating this struct:For redactions, we strictly only need
Heroes
,NameEvent
,AvatarEvent
,CanonicalAlias
as they contain redactable user input.For state resets, we need all state related fields. This is all fields with the exception of
LatestEventsByType
,LastMessageTimestamp
andTypingEvent
, though in practice in most cases we would have a new latest event in the room, and the typing event would be nullified as too old.Even if we populate downstream caches effectively, we need to communicate this information to the client. The room list sorting logic assumes 1 update affects at most 1 room and therefore causes at most 0-1 move operations. In the invalidation scenario, this assumption holds if we invalidate 1 room per payload. This also matches reality well as often we are talking about a single room being joined, redacted or state reset, not many. In theory, it should be enough to just send an update to connections with the newly updated
RoomConnMetadata
values and have it do the right thing, but code will need to be checked to not assume that only 1 action can take place per payload (e.g it is possible for the room name to change AND space children to change AND typing to change, etc). We probably also need to resend anyrequired_state
events with their new values. E.g ifm.room.name
was requested and we just redacted it, we should resend it withcontent: {}
.Testing wise, we need to engineer several failure scenarios:
room.name
updates to canonical alias.Redactions work was done in #296.
Detecting cache invalidation in pollers:
limited
andtimeline[0]
is unknown, then we don't know iftimeline[-1]
is known or unknown, so cannot safely respond totimeline_limit
s which span these events. We should "invalidate" the timeline to only return the span we know to be safe (the latest response). We do not store timelines in-memory, so the invalidation is purely database-backed (probably via a flag on the event row itself).state
block for a room, and we don't know some of them (i.e prepend state events logic). At this point, we should make a new snapshot and invalidate caches downstream.The text was updated successfully, but these errors were encountered: