Skip to content
This repository has been archived by the owner on Apr 26, 2024. It is now read-only.

Check the stream position before checking if the cache is empty. #14639

Merged
merged 5 commits into from
Dec 8, 2022
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
1 change: 1 addition & 0 deletions changelog.d/14639.bugfix
Original file line number Diff line number Diff line change
@@ -0,0 +1 @@
Fix a long-standing bug where the user directory and room/user stats might be out of sync.
9 changes: 5 additions & 4 deletions synapse/util/caches/stream_change_cache.py
Original file line number Diff line number Diff line change
Expand Up @@ -213,16 +213,17 @@ def has_any_entity_changed(self, stream_pos: int) -> bool:
"""
assert isinstance(stream_pos, int)

if not self._cache:
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Note that has_any_entity_changed only has a single caller: get_partial_current_state_deltas, which is called from get_current_state_deltas which affects presence, stats, and the user directory.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Searching the matrix.org database to see if there are descrepencies that could be due to this (or #14435 or #14592) shows that there are rooms where the room stats do not match current state events, this makes me think the user directory table might also be out of date? (I'll file a separate issue about this, I think.)

SELECT room_id, event_id, name AS stats_name, json::json->'content'->>'name' AS json_name
FROM current_state_events
INNER JOIN event_json USING (event_id, room_id)
INNER JOIN room_stats_state USING (room_id)
WHERE
    type = 'm.room.name' AND state_key = ''
    AND name != json::json->'content'->>'name';

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

See #14643.

# If the cache is empty, nothing can have changed.
return False

# _cache is not valid at or before the earliest known stream position, so
# return that an entity has changed.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We should add a comment here that the ordering is important. Something like:

It's important that we check stream position for changes first before we do the early-return check against the cache below. If the stream position is at/earlier than the earliest known stream position, but the cache is empty we should still consider entities to have changed.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@clokep Thoughts? Picking this up?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think this is necessary -- the updated comments are clear and the tests assert this.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@clokep I don't pick any of this information up from the comments. Only from your PR description.

It's great that we have tests for this though.

if stream_pos <= self._earliest_known_stream_pos:
self.metrics.inc_misses()
return True

# If the cache is empty, nothing can have changed.
if not self._cache:
self.metrics.inc_misses()
return False

self.metrics.inc_hits()
return stream_pos < self._cache.peekitem()[0]

Expand Down
7 changes: 4 additions & 3 deletions tests/util/test_stream_change_cache.py
Original file line number Diff line number Diff line change
Expand Up @@ -144,9 +144,10 @@ def test_has_any_entity_changed(self) -> None:
"""
cache = StreamChangeCache("#test", 1)

# With no entities, it returns False for the past, present, and future.
self.assertFalse(cache.has_any_entity_changed(0))
self.assertFalse(cache.has_any_entity_changed(1))
# With no entities, it returns True for the past, present, and False for
# the future.
self.assertTrue(cache.has_any_entity_changed(0))
self.assertTrue(cache.has_any_entity_changed(1))
self.assertFalse(cache.has_any_entity_changed(2))

# We add an entity
Expand Down