-
-
Notifications
You must be signed in to change notification settings - Fork 2.1k
When restarting a partial join resync, prioritise the server which actioned a partial join #14126
Conversation
This means that - if we decline to retry, we'll move on to try another HS - if there are no other servers left to retry, we'll have slightly better logging
6e292a8
to
b9a77e8
Compare
# TODO(faster joins): To make this robust we should run both SELECTs in the | ||
# same transaction. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I left this as a TODO for pragmatism's sake (read: I don't want to read database.py right now). I'm assuming it's very unlikely that a new partial join will have started and completed between the two SELECTs.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Note that a transaction only helps you at a REPEATABLE READ
(or higher) isolation level (which, in fairness, we do use by default, but every so often we talk about reducing it).
I'd just do room_servers.get(room_id)
and add a check for None
, to save having to worry about it.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks---the comment is naive as written.
(The reference here is https://www.postgresql.org/docs/current/transaction-iso.html .)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'd just do room_servers.get(room_id) and add a check for None, to save having to worry about it.
Are you suggesting we ignore this row if we get a None
here, or loudly log a warning?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think we should ignore rows that are returned from partial_state_rooms_servers
and not partial_state_rooms
. (Which will give the same effect as we would see if the transaction was properly isolated).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Many thanks. This is 9e9be2b
synapse/storage/schema/main/delta/73/09partial_joined_via_destination.sql
Outdated
Show resolved
Hide resolved
# TODO(faster joins): To make this robust we should run both SELECTs in the | ||
# same transaction. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Note that a transaction only helps you at a REPEATABLE READ
(or higher) isolation level (which, in fairness, we do use by default, but every so often we talk about reducing it).
I'd just do room_servers.get(room_id)
and add a check for None
, to save having to worry about it.
synapse/handlers/federation.py
Outdated
If an `initial_destination` is given, it takes top priority. Otherwise | ||
all servers are treated equally. | ||
""" | ||
if initial_destination is not None: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
suggest inverting this condition for simplicity:
if initial_destination is not None: | |
if initial_destination is None: | |
return other_destinations |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
synapse/handlers/federation.py
Outdated
@@ -1617,7 +1617,7 @@ async def _resume_sync_partial_state_room(self) -> None: | |||
async def _sync_partial_state_room( | |||
self, | |||
initial_destination: Optional[str], | |||
other_destinations: Collection[str], | |||
other_destinations: Sequence[str], |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
can you explain why this is needed? _prioritise_destinations_for_partial_state_resync
takes a Collection
, not a Sequence
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this might have been a left-over from a WIP version before I pulled out the helper. (Or maybe I was trying to express "we preserve the iteration order of other_destinations
"? 🤷 )
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
See 1ccf9b8, in any case
This reverts commit d05d53d.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
lgtm
Fixes #12999. See #12999 (comment) and the comments below it for context.
Should be commitwise reviewable.