This repository has been archived by the owner on Apr 26, 2024. It is now read-only.
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Catch-up after Federation Outage #8096
Catch-up after Federation Outage #8096
Changes from all commits
18d900e
07a415c
5bc321a
c60e259
20c896a
d232200
967d8c1
d910798
74a6f4f
c1b32ae
5de9313
759e027
6c52666
9ba56cb
af13948
558af38
44765e9
56aaa17
84dbc43
2c740a7
d77e444
33874d4
c1a2b68
16eec5c
92517e9
ef4680d
de5caf0
3e308f9
b0bdadd
843403f
ad7124d
b1fd67b
e6890c7
7cfecf3
bf51d2f
7589a03
8d9f4ba
b60ad35
File filter
Filter by extension
Conversations
Jump to
There are no files selected for viewing
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
when would this happen? I'm struggling to imagine how we could end up here with an event with a lower stream ordering than the catch-up max.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In a (potentially hypothetical?) race condition.
I'm not sure if ^ has an opportunity to race or not; but even if it doesn't now, what about if someone innocently comes along and plops an await in there (at the X)?
The uncertainty made me feel like I should try and make this robust and 'just deal with it'.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
N.B. needs scrutiny about whether that needs to check we actually have
self._catch_up
🤔There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
it seems to me that we shouldn't add new events to the queue while we are catching up?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
As you note later, I set a high-water-mark. Anything after this watermark should be handled in the normal way.
(Part of this is also to reduce the use of
/get_missing_events
for when we think the remote is online.)If we are backing off for a long time, it will clear the queue and high-water-mark when it attempts a transaction on the next line.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
question: why do we need a high-water-mark at all? why not just keep going until
get_catch_up_room_event_ids
returns an empty list?There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
(doing so without care will introduce races though...)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I suppose we could do that if you are keen — as you say though, needs more thought.
The advantage of this approach is that it's easy to keep the logic in my head. Was keen not to give a chance for any nasty bugs to crawl in, because it'd be nice to have confidence in this.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
as noted elsewhere: I think the
order
is redundant: it might be easier to get rid of it (in a separate PR).There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'll put it on my list, then :)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
please can you take these fixes to a separate PR?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'd kinda like to see the logic shuffled so that this can be a foreign key; however at the very least it needs a comment saying why it can't be one.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sure, would it suffice to
INSERT IGNORE
a NULL row into the destinations table when upserting adestination_rooms
row?There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
probably. It would be nice to avoid doing so on every PDU (for example by doing it when we first create a
PerDestinationQueue
for the destination), but that might be fiddly, and is something we can do later.There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ON DELETE CASCADE
sounds scary. Why do we do that?