-
Notifications
You must be signed in to change notification settings - Fork 11.4k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Notify batch notifier for actual sequence number because it could be blocking it #5868
Conversation
crates/sui-core/src/authority.rs
Outdated
if seq < ticket_seq { | ||
debug!("Notifying during retry, current low watermark {:?}, ticket_seq {:?}, seq {:?}", self.batch_notifier.low_watermark(), ticket_seq, seq); | ||
self.batch_notifier.notify(seq); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
the notifier_ticket.notify();
in fn commit_certificate
does not work, is it because it only notifies the ticket_seq
but not the old one (seq
) ?
Is it better to move the notify-old-seq logic to the end of fn commit_certificate
where notifier_ticket.notify();
happens?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@longbowlu lmk if you strongly feel about moving this check in commit_certificate
but i like the fact that all tricky batch notifier checks are in one place here
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
no strong opinions
crates/sui-core/src/authority.rs
Outdated
@@ -851,10 +851,19 @@ impl AuthorityState { | |||
} else { | |||
error!(?digest, "commit_certificate failed: {}", err); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
can we log seq
here
error!(?digest, seq?=ticket_seq, "commit_certificate failed: {}", err);
same thing for the above debug!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
sure, done
crates/sui-core/src/authority.rs
Outdated
debug!( | ||
"Failed to notify ticket with sequence number: {}", | ||
ticket_seq | ||
); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
debug!( | |
"Failed to notify ticket with sequence number: {}", | |
ticket_seq | |
); | |
debug!( | |
seq=?ticket_seq, | |
"Ticket not notified due to commit failure", | |
); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
done
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The probability is probably low, but you could have the following scenario:
- The first run failed, so we have a ticket that's not released
- The second run failed again, and we have another ticket that's not released.
- The third time succeeded, but it will only unlock the first one, not the second.
3036428
to
1d8f870
Compare
Good point! I think I handled your comments @lxfind, lmk what you think |
crates/sui-core/src/authority.rs
Outdated
"Ticket not notified due to commit failure", | ||
); | ||
// Check if we were able to sequence the tx at all | ||
if let Some(tx_seq) = self.db().get_tx_sequence(*certificate.digest()).await? { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I am probably being picky, but this line could return Err as well..
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I am ok with this going in though, as the chance should be really really really low. We need to start somewhere and this is a big improvement already
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah I agree, i thought of it too and wanted to add a few retries there to handle db read failing. Let me do it in a follow up PR, and we can let this go for now
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe add a TODO
1d8f870
to
28887bc
Compare
Authority notifier gets stuck if the previous attempt of commiting a certificate sequenced the transaction but failed to commit. In this case, we should notify both previous and old sequence.
Also, if we failed to lock a sequence number and failed to commit, we should notify the ticket as well