Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Notify batch notifier for actual sequence number because it could be blocking it #5868

Merged
merged 1 commit into from
Nov 4, 2022

Conversation

sadhansood
Copy link
Contributor

@sadhansood sadhansood commented Nov 4, 2022

Authority notifier gets stuck if the previous attempt of commiting a certificate sequenced the transaction but failed to commit. In this case, we should notify both previous and old sequence.
Also, if we failed to lock a sequence number and failed to commit, we should notify the ticket as well

Comment on lines 861 to 897
if seq < ticket_seq {
debug!("Notifying during retry, current low watermark {:?}, ticket_seq {:?}, seq {:?}", self.batch_notifier.low_watermark(), ticket_seq, seq);
self.batch_notifier.notify(seq);
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

the notifier_ticket.notify(); in fn commit_certificate does not work, is it because it only notifies the ticket_seq but not the old one (seq) ?
Is it better to move the notify-old-seq logic to the end of fn commit_certificate where notifier_ticket.notify(); happens?

https://github.com/MystenLabs/sui/blob/qd-retry-locked-objs/crates/sui-core/src/authority.rs#L2127

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@longbowlu lmk if you strongly feel about moving this check in commit_certificate but i like the fact that all tricky batch notifier checks are in one place here

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

no strong opinions

@@ -851,10 +851,19 @@ impl AuthorityState {
} else {
error!(?digest, "commit_certificate failed: {}", err);
Copy link
Collaborator

@longbowlu longbowlu Nov 4, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can we log seq here

error!(?digest, seq?=ticket_seq, "commit_certificate failed: {}", err);

same thing for the above debug!

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

sure, done

Comment on lines 854 to 856
debug!(
"Failed to notify ticket with sequence number: {}",
ticket_seq
);
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
debug!(
"Failed to notify ticket with sequence number: {}",
ticket_seq
);
debug!(
seq=?ticket_seq,
"Ticket not notified due to commit failure",
);

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done

Copy link
Contributor

@lxfind lxfind left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The probability is probably low, but you could have the following scenario:

  1. The first run failed, so we have a ticket that's not released
  2. The second run failed again, and we have another ticket that's not released.
  3. The third time succeeded, but it will only unlock the first one, not the second.

@sadhansood sadhansood force-pushed the sadhan/fix_notifier branch 2 times, most recently from 3036428 to 1d8f870 Compare November 4, 2022 22:06
@sadhansood
Copy link
Contributor Author

The probability is probably low, but you could have the following scenario:

  1. The first run failed, so we have a ticket that's not released
  2. The second run failed again, and we have another ticket that's not released.
  3. The third time succeeded, but it will only unlock the first one, not the second.

Good point! I think I handled your comments @lxfind, lmk what you think

"Ticket not notified due to commit failure",
);
// Check if we were able to sequence the tx at all
if let Some(tx_seq) = self.db().get_tx_sequence(*certificate.digest()).await? {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am probably being picky, but this line could return Err as well..

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am ok with this going in though, as the chance should be really really really low. We need to start somewhere and this is a big improvement already

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah I agree, i thought of it too and wanted to add a few retries there to handle db read failing. Let me do it in a follow up PR, and we can let this go for now

Copy link
Contributor

@lxfind lxfind left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe add a TODO

@sadhansood sadhansood merged commit 8d105bb into main Nov 4, 2022
@sadhansood sadhansood deleted the sadhan/fix_notifier branch November 4, 2022 22:54
sadhansood added a commit that referenced this pull request Nov 4, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants