Skip to content

PohRecorder: SharedWorkingBank#7280

Merged
apfitzge merged 10 commits intoanza-xyz:masterfrom
apfitzge:arc-swap-leader-bank
Aug 5, 2025
Merged

PohRecorder: SharedWorkingBank#7280
apfitzge merged 10 commits intoanza-xyz:masterfrom
apfitzge:arc-swap-leader-bank

Conversation

@apfitzge
Copy link
Copy Markdown

@apfitzge apfitzge commented Aug 1, 2025

Problem

  • LeaderBankNotifier has an inner lock that is seeing contention
  • We don't need to lock poh recorder to get the working bank

Summary of Changes

  • Add SharedWorkingBank to PohRecorder
    • arc-swappable bank controlled by the recorder
    • can be shared on other threads, in the wrapper that only gives read-access
  • Delete LeaderBankNotifier

Fixes #

@apfitzge apfitzge changed the title add poh arc-swap dependency PohRecorder: SharedWorkingBank Aug 1, 2025
.upgrade()
/// Get the current poh working bank with a timeout.
fn get_working_bank_with_timeout(&self) -> Option<Arc<Bank>> {
const TIMEOUT: Duration = Duration::from_millis(50);
Copy link
Copy Markdown
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

kept the timeout the same in this PR. I think we may want to remove this in the near future.

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

for my own understanding since this is the first thing that jumped out: when/why
did we need such a long timeout?

Copy link
Copy Markdown
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

originally this was here because txs can (still can unless #7170 lands) be "leaked" over to the next bank.
Bank::new_from_parent can take unexpectedly long times during some cases.

I think if #7170 lands, we can just remove any timeout - either we have the bank or we don't.

@apfitzge
Copy link
Copy Markdown
Author

apfitzge commented Aug 1, 2025

@alessandrod once merged we can use this in DecisionMaker to remove the poh lock optimistically when we are leader.

We'll need some additional changes to remove the lock to get time until leader when we're not leader. I think that's also quite straight-forward though!

Comment thread poh/src/poh_recorder.rs
Comment on lines 178 to +180
tick_cache: Vec<(Entry, u64)>, // cache of entry and its tick_height
working_bank: Option<WorkingBank>,
working_bank_sender: Sender<WorkingBankEntry>,
shared_working_bank: SharedWorkingBank,
Copy link
Copy Markdown
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I split this from working_bank because of unified-scheduler type messyness. Would have had like 3 or 4 levels of Arc indirection and would have required splitting the bank from the other working bank meta anyway.

This was the easier path.

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this seems fine, but maybe add a comment explaining that working_bank and
shared_working_bank are the same thing? Just in the (unlikely) case someone goes
and reworks working bank and doesn't notice the shared one or something

@codecov-commenter
Copy link
Copy Markdown

codecov-commenter commented Aug 1, 2025

Codecov Report

✅ All modified and coverable lines are covered by tests.
✅ Project coverage is 82.8%. Comparing base (87769e8) to head (84a3616).
⚠️ Report is 41 commits behind head on master.

Additional details and impacted files
@@            Coverage Diff            @@
##           master    #7280     +/-   ##
=========================================
- Coverage    82.8%    82.8%   -0.1%     
=========================================
  Files         803      802      -1     
  Lines      364439   364255    -184     
=========================================
- Hits       302080   301870    -210     
- Misses      62359    62385     +26     
🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

@apfitzge apfitzge marked this pull request as ready for review August 2, 2025 00:58
alessandrod
alessandrod previously approved these changes Aug 5, 2025
Copy link
Copy Markdown

@alessandrod alessandrod left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

lgtm, just a couple nits

.upgrade()
/// Get the current poh working bank with a timeout.
fn get_working_bank_with_timeout(&self) -> Option<Arc<Bank>> {
const TIMEOUT: Duration = Duration::from_millis(50);
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

for my own understanding since this is the first thing that jumped out: when/why
did we need such a long timeout?

Comment thread poh/src/poh_recorder.rs Outdated
tick_cache: vec![],
working_bank: None,
working_bank_sender,
shared_working_bank: SharedWorkingBank::default(),
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: I'd rather have this be SharedWorkingBank::empty() or something

IMO default is an anti pattern when the default isn't obvious, and in this case
it isn't clear what a default SharedWorkingBank is

Copy link
Copy Markdown
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

.get_or_wait_for_in_progress(Duration::from_millis(50))
.upgrade()
/// Get the current poh working bank with a timeout.
fn get_working_bank_with_timeout(&self) -> Option<Arc<Bank>> {
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: we must kill the get_ prefixes at some point (thing(), set_thing() is
idiomatic rust)

Copy link
Copy Markdown
Author

@apfitzge apfitzge Aug 5, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Comment thread poh/src/poh_recorder.rs
Comment on lines 178 to +180
tick_cache: Vec<(Entry, u64)>, // cache of entry and its tick_height
working_bank: Option<WorkingBank>,
working_bank_sender: Sender<WorkingBankEntry>,
shared_working_bank: SharedWorkingBank,
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this seems fine, but maybe add a comment explaining that working_bank and
shared_working_bank are the same thing? Just in the (unlikely) case someone goes
and reworks working bank and doesn't notice the shared one or something

@apfitzge apfitzge merged commit dc0f51a into anza-xyz:master Aug 5, 2025
52 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants