LeaderBankNotifier #30395

apfitzge · 2023-02-17T22:39:56Z

Problem

Currently there's not a great way to wait for a new leader slot to start or end, without busy waiting. Often a leader has multiple (4) back-to-back slots, and will end up with txs hitting PohRecorderError:MaxHeightReached on transactions during the transition between bank N and bank N+1. Instead of just running a fast loop for the next slot to begin, we can use a Condvar and notify other threads when the slot begins/ends.

Summary of Changes

New struct LeaderBankNotifier
- Locked state-machine for slot transitions and notifies on change to InProgress or StandBy
- Also stores weak reference to the most recent leader bank
Set InProgress state when we set the working bank in PohRecorder.
Set StandBy state when the working bank is cleared in PohRecorder.

This also enables us to have less busy threads in BankingStage - probably not going to rework existing BankingStage, but plan to use the functionality in scheduler "worker" threads.

An example of a slot ending notification being useful is in #30396

Fixes #

apfitzge · 2023-02-21T16:51:38Z

@ryoqun requesting an early review on this. You had a similar mechanism in your branch, which I had adapated for my own uses - using condvar to wake threads up when they have a valid bank to commit to.

LMK what you want/need for this so we can start using a common struct for it.

apfitzge · 2023-02-23T18:13:16Z

Thinking that an issue with the current impl is that when we are no longer the leader, we'd end up waiting forever on txs waiting for the bank to commit into.

Probably need to allow some short timeout for the wait_for_in_progress as well - maybe wait up to 50ms before we just stop processing txs in the execution thread(s).

ryoqun · 2023-03-01T14:43:25Z

@ryoqun requesting an early review on this. You had a similar mechanism in your branch, which I had adapated for my own uses - using condvar to wake threads up when they have a valid bank to commit to.

LMK what you want/need for this so we can start using a common struct for it.

hehe, yeah, i had similar struct called CommitStatus in my branch. Actually, I'm still not decided that parking worker threads is the best idea during the gap of bank freezing... alternatively, signal retry to scheduler thread, and let the scheduler thread process incoming txes in search of more higher prio tx while waiting for new tpu bank.

That said, the impl is itself looks good at quick glance. As for name, I'd prefer something which alludes blocking nature here. (so, i don't like CommitStatus too...)...

ryoqun · 2023-03-01T14:47:36Z

this is just fyi, but, as for TransactionRecorder, it's synchronization arrangement is too inefficient. namely, one shot channle isn't needed. I'm planning to fix it using this upstream change: crossbeam-rs/crossbeam#959

apfitzge · 2023-03-06T18:55:10Z

alternatively, signal retry to scheduler thread, and let the scheduler thread process incoming txes in search of more higher prio tx while waiting for new tpu bank.

I've tried signaling for retry, and what I found was that the scheduler scheduled a bunch of work, most of it was not processed and sent back, then it needed to be re-scheduled. That seemed inefficient and I ended up trying a few different versions of waiting for the next bank, though condvars seem most reliable.

Not convinced it's the 100% best way, but right now it seems to be better than other options I've tested and I'd like to move forward - "don't let perfect be the enemy of good" and all.
In terms of searching for a higher prio tx while we wait, and option there is we could potentially use skiplists instead of channels - that way we're essentially building dynamically updating queues for the threads. I have not tested this yet however.

apfitzge · 2023-03-07T17:21:48Z

this is just fyi, but, as for TransactionRecorder, it's synchronization arrangement is too inefficient. namely, one shot channle isn't needed. I'm planning to fix it using this upstream change: crossbeam-rs/crossbeam#959

Could you elaborate what you mean here? AFAICT TransactionRecorder doesn't use a oneshot right now as its' got a loop to receive potentially multiple messages. 100% agree its' inefficient though.

Wasn't entirely clear what you meant in terms of changes with that crossbeam PR - is your desired synchronization in one of your branches?

codecov · 2023-03-13T20:46:47Z

Codecov Report

Merging #30395 (974cf7e) into master (5a05e9b) will increase coverage by 0.0%.
The diff coverage is 98.1%.

@@           Coverage Diff            @@
##           master   #30395    +/-   ##
========================================
  Coverage    81.5%    81.5%            
========================================
  Files         726      727     +1     
  Lines      204809   204971   +162     
========================================
+ Hits       167027   167197   +170     
+ Misses      37782    37774     -8

ryoqun · 2023-03-15T12:03:36Z

this is just fyi, but, as for TransactionRecorder, it's synchronization arrangement is too inefficient. namely, one shot channle isn't needed. I'm planning to fix it using this upstream change: crossbeam-rs/crossbeam#959

Could you elaborate what you mean here? AFAICT TransactionRecorder doesn't use a oneshot right now as its' got a loop to receive potentially multiple messages. 100% agree its' inefficient though.

here, it's creating unbounded() per a batch. (i.e. per transaction for my unified scheduler.)

solana/poh/src/poh_recorder.rs

Lines 212 to 224 in 65cd552

    
           let (result_sender, result_receiver) = unbounded(); 
        
           let res = 
        
               self.record_sender 
        
                   .send(Record::new(mixin, transactions, bank_slot, result_sender)); 
        
           if res.is_err() { 
        
               // If the channel is dropped, then the validator is shutting down so return that we are hitting 
        
               //  the max tick height to stop transaction processing and flush any transactions in the pipeline. 
        
               return Err(PohRecorderError::MaxHeightReached); 
        
           } 
        
           // Besides validator exit, this timeout should primarily be seen to affect test execution environments where the various pieces can be shutdown abruptly 
        
           let mut is_exited = false; 
        
           loop { 
        
               let res = result_receiver.recv_timeout(Duration::from_millis(1000));

ryoqun · 2023-03-15T12:08:23Z

Wasn't entirely clear what you meant in terms of changes with that crossbeam PR - is your desired synchronization in one of your branches?

nope, it's only in mind mind and @behzadnouri 's (he was faster than me to report to crossbeam: crossbeam-rs/crossbeam#861 for this optimization opportunity) ... lol

anyway, it's straightforward: we can remove .recv_timeout entirely and FUTEX_WAKE in .send in the quoted code in my previous comment at worst case (it's likely to happen, considering poh idling nature; so there is no amortization fluff applied here).

so, we can elide 2 syscalls per batch at banking thread side and we can elide 1 syscall per batch at poh thread side. worse, those syscalls must be serialized.... 🤮 I already confirmed this is significant bottleneck according to my off cpu profiling: https://github.com/solana-labs/solana/wiki/General-Debugging#perf-based-profiling

ryoqun · 2023-03-15T12:09:10Z

The diff coverage is 98.8%.

btw, diff coverage is quite good. 💯

apfitzge · 2023-03-15T17:50:40Z

The diff coverage is 98.8%.

btw, diff coverage is quite good. 💯

I missed testing the timeout case for both of the functions. Added tests which should bump this to 100% - and actually found a bug where even in timeout wait_for_in_progress still returned a bank!

ryoqun · 2023-03-20T11:54:27Z

The diff coverage is 98.8%.

btw, diff coverage is quite good. 100

I missed testing the timeout case for both of the functions. Added tests which should bump this to 100% - and actually found a bug where even in timeout wait_for_in_progress still returned a bank!

see? codecov has some value. :)

poh/src/poh_recorder.rs

core/src/replay_stage.rs

poh/src/poh_recorder.rs

runtime/src/leader_bank_status.rs

…ropriately

apfitzge · 2023-03-24T16:19:06Z

@ryoqun, I had to rebase due to some dependency audit failures unrelated to the change, which screwed up some of the commit links I replied with...

In addition to addressing your comments I added an additional commit 42d778b to add appropriate assertions in set_completed since it is now only called by the recorder, similar to our previous assertions with set_in_progress

ryoqun · 2023-03-25T12:35:20Z

poh/src/poh_recorder.rs


-#[derive(Clone)]
 pub struct TransactionRecorder {
    // shared by all users of PohRecorder
    pub record_sender: Sender<Record>,
    pub is_exited: Arc<AtomicBool>,
 }

+impl Clone for TransactionRecorder {
+    fn clone(&self) -> Self {
+        TransactionRecorder::new(self.record_sender.clone(), self.is_exited.clone())
+    }
+}
+
 impl TransactionRecorder {
    pub fn new(record_sender: Sender<Record>, is_exited: Arc<AtomicBool>) -> Self {
        Self {
+            // shared
            record_sender,
+            // shared
            is_exited,
        }


hmm, seems rebase miss... manual Clone is revived unexpectedly...

🤦 nice catch. I manually "reverted" my original derive by re-adding these manually.
That's what I get for not using git properly!

Added a single commit to make your re-review simpler, which should do nothing once we squash everything on merge: b679dee

ryoqun

lgtm with nits.

super hard work here. really appreciate addressing all of my comments seriously.. :) Through that journey, I think the impl pivoting made sense.

At this point, I've thoroughly analyzed the code path and i think all of newly added panic sources are safe.

There's a few of nits still. but i think it's ok to merge as-is, unless you're so obsessed with cleanest code possible like me... xD

apfitzge · 2023-03-25T23:48:36Z

@ryoqun I pushed those 2 nits...

unless you're so obsessed with cleanest code possible like me... xD

I think you may have me pegged 😆

ryoqun

re-lgtm. thanks for the extra chores.

apfitzge mentioned this pull request Feb 17, 2023

Banking MetricsRecorder #30396

Closed

apfitzge requested a review from ryoqun February 21, 2023 16:49

apfitzge force-pushed the feature/leader_bank_status branch from 43ccdd2 to 81a60e4 Compare March 13, 2023 17:11

apfitzge marked this pull request as ready for review March 15, 2023 23:19