Remove pubkey from DecisionMaker#7077
Conversation
0b103d3 to
3e77501
Compare
Codecov ReportAttention: Patch coverage is
Additional details and impacted files@@ Coverage Diff @@
## master #7077 +/- ##
=======================================
Coverage 83.2% 83.2%
=======================================
Files 854 854
Lines 374686 374613 -73
=======================================
- Hits 312083 312025 -58
+ Misses 62603 62588 -15 🚀 New features to boost your workflow:
|
bw-solana
left a comment
There was a problem hiding this comment.
LGTM, but I'm too stupid to understand why we ever needed the old code.. so want to make sure I'm not missing something
| @@ -101,17 +94,9 @@ impl DecisionMaker { | |||
| // Node will be leader within ~20 slots, hold the transactions in | |||
| // case it is the only node which produces an accepted slot. | |||
| BufferedPacketsDecision::ForwardAndHold | |||
There was a problem hiding this comment.
I'm not understanding the case where we would even hit the Hold case for the old logic.
- If we didn't know the leader (because we don't have the leader schedule), but in reality we generate the schedule so far in advance relative to how far ahead we look that this seems impossible outside of weird, contrived test cases.
- If we somehow don't think we will be leader within 20 slots based on ticks but also think we will be leader in 2 slots based on the leader schedule. Is there even a corner case where this could happen?
There was a problem hiding this comment.
Want to be clear, that we hit Hold very often. But there are 3 different cases in which the old logic could have resulted in Hold:
would_be_leader_shortly_fnreturns true:- This is the one we hit a lot.
- We are within 2 slots of being leader
- This also includes when we are currently leader, but don't yet a Bank to operate on.
leader_pubkey_fnreturnsSome(x)wherexIS the nodes ID pubkey (at start up)- We search 2 slots into the future in cached leader schedule. If it IS us, then we return
Hold. BUT this can only happen if our POH has an inconsistent view of who is leader in slot N+2 from the leader-schedule cache. - In practice the
ifdoesn't even work. Many operators have failover setup so that when they are starting up they are in a non-voting mode with a dummy key...since we grab it at start-up, we're comparing against the wrong key in this case anyway.
- We search 2 slots into the future in cached leader schedule. If it IS us, then we return
leader_pubkey_fnreturnsNone- this shouldn't ever happen because we generate leader schedules ahead of time and only look a small offset into the future (2 slots)
There was a problem hiding this comment.
yeah, sorry, 1 makes perfect sense. I was referring to the cases we're removing (2 and 3)
There was a problem hiding this comment.
Right, and to summarize my comment - 2 and 3 should NOT happen unless there are other bugs, to the best of my knowledge.
tao-stones
left a comment
There was a problem hiding this comment.
looks good, thank for the explanations.
|
This is non-urgent, gonna wait for @jstarry's review since he is good at finding edge-cases in my understanding of poh. |
Problem
DecisionMakercurrently takes in aPubkeywhich should be the ID of the nodeDecisionMakerwould likely failDecisionMakerlooks very shortly into the future, 2 slots. We should not be in a situation where there is not anEpochSchedulein the cache for that slotSummary of Changes
DecisionMakerFixes #