Use poh grace ticks when new reset bank is pending#794
Use poh grace ticks when new reset bank is pending#794mergify[bot] merged 3 commits intoanza-xyz:masterfrom
Conversation
|
Backports to the beta branch are to be avoided unless absolutely necessary for fixing bugs, security issues, and perf regressions. Changes intended for backport should be structured such that a minimum effective diff can be committed separately from any refactoring, plumbing, cleanup, etc that are not strictly necessary to achieve the goal. Any of the latter should go only into master and ride the normal stabilization schedule. Exceptions include CI/metrics changes, CLI improvements and documentation updates on a case by case basis. |
|
I would like to backport to v1.18 because I believe it will have a notable impact on skip rate and I would like to see it get rolled out on testnet before mainnet |
275fa30 to
43df692
Compare
Codecov ReportAll modified and coverable lines are covered by tests ✅
Additional details and impacted files@@ Coverage Diff @@
## master #794 +/- ##
=======================================
Coverage 81.9% 81.9%
=======================================
Files 851 851
Lines 231504 231542 +38
=======================================
+ Hits 189731 189796 +65
+ Misses 41773 41746 -27 |
|
This is an interesting PR, implements a similar idea to yielding to slow leaders that @AshwinSekar and I had been discussing in previous weeks. Wondering how we can experimentally verify this causes fewer forks. Can we
We then compare the metric from 1 between validators with 2 turned on and off, how many got their slots included vs not included into the main fork. |
|
@carllin I added a metric which reports the leader slot if we detect a pending fork and whether or not we decided to yield for some grace ticks. We can then check whether those reported leader slots were confirmed or not. |
d49e98c to
971380e
Compare
|
If we're experimenting can we make the flag hidden, at least for the backport to 1.18? It seems a couple staked validators against mainnet would suffice to test this out? |
|
@AshwinSekar made it hidden, can I get another approval? |
* Use poh grace ticks when new reset bank is pending * feedback * make it hidden (cherry picked from commit 1c1b4c3) # Conflicts: # core/src/validator.rs # local-cluster/src/validator_configs.rs # validator/src/main.rs
* Use poh grace ticks when new reset bank is pending * feedback * make it hidden
Problem
I've observed that a lot of skipped slots are caused when a leader "A" tries to skip the previous leader "B" when producing its blocks but the previous leader had actually already started broadcasting shreds for its blocks. In this case, there's a race between the forks of leader A and B and it's almost certain that leader B's fork will be confirmed because its block will likely be finished and replayed by the cluster before leader A's block that only just started being produced.
Summary of Changes
Currently when leader A skips all of leader B's blocks, it doesn't use grace ticks when deciding when to start building its block. After this PR, when a leader skips all of the previous leader's blocks and has ticked to its leader slot (without grace ticks) it will first check if it has received any shreds for a potential new reset bank. If it has received some shreds, it will apply grace ticks to wait a bit longer to allow time for the corresponding bank for those shreds to be frozen (or marked dead). If it hasn't received any shreds, it can go ahead with producing its block without waiting for grace ticks as before.
--delay-leader-block-for-pending-forkwhich allows validators to opt into this new behavior.poh_recorder-detected_pending_forkwhich reports when a pending fork was detected and whether or not the validator yieldedFixes #