Updates static assertions for CLUSTER_NODES_CACHES_NUM_EPOCH_CAP#8599
Conversation
| // 1. There must be at least two epochs because near an epoch boundary you might receive | ||
| // shreds from the other side of the epoch boundary. | ||
| // 2. It does not make sense to have capacity more than the number of epoch-stakes in Bank. | ||
| assert!(CLUSTER_NODES_CACHE_NUM_EPOCH_CAP >= 2); |
There was a problem hiding this comment.
Is 2 sufficient here? The reasons cited in the linked discussion include warping, which shouldn't be an issue since all other nodes would have to warp to the same slot as well (and thus same epoch). This is really only used in ledger-tool and test-validator too.
The other reason was unknown unknowns. Should that suggest that we use a different value?
There was a problem hiding this comment.
Unknown unknowns are not ideal.
Broadcast stage only operates on cluster nodes at the tip of the chain (+-32 slots), so 2 epochs is 100% enough there.
From what we have in ClusterSlotsService, anything > 50000 slots in the past is effectively impossible to repair, and thus 2 epochs should always be enough for repair. (Yes, during startup of the cluster epochs are shorter but then we are also unlikely to be catching up from >1 epoch in the past in those cases anyway).
Gossip only operates with latest available stake weights, and can tolerate massive divergence in stake amounts.
Overall, from networking PoV we should never need more than 2 epochs worth of information. If we do, its a bug we need to fix.
There was a problem hiding this comment.
Generally speaking, 2 seems sufficient.
I could see some corner cases when we shrink epochs down to the minimum number of slots - especially during bootstrap period. But my understanding is this shouldn't fail - it would just thrash the LRU cache.
Might be nice to add some metrics in this area to get visibility into thrashing (doesn't have to be this PR)
|
@bw-solana - requesting your review since you reviewed the PR that I grabbed the comments/values from: #1735 Please request reviews from other people as needed. Thanks! |
Codecov Report✅ All modified and coverable lines are covered by tests. Additional details and impacted files@@ Coverage Diff @@
## master #8599 +/- ##
=======================================
Coverage 83.1% 83.1%
=======================================
Files 850 850
Lines 369043 369043
=======================================
+ Hits 306926 306959 +33
+ Misses 62117 62084 -33 🚀 New features to boost your workflow:
|
|
From the network protocol PoV we should never need >2 epochs of ClusterNodes (current + upcoming). |
My understanding is that each epoch of the epoch stakes cache contains the pubkey and delegation from every single stake account. There's about 1.175 million stake accounts on mnb, and the epoch stakes cache holds 6 epochs in v2.3 and v3.0. So over 7 million entries in each snapshot. |
This sounds very excessive. Either way storing 6 epochs should be overkill, I'd be surprised if we need more than 2, but we can be conservative and go down to 3. Either way this needs to be merged before =) |
Oh yes 😸
🤝 |
Problem
We'd like to reduce the number of epochs in
MAX_LEADER_SCHEDULE_STAKES, but this constant is required to be a specific value byCLUSTER_NODES_CACHES_NUM_EPOCH_CAPin retransmit/broadcast.(Why do we want to reduce the number of epochs in
MAX_LEADER_SCHEDULE_STAKES? Good question! This is because the epoch stakes cache is serialized into the snapshot, and is by far the largest component—both in size and time. We don't need all the epochs either.Note that
MAX_LEADER_SCHEDULE_STAKES = 5was arbitrarily chosen. Here's its origin: solana-labs#7668 (comment).Also note that we used to actually store 6 epochs in the cache. That was changed to 5 in #8584.)
Summary of Changes
Update the static assertions for
CLUSTER_NODES_CACHES_NUM_EPOCH_CAPto not require a specific value forMAX_LEADER_SCHEDULE_STAKES. Instead, assert the range in which it is valid.