Skip to content

Updates static assertions for CLUSTER_NODES_CACHES_NUM_EPOCH_CAP#8599

Merged
brooksprumo merged 1 commit intoanza-xyz:masterfrom
brooksprumo:leader-schedule-stakes/network
Oct 22, 2025
Merged

Updates static assertions for CLUSTER_NODES_CACHES_NUM_EPOCH_CAP#8599
brooksprumo merged 1 commit intoanza-xyz:masterfrom
brooksprumo:leader-schedule-stakes/network

Conversation

@brooksprumo
Copy link
Copy Markdown

Problem

We'd like to reduce the number of epochs in MAX_LEADER_SCHEDULE_STAKES, but this constant is required to be a specific value by CLUSTER_NODES_CACHES_NUM_EPOCH_CAP in retransmit/broadcast.

(Why do we want to reduce the number of epochs in MAX_LEADER_SCHEDULE_STAKES? Good question! This is because the epoch stakes cache is serialized into the snapshot, and is by far the largest component—both in size and time. We don't need all the epochs either.

Note that MAX_LEADER_SCHEDULE_STAKES = 5 was arbitrarily chosen. Here's its origin: solana-labs#7668 (comment).

Also note that we used to actually store 6 epochs in the cache. That was changed to 5 in #8584.)

Summary of Changes

Update the static assertions for CLUSTER_NODES_CACHES_NUM_EPOCH_CAP to not require a specific value for MAX_LEADER_SCHEDULE_STAKES. Instead, assert the range in which it is valid.

@brooksprumo brooksprumo self-assigned this Oct 21, 2025
// 1. There must be at least two epochs because near an epoch boundary you might receive
// shreds from the other side of the epoch boundary.
// 2. It does not make sense to have capacity more than the number of epoch-stakes in Bank.
assert!(CLUSTER_NODES_CACHE_NUM_EPOCH_CAP >= 2);
Copy link
Copy Markdown
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is 2 sufficient here? The reasons cited in the linked discussion include warping, which shouldn't be an issue since all other nodes would have to warp to the same slot as well (and thus same epoch). This is really only used in ledger-tool and test-validator too.

The other reason was unknown unknowns. Should that suggest that we use a different value?

Copy link
Copy Markdown

@alexpyattaev alexpyattaev Oct 21, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Unknown unknowns are not ideal.

Broadcast stage only operates on cluster nodes at the tip of the chain (+-32 slots), so 2 epochs is 100% enough there.

From what we have in ClusterSlotsService, anything > 50000 slots in the past is effectively impossible to repair, and thus 2 epochs should always be enough for repair. (Yes, during startup of the cluster epochs are shorter but then we are also unlikely to be catching up from >1 epoch in the past in those cases anyway).

Gossip only operates with latest available stake weights, and can tolerate massive divergence in stake amounts.

Overall, from networking PoV we should never need more than 2 epochs worth of information. If we do, its a bug we need to fix.

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Generally speaking, 2 seems sufficient.

I could see some corner cases when we shrink epochs down to the minimum number of slots - especially during bootstrap period. But my understanding is this shouldn't fail - it would just thrash the LRU cache.

Might be nice to add some metrics in this area to get visibility into thrashing (doesn't have to be this PR)

@brooksprumo brooksprumo marked this pull request as ready for review October 21, 2025 13:26
@brooksprumo brooksprumo requested a review from a team as a code owner October 21, 2025 13:26
@brooksprumo
Copy link
Copy Markdown
Author

@bw-solana - requesting your review since you reviewed the PR that I grabbed the comments/values from: #1735
@gregcusack - requesting your review as SME for networking

Please request reviews from other people as needed. Thanks!

@codecov-commenter
Copy link
Copy Markdown

Codecov Report

✅ All modified and coverable lines are covered by tests.
✅ Project coverage is 83.1%. Comparing base (e6661f4) to head (950473a).
⚠️ Report is 3 commits behind head on master.

Additional details and impacted files
@@           Coverage Diff           @@
##           master    #8599   +/-   ##
=======================================
  Coverage    83.1%    83.1%           
=======================================
  Files         850      850           
  Lines      369043   369043           
=======================================
+ Hits       306926   306959   +33     
+ Misses      62117    62084   -33     
🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

@alexpyattaev
Copy link
Copy Markdown

From the network protocol PoV we should never need >2 epochs of ClusterNodes (current + upcoming).
Can you elaborate how epoch stakes cache ends up occupying a lot of space? It seems like we are storing 1 hashmap of ~2000 entries per epoch, no?

@brooksprumo
Copy link
Copy Markdown
Author

Can you elaborate how epoch stakes cache ends up occupying a lot of space?

My understanding is that each epoch of the epoch stakes cache contains the pubkey and delegation from every single stake account.

There's about 1.175 million stake accounts on mnb, and the epoch stakes cache holds 6 epochs in v2.3 and v3.0. So over 7 million entries in each snapshot.

@alexpyattaev
Copy link
Copy Markdown

Can you elaborate how epoch stakes cache ends up occupying a lot of space?

My understanding is that each epoch of the epoch stakes cache contains the pubkey and delegation from every single stake account.

There's about 1.175 million stake accounts on mnb, and the epoch stakes cache holds 6 epochs in v2.3 and v3.0. So over 7 million entries in each snapshot.

This sounds very excessive. Either way storing 6 epochs should be overkill, I'd be surprised if we need more than 2, but we can be conservative and go down to 3. Either way this needs to be merged before =)

Copy link
Copy Markdown

@alexpyattaev alexpyattaev left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@brooksprumo
Copy link
Copy Markdown
Author

This sounds very excessive.

Oh yes 😸

Either way storing 6 epochs should be overkill, I'd be surprised if we need more than 2, but we can be conservative and go down to 3. Either way this needs to be merged before =)

🤝

@brooksprumo brooksprumo added this pull request to the merge queue Oct 22, 2025
Merged via the queue into anza-xyz:master with commit 6f91f67 Oct 22, 2025
43 checks passed
@brooksprumo brooksprumo deleted the leader-schedule-stakes/network branch October 22, 2025 13:55
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants