RFC | Don’t apply throttling unless a specific load threshold is reached.#9493
Closed
stablebits wants to merge 5 commits intoanza-xyz:masterfrom
Closed
RFC | Don’t apply throttling unless a specific load threshold is reached.#9493stablebits wants to merge 5 commits intoanza-xyz:masterfrom
stablebits wants to merge 5 commits intoanza-xyz:masterfrom
Conversation
The approach taken here is simply not to apply any throttling until a configurable load threshold has been reached. For the v3.1.x fix, it's probably safer to use a minimal change that does not remove (yet) the curernt throttling entirely. For v4.0.0, we can replace it with a better mechanism.
Simulations with the current EMA load mechanism (stream_throttle.rs) show that staked connections with very low stake (e.g., ~0.01% of total stake) receive streams-per-100ms quotas that are similar to unstaked connections even in no-load scenarios.
Example:
step load_in_5ms ema quota_0.01% quota_0.1% quota_1%
0 0 0 21 160 1600
1 3000 544 21 160 1600
2 1000 626 21 160 1600
Data collected on mds1 (over a few leader slots) also showed these low-stake connections being throttled:
[2025-12-04T22:56:59.929547468Z ERROR solana_streamer::nonblocking::stream_throttle] Throttling tpu stream from 3.66.188.50:8016, peer type: Staked(30314578869242), current_load: 11, total_stake: 415746706271632896, max_streams_per_interval: 28, read_interval_streams: 28 throttle_duration: 99.948899ms
In all observed cases, the effective load was basically 0 (3–25 streams per 5ms) while affected connections had quotas of 28–64 streams per 100ms,
and stakes of ~0.007–0.016% of total stake.
With the change in this PR, mds1 has been running for multiple days without any staked connections being throttled.
Codecov Report❌ Patch coverage is Additional details and impacted files@@ Coverage Diff @@
## master #9493 +/- ##
=========================================
- Coverage 82.5% 82.5% -0.1%
=========================================
Files 901 901
Lines 323326 323367 +41
=========================================
+ Hits 267058 267060 +2
- Misses 56268 56307 +39 🚀 New features to boost your workflow:
|
(ema-window) to be properly used in available_throttled_load_capacity(). The old implementation was giving quotas incorrectly (larger than expected). Set the no-throttling threshold to 90%.
alexpyattaev
left a comment
There was a problem hiding this comment.
I think we can make diff smaller.
|
|
||
| // Unstaked nodes must contribute to the EMA load for this threshold to be meaningful. | ||
| // See increment_load(). | ||
| const UNSTAKED_STREAM_THROTTLING_LOAD_THRESHOLD_PERCENT: u64 = 0; |
There was a problem hiding this comment.
I think we should not introduce a const if it does nothing. We want to keep diff minimal for backport.
Author
|
reworked approach is #9580; closing this one |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Problem
Staked connections are being throttled on unloaded systems.
Summary of Changes
For the v3.1.x fix, it's probably safer to use a minimal change that does not remove (yet) the current throttling entirely. For v4.0.0, we'll replace it with a better mechanism.
The approach taken here is simply not to apply any throttling until a configurable load threshold has been reached.
Open Questions
Details
Simulations with the current EMA load mechanism (stream_throttle.rs) show that staked connections with very low stake (e.g., ~0.01% of total stake) receive streams-per-100ms quotas that are similar to unstaked connections even in no-load scenarios.
Data collected on mds1 (over a few leader slots) also showed these low-stake connections being throttled:
In all observed cases, the effective load was basically 0 (3–25 streams per 5ms) while affected connections had quotas of 28–64 streams per 100ms, and stakes of ~0.007–0.016% of total stake.
With the change in this PR, mds1 has been running for multiple days without any staked connections being throttled (since Dec 6)
Simulation of the new approach:
Note that throttling is applied after the load is above 10K (50% of 20K-per-5ms => 500K TPS target configuration in this case).