Don't apply throttling until a configurable load threshold is reached.#9580
Conversation
Codecov Report✅ All modified and coverable lines are covered by tests. Additional details and impacted files@@ Coverage Diff @@
## master #9580 +/- ##
=========================================
- Coverage 82.5% 82.5% -0.1%
=========================================
Files 844 844
Lines 316757 316679 -78
=========================================
- Hits 261599 261499 -100
- Misses 55158 55180 +22 🚀 New features to boost your workflow:
|
|
Is it possible to have a plot for quotas with old/new method? |
alexpyattaev
left a comment
There was a problem hiding this comment.
Can this work without hysteresis? If so, worth exploring to reduce complexity. Also a few small nits.
gregcusack
left a comment
There was a problem hiding this comment.
Ideally, this EMA-based estimator with a configured TPS target should be replaced by backpressure from dedup/sigver/scheduler, so that TPU throttles/slows (prioritizing high-stake connections) based on that indicator instead of the current EMA.
However, the changes in this PR is arguably an improvement until this new mechanism is implemented.
ya i agree. if we don't make this change, than even with backpressure, low staked nodes are still going to get unfairly throttled
this PR looks good to me as long as the hysteresis discussion is resolved. (As mentioned, I think we should remove it for now).
removed hysteresis here: 1f75c55 |
alexpyattaev
left a comment
There was a problem hiding this comment.
Generally this LGTM, just some leftover mentions of hysteresis to clean up. We need to focus on making this suitable for a backport. I'd consider splitting off the changes to existing consts into separate PR based on previous reviews (to make sure we clearly understand their impact in isolation from what we are doing here). Might be a good time to ask backports team about how backportable does this look in general.
| // EMA smoothing window to make the load signal less sensitive to short bursts at the start | ||
| // of a leader slot and only trigger throttling if saturation is sustained. | ||
| // 40 was chosen based on simulations. See ema_function(). | ||
| const STREAM_LOAD_EMA_INTERVAL_COUNT: u64 = 40; |
There was a problem hiding this comment.
I believe the backports team would not appreciate us changing existing constants and backporting functional changes in the same PR.
There was a problem hiding this comment.
if we don't change this number do we get any benefit? based on dmitry's numbers above, we'll still throttle the crap out of low staked nodes if we leave N=10
There was a problem hiding this comment.
the ask is to split the changes, not to refrain from making them. backport are free, we can take two
| max_streams_per_ms: u64, | ||
| // These values are in streams/STREAM_LOAD_EMA_INTERVAL_MS. | ||
| staked_throttling_on_load_threshold: u64, | ||
| staked_throttling: AtomicBool, |
There was a problem hiding this comment.
| staked_throttling: AtomicBool, | |
| staked_throttling_enabled: AtomicBool, |
84b550f to
db2124d
Compare
There was a problem hiding this comment.
lgtm as well but need sme approval and agree with alex that we need some insight into how backportable this is. From dmitry's tests, it looks like STREAM_LOAD_EMA_INTERVAL_COUNT = 10 doesn't have any significant benefit. But not sure we can backport STREAM_LOAD_EMA_INTERVAL_COUNT = 40.
@t-nelson or @alessandrod what are your thoughts?
|
have we repeated tests on mds1 and confirmed this fixes the issues we observed? |
This is currently running on mds1 (has been for >week now, with a small gap saturday-monday this week). |
nice! Do we have updated numbers showing that low staked nodes are not getting throttled as much as they were before? Something like what Dmitry has at the top of the PR description under |
generally not opposed. this "ema" "code" has been a bug since its inception. that the line count is reducing can only be an improvement. that it is contained and the simulation results seem to match is a bonus try to get it in before friday though |
@alessandrod @gregcusack This is
The spikes in the middle correspond to the roll-back to the official release (security fixes). To the right of the spike is this branch after it's been synced with master (security fixes). |
Simulations with the existing EMA-based load metric (stream_throttle.rs) showed that very low-stake staked connections (~0.01% of total stake) could end up with streams-per-100ms quotas similar to unstaked connections even under near-zero load. Data collected on mds1 (mainnet) over a few leader slots also showed low-stake connections being throttled under effectively idle conditions: [2025-12-04T22:56:59.929547468Z ERROR solana_streamer::nonblocking::stream_throttle] Throttling tpu stream from 3.66.188.50:8016, peer type: Staked(30314578869242), current_load: 11, total_stake: 415746706271632896, max_streams_per_interval: 28, read_interval_streams: 28, throttle_duration: 99.948899ms In observed cases, effective load was near 0 (3–25 streams per 5ms) while affected connections had quotas of 28–64 streams per 100ms and stakes of ~0.007–0.016% of total stake. Also: - Fix update_ema() catch-up behavior so missed slots do not re-apply the same accumulated load. - available_load_capacity_in_throttling_duration() mixed load values in streams/5ms and streams/50ms. Replaced it with a simpler stake-only quota under load.
5125629 to
017a87d
Compare
|
@gregcusack reverted the The |
|
@gregcusack we run tests with @alexpyattaev today to confirm that throttling is activated under load. It does. The test: 3 clients (2 high and 1 low stake), each with 16 connections. RTT=~50ms. Max TPS is 400K for validator, so This PR alone (without the 10->40 change; 3sec run time): Throttling was activated: ema jumping from threshold to low and back suggests that all (or most) connections were throttled. With N=40 (30sec run time): Throttling is less frequent and TPS are higher. ema doesn't jump to extreme lows (suggesting that not all connections were throttled). Now, N=40 but same 3 clients with 8 connections (30 sec run time) -- less load overall. No throttling was observed in this case. The overall load (but mostly spikes/fluctuations) weren't sufficient to reach the threshold. |
|
Yes the tests were quite conclusive that the desired effect is achieved, i.e. staked connections can ignore throttling unless load is high enough. One correction the ping to the server was 30ms, not 50.
I guess you meant throughput here?
Which spikes? We had consistent full-buffer traffic generation on our clients, as you recall from the other test under CUBIC CC there are no load spikes. |
TPS
The overall (all clients) TPS was ~160K for N=10 and ~300K for N=40, which is below the 400K target (and 95%) on the validator side. If transaction arrivals were spread evenly over each second, that would correspond to ~800 and ~1.5K transactions per 5ms (EMA sampling window), in which case throttling should not have been activated. The fact that it was activated (logs show the EMA reaching ~1900) suggests either a bug or that load isn’t evenly distributed—creating denser intervals (spikes) where 1.9K+ transactions arrive per 5ms. My bet is the latter heh. Especially given that we saw no throttling when using 8 connections instead of 16 -- at 210K TPS. |
When we used 16 connections they could not get to full speed (as they got throttled). With 8 connections per ID we generated 210K TPS average (including the slow-start) => with 16 we could make 420K TPS easily enough (and thus going above the threshold). No bugs needed. |
could this also just be how we're measuring TPS? the data dmitry has above shows connections getting throttled and then unthrottled quickly. If we measure TPS over multiple of those throttled on/off windows, we would see an average TPS lower than the expected throttle threshold. |
There was a problem hiding this comment.
thank you for getting these numbers! looks like a significant improvement! lgtm!
EDIT: let's get #10033 merged next
Yes, we have tested both with 10033 changes and without them (as seen in the results listed above). 10033 is a clear improvement on what we get with this. |
ya i saw that. i was just saying let's merge this one and then that one. |
|
Backports to the beta branch are to be avoided unless absolutely necessary for fixing bugs, security issues, and perf regressions. Changes intended for backport should be structured such that a minimum effective diff can be committed separately from any refactoring, plumbing, cleanup, etc that are not strictly necessary to achieve the goal. Any of the latter should go only into master and ride the normal stabilization schedule. Exceptions include CI/metrics changes, CLI improvements and documentation updates on a case by case basis. |
#9580) Simulations with the existing EMA-based load metric (stream_throttle.rs) showed that very low-stake staked connections (~0.01% of total stake) could end up with streams-per-100ms quotas similar to unstaked connections even under near-zero load. Data collected on mds1 (mainnet) over a few leader slots also showed low-stake connections being throttled under effectively idle conditions: [2025-12-04T22:56:59.929547468Z ERROR solana_streamer::nonblocking::stream_throttle] Throttling tpu stream from 3.66.188.50:8016, peer type: Staked(30314578869242), current_load: 11, total_stake: 415746706271632896, max_streams_per_interval: 28, read_interval_streams: 28, throttle_duration: 99.948899ms In observed cases, effective load was near 0 (3–25 streams per 5ms) while affected connections had quotas of 28–64 streams per 100ms and stakes of ~0.007–0.016% of total stake. Also: - Fix update_ema() catch-up behavior so missed slots do not re-apply the same accumulated load. - available_load_capacity_in_throttling_duration() mixed load values in streams/5ms and streams/50ms. Replaced it with a simpler stake-only quota under load. (cherry picked from commit 82836bf)

Problem
Staked connections are being throttled under virtually no load.
Mainnet observations
Data collected on
mds1over a few leader slots showed low-stake connections being throttled under effectively idle conditions:In observed cases, effective load was near 0 (3–25 streams per 5ms) while affected connections had quotas of 28–64 streams per 100ms and stakes of ~0.007–0.016% of total stake.
Simulations
Simulations with the existing EMA-based load metric (
stream_throttle.rs) confirmed that very low-stake staked connections (~0.01% of total stake) could end up with streams-per-100ms quotas similar to unstaked connections even under near-zero load:How to read this plot
The load estimator operates on 5 ms sampling intervals. This plot shows how the current load value (ema) evolves over time for a given load per sampling interval. For example, 15 K streams are processed in the first interval, 20 K in the second and third intervals, and so on.
The plot also shows the quotas allocated to connections with a given stake (as a percentage of the total stake).
The minimum load is always capped at 25% of the maximum expected load (20 K per 50 ms in this case), even when the actual load is lower. This is why the quotas do not change until ema exceeds 5 K.
Note that connections with a low stake receive the same quota regardless of the load.
Ideally, this EMA-based estimator with a configured TPS target should be replaced by backpressure from dedup/sigver/scheduler, so that TPU throttles/slows (prioritizing high-stake connections) based on that indicator instead of the current EMA.
However, the changes in this PR is arguably an improvement until this new mechanism is implemented.
Summary of Changes
Only start throttling once the measured EMA load exceeds a configurable threshold. Use hysteresis (load_on / load_off) to avoid rapid on/off toggling.
Also:
update_ema()catch-up behavior so missed slots do not re-apply the same accumulated load.available_load_capacity_in_throttling_duration()mixed load values in streams/5ms and streams/50ms when calculating quotas. Replaced it with a simpler stake-only-no-load calculation under load.