Skip to content

Don't apply throttling until a configurable load threshold is reached.#9580

Merged
gregcusack merged 1 commit intoanza-xyz:masterfrom
stablebits:rfc-throttling-threshold-v2
Jan 15, 2026
Merged

Don't apply throttling until a configurable load threshold is reached.#9580
gregcusack merged 1 commit intoanza-xyz:masterfrom
stablebits:rfc-throttling-threshold-v2

Conversation

@stablebits
Copy link
Copy Markdown

@stablebits stablebits commented Dec 16, 2025

Problem

Staked connections are being throttled under virtually no load.

Mainnet observations

Data collected on mds1 over a few leader slots showed low-stake connections being throttled under effectively idle conditions:

[2025-12-04T22:56:59.929547468Z ERROR solana_streamer::nonblocking::stream_throttle] Throttling tpu stream from 3.66.188.50:8016, peer type: Staked(30314578869242), current_load: 11, total_stake: 415746706271632896, max_streams_per_interval: 28, read_interval_streams: 28, throttle_duration: 99.948899ms

In observed cases, effective load was near 0 (3–25 streams per 5ms) while affected connections had quotas of 28–64 streams per 100ms and stakes of ~0.007–0.016% of total stake.

Simulations

Simulations with the existing EMA-based load metric (stream_throttle.rs) confirmed that very low-stake staked connections (~0.01% of total stake) could end up with streams-per-100ms quotas similar to unstaked connections even under near-zero load:

 # max_streams_per_ms=500 max_unstaked_connections=500 max_staked_load_in_ema_window=20000 max_unstaked_load_in_throttling_window=20
        step  load_in_5ms          ema     quota_1%   quota_0.1%  quota_0.01%
           0            0            0         1600          160           21
           1        15000         2724         1600          160           21
           2        20000         5862         1364          136           21
           3        20000         8430          948           94           21
           4        10000         8715          916           90           21
           5         1000         7313         1092          108           21
           6         1000         6166         1296          128           21
           7         1000         5227         1530          152           21
           8          500         4368         1600          160           21

How to read this plot

The load estimator operates on 5 ms sampling intervals. This plot shows how the current load value (ema) evolves over time for a given load per sampling interval. For example, 15 K streams are processed in the first interval, 20 K in the second and third intervals, and so on.

The plot also shows the quotas allocated to connections with a given stake (as a percentage of the total stake).

The minimum load is always capped at 25% of the maximum expected load (20 K per 50 ms in this case), even when the actual load is lower. This is why the quotas do not change until ema exceeds 5 K.

Note that connections with a low stake receive the same quota regardless of the load.

Ideally, this EMA-based estimator with a configured TPS target should be replaced by backpressure from dedup/sigver/scheduler, so that TPU throttles/slows (prioritizing high-stake connections) based on that indicator instead of the current EMA.

However, the changes in this PR is arguably an improvement until this new mechanism is implemented.

Summary of Changes

Only start throttling once the measured EMA load exceeds a configurable threshold. Use hysteresis (load_on / load_off) to avoid rapid on/off toggling.

Also:

  • Fix update_ema() catch-up behavior so missed slots do not re-apply the same accumulated load.
  • available_load_capacity_in_throttling_duration() mixed load values in streams/5ms and streams/50ms when calculating quotas. Replaced it with a simpler stake-only-no-load calculation under load.

@codecov-commenter
Copy link
Copy Markdown

codecov-commenter commented Dec 17, 2025

Codecov Report

✅ All modified and coverable lines are covered by tests.
✅ Project coverage is 82.5%. Comparing base (bba10d8) to head (017a87d).

Additional details and impacted files
@@            Coverage Diff            @@
##           master    #9580     +/-   ##
=========================================
- Coverage    82.5%    82.5%   -0.1%     
=========================================
  Files         844      844             
  Lines      316757   316679     -78     
=========================================
- Hits       261599   261499    -100     
- Misses      55158    55180     +22     
🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

Comment thread streamer/src/nonblocking/stream_throttle.rs
Comment thread streamer/src/nonblocking/stream_throttle.rs
@KirillLykov
Copy link
Copy Markdown

Is it possible to have a plot for quotas with old/new method?

Copy link
Copy Markdown

@alexpyattaev alexpyattaev left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can this work without hysteresis? If so, worth exploring to reduce complexity. Also a few small nits.

Comment thread streamer/src/nonblocking/stream_throttle.rs
Comment thread streamer/src/nonblocking/stream_throttle.rs Outdated
Comment thread streamer/src/nonblocking/stream_throttle.rs Outdated
@stablebits stablebits requested a review from gregcusack January 9, 2026 10:19
Copy link
Copy Markdown

@gregcusack gregcusack left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ideally, this EMA-based estimator with a configured TPS target should be replaced by backpressure from dedup/sigver/scheduler, so that TPU throttles/slows (prioritizing high-stake connections) based on that indicator instead of the current EMA.

However, the changes in this PR is arguably an improvement until this new mechanism is implemented.

ya i agree. if we don't make this change, than even with backpressure, low staked nodes are still going to get unfairly throttled

this PR looks good to me as long as the hysteresis discussion is resolved. (As mentioned, I think we should remove it for now).

@stablebits
Copy link
Copy Markdown
Author

Ideally, this EMA-based estimator with a configured TPS target should be replaced by backpressure from dedup/sigver/scheduler, so that TPU throttles/slows (prioritizing high-stake connections) based on that indicator instead of the current EMA.
However, the changes in this PR is arguably an improvement until this new mechanism is implemented.

ya i agree. if we don't make this change, than even with backpressure, low staked nodes are still going to get unfairly throttled

this PR looks good to me as long as the hysteresis discussion is resolved. (As mentioned, I think we should remove it for now).

removed hysteresis here: 1f75c55

Copy link
Copy Markdown

@alexpyattaev alexpyattaev left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Generally this LGTM, just some leftover mentions of hysteresis to clean up. We need to focus on making this suitable for a backport. I'd consider splitting off the changes to existing consts into separate PR based on previous reviews (to make sure we clearly understand their impact in isolation from what we are doing here). Might be a good time to ask backports team about how backportable does this look in general.

// EMA smoothing window to make the load signal less sensitive to short bursts at the start
// of a leader slot and only trigger throttling if saturation is sustained.
// 40 was chosen based on simulations. See ema_function().
const STREAM_LOAD_EMA_INTERVAL_COUNT: u64 = 40;
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I believe the backports team would not appreciate us changing existing constants and backporting functional changes in the same PR.

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

if we don't change this number do we get any benefit? based on dmitry's numbers above, we'll still throttle the crap out of low staked nodes if we leave N=10

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

the ask is to split the changes, not to refrain from making them. backport are free, we can take two

max_streams_per_ms: u64,
// These values are in streams/STREAM_LOAD_EMA_INTERVAL_MS.
staked_throttling_on_load_threshold: u64,
staked_throttling: AtomicBool,
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
staked_throttling: AtomicBool,
staked_throttling_enabled: AtomicBool,

Comment thread streamer/src/nonblocking/stream_throttle.rs Outdated
Comment thread streamer/src/nonblocking/stream_throttle.rs Outdated
@stablebits stablebits force-pushed the rfc-throttling-threshold-v2 branch from 84b550f to db2124d Compare January 12, 2026 20:25
Copy link
Copy Markdown

@gregcusack gregcusack left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

lgtm as well but need sme approval and agree with alex that we need some insight into how backportable this is. From dmitry's tests, it looks like STREAM_LOAD_EMA_INTERVAL_COUNT = 10 doesn't have any significant benefit. But not sure we can backport STREAM_LOAD_EMA_INTERVAL_COUNT = 40.

@t-nelson or @alessandrod what are your thoughts?

@alessandrod
Copy link
Copy Markdown

have we repeated tests on mds1 and confirmed this fixes the issues we observed?

@alexpyattaev
Copy link
Copy Markdown

have we repeated tests on mds1 and confirmed this fixes the issues we observed?

This is currently running on mds1 (has been for >week now, with a small gap saturday-monday this week).

@gregcusack
Copy link
Copy Markdown

have we repeated tests on mds1 and confirmed this fixes the issues we observed?

This is currently running on mds1 (has been for >week now, with a small gap saturday-monday this week).

nice! Do we have updated numbers showing that low staked nodes are not getting throttled as much as they were before? Something like what Dmitry has at the top of the PR description under Mainnet Observations

@t-nelson
Copy link
Copy Markdown

@t-nelson or @alessandrod what are your thoughts?

generally not opposed. this "ema" "code" has been a bug since its inception. that the line count is reducing can only be an improvement. that it is contained and the simulation results seem to match is a bonus

try to get it in before friday though

@stablebits
Copy link
Copy Markdown
Author

stablebits commented Jan 14, 2026

have we repeated tests on mds1 and confirmed this fixes the issues we observed?

@alessandrod @gregcusack This is throttled_staked_streams on mds1:

Screenshot 2026-01-14 at 14 22 16

The spikes in the middle correspond to the roll-back to the official release (security fixes). To the right of the spike is this branch after it's been synced with master (security fixes).

[1] https://metrics.solana.com:8889/sources/27/chronograf/data-explorer?query=SELECT%20mean%28%22throttled_staked_streams%22%29%20AS%20%22mean_throttled_staked_streams%22%20FROM%20%22mainnet-beta%22.%22autogen%22.%22quic_streamer_tpu%22%20WHERE%20time%20%3E%20%3AdashboardTime%3A%20AND%20time%20%3C%20%3AupperDashboardTime%3A%20AND%20%22host_id%22%3D%27mds1WWedpezW3qvgML4WgP341jZksYAy5SbMLwjP5KC%27%20GROUP%20BY%20time%28%3Ainterval%3A%29%2C%20%22host_name%22%20FILL%28null%29#.

Simulations with the existing EMA-based load metric (stream_throttle.rs) showed that
very low-stake staked connections (~0.01% of total stake) could end up with
streams-per-100ms quotas similar to unstaked connections even under near-zero load.

Data collected on mds1 (mainnet) over a few leader slots also showed low-stake connections
being throttled under effectively idle conditions:
[2025-12-04T22:56:59.929547468Z ERROR solana_streamer::nonblocking::stream_throttle] Throttling tpu stream from 3.66.188.50:8016, peer type: Staked(30314578869242),
current_load: 11, total_stake: 415746706271632896, max_streams_per_interval: 28, read_interval_streams: 28, throttle_duration: 99.948899ms

In observed cases, effective load was near 0 (3–25 streams per 5ms) while affected connections had quotas of 28–64 streams per 100ms and stakes of ~0.007–0.016% of
total stake.

Also:
- Fix update_ema() catch-up behavior so missed slots do not re-apply the same
  accumulated load.
- available_load_capacity_in_throttling_duration() mixed load values in streams/5ms and streams/50ms. Replaced it with a simpler
  stake-only quota under load.
@stablebits stablebits force-pushed the rfc-throttling-threshold-v2 branch from 5125629 to 017a87d Compare January 14, 2026 14:23
@stablebits
Copy link
Copy Markdown
Author

stablebits commented Jan 14, 2026

@gregcusack reverted the 10->40 change; rebased.

The 10->40 change is introduced by this followup PR #10033. I'll rebase it if this PR gets merged.

@stablebits
Copy link
Copy Markdown
Author

stablebits commented Jan 15, 2026

@gregcusack we run tests with @alexpyattaev today to confirm that throttling is activated under load. It does.

The test: 3 clients (2 high and 1 low stake), each with 16 connections. RTT=~50ms. Max TPS is 400K for validator, so threshold 1900 that you see below is 95% of 2000 tx per 5ms (ema update window).

This PR alone (without the 10->40 change; 3sec run time):

td2GGWDsCJ6LvjN89oLJvmrDwE14neNrbqQ9s3tVkPy: sent=242524   (80841 TPS)
td3n5NGhP7JKWrL638gzau3NY7mF4K3ztZww3GkpywJ: sent=225785   (75261 TPS)
DZ6m64cghUKgVcvsFgAb3NDaeEyRpuMSK8hwqETndNTS: sent=17923   (5974 TPS)

Throttling was activated:

[2026-01-15T10:42:34.213157417Z DEBUG solana_streamer::nonblocking::stream_throttle] activating throttling: ema 1921, threshold 1900
[2026-01-15T10:42:34.294720936Z DEBUG solana_streamer::nonblocking::stream_throttle] deactivating throttling: ema 61, threshold 1900
[2026-01-15T10:42:34.311541923Z DEBUG solana_streamer::nonblocking::stream_throttle] activating throttling: ema 1925, threshold 1900
[2026-01-15T10:42:34.395830588Z DEBUG solana_streamer::nonblocking::stream_throttle] deactivating throttling: ema 62, threshold 1900
[2026-01-15T10:42:34.412581674Z DEBUG solana_streamer::nonblocking::stream_throttle] activating throttling: ema 2059, threshold 1900
[2026-01-15T10:42:34.497043850Z DEBUG solana_streamer::nonblocking::stream_throttle] deactivating throttling: ema 66, threshold 1900
[2026-01-15T10:42:34.514117727Z DEBUG solana_streamer::nonblocking::stream_throttle] activating throttling: ema 2004, threshold 1900
[2026-01-15T10:42:34.597186099Z DEBUG solana_streamer::nonblocking::stream_throttle] deactivating throttling: ema 65, threshold 1900

ema jumping from threshold to low and back suggests that all (or most) connections were throttled.

With N=40 (30sec run time):

td2GGWDsCJ6LvjN89oLJvmrDwE14neNrbqQ9s3tVkPy: sent=3929659   (130988 TPS)
td3n5NGhP7JKWrL638gzau3NY7mF4K3ztZww3GkpywJ: sent=4128660   (137622 TPS)
DZ6m64cghUKgVcvsFgAb3NDaeEyRpuMSK8hwqETndNTS: sent=1337545   (44584 TPS)

Throttling is less frequent and TPS are higher.

[2026-01-15T10:53:03.693445265Z DEBUG solana_streamer::nonblocking::stream_throttle] deactivating throttling: ema 1757, threshold 1900
[2026-01-15T10:53:03.964389878Z DEBUG solana_streamer::nonblocking::stream_throttle] activating throttling: ema 1920, threshold 1900
[2026-01-15T10:53:03.970419795Z DEBUG solana_streamer::nonblocking::stream_throttle] deactivating throttling: ema 1740, threshold 1900
[2026-01-15T10:53:03.995491707Z DEBUG solana_streamer::nonblocking::stream_throttle] activating throttling: ema 1903, threshold 1900
[2026-01-15T10:53:04.072341016Z DEBUG solana_streamer::nonblocking::stream_throttle] deactivating throttling: ema 863, threshold 1900
[2026-01-15T10:53:04.267823114Z DEBUG solana_streamer::nonblocking::stream_throttle] activating throttling: ema 1908, threshold 1900
[2026-01-15T10:53:04.273038979Z DEBUG solana_streamer::nonblocking::stream_throttle] deactivating throttling: ema 1816, threshold 1900

ema doesn't jump to extreme lows (suggesting that not all connections were throttled).

Now, N=40 but same 3 clients with 8 connections (30 sec run time) -- less load overall.

td2GGWDsCJ6LvjN89oLJvmrDwE14neNrbqQ9s3tVkPy: sent=2802975   (93432 TPS)
td3n5NGhP7JKWrL638gzau3NY7mF4K3ztZww3GkpywJ: sent=2806442   (93548 TPS)
DZ6m64cghUKgVcvsFgAb3NDaeEyRpuMSK8hwqETndNTS: sent=776462   (25882 TPS)

No throttling was observed in this case. The overall load (but mostly spikes/fluctuations) weren't sufficient to reach the threshold.

@alexpyattaev
Copy link
Copy Markdown

Yes the tests were quite conclusive that the desired effect is achieved, i.e. staked connections can ignore throttling unless load is high enough. One correction the ping to the server was 30ms, not 50.

Now, N=40 but same 3 clients with 8 connections (30 sec run time) -- less load overall.

I guess you meant throughput here?

No throttling was observed in this case. The overall load (but mostly spikes/fluctuations) weren't sufficient to reach the threshold.

Which spikes? We had consistent full-buffer traffic generation on our clients, as you recall from the other test under CUBIC CC there are no load spikes.

@stablebits
Copy link
Copy Markdown
Author

Now, N=40 but same 3 clients with 8 connections (30 sec run time) -- less load overall.

I guess you meant throughput here?

TPS

No throttling was observed in this case. The overall load (but mostly spikes/fluctuations) weren't sufficient to reach the threshold.

Which spikes? We had consistent full-buffer traffic generation on our clients, as you recall from the other test under CUBIC CC there are no load spikes.

The overall (all clients) TPS was ~160K for N=10 and ~300K for N=40, which is below the 400K target (and 95%) on the validator side. If transaction arrivals were spread evenly over each second, that would correspond to ~800 and ~1.5K transactions per 5ms (EMA sampling window), in which case throttling should not have been activated. The fact that it was activated (logs show the EMA reaching ~1900) suggests either a bug or that load isn’t evenly distributed—creating denser intervals (spikes) where 1.9K+ transactions arrive per 5ms. My bet is the latter heh. Especially given that we saw no throttling when using 8 connections instead of 16 -- at 210K TPS.

@alexpyattaev
Copy link
Copy Markdown

The overall (all clients) TPS was ~160K for N=10 and ~300K for N=40, which is below the 400K target (and 95%) on the validator side. If transaction arrivals were spread evenly over each second, that would correspond to ~800 and ~1.5K transactions per 5ms (EMA sampling window), in which case throttling should not have been activated. The fact that it was activated (logs show the EMA reaching ~1900) suggests either a bug or that load isn’t evenly distributed—creating denser intervals (spikes) where 1.9K+ transactions arrive per 5ms. My bet is the latter heh. Especially given that we saw no throttling when using 8 connections instead of 16 -- at 210K TPS.

When we used 16 connections they could not get to full speed (as they got throttled). With 8 connections per ID we generated 210K TPS average (including the slow-start) => with 16 we could make 420K TPS easily enough (and thus going above the threshold). No bugs needed.

@gregcusack
Copy link
Copy Markdown

The overall (all clients) TPS was ~160K for N=10 and ~300K for N=40, which is below the 400K target (and 95%) on the validator side. If transaction arrivals were spread evenly over each second, that would correspond to ~800 and ~1.5K transactions per 5ms (EMA sampling window), in which case throttling should not have been activated. The fact that it was activated (logs show the EMA reaching ~1900) suggests either a bug or that load isn’t evenly distributed—creating denser intervals (spikes) where 1.9K+ transactions arrive per 5ms. My bet is the latter heh. Especially given that we saw no throttling when using 8 connections instead of 16 -- at 210K TPS.

When we used 16 connections they could not get to full speed (as they got throttled). With 8 connections per ID we generated 210K TPS average (including the slow-start) => with 16 we could make 420K TPS easily enough (and thus going above the threshold). No bugs needed.

could this also just be how we're measuring TPS? the data dmitry has above shows connections getting throttled and then unthrottled quickly. If we measure TPS over multiple of those throttled on/off windows, we would see an average TPS lower than the expected throttle threshold.

@gregcusack gregcusack self-requested a review January 15, 2026 17:30
Copy link
Copy Markdown

@gregcusack gregcusack left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

thank you for getting these numbers! looks like a significant improvement! lgtm!

EDIT: let's get #10033 merged next

@alexpyattaev
Copy link
Copy Markdown

EDIT: let's get #10033 merged next

Yes, we have tested both with 10033 changes and without them (as seen in the results listed above). 10033 is a clear improvement on what we get with this.

@gregcusack
Copy link
Copy Markdown

EDIT: let's get #10033 merged next

Yes, we have tested both with 10033 changes and without them (as seen in the results listed above). 10033 is a clear improvement on what we get with this.

ya i saw that. i was just saying let's merge this one and then that one.

@gregcusack gregcusack added this pull request to the merge queue Jan 15, 2026
Merged via the queue into anza-xyz:master with commit 82836bf Jan 15, 2026
47 checks passed
@gregcusack gregcusack added the v3.1 Backport to v3.1 branch label Jan 16, 2026
@mergify
Copy link
Copy Markdown

mergify Bot commented Jan 16, 2026

Backports to the beta branch are to be avoided unless absolutely necessary for fixing bugs, security issues, and perf regressions. Changes intended for backport should be structured such that a minimum effective diff can be committed separately from any refactoring, plumbing, cleanup, etc that are not strictly necessary to achieve the goal. Any of the latter should go only into master and ride the normal stabilization schedule. Exceptions include CI/metrics changes, CLI improvements and documentation updates on a case by case basis.

mergify Bot pushed a commit that referenced this pull request Jan 16, 2026
#9580)

Simulations with the existing EMA-based load metric (stream_throttle.rs) showed that
very low-stake staked connections (~0.01% of total stake) could end up with
streams-per-100ms quotas similar to unstaked connections even under near-zero load.

Data collected on mds1 (mainnet) over a few leader slots also showed low-stake connections
being throttled under effectively idle conditions:
[2025-12-04T22:56:59.929547468Z ERROR solana_streamer::nonblocking::stream_throttle] Throttling tpu stream from 3.66.188.50:8016, peer type: Staked(30314578869242),
current_load: 11, total_stake: 415746706271632896, max_streams_per_interval: 28, read_interval_streams: 28, throttle_duration: 99.948899ms

In observed cases, effective load was near 0 (3–25 streams per 5ms) while affected connections had quotas of 28–64 streams per 100ms and stakes of ~0.007–0.016% of
total stake.

Also:
- Fix update_ema() catch-up behavior so missed slots do not re-apply the same
  accumulated load.
- available_load_capacity_in_throttling_duration() mixed load values in streams/5ms and streams/50ms. Replaced it with a simpler
  stake-only quota under load.

(cherry picked from commit 82836bf)
@gregcusack gregcusack mentioned this pull request Jan 29, 2026
4 tasks
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

v3.1 Backport to v3.1 branch

Projects

None yet

Development

Successfully merging this pull request may close these issues.

7 participants