Skip to content

RFC | Don’t apply throttling unless a specific load threshold is reached.#9493

Closed
stablebits wants to merge 5 commits intoanza-xyz:masterfrom
stablebits:rfc-throttling-threshold
Closed

RFC | Don’t apply throttling unless a specific load threshold is reached.#9493
stablebits wants to merge 5 commits intoanza-xyz:masterfrom
stablebits:rfc-throttling-threshold

Conversation

@stablebits
Copy link
Copy Markdown

Problem

Staked connections are being throttled on unloaded systems.

Summary of Changes

For the v3.1.x fix, it's probably safer to use a minimal change that does not remove (yet) the current throttling entirely. For v4.0.0, we'll replace it with a better mechanism.
The approach taken here is simply not to apply any throttling until a configurable load threshold has been reached.

Open Questions

  • Throttling threshold (50% was chosen arbitrarily).
  • Throttling limit (5K now).

Details

Simulations with the current EMA load mechanism (stream_throttle.rs) show that staked connections with very low stake (e.g., ~0.01% of total stake) receive streams-per-100ms quotas that are similar to unstaked connections even in no-load scenarios.

Example:
        step  load_in_5ms          ema  quota_0.01%   quota_0.1%     quota_1%
           0            0            0           21          160         1600
           1         3000          544           21          160         1600
           2         1000          626           21          160         1600

Data collected on mds1 (over a few leader slots) also showed these low-stake connections being throttled:

[2025-12-04T22:56:59.929547468Z ERROR solana_streamer::nonblocking::stream_throttle] Throttling tpu stream from 3.66.188.50:8016, peer type: Staked(30314578869242), current_load: 11, total_stake: 415746706271632896, max_streams_per_interval: 28, read_interval_streams: 28 throttle_duration: 99.948899ms

In all observed cases, the effective load was basically 0 (3–25 streams per 5ms) while affected connections had quotas of 28–64 streams per 100ms, and stakes of ~0.007–0.016% of total stake.

With the change in this PR, mds1 has been running for multiple days without any staked connections being throttled (since Dec 6)

Screenshot 2025-12-10 at 17 17 58

Simulation of the new approach:

# max_streams_per_ms=500 max_unstaked_connections=2000 max_staked_load_in_ema_window=20000 max_unstaked_load_in_throttling_window=20
        step  load_in_5ms          ema     quota_0% quota_0.0010377902708922533% quota_0.0013466210076322612% quota_0.011060899378932171% quota_0.11449796281145662% quota_0.9999999999999997%
           0            0            0          200        50000        50000        50000        50000        50000
           1           10            1          200        50000        50000        50000        50000        50000
           2           50            9          200        50000        50000        50000        50000        50000
          [...]
          12        20000         8706          200        50000        50000        50000        50000        50000
          13        20000        10757           20           21           21           21           84          742
          14        20000        12435           20           21           21           21           72          642

Note that throttling is applied after the load is above 10K (50% of 20K-per-5ms => 500K TPS target configuration in this case).

The approach taken here is simply not to apply any throttling until a configurable load threshold has been reached. For the v3.1.x fix, it's probably safer to use a minimal change that does not remove (yet) the curernt throttling entirely. For v4.0.0, we can replace it with a better mechanism.

Simulations with the current EMA load mechanism (stream_throttle.rs) show that staked connections with very low stake (e.g., ~0.01% of total stake) receive streams-per-100ms quotas that are similar to unstaked connections even in no-load scenarios.

Example:
        step  load_in_5ms          ema  quota_0.01%   quota_0.1%     quota_1%
           0            0            0           21          160         1600
           1         3000          544           21          160         1600
           2         1000          626           21          160         1600

Data collected on mds1 (over a few leader slots) also showed these low-stake connections being throttled:
[2025-12-04T22:56:59.929547468Z ERROR solana_streamer::nonblocking::stream_throttle] Throttling tpu stream from 3.66.188.50:8016, peer type: Staked(30314578869242), current_load: 11, total_stake: 415746706271632896, max_streams_per_interval: 28, read_interval_streams: 28 throttle_duration: 99.948899ms

In all observed cases, the effective load was basically 0 (3–25 streams per 5ms) while affected connections had quotas of 28–64 streams per 100ms,
and stakes of ~0.007–0.016% of total stake.

With the change in this PR, mds1 has been running for multiple days without any staked connections being throttled.
@stablebits stablebits marked this pull request as draft December 15, 2025 12:53
@codecov-commenter
Copy link
Copy Markdown

codecov-commenter commented Dec 15, 2025

Codecov Report

❌ Patch coverage is 97.05882% with 2 lines in your changes missing coverage. Please review.
✅ Project coverage is 82.5%. Comparing base (1776695) to head (1e74e13).
⚠️ Report is 8 commits behind head on master.

Additional details and impacted files
@@            Coverage Diff            @@
##           master    #9493     +/-   ##
=========================================
- Coverage    82.5%    82.5%   -0.1%     
=========================================
  Files         901      901             
  Lines      323326   323367     +41     
=========================================
+ Hits       267058   267060      +2     
- Misses      56268    56307     +39     
🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

(ema-window) to be properly used in available_throttled_load_capacity().

The old implementation was giving quotas incorrectly (larger than
expected).

Set the no-throttling threshold to 90%.
Copy link
Copy Markdown

@alexpyattaev alexpyattaev left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we can make diff smaller.


// Unstaked nodes must contribute to the EMA load for this threshold to be meaningful.
// See increment_load().
const UNSTAKED_STREAM_THROTTLING_LOAD_THRESHOLD_PERCENT: u64 = 0;
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we should not introduce a const if it does nothing. We want to keep diff minimal for backport.

@stablebits
Copy link
Copy Markdown
Author

reworked approach is #9580; closing this one

@stablebits stablebits closed this Dec 22, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants