Skip to content

Increase the duration of the EMA smoothing window (STREAM_LOAD_EMA_INTERVAL_COUNT)#10033

Merged
alexpyattaev merged 2 commits intoanza-xyz:masterfrom
stablebits:increase-stream_load_ema_interval_count
Jan 16, 2026
Merged

Increase the duration of the EMA smoothing window (STREAM_LOAD_EMA_INTERVAL_COUNT)#10033
alexpyattaev merged 2 commits intoanza-xyz:masterfrom
stablebits:increase-stream_load_ema_interval_count

Conversation

@stablebits
Copy link
Copy Markdown

@stablebits stablebits commented Jan 14, 2026

Note
This change is a follow-up to #9580.

The STREAM_LOAD_EMA_INTERVAL_COUNT constant controls the duration of the EMA smoothing window used to reduce sensitivity to short-lived load spikes at the start of a leader slot. With #9580 in place, throttling is only triggered when saturation is sustained (reaching 95% of max target).

Problem

With 10, the duration of the smoothing window is too short (see the simulation results below).

Summary of Changes

The value 40 was chosen based on simulations: at a max target TPS of ~400K, it allows the system to absorb a burst of ~50K transactions over ~40 ms before throttling activates.

There is no magic about N=40; the value should be tuned based on the size and duration of spikes we want to tolerate.

This choice was made based on simulations: the alpha in the EMA (new_ema = alpha * latest + (1 - alpha) * ema) is basically 2/(N+1), where N is STREAM_LOAD_EMA_INTERVAL_COUNT.
The larger N is, the slower the EMA grows (i.e., the larger a burst it can absorb). With N=10 (current code), alpha ≈ 0.18. For example, here’s the EMA growth under sustained load of 1K / 5ms.

N=10 (alpha ≈ 0.18)

        step  load_in_5ms          ema
           0         1000          181
           1         1000          329
           2         1000          450
           3         1000          549
           4         1000          630
           5         1000          697
           6         1000          752
           7         1000          797
           8         1000          833
           9         1000          863

N=40 (alpha ≈ 0.047)

        step  load_in_5ms          ema
           0         1000           47
           1         1000           92
           2         1000          135
           3         1000          176
           4         1000          215
           5         1000          252
           6         1000          287
           7         1000          321
           8         1000          353
           9         1000          383

Below is simulated ingestion of ~60K transactions over 100ms with a spike at the beginning -- roughly corresponding to a pattern we recently saw on mds1 (mainnet), but at about 10x more traffic.
Note: throttling is activated at 95% of the target (500K TPS) load and deactivated at 90%). The quota of 40K basically means unthrottled.

N=10

Running `target/debug/ema_sim 5000 15000 1000 3000 4000 7000 5000 5000 3000 5000 1000 2000 1000 1000 1000 1000 1000 1000 1000 1000 --stakes 1,10,100 --total-stake 10000`
# max_streams_per_ms=500 max_unstaked_connections=500 max_staked_load_in_throttling_window=40000 max_unstaked_load_in_throttling_window=20 throttling_on_threshold=1900
        step  load_in_5ms          ema  quota_0.01%   quota_0.1%     quota_1%
           0         5000          908        40000        40000        40000
           1        15000         3467           21           40          400
           2         1000         3018           21           40          400
           3         3000         3014           21           40          400
           4         4000         3193           21           40          400
           5         7000         3884           21           40          400
           6         5000         4086           21           40          400
           7         5000         4252           21           40          400
           8         3000         4024           21           40          400
           9         5000         4201           21           40          400
          10         1000         3619           21           40          400
          11         2000         3324           21           40          400
          12         1000         2901           21           40          400
          13         1000         2555           21           40          400
          14         1000         2272           21           40          400
          15         1000         2040           21           40          400
          16         1000         1851           21           40          400
          17         1000         1696        40000        40000        40000
          18         1000         1569        40000        40000        40000
          19         1000         1465        40000        40000        40000

N=40

# max_streams_per_ms=500 max_unstaked_connections=500 max_staked_load_in_throttling_window=40000 max_unstaked_load_in_throttling_window=20 throttling_on_threshold=1900
        step  load_in_5ms          ema  quota_0.01%   quota_0.1%     quota_1%
           0         5000          239        40000        40000        40000
           1        15000          945        40000        40000        40000
           2         1000          947        40000        40000        40000
           3         3000         1045        40000        40000        40000
           4         4000         1186        40000        40000        40000
           5         7000         1464        40000        40000        40000
           6         5000         1633        40000        40000        40000
           7         5000         1794        40000        40000        40000
           8         3000         1851        40000        40000        40000
           9         5000         2001           21           40          400
          10         1000         1953           21           40          400
          11         2000         1955           21           40          400
          12         1000         1909           21           40          400
          13         1000         1865           21           40          400
          14         1000         1823           21           40          400
          15         1000         1783        40000        40000        40000
          16         1000         1745        40000        40000        40000
          17         1000         1709        40000        40000        40000
          18         1000         1675        40000        40000        40000
          19         1000         1642        40000        40000        40000

With N=40, we can absorb ~50K transactions (with a spike) over ~40ms before throttling gets activated.

Fixes #

@codecov-commenter
Copy link
Copy Markdown

codecov-commenter commented Jan 14, 2026

Codecov Report

✅ All modified and coverable lines are covered by tests.
✅ Project coverage is 82.6%. Comparing base (0e59c0b) to head (3b743ee).

Additional details and impacted files
@@           Coverage Diff           @@
##           master   #10033   +/-   ##
=======================================
  Coverage    82.6%    82.6%           
=======================================
  Files         844      844           
  Lines      316634   316634           
=======================================
+ Hits       261608   261619   +11     
+ Misses      55026    55015   -11     
🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

@gregcusack
Copy link
Copy Markdown

NOTE: data for this change is here: #9580

@stablebits, can you rebase and just leave the last constant commit. and then we can merge! thank you!!

This constant controls the duration of the EMA smoothing window used to
reduce sensitivity to short-lived load spikes at the start of a leader
slot. Throttling is only triggered when saturation is sustained.

The value 40 was chosen based on simulations: at a max target TPS of ~400K,
it allows the system to absorb a burst of ~50K transactions over ~40 ms
before throttling activates.

There is no magic about N=40; the value should be tuned based on the size
and duration of spikes we want to tolerate.
@stablebits stablebits force-pushed the increase-stream_load_ema_interval_count branch from 6fc5d7c to 98486db Compare January 16, 2026 13:59
@stablebits
Copy link
Copy Markdown
Author

@gregcusack rebased

Copy link
Copy Markdown

@gregcusack gregcusack left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

lgtm! thank you!!

@alexpyattaev alexpyattaev added this pull request to the merge queue Jan 16, 2026
@alexpyattaev alexpyattaev added the v3.1 Backport to v3.1 branch label Jan 16, 2026
@mergify
Copy link
Copy Markdown

mergify Bot commented Jan 16, 2026

Backports to the beta branch are to be avoided unless absolutely necessary for fixing bugs, security issues, and perf regressions. Changes intended for backport should be structured such that a minimum effective diff can be committed separately from any refactoring, plumbing, cleanup, etc that are not strictly necessary to achieve the goal. Any of the latter should go only into master and ride the normal stabilization schedule. Exceptions include CI/metrics changes, CLI improvements and documentation updates on a case by case basis.

Merged via the queue into anza-xyz:master with commit 51ebbc4 Jan 16, 2026
48 checks passed
mergify Bot pushed a commit that referenced this pull request Jan 16, 2026
…TERVAL_COUNT) (#10033)

streamer/TPU: increase STREAM_LOAD_EMA_INTERVAL_COUNT from 10 to 40

This constant controls the duration of the EMA smoothing window used to
reduce sensitivity to short-lived load spikes at the start of a leader
slot. Throttling is only triggered when saturation is sustained.

The value 40 was chosen based on simulations: at a max target TPS of ~400K,
it allows the system to absorb a burst of ~50K transactions over ~40 ms
before throttling activates.

There is no magic about N=40; the value should be tuned based on the size
and duration of spikes we want to tolerate.

(cherry picked from commit 51ebbc4)

# Conflicts:
#	streamer/src/nonblocking/stream_throttle.rs
@alexpyattaev alexpyattaev mentioned this pull request Jan 22, 2026
4 tasks
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

v3.1 Backport to v3.1 branch

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants