Skip to content

v3.1: Increase the duration of the EMA smoothing window (STREAM_LOAD_EMA_INTERVAL_COUNT) (backport of #10033)#10089

Closed
mergify[bot] wants to merge 1 commit intov3.1from
mergify/bp/v3.1/pr-10033
Closed

v3.1: Increase the duration of the EMA smoothing window (STREAM_LOAD_EMA_INTERVAL_COUNT) (backport of #10033)#10089
mergify[bot] wants to merge 1 commit intov3.1from
mergify/bp/v3.1/pr-10033

Conversation

@mergify
Copy link
Copy Markdown

@mergify mergify Bot commented Jan 16, 2026

Note
This change is a follow-up to #9580.

The STREAM_LOAD_EMA_INTERVAL_COUNT constant controls the duration of the EMA smoothing window used to reduce sensitivity to short-lived load spikes at the start of a leader slot. With #9580 in place, throttling is only triggered when saturation is sustained (reaching 95% of max target).

Problem

With 10, the duration of the smoothing window is too short (see the simulation results below).

Summary of Changes

The value 40 was chosen based on simulations: at a max target TPS of ~400K, it allows the system to absorb a burst of ~50K transactions over ~40 ms before throttling activates.

There is no magic about N=40; the value should be tuned based on the size and duration of spikes we want to tolerate.

This choice was made based on simulations: the alpha in the EMA (new_ema = alpha * latest + (1 - alpha) * ema) is basically 2/(N+1), where N is STREAM_LOAD_EMA_INTERVAL_COUNT.
The larger N is, the slower the EMA grows (i.e., the larger a burst it can absorb). With N=10 (current code), alpha ≈ 0.18. For example, here’s the EMA growth under sustained load of 1K / 5ms.

N=10 (alpha ≈ 0.18)

        step  load_in_5ms          ema
           0         1000          181
           1         1000          329
           2         1000          450
           3         1000          549
           4         1000          630
           5         1000          697
           6         1000          752
           7         1000          797
           8         1000          833
           9         1000          863

N=40 (alpha ≈ 0.047)

        step  load_in_5ms          ema
           0         1000           47
           1         1000           92
           2         1000          135
           3         1000          176
           4         1000          215
           5         1000          252
           6         1000          287
           7         1000          321
           8         1000          353
           9         1000          383

Below is simulated ingestion of ~60K transactions over 100ms with a spike at the beginning -- roughly corresponding to a pattern we recently saw on mds1 (mainnet), but at about 10x more traffic.
Note: throttling is activated at 95% of the target (500K TPS) load and deactivated at 90%). The quota of 40K basically means unthrottled.

N=10

Running `target/debug/ema_sim 5000 15000 1000 3000 4000 7000 5000 5000 3000 5000 1000 2000 1000 1000 1000 1000 1000 1000 1000 1000 --stakes 1,10,100 --total-stake 10000`
# max_streams_per_ms=500 max_unstaked_connections=500 max_staked_load_in_throttling_window=40000 max_unstaked_load_in_throttling_window=20 throttling_on_threshold=1900
        step  load_in_5ms          ema  quota_0.01%   quota_0.1%     quota_1%
           0         5000          908        40000        40000        40000
           1        15000         3467           21           40          400
           2         1000         3018           21           40          400
           3         3000         3014           21           40          400
           4         4000         3193           21           40          400
           5         7000         3884           21           40          400
           6         5000         4086           21           40          400
           7         5000         4252           21           40          400
           8         3000         4024           21           40          400
           9         5000         4201           21           40          400
          10         1000         3619           21           40          400
          11         2000         3324           21           40          400
          12         1000         2901           21           40          400
          13         1000         2555           21           40          400
          14         1000         2272           21           40          400
          15         1000         2040           21           40          400
          16         1000         1851           21           40          400
          17         1000         1696        40000        40000        40000
          18         1000         1569        40000        40000        40000
          19         1000         1465        40000        40000        40000

N=40

# max_streams_per_ms=500 max_unstaked_connections=500 max_staked_load_in_throttling_window=40000 max_unstaked_load_in_throttling_window=20 throttling_on_threshold=1900
        step  load_in_5ms          ema  quota_0.01%   quota_0.1%     quota_1%
           0         5000          239        40000        40000        40000
           1        15000          945        40000        40000        40000
           2         1000          947        40000        40000        40000
           3         3000         1045        40000        40000        40000
           4         4000         1186        40000        40000        40000
           5         7000         1464        40000        40000        40000
           6         5000         1633        40000        40000        40000
           7         5000         1794        40000        40000        40000
           8         3000         1851        40000        40000        40000
           9         5000         2001           21           40          400
          10         1000         1953           21           40          400
          11         2000         1955           21           40          400
          12         1000         1909           21           40          400
          13         1000         1865           21           40          400
          14         1000         1823           21           40          400
          15         1000         1783        40000        40000        40000
          16         1000         1745        40000        40000        40000
          17         1000         1709        40000        40000        40000
          18         1000         1675        40000        40000        40000
          19         1000         1642        40000        40000        40000

With N=40, we can absorb ~50K transactions (with a spike) over ~40ms before throttling gets activated.

Fixes #


This is an automatic backport of pull request #10033 done by [Mergify](https://mergify.com).

…TERVAL_COUNT) (#10033)

streamer/TPU: increase STREAM_LOAD_EMA_INTERVAL_COUNT from 10 to 40

This constant controls the duration of the EMA smoothing window used to
reduce sensitivity to short-lived load spikes at the start of a leader
slot. Throttling is only triggered when saturation is sustained.

The value 40 was chosen based on simulations: at a max target TPS of ~400K,
it allows the system to absorb a burst of ~50K transactions over ~40 ms
before throttling activates.

There is no magic about N=40; the value should be tuned based on the size
and duration of spikes we want to tolerate.

(cherry picked from commit 51ebbc4)

# Conflicts:
#	streamer/src/nonblocking/stream_throttle.rs
@mergify mergify Bot requested a review from a team as a code owner January 16, 2026 17:52
@mergify mergify Bot added the conflicts label Jan 16, 2026
@mergify
Copy link
Copy Markdown
Author

mergify Bot commented Jan 16, 2026

Cherry-pick of 51ebbc4 has failed:

On branch mergify/bp/v3.1/pr-10033
Your branch is up to date with 'origin/v3.1'.

You are currently cherry-picking commit 51ebbc43a.
  (fix conflicts and run "git cherry-pick --continue")
  (use "git cherry-pick --skip" to skip this patch)
  (use "git cherry-pick --abort" to cancel the cherry-pick operation)

Unmerged paths:
  (use "git add <file>..." to mark resolution)
	both modified:   streamer/src/nonblocking/stream_throttle.rs

no changes added to commit (use "git add" and/or "git commit -a")

To fix up this pull request, you can check it out locally. See documentation: https://docs.github.com/en/pull-requests/collaborating-with-pull-requests/reviewing-changes-in-pull-requests/checking-out-pull-requests-locally

@stablebits
Copy link
Copy Markdown

stablebits commented Jan 19, 2026

@gregcusack @alexpyattaev It doesn't look like my first PR was merged. So this one can't be merged yet...

This one should be backported first:

commit 82836bf7f03c9e3125200e5d7691fbd08530c720
Author: Dmitry Adamushko <dmitry.adamushka@anza.xyz>
Date:   Thu Jan 15 23:40:02 2026 +0100

    Don't apply throttling until a configurable load threshold is reached. (#9580)

@gregcusack
Copy link
Copy Markdown

@gregcusack @alexpyattaev It doesn't look like my first PR was merged. So this one can't be merged yet...

This one should be backported first:

commit 82836bf7f03c9e3125200e5d7691fbd08530c720
Author: Dmitry Adamushko <dmitry.adamushka@anza.xyz>
Date:   Thu Jan 15 23:40:02 2026 +0100

    Don't apply throttling until a configurable load threshold is reached. (#9580)

ya you're good! nothing to do on this one right now. as you mentioned, the other needs to be merged first

@t-nelson t-nelson closed this Mar 30, 2026
@steviez steviez deleted the mergify/bp/v3.1/pr-10033 branch April 24, 2026 03:50
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants