Increase the duration of the EMA smoothing window (STREAM_LOAD_EMA_INTERVAL_COUNT)#10033
Conversation
Codecov Report✅ All modified and coverable lines are covered by tests. Additional details and impacted files@@ Coverage Diff @@
## master #10033 +/- ##
=======================================
Coverage 82.6% 82.6%
=======================================
Files 844 844
Lines 316634 316634
=======================================
+ Hits 261608 261619 +11
+ Misses 55026 55015 -11 🚀 New features to boost your workflow:
|
|
NOTE: data for this change is here: #9580 @stablebits, can you rebase and just leave the last constant commit. and then we can merge! thank you!! |
This constant controls the duration of the EMA smoothing window used to reduce sensitivity to short-lived load spikes at the start of a leader slot. Throttling is only triggered when saturation is sustained. The value 40 was chosen based on simulations: at a max target TPS of ~400K, it allows the system to absorb a burst of ~50K transactions over ~40 ms before throttling activates. There is no magic about N=40; the value should be tuned based on the size and duration of spikes we want to tolerate.
6fc5d7c to
98486db
Compare
|
@gregcusack rebased |
|
Backports to the beta branch are to be avoided unless absolutely necessary for fixing bugs, security issues, and perf regressions. Changes intended for backport should be structured such that a minimum effective diff can be committed separately from any refactoring, plumbing, cleanup, etc that are not strictly necessary to achieve the goal. Any of the latter should go only into master and ride the normal stabilization schedule. Exceptions include CI/metrics changes, CLI improvements and documentation updates on a case by case basis. |
…TERVAL_COUNT) (#10033) streamer/TPU: increase STREAM_LOAD_EMA_INTERVAL_COUNT from 10 to 40 This constant controls the duration of the EMA smoothing window used to reduce sensitivity to short-lived load spikes at the start of a leader slot. Throttling is only triggered when saturation is sustained. The value 40 was chosen based on simulations: at a max target TPS of ~400K, it allows the system to absorb a burst of ~50K transactions over ~40 ms before throttling activates. There is no magic about N=40; the value should be tuned based on the size and duration of spikes we want to tolerate. (cherry picked from commit 51ebbc4) # Conflicts: # streamer/src/nonblocking/stream_throttle.rs
Note
This change is a follow-up to #9580.
The
STREAM_LOAD_EMA_INTERVAL_COUNTconstant controls the duration of the EMA smoothing window used to reduce sensitivity to short-lived load spikes at the start of a leader slot. With #9580 in place, throttling is only triggered when saturation is sustained (reaching 95% of max target).Problem
With 10, the duration of the smoothing window is too short (see the simulation results below).
Summary of Changes
The value 40 was chosen based on simulations: at a max target TPS of ~400K, it allows the system to absorb a burst of ~50K transactions over ~40 ms before throttling activates.
There is no magic about N=40; the value should be tuned based on the size and duration of spikes we want to tolerate.
This choice was made based on simulations: the
alphain the EMA (new_ema = alpha * latest + (1 - alpha) * ema) is basically2/(N+1), whereNisSTREAM_LOAD_EMA_INTERVAL_COUNT.The larger
Nis, the slower the EMA grows (i.e., the larger a burst it can absorb). With N=10 (current code), alpha ≈ 0.18. For example, here’s the EMA growth under sustained load of 1K / 5ms.N=10 (alpha ≈ 0.18)
N=40 (alpha ≈ 0.047)
Below is simulated ingestion of ~60K transactions over 100ms with a spike at the beginning -- roughly corresponding to a pattern we recently saw on mds1 (mainnet), but at about 10x more traffic.
Note: throttling is activated at 95% of the target (500K TPS) load and deactivated at 90%). The quota of 40K basically means unthrottled.
N=10
N=40
With N=40, we can absorb ~50K transactions (with a spike) over ~40ms before throttling gets activated.
Fixes #