Streamer/TPU: scale amount of bytes in flight with peer RTT by alexpyattaev · Pull Request #7745 · anza-xyz/agave

alexpyattaev · 2025-08-27T13:13:41Z

Problem

SWQOS ignores RTT in some of its calculations. This means that connections with high latency are heavily rate limited, much more than the stake amount would suggest. This happens before throttling even has a change to kick in, and is thus very counterintuitive to the client.
A deeper version of #7706. These PRs are mutually exclusive.

Sender can not have more than receive_window bytes in flight between itself and server
Sender can not have more than max_concurrent_streams worth of open streams at any time
These together limit how many TXs can be “in flight” between client and server and not yet ACKd. Currently, both these limits are computed based on stake. We need to compute them based on stake and RTT to the client (as longer RTT means you need to have more things on the wire before you see an ACK).

This mechanism is not intended as the actual rate limiter for complaint clients, just as a limit on network buffers and such like. This is designed to allocate more bandwidth than what you'd get today. Note that a client can open up to 8 concurrent connections per identity, allowing for up to 16000 TPS on the network level (before throttling). So this should not reduce the TPU bandwidth in any way. If operators want to limit TPS on a per-client basis, they should use throttling logic for it rather than receive_window.

Number of concurrent streams should match the BDP of the link to prevent starvation of the client.
Throughput is not limited by window size or number of streams in flight, but rather the throttling logic behind all this that operates on the per-staked-identity basis, not per connection.

Summary of Changes

Make number of concurrent streams allowed scale with RX window size.

Without this PR, for the same set of nodes each holding 20% of stake:

{'latency': 10,
'clients': 5,
 'duration': 10.0,
 'tx-size': 250}
Server captured 468315 transactions (46831 TPS)


{'latency': 200,
'clients': 5,
 'duration': 10.0,
 'tx-size': 250}
Server captured 57811 transactions (5781 TPS)

With this PR:

{'latency': 10,
'clients': 5,
'duration': 10.0,
 'tx-size': 250}
Server captured 196799 transactions (19679 TPS)


{'latency': 200,
'clients': 5,
 'duration': 10.0,
 'tx-size': 250}
Server captured 135651 transactions (13565 TPS)

Clearly, the PR makes the changes in TPS as a result of latency variations less severe.

Importantly, we are getting > 2000 TPS per staked connection (so up to 16000 TPS per staked identity before throttling)
Currently, mainnet is serving ~1000 TPS, so this should never limit the overall TPS.

codecov-commenter · 2025-08-27T14:35:57Z

Codecov Report

❌ Patch coverage is 98.91304% with 1 line in your changes missing coverage. Please review.
✅ Project coverage is 83.1%. Comparing base (4a3e05a) to head (3cf2ba1).
⚠️ Report is 2358 commits behind head on master.

Additional details and impacted files

@@           Coverage Diff           @@
##           master    #7745   +/-   ##
=======================================
  Coverage    83.0%    83.1%           
=======================================
  Files         812      812           
  Lines      356900   356820   -80     
=======================================
- Hits       296578   296529   -49     
+ Misses      60322    60291   -31

🚀 New features to boost your workflow:

❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

lvboudre · 2025-08-27T19:52:23Z

@alexpyattaev

I think we need a max stream limit, because window size alone is not a good proxy for bandwidth.

Window size represents the number bytes that can be send without waiting for an acknowledgement.
If someone receives ACK quickly, the window size capacity will never be filled.
This is why window-size needs to be greater for higher-latency peers.

Someone with really low latency, that receives ACK faster from remote validator will be able to send more transaction (thus use more bandwidth) than other higher-stake peer.

IMO, max-concurrent stream is closer to what we call "bandwidth".

EDIT:

I didn't notice compute_receive_window_bdp that was used to compute the new max_concurrrent_stream.

alexpyattaev · 2025-08-28T04:25:38Z

@alexpyattaev

I think we need a max stream limit, because window size alone is not a good proxy for bandwidth.

Unfortunately, the stream limit is an even worse proxy for bandwidth on the wire since a stream can carry anywhere between 180 and 1232 bytes, thus having a factor of 7 "error" in terms of how many bits per second we allow.

Thus, if we rate limit primarily based on streams and not Rx window, it becomes harder to predict how much bandwidth a given TPU peer will ultimately be able to use. Once 4KB transactions are available, this variability will be even more pronounced. This will force the server to be more conservative with bandwidth allocations, resulting in less bandwidth for everyone.

do not update RX window on every TX, only every 64 TXs bump max RTT to 300ms based on popular request

Not adjusting stream count during connection lifetime reduces allocations

alexpyattaev · 2025-09-03T07:45:57Z

+const MAX_ALLOWED_RTT: Duration = Duration::from_millis(300);
+
+/// Maximal possible amount of streams to allocate per connection
+const MAX_ALLOWED_UNI_STREAMS: u64 = 1024;


This used to max out at 512 for highest staked peer. This is not intended for throttling.

alexpyattaev · 2025-09-08T12:57:33Z

Ran this on mds1 for 3 days, no obvious issues found.

…nza-xyz#7745)" This reverts commit 82716b8.

#7953) Revert "Streamer/TPU: scale amount of bytes in flight with peer RTT (#7745)" This reverts commit 82716b8.

lijunwangs · 2025-09-08T17:32:59Z

 const CONNECTION_CLOSE_CODE_DISALLOWED: u32 = 2;
 const CONNECTION_CLOSE_REASON_DISALLOWED: &[u8] = b"disallowed";

-const CONNECTION_CLOSE_CODE_EXCEED_MAX_STREAM_COUNT: u32 = 3;


Why is this removed? We still have max streams no?

Yes, we do. It is enforced by quinn.

…#7745) * use BDP in SWQOS calculations * set number of streamer streams based on BDP do not update RX window on every TX, only every 64 TXs Set max RTT to 300ms

…#7745) * use BDP in SWQOS calculations * set number of streamer streams based on BDP do not update RX window on every TX, only every 64 TXs Set max RTT to 300ms address review comments from Lijun

…#7745) * use BDP to compute the rx window before SWQOS throttling is applied * set number of streamer streams based on BDP do not update RX window on every TX, only every 64 TXs Set max RTT to 300ms

…#7745) * use BDP to compute the rx window before SWQOS throttling is applied * set number of streamer streams based on BDP * target up to 80 Mbps service rate per max-staked connection do not update RX window on every TX, only every 64 TXs Set max RTT to 400ms

…#7745) * use BDP to compute the rx window before SWQOS throttling is applied, helps high-latency senders (>50ms RTT) get reasonable TPS * set number of streamer streams based on BDP (for same reason) * For any RTT below 400ms, target up to 80 Mbps service rate per max-staked connection * add a workaround to keep giving higher bandwidth to close-by nodes * update RX window every 128 TXs in case someone tries to spoof it

alexpyattaev requested a review from KirillLykov August 27, 2025 15:17

alexpyattaev force-pushed the autotune_number_of_streams branch from d1b02ea to 644e3f8 Compare August 27, 2025 19:35

alexpyattaev requested a review from alessandrod August 27, 2025 19:39

alexpyattaev force-pushed the autotune_number_of_streams branch 2 times, most recently from bf5768c to e267c8a Compare August 27, 2025 19:44

alexpyattaev added 4 commits August 29, 2025 08:20

use BDP in SWQOS calculations

fe10c16

tuning the parameters a bit

73dbcdc

do not update RX window on every TX, only every 64 TXs bump max RTT to 300ms based on popular request

autotune number of streamer streams

a61325a

do not adjust number of streams dynamically

3cf2ba1

Not adjusting stream count during connection lifetime reduces allocations

alexpyattaev force-pushed the autotune_number_of_streams branch from 6f2c114 to 3cf2ba1 Compare August 29, 2025 08:20

alexpyattaev marked this pull request as ready for review August 29, 2025 08:20

alexpyattaev mentioned this pull request Aug 29, 2025

use BDP in SWQOS calculations #7706

Closed

alexpyattaev changed the title ~~autotune number of streamer streams~~ Streamer/TPU: scale receive window and number of streams with peer RTT Aug 29, 2025

alexpyattaev commented Sep 3, 2025

View reviewed changes

alexpyattaev changed the title ~~Streamer/TPU: scale receive window and number of streams with peer RTT~~ Streamer/TPU: scale amount of bytes in flight with peer RTT Sep 3, 2025

KirillLykov approved these changes Sep 3, 2025

View reviewed changes

alexpyattaev merged commit 82716b8 into anza-xyz:master Sep 8, 2025
43 checks passed

alexpyattaev deleted the autotune_number_of_streams branch September 8, 2025 12:59

brooksprumo mentioned this pull request Sep 8, 2025

fix build due to fmt #7952

Closed

alexpyattaev added a commit to alexpyattaev/agave that referenced this pull request Sep 8, 2025

Revert "Streamer/TPU: scale amount of bytes in flight with peer RTT (a…

fd7902f

…nza-xyz#7745)" This reverts commit 82716b8.

alexpyattaev restored the autotune_number_of_streams branch September 8, 2025 14:19

alexpyattaev deleted the autotune_number_of_streams branch September 8, 2025 14:19

This was referenced Sep 8, 2025

use BDP for streamer window size calculations #7954

Closed

Revert "Streamer/TPU: scale amount of bytes in flight with peer RTT (… #7953

Merged

alexpyattaev added a commit that referenced this pull request Sep 8, 2025

Revert "Streamer/TPU: scale amount of bytes in flight with peer RTT (… (

d9547a0

#7953) Revert "Streamer/TPU: scale amount of bytes in flight with peer RTT (#7745)" This reverts commit 82716b8.

lijunwangs reviewed Sep 8, 2025

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Streamer/TPU: scale amount of bytes in flight with peer RTT#7745

Streamer/TPU: scale amount of bytes in flight with peer RTT#7745
alexpyattaev merged 4 commits intoanza-xyz:masterfrom
alexpyattaev:autotune_number_of_streams

alexpyattaev commented Aug 27, 2025 •

edited

Loading

Uh oh!

codecov-commenter commented Aug 27, 2025 •

edited

Loading

Uh oh!

lvboudre commented Aug 27, 2025 •

edited

Loading

Uh oh!

alexpyattaev commented Aug 28, 2025

Uh oh!

alexpyattaev Sep 3, 2025 •

edited

Loading

Uh oh!

alexpyattaev commented Sep 8, 2025

Uh oh!

Uh oh!

lijunwangs Sep 8, 2025

Uh oh!

alexpyattaev Sep 8, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

5 participants

Conversation

alexpyattaev commented Aug 27, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Problem

Summary of Changes

Uh oh!

codecov-commenter commented Aug 27, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Codecov Report

Uh oh!

lvboudre commented Aug 27, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

alexpyattaev commented Aug 28, 2025

Uh oh!

alexpyattaev Sep 3, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

alexpyattaev commented Sep 8, 2025

Uh oh!

Uh oh!

lijunwangs Sep 8, 2025

Choose a reason for hiding this comment

Uh oh!

alexpyattaev Sep 8, 2025

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

5 participants

alexpyattaev commented Aug 27, 2025 •

edited

Loading

codecov-commenter commented Aug 27, 2025 •

edited

Loading

lvboudre commented Aug 27, 2025 •

edited

Loading

alexpyattaev Sep 3, 2025 •

edited

Loading