Skip to content

use BDP in SWQOS calculations#7706

Closed
alexpyattaev wants to merge 2 commits intoanza-xyz:masterfrom
alexpyattaev:quic_window_qos
Closed

use BDP in SWQOS calculations#7706
alexpyattaev wants to merge 2 commits intoanza-xyz:masterfrom
alexpyattaev:quic_window_qos

Conversation

@alexpyattaev
Copy link
Copy Markdown

@alexpyattaev alexpyattaev commented Aug 25, 2025

Problem

Streamer does not allow larger RX windows for peers with larger RTT, as a result they can not achieve target rates

Summary of Changes

  • Adjust the logic assigning RX window to match the RTT of the connection
  • Reformulate the bandwidth limits in terms of Kbps rather than nominal max-sized solana packets for ease of reasoning
  • Eliminate the min_stake concept, it was always effectively zero due to presence of low-staked nodes on any cluster.

Measurements before change:

{'38raKQgrSVtSvcS9stYWKLZct2JeN11bfWccgwuxnn7n': {'latency': 50},
 '3kbp21mvmHMhAba4KitGsemKuCMqNP3H362x61crezXC': {'latency': 50},
 'FRPwjpqnoDc9UNwBgGPD8RaNkZrXT2Mqyn6txEyNVUSo': {'latency': 50},
 'duration': 3.0,
 'tx-size': 1000}
Server captured 34865 transactions (11621 TPS)
3kbp21mvmHMhAba4KitGsemKuCMqNP3H362x61crezXC: sent=11229 got=11229 lost 0 (3743 TPS)
FRPwjpqnoDc9UNwBgGPD8RaNkZrXT2Mqyn6txEyNVUSo: sent=11851 got=11851 lost 0 (3950 TPS)
38raKQgrSVtSvcS9stYWKLZct2JeN11bfWccgwuxnn7n: sent=11785 got=11785 lost 0 (3928 TPS)

{'38raKQgrSVtSvcS9stYWKLZct2JeN11bfWccgwuxnn7n': {'latency': 100},
 '3kbp21mvmHMhAba4KitGsemKuCMqNP3H362x61crezXC': {'latency': 100},
 'FRPwjpqnoDc9UNwBgGPD8RaNkZrXT2Mqyn6txEyNVUSo': {'latency': 100},
 'duration': 3.0,
 'tx-size': 1000}
Server captured 13286 transactions (4428 TPS)
3kbp21mvmHMhAba4KitGsemKuCMqNP3H362x61crezXC: sent=4310 got=4310 lost 0 (1436 TPS)
FRPwjpqnoDc9UNwBgGPD8RaNkZrXT2Mqyn6txEyNVUSo: sent=4478 got=4478 lost 0 (1492 TPS)
38raKQgrSVtSvcS9stYWKLZct2JeN11bfWccgwuxnn7n: sent=4498 got=4498 lost 0 (1499 TPS)

{'38raKQgrSVtSvcS9stYWKLZct2JeN11bfWccgwuxnn7n': {'latency': 200},
 '3kbp21mvmHMhAba4KitGsemKuCMqNP3H362x61crezXC': {'latency': 200},
 'FRPwjpqnoDc9UNwBgGPD8RaNkZrXT2Mqyn6txEyNVUSo': {'latency': 200},
 'duration': 3.0,
 'tx-size': 1000}
Server captured 2755 transactions (918 TPS)
3kbp21mvmHMhAba4KitGsemKuCMqNP3H362x61crezXC: sent=915 got=766 lost 149 (255 TPS)
FRPwjpqnoDc9UNwBgGPD8RaNkZrXT2Mqyn6txEyNVUSo: sent=998 got=998 lost 0 (332 TPS)
38raKQgrSVtSvcS9stYWKLZct2JeN11bfWccgwuxnn7n: sent=991 got=991 lost 0 (330 TPS)

{'38raKQgrSVtSvcS9stYWKLZct2JeN11bfWccgwuxnn7n': {'latency': 35},
 '3kbp21mvmHMhAba4KitGsemKuCMqNP3H362x61crezXC': {'latency': 100},
 'FRPwjpqnoDc9UNwBgGPD8RaNkZrXT2Mqyn6txEyNVUSo': {'latency': 200},
 'duration': 3.0,
 'tx-size': 1000}
Server captured 22817 transactions (7605 TPS)
3kbp21mvmHMhAba4KitGsemKuCMqNP3H362x61crezXC: sent=4407 got=4407 lost 0 (1469 TPS)
FRPwjpqnoDc9UNwBgGPD8RaNkZrXT2Mqyn6txEyNVUSo: sent=941 got=941 lost 0 (313 TPS)
38raKQgrSVtSvcS9stYWKLZct2JeN11bfWccgwuxnn7n: sent=17469 got=17469 lost 0 (5823 TPS)

and after:

{'5o9jPpKLYrV9eLCMiZXHgtfTmyJAXj6vPjedRSe6VjsT': {'latency': 50},
 'CCoRhxgcAABjTTSUswdDkexB98xj5LAo3uAUjwtY8juS': {'latency': 50},
 'CZYzndWS5djCjy2i1GGy34NcoHSPhqkZ6UTmVLRTuXM1': {'latency': 50},
 'duration': 4.0,
 'tx-size': 1000}
Server captured 28666 transactions (7166 TPS)
CZYzndWS5djCjy2i1GGy34NcoHSPhqkZ6UTmVLRTuXM1: sent=9898 got=9898 lost 0 (2474 TPS)
CCoRhxgcAABjTTSUswdDkexB98xj5LAo3uAUjwtY8juS: sent=9384 got=9384 lost 0 (2346 TPS)
5o9jPpKLYrV9eLCMiZXHgtfTmyJAXj6vPjedRSe6VjsT: sent=9384 got=9384 lost 0 (2346 TPS)


{'5o9jPpKLYrV9eLCMiZXHgtfTmyJAXj6vPjedRSe6VjsT': {'latency': 100},
 'CCoRhxgcAABjTTSUswdDkexB98xj5LAo3uAUjwtY8juS': {'latency': 100},
 'CZYzndWS5djCjy2i1GGy34NcoHSPhqkZ6UTmVLRTuXM1': {'latency': 100},
 'duration': 4.0,
 'tx-size': 1000}
Server captured 26307 transactions (6576 TPS)
CZYzndWS5djCjy2i1GGy34NcoHSPhqkZ6UTmVLRTuXM1: sent=8734 got=8734 lost 0 (2183 TPS)
CCoRhxgcAABjTTSUswdDkexB98xj5LAo3uAUjwtY8juS: sent=8751 got=8751 lost 0 (2187 TPS)
5o9jPpKLYrV9eLCMiZXHgtfTmyJAXj6vPjedRSe6VjsT: sent=8822 got=8822 lost 0 (2205 TPS)


{'5o9jPpKLYrV9eLCMiZXHgtfTmyJAXj6vPjedRSe6VjsT': {'latency': 200},
 'CCoRhxgcAABjTTSUswdDkexB98xj5LAo3uAUjwtY8juS': {'latency': 200},
 'CZYzndWS5djCjy2i1GGy34NcoHSPhqkZ6UTmVLRTuXM1': {'latency': 200},
 'duration': 4.0,
 'tx-size': 1000}
Server captured 12456 transactions (3114 TPS)
CZYzndWS5djCjy2i1GGy34NcoHSPhqkZ6UTmVLRTuXM1: sent=4155 got=4154 lost 1 (1038 TPS)
CCoRhxgcAABjTTSUswdDkexB98xj5LAo3uAUjwtY8juS: sent=4117 got=4117 lost 0 (1029 TPS)
5o9jPpKLYrV9eLCMiZXHgtfTmyJAXj6vPjedRSe6VjsT: sent=4185 got=4185 lost 0 (1046 TPS)

{'5o9jPpKLYrV9eLCMiZXHgtfTmyJAXj6vPjedRSe6VjsT': {'latency': 35},
 'CCoRhxgcAABjTTSUswdDkexB98xj5LAo3uAUjwtY8juS': {'latency': 200},
 'CZYzndWS5djCjy2i1GGy34NcoHSPhqkZ6UTmVLRTuXM1': {'latency': 100},
 'duration': 4.0,
 'tx-size': 1000}
Server captured 22321 transactions (5580 TPS)
CZYzndWS5djCjy2i1GGy34NcoHSPhqkZ6UTmVLRTuXM1: sent=8741 got=8741 lost 0 (2185 TPS)
CCoRhxgcAABjTTSUswdDkexB98xj5LAo3uAUjwtY8juS: sent=4173 got=4173 lost 0 (1043 TPS)
5o9jPpKLYrV9eLCMiZXHgtfTmyJAXj6vPjedRSe6VjsT: sent=9407 got=9407 lost 0 (2351 TPS)

A more complete rework is available in #7745

@alexpyattaev alexpyattaev added the noCI Suppress CI on this Pull Request label Aug 25, 2025
@alexpyattaev alexpyattaev marked this pull request as ready for review August 25, 2025 19:49
@alexpyattaev alexpyattaev added CI Pull Request is ready to enter CI and removed noCI Suppress CI on this Pull Request labels Aug 25, 2025
@anza-team anza-team removed the CI Pull Request is ready to enter CI label Aug 25, 2025
drop(connection_table_l);

if let Ok(receive_window) = receive_window {
connection.set_receive_window(receive_window);
Copy link
Copy Markdown
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

we now set it a bit later


/// Maximal allowed RTT for SWQOS calculations (to limit abuse)
const MAX_ALLOWED_RTT: Duration = Duration::from_millis(200);

Copy link
Copy Markdown
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

These are just placeholder values to be tuned.

..
} = params;

let max_receive_rate_kbps = compute_max_receive_rate_kbps(params.max_stake, params.peer_type);
Copy link
Copy Markdown
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

we compute the RX window immediately after connection is confirmed

Comment thread streamer/src/nonblocking/quic.rs Outdated
stats.total_streams.fetch_sub(1, Ordering::Relaxed);
stream_load_ema.update_ema_if_needed();

let new_window = compute_receive_window_bdp(max_receive_rate_kbps, connection.rtt());
Copy link
Copy Markdown
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

and we re-compute it after a full stream is received to keep it in line with what is going on in the channel

min_stake = 10000;
let ratio = compute_receive_window_ratio_for_staked_node(max_stake, min_stake, max_stake);
assert_eq!(ratio, max_ratio);

Copy link
Copy Markdown
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Since we embrace min_stake == 0 we can eliminate this part of the test.

@codecov-commenter
Copy link
Copy Markdown

codecov-commenter commented Aug 25, 2025

Codecov Report

❌ Patch coverage is 96.42857% with 2 lines in your changes missing coverage. Please review.
✅ Project coverage is 83.0%. Comparing base (996570b) to head (f79a707).
⚠️ Report is 2343 commits behind head on master.

Additional details and impacted files
@@           Coverage Diff           @@
##           master    #7706   +/-   ##
=======================================
  Coverage    83.0%    83.0%           
=======================================
  Files         812      812           
  Lines      356934   356907   -27     
=======================================
+ Hits       296535   296557   +22     
+ Misses      60399    60350   -49     
🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

do not update RX window on every TX, only every 64 TXs
bump max RTT to 300ms based on popular request
@alexpyattaev
Copy link
Copy Markdown
Author

closing in favor of #7954

@alexpyattaev alexpyattaev deleted the quic_window_qos branch September 10, 2025 07:56
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants