Skip to content

streamer: set fixed RX window for all connections#9143

Merged
alexpyattaev merged 1 commit intoanza-xyz:masterfrom
alexpyattaev:streamer_simplify_window_sizing
Nov 23, 2025
Merged

streamer: set fixed RX window for all connections#9143
alexpyattaev merged 1 commit intoanza-xyz:masterfrom
alexpyattaev:streamer_simplify_window_sizing

Conversation

@alexpyattaev
Copy link
Copy Markdown

@alexpyattaev alexpyattaev commented Nov 19, 2025

Problem

  • Streamer RX window management logic is unnecessarily complex
  • We control MAX_DATA and MAX_STREAMS independently even though controlling one of them is sufficient
  • Since most our logic related to streamer operates in streams, setting RX window in bytes is adding unnecessary complications to convert from streams to bytes

Summary of Changes

  • Allow for 8MB MAX_DATA for all connections (enough to reach 200Mbps over 320ms RTT, reasonable connections will never be limited by this)
  • Ensure the SWQOS efficiently throttles connections based on MAX_STREAMS setting

@alexpyattaev alexpyattaev force-pushed the streamer_simplify_window_sizing branch from d5b96d6 to 306d3a1 Compare November 19, 2025 10:28
@alexpyattaev alexpyattaev marked this pull request as ready for review November 19, 2025 10:29
Comment thread streamer/src/streamer.rs
}

impl StakedNodes {
/// Calculate the stake stats: return the new (total_stake, min_stake and max_stake) tuple
Copy link
Copy Markdown
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This can be simplified now that min_stake and max_stake are not used anymore.

Comment thread streamer/src/quic.rs

let config = Arc::get_mut(&mut server_config.transport).unwrap();

// QUIC_MAX_CONCURRENT_STREAMS doubled, which was found to improve reliability
Copy link
Copy Markdown
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This was doing nothing since we override these settings after handshake anyway.

Comment thread streamer/src/streamer.rs
self.total_stake
}

#[inline]
Copy link
Copy Markdown
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

min_stake and max_stake were only used in receive window calculations

use super::*;

#[test]
fn test_cacluate_receive_window_ratio_for_staked_node() {
Copy link
Copy Markdown
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

testing removed code

Comment thread streamer/src/nonblocking/quic.rs Outdated
);
stats.total_connections.fetch_add(1, Ordering::Relaxed);

connection.set_receive_window(CONNECTION_RECEIVE_WINDOW_BYTES);
Copy link
Copy Markdown
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this is the key change - we set RX window to const after connection is fully confirmed.

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

minor: since we now disable streams in the default transport config in configure_server(), it should be safe to set the window value to CONNECTION_RECEIVE_WINDOW_BYTES there too. One less place that tweaks connection settings. wdyt?

Copy link
Copy Markdown
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Moved to server config.

Copy link
Copy Markdown
Author

@alexpyattaev alexpyattaev left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Had to apply more changes to SimpleQos as it did not have own logic to set max_streams. Removing throttling from it completely is not possible since our throttling limits are very low (20 streams / s is normal), so just MAX_STREAMS is not granular enough to achieve enforcement we want for voting.

Copy link
Copy Markdown
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For SimpleQos we either set max_streams to some constant, or compute based on RTT. Computing based on RTT seems less brittle.

Comment thread streamer/src/nonblocking/simple_qos.rs
@codecov-commenter
Copy link
Copy Markdown

codecov-commenter commented Nov 19, 2025

Codecov Report

❌ Patch coverage is 95.23810% with 1 line in your changes missing coverage. Please review.
✅ Project coverage is 82.6%. Comparing base (e8cbabe) to head (7abf156).

Additional details and impacted files
@@            Coverage Diff            @@
##           master    #9143     +/-   ##
=========================================
- Coverage    82.6%    82.6%   -0.1%     
=========================================
  Files         890      890             
  Lines      320845   320773     -72     
=========================================
- Hits       265285   265147    -138     
- Misses      55560    55626     +66     
🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

Comment thread streamer/src/nonblocking/quic.rs Outdated
/// connection to require more bandwidth. This prevents MAX_DATA from affecting
/// the bitrate achieved by a single connection. Actual throttling is achieved based
/// on the number of concurrent streams.
const CONNECTION_RECEIVE_WINDOW_BYTES: VarInt = VarInt::from_u32(8 * 1024 * 1024);
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd remove the "unreasonable to expect a single connection..." part and expose this setting via config. No need to enforce strict limits.

Copy link
Copy Markdown
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Then we will have to explain to users how to choose this value. Our users build from source anyway, can customize it if they really want to.

Comment thread streamer/src/nonblocking/simple_qos.rs
Comment thread streamer/src/nonblocking/simple_qos.rs
@KirillLykov
Copy link
Copy Markdown

KirillLykov commented Nov 21, 2025

I agree that we should used stake parametrization for only one of these MAX_DATA or MAX_STREAMS. Reasoning in terms of streams (transactions) might be simpler for users. Although we can have many small transactions and limiting streams might be a bit more restrictive but since the idea that we don't restrict users when we have some bandwidth for them, doesn't sound like a blocking thing.
I'm sort of neutral to which one to pick. The only thing is that in the discussions I found argument that in quinn implementation moving stream is more expensive than RW. No idea how much more expensive.

Have we checked in some stress tests of isolated streamer + mock-client that this change doesn't lead to any surprises if any?

if let Ok(receive_window) = receive_window {
connection.set_receive_window(receive_window);
}
connection.set_max_concurrent_uni_streams(max_uni_streams);
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why not apply RTT-based scaling like we introduce in simple_qos? In this case, we should also ensure that senders don’t receive less than before -- at least until we have auto-tuning, and maybe even afterward unless these defaults turn out to be inadequate.

Copy link
Copy Markdown
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In a separate PR. One small change at a time.

// for very low values of max_streams_per_second, prevent connections from having zero
// streams in flight
let max_streams_in_flight = max_streams_in_flight.max(STREAMS_IN_FLIGHT_MARGIN);
connection.set_max_concurrent_uni_streams(VarInt::from_u32(max_streams_in_flight));
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It looks like the receive window remained to be set to PACKET_DATA_SIZE (default option in configure_server(), thus limiting votes TPS to 1 per RTT.

Copy link
Copy Markdown
Author

@alexpyattaev alexpyattaev Nov 21, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No, the receive window is set for both QoS modules in the common code, see async fn handle_connection<Q, C>(...)

Copy link
Copy Markdown

@stablebits stablebits Nov 21, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, I wanted to say the 1transaction/RTT problem exists in the current (before this PR) code.

Copy link
Copy Markdown
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, it does exist today indeed. And we're fixing it =)

@alexpyattaev alexpyattaev force-pushed the streamer_simplify_window_sizing branch from fa4887e to 7abf156 Compare November 21, 2025 18:27
@alexpyattaev
Copy link
Copy Markdown
Author

The only thing is that in the discussions I found argument that in quinn implementation moving stream is more expensive than RW. No idea how much more expensive.

Let us fix the QoS logic first to be correct, then we focus on making it performant.

Have we checked in some stress tests of isolated streamer + mock-client that this change doesn't lead to any surprises if any?

Yes, this was tested against mock_clients. no TPS change compared to the case before the change.

@alexpyattaev alexpyattaev added this pull request to the merge queue Nov 23, 2025
Merged via the queue into anza-xyz:master with commit a155621 Nov 23, 2025
47 checks passed
@alexpyattaev alexpyattaev deleted the streamer_simplify_window_sizing branch November 23, 2025 08:53
AvhiMaz pushed a commit to AvhiMaz/agave that referenced this pull request Nov 28, 2025
@alexpyattaev alexpyattaev added the v3.1 Backport to v3.1 branch label Nov 29, 2025
@mergify
Copy link
Copy Markdown

mergify Bot commented Nov 29, 2025

Backports to the beta branch are to be avoided unless absolutely necessary for fixing bugs, security issues, and perf regressions. Changes intended for backport should be structured such that a minimum effective diff can be committed separately from any refactoring, plumbing, cleanup, etc that are not strictly necessary to achieve the goal. Any of the latter should go only into master and ride the normal stabilization schedule. Exceptions include CI/metrics changes, CLI improvements and documentation updates on a case by case basis.

mergify Bot pushed a commit that referenced this pull request Nov 29, 2025
(cherry picked from commit a155621)

# Conflicts:
#	streamer/src/nonblocking/simple_qos.rs
#	streamer/src/nonblocking/swqos.rs
#	streamer/src/streamer.rs
alexpyattaev added a commit that referenced this pull request Dec 3, 2025
(cherry picked from commit a155621)

# Conflicts:
#	streamer/src/nonblocking/simple_qos.rs
#	streamer/src/nonblocking/swqos.rs
#	streamer/src/streamer.rs
alexpyattaev added a commit that referenced this pull request Dec 3, 2025
…9143) (#9330)

* streamer: set fixed RX window for all connections (#9143)

(cherry picked from commit a155621)

# Conflicts:
#	streamer/src/nonblocking/simple_qos.rs
#	streamer/src/nonblocking/swqos.rs
#	streamer/src/streamer.rs

* resolve conflicts

---------

Co-authored-by: Alex Pyattaev <alex.pyattaev@anza.xyz>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

v3.1 Backport to v3.1 branch

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants