Skip to content

Streamer: swap governor crate with TokenBucket#8740

Merged
alexpyattaev merged 1 commit intoanza-xyz:masterfrom
alexpyattaev:rm_governor_2
Nov 5, 2025
Merged

Streamer: swap governor crate with TokenBucket#8740
alexpyattaev merged 1 commit intoanza-xyz:masterfrom
alexpyattaev:rm_governor_2

Conversation

@alexpyattaev
Copy link
Copy Markdown

@alexpyattaev alexpyattaev commented Oct 28, 2025

Problem

Summary of Changes

  • Use TokenBucket instead of governor crate
  • Add a patch to prevent localhost connections from getting ratelimited

CONTEXT

https://discord.com/channels/428295358100013066/560503042458517505/1430348086042951722

@alexpyattaev alexpyattaev force-pushed the rm_governor_2 branch 2 times, most recently from d6cac0d to eba17f2 Compare October 29, 2025 02:03
@codecov-commenter
Copy link
Copy Markdown

codecov-commenter commented Oct 29, 2025

Codecov Report

❌ Patch coverage is 72.88136% with 16 lines in your changes missing coverage. Please review.
✅ Project coverage is 83.2%. Comparing base (b4cce97) to head (d0874c8).

Additional details and impacted files
@@            Coverage Diff            @@
##           master    #8740     +/-   ##
=========================================
- Coverage    83.2%    83.2%   -0.1%     
=========================================
  Files         863      863             
  Lines      373947   373921     -26     
=========================================
- Hits       311236   311184     -52     
- Misses      62711    62737     +26     
🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

@alexpyattaev
Copy link
Copy Markdown
Author

alexpyattaev commented Oct 29, 2025

Extra context:
Previous version of this PR was reverted due to regression in local-cluster

This PR addresses the regression by bumping number of connections per IP for test-validator setups

 cargo nextest run --profile ci --test-threads=1 --package solana-local-cluster -- test_run_test_load_program_accounts_root
    Finished `test` profile [unoptimized + debuginfo] target(s) in 0.28s
────────────
 Nextest run ID e175959f-0a95-4bca-8c3f-a87f3e5e308c with nextest profile: ci
    Starting 1 test across 2 binaries (51 tests skipped)
        PASS [  42.601s] solana-local-cluster::local_cluster test_run_test_load_program_accounts_root
────────────
     Summary [  42.604s] 1 test run: 1 passed, 51 skipped

Full partition:

cargo nextest run --profile ci --package solana-local-cluster --test local_cluster --partition hash:9/10 --test-threads=1 --no-tests=warn 
    Finished `test` profile [unoptimized + debuginfo] target(s) in 0.28s
────────────
 Nextest run ID be8af19b-3b78-4641-8f65-293b324a072d with nextest profile: ci
    Starting 7 tests across 1 binary (45 tests skipped)
        PASS [  18.627s] solana-local-cluster::local_cluster test_local_cluster_signature_subscribe
        PASS [  40.884s] solana-local-cluster::local_cluster test_no_lockout_violation_with_tower
        PASS [  17.204s] solana-local-cluster::local_cluster test_no_voting
  TRY 1 SLOW [> 60.000s] solana-local-cluster::local_cluster test_run_test_load_program_accounts_partition_root
        PASS [  67.404s] solana-local-cluster::local_cluster test_run_test_load_program_accounts_partition_root
        PASS [  49.464s] solana-local-cluster::local_cluster test_run_test_load_program_accounts_root
        PASS [  33.753s] solana-local-cluster::local_cluster test_spend_and_verify_all_nodes_2
  TRY 1 SLOW [> 60.000s] solana-local-cluster::local_cluster test_wait_for_max_stake
        PASS [ 116.574s] solana-local-cluster::local_cluster test_wait_for_max_stake
────────────
     Summary [ 343.917s] 7 tests run: 7 passed (2 slow), 45 skipped

All CI partitions of local-cluster complete in <10 min.

Comment thread core/src/validator.rs
Copy link
Copy Markdown
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Bumping the max_connections_per_ipaddr_per_min way up to avoid tests ever flaking in the future due to ratelimits.

@alexpyattaev alexpyattaev marked this pull request as ready for review October 29, 2025 07:11
@alexpyattaev alexpyattaev requested a review from apfitzge October 29, 2025 07:11
@alexpyattaev alexpyattaev changed the title swap governor crate with TokenBucket Streamer: swap governor crate with TokenBucket Oct 29, 2025
@apfitzge
Copy link
Copy Markdown

Have no context beyond the previously slow CI test. Will defer to @lijunwangs for the review.
Sorry for the trouble 🫡

@lijunwangs
Copy link
Copy Markdown

Have no context beyond the previously slow CI test. Will defer to @lijunwangs for the review. Sorry for the trouble 🫡

@apfitzge can you point to the slow CI test link? Thanks

@alexpyattaev
Copy link
Copy Markdown
Author

Have no context beyond the previously slow CI test. Will defer to @lijunwangs for the review. Sorry for the trouble 🫡

@apfitzge can you point to the slow CI test link? Thanks

https://discord.com/channels/428295358100013066/560503042458517505/1430348086042951722

let quic_server_params = Arc::new(quic_server_params);
let rate_limiter = Arc::new(ConnectionRateLimiter::new(
quic_server_params.max_connections_per_ipaddr_per_min,
// allow for 10x burst to make sure we can accommodate legitimate
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should we do this only to test? Would this make our rate limiting ineffective against abusive clients?

Copy link
Copy Markdown
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It will not. If someone is truly abusive they will exhaust this very quickly, and sustained attack will be stopped, as intended.

Comment thread core/src/validator.rs
pub fn new_for_tests(tpu_enable_udp: bool) -> Self {
let tpu_quic_server_config = SwQosQuicStreamerConfig {
quic_streamer_config: QuicStreamerConfig {
max_connections_per_ipaddr_per_min: 32,
Copy link
Copy Markdown

@KirillLykov KirillLykov Nov 4, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

was this test trying to create > 32 connections per minute? Seems suspicious. I think this should be investigated as a spin-off in a new issue to avoid blocking progress on the current change

Copy link
Copy Markdown
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I did not dive too deeply into test_run_test_load_program_accounts_root. However now that we allow a larger initial burst it should not be necessary to patch new_for_tests.

lijunwangs
lijunwangs previously approved these changes Nov 5, 2025
add burst handling to the per-IP rate limiter

This should allow legit container users to connect
without triggering ratelimits more easily
@alexpyattaev
Copy link
Copy Markdown
Author

Had to rebase due to conflicts + clean up commit messages.

@alexpyattaev alexpyattaev added the automerge automerge Merge this Pull Request automatically once CI passes label Nov 5, 2025
@alexpyattaev alexpyattaev added this pull request to the merge queue Nov 5, 2025
Merged via the queue into anza-xyz:master with commit a28400f Nov 5, 2025
56 checks passed
@alexpyattaev alexpyattaev deleted the rm_governor_2 branch November 5, 2025 15:40
rustopian pushed a commit to rustopian/agave that referenced this pull request Nov 20, 2025
swap governor crate with TokenBucket

add burst handling to the per-IP rate limiter

This should allow legit container users to connect
without triggering ratelimits more easily
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

automerge automerge Merge this Pull Request automatically once CI passes

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants