Skip to content

multiquic support in solana_streamer -- rebase of #634 to latest master#1452

Merged
lijunwangs merged 13 commits intoanza-xyz:masterfrom
lijunwangs:multi-quic-rebase
Jun 21, 2024
Merged

multiquic support in solana_streamer -- rebase of #634 to latest master#1452
lijunwangs merged 13 commits intoanza-xyz:masterfrom
lijunwangs:multi-quic-rebase

Conversation

@lijunwangs
Copy link
Copy Markdown

@lijunwangs lijunwangs commented May 22, 2024

Problem

We have found in production that the quic streamer endpoint is overwhelmed with connections and txn packets that cause large number of connection timeout or send transaction time out. With multiple endpoints, multiple threads can be utilized to ingest the input packets.

Summary of Changes

  • tpu: use multiple quic endpoints

  • cluster-info: manage port range by hand...

  • local-cluster: keep udp tpu socket around for tests

  • First set the number of endpoint to 1 -- pending more test results in the latest v1.18 on skip rage problem.

Fixes #

@lijunwangs lijunwangs force-pushed the multi-quic-rebase branch from cf9ae04 to 8250821 Compare June 4, 2024 23:30
@lijunwangs lijunwangs changed the title multiquic support in solana_streamer -- rebase of #634 multiquic support in solana_streamer -- rebase of #634 to latest master Jun 4, 2024
@alessandrod
Copy link
Copy Markdown

I've re-re-re-reviewed this. It looks great (😏), except for the manual range thing, which I would try to undo.

See this commit: solana-labs@fb0eea2. I think if we revert this - so we don't set reuseport=true in multi_bind_in_range - it should work (tm).

I don't think that reverting that commit actually breaks anything. In most other places we don't set reuseport=true, so WSL will break there anyway. I'd just revert it.

Comment thread local-cluster/src/local_cluster.rs
Comment thread net-utils/src/lib.rs Outdated
Comment thread net-utils/src/lib.rs Outdated
@lijunwangs lijunwangs merged commit 2443048 into anza-xyz:master Jun 21, 2024
samkim-crypto pushed a commit to samkim-crypto/agave that referenced this pull request Jul 31, 2024
…est master (anza-xyz#1452)

* net-utils: support SO_REUSEPORT

tpu: use multiple quic endpoints

cluster-info: manage port range by hand...

local-cluster: keep udp tpu socket around for tests

* Missing cargo file

* sort cargo.toml

* divide the concurrent_connections among the endpoints for multiquic

* Change default multiquic endpoint count to 1

* Missing Cargo.lock changes

* revert reuseaddr changes

* revert reuseaddr changes;fmt code

* reverted port range changes

* revert DEFAULT_TPU_ENABLE_UDP change in local_cluster

* Turn tpu_enable_udp to true to prevent concurrent local cluster tests to use the same QUIC ports

* changed QUIC_ENDPOINTS to 10 for testing

* Turn QUIC_ENDPOINTS to 1 for now

---------

Co-authored-by: Trent Nelson <trent@solana.com>
Co-authored-by: Lijun Wang <lijun.wang@oracle.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants