perf: tighten network timeouts for faster connections#68
Merged
Conversation
Reduce timeouts based on RTT analysis (300ms worst-case cross-globe RTT). Previous values caused excessive delays when peers were unreachable. Changes: - DIAL_TIMEOUT: 90s → 25s (derived from strategy stage sum + margin) - DEFAULT_CONNECTION_TIMEOUT_SECS: 90s → 25s (matches DIAL_TIMEOUT) - SEND_TIMEOUT: 15s → 10s (3x margin over 4MB at 10Mbps) - REQUEST_TIMEOUT: 30s → 10s (aligned with libp2p/BEP 5 practice) - IDENTITY_EXCHANGE_TIMEOUT: 10s → 5s (1-2 RTTs + margin) - BOOTSTRAP_IDENTITY_TIMEOUT_SECS: 10s → 5s (matches identity exchange) Companion to saorsa-transport timeout tightening. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Contributor
There was a problem hiding this comment.
Pull request overview
Reduces several network operation timeouts to shorten failure detection and improve overall connection/send responsiveness (especially when peers/addresses are stale).
Changes:
- Reduced peer dial timeout in the transport adapter (90s → 25s).
- Reduced raw send path timeout in the transport adapter (15s → 10s).
- Reduced default node connection timeout and DHT handler / identity-exchange timeouts (e.g., 90s → 25s; 30s → 10s; 10s → 5s).
Reviewed changes
Copilot reviewed 3 out of 3 changed files in this pull request and generated 1 comment.
| File | Description |
|---|---|
| src/transport/saorsa_transport_adapter.rs | Tightens dial and send operation timeouts used by the adapter APIs. |
| src/network.rs | Lowers the default node connection timeout and bootstrap identity exchange timeout. |
| src/dht_network_manager.rs | Lowers DHT handler request timeout and identity exchange timeout caps. |
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
send_to_peer_raw calls the same dial_addr path as connect_to_peer but had only 10s — less than the relay fallback alone. Bump to 25s to match. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Aligns with the tighter timeout profile while staying above the relay stage (~10s) so dial_candidate's NAT traversal cascade is not truncated. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Point saorsa-transport dependency at the saorsa-labs Git repo on the rc-2026.4.1 release-candidate branch instead of crates.io. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
jacderida
approved these changes
Apr 4, 2026
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
DIAL_TIMEOUTandDEFAULT_CONNECTION_TIMEOUT_SECSreduced from 90s→25s (derived from sum of connection strategy stages)Timeout changes
DIAL_TIMEOUTDEFAULT_CONNECTION_TIMEOUT_SECSSEND_TIMEOUTREQUEST_TIMEOUTIDENTITY_EXCHANGE_TIMEOUTBOOTSTRAP_IDENTITY_TIMEOUT_SECSTest plan
cargo check— cleancargo fmt --all -- --check— cleancargo clippy --lib --all-features -- -D warnings— clean🤖 Generated with Claude Code
Greptile Summary
This PR reduces six network timeouts across the transport adapter, DHT network manager, and node configuration — cutting dial/connection timeouts from 90s→25s, send timeout from 15s→10s, and request/identity timeouts from 30s→10s and 10s→5s. The rationale (sum-of-stages analysis: direct 2s + hole-punch 3+1s + relay 10s = ~20s, 25s with margin) is well-documented and the approach is sound for the majority of changes. However,
SEND_TIMEOUTinsend_to_peer_rawpresents a concrete issue: it wraps bothdial_addrand the data transfer in the same 10s budget, while the companionDIAL_TIMEOUTanalysis documents that relay connections alone take ~10s, leaving no time for data writes in relay-only NAT scenarios.DIAL_TIMEOUT90s→25s: Well-justified from stage analysis; consistent withDEFAULT_CONNECTION_TIMEOUT_SECS.SEND_TIMEOUT15s→10s:dial_addrcall (up to ~20s for relay) and the actual transfer — relay-only peers will almost certainly time out before any data is sent.REQUEST_TIMEOUT30s→10s: Reasonable tightening for DoS-protection on message handlers.IDENTITY_EXCHANGE_TIMEOUT10s→5s: Safe; themin(config.request_timeout, IDENTITY_EXCHANGE_TIMEOUT)guard still correctly applies the 5s cap.DEFAULT_CONNECTION_TIMEOUT_SECS90s→25s andBOOTSTRAP_IDENTITY_TIMEOUT_SECS10s→5s: Well-reasoned, no concerns.Confidence Score: 3/5
Mostly safe to merge, but SEND_TIMEOUT in send_to_peer_raw is too tight for relay paths and will cause silent send failures to NAT'd peers requiring relay connections
Five of six timeout changes are well-reasoned and consistent with the documented stage-sum analysis. The SEND_TIMEOUT change introduces a concrete regression: by wrapping dial_addr (which can take ~20s in relay scenarios) inside a 10s budget, sends to relay-only peers will reliably time out. The E2E testnet validation across mixed NAT nodes is also still pending. These two factors lower confidence from what would otherwise be a 4-5.
src/transport/saorsa_transport_adapter.rs — specifically the SEND_TIMEOUT constant and send_to_peer_raw function; verify whether saorsa-transport's dial_addr reuses existing connections before applying the new timeout
Important Files Changed
Sequence Diagram
sequenceDiagram participant Caller participant send_to_peer_raw participant dial_addr participant Transport Caller->>send_to_peer_raw: send(addr, data) Note over send_to_peer_raw: SEND_TIMEOUT = 10s (entire block) send_to_peer_raw->>dial_addr: dial_addr(addr) dial_addr->>Transport: direct attempt (~2s) alt direct succeeds Transport-->>dial_addr: conn ✓ else hole-punch dial_addr->>Transport: hole-punch rounds (~3s + 1s) alt hole-punch succeeds Transport-->>dial_addr: conn ✓ else relay fallback dial_addr->>Transport: relay connection (~10s) Note over dial_addr,Transport: Relay alone ≈ 10s = SEND_TIMEOUT ⚠️ Transport-->>dial_addr: conn (or timeout ❌) end end dial_addr-->>send_to_peer_raw: conn send_to_peer_raw->>Transport: open_uni_typed + write_all + finish Transport-->>send_to_peer_raw: Ok(()) send_to_peer_raw-->>Caller: Ok(()) or Elapsed timeout errorComments Outside Diff (2)
src/transport/saorsa_transport_adapter.rs, line 520-558 (link)SEND_TIMEOUTtoo tight to cover relay dial + data transfersend_to_peer_rawwraps bothdial_addrand the subsequent stream write inside the same 10-second budget. According to the updated comment onDIAL_TIMEOUTinconnect_to_peer(lines 440–443), the full NAT-traversal flow is:The relay fallback alone is documented as 10s, which exactly equals the new
SEND_TIMEOUT. For a fresh relay connection to a NAT'd peer,dial_addrcan consume the entire 10-second window before a single byte of data is written — meaningsend_to_peer_rawwill reliably time out in the worst-case relay scenario.connect_to_peercorrectly budgets 25s for the same dial path viaDIAL_TIMEOUT.send_to_peer_rawinvokes the same transport but has only 10s total for dial + transfer. Consider splitting the timeout budget:Prompt To Fix With AI
src/dht_network_manager.rs, line 2539-2541 (link)DEFAULT_REQUEST_TIMEOUT_SECSnot updated alongsideREQUEST_TIMEOUTThe module-level
REQUEST_TIMEOUT(DoS-protection timeout on DHT message handlers) was reduced from 30s → 10s. However,DEFAULT_REQUEST_TIMEOUT_SECSstill defaults to 30s and feedsDhtNetworkConfig::default().request_timeout, which drives:response_timeoutinwait_for_dht_response(30s)dial_timeout = transport.connection_timeout().min(config.request_timeout)These two constants serve different roles so diverging values may be intentional, but given the PR's stated goal of a 10s request budget it is worth confirming
DEFAULT_REQUEST_TIMEOUT_SECSshould remain at 30s.Prompt To Fix With AI
Prompt To Fix All With AI
Reviews (1): Last reviewed commit: "perf: tighten network timeouts for faste..." | Re-trigger Greptile