Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
26 changes: 6 additions & 20 deletions src/transport_handle.rs
Original file line number Diff line number Diff line change
Expand Up @@ -1528,26 +1528,12 @@ impl TransportHandle {
continue;
}

// Spawn registration work so the accept loop immediately
// returns to draining the handshake channel. Previously
// the two write locks below were taken inline, serialising
// the accept loop behind `peers` and `active_connections`
// contention. In a 1000-node network this caused the
// bounded handshake channel (cap 32) to fill, blocking all
// new connection handoffs and stalling identity exchange.
let peers = peers.clone();
let active_connections = active_connections.clone();
let handle = tokio::spawn(async move {
let channel_id = remote_sock.to_string();
let remote_addr = MultiAddr::quic(remote_sock);
register_new_channel(&peers, &channel_id, &remote_addr).await;
active_connections.write().await.insert(channel_id);
});
tokio::spawn(async move {
if let Err(e) = handle.await {
warn!("Accept registration task failed: {}", e);
}
});
let channel_id = remote_sock.to_string();
let remote_addr = MultiAddr::quic(remote_sock);
// PeerConnected is emitted later when the peer's identity is
// authenticated via a signed message — not at transport level.
register_new_channel(&peers, &channel_id, &remote_addr).await;
active_connections.write().await.insert(channel_id);
Comment on lines +1531 to +1536
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P1 Accept loop write lock stall re-introduced

This restores the exact code that PR #82 identified and fixed: register_new_channel acquires a peers write lock and the following line acquires an active_connections write lock, both inline inside the accept loop. At 1000-node scale, lock contention on these two guards causes the loop to fall behind draining dual.accept_any(), which sits behind the bounded handshake channel (cap 32 in saorsa-transport). Once that channel fills, all new connection handoffs block — identity exchange times out and upload performance degrades (documented: 175 s → 358 s+ after 17 h of operation).

The PR description says PR #82 "introduced a regression" but does not describe it. Without knowing what that regression was, it is impossible to evaluate whether reverting (and re-exposing this documented production failure mode) represents the right tradeoff, or whether a targeted fix to #82's implementation would be safer.

Prompt To Fix With AI
This is a comment left during a code review.
Path: src/transport_handle.rs
Line: 1531-1536

Comment:
**Accept loop write lock stall re-introduced**

This restores the exact code that PR #82 identified and fixed: `register_new_channel` acquires a `peers` write lock and the following line acquires an `active_connections` write lock, both inline inside the accept loop. At 1000-node scale, lock contention on these two guards causes the loop to fall behind draining `dual.accept_any()`, which sits behind the bounded handshake channel (cap 32 in `saorsa-transport`). Once that channel fills, all new connection handoffs block — identity exchange times out and upload performance degrades (documented: 175 s → 358 s+ after 17 h of operation).

The PR description says PR #82 "introduced a regression" but does not describe it. Without knowing what that regression was, it is impossible to evaluate whether reverting (and re-exposing this documented production failure mode) represents the right tradeoff, or whether a targeted fix to #82's implementation would be safer.

How can I resolve this? If you propose a fix, please make it concise.

}
});
*self.listener_handle.write().await = Some(handle);
Expand Down
153 changes: 0 additions & 153 deletions tests/accept_loop_stress.rs

This file was deleted.

Loading