transport: Make accept async to close the gap on service races#525
Conversation
Signed-off-by: Alexandru Vasile <alexandru.vasile@parity.io>
Signed-off-by: Alexandru Vasile <alexandru.vasile@parity.io>
Signed-off-by: Alexandru Vasile <alexandru.vasile@parity.io>
Signed-off-by: Alexandru Vasile <alexandru.vasile@parity.io>
Signed-off-by: Alexandru Vasile <alexandru.vasile@parity.io>
…-are-propagated-sooner
Signed-off-by: Alexandru Vasile <alexandru.vasile@parity.io>
Signed-off-by: Alexandru Vasile <alexandru.vasile@parity.io>
| })); | ||
| } | ||
| Err(error) => { | ||
| tracing::debug!( |
There was a problem hiding this comment.
Don't we need to clean up some data in this case? Accept will not run at all when we arrive here, but we already modified some state in on_connection_established
| return Some(TransportEvent::ConnectionEstablished { peer, endpoint }); | ||
| } | ||
| Err(error) => { | ||
| tracing::error!( |
There was a problem hiding this comment.
Same as the other comment, if we fail here we need to clean up some data like this here:
litep2p/src/transport/manager/mod.rs
Line 899 in fe1a448
Signed-off-by: Alexandru Vasile <alexandru.vasile@parity.io>
Signed-off-by: Alexandru Vasile <alexandru.vasile@parity.io>
|
|
||
| Ok(Box::pin(async move { | ||
| // First, notify all protocols about the connection establishment | ||
| protocol_set.report_connection_established(peer, endpoint_clone).await?; |
There was a problem hiding this comment.
report_connection_established internally sends an event to protocols. Can it be a problem if the protocol doesn't read the event fast enough?
May be we need to modify InnerTransportEvent::ConnectionEstablished to include oneshot channel and collect ACKs from the protocols?
There was a problem hiding this comment.
Yep this could be a great followup. Would like to keep this separate as we'll need to revalidate if this adds any delays to connection / deadlocks if one single protocol is under pressure during high loads. 🙏
## [0.13.1] - 2026-02-27 This release includes multiple fixes of transports and protocols, fixing connection stability issues with other librariies (specifically, [smoldot](https://github.com/smol-dot/smoldot/)) and increasing success rates of dialing & opening substreams, especially in extreme cases when remote nodes have a lot of private addresses published to the DHT. ### Fixed - ping: Conform to the spec & exclude from connection keep-alive ([#416](#416)) - transport: Make accept async to close the gap on service races ([#525](#525)) - transport: Limit dial concurrency and bound total dialing time ([#538](#538)) - webrtc: Support `FIN`/`FIN_ACK` handshake for substream shutdown ([#513](#513)) - transport: Expose failed addresses to the transport manager ([#529](#529)) ### Changed - manager: Prioritize public addresses for dialing ([#530](#530))
This PR ensures that protocols know about an incoming connection established before the litep2p users.
There is a race condition in the litep2p that is caused by the following:
Under heavy load this can produce a gap, where the higher levels know about the connection, but protocols cannot open a new substream with the said connection.
To close the gap, this PR modified the
acceptfunction to first report the connection to the protocols before the higher levels are inforemd.