Skip to content

transport: Make accept async to close the gap on service races#525

Merged
lexnv merged 13 commits into
masterfrom
lexnv/ensure-events-are-propagated-sooner
Feb 26, 2026
Merged

transport: Make accept async to close the gap on service races#525
lexnv merged 13 commits into
masterfrom
lexnv/ensure-events-are-propagated-sooner

Conversation

@lexnv
Copy link
Copy Markdown
Collaborator

@lexnv lexnv commented Jan 27, 2026

This PR ensures that protocols know about an incoming connection established before the litep2p users.

There is a race condition in the litep2p that is caused by the following:

  • connection established are reported immediately to the users (the ones that poll the litep2p object to advacne the network)
  • then the notification / request response protocols are informed about the connection

Under heavy load this can produce a gap, where the higher levels know about the connection, but protocols cannot open a new substream with the said connection.

To close the gap, this PR modified the accept function to first report the connection to the protocols before the higher levels are inforemd.

lexnv added 3 commits January 27, 2026 14:42
Signed-off-by: Alexandru Vasile <alexandru.vasile@parity.io>
Signed-off-by: Alexandru Vasile <alexandru.vasile@parity.io>
Signed-off-by: Alexandru Vasile <alexandru.vasile@parity.io>
Comment thread src/transport/webrtc/connection.rs Outdated
Comment thread src/transport/manager/mod.rs Outdated
lexnv added 4 commits February 3, 2026 09:25
Signed-off-by: Alexandru Vasile <alexandru.vasile@parity.io>
Signed-off-by: Alexandru Vasile <alexandru.vasile@parity.io>
Signed-off-by: Alexandru Vasile <alexandru.vasile@parity.io>
Comment thread src/transport/manager/mod.rs Outdated
Comment thread src/transport/manager/mod.rs Outdated
Signed-off-by: Alexandru Vasile <alexandru.vasile@parity.io>
@lexnv lexnv changed the title wip/transport: Make accept async to close the gap on service races transport: Make accept async to close the gap on service races Feb 19, 2026
lexnv and others added 2 commits February 19, 2026 15:53
}));
}
Err(error) => {
tracing::debug!(
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Don't we need to clean up some data in this case? Accept will not run at all when we arrive here, but we already modified some state in on_connection_established

return Some(TransportEvent::ConnectionEstablished { peer, endpoint });
}
Err(error) => {
tracing::error!(
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same as the other comment, if we fail here we need to clean up some data like this here:

.accept_established_connection(endpoint.connection_id(), endpoint.is_listener());

Signed-off-by: Alexandru Vasile <alexandru.vasile@parity.io>
Signed-off-by: Alexandru Vasile <alexandru.vasile@parity.io>
Comment thread src/transport/quic/mod.rs

Ok(Box::pin(async move {
// First, notify all protocols about the connection establishment
protocol_set.report_connection_established(peer, endpoint_clone).await?;
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

report_connection_established internally sends an event to protocols. Can it be a problem if the protocol doesn't read the event fast enough?

May be we need to modify InnerTransportEvent::ConnectionEstablished to include oneshot channel and collect ACKs from the protocols?

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yep this could be a great followup. Would like to keep this separate as we'll need to revalidate if this adds any delays to connection / deadlocks if one single protocol is under pressure during high loads. 🙏

@lexnv lexnv merged commit c43c1a9 into master Feb 26, 2026
8 checks passed
@lexnv lexnv deleted the lexnv/ensure-events-are-propagated-sooner branch February 26, 2026 10:53
dmitry-markin added a commit that referenced this pull request Feb 27, 2026
## [0.13.1] - 2026-02-27

This release includes multiple fixes of transports and protocols, fixing
connection stability issues with other librariies (specifically,
[smoldot](https://github.com/smol-dot/smoldot/)) and increasing success
rates of dialing & opening substreams, especially in extreme cases when
remote nodes have a lot of private addresses published to the DHT.

### Fixed

- ping: Conform to the spec & exclude from connection keep-alive
([#416](#416))
- transport: Make accept async to close the gap on service races
([#525](#525))
- transport: Limit dial concurrency and bound total dialing time
([#538](#538))
- webrtc: Support `FIN`/`FIN_ACK` handshake for substream shutdown
([#513](#513))
- transport: Expose failed addresses to the transport manager
([#529](#529))

### Changed

- manager: Prioritize public addresses for dialing
([#530](#530))
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants