protocols/identify: Fix race condition in discover_peer_after_disconnect #2744
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Description
Summary of the plot of the
discover_peer_after_disconnecttest:swarm2connects toswarm1.swarm2requests an identify response fromswarm1.swarm1sends the response toswarm2.swarm2disconnects fromswarm1.swarm2tries to disconnect.Problem
libp2p-identifysetsKeepAlive::Nowhen it identified the remote. Thusswarm1mightidentify
swarm2beforeswarm2identifiedswarm1.swarm1then setsKeepAlive::Noand thus closes the connection toswarm2beforeswarm2identifiedswarm1. In such case the unit testdiscover_peer_after_disconnect hangs indefinitely.Solution
Add an initial delay to
swarm1requesting an identification fromswarm2, thus ensuringswarm2is always able to identify
swarm1.Links to any relevant issues
Fixes #2743
Open Questions
We could as well introduce
libp2p-pingand setPingConfig::with_keep_alive. I think the fix here is sufficient though.As another alternative, instead of setting
KeepAlive::Noonce we identified the remote, we could as well setKeepAlive::Until(Instant::now() + Duration::from_secs(10)), thus giving the remote a chance to identify us.Change checklist