This repository was archived by the owner on Nov 15, 2023. It is now read-only.
Adapt to rust-libp2p#1440. #5066
Closed
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Overview
These are the current changes necessary for adapting substrate to libp2p/rust-libp2p#1440. As described in the libp2p PR, the underlying changes are primarily in
libp2p-coreand for the first iteration the impact on thelibp2p-swarmAPI and thus substrate is relatively minimal since at this point the API oflibp2p-swarmdoes not actually permit aNetworkBehaviourto explicitly request multiple connections per peer. That will change later. For the moment, realistically, a second connection to the same peer only occurs if two peers connect to each other "at the same time". As a side-effect, existing connections are also no longer closed in favour of new ones, which should implicitly address #4272, though I didn't get around to verify that yet.The approach to the integration of the libp2p changes taken here can be summarised as follows (also in the code comments).
Details
GenericProtobehaviour aware of all connection handlers (and thus connections), each handler now explicitly emits anInitevent as the very first event, requesting initialisation (enable/disable) from the behaviour. This was previously implicit.send_packetandwrite_notificationalways send all data over the same connection to preserve the ordering provided by the transport, as long as that connection is open. If it closes, a second open connection may take over, if one exists, but that case should be no different than a single connection failing and being re-established in terms of potential reordering and dropped messages. Messages can be received on any connection.GenericProtoOut::CustomProtocolOpenwhen the first connection reportsNotifsHandlerOut::Open.GenericProtoOut::CustomProtocolClosedwhen the last connection reportsNotifsHandlerOut::Closed.In this way, the number of actual established connections to the peer is an implementation detail of the
GenericProtobehaviour. As mentioned before, in practice and at the time of this writing, there may be at most two connections to a peer and only as a result of simultaneous dialing. However, the implementation accommodates for any number of connections.Noteworthy
During intermediate testing with the (by default disabled) integration tests
test_consensus,test_syncandtest_connectivityit was revealed that when run in release mode these tests were very often failing, with the common symptom that the last node to start in a round of testing would often see no other peers (i.e. empty DHT routing table) and thus make no progress while all the others keep on running, causing the tests to time out waiting for the problematic peer to reach a certain state. The tests are mainly usingadd_reserved_peeron the network to initialise the topology, however,add_reserved_peerultimately results in a call toadd_known_peeron theDiscoveryBehaviourwhich did not actually add that address to the Kademlia routing table, though it adds it to theuser_definedpeers which, when passed in the constructor of the behaviour, are added to the Kademlia routing table. I thus changedadd_known_peerto also add the given address to the Kademlia routing table and that resolved the issues with these integration tests and thetest_connectivitytest seems to run notably faster (release mode). My current guess is that the tests were so far unknowingly relying on a timing assumption w.r.t. the initial discovery / connection setup and DHT queries in order for all peers to find each other, in particular when simultaneous connections attempts are in play, as often happens in release mode. Ultimately, the change of lettingadd_known_peeradd the given address to the Kademlia routing table may be a patch worth extracting separately, because it does look like an oversight to me.