Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

litep2p/req-resp: Always provide main protocol name in responses #6603

Merged
merged 4 commits into from
Nov 27, 2024

Conversation

lexnv
Copy link
Contributor

@lexnv lexnv commented Nov 21, 2024

Request responses are initialized with a main protocol name, and optional protocol names as a fallback.

Running litep2p in kusama as a validator has surfaced a debug_asserts coming from the sync component:

warn!(
target: LOG_TARGET,
"Unexpected generic response protocol {protocol_name}, strategy key \
{key:?}",
);
debug_assert!(false);
}

The issue is that we initiate a request-response over the main protocol name /genesis/sync/2 but receive a response over the legacy procotol ksm/sync/2. This behavior is correct because litep2p propagates to the higher levels the protocol that responded.

In contrast, libp2p provides the main protocol name regardless of negotiating a legacy protocol.

Because of this, higher level components assume that only the main protocol name will respond.
To not break this assumption, this PR alings litep2p shim layer with the libp2p behavior.

Closes: #6581

response,
fallback.map_or_else(|| self.protocol.clone(), Into::into),
)));
let _ = tx.send(Ok((response, self.protocol.clone())));
Copy link
Contributor

@dmitry-markin dmitry-markin Nov 22, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Unfortunately, this is not a fix for our issue because the fallback mentioned here is actually a fallback protocol, and not a fallback name. Here we need to pass the actual name of the fallback protocol for the client code to correctly implement protocol backward-compatibility (e.g., correctly process a different protobuf schema).

I have created an issue in litep2p for adding support for actual fallback names:
paritytech/litep2p#289

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

To sum up the difference in operation of libp2p & litep2p network backends:

libp2p network backend

  1. libp2p backend is responsible for sending the fallback request, not libp2p.
  2. If only the fallback protocol name was used on the wire and no fallback request happened, the main protocol name is returned to the client code
  3. If the fallback request was sent, the name of the fallback request protocol is returned to the client code (as it is just another protocol from the libp2p backend perspective).

litep2p network backend

  1. litep2p sends the fallback request internally and it is the one responsible for actually performing the fallback request, not the network backend.

    Surprisingly enough, litep2p network backend also has the logic of handling fallback requests in case litep2p returned UnsupportedProtocol for some reason.

  2. If only the fallback protocol name was used on the wire and no fallback request happened, the fallback protocol name is returned to the client code. This is what makes litep2p different and leads to panics in network/sync: Panic on unexpected generic response protocol #6581.

  3. If the fallback request was sent, the name of the fallback request protocol is returned to the client code (the same as libp2p).

I will create a PR to make litep2p/backend behave in a way libp2p backend behaves.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

itep2p sends the fallback request internally and it is the one responsible for actually performing the fallback request, not the network backend.

This turned out to be not the case: litep2p network backend does not use the litep2p's try_send_request_with_fallback and uses try_send_request instead. This means fallback requests are handled completely by the backend in substrate, and not by litep2p.

So, we can be sure that a fallback reported by litep2p is just the fallback name of the main protocol and safely override it with the main protocol name in substrate.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oki doki, thanks a lot for the detailed info and for confirming, this helps a lot🙏

@lexnv lexnv added the T0-node This PR/Issue is related to the topic “node”. label Nov 25, 2024
@lexnv lexnv marked this pull request as ready for review November 25, 2024 14:26
@lexnv lexnv added the A4-needs-backport Pull request must be backported to all maintained releases. label Nov 27, 2024
@lexnv lexnv enabled auto-merge November 27, 2024 10:33
@lexnv lexnv added this pull request to the merge queue Nov 27, 2024
Merged via the queue into master with commit 2a0b268 Nov 27, 2024
246 of 252 checks passed
@lexnv lexnv deleted the lexnv/fix-sync-panic2 branch November 27, 2024 12:02
github-actions bot pushed a commit that referenced this pull request Nov 27, 2024
Request responses are initialized with a main protocol name, and
optional protocol names as a fallback.

Running litep2p in kusama as a validator has surfaced a `debug_asserts`
coming from the sync component:

https://github.com/paritytech/polkadot-sdk/blob/3906c578c96d97a8a099a4bdac4685acbe375a7c/substrate/client/network/sync/src/strategy/chain_sync.rs#L640-L646

The issue is that we initiate a request-response over the main protocol
name `/genesis/sync/2` but receive a response over the legacy procotol
`ksm/sync/2`. This behavior is correct because litep2p propagates to the
higher levels the protocol that responded.

In contrast, libp2p provides the main protocol name regardless of
negotiating a legacy protocol.

Because of this, higher level components assume that only the main
protocol name will respond.
To not break this assumption, this PR alings litep2p shim layer with the
libp2p behavior.

Closes: #6581

---------

Signed-off-by: Alexandru Vasile <[email protected]>
Co-authored-by: Dmitry Markin <[email protected]>
(cherry picked from commit 2a0b268)
@paritytech-cmd-bot-polkadot-sdk

Successfully created backport PR for stable2407:

github-actions bot pushed a commit that referenced this pull request Nov 27, 2024
Request responses are initialized with a main protocol name, and
optional protocol names as a fallback.

Running litep2p in kusama as a validator has surfaced a `debug_asserts`
coming from the sync component:

https://github.com/paritytech/polkadot-sdk/blob/3906c578c96d97a8a099a4bdac4685acbe375a7c/substrate/client/network/sync/src/strategy/chain_sync.rs#L640-L646

The issue is that we initiate a request-response over the main protocol
name `/genesis/sync/2` but receive a response over the legacy procotol
`ksm/sync/2`. This behavior is correct because litep2p propagates to the
higher levels the protocol that responded.

In contrast, libp2p provides the main protocol name regardless of
negotiating a legacy protocol.

Because of this, higher level components assume that only the main
protocol name will respond.
To not break this assumption, this PR alings litep2p shim layer with the
libp2p behavior.

Closes: #6581

---------

Signed-off-by: Alexandru Vasile <[email protected]>
Co-authored-by: Dmitry Markin <[email protected]>
(cherry picked from commit 2a0b268)
@paritytech-cmd-bot-polkadot-sdk

Successfully created backport PR for stable2409:

github-actions bot pushed a commit that referenced this pull request Nov 27, 2024
Request responses are initialized with a main protocol name, and
optional protocol names as a fallback.

Running litep2p in kusama as a validator has surfaced a `debug_asserts`
coming from the sync component:

https://github.com/paritytech/polkadot-sdk/blob/3906c578c96d97a8a099a4bdac4685acbe375a7c/substrate/client/network/sync/src/strategy/chain_sync.rs#L640-L646

The issue is that we initiate a request-response over the main protocol
name `/genesis/sync/2` but receive a response over the legacy procotol
`ksm/sync/2`. This behavior is correct because litep2p propagates to the
higher levels the protocol that responded.

In contrast, libp2p provides the main protocol name regardless of
negotiating a legacy protocol.

Because of this, higher level components assume that only the main
protocol name will respond.
To not break this assumption, this PR alings litep2p shim layer with the
libp2p behavior.

Closes: #6581

---------

Signed-off-by: Alexandru Vasile <[email protected]>
Co-authored-by: Dmitry Markin <[email protected]>
(cherry picked from commit 2a0b268)
@paritytech-cmd-bot-polkadot-sdk

Successfully created backport PR for stable2412:

EgorPopelyaev pushed a commit that referenced this pull request Nov 27, 2024
Backport #6603 into `stable2409` from lexnv.

See the
[documentation](https://github.com/paritytech/polkadot-sdk/blob/master/docs/BACKPORT.md)
on how to use this bot.

<!--
  # To be used by other automation, do not modify:
  original-pr-number: #${pull_number}
-->

Co-authored-by: Alexandru Vasile <[email protected]>
EgorPopelyaev added a commit that referenced this pull request Nov 27, 2024
Backport #6603 into `stable2412` from lexnv.

See the
[documentation](https://github.com/paritytech/polkadot-sdk/blob/master/docs/BACKPORT.md)
on how to use this bot.

<!--
  # To be used by other automation, do not modify:
  original-pr-number: #${pull_number}
-->

Co-authored-by: Alexandru Vasile <[email protected]>
Co-authored-by: Egor_P <[email protected]>
EgorPopelyaev pushed a commit that referenced this pull request Dec 3, 2024
Backport #6603 into `stable2407` from lexnv.

See the
[documentation](https://github.com/paritytech/polkadot-sdk/blob/master/docs/BACKPORT.md)
on how to use this bot.

<!--
  # To be used by other automation, do not modify:
  original-pr-number: #${pull_number}
-->

Co-authored-by: Alexandru Vasile <[email protected]>
Krayt78 pushed a commit to Krayt78/polkadot-sdk that referenced this pull request Dec 18, 2024
…itytech#6603)

Request responses are initialized with a main protocol name, and
optional protocol names as a fallback.

Running litep2p in kusama as a validator has surfaced a `debug_asserts`
coming from the sync component:

https://github.com/paritytech/polkadot-sdk/blob/3906c578c96d97a8a099a4bdac4685acbe375a7c/substrate/client/network/sync/src/strategy/chain_sync.rs#L640-L646

The issue is that we initiate a request-response over the main protocol
name `/genesis/sync/2` but receive a response over the legacy procotol
`ksm/sync/2`. This behavior is correct because litep2p propagates to the
higher levels the protocol that responded.

In contrast, libp2p provides the main protocol name regardless of
negotiating a legacy protocol.

Because of this, higher level components assume that only the main
protocol name will respond.
To not break this assumption, this PR alings litep2p shim layer with the
libp2p behavior.


Closes: paritytech#6581

---------

Signed-off-by: Alexandru Vasile <[email protected]>
Co-authored-by: Dmitry Markin <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
A4-needs-backport Pull request must be backported to all maintained releases. T0-node This PR/Issue is related to the topic “node”.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

network/sync: Panic on unexpected generic response protocol
3 participants