Skip to content
This repository was archived by the owner on Nov 15, 2023. It is now read-only.

Conversation

@tomaka
Copy link
Contributor

@tomaka tomaka commented Jul 10, 2020

Close #5536
Relates to paritytech/polkadot-sdk#553 paritytech/polkadot-sdk#559

Introduces a convenient API for request-response protocols to the NetworkService.

In order to use them, a new field is introduced in NetworkConfiguration that lets you specify the list of supported request-response protocols. As part of this field, you can pass a channel where incoming requests will be sent and can be answered.
Afterwards, call NetworkService::request in order to perform a request. This is a simple asynchronous method that returns a Result<Vec<u8>, SomeError>.

Request and responses are simply exposed as Vec<u8>, which I think ends up being less confusing than an API that automatically encodes/decodes requests and responses. Contrary to libp2p, we can't provide strong typing in Substrate because generics would cascade too much everywhere.

I went for an API that only allows sending requests to nodes we're already connected to, as request-reponse protocols aren't meant to be used to send requests to random peers, but rather to complement notifications. You are expected to start a request in response to a received notification, and not out of nowhere.

I've also modified the request-response-related metrics. in_total and out_finished have been replaced with in_success_total, in_failure_total, out_success_total and out_failure_total. The success variants include the time the request/response took, and the failure variants include the reason for the failure. Considering that requests/responses are very unidirectional in nature, it does make sense to me to not merge the in and out variants together.

This PR is marked as "inprogress" because tests haven't been written yet, but it can already be reviewed and feedback given.

Known caveats

  • Since there is now a channel between the network and the task that builds responses, and that channel has limited capacity, if a peer floods the local node with expensive requests, then we will start dropping the requests made by other peers. This isn't something new, and is the definition of a DoS attack. Before this PR, however, DoS attacks will slow down the entire networking of the node, while after this PR it will cause requests to be dropped (without slowing down the networking, hopefully). Ideally, there should be a mechanism that guarantees some equal distribution of the requests based on their origin, so that even if a node sends us thousands of requests, we will also process requests made by other nodes. An issue should be opened about that after this PR is merged.

  • Ideally I'd prefer to not do that, but we might have to provide a register_request_response_protocol method on the network service, in order to integrate it in Polkadot.

Future work

We could rather easily change sync/block-requests/finality-proof-requests/light-client-requests to become based on top of this API in the future. An issue should also be opened about this.

@tomaka tomaka added A3-in_progress Pull request is in progress. No review needed at this stage. B0-silent Changes should not be mentioned in any release notes C1-low PR touches the given topic and has a low impact on builders. labels Jul 10, 2020
@tomaka tomaka requested review from mxinden and romanb July 10, 2020 16:27
@tomaka tomaka added A0-please_review Pull request needs code review. and removed A3-in_progress Pull request is in progress. No review needed at this stage. labels Jul 13, 2020
@tomaka tomaka requested a review from twittner July 13, 2020 17:03
Copy link
Contributor

@mxinden mxinden left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this does a great job at hiding the complexity involved behind a simple interface (async fn request).

///
/// This event is for statistics purposes only. The request and response handling are entirely
/// internal to the behaviour.
OpaqueRequestStarted {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
OpaqueRequestStarted {
OutboundRequestStarted {

Would that not be more descriptive? Would as well match with InboundRequest above.

},

/// A request initiated using [`Behaviour::send_request`] has succeeded or failed.
RequestFinished {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
RequestFinished {
OutboundRequestFinished {

Would be easier for me to understand without having to read the doc comment.

@romanb
Copy link
Contributor

romanb commented Jul 14, 2020

My review is in progress and will be finished later today.

protocol: Vec<u8>,
/// Time it took to build the response.
build_time: Duration,
protocol: Cow<'static, str>,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why do protocol names have to be strings instead of bytes?

Copy link
Contributor Author

@tomaka tomaka Jul 14, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

While libp2p allows bytes, I went for restricting this to strings in Substrate because:

  • In practice they're always strings anyway.
  • These protocol names are reported to Prometheus, and Prometheus expects strings (which is the reason for maybe_utf8_bytes_to_string).

match event {
block_requests::Event::AnsweredRequest { peer, total_handling_time } => {
self.events.push_back(BehaviourOut::AnsweredRequest {
let protocol = maybe_utf8_bytes_to_string(&self.block_requests.protocol_name())
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

According to its documentation maybe_utf8_bytes_to_string is meant for diagnostics only and non-UTF-8 bytes are just format!ed as a string. If protocol names are now supposed to be strings (why?), should BlockRequests::protocol_name not return a &str?

Copy link
Contributor Author

@tomaka tomaka Jul 14, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The protocol name contained in BlockRequests unfortunately contains the ProtocolId, which might theoretically not be UTF-8. In practice, this ProtocolId is always UTF-8, and we could add a requirement that the ProtocolId must be UTF-8, but this was a bit too off-topic for this PR.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ideally this would be handled consistently. If changing BlockRequests::protocol_name to return a &str is too much for this PR maybe protocol names should continue to be &[u8] for the time being and a follow-up PR could transition to &str everywhere at once.

Copy link
Contributor Author

@tomaka tomaka Jul 15, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I understand that you hate maybe_utf8_bytes_to_string, but this is not something that this PR introduces, and the change is out of scope to me. I've opened #6660.

// initialization.
if let Some(resp_builder) = resp_builder {
// If the response builder is too busy, silently drop `tx`.
// This will be reported as a `Busy` error.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The requirement to size the "response builder" channel properly and the dropping of requests when it has reached capacity look quite brittle to me. I would expect back-pressure is exercised, i.e. unless the response builder is ready to process another inbound request, the RequestResponseBehaviour should not even accept new inbound requests, such that the senders can not produce ever more. I do realise that the events from RequestResponse need to be processed, but maybe the back-pressure handling should be rethought, e.g. instead of immediately passing on the whole inbound request RequestResponse may queue a single request, generate an event that a request is ready to be consumed and only after RequestResponseBehaviour actually retrieved it will RequestResponse read another inbound request.

Copy link
Contributor Author

@tomaka tomaka Jul 14, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's not really a question of API here, but of what to do on a more fundamental level.

If we are connected to 200 other nodes, and they all send us a request at the same time, and it takes on average 100ms to build a response to each request, then the 200th node would receive a response only after 20 seconds.

This PR assumes that 20 seconds is an unacceptably long time to send back a response, and prefers to immediately make these long-duration requests fail rather than queuing them.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree this is a fundamental decision, but my conclusion is the opposite to yours. In particular I think it is up to the requesting endpoint to decide how long it is willing to wait for an answer.

Copy link
Contributor

@romanb romanb Jul 15, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

On a general level, I think it would be nice if a configurable concurrent inbound substream limit could be employed for back-pressure already in the muxers, but as @twittner pointed out, that is probably not how the muxers currently behave. Of course, such a limit would also apply to all substreams (whether used for notifications, request-response, etc.).

I think all such measures - not accepting new substreams (or delaying accepting them), sending early error responses, "dropping" requests (i.e. just closing the substream, a form of inbound substream limit at a higher layer) - are reasonable and can be complementary for managing inbound traffic. I just also think that considerations about average response times are not really important for sizing queues on the receiving end, as the numbers involved are subject to a lot of variation. Rather, the receiver merely decides how much resources (primarily in terms of memory) it is willing to dedicate to pending requests for a certain protocol. With the max_request_size one can calculate the upper limits reasonably well. How long one is willing to wait for a response, on the other hand, is primarily up to the sender of a request and different senders can have different tolerances in that regard - one shouldn't assume that all senders have the same request timeout, since we're not even assuming all peers to be substrate nodes.

Copy link
Contributor Author

@tomaka tomaka Jul 15, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I generally agree with @twittner's suggestion to modify the way the RequestResponse behaves.

However, muxers in general indeed don't permit applying back-pressure on incoming substreams. As far as I know, it's not something that we can change in rust-libp2p, but rather something that multiplexing protocols in general don't permit.

Assuming that the receiving side should never refuse incoming requests, I think that there are two possible courses of actions:

  • Change the protocol to no longer open a substream per request, but rather append multiple requests one behind the other in the same substream. It would be a protocol error to open more than one substream per protocol per connection.
  • Enforce a limit, for each protocol, to the number of ongoing requests. The sending side would have to throttle itself in order to not reach this limit.

With the first option, once the sending side has started to write a request, it cannot cancel it anymore and has no choice but to finish writing it, even if the substream is being back-pressure for a longer time than is desirable.
The second option, however, leaves the possibility for the sending side to cancel their ongoing requests by closing the substream.

Copy link
Contributor

@twittner twittner Jul 15, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The throttling can be made explicit in the protocol. If we augment the request-response protocol and send metadata along with the actual payload data (something we should maybe anyway do for future extensibility) we could make use of tokens which are required for sending. More specifically, the receiver issues tokens to the sender which are valid for a given number of requests the sender is allowed to send before having to wait for the next token. The initial request is free because the sender does not have a token yet and we probably do not want to require a roundtrip to get one, otherwise we could require that the sender first has to get a token. RequestResponse queues the requests and issues a new token to the sender when the queue is empty:

   Sender                     Receiver                           Consumer
|         |               |             |                       |
|Request  |----(Send)---->|[]           |                       |
|         |               |             |                       |
|<wait>   |               |[Request]    |<---(Get Request)----- |
|         |               |             |<-(Accept N requests)--|
|         |               |             |                       |
|         |<-(Token1(N))--|[]           |                       |
|         |               |             |                       |
|Request1,|               |             |                       |
|Token1   |----(Send)---->|[Request1]   |                       |
|         |               |             |                       |
|...      |               |             |                       |
|         |               |             |                       |
|RequestN,|               |             |                       |
|Token1   |----(Send)---->|[Request1,   |                       |
|         |               | ... RequestN|                       |
|         |               |]            |                       |
|         |               |             |                       |
|<wait>   |               |[Request1,   |<---(Get Request)------|
|         |               | ... RequestN|                       |
|         |               |]            |                       |
|         |               |             |                       |
|         |               |...          |                       |
|         |               |             |                       |
|         |               |[RequestN]   |<---(Get Request)------|
|         |               |             |<-(Accept M requests)--|
|         |<-(Token2(M))--|[]           |                       |
...

Sending requests with invalid tokens or tokens whose associated quota is used up is a protocol violation. The quota applies per peer and protocol and is independent of the number of substreams or connections.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I still don't know what to do, practically speaking. As a reminder, we're trying to ship parachains as soon as possible, and I would like to avoid this protocol design discussion taking two months of back-and-forth.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I started prototyping the required changes here: tomaka/polkadot@request-responses...twittner:budgets It contains some TODOs, mostly to reset send/receive budgets on timeouts etc. Let me know if this makes sense to you.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm quite reluctant to implement something complex, as it means:

  • We're potentially going to spend a lot of time implementing it and fixing bugs. Same for any other implementation team.
  • We wouldn't be able to base the existing block-request/finality-request/light-client-requests on top of this new request-response system unless we do a transition period, which is again quite annoying.
  • I'm not convinced that there doesn't exist a more simple solution that achieves the same. In particular, is there anything wrong with instead limiting the number of simultaneous requests per peer for any given protocol?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In particular, is there anything wrong with instead limiting the number of simultaneous requests per peer for any given protocol?

As discussed in private, a protocol level constant should be fine and does not require the proper sizing of mpsc channels. It is awkward to roll out changes to this constant but as I understood it this is not expected to happen.

}

/// Error when registering a protocol.
#[derive(Debug, derive_more::Display, derive_more::Error)]
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This crate uses derive_more and thiserror. Maybe we could eventually select one?

///
/// Any response larger than this value will be declined as a way to avoid allocating too
/// much memory for it.
pub max_response_size: usize,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe instead of using usize we should be machine independent and use u32 or u64 for max_request_size and max_response_size?

Copy link
Contributor

@romanb romanb left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Apart from the minor comments and naming suggestions that I left, there are two aspects which I'd like to discuss:

  1. Is it really beneficial to build-in the inbound request handling via requests_processing, pending_responses, InboundError::Busy etc. in this way as compared to just emitting an event for every inbound request on the API and offering NetworkWorker::respond complementary to NetworkWorker::request, analogous to libp2p-request-response? Essentially leaving it to client code how many, if any, queues are used per protocol, how to react to "too many requests" etc.?

  2. It seems quite unfortunate that despite being "generic" (i.e. Vec<u8>) over protocols, requests and responses, the implementation needs to employ one RequestResponse behaviour per protocol with all the associated additional implementation complexity and runtime overhead. As far as I can tell, the only thing that stands between the usage of a single RequestResponse behaviour is the ability to have different request timeouts and protocol support per protocol. If this is true, it may be worth thinking about adding fn protocol_support(Self::Protocol) -> ... and fn request_timeout(Self::Protocol) -> ... to RequestResponseCodec with default implementations that are either complementary or complete replacements for the corresponding configuration done in RequestResponseConfig and RequestResponse::new. The idea would be that the GenericCodec then contains an Arc to a map of protocol configurations built at initialisation, which it can consult for: request timeout, protocol support, max payload sizes, ... If I didn't overlook a substantial roadblock and there is interest in this idea, I could prepare a libp2p PR for further discussion and this is not something that needs to stall this PR since it concerns implementation details and could be a subsequent refactoring.

/// If this is `None`, then the local node will not advertise support for this protocol towards
/// other peers. If this is `Some` but the channel is closed, then the local node will
/// advertise support for this protocol, but any incoming request will lead to an error being
/// sent back.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If this is Some but the channel is closed, then the local node will advertise support for this protocol, but any incoming request will lead to an error being sent back.

Could you link to the code that sends back an error under this circumstance?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I took a bit of a shortcut in this comment. The error is "reported" by destroying the libp2p_request_response::ResponseChannel, which closes the substream, which gets detected as an error by the remote.

// initialization.
if let Some(resp_builder) = resp_builder {
// If the response builder is too busy, silently drop `tx`.
// This will be reported as a `Busy` error.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I started prototyping the required changes here: tomaka/polkadot@request-responses...twittner:budgets It contains some TODOs, mostly to reset send/receive budgets on timeouts etc. Let me know if this makes sense to you.

@tomaka
Copy link
Contributor Author

tomaka commented Aug 27, 2020

Is it really beneficial to build-in the inbound request handling via requests_processing, pending_responses, InboundError::Busy etc. in this way as compared to just emitting an event for every inbound request on the API and offering NetworkWorker::respond complementary to NetworkWorker::request, analogous to libp2p-request-response? Essentially leaving it to client code how many, if any, queues are used per protocol, how to react to "too many requests" etc.?

In practice, the NetworkWorker is completely owned by sc-service, and exposing it to Polkadot would be a huge refactoring task. The API that we want general users to use is NetworkService.

These queues of incoming requests have to exist somewhere, and this PR implements them inside of the NetworkWorker. I don't really see any better solution that wouldn't involve a refactoring of sc-service.

It seems quite unfortunate that despite being "generic" (i.e. Vec) over protocols, requests and responses, the implementation needs to employ one RequestResponse behaviour per protocol with all the associated additional implementation complexity and runtime overhead. As far as I can tell, the only thing that stands between the usage of a single RequestResponse behaviour is the ability to have different request timeouts and protocol support per protocol. If this is true, it may be worth thinking about adding fn protocol_support(Self::Protocol) -> ... and fn request_timeout(Self::Protocol) -> ... to RequestResponseCodec with default implementations that are either complementary or complete replacements for the corresponding configuration done in RequestResponseConfig and RequestResponse::new. The idea would be that the GenericCodec then contains an Arc to a map of protocol configurations built at initialisation, which it can consult for: request timeout, protocol support, max payload sizes, ... If I didn't overlook a substantial roadblock and there is interest in this idea, I could prepare a libp2p PR for further discussion and this is not something that needs to stall this PR since it concerns implementation details and could be a subsequent refactoring.

I don't really have a strong opinion on the question of one-behaviour-per-protocol vs one-behaviour-that-groups-everything.
I'm not opposed to doing the changes in libp2p that allow that.

Unfortunately I'd strongly advocate for not blocking this pull request on this concern. I do realize that if I had answered this comment earlier, we could have done the change by now, sorry for that.

Similarly, since libp2p/rust-libp2p#1715 isn't ready quite yet, I'd strongly prefer to merge this pull request beforehand. The work on Polkadot's collation protocol is blocked on this PR.

@tomaka
Copy link
Contributor Author

tomaka commented Aug 27, 2020

As mentioned, I'd like to merge this in the very near future.
I do realize that this PR isn't ideal, but this feature is quite important for parachains.
Between merging this PR now and improving the feature later, and writing higher-level protocols using work-arounds, the former is very strongly preferable.

@tomaka
Copy link
Contributor Author

tomaka commented Aug 27, 2020

bot merge

@ghost
Copy link

ghost commented Aug 27, 2020

Trying merge.

@ghost ghost merged commit 295a670 into paritytech:master Aug 27, 2020
@tomaka tomaka deleted the request-responses branch August 27, 2020 12:53
@tomaka tomaka added B5-clientnoteworthy and removed B0-silent Changes should not be mentioned in any release notes labels Sep 1, 2020
This pull request was closed.
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.

Labels

A0-please_review Pull request needs code review. C1-low PR touches the given topic and has a low impact on builders.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Allow registering custom request-response protocols

4 participants