Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Syncing strategy refactoring (part 2) #5666

Merged

Conversation

nazar-pc
Copy link
Contributor

@nazar-pc nazar-pc commented Sep 10, 2024

Description

Follow-up to #5469 and mostly covering #5333.

The primary change here is that syncing strategy is no longer created inside of syncing engine, instead syncing strategy is an argument of syncing engine, more specifically it is an argument to build_network that most downstream users will use. This also extracts addition of request-response protocols outside of network construction, making sure they are physically not present when they don't need to be (imagine syncing strategy that uses none of Substrate's protocols in its implementation for example).

This technically allows to completely replace syncing strategy with whatever strategy chain might need.

There will be at least one follow-up PR that will simplify SyncingStrategy trait and other public interfaces to remove mentions of block/state/warp sync requests, replacing them with generic APIs, such that strategies where warp sync is not applicable don't have to provide dummy method implementations, etc.

Integration

Downstream projects will have to write a bit of boilerplate calling build_polkadot_syncing_strategy function to create previously default syncing strategy.

Review Notes

Please review PR through individual commits rather than the final diff, it will be easier that way. The changes are mostly just moving code around one step at a time.

Checklist

  • My PR includes a detailed description as outlined in the "Description" and its two subsections above.
  • My PR follows the labeling requirements of this project (at minimum one label for T required)
    • External contributors: ask maintainers to put the right label on your PR.
  • I have made corresponding changes to the documentation (if applicable)

@nazar-pc
Copy link
Contributor Author

@dmitry-markin @lexnv another one for you folks

@dmitry-markin dmitry-markin added the T0-node This PR/Issue is related to the topic “node”. label Sep 10, 2024
@dmitry-markin
Copy link
Contributor

Impeccable! Once again thanks for excellent work!

@dmitry-markin dmitry-markin requested a review from bkchr September 12, 2024 15:10
…refactoring-part-2

# Conflicts:
#	polkadot/node/service/src/lib.rs
let request = WarpProofRequest { begin };

let Some(protocol_name) = self.protocol_name.clone() else {
warn!(
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: Wouldn't this generate too much noise?

Copy link
Contributor Author

@nazar-pc nazar-pc Sep 13, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It will not, this method is not supposed to be called when there is no need for warp requests, that would basically be an implementation bug. This code effectively migrated from substrate/client/network/sync/src/engine.rs (see the change at the bottom of the file). Should be clear if looking at individual commits.

It is just that the invariants for warp sync are a bit strange, which causes this property to be optional (I only did mechanical refactoring, I didn't change the algorithm).

Copy link
Contributor

@lexnv lexnv left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM! Thanks for contributing! 🙏

@nazar-pc
Copy link
Contributor Author

What else is left here? I have another set of changes that depend on this for review already 🙂

@dmitry-markin dmitry-markin added this pull request to the merge queue Sep 17, 2024
Merged via the queue into paritytech:master with commit 43cd6fd Sep 17, 2024
206 of 208 checks passed
@nazar-pc nazar-pc deleted the syncing-strategy-refactoring-part-2 branch September 17, 2024 12:25
/// Metrics.
pub metrics: NotificationMetrics,
}

/// Build the network service, the network status sinks and an RPC sender.
pub fn build_network<TBl, TNet, TExPool, TImpQu, TCl>(
params: BuildNetworkParams<TBl, TNet, TExPool, TImpQu, TCl>,
pub fn build_network<Block, Net, TxPool, IQ, Client>(
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

While I'm late to the party (sorry).

Before this goes into any stable release or whatever, can we please revert this?

Basically we should not expose build_polkadot_syncing_strategy . Instead we should do this internally in this build_network function and then expose some build_network_advanced or whatever for @nazar-pc that takes then the syncing strategy. This way there is no breaking change to the outside for developers. (that only have a node and not any custom syncing stuff)

The end goal should be that all the node/service changes to all the nodes in this repo are reverted.

@lexnv or @dmitry-markin can you please take care of this?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

#5737 actually extracted the whole syncing strategy out of build_network and exposed build_default_syncing_engine function to construct the typical setup.

The fact that build_network was not just building the network, but also syncing engine, syncing strategies and hardcoding all the protocol was a long standing problem IMO. It was also taking (and still does) the whole sc_service::config::Configuration, most of which it doesn't use, which is also a problem (though I didn't address it in any of my changes). It makes using Substrate as a library quite painful for those who need to customize it the way we do at Subspace.

My goal would be for build_network to be simplified to little more than Net::new() call. The fact that it unconditionally adds a whole bunch of the protocols is annoying and undesirable.

Having build_network_advanced is an option, but not a long-term solution the way I imagine it. Happy to push it into #5737 if that is something you feel strongly about.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I guess the gist of it is that build_network should build the network, not syncing engines and other stuff it is coupled with right now.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I guess the gist of it is that build_network should build the network, not syncing engines and other stuff it is coupled with right now.

I mean I'm not against this and I support this. Just about how it is done. I mean just keep the function do what it was doing all the time and then provide some extended way.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reverted build_network's API to before this PR and added build_network_advanced instead in #5737

nazar-pc added a commit to nazar-pc/polkadot-sdk that referenced this pull request Sep 20, 2024
nazar-pc added a commit to autonomys/polkadot-sdk that referenced this pull request Sep 27, 2024
# Description

Follow-up to paritytech#5469 and
mostly covering paritytech#5333.

The primary change here is that syncing strategy is no longer created
inside of syncing engine, instead syncing strategy is an argument of
syncing engine, more specifically it is an argument to `build_network`
that most downstream users will use. This also extracts addition of
request-response protocols outside of network construction, making sure
they are physically not present when they don't need to be (imagine
syncing strategy that uses none of Substrate's protocols in its
implementation for example).

This technically allows to completely replace syncing strategy with
whatever strategy chain might need.

There will be at least one follow-up PR that will simplify
`SyncingStrategy` trait and other public interfaces to remove mentions
of block/state/warp sync requests, replacing them with generic APIs,
such that strategies where warp sync is not applicable don't have to
provide dummy method implementations, etc.

## Integration

Downstream projects will have to write a bit of boilerplate calling
`build_polkadot_syncing_strategy` function to create previously default
syncing strategy.

## Review Notes

Please review PR through individual commits rather than the final diff,
it will be easier that way. The changes are mostly just moving code
around one step at a time.

# Checklist

* [x] My PR includes a detailed description as outlined in the
"Description" and its two subsections above.
* [x] My PR follows the [labeling requirements](

https://github.com/paritytech/polkadot-sdk/blob/master/docs/contributor/CONTRIBUTING.md#Process
) of this project (at minimum one label for `T` required)
* External contributors: ask maintainers to put the right label on your
PR.
* [x] I have made corresponding changes to the documentation (if
applicable)

(cherry picked from commit 43cd6fd)
github-merge-queue bot pushed a commit that referenced this pull request Nov 7, 2024
# Description

This is a continuation of
#5666 that finally fixes
#5333.

This should allow developers to create custom syncing strategies or even
the whole syncing engine if they so desire. It also moved syncing engine
creation and addition of corresponding protocol outside
`build_network_advanced` method, which is something Bastian expressed as
desired in
#5 (comment)

Here I replaced strategy-specific types and methods in `SyncingStrategy`
trait with generic ones. Specifically `SyncingAction` is now used by all
strategies instead of strategy-specific types with conversions.
`StrategyKey` was an enum with a fixed set of options and now replaced
with an opaque type that strategies create privately and send to upper
layers as an opaque type. Requests and responses are now handled in a
generic way regardless of the strategy, which reduced and simplified
strategy API.

`PolkadotSyncingStrategy` now lives in its dedicated module (had to edit
.gitignore for this) like other strategies.

`build_network_advanced` takes generic `SyncingService` as an argument
alongside with a few other low-level types (that can probably be
extracted in the future as well) without any notion of specifics of the
way syncing is actually done. All the protocol and tasks are created
outside and not a part of the network anymore. It still adds a bunch of
protocols like for light client and some others that should eventually
be restructured making `build_network_advanced` just building generic
network and not application-specific protocols handling.

## Integration

Just like #5666
introduced `build_polkadot_syncing_strategy`, this PR introduces
`build_default_block_downloader`, but for convenience and to avoid
typical boilerplate a simpler high-level function
`build_default_syncing_engine` is added that will take care of creating
typical block downloader, syncing strategy and syncing engine, which is
what most users will be using going forward. `build_network` towards the
end of the PR was renamed to `build_network_advanced` and
`build_network`'s API was reverted to
pre-#5666, so most users
will not see much of a difference during upgrade unless they opt-in to
use new API.

## Review Notes

For `StrategyKey` I was thinking about using something like private type
and then storing `TypeId` inside instead of a static string in it, let
me know if that would preferred.

The biggest change happened to requests that different strategies make
and how their responses are handled. The most annoying thing here is
that block response decoding, in contrast to all other responses, is
dependent on request. This meant request had to be sent throughout the
system. While originally `Response` was `Vec<u8>`, I didn't want to
re-encode/decode request and response just to fit into that API, so I
ended up with `Box<dyn Any + Send>`. This allows responses to be truly
generic and each strategy will know how to downcast it back to the
concrete type when handling the response.

Import queue refactoring was needed to move `SyncingEngine` construction
out of `build_network` that awkwardly implemented for `SyncingService`,
but due to `&mut self` wasn't usable on `Arc<SyncingService>` for no
good reason. `Arc<SyncingService>` itself is of course useless, but
refactoring to replace it with just `SyncingService` was unfortunately
rejected in #5454

As usual I recommend to review this PR as a series of commits instead of
as the final diff, it'll make more sense that way.

# Checklist

* [x] My PR includes a detailed description as outlined in the
"Description" and its two subsections above.
* [x] My PR follows the [labeling requirements](

https://github.com/paritytech/polkadot-sdk/blob/master/docs/contributor/CONTRIBUTING.md#Process
) of this project (at minimum one label for `T` required)
* External contributors: ask maintainers to put the right label on your
PR.
* [x] I have made corresponding changes to the documentation (if
applicable)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
T0-node This PR/Issue is related to the topic “node”.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants