Fix on-different-forks metrics during initialization by svyatonik · Pull Request #1468 · paritytech/parity-bridges-common

svyatonik · 2022-06-17T12:49:04Z

This PR fixes point (3) from #1462 (comment)

Currently It breaks RialtoParachain<>Millau relay, since it thinks that with-RialtoParachain finality pallet at Millau. Needs some debugging => draft

Update below:

There's another important commit in this PR, which "initializes" parachains finality pallet when on-demand parachains relay is being used (that's what we have now). Without it, messages are never delivered, because message relay keep getting "BridgePalletsIsNotInitialized" error.

During (nontrivial) debugging, I've seen couple of other issues locally. One is #2477 and another one is stall similar to the #1463. The latter one may be fixed by adding more accounts, but it is rare (not as many stalls as in #1463) and stall recovery works well here. So I'm leaving it as is && will fix properly if it'll cause excess alerts.

This PR breaks existing runtime APIs - best_finalized now returns Option<(BlockNumber, BlockHash)> instead of (BlockNumber, BlockHash).

serban300

Looks good ! I just left a couple of nits which I'd be happy to implement if they make sense.

serban300 · 2022-07-27T12:35:01Z

-				Some(head) => (*head.number(), head.hash()),
-				None => (Default::default(), Default::default()),
-			}
+			>::best_parachain_head(RIALTO_PARACHAIN_ID.into())?;


Nit: I would use bp_rialto_parachain::RIALTO_PARACHAIN_ID directly and remove the use statement above.

serban300 · 2022-07-27T12:39:10Z

-			let header = BridgeRialtoGrandpa::best_finalized();
-			(header.number, header.hash())
+		fn best_finalized() -> Option<(bp_rialto::BlockNumber, bp_rialto::Hash)> {
+			BridgeRialtoGrandpa::best_finalized().map(|header| (header.number, header.hash()))


Nit: I think we could use the new HeaderIdProvider::id() in these implementations. Could we also return an Option<HeaderId> here ? Or are we limited to basic types like tuples for the values returned by the API methods ?

Yeah - I think it could be done. Thank you!

serban300 · 2022-07-27T13:41:51Z

 		RelayState::RelayingRelayHeader(_) => unreachable!("processed by previous match; qed"),
 		RelayState::RelayingParaHeader(para_header_id) => {
-			if data.para_header_at_target < para_header_id.0 {
+			if para_header_at_target_or_zero < para_header_id.0 {


2 small Nits not related to this PR:

I would return state here instead of RelayState::RelayingParaHeader(para_header_id).

I think I would use 2 if statements here instead of 2 matches.

serban300 · 2022-07-27T16:09:39Z

+	let (required_para_header, para_header_at_target) = match data.para_header_at_target {
+		Some(para_header_at_target) => (data.required_para_header, para_header_at_target),
+		None => (para_header_at_source.0, Zero::zero()),
+	};


Would it make sense to have something like the following ?

let required_para_header = max(data.required_para_header, para_header_at_source.0); let para_header_at_target = para_header_at_target_or_zero;

Since we want to sync the latest header if I understand correctly.

IIUC max(data.required_para_header, para_header_at_source.0) isn't fully correct. So we are choosing what we need to sync using following algorithm:

if there are no parachain headers at the target (i.e. data.para_header_at_target is None) - then we want to sync current best header at source (para_header_at_source.0);

otherwise, we shall never sync past the an artificial limit (data.required_para_header).

Thanks ! I think this part makes sense now. I have one more question. Sorry for insisting on this and sorry if it doesn't make much sense, but I'm trying to understand this logic (and the parachain heads syncing logic in general) better since I got to it here. One thing that stands out to me is that we do a lot of parachain validations:

// this switch is responsible for processing `RelayingParaHeader` state let para_header_at_target_or_zero = data.para_header_at_target.unwrap_or_else(Zero::zero); match state { RelayState::Idle => (), RelayState::RelayingRelayHeader(_) => unreachable!("processed by previous match; qed"), RelayState::RelayingParaHeader(para_header_id) => { if para_header_at_target_or_zero < para_header_id.0 { // parachain header hasn't yet been relayed return RelayState::RelayingParaHeader(para_header_id) } }, } // if we haven't read para head from the source, we can't yet do anyhting let para_header_at_source = match data.para_header_at_source { Some(ref para_header_at_source) => para_header_at_source.clone(), None => return RelayState::Idle, }; ...

And at the end we do:

// we need relay chain header first if data.relay_header_at_target < data.relay_header_at_source { return RelayState::RelayingRelayHeader(data.relay_header_at_source) }

And I was wondering if this order of operations is absolutely required. Do we need to know for example that para_header_at_source is Some, before returning RelayState::RelayingRelayHeader ? Or could we rearrange this logic a bit, starting with something like:

// We need relay chain header first if data.relay_header_at_target < data.relay_header_at_source { return RelayState::RelayingRelayHeader(data.relay_header_at_source); } // this switch is responsible for processing `RelayingRelayHeader` state match state { RelayState::Idle | RelayState::RelayingParaHeader(_) => (), RelayState::RelayingRelayHeader(relay_header_number) => { if data.relay_header_at_target < relay_header_number { // required relay header hasn't yet been relayed return RelayState::RelayingRelayHeader(relay_header_number) } // we may switch to `RelayingParaHeader` if parachain head is available if let Some(para_header_at_relay_header_at_target) = data.para_header_at_relay_header_at_target.clone() { state = RelayState::RelayingParaHeader(para_header_at_relay_header_at_target); } else { // otherwise, we'd need to restart (this may happen only if parachain has been // deregistered) state = RelayState::Idle; } }, } // this switch is responsible for processing `RelayingParaHeader` state let para_header_at_target_or_zero = data.para_header_at_target.unwrap_or_else(Zero::zero); match state { RelayState::Idle => (), RelayState::RelayingRelayHeader(_) => unreachable!("processed by previous match; qed"), RelayState::RelayingParaHeader(para_header_id) => { if para_header_at_target_or_zero < para_header_id.0 { // parachain header hasn't yet been relayed return RelayState::RelayingParaHeader(para_header_id) } }, } ...

This would create a separation between the relay headers logic and para headers logic, and personally I would have found it easier to follow.

Please keep asking questions - that's absolutely fine!

If we put this part

// We need relay chain header first if data.relay_header_at_target < data.relay_header_at_source { return RelayState::RelayingRelayHeader(data.relay_header_at_source); }

at the beginning of the code, then we'll end up always syncing relay headers and parachain headers won't ever be synced. Just because (as a rule) it'll take more time for us to craft + send + mine transaction at the target chain vs just generating new block of the source chain.

There's also a big story behind this on-demand relay - it is called on-demand for a reason :) So we don't need to sync all source headers to the target chain. That's because then we'll:

lose a ton of tokens - e.g. cost of import-Kusama-header-to-Polkadot iirc was somewhere between 1 and 2 DOTs. So if we'll sync all Kusama headers to Polkadot, we'll be losing ~24*60*60/6*2=28800 DOTs per day;

we'll be generating unneeded congestion at the target network - i.e. we'll be generating quite large (in both size and weight terms) transactions at almost every block;

we simply don't need it.

So initially relayers architecture was layered - similar to how our pallet are also layered (GRANDPA at the bottom, then parachains, then messaging on app level). And we have 3 separate relayer subcommands - one that relays GRANDPA proofs and headers (substrate-relay relay-headers), one that relays parachain headers (substrate-relay relay-parachains) and the messages relay (relay-messages). They all have their own task and e.g. messages relay will stall if other relays are not working (simply because it needs relay and parachain headers to be able to prove messages).

But running standalone headers relay (substrate-relay relay-headers) would lead us to the problem that I've described above. And the solution was to introduce so called complex relay (substrate-relay relay-headers-and-messages), which is a messages relay with "paused" background headers (and parachains) relays. So when messages relay is seeing that it needs header-100 to deliver message-1 from source to the target chain, it "awakes" the background headers relay and asks it to relay header-100. Once it is relayed, headers relay starts sleeping again and messages relay does its job.

There are also other caveats there - e.g. we need to awake background relay sometimes and relay mandatory headers, but overall - when complex relay is running (which is as I said: messages relay + on-demand headers and parachains relay), we are trying to minimize number of transactions that we're submitting.

Oh, ok, this is super useful both for this review and for understanding more details about the project. Thanks ! I was thinking that I might be missing something.

serban300 · 2022-07-27T16:43:44Z


 # last time when we have been asking for conversion rate update
 LAST_CONVERSION_RATE_UPDATE_TIME=0
-# current conversion rate


All these scripts look very similar. I wonder if we could move the common logic in a function that's accessible to all. Just saying, but probably it's not worth doing now.

Yeah - it has became complicated with that conversion rate override stuff, but we'll probably would need to remove it soon (it isn't usable in the XCM infra). But overall idea looks good (if there's enough code that may be shared)

I created #1526 for this

serban300 · 2022-07-29T08:18:53Z

The comments have been addressed as part of #1525 . Marking this PR as reviewed.

paritytech/parity-bridges-common#1468

Bumps [clap](https://github.com/clap-rs/clap) from 3.2.13 to 3.2.15. - [Release notes](https://github.com/clap-rs/clap/releases) - [Changelog](https://github.com/clap-rs/clap/blob/v3.2.15/CHANGELOG.md) - [Commits](clap-rs/clap@v3.2.13...v3.2.15) --- updated-dependencies: - dependency-name: clap dependency-type: direct:production update-type: version-update:semver-patch ... Signed-off-by: dependabot[bot] <support@github.com> Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>

* fix on-different-forks metrics during initialization * "initialize" parachain finality pallet in on-demand parachains relay * decrease converstion rate requests count * more error logging * fix compilation * clippy

svyatonik added 4 commits June 17, 2022 15:45

fix on-different-forks metrics during initialization

1fbcd0c

"initialize" parachain finality pallet in on-demand parachains relay

b10f341

decrease converstion rate requests count

8c5f904

more error logging

3f6ca6c

svyatonik added P-Relay PR-breaksrelay A PR that is going to break existing relayers. I.e. some Runtime changes render old relayers unusabl labels Jun 21, 2022

svyatonik added 2 commits June 21, 2022 14:27

fix compilation

d799c64

clippy

d16260b

svyatonik marked this pull request as ready for review June 21, 2022 11:37

svyatonik enabled auto-merge (squash) June 21, 2022 11:39

svyatonik merged commit 51a8760 into master Jun 21, 2022

svyatonik deleted the fix-different-forks-metrics-during-initialization branch June 21, 2022 11:52

This was referenced Jun 21, 2022

Remove local images refs #1470

Merged

Unconfirmed rewards at Millau->RaltoParachain alert on test deployments #1469

Closed

serban300 reviewed Jul 27, 2022

View reviewed changes

serban300 mentioned this pull request Jul 28, 2022

Followu-up un #1468 #1525

Merged

wuminzhe pushed a commit to darwinia-network/darwinia-messages-substrate that referenced this pull request Aug 8, 2022

Companion for 1468

7b7efe4

paritytech/parity-bridges-common#1468

wuminzhe mentioned this pull request Aug 9, 2022

Sync upstream darwinia-network/darwinia-messages-substrate#174

Closed

30 tasks

Conversation

svyatonik commented Jun 17, 2022 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

serban300 left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

serban300 commented Jul 29, 2022

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

svyatonik commented Jun 17, 2022 •

edited

Loading