Skip to content

Fix on-different-forks metrics during initialization#1468

Merged
svyatonik merged 6 commits into
masterfrom
fix-different-forks-metrics-during-initialization
Jun 21, 2022
Merged

Fix on-different-forks metrics during initialization#1468
svyatonik merged 6 commits into
masterfrom
fix-different-forks-metrics-during-initialization

Conversation

@svyatonik
Copy link
Copy Markdown
Contributor

@svyatonik svyatonik commented Jun 17, 2022

This PR fixes point (3) from #1462 (comment)

Currently It breaks RialtoParachain<>Millau relay, since it thinks that with-RialtoParachain finality pallet at Millau. Needs some debugging => draft

Update below:

There's another important commit in this PR, which "initializes" parachains finality pallet when on-demand parachains relay is being used (that's what we have now). Without it, messages are never delivered, because message relay keep getting "BridgePalletsIsNotInitialized" error.

During (nontrivial) debugging, I've seen couple of other issues locally. One is #2477 and another one is stall similar to the #1463. The latter one may be fixed by adding more accounts, but it is rare (not as many stalls as in #1463) and stall recovery works well here. So I'm leaving it as is && will fix properly if it'll cause excess alerts.

This PR breaks existing runtime APIs - best_finalized now returns Option<(BlockNumber, BlockHash)> instead of (BlockNumber, BlockHash).

@svyatonik svyatonik added P-Relay PR-breaksrelay A PR that is going to break existing relayers. I.e. some Runtime changes render old relayers unusabl labels Jun 21, 2022
@svyatonik svyatonik marked this pull request as ready for review June 21, 2022 11:37
@svyatonik svyatonik enabled auto-merge (squash) June 21, 2022 11:39
@svyatonik svyatonik merged commit 51a8760 into master Jun 21, 2022
@svyatonik svyatonik deleted the fix-different-forks-metrics-during-initialization branch June 21, 2022 11:52
Copy link
Copy Markdown
Collaborator

@serban300 serban300 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good ! I just left a couple of nits which I'd be happy to implement if they make sense.

Some(head) => (*head.number(), head.hash()),
None => (Default::default(), Default::default()),
}
>::best_parachain_head(RIALTO_PARACHAIN_ID.into())?;
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nit: I would use bp_rialto_parachain::RIALTO_PARACHAIN_ID directly and remove the use statement above.

let header = BridgeRialtoGrandpa::best_finalized();
(header.number, header.hash())
fn best_finalized() -> Option<(bp_rialto::BlockNumber, bp_rialto::Hash)> {
BridgeRialtoGrandpa::best_finalized().map(|header| (header.number, header.hash()))
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nit: I think we could use the new HeaderIdProvider::id() in these implementations. Could we also return an Option<HeaderId> here ? Or are we limited to basic types like tuples for the values returned by the API methods ?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah - I think it could be done. Thank you!

RelayState::RelayingRelayHeader(_) => unreachable!("processed by previous match; qed"),
RelayState::RelayingParaHeader(para_header_id) => {
if data.para_header_at_target < para_header_id.0 {
if para_header_at_target_or_zero < para_header_id.0 {
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

2 small Nits not related to this PR:

  1. I would return state here instead of RelayState::RelayingParaHeader(para_header_id).
  2. I think I would use 2 if statements here instead of 2 matches.

Comment on lines +481 to +484
let (required_para_header, para_header_at_target) = match data.para_header_at_target {
Some(para_header_at_target) => (data.required_para_header, para_header_at_target),
None => (para_header_at_source.0, Zero::zero()),
};
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Would it make sense to have something like the following ?

let required_para_header = max(data.required_para_header, para_header_at_source.0);
let para_header_at_target = para_header_at_target_or_zero;

Since we want to sync the latest header if I understand correctly.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

IIUC max(data.required_para_header, para_header_at_source.0) isn't fully correct. So we are choosing what we need to sync using following algorithm:

  1. if there are no parachain headers at the target (i.e. data.para_header_at_target is None) - then we want to sync current best header at source (para_header_at_source.0);
  2. otherwise, we shall never sync past the an artificial limit (data.required_para_header).

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks ! I think this part makes sense now. I have one more question. Sorry for insisting on this and sorry if it doesn't make much sense, but I'm trying to understand this logic (and the parachain heads syncing logic in general) better since I got to it here. One thing that stands out to me is that we do a lot of parachain validations:

	// this switch is responsible for processing `RelayingParaHeader` state
	let para_header_at_target_or_zero = data.para_header_at_target.unwrap_or_else(Zero::zero);
	match state {
		RelayState::Idle => (),
		RelayState::RelayingRelayHeader(_) => unreachable!("processed by previous match; qed"),
		RelayState::RelayingParaHeader(para_header_id) => {
			if para_header_at_target_or_zero < para_header_id.0 {
				// parachain header hasn't yet been relayed
				return RelayState::RelayingParaHeader(para_header_id)
			}
		},
	}

	// if we haven't read para head from the source, we can't yet do anyhting
	let para_header_at_source = match data.para_header_at_source {
		Some(ref para_header_at_source) => para_header_at_source.clone(),
		None => return RelayState::Idle,
	};
    ...

And at the end we do:

	// we need relay chain header first
	if data.relay_header_at_target < data.relay_header_at_source {
		return RelayState::RelayingRelayHeader(data.relay_header_at_source)
	}

And I was wondering if this order of operations is absolutely required. Do we need to know for example that para_header_at_source is Some, before returning RelayState::RelayingRelayHeader ? Or could we rearrange this logic a bit, starting with something like:

	// We need relay chain header first
	if data.relay_header_at_target < data.relay_header_at_source {
		return RelayState::RelayingRelayHeader(data.relay_header_at_source);
	}
	// this switch is responsible for processing `RelayingRelayHeader` state
	match state {
		RelayState::Idle | RelayState::RelayingParaHeader(_) => (),
		RelayState::RelayingRelayHeader(relay_header_number) => {
			if data.relay_header_at_target < relay_header_number {
				// required relay header hasn't yet been relayed
				return RelayState::RelayingRelayHeader(relay_header_number)
			}

			// we may switch to `RelayingParaHeader` if parachain head is available
			if let Some(para_header_at_relay_header_at_target) =
				data.para_header_at_relay_header_at_target.clone()
			{
				state = RelayState::RelayingParaHeader(para_header_at_relay_header_at_target);
			} else {
				// otherwise, we'd need to restart (this may happen only if parachain has been
				// deregistered)
				state = RelayState::Idle;
			}
		},
	}




	// this switch is responsible for processing `RelayingParaHeader` state
	let para_header_at_target_or_zero = data.para_header_at_target.unwrap_or_else(Zero::zero);
	match state {
		RelayState::Idle => (),
		RelayState::RelayingRelayHeader(_) => unreachable!("processed by previous match; qed"),
		RelayState::RelayingParaHeader(para_header_id) => {
			if para_header_at_target_or_zero < para_header_id.0 {
				// parachain header hasn't yet been relayed
				return RelayState::RelayingParaHeader(para_header_id)
			}
		},
	}
    ...

This would create a separation between the relay headers logic and para headers logic, and personally I would have found it easier to follow.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please keep asking questions - that's absolutely fine!

If we put this part

	// We need relay chain header first
	if data.relay_header_at_target < data.relay_header_at_source {
		return RelayState::RelayingRelayHeader(data.relay_header_at_source);
	}

at the beginning of the code, then we'll end up always syncing relay headers and parachain headers won't ever be synced. Just because (as a rule) it'll take more time for us to craft + send + mine transaction at the target chain vs just generating new block of the source chain.

There's also a big story behind this on-demand relay - it is called on-demand for a reason :) So we don't need to sync all source headers to the target chain. That's because then we'll:

  1. lose a ton of tokens - e.g. cost of import-Kusama-header-to-Polkadot iirc was somewhere between 1 and 2 DOTs. So if we'll sync all Kusama headers to Polkadot, we'll be losing ~24*60*60/6*2=28800 DOTs per day;
  2. we'll be generating unneeded congestion at the target network - i.e. we'll be generating quite large (in both size and weight terms) transactions at almost every block;
  3. we simply don't need it.

So initially relayers architecture was layered - similar to how our pallet are also layered (GRANDPA at the bottom, then parachains, then messaging on app level). And we have 3 separate relayer subcommands - one that relays GRANDPA proofs and headers (substrate-relay relay-headers), one that relays parachain headers (substrate-relay relay-parachains) and the messages relay (relay-messages). They all have their own task and e.g. messages relay will stall if other relays are not working (simply because it needs relay and parachain headers to be able to prove messages).

But running standalone headers relay (substrate-relay relay-headers) would lead us to the problem that I've described above. And the solution was to introduce so called complex relay (substrate-relay relay-headers-and-messages), which is a messages relay with "paused" background headers (and parachains) relays. So when messages relay is seeing that it needs header-100 to deliver message-1 from source to the target chain, it "awakes" the background headers relay and asks it to relay header-100. Once it is relayed, headers relay starts sleeping again and messages relay does its job.

There are also other caveats there - e.g. we need to awake background relay sometimes and relay mandatory headers, but overall - when complex relay is running (which is as I said: messages relay + on-demand headers and parachains relay), we are trying to minimize number of transactions that we're submitting.

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh, ok, this is super useful both for this review and for understanding more details about the project. Thanks ! I was thinking that I might be missing something.


# last time when we have been asking for conversion rate update
LAST_CONVERSION_RATE_UPDATE_TIME=0
# current conversion rate
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

All these scripts look very similar. I wonder if we could move the common logic in a function that's accessible to all. Just saying, but probably it's not worth doing now.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah - it has became complicated with that conversion rate override stuff, but we'll probably would need to remove it soon (it isn't usable in the XCM infra). But overall idea looks good (if there's enough code that may be shared)

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I created #1526 for this

@serban300 serban300 mentioned this pull request Jul 28, 2022
@serban300
Copy link
Copy Markdown
Collaborator

The comments have been addressed as part of #1525 . Marking this PR as reviewed.

wuminzhe pushed a commit to darwinia-network/darwinia-messages-substrate that referenced this pull request Aug 8, 2022
svyatonik pushed a commit that referenced this pull request Jul 17, 2023
Bumps [clap](https://github.com/clap-rs/clap) from 3.2.13 to 3.2.15.
- [Release notes](https://github.com/clap-rs/clap/releases)
- [Changelog](https://github.com/clap-rs/clap/blob/v3.2.15/CHANGELOG.md)
- [Commits](clap-rs/clap@v3.2.13...v3.2.15)

---
updated-dependencies:
- dependency-name: clap
  dependency-type: direct:production
  update-type: version-update:semver-patch
...

Signed-off-by: dependabot[bot] <support@github.com>

Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
serban300 pushed a commit to serban300/parity-bridges-common that referenced this pull request Mar 27, 2024
* fix on-different-forks metrics during initialization

* "initialize" parachain finality pallet in on-demand parachains relay

* decrease converstion rate requests count

* more error logging

* fix compilation

* clippy
serban300 pushed a commit to serban300/parity-bridges-common that referenced this pull request Apr 8, 2024
* fix on-different-forks metrics during initialization

* "initialize" parachain finality pallet in on-demand parachains relay

* decrease converstion rate requests count

* more error logging

* fix compilation

* clippy
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

P-Relay PR-breaksrelay A PR that is going to break existing relayers. I.e. some Runtime changes render old relayers unusabl

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants