Skip to content
This repository was archived by the owner on Nov 15, 2023. It is now read-only.

Conversation

@guoxbin
Copy link

@guoxbin guoxbin commented Jan 4, 2020

Fixes #4523

v1.0 light node cannot sync to the newest block.
There are 2 problems:

  • light node call a wrong method name version
    version is a method of Core mudule, so the method name should be Core_version

  • the import blocks channel of light node has a limited capacity
    light node makes remote calls when importing and verifying new blocks and costs more time than full node does.
    When a light node has a lot of new blocks to sync, a limited capacity channel will be a problem.

@parity-cla-bot
Copy link

It looks like @guoxbin hasn't signed our Contributor License Agreement, yet.

The purpose of a CLA is to ensure that the guardian of a project's outputs has the necessary ownership or grants of rights over all contributions to allow them to distribute under the chosen licence.
Wikipedia

You can read and sign our full Contributor License Agreement at the following URL: https://cla.parity.io

Once you've signed, please reply to this thread with [clabot:check] to prove it.

Many thanks,

Parity Technologies CLA Bot

@guoxbin
Copy link
Author

guoxbin commented Jan 4, 2020

[clabot:check]

@parity-cla-bot
Copy link

It looks like @guoxbin signed our Contributor License Agreement. 👍

Many thanks,

Parity Technologies CLA Bot

@gavofyork gavofyork added the A0-please_review Pull request needs code review. label Jan 4, 2020
@bkchr
Copy link
Member

bkchr commented Jan 5, 2020

Why will the channel be a problem? We have a limited channel to not blow up the memory usage.

@guoxbin
Copy link
Author

guoxbin commented Jan 7, 2020

Why will the channel be a problem? We have a limited channel to not blow up the memory usage.

When a light node verify a new block header, It needs to get state (like authories).

Since the light node doesn't maintain local state, It need make remote call to get the state, which slows down the verification speed and exhausts the import block channel.
    

@arkpar
Copy link
Member

arkpar commented Jan 7, 2020

cc @svyatonik

@bkchr
Copy link
Member

bkchr commented Jan 7, 2020

Why will the channel be a problem? We have a limited channel to not blow up the memory usage.

When a light node verify a new block header, It needs to get state (like authories).

Since the light node doesn't maintain local state, It need make remote call to get the state, which slows down the verification speed and exhausts the import block channel.

Nevertheless, I would stick to a bounded channel. We could increase the size. Maybe to 40 elements?

@guoxbin
Copy link
Author

guoxbin commented Jan 7, 2020

Nevertheless, I would stick to a bounded channel. We could increase the size. Maybe to 40 elements?

I know your concern.

According to my recent study, I think the requirement of remote call is quite related to how the consensus module is implemented.
If the block verification makes lots of remote calls, the sync from very early block number to a large best number may be a mission impossible.

I find that a light node will lose it's genesis state after a restart, which causes the node make a remote call even when querying state of genesis block. This causes things worse.

So I suggest:
Firstly, fix the genesis state lost problem,
and then if unbounded(4) is acceptable, we don't need modify that.

Some code about genesis state lost problem
the genesis_state is initialized with None and no code will reload it from disk storage:

pub struct Backend<S, F, H> {
	blockchain: Arc<Blockchain<S, F>>,
	genesis_state: RwLock<Option<InMemoryState<H>>>,
}

source

impl<S, F, H> Backend<S, F, H> {
	/// Create new light backend.
	pub fn new(blockchain: Arc<Blockchain<S, F>>) -> Self {
		Self {
			blockchain,
			genesis_state: RwLock::new(None),
		}
	}

source

@bkchr
Copy link
Member

bkchr commented Jan 7, 2020

The problem is also that you are working here with deprecated code. We don't work on this anymore. Did you checked master to see if your problems are solved there?

@guoxbin
Copy link
Author

guoxbin commented Jan 7, 2020

Yes, our project is based on v1.0.

We may migrate to the newest version later, but this is not a simple work.

I would like to help fix the problem on branch v1.0 for someone else may need a solid legacy version.

@svyatonik
Copy link
Contributor

just a side note: the idea behind light client was that it shouldn't have made any runtime calls during sync at all. So if in 1.0 it makes such calls, then it isn't actually a light client (iirc it has been syncing much longer than a full client).

To make light client working for 1.0, you'll at least need #1622 and #1669 (and even more, but these are essential), even though they're already obsoleted in the master branch. I'm not sure - whether we accept PRs to v1.0 branch at all?

@guoxbin
Copy link
Author

guoxbin commented Jan 7, 2020

Agree that light client shouldn't have made any runtime calls.

Branch v1.0 contains the commit by pr #1622.
This makes the node have the genesis state for the first time, but the genesis state will be lost after restarting.

@bkchr
Copy link
Member

bkchr commented Jan 8, 2020

Yeah, I'm also not that convinced of merging something into version 1.0.

@bkchr
Copy link
Member

bkchr commented Jan 8, 2020

I would really encourage you to migrate to 2.0.

For the time, I would propose that you just use a fork of Substrate 1.0 to add fixes you need. Sorry, but we don't have the resources to maintain this old version.

@bkchr bkchr closed this Jan 8, 2020
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.

Labels

A0-please_review Pull request needs code review.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

6 participants