Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[R4R] Separate Processing and State Verification on BSC #926

Merged
merged 58 commits into from
Jul 4, 2022

Conversation

dean65
Copy link
Contributor

@dean65 dean65 commented May 24, 2022

Motivation

The increasing adoption of BSC leads to a more active network. On the other hand, the node maintainer had a hard time keeping their node catching up with the chain. A new syncing protocol to lower the hardware requirement is an urgent need.

Currently a BSC node has two kinds of state world: MPT and snapshot. MPT(Merkle Patricia Tries) is a tree-structured state world. The key function of MPT is to generate the state root to ensure state consistency, while the query/commit on MPT is quite slow. Snapshot is a flattened key-value-based state world. Snapshot provides fast queries and commits. The storage size of the snapshot increases slowly even with a large transaction volume. Snapshot is usually used for block processing, while MPT is used for state verification.

In order to lower the hardware requirement and keep security, we introduce two type nodes to make full use of different storages, one is named fast node, and the other is named verify node. The fast node will do block processing with snapshot, it will do all verification against blocks except state root. The verify node receives diffhash from the fast node then responds MPT root to the fast node.

The fast node doesn’t need to store MPT, so the storage and computation requirement will be much lower.

Specification

Architecture

The network topology of the fast node and verify node:
topology

Fast node is a bsc client that does fullsync using only Snapshot and generates difflayer. It needs the confirm message from the verify node before freezing the blocks, it has to wait until it receives a confirm message from the verify node for the ancestor block before inserting new blocks.

Verify node is a normal bsc full node that does fullsync using Snapshot and MPT, generates difflayer. It receives diffhash from fast nodes, finds corresponding difflayer whose diffhash is matched, then it responds MPT root message to the fast node.

All the messages exchanged between fast node and verify node are based on the trust p2p protocol.

The relationship between the fast node and the verify node:
architecture

On the fast node side, for each inserted block, verify manager creates a verify task, the verify task submits the block’s diffhash to multiple trusted verify nodes, and receives response from them, once received the confirm message, the block has been verified.

On the verify node side, it receives requests from fast nodes, does corresponding processing and then responds to fast nodes immediately. The processing of diffhash finds difflayer from cache first, if not found then search from diff store, if still no difflayer found, then find from blockchain headers, this processing would be very quick.

Authentication

Fast node can only rely on trusted verify node, either deployed by the same developer or deployed by a trusted organization. We know the different peers will verify the peer id during the handshake, we will borrow this mechanism to do the authentication. We introduce VerifyNodes settings, which is a list of encoded addresses, the fast nodes only build connections(based on trust protocol) with VerifyNodes.

For organizations, they can deploy their own verified node.

For individual developers, they can connect to the verify node donated by the famous organization/validators.

Chain Tools

Implement a new prune command to prune all MPT storage.

Prototype verification

The performance improves x3 in fast node on mainnet.

Command

Fast Node

Introduce a new tries-verify-mode setting, there are four modes:

  • local: a normal full node with complete state world(both MPT and snapshot), merkle state root will be verified against the block header.
  • full: a fast node with only snapshot state world. Merkle state root is verified by the trustworthy remote verify node by comparing the diffhash(an identify of difflayer generated by the block) and state root.
  • insecure: same as full mode, except that it can tolerate without verifying the diffhash when verify node does not have it.
  • none: no merkle state root verification at all, there is no need to setup or connect remote verify node at all, it is more light comparing to full and insecure mode, but get a very little chance that the state is not consistent with other peers.

If the fast node runs in not local mode, the node will disable diff protocol by default, If the fast node runs in full or light mode, the node will enable trust protocol by default.

./geth --config ./config.toml --datadir ./node --syncmode full --cache 5000 --tries-verify-mode none

Verify node

When a full node has enabled the trust protocol, it can serve as a verify node, at the same time, we will recommend you to enable persist diff, disable snap protocol and diff protocol when running a verify node.

./geth --config ./config.toml --datadir ./node --syncmode full --cache 5000 --persistdiff --enabletrustprotocol --disablesnapprotocol --disablediffprotocol

Prune tries node

Prune the tires node: ./geth snapshot insecure-prune-all --datadir ./node ./config.toml ./genesis.json

Noteable

When using noTrie mode on the test node, we found a problem, due to the change of pipecommit, the update of snapshots was changed to be parallelized with insertBlock. This will cause a problem. When insertBlock, the EVM execution process needs to read data from the snpashot, and the snapshots refresh the data in the background at the same time, and the difflayer that has been merged into the underlying snapshot will be updated to the stale state, and the reader will get it. an empty array.
In the node with Trie, the correct data can be retrieved from the Trie Tree, but because the noTrie mode cannot retrieve any data from the Trie tree, it will cause problems.

In order to solve this issue, we have introduced a fix.
Reading snapshot data in noTrie mode will retry if it gets a stale error.

Under normal case, the snapshot updated to be outdated is not read at the time of reading, but there is a time gap between the snpashot update and the read lock, and there is a chance to get stale difflayer under parallel execution.

RealUncle and others added 30 commits January 26, 2022 14:37
[R4R]Separate Processing and State Verification on BSC: sync develop branch
Signed-off-by: kyrie-yl <[email protected]>
Signed-off-by: kyrie-yl <[email protected]>
unclezoro
unclezoro previously approved these changes Jun 30, 2022
unclezoro
unclezoro previously approved these changes Jul 1, 2022
core/blockchain.go Outdated Show resolved Hide resolved
keefel
keefel previously approved these changes Jul 1, 2022
@dean65 dean65 dismissed stale reviews from keefel and unclezoro via 149718f July 1, 2022 07:39
@setunapo setunapo self-requested a review July 3, 2022 08:33
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

10 participants