Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Enable ancient block pruning #1216

Merged
merged 60 commits into from
May 9, 2024
Merged

Enable ancient block pruning #1216

merged 60 commits into from
May 9, 2024

Conversation

manav2401
Copy link
Contributor

@manav2401 manav2401 commented Apr 11, 2024

Description

Updated version of original PR from @jsvisa: #751

This PR implements PIP-32: Ancient Block Pruning. It adds a new pruner which can be used to prune the data from the ancient/freezer database in bor. The aim of this is to delete the historical block data (headers, txs, and receipts) for old blocks which are not used in verification of the chain (after a point).

Specification

Usage: bor snapshot prune-block [options...]
  This command will prune ancient blockchain data at the given datadir location. Options:
  -datadir <value>                  Path of the data directory to store information
  -datadir.ancient <value>          Path of the old ancient data directory
  -block-amount-reserved <value>    Sets the expected reserved number of blocks for offline block prune (default: 1024)
  -cache <value>                    Megabytes of memory allocated to internal caching (default: 1024)
  -cache.triesinmemory              Number of block states (tries) to keep in memory (default: 128)
  -check-snapshot-with-mpt          Enable checking between snapshot and Merkle Patricia Tree (default: false)

The prune command keeps the last N blocks in the Freezer database and removes the rest. Value of N can vary from 0 to K, where K is the length of the chain excluding the blocks in KvDB (LevelDB/PebbleDB) (90k default) and can be set using the flag block-amount-reserved. Basically, the pruning process will begin from genesis moving towards the most recent block. Pruning can be done multiple times.

For example: the Freezer database has these blocks: [0, 1, ..., 999, 1000]. If the value of block-amount-reserved is set to 100, then [0, 1, ..., 900] will be pruned and the database will be left with [901, ..., 1000]. On the second round of pruning with the value of block-amount-reserved set to 50, the remaining values will be [951, ..., 1000].

The pruner maintains an offset in the key value store, which depicts the start block number of the Freezer db containing ancient data. The pruner keeps updating this value after each pruning round and uses it for the next round to determine the starting point. The pruner opens a backup Freezer db and moves the blocks to be kept from the old db location. Upon completion, it performs the validation to make sure the Freezer db and KvDB (LevelDB/PebbleDB) are in continuity and proceeds to delete the old db fully.

Note: Ancient pruning only works for hash based storage scheme and doesn't work for path based as of now.

Backwards Compatibility

This change is backwards in-compatible if pruning is performed (at least once) on the node.

Changes

  • Bugfix (non-breaking change that solves an issue)
  • Hotfix (change that solves an urgent issue, and requires immediate attention)
  • New feature (non-breaking change that adds functionality)
  • Breaking change (change that is not backwards-compatible and/or changes current functionality)
  • Changes only for a subset of nodes

Breaking changes

Once pruning is done (for a particular range of blocks), one won't be able to serve RPC requests and P2P requests involving blocks in that range.

Checklist

  • I have added at least 2 reviewer or the whole pos-v1 team
  • I have added sufficient documentation in code
  • I will be resolving comments - if any - by pushing each fix in a separate commit and linking the commit hash in the comment reply
  • Created a task in Jira and informed the team for implementation in Erigon client (if applicable)
  • Includes RPC methods changes, and the Notion documentation has been updated

Cross repository changes

  • This PR requires changes to heimdall
    • In case link the PR here:
  • This PR requires changes to matic-cli
    • In case link the PR here:

Testing

  • I have added unit tests
  • I have added tests to CI
  • I have tested this code manually on local environment
  • I have tested this code manually on remote devnet using express-cli
  • I have tested this code manually on mumbai
  • I have created new e2e tests into express-cli

Manual tests

Tested on devnets, and mumbai.

jsvisa and others added 30 commits March 17, 2023 13:15
Signed-off-by: Delweng <[email protected]>
Signed-off-by: Delweng <[email protected]>
Signed-off-by: Delweng <[email protected]>
@manav2401
Copy link
Contributor Author

manav2401 commented Apr 14, 2024

Updates: I have resolved all the conflicts and the tests are passing now. Below are few pending things:

@manav2401
Copy link
Contributor Author

Updates:

  • This feature doesn't work right away with PBSS enabled - this is because PBSS adds some more tables to the freezer db and needs some additional handling. If required, this can be taken up later as a separate PR (this addresses point 2 and 5)
  • Devnet testing completed

Next steps

  • Test on Amoy/Mumbai once

@manav2401
Copy link
Contributor Author

Have we tested the behavior of the node under impacted RPC calls, e.g. getBlockByNumber?

Yes. Post pruning of the ancient data, none of the RPC calls will return the block / tx data (which is expected) for the pruned blocks. They will instead return null. Also, it doesn't affect the working of node in any way.

cmd/utils/flags.go Outdated Show resolved Hide resolved
core/rawdb/database.go Outdated Show resolved Hide resolved
core/rawdb/database.go Show resolved Hide resolved
core/rawdb/database.go Show resolved Hide resolved
core/rawdb/database.go Show resolved Hide resolved
core/rawdb/schema.go Outdated Show resolved Hide resolved
core/state/pruner/pruner.go Show resolved Hide resolved
core/state/pruner/pruner.go Show resolved Hide resolved
core/state/pruner/pruner.go Show resolved Hide resolved
core/state/pruner/pruner.go Show resolved Hide resolved
@manav2401 manav2401 requested review from marcello33, cffls and a team April 22, 2024 11:46
@manav2401 manav2401 requested a review from a team April 25, 2024 09:17
@0x090909
Copy link

0x090909 commented May 6, 2024

updates?

core/rawdb/accessors_chain.go Show resolved Hide resolved
core/rawdb/database.go Show resolved Hide resolved
@manav2401 manav2401 merged commit d95c05b into develop May 9, 2024
13 checks passed
@manav2401 manav2401 deleted the manav/ancient-pruner-test-1 branch May 9, 2024 05:49
@VSGic
Copy link

VSGic commented Jul 11, 2024

Hi, there, I have a question about block prune implementation, what is workflow with it usage, I had node with almost out of disk space, and I started there block prune, and node within several hours got out of space and crushed. So, better have some explanation about block prune implementation. My aim is to setup node with pruned blocks that supposed should save disk space on the node server? Should I resetup node, download snapshot and start bor with this settings to exclude blocks from database? How much space additionally needed?

@manav2401
Copy link
Contributor Author

manav2401 commented Jul 12, 2024

Hey @VSGic, a very brief workflow is written in this PR description. I'd like to know which parameters you used to run the ancient pruning and were there any errors in logs?

Moreover, you can inspect the ancient datadir via the command below:

bor snapshot inspect-ancient-db --datadir <datadir> --datadir.ancient <ancient_dir>

This will let you know how many blocks your ancient dir contains and if they're actually pruned or not (due to some errors while running pruning).

Additionally, can you check the size of ancient datadir (du -sh <datadir>/bor/chaindata/ancient) and let me know?

I don't think you need to setup the node again and download snapshot. The underlying state of the blockchain also takes quite a bit of space. I don't know individual amounts but one of our internal node is taking 3.8TB (mainnet, full datadir size). You might want to prune the state data as well (which might take few hours and is a lengthy process). This will reduce your state size by a great extent. See the bor snapshot prune-state command.

Finally, we'll also be releasing a PBSS snapshot soon which has significant reduction in the state size if you want to wait for that.

I'd appreciate if you can create a new issue to discuss this. Thanks!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

8 participants