Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Allow pruning to multiple targets, including without snapshots #186

Merged
merged 20 commits into from
Apr 13, 2023

Conversation

PlasmaPower
Copy link
Collaborator

@PlasmaPower PlasmaPower commented Dec 7, 2022

No description provided.

Copy link
Contributor

@tsahee tsahee left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

some notes & thoughts

// don't belong to the target state and the genesis state
// - iterate the snapshot, reconstruct the relevant state
// - iterate the database, delete all other state entries which
// don't belong to the target state and the genesis state
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

better if could avoid any unnecessary geth changes.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is just gofmt which happens every time I save the file. I'd be happy to change it back, but I want to make sure I won't need to make any future changes first, so I'll do it after you approve. Though at this point I've already changed a large percentage of the file.

@@ -261,6 +261,8 @@ func (p *Pruner) Prune(root common.Hash) error {
}
// Use the bottom-most diff layer as the target
root = layers[len(layers)-1].Root()
} else if p.snaptree.Snapshot(root) == nil {
p.snaptree.Rebuild(root, false)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This seems like it will work.. however, I think that a better alternative would be to skip creating the snaptree in NewPruner and passing the right root to snapshot.New which wouldn't require any change in snapshot.

For nitro - I think we'll need to support multiple roots (so all will be cleaned unless in any of these roots), which should probably be done with multiple snaptrees, each with a different root, each updating the same stateBloom.

At least in a validators we'd like to keep state for: something recent + latest validated + latest accepted onchain.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've added support for multiple roots, but I found that having multiple snaptrees was way too expensive, and would completely destroy any existing snapshots. Instead, I've added support for generating the pruning bloom filters without a snapshot :)

func (t *Tree) Rebuild(root common.Hash) {
func (t *Tree) Rebuild(root common.Hash, async bool) {
var genPending chan struct{}
if !async {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

  1. Why use defer and not just check async at the end of function?
    (if using defer, I'd at least check that genPending is not nil)

  2. why not use waitBuild for waiting?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I used a defer because I wanted it to happen after the mutex was unlocked. I'll switch it to waitBuild though.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've removed the need for this change

@PlasmaPower PlasmaPower changed the title Fix pruning with an arbitrary target root Allow pruning to multiple targets, including without snapshots Dec 15, 2022
}

// We assume output does not need the value, only the key
func dumpRawTrieDescendants(db ethdb.Database, root common.Hash, output ethdb.KeyValueWriter) error {
Copy link
Contributor

@tsahee tsahee Dec 19, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

taking a stateBloom instead of KeyValueWriter would be better for self-documentation

Copy link
Contributor

@tsahee tsahee left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm mostly comparing dumpTrieDescendant to extractGenesis.
If there is a good reason for the diff - LGTM. Otherwise, code-sharing the two might have nice advantages.

@@ -228,10 +233,130 @@ func prune(snaptree *snapshot.Tree, root common.Hash, maindb ethdb.Database, sta
return nil
}

func nodeIteratorKey(it trie.NodeIterator) (common.Hash, error) {
if it.Leaf() {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looking through extractGenesis it doesn't have this part - only it.Hash()
Also - if I'm reading this correctly, LeafKey would be the path to the data, and I don't remember that it should appear as a key in the database (?)

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

extractGenesis is right here :) I'm updating my function and having extractGenesis just call mine since mine is parallel and has ETA tracking

Copy link
Contributor

@tsahee tsahee left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM + small suggestion (can be done separately or skipped)


func bloomFilterName(datadir string, hash common.Hash) string {
return filepath.Join(datadir, fmt.Sprintf("%s.%s.%s", stateBloomFilePrefix, hash.Hex(), stateBloomFileSuffix))
return dumpRawTrieDescendants(db, genesis.Root(), stateBloom)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

code already changed, you could also make this func treat the real arbitrum genesis

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Unfortunately we need the genesis block number separately on the nitro side to compute the latest finalized L2 block from the latest finalized message number. Getting the block number is a bit trickier here without the ReadChainConfig helper (it means we'd need to first read block 0, and then read the actual genesis block), so I've kept it on the nitro side for now.

@PlasmaPower PlasmaPower merged commit 8c5b933 into master Apr 13, 2023
@PlasmaPower PlasmaPower deleted the arbitrary-prune-point branch April 13, 2023 18:59
@Tristan-Wilson Tristan-Wilson mentioned this pull request Apr 18, 2023
13 tasks
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants