Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

db/store full (forked) blocks cleanup #2585

Closed
antiochp opened this issue Feb 16, 2019 · 10 comments
Closed

db/store full (forked) blocks cleanup #2585

antiochp opened this issue Feb 16, 2019 · 10 comments

Comments

@antiochp
Copy link
Member

We compact/prune the blocks db periodically to remove historical full blocks from the db older than 1 week (unless running in archive mode).

But - we do this based on the current chain by traversing back from the head of the chain.

So if we see any short lived forks or reorgs we will never actually clean these forked blocks up and they will remain in the db indefinitely.

i.e. if we have the following blocks in the chain -

B1 <- B2 <- B3 etc.

We will never remove any forked blocks, say B2' for example.

@192-sean
Copy link

Where would I start if I were to tackle this.

@antiochp
Copy link
Member Author

Right now we traverse back through historical blocks and then go to the db for each one to be removed. But I think a better approach may be start at the db and simply iterate over the blocks in the db, removing any that are candidates for removal.

This might be a good introduction to the lower level db access stuff, so I'd start looking around our lmdb code to see how we might use an iterator over full blocks. We do something similar in the peers db (via an iterator).

@miketang84
Copy link
Contributor

I would like to process this.

miketang84 added a commit to miketang84/grin that referenced this issue May 6, 2019
@miketang84
Copy link
Contributor

miketang84 commented May 7, 2019

Hi, @antiochp I have the following questions:

If we clean old blocks by iterating db directly, how we check which block is at a short-lived fork? or even a fork?

And, should we remove all fork blocks from db?

And I proposed a request on #2807 , does it walk the right way?

Thank you.

@hayeah
Copy link

hayeah commented May 7, 2019

My understanding is that instead of using batch.get_previous_header(&current) to iterate through the blocks, you should access the underlying store of the Batch type (which is a LMDB instance)

/// An atomic batch in which all changes can be committed all at once or
/// discarded on error.
pub struct Batch<'a> {
	db: store::Batch<'a>,
}

Create a LMDB iterator to access ALL the blocks, regardless of whether the block is in the chain or not.

Then we need to have an efficient way to check whether a particular block exists on the chain (if not, remove it).

To check whether a block is on chain, one naive way is to iterate through the chain (using get_previous_header), and save block hashes in a map. The CUT_THROUGH_HORIZON is ~10k blocks, so we'll have to keep 10k entries in memory for the compaction process.

miketang84 added a commit to miketang84/grin that referenced this issue May 7, 2019
@antiochp
Copy link
Member Author

antiochp commented May 7, 2019

Then we need to have an efficient way to check whether a particular block exists on the chain (if not, remove it).

chain.is_on_current_chain()

This checks the header against the header at that height based on our current chain - if they match then we're good (header at height h is the one we think it is).

@miketang84
Copy link
Contributor

miketang84 commented May 8, 2019

hi, #2810 CI prompted:

running 5 tests
test hard_forks ... ok
test next_target_adjustment ... ok
test test_secondary_pow_ratio ... ok
test adjustment_scenarios ... test adjustment_scenarios has been running for over 60 seconds
test test_secondary_pow_scale ... test test_secondary_pow_scale has been running for over 60 seconds

No output has been received in the last 10m0s, this potentially indicates a stalled build or something wrong with the build itself.
Check the details on how to adjust your build configuration on: https://docs.travis-ci.com/user/common-build-problems/#Build-times-out-because-no-output-was-received

What is the meaning of it? Because in my local device, there are no these timeout error. How should I resolve this consensus timeout?

@chisa0a
Copy link

chisa0a commented May 9, 2019

@daogangtang had a similar issue on a resource constrained device. If you have the ability to increase the memory available on the CI box, that might help.

Another option is to refactor the tests to consume less resources, which is being partly addressed by #2813 (IIUC).

miketang84 added a commit to miketang84/grin that referenced this issue May 9, 2019
@miketang84
Copy link
Contributor

@chisa0a Yes, thank you, I repost a PR #2815 , and it pass now.

antiochp pushed a commit that referenced this issue May 13, 2019
…#2815)

* fix: try to fix issue #2585 by adding block cleanup from db directly.

Signed-off-by: Mike Tang <[email protected]>

* use another effective algorithm to do old block and short-lived block cleaup.

Signed-off-by: Mike Tang <[email protected]>

* 1. rename iter_lived_blocks to blocks_iter;
2. comments and iterator calling optimiztions.

Signed-off-by: Mike Tang <[email protected]>

* Fix locking bug when calling is_on_current_chain() in batch.blocks_iter just by removing it.
Because "we want to delete block older (i.e. lower height) than tail.height".

Signed-off-by: Mike Tang <[email protected]>
@antiochp
Copy link
Member Author

Resolved in #2815. 🚀

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

6 participants
@hayeah @miketang84 @192-sean @antiochp @chisa0a and others