TrieLogPruner preload with 30 second timeout#7365
Merged
siladu merged 12 commits intobesu-eth:mainfrom Jul 26, 2024
Merged
Conversation
Signed-off-by: Simon Dudley <simon.dudley@consensys.net>
Signed-off-by: Simon Dudley <simon.dudley@consensys.net>
Signed-off-by: Simon Dudley <simon.dudley@consensys.net>
Signed-off-by: Simon Dudley <simon.dudley@consensys.net>
Signed-off-by: Simon Dudley <simon.dudley@consensys.net>
pinges
approved these changes
Jul 24, 2024
Contributor
pinges
left a comment
There was a problem hiding this comment.
I've left a question for you, but I think it looks good and adding the timeout is a good thing!
| private void preloadQueue( | ||
| final AtomicBoolean timeoutOccurred, final ScheduledFuture<?> timeoutFuture) { | ||
|
|
||
| try (final Stream<byte[]> trieLogKeys = rootWorldStateStorage.streamTrieLogKeys(loadingLimit)) { |
Contributor
There was a problem hiding this comment.
Would it make sense to start streaming at a random key? There is a method streamFromKey() in class KeyValueStorage that allows you to pass in the starting key.
…meout Signed-off-by: Simon Dudley <simon.dudley@consensys.net>
Signed-off-by: Simon Dudley <simon.dudley@consensys.net>
This was referenced Jul 26, 2024
8 tasks
gconnect
pushed a commit
to gconnect/besu
that referenced
this pull request
Aug 26, 2024
Also reduce pruning window from 30_000 to 5_000 --------- Signed-off-by: Simon Dudley <simon.dudley@consensys.net> Co-authored-by: Sally MacFarlane <macfarla.github@gmail.com> Signed-off-by: gconnect <agatevureglory@gmail.com>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
PR description
The TrieLogPruner preload operation executes during startup. It streams a batch of trie logs from rocksdb, looks up their block headers to get a block number and adds them to the prune queue. Finally it triggers a prune operation on the prune queue, which may prune some of these trie logs if they are eligible - note pruning from the queue is fast.
The preload is executed synchronously on the main thread. Under certain conditions, this can take a long time, particularly when the trie log column family contains a large number of keys.
This PR wraps the whole operation in a 30 second timeout to avoid startup hangs.
It is still synchronous and blocks the main startup thread because doing this async (as explored in #7337) affects the performance (especially disk I/O) of the normal operation of the node: block import and potentially block proposal.
In addition, the default pruning window size is reduced from 30_000 to 5_000 which should be sufficient to cover any pruning gaps due to maintenance downtime, while reducing the impact of the performance issue.
The expectation is that if the user has a backlog of trie logs, they will follow the guide to use the prune subcommand. As such, a log has been added which will be the last log line printed before the 30 second timeout window expires.
Here's an example from before this PR of a 2.5minute load...
And now with this PR...
Fixed Issue(s)
#7322
Thanks for sending a pull request! Have you done the following?
doc-change-requiredlabel to this PR if updates are required.Locally, you can run these tests to catch failures early:
./gradlew build./gradlew acceptanceTest./gradlew integrationTest./gradlew ethereum:referenceTests:referenceTests