Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Import export trie log #6363

Merged
merged 51 commits into from
Jan 22, 2024
Merged

Conversation

gfukushima
Copy link
Contributor

@gfukushima gfukushima commented Jan 8, 2024

PR description

This PR extends the x-trie-log subcommand to allow exporting and importing of a trie log.

This PR is built on top of #6303

siladu and others added 26 commits December 15, 2023 16:00
Signed-off-by: Simon Dudley <[email protected]>
Signed-off-by: Gabriel Fukushima <[email protected]>
Signed-off-by: Simon Dudley <[email protected]>

Signed-off-by: Gabriel Fukushima <[email protected]>
Signed-off-by: Gabriel Fukushima <[email protected]>
Signed-off-by: Gabriel Fukushima <[email protected]>
Signed-off-by: Gabriel Fukushima <[email protected]>
Signed-off-by: Gabriel Fukushima <[email protected]>
Signed-off-by: Gabriel Fukushima <[email protected]>
Signed-off-by: Gabriel Fukushima <[email protected]>
Signed-off-by: Gabriel Fukushima <[email protected]>
Signed-off-by: Gabriel Fukushima <[email protected]>
Signed-off-by: Gabriel Fukushima <[email protected]>
Signed-off-by: Gabriel Fukushima <[email protected]>
Copy link

github-actions bot commented Jan 8, 2024

  • I thought about documentation and added the doc-change-required label to this PR if updates are required.
  • I thought about the changelog and included a changelog update if required.
  • If my PR includes database changes (e.g. KeyValueSegmentIdentifier) I have thought about compatibility and performed forwards and backwards compatibility tests

# Conflicts:
#	besu/src/main/java/org/hyperledger/besu/cli/subcommands/storage/TrieLogHelper.java
#	besu/src/main/java/org/hyperledger/besu/cli/subcommands/storage/TrieLogSubCommand.java
#	besu/src/test/java/org/hyperledger/besu/cli/subcommands/storage/TrieLogHelperTest.java
@Command(
name = "export",
description =
"This command prunes all trie log layers below the retention threshold, including orphaned trie logs.",
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is the pruning description not the export description

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

thanks for catching this

@Command(
name = "import",
description =
"This command prunes all trie log layers below the retention threshold, including orphaned trie logs.",
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This description looks like the pruning description

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

thanks for catching this!

private CommandLine.Model.CommandSpec spec; // Picocli injects reference to command spec

@CommandLine.Option(
names = "--trie-log-hash",
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can the exported file contain more than one trielog? Would of expected the import to specify the filename and just import all trielogs in that file

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not at the moment, this could be extended in the future to do so if needed.

final String trieLogFile =
dataDirectoryPath.resolve(DATABASE_PATH).resolve(trieLogHash.toString()).toString();

var trieLog = readTrieLogsFromFile(trieLogFile);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: not sure how important this is so feel free to ignore it doesn't make sense. But having the format in RLP to match the storage in the database trielog column family might make comparison easier without the extra encoding of the IdentityMap. Depends on the purpose of this tool.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've added this subcommands to allow extraction and insertion of trie logs mostly for debug or troubleshooting purposes, but a version that imports and exports RLP could be very useful too.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd lean towards a more readable format rather than binary as it's hard to work with. Agree RLP is a more useful middle ground than custom binary (if that's what this is).
e.g. I wonder if this could be used to import test data...compare with RlpBlockImporter.java and JsonBlockImporter

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd rather have us to extend the current feature to allow the export to RLP reusing this feature if necessary, it isn't quite simple to do that right now and it would impact the sub-command prune logic as well.

paramLabel = DefaultCommandValues.MANDATORY_LONG_FORMAT_HELP,
description = "The hash of the block you want to import the trie log.",
arity = "1..1")
private String trieLogHash;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

rename to blockHash? the variable name suggests this is the hash of the trielog

paramLabel = DefaultCommandValues.MANDATORY_LONG_FORMAT_HELP,
description = "The hash of the block you want to import the trie log.",
arity = "1..1")
private String trieLogHash;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could be useful to be able to specify a block number instead of block hash. But only a suggestion not sure which one we will use more in practice

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think block hash is prob better is this case, as you could eventually request by number and not have the right block hash if your node forked at that specific block requested.

@gfukushima gfukushima added the TeamGroot GH issues worked on by Groot Team label Jan 9, 2024
final String trieLogFile =
dataDirectoryPath.resolve(DATABASE_PATH).resolve(trieLogHash.toString()).toString();

var trieLog = readTrieLogsFromFile(trieLogFile);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd lean towards a more readable format rather than binary as it's hard to work with. Agree RLP is a more useful middle ground than custom binary (if that's what this is).
e.g. I wonder if this could be used to import test data...compare with RlpBlockImporter.java and JsonBlockImporter

Copy link
Contributor

@garyschulte garyschulte left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

great feature 👍 feedback mostly about import/export introspection

@@ -97,16 +97,15 @@ private static void processTrieLogBatches(
final String batchFileNameBase) {

for (long batchNumber = 1; batchNumber <= numberOfBatches; batchNumber++) {

final String batchFileName = batchFileNameBase + "-" + batchNumber;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this might make more sense to have first/last block numbers included in the filename. Otherwise it won't be clear what is actually in the files after an export

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The batch filenames aren't used as part of import/export subcommands the filename is taken from command line args instead. This is only used for the prune subcommand

@@ -285,11 +279,10 @@ private static void saveTrieLogsInFile(
}

@SuppressWarnings("unchecked")
private static IdentityHashMap<byte[], byte[]> readTrieLogsFromFile(
final String batchFileNameBase, final long batchNumber) {
static IdentityHashMap<byte[], byte[]> readTrieLogsFromFile(final String batchFileName) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Non blocking feedback -

I am all for code reuse, but if we are going to allow for arbitrary import and export, the import files should be more readable and "createable".

The ObjectOutputStream seems fine for backup/recovery of a pruning process, but when part of a general import/export process this file format is too inscrutable IMO.

At least for import/export we should serialize/deserialize these as json maps. Key as the hash string, and the trielog itself as hex (or as a rich json object if we wanted to be super transparent). In addition to being a bit more introspectable, it would allow us to create and import our own handcrafted trielogs when debugging

@garyschulte
Copy link
Contributor

another cadillac feature would be the ability to specify a URL for the import files. This could have been used to recover nodes from the Jan 6 halting event. We could have provided import command instructions and a URL to delivery the correct trielog.

Copy link
Contributor

@siladu siladu left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good. Rename tidy-up could happen in a subsequent PR

"--Xbonsai-trie-log-pruning-enabled",
"--Xbonsai-trie-log-pruning-limit",
"--Xbonsai-limit-trie-logs-enabled",
"--Xbonsai-trie-logs-pruning-limit",
"1");
}

@Test
public void bonsaiTrieLogPruningLimitShouldBePositive() {
internalTestFailure(
"--Xbonsai-trie-log-pruning-limit=0 must be greater than 0",
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
"--Xbonsai-trie-log-pruning-limit=0 must be greater than 0",
"--Xbonsai-trie-logs-pruning-limit=0 must be greater than 0",

Copy link
Contributor

@siladu siladu Jan 19, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There's a few other Strings where this rename needs to happen too.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

will do this in a followup PR

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should be covered by #6445

@jframe jframe enabled auto-merge (squash) January 19, 2024 05:47
@jframe jframe disabled auto-merge January 19, 2024 05:48
Signed-off-by: Jason Frame <[email protected]>
@jframe jframe enabled auto-merge (squash) January 19, 2024 06:15
@jframe jframe disabled auto-merge January 19, 2024 06:43
@jframe jframe enabled auto-merge (squash) January 22, 2024 00:40
@jframe jframe merged commit cfea3ab into hyperledger:main Jan 22, 2024
18 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
TeamGroot GH issues worked on by Groot Team
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants