cmd/geth: add inspect trie tool to analysis trie storage#28892
cmd/geth: add inspect trie tool to analysis trie storage#28892lightclient merged 15 commits intoethereum:masterfrom
Conversation
|
This PR on mainnet: |
|
Nice job! @MariusVanDerWijden Currently, only the hash of the contract address can be printed. The contract address can be obtained through the preimage. |
3bdeec1 to
1581066
Compare
36b7785 to
1649310
Compare
|
Still getting killed with OOM for me... |
b66c822 to
f651600
Compare
gballet
left a comment
There was a problem hiding this comment.
I left a few comments. Regarding verkle/binary, I can add support for it later.
| // triestat tracks the type and count of trie nodes at each level in the trie. | ||
| // | ||
| // Note: theoretically it is possible to have up to 64 trie level. Since it is | ||
| // unlikely to encounter such a large trie, the stats are capped at 16 levels to | ||
| // avoid substantial unneeded allocation. | ||
| type triestat struct { | ||
| level [16]stat | ||
| } |
There was a problem hiding this comment.
there is a similar structure in core/stateless/stats, it should be merged.
| } | ||
|
|
||
| table := newTableWriter(os.Stdout) | ||
| table := tablewriter.NewWriter(os.Stdout) |
There was a problem hiding this comment.
I don't see the point of moving this to its own package, the tree writer is only used in trie, which is the one importing rawdb.
There was a problem hiding this comment.
I moved it to it's own package because it isn't exported from rawdb. It felt wrong to expose it publicly in that package.
Co-authored-by: Fynn <zcheng1004@gmail.com> Co-authored-by: lightclient <lightclient@protonmail.com> Co-authored-by: MariusVanDerWijden <m.vanderwijden@live.de>
a22690d to
f93b030
Compare
|
I've rebase the PR on top of master and added optional JSON output, because that would be very beneficial for the Pandaops team, who would like to use this to display the current state of the trie on their website |
|
@lightclient can we review this again? |
1077ca3 to
97cdc16
Compare
|
Two main things have been blocking this PR:
I confirmed that this PR still OOMs on mainnet with 16GB memory. Currently testing a new 2-pass strategy for the inspector. Instead of keeping trie stats in memory, we'll just flush each goroutine's result (i.e. the stat for a specific storage trie) to disk then come back in a second pass and do the summarization where we determine the top N to print. The streamed file should only be 8-12 GB on disk, so not a heavy lift. Will update with the results from that test. |
| var ret error | ||
| if in.dumpBuf != nil { | ||
| if err := in.dumpBuf.Flush(); err != nil { | ||
| ret = errors.Join(ret, fmt.Errorf("failed to flush trie dump %s: %w", in.config.DumpPath, err)) |
There was a problem hiding this comment.
Do you really need to call Join here?
|
Okay was able to run the PR against mainnet. OOM is generally solved via streaming storage trie results to disk and executing a second pass over them at the end, which takes less than 1 second on mainnet. Mainnet inspect-trie resultsRan against mainnet state at block **24,397,789** on `geth-ai-experiments-01`. Full run completed in ~9 hours. Dump file is 5.3 GB (25,223,117 records × 224 bytes, cleanly divisible). Account Trie
Max depth 13. 848,861,501 total nodes. Storage Summary
Storage Trie Depth Distribution
84% of storage tries have depth ≤ 2. Only 4 reach depth 12. Top 10 Storage Tries (by total node count)
Owner Dump file
|
|
Some updated tables, especially with the Contract mode output
|
gballet
left a comment
There was a problem hiding this comment.
LGTM. I'm not a fan of dumping things into a gigabyte-sized intermediate file by default, but let's improve on the concept if the command turns out to be widely used.
…8892) Ports ethereum/go-ethereum#28892 to morph's v1.10.26-era trie package. The implementation matches upstream's public API surface while adapting to the differences in morph's trie opening / reader layer and to the ZKTrie-aware chain config. New building blocks ------------------- trie/levelstats.go Per-depth classifier that counts short / full / value nodes and raw byte size per level, backed by atomic counters for concurrent writers. AddLeaf / LeafDepths preserve the witness-stats API so the same type can be reused by stateless verification if morph ever pulls in that path. trie/inspect.go - Inspect(triedb, root, config): two-pass inspector. Pass 1 walks the account trie with bounded parallelism (16 concurrent walkers via semaphore.Weighted + errgroup), streams one fixed-size record per non-empty storage trie into a dump file on disk, and emits the account trie as a sentinel record. A ticker logs progress every 8s for long scans. - InspectContract(triedb, db, stateRoot, address): inspects a single contract's storage footprint by running the storage trie walk in parallel with an iterator over the snapshot storage prefix, so operators see both views side-by-side. - Summarize(dumpPath, config): pass 2 aggregates per-level totals, builds three top-N rankings (by max depth, total node count, value-node count), and emits a human-readable table via internal/tablewriter or a JSON blob when --output is supplied. - ErrUnsupportedTrieFormat signals ZKTrie-encoded state (morph pre-JadeFork) so the CLI can refuse before producing bogus statistics. internal/tablewriter Self-contained text/tabwriter-backed stub mirroring the public API used by upstream's tablewriter call sites (SetHeader / AppendBulk / SetFooter / Render). Avoids pulling in a third-party dependency. CLI surface (cmd/geth db inspect-trie) -------------------------------------- latest | <blocknum> | snapshot select state root --exclude-storage skip per-account storage walks --top N top-N ranking size (default 10) --output <path> write JSON report instead of stdout tables --dump-path <path> custom pass-1 dump location (default <datadir>/trie-dump.bin) --summarize <path> re-summarize an existing dump, skipping the walk --contract 0xADDR run InspectContract against the resolved state root Tests ----- TestInspectRoundTripsDump — Inspect output and a standalone Summarize over the same dump produce byte-identical JSON. TestInspectNoStorageSkipsWalk — NoStorage short-circuits the storage walk but still emits the account sentinel. TestInspectEmptyRootEmitsAccountSentinel — one 352-byte record on an empty trie. TestInspectRejectsMissingRoot — clean error on unknown root. TestSummarizeRejectsTruncatedDump — dump file size validation. TestSummarizeRejectsMissingAccountSentinel — sentinel record is mandatory. TestInspectContract — end-to-end run on a synthetic contract with snapshot data. TestInspectContractRejectsMissingAccount / TestInspectContractRejectsStorageless — refuse absent or empty storage tries. Level-stats + tablewriter unit tests round out the bucket. Intentionally out of scope -------------------------- - core/stateless/stats.go does not currently use LevelStats on morph; migrating it is a separate change that can follow once stateless verification tracks upstream more closely. Made-with: Cursor
This pr adds a tool names
inpsect-trie, aimed to analyze the mpt and its node storage more efficiently.Example
./geth db inspect-trie --datadir server/data-seed/ latest 4000
Result