feat(l1): implement debug_getModifiedAccountsByNumber/Hash#6700
feat(l1): implement debug_getModifiedAccountsByNumber/Hash#6700azteca1998 wants to merge 6 commits into
Conversation
|
Lines of code reportTotal lines added: Detailed view |
🤖 Kimi Code ReviewThis is a well-structured implementation of the Minor issues:
Security & Correctness:
Testing: The integration tests cover the critical paths: happy path, empty range, ordering validation, and unknown blocks. The helper module in Verdict: Automated review by Kimi (Moonshot AI) · kimi-k2.5 · custom prompt |
🤖 Claude Code ReviewNow I have enough context to write a thorough review. PR Review:
|
| Severity | Item |
|---|---|
| Medium | Wrong error variant (Internal vs. BadParams) for invalid user-supplied blocks |
| Medium | No maximum block-range limit — potential DoS for expensive state traversals |
| Medium | API divergence (hashed vs. plain addresses) needs clearer external documentation |
| Low | O(1) memory claim in doc comment is inaccurate |
| Low | Misleading ordering error message |
| Low | Missing test: by-hash variant with unknown hash |
| Low | Missing test: recipient in diff |
The core merge-join algorithm is correct and the spawn_blocking usage is appropriate. The main blockers before merging are the missing range limit and the wrong error variant for user-supplied invalid blocks.
Automated review by Claude (Anthropic) · sonnet · custom prompt
🤖 Codex Code ReviewFindings:
The lockstep trie walk itself looks reasonable and the endpoint is memory-efficient. I couldn’t run the targeted tests locally because Automated review by OpenAI Codex · gpt-5.4 · custom prompt |
Greptile SummaryImplements
Confidence Score: 3/5The core merge-join logic and async dispatch are correct, but the response serializes keccak256 hashes instead of addresses, making the endpoints incompatible with any standard Ethereum tooling that calls these debug methods. The return type divergence from the Ethereum spec is a present, user-visible defect: callers receive 32-byte hashed keys where they expect 20-byte addresses, and they have no way to reverse the hash. The diff algorithm itself, the spawn_blocking offload, and the test suite are all well-implemented, but the API contract mismatch would affect anyone integrating this with standard tools. crates/networking/rpc/debug/get_modified_accounts.rs — specifically the response type and the block-not-found error classification
|
| Filename | Overview |
|---|---|
| crates/networking/rpc/debug/get_modified_accounts.rs | New file implementing debug_getModifiedAccountsByNumber/Hash; returns Vec (hashed keys) instead of the spec-defined Vec, breaking geth API compatibility; merge-join diff algorithm is otherwise correct |
| crates/networking/rpc/debug/mod.rs | Adds module declaration for get_modified_accounts; no issues |
| crates/networking/rpc/rpc.rs | Wires debug_getModifiedAccountsByNumber and debug_getModifiedAccountsByHash into the RPC dispatcher; no issues |
| test/tests/rpc/debug_get_modified_accounts_tests.rs | Integration tests covering happy path, empty range, start-after-end error, and unknown block error; tests are well-structured but test the hashed-key response shape rather than actual addresses |
| test/tests/rpc/helpers.rs | New shared test helpers for RPC integration tests; setup_single_transfer_block, rpc_call, and rpc_call_expect_err are clean and reusable |
| test/tests/rpc/mod.rs | Registers the two new test modules; no issues |
Sequence Diagram
sequenceDiagram
participant C as Client
participant R as RPC Dispatcher
participant H as GetModifiedAccountsHandler
participant S as Store
participant T as spawn_blocking
C->>R: debug_getModifiedAccountsByNumber(start, end)
R->>H: parse(params)
H->>S: resolve_block_number(start_block)
H->>S: resolve_block_number(end_block)
H->>S: get_block_header(start_number) → start_root
H->>S: get_block_header(end_number) → end_root
H->>T: diff_state_roots(storage, start_root, end_root)
T->>S: iter_accounts(start_root) → start_iter
T->>S: iter_accounts(end_root) → end_iter
loop merge-join both iterators
T->>T: compare hashed keys, emit modified H256s
end
T-->>H: "Vec<H256>"
H-->>C: JSON array of hashed addresses
Prompt To Fix All With AI
Fix the following 3 code review issues. Work through them one at a time, proposing concise fixes.
---
### Issue 1 of 3
crates/networking/rpc/debug/get_modified_accounts.rs:11-14
**Incompatible response type — hashes instead of addresses**
The Ethereum JSON-RPC spec and geth both define the return type as `Vec<Address>` (20-byte addresses). This implementation returns `Vec<H256>` (32-byte keccak256 hashes of those addresses), so any caller using a standard Ethereum client library (`web3.js`, `ethers.js`, `cast debug getModifiedAccountsByNumber`, etc.) will receive 32-byte values where it expects 20-byte addresses and will either fail to parse them or misinterpret them silently. The doc comment itself explains the gap, but the gap itself is the issue: without a preimage store, callers have no way to reverse the hash back to an address, making the response practically unusable for the common task of "which accounts changed?"
### Issue 2 of 3
crates/networking/rpc/debug/get_modified_accounts.rs:59-64
**`RpcErr::Internal` used for a user-caused block-not-found condition**
When `start_number` or `end_number` resolves successfully (the block tag/number itself is valid) but the corresponding block header doesn't exist in storage, the handler returns `RpcErr::Internal` with "Start block header not found". `RpcErr::Internal` is intended for unexpected server-side failures; a missing block at a requested number is a client error and would be better surfaced as `RpcErr::BadParams` (or the equivalent) so callers can distinguish "the RPC server broke" from "the block you asked for doesn't exist".
### Issue 3 of 3
crates/networking/rpc/debug/get_modified_accounts.rs:152-168
**`peek()` borrows held across mutable `next()` calls in the match loop**
Each match arm in the `loop` calls `start_iter.peek()` and/or `end_iter.peek()`, binding references that belong to the respective `Peekable` wrapper. Several arms subsequently call `start_iter.next()` or `end_iter.next()` while the peeked references are still nominally in scope. This compiles today under NLL because the borrow checker sees that the references are last used before `next()` is called, but the pattern is fragile: any future refactor that adds even a single read of a peeked reference after the `next()` call will introduce a compile error that is non-obvious to diagnose. Consider extracting copies of the keys before the match body to make the lifetime boundaries explicit.
Reviews (1): Last reviewed commit: "fix(rpc): correct, stream, and spawn_blo..." | Re-trigger Greptile
| /// **Divergence from geth**: geth returns `Vec<Address>` by looking original | ||
| /// addresses up through its preimage store. ethrex has no preimage store, so | ||
| /// the response is `Vec<H256>` of *hashed* addresses (`keccak256(address)`). | ||
| /// Callers that know the address they care about can hash it themselves to |
There was a problem hiding this comment.
Incompatible response type — hashes instead of addresses
The Ethereum JSON-RPC spec and geth both define the return type as Vec<Address> (20-byte addresses). This implementation returns Vec<H256> (32-byte keccak256 hashes of those addresses), so any caller using a standard Ethereum client library (web3.js, ethers.js, cast debug getModifiedAccountsByNumber, etc.) will receive 32-byte values where it expects 20-byte addresses and will either fail to parse them or misinterpret them silently. The doc comment itself explains the gap, but the gap itself is the issue: without a preimage store, callers have no way to reverse the hash back to an address, making the response practically unusable for the common task of "which accounts changed?"
Prompt To Fix With AI
This is a comment left during a code review.
Path: crates/networking/rpc/debug/get_modified_accounts.rs
Line: 11-14
Comment:
**Incompatible response type — hashes instead of addresses**
The Ethereum JSON-RPC spec and geth both define the return type as `Vec<Address>` (20-byte addresses). This implementation returns `Vec<H256>` (32-byte keccak256 hashes of those addresses), so any caller using a standard Ethereum client library (`web3.js`, `ethers.js`, `cast debug getModifiedAccountsByNumber`, etc.) will receive 32-byte values where it expects 20-byte addresses and will either fail to parse them or misinterpret them silently. The doc comment itself explains the gap, but the gap itself is the issue: without a preimage store, callers have no way to reverse the hash back to an address, making the response practically unusable for the common task of "which accounts changed?"
How can I resolve this? If you propose a fix, please make it concise.| .await? | ||
| .ok_or(RpcErr::Internal("End block not found".to_string()))?; | ||
|
|
||
| if start_number > end_number { | ||
| return Err(RpcErr::BadParams(format!( | ||
| "start block ({start_number}) must be older than end block ({end_number})" |
There was a problem hiding this comment.
RpcErr::Internal used for a user-caused block-not-found condition
When start_number or end_number resolves successfully (the block tag/number itself is valid) but the corresponding block header doesn't exist in storage, the handler returns RpcErr::Internal with "Start block header not found". RpcErr::Internal is intended for unexpected server-side failures; a missing block at a requested number is a client error and would be better surfaced as RpcErr::BadParams (or the equivalent) so callers can distinguish "the RPC server broke" from "the block you asked for doesn't exist".
Prompt To Fix With AI
This is a comment left during a code review.
Path: crates/networking/rpc/debug/get_modified_accounts.rs
Line: 59-64
Comment:
**`RpcErr::Internal` used for a user-caused block-not-found condition**
When `start_number` or `end_number` resolves successfully (the block tag/number itself is valid) but the corresponding block header doesn't exist in storage, the handler returns `RpcErr::Internal` with "Start block header not found". `RpcErr::Internal` is intended for unexpected server-side failures; a missing block at a requested number is a client error and would be better surfaced as `RpcErr::BadParams` (or the equivalent) so callers can distinguish "the RPC server broke" from "the block you asked for doesn't exist".
How can I resolve this? If you propose a fix, please make it concise.| let mut modified = Vec::new(); | ||
|
|
||
| loop { | ||
| match (start_iter.peek(), end_iter.peek()) { | ||
| (None, None) => break, | ||
| (Some((s_hash, _)), None) => { | ||
| modified.push(*s_hash); | ||
| start_iter.next(); | ||
| } | ||
| (None, Some((e_hash, _))) => { | ||
| modified.push(*e_hash); | ||
| end_iter.next(); | ||
| } | ||
| (Some((s_hash, _)), Some((e_hash, _))) if s_hash < e_hash => { | ||
| // Account in start but not yet (and never, since iters are | ||
| // monotonic) seen in end at this position — deleted. | ||
| modified.push(*s_hash); |
There was a problem hiding this comment.
peek() borrows held across mutable next() calls in the match loop
Each match arm in the loop calls start_iter.peek() and/or end_iter.peek(), binding references that belong to the respective Peekable wrapper. Several arms subsequently call start_iter.next() or end_iter.next() while the peeked references are still nominally in scope. This compiles today under NLL because the borrow checker sees that the references are last used before next() is called, but the pattern is fragile: any future refactor that adds even a single read of a peeked reference after the next() call will introduce a compile error that is non-obvious to diagnose. Consider extracting copies of the keys before the match body to make the lifetime boundaries explicit.
Prompt To Fix With AI
This is a comment left during a code review.
Path: crates/networking/rpc/debug/get_modified_accounts.rs
Line: 152-168
Comment:
**`peek()` borrows held across mutable `next()` calls in the match loop**
Each match arm in the `loop` calls `start_iter.peek()` and/or `end_iter.peek()`, binding references that belong to the respective `Peekable` wrapper. Several arms subsequently call `start_iter.next()` or `end_iter.next()` while the peeked references are still nominally in scope. This compiles today under NLL because the borrow checker sees that the references are last used before `next()` is called, but the pattern is fragile: any future refactor that adds even a single read of a peeked reference after the `next()` call will introduce a compile error that is non-obvious to diagnose. Consider extracting copies of the keys before the match body to make the lifetime boundaries explicit.
How can I resolve this? If you propose a fix, please make it concise.Add debug_getModifiedAccountsByNumber and debug_getModifiedAccountsByHash which return addresses modified between two blocks. Compares state tries of start and end blocks to find accounts that were created, modified, or deleted. Note: returns hashed addresses (converted to Address) since ethrex does not maintain a preimage store. The brute-force trie comparison is O(n) in total accounts; a trie-diff algorithm would be more efficient for mainnet scale. Part of #6572
Execute a transfer, then query getModifiedAccountsByNumber and getModifiedAccountsByHash. Assert modified accounts are returned.
Three real issues with the original handler: - Wrong return values. The handler emitted `Address::from(hash)`, which takes the low 20 bytes of the 32-byte hashed-address key. That value isn't the original address — without a preimage store (ethrex has none) you can't recover it. Switch the response type to `Vec<H256>` of hashed addresses and document the divergence from geth on the type. The new `*_includes_sender` integration test asserts the sender's `keccak256(address)` appears in the result; it would have failed against the original truncation. - Blocking I/O on the async runtime. `Store::iter_accounts` opens four long-lived locked DB transactions and walks the trie synchronously (same reason snap-sync's validators use `spawn_blocking`). The handler now runs the diff on a blocking thread. - Full state materialised twice. The original collected every account from both tries into `HashSet<(H256, Vec<u8>)>` keyed on a stringified debug-format value — O(state) memory and millions of `format!()` allocations on mainnet. Replace with a streaming merge of the two iterators (both yield in hashed-key order from the trie traversal), comparing `AccountState` by value. O(1) extra memory. Test setup boilerplate moves out of the misnamed `debug_trace_tests.rs` into the shared `rpc::helpers` module introduced in #6701; the file is renamed to `debug_get_modified_accounts_tests.rs`. Integration coverage now spans: sender-appears-in-diff, by-number/by-hash equivalence, empty-range, start-after-end (both variants), and unknown-block.
eaaf276 to
78b3dcf
Compare
… in debug_getModifiedAccounts - Replace RpcErr::Internal with RpcErr::WrongParam for block-not-found conditions (these are client errors, not server failures). - Extract copies from peek() before calling next() so the Peekable borrows are cleanly released and future refactors won't hit borrow-checker surprises.
Summary
debug_getModifiedAccountsByNumberanddebug_getModifiedAccountsByHashdiff_state_rootshelper that iterates both triesCloses part of #6572
Test plan
Closes #6651