Skip to content

perf(l1): lazy BAL cursor for per-tx parallel execution#6669

Merged
edg-l merged 8 commits into
mainfrom
perf/bal-lazy-cursor
May 22, 2026
Merged

perf(l1): lazy BAL cursor for per-tx parallel execution#6669
edg-l merged 8 commits into
mainfrom
perf/bal-lazy-cursor

Conversation

@edg-l
Copy link
Copy Markdown
Contributor

@edg-l edg-l commented May 18, 2026

Summary

Replaces eager per-tx BAL prefix materialization inside execute_block_parallel with an on-read LazyBalCursor installed on each per-tx GeneralizedDatabase (LEVM's in-memory state cache that the EVM reads accounts/slots through during execution). Each tx materializes only the accounts/slots it actually touches instead of the full BAL prefix.

The two outer sequential seed_db_from_bal callers (system-call recovery, post-tx outer seed) are unchanged; the cursor is per-tx only.

Benchmark

Fixture: bal-devnet-7-mainnet-mix-460 (460 blocks, ~30 Ggas, transfer/EVM-mix). Single run, release-with-debug profile, import-bench --with-bal.

metric baseline (parallel, eager seed) this PR (lazy cursor) delta
wall time 8.58 s 6.85 s -1.73 s (-20.2%)
agg Ggas/s 3.90 5.02 +28.7%
avg ms / block 16.88 13.11 -3.77 ms (-22.3%)
p95 ms / block 17.73 14.09 -3.64 ms (-20.5%)
max ms / block 90.06 52.39 -37.67 ms
exec avg 15.57 11.89 -3.68 ms (-23.7%)
merkle avg 0.48 0.44 -0.04 ms
store avg 0.67 0.63 -0.04 ms
warmer avg 1.37 1.37 flat

Win is concentrated in exec, which is exactly what the cursor targets; merkle/store/warmer barely move, so the gain is not a measurement shift.

Changes

  • Extract seed_one_address_info_from_bal and seed_one_storage_slot_from_bal from seed_db_from_bal as reusable helpers in ethrex-levm. seed_db_from_bal becomes a thin loop over these helpers (behavior-preserving).
  • Add Clone on BalAddressIndex.
  • Add lazy_bal: Option<LazyBalCursor> field on GeneralizedDatabase. LazyBalCursor holds Arc<BlockAccessList>, bal_index: u32, Arc<BalAddressIndex>.
  • load_account consults the cursor for account info (balance, nonce, code_hash) on cache miss before falling through to the store. Does not inject account.storage.
  • get_storage_value consults the cursor per-slot on cache miss.
  • execute_block_parallel sets tx_db.lazy_bal = Some(...) per tx instead of calling seed_db_from_bal eagerly.
  • Per-tx GeneralizedDatabase capacity hint drops from bal_account_count to 32.
  • code_from_bal deduplicated into gen_db.rs.

Invariants

  1. Cursor bal_index = tx_idx + 1; effective cutoff is bal_index.saturating_sub(1), matching the existing seed_db_from_bal's max_idx = tx_idx. debug_assert!(bal_index >= 1).
  2. load_account only injects account-info fields, never account.storage. Storage stays lazy through get_storage_value.
  3. In seed_one_address_info_from_bal, code_update is computed before the &mut LevmAccount borrow; db.codes.entry().or_insert() runs after the borrow is released.
  4. In get_storage_value, the cursor result is copied to a local before taking &mut current_accounts_state.
  5. load_account .take()s the cursor before calling the helper (whose partial-coverage path calls db.get_account internally) and restores it after; prevents re-entry into the lazy hook.

Tests

test/tests/levm/bal_view_tests.rs:

  • tx1_sees_tx0_write ; off-by-one boundary
  • load_account_does_not_inject_storage ; no storage injection
  • sstore_sees_prior_write ; SSTORE pre-image flows through cursor
  • lazy_load_account_partial_coverage_does_not_recurse ; .take() guard

Test plan

  • cargo test -p ethrex-test --features rayon bal_view_tests (4/4)
  • cargo test -p ethrex-vm -p ethrex-levm -p ethrex-blockchain
  • make lint
  • cargo fmt --all --check
  • make -C tooling/ef_tests/state test
  • make -C tooling/ef_tests/blockchain test

@github-actions
Copy link
Copy Markdown

github-actions Bot commented May 18, 2026

⚠️ Known Issues — intentionally skipped tests

Source: docs/known_issues.md

Known Issues

Tests intentionally excluded from CI. Source of truth for the Known
Issues
section the L1 workflow appends to each ef-tests job summary
and posts as a sticky PR comment.

EF Tests — Stateless coverage narrowed to EIP-8025 optional-proofs

make -C tooling/ef_tests/blockchain test calls test-stateless-zkevm
instead of test-stateless. The zkevm@v0.3.3 fixtures are filled against
bal@v5.6.1, out of sync with current bal spec; the broad target trips ~549
fixtures. Re-broaden once the zkevm bundle is regenerated.

Why and resolution path

PR #6527 broadened
test-stateless to extract the entire for_amsterdam/ tree from the
zkevm bundle and run all of it under --features stateless; combined with
this branch's bal-devnet-7 semantics that scope produces ~549
GasUsedMismatch / ReceiptsRootMismatch /
BlockAccessListHashMismatch failures.

test-stateless-zkevm filters cargo to the eip8025_optional_proofs
suite, which still validates the stateless harness without the bal-version
mismatch.

Re-broaden by switching test: back to test-stateless in
tooling/ef_tests/blockchain/Makefile once the zkevm bundle is regenerated
against the current bal spec.

@github-actions github-actions Bot added L1 Ethereum client performance Block execution throughput and performance in general labels May 18, 2026
@github-actions
Copy link
Copy Markdown

github-actions Bot commented May 18, 2026

Lines of code report

Total lines added: 212
Total lines removed: 61
Total lines changed: 273

Detailed view
+-------------------------------------------------+-------+------+
| File                                            | Lines | Diff |
+-------------------------------------------------+-------+------+
| ethrex/crates/common/types/block_access_list.rs | 1163  | +11  |
+-------------------------------------------------+-------+------+
| ethrex/crates/vm/backends/levm/mod.rs           | 2387  | -61  |
+-------------------------------------------------+-------+------+
| ethrex/crates/vm/levm/src/db/gen_db.rs          | 762   | +201 |
+-------------------------------------------------+-------+------+

@github-actions
Copy link
Copy Markdown

github-actions Bot commented May 18, 2026

Benchmark Results Comparison

No significant difference was registered for any benchmark run.

Detailed Results

Benchmark Results: BubbleSort

Command Mean [s] Min [s] Max [s] Relative
main_revm_BubbleSort 2.972 ± 0.019 2.945 2.991 1.07 ± 0.01
main_levm_BubbleSort 2.917 ± 0.289 2.758 3.682 1.05 ± 0.10
pr_revm_BubbleSort 2.962 ± 0.042 2.918 3.053 1.07 ± 0.02
pr_levm_BubbleSort 2.778 ± 0.020 2.749 2.811 1.00

Benchmark Results: ERC20Approval

Command Mean [ms] Min [ms] Max [ms] Relative
main_revm_ERC20Approval 989.4 ± 5.8 982.4 999.1 1.02 ± 0.01
main_levm_ERC20Approval 1059.6 ± 10.3 1039.4 1076.0 1.10 ± 0.01
pr_revm_ERC20Approval 966.4 ± 3.7 959.5 971.9 1.00
pr_levm_ERC20Approval 1052.5 ± 8.3 1037.3 1062.7 1.09 ± 0.01

Benchmark Results: ERC20Mint

Command Mean [ms] Min [ms] Max [ms] Relative
main_revm_ERC20Mint 136.1 ± 2.0 134.3 140.3 1.02 ± 0.02
main_levm_ERC20Mint 156.3 ± 1.0 154.9 157.8 1.17 ± 0.02
pr_revm_ERC20Mint 133.3 ± 1.6 130.6 135.7 1.00
pr_levm_ERC20Mint 154.9 ± 1.0 153.9 156.9 1.16 ± 0.02

Benchmark Results: ERC20Transfer

Command Mean [ms] Min [ms] Max [ms] Relative
main_revm_ERC20Transfer 237.6 ± 1.3 236.1 239.3 1.03 ± 0.01
main_levm_ERC20Transfer 262.5 ± 2.3 259.3 266.6 1.14 ± 0.01
pr_revm_ERC20Transfer 230.7 ± 1.8 229.0 233.8 1.00
pr_levm_ERC20Transfer 260.8 ± 1.2 259.0 262.8 1.13 ± 0.01

Benchmark Results: Factorial

Command Mean [ms] Min [ms] Max [ms] Relative
main_revm_Factorial 227.1 ± 1.7 224.5 229.7 1.00
main_levm_Factorial 270.6 ± 4.2 266.4 278.6 1.19 ± 0.02
pr_revm_Factorial 227.5 ± 3.2 220.2 233.1 1.00 ± 0.02
pr_levm_Factorial 268.2 ± 1.5 266.6 270.4 1.18 ± 0.01

Benchmark Results: FactorialRecursive

Command Mean [s] Min [s] Max [s] Relative
main_revm_FactorialRecursive 1.726 ± 0.038 1.657 1.793 1.06 ± 0.02
main_levm_FactorialRecursive 1.641 ± 0.019 1.612 1.665 1.01 ± 0.01
pr_revm_FactorialRecursive 1.711 ± 0.033 1.651 1.752 1.05 ± 0.02
pr_levm_FactorialRecursive 1.628 ± 0.010 1.612 1.647 1.00

Benchmark Results: Fibonacci

Command Mean [ms] Min [ms] Max [ms] Relative
main_revm_Fibonacci 206.1 ± 2.1 204.0 210.2 1.01 ± 0.01
main_levm_Fibonacci 254.1 ± 3.6 249.3 259.0 1.25 ± 0.02
pr_revm_Fibonacci 203.2 ± 1.2 201.7 205.2 1.00
pr_levm_Fibonacci 249.4 ± 1.5 247.3 253.0 1.23 ± 0.01

Benchmark Results: FibonacciRecursive

Command Mean [ms] Min [ms] Max [ms] Relative
main_revm_FibonacciRecursive 911.1 ± 7.7 895.7 922.4 1.25 ± 0.03
main_levm_FibonacciRecursive 730.5 ± 26.9 714.5 804.9 1.01 ± 0.04
pr_revm_FibonacciRecursive 907.3 ± 11.8 889.5 933.1 1.25 ± 0.03
pr_levm_FibonacciRecursive 726.3 ± 15.7 712.0 765.5 1.00

Benchmark Results: ManyHashes

Command Mean [ms] Min [ms] Max [ms] Relative
main_revm_ManyHashes 8.5 ± 0.2 8.4 9.0 1.01 ± 0.02
main_levm_ManyHashes 9.9 ± 0.1 9.9 10.1 1.18 ± 0.02
pr_revm_ManyHashes 8.4 ± 0.1 8.3 8.6 1.00
pr_levm_ManyHashes 10.0 ± 0.2 9.8 10.5 1.19 ± 0.03

Benchmark Results: MstoreBench

Command Mean [ms] Min [ms] Max [ms] Relative
main_revm_MstoreBench 260.1 ± 5.7 255.9 274.7 1.14 ± 0.03
main_levm_MstoreBench 229.1 ± 1.2 227.5 230.8 1.00
pr_revm_MstoreBench 263.3 ± 8.9 255.6 276.3 1.15 ± 0.04
pr_levm_MstoreBench 236.7 ± 4.5 232.7 248.3 1.03 ± 0.02

Benchmark Results: Push

Command Mean [ms] Min [ms] Max [ms] Relative
main_revm_Push 291.0 ± 2.2 288.4 295.1 1.00 ± 0.01
main_levm_Push 295.0 ± 1.5 293.2 297.4 1.02 ± 0.01
pr_revm_Push 290.5 ± 1.2 288.6 292.4 1.00
pr_levm_Push 293.6 ± 1.3 292.2 296.7 1.01 ± 0.01

Benchmark Results: SstoreBench_no_opt

Command Mean [ms] Min [ms] Max [ms] Relative
main_revm_SstoreBench_no_opt 169.9 ± 6.0 165.6 183.1 1.67 ± 0.06
main_levm_SstoreBench_no_opt 101.8 ± 0.9 100.5 103.2 1.00 ± 0.02
pr_revm_SstoreBench_no_opt 165.6 ± 2.6 162.7 170.7 1.63 ± 0.03
pr_levm_SstoreBench_no_opt 101.5 ± 1.3 100.0 104.8 1.00

@edg-l edg-l force-pushed the perf/bal-lazy-cursor branch from 7abf051 to 01ecf18 Compare May 19, 2026 07:30
@github-actions
Copy link
Copy Markdown

github-actions Bot commented May 19, 2026

Benchmark Block Execution Results Comparison Against Main

Command Mean [s] Min [s] Max [s] Relative
base 66.767 ± 0.209 66.480 67.095 1.00
head 67.136 ± 0.264 66.650 67.444 1.01 ± 0.01

@edg-l edg-l marked this pull request as ready for review May 19, 2026 08:35
@edg-l edg-l requested a review from a team as a code owner May 19, 2026 08:35
@ethrex-project-sync ethrex-project-sync Bot moved this to In Review in ethrex_l1 May 19, 2026
@github-actions
Copy link
Copy Markdown

🤖 Codex Code Review

  1. High: shared_base now masks the BAL prefix, so later txs can execute against stale pre-state for any account already loaded during prepare_block. In crates/vm/levm/src/db/gen_db.rs:346- :401, load_account() returns the shared_base snapshot before consulting lazy_bal. Before this PR, seed_db_from_bal() overlaid prior-tx BAL changes on top of that snapshot. After this change, an account touched by system-contract setup and then modified by tx 0 will still be seen by tx 1 as the post-system-call version, not the post-tx-0 version. The same stale-state problem also applies to storage slots already present in the cloned shared-base account, because get_storage_value() returns cached storage before the lazy BAL hook at :957. This is consensus-critical.

  2. Medium: the new lazy storage lookup is linear in the number of changed slots for that account. seed_one_storage_slot_from_bal() does .iter().find(...) over storage_changes at :163- :166, and that helper is now on the SLOAD/SSTORE miss path via get_storage_value(). Since BAL storage entries are slot-sorted, this should stay binary-search-based; otherwise storage-heavy txs regress toward quadratic behavior across many distinct slot reads.

The .take() recursion fix itself looks sound, but I would not merge this until the shared_base/lazy_bal ordering bug is fixed. I could not run cargo test in this environment because rustup cannot create temp files on the read-only filesystem.


Automated review by OpenAI Codex · gpt-5.4 · custom prompt

@github-actions
Copy link
Copy Markdown

🤖 Claude Code Review

Now I have everything I need to write a detailed review.


Review: perf(l1): lazy BAL cursor for per-tx parallel execution

Summary

The design is sound and the performance numbers are compelling (+28.7% Ggas/s). The .take() anti-recursion guard is correct, the borrow-checker concerns are handled properly, and the off-by-one semantics (bal_index = tx_idx + 1 → max_idx = tx_idx) match the previous eager-seed behaviour. The tests cover the key regression surfaces. Several issues below range from a potential correctness bug to minor nits.


Potential Correctness Bug: get_storage_value hook fires before any "already cached" check

crates/vm/levm/src/db/gen_db.rs, hunk @@ -725,6 +954

The BAL hook is placed immediately after the contains_key(address) guard, but there is no check that the slot is already in account.storage before the hook fires:

// Lazy-BAL hook — no prior "is key already in account.storage?" guard
let bal_hit = self.db.lazy_bal.as_ref().and_then(|cursor| {
    seed_one_storage_slot_from_bal(&cursor.bal, acct_idx, key, max_idx)
});
if let Some(value) = bal_hit {
    account.storage.insert(key, value);   // could silently overwrite a live SSTORE
    return Ok(value);
}

If get_storage_value is called for a slot that the current transaction has already written via SSTORE (so the slot IS in account.storage), and the BAL also covers that slot, the hook would overwrite the live written value with the BAL's pre-tx view and return the stale value. Whether this actually happens depends on whether the SLOAD/SSTORE path has an account.storage.get(&key) fast-return before calling get_storage_value — that path is not visible in this diff. Please confirm (and ideally add a test) that get_storage_value is never called for a slot already present in account.storage, or add the guard here.


O(n²) Storage Seeding in seed_db_from_bal

crates/vm/backends/levm/mod.rs, around the new storage loop

for sc in &acct_changes.storage_changes {
    let key = ethrex_common::utils::u256_to_h256(sc.slot);
    if let Some(value) = seed_one_storage_slot_from_bal(bal, acct_idx, key, max_idx) {
        acc.storage.insert(key, value);
    }
}

seed_one_storage_slot_from_bal accepts a key: H256 and re-scans acct_changes.storage_changes via .find(|sc| u256_to_h256(sc.slot) == key) for every call. The calling loop is also iterating over the same storage_changes slice, so each account with N storage slots costs O(N²) for this outer path.

The outer seed_db_from_bal is only called twice per block (system-call recovery and post-tx seed), so the blast radius is limited, but for accounts with many touched slots this is a clear regression from the O(N log M) old code. The simplest fix is to inline the partition_point logic directly in the loop (as the old code did) rather than delegating to the helper:

for sc in &acct_changes.storage_changes {
    let pos = sc.slot_changes.partition_point(|c| c.block_access_index <= max_idx);
    if pos > 0 {
        let key = ethrex_common::utils::u256_to_h256(sc.slot);
        acc.storage.insert(key, sc.slot_changes[pos - 1].post_value);
    }
}

Double Hash-Map Lookup in load_account Fast Path

crates/vm/levm/src/db/gen_db.rs, load_account rewrite

if self.current_accounts_state.contains_key(&address) {
    return self
        .current_accounts_state
        .get_mut(&address)           // second lookup
        .ok_or(InternalError::AccountNotFound);
}

Two map lookups for the common case. The old Entry::Occupied pattern was a single lookup. Use:

if let Entry::Occupied(e) = self.current_accounts_state.entry(address) {
    return Ok(e.into_mut());
}

The same double-lookup pattern appears for initial_accounts_state and the shared_base path. The ok_or(AccountNotFound) after get_mut is also unreachable (we just confirmed presence with contains_key), which would be eliminated by using the Entry API.


helper_result Discards the Semantic Bool from seed_one_address_info_from_bal

crates/vm/levm/src/db/gen_db.rs, load_account lazy-BAL block

Some(
    seed_one_address_info_from_bal(self, &cursor.bal, acct_idx, max_idx)
        .map(|_| true),   // bool return is thrown away
)

The function returns Ok(false) to signal "no fields applied" and Ok(true) to signal "at least one field applied", but that meaning is erased here by .map(|_| true). The subsequent check relies on contains_key as the real signal — which works, but obscures intent. Either use the returned bool to decide whether to fall through, or change the return type to Result<(), InternalError> and use the contains_key check explicitly with a comment explaining why.


Unnecessary saturating_sub(1) after Proven-Non-Zero Guards

crates/vm/levm/src/db/gen_db.rs, seed_one_address_info_from_bal

// inside `if code_pos > 0 { ... }`
let entry = acct_changes
    .code_changes
    .get(code_pos.saturating_sub(1))     // code_pos > 0, so this is code_pos - 1
    .ok_or(InternalError::AccountNotFound)?;

Inside if code_pos > 0, saturating_sub(1) is identical to - 1. Same pattern appears for balance_pos and nonce_pos in the has_all_info branch. Just use code_pos - 1; the intent is clearer and the ok_or is also unreachable (partition_point guarantees pos - 1 < len when pos > 0).


Confusing filter(|_| pos > 0) Pattern in seed_one_storage_slot_from_bal

crates/vm/levm/src/db/gen_db.rs

sc.slot_changes
    .get(pos.saturating_sub(1))   // returns Some(...) when pos == 0 if slice non-empty
    .filter(|_| pos > 0)          // then immediately discards it
    .map(|c| c.post_value)

When pos == 0, get(0) may return Some even though there is nothing to return. The .filter corrects this, but the sequence is needlessly opaque. Prefer:

(pos > 0).then(|| sc.slot_changes[pos - 1].post_value)

Stale Line-Number References in Doc Comment

crates/vm/backends/levm/mod.rs, seed_db_from_bal doc

/// Eager BAL prefix seed — used only by the outer DB path (system-call recovery
/// at `:440` and post-tx outer seed at `:463`).

Absolute line numbers in comments rot immediately. Replace with function/variable names or conceptual labels.


Minor: Arc::new(bal.clone()) Adds a Whole-BAL Clone Per Block

crates/vm/backends/levm/mod.rs, execute_block_parallel

let arc_bal = Arc::new(bal.clone());

bal is already available as a shared reference. Unless the caller owns it and wrapping in Arc without cloning is impossible, consider whether the BAL could be passed as Arc<BlockAccessList> further up the call chain to eliminate this clone. This is a lower-priority concern given it's one clone per block, but for very large BALs it contributes to peak memory.


Test Coverage Gap

There is no test exercising the get_storage_value BAL hook (bal_hit path) through the full EVM/VM machinery. The existing tests cover seed_one_storage_slot_from_bal and seed_one_address_info_from_bal directly, plus the recursion guard. A test that constructs a minimal VM with lazy_bal = Some(...), calls SLOAD for a slot present in the BAL, and verifies the returned value would close the coverage gap for the hook that carries the most correctness risk.


Positive Notes

  • The .take() + restore pattern to break the load_accountseed_one_address_info_from_balget_accountload_account recursion cycle is correct and the explanation in the comment is excellent.
  • Copying bal_hit to a local before the &mut current_accounts_state borrow is the right fix for the borrow-checker tension in get_storage_value.
  • Using Arc<BlockAccessList> / Arc<BalAddressIndex> cloned cheaply per-tx (rather than cloning data per-tx) is the right pattern.
  • debug_assert!(cursor.bal_index >= 1) at both hook sites is good defensive practice.
  • The four regression tests are well-targeted: off-by-one boundary, storage-non-injection, multi-write boundaries, and recursion prevention.

Automated review by Claude (Anthropic) · sonnet · custom prompt

@greptile-apps
Copy link
Copy Markdown

greptile-apps Bot commented May 19, 2026

Greptile Summary

This PR replaces the eager per-tx BAL prefix materialization in execute_block_parallel with an on-read LazyBalCursor installed on each per-tx GeneralizedDatabase, so only the accounts and slots actually touched during a transaction are materialized rather than the entire BAL prefix upfront. The benchmark shows a ~20% wall-time improvement concentrated entirely in the exec phase, with merkle/store/warmer metrics flat, validating the targeted scope of the change.

  • LazyBalCursor is added to GeneralizedDatabase; load_account and get_storage_value consult it on cache-miss before falling back to the store, with a .take()/restore guard in load_account to prevent infinite re-entry through partial-coverage accounts that call db.get_account internally.
  • seed_db_from_bal is refactored into reusable seed_one_address_info_from_bal and seed_one_storage_slot_from_bal helpers; the outer eager-seed callers (system-call recovery, post-tx outer seed) are unchanged and the lazy cursor is per-tx only.
  • Four unit tests in bal_view_tests.rs cover the off-by-one BAL boundary, no-storage-injection invariant, multi-write cursor semantics, and the recursion guard.

Confidence Score: 4/5

The parallel per-tx path is safe — the lazy cursor correctly replicates the semantics of the eager seed with a well-documented recursion guard. The two outer eager-seed callers are untouched.

The core lazy-cursor implementation in gen_db.rs is carefully structured and the off-by-one invariants, borrow-split patterns, and anti-recursion guard are all correct. The regression lives in the outer seed_db_from_bal storage loop in mod.rs, where the new code delegates to seed_one_storage_slot_from_bal (which re-searches storage_changes by key) while iterating storage_changes itself — trading O(n) for O(n²) per account. This path runs only twice per block, limiting impact, but it is a clear regression relative to the old code.

crates/vm/backends/levm/mod.rs — specifically the refactored storage inner-loop inside seed_db_from_bal

Important Files Changed

Filename Overview
crates/vm/levm/src/db/gen_db.rs Core of the PR: adds LazyBalCursor struct, seed_one_address_info_from_bal/seed_one_storage_slot_from_bal helpers, lazy_bal field on GeneralizedDatabase, and hooks into load_account and get_storage_value. The recursion guard (take/restore of cursor) and borrow-split patterns are correctly implemented.
crates/vm/backends/levm/mod.rs seed_db_from_bal refactored to delegate info-seeding to seed_one_address_info_from_bal; execute_block_parallel switches from eager seed to lazy cursor. The storage inner-loop introduces an O(n²) scan for the outer eager seed path.
crates/common/types/block_access_list.rs Adds #[derive(Clone)] to BalAddressIndex to allow Arc wrapping; minimal, safe change.
test/tests/levm/bal_view_tests.rs Adds four unit tests: off-by-one boundary, no-storage-injection invariant, multi-write cursor semantics, and recursion guard. Good coverage of the non-trivial edge cases.

Sequence Diagram

sequenceDiagram
    participant EP as execute_block_parallel
    participant TxDB as per-tx GeneralizedDatabase
    participant Cursor as LazyBalCursor
    participant BAL as BlockAccessList
    participant Store as backing Store

    EP->>TxDB: "set lazy_bal = Some(LazyBalCursor)"
    Note over EP,TxDB: replaces eager seed_db_from_bal call

    TxDB->>TxDB: load_account(addr) — cache miss
    TxDB->>Cursor: take() cursor (anti-recursion guard)
    Cursor->>BAL: seed_one_address_info_from_bal(addr, max_idx)
    alt has_all_info
        BAL-->>TxDB: insert LevmAccount with BAL fields
    else partial coverage
        TxDB->>Store: get_account_state(addr)
        Store-->>TxDB: base account + overlay BAL fields
    else not in BAL
        TxDB->>Store: get_account_state(addr)
        Store-->>TxDB: account
    end
    TxDB->>Cursor: restore cursor unconditionally

    TxDB->>TxDB: get_storage_value(addr, key) — slot not cached
    TxDB->>Cursor: as_ref() read only
    Cursor->>BAL: seed_one_storage_slot_from_bal(acct_idx, key, max_idx)
    alt slot in BAL
        BAL-->>TxDB: post_value → cache → return
    else not in BAL
        TxDB->>Store: get_value_from_database
        Store-->>TxDB: value → cache → return
    end
Loading
Prompt To Fix All With AI
Fix the following 2 code review issues. Work through them one at a time, proposing concise fixes.

---

### Issue 1 of 2
crates/vm/backends/levm/mod.rs:921-929
The refactored storage loop reintroduces an O(n²) cost in the outer eager `seed_db_from_bal` path. For each `sc` yielded by the outer `for sc in &acct_changes.storage_changes` iteration, `seed_one_storage_slot_from_bal` re-scans `storage_changes` via `iter().find()` to locate the *same* `sc` by key. With n storage entries per account this is O(n²), whereas the old code used the already-available iterator value and did a single O(log k) `partition_point` call — net O(n) per account. For a contract with hundreds of BAL-tracked slots this compounds noticeably in both the system-call-recovery path and the post-tx outer seed.

```suggestion
            let acc = db
                .get_account_mut(addr)
                .map_err(|e| EvmError::Custom(format!("seed storage mut: {e}")))?;
            for sc in &acct_changes.storage_changes {
                let pos = sc
                    .slot_changes
                    .partition_point(|c| c.block_access_index <= max_idx);
                if pos > 0 {
                    let key = ethrex_common::utils::u256_to_h256(sc.slot);
                    acc.storage.insert(key, sc.slot_changes[pos - 1].post_value);
                }
            }
```

### Issue 2 of 2
crates/vm/levm/src/db/gen_db.rs:407-414
**Linear slot scan on every storage cache-miss**

`seed_one_storage_slot_from_bal` uses `iter().find()` over the full `storage_changes` slice on every call from the lazy-cursor hook in `get_storage_value`. For a contract whose BAL entry has many storage-change records the cost is O(n_storage_changes_in_bal) per cold-slot read. The `BalAddressIndex` gives O(1) address lookup, but there is no equivalent per-account slot index. For the described workload (transfer / EVM-mix) the average account has few BAL storage slots and this is fine; however, a DeFi block dominated by high-storage-turnover contracts (e.g., AMM pools with many SSTORE'd ticks) could see the lazy path regress relative to the old eager seed. A `FxHashMap<H256, usize>` slot-to-position index on `LazyBalCursor`, built when the cursor is constructed, would restore O(1) slot lookup without increasing peak memory meaningfully.

Reviews (1): Last reviewed commit: "docs(changelog): add lazy BAL cursor per..." | Re-trigger Greptile

Comment thread crates/vm/backends/levm/mod.rs
Comment thread crates/vm/levm/src/db/gen_db.rs
Comment thread crates/vm/levm/src/db/gen_db.rs Outdated
@edg-l edg-l force-pushed the perf/bal-lazy-cursor branch from 91648bb to 4d26edc Compare May 20, 2026 10:54
edg-l added a commit that referenced this pull request May 20, 2026
Address greptile findings on PR #6669:
- seed_db_from_bal eager loop walked storage_changes, then
  seed_one_storage_slot_from_bal re-found the same sc by slot key.
  Use the outer sc directly via a new post_value_at_or_before helper.
- seed_one_storage_slot_from_bal (lazy cursor) did iter().find() over
  storage_changes on every cache miss. Resolve slot in O(1) via a new
  per-account slot_idx_by_account map on BalAddressIndex, built once
  per block in build_validation_index.

Safe under EIP-7928: canonical-ordering validation enforces strictly
ascending unique slots per account, so map insert order matches the
former find() semantics.

Verified clean: 8721 + 93 ef-tests pass on a clean vectors checkout.
@edg-l edg-l requested a review from iovoid May 20, 2026 11:23
edg-l added a commit that referenced this pull request May 20, 2026
Address greptile findings on PR #6669:
- seed_db_from_bal eager loop walked storage_changes, then
  seed_one_storage_slot_from_bal re-found the same sc by slot key.
  Use the outer sc directly via a new post_value_at_or_before helper.
- seed_one_storage_slot_from_bal (lazy cursor) did iter().find() over
  storage_changes on every cache miss. Resolve slot in O(1) via a new
  per-account slot_idx_by_account map on BalAddressIndex, built once
  per block in build_validation_index.

Safe under EIP-7928: canonical-ordering validation enforces strictly
ascending unique slots per account, so map insert order matches the
former find() semantics.

Verified clean: 8721 + 93 ef-tests pass on a clean vectors checkout.
edg-l added a commit that referenced this pull request May 20, 2026
Address greptile findings on PR #6669:
- seed_db_from_bal eager loop walked storage_changes, then
  seed_one_storage_slot_from_bal re-found the same sc by slot key.
  Use the outer sc directly via a new post_value_at_or_before helper.
- seed_one_storage_slot_from_bal (lazy cursor) did iter().find() over
  storage_changes on every cache miss. Resolve slot in O(1) via a new
  per-account slot_idx_by_account map on BalAddressIndex, built once
  per block in build_validation_index.

Safe under EIP-7928: canonical-ordering validation enforces strictly
ascending unique slots per account, so map insert order matches the
former find() semantics.

Verified clean: 8721 + 93 ef-tests pass on a clean vectors checkout.
Copy link
Copy Markdown
Contributor

@ElFantasma ElFantasma left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Three inline findings, all minor — none blocking.

Comment thread crates/vm/levm/src/db/gen_db.rs
Comment thread crates/vm/levm/src/db/gen_db.rs
Comment thread crates/vm/levm/src/db/gen_db.rs
Comment thread crates/vm/levm/src/db/gen_db.rs
Comment thread crates/vm/levm/src/db/gen_db.rs
Comment thread crates/vm/backends/levm/mod.rs Outdated
edg-l added a commit that referenced this pull request May 21, 2026
Address greptile findings on PR #6669:
- seed_db_from_bal eager loop walked storage_changes, then
  seed_one_storage_slot_from_bal re-found the same sc by slot key.
  Use the outer sc directly via a new post_value_at_or_before helper.
- seed_one_storage_slot_from_bal (lazy cursor) did iter().find() over
  storage_changes on every cache miss. Resolve slot in O(1) via a new
  per-account slot_idx_by_account map on BalAddressIndex, built once
  per block in build_validation_index.

Safe under EIP-7928: canonical-ordering validation enforces strictly
ascending unique slots per account, so map insert order matches the
former find() semantics.

Verified clean: 8721 + 93 ef-tests pass on a clean vectors checkout.
@edg-l edg-l force-pushed the perf/bal-lazy-cursor branch from 4839027 to a57c161 Compare May 21, 2026 14:07
edg-l added 8 commits May 21, 2026 16:47
Replaces eager per-tx BAL prefix materialization inside execute_block_parallel
with an on-read LazyBalCursor installed on each per-tx GeneralizedDatabase.
load_account consults the cursor for account info only; get_storage_value
consults it per-slot. Each tx now materializes only what it actually touches
instead of the full BAL prefix.

The two outer sequential seed_db_from_bal callers (system-call recovery,
post-tx outer seed) remain untouched.

- Extract seed_one_address_info_from_bal + seed_one_storage_slot_from_bal
  from seed_db_from_bal as reusable helpers in ethrex-levm
- Add Clone to BalAddressIndex so it can be Arc-wrapped once per block
- Add lazy_bal: Option<LazyBalCursor> on GeneralizedDatabase
- Hook load_account and get_storage_value with explicit borrow-ordering
- Switch execute_block_parallel to set tx_db.lazy_bal instead of seeding
- Drop per-tx DB capacity hint from bal_account_count to 32

Tests in test/tests/levm/bal_view_tests.rs cover:
- T1 off-by-one cutoff (tx1_sees_tx0_write)
- T2 no storage injection in load_account
- T3 SSTORE pre-image flows through cursor
- T4 partial-coverage load_account does not recurse (cursor .take() guard)
The per-tx GeneralizedDatabase in execute_block_parallel is configured with
both a shared_base (pre-block snapshot of system-touched addresses, captured
from initial_accounts_state after prepare_block) and a LazyBalCursor that
materialises the BAL prefix on cache-miss. load_account previously consulted
shared_base before the cursor, so any address present in both would short-
circuit to the pre-block balance / nonce / code and miss the BAL overlay.

For a predeploy touched by prepare_block (e.g. the withdrawal / consolidation
request contracts) whose info is then mutated by a prior tx in the same block,
a later tx reading that info via BALANCE / EXTCODE* would observe the stale
pre-block value. Storage reads are unaffected because shared_base accounts are
cloned with empty .storage and slot reads go through the lazy_bal hook in
get_storage_value.

Reorder load_account: lazy_bal hook runs first, falling back to shared_base
only when the cursor has no entry for the address. The .take() guard already
prevents the partial-coverage recursion through db.get_account; the inner
call now lands on shared_base (or store), then the outer overlays BAL info.

Regression test in test/tests/levm/bal_view_tests.rs constructs a per-tx db
with a shared_base balance of 0 and a BAL balance_change of 42_000 at
block_access_index 1, and asserts load_account returns the BAL value.

Verified clean: full blockchain ef-tests (8721 + 93 = 8814 tests, 0 failed)
on a freshly downloaded amsterdam fixtures bundle.
Address greptile findings on PR #6669:
- seed_db_from_bal eager loop walked storage_changes, then
  seed_one_storage_slot_from_bal re-found the same sc by slot key.
  Use the outer sc directly via a new post_value_at_or_before helper.
- seed_one_storage_slot_from_bal (lazy cursor) did iter().find() over
  storage_changes on every cache miss. Resolve slot in O(1) via a new
  per-account slot_idx_by_account map on BalAddressIndex, built once
  per block in build_validation_index.

Safe under EIP-7928: canonical-ordering validation enforces strictly
ascending unique slots per account, so map insert order matches the
former find() semantics.

Verified clean: 8721 + 93 ef-tests pass on a clean vectors checkout.
L2 lint (no rayon feature) flagged unused import: SlotChange,
since post_value_at_or_before is rayon-gated.
Replace fragile line-number references in seed_db_from_bal doc with
descriptive context.
@edg-l edg-l force-pushed the perf/bal-lazy-cursor branch from a57c161 to df9a356 Compare May 21, 2026 14:47
@ilitteri ilitteri added this pull request to the merge queue May 21, 2026
@ilitteri ilitteri removed this pull request from the merge queue due to a manual request May 21, 2026
@edg-l edg-l added this pull request to the merge queue May 22, 2026
Merged via the queue into main with commit 17c3d14 May 22, 2026
70 checks passed
@edg-l edg-l deleted the perf/bal-lazy-cursor branch May 22, 2026 06:37
@github-project-automation github-project-automation Bot moved this from In Review to Done in ethrex_l1 May 22, 2026
@github-project-automation github-project-automation Bot moved this from Todo to Done in ethrex_performance May 22, 2026
ilitteri added a commit that referenced this pull request May 26, 2026
Resolves a conflict in crates/vm/backends/levm/mod.rs introduced by #6669
(lazy BAL cursor) and #6655 (BAL optimistic merkleization), which rewrote the
same lines this branch un-gated. Took main's version of the file wholesale,
then re-stripped the rayon/eip-8025 cfg gates — keeping main's is_amsterdam
correctness guard and gen_db refactor.

The merge also pulled in new rayon/eip-8025 gates from #6669 in files that did
not conflict (auto-merged): crates/vm/levm/src/db/gen_db.rs, test/Cargo.toml,
and test/tests/levm/bal_view_tests.rs. Stripped those too, so the only
remaining eip-8025 gates are the four guest binary main.rs files and no rayon
feature gates remain. The bal_view_tests now run unconditionally.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

L1 Ethereum client performance Block execution throughput and performance in general

Projects

Status: Done
Status: Done

Development

Successfully merging this pull request may close these issues.

4 participants