Skip to content

perf(l1): move per-tx BAL validation into the par_iter closure#6677

Open
edg-l wants to merge 3 commits into
mainfrom
perf/bal-validation-into-par-iter
Open

perf(l1): move per-tx BAL validation into the par_iter closure#6677
edg-l wants to merge 3 commits into
mainfrom
perf/bal-validation-into-par-iter

Conversation

@edg-l
Copy link
Copy Markdown
Contributor

@edg-l edg-l commented May 19, 2026

Summary

execute_block_parallel previously returned (current_state, codes, shadow_touched, shadow_reads) per tx and validated them in a serial post-par_iter loop across all txs. Validation is per-tx pure work; only the marking of unread_storage_reads / unaccessed_pure_accounts mutates cross-tx state.

This PR moves validate_tx_execution + shadow-touched / shadow-reads checks into the rayon closure (parallel). The closure also precomputes the small (Vec<(Address, H256)>, Vec<Address>) inputs the serial pass uses to update the shared sets, so current_state + codes no longer cross the rayon boundary.

BAL validation errors are deferred via Option<EvmError> in the result tuple so the post-par_iter gas-limit check still takes priority (preserves GAS_USED_OVERFLOW > BAL mismatch on blocks exceeding the gas limit; the BAL is built assuming rejected txs, so miner balance in the BAL won't match execution that ran all txs).

Changes

  • TxExecResult tuple shrinks from 8 fields (with two heavy FxHashMaps) to 7 fields with Vecs only.
  • Inside the closure, after execute_tx_in_block:
    • Compute reads_satisfied: Vec<(Address, H256)> (touched storage keys per non-destroyed account) and destroyed: Vec<Address> (selfdestructed accounts whose remaining BAL storage_reads are wiped) from current_state.
    • Run validate_tx_execution + shadow-touched + shadow-reads checks; capture the first error as Option<EvmError> (deferred).
    • Drop current_state and codes before returning.
  • After par_iter:
    • Existing 2D gas allowance / breakdowns / block-overflow loop unchanged.
    • New step: surface the first deferred BAL error in tx order (only reached if the gas-limit check passes).
    • Apply reads_satisfied / destroyed to unread_storage_reads and tracked_accounts to unaccessed_pure_accounts (cheap hash-set ops).

Invariants

  1. Gas-limit error priority preserved: the serial pass runs gas / 2D allowance / block-overflow before surfacing any deferred BAL error.
  2. Tx-order error reporting preserved: errors are surfaced in sorted tx_idx order; the first failing tx still wins.
  3. Destroyed-account semantics preserved: the Destroyed | DestroyedModified branch in the unread-reads marking is now an explicit Vec<Address> from the closure; the serial pass does the same retain(|&(a, _)| a != addr) it did before.
  4. No semantic change to validation: same checks, same error strings, only the location moves.

Benchmark

Fixture: bal-devnet-7-mainnet-mix-460 (460 blocks, ~30 Ggas, transfer/EVM mix). release-with-debug, import-bench --with-bal. Baseline = feat/import-bench-bal-tooling (bench tooling only, no perf change).

metric baseline this PR delta
wall time 8.575 s 7.871 s -704 ms (-8.2%)
agg Ggas/s 3.901 4.291 +10.0%
avg ms / block 16.884 15.349 -1.54 ms (-9.1%)
p95 ms / block 17.730 16.050 -1.68 ms (-9.5%)
max ms / block 90.060 85.000 -5.06 ms
exec avg 15.574 ms 14.172 ms -1.40 ms (-9.0%)
merkle 0.479 ms 0.412 ms flat (run variance)
store 0.667 ms 0.610 ms flat (run variance)
warmer 1.365 ms 1.315 ms flat

The win is concentrated in exec (-9%), which is exactly where the serial post-par_iter validation pass used to live. The merkle / store / warmer deltas are within run-to-run variance; this PR doesn't touch those phases.

Compare dashboard: https://edgl.dev/share/compare-feat-import-bench-bal-tooling@02a5663c-vs-perf-bal-validation-into-par-iter@cc8ba27c.html

Test plan

  • cargo check --bin ethrex (default features)
  • cargo fmt --all --check
  • cargo clippy --workspace --no-deps -- -D warnings
  • EF blockchain tests (Amsterdam; two-pass parallel runner)
  • Hive bal group

@github-actions
Copy link
Copy Markdown

github-actions Bot commented May 19, 2026

⚠️ Known Issues — intentionally skipped tests

Source: docs/known_issues.md

Known Issues

Tests intentionally excluded from CI. Source of truth for the Known
Issues
section the L1 workflow appends to each ef-tests job summary
and posts as a sticky PR comment.

EF Tests — Stateless coverage narrowed to EIP-8025 optional-proofs

make -C tooling/ef_tests/blockchain test calls test-stateless-zkevm
instead of test-stateless. The zkevm@v0.3.3 fixtures are filled against
bal@v5.6.1, out of sync with current bal spec; the broad target trips ~549
fixtures. Re-broaden once the zkevm bundle is regenerated.

Why and resolution path

PR #6527 broadened
test-stateless to extract the entire for_amsterdam/ tree from the
zkevm bundle and run all of it under --features stateless; combined with
this branch's bal-devnet-7 semantics that scope produces ~549
GasUsedMismatch / ReceiptsRootMismatch /
BlockAccessListHashMismatch failures.

test-stateless-zkevm filters cargo to the eip8025_optional_proofs
suite, which still validates the stateless harness without the bal-version
mismatch.

Re-broaden by switching test: back to test-stateless in
tooling/ef_tests/blockchain/Makefile once the zkevm bundle is regenerated
against the current bal spec.

@github-actions github-actions Bot added L1 Ethereum client performance Block execution throughput and performance in general labels May 19, 2026
@github-actions
Copy link
Copy Markdown

github-actions Bot commented May 19, 2026

Lines of code report

Total lines added: 18
Total lines removed: 0
Total lines changed: 18

Detailed view
+---------------------------------------+-------+------+
| File                                  | Lines | Diff |
+---------------------------------------+-------+------+
| ethrex/crates/vm/backends/levm/mod.rs | 2405  | +18  |
+---------------------------------------+-------+------+

@github-actions
Copy link
Copy Markdown

github-actions Bot commented May 19, 2026

Benchmark Block Execution Results Comparison Against Main

Command Mean [s] Min [s] Max [s] Relative
base 62.978 ± 0.232 62.521 63.257 1.00 ± 0.01
head 62.890 ± 0.297 62.457 63.303 1.00

@edg-l edg-l marked this pull request as ready for review May 19, 2026 15:49
@edg-l edg-l requested a review from a team as a code owner May 19, 2026 15:49
@ethrex-project-sync ethrex-project-sync Bot moved this to In Review in ethrex_l1 May 19, 2026
@github-actions
Copy link
Copy Markdown

🤖 Kimi Code Review

The PR optimizes parallel transaction execution by moving BAL (Block Access List) validation into the rayon parallel closure, reducing the serial bottleneck. The implementation correctly preserves consensus-critical error ordering (gas limit checks before BAL validation) and maintains EIP-7928 compliance.

No critical issues found. Below are minor observations and suggestions:

Code Quality & Safety

crates/vm/backends/levm/mod.rs

  1. Lines 1194–1208: The iteration over current_state to build reads_satisfied and destroyed clones storage keys. Ensure acct.storage.keys() yields &H256 (not &U256) to match the unread_storage_reads: FxHashSet<(Address, H256)> type. The diff shows this change (from U256 to H256), which is correct for consistency.

  2. Lines 1250–1252: binary_search_by assumes acct.storage_changes is sorted by slot. Verify that the BAL construction guarantees this invariant. If the BAL can be unsorted, this will cause false negatives in validation.

  3. Lines 1263–1264: Explicit drop(current_state); drop(codes); is good documentation that these large structures must not cross the Rayon boundary, even though Rust would drop them implicitly at the end of the closure.

  4. Lines 1338–1340: unread_storage_reads.retain(|&(a, _)| a != *addr) is O(n) per destroyed account. Given selfdestruct is rare and unread_storage_reads is bounded by BAL size, this is acceptable, but consider documenting this trade-off.

Error Handling

  1. Lines 1329–1333: The deferred error surfacing correctly maintains transaction order after sorting. Using `.take

Automated review by Kimi (Moonshot AI) · kimi-k2.5 · custom prompt

@github-actions
Copy link
Copy Markdown

🤖 Codex Code Review

  1. crates/vm/backends/levm/mod.rs:1218-1269, 1292-1335
    Moving validate_tx_execution() into the par_iter means full BAL validation now runs for every tx before the block-level gas checks decide whether the block is invalid anyway. That preserves error precedence, but it changes rejection cost for adversarial inputs: an over-gas / allowance-invalid block now still pays all per-tx BAL validation work, including fallback DB/code reads, before being rejected. In a peer-facing execution client, that is an avoidable CPU/DB amplification path on invalid blocks. I’d keep the cheap per-tx preprocessing (reads_satisfied / destroyed / tracked) in the closure, but leave validate_tx_execution() in the post-gas serial phase, or otherwise gate the heavy validation behind “gas checks passed”.

I did not find a correctness or consensus-rule regression in the refactor itself: tx-order error surfacing is preserved, the EIP-7928 shadow checks still run in tx order, and the reads_satisfied / destroyed extraction matches the prior semantics. The main gap I see is performance/security on invalid blocks. A targeted regression test for “gas overflow beats BAL mismatch on the parallel path” would also be worth adding to lock the intended precedence down.


Automated review by OpenAI Codex · gpt-5.4 · custom prompt

@github-actions
Copy link
Copy Markdown

🤖 Claude Code Review

Now I have everything I need for a thorough review.


PR Review: perf(l1): move per-tx BAL validation into the par_iter closure

Overall Assessment

This is a well-reasoned refactor. The core idea — moving read-only validation work from the serial post-par_iter loop into the parallel closure, while deferring errors via Option<EvmError> to preserve GAS_USED_OVERFLOW priority — is sound. The benchmarks (+8–10%) are compelling. The logic is semantically equivalent to the original in all observable outcomes.

A few things worth reviewing before merging:


Correctness

Deferred error priority is preserved correctly.
Step 4 iterates exec_results in sorted (tx) order and returns on the first Some(err). Because validate_tx_execution + shadow checks are both captured in the same IIFE, neither class of BAL error escapes the deferral. The GAS_USED_OVERFLOW > BAL mismatch ordering is maintained. ✓

reads_satisfied is semantically equivalent to the original current_state iteration.
Original step 4 iterated current_state.storage.keys() for non-destroyed accounts. The new closure does the same thing to build reads_satisfied, then the serial step 5 applies them. The special-case for Destroyed | DestroyedModified (retain-by-address instead of per-key remove) is preserved identically. ✓

Set-update behavior when an error is surfaced.
In the original, a BAL error at tx N still had txs 0..N-1's set updates applied (same loop). In the new code, step 4 returns before step 5 runs at all, so unread_storage_reads / unaccessed_pure_accounts are never updated when there's an error. This is harmless — the function returns Err in both cases and the caller never uses those sets — but it is a semantic difference worth being aware of if this code is ever refactored further.

Thread safety of validate_tx_execution in parallel context.
bal, validation_index, system_seed, and store are now accessed simultaneously across all rayon worker threads. The Rust borrow checker enforces that these are all Sync (verified at compile time, since the code passes cargo check). The function takes only shared references to all shared inputs, so there are no data races. ✓


Minor Observations

Unconditional reads_satisfied allocation inside the closure.
crates/vm/backends/levm/mod.rs, around line 1194:

let mut reads_satisfied: Vec<(Address, H256)> = Vec::new();
let mut destroyed: Vec<Address> = Vec::new();
for (addr, acct) in &current_state { ... }

This runs unconditionally for every tx, even if unread_storage_reads is empty (which would happen on blocks without any BAL storage reads). The original code gated this work behind if !unread_storage_reads.is_empty(). The benchmark improvement suggests this case is rare or the overhead is trivial, but for completeness you could add a short-circuit:

let build_reads = !unread_storage_reads_is_populated; // capture a bool before the closure

Since unread_storage_reads is not accessible from inside the closure, you'd need to capture a bool flag. Whether this is worth doing depends on how often BAL blocks have no storage reads.

Dangling comment context after removing "Mark pure-access accounts..." header.
crates/vm/backends/levm/mod.rs, step 5 loop body:

// The coinbase is always accessed during fee finalization (geth's
// readerTracker records it), even when the miner fee is zero and
// ethrex skips the load_account call.

The comment was originally preceded by // Mark pure-access accounts that were accessed during this tx. which provided context. Without it, the coinbase comment now appears without a lead-in explaining why tracked_accounts is being iterated at all. A one-line header like // Remove pure-access accounts that were accessed during this tx. would restore context.

All BAL validations complete before any error is surfaced.
Because rayon runs all tx closures to completion, a BAL failure in tx 0 still causes txs 1..N to run their full validation. In the original serial loop, tx 0's failure returned immediately. This is not a bug — the parallel path is read-only — but it changes the "fail-fast" behaviour. No action needed, just worth noting in case future work adds any write-path in these closures.

u32::try_from(tx_idx + 1).unwrap_or(u32::MAX) (pre-existing).
This was already in the original code and is inherited unchanged. The + 1 on a usize is not guarded against usize::MAX before the try_from, but this is not realistic in practice. Not introduced by this PR.


Summary

The refactor is correct, safe, and the performance win is real. The two minor items worth addressing before merge:

  • Restore a one-line header comment before the tracked_accounts iteration in step 5.
  • Optionally consider whether the unconditional reads_satisfied/destroyed allocation in the closure needs a guard (low priority — likely not observable in practice).

Automated review by Claude (Anthropic) · sonnet · custom prompt

@greptile-apps
Copy link
Copy Markdown

greptile-apps Bot commented May 19, 2026

Greptile Summary

This PR moves per-tx BAL validation (including validate_tx_execution and EIP-7928 shadow recorder checks) from a serial post-par_iter loop into the rayon parallel closure, delivering a measured ~9% speedup in block execution time. Errors are carried as Option<EvmError> in the result tuple and surfaced in tx order after the gas-limit check, preserving the GAS_USED_OVERFLOW > BAL-mismatch priority invariant.

  • TxExecResult sheds its two heavy FxHashMap fields (current_state and codes) by precomputing reads_satisfied: Vec<(Address, H256)> and destroyed: Vec<Address> in the closure, then explicitly dropping the maps before the result crosses the rayon boundary.
  • The serial post-par_iter pass is now three lightweight steps: gas/2D-allowance accounting (unchanged), deferred-BAL-error surfacing (new, O(n) scan), and cheap hash-set ops for unread_storage_reads / unaccessed_pure_accounts (semantically equivalent to the old logic).
  • EF Amsterdam blockchain tests and the Hive bal group are marked not yet run in the test plan; this path handles parallel block execution and BAL validation, so those results would strengthen confidence before merging.

Confidence Score: 4/5

Safe to merge pending EF Amsterdam blockchain tests and Hive bal group results.

The semantic equivalence between old and new code is clear and well-documented: reads_satisfied/destroyed precomputation mirrors the old serial loop exactly, deferred errors surface in tx order after the gas-limit check, and explicit drop() calls ensure heavy maps never cross the rayon boundary. The one gap is that EF blockchain tests and the Hive bal group have not yet run.

crates/vm/backends/levm/mod.rs — specifically the deferred error surfacing loop (step 4) and the reads_satisfied/destroyed application loop (step 5).

Important Files Changed

Filename Overview
crates/vm/backends/levm/mod.rs Core parallel execution refactoring: BAL validation and shadow checks moved into the rayon closure; TxExecResult shrinks from 8 to 7 fields by replacing two heavy FxHashMaps with pre-computed Vecs; deferred errors preserve gas-limit-check priority; EF/Hive tests not yet run.
CHANGELOG.md Changelog entry added for the BAL validation parallelization improvement.

Sequence Diagram

sequenceDiagram
    participant Caller
    participant SerialPre as Serial (pre-par_iter)
    participant Rayon as Rayon par_iter closure (per tx, parallel)
    participant SerialPost as Serial (post-par_iter)

    Caller->>SerialPre: execute_block_parallel()
    SerialPre->>SerialPre: prepare_block (system calls, seed unread_storage_reads)
    SerialPre->>Rayon: spawn N parallel tasks

    Note over Rayon: Per tx (parallel):
    Rayon->>Rayon: seed_db_from_bal()
    Rayon->>Rayon: execute_tx_in_block()
    Rayon->>Rayon: compute reads_satisfied + destroyed from current_state
    Rayon->>Rayon: validate_tx_execution() capture error
    Rayon->>Rayon: shadow_touched / shadow_reads checks capture error
    Rayon->>Rayon: drop(current_state), drop(codes)
    Rayon-->>SerialPost: (tx_idx, tx_type, report, tracked, reads_satisfied, destroyed, Option EvmError)

    SerialPost->>SerialPost: sort by tx_idx
    SerialPost->>SerialPost: Step 3 gas limit + 2D allowance check
    SerialPost->>SerialPost: Step 4 surface first deferred BAL error in tx order
    SerialPost->>SerialPost: Step 5 apply reads_satisfied/destroyed to unread_storage_reads
    SerialPost->>SerialPost: Step 6 build receipts
    SerialPost-->>Caller: Ok(receipts, block_gas_used, unread_storage_reads, ...)
Loading

Reviews (1): Last reviewed commit: "docs(changelog): add per-tx BAL validati..." | Re-trigger Greptile

Copy link
Copy Markdown
Contributor

@ElFantasma ElFantasma left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Two small inline observations + a body note. None blocking.

Body finding: the closure's validate_tx_execution call is wrapped in map_err(|e| EvmError::Custom(format!("BAL validation failed for tx {tx_idx}: {e}"))) (line 1236-1238) and the two Custom(...) return Err paths below it also bake tx_idx into the message. When the deferred-error surfacing loop (line 1332-1336) returns the first error, the original tx_idx is preserved in the message — good. But: EvmError::Custom loses structured information that a future caller might want (e.g., EvmError::BalValidation { tx_idx, kind, ... }). Adding a typed variant for BAL errors is out of scope for this PR but worth considering as a follow-up — right now consumers can only string-match.

Comment thread crates/vm/backends/levm/mod.rs
Comment thread crates/vm/backends/levm/mod.rs Outdated
edg-l added 3 commits May 22, 2026 10:37
execute_block_parallel previously returned (current_state, codes,
shadow_touched, shadow_reads) per tx and validated them in a serial post-loop
across all txs (validate_tx_execution + shadow checks + mark
unread_storage_reads / unaccessed_pure_accounts).

Validation is per-tx pure work; only marking the shared sets mutates
cross-tx state. Move validate_tx_execution + shadow_touched/shadow_reads
checks into the closure. Precompute the small (Vec<(Address, H256)>,
Vec<Address>) inputs the serial pass needs to update the shared sets.
Drop current_state + codes inside the closure so they no longer cross
the rayon boundary.

Defer BAL validation errors via Option<EvmError> in the result tuple so
the post-par_iter gas-limit check still takes priority over BAL mismatch
(preserves GAS_USED_OVERFLOW > BAL error ordering for blocks exceeding
the gas limit).

Expected win on 200-tx blocks: the ~3 ms median serial validation pass
goes away. Also reduces per-tx allocator pressure across rayon workers
since the per-tx maps are dropped before the result tuple is constructed.
- reads_satisfied: pre-size with current_state.len() * 4 to skip 2-3
  reallocations on the hot path (rough avg slots-per-account).
- destroyed: keep Vec::new() since selfdestruct is rare post-EIP-6780;
  no-allocation default is optimal.
- Document why the post-collect sort_unstable_by_key is a defensive
  no-op (IndexedParallelIterator preserves order) so a future refactor
  doesn't drop the guard.

Addresses ElFantasma feedback on #6677.
@edg-l
Copy link
Copy Markdown
Contributor Author

edg-l commented May 22, 2026

Re. the body note on EvmError::Custom(format!(...)) losing structure: agreed, but out of scope for this PR. Opened #6711 as a follow-up to introduce a typed EvmError::BalValidation { tx_idx, kind } variant and migrate the BAL call sites.

Rebased on main (PR #6669 landed in the meantime; CHANGELOG conflict resolved by collapsing both ### 2026-05-19 Perf entries under one date). Force-pushed.

edg-l added a commit that referenced this pull request May 28, 2026
- reads_satisfied: pre-size with current_state.len() * 4 to skip 2-3
  reallocations on the hot path (rough avg slots-per-account).
- destroyed: keep Vec::new() since selfdestruct is rare post-EIP-6780;
  no-allocation default is optimal.
- Document why the post-collect sort_unstable_by_key is a defensive
  no-op (IndexedParallelIterator preserves order) so a future refactor
  doesn't drop the guard.

Addresses ElFantasma feedback on #6677.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

L1 Ethereum client performance Block execution throughput and performance in general

Projects

Status: In Review
Status: Todo

Development

Successfully merging this pull request may close these issues.

4 participants