perf(l1): move per-tx BAL validation into the par_iter closure#6677
perf(l1): move per-tx BAL validation into the par_iter closure#6677edg-l wants to merge 3 commits into
Conversation
|
Lines of code reportTotal lines added: Detailed view |
Benchmark Block Execution Results Comparison Against Main
|
🤖 Kimi Code ReviewThe PR optimizes parallel transaction execution by moving BAL (Block Access List) validation into the No critical issues found. Below are minor observations and suggestions: Code Quality & Safety
Error Handling
Automated review by Kimi (Moonshot AI) · kimi-k2.5 · custom prompt |
🤖 Codex Code Review
I did not find a correctness or consensus-rule regression in the refactor itself: tx-order error surfacing is preserved, the EIP-7928 shadow checks still run in tx order, and the Automated review by OpenAI Codex · gpt-5.4 · custom prompt |
🤖 Claude Code ReviewNow I have everything I need for a thorough review. PR Review:
|
Greptile SummaryThis PR moves per-tx BAL validation (including
Confidence Score: 4/5Safe to merge pending EF Amsterdam blockchain tests and Hive bal group results. The semantic equivalence between old and new code is clear and well-documented: reads_satisfied/destroyed precomputation mirrors the old serial loop exactly, deferred errors surface in tx order after the gas-limit check, and explicit drop() calls ensure heavy maps never cross the rayon boundary. The one gap is that EF blockchain tests and the Hive bal group have not yet run. crates/vm/backends/levm/mod.rs — specifically the deferred error surfacing loop (step 4) and the reads_satisfied/destroyed application loop (step 5).
|
| Filename | Overview |
|---|---|
| crates/vm/backends/levm/mod.rs | Core parallel execution refactoring: BAL validation and shadow checks moved into the rayon closure; TxExecResult shrinks from 8 to 7 fields by replacing two heavy FxHashMaps with pre-computed Vecs; deferred errors preserve gas-limit-check priority; EF/Hive tests not yet run. |
| CHANGELOG.md | Changelog entry added for the BAL validation parallelization improvement. |
Sequence Diagram
sequenceDiagram
participant Caller
participant SerialPre as Serial (pre-par_iter)
participant Rayon as Rayon par_iter closure (per tx, parallel)
participant SerialPost as Serial (post-par_iter)
Caller->>SerialPre: execute_block_parallel()
SerialPre->>SerialPre: prepare_block (system calls, seed unread_storage_reads)
SerialPre->>Rayon: spawn N parallel tasks
Note over Rayon: Per tx (parallel):
Rayon->>Rayon: seed_db_from_bal()
Rayon->>Rayon: execute_tx_in_block()
Rayon->>Rayon: compute reads_satisfied + destroyed from current_state
Rayon->>Rayon: validate_tx_execution() capture error
Rayon->>Rayon: shadow_touched / shadow_reads checks capture error
Rayon->>Rayon: drop(current_state), drop(codes)
Rayon-->>SerialPost: (tx_idx, tx_type, report, tracked, reads_satisfied, destroyed, Option EvmError)
SerialPost->>SerialPost: sort by tx_idx
SerialPost->>SerialPost: Step 3 gas limit + 2D allowance check
SerialPost->>SerialPost: Step 4 surface first deferred BAL error in tx order
SerialPost->>SerialPost: Step 5 apply reads_satisfied/destroyed to unread_storage_reads
SerialPost->>SerialPost: Step 6 build receipts
SerialPost-->>Caller: Ok(receipts, block_gas_used, unread_storage_reads, ...)
Reviews (1): Last reviewed commit: "docs(changelog): add per-tx BAL validati..." | Re-trigger Greptile
ElFantasma
left a comment
There was a problem hiding this comment.
Two small inline observations + a body note. None blocking.
Body finding: the closure's validate_tx_execution call is wrapped in map_err(|e| EvmError::Custom(format!("BAL validation failed for tx {tx_idx}: {e}"))) (line 1236-1238) and the two Custom(...) return Err paths below it also bake tx_idx into the message. When the deferred-error surfacing loop (line 1332-1336) returns the first error, the original tx_idx is preserved in the message — good. But: EvmError::Custom loses structured information that a future caller might want (e.g., EvmError::BalValidation { tx_idx, kind, ... }). Adding a typed variant for BAL errors is out of scope for this PR but worth considering as a follow-up — right now consumers can only string-match.
execute_block_parallel previously returned (current_state, codes, shadow_touched, shadow_reads) per tx and validated them in a serial post-loop across all txs (validate_tx_execution + shadow checks + mark unread_storage_reads / unaccessed_pure_accounts). Validation is per-tx pure work; only marking the shared sets mutates cross-tx state. Move validate_tx_execution + shadow_touched/shadow_reads checks into the closure. Precompute the small (Vec<(Address, H256)>, Vec<Address>) inputs the serial pass needs to update the shared sets. Drop current_state + codes inside the closure so they no longer cross the rayon boundary. Defer BAL validation errors via Option<EvmError> in the result tuple so the post-par_iter gas-limit check still takes priority over BAL mismatch (preserves GAS_USED_OVERFLOW > BAL error ordering for blocks exceeding the gas limit). Expected win on 200-tx blocks: the ~3 ms median serial validation pass goes away. Also reduces per-tx allocator pressure across rayon workers since the per-tx maps are dropped before the result tuple is constructed.
- reads_satisfied: pre-size with current_state.len() * 4 to skip 2-3 reallocations on the hot path (rough avg slots-per-account). - destroyed: keep Vec::new() since selfdestruct is rare post-EIP-6780; no-allocation default is optimal. - Document why the post-collect sort_unstable_by_key is a defensive no-op (IndexedParallelIterator preserves order) so a future refactor doesn't drop the guard. Addresses ElFantasma feedback on #6677.
|
Re. the body note on Rebased on main (PR #6669 landed in the meantime; CHANGELOG conflict resolved by collapsing both |
8008a21 to
48b6967
Compare
- reads_satisfied: pre-size with current_state.len() * 4 to skip 2-3 reallocations on the hot path (rough avg slots-per-account). - destroyed: keep Vec::new() since selfdestruct is rare post-EIP-6780; no-allocation default is optimal. - Document why the post-collect sort_unstable_by_key is a defensive no-op (IndexedParallelIterator preserves order) so a future refactor doesn't drop the guard. Addresses ElFantasma feedback on #6677.
Summary
execute_block_parallelpreviously returned(current_state, codes, shadow_touched, shadow_reads)per tx and validated them in a serial post-par_iterloop across all txs. Validation is per-tx pure work; only the marking ofunread_storage_reads/unaccessed_pure_accountsmutates cross-tx state.This PR moves
validate_tx_execution+ shadow-touched / shadow-reads checks into the rayon closure (parallel). The closure also precomputes the small(Vec<(Address, H256)>, Vec<Address>)inputs the serial pass uses to update the shared sets, socurrent_state+codesno longer cross the rayon boundary.BAL validation errors are deferred via
Option<EvmError>in the result tuple so the post-par_itergas-limit check still takes priority (preservesGAS_USED_OVERFLOW> BAL mismatch on blocks exceeding the gas limit; the BAL is built assuming rejected txs, so miner balance in the BAL won't match execution that ran all txs).Changes
TxExecResulttuple shrinks from 8 fields (with two heavyFxHashMaps) to 7 fields withVecs only.execute_tx_in_block:reads_satisfied: Vec<(Address, H256)>(touched storage keys per non-destroyed account) anddestroyed: Vec<Address>(selfdestructed accounts whose remaining BAL storage_reads are wiped) fromcurrent_state.validate_tx_execution+ shadow-touched + shadow-reads checks; capture the first error asOption<EvmError>(deferred).current_stateandcodesbefore returning.par_iter:reads_satisfied/destroyedtounread_storage_readsandtracked_accountstounaccessed_pure_accounts(cheap hash-set ops).Invariants
tx_idxorder; the first failing tx still wins.Destroyed | DestroyedModifiedbranch in the unread-reads marking is now an explicitVec<Address>from the closure; the serial pass does the sameretain(|&(a, _)| a != addr)it did before.Benchmark
Fixture:
bal-devnet-7-mainnet-mix-460(460 blocks, ~30 Ggas, transfer/EVM mix).release-with-debug,import-bench --with-bal. Baseline =feat/import-bench-bal-tooling(bench tooling only, no perf change).The win is concentrated in
exec(-9%), which is exactly where the serial post-par_itervalidation pass used to live. The merkle / store / warmer deltas are within run-to-run variance; this PR doesn't touch those phases.Compare dashboard: https://edgl.dev/share/compare-feat-import-bench-bal-tooling@02a5663c-vs-perf-bal-validation-into-par-iter@cc8ba27c.html
Test plan
cargo check --bin ethrex(default features)cargo fmt --all --checkcargo clippy --workspace --no-deps -- -D warningsbalgroup