feat(l1): add statetest subcommand to ef_tests-statev2 for goevmlab fuzzing by ElFantasma · Pull Request #6663 · lambdaclass/ethrex

ElFantasma · 2026-05-15T17:58:10Z

Motivation

goevmlab is the standard differential-fuzzing harness for EL clients (geth, nethermind, besu, erigon, nimbus, evmone, eels, revm). It runs the same EF state test through every configured EVM and diffs their per-opcode (EIP-3155) traces and post-state roots. To plug ethrex into that fleet, goevmlab needs a tiny CLI surface on ethrex that emits EIP-3155 trace JSONL plus a stateRoot line on stderr. This PR provides that.

Description

Adds a statetest subcommand to the existing ef_tests-statev2 binary. Two modes:

statetest --json <fixture>: emits EIP-3155 per-opcode JSONL on stderr, followed by one {"stateRoot":"0x..."} line per (fork, post-index) case in the fixture.
statetest --json-outcome <fixture>: same but without the per-opcode trace — only the stateRoot line(s).

Output goes exclusively to stderr (stdout reserved for crash diagnostics, matching geth/revm convention). Exit code 0 means every case's computed post-state root matched the fixture's expected root; 1 means at least one mismatch (tolerated by goevmlab); any other non-zero / panic indicates an actual crash.

The subcommand reuses the existing fixture parser, env/tx builders, and post-state-root computation from state_v2. The per-opcode trace is generated by LevmOpcodeTracer (added in #6595) — for each test case we activate the tracer, execute the tx, and serialize each OpcodeStep as one JSONL line. The geth-structLogger wrapper used by the RPC adapter is sidestepped: goevmlab expects raw JSONL.

Also rounds out the state_v2 fixture deserializer with the fork mappings for Osaka, Amsterdam, and BPO1–BPO5. These variants are already in the Fork enum but were missing from the deserializer's match arm, so any fixture's post map containing them aborted parsing (the unmodified strict-error behavior is preserved for truly unknown fork names — adding a new variant remains a build break that has to be addressed).

Backwards-compatible: running the binary without a subcommand still triggers the existing bulk runner. Branch stacked on top of #6595 (feat/eip-3155-tracer) since the implementation depends on its LevmOpcodeTracer.

Closes #6575

How to Test

cd tooling

# full per-opcode trace + stateRoot per case
cargo run --release -p ef_tests-statev2 -- statetest --json \
  ef_tests/state/vectors/state_tests/cancun/eip5656_mcopy/test_mcopy_on_empty_memory.json \
  2> trace.jsonl
echo "exit=$?"
head -3 trace.jsonl     # per-opcode JSONL: {"pc":..., "op":..., ...}
grep stateRoot trace.jsonl   # final stateRoot line(s)

# outcome only
cargo run --release -p ef_tests-statev2 -- statetest --json-outcome \
  ef_tests/state/vectors/state_tests/cancun/eip5656_mcopy/test_mcopy_on_empty_memory.json \
  2>&1

Checklist

Sub-issue of the debug_* / fuzzing tracker ([Tracking] Implement missing debug_* RPC endpoints #6572).

Add per-opcode StructLog tracing wired through `debug_traceTransaction` and `debug_traceBlockByNumber`. Output is byte-compatible with geth's `structLogLegacy` JSON shape so goevmlab and EELS reference traces diff cleanly. - `crates/common/tracing.rs`: `StructLog`, `MemoryChunk`, `StructLogResult` with manual Serialize impls matching geth's `toLegacyJSON` field-by-field (decimal pc/gas/gasCost, opcode-name op string, geth uint256 hex stack, chunked memory, accumulated per-contract storage at SLOAD/SSTORE, omitempty refund/error/returnData). - `crates/vm/levm/src/struct_log_tracer.rs`: `LevmStructLogTracer` with zero-cost `active: bool` gate. `pre_step_capture` + `finalize_step` keep the dispatch-loop hook to one branch when disabled. - `crates/vm/levm/src/vm.rs`: dispatch-loop hook (pc captured pre-advance, stack reversed top→bottom for wire format), helper methods, end-of-tx capture using post-refund `gas_spent` (matches geth's `receipt.GasUsed`). - `crates/vm/levm/src/opcode_handlers/system.rs`: explicit `gasCost` for CALL/CALLCODE/DELEGATECALL/STATICCALL/CREATE/CREATE2 = `intrinsic + callGasTemp`, matching geth. - `crates/vm/backends/levm/tracing.rs`, `crates/vm/tracing.rs`, `crates/blockchain/tracing.rs`: `trace_tx_struct_log` plumbing. - `crates/networking/rpc/tracing.rs`: `TracerType::StructLogger` variant (alias `structLog`), `StructLogTracerConfig` with five geth-aligned flags. Default tracer remains `CallTracer` for Blockscout compat; documented divergence from geth. - Tests: 21 unit tests pinning every wire-format rule against geth source; 2 LEVM unit tests for dispatch+SSTORE; 3 fixture-diff integration tests (`eip3155_sstore_basic.json`, `eip3155_mstore_memory.json`, `eip3155_identity_return_data.json`). - `tooling/scripts/gen_structlog_fixtures.sh`: pinned-geth regen procedure. - `crates/vm/levm/benches/struct_log_disabled.rs`: Criterion microbench for the disabled hot path (~7.7 µs per 2k-opcode loop on dev machine).

Place the three structLog fixture JSONs alongside their integration tests in the project's standard test crate (test/tests/levm/fixtures/) instead of cmd/ethrex/tests/fixtures/. Updates the loader path in struct_log_tracer_tests.rs and the regen-script doc comments.

…ogLegacy

Wire-format rules and per-opcode capture semantics are already pinned by the 26 unit tests in ethrex-common and the 2 LEVM unit tests. The three JSON fixtures + gen helper + geth-targeted shell script added ~600 LoC of snapshot machinery for end-to-end coverage that one focused smoke test can provide. The smoke test exercises the full RPC pipeline (`LEVM::trace_tx_struct_log` -> `serde_json::to_value`) on a `PUSH1 PUSH1 SSTORE STOP` program and asserts the resulting JSON has the EIP-3155 strict wrapper (`pass`/`gasUsed`/`output`/`structLogs`) and per-step shape (numeric `op`, `opName`, hex `gas`/`gasCost`/`refund`, always-present `returnData`/`memSize`, single-entry `storage` on SSTORE). Removes: - test/tests/levm/fixtures/*.json (3 files) - test/tests/levm/struct_log_fixture_gen.rs - tooling/scripts/gen_structlog_fixtures.sh (geth comparison script, obsolete after the move to EIP-3155 strict)

"structLog" inherits geth's Go-type jargon and now misleads consumers: since this PR moved to strict EIP-3155 output, clients sending `tracer: "structLog"` expecting geth's structLogLegacy shape get a different format. The new name is self-describing and sits naturally beside `callTracer` and `prestateTracer`. No aliases — clients pass `"opcodeTracer"` explicitly.

Match the prestateTracer/callTracer convention: all tracer tests live under `test/tests/levm/`, none in `crates/`. - Deletes the 26 unit tests in `crates/common/tracing.rs` (Serialize field-by-field assertions). The single-field tests were dev scaffolding; end-to-end coverage now lives in `opcode_tracer_tests.rs`. - Deletes the 2 LEVM unit tests + in-tracer TestDb harness in `crates/vm/levm/src/opcode_tracer.rs`. Same coverage rebuilt as bytecode-driven e2e tests via the real `LEVM::trace_tx_opcodes` entry point and the shared `TestDatabase` fixture. - Renames `struct_log_tracer_tests.rs` -> `opcode_tracer_tests.rs` to match the existing `prestate_tracer_tests.rs` naming. - New e2e set covers: basic execution + wrapper shape, single-entry storage on SSTORE, memory capture under `enableMemory`, return-data capture under `enableReturnData`, stack=null under `disableStack`.

The microbench (PUSH1/POP loop, no `main` baseline, no enabled-path variant) only confirmed the disabled-path branch is hot-path-clean. It served as dev scaffolding; not worth carrying long-term.

Renaming `StructLogger` to `OpcodeTracer` made all three variants share the `Tracer` suffix, which clippy flags via `enum_variant_names`. The suffix is required because `rename_all = "camelCase"` derives the externally-fixed wire names `callTracer` / `prestateTracer` / `opcodeTracer` from the variant identifiers.

- Add CLZ (0x1E) and SLOTNUM (0x4B) names to opcode_name. - Drop redundant OpcodeTracerRpcConfig; deserialize OpcodeTracerConfig directly. - Replace total_size counter with last_step_captured flag so finalize_step doesn't clobber the last retained entry once the limit cap is hit. - Use call frame `to` (storage context) instead of `code_address` for SLOAD/SSTORE capture so DELEGATECALL/CALLCODE record under the caller's account. - Replace unsafe transmute-based U256->H256 with BigEndianHash::from_uint. - Narrow SLOAD error fallback: only AccountNotFound returns zero; other DB errors omit the storage entry instead of fabricating a value.

- omit `opName` for unknown opcodes (geth-compatible, EIP-3155 allows) - SLOAD: omit storage entry on any read failure, not just non-AccountNotFound - drop unused `addr` from `read_storage_for_trace` tuple

Align debug_traceTransaction output with the cross-client structLogger shape: {failed, gas, returnValue, structLogs}; per-step gas/gasCost/refund as numbers, op as the mnemonic string (opName dropped). EIP-3155 step content preserved (memSize, returnData, refund always emitted). Fix: jump() fused JUMP/JUMPI with the destination JUMPDEST, dropping the JUMPDEST step from the trace and inflating the parent's gasCost by 1. Now synthesizes a JUMPDEST entry when the tracer is active; the disabled hot path keeps the fusion. last_step_captured replaced with last_step_index so the synthetic entry doesn't shadow the parent's finalize_step patch target.

…ernode - ethereum-package pinned at e4b3305 (2025-04) -> 71b02f6 (current main). Required to pick up the besu launcher fix that drops CLIQUE from --rpc-http-api (besu 26.x removed the namespace). - geth v1.15.2 -> v1.17.3; besu main-142a5e6 -> main-6d54451; lighthouse v8.0.0-rc.1 -> v8.1.3 (the rc is over a year old). - supernode: true on the ethrex participant. With Fulu at epoch 0, the package now requires at least one supernode, a node with 128+ validators, or perfect_peerdas_enabled.

…PDEST `build_jumpdest_step` was duplicating ~50 lines of conditional stack/memory/ return-data construction already present in `LevmOpcodeTracer::pre_step_capture`. Extract a shared `opcode_tracer::build_step` free function that takes the cfg and the raw inputs, leaving both call sites as thin "collect VM state, hand off" wrappers. Also documents that `opcode_name`'s table is intentionally fork-agnostic (matching geth's `opCodeToString`) — fork validity is enforced at dispatch via the VM's per-fork opcode table, so the name lookup never fires for invalid-for-this-fork bytes in practice. Addresses iovoid's review comments on PR #6595.

github-actions · 2026-05-15T17:58:20Z

⚠️ Known Issues — intentionally skipped tests

Source: docs/known_issues.md

Known Issues

Tests intentionally excluded from CI. Source of truth for the Known
Issues section the L1 workflow appends to each ef-tests job summary
and posts as a sticky PR comment.

EF Tests — Stateless coverage narrowed to EIP-8025 optional-proofs

make -C tooling/ef_tests/blockchain test calls test-stateless-zkevm
instead of test-stateless. The zkevm@v0.3.3 fixtures are filled against
bal@v5.6.1, out of sync with current bal spec; the broad target trips ~549
fixtures. Re-broaden once the zkevm bundle is regenerated.

Why and resolution path

PR #6527 broadened
test-stateless to extract the entire for_amsterdam/ tree from the
zkevm bundle and run all of it under --features stateless; combined with
this branch's bal-devnet-7 semantics that scope produces ~549
GasUsedMismatch / ReceiptsRootMismatch /
BlockAccessListHashMismatch failures.

test-stateless-zkevm filters cargo to the eip8025_optional_proofs
suite, which still validates the stateless harness without the bal-version
mismatch.

Re-broaden by switching test: back to test-stateless in
tooling/ef_tests/blockchain/Makefile once the zkevm bundle is regenerated
against the current bal spec.

github-actions · 2026-05-15T18:00:41Z

🤖 Kimi Code Review

Overall this is a well-structured PR that correctly implements the goevmlab differential fuzzing interface. The code follows Rust idioms and properly handles the Ethereum state test semantics.

Issues and Suggestions

1. Unnecessary database clone (Performance)

File: tooling/ef_tests/state_v2/src/modules/statetest.rs
Line: 115

let account_updates = backends::levm::LEVM::get_state_transitions(&mut vm.db.clone())

The vm.db.clone() creates a full copy of the database state. Since vm is not used after this point (the opcode tracer logs are accessed earlier at lines 100-104), you likely don't need to clone unless get_state_transitions consumes self by value.

Recommendation: If get_state_transitions only requires &mut self or &self, remove the .clone() to avoid the memory overhead. If it does consume self, add a comment explaining why the clone is necessary (to preserve vm for the trace access above).

2. Tuple return clarity (Readability)

File: tooling/ef_tests/state_v2/src/modules/statetest.rs
Line: 44

The fixture_path method returns Result<(&PathBuf, bool), RunnerError>. While the boolean represents emit_trace, this is not immediately obvious at call sites.

Recommendation: Consider returning a small struct or adding a type alias for clarity:

struct FixtureConfig<'a> {
    path: &'a PathBuf,
    emit_trace: bool,
}

3. Fork variant synchronization (Maintenance)

File: tooling/ef_tests/state_v2/src/modules/deserialize.rs
Lines: 143-168

The new fork names ("Osaka", "BPO1"–"BPO5", "Amsterdam") are added with a comment to keep them in sync with crates/common/types/genesis.rs.

Recommendation: Ensure the corresponding Fork enum in genesis.rs actually contains these variants (particularly the "BPO" series which appear to be temporary testnet identifiers). If any are missing, this will fail to compile.

4. Error handling for JSON serialization (Robustness)

File: tooling/ef_tests/state_v2/src/modules/statetest.rs
Line: 102-103

let line = serde_json::to_string(step)
    .map_err(|e| RunnerError::Custom(format!("failed to serialize trace step: {e}")))?;

This correctly handles serialization failures. However, if a single step fails to serialize, the entire test run fails.

Recommendation: This is acceptable for fuzzing

Automated review by Kimi (Moonshot AI) · kimi-k2.5 · custom prompt

github-actions · 2026-05-15T18:00:48Z

🤖 Claude Code Review

Here is my code review for PR #6663:

Code Review: `feat(l1): add statetest subcommand to ef_tests-statev2 for goevmlab fuzzing`

Overview

This PR adds a statetest subcommand to the ef_tests-statev2 binary for goevmlab differential fuzzing. The approach is sound and the implementation is clean overall. A few issues are worth addressing before merge.

`statetest.rs` — New File

Exit code semantics mismatch (design issue)

The module doc comment promises:

0 = all cases match
1 = at least one mismatch
other = crash

However, when run_case (or opts.fixture_path()) returns an Err, it propagates through main's Result<ExitCode, RunnerError>. Rust's Termination impl for Result exits with code 1 when Err is returned, printing the error to stderr. This means parse errors and VM setup errors are indistinguishable from a state-root mismatch to goevmlab — both exit with 1.

Only a bare panic! would give a non-1 exit code. If goevmlab relies on other codes to detect infrastructure failures versus legitimate mismatches, this is a real bug. Consider:

pub async fn run(opts: StatetestOptions) -> ExitCode {
    match inner_run(opts).await {
        Ok(false) => ExitCode::SUCCESS,
        Ok(true) => ExitCode::from(1),
        Err(e) => {
            eprintln!("error: {e}");
            ExitCode::from(2)  // distinguishable from mismatch
        }
    }
}

statetest.rs:37–44 — Missing required-arg constraint at the clap level

Neither --json nor --json-outcome is marked required. Users invoking statetest with no path get a generic RunnerError::Custom at runtime rather than a clap usage error. Consider using an ArgGroup:

#[derive(Args, Debug)]
#[group(required = true, multiple = false)]
pub struct StatetestOptions {
    #[arg(long, value_name = "PATH")]
    pub json: Option<PathBuf>,
    #[arg(long, value_name = "PATH")]
    pub json_outcome: Option<PathBuf>,
}

This gives a proper clap error message with --help guidance.

statetest.rs:96 — Potential leading-zero truncation in stateRoot output

eprintln!("{{\"stateRoot\":\"0x{computed_root:x}\"}}");

If computed_root is of a type whose LowerHex impl does not zero-pad to 64 characters (e.g., a plain U256 or [u8; 32] using a non-padded formatter), the output may silently drop leading zero bytes. goevmlab (and EIP-3155) expect the full 64-character hex representation. Verify that the LowerHex impl for computed_root's type always emits 64 hex characters, or force it explicitly:

eprintln!("{{\"stateRoot\":\"0x{computed_root:064x}\"}}");

The flag 64x would be wrong if the value is already 32 bytes printed as 64 nibbles by the type's formatter, so check what type post_state_root returns.

statetest.rs:91 — Unnecessary clone of the entire database

let account_updates = backends::levm::LEVM::get_state_transitions(&mut vm.db.clone())

This clones the full in-memory DB just to satisfy the &mut requirement. For state tests with large initial state, this is a significant allocation that is immediately discarded. If get_state_transitions cannot take an immutable reference (an API limitation), at least document why the clone is necessary. If it can be changed to &mut vm.db directly, prefer that.

statetest.rs:62–65 — Minor: prefer compound assignment

let mismatch = run_case(test, test_case, emit_trace).await?;
if mismatch {
    any_mismatch = true;
}

More idiomatic:

any_mismatch |= run_case(test, test_case, emit_trace).await?;

`main.rs`

Potential conflict between flattened RunnerOptions and statetest subcommand

RunnerOptions is #[command(flatten)]-ed into Cli alongside an Option<Command>. If RunnerOptions has any required fields (positional args or required --flags), clap will demand them even when the statetest subcommand is used, making the UX very awkward.

This may already work if all RunnerOptions fields are optional, but it deserves an explicit note or test. If they conflict, the cleaner design is to not flatten RunnerOptions into the top-level Cli struct, and instead only accept them in the None branch (perhaps via a separate Run subcommand).

`deserialize.rs`

The reordering of EIP150, EIP158, Byzantium to be in chronological order before Constantinople is a clean improvement. The new fork variants (Osaka, BPO1–BPO5, Amsterdam) and the comment explaining the "hard error" policy for unknown forks are both correct and well-documented. No issues here.

Summary

Severity	Location	Item
Medium	`statetest.rs:56` / `main.rs:31`	Exit codes 1 (mismatch) and 1 (returned error) are conflated
Medium	`statetest.rs:96`	`{:x}` may truncate leading zeros on stateRoot
Minor	`statetest.rs:37`	No clap-level `required` constraint on `--json` / `--json-outcome`
Minor	`statetest.rs:91`	Unnecessary full DB clone
Nit	`statetest.rs:62`	Use `
Risk	`main.rs:14`	Flattened `RunnerOptions` may force unwanted args when using `statetest`

The deserialize.rs and mod.rs changes are clean and correct. The overall structure is good — the main concerns are the exit-code semantics and the stateRoot hex formatting.

Automated review by Claude (Anthropic) · sonnet · custom prompt

github-actions · 2026-05-15T18:03:37Z

🤖 Codex Code Review

Findings

High: statetest.rs:103 discards vm.execute() entirely and statetest.rs:121 treats “matching stateRoot” as the only success criterion. That diverges from the existing EF semantics in result_check.rs:73 and result_check.rs:132: expectException cases must match the expected error, not just leave the same post-state. As written, a wrong rejection that leaves state untouched will false-pass.
High: the same swallowed result at statetest.rs:103 also suppresses genuine VM bugs. VMError::Internal is explicitly marked as something that should propagate in errors.rs:22, but this path downgrades it to a normal “root mismatch” outcome instead of the documented “actual crash” exit class.
Medium: deserialize.rs:160 adds Osaka/BPO1..5 parsing, but types.rs:30 and types.rs:87 still filter runnable forks to Merge/Shanghai/Cancun/Prague/Amsterdam. For an Osaka/BPO fixture, statetest.rs:59 will execute zero cases and return success with no stateRoot, which is a silent false green.

Open Question

Since this subcommand is meant for differential fuzzing, should statetest.rs:87 consume fixture txbytes from types.rs:398 / types.rs:530 via transaction.rs:1628 instead of reconstructing and re-signing a normalized tx? Otherwise malformed/RLP-edge fixtures are not exercised faithfully.

Compile verification was not possible in this sandbox because Cargo needed write access under /home/runner/.cargo and the libssz git dependency was not available locally.

Automated review by OpenAI Codex · gpt-5.4 · custom prompt

greptile-apps · 2026-05-15T18:03:51Z

Greptile Summary

This PR plugs ethrex into the goevmlab differential fuzzing harness by adding a statetest subcommand to ef_tests-statev2. It also rounds out the fixture deserializer with missing fork mappings (Osaka, Amsterdam, BPO1–BPO5).

statetest subcommand (statetest.rs, main.rs): accepts --json (full EIP-3155 per-opcode JSONL + stateRoot) or --json-outcome (stateRoot only) and emits everything to stderr; exit code 0 = all roots match, 1 = at least one mismatch.
Deserializer extension (deserialize.rs): adds seven new fork-name → Fork enum mappings and reorders the pre-existing EIP150/EIP158/Byzantium entries into chronological order; strictly additive with no behavior change for existing fixtures.

Confidence Score: 3/5

The core logic and goevmlab wire protocol are correct, but there is a concrete defect in how the subcommand signals errors back to the fuzzer.

When any RunnerError is returned via ? in run_case — a VM setup failure, a failed state-transition call, or a bad fixture argument — Rust's Termination trait exits with code 1, the same code deliberately chosen for state-root mismatches. goevmlab interprets exit code 1 as a tolerated differential result and would silently absorb real execution errors rather than flagging them as crashes. The rest of the PR (deserializer additions, clap wiring, trace emission order) is straightforward and low-risk.

tooling/ef_tests/state_v2/src/modules/statetest.rs — specifically the run() and run_case() error propagation paths.

Important Files Changed

Filename	Overview
tooling/ef_tests/state_v2/src/modules/statetest.rs	New file implementing the goevmlab statetest subcommand; exit code 1 is used for both state-root mismatches and propagated RunnerErrors, collapsing the crash/mismatch distinction the interface contract requires.
tooling/ef_tests/state_v2/src/main.rs	Clean addition of optional `statetest` subcommand via clap; backwards-compatible with the existing bulk runner path.
tooling/ef_tests/state_v2/src/modules/deserialize.rs	Adds Osaka, BPO1–5, and Amsterdam fork mappings; reorders EIP150/EIP158/Byzantium to appear in chronological order. No logic changes, strictly additive.
tooling/ef_tests/state_v2/src/modules/mod.rs	Trivial: exports the new `statetest` module.

Sequence Diagram

sequenceDiagram
    participant goevmlab
    participant ef_tests_binary as ef_tests-statev2
    participant statetest as statetest::run()
    participant parse as parse_file()
    participant run_case as run_case()
    participant VM as LEVM VM
    participant psr as post_state_root()

    goevmlab->>ef_tests_binary: statetest --json fixture
    ef_tests_binary->>statetest: run(StatetestOptions)
    statetest->>parse: parse_file(path, false)
    parse-->>statetest: "Vec<Test>"
    loop for each (test, test_case)
        statetest->>run_case: run_case(test, test_case, emit_trace)
        run_case->>VM: VM::new(env, db, tx, ...)
        run_case->>VM: "vm.opcode_tracer = LevmOpcodeTracer::new(...)"
        run_case->>VM: vm.execute()
        VM-->>run_case: execution result (ignored)
        Note over run_case: emit opcode JSONL steps to stderr
        run_case->>psr: post_state_root(account_updates, ...)
        psr-->>run_case: computed H256
        Note over run_case: eprintln stateRoot line to stderr
        run_case-->>statetest: mismatch: bool
    end
    statetest-->>ef_tests_binary: ExitCode 0 or 1
    ef_tests_binary-->>goevmlab: exit code + stderr JSONL + stateRoot lines

Prompt To Fix All With AI

Fix the following 3 code review issues. Work through them one at a time, proposing concise fixes.

---

### Issue 1 of 3
tooling/ef_tests/state_v2/src/modules/statetest.rs:55-73
**Exit code collision between errors and mismatches**

When `run_case` returns `Err(RunnerError)` — e.g., `VMError` from `VM::new()`, `FailedToGetAccountsUpdates` from `get_state_transitions`, or a `RunnerError::Custom` from `fixture_path()` — Rust's `Termination` impl for `Result<ExitCode, E: Debug>` emits `ExitCode::FAILURE` which is `1` on every platform. This is identical to the `ExitCode::from(1)` path for state-root mismatches. The PR documentation says exit code `1` means "mismatch (tolerated)" and any higher value means "actual crash," but actual crashes via `?`-propagated errors are silently folded into the mismatch bucket. goevmlab would treat a VM setup failure or state-transition error the same way it treats a normal differential result, hiding real bugs from the fuzzer.

### Issue 2 of 3
tooling/ef_tests/state_v2/src/modules/statetest.rs:48-50
When neither `--json` nor `--json-outcome` is supplied (both are `Option`al so clap won't reject the invocation), this error fires with a message that sounds like "both were provided." Clap's `conflicts_with` already prevents the `(Some, Some)` case, so the only real runtime scenario is `(None, None)`. A clearer message would help users.

```suggestion
            (None, None) => Err(RunnerError::Custom(
                "one of --json or --json-outcome must be provided".to_string(),
            )),
            _ => Err(RunnerError::Custom(
                "exactly one of --json or --json-outcome must be provided".to_string(),
            )),
```

### Issue 3 of 3
tooling/ef_tests/state_v2/src/modules/statetest.rs:115-117
**`get_state_transitions` called on a clone of `vm.db`**

`backends::levm::LEVM::get_state_transitions(&mut vm.db.clone())` clones the entire generalized database, then immediately passes the temporary clone by mutable reference and discards it. This mirrors the same pattern in `result_check::check_root`, so it is functionally correct, but the DB clone can be expensive for large pre-state fixtures. Worth noting as a candidate for a shared refactor once the `run_case` API stabilizes — no change needed now.

_{Reviews (1): Last reviewed commit: "feat(l1): add statetest subcommand to ef..." | Re-trigger Greptile}

greptile-apps · 2026-05-15T18:03:55Z

+pub async fn run(opts: StatetestOptions) -> Result<ExitCode, RunnerError> {
+    let (path, emit_trace) = opts.fixture_path()?;
+    let tests = parse_file(path, false)?;
+
+    let mut any_mismatch = false;
+    for test in &tests {
+        for test_case in &test.test_cases {
+            let mismatch = run_case(test, test_case, emit_trace).await?;
+            if mismatch {
+                any_mismatch = true;
+            }
+        }
+    }
+
+    Ok(if any_mismatch {
+        ExitCode::from(1)
+    } else {
+        ExitCode::SUCCESS
+    })


Exit code collision between errors and mismatches

When run_case returns Err(RunnerError) — e.g., VMError from VM::new(), FailedToGetAccountsUpdates from get_state_transitions, or a RunnerError::Custom from fixture_path() — Rust's Termination impl for Result<ExitCode, E: Debug> emits ExitCode::FAILURE which is 1 on every platform. This is identical to the ExitCode::from(1) path for state-root mismatches. The PR documentation says exit code 1 means "mismatch (tolerated)" and any higher value means "actual crash," but actual crashes via ?-propagated errors are silently folded into the mismatch bucket. goevmlab would treat a VM setup failure or state-transition error the same way it treats a normal differential result, hiding real bugs from the fuzzer.

Prompt To Fix With AI

This is a comment left during a code review. Path: tooling/ef_tests/state_v2/src/modules/statetest.rs Line: 55-73 Comment: **Exit code collision between errors and mismatches** When `run_case` returns `Err(RunnerError)` — e.g., `VMError` from `VM::new()`, `FailedToGetAccountsUpdates` from `get_state_transitions`, or a `RunnerError::Custom` from `fixture_path()` — Rust's `Termination` impl for `Result<ExitCode, E: Debug>` emits `ExitCode::FAILURE` which is `1` on every platform. This is identical to the `ExitCode::from(1)` path for state-root mismatches. The PR documentation says exit code `1` means "mismatch (tolerated)" and any higher value means "actual crash," but actual crashes via `?`-propagated errors are silently folded into the mismatch bucket. goevmlab would treat a VM setup failure or state-transition error the same way it treats a normal differential result, hiding real bugs from the fuzzer. How can I resolve this? If you propose a fix, please make it concise.

greptile-apps · 2026-05-15T18:03:56Z

+            _ => Err(RunnerError::Custom(
+                "exactly one of --json or --json-outcome must be provided".to_string(),
+            )),


When neither --json nor --json-outcome is supplied (both are Optional so clap won't reject the invocation), this error fires with a message that sounds like "both were provided." Clap's conflicts_with already prevents the (Some, Some) case, so the only real runtime scenario is (None, None). A clearer message would help users.

Suggested change

_ => Err(RunnerError::Custom(

"exactly one of --json or --json-outcome must be provided".to_string(),

)),

(None, None) => Err(RunnerError::Custom(

"one of --json or --json-outcome must be provided".to_string(),

)),

_ => Err(RunnerError::Custom(

"exactly one of --json or --json-outcome must be provided".to_string(),

)),

Prompt To Fix With AI

This is a comment left during a code review. Path: tooling/ef_tests/state_v2/src/modules/statetest.rs Line: 48-50 Comment: When neither `--json` nor `--json-outcome` is supplied (both are `Option`al so clap won't reject the invocation), this error fires with a message that sounds like "both were provided." Clap's `conflicts_with` already prevents the `(Some, Some)` case, so the only real runtime scenario is `(None, None)`. A clearer message would help users. ```suggestion (None, None) => Err(RunnerError::Custom( "one of --json or --json-outcome must be provided".to_string(), )), _ => Err(RunnerError::Custom( "exactly one of --json or --json-outcome must be provided".to_string(), )), ``` How can I resolve this? If you propose a fix, please make it concise.

greptile-apps · 2026-05-15T18:03:57Z

+    let account_updates = backends::levm::LEVM::get_state_transitions(&mut vm.db.clone())
+        .map_err(|e| RunnerError::FailedToGetAccountsUpdates(e.to_string()))?;
+    let computed_root = post_state_root(&account_updates, initial_block_hash, storage);


get_state_transitions called on a clone of vm.db

backends::levm::LEVM::get_state_transitions(&mut vm.db.clone()) clones the entire generalized database, then immediately passes the temporary clone by mutable reference and discards it. This mirrors the same pattern in result_check::check_root, so it is functionally correct, but the DB clone can be expensive for large pre-state fixtures. Worth noting as a candidate for a shared refactor once the run_case API stabilizes — no change needed now.

Prompt To Fix With AI

This is a comment left during a code review. Path: tooling/ef_tests/state_v2/src/modules/statetest.rs Line: 115-117 Comment: **`get_state_transitions` called on a clone of `vm.db`** `backends::levm::LEVM::get_state_transitions(&mut vm.db.clone())` clones the entire generalized database, then immediately passes the temporary clone by mutable reference and discards it. This mirrors the same pattern in `result_check::check_root`, so it is functionally correct, but the DB clone can be expensive for large pre-state fixtures. Worth noting as a candidate for a shared refactor once the `run_case` API stabilizes — no change needed now. How can I resolve this? If you propose a fix, please make it concise.

ElFantasma · 2026-05-15T19:03:07Z

Follow-up commit fa34b00bc addresses the three concrete bugs flagged across the reviews:

Fixed

Exit-code collision between mismatches and internal errors (Greptile P1, Codex High, Claude Medium). main now returns ExitCode directly. Any RunnerError propagated from the statetest path maps to ExitCode::from(2); deliberate state-root mismatches stay at 1; success stays at 0. goevmlab can now distinguish a tolerated differential from an actual crash.
Silent false-green on Osaka/BPO fixtures (Codex Medium). types.rs::DEFAULT_FORKS only allows Merge/Shanghai/Cancun/Prague/Amsterdam, so adding Osaka/BPO1–5 to the deserializer made those entries parse but Tests::from silently dropped them, leaving test_cases empty and producing exit 0 with no stateRoot emitted. statetest::run now errors (→ exit 2) when every test_cases list is empty.
--json / --json-outcome required-one-of constraint (Claude Minor, Greptile P2). StatetestOptions now uses #[group(required = true, multiple = false)] so clap rejects the invocation at parse time with a clear usage message, instead of the previous runtime RunnerError::Custom whose message was also misleading for the (None, None) case.

Not addressed (with rationale)

stateRoot zero-padding (Claude Medium): non-bug. H256's LowerHex impl writes every byte as {:02x}, always 64 hex chars regardless of leading zeros. Confirmed in the smoke test outputs — every line emitted is exactly 0x + 64 chars.
vm.db.clone() before get_state_transitions (Kimi, Claude, Codex): pre-existing pattern, mirroring result_check::check_root. Greptile correctly noted "no change needed now". A real refactor target but unrelated to this PR's scope.
Compound |= assignment nit (Claude): adopted in the new commit alongside the other changes (any_mismatch |= run_case(...)?;).
expectException not checked (Codex High): valid concern for direct EF-fixture use, but out of scope for the goevmlab CLI. goevmlab compares state roots across implementations and never feeds us expectException-bearing fixtures (its generator produces simple txs). The bulk runner still enforces this check via result_check::check_exception.
txbytes vs re-signed tx (Codex Open Question): re-using the existing get_tx_from_test_case helper that re-signs matches what state_v2's bulk runner does and what the issue's spec requested. Re-using raw txbytes is a separate ticket about test-runner accuracy, independent of the goevmlab interface.
Flattened RunnerOptions conflict (Claude Risk): RunnerOptions has no required fields, both subcommand and no-subcommand invocations work in smoke tests. Speculative; not reproducible.

…codeStep

…ction across EL clients in kurtosis

…spect

…fers from geth/besu)

…er newtypes OpcodeStep and OpcodeTraceResult become pure data — no Serialize impl. Wire formats live in newtypes: - StructLoggerStep / StructLoggerResult — the geth-RPC structLogger shape used by debug_traceTransaction (string op, no opName, decimal gas, stack:null when disabled). Verified live against geth and besu. - Eip3155Step — strict EIP-3155 spec shape used by streaming sinks feeding EIP-3155-conformant tooling (numeric op + opName, hex gas, stack:[] when disabled). debug_traceTransaction now wraps with StructLoggerResult. The EIP-3155 wrapper will be used by the statetest CLI subcommand (#6575) feeding goevmlab.

…a StructLoggerEmit Adds an explicit emit-options struct passed alongside OpcodeStep/OpcodeTraceResult to the structLogger wire serializers. memSize, returnData, and refund are now emitted only when the caller opts in. Defaults match geth's empirical debug_traceTransaction output (all three suppressed unless the corresponding capture flag is on). RPC handler maps OpcodeTracerConfig → StructLoggerEmit, so requesting enableMemory turns on memSize emission and enableReturnData turns on returnData. This eliminates the extra-fields divergence vs geth/besu on the same debug_traceTransaction call observed in cross-client comparisons.

…itempty behavior Geth's debug_traceTransaction emits the refund field whenever it's non-zero and suppresses it when zero (omitempty-on-uint pattern). Our wire output now matches this: refund is always emitted on non-zero steps regardless of the emit flag; the flag now controls only whether a zero-valued refund is force- emitted. Closes the last remaining systematic per-step diff vs geth on tracer output.

Match geth's structLogger behavior: each SLOAD/SSTORE step's storage field carries the full set of slots touched in the transaction so far, not just the slot being accessed at that step. Implementation: LevmOpcodeTracer carries a cumulative BTreeMap<H256, H256> updated in pre_step_capture whenever storage_kv is Some. Each SLOAD/SSTORE step's storage field is overwritten with a clone of that map. Synthesized steps (e.g. fused JUMPDEST) pass storage_kv: None and are unaffected. Closes the storage-side cross-client diff vs geth observed via the trace comparison tooling; per Edgar the non-cumulative behavior was unintentional.

…ructLogger timing Geth's OnOpcode hook fires after the opcode's gas+refund accounting has been applied but before state mutation, so the refund counter shown on an SSTORE (or pre-London SELFDESTRUCT) step reflects the credit that opcode just added. LEVM's pre_step_capture snapshotted the refund counter entirely pre-opcode, which made SSTORE-credited refunds appear one step late vs geth. Fix: extend finalize_step (already used to patch gas_cost post-handler) to also patch refund with the post-execution value (substate.refunded_gas). For opcodes that don't mutate refund — every opcode except SSTORE and pre-London SELFDESTRUCT — post == pre so this is a no-op. Closes #6672.

…uzzing

…enforce --json/--json-outcome at clap level

…p wrapper The OpcodeStep refactor on feat/eip-3155-tracer split the data type from the wire format: StructLoggerStep (geth-RPC, used by debug_traceTransaction) and Eip3155Step (strict spec, what goevmlab's gencodec opLog unmarshaler expects). Switch the statetest CLI sink to the latter.

parser::parse_file and parse_dir unconditionally .unwrap()ed on File::open, read_dir, file_type, and serde_json::from_reader, so any missing fixture or malformed JSON triggered an exit-101 panic. Under the statetest CLI this looked to goevmlab indistinguishable from a real client crash. Replace each unwrap with a RunnerError::ParseFixture { path, source } that the existing main.rs error handler maps to ExitCode::from(2). goevmlab now sees '2' for operator errors and reserves the higher codes for real crashes.

ElFantasma · 2026-05-19T20:12:35Z

Verification + parser hardening update

Rebased on the latest feat/eip-3155-tracer tip and verified end-to-end that the wire output matches what goevmlab's gen_oplog.go unmarshaler expects.

Commits now on the branch

feat(l1): add statetest subcommand to ef_tests-statev2 for goevmlab fuzzing — the original implementation.
fix(l1): distinguish statetest internal errors from root mismatches, enforce --json/--json-outcome at clap level — exit-code semantics + clap ArgGroup.
feat(l1): emit EIP-3155 wire shape from statetest sink via Eip3155Step wrapper — switches the per-step serializer to the new Eip3155Step newtype landed in feat(l1): opcodeTracer (EIP-3155 per-opcode tracer) #6595. Strict EIP-3155 shape: numeric op byte + separate opName, "0xN" hex gas/gasCost/refund, stack: [] (never null) when disabled.
fix(l1): propagate fixture parse errors instead of panicking — parser::parse_file / parse_dir previously .unwrap()-ed on file-not-found, malformed JSON, etc., producing an exit-101 panic that goevmlab would read as a real client crash. Now returns RunnerError::ParseFixture { path, source } which the main handler maps to exit code 2 with a clear message.

Verification matrix

Run against cancun/eip5656_mcopy/test_mcopy_on_empty_memory.json (16 cases, 288 trace steps, 16 stateRoot lines):

Check	Result
`--json` emits per-op JSONL + stateRoot to stderr	exit 0, 304 lines
`--json-outcome` emits only stateRoot to stderr	exit 0, 16/16 stateRoot lines
stdout silent in both modes	✓
Field types per step: `pc int, op int (0-255), opName str, gas "0xN", gasCost "0xN", refund "0xN", memSize int, depth int, stack [], returnData "0x..."`	10/10 fields correct
goevmlab `gen_oplog.go`-style UnmarshalJSON parse (Python mimic of `HexOrDecimal64` and `vm.OpCode`)	every line round-trips
stateRoot lines match the literal byte pattern goevmlab scans for (`"stateRoot":"0x<64 hex>"`)	✓
Storage accumulation across SSTOREs (inherited from #6595)	SSTORE #0: 1 slot · #1: 2 slots · #2: 3 slots · …
Missing fixture file	exit 2 (was exit 101)
Malformed JSON	exit 2 (was exit 101)
No-flag invocation	clap rejects with exit 2 + helpful message

Exit-1 (post-state root mismatch) path is straightforward (any_mismatch |= run_case(...)) and exercised by its dual (root-match → exit 0) above. Provoking a real mismatch needs an intentionally wrong fixture which we don't have lying around; happy to add a synthetic regression test if reviewers want it.

Example output

$ cargo run --release -p ef_tests-statev2 -- statetest --json fixture.json 2> trace.jsonl
$ head -2 trace.jsonl
{"pc":0,"op":96,"gas":"0xeee94","gasCost":"0x3","memSize":0,"stack":[],"depth":1,"returnData":"0x","refund":"0x0","opName":"PUSH1"}
{"pc":2,"op":53,"gas":"0xeee91","gasCost":"0x3","memSize":0,"stack":["0x40"],"depth":1,"returnData":"0x","refund":"0x0","opName":"CALLDATALOAD"}
$ grep -c stateRoot trace.jsonl
16

cargo clippy --all-targets -- -D warnings and cargo fmt --check clean.

Adds three regression tests around the wire contract our subcommand depends on: - eip3155_step_matches_goevmlab_oplog_shape: pins that Eip3155Step emits 'op' as a numeric byte, 'gas'/'gasCost' as '0xN' hex strings, 'stack' as a non-null array — the field types gen_oplog.go expects to unmarshal. - eip3155_step_stack_disabled_renders_as_empty_array: pins the EIP-3155 MUST rule that disabled stack capture renders as [] (not null). - stateroot_line_pins_literal_goevmlab_scan_pattern: pins the literal '"stateRoot":"0x<64 lowercase hex>"' substring goevmlab byte-scans for, and that H256 LowerHex zero-pads even small values. Extracts the stateRoot line formatter into a small helper so the test can assert on the string without reaching into eprintln!.

edg-l and others added 17 commits May 14, 2026 09:15

refactor(l1): use strict EIP-3155 step format instead of geth structL…

f4b4045

…ogLegacy

chore(test): add trailing newlines to EIP-3155 fixtures

bab41c0

refactor: rename StructLog types to OpcodeStep (opcodeTracer)

d3773e7

chore(perf): drop opcode_tracer disabled-path microbench

7af79f1

The microbench (PUSH1/POP loop, no `main` baseline, no enabled-path variant) only confirmed the disabled-path branch is hot-path-clean. It served as dev scaffolding; not worth carrying long-term.

refactor(l1): opcodeTracer review fixes

4f8f3f9

- omit `opName` for unknown opcodes (geth-compatible, EIP-3155 allows) - SLOAD: omit storage entry on any read failure, not just non-AccountNotFound - drop unused `addr` from `read_storage_for_trace` tuple

Merge branch 'main' into feat/eip-3155-tracer

cb8f2be

Merge branch 'main' into feat/eip-3155-tracer

d058237

ElFantasma requested a review from a team as a code owner May 15, 2026 17:58

github-actions Bot assigned ElFantasma May 15, 2026

github-actions Bot added the L1 Ethereum client label May 15, 2026

github-project-automation Bot added this to ethrex_l1 May 15, 2026

greptile-apps Bot reviewed May 15, 2026

View reviewed changes

ElFantasma and others added 2 commits May 15, 2026 16:13

Merge branch 'main' into feat/eip-3155-tracer

450f5c2

fix(common): emit EIP-3155-compliant op/opName/gas/stack fields in Op…

dc11a20

…codeStep

ElFantasma and others added 13 commits May 18, 2026 11:55

feat(tooling): add trace_compare helper for diffing debug_traceTransa…

c157f7a

…ction across EL clients in kurtosis

fix(tooling): make trace_compare script work on macOS's bash 3.2

7576dec

fix(tooling): parse service name from column 2 of kurtosis enclave in…

0197fc7

…spect

fix(tooling): explicitly request opcodeTracer for ethrex (default dif…

92c00ff

…fers from geth/besu)

Merge branch 'main' into feat/eip-3155-tracer

52c717c

feat(l1): add statetest subcommand to ef_tests-statev2 for goevmlab f…

3c6f9b0

…uzzing

fix(l1): distinguish statetest internal errors from root mismatches, …

d0efbb1

…enforce --json/--json-outcome at clap level

ElFantasma force-pushed the feat/6575-statetest-cli branch from fa34b00 to 7d640ba Compare May 19, 2026 18:38

iovoid approved these changes May 20, 2026

View reviewed changes

github-project-automation Bot moved this to In Review in ethrex_l1 May 20, 2026

Base automatically changed from feat/eip-3155-tracer to main May 20, 2026 21:09

ElFantasma added 2 commits May 21, 2026 08:21

Merge branch 'main' into feat/6575-statetest-cli

37be064

Merge branch 'main' into feat/6575-statetest-cli

8f202b5

ilitteri approved these changes May 21, 2026

View reviewed changes

azteca1998 approved these changes May 21, 2026

View reviewed changes

ElFantasma added this pull request to the merge queue May 21, 2026

Merged via the queue into main with commit 5b33379 May 21, 2026
55 checks passed

ElFantasma deleted the feat/6575-statetest-cli branch May 21, 2026 21:29

github-project-automation Bot moved this from In Review to Done in ethrex_l1 May 21, 2026

ElFantasma mentioned this pull request May 22, 2026

fix(l1): statetest CLI follow-ups (optional _info, Osaka, BLOCKHASH convention) #6721

Merged

1 task

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat(l1): add statetest subcommand to ef_tests-statev2 for goevmlab fuzzing#6663

feat(l1): add statetest subcommand to ef_tests-statev2 for goevmlab fuzzing#6663
ElFantasma merged 37 commits into
mainfrom
feat/6575-statetest-cli

ElFantasma commented May 15, 2026 •

edited

Loading

Uh oh!

github-actions Bot commented May 15, 2026 •

edited

Loading

Uh oh!

github-actions Bot commented May 15, 2026

Uh oh!

github-actions Bot commented May 15, 2026

Uh oh!

github-actions Bot commented May 15, 2026

Uh oh!

greptile-apps Bot commented May 15, 2026

Important Files Changed

Uh oh!

greptile-apps Bot May 15, 2026

Uh oh!

greptile-apps Bot May 15, 2026

Uh oh!

greptile-apps Bot May 15, 2026

Uh oh!

ElFantasma commented May 15, 2026

Uh oh!

ElFantasma commented May 19, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

5 participants

Conversation

ElFantasma commented May 15, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

github-actions Bot commented May 15, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

⚠️ Known Issues — intentionally skipped tests

Known Issues

EF Tests — Stateless coverage narrowed to EIP-8025 optional-proofs

Uh oh!

github-actions Bot commented May 15, 2026

🤖 Kimi Code Review

Issues and Suggestions

1. Unnecessary database clone (Performance)

2. Tuple return clarity (Readability)

3. Fork variant synchronization (Maintenance)

4. Error handling for JSON serialization (Robustness)

Uh oh!

github-actions Bot commented May 15, 2026

🤖 Claude Code Review

Code Review: feat(l1): add statetest subcommand to ef_tests-statev2 for goevmlab fuzzing

Overview

statetest.rs — New File

main.rs

deserialize.rs

Summary

Uh oh!

github-actions Bot commented May 15, 2026

🤖 Codex Code Review

Uh oh!

greptile-apps Bot commented May 15, 2026

Greptile Summary

Confidence Score: 3/5

Important Files Changed

Sequence Diagram

Uh oh!

greptile-apps Bot May 15, 2026

Choose a reason for hiding this comment

Uh oh!

greptile-apps Bot May 15, 2026

Choose a reason for hiding this comment

Uh oh!

greptile-apps Bot May 15, 2026

Choose a reason for hiding this comment

Uh oh!

ElFantasma commented May 15, 2026

Uh oh!

ElFantasma commented May 19, 2026

Verification + parser hardening update

Commits now on the branch

Verification matrix

Example output

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

5 participants

ElFantasma commented May 15, 2026 •

edited

Loading

github-actions Bot commented May 15, 2026 •

edited

Loading

Code Review: `feat(l1): add statetest subcommand to ef_tests-statev2 for goevmlab fuzzing`

`statetest.rs` — New File

`main.rs`

`deserialize.rs`