Skip to content

feat(l1): add statetest subcommand to ef_tests-statev2 for goevmlab fuzzing#6663

Merged
ElFantasma merged 37 commits into
mainfrom
feat/6575-statetest-cli
May 21, 2026
Merged

feat(l1): add statetest subcommand to ef_tests-statev2 for goevmlab fuzzing#6663
ElFantasma merged 37 commits into
mainfrom
feat/6575-statetest-cli

Conversation

@ElFantasma
Copy link
Copy Markdown
Contributor

@ElFantasma ElFantasma commented May 15, 2026

Motivation

goevmlab is the standard differential-fuzzing harness for EL clients (geth, nethermind, besu, erigon, nimbus, evmone, eels, revm). It runs the same EF state test through every configured EVM and diffs their per-opcode (EIP-3155) traces and post-state roots. To plug ethrex into that fleet, goevmlab needs a tiny CLI surface on ethrex that emits EIP-3155 trace JSONL plus a stateRoot line on stderr. This PR provides that.

Description

Adds a statetest subcommand to the existing ef_tests-statev2 binary. Two modes:

  • statetest --json <fixture>: emits EIP-3155 per-opcode JSONL on stderr, followed by one {"stateRoot":"0x..."} line per (fork, post-index) case in the fixture.
  • statetest --json-outcome <fixture>: same but without the per-opcode trace — only the stateRoot line(s).

Output goes exclusively to stderr (stdout reserved for crash diagnostics, matching geth/revm convention). Exit code 0 means every case's computed post-state root matched the fixture's expected root; 1 means at least one mismatch (tolerated by goevmlab); any other non-zero / panic indicates an actual crash.

The subcommand reuses the existing fixture parser, env/tx builders, and post-state-root computation from state_v2. The per-opcode trace is generated by LevmOpcodeTracer (added in #6595) — for each test case we activate the tracer, execute the tx, and serialize each OpcodeStep as one JSONL line. The geth-structLogger wrapper used by the RPC adapter is sidestepped: goevmlab expects raw JSONL.

Also rounds out the state_v2 fixture deserializer with the fork mappings for Osaka, Amsterdam, and BPO1BPO5. These variants are already in the Fork enum but were missing from the deserializer's match arm, so any fixture's post map containing them aborted parsing (the unmodified strict-error behavior is preserved for truly unknown fork names — adding a new variant remains a build break that has to be addressed).

Backwards-compatible: running the binary without a subcommand still triggers the existing bulk runner. Branch stacked on top of #6595 (feat/eip-3155-tracer) since the implementation depends on its LevmOpcodeTracer.

Closes #6575

How to Test

cd tooling

# full per-opcode trace + stateRoot per case
cargo run --release -p ef_tests-statev2 -- statetest --json \
  ef_tests/state/vectors/state_tests/cancun/eip5656_mcopy/test_mcopy_on_empty_memory.json \
  2> trace.jsonl
echo "exit=$?"
head -3 trace.jsonl     # per-opcode JSONL: {"pc":..., "op":..., ...}
grep stateRoot trace.jsonl   # final stateRoot line(s)

# outcome only
cargo run --release -p ef_tests-statev2 -- statetest --json-outcome \
  ef_tests/state/vectors/state_tests/cancun/eip5656_mcopy/test_mcopy_on_empty_memory.json \
  2>&1

Checklist

edg-l and others added 17 commits May 14, 2026 09:15
Add per-opcode StructLog tracing wired through `debug_traceTransaction` and
`debug_traceBlockByNumber`. Output is byte-compatible with geth's
`structLogLegacy` JSON shape so goevmlab and EELS reference traces diff
cleanly.

- `crates/common/tracing.rs`: `StructLog`, `MemoryChunk`, `StructLogResult`
  with manual Serialize impls matching geth's `toLegacyJSON` field-by-field
  (decimal pc/gas/gasCost, opcode-name op string, geth uint256 hex stack,
  chunked memory, accumulated per-contract storage at SLOAD/SSTORE,
  omitempty refund/error/returnData).
- `crates/vm/levm/src/struct_log_tracer.rs`: `LevmStructLogTracer` with
  zero-cost `active: bool` gate. `pre_step_capture` + `finalize_step` keep
  the dispatch-loop hook to one branch when disabled.
- `crates/vm/levm/src/vm.rs`: dispatch-loop hook (pc captured pre-advance,
  stack reversed top→bottom for wire format), helper methods, end-of-tx
  capture using post-refund `gas_spent` (matches geth's `receipt.GasUsed`).
- `crates/vm/levm/src/opcode_handlers/system.rs`: explicit `gasCost` for
  CALL/CALLCODE/DELEGATECALL/STATICCALL/CREATE/CREATE2 = `intrinsic +
  callGasTemp`, matching geth.
- `crates/vm/backends/levm/tracing.rs`, `crates/vm/tracing.rs`,
  `crates/blockchain/tracing.rs`: `trace_tx_struct_log` plumbing.
- `crates/networking/rpc/tracing.rs`: `TracerType::StructLogger` variant
  (alias `structLog`), `StructLogTracerConfig` with five geth-aligned
  flags. Default tracer remains `CallTracer` for Blockscout compat;
  documented divergence from geth.
- Tests: 21 unit tests pinning every wire-format rule against geth source;
  2 LEVM unit tests for dispatch+SSTORE; 3 fixture-diff integration tests
  (`eip3155_sstore_basic.json`, `eip3155_mstore_memory.json`,
  `eip3155_identity_return_data.json`).
- `tooling/scripts/gen_structlog_fixtures.sh`: pinned-geth regen procedure.
- `crates/vm/levm/benches/struct_log_disabled.rs`: Criterion microbench for
  the disabled hot path (~7.7 µs per 2k-opcode loop on dev machine).
Place the three structLog fixture JSONs alongside their integration tests
in the project's standard test crate (test/tests/levm/fixtures/) instead
of cmd/ethrex/tests/fixtures/. Updates the loader path in
struct_log_tracer_tests.rs and the regen-script doc comments.
Wire-format rules and per-opcode capture semantics are already pinned by
the 26 unit tests in ethrex-common and the 2 LEVM unit tests. The three
JSON fixtures + gen helper + geth-targeted shell script added ~600 LoC
of snapshot machinery for end-to-end coverage that one focused smoke
test can provide.

The smoke test exercises the full RPC pipeline
(`LEVM::trace_tx_struct_log` -> `serde_json::to_value`) on a
`PUSH1 PUSH1 SSTORE STOP` program and asserts the resulting JSON has
the EIP-3155 strict wrapper (`pass`/`gasUsed`/`output`/`structLogs`)
and per-step shape (numeric `op`, `opName`, hex `gas`/`gasCost`/`refund`,
always-present `returnData`/`memSize`, single-entry `storage` on SSTORE).

Removes:
- test/tests/levm/fixtures/*.json (3 files)
- test/tests/levm/struct_log_fixture_gen.rs
- tooling/scripts/gen_structlog_fixtures.sh (geth comparison script,
  obsolete after the move to EIP-3155 strict)
"structLog" inherits geth's Go-type jargon and now misleads consumers:
since this PR moved to strict EIP-3155 output, clients sending
`tracer: "structLog"` expecting geth's structLogLegacy shape get a
different format. The new name is self-describing and sits naturally
beside `callTracer` and `prestateTracer`. No aliases — clients pass
`"opcodeTracer"` explicitly.
Match the prestateTracer/callTracer convention: all tracer tests live
under `test/tests/levm/`, none in `crates/`.

- Deletes the 26 unit tests in `crates/common/tracing.rs` (Serialize
  field-by-field assertions). The single-field tests were dev scaffolding;
  end-to-end coverage now lives in `opcode_tracer_tests.rs`.
- Deletes the 2 LEVM unit tests + in-tracer TestDb harness in
  `crates/vm/levm/src/opcode_tracer.rs`. Same coverage rebuilt as
  bytecode-driven e2e tests via the real `LEVM::trace_tx_opcodes` entry
  point and the shared `TestDatabase` fixture.
- Renames `struct_log_tracer_tests.rs` -> `opcode_tracer_tests.rs` to
  match the existing `prestate_tracer_tests.rs` naming.
- New e2e set covers: basic execution + wrapper shape, single-entry
  storage on SSTORE, memory capture under `enableMemory`, return-data
  capture under `enableReturnData`, stack=null under `disableStack`.
The microbench (PUSH1/POP loop, no `main` baseline, no enabled-path
variant) only confirmed the disabled-path branch is hot-path-clean.
It served as dev scaffolding; not worth carrying long-term.
Renaming `StructLogger` to `OpcodeTracer` made all three variants share
the `Tracer` suffix, which clippy flags via `enum_variant_names`. The
suffix is required because `rename_all = "camelCase"` derives the
externally-fixed wire names `callTracer` / `prestateTracer` /
`opcodeTracer` from the variant identifiers.
- Add CLZ (0x1E) and SLOTNUM (0x4B) names to opcode_name.
- Drop redundant OpcodeTracerRpcConfig; deserialize OpcodeTracerConfig directly.
- Replace total_size counter with last_step_captured flag so finalize_step
  doesn't clobber the last retained entry once the limit cap is hit.
- Use call frame `to` (storage context) instead of `code_address` for
  SLOAD/SSTORE capture so DELEGATECALL/CALLCODE record under the caller's
  account.
- Replace unsafe transmute-based U256->H256 with BigEndianHash::from_uint.
- Narrow SLOAD error fallback: only AccountNotFound returns zero; other
  DB errors omit the storage entry instead of fabricating a value.
- omit `opName` for unknown opcodes (geth-compatible, EIP-3155 allows)
- SLOAD: omit storage entry on any read failure, not just non-AccountNotFound
- drop unused `addr` from `read_storage_for_trace` tuple
Align debug_traceTransaction output with the cross-client structLogger shape:
{failed, gas, returnValue, structLogs}; per-step gas/gasCost/refund as numbers,
op as the mnemonic string (opName dropped). EIP-3155 step content preserved
(memSize, returnData, refund always emitted).

Fix: jump() fused JUMP/JUMPI with the destination JUMPDEST, dropping the
JUMPDEST step from the trace and inflating the parent's gasCost by 1. Now
synthesizes a JUMPDEST entry when the tracer is active; the disabled hot path
keeps the fusion. last_step_captured replaced with last_step_index so the
synthetic entry doesn't shadow the parent's finalize_step patch target.
…ernode

- ethereum-package pinned at e4b3305 (2025-04) -> 71b02f6 (current main).
  Required to pick up the besu launcher fix that drops CLIQUE from
  --rpc-http-api (besu 26.x removed the namespace).
- geth v1.15.2 -> v1.17.3; besu main-142a5e6 -> main-6d54451;
  lighthouse v8.0.0-rc.1 -> v8.1.3 (the rc is over a year old).
- supernode: true on the ethrex participant. With Fulu at epoch 0,
  the package now requires at least one supernode, a node with 128+
  validators, or perfect_peerdas_enabled.
…PDEST

`build_jumpdest_step` was duplicating ~50 lines of conditional stack/memory/
return-data construction already present in `LevmOpcodeTracer::pre_step_capture`.
Extract a shared `opcode_tracer::build_step` free function that takes the cfg
and the raw inputs, leaving both call sites as thin "collect VM state, hand
off" wrappers.

Also documents that `opcode_name`'s table is intentionally fork-agnostic
(matching geth's `opCodeToString`) — fork validity is enforced at dispatch
via the VM's per-fork opcode table, so the name lookup never fires for
invalid-for-this-fork bytes in practice.

Addresses iovoid's review comments on PR #6595.
@ElFantasma ElFantasma requested a review from a team as a code owner May 15, 2026 17:58
@github-actions
Copy link
Copy Markdown

github-actions Bot commented May 15, 2026

⚠️ Known Issues — intentionally skipped tests

Source: docs/known_issues.md

Known Issues

Tests intentionally excluded from CI. Source of truth for the Known
Issues
section the L1 workflow appends to each ef-tests job summary
and posts as a sticky PR comment.

EF Tests — Stateless coverage narrowed to EIP-8025 optional-proofs

make -C tooling/ef_tests/blockchain test calls test-stateless-zkevm
instead of test-stateless. The zkevm@v0.3.3 fixtures are filled against
bal@v5.6.1, out of sync with current bal spec; the broad target trips ~549
fixtures. Re-broaden once the zkevm bundle is regenerated.

Why and resolution path

PR #6527 broadened
test-stateless to extract the entire for_amsterdam/ tree from the
zkevm bundle and run all of it under --features stateless; combined with
this branch's bal-devnet-7 semantics that scope produces ~549
GasUsedMismatch / ReceiptsRootMismatch /
BlockAccessListHashMismatch failures.

test-stateless-zkevm filters cargo to the eip8025_optional_proofs
suite, which still validates the stateless harness without the bal-version
mismatch.

Re-broaden by switching test: back to test-stateless in
tooling/ef_tests/blockchain/Makefile once the zkevm bundle is regenerated
against the current bal spec.

@github-actions github-actions Bot added the L1 Ethereum client label May 15, 2026
@github-actions
Copy link
Copy Markdown

🤖 Kimi Code Review

Overall this is a well-structured PR that correctly implements the goevmlab differential fuzzing interface. The code follows Rust idioms and properly handles the Ethereum state test semantics.

Issues and Suggestions

1. Unnecessary database clone (Performance)

File: tooling/ef_tests/state_v2/src/modules/statetest.rs
Line: 115

let account_updates = backends::levm::LEVM::get_state_transitions(&mut vm.db.clone())

The vm.db.clone() creates a full copy of the database state. Since vm is not used after this point (the opcode tracer logs are accessed earlier at lines 100-104), you likely don't need to clone unless get_state_transitions consumes self by value.

Recommendation: If get_state_transitions only requires &mut self or &self, remove the .clone() to avoid the memory overhead. If it does consume self, add a comment explaining why the clone is necessary (to preserve vm for the trace access above).

2. Tuple return clarity (Readability)

File: tooling/ef_tests/state_v2/src/modules/statetest.rs
Line: 44

The fixture_path method returns Result<(&PathBuf, bool), RunnerError>. While the boolean represents emit_trace, this is not immediately obvious at call sites.

Recommendation: Consider returning a small struct or adding a type alias for clarity:

struct FixtureConfig<'a> {
    path: &'a PathBuf,
    emit_trace: bool,
}

3. Fork variant synchronization (Maintenance)

File: tooling/ef_tests/state_v2/src/modules/deserialize.rs
Lines: 143-168

The new fork names ("Osaka", "BPO1"–"BPO5", "Amsterdam") are added with a comment to keep them in sync with crates/common/types/genesis.rs.

Recommendation: Ensure the corresponding Fork enum in genesis.rs actually contains these variants (particularly the "BPO" series which appear to be temporary testnet identifiers). If any are missing, this will fail to compile.

4. Error handling for JSON serialization (Robustness)

File: tooling/ef_tests/state_v2/src/modules/statetest.rs
Line: 102-103

let line = serde_json::to_string(step)
    .map_err(|e| RunnerError::Custom(format!("failed to serialize trace step: {e}")))?;

This correctly handles serialization failures. However, if a single step fails to serialize, the entire test run fails.

Recommendation: This is acceptable for fuzzing


Automated review by Kimi (Moonshot AI) · kimi-k2.5 · custom prompt

@github-actions
Copy link
Copy Markdown

🤖 Claude Code Review

Here is my code review for PR #6663:


Code Review: feat(l1): add statetest subcommand to ef_tests-statev2 for goevmlab fuzzing

Overview

This PR adds a statetest subcommand to the ef_tests-statev2 binary for goevmlab differential fuzzing. The approach is sound and the implementation is clean overall. A few issues are worth addressing before merge.


statetest.rs — New File

Exit code semantics mismatch (design issue)

The module doc comment promises:

  • 0 = all cases match
  • 1 = at least one mismatch
  • other = crash

However, when run_case (or opts.fixture_path()) returns an Err, it propagates through main's Result<ExitCode, RunnerError>. Rust's Termination impl for Result exits with code 1 when Err is returned, printing the error to stderr. This means parse errors and VM setup errors are indistinguishable from a state-root mismatch to goevmlab — both exit with 1.

Only a bare panic! would give a non-1 exit code. If goevmlab relies on other codes to detect infrastructure failures versus legitimate mismatches, this is a real bug. Consider:

pub async fn run(opts: StatetestOptions) -> ExitCode {
    match inner_run(opts).await {
        Ok(false) => ExitCode::SUCCESS,
        Ok(true) => ExitCode::from(1),
        Err(e) => {
            eprintln!("error: {e}");
            ExitCode::from(2)  // distinguishable from mismatch
        }
    }
}

statetest.rs:37–44 — Missing required-arg constraint at the clap level

Neither --json nor --json-outcome is marked required. Users invoking statetest with no path get a generic RunnerError::Custom at runtime rather than a clap usage error. Consider using an ArgGroup:

#[derive(Args, Debug)]
#[group(required = true, multiple = false)]
pub struct StatetestOptions {
    #[arg(long, value_name = "PATH")]
    pub json: Option<PathBuf>,
    #[arg(long, value_name = "PATH")]
    pub json_outcome: Option<PathBuf>,
}

This gives a proper clap error message with --help guidance.


statetest.rs:96 — Potential leading-zero truncation in stateRoot output

eprintln!("{{\"stateRoot\":\"0x{computed_root:x}\"}}");

If computed_root is of a type whose LowerHex impl does not zero-pad to 64 characters (e.g., a plain U256 or [u8; 32] using a non-padded formatter), the output may silently drop leading zero bytes. goevmlab (and EIP-3155) expect the full 64-character hex representation. Verify that the LowerHex impl for computed_root's type always emits 64 hex characters, or force it explicitly:

eprintln!("{{\"stateRoot\":\"0x{computed_root:064x}\"}}");

The flag 64x would be wrong if the value is already 32 bytes printed as 64 nibbles by the type's formatter, so check what type post_state_root returns.


statetest.rs:91 — Unnecessary clone of the entire database

let account_updates = backends::levm::LEVM::get_state_transitions(&mut vm.db.clone())

This clones the full in-memory DB just to satisfy the &mut requirement. For state tests with large initial state, this is a significant allocation that is immediately discarded. If get_state_transitions cannot take an immutable reference (an API limitation), at least document why the clone is necessary. If it can be changed to &mut vm.db directly, prefer that.


statetest.rs:62–65 — Minor: prefer compound assignment

let mismatch = run_case(test, test_case, emit_trace).await?;
if mismatch {
    any_mismatch = true;
}

More idiomatic:

any_mismatch |= run_case(test, test_case, emit_trace).await?;

main.rs

Potential conflict between flattened RunnerOptions and statetest subcommand

RunnerOptions is #[command(flatten)]-ed into Cli alongside an Option<Command>. If RunnerOptions has any required fields (positional args or required --flags), clap will demand them even when the statetest subcommand is used, making the UX very awkward.

This may already work if all RunnerOptions fields are optional, but it deserves an explicit note or test. If they conflict, the cleaner design is to not flatten RunnerOptions into the top-level Cli struct, and instead only accept them in the None branch (perhaps via a separate Run subcommand).


deserialize.rs

The reordering of EIP150, EIP158, Byzantium to be in chronological order before Constantinople is a clean improvement. The new fork variants (Osaka, BPO1BPO5, Amsterdam) and the comment explaining the "hard error" policy for unknown forks are both correct and well-documented. No issues here.


Summary

Severity Location Item
Medium statetest.rs:56 / main.rs:31 Exit codes 1 (mismatch) and 1 (returned error) are conflated
Medium statetest.rs:96 {:x} may truncate leading zeros on stateRoot
Minor statetest.rs:37 No clap-level required constraint on --json / --json-outcome
Minor statetest.rs:91 Unnecessary full DB clone
Nit statetest.rs:62 Use `
Risk main.rs:14 Flattened RunnerOptions may force unwanted args when using statetest

The deserialize.rs and mod.rs changes are clean and correct. The overall structure is good — the main concerns are the exit-code semantics and the stateRoot hex formatting.


Automated review by Claude (Anthropic) · sonnet · custom prompt

@github-actions
Copy link
Copy Markdown

🤖 Codex Code Review

Findings

  1. High: statetest.rs:103 discards vm.execute() entirely and statetest.rs:121 treats “matching stateRoot” as the only success criterion. That diverges from the existing EF semantics in result_check.rs:73 and result_check.rs:132: expectException cases must match the expected error, not just leave the same post-state. As written, a wrong rejection that leaves state untouched will false-pass.

  2. High: the same swallowed result at statetest.rs:103 also suppresses genuine VM bugs. VMError::Internal is explicitly marked as something that should propagate in errors.rs:22, but this path downgrades it to a normal “root mismatch” outcome instead of the documented “actual crash” exit class.

  3. Medium: deserialize.rs:160 adds Osaka/BPO1..5 parsing, but types.rs:30 and types.rs:87 still filter runnable forks to Merge/Shanghai/Cancun/Prague/Amsterdam. For an Osaka/BPO fixture, statetest.rs:59 will execute zero cases and return success with no stateRoot, which is a silent false green.

Open Question

Compile verification was not possible in this sandbox because Cargo needed write access under /home/runner/.cargo and the libssz git dependency was not available locally.


Automated review by OpenAI Codex · gpt-5.4 · custom prompt

@greptile-apps
Copy link
Copy Markdown

greptile-apps Bot commented May 15, 2026

Greptile Summary

This PR plugs ethrex into the goevmlab differential fuzzing harness by adding a statetest subcommand to ef_tests-statev2. It also rounds out the fixture deserializer with missing fork mappings (Osaka, Amsterdam, BPO1BPO5).

  • statetest subcommand (statetest.rs, main.rs): accepts --json (full EIP-3155 per-opcode JSONL + stateRoot) or --json-outcome (stateRoot only) and emits everything to stderr; exit code 0 = all roots match, 1 = at least one mismatch.
  • Deserializer extension (deserialize.rs): adds seven new fork-name → Fork enum mappings and reorders the pre-existing EIP150/EIP158/Byzantium entries into chronological order; strictly additive with no behavior change for existing fixtures.

Confidence Score: 3/5

The core logic and goevmlab wire protocol are correct, but there is a concrete defect in how the subcommand signals errors back to the fuzzer.

When any RunnerError is returned via ? in run_case — a VM setup failure, a failed state-transition call, or a bad fixture argument — Rust's Termination trait exits with code 1, the same code deliberately chosen for state-root mismatches. goevmlab interprets exit code 1 as a tolerated differential result and would silently absorb real execution errors rather than flagging them as crashes. The rest of the PR (deserializer additions, clap wiring, trace emission order) is straightforward and low-risk.

tooling/ef_tests/state_v2/src/modules/statetest.rs — specifically the run() and run_case() error propagation paths.

Important Files Changed

Filename Overview
tooling/ef_tests/state_v2/src/modules/statetest.rs New file implementing the goevmlab statetest subcommand; exit code 1 is used for both state-root mismatches and propagated RunnerErrors, collapsing the crash/mismatch distinction the interface contract requires.
tooling/ef_tests/state_v2/src/main.rs Clean addition of optional statetest subcommand via clap; backwards-compatible with the existing bulk runner path.
tooling/ef_tests/state_v2/src/modules/deserialize.rs Adds Osaka, BPO1–5, and Amsterdam fork mappings; reorders EIP150/EIP158/Byzantium to appear in chronological order. No logic changes, strictly additive.
tooling/ef_tests/state_v2/src/modules/mod.rs Trivial: exports the new statetest module.

Sequence Diagram

sequenceDiagram
    participant goevmlab
    participant ef_tests_binary as ef_tests-statev2
    participant statetest as statetest::run()
    participant parse as parse_file()
    participant run_case as run_case()
    participant VM as LEVM VM
    participant psr as post_state_root()

    goevmlab->>ef_tests_binary: statetest --json fixture
    ef_tests_binary->>statetest: run(StatetestOptions)
    statetest->>parse: parse_file(path, false)
    parse-->>statetest: "Vec<Test>"
    loop for each (test, test_case)
        statetest->>run_case: run_case(test, test_case, emit_trace)
        run_case->>VM: VM::new(env, db, tx, ...)
        run_case->>VM: "vm.opcode_tracer = LevmOpcodeTracer::new(...)"
        run_case->>VM: vm.execute()
        VM-->>run_case: execution result (ignored)
        Note over run_case: emit opcode JSONL steps to stderr
        run_case->>psr: post_state_root(account_updates, ...)
        psr-->>run_case: computed H256
        Note over run_case: eprintln stateRoot line to stderr
        run_case-->>statetest: mismatch: bool
    end
    statetest-->>ef_tests_binary: ExitCode 0 or 1
    ef_tests_binary-->>goevmlab: exit code + stderr JSONL + stateRoot lines
Loading
Prompt To Fix All With AI
Fix the following 3 code review issues. Work through them one at a time, proposing concise fixes.

---

### Issue 1 of 3
tooling/ef_tests/state_v2/src/modules/statetest.rs:55-73
**Exit code collision between errors and mismatches**

When `run_case` returns `Err(RunnerError)` — e.g., `VMError` from `VM::new()`, `FailedToGetAccountsUpdates` from `get_state_transitions`, or a `RunnerError::Custom` from `fixture_path()` — Rust's `Termination` impl for `Result<ExitCode, E: Debug>` emits `ExitCode::FAILURE` which is `1` on every platform. This is identical to the `ExitCode::from(1)` path for state-root mismatches. The PR documentation says exit code `1` means "mismatch (tolerated)" and any higher value means "actual crash," but actual crashes via `?`-propagated errors are silently folded into the mismatch bucket. goevmlab would treat a VM setup failure or state-transition error the same way it treats a normal differential result, hiding real bugs from the fuzzer.

### Issue 2 of 3
tooling/ef_tests/state_v2/src/modules/statetest.rs:48-50
When neither `--json` nor `--json-outcome` is supplied (both are `Option`al so clap won't reject the invocation), this error fires with a message that sounds like "both were provided." Clap's `conflicts_with` already prevents the `(Some, Some)` case, so the only real runtime scenario is `(None, None)`. A clearer message would help users.

```suggestion
            (None, None) => Err(RunnerError::Custom(
                "one of --json or --json-outcome must be provided".to_string(),
            )),
            _ => Err(RunnerError::Custom(
                "exactly one of --json or --json-outcome must be provided".to_string(),
            )),
```

### Issue 3 of 3
tooling/ef_tests/state_v2/src/modules/statetest.rs:115-117
**`get_state_transitions` called on a clone of `vm.db`**

`backends::levm::LEVM::get_state_transitions(&mut vm.db.clone())` clones the entire generalized database, then immediately passes the temporary clone by mutable reference and discards it. This mirrors the same pattern in `result_check::check_root`, so it is functionally correct, but the DB clone can be expensive for large pre-state fixtures. Worth noting as a candidate for a shared refactor once the `run_case` API stabilizes — no change needed now.

Reviews (1): Last reviewed commit: "feat(l1): add statetest subcommand to ef..." | Re-trigger Greptile

Comment on lines +55 to +73
pub async fn run(opts: StatetestOptions) -> Result<ExitCode, RunnerError> {
let (path, emit_trace) = opts.fixture_path()?;
let tests = parse_file(path, false)?;

let mut any_mismatch = false;
for test in &tests {
for test_case in &test.test_cases {
let mismatch = run_case(test, test_case, emit_trace).await?;
if mismatch {
any_mismatch = true;
}
}
}

Ok(if any_mismatch {
ExitCode::from(1)
} else {
ExitCode::SUCCESS
})
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P1 Exit code collision between errors and mismatches

When run_case returns Err(RunnerError) — e.g., VMError from VM::new(), FailedToGetAccountsUpdates from get_state_transitions, or a RunnerError::Custom from fixture_path() — Rust's Termination impl for Result<ExitCode, E: Debug> emits ExitCode::FAILURE which is 1 on every platform. This is identical to the ExitCode::from(1) path for state-root mismatches. The PR documentation says exit code 1 means "mismatch (tolerated)" and any higher value means "actual crash," but actual crashes via ?-propagated errors are silently folded into the mismatch bucket. goevmlab would treat a VM setup failure or state-transition error the same way it treats a normal differential result, hiding real bugs from the fuzzer.

Prompt To Fix With AI
This is a comment left during a code review.
Path: tooling/ef_tests/state_v2/src/modules/statetest.rs
Line: 55-73

Comment:
**Exit code collision between errors and mismatches**

When `run_case` returns `Err(RunnerError)` — e.g., `VMError` from `VM::new()`, `FailedToGetAccountsUpdates` from `get_state_transitions`, or a `RunnerError::Custom` from `fixture_path()` — Rust's `Termination` impl for `Result<ExitCode, E: Debug>` emits `ExitCode::FAILURE` which is `1` on every platform. This is identical to the `ExitCode::from(1)` path for state-root mismatches. The PR documentation says exit code `1` means "mismatch (tolerated)" and any higher value means "actual crash," but actual crashes via `?`-propagated errors are silently folded into the mismatch bucket. goevmlab would treat a VM setup failure or state-transition error the same way it treats a normal differential result, hiding real bugs from the fuzzer.

How can I resolve this? If you propose a fix, please make it concise.

Comment on lines +48 to +50
_ => Err(RunnerError::Custom(
"exactly one of --json or --json-outcome must be provided".to_string(),
)),
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 When neither --json nor --json-outcome is supplied (both are Optional so clap won't reject the invocation), this error fires with a message that sounds like "both were provided." Clap's conflicts_with already prevents the (Some, Some) case, so the only real runtime scenario is (None, None). A clearer message would help users.

Suggested change
_ => Err(RunnerError::Custom(
"exactly one of --json or --json-outcome must be provided".to_string(),
)),
(None, None) => Err(RunnerError::Custom(
"one of --json or --json-outcome must be provided".to_string(),
)),
_ => Err(RunnerError::Custom(
"exactly one of --json or --json-outcome must be provided".to_string(),
)),
Prompt To Fix With AI
This is a comment left during a code review.
Path: tooling/ef_tests/state_v2/src/modules/statetest.rs
Line: 48-50

Comment:
When neither `--json` nor `--json-outcome` is supplied (both are `Option`al so clap won't reject the invocation), this error fires with a message that sounds like "both were provided." Clap's `conflicts_with` already prevents the `(Some, Some)` case, so the only real runtime scenario is `(None, None)`. A clearer message would help users.

```suggestion
            (None, None) => Err(RunnerError::Custom(
                "one of --json or --json-outcome must be provided".to_string(),
            )),
            _ => Err(RunnerError::Custom(
                "exactly one of --json or --json-outcome must be provided".to_string(),
            )),
```

How can I resolve this? If you propose a fix, please make it concise.

Comment on lines +115 to +117
let account_updates = backends::levm::LEVM::get_state_transitions(&mut vm.db.clone())
.map_err(|e| RunnerError::FailedToGetAccountsUpdates(e.to_string()))?;
let computed_root = post_state_root(&account_updates, initial_block_hash, storage);
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 get_state_transitions called on a clone of vm.db

backends::levm::LEVM::get_state_transitions(&mut vm.db.clone()) clones the entire generalized database, then immediately passes the temporary clone by mutable reference and discards it. This mirrors the same pattern in result_check::check_root, so it is functionally correct, but the DB clone can be expensive for large pre-state fixtures. Worth noting as a candidate for a shared refactor once the run_case API stabilizes — no change needed now.

Prompt To Fix With AI
This is a comment left during a code review.
Path: tooling/ef_tests/state_v2/src/modules/statetest.rs
Line: 115-117

Comment:
**`get_state_transitions` called on a clone of `vm.db`**

`backends::levm::LEVM::get_state_transitions(&mut vm.db.clone())` clones the entire generalized database, then immediately passes the temporary clone by mutable reference and discards it. This mirrors the same pattern in `result_check::check_root`, so it is functionally correct, but the DB clone can be expensive for large pre-state fixtures. Worth noting as a candidate for a shared refactor once the `run_case` API stabilizes — no change needed now.

How can I resolve this? If you propose a fix, please make it concise.

@ElFantasma
Copy link
Copy Markdown
Contributor Author

Follow-up commit fa34b00bc addresses the three concrete bugs flagged across the reviews:

Fixed

  • Exit-code collision between mismatches and internal errors (Greptile P1, Codex High, Claude Medium). main now returns ExitCode directly. Any RunnerError propagated from the statetest path maps to ExitCode::from(2); deliberate state-root mismatches stay at 1; success stays at 0. goevmlab can now distinguish a tolerated differential from an actual crash.
  • Silent false-green on Osaka/BPO fixtures (Codex Medium). types.rs::DEFAULT_FORKS only allows Merge/Shanghai/Cancun/Prague/Amsterdam, so adding Osaka/BPO1–5 to the deserializer made those entries parse but Tests::from silently dropped them, leaving test_cases empty and producing exit 0 with no stateRoot emitted. statetest::run now errors (→ exit 2) when every test_cases list is empty.
  • --json / --json-outcome required-one-of constraint (Claude Minor, Greptile P2). StatetestOptions now uses #[group(required = true, multiple = false)] so clap rejects the invocation at parse time with a clear usage message, instead of the previous runtime RunnerError::Custom whose message was also misleading for the (None, None) case.

Not addressed (with rationale)

  • stateRoot zero-padding (Claude Medium): non-bug. H256's LowerHex impl writes every byte as {:02x}, always 64 hex chars regardless of leading zeros. Confirmed in the smoke test outputs — every line emitted is exactly 0x + 64 chars.
  • vm.db.clone() before get_state_transitions (Kimi, Claude, Codex): pre-existing pattern, mirroring result_check::check_root. Greptile correctly noted "no change needed now". A real refactor target but unrelated to this PR's scope.
  • Compound |= assignment nit (Claude): adopted in the new commit alongside the other changes (any_mismatch |= run_case(...)?;).
  • expectException not checked (Codex High): valid concern for direct EF-fixture use, but out of scope for the goevmlab CLI. goevmlab compares state roots across implementations and never feeds us expectException-bearing fixtures (its generator produces simple txs). The bulk runner still enforces this check via result_check::check_exception.
  • txbytes vs re-signed tx (Codex Open Question): re-using the existing get_tx_from_test_case helper that re-signs matches what state_v2's bulk runner does and what the issue's spec requested. Re-using raw txbytes is a separate ticket about test-runner accuracy, independent of the goevmlab interface.
  • Flattened RunnerOptions conflict (Claude Risk): RunnerOptions has no required fields, both subcommand and no-subcommand invocations work in smoke tests. Speculative; not reproducible.

ElFantasma and others added 13 commits May 18, 2026 11:55
…er newtypes

OpcodeStep and OpcodeTraceResult become pure data — no Serialize impl. Wire
formats live in newtypes:

- StructLoggerStep / StructLoggerResult — the geth-RPC structLogger shape used
  by debug_traceTransaction (string op, no opName, decimal gas, stack:null when
  disabled). Verified live against geth and besu.
- Eip3155Step — strict EIP-3155 spec shape used by streaming sinks feeding
  EIP-3155-conformant tooling (numeric op + opName, hex gas, stack:[] when
  disabled).

debug_traceTransaction now wraps with StructLoggerResult. The EIP-3155 wrapper
will be used by the statetest CLI subcommand (#6575) feeding goevmlab.
…a StructLoggerEmit

Adds an explicit emit-options struct passed alongside OpcodeStep/OpcodeTraceResult
to the structLogger wire serializers. memSize, returnData, and refund are now
emitted only when the caller opts in. Defaults match geth's empirical
debug_traceTransaction output (all three suppressed unless the corresponding
capture flag is on).

RPC handler maps OpcodeTracerConfig → StructLoggerEmit, so requesting
enableMemory turns on memSize emission and enableReturnData turns on returnData.

This eliminates the extra-fields divergence vs geth/besu on the same
debug_traceTransaction call observed in cross-client comparisons.
…itempty behavior

Geth's debug_traceTransaction emits the refund field whenever it's non-zero
and suppresses it when zero (omitempty-on-uint pattern). Our wire output now
matches this: refund is always emitted on non-zero steps regardless of the
emit flag; the flag now controls only whether a zero-valued refund is force-
emitted.

Closes the last remaining systematic per-step diff vs geth on tracer output.
Match geth's structLogger behavior: each SLOAD/SSTORE step's storage field
carries the full set of slots touched in the transaction so far, not just
the slot being accessed at that step.

Implementation: LevmOpcodeTracer carries a cumulative BTreeMap<H256, H256>
updated in pre_step_capture whenever storage_kv is Some. Each SLOAD/SSTORE
step's storage field is overwritten with a clone of that map. Synthesized
steps (e.g. fused JUMPDEST) pass storage_kv: None and are unaffected.

Closes the storage-side cross-client diff vs geth observed via the trace
comparison tooling; per Edgar the non-cumulative behavior was unintentional.
…ructLogger timing

Geth's OnOpcode hook fires after the opcode's gas+refund accounting has been
applied but before state mutation, so the refund counter shown on an SSTORE
(or pre-London SELFDESTRUCT) step reflects the credit that opcode just added.
LEVM's pre_step_capture snapshotted the refund counter entirely pre-opcode,
which made SSTORE-credited refunds appear one step late vs geth.

Fix: extend finalize_step (already used to patch gas_cost post-handler) to
also patch refund with the post-execution value (substate.refunded_gas). For
opcodes that don't mutate refund — every opcode except SSTORE and pre-London
SELFDESTRUCT — post == pre so this is a no-op.

Closes #6672.
…p wrapper

The OpcodeStep refactor on feat/eip-3155-tracer split the data type from
the wire format: StructLoggerStep (geth-RPC, used by debug_traceTransaction)
and Eip3155Step (strict spec, what goevmlab's gencodec opLog unmarshaler
expects). Switch the statetest CLI sink to the latter.
@ElFantasma ElFantasma force-pushed the feat/6575-statetest-cli branch from fa34b00 to 7d640ba Compare May 19, 2026 18:38
parser::parse_file and parse_dir unconditionally .unwrap()ed on File::open,
read_dir, file_type, and serde_json::from_reader, so any missing fixture or
malformed JSON triggered an exit-101 panic. Under the statetest CLI this looked
to goevmlab indistinguishable from a real client crash.

Replace each unwrap with a RunnerError::ParseFixture { path, source } that the
existing main.rs error handler maps to ExitCode::from(2). goevmlab now sees
'2' for operator errors and reserves the higher codes for real crashes.
@ElFantasma
Copy link
Copy Markdown
Contributor Author

Verification + parser hardening update

Rebased on the latest feat/eip-3155-tracer tip and verified end-to-end that the wire output matches what goevmlab's gen_oplog.go unmarshaler expects.

Commits now on the branch

  1. feat(l1): add statetest subcommand to ef_tests-statev2 for goevmlab fuzzing — the original implementation.
  2. fix(l1): distinguish statetest internal errors from root mismatches, enforce --json/--json-outcome at clap level — exit-code semantics + clap ArgGroup.
  3. feat(l1): emit EIP-3155 wire shape from statetest sink via Eip3155Step wrapper — switches the per-step serializer to the new Eip3155Step newtype landed in feat(l1): opcodeTracer (EIP-3155 per-opcode tracer) #6595. Strict EIP-3155 shape: numeric op byte + separate opName, "0xN" hex gas/gasCost/refund, stack: [] (never null) when disabled.
  4. fix(l1): propagate fixture parse errors instead of panickingparser::parse_file / parse_dir previously .unwrap()-ed on file-not-found, malformed JSON, etc., producing an exit-101 panic that goevmlab would read as a real client crash. Now returns RunnerError::ParseFixture { path, source } which the main handler maps to exit code 2 with a clear message.

Verification matrix

Run against cancun/eip5656_mcopy/test_mcopy_on_empty_memory.json (16 cases, 288 trace steps, 16 stateRoot lines):

Check Result
--json emits per-op JSONL + stateRoot to stderr exit 0, 304 lines
--json-outcome emits only stateRoot to stderr exit 0, 16/16 stateRoot lines
stdout silent in both modes
Field types per step: pc int, op int (0-255), opName str, gas "0xN", gasCost "0xN", refund "0xN", memSize int, depth int, stack [], returnData "0x..." 10/10 fields correct
goevmlab gen_oplog.go-style UnmarshalJSON parse (Python mimic of HexOrDecimal64 and vm.OpCode) every line round-trips
stateRoot lines match the literal byte pattern goevmlab scans for ("stateRoot":"0x<64 hex>")
Storage accumulation across SSTOREs (inherited from #6595) SSTORE #0: 1 slot · #1: 2 slots · #2: 3 slots · …
Missing fixture file exit 2 (was exit 101)
Malformed JSON exit 2 (was exit 101)
No-flag invocation clap rejects with exit 2 + helpful message

Exit-1 (post-state root mismatch) path is straightforward (any_mismatch |= run_case(...)) and exercised by its dual (root-match → exit 0) above. Provoking a real mismatch needs an intentionally wrong fixture which we don't have lying around; happy to add a synthetic regression test if reviewers want it.

Example output

$ cargo run --release -p ef_tests-statev2 -- statetest --json fixture.json 2> trace.jsonl
$ head -2 trace.jsonl
{"pc":0,"op":96,"gas":"0xeee94","gasCost":"0x3","memSize":0,"stack":[],"depth":1,"returnData":"0x","refund":"0x0","opName":"PUSH1"}
{"pc":2,"op":53,"gas":"0xeee91","gasCost":"0x3","memSize":0,"stack":["0x40"],"depth":1,"returnData":"0x","refund":"0x0","opName":"CALLDATALOAD"}
$ grep -c stateRoot trace.jsonl
16

cargo clippy --all-targets -- -D warnings and cargo fmt --check clean.

Adds three regression tests around the wire contract our subcommand depends on:

- eip3155_step_matches_goevmlab_oplog_shape: pins that Eip3155Step emits 'op'
  as a numeric byte, 'gas'/'gasCost' as '0xN' hex strings, 'stack' as a
  non-null array — the field types gen_oplog.go expects to unmarshal.
- eip3155_step_stack_disabled_renders_as_empty_array: pins the EIP-3155 MUST
  rule that disabled stack capture renders as [] (not null).
- stateroot_line_pins_literal_goevmlab_scan_pattern: pins the literal
  '"stateRoot":"0x<64 lowercase hex>"' substring goevmlab byte-scans for,
  and that H256 LowerHex zero-pads even small values.

Extracts the stateRoot line formatter into a small helper so the test can
assert on the string without reaching into eprintln!.
@github-project-automation github-project-automation Bot moved this to In Review in ethrex_l1 May 20, 2026
Base automatically changed from feat/eip-3155-tracer to main May 20, 2026 21:09
@ElFantasma ElFantasma added this pull request to the merge queue May 21, 2026
Merged via the queue into main with commit 5b33379 May 21, 2026
55 checks passed
@ElFantasma ElFantasma deleted the feat/6575-statetest-cli branch May 21, 2026 21:29
@github-project-automation github-project-automation Bot moved this from In Review to Done in ethrex_l1 May 21, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

L1 Ethereum client

Projects

Status: Done

Development

Successfully merging this pull request may close these issues.

[Fuzzing] Add statetest CLI subcommand for goevmlab differential fuzzing

5 participants