feat(l1): bump zkevm EF-tests fixtures to v0.3.3 and extend stateless witness coverage to all for_amsterdam tests#6527
Conversation
…5.6.1 base, so it lands cleanly on main) and widen the Makefile zkevm-vectors extraction from just eip8025_optional_proofs to the whole for_amsterdam tree. This lifts witness-based stateless coverage from the ~93 eip8025 fixtures to all 2763 for_amsterdam fixtures, since every zkevm fixture now carries executionWitness. Order the zkevm extraction after amsterdam-vectors so its witness-bearing JSONs overlay the non-witness copies produced by bal@v5.6.1, giving us the "free extra checks" the EEST maintainers suggested. Teach run_stateless_from_fixture about the zkevm output format. The fixture's statelessOutputBytes packs new_payload_request_root (32 B) || valid (1 B) || padding, so byte 32 is the expected validity marker. parse_expected_valid_flag decodes it and the outcome match now accepts two cases as success: valid=1 with Ok and valid=0 with Err (the new tolerance/rejection tests rely on the latter). Without this, every zkevm@v0.3.3 "invalid executionWitness" test would fail with an inverted error. Refactor the skip list in tests/all.rs so prover/stateless extras are feature-gated instead of living in SKIPPED_BASE. Add a new #[cfg(feature = "stateless")] EXTRA_SKIPS group covering nine validation_* tests the v0.3.3 bundle exposes: four "tolerance" cases (valid=1) where RpcExecutionWitness::into_execution_witness eagerly rejects entries the execution path never touches, and five "rejection" cases (valid=0) where our stateless path runs clean instead of noticing a missing code or ancestor entry. Both groups have TODOs pointing at the real follow-up work (lazy witness parsing and completeness checks in the guest program). End state: test-stateless goes from 16 failures to 8720 passed / 0 failed / 9 skipped. test-levm behavior is unchanged.
Greptile SummaryThis PR bumps the zkevm EF-test fixtures from v0.3.0 to v0.3.3, widens the Makefile extraction from a single subdirectory to the full Confidence Score: 4/5Safe to merge; the only finding is a P2 edge-case fallback in the validity-byte parser that is unlikely to affect real fixtures. All P0/P1 paths look correct: the hex byte extraction (chars 64..66 = byte 32) is accurate, the match on (expected_valid, outcome) covers all four arms, the Makefile overlay order is correct, and the feature-gated EXTRA_SKIPS block is properly exhaustive. The single concern is a P2 silent-fallback when statelessOutputBytes is present but too short to contain the valid byte. tooling/ef_tests/blockchain/test_runner.rs — the parse_expected_valid_flag fallback behaviour for truncated byte strings.
|
| Filename | Overview |
|---|---|
| tooling/ef_tests/blockchain/.fixtures_url_zkevm | Bumps zkevm fixture URL from v0.3.0 to v0.3.3 — single-line version pin change. |
| tooling/ef_tests/blockchain/Makefile | Adds amsterdam-vectors as a prerequisite of zkevm-vectors and widens the extracted path from eip8025_optional_proofs to the full for_amsterdam tree; overlay order is correct and phony-target deduplication prevents double extraction. |
| tooling/ef_tests/blockchain/test_runner.rs | Adds parse_expected_valid_flag to extract the validity byte from statelessOutputBytes and updates the stateless outcome matching logic to handle both pass (valid=1) and expected-rejection (valid=0) cases; the fallback for absent bytes is correct but a truncated-but-present string silently defaults to expected_valid = true. |
| tooling/ef_tests/blockchain/tests/all.rs | Adds a feature-gated EXTRA_SKIPS block for the stateless feature covering 9 known-broken validation fixtures (4 tolerance + 5 rejection), and correctly tightens the fallback empty-skips guard to not(any(sp1, stateless)). |
Flowchart
%%{init: {'theme': 'neutral'}}%%
flowchart TD
A[run_stateless_from_fixture] --> B{block has executionWitness?}
B -- No --> NEXT[next block]
B -- Yes --> C[read statelessOutputBytes]
C --> D{bytes present?}
D -- No --> E[expected_valid = true]
D -- Yes --> F[parse_expected_valid_flag\nbyte index 32 of hex string]
F -- parse ok --> G[expected_valid = b != 0]
F -- parse fails / too short --> E
E & G --> H[run stateless execution\nRpcExecutionWitness → ProgramInput]
H --> I{outcome}
I -- Ok --> J{expected_valid?}
I -- Err --> K{expected_valid?}
J -- true --> PASS[✅ pass]
J -- false --> FAIL2[❌ expected failure but succeeded]
K -- false --> PASS
K -- true --> FAIL1[❌ expected success but failed]
Prompt To Fix All With AI
This is a comment left during a code review.
Path: tooling/ef_tests/blockchain/test_runner.rs
Line: 606-610
Comment:
**Silent `true` fallback on truncated `statelessOutputBytes`**
`and_then(parse_expected_valid_flag).unwrap_or(true)` treats a present-but-too-short hex string (< 66 chars after stripping `0x`) identically to an absent field. If a fixture ships a `statelessOutputBytes` that is exactly 32 bytes (64 hex chars) — omitting the `valid` byte — `get(64..66)` returns `None` and the test silently runs as "expected to succeed" instead of surfacing the ambiguous fixture. A targeted `None`-return for the case where bytes are present but the string is too short would make the failure explicit rather than silent.
The existing code already handles this correctly at the call site (`unwrap_or(true)`) for the absent-field case; the concern is that a malformed fixture with a truncated byte string would be silently treated as "expected valid" rather than triggering a parse warning.
How can I resolve this? If you propose a fix, please make it concise.Reviews (1): Last reviewed commit: "Merge branch 'main' into feat/l1-ef-test..." | Re-trigger Greptile
| fn parse_expected_valid_flag(hex: &str) -> Option<bool> { | ||
| let trimmed = hex.strip_prefix("0x").unwrap_or(hex); | ||
| let byte_hex = trimmed.get(64..66)?; | ||
| u8::from_str_radix(byte_hex, 16).ok().map(|b| b != 0) | ||
| } |
There was a problem hiding this comment.
Silent
true fallback on truncated statelessOutputBytes
and_then(parse_expected_valid_flag).unwrap_or(true) treats a present-but-too-short hex string (< 66 chars after stripping 0x) identically to an absent field. If a fixture ships a statelessOutputBytes that is exactly 32 bytes (64 hex chars) — omitting the valid byte — get(64..66) returns None and the test silently runs as "expected to succeed" instead of surfacing the ambiguous fixture. A targeted None-return for the case where bytes are present but the string is too short would make the failure explicit rather than silent.
The existing code already handles this correctly at the call site (unwrap_or(true)) for the absent-field case; the concern is that a malformed fixture with a truncated byte string would be silently treated as "expected valid" rather than triggering a parse warning.
Prompt To Fix With AI
This is a comment left during a code review.
Path: tooling/ef_tests/blockchain/test_runner.rs
Line: 606-610
Comment:
**Silent `true` fallback on truncated `statelessOutputBytes`**
`and_then(parse_expected_valid_flag).unwrap_or(true)` treats a present-but-too-short hex string (< 66 chars after stripping `0x`) identically to an absent field. If a fixture ships a `statelessOutputBytes` that is exactly 32 bytes (64 hex chars) — omitting the `valid` byte — `get(64..66)` returns `None` and the test silently runs as "expected to succeed" instead of surfacing the ambiguous fixture. A targeted `None`-return for the case where bytes are present but the string is too short would make the failure explicit rather than silent.
The existing code already handles this correctly at the call site (`unwrap_or(true)`) for the absent-field case; the concern is that a malformed fixture with a truncated byte string would be silently treated as "expected valid" rather than triggering a parse warning.
How can I resolve this? If you propose a fix, please make it concise.There was a problem hiding this comment.
Done in 5a597e6 — parse_expected_valid_flag now returns Result<bool, String>. Truncated input, non-hex bytes, and validity bytes other than 0x00/0x01 all become hard errors at the call site.
🤖 Claude Code ReviewNow I have enough context to write a thorough review. PR #6527 Review — Bump zkevm EF-tests to v0.3.3 and widen stateless witness coverageOverviewThis PR expands stateless test coverage from ~93 ( MakefileOverlay ordering is implicit and fragile zkevm-vectors: $(ZKEVM_ARTIFACT) $(SPECTEST_VECTORS_DIR) amsterdam-vectors
tar -xzf $(ZKEVM_ARTIFACT) --strip-components=2 -C $(SPECTEST_VECTORS_DIR) fixtures/blockchain_tests/for_amsterdamThe correctness of this target depends entirely on # Run amsterdam-vectors first so zkevm can overlay witness-enriched JSONs on top of the base fixtures.
zkevm-vectors: $(ZKEVM_ARTIFACT) $(SPECTEST_VECTORS_DIR) amsterdam-vectorsSince
|
There was a problem hiding this comment.
Pull request overview
This PR updates the blockchain EF-test vector sourcing to zkevm fixture release v0.3.3 and expands the stateless runner to utilize the now-available executionWitness across all for_amsterdam fixtures, including honoring the valid marker encoded in statelessOutputBytes.
Changes:
- Bump zkevm fixtures download URL to
zkevm@v0.3.3. - Update vector extraction to unpack all
for_amsterdamzkevm fixtures (overlaying the existing amsterdam vectors). - Extend stateless fixture execution to interpret
statelessOutputBytesvalidity and treat(valid=0, Err)as a passing expected-failure outcome; add stateless-only skip list for known problematic fixtures.
Reviewed changes
Copilot reviewed 4 out of 4 changed files in this pull request and generated 2 comments.
| File | Description |
|---|---|
| tooling/ef_tests/blockchain/tests/all.rs | Adds stateless-feature-gated EXTRA_SKIPS for known problematic validation fixtures. |
| tooling/ef_tests/blockchain/test_runner.rs | Parses statelessOutputBytes validity flag and adjusts pass/fail logic accordingly; introduces helper to extract the flag byte. |
| tooling/ef_tests/blockchain/Makefile | Expands zkevm-vectors extraction scope to all for_amsterdam and ensures base vectors are present first. |
| tooling/ef_tests/blockchain/.fixtures_url_zkevm | Bumps downloaded zkevm fixture release to v0.3.3. |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
| // zkevm fixtures encode the expected stateless outcome in `statelessOutputBytes` | ||
| // as `new_payload_request_root (32 bytes) ++ valid (1 byte) ++ trailing padding`. | ||
| // When the fixture signals `valid = false` the witness is deliberately incomplete | ||
| // and the stateless path must reject it; absent bytes means "expected to succeed". | ||
| let expected_valid = block_data | ||
| .stateless_output_bytes | ||
| .as_deref() | ||
| .and_then(parse_expected_valid_flag) | ||
| .unwrap_or(true); | ||
|
|
||
| let block: CoreBlock = block_data.clone().into(); | ||
| let block_number = block.header.number; | ||
|
|
There was a problem hiding this comment.
expected_valid defaults to true not only when stateless_output_bytes is absent, but also when it is present yet malformed/too short (because parse_expected_valid_flag returns None). That can silently treat corrupted fixtures as “expected to succeed”. Consider returning an explicit error when stateless_output_bytes is Some(_) but the validity byte cannot be parsed, and only default to true when the field is actually missing.
| // zkevm fixtures encode the expected stateless outcome in `statelessOutputBytes` | |
| // as `new_payload_request_root (32 bytes) ++ valid (1 byte) ++ trailing padding`. | |
| // When the fixture signals `valid = false` the witness is deliberately incomplete | |
| // and the stateless path must reject it; absent bytes means "expected to succeed". | |
| let expected_valid = block_data | |
| .stateless_output_bytes | |
| .as_deref() | |
| .and_then(parse_expected_valid_flag) | |
| .unwrap_or(true); | |
| let block: CoreBlock = block_data.clone().into(); | |
| let block_number = block.header.number; | |
| let block: CoreBlock = block_data.clone().into(); | |
| let block_number = block.header.number; | |
| // zkevm fixtures encode the expected stateless outcome in `statelessOutputBytes` | |
| // as `new_payload_request_root (32 bytes) ++ valid (1 byte) ++ trailing padding`. | |
| // When the fixture signals `valid = false` the witness is deliberately incomplete | |
| // and the stateless path must reject it; absent bytes means "expected to succeed". | |
| let expected_valid = match block_data.stateless_output_bytes.as_deref() { | |
| None => true, | |
| Some(stateless_output_bytes) => parse_expected_valid_flag(stateless_output_bytes) | |
| .ok_or_else(|| { | |
| format!( | |
| "Malformed statelessOutputBytes for {test_key} block {block_number}" | |
| ) | |
| })?, | |
| }; |
There was a problem hiding this comment.
Done in 5a597e6 — took your suggested shape: match on Some/None, and a present-but-malformed field returns an explicit error rather than defaulting to true.
| match (expected_valid, stateless_outcome) { | ||
| (true, Ok(())) | (false, Err(_)) => {} | ||
| (true, Err(e)) => { |
There was a problem hiding this comment.
When expected_valid is false, any error is currently treated as an acceptable pass ((false, Err(_)) => {}). That includes host-side failures like executionWitness JSON parsing and witness conversion, which can mask regressions and also means the guest may never be invoked. Consider distinguishing error stages (parse/conversion vs execution) and only accepting the expected-failure case when the stateless execution step actually runs and rejects the witness.
There was a problem hiding this comment.
Done in 5a597e6 — split the IIFE. serde_json::from_value and into_execution_witness now propagate via ? unconditionally; only backend execution errors are matched against expected_valid.
🤖 Codex Code ReviewFindings
Notes
Automated review by OpenAI Codex · gpt-5.4 · custom prompt |
| #[cfg(not(feature = "sp1"))] | ||
| #[cfg(feature = "stateless")] | ||
| const EXTRA_SKIPS: &[&str] = &[ | ||
| // zkevm@v0.3.3 tolerance tests: the fixture's `statelessOutputBytes` declares `valid = 1` | ||
| // because the executed path does not actually consume the malformed/extra/missing witness | ||
| // entry, but our RpcExecutionWitness conversion eagerly validates the full witness and | ||
| // rejects it. Re-enable once the witness conversion is lazy per EIP-8025 §Tolerance. | ||
| "validation_headers_malformed_rlp_header", | ||
| "validation_headers_missing_oldest_blockhash_ancestor", | ||
| "validation_headers_missing_parent_header", | ||
| "validation_state_extra_unused_trie_node", | ||
| // zkevm@v0.3.3 rejection tests: `statelessOutputBytes` declares `valid = 0` so the guest | ||
| // program must reject the deliberately-incomplete witness, but our stateless path runs | ||
| // to completion instead of detecting the missing entry. Re-enable once the witness | ||
| // completeness checks land (missing delegation/external-code bytecodes, non-contiguous | ||
| // header chain detection). | ||
| "validation_codes_missing_delegated_code_on_insufficient_balance_call", | ||
| "validation_codes_missing_external_code_read_target", | ||
| "validation_codes_missing_redelegation_old_marker", | ||
| "validation_codes_missing_sender_delegation_marker", | ||
| "validation_headers_non_contiguous_chain", | ||
| ]; | ||
| #[cfg(not(any(feature = "sp1", feature = "stateless")))] | ||
| const EXTRA_SKIPS: &[&str] = &[]; |
There was a problem hiding this comment.
if sp1 is set but stateless isn't this won't compile, and if sp1 implies stateless then why specify both?
There was a problem hiding this comment.
sp1 alone compiles (verified). The sp1 + stateless combo is rejected by the compile_error! at lines 6–7. They are not redundant: stateless uses the in-process Exec backend; sp1 runs the guest ELF inside the SP1 zkVM executor.
| res.map(|_| ()).map_err(|e| format!("execution: {e}")) | ||
| })(); | ||
|
|
||
| match (expected_valid, stateless_outcome) { |
There was a problem hiding this comment.
Block on this: the (false, Err(_)) => {} arm is too permissive — it treats any host-side failure as a pass, including executionWitness JSON parse errors, into_execution_witness conversion errors, and program-input setup errors. None exercise the stateless rejection path the fixture is asserting. A regression in RpcExecutionWitness::deserialize would silently pass every valid=false fixture without ever invoking the guest. Split the closure so JSON/conversion errors propagate as real failures regardless of expected_valid. (Copilot flagged.)
There was a problem hiding this comment.
Done in 5a597e6 — IIFE removed. JSON parse and witness conversion always fail the test; only Backend::execute errors satisfy (false, Err(_)).
| /// The output encoding is `new_payload_request_root (32 bytes) ++ valid (1 byte) ++ padding`, | ||
| /// so byte index 32 carries the validity marker. | ||
| #[cfg(feature = "stateless")] | ||
| fn parse_expected_valid_flag(hex: &str) -> Option<bool> { |
There was a problem hiding this comment.
unwrap_or(true) (at the call site above) collapses three distinct cases — field absent, field present but malformed, field present with valid byte 0x00 — into the same default. Right fix: parse_expected_valid_flag returns Result (or at minimum an Option with the absent case kept distinct from the malformed case), so a bad fixture can't silently flip into the success bucket. With ~2670 new fixtures landing, having one silently misinterpreted is a real risk. (Greptile + Copilot both flagged.)
There was a problem hiding this comment.
Done in 5a597e6 — parse_expected_valid_flag returns Result<bool, String>, the call site ?s the error, and only 0x00/0x01 are accepted. Absent / malformed / 0x00 / 0x01 are now all distinguishable.
| "static_Call1MB1024Calldepth", | ||
| ]; | ||
| #[cfg(not(feature = "sp1"))] | ||
| #[cfg(feature = "stateless")] |
There was a problem hiding this comment.
cfg(feature = "sp1") and cfg(feature = "stateless") aren't mutually exclusive at the cfg level — if both are enabled together, this file fails to compile (two definitions of EXTRA_SKIPS). Either gate the third arm with cfg(all(not(feature="sp1"), not(feature="stateless"))) (which it already does via not(any(...))) AND add a compile_error! in cfg(all(feature="sp1", feature="stateless")) to make the exclusivity explicit. Currently those features may be exclusive in practice, but the constraint isn't documented anywhere.
There was a problem hiding this comment.
compile_error! is at lines 6–7. In ad031d5 I gated the sp1 arms with not(feature = "stateless") so the both-on combo now fails with only that diagnostic (previously two duplicate-definition errors rode along with it).
conversion, and execution stages so the first two always fail the test, instead of being absorbed by the (false, Err(_)) acceptance arm. With the previous IIFE, a regression in RpcExecutionWitness deserialization or into_execution_witness would have silently passed every valid=0 fixture without ever invoking the guest. Now only the backend execution result is matched against expected_valid. Tighten parse_expected_valid_flag to return Result instead of Option so a present-but-malformed statelessOutputBytes is a hard error rather than silently defaulting to expected_valid=true. Also restrict the accepted byte values to 0x00 and 0x01; anything else (truncated input, invalid hex, byte != 0/1) now produces an explicit error message. Move block into ProgramInput::new instead of cloning it, and document the amsterdam-vectors -> zkevm-vectors overlay ordering in the Makefile.
`not(feature = "stateless")` so that, when both features are mistakenly enabled together, the `compile_error!` at the top of the file is the only diagnostic that surfaces — not the two duplicate-definition errors that previously rode along with it. The stateless arms remain the sole definers in the both-on case; values are irrelevant since compilation already aborts.
Motivation
zkevm@v0.3.3shipsexecutionWitnesson everyfor_amsterdamfixture, but the Makefileonly extracts
eip8025_optional_proofs— leaving ~2670 witnessed fixtures unused by thestateless path.
Description
.fixtures_url_zkevmtozkevm@v0.3.3(samebal@v5.6.1base).zkevm-vectorsto extract all offor_amsterdam, overlaying the non-witnessbal@v5.6.1JSONs so every fixture runs throughrun_stateless_from_fixture.statelessOutputBytesto honor thevalidbyte at offset 32: both(valid=1, Ok)and(valid=0, Err)now count as pass.EXTRA_SKIPS. 9validation_*fixtures skipped pending follow-up — see below.
Follow-up
9 skipped
validation_*tests will be re-enabled in a separate PR: drop malformed entrieslazily in witness conversion, and error on missing bytecode / discontinuous header chains
during stateless execution.
How to test
Expected: 8720 passed, ~160s.
test-levmunchanged.Checklist
STORE_SCHEMA_VERSION— N/A