tests(benchmark): add missing scenario and optimization #1768

LouisTsai-Csie · 2025-11-07T15:04:18Z

🗒️ Description

Enhance the code generator for BenchmarkTest wrapper:

Add initial balance setting for JumpLoopGenerator, which already supports in the other code generator, this could simplify test_create.
Add safety checks for ExtCallGenerator, when calculating the max iterations, it should deduct the stack delta that comes from setup section, or it might lead to stack overflow issue.
Most of the contract will be filled with repeated pattern, this PR further fills the remaining space with STOP opcode. This could help simplify some cases like test_codecopy & test_return_revert.
For ExtCallGenerator, there are two different account: target contract and loop contract. The loop contract will repeatedly calls into target contract via STATICCALL. For some operations that involves CALLDATA interactions, we should forward the CALLDATA from loop contract to target contract via (1) forward the CALLDATA from loop contract and (2) Configure the memory from CALLDATA in target contract.

Test Case Enhancement:

test_ext_account_query_warm: Add initial balance / initial storage parametrization.
test_blockhash: Add extra cases for fixed block index and dynamic block index
test_pc_op: Add missing benchmark test
test_keccak: Add this case to parametrize memory size / offset for gas repricing effort.
test_tload: Optimize the benchmark by switching to ExtCallGenerator
test_block_full_of_ether_transfers: Add scenario that the receiver is not an empty account.

🔗 Related Issues or PRs

N/A.

✅ Checklist

All: Ran fast tox checks to avoid unnecessary CI fails, see also Code Standards and Enabling Pre-commit Checks:
```
uvx tox -e static
```
All: PR title adheres to the repo standard - it will be used as the squash commit message and should start type(scope):.
All: Considered adding an entry to CHANGELOG.md.
All: Considered updating the online docs in the ./docs/ directory.
All: Set appropriate labels for the changes (only maintainers can apply labels).
Tests: Ran mkdocs serve locally and verified the auto-generated docs for new tests in the Test Case Reference are correctly formatted.
Tests: For PRs implementing a missed test case, update the post-mortem document to add an entry the list.
Ported Tests: All converted JSON/YML tests from ethereum/tests or tests/static have been assigned @ported_from marker.

Cute Animal Picture

LouisTsai-Csie · 2025-11-10T10:20:49Z

tests/benchmark/compute/instruction/test_control_flow.py

    )


+def test_pc_op(


We miss the benchmark for PC opcode

LouisTsai-Csie · 2025-11-10T10:21:58Z

tests/benchmark/compute/instruction/test_keccak.py

+@pytest.mark.parametrize("mem_alloc", [b"", b"ff", b"ff" * 32])
+@pytest.mark.parametrize("offset", [0, 31, 1024])
+def test_keccak(


Instead of finding the optimal length, add a new case that parametrized initial memory layout and access offset. Request by gas repricing effort.

LouisTsai-Csie · 2025-11-10T10:24:57Z

tests/benchmark/compute/scenario/test_transaction_types.py

    ],
 )
+@pytest.mark.parametrize("balance", [0, 1])
 def test_block_full_of_ether_transfers(


Add a new scenario for ether transfer benchmark: whether the receiver is an empty account.

tests/benchmark/compute/instruction/test_storage.py

LouisTsai-Csie · 2025-11-10T10:28:43Z

tests/benchmark/compute/instruction/test_block_context.py

    # Create 256 dummy blocks to fill the blockhash window.
    blocks = [Block()] * 256

+    block_number = Op.AND(Op.GAS, 0xFF) if index is None else index


Adding extra BLOCKHASH operation for benchmarking, as it was one of the slowest operations.
Cases:

Valid block index

Invalid block index

Dynamic block index

LouisTsai-Csie · 2025-11-10T10:30:11Z

tests/benchmark/compute/instruction/test_account_query.py

+    empty_account: bool,
+    initial_balance: bool,
+    initial_storage: bool,


Add new test scenario for gas repricing effort:

Accessing empty / non-empty account

Accessing account contains zero / non-zero balance

Accessing account contains zero / non-zero storage

marioevz

Thanks!

In general, I liked the refactoring and changes, I've just left comments on multiple files just to make everything more polished.

packages/testing/src/execution_testing/specs/benchmark.py

tests/benchmark/compute/instruction/test_account_query.py

tests/benchmark/compute/instruction/test_memory.py

tests/benchmark/compute/instruction/test_storage.py

tests/benchmark/compute/instruction/test_system.py

tests/benchmark/compute/scenario/test_transaction_types.py

LouisTsai-Csie · 2025-11-11T06:39:02Z

Sorry for adding complexity to this PR, I’ve added one more commit that optimizes the DUP and CALLDATASIZE operations.

test_dup: The setup stack delta is now handled in ExtCallGenerator, which simplifies the test logic.
test_calldatasize: This test is also simplified using ExtCallGenerator, as the generator now forwards calldata to the target contract.

codecov-commenter · 2025-11-11T12:21:22Z

⚠️ Please install the to ensure uploads and comments are reliably processed by Codecov.

Codecov Report

✅ All modified and coverable lines are covered by tests.
✅ Project coverage is 86.08%. Comparing base (1078413) to head (2b6823a).
⚠️ Report is 2 commits behind head on forks/osaka.
❗ Your organization needs to install the Codecov GitHub app to enable full functionality.

Additional details and impacted files

@@             Coverage Diff              @@
##           forks/osaka    #1768   +/-   ##
============================================
  Coverage        86.08%   86.08%           
============================================
  Files              743      743           
  Lines            44072    44072           
  Branches          3891     3891           
============================================
  Hits             37938    37938           
  Misses            5656     5656           
  Partials           478      478

Flag	Coverage Δ
unittests	`86.08% <ø> (ø)`

Flags with carried forward coverage won't be shown. Click here to find out more.

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:

❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.
📦 JS Bundle Analysis: Save yourself from yourself by tracking and limiting bundle sizes in JS merges.

marioevz

Excellent job on all these optimizations. LGTM, thanks!

* feat: add max_stack_height method to BaseFork and Frontier classes * refactor: replace hardcode stack limit to fork configuration

feat: enhance benchmark code generator and wrapper

b5c2255

LouisTsai-Csie force-pushed the benchmark/add-missing-cases branch from f9fdb17 to 01e4d6b Compare November 10, 2025 10:02

LouisTsai-Csie self-assigned this Nov 10, 2025

LouisTsai-Csie added E-medium Experience: of moderate difficulty P-medium C-refactor Category: refactor A-test-benchmark Area: execution_testing.benchmark and tests/benchmark labels Nov 10, 2025

refactor: optimize benchmark cases

1b44dea

LouisTsai-Csie force-pushed the benchmark/add-missing-cases branch from 01e4d6b to 1b44dea Compare November 10, 2025 10:05

LouisTsai-Csie commented Nov 10, 2025

View reviewed changes

tests/benchmark/compute/instruction/test_storage.py Show resolved Hide resolved

LouisTsai-Csie commented Nov 10, 2025

View reviewed changes

refactor: add more scenario for warm account query

ac154c4

LouisTsai-Csie marked this pull request as ready for review November 10, 2025 10:38

marioevz self-requested a review November 10, 2025 17:03

marioevz reviewed Nov 10, 2025

View reviewed changes

LouisTsai-Csie added 2 commits November 11, 2025 14:31

refactor: apply suggested changes

ca5d64a

refactor: optimize dup and calldatasize benchmark

2b6823a

LouisTsai-Csie force-pushed the benchmark/add-missing-cases branch from c720a28 to 2b6823a Compare November 11, 2025 06:35

marioevz approved these changes Nov 11, 2025

View reviewed changes

marioevz merged commit dedec64 into ethereum:forks/osaka Nov 11, 2025
12 checks passed

LouisTsai-Csie mentioned this pull request Nov 12, 2025

Refactor(benchmark): analyze and optimize existing test to enhance opcode count #1575

Closed

4 tasks

This was referenced Nov 12, 2025

chore(ci): add sanity check for filling benchmark tests #1779

Merged

fix(benchmark): fix failing benchmark cases #1780

Merged

SamWilsn pushed a commit that referenced this pull request Dec 9, 2025

feat(forks): add stack limit in fork configuration (#1768)

58fdb04

* feat: add max_stack_height method to BaseFork and Frontier classes * refactor: replace hardcode stack limit to fork configuration

SamWilsn pushed a commit that referenced this pull request Dec 9, 2025

feat(forks): add stack limit in fork configuration (#1768)

a245563

* feat: add max_stack_height method to BaseFork and Frontier classes * refactor: replace hardcode stack limit to fork configuration

tests(benchmark): add missing scenario and optimization #1768

tests(benchmark): add missing scenario and optimization #1768

Uh oh!

Conversation

LouisTsai-Csie commented Nov 7, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

🗒️ Description

🔗 Related Issues or PRs

✅ Checklist

Cute Animal Picture

Uh oh!

LouisTsai-Csie Nov 10, 2025

Choose a reason for hiding this comment

Uh oh!

LouisTsai-Csie Nov 10, 2025

Choose a reason for hiding this comment

Uh oh!

LouisTsai-Csie Nov 10, 2025

Choose a reason for hiding this comment

Uh oh!

Uh oh!

LouisTsai-Csie Nov 10, 2025

Choose a reason for hiding this comment

Uh oh!

LouisTsai-Csie Nov 10, 2025

Choose a reason for hiding this comment

Uh oh!

marioevz left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

LouisTsai-Csie commented Nov 11, 2025

Uh oh!

codecov-commenter commented Nov 11, 2025

Codecov Report

Uh oh!

marioevz left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

LouisTsai-Csie commented Nov 7, 2025 •

edited

Loading