Skip to content

Conversation

@spencer-tb
Copy link
Contributor

@spencer-tb spencer-tb commented Nov 14, 2025

🗒️ Description

This PR adds a CLI tool benchmark_parser to automatically scan benchmark tests and generate a configuration file .fixed_opcode_counts.json for the --fixed-opcode-count feature from #1747.

Key Changes

  • New CLI tool: uv run benchmark_parser
    • Uses Python AST to scan tests/benchmark/ for tests with @pytest.mark.repricing marker
    • Extracts opcode patterns from @pytest.mark.parametrize decorators
    • Generates .fixed_opcode_counts.json at repo root with opcode counts mapping
    • Supports --check mode for CI validation: uv run benchmark_parser --check
  • Config file format: .fixed_opcode_counts.json
    • Gitignored (user-local configuration)
    • All patterns default to [1] (1K opcodes)
    • Users can customize counts per pattern, [1, 10, 100] for 1K, 10K, 100K, by manually editing the file
    • Custom counts are preserved when re-running the parser.
  • Help text improvements:
    • Added benchmark options to fill --fill-help and execute remote --execute-remote-help
    • Simplified help text with examples for --gas-benchmark-values and --fixed-opcode-count
  • Test updates:
    • Renamed op parameters to opcode in test_arithmetic.py for consistency

Usage

  1. Generate/update config (first time or after benchmark test changes): uv run benchmark_parser
  2. Customize counts by editing .fixed_opcode_counts.json:
{
  "scenario_configs": {
    "test_codecopy.*": [
      1
    ],
    ...
}
  1. Run with configured opcode counts:
# Fill fixtures (useful for fast one shot checks if count is 1)
uv run fill --fixed-opcode-count --fork Prague -m repricing tests/benchmark

# Execute on remote RPC
uv run execute remote --fixed-opcode-count --fork Prague -m repricing tests/benchmark --rpc-seed-key <key> --rpc-endpoint <url> --chain-id <id>

Fill works correctly and I tested execute remote on Hoodi with 1K opcode count set, the latest txs: https://hoodi.etherscan.io/address/0x83fd666bfb2b345f932c3e4e04b6d85e5ed3568d

Future Items

  • Add CI for fill/execute with --fixed-opcode-count after generating the config file.
  • Verify --fixed-opcode-count with debug_traceTransaction using execute hive.
  • Add documentation & framework tests.

🔗 Related Issues or PRs

#1747

✅ Checklist

  • All: Ran fast tox checks to avoid unnecessary CI fails, see also Code Standards and Enabling Pre-commit Checks:
    uvx tox -e static
  • All: PR title adheres to the repo standard - it will be used as the squash commit message and should start type(scope):.
  • All: Considered adding an entry to CHANGELOG.md.
  • All: Considered updating the online docs in the ./docs/ directory.
  • All: Set appropriate labels for the changes (only maintainers can apply labels).

@spencer-tb spencer-tb added C-feat Category: an improvement or new feature A-test-benchmark Area: execution_testing.benchmark and tests/benchmark P-high labels Nov 15, 2025
Copy link
Collaborator

@LouisTsai-Csie LouisTsai-Csie left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks a lot for this! I left some suggestions, but I’m happy to discuss further. I’ll share this with Kamil to confirm it aligns with their needs.

@spencer-tb spencer-tb force-pushed the enhance/benchmarking/fixed-opcode-count-config branch from 7ca99dc to 7c68d19 Compare December 4, 2025 16:34
@codecov
Copy link

codecov bot commented Dec 4, 2025

Codecov Report

✅ All modified and coverable lines are covered by tests.
✅ Project coverage is 83.87%. Comparing base (2a6f9ee) to head (b98b267).
⚠️ Report is 18 commits behind head on forks/osaka.

Additional details and impacted files
@@               Coverage Diff               @@
##           forks/osaka    #1790      +/-   ##
===============================================
- Coverage        87.31%   83.87%   -3.45%     
===============================================
  Files              541      402     -139     
  Lines            32832    25101    -7731     
  Branches          3015     2285     -730     
===============================================
- Hits             28668    21053    -7615     
- Misses            3557     3609      +52     
+ Partials           607      439     -168     
Flag Coverage Δ
unittests 83.87% <ø> (-3.45%) ⬇️

Flags with carried forward coverage won't be shown. Click here to find out more.

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.
  • 📦 JS Bundle Analysis: Save yourself from yourself by tracking and limiting bundle sizes in JS merges.

Copy link
Collaborator

@LouisTsai-Csie LouisTsai-Csie left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Add some suggestion for the parser! Thanks

@spencer-tb spencer-tb force-pushed the enhance/benchmarking/fixed-opcode-count-config branch 2 times, most recently from 5506d25 to 14b6b64 Compare December 5, 2025 19:15
@spencer-tb spencer-tb force-pushed the enhance/benchmarking/fixed-opcode-count-config branch from 14b6b64 to 697a7d0 Compare December 5, 2025 19:18
@spencer-tb spencer-tb marked this pull request as ready for review December 5, 2025 19:33
Copy link
Collaborator

@LouisTsai-Csie LouisTsai-Csie left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks!! Only a small logic adjustment is needed.

Here is an example using this test:

@pytest.mark.repricing(contract_balance=0)
@pytest.mark.parametrize("contract_balance", [0, 1])
def test_selfbalance(
    benchmark_test: BenchmarkTestFiller,
    contract_balance: int,
) -> None:
    """Benchmark SELFBALANCE instruction."""
    benchmark_test(
        code_generator=ExtCallGenerator(
            attack_block=Op.SELFBALANCE,
            contract_balance=contract_balance,
        ),
    )

When running in pure --fixed-opcode-count or --gas-benchmark-values mode, both parameters should be executed (contract_balance = 0 and 1).
This results in 4 tests being run (including both blockchain test and blockchain engine test combinations).

Example run (without repricing marker)
The full benchmark test suite should run, producing 4 tests:

fill -v tests/benchmark/compute/instruction/test_account_query.py::test_selfbalance --gas-benchmark-values 10 -m benchmark --clean

fill -v tests/benchmark/compute/instruction/test_account_query.py::test_selfbalance --fixed-opcode-count 1 -m benchmark --clean

With the repricing marker
Only the cases with contract_balance = 0 should run, so it should produce 2 tests:

fill -v tests/benchmark/compute/instruction/test_account_query.py::test_selfbalance --gas-benchmark-values 10 -m repricing --clean

fill -v tests/benchmark/compute/instruction/test_account_query.py::test_selfbalance --fixed-opcode-count 1 -m repricing --clean

Edit:

Regarding the check: if no marker is present, the test should still run when using the -m benchmark flag, but it should be ignored when using the -m repricing flag.

Still take test_selfbalance as example, if you remove the repricing marker, both of this should still be able to run

fill -v tests/benchmark/compute/instruction/test_account_query.py::test_selfbalance --gas-benchmark-values 10 -m benchmark --clean

fill -v tests/benchmark/compute/instruction/test_account_query.py::test_selfbalance --fixed-opcode-count 1 -m benchmark --clean

@spencer-tb
Copy link
Contributor Author

Thanks for this! I've tried to address all you comments. Let me know if I missed anything:

1. Fix repricing filter to work with both benchmark options (0ca5109)

  • pytest_collection_modifyitems now checks for both --gas-benchmark-values and --fixed-opcode-count before applying the -m repricing filter.
  • Added a test to verify repricing filter works with both options.

2. Allow fixed-opcode-count for all benchmark tests (a18f800)

  • Removed the has_repricing check in pytest_generate_tests so --fixed-opcode-count works on all benchmark tests, not just repricing marked ones.

3. Warn when config file missing (a20f2b7)

  • Added a UserWarning when --fixed-opcode-count is provided without a value but .fixed_opcode_counts.json doesn't exist.

4. Update test to match new help text (dff1a46)

  • Fixed the failing CI test by updating the expected help text string.

5. Remove unnecessary generic_visit call in parser (33fa5ab)

  • Removed self.generic_visit(node) since we have early returns and don't use nested test functions.

Copy link
Collaborator

@LouisTsai-Csie LouisTsai-Csie left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM! Huge thanks for the effort!

@spencer-tb spencer-tb merged commit d1e7e6b into ethereum:forks/osaka Dec 9, 2025
13 of 14 checks passed
jochem-brouwer pushed a commit to jochem-brouwer/execution-specs that referenced this pull request Dec 11, 2025
…rios (ethereum#1790)

* enhance(test-benchmark): use config file for fixed opcode count scenarios

* chore(test-benchmark): update help messages for fixed opcode count and gas bench values

* chore(test-benchmark): fix repricing filter to work with both benchmark options

* chore(test-benchmark): allow fixed-opcode-count for all benchmark tests

* chore(test-benchmark): warn when config file missing for fixed-opcode-count

* chore(test-benchmark): update test to match new help text

* chore(test-benchmark): remove unnecessary generic_visit call in parser

* chore(test-benchmark): format test file
jochem-brouwer pushed a commit to jochem-brouwer/execution-specs that referenced this pull request Dec 11, 2025
…rios (ethereum#1790)

* enhance(test-benchmark): use config file for fixed opcode count scenarios

* chore(test-benchmark): update help messages for fixed opcode count and gas bench values

* chore(test-benchmark): fix repricing filter to work with both benchmark options

* chore(test-benchmark): allow fixed-opcode-count for all benchmark tests

* chore(test-benchmark): warn when config file missing for fixed-opcode-count

* chore(test-benchmark): update test to match new help text

* chore(test-benchmark): remove unnecessary generic_visit call in parser

* chore(test-benchmark): format test file
LouisTsai-Csie pushed a commit to LouisTsai-Csie/execution-specs that referenced this pull request Dec 12, 2025
…rios (ethereum#1790)

* enhance(test-benchmark): use config file for fixed opcode count scenarios

* chore(test-benchmark): update help messages for fixed opcode count and gas bench values

* chore(test-benchmark): fix repricing filter to work with both benchmark options

* chore(test-benchmark): allow fixed-opcode-count for all benchmark tests

* chore(test-benchmark): warn when config file missing for fixed-opcode-count

* chore(test-benchmark): update test to match new help text

* chore(test-benchmark): remove unnecessary generic_visit call in parser

* chore(test-benchmark): format test file
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

A-test-benchmark Area: execution_testing.benchmark and tests/benchmark C-feat Category: an improvement or new feature P-high

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants