feat(test-benchmark): updates and fixes for fixed opcode count #1985

spencer-tb · 2026-01-07T15:06:05Z

🗒️ Description

This PR fixes several bugs and adds features for --fixed-opcode-count.

Features

Use longest match for pattern specificity with regex, more detail below.
Add -m repricing marker to all precompile tests, apart from select few not required or broken.
Remove test_mod from repricing as not compatible with foc (fixed-opcode-count).
Adds unit tests for OpcodeCountsConfig pattern matching and validation.
Support float values for sub-1K opcode counts (e.g., 0.25 = 250 opcodes, 0.5 = 500 opcodes).

Bug Fixes

Raise UsageError when the fixed opcode count config file (.fixed_opcode_counts.json) is missing.
Validate flag input to reject test paths accidentally consumed by argparse.
Major - Fix config file pattern matching for parametrized tests, patterns like test_bitwise.*AND.* now correctly match against full test names instead of test_bitwise.* only.

Floats As Inputs

For fixed opcode count values < 1.0 (e.g., 0.25 = 250 opcodes), the inner iterations are set to the exact count with outer = 1, enabling precise low-count benchmarks.

Correct Pattern Matching With Config

Pattern matching now works with full test names, given both "test_bitwise.*": [1] & "test_bitwise.*AND.*": [100]:

Before: All test_bitwise variants got 1K, pattern matches function name only.
After: test_bitwise[fork_Prague-opcode_AND] gets 100K, others get 1K, pattern matches simulated full test ID so get different counts per parameter.

🔗 Related Issues or PRs

Follow ups and fixes from #1790.

✅ Checklist

All: Ran fast tox checks to avoid unnecessary CI fails, see also Code Standards and Enabling Pre-commit Checks:
```
uvx tox -e static
```
All: PR title adheres to the repo standard - it will be used as the squash commit message and should start type(scope):.
All: Considered adding an entry to CHANGELOG.md.
All: Considered updating the online docs in the ./docs/ directory.
All: Set appropriate labels for the changes (only maintainers can apply labels).

Cute Animal Picture

Venusaur - 0003

codecov · 2026-01-07T16:01:09Z

Codecov Report

✅ All modified and coverable lines are covered by tests.
✅ Project coverage is 86.33%. Comparing base (858336e) to head (b62cf1a).
⚠️ Report is 6 commits behind head on forks/amsterdam.

Additional details and impacted files

@@               Coverage Diff                @@
##           forks/amsterdam    #1985   +/-   ##
================================================
  Coverage            86.33%   86.33%           
================================================
  Files                  538      538           
  Lines                34557    34557           
  Branches              3222     3222           
================================================
  Hits                 29835    29835           
  Misses                4148     4148           
  Partials               574      574

Flag	Coverage Δ
unittests	`86.33% <ø> (ø)`

Flags with carried forward coverage won't be shown. Click here to find out more.

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:

❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.
📦 JS Bundle Analysis: Save yourself from yourself by tracking and limiting bundle sizes in JS merges.

packages/testing/src/execution_testing/cli/pytest_commands/plugins/shared/benchmarking.py

LouisTsai-Csie

Some initial comments, will continue on more feedback:

The required precompile benchmarks are already labeled with repricing marker, we only label these with necessary configurations. So we could remove the repricing marker in precompiles benchmark you add.

We need to update the benchmark_parser also, or it would override some of the setting. For example, this is valid in current logic: test_push* but it would be removed in the parser logic, as parser always find all the entry for the fixed-opcode-count compatible test.

spencer-tb · 2026-01-08T12:39:09Z

@LouisTsai-Csie I updated the parser and removed some precompiles: Modexp, BLS381, BN128 - as followed on discord.

…d tests

…opcode counts

…trization

…ile mode

…atterns

…ounts

…hing

…dation

…ze limits

LouisTsai-Csie

Huge thanks for the help!! I have some comment below and wants to discuss several points:

For the README file, could we integrate it into existing docs under the benchmark section?

I have one more files to review, the benchmark_parser.py one, will udpated soon

tests/benchmark/compute/instruction/test_log.py

tests/benchmark/compute/precompile/test_ecrecover.py

tests/benchmark/compute/precompile/test_p256verify.py

tests/benchmark/compute/precompile/test_point_evaluation.py

LouisTsai-Csie · 2026-01-14T07:49:23Z

tests/benchmark/README.md

+
+The fixed opcode count mode runs benchmark tests with a predetermined number of opcode iterations rather than gas-based limits. This approach enables rapid iteration when analyzing gas costs for repricing proposals, as you can directly compare execution times across different opcode counts.
+
+**Important:** Tests must be marked with `@pytest.mark.repricing` to be compatible with fixed opcode count mode. This marker identifies tests that have been specifically designed for gas repricing analysis with proper code generators.


This is only partially correct. The fixed-opcode-count feature is not limited to tests marked with the repricing marker.

Q: Which benchmark formats support the fixed-opcode-count feature?
A: Any test that uses the benchmark test wrapper (benchmark_test) together with a code generator (code_generator) can support the fixed-opcode-count feature, since we can generate the required contract logic dynamically.

Q: Why do we use the repricing marker?
A: During gas repricing analysis, Maria is typically interested in a specific subset of configurations. For example, in test_calldatasize, we assume that calldata size is the primary factor affecting performance, so we don’t focus on the zero_data parameter. The repricing marker allows us to configure and narrow down the relevant benchmark cases. We apply this same approach to other tests to limit the scope of benchmark runs.

The current implementation does not restrict the fixed-opcode-count feature to tests labeled with the repricing marker.

execution-specs/packages/testing/src/execution_testing/cli/pytest_commands/plugins/shared/benchmarking.py

Line 210 in 180dcec

def pytest_collection_modifyitems(

Summary: atm, we typically run the fixed-opcode-count feature together with the repricing marker, but this is a usage choice, not a technical limitation. From an implementation perspective, fixed-opcode-count benchmarks can run independently of the repricing marker.

Example: below both work

uv run fill -v --clean --gas-benchmark-values 30 tests/benchmark/compute/instruction/test_account_query.py::test_selfbalance -m benchmark uv run fill -v --clean --gas-benchmark-values 30 tests/benchmark/compute/instruction/test_account_query.py::test_selfbalance -m repricing

LouisTsai-Csie · 2026-01-14T07:58:00Z

tests/benchmark/README.md

+### Generating the Config File
+
+The benchmark parser tool can automatically generate and update the configuration file by scanning your test modules:
+
+```bash
+# Generate or update .fixed_opcode_counts.json
+uv run benchmark_parser
+
+# Validate that config is in sync
+uv run benchmark_parser --check
+```
+
+The parser preserves any custom counts you've configured while adding new tests with default values.


Suggestions:

Move this section before the Config File Format section, so readers are aware of the tool before they start manually adding configuration entries.

Provide more details on the override rules used by benchmark_parser, so users can understand whether their existing configurations will be preserved or overridden.

LouisTsai-Csie · 2026-01-14T09:03:42Z

packages/testing/src/execution_testing/cli/pytest_commands/plugins/shared/benchmarking.py

+        elif self.uses_config_file:
+            # Config file mode, no existing params: match against function name
+            metafunc.parametrize(
+                self.parameter_name,
+                self.get_test_parameters(test_name),
+                scope="function",
+            )
+        else:
+            # CLI mode: use function name matching (original behavior)
+            metafunc.parametrize(
+                self.parameter_name,
+                self.get_test_parameters(test_name),
+                scope="function",
+            )


Suggested change

elif self.uses_config_file:

# Config file mode, no existing params: match against function name

metafunc.parametrize(

self.parameter_name,

self.get_test_parameters(test_name),

scope="function",

)

else:

# CLI mode: use function name matching (original behavior)

metafunc.parametrize(

self.parameter_name,

self.get_test_parameters(test_name),

scope="function",

)

elif self.uses_config_file:

# Config file mode, no existing params: match against function name

metafunc.parametrize(

self.parameter_name,

self.get_test_parameters(test_name),

scope="function",

)

The else statement duplicate the logic of the one in elif statement

LouisTsai-Csie · 2026-01-14T09:13:01Z

packages/testing/src/execution_testing/cli/pytest_commands/plugins/shared/benchmarking.py

+
+        # Remove the opcode count part from the test ID for pattern matching
+        # Pattern: -opcount_X.XK or -opcount_XK at the end before ]
+        import re


Suggested change

import re

Already imported from the top

LouisTsai-Csie · 2026-01-14T09:29:55Z

packages/testing/src/execution_testing/specs/benchmark.py

        # --- 2. Determine Outer Iterations (M) ---
        # The Loop Contract's call count (M) is set to ensure the final total execution is consistent.
        #
        # 2a. If N is 1000: Set M = fixed_opcode_count. (Total ops: fixed_opcode_count * 1000)
        # 2b. If N is 500: Set M = fixed_opcode_count * 2. (Total ops: (fixed_opcode_count * 2) * 500 = fixed_opcode_count * 1000)


After review the logic below, i think the current fallback iteration is not 500 but 250, would be nice to be consistent here. It might be worth checking the entire comment again.

spencer-tb added C-bug Category: this is a bug, deviation, or other problem C-feat Category: an improvement or new feature P-high A-test-benchmark Area: execution_testing.benchmark and tests/benchmark labels Jan 7, 2026

spencer-tb force-pushed the feat/fixed-opcode-count-updates branch 2 times, most recently from 28a0c0c to 8305eab Compare January 7, 2026 15:19

danceratopz assigned LouisTsai-Csie Jan 7, 2026

spencer-tb force-pushed the feat/fixed-opcode-count-updates branch from 8305eab to 08a6443 Compare January 7, 2026 21:31

spencer-tb mentioned this pull request Jan 7, 2026

enhance(ci): improve benchmark workflows #1853

Merged

5 tasks

LouisTsai-Csie reviewed Jan 8, 2026

View reviewed changes

packages/testing/src/execution_testing/cli/pytest_commands/plugins/shared/benchmarking.py Outdated Show resolved Hide resolved

LouisTsai-Csie requested changes Jan 8, 2026

View reviewed changes

spencer-tb force-pushed the feat/fixed-opcode-count-updates branch 2 times, most recently from f11e602 to 1104a8b Compare January 8, 2026 12:37

spencer-tb force-pushed the feat/fixed-opcode-count-updates branch 5 times, most recently from a7aa3d9 to 9a59736 Compare January 12, 2026 12:45

This was referenced Jan 13, 2026

Refactor(test-benchmark): benchmark regex matching logic #1946

Closed

Simplify fixed_opcode_count parameterization logic #1925

Closed

spencer-tb added 8 commits January 13, 2026 14:18

bug(test-benchmark): raise UsageError when config file missing

a2880c1

bug(test-benchmark): validate fixed-opcode-count input

5252a3f

feat(test-benchmark): use longest-match for pattern specificity

df722a7

feat(test-benchmark): add repricing marker to precompile tests

4629459

test(test-benchmark): add unit tests for OpcodeCountsConfig

3df3750

bug(test-benchmark): fix config file pattern matching for parametrize…

5c8f2d5

…d tests

chore: fix tox ruff static fails

be5ebea

chore(test-benchmark): remove repricing marker from modexp test

7b0d59f

spencer-tb added 26 commits January 13, 2026 14:18

feat(test-benchmark): properly support sub 1k opcode counts

f27e0ec

chore(test-benchmark): remove test mod for foc

899df0f

chore: fix static tox mypy

025a5e6

bug(test-benchmark): fix config file mode not parametrizing multiple …

3956afc

…opcode counts

test(test-benchmark): add unit tests for multiple opcode count parame…

cda4cf3

…trization

feat(test-benchmark): add per-parameter pattern matching for config f…

e64765d

…ile mode

test(test-benchmark): add unit tests for per-parameter opcode count p…

bf0c154

…atterns

test(test-benchmark): add regression tests for CLI mode and default c…

d774209

…ounts

bug(test-benchmark): fix unhashable list params in per-parameter matc…

b8b0a46

…hing

chore: fix static tox ruff

7053a23

bug(test-benchmark): fix inner iterations for non-integer opcode counts

a132418

bug(test-benchmark): validate minimum fixed_opcode_count of 0.001

f2e0bea

feat(test-benchmark): improve fixed_opcode_count granularity and vali…

360096f

…dation

docs(test-benchmark): clarify config file uses same value restrictions

406940d

docs(test-benchmark): improve help and error messages for contract si…

8dad4f0

…ze limits

feat(test-benchmark): simplify to 0.25K granularity for >= 1K counts

725101f

docs(test-benchmark): add granularity rules for opcode gas costs

404f606

bug(test-benchmark): error on invalid regex patterns in config

6820a66

test(test-benchmark): improve test IDs and add per-param fallback test

9fafef7

docs(test-benchmark): add README for benchmark tests

f203327

docs(test-benchmark): comprehensive README for benchmark tests

5a62902

chore: fix static tox ruff

fddba4f

fix(test-benchmark): persist CREATE2 salt across outer loop calls

a83b4c7

chore: fix static tox ruff

8a09073

chore(test-benchmark): add evmone to unit tests CI

22c14e9

chore(test-benchmark): use collect-only for collection tests

e6313d6

spencer-tb force-pushed the feat/fixed-opcode-count-updates branch from 9a59736 to e6313d6 Compare January 13, 2026 14:18

fix(test-benchmark): deduplicate opcode counts in CLI mode

85ec326

LouisTsai-Csie requested changes Jan 14, 2026

View reviewed changes

refactor: update repricing marker

b62cf1a


		The fixed opcode count mode runs benchmark tests with a predetermined number of opcode iterations rather than gas-based limits. This approach enables rapid iteration when analyzing gas costs for repricing proposals, as you can directly compare execution times across different opcode counts.

		Important: Tests must be marked with `@pytest.mark.repricing` to be compatible with fixed opcode count mode. This marker identifies tests that have been specifically designed for gas repricing analysis with proper code generators.

feat(test-benchmark): updates and fixes for fixed opcode count #1985

Are you sure you want to change the base?

feat(test-benchmark): updates and fixes for fixed opcode count #1985

Uh oh!

Conversation

spencer-tb commented Jan 7, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

🗒️ Description

Features

Bug Fixes

Floats As Inputs

Correct Pattern Matching With Config

🔗 Related Issues or PRs

✅ Checklist

Cute Animal Picture

Uh oh!

codecov bot commented Jan 7, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Codecov Report

Uh oh!

Uh oh!

LouisTsai-Csie left a comment

Choose a reason for hiding this comment

Uh oh!

spencer-tb commented Jan 8, 2026

Uh oh!

LouisTsai-Csie left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

LouisTsai-Csie Jan 14, 2026

Choose a reason for hiding this comment

Uh oh!

LouisTsai-Csie Jan 14, 2026

Choose a reason for hiding this comment

Uh oh!

LouisTsai-Csie Jan 14, 2026

Choose a reason for hiding this comment

Uh oh!

LouisTsai-Csie Jan 14, 2026

Choose a reason for hiding this comment

Uh oh!

LouisTsai-Csie Jan 14, 2026

Choose a reason for hiding this comment

Uh oh!

LouisTsai-Csie Jan 14, 2026

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

spencer-tb commented Jan 7, 2026 •

edited

Loading

codecov bot commented Jan 7, 2026 •

edited

Loading