feat(test-benchmark): add worst-case depth attack benchmarks for Ethereum state tries using deterministic deploy #1976

marioevz · 2026-01-06T04:13:44Z

🗒️ Description

WIP: Based on #1937 but using #1934.

🔗 Related Issues or PRs

N/A.

✅ Checklist

All: Ran fast tox checks to avoid unnecessary CI fails, see also Code Standards and Enabling Pre-commit Checks:
```
uvx tox -e static
```
All: PR title adheres to the repo standard - it will be used as the squash commit message and should start type(scope):.
All: Considered adding an entry to CHANGELOG.md.
All: Considered updating the online docs in the ./docs/ directory.
All: Set appropriate labels for the changes (only maintainers can apply labels).
Tests: Ran mkdocs serve locally and verified the auto-generated docs for new tests in the Test Case Reference are correctly formatted.
Tests: For PRs implementing a missed test case, update the post-mortem document to add an entry the list.
Ported Tests: All converted JSON/YML tests from ethereum/tests or tests/static have been assigned @ported_from marker.

Cute Animal Picture

CPerezz · 2026-01-07T15:33:50Z

tests/benchmark/stateful/bloatnet/depth_benchmarks/test_deep_branch.py

- AttackOrchestrator.sol and Verifier.sol:
-  https://gist.github.com/CPerezz/8686da933fa5c045fbdf7c31e20e6c71


Why were those removed?? Just curious as they are indeed the contracts used within this test to perform the attack and verify the execution

For AttackOrchestrator.sol, Solidity was using more opcodes than necessary and made it difficult to see which ones were actually being used, which in turn made it difficult to estimate gas.
For Verifier.sol, we could do the verification via the post object in the test, so no need to create another contract and call it and use gas that could be instead used by the attacks.

CPerezz

I have been thinking on some of the suggestions from @jochem-brouwer in
#1976 and I think some of them apply here. Would you like me to address them?

jochem-brouwer · 2026-01-07T21:45:55Z

@CPerezz I think you are refering to #1961 and you are right, those indeed apply here. I'll write comments on it and will publish a review in a hour or so 😄 👍

jochem-brouwer

@CPerezz is I think referring to #1961. I think the format there cleans up the test and makes it easier to reason about. I left comments and pointers to there.

The main thing I think we should do ASAP is wrap this attack template in a helper file somewhere. So attacking specific contracts using CREATE2 factory is a common pattern, however the attack itself (use an EVM opcode on them, or CALL them with certain data) differs.

(If something is unclear let me know 😄 👍 Happy to help or give pointers)

docs/running_tests/execute/remote.md

packages/testing/src/execution_testing/test_types/helpers.py

packages/testing/src/execution_testing/cli/pytest_commands/plugins/execute/pre_alloc.py

jochem-brouwer · 2026-01-07T22:06:29Z

tests/benchmark/stateful/bloatnet/depth_benchmarks/test_deep_branch.py

+            self.value.to_bytes(32, "big")
+            + self.start.to_bytes(32, "big")
+            + self.end.to_bytes(32, "big")
+            + self.initcode_hash


I believe all 4 of these can be removed from calldata as with the method in #1961. This also raises the point for a further refactor short- to mid-term where we create a template for the following pattern:

We execute certain code with one input argument: the target address. This target address is a target of a CREATE2 factory. This is done in a loop with the exit condition: if the gas is below some threshold then exit the loop. The salt is increased every time the loop is re-ran. After the loop, the next salt is stored into storage. The initial salt is read from this storage.

This has the following benefits:

We do not have to calculate the start salt and the end salt here. To calculate the start and end salt we need to know the gas costs of each loop (which are usually dynamic and hard to calculate)

Encoding this data into calldata makes the transactions parallelizable. Using storage this is not possible, because the next transaction depends on the initial storage slot left to the previous one.

Although we get slightly waste of gas in the EVM, this is negligible compared to the operations we use in these tests (storage writes)

The value here can be read from the attack contract (this seems to be the target key to write. Small note: the usage of the name "value" here is confusing, because it could also point to tx value e.g. CALLVALUE). Start/end bytes are handled by the contract (in EVM). initcode_hash is also a constant and is something we can hardcode in the contract (no need to put it in calldata)

What to do is to take the contract from #1961: these lines: https://github.com/ethereum/execution-specs/pull/1961/changes#diff-88ac263a5a41126dcb0c95cc6939a105f972f0a9fd526ecaae4f085f01f96d0aR118-R152 and edit it such that it hardcodes the initcode, and changes the EXTCODESIZE in the loop to Op.CALL which calls the attack(uint256) here.

Note: this CREATE2 attack pattern is something we have seen in many places (and also in some small variations, with the same end goal) - so we should template this attack at some point so we can re-use it and iterate faster using the same code 😄 👍

The downside of the code created in build_attack_contract is that we have to deploy a new fresh contract each time we run execute again, because it stores the last salt to storage so we cannot start from zero when the test is executed from the beginning.

This version can reuse the same contract not only across re-execution of the same test but in in all tests (as long as the factory pre-deploy address is the same).

I think the slightly higher calldata cost is worth it in order to speed up test execution.

jochem-brouwer · 2026-01-07T22:07:44Z

tests/benchmark/stateful/bloatnet/depth_benchmarks/test_deep_branch.py

+            + self.initcode_hash
+        )
+
+    def calculate_inner_call_cost(self, fork: Fork) -> int:


Note: with the new format this can thus be removed as we let EVM handle the logic if we want to do "one more loop" or if we want to exit

jochem-brouwer · 2026-01-07T22:08:07Z

tests/benchmark/stateful/bloatnet/depth_benchmarks/test_deep_branch.py

+        return inner_call_cost
+
+    def calculate_gas(self, fork: Fork) -> int:
+        """Calculate the exact gas this attack transaction will use."""


Same here, gas calculations are not necessary anymore with this other logic

jochem-brouwer · 2026-01-07T22:08:59Z

tests/benchmark/stateful/bloatnet/depth_benchmarks/test_deep_branch.py

+            to=ATTACK_ORCHESTRATOR_ADDRESS,
+            gas_limit=self.calculate_tx_gas_limit(fork),
+            sender=sender,
+            data=self.calldata(),


If we switch here to new format, calldata is empty, and can thefore use BenchmarkTestFiller to handle the split logic

jochem-brouwer · 2026-01-07T22:11:50Z

tests/benchmark/stateful/bloatnet/depth_benchmarks/test_deep_branch.py

+    def add_post_verification(
+        self, post: Alloc, mined_contract_file: MinedContractFile
+    ) -> None:
+        """Add the post-verification transaction to the post-state."""


The post verification from #1961:

Writes current salt to slot 0

Writes EXTCODESIZE of target to slot 1.

The EXTCODESIZE check verifies that the target contract is deployed. Additionally, can also verify that slot 0 is "at least some number" such that we know at least X amount of attacks have ran.
(Note: if tx fails or OOGs these slots could thus be written but are then reverted)

This PR introduces comprehensive benchmarks to test Ethereum clients under worst-case scenarios involving extremely deep state and account tries. The attack scenario: - Pre-deployed contracts with deep storage tries (depth=9) maximizing traversal costs - CREATE2-based deterministic addressing for reproducible benchmarks - AttackOrchestrator contract that batches up to 2,510 attacks per transaction - Tests measure state root recomputation impact when modifying deep slots Key components: - depth_9.sol, depth_10.sol: Contracts with deep storage tries - s9_acc3.json: Pre-computed CREATE2 addresses and auxiliary accounts (15k contracts) - AttackOrchestrator.sol: Optimized attack coordinator (3,650 gas per attack) - deep_branch_testing.py: EEST test harness for pre-deployed contracts - README.md: Complete documentation and setup instructions Performance optimizations: - Reduced gas forwarding from 50k to 3,650 per attack (8.3x throughput increase) - MAX_ATTACKS_PER_TX increased from 303 to 2,510 - Precise EVM opcode cost analysis with safety margins - Read init_code_hash directly from JSON instead of recompiling Deployment setup and instructions available at: https://gist.github.com/CPerezz/44d521c0f9e6adf7d84187a4f2c11978 This benchmark helps identify performance bottlenecks in state trie handling and validates client implementations under extreme depth conditions.

The attack() call was forwarding only 3650 gas, which is insufficient for SSTORE operations on cold storage slots. SSTORE requires: - 2100 gas for cold slot access - 2900 gas for zero-to-nonzero write - Plus dispatch overhead (~200 gas) Updated to forward 5300 gas to ensure SSTORE succeeds.

Adds a minimal Verifier contract that checks if a target contract's deepest storage slot was updated to the expected attack value. This enables the test to verify attack success without expensive post-state checks on all attacked contracts. The verify() function calls getDeepest() on the target and compares the returned value against the expected attack value.

… gas Major refactor of the depth benchmark test for execute mode: - Remove stubs dependency; derive contract addresses directly from init_code_hash + Nick's deployer using CREATE2 formula - Deploy AttackOrchestrator and Verifier as part of test execution - Dynamically compute NUM_CONTRACTS based on gas_benchmark_value - Add verification transaction at end of block to confirm attack success - Fix gas constants based on empirical measurements: - GAS_PER_ATTACK: 8014 -> 8050 (measured ~8042) - MAX_ATTACKS_PER_TX: 1990 -> 1980 (safety margin) - TX_OVERHEAD: 22900 -> 22600 (more accurate) The previous gas constants caused all attack transactions to run out of gas, as the 28 gas/attack shortfall compounded over 1990 attacks to ~55k gas deficit.

- Embed AttackOrchestrator and Verifier bytecode directly in Python - Add download_mined_asset() to fetch JSON/SOL files from GitHub - Cache downloaded files locally in .cache/ directory - Remove local .sol and .json asset files (now downloaded on demand) - Update test parameters to use (10, 6) available from GitHub - Add gist reference for contract sources Contract sources: https://gist.github.com/CPerezz/8686da933fa5c045fbdf7c31e20e6c71 Mined assets: https://github.com/CPerezz/worst_case_miner/tree/master/mined_assets

- Remove unused ATTACK_SELECTOR constant - Extract magic numbers to named constants (gas limits, fees, etc.) - Add zero contracts validation to prevent edge case bugs - Fix unused fork parameter (rename to _fork) - Replace print warning with warnings.warn - Fix docstring math discrepancy (~2,742 not 2,750) - Fix line length issues and add proper type annotations

codecov · 2026-01-09T18:55:33Z

Codecov Report

✅ All modified and coverable lines are covered by tests.
✅ Project coverage is 86.33%. Comparing base (8c9e889) to head (fee6e2e).
⚠️ Report is 3 commits behind head on forks/amsterdam.

Additional details and impacted files

@@               Coverage Diff                @@
##           forks/amsterdam    #1976   +/-   ##
================================================
  Coverage            86.33%   86.33%           
================================================
  Files                  538      538           
  Lines                34557    34557           
  Branches              3222     3222           
================================================
  Hits                 29835    29835           
  Misses                4148     4148           
  Partials               574      574

Flag	Coverage Δ
unittests	`86.33% <ø> (ø)`

Flags with carried forward coverage won't be shown. Click here to find out more.

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:

❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.
📦 JS Bundle Analysis: Save yourself from yourself by tracking and limiting bundle sizes in JS merges.

marioevz force-pushed the feat/depth-bench-without-deploys branch from 63f135d to c92cfae Compare January 6, 2026 22:28

CPerezz reviewed Jan 7, 2026

View reviewed changes

danceratopz assigned LouisTsai-Csie Jan 7, 2026

jochem-brouwer suggested changes Jan 7, 2026

View reviewed changes

marioevz force-pushed the feat/depth-bench-without-deploys branch from c92cfae to 47f4a63 Compare January 9, 2026 17:17

CPerezz added 8 commits January 9, 2026 18:05

style: run ruff format on deep_branch_testing.py

443f829

fix: add mypy type annotations for deep_branch_testing.py

f1782d6

marioevz force-pushed the feat/depth-bench-without-deploys branch from 8afabff to 42e4830 Compare January 9, 2026 18:06

feat(git): Add CPerezz/worst_case_miner submodule

7266f6e

feat(tests/benchmarking): Update deep branch tests

fee6e2e

marioevz force-pushed the feat/depth-bench-without-deploys branch from 42e4830 to fee6e2e Compare January 9, 2026 18:57

marioevz marked this pull request as ready for review January 9, 2026 19:15

LouisTsai-Csie self-requested a review January 19, 2026 13:27

		- AttackOrchestrator.sol and Verifier.sol:
		https://gist.github.com/CPerezz/8686da933fa5c045fbdf7c31e20e6c71

feat(test-benchmark): add worst-case depth attack benchmarks for Ethereum state tries using deterministic deploy #1976

Are you sure you want to change the base?

feat(test-benchmark): add worst-case depth attack benchmarks for Ethereum state tries using deterministic deploy #1976

Uh oh!

Conversation

marioevz commented Jan 6, 2026

🗒️ Description

🔗 Related Issues or PRs

✅ Checklist

Cute Animal Picture

Uh oh!

CPerezz Jan 7, 2026

Choose a reason for hiding this comment

Uh oh!

marioevz Jan 7, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

CPerezz left a comment

Choose a reason for hiding this comment

Uh oh!

jochem-brouwer commented Jan 7, 2026

Uh oh!

jochem-brouwer left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

jochem-brouwer Jan 7, 2026

Choose a reason for hiding this comment

Uh oh!

jochem-brouwer Jan 7, 2026

Choose a reason for hiding this comment

Uh oh!

marioevz Jan 8, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

jochem-brouwer Jan 7, 2026

Choose a reason for hiding this comment

Uh oh!

jochem-brouwer Jan 7, 2026

Choose a reason for hiding this comment

Uh oh!

jochem-brouwer Jan 7, 2026

Choose a reason for hiding this comment

Uh oh!

jochem-brouwer Jan 7, 2026

Choose a reason for hiding this comment

Uh oh!

codecov bot commented Jan 9, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Codecov Report

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

marioevz Jan 7, 2026 •

edited

Loading

marioevz Jan 8, 2026 •

edited

Loading

codecov bot commented Jan 9, 2026 •

edited

Loading