ethereum · marioevz · Feb 24, 2026 · Feb 4, 2026 · Feb 4, 2026 · Feb 4, 2026
diff --git a/docs/running_tests/releases.md b/docs/running_tests/releases.md
@@ -12,6 +12,71 @@
 | [Transaction Tests](./test_formats/transaction_test.md)             | - using a new simulator coming soon                                                                                                                                                                                                                                                       | None; executed directly from Python source,</br>using a release tag |
 | Blob Transaction Tests                                               | - using the [eels/execute-blobs Simulator](./execute/hive.md#the-eelsexecute-blobs-simulator) and                                                                                                                                                                                                                         | None; executed directly from Python source,</br>using a release tag |
 
+## Fixture Output Directory Structure
+
+Inside each format directory, fixtures are grouped by **target fork**.
+
+The top-level subdirectory identifies the fork **under test**. Below it,
+fixtures mirror the `./tests/` source layout: each directory corresponds
+to the fork where the functionality was originally introduced. Because
+tests declare `valid_from`, a single target fork directory contains
+fixtures from every prior fork whose tests are still valid at that fork.
+
+### Consensus fixture layout
+
+```text
+fixtures/
+└── blockchain_tests/
+    ├── for_prague/                   # filled targeting Prague
+    │   ├── istanbul/                 # tests introduced in Istanbul
+    │   │   └── eip1344_chainid/...
+    │   ├── cancun/                   # tests introduced in Cancun
+    │   │   └── eip4844_blobs/...
+    │   └── prague/                   # tests introduced in Prague
+    │       └── eip7702_set_code_tx/...
+    └── for_osaka/                    # filled targeting Osaka
+        ├── istanbul/
+        │   └── eip1344_chainid/...
+        ├── cancun/
+        │   └── eip4844_blobs/...
+        ├── prague/
+        │   └── eip7702_set_code_tx/...
+        └── osaka/                    # tests introduced in Osaka
+            └── eip7692_eof_v1/...
+```
+
+Other format directories (`state_tests/`, `blockchain_tests_engine/`)
+follow the same layout.
+
+### Benchmark fixture layout
+
+When filling with `--gas-benchmark-values`, benchmark tests additionally
+include the gas limit in the subdirectory name (`for_{fork}_at_{gas}M`,
+where `{gas}` is in millions, zero-padded to four digits), with one
+subdirectory per gas value:
+
+```text
+fixtures/
+└── blockchain_tests/
+    ├── for_osaka_at_0001M/           # 1M gas benchmark
+    │   └── benchmark/compute/...
+    └── for_osaka_at_0002M/           # 2M gas benchmark
+        └── benchmark/compute/...
+```
+
+When filling with `--fixed-opcode-count`, the opcode count replaces the
+gas limit in the subdirectory name (`for_{fork}_at_opcount_{N}K`, where
+`{N}` is in thousands and may include decimals):
+
+```text
+fixtures/
+└── blockchain_tests/
+    ├── for_osaka_at_opcount_10K/     # 10K opcodes
+    │   └── benchmark/compute/...
+    └── for_osaka_at_opcount_20K/     # 20K opcodes
+        └── benchmark/compute/...
+```
+
 ## Release URLs and Tarballs
 
 ### Versioning Scheme

diff --git a/docs/writing_tests/benchmarks.md b/docs/writing_tests/benchmarks.md
@@ -69,6 +69,36 @@ This mode is designed for gas limit testing, and gas repricing, where it enables
 - `--gas-benchmark-values 1,2,3` runs the test with 1M, 2M, and 3M block gas limits
 - `--fixed-opcode-count 4,5` runs the test with approximately 4K and 5K opcode executions
 
+**Output layout with gas benchmark values:** When `--gas-benchmark-values` is provided and benchmark tests are filled, fixtures are written into per‑fork, per‑gas‑limit subdirectories under each format directory:
+
+```text
+<output>/
+  blockchain_tests/
+    for_osaka_at_0001M/...
+    for_osaka_at_0002M/...
+  blockchain_tests_engine/
+    for_osaka_at_0001M/...
+    for_osaka_at_0002M/...
+  blockchain_tests_engine_x/
+    pre_alloc/...
+    for_osaka_at_0001M/...
+    for_osaka_at_0002M/...
+```
+
+The subdirectory name follows the pattern `for_{fork}_at_{gas}M` (see [Fixture Output Directory Structure](../running_tests/releases.md#fixture-output-directory-structure) for details). Non-benchmark (consensus) fixtures use `for_{fork}` without the gas limit suffix.
+
+**Output layout with fixed opcode counts:** When `--fixed-opcode-count` is provided, the subdirectory name uses the opcode count instead of the gas limit (`for_{fork}_at_opcount_{N}K`):
+
+```text
+<output>/
+  blockchain_tests/
+    for_osaka_at_opcount_10K/...
+    for_osaka_at_opcount_20K/...
+  blockchain_tests_engine/
+    for_osaka_at_opcount_10K/...
+    for_osaka_at_opcount_20K/...
+```
+
 ## Developing Benchmarks
 
 Before writing benchmark-specific tests, please refer to the [general documentation](./writing_a_new_test.md) for the fundamentals of writing tests in the EELS framework.

diff --git a/packages/testing/src/execution_testing/cli/pytest_commands/plugins/filler/__init__.py b/packages/testing/src/execution_testing/cli/pytest_commands/plugins/filler/__init__.py
@@ -1,6 +1,6 @@
 """A pytest plugin to fill tests and generate JSON fixtures."""
 
-from .fixture_output import FixtureOutput
+from ..shared.fixture_output import FixtureOutput
 
 __all__ = [
     "FixtureOutput",

diff --git a/packages/testing/src/execution_testing/cli/pytest_commands/plugins/filler/filler.py b/packages/testing/src/execution_testing/cli/pytest_commands/plugins/filler/filler.py
@@ -75,6 +75,10 @@
 )
 
 from ..shared.execute_fill import ALL_FIXTURE_PARAMETERS
+from ..shared.fixture_output import (
+    FixtureOutput,
+    resolve_fixture_subfolder,
+)
 from ..shared.helpers import (
     get_spec_format_for_item,
     is_help_or_collectonly_mode,
@@ -83,7 +87,6 @@
 from ..spec_version_checker.spec_version_checker import (
     get_ref_spec_from_module,
 )
-from .fixture_output import FixtureOutput
 from .pre_alloc import Alloc
 
 # Fixture output dir for keyboard interrupt cleanup (set in pytest_configure).
@@ -1770,9 +1773,14 @@ def __init__(self, *args: Any, **kwargs: Any) -> None:
                     _info_metadata=t8n._info_metadata,
                 )
 
+                output_subdir = resolve_fixture_subfolder(
+                    list(request.node.iter_markers("fixture_subfolder"))
+                )
+
                 fixture_path = fixture_collector.add_fixture(
                     node_to_test_info(request.node),
                     fixture,
+                    output_subdir=output_subdir,
                 )
 
                 # NOTE: Use str for compatibility with pytest-dist

diff --git a/packages/testing/src/execution_testing/cli/pytest_commands/plugins/filler/static_filler.py b/packages/testing/src/execution_testing/cli/pytest_commands/plugins/filler/static_filler.py
@@ -21,7 +21,7 @@
 from execution_testing.specs import BaseStaticTest, BaseTest
 from execution_testing.tools.tools_code.yul import Yul
 
-from ..forks.forks import ValidityMarker
+from ..forks.forks import ValidityMarker, fork_markers
 from ..shared.helpers import labeled_format_parameter_set
 
 
@@ -267,8 +267,8 @@ def collect(self: "FillerFile") -> Generator["FillerTestItem", None, None]:
                             fixturenames = [
                                 spec_parameter_name,
                             ]
-                            marks: List[pytest.Mark] = [
-                                mark  # type: ignore
+                            marks: List[pytest.Mark | pytest.MarkDecorator] = [
+                                mark
                                 for mark in fixture_format_parameter_set.marks
                                 if mark.name != "parametrize"
                             ]
@@ -298,6 +298,7 @@ def collect(self: "FillerFile") -> Generator["FillerTestItem", None, None]:
                                             if mark.name != "parametrize"
                                         ]
                                         + extra_function_marks
+                                        + fork_markers(fork=fork)
                                     )
                                     case_params = params.copy() | dict(
                                         zip(
@@ -319,6 +320,7 @@ def collect(self: "FillerFile") -> Generator["FillerTestItem", None, None]:
                                         marks=case_marks,
                                     )
                             else:
+                                case_marks = marks[:] + fork_markers(fork=fork)
                                 yield FillerTestItem.from_parent(
                                     self,
                                     original_name=key,
@@ -328,7 +330,7 @@ def collect(self: "FillerFile") -> Generator["FillerTestItem", None, None]:
                                     name=f"{key}[{test_id}]",
                                     fork=fork,
                                     fixture_format=fixture_format,
-                                    marks=marks,
+                                    marks=case_marks,
                                 )
             except Exception as e:
                 pytest.fail(f"Error loading file {self.path} as a test: {e}")