Skip to content

Conversation

@danceratopz
Copy link
Member

@danceratopz danceratopz commented Jan 27, 2026

🗒️ Description

Add an in-memory LRU cache for transition tool outputs during fixture generation. When multiple fixture formats share the same t8n inputs (e.g., blockchain_test and blockchain_test_engine), the cache eliminates redundant t8n calls.

Changes

  • Add T8nOutputCache class with bounded LRU eviction.
  • Add strip_fixture_format_from_nodeid() to derive consistent cache keys.
  • Integrate cache lookup/store in BlockchainTest._generate_block_data().
  • Add xdist_group markers to keep related formats on the same worker.
  • Sort test items during collection for deterministic execution order.

Performance

Command (with xdist), note this also generates the state_test format, which doesn't benefit from caching:

hyperfine --warmup=1 --runs=4 "taskset -c 0,2,4,5,6,7 uv run fill tests/shanghai --output=fixtures-with-cache-without-xidst --clean -x --no-html --skip-index"

Benchmarked on tests/shanghai (1408 tests across all forks through Osaka) with 6 parallel workers:

Configuration Time Improvement
Sequential, no cache 61.7s baseline
Sequential, with cache 47.6s 1.3x faster
Parallel, no cache 38.7s 1.6x faster
Parallel, with cache 33.2s 1.9x faster

Sequential:

No cache   ████████████████████████████████████████  61.7s
+ Cache    ███████████████████████████████·········  47.6s  -23%

Parallel (6 workers):

No cache   █████████████████████████···············  38.7s  -37%
+ Cache    ██████████████████████··················  33.2s  -46%
  • Percentages relative to sequential baseline (61.7s).
  • So using xdist, the cache gives a ~1.17x speed-up or a ~14.2% improvement.

🔗 Related Issues or PRs

N/A.

✅ Checklist

  • All: Ran fast tox checks to avoid unnecessary CI fails, see also Code Standards and Enabling Pre-commit Checks:
    uvx tox -e static
  • All: PR title adheres to the repo standard - it will be used as the squash commit message and should start type(scope):.
  • All: Considered updating the online docs in the ./docs/ directory.
  • All: Set appropriate labels for the changes (only maintainers can apply labels).
  • Tests: Ran mkdocs serve locally and verified the auto-generated docs for new tests in the Test Case Reference are correctly formatted.

Cute Animal Picture

image

@danceratopz danceratopz added C-feat Category: an improvement or new feature A-test-fill Area: execution_testing.cli.pytest_commands.plugins.filler A-test-specs Area: execution_testing.specs labels Jan 27, 2026
@marioevz marioevz self-requested a review January 27, 2026 14:15
@codecov
Copy link

codecov bot commented Jan 27, 2026

Codecov Report

✅ All modified and coverable lines are covered by tests.
✅ Project coverage is 86.07%. Comparing base (7f584cd) to head (f57d1d5).
⚠️ Report is 3 commits behind head on forks/amsterdam.

Additional details and impacted files
@@               Coverage Diff                @@
##           forks/amsterdam    #2084   +/-   ##
================================================
  Coverage            86.07%   86.07%           
================================================
  Files                  599      599           
  Lines                39472    39472           
  Branches              3780     3780           
================================================
  Hits                 33977    33977           
  Misses                4862     4862           
  Partials               633      633           
Flag Coverage Δ
unittests 86.07% <ø> (ø)

Flags with carried forward coverage won't be shown. Click here to find out more.

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.
  • 📦 JS Bundle Analysis: Save yourself from yourself by tracking and limiting bundle sizes in JS merges.

session_t8n.reset_traces()
session_t8n.call_counter = 0
session_t8n.debug_dump_dir = dump_dir_parameter_level
# TODO: Configure the transition tool to count opcodes only when required.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Regarding this comment, we are currently counting opcodes every single time, but we could improve this by doing so only for tests that are marked to require it.

We could mark the tests that need this information with a new marker (pytest.mark.count_opcodes or similar). Perhaps we can set it at a folder level, e.g. for all tests in tests/benchmarking.

    if request.node.get_closest_marker("count_opcodes"):
      session_t8n.reset_opcode_count()
    else:
      session_t8n.remove_opcode_count()

or similar.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Created tracking issue so we can merge without this: #2102

Copy link
Member

@marioevz marioevz left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Running the latest version I see that we are adding @t8n-cache-<md5-hash> to the names of all tests for some reason. This seems incorrect to me, so I'd like to give this a proper re-review.

@danceratopz
Copy link
Member Author

Running the latest version I see that we are adding @t8n-cache-<md5-hash> to the names of all tests for some reason. This seems incorrect to me, so I'd like to give this a proper re-review.

This is to ensure that all parametrized test formats (state_test, blockchain_test, blockchain_test_engine) for a single test case get distributed to the same xdist worker to ensure they use the same per-worker cache; the cache is not global across workers.

@danceratopz danceratopz force-pushed the t8n-call-cache-specs branch 2 times, most recently from a294da7 to 0b598ec Compare February 2, 2026 13:52
@danceratopz danceratopz marked this pull request as draft February 2, 2026 13:56
@SamWilsn
Copy link
Contributor

SamWilsn commented Feb 3, 2026

Just a general comment (haven't looked at the code yet), but are the caches per-thread/per-process or global? If they aren't global, you might need to load group them to get the biggest benefit.

danceratopz and others added 14 commits February 9, 2026 12:16
- Enable pytest-xdist loadgroup distribution mode by default.
- Required for xdist_group markers to control worker assignment.
- Add strip_fixture_format_from_nodeid() to extract base nodeid.
- Add get_all_fixture_format_names() for format name lookup.
- Used to ensure related fixture formats share cache keys.
- Add T8nOutputCache LRU cache class for storing t8n outputs.
- Add t8n_output_cache field to FillingSession.
- Add xdist_group markers during collection for --dist=loadgroup.
- Use t8n-cache-{hash} prefix to distinguish from user-defined groups.
- Strip cache-specific @t8n-cache-* suffix from nodeids in TestInfo.
- Add cache key helpers to BaseTest (_get_base_nodeid, _get_t8n_cache_key).
- Add _get_filling_session() to access cache from test instances.
- Cache t8n outputs in _generate_block_data() for reuse across formats.
- Skip caching for engine_x and engine_sync variants (different execution).
- Test T8nOutputCache LRU behavior, eviction, and hit/miss tracking.
- Test strip_fixture_format_from_nodeid for various nodeid patterns.
- Test get_all_fixture_format_names ordering and contents.
- Test cache key consistency across fixture format variants.
- Test _strip_xdist_group_suffix preserves non-cache group markers.
Add tests to verify that test items are sorted during collection
to ensure deterministic cache hits. The tests demonstrate:

- Sorting groups related fixture formats by base nodeid.
- Without xdist, items are correctly sorted.
- With xdist, items are NOT sorted (BUG causing high variance).
- Expected vs actual behavior comparison.

The xfail test `test_xdist_sorting_required_for_cache_hits` asserts
the correct behavior (sorting with xdist) and fails until the fix
is applied.
- Add helper methods to TransitionToolCacheStats for serialization.
- Initialize aggregated stats on xdist controller in pytest_configure.
- Send worker stats via workeroutput in fixture teardown.
- Add pytest_testnodedown hook to aggregate stats from workers.
- Update pytest_terminal_summary to display aggregated stats.
- Clear `_cache` in `remove_cache()` to prevent stale data leakage.
- Tests without `transition_tool_cache_key` (e.g., state_test) could
  previously retrieve cached results from prior tests via matching
  `call_counter` subkeys.
Sort test items by (is_slow, base_nodeid, nodeid) to optimize execution:
- Slow tests first (LPT scheduling for xdist load balance).
- Related fixture formats grouped together (cache locality).
- Deterministic order within groups.

If ANY fixture format variant of a test is marked slow, ALL variants
are treated as slow to keep them grouped together for cache hits.

Reuses the base_nodeid cache for xdist marker generation to avoid
redundant strip_fixture_format_from_node calls.
BlockchainEngineXFixture and BlockchainEngineSyncFixture had
can_use_cache=False which was dead code (never checked anywhere).
Replace with transition_tool_cache_key="" which is the actual mechanism
that controls caching — empty string means no caching.
For StateTest specs with --generate-all-formats, the _from_state_test
label suffixes cause alphabetical sort to interleave cacheable and
non-cacheable formats: blockchain_test_engine_from_state_test (cacheable)
→ blockchain_test_engine_x_from_state_test (non-cacheable, clears cache)
→ blockchain_test_from_state_test (cacheable, but cache is gone).

Add has_cache_key to the sort key so cacheable formats cluster together
within each base nodeid group, ensuring the second cacheable format hits
the warm cache before any non-cacheable format clears it.
node_id_for_entropy strips fixture format and fork names from the node
ID before hashing it for deterministic address generation. However, it
did not strip the xdist @group_name suffix (e.g., @t8n-cache-abc12345),
causing different addresses when running with vs without xdist workers.

Strip the suffix so addresses are deterministic regardless of whether
xdist is active.
Replace the raw hit/miss counts with an efficiency metric where 100%
means all tests that could have hit the cache did hit it. Track unique
cache keys to compute expected hits (total cacheable - unique keys).

Also filter subkey stats to only count cacheable tests, eliminating
phantom misses from non-cacheable tests that still interact with the
OutputCache after remove_cache().

Before: T8n cache: key_hits=6, key_misses=6 (50.0%), subkey_hits=6, subkey_misses=18 (25.0%)
After:  T8n cache: 100% hit rate (6/6 expected), 6 t8n calls saved
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

A-test-fill Area: execution_testing.cli.pytest_commands.plugins.filler A-test-specs Area: execution_testing.specs C-feat Category: an improvement or new feature

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants