fix(l1): bound RocksDB index and filter block memory by ilitteri · Pull Request #6735 · lambdaclass/ethrex

ilitteri · 2026-05-27T14:53:56Z

Motivation

ethrex's RocksDB backend ties resident memory to database size with no upper bound: as the on-disk state grows, so does the in-heap footprint, with no ceiling. On any long-running node this presents as resident memory that climbs indefinitely — operationally indistinguishable from a memory leak — and on a large enough database it will eventually exhaust the host. The mechanism behind this (RocksDB keeping all SST files' index and filter blocks pinned in heap, outside its bounded LRU) is detailed below.

Description

This PR ships in two commits.

1. Store index and filter blocks in the shared block cache (67f3492f).
Enables cache_index_and_filter_blocks(true) + pin_l0_filter_and_index_blocks_in_cache(true) on every column family. With this change, RocksDB stops pinning every open SST's index and bloom-filter blocks in heap and instead routes them through its shared LRU cache. Total RocksDB resident memory now tracks the block cache size, not the database size.

2. Expose the block cache size as a CLI option (32ffd479).
Adds --rocksdb.block-cache-size <BYTES> (env ETHREX_ROCKSDB_BLOCK_CACHE_SIZE), default 20 GiB. Plumbed through a new StoreConfig struct and *_with_config constructor variants on Store and the init_store / load_store / open_store helpers; the existing zero-config constructors keep working with the default and are unchanged for tests, tools, and L2 callers.

The cache size now governs the memory vs. block-import-throughput trade-off: filter and index blocks share the cache with data blocks, so a cache that is too small to hold the filter + index working set plus a useful amount of hot data will stall execution. The CLI help text states this explicitly and warns against lowering the value below the default.

Validation (live on mainnet, 60-block window of head-following, same chain segment)

	Stock baseline	Fix @ 4 GiB	Fix @ 20 GiB (default)
Median block-import	35.4 ms	53.0 ms	31.5 ms
Mean block-import	38.1 ms	66.2 ms	36.4 ms
Median ratio vs baseline	—	1.50×	0.89×
Mean ratio vs baseline	—	1.74×	1.00×
RSS at ~500 GB DB	16 GB, climbing	7 GB, bounded	27 GB, bounded
RSS projected at 1 TB DB	~28–30 GB	~7 GB	~27 GB (unchanged)

A jemalloc heap profile of the unfixed baseline attributed ~92% of resident memory to RocksDB, dominated by ~8 GB of index and bloom-filter blocks (~6 GB of which are bloom filters). With the fix applied, the corresponding PrefetchIndexAndFilterBlocks allocations drop from ~8 GB to under 1 GB — the rest is now demand-loaded into the bounded cache via GetOrReadFilterBlock.

At the 20 GiB default, block-import is at parity with the unfixed baseline and resident memory is bounded forever regardless of database growth.

Trade-off worth noting

At today's ~500 GB mainnet database the default 20 GiB cache uses more memory than the unfixed baseline (~27 GB vs ~16 GB). The value of the fix is bounded memory forever — the unfixed baseline keeps climbing as the database grows (state DBs only grow); the crossover lands around a ~1 TB database. Operators who need a lower ceiling at the cost of throughput can lower the cache size; the help text documents this.

… instead of pinning them per open file. With max_open_files(-1) every SST stays open, and the RocksDB default (cache_index_and_filter_blocks = false) keeps each file's index and filter blocks in heap for the reader's lifetime, so table memory grows without bound with the number of SST files. On a 490 GB mainnet DB this reached ~8 GB of pinned index/filter blocks (~6 GB of it bloom filters), driving resident memory to ~20 GB. Enabling cache_index_and_filter_blocks moves index and filter blocks into the bounded block cache, capping total table memory at the cache size. pin_l0_filter_and_index_blocks_in_cache keeps the hottest level's metadata resident to avoid a read-latency cliff on the cache.

github-actions · 2026-05-27T14:54:15Z

⚠️ Known Issues — intentionally skipped tests

Source: docs/known_issues.md

Known Issues

Tests intentionally excluded from CI. Source of truth for the Known
Issues section the L1 workflow appends to each ef-tests job summary
and posts as a sticky PR comment.

EF Tests — Stateless coverage narrowed to EIP-8025 optional-proofs

make -C tooling/ef_tests/blockchain test calls test-stateless-zkevm
instead of test-stateless. The zkevm@v0.3.3 fixtures are filled against
bal@v5.6.1, out of sync with current bal spec; the broad target trips ~549
fixtures. Re-broaden once the zkevm bundle is regenerated.

Why and resolution path

PR #6527 broadened
test-stateless to extract the entire for_amsterdam/ tree from the
zkevm bundle and run all of it under --features stateless; combined with
this branch's bal-devnet-7 semantics that scope produces ~549
GasUsedMismatch / ReceiptsRootMismatch /
BlockAccessListHashMismatch failures.

test-stateless-zkevm filters cargo to the eip8025_optional_proofs
suite, which still validates the stateless harness without the bal-version
mismatch.

Re-broaden by switching test: back to test-stateless in
tooling/ef_tests/blockchain/Makefile once the zkevm bundle is regenerated
against the current bal spec.

github-actions · 2026-05-27T14:57:19Z

Lines of code report

Total lines added: 111
Total lines removed: 0
Total lines changed: 111

Detailed view

+------------------------------------------+-------+------+
| File                                     | Lines | Diff |
+------------------------------------------+-------+------+
| ethrex/cmd/ethrex/cli.rs                 | 1243  | +46  |
+------------------------------------------+-------+------+
| ethrex/cmd/ethrex/initializers.rs        | 676   | +21  |
+------------------------------------------+-------+------+
| ethrex/cmd/ethrex/l2/initializers.rs     | 386   | +5   |
+------------------------------------------+-------+------+
| ethrex/crates/storage/backend/rocksdb.rs | 334   | +5   |
+------------------------------------------+-------+------+
| ethrex/crates/storage/store.rs           | 2757  | +34  |
+------------------------------------------+-------+------+

(--rocksdb.block-cache-size, env ETHREX_ROCKSDB_BLOCK_CACHE_SIZE) with a default of 20 GiB. Because the previous commit moved index and bloom-filter blocks into the bounded block cache, the cache size now governs total RocksDB resident memory and significantly influences block-import throughput. Measured on a synced mainnet node: at a 4 GiB cache, filter blocks monopolize the cache and block exec is ~76% slower than the unbounded baseline; at 20 GiB the cache comfortably holds the filter + index working set plus the EVM's hot data and exec is at parity. The help text spells the trade-off out explicitly and only recommends lowering it on resource-constrained hosts. Plumbed through a new StoreConfig struct (exposed from ethrex-storage) and Store::new_with_config / new_from_genesis_with_config / {init,load,open}_store_with_config variants. The existing zero-config constructors continue to use the default and remain unchanged for tests and tools, so callers that don't need to override the cache size are unaffected.

github-actions Bot assigned ilitteri May 27, 2026

github-actions Bot added the L1 Ethereum client label May 27, 2026

github-project-automation Bot added this to ethrex_l1 May 27, 2026

pablodeymo approved these changes May 27, 2026

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

fix(l1): bound RocksDB index and filter block memory#6735

fix(l1): bound RocksDB index and filter block memory#6735
ilitteri wants to merge 2 commits into
mainfrom
fix/rocksdb-bounded-index-filter-memory

ilitteri commented May 27, 2026 •

edited

Loading

Uh oh!

github-actions Bot commented May 27, 2026 •

edited

Loading

Uh oh!

github-actions Bot commented May 27, 2026 •

edited

Loading

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

ilitteri commented May 27, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Motivation

Description

Validation (live on mainnet, 60-block window of head-following, same chain segment)

Trade-off worth noting

Uh oh!

github-actions Bot commented May 27, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

⚠️ Known Issues — intentionally skipped tests

Known Issues

EF Tests — Stateless coverage narrowed to EIP-8025 optional-proofs

Uh oh!

github-actions Bot commented May 27, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Lines of code report

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

ilitteri commented May 27, 2026 •

edited

Loading

github-actions Bot commented May 27, 2026 •

edited

Loading

github-actions Bot commented May 27, 2026 •

edited

Loading