perf(l1): add bloom filter to skip trie seeks for non-existent storage slots by ilitteri · Pull Request #6288 · lambdaclass/ethrex

ilitteri · 2026-03-03T01:17:26Z

Motivation

Storage reads (get_storage_value) always perform a full trie traversal, even for slots that were never written. On workloads with many SLOAD misses (e.g. first-touch patterns), this is a significant bottleneck.

Description

Add a StorageBloomFilter backed by fastbloom::AtomicBloomFilter that tracks every (address, storage_key) pair written with a non-zero value. Before traversing the trie, get_storage_value checks the bloom filter — if the slot was definitely never written, the trie lookup is skipped entirely.

Key design points:

Starts disabled (pass-through): The filter is created empty and might_contain always returns true until enable() is called. This prevents an unpopulated filter from incorrectly rejecting all lookups.
Populated on writes: insert() is called whenever a non-zero storage value is written to the trie.
1% false positive rate with 200M expected items, using FxBuildHasher for fast hashing.
Thread-safe: AtomicBloomFilter with AtomicBool enabled flag (Release/Acquire ordering).

Future work: scan existing trie storage on startup to populate the filter and call enable(), so the filter can actually start rejecting lookups for pre-existing data.

How to Test

Run the existing test suite — the filter is disabled by default (pass-through), so behavior is identical to main.
Once enable() is wired up after initial population, benchmark with import on l2-1k-erc20.rlp to measure trie-skip savings.

github-actions · 2026-03-03T01:17:59Z

🤖 Kimi Code Review

Security & Correctness Issues

Race condition in bloom filter initialization (bloom.rs:40-43, store.rs:1491)
The bloom filter is created with enabled: false but is immediately shared via Arc. Other threads could call might_contain() before enable() is called, causing unnecessary trie lookups. The initialization flow needs to be atomic.
Missing bloom filter population on startup (store.rs:1488-1491)
The bloom filter is initialized with capacity for 200M items but never populated with existing storage slots. This means:
- On restart, the filter will incorrectly report "never written" for existing storage
- Could cause consensus failures by returning None for existing storage values
No handling of storage deletion (store.rs:1732-1740, 1824-1832)
When storage values are set to zero (deleted), the bloom filter retains the "might contain" status forever. This causes:
- False positives accumulating over time
- Unnecessary trie lookups for deleted storage slots
- Potential performance degradation

Performance & Design Issues

Inefficient key serialization (bloom.rs:57-62)
The make_key function creates a new [u8; 52] array on every call. Consider:
- Using a stack-allocated buffer passed as mutable reference
- Or implementing a proper hasher that can process the two fields without allocation
Hard-coded capacity (store.rs:1491)
StorageBloomFilter::new(200_000_000) uses a fixed capacity that may not match actual chain state. This should be:
- Configurable via CLI/config
- Or calculated based on actual chain statistics
Memory usage concern
A bloom filter for 200M items at 1% false positive rate requires ~240MB of memory. This should be documented and made configurable.

Code Quality

Debug implementation is incomplete (bloom.rs:15-20)
The debug implementation doesn't show any useful state. Consider including:
- Enabled status
- Capacity
- Number of inserted items (if available from fastbloom)
Atomic ordering could be relaxed (bloom.rs:42, 51)
Ordering::Release/Acquire might be stronger than needed. Ordering::Relaxed is likely sufficient since there's no other shared state being synchronized.

Recommendations

Add bloom filter population during store initialization by scanning existing storage tries
Implement bloom filter cleanup when storage slots are deleted (consider counting bloom filter or periodic rebuild)
Make bloom filter capacity configurable
Document memory usage implications
Consider using a more sophisticated approach for handling deletions (e.g., counting bloom filter or epoch-based rebuilds)

Automated review by Kimi (Moonshot AI) · custom prompt

github-actions · 2026-03-03T01:19:23Z

🤖 Codex Code Review

Potential consensus/correctness break if bloom is ever enabled without full backfill
The read path can short-circuit to None before touching trie state (store.rs:2119).
But the bloom is only populated from new non-zero writes during account updates (store.rs:1738, store.rs:1830) and misses existing DB/genesis storage (store.rs:1877).
If enable() is later called (bloom.rs:36) without rebuilding from full state, historical/existing non-zero slots can become false negatives and return wrong storage values.
Feature appears inactive in this PR (performance benefit not realized)
StorageBloomFilter starts disabled (bloom.rs:30), and I don’t see any call site enabling it. That means might_contain() always returns true (bloom.rs:50), so the new fast path in get_storage_at_root is effectively dead code right now.
High fixed memory cost at startup
The filter is instantiated with expected_items(200_000_000) (store.rs:1491, bloom.rs:29). At 1% FP rate this is roughly hundreds of MB just for bloom bits, plus atomic structure overhead. For client deployments this is a significant resident-memory increase and should be justified/configurable.

If intended direction is to ship this optimization, I’d require:

deterministic backfill/rebuild strategy (including genesis + existing DB state),
explicit enable point only after backfill completes,
tests proving no false negatives for existing state and historical reads across restart.

Automated review by OpenAI Codex · custom prompt

github-actions · 2026-03-03T01:20:32Z

🤖 Claude Code Review

Now I have enough context to write a comprehensive review. Here it is:

PR Review: `perf(l1): add bloom filter to skip trie seeks for non-existent storage slots`

The overall approach is sound — using a bloom filter to short-circuit trie traversal on SLOAD misses is a well-known optimization. The pass-through-until-enabled design correctly avoids the corruption that was caught in the fixup commit. However, there are several issues worth addressing before merging.

Correctness

Critical: Genesis storage bypasses the bloom filter

setup_genesis_state_trie (store.rs:1877–1881) calls storage_trie.insert() for every non-zero genesis slot but never calls storage_bloom.insert():

for (storage_key, storage_value) in account.storage {
    if !storage_value.is_zero() {
        let hashed_key = hash_key(&H256(storage_key.to_big_endian()));
        storage_trie.insert(hashed_key, storage_value.encode_to_vec())?; // no bloom insert
    }
}

When enable() is eventually wired up after "scanning existing trie storage on startup", any genesis storage slot that was never subsequently written during block processing will be absent from the filter. At that point might_contain will return false for those slots and get_storage_at_root will return Ok(None) instead of the actual value — silently corrupting reads. This is the exact same class of bug that the fixup commit caught during development.

The PR description acknowledges this as "future work", but the code as written has no guard or comment at the enable() call site ensuring the filter is fully populated before activation. A TODO or debug_assert linking the genesis-population step to enable() would prevent the same mistake from recurring.

Concern: write_storage_trie_nodes_batch also bypasses the bloom filter

write_storage_trie_nodes_batch (store.rs:1164) writes raw trie node bytes directly to the database with no bloom update. If this path is used to ingest storage (e.g. during snap sync), those slots will be invisible to the filter after enable().

Memory

~240 MB always allocated, even in disabled (pass-through) mode

With 200M items and 1% FPR the underlying bit array is approximately:

m = -n * ln(p) / ln(2)^2  ≈  1.92 × 10^9 bits  ≈  240 MB

This is allocated unconditionally at Store::new time (store.rs:1491). An always-resident 240 MB allocation is a meaningful cost for nodes that never call enable(), and especially large for testnets, development, or L2 instances where the expected item count is orders of magnitude smaller.

The hardcoded 200_000_000 in Store::new gives no way for callers to tune the capacity to their workload. At minimum the constant deserves a comment explaining where 200M comes from (e.g. "approximate upper bound of distinct non-zero storage slots on mainnet"). A config option or lazy initialization would be better.

Semantic gap: zero-writes after non-zero writes

When SSTORE sets a slot to zero, the trie removes the entry (store.rs:1735–1736, 1827–1828) but the bloom filter is not and cannot be updated (bloom filters don't support deletion). This is intentional and correct — the false positive causes a trie lookup that correctly returns None. The comment on might_contain correctly describes this. No action needed, but the insert() doc comment at bloom.rs:42 says "Record that a non-zero value exists at (address, key)" — it should mention that once inserted, a slot remains in the filter even after the value is deleted.

Missing tests

bloom.rs has no unit tests. For a correctness-sensitive component that gates all storage reads, at least the following should be covered:

might_contain returns true (pass-through) before enable()
After enable(), might_contain returns false for never-inserted keys
After insert + enable(), might_contain returns true
make_key produces distinct values for (A, K1) vs (A, K2) and (A1, K) vs (A2, K)

Minor

bloom.rs:24: pub fn new(capacity: usize) — the capacity parameter is unused internally (the builder takes expected_items, which is correct), but naming it capacity vs expected_items may mislead callers about semantics.
The FxBuildHasher choice is fine here — inputs are addresses and keys, not adversarially controlled strings, so non-cryptographic hashing is appropriate.
Arc<StorageBloomFilter> is correctly shared across Store clones (since Store: Clone and Arc clone is shallow). This is good.
Ordering choices (Release on store, Acquire on load) are correct for establishing a happens-before relationship between population and enable.

Summary

The implementation is mechanically correct for the current (always-disabled) state, and the pass-through design safely prevents corruption. The main concern is that the path to activating the filter — the advertised "future work" — has a latent correctness trap: enable() must not be called until all existing storage has been inserted, including genesis state and any snap-synced data. This precondition is easy to violate accidentally. I'd recommend adding either an explicit populate_from_genesis step in the bloom module, or at minimum a prominent // SAFETY: comment at every future enable() call site enumerating what must be true before calling it.

Automated review by Claude (Anthropic) · custom prompt

Copilot

Pull request overview

This PR introduces a storage-level bloom filter to avoid expensive storage-trie lookups for (address, storage_key) pairs that have definitely never been written with a non-zero value, targeting workloads with many SLOAD misses.

Changes:

Add StorageBloomFilter (wrapper around fastbloom::AtomicBloomFilter) with an enabled pass-through mode.
Populate the bloom filter on non-zero storage writes and consult it in Store::get_storage_at_root to potentially skip trie access.
Wire the new module into the storage crate and store it on Store.

Reviewed changes

Copilot reviewed 3 out of 3 changed files in this pull request and generated 3 comments.

File	Description
`crates/storage/store.rs`	Adds bloom filter field, populates it on writes, and checks it before storage trie traversal.
`crates/storage/lib.rs`	Registers the new internal `bloom` module.
`crates/storage/bloom.rs`	Implements `StorageBloomFilter` with enable/pass-through semantics and key construction.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Copilot · 2026-03-03T01:21:18Z

            last_computed_flatkeyvalue: Arc::new(RwLock::new(last_written)),
            account_code_cache: Arc::new(Mutex::new(CodeCache::default())),
            code_metadata_cache: Arc::new(Mutex::new(rustc_hash::FxHashMap::default())),
+            storage_bloom: Arc::new(StorageBloomFilter::new(200_000_000)),


StorageBloomFilter::new(200_000_000) is created unconditionally during Store construction. For a 1% false-positive rate, a bloom sized for 200M items is on the order of hundreds of MB of resident memory, which can cause OOM/regressions (notably for the many EngineType::InMemory test stores). Consider making this capacity configurable, drastically smaller by default, or lazily allocating the backing filter only when the bloom is actually going to be used/populated.

Suggested change

storage_bloom: Arc::new(StorageBloomFilter::new(200_000_000)),

storage_bloom: Arc::new(StorageBloomFilter::new(2_000_000)),

Copilot · 2026-03-03T01:21:18Z

                    if storage_value.is_zero() {
                        storage_trie.remove(&hashed_key)?;
                    } else {
+                        self.storage_bloom.insert(update.address, *storage_key);


The PR description says the bloom is “disabled by default (pass-through), so behavior is identical”, but insert() is still called on every non-zero storage write even while disabled. That adds hashing + atomic updates to the write path with no read-side benefit until enable() is wired up. If the intent is truly zero-overhead until enabled, consider gating insert() behind a separate runtime/feature flag or deferring population until you’re ready to enable the filter.

Suggested change

self.storage_bloom.insert(update.address, *storage_key);

if self.storage_bloom.is_enabled() {

self.storage_bloom.insert(update.address, *storage_key);

}

Copilot · 2026-03-03T01:21:18Z

+                        self.storage_bloom.insert(update.address, *storage_key);
                        storage_trie.insert(hashed_key, storage_value.encode_to_vec())?;
                    }


storage_bloom is populated here for added_storage, but there are other code paths that write non-zero values into storage tries without calling storage_bloom.insert (e.g. setup_genesis_state_trie inserts directly into a storage trie). If enable() is ever called without a full scan/population step that covers those paths, this can introduce false negatives and incorrect get_storage_at_root results. Consider centralizing storage-trie writes or ensuring all non-zero writes go through a helper that also updates the bloom.

github-actions · 2026-03-03T01:21:49Z

Lines of code report

Total lines added: 101
Total lines removed: 0
Total lines changed: 101

Detailed view

+--------------------------------+-------+------+
| File                           | Lines | Diff |
+--------------------------------+-------+------+
| ethrex/crates/storage/bloom.rs | 92    | +92  |
+--------------------------------+-------+------+
| ethrex/crates/storage/lib.rs   | 13    | +1   |
+--------------------------------+-------+------+
| ethrex/crates/storage/store.rs | 2521  | +8   |
+--------------------------------+-------+------+

greptile-apps · 2026-03-03T01:22:06Z

Greptile Summary

This PR adds infrastructure for a StorageBloomFilter — backed by fastbloom::AtomicBloomFilter — that is intended to skip trie lookups for (address, storage_key) pairs that were never written. The filter is wired into get_storage_at_root and populated on every non-zero storage write.

The core mechanism is sound: AtomicBool enabled-flag with Release/Acquire ordering, a 52-byte concatenated key, and a 1 % false-positive rate over 200 M expected items.
The filter is never activated. enable() has no callers in the repository, so might_contain always returns true (pass-through). Every trie lookup still executes — there is no performance benefit today.
~240 MB of RAM is allocated at startup unconditionally, even though the filter is unused.
insert() adds hashing cost on every non-zero storage write with no corresponding benefit while disabled.
There are no unit tests for bloom.rs; the pass-through guarantee and the no-false-negatives invariant (critical for correctness) are untested.
The enable() function carries a silent correctness footgun: if called before the filter is fully populated from existing trie state, get_storage_at_root will silently return Ok(None) for slots that actually hold a value.

Confidence Score: 3/5

Safe to merge in terms of correctness (filter is disabled, so behaviour is identical to main), but introduces ~240 MB of unconditional memory overhead and per-write hashing cost with no active benefit.
Since enable() is never called, the filter acts as a pure pass-through and cannot produce incorrect results. However, landing infra that consumes significant memory/CPU with zero benefit, and with no tests, warrants attention before merging.
crates/storage/bloom.rs (missing tests, undocumented footgun on enable()), crates/storage/store.rs (large unconditional allocation, insert overhead while disabled).

Important Files Changed

Filename	Overview
crates/storage/bloom.rs	New `StorageBloomFilter` wrapping `fastbloom::AtomicBloomFilter`; correct thread-safety model and key construction, but no unit tests and the `enable()` path carries a silent correctness footgun if called before full trie population.
crates/storage/store.rs	Integrates `StorageBloomFilter` into `Store`: allocates ~240 MB upfront at node startup, inserts on every non-zero storage write, and short-circuits `get_storage_at_root`; however `enable()` is never called so the filter is pure overhead in its current state.
crates/storage/lib.rs	Trivial addition of `mod bloom` declaration; no issues.

Sequence Diagram

sequenceDiagram
    participant EVM as EVM / RPC Caller
    participant Store as Store
    participant Bloom as StorageBloomFilter
    participant Trie as Storage Trie

    EVM->>Store: get_storage_at_root(state_root, address, key)
    Store->>Bloom: might_contain(address, key)
    alt filter disabled (current state) OR false positive
        Bloom-->>Store: true
        Store->>Trie: open_state_trie → get account storage_root
        Store->>Trie: open_storage_trie → get(hashed_key)
        Trie-->>Store: Option<encoded_value>
        Store-->>EVM: Ok(Some(value)) / Ok(None)
    else filter enabled AND definitely absent
        Bloom-->>Store: false
        Store-->>EVM: Ok(None)  [trie skip]
    end

    note over Bloom: enable() is never called today,<br/>so the "trie skip" path is never taken

    EVM->>Store: apply_account_updates (non-zero write)
    Store->>Bloom: insert(address, key)
    Store->>Trie: trie.insert(hashed_key, value)

_{Last reviewed commit: f36d8cb}

greptile-apps

_{3 files reviewed, 4 comments}

_{Edit Code Review Agent Settings | Greptile}

greptile-apps · 2026-03-03T01:22:09Z

            last_computed_flatkeyvalue: Arc::new(RwLock::new(last_written)),
            account_code_cache: Arc::new(Mutex::new(CodeCache::default())),
            code_metadata_cache: Arc::new(Mutex::new(rustc_hash::FxHashMap::default())),
+            storage_bloom: Arc::new(StorageBloomFilter::new(200_000_000)),


~240 MB allocated at startup before the filter is useful

StorageBloomFilter::new(200_000_000) with a 1 % false-positive rate requires roughly m = -n·ln(p) / ln(2)² bits ≈ 1.92 billion bits ≈ 240 MB of RAM. This allocation happens unconditionally at node startup, even though:

enable() is never called anywhere in the codebase, so might_contain always returns true (pass-through).

The "Future work" item (scan existing trie on startup then call enable()) has not yet been implemented.

The net effect today is a large fixed memory cost with zero benefit — every trie lookup still executes.

Consider either (a) deferring the allocation until enable() is about to be called, (b) making capacity configurable/feature-gated, or (c) landing this PR together with the startup-scan + enable() call so the optimisation is actually active from the first merge.

Prompt To Fix With AI

This is a comment left during a code review. Path: crates/storage/store.rs Line: 1491 Comment: **~240 MB allocated at startup before the filter is useful** `StorageBloomFilter::new(200_000_000)` with a 1 % false-positive rate requires roughly `m = -n·ln(p) / ln(2)²` bits ≈ **1.92 billion bits ≈ 240 MB** of RAM. This allocation happens unconditionally at node startup, even though: 1. `enable()` is never called anywhere in the codebase, so `might_contain` always returns `true` (pass-through). 2. The "Future work" item (scan existing trie on startup then call `enable()`) has not yet been implemented. The net effect today is a large fixed memory cost with **zero benefit** — every trie lookup still executes. Consider either (a) deferring the allocation until `enable()` is about to be called, (b) making `capacity` configurable/feature-gated, or (c) landing this PR together with the startup-scan + `enable()` call so the optimisation is actually active from the first merge. How can I resolve this? If you propose a fix, please make it concise.

greptile-apps · 2026-03-03T01:22:10Z

                    if storage_value.is_zero() {
                        storage_trie.remove(&hashed_key)?;
                    } else {
+                        self.storage_bloom.insert(update.address, *storage_key);


insert() incurs hashing cost even while the filter is disabled

insert() always computes make_key and hashes into the underlying AtomicBloomFilter, even when enabled == false. Since might_contain short-circuits to true while disabled, these inserts represent pure overhead on every non-zero storage write until enable() is eventually called.

If the motivation for inserting unconditionally is to "warm up" the filter so it's ready when enable() is called, that is a reasonable design — but it should be explicitly documented with a comment. Otherwise, a future reader may add an early-return guard in insert() that would actually break the warm-up intent.

The same applies to the second insert site at line 1830.

Prompt To Fix With AI

This is a comment left during a code review. Path: crates/storage/store.rs Line: 1738 Comment: **`insert()` incurs hashing cost even while the filter is disabled** `insert()` always computes `make_key` and hashes into the underlying `AtomicBloomFilter`, even when `enabled == false`. Since `might_contain` short-circuits to `true` while disabled, these inserts represent **pure overhead** on every non-zero storage write until `enable()` is eventually called. If the motivation for inserting unconditionally is to "warm up" the filter so it's ready when `enable()` is called, that is a reasonable design — but it should be explicitly documented with a comment. Otherwise, a future reader may add an early-return guard in `insert()` that would actually break the warm-up intent. The same applies to the second insert site at line 1830. How can I resolve this? If you propose a fix, please make it concise.

greptile-apps · 2026-03-03T01:22:11Z

+use std::fmt;
+use std::sync::atomic::{AtomicBool, Ordering};
+
+use ethrex_common::{Address, H256};
+use fastbloom::AtomicBloomFilter;
+use rustc_hash::FxBuildHasher;
+
+const FALSE_POSITIVE_RATE: f64 = 0.01;
+
+/// Bloom filter that tracks which (address, storage_key) pairs have non-zero
+/// storage values. Used to skip expensive trie lookups for slots that were
+/// never written to.
+pub struct StorageBloomFilter {
+    filter: AtomicBloomFilter<FxBuildHasher>,
+    enabled: AtomicBool,
+}
+
+impl fmt::Debug for StorageBloomFilter {
+    fn fmt(&self, f: &mut fmt::Formatter<'_>) -> fmt::Result {
+        f.debug_struct("StorageBloomFilter").finish()
+    }
+}
+
+impl StorageBloomFilter {
+    pub fn new(capacity: usize) -> Self {
+        Self {
+            filter: AtomicBloomFilter::with_false_pos(FALSE_POSITIVE_RATE)
+                .hasher(FxBuildHasher)
+                .expected_items(capacity),
+            enabled: AtomicBool::new(false),
+        }
+    }
+
+    /// Activate the bloom filter after it has been populated.
+    /// Before this is called, `might_contain` always returns `true` (pass-through).
+    pub fn enable(&self) {
+        self.enabled.store(true, Ordering::Release);
+    }
+
+    /// Record that a non-zero value exists at (address, key).
+    pub fn insert(&self, address: Address, key: H256) {
+        let bloom_key = Self::make_key(address, key);
+        self.filter.insert(&bloom_key);
+    }
+
+    /// Returns `true` if the slot *might* contain a non-zero value.
+    /// Returns `false` if the slot was definitely never written.
+    /// When the filter is not yet enabled, always returns `true` (pass-through).
+    pub fn might_contain(&self, address: Address, key: H256) -> bool {
+        if !self.enabled.load(Ordering::Acquire) {
+            return true;
+        }
+        let bloom_key = Self::make_key(address, key);
+        self.filter.contains(&bloom_key)
+    }
+
+    fn make_key(address: Address, key: H256) -> [u8; 52] {
+        let mut buf = [0u8; 52];
+        buf[..20].copy_from_slice(address.as_bytes());
+        buf[20..].copy_from_slice(key.as_bytes());
+        buf
+    }
+}


No unit tests for the new module

bloom.rs introduces a StorageBloomFilter with non-trivial behaviour: a disabled filter must be a transparent pass-through, and an enabled filter must never produce false negatives. Neither property is covered by a test.

Suggested minimal test cases:

A newly created filter (disabled) always returns true from might_contain.

After enable(), a key that was insert()-ed still returns true.

After enable(), a key that was never inserted returns false (no false negatives for a key set that is small relative to capacity).

make_key produces distinct bytes for (a, k₁) vs (a, k₂) and (a₁, k) vs (a₂, k).

Given that a correctness regression here (false negatives while enabled) would silently return None for slots that actually have a value, tests are especially important.

Prompt To Fix With AI

This is a comment left during a code review. Path: crates/storage/bloom.rs Line: 1-63 Comment: **No unit tests for the new module** `bloom.rs` introduces a `StorageBloomFilter` with non-trivial behaviour: a disabled filter must be a transparent pass-through, and an enabled filter must never produce false negatives. Neither property is covered by a test. Suggested minimal test cases: - A newly created filter (disabled) always returns `true` from `might_contain`. - After `enable()`, a key that was `insert()`-ed still returns `true`. - After `enable()`, a key that was **never** inserted returns `false` (no false negatives for a key set that is small relative to capacity). - `make_key` produces distinct bytes for `(a, k₁)` vs `(a, k₂)` and `(a₁, k)` vs `(a₂, k)`. Given that a correctness regression here (false negatives while enabled) would silently return `None` for slots that actually have a value, tests are especially important. How can I resolve this? If you propose a fix, please make it concise.

greptile-apps · 2026-03-03T01:22:12Z

+    pub fn enable(&self) {
+        self.enabled.store(true, Ordering::Release);
+    }


enable() is never called — silent correctness risk for future callers

enable() has no callers in the repository today. When someone eventually wires it up (the "Future work" mentioned in the PR), they must ensure the filter has been fully populated from existing trie storage before calling it; calling enable() prematurely on a partially-populated filter would cause might_contain to return false for slots that genuinely have a value, silently returning Ok(None) instead of the correct stored value.

Consider adding a doc-comment warning directly on enable() about this requirement so the footgun is visible at the call site:

/// Activate the bloom filter. /// /// # Safety / correctness /// The filter MUST have been fully populated (via `insert`) for all /// (address, storage_key) pairs that exist in the trie before this is called. /// Calling `enable()` on a partially-populated filter will cause /// `might_contain` to return `false` for real slots, producing silent /// incorrect `None` results from `get_storage_at_root`. pub fn enable(&self) { self.enabled.store(true, Ordering::Release); }

Prompt To Fix With AI

This is a comment left during a code review. Path: crates/storage/bloom.rs Line: 36-38 Comment: **`enable()` is never called — silent correctness risk for future callers** `enable()` has no callers in the repository today. When someone eventually wires it up (the "Future work" mentioned in the PR), they must ensure the filter has been **fully populated** from existing trie storage before calling it; calling `enable()` prematurely on a partially-populated filter would cause `might_contain` to return `false` for slots that genuinely have a value, silently returning `Ok(None)` instead of the correct stored value. Consider adding a doc-comment warning directly on `enable()` about this requirement so the footgun is visible at the call site: ```rust /// Activate the bloom filter. /// /// # Safety / correctness /// The filter MUST have been fully populated (via `insert`) for all /// (address, storage_key) pairs that exist in the trie before this is called. /// Calling `enable()` on a partially-populated filter will cause /// `might_contain` to return `false` for real slots, producing silent /// incorrect `None` results from `get_storage_at_root`. pub fn enable(&self) { self.enabled.store(true, Ordering::Release); } ``` How can I resolve this? If you propose a fix, please make it concise.

…abled The bloom filter was created empty and immediately used to reject ALL storage lookups, since an empty bloom filter returns false for every query. This caused SLOAD to return 0 for all pre-existing storage slots, producing a gas mismatch (4.6M vs 10.5M expected). Add an `enabled` flag (AtomicBool, defaults to false) so that `might_contain` returns true (pass-through) until `enable()` is called. The filter will only start rejecting lookups after it has been populated with existing storage data and explicitly activated.

github-actions · 2026-03-03T03:15:11Z

Benchmark Block Execution Results Comparison Against Main

Command	Mean [s]	Min [s]	Max [s]	Relative
`base`	61.141 ± 0.284	60.668	61.458	1.00
`head`	61.293 ± 0.234	60.968	61.598	1.00 ± 0.01

… mark flaky Hive Cancun ReOrg test (Transaction Nonce variant) as known-flaky

…nditions Use OnceLock to lazily allocate the ~240MB bloom filter on first insert() instead of at Store construction. This avoids upfront memory overhead when the filter is never used (dev mode, testnets). Also documents the warm-up insert pattern (inserts happen while disabled to populate the filter before enable()), and adds a precondition doc on enable() listing what must be true before calling it.

Genesis setup_genesis_state_trie inserts storage slots without updating the bloom filter, and write_storage_trie_nodes_batch (snap sync) bypasses it too. Document these as latent false-negative sources for when bloom is enabled.

Test pass-through when disabled, no false negatives after enable, rejection of unknown keys, and make_key distinctness.

The bloom filter only tracks writes during the current process lifetime. For RPC historical state queries, slots that were non-zero in older states but later zeroed won't be in the filter, causing false negatives.

iovoid · 2026-03-06T14:49:06Z

+    #[allow(dead_code)]
+    pub fn enable(&self) {


Why is the feature never enabled?

- Add §1.18 observability tooling (PR #6470) - Add §1.19 pivot update reliability (PR #6475, issue #6474) - Add §1.20 big-account within-trie parallelization (issue #6477) - Add §1.21 small-account batching (issue #6476) - Add §1.22 decoded TrieLayerCache (PR #6348) - Add §1.23 bloom filter for non-existent storage (PR #6288) - Add §1.24 adaptive request sizing + bisection (PR #6181) - Add §1.25 concurrent bytecode + storage (PR #6205) - Add §1.26 phase completion markers (PR #6189) - Add §2.18 StorageTrieTracker refactor (PR #6171) - Update current-state bottleneck table with small-account and pivot-update findings - Reprioritize timeline: pivot-update crash fix is now priority 0 - Add two risks (pivot crash masks perf work, DB corruption on every crash) - Bump doc version to 1.3

Copilot AI review requested due to automatic review settings March 3, 2026 01:17

ilitteri requested a review from a team as a code owner March 3, 2026 01:17

github-actions Bot assigned ilitteri Mar 3, 2026

github-actions Bot added L1 Ethereum client performance Block execution throughput and performance in general labels Mar 3, 2026

github-project-automation Bot added this to ethrex_performance and ethrex_l1 Mar 3, 2026

github-project-automation Bot moved this to Todo in ethrex_performance Mar 3, 2026

Copilot started reviewing on behalf of ilitteri March 3, 2026 01:17 View session

Copilot AI reviewed Mar 3, 2026

View reviewed changes

greptile-apps Bot reviewed Mar 3, 2026

View reviewed changes

ilitteri added 2 commits March 2, 2026 23:29

Add bloom filter to skip trie seeks for non-existent storage slots

f3b6e77

ilitteri force-pushed the perf/storage-bloom-filter branch from f36d8cb to 1629d73 Compare March 3, 2026 02:29

Fix CI: suppress dead_code lint on enable(), add changelog entry, and…

15133f8

… mark flaky Hive Cancun ReOrg test (Transaction Nonce variant) as known-flaky

ElFantasma requested changes Mar 3, 2026

View reviewed changes

Comment thread crates/storage/store.rs

Comment thread crates/storage/store.rs

github-project-automation Bot moved this to In Progress in ethrex_l1 Mar 3, 2026

ilitteri added 5 commits March 3, 2026 17:46

Add unit tests for StorageBloomFilter

b722ab1

Test pass-through when disabled, no false negatives after enable, rejection of unknown keys, and make_key distinctness.

Document historical state root false-negative risk in bloom check

b19f9a6

The bloom filter only tracks writes during the current process lifetime. For RPC historical state queries, slots that were non-zero in older states but later zeroed won't be in the filter, causing false negatives.

Fix import ordering in bloom.rs to satisfy cargo fmt

46110af

ilitteri requested a review from ElFantasma March 3, 2026 21:47

Resolve merge conflict in CHANGELOG.md with main

cb95ce9

edg-l approved these changes Mar 6, 2026

View reviewed changes

azteca1998 approved these changes Mar 6, 2026

View reviewed changes

iovoid requested changes Mar 6, 2026

View reviewed changes

ElFantasma approved these changes Mar 6, 2026

View reviewed changes

Merge branch 'main' into perf/storage-bloom-filter

713517a

ElFantasma mentioned this pull request Apr 15, 2026

docs(l1): snapsync roadmap #6112

Open

1 task

	storage_bloom: Arc::new(StorageBloomFilter::new(200_000_000)),
	storage_bloom: Arc::new(StorageBloomFilter::new(2_000_000)),

Conversation

ilitteri commented Mar 3, 2026

Motivation

Description

How to Test

Uh oh!

github-actions Bot commented Mar 3, 2026

🤖 Kimi Code Review

Security & Correctness Issues

Performance & Design Issues

Code Quality

Recommendations

Uh oh!

github-actions Bot commented Mar 3, 2026

🤖 Codex Code Review

Uh oh!

github-actions Bot commented Mar 3, 2026

🤖 Claude Code Review

PR Review: perf(l1): add bloom filter to skip trie seeks for non-existent storage slots

Correctness

Memory

Semantic gap: zero-writes after non-zero writes

Missing tests

Minor

Summary

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Reviewed changes

Uh oh!

Copilot AI Mar 3, 2026

Choose a reason for hiding this comment

Uh oh!

ilitteri Mar 3, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Mar 3, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Mar 3, 2026

Choose a reason for hiding this comment

Uh oh!

ilitteri Mar 3, 2026

Choose a reason for hiding this comment

Uh oh!

github-actions Bot commented Mar 3, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Lines of code report

Uh oh!

greptile-apps Bot commented Mar 3, 2026

Greptile Summary

Confidence Score: 3/5

Important Files Changed

Sequence Diagram

Uh oh!

greptile-apps Bot left a comment

Choose a reason for hiding this comment

Uh oh!

greptile-apps Bot Mar 3, 2026

Choose a reason for hiding this comment

Uh oh!

ilitteri Mar 3, 2026

Choose a reason for hiding this comment

Uh oh!

greptile-apps Bot Mar 3, 2026

Choose a reason for hiding this comment

Uh oh!

ilitteri Mar 3, 2026

Choose a reason for hiding this comment

Uh oh!

greptile-apps Bot Mar 3, 2026

Choose a reason for hiding this comment

Uh oh!

ilitteri Mar 3, 2026

Choose a reason for hiding this comment

Uh oh!

greptile-apps Bot Mar 3, 2026

Choose a reason for hiding this comment

Uh oh!

PR Review: `perf(l1): add bloom filter to skip trie seeks for non-existent storage slots`

github-actions Bot commented Mar 3, 2026 •

edited

Loading

github-actions Bot commented Mar 3, 2026 •

edited

Loading