Skip to content

perf(l1): add bloom filter to skip trie seeks for non-existent storage slots#6288

Open
ilitteri wants to merge 10 commits into
mainfrom
perf/storage-bloom-filter
Open

perf(l1): add bloom filter to skip trie seeks for non-existent storage slots#6288
ilitteri wants to merge 10 commits into
mainfrom
perf/storage-bloom-filter

Conversation

@ilitteri
Copy link
Copy Markdown
Collaborator

@ilitteri ilitteri commented Mar 3, 2026

Motivation

Storage reads (get_storage_value) always perform a full trie traversal, even for slots that were never written. On workloads with many SLOAD misses (e.g. first-touch patterns), this is a significant bottleneck.

Description

Add a StorageBloomFilter backed by fastbloom::AtomicBloomFilter that tracks every (address, storage_key) pair written with a non-zero value. Before traversing the trie, get_storage_value checks the bloom filter — if the slot was definitely never written, the trie lookup is skipped entirely.

Key design points:

  • Starts disabled (pass-through): The filter is created empty and might_contain always returns true until enable() is called. This prevents an unpopulated filter from incorrectly rejecting all lookups.
  • Populated on writes: insert() is called whenever a non-zero storage value is written to the trie.
  • 1% false positive rate with 200M expected items, using FxBuildHasher for fast hashing.
  • Thread-safe: AtomicBloomFilter with AtomicBool enabled flag (Release/Acquire ordering).

Future work: scan existing trie storage on startup to populate the filter and call enable(), so the filter can actually start rejecting lookups for pre-existing data.

How to Test

  • Run the existing test suite — the filter is disabled by default (pass-through), so behavior is identical to main.
  • Once enable() is wired up after initial population, benchmark with import on l2-1k-erc20.rlp to measure trie-skip savings.

Copilot AI review requested due to automatic review settings March 3, 2026 01:17
@ilitteri ilitteri requested a review from a team as a code owner March 3, 2026 01:17
@github-actions github-actions Bot added L1 Ethereum client performance Block execution throughput and performance in general labels Mar 3, 2026
@github-actions
Copy link
Copy Markdown

github-actions Bot commented Mar 3, 2026

🤖 Kimi Code Review

Security & Correctness Issues

  1. Race condition in bloom filter initialization (bloom.rs:40-43, store.rs:1491)
    The bloom filter is created with enabled: false but is immediately shared via Arc. Other threads could call might_contain() before enable() is called, causing unnecessary trie lookups. The initialization flow needs to be atomic.

  2. Missing bloom filter population on startup (store.rs:1488-1491)
    The bloom filter is initialized with capacity for 200M items but never populated with existing storage slots. This means:

    • On restart, the filter will incorrectly report "never written" for existing storage
    • Could cause consensus failures by returning None for existing storage values
  3. No handling of storage deletion (store.rs:1732-1740, 1824-1832)
    When storage values are set to zero (deleted), the bloom filter retains the "might contain" status forever. This causes:

    • False positives accumulating over time
    • Unnecessary trie lookups for deleted storage slots
    • Potential performance degradation

Performance & Design Issues

  1. Inefficient key serialization (bloom.rs:57-62)
    The make_key function creates a new [u8; 52] array on every call. Consider:

    • Using a stack-allocated buffer passed as mutable reference
    • Or implementing a proper hasher that can process the two fields without allocation
  2. Hard-coded capacity (store.rs:1491)
    StorageBloomFilter::new(200_000_000) uses a fixed capacity that may not match actual chain state. This should be:

    • Configurable via CLI/config
    • Or calculated based on actual chain statistics
  3. Memory usage concern
    A bloom filter for 200M items at 1% false positive rate requires ~240MB of memory. This should be documented and made configurable.

Code Quality

  1. Debug implementation is incomplete (bloom.rs:15-20)
    The debug implementation doesn't show any useful state. Consider including:

    • Enabled status
    • Capacity
    • Number of inserted items (if available from fastbloom)
  2. Atomic ordering could be relaxed (bloom.rs:42, 51)
    Ordering::Release/Acquire might be stronger than needed. Ordering::Relaxed is likely sufficient since there's no other shared state being synchronized.

Recommendations

  1. Add bloom filter population during store initialization by scanning existing storage tries
  2. Implement bloom filter cleanup when storage slots are deleted (consider counting bloom filter or periodic rebuild)
  3. Make bloom filter capacity configurable
  4. Document memory usage implications
  5. Consider using a more sophisticated approach for handling deletions (e.g., counting bloom filter or epoch-based rebuilds)

Automated review by Kimi (Moonshot AI) · custom prompt

@github-actions
Copy link
Copy Markdown

github-actions Bot commented Mar 3, 2026

🤖 Codex Code Review

  1. Potential consensus/correctness break if bloom is ever enabled without full backfill
    The read path can short-circuit to None before touching trie state (store.rs:2119).
    But the bloom is only populated from new non-zero writes during account updates (store.rs:1738, store.rs:1830) and misses existing DB/genesis storage (store.rs:1877).
    If enable() is later called (bloom.rs:36) without rebuilding from full state, historical/existing non-zero slots can become false negatives and return wrong storage values.

  2. Feature appears inactive in this PR (performance benefit not realized)
    StorageBloomFilter starts disabled (bloom.rs:30), and I don’t see any call site enabling it. That means might_contain() always returns true (bloom.rs:50), so the new fast path in get_storage_at_root is effectively dead code right now.

  3. High fixed memory cost at startup
    The filter is instantiated with expected_items(200_000_000) (store.rs:1491, bloom.rs:29). At 1% FP rate this is roughly hundreds of MB just for bloom bits, plus atomic structure overhead. For client deployments this is a significant resident-memory increase and should be justified/configurable.

If intended direction is to ship this optimization, I’d require:

  1. deterministic backfill/rebuild strategy (including genesis + existing DB state),
  2. explicit enable point only after backfill completes,
  3. tests proving no false negatives for existing state and historical reads across restart.

Automated review by OpenAI Codex · custom prompt

@github-actions
Copy link
Copy Markdown

github-actions Bot commented Mar 3, 2026

🤖 Claude Code Review

Now I have enough context to write a comprehensive review. Here it is:


PR Review: perf(l1): add bloom filter to skip trie seeks for non-existent storage slots

The overall approach is sound — using a bloom filter to short-circuit trie traversal on SLOAD misses is a well-known optimization. The pass-through-until-enabled design correctly avoids the corruption that was caught in the fixup commit. However, there are several issues worth addressing before merging.


Correctness

Critical: Genesis storage bypasses the bloom filter

setup_genesis_state_trie (store.rs:1877–1881) calls storage_trie.insert() for every non-zero genesis slot but never calls storage_bloom.insert():

for (storage_key, storage_value) in account.storage {
    if !storage_value.is_zero() {
        let hashed_key = hash_key(&H256(storage_key.to_big_endian()));
        storage_trie.insert(hashed_key, storage_value.encode_to_vec())?; // no bloom insert
    }
}

When enable() is eventually wired up after "scanning existing trie storage on startup", any genesis storage slot that was never subsequently written during block processing will be absent from the filter. At that point might_contain will return false for those slots and get_storage_at_root will return Ok(None) instead of the actual value — silently corrupting reads. This is the exact same class of bug that the fixup commit caught during development.

The PR description acknowledges this as "future work", but the code as written has no guard or comment at the enable() call site ensuring the filter is fully populated before activation. A TODO or debug_assert linking the genesis-population step to enable() would prevent the same mistake from recurring.

Concern: write_storage_trie_nodes_batch also bypasses the bloom filter

write_storage_trie_nodes_batch (store.rs:1164) writes raw trie node bytes directly to the database with no bloom update. If this path is used to ingest storage (e.g. during snap sync), those slots will be invisible to the filter after enable().


Memory

~240 MB always allocated, even in disabled (pass-through) mode

With 200M items and 1% FPR the underlying bit array is approximately:

m = -n * ln(p) / ln(2)^2  ≈  1.92 × 10^9 bits  ≈  240 MB

This is allocated unconditionally at Store::new time (store.rs:1491). An always-resident 240 MB allocation is a meaningful cost for nodes that never call enable(), and especially large for testnets, development, or L2 instances where the expected item count is orders of magnitude smaller.

The hardcoded 200_000_000 in Store::new gives no way for callers to tune the capacity to their workload. At minimum the constant deserves a comment explaining where 200M comes from (e.g. "approximate upper bound of distinct non-zero storage slots on mainnet"). A config option or lazy initialization would be better.


Semantic gap: zero-writes after non-zero writes

When SSTORE sets a slot to zero, the trie removes the entry (store.rs:1735–1736, 1827–1828) but the bloom filter is not and cannot be updated (bloom filters don't support deletion). This is intentional and correct — the false positive causes a trie lookup that correctly returns None. The comment on might_contain correctly describes this. No action needed, but the insert() doc comment at bloom.rs:42 says "Record that a non-zero value exists at (address, key)" — it should mention that once inserted, a slot remains in the filter even after the value is deleted.


Missing tests

bloom.rs has no unit tests. For a correctness-sensitive component that gates all storage reads, at least the following should be covered:

  • might_contain returns true (pass-through) before enable()
  • After enable(), might_contain returns false for never-inserted keys
  • After insert + enable(), might_contain returns true
  • make_key produces distinct values for (A, K1) vs (A, K2) and (A1, K) vs (A2, K)

Minor

  • bloom.rs:24: pub fn new(capacity: usize) — the capacity parameter is unused internally (the builder takes expected_items, which is correct), but naming it capacity vs expected_items may mislead callers about semantics.
  • The FxBuildHasher choice is fine here — inputs are addresses and keys, not adversarially controlled strings, so non-cryptographic hashing is appropriate.
  • Arc<StorageBloomFilter> is correctly shared across Store clones (since Store: Clone and Arc clone is shallow). This is good.
  • Ordering choices (Release on store, Acquire on load) are correct for establishing a happens-before relationship between population and enable.

Summary

The implementation is mechanically correct for the current (always-disabled) state, and the pass-through design safely prevents corruption. The main concern is that the path to activating the filter — the advertised "future work" — has a latent correctness trap: enable() must not be called until all existing storage has been inserted, including genesis state and any snap-synced data. This precondition is easy to violate accidentally. I'd recommend adding either an explicit populate_from_genesis step in the bloom module, or at minimum a prominent // SAFETY: comment at every future enable() call site enumerating what must be true before calling it.


Automated review by Claude (Anthropic) · custom prompt

Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR introduces a storage-level bloom filter to avoid expensive storage-trie lookups for (address, storage_key) pairs that have definitely never been written with a non-zero value, targeting workloads with many SLOAD misses.

Changes:

  • Add StorageBloomFilter (wrapper around fastbloom::AtomicBloomFilter) with an enabled pass-through mode.
  • Populate the bloom filter on non-zero storage writes and consult it in Store::get_storage_at_root to potentially skip trie access.
  • Wire the new module into the storage crate and store it on Store.

Reviewed changes

Copilot reviewed 3 out of 3 changed files in this pull request and generated 3 comments.

File Description
crates/storage/store.rs Adds bloom filter field, populates it on writes, and checks it before storage trie traversal.
crates/storage/lib.rs Registers the new internal bloom module.
crates/storage/bloom.rs Implements StorageBloomFilter with enable/pass-through semantics and key construction.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment thread crates/storage/store.rs
last_computed_flatkeyvalue: Arc::new(RwLock::new(last_written)),
account_code_cache: Arc::new(Mutex::new(CodeCache::default())),
code_metadata_cache: Arc::new(Mutex::new(rustc_hash::FxHashMap::default())),
storage_bloom: Arc::new(StorageBloomFilter::new(200_000_000)),
Copy link

Copilot AI Mar 3, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

StorageBloomFilter::new(200_000_000) is created unconditionally during Store construction. For a 1% false-positive rate, a bloom sized for 200M items is on the order of hundreds of MB of resident memory, which can cause OOM/regressions (notably for the many EngineType::InMemory test stores). Consider making this capacity configurable, drastically smaller by default, or lazily allocating the backing filter only when the bloom is actually going to be used/populated.

Suggested change
storage_bloom: Arc::new(StorageBloomFilter::new(200_000_000)),
storage_bloom: Arc::new(StorageBloomFilter::new(2_000_000)),

Copilot uses AI. Check for mistakes.
Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Comment thread crates/storage/store.rs
if storage_value.is_zero() {
storage_trie.remove(&hashed_key)?;
} else {
self.storage_bloom.insert(update.address, *storage_key);
Copy link

Copilot AI Mar 3, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The PR description says the bloom is “disabled by default (pass-through), so behavior is identical”, but insert() is still called on every non-zero storage write even while disabled. That adds hashing + atomic updates to the write path with no read-side benefit until enable() is wired up. If the intent is truly zero-overhead until enabled, consider gating insert() behind a separate runtime/feature flag or deferring population until you’re ready to enable the filter.

Suggested change
self.storage_bloom.insert(update.address, *storage_key);
if self.storage_bloom.is_enabled() {
self.storage_bloom.insert(update.address, *storage_key);
}

Copilot uses AI. Check for mistakes.
Comment thread crates/storage/store.rs
Comment on lines +1738 to 1740
self.storage_bloom.insert(update.address, *storage_key);
storage_trie.insert(hashed_key, storage_value.encode_to_vec())?;
}
Copy link

Copilot AI Mar 3, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

storage_bloom is populated here for added_storage, but there are other code paths that write non-zero values into storage tries without calling storage_bloom.insert (e.g. setup_genesis_state_trie inserts directly into a storage trie). If enable() is ever called without a full scan/population step that covers those paths, this can introduce false negatives and incorrect get_storage_at_root results. Consider centralizing storage-trie writes or ensuring all non-zero writes go through a helper that also updates the bloom.

Copilot uses AI. Check for mistakes.
Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@github-actions
Copy link
Copy Markdown

github-actions Bot commented Mar 3, 2026

Lines of code report

Total lines added: 101
Total lines removed: 0
Total lines changed: 101

Detailed view
+--------------------------------+-------+------+
| File                           | Lines | Diff |
+--------------------------------+-------+------+
| ethrex/crates/storage/bloom.rs | 92    | +92  |
+--------------------------------+-------+------+
| ethrex/crates/storage/lib.rs   | 13    | +1   |
+--------------------------------+-------+------+
| ethrex/crates/storage/store.rs | 2521  | +8   |
+--------------------------------+-------+------+

@greptile-apps
Copy link
Copy Markdown

greptile-apps Bot commented Mar 3, 2026

Greptile Summary

This PR adds infrastructure for a StorageBloomFilter — backed by fastbloom::AtomicBloomFilter — that is intended to skip trie lookups for (address, storage_key) pairs that were never written. The filter is wired into get_storage_at_root and populated on every non-zero storage write.

  • The core mechanism is sound: AtomicBool enabled-flag with Release/Acquire ordering, a 52-byte concatenated key, and a 1 % false-positive rate over 200 M expected items.
  • The filter is never activated. enable() has no callers in the repository, so might_contain always returns true (pass-through). Every trie lookup still executes — there is no performance benefit today.
  • ~240 MB of RAM is allocated at startup unconditionally, even though the filter is unused.
  • insert() adds hashing cost on every non-zero storage write with no corresponding benefit while disabled.
  • There are no unit tests for bloom.rs; the pass-through guarantee and the no-false-negatives invariant (critical for correctness) are untested.
  • The enable() function carries a silent correctness footgun: if called before the filter is fully populated from existing trie state, get_storage_at_root will silently return Ok(None) for slots that actually hold a value.

Confidence Score: 3/5

  • Safe to merge in terms of correctness (filter is disabled, so behaviour is identical to main), but introduces ~240 MB of unconditional memory overhead and per-write hashing cost with no active benefit.
  • Since enable() is never called, the filter acts as a pure pass-through and cannot produce incorrect results. However, landing infra that consumes significant memory/CPU with zero benefit, and with no tests, warrants attention before merging.
  • crates/storage/bloom.rs (missing tests, undocumented footgun on enable()), crates/storage/store.rs (large unconditional allocation, insert overhead while disabled).

Important Files Changed

Filename Overview
crates/storage/bloom.rs New StorageBloomFilter wrapping fastbloom::AtomicBloomFilter; correct thread-safety model and key construction, but no unit tests and the enable() path carries a silent correctness footgun if called before full trie population.
crates/storage/store.rs Integrates StorageBloomFilter into Store: allocates ~240 MB upfront at node startup, inserts on every non-zero storage write, and short-circuits get_storage_at_root; however enable() is never called so the filter is pure overhead in its current state.
crates/storage/lib.rs Trivial addition of mod bloom declaration; no issues.

Sequence Diagram

sequenceDiagram
    participant EVM as EVM / RPC Caller
    participant Store as Store
    participant Bloom as StorageBloomFilter
    participant Trie as Storage Trie

    EVM->>Store: get_storage_at_root(state_root, address, key)
    Store->>Bloom: might_contain(address, key)
    alt filter disabled (current state) OR false positive
        Bloom-->>Store: true
        Store->>Trie: open_state_trie → get account storage_root
        Store->>Trie: open_storage_trie → get(hashed_key)
        Trie-->>Store: Option<encoded_value>
        Store-->>EVM: Ok(Some(value)) / Ok(None)
    else filter enabled AND definitely absent
        Bloom-->>Store: false
        Store-->>EVM: Ok(None)  [trie skip]
    end

    note over Bloom: enable() is never called today,<br/>so the "trie skip" path is never taken

    EVM->>Store: apply_account_updates (non-zero write)
    Store->>Bloom: insert(address, key)
    Store->>Trie: trie.insert(hashed_key, value)
Loading

Last reviewed commit: f36d8cb

Copy link
Copy Markdown

@greptile-apps greptile-apps Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

3 files reviewed, 4 comments

Edit Code Review Agent Settings | Greptile

Comment thread crates/storage/store.rs
last_computed_flatkeyvalue: Arc::new(RwLock::new(last_written)),
account_code_cache: Arc::new(Mutex::new(CodeCache::default())),
code_metadata_cache: Arc::new(Mutex::new(rustc_hash::FxHashMap::default())),
storage_bloom: Arc::new(StorageBloomFilter::new(200_000_000)),
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

~240 MB allocated at startup before the filter is useful

StorageBloomFilter::new(200_000_000) with a 1 % false-positive rate requires roughly m = -n·ln(p) / ln(2)² bits ≈ 1.92 billion bits ≈ 240 MB of RAM. This allocation happens unconditionally at node startup, even though:

  1. enable() is never called anywhere in the codebase, so might_contain always returns true (pass-through).
  2. The "Future work" item (scan existing trie on startup then call enable()) has not yet been implemented.

The net effect today is a large fixed memory cost with zero benefit — every trie lookup still executes.

Consider either (a) deferring the allocation until enable() is about to be called, (b) making capacity configurable/feature-gated, or (c) landing this PR together with the startup-scan + enable() call so the optimisation is actually active from the first merge.

Prompt To Fix With AI
This is a comment left during a code review.
Path: crates/storage/store.rs
Line: 1491

Comment:
**~240 MB allocated at startup before the filter is useful**

`StorageBloomFilter::new(200_000_000)` with a 1 % false-positive rate requires roughly `m = -n·ln(p) / ln(2)²` bits ≈ **1.92 billion bits ≈ 240 MB** of RAM. This allocation happens unconditionally at node startup, even though:

1. `enable()` is never called anywhere in the codebase, so `might_contain` always returns `true` (pass-through).
2. The "Future work" item (scan existing trie on startup then call `enable()`) has not yet been implemented.

The net effect today is a large fixed memory cost with **zero benefit** — every trie lookup still executes.

Consider either (a) deferring the allocation until `enable()` is about to be called, (b) making `capacity` configurable/feature-gated, or (c) landing this PR together with the startup-scan + `enable()` call so the optimisation is actually active from the first merge.

How can I resolve this? If you propose a fix, please make it concise.

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Comment thread crates/storage/store.rs
if storage_value.is_zero() {
storage_trie.remove(&hashed_key)?;
} else {
self.storage_bloom.insert(update.address, *storage_key);
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

insert() incurs hashing cost even while the filter is disabled

insert() always computes make_key and hashes into the underlying AtomicBloomFilter, even when enabled == false. Since might_contain short-circuits to true while disabled, these inserts represent pure overhead on every non-zero storage write until enable() is eventually called.

If the motivation for inserting unconditionally is to "warm up" the filter so it's ready when enable() is called, that is a reasonable design — but it should be explicitly documented with a comment. Otherwise, a future reader may add an early-return guard in insert() that would actually break the warm-up intent.

The same applies to the second insert site at line 1830.

Prompt To Fix With AI
This is a comment left during a code review.
Path: crates/storage/store.rs
Line: 1738

Comment:
**`insert()` incurs hashing cost even while the filter is disabled**

`insert()` always computes `make_key` and hashes into the underlying `AtomicBloomFilter`, even when `enabled == false`. Since `might_contain` short-circuits to `true` while disabled, these inserts represent **pure overhead** on every non-zero storage write until `enable()` is eventually called.

If the motivation for inserting unconditionally is to "warm up" the filter so it's ready when `enable()` is called, that is a reasonable design — but it should be explicitly documented with a comment. Otherwise, a future reader may add an early-return guard in `insert()` that would actually break the warm-up intent.

The same applies to the second insert site at line 1830.

How can I resolve this? If you propose a fix, please make it concise.

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Comment thread crates/storage/bloom.rs
Comment on lines +1 to +63
use std::fmt;
use std::sync::atomic::{AtomicBool, Ordering};

use ethrex_common::{Address, H256};
use fastbloom::AtomicBloomFilter;
use rustc_hash::FxBuildHasher;

const FALSE_POSITIVE_RATE: f64 = 0.01;

/// Bloom filter that tracks which (address, storage_key) pairs have non-zero
/// storage values. Used to skip expensive trie lookups for slots that were
/// never written to.
pub struct StorageBloomFilter {
filter: AtomicBloomFilter<FxBuildHasher>,
enabled: AtomicBool,
}

impl fmt::Debug for StorageBloomFilter {
fn fmt(&self, f: &mut fmt::Formatter<'_>) -> fmt::Result {
f.debug_struct("StorageBloomFilter").finish()
}
}

impl StorageBloomFilter {
pub fn new(capacity: usize) -> Self {
Self {
filter: AtomicBloomFilter::with_false_pos(FALSE_POSITIVE_RATE)
.hasher(FxBuildHasher)
.expected_items(capacity),
enabled: AtomicBool::new(false),
}
}

/// Activate the bloom filter after it has been populated.
/// Before this is called, `might_contain` always returns `true` (pass-through).
pub fn enable(&self) {
self.enabled.store(true, Ordering::Release);
}

/// Record that a non-zero value exists at (address, key).
pub fn insert(&self, address: Address, key: H256) {
let bloom_key = Self::make_key(address, key);
self.filter.insert(&bloom_key);
}

/// Returns `true` if the slot *might* contain a non-zero value.
/// Returns `false` if the slot was definitely never written.
/// When the filter is not yet enabled, always returns `true` (pass-through).
pub fn might_contain(&self, address: Address, key: H256) -> bool {
if !self.enabled.load(Ordering::Acquire) {
return true;
}
let bloom_key = Self::make_key(address, key);
self.filter.contains(&bloom_key)
}

fn make_key(address: Address, key: H256) -> [u8; 52] {
let mut buf = [0u8; 52];
buf[..20].copy_from_slice(address.as_bytes());
buf[20..].copy_from_slice(key.as_bytes());
buf
}
}
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No unit tests for the new module

bloom.rs introduces a StorageBloomFilter with non-trivial behaviour: a disabled filter must be a transparent pass-through, and an enabled filter must never produce false negatives. Neither property is covered by a test.

Suggested minimal test cases:

  • A newly created filter (disabled) always returns true from might_contain.
  • After enable(), a key that was insert()-ed still returns true.
  • After enable(), a key that was never inserted returns false (no false negatives for a key set that is small relative to capacity).
  • make_key produces distinct bytes for (a, k₁) vs (a, k₂) and (a₁, k) vs (a₂, k).

Given that a correctness regression here (false negatives while enabled) would silently return None for slots that actually have a value, tests are especially important.

Prompt To Fix With AI
This is a comment left during a code review.
Path: crates/storage/bloom.rs
Line: 1-63

Comment:
**No unit tests for the new module**

`bloom.rs` introduces a `StorageBloomFilter` with non-trivial behaviour: a disabled filter must be a transparent pass-through, and an enabled filter must never produce false negatives. Neither property is covered by a test.

Suggested minimal test cases:
- A newly created filter (disabled) always returns `true` from `might_contain`.
- After `enable()`, a key that was `insert()`-ed still returns `true`.
- After `enable()`, a key that was **never** inserted returns `false` (no false negatives for a key set that is small relative to capacity).
- `make_key` produces distinct bytes for `(a, k₁)` vs `(a, k₂)` and `(a₁, k)` vs `(a₂, k)`.

Given that a correctness regression here (false negatives while enabled) would silently return `None` for slots that actually have a value, tests are especially important.

How can I resolve this? If you propose a fix, please make it concise.

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Comment thread crates/storage/bloom.rs
Comment on lines +36 to +38
pub fn enable(&self) {
self.enabled.store(true, Ordering::Release);
}
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

enable() is never called — silent correctness risk for future callers

enable() has no callers in the repository today. When someone eventually wires it up (the "Future work" mentioned in the PR), they must ensure the filter has been fully populated from existing trie storage before calling it; calling enable() prematurely on a partially-populated filter would cause might_contain to return false for slots that genuinely have a value, silently returning Ok(None) instead of the correct stored value.

Consider adding a doc-comment warning directly on enable() about this requirement so the footgun is visible at the call site:

/// Activate the bloom filter.
///
/// # Safety / correctness
/// The filter MUST have been fully populated (via `insert`) for all
/// (address, storage_key) pairs that exist in the trie before this is called.
/// Calling `enable()` on a partially-populated filter will cause
/// `might_contain` to return `false` for real slots, producing silent
/// incorrect `None` results from `get_storage_at_root`.
pub fn enable(&self) {
    self.enabled.store(true, Ordering::Release);
}
Prompt To Fix With AI
This is a comment left during a code review.
Path: crates/storage/bloom.rs
Line: 36-38

Comment:
**`enable()` is never called — silent correctness risk for future callers**

`enable()` has no callers in the repository today. When someone eventually wires it up (the "Future work" mentioned in the PR), they must ensure the filter has been **fully populated** from existing trie storage before calling it; calling `enable()` prematurely on a partially-populated filter would cause `might_contain` to return `false` for slots that genuinely have a value, silently returning `Ok(None)` instead of the correct stored value.

Consider adding a doc-comment warning directly on `enable()` about this requirement so the footgun is visible at the call site:

```rust
/// Activate the bloom filter.
///
/// # Safety / correctness
/// The filter MUST have been fully populated (via `insert`) for all
/// (address, storage_key) pairs that exist in the trie before this is called.
/// Calling `enable()` on a partially-populated filter will cause
/// `might_contain` to return `false` for real slots, producing silent
/// incorrect `None` results from `get_storage_at_root`.
pub fn enable(&self) {
    self.enabled.store(true, Ordering::Release);
}
```

How can I resolve this? If you propose a fix, please make it concise.

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ilitteri added 2 commits March 2, 2026 23:29
…abled

The bloom filter was created empty and immediately used to reject ALL
storage lookups, since an empty bloom filter returns false for every
query. This caused SLOAD to return 0 for all pre-existing storage slots,
producing a gas mismatch (4.6M vs 10.5M expected).

Add an `enabled` flag (AtomicBool, defaults to false) so that
`might_contain` returns true (pass-through) until `enable()` is called.
The filter will only start rejecting lookups after it has been populated
with existing storage data and explicitly activated.
@ilitteri ilitteri force-pushed the perf/storage-bloom-filter branch from f36d8cb to 1629d73 Compare March 3, 2026 02:29
@github-actions
Copy link
Copy Markdown

github-actions Bot commented Mar 3, 2026

Benchmark Block Execution Results Comparison Against Main

Command Mean [s] Min [s] Max [s] Relative
base 61.141 ± 0.284 60.668 61.458 1.00
head 61.293 ± 0.234 60.968 61.598 1.00 ± 0.01

… mark

flaky Hive Cancun ReOrg test (Transaction Nonce variant) as known-flaky
Comment thread crates/storage/store.rs
Comment thread crates/storage/store.rs
@github-project-automation github-project-automation Bot moved this to In Progress in ethrex_l1 Mar 3, 2026
ilitteri added 5 commits March 3, 2026 17:46
…nditions

Use OnceLock to lazily allocate the ~240MB bloom filter on first insert()
instead of at Store construction. This avoids upfront memory overhead when
the filter is never used (dev mode, testnets).

Also documents the warm-up insert pattern (inserts happen while disabled
to populate the filter before enable()), and adds a precondition doc on
enable() listing what must be true before calling it.
Genesis setup_genesis_state_trie inserts storage slots without updating
the bloom filter, and write_storage_trie_nodes_batch (snap sync) bypasses
it too. Document these as latent false-negative sources for when bloom is
enabled.
Test pass-through when disabled, no false negatives after enable,
rejection of unknown keys, and make_key distinctness.
The bloom filter only tracks writes during the current process lifetime.
For RPC historical state queries, slots that were non-zero in older states
but later zeroed won't be in the filter, causing false negatives.
@ilitteri ilitteri requested a review from ElFantasma March 3, 2026 21:47
Comment thread crates/storage/bloom.rs
Comment on lines +48 to +49
#[allow(dead_code)]
pub fn enable(&self) {
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why is the feature never enabled?

ElFantasma added a commit that referenced this pull request Apr 15, 2026
- Add §1.18 observability tooling (PR #6470)
- Add §1.19 pivot update reliability (PR #6475, issue #6474)
- Add §1.20 big-account within-trie parallelization (issue #6477)
- Add §1.21 small-account batching (issue #6476)
- Add §1.22 decoded TrieLayerCache (PR #6348)
- Add §1.23 bloom filter for non-existent storage (PR #6288)
- Add §1.24 adaptive request sizing + bisection (PR #6181)
- Add §1.25 concurrent bytecode + storage (PR #6205)
- Add §1.26 phase completion markers (PR #6189)
- Add §2.18 StorageTrieTracker refactor (PR #6171)
- Update current-state bottleneck table with small-account and pivot-update findings
- Reprioritize timeline: pivot-update crash fix is now priority 0
- Add two risks (pivot crash masks perf work, DB corruption on every crash)
- Bump doc version to 1.3
@ElFantasma ElFantasma mentioned this pull request Apr 15, 2026
1 task
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

L1 Ethereum client performance Block execution throughput and performance in general

Projects

Status: In Progress
Status: Todo

Development

Successfully merging this pull request may close these issues.

6 participants