perf(l1): add bloom filter to skip trie seeks for non-existent storage slots#6288
perf(l1): add bloom filter to skip trie seeks for non-existent storage slots#6288ilitteri wants to merge 10 commits into
Conversation
🤖 Kimi Code ReviewSecurity & Correctness Issues
Performance & Design Issues
Code Quality
Recommendations
Automated review by Kimi (Moonshot AI) · custom prompt |
🤖 Codex Code Review
If intended direction is to ship this optimization, I’d require:
Automated review by OpenAI Codex · custom prompt |
🤖 Claude Code ReviewNow I have enough context to write a comprehensive review. Here it is: PR Review:
|
There was a problem hiding this comment.
Pull request overview
This PR introduces a storage-level bloom filter to avoid expensive storage-trie lookups for (address, storage_key) pairs that have definitely never been written with a non-zero value, targeting workloads with many SLOAD misses.
Changes:
- Add
StorageBloomFilter(wrapper aroundfastbloom::AtomicBloomFilter) with anenabledpass-through mode. - Populate the bloom filter on non-zero storage writes and consult it in
Store::get_storage_at_rootto potentially skip trie access. - Wire the new module into the storage crate and store it on
Store.
Reviewed changes
Copilot reviewed 3 out of 3 changed files in this pull request and generated 3 comments.
| File | Description |
|---|---|
crates/storage/store.rs |
Adds bloom filter field, populates it on writes, and checks it before storage trie traversal. |
crates/storage/lib.rs |
Registers the new internal bloom module. |
crates/storage/bloom.rs |
Implements StorageBloomFilter with enable/pass-through semantics and key construction. |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
| last_computed_flatkeyvalue: Arc::new(RwLock::new(last_written)), | ||
| account_code_cache: Arc::new(Mutex::new(CodeCache::default())), | ||
| code_metadata_cache: Arc::new(Mutex::new(rustc_hash::FxHashMap::default())), | ||
| storage_bloom: Arc::new(StorageBloomFilter::new(200_000_000)), |
There was a problem hiding this comment.
StorageBloomFilter::new(200_000_000) is created unconditionally during Store construction. For a 1% false-positive rate, a bloom sized for 200M items is on the order of hundreds of MB of resident memory, which can cause OOM/regressions (notably for the many EngineType::InMemory test stores). Consider making this capacity configurable, drastically smaller by default, or lazily allocating the backing filter only when the bloom is actually going to be used/populated.
| storage_bloom: Arc::new(StorageBloomFilter::new(200_000_000)), | |
| storage_bloom: Arc::new(StorageBloomFilter::new(2_000_000)), |
| if storage_value.is_zero() { | ||
| storage_trie.remove(&hashed_key)?; | ||
| } else { | ||
| self.storage_bloom.insert(update.address, *storage_key); |
There was a problem hiding this comment.
The PR description says the bloom is “disabled by default (pass-through), so behavior is identical”, but insert() is still called on every non-zero storage write even while disabled. That adds hashing + atomic updates to the write path with no read-side benefit until enable() is wired up. If the intent is truly zero-overhead until enabled, consider gating insert() behind a separate runtime/feature flag or deferring population until you’re ready to enable the filter.
| self.storage_bloom.insert(update.address, *storage_key); | |
| if self.storage_bloom.is_enabled() { | |
| self.storage_bloom.insert(update.address, *storage_key); | |
| } |
| self.storage_bloom.insert(update.address, *storage_key); | ||
| storage_trie.insert(hashed_key, storage_value.encode_to_vec())?; | ||
| } |
There was a problem hiding this comment.
storage_bloom is populated here for added_storage, but there are other code paths that write non-zero values into storage tries without calling storage_bloom.insert (e.g. setup_genesis_state_trie inserts directly into a storage trie). If enable() is ever called without a full scan/population step that covers those paths, this can introduce false negatives and incorrect get_storage_at_root results. Consider centralizing storage-trie writes or ensuring all non-zero writes go through a helper that also updates the bloom.
Lines of code reportTotal lines added: Detailed view |
Greptile SummaryThis PR adds infrastructure for a
Confidence Score: 3/5
|
| Filename | Overview |
|---|---|
| crates/storage/bloom.rs | New StorageBloomFilter wrapping fastbloom::AtomicBloomFilter; correct thread-safety model and key construction, but no unit tests and the enable() path carries a silent correctness footgun if called before full trie population. |
| crates/storage/store.rs | Integrates StorageBloomFilter into Store: allocates ~240 MB upfront at node startup, inserts on every non-zero storage write, and short-circuits get_storage_at_root; however enable() is never called so the filter is pure overhead in its current state. |
| crates/storage/lib.rs | Trivial addition of mod bloom declaration; no issues. |
Sequence Diagram
sequenceDiagram
participant EVM as EVM / RPC Caller
participant Store as Store
participant Bloom as StorageBloomFilter
participant Trie as Storage Trie
EVM->>Store: get_storage_at_root(state_root, address, key)
Store->>Bloom: might_contain(address, key)
alt filter disabled (current state) OR false positive
Bloom-->>Store: true
Store->>Trie: open_state_trie → get account storage_root
Store->>Trie: open_storage_trie → get(hashed_key)
Trie-->>Store: Option<encoded_value>
Store-->>EVM: Ok(Some(value)) / Ok(None)
else filter enabled AND definitely absent
Bloom-->>Store: false
Store-->>EVM: Ok(None) [trie skip]
end
note over Bloom: enable() is never called today,<br/>so the "trie skip" path is never taken
EVM->>Store: apply_account_updates (non-zero write)
Store->>Bloom: insert(address, key)
Store->>Trie: trie.insert(hashed_key, value)
Last reviewed commit: f36d8cb
| last_computed_flatkeyvalue: Arc::new(RwLock::new(last_written)), | ||
| account_code_cache: Arc::new(Mutex::new(CodeCache::default())), | ||
| code_metadata_cache: Arc::new(Mutex::new(rustc_hash::FxHashMap::default())), | ||
| storage_bloom: Arc::new(StorageBloomFilter::new(200_000_000)), |
There was a problem hiding this comment.
~240 MB allocated at startup before the filter is useful
StorageBloomFilter::new(200_000_000) with a 1 % false-positive rate requires roughly m = -n·ln(p) / ln(2)² bits ≈ 1.92 billion bits ≈ 240 MB of RAM. This allocation happens unconditionally at node startup, even though:
enable()is never called anywhere in the codebase, somight_containalways returnstrue(pass-through).- The "Future work" item (scan existing trie on startup then call
enable()) has not yet been implemented.
The net effect today is a large fixed memory cost with zero benefit — every trie lookup still executes.
Consider either (a) deferring the allocation until enable() is about to be called, (b) making capacity configurable/feature-gated, or (c) landing this PR together with the startup-scan + enable() call so the optimisation is actually active from the first merge.
Prompt To Fix With AI
This is a comment left during a code review.
Path: crates/storage/store.rs
Line: 1491
Comment:
**~240 MB allocated at startup before the filter is useful**
`StorageBloomFilter::new(200_000_000)` with a 1 % false-positive rate requires roughly `m = -n·ln(p) / ln(2)²` bits ≈ **1.92 billion bits ≈ 240 MB** of RAM. This allocation happens unconditionally at node startup, even though:
1. `enable()` is never called anywhere in the codebase, so `might_contain` always returns `true` (pass-through).
2. The "Future work" item (scan existing trie on startup then call `enable()`) has not yet been implemented.
The net effect today is a large fixed memory cost with **zero benefit** — every trie lookup still executes.
Consider either (a) deferring the allocation until `enable()` is about to be called, (b) making `capacity` configurable/feature-gated, or (c) landing this PR together with the startup-scan + `enable()` call so the optimisation is actually active from the first merge.
How can I resolve this? If you propose a fix, please make it concise.| if storage_value.is_zero() { | ||
| storage_trie.remove(&hashed_key)?; | ||
| } else { | ||
| self.storage_bloom.insert(update.address, *storage_key); |
There was a problem hiding this comment.
insert() incurs hashing cost even while the filter is disabled
insert() always computes make_key and hashes into the underlying AtomicBloomFilter, even when enabled == false. Since might_contain short-circuits to true while disabled, these inserts represent pure overhead on every non-zero storage write until enable() is eventually called.
If the motivation for inserting unconditionally is to "warm up" the filter so it's ready when enable() is called, that is a reasonable design — but it should be explicitly documented with a comment. Otherwise, a future reader may add an early-return guard in insert() that would actually break the warm-up intent.
The same applies to the second insert site at line 1830.
Prompt To Fix With AI
This is a comment left during a code review.
Path: crates/storage/store.rs
Line: 1738
Comment:
**`insert()` incurs hashing cost even while the filter is disabled**
`insert()` always computes `make_key` and hashes into the underlying `AtomicBloomFilter`, even when `enabled == false`. Since `might_contain` short-circuits to `true` while disabled, these inserts represent **pure overhead** on every non-zero storage write until `enable()` is eventually called.
If the motivation for inserting unconditionally is to "warm up" the filter so it's ready when `enable()` is called, that is a reasonable design — but it should be explicitly documented with a comment. Otherwise, a future reader may add an early-return guard in `insert()` that would actually break the warm-up intent.
The same applies to the second insert site at line 1830.
How can I resolve this? If you propose a fix, please make it concise.| use std::fmt; | ||
| use std::sync::atomic::{AtomicBool, Ordering}; | ||
|
|
||
| use ethrex_common::{Address, H256}; | ||
| use fastbloom::AtomicBloomFilter; | ||
| use rustc_hash::FxBuildHasher; | ||
|
|
||
| const FALSE_POSITIVE_RATE: f64 = 0.01; | ||
|
|
||
| /// Bloom filter that tracks which (address, storage_key) pairs have non-zero | ||
| /// storage values. Used to skip expensive trie lookups for slots that were | ||
| /// never written to. | ||
| pub struct StorageBloomFilter { | ||
| filter: AtomicBloomFilter<FxBuildHasher>, | ||
| enabled: AtomicBool, | ||
| } | ||
|
|
||
| impl fmt::Debug for StorageBloomFilter { | ||
| fn fmt(&self, f: &mut fmt::Formatter<'_>) -> fmt::Result { | ||
| f.debug_struct("StorageBloomFilter").finish() | ||
| } | ||
| } | ||
|
|
||
| impl StorageBloomFilter { | ||
| pub fn new(capacity: usize) -> Self { | ||
| Self { | ||
| filter: AtomicBloomFilter::with_false_pos(FALSE_POSITIVE_RATE) | ||
| .hasher(FxBuildHasher) | ||
| .expected_items(capacity), | ||
| enabled: AtomicBool::new(false), | ||
| } | ||
| } | ||
|
|
||
| /// Activate the bloom filter after it has been populated. | ||
| /// Before this is called, `might_contain` always returns `true` (pass-through). | ||
| pub fn enable(&self) { | ||
| self.enabled.store(true, Ordering::Release); | ||
| } | ||
|
|
||
| /// Record that a non-zero value exists at (address, key). | ||
| pub fn insert(&self, address: Address, key: H256) { | ||
| let bloom_key = Self::make_key(address, key); | ||
| self.filter.insert(&bloom_key); | ||
| } | ||
|
|
||
| /// Returns `true` if the slot *might* contain a non-zero value. | ||
| /// Returns `false` if the slot was definitely never written. | ||
| /// When the filter is not yet enabled, always returns `true` (pass-through). | ||
| pub fn might_contain(&self, address: Address, key: H256) -> bool { | ||
| if !self.enabled.load(Ordering::Acquire) { | ||
| return true; | ||
| } | ||
| let bloom_key = Self::make_key(address, key); | ||
| self.filter.contains(&bloom_key) | ||
| } | ||
|
|
||
| fn make_key(address: Address, key: H256) -> [u8; 52] { | ||
| let mut buf = [0u8; 52]; | ||
| buf[..20].copy_from_slice(address.as_bytes()); | ||
| buf[20..].copy_from_slice(key.as_bytes()); | ||
| buf | ||
| } | ||
| } |
There was a problem hiding this comment.
No unit tests for the new module
bloom.rs introduces a StorageBloomFilter with non-trivial behaviour: a disabled filter must be a transparent pass-through, and an enabled filter must never produce false negatives. Neither property is covered by a test.
Suggested minimal test cases:
- A newly created filter (disabled) always returns
truefrommight_contain. - After
enable(), a key that wasinsert()-ed still returnstrue. - After
enable(), a key that was never inserted returnsfalse(no false negatives for a key set that is small relative to capacity). make_keyproduces distinct bytes for(a, k₁)vs(a, k₂)and(a₁, k)vs(a₂, k).
Given that a correctness regression here (false negatives while enabled) would silently return None for slots that actually have a value, tests are especially important.
Prompt To Fix With AI
This is a comment left during a code review.
Path: crates/storage/bloom.rs
Line: 1-63
Comment:
**No unit tests for the new module**
`bloom.rs` introduces a `StorageBloomFilter` with non-trivial behaviour: a disabled filter must be a transparent pass-through, and an enabled filter must never produce false negatives. Neither property is covered by a test.
Suggested minimal test cases:
- A newly created filter (disabled) always returns `true` from `might_contain`.
- After `enable()`, a key that was `insert()`-ed still returns `true`.
- After `enable()`, a key that was **never** inserted returns `false` (no false negatives for a key set that is small relative to capacity).
- `make_key` produces distinct bytes for `(a, k₁)` vs `(a, k₂)` and `(a₁, k)` vs `(a₂, k)`.
Given that a correctness regression here (false negatives while enabled) would silently return `None` for slots that actually have a value, tests are especially important.
How can I resolve this? If you propose a fix, please make it concise.| pub fn enable(&self) { | ||
| self.enabled.store(true, Ordering::Release); | ||
| } |
There was a problem hiding this comment.
enable() is never called — silent correctness risk for future callers
enable() has no callers in the repository today. When someone eventually wires it up (the "Future work" mentioned in the PR), they must ensure the filter has been fully populated from existing trie storage before calling it; calling enable() prematurely on a partially-populated filter would cause might_contain to return false for slots that genuinely have a value, silently returning Ok(None) instead of the correct stored value.
Consider adding a doc-comment warning directly on enable() about this requirement so the footgun is visible at the call site:
/// Activate the bloom filter.
///
/// # Safety / correctness
/// The filter MUST have been fully populated (via `insert`) for all
/// (address, storage_key) pairs that exist in the trie before this is called.
/// Calling `enable()` on a partially-populated filter will cause
/// `might_contain` to return `false` for real slots, producing silent
/// incorrect `None` results from `get_storage_at_root`.
pub fn enable(&self) {
self.enabled.store(true, Ordering::Release);
}Prompt To Fix With AI
This is a comment left during a code review.
Path: crates/storage/bloom.rs
Line: 36-38
Comment:
**`enable()` is never called — silent correctness risk for future callers**
`enable()` has no callers in the repository today. When someone eventually wires it up (the "Future work" mentioned in the PR), they must ensure the filter has been **fully populated** from existing trie storage before calling it; calling `enable()` prematurely on a partially-populated filter would cause `might_contain` to return `false` for slots that genuinely have a value, silently returning `Ok(None)` instead of the correct stored value.
Consider adding a doc-comment warning directly on `enable()` about this requirement so the footgun is visible at the call site:
```rust
/// Activate the bloom filter.
///
/// # Safety / correctness
/// The filter MUST have been fully populated (via `insert`) for all
/// (address, storage_key) pairs that exist in the trie before this is called.
/// Calling `enable()` on a partially-populated filter will cause
/// `might_contain` to return `false` for real slots, producing silent
/// incorrect `None` results from `get_storage_at_root`.
pub fn enable(&self) {
self.enabled.store(true, Ordering::Release);
}
```
How can I resolve this? If you propose a fix, please make it concise.…abled The bloom filter was created empty and immediately used to reject ALL storage lookups, since an empty bloom filter returns false for every query. This caused SLOAD to return 0 for all pre-existing storage slots, producing a gas mismatch (4.6M vs 10.5M expected). Add an `enabled` flag (AtomicBool, defaults to false) so that `might_contain` returns true (pass-through) until `enable()` is called. The filter will only start rejecting lookups after it has been populated with existing storage data and explicitly activated.
f36d8cb to
1629d73
Compare
Benchmark Block Execution Results Comparison Against Main
|
… mark flaky Hive Cancun ReOrg test (Transaction Nonce variant) as known-flaky
…nditions Use OnceLock to lazily allocate the ~240MB bloom filter on first insert() instead of at Store construction. This avoids upfront memory overhead when the filter is never used (dev mode, testnets). Also documents the warm-up insert pattern (inserts happen while disabled to populate the filter before enable()), and adds a precondition doc on enable() listing what must be true before calling it.
Genesis setup_genesis_state_trie inserts storage slots without updating the bloom filter, and write_storage_trie_nodes_batch (snap sync) bypasses it too. Document these as latent false-negative sources for when bloom is enabled.
Test pass-through when disabled, no false negatives after enable, rejection of unknown keys, and make_key distinctness.
The bloom filter only tracks writes during the current process lifetime. For RPC historical state queries, slots that were non-zero in older states but later zeroed won't be in the filter, causing false negatives.
| #[allow(dead_code)] | ||
| pub fn enable(&self) { |
There was a problem hiding this comment.
Why is the feature never enabled?
- Add §1.18 observability tooling (PR #6470) - Add §1.19 pivot update reliability (PR #6475, issue #6474) - Add §1.20 big-account within-trie parallelization (issue #6477) - Add §1.21 small-account batching (issue #6476) - Add §1.22 decoded TrieLayerCache (PR #6348) - Add §1.23 bloom filter for non-existent storage (PR #6288) - Add §1.24 adaptive request sizing + bisection (PR #6181) - Add §1.25 concurrent bytecode + storage (PR #6205) - Add §1.26 phase completion markers (PR #6189) - Add §2.18 StorageTrieTracker refactor (PR #6171) - Update current-state bottleneck table with small-account and pivot-update findings - Reprioritize timeline: pivot-update crash fix is now priority 0 - Add two risks (pivot crash masks perf work, DB corruption on every crash) - Bump doc version to 1.3
Motivation
Storage reads (
get_storage_value) always perform a full trie traversal, even for slots that were never written. On workloads with many SLOAD misses (e.g. first-touch patterns), this is a significant bottleneck.Description
Add a
StorageBloomFilterbacked byfastbloom::AtomicBloomFilterthat tracks every(address, storage_key)pair written with a non-zero value. Before traversing the trie,get_storage_valuechecks the bloom filter — if the slot was definitely never written, the trie lookup is skipped entirely.Key design points:
might_containalways returnstrueuntilenable()is called. This prevents an unpopulated filter from incorrectly rejecting all lookups.insert()is called whenever a non-zero storage value is written to the trie.FxBuildHasherfor fast hashing.AtomicBloomFilterwithAtomicBoolenabled flag (Release/Acquire ordering).Future work: scan existing trie storage on startup to populate the filter and call
enable(), so the filter can actually start rejecting lookups for pre-existing data.How to Test
enable()is wired up after initial population, benchmark withimportonl2-1k-erc20.rlpto measure trie-skip savings.