Skip to content

perf(storage): add bloom filter for empty storage slots#21185

Draft
gakonst wants to merge 9 commits intomainfrom
tempo/storage-bloom-filter
Draft

perf(storage): add bloom filter for empty storage slots#21185
gakonst wants to merge 9 commits intomainfrom
tempo/storage-bloom-filter

Conversation

@gakonst
Copy link
Member

@gakonst gakonst commented Jan 19, 2026

Summary

Implements a bloom filter to short-circuit storage reads for empty slots, potentially avoiding RocksDB lookups for 30-40% of storage reads.

Based on analysis from Nethermind's FlatDB work (PR #9854):

  • Mainnet has ~500M-1B non-empty storage slots
  • A bloom filter can definitively say 'not present' for empty slots
  • Potential 10-20% mgas/s improvement on some block ranges

Architecture

  • New reth-storage-bloom crate with GrowableBloom filter
  • BloomStateProvider wrapper that checks bloom before DB access
  • Metrics for bloom hits, misses, and false positives
  • Persistence support for saving/loading bloom to disk

Trade-offs

From Nethermind's experience:

  • Memory: ~1-2GB for mainnet scale at 1% false positive rate
  • Write amplification: Every storage write updates bloom
  • Uncertain ROI: Depends heavily on workload characteristics

How to Test

This is feature-gated behind storage-bloom flag. To enable:

reth-provider = { features = ["storage-bloom"] }

Benchmark Request

Need to benchmark on mainnet block re-execution to measure:

  1. Storage read reduction (bloom hit rate)
  2. False positive rate
  3. Overall mgas/s impact
  4. Memory overhead

cc @brian @alexey - please review benchmark results when available

Closes

Closes #21184

Implements a bloom filter to short-circuit storage reads for empty slots,
potentially avoiding RocksDB lookups for 30-40% of storage reads.

Based on analysis from Nethermind's FlatDB work:
- Mainnet has ~500M-1B non-empty storage slots
- A bloom filter can definitively say 'not present' for empty slots
- Potential 10-20% mgas/s improvement on some block ranges

Architecture:
- New reth-storage-bloom crate with GrowableBloom filter
- BloomStateProvider wrapper that checks bloom before DB access
- Metrics for bloom hits, misses, and false positives
- Persistence support for saving/loading bloom to disk

Trade-offs:
- Memory: ~1-2GB for mainnet scale at 1% false positive rate
- Write amplification: Every storage write updates bloom
- Feature-gated behind 'storage-bloom' flag

Closes #21184

Amp-Thread-ID: https://ampcode.com/threads/T-019bd355-619a-769b-9eb0-40d1dc2d47ba
Co-authored-by: Amp <amp@ampcode.com>
@gakonst gakonst added the C-perf A change motivated by improving speed, memory usage or disk footprint label Jan 19, 2026
@github-project-automation github-project-automation bot moved this to Backlog in Reth Tracker Jan 19, 2026
@gakonst
Copy link
Member Author

gakonst commented Jan 19, 2026

Benchmark Started 🚀

The CodSpeed benchmark workflow is running: https://github.com/paradigmxyz/reth/actions/runs/21133685427

Note: This is the standard bench workflow. For proper mainnet block re-execution benchmarks to measure the bloom filter impact, we'll need to run manual benchmarks on a reth box.

What to measure:

  1. Storage read reduction - bloom hit rate (how many reads avoided DB)
  2. False positive rate - bloom said 'maybe present' but DB returned empty
  3. Overall mgas/s impact - with vs without bloom
  4. Memory overhead - actual bloom filter size at scale

cc @brian @alexey - will need to coordinate a manual benchmark run on reth3/reth5 to get meaningful numbers.

gakonst and others added 3 commits January 19, 2026 10:24
- Replace FPR abbreviation with 'false positive rate'
- Merge match arms with identical bodies
- Apply nightly rustfmt formatting
Copy link
Member

@mediocregopher mediocregopher left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Integrate into the payload validator in engine tree

@github-project-automation github-project-automation bot moved this from Backlog to In Progress in Reth Tracker Jan 19, 2026
- Remove file persistence (load_from_file, save_to_file) from bloom filter
- Remove BloomError type and related test
- Integrate storage bloom filter into BasicEngineValidator in engine tree
- Add storage-bloom feature flag to engine-tree crate
- Remove file persistence code (load_from_file, save_to_file) per review
- Integrate bloom filter into payload validator in engine tree
- Add storage-bloom feature flag to reth-engine-tree
- Export BloomStateProvider from reth-provider

The bloom filter now starts fresh on each node restart and is integrated
into the payload validation path where it wraps the state provider to
short-circuit empty storage slot reads.
This fixes the API breakage when storage-bloom feature is enabled.
The bloom filter is now created inside the BasicEngineValidator::new
constructor with default configuration, keeping the constructor
signature stable regardless of feature flags.
Per review feedback, removed the feature flag and made the storage
bloom filter the default behavior. The bloom filter is now always
enabled and integrated into the payload validator.
@github-actions github-actions bot added the S-stale This issue/PR is stale and will close with no further activity label Feb 10, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

C-perf A change motivated by improving speed, memory usage or disk footprint S-stale This issue/PR is stale and will close with no further activity

Projects

Status: In Progress

Development

Successfully merging this pull request may close these issues.

perf(storage): Add bloom filter for empty storage slots

2 participants

Comments