Skip to content

Enable direct io for blockstore_db reads and writes#10907

Open
dachen0 wants to merge 2 commits into
anza-xyz:masterfrom
dachen0:blockstore-direct-io
Open

Enable direct io for blockstore_db reads and writes#10907
dachen0 wants to merge 2 commits into
anza-xyz:masterfrom
dachen0:blockstore-direct-io

Conversation

@dachen0
Copy link
Copy Markdown

@dachen0 dachen0 commented Mar 2, 2026

Problem

n/a

Summary of Changes

Add direct io support to blockstore_db reads and writes.

This uses check_direct_io_capability added by #10957 to check for dio support and then sets

options.set_use_direct_io_for_flush_and_compaction(true);
options.set_use_direct_reads(true);

if dio is supported on the blockstore path.

Fixes #

Copilot AI review requested due to automatic review settings March 2, 2026 17:43
@dachen0
Copy link
Copy Markdown
Author

dachen0 commented Mar 2, 2026

This is not merge ready. The direct io flag should be controlled by the user through a cli flag, similar to how snapshot direct io is controlled.

Copy link
Copy Markdown

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR enables direct I/O for RocksDB reads and writes in the blockstore, which can improve performance by bypassing the OS page cache and reducing CPU overhead from cache management. The change is scoped to Linux only via a #[cfg(target_os = "linux")] attribute.

Changes:

  • Enables set_use_direct_reads for all RocksDB blockstore reads on Linux.
  • Enables set_use_direct_io_for_flush_and_compaction for all RocksDB flush and compaction I/O on Linux.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment thread ledger/src/blockstore_db.rs
Comment thread ledger/src/blockstore_db.rs Outdated
// Direct io
#[cfg(target_os = "linux")]
{
options.set_use_direct_io_for_flush_and_compaction(true);
Copy link

Copilot AI Mar 2, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

set_use_direct_io_for_flush_and_compaction controls the I/O mode for RocksDB's background flush and compaction operations, which are write operations. This option is irrelevant and has no effect when the database is opened with AccessType::ReadOnly, since no flushes or compactions are performed in read-only mode.

While this is not harmful, for consistency with how other write-specific options are guarded (e.g., set_disable_auto_compactions is gated on should_disable_auto_compactions), consider only setting set_use_direct_io_for_flush_and_compaction when access_type is Primary or PrimaryForMaintenance.

Copilot uses AI. Check for mistakes.
@dachen0 dachen0 force-pushed the blockstore-direct-io branch from 5601f71 to a7706e0 Compare March 9, 2026 18:27
@dachen0
Copy link
Copy Markdown
Author

dachen0 commented Mar 9, 2026

cc @kskalski @steviez

This should make shred insert faster due to less writes from the background compaction and flushes hitting the cache -> WAL writes are faster to the fs block cache. And overall the compacted/flushed entries don't need be in the cache(we only need recent shreds most of the time).

I'm not really sure how to bench this change so please let me know what if/how it should be benched :)

@dachen0
Copy link
Copy Markdown
Author

dachen0 commented Mar 9, 2026

@kskalski
Copy link
Copy Markdown

I suppose it will be better to have a flag that controls the mode directly and just check capability as a validation / better diagnostics as we did for --no-accounts-db-snapshot-direct-io, but I think Steve is best to advise on that.

@dachen0
Copy link
Copy Markdown
Author

dachen0 commented Mar 12, 2026

https://github.com/facebook/rocksdb/wiki/Block-Cache

I've been looking at blockstore_rocksdb_cf while using direct io and noticed that the block_cache_usage was non zero. So rocksdb has an internal block cache on top of the system page cache. So if direct io causes read performance degradation, then an easy solution is to tune the rocksdb read only block cache size.

EDIT: on a related note, should the block cache even be enabled? Since it's a RO LRU cache, it's not very useful when we already have memtables we can read from for recent writes. Shreds are inserted (mostly) in order, so the the block cache functions as a queue in our usecase.

@steviez
Copy link
Copy Markdown

steviez commented Mar 12, 2026

Hi @dachen0 - Appreciate the interest and you digging in here (and your other PR). I'm open to exploring these options; however, I'm currently focused on trying to land a change (& balance everything else) to avoid the WAL (#4132) for shred insertion that we can take in v4.0

Steady state replay hits memtables for reads, and skipping the WAL for shreds will ensure that any slowdowns for the WAL (flush happens at some set cadence IIRC) does not incur in the shred ingestion (Blockstore insertion) path.

After that, I'm down to explore more optimizations like this PR and your other one. However, can we defer that for ~2 weeks - I should have fewer things on my plate / more time to give actual responses, ideas on testing, etc then

@dachen0
Copy link
Copy Markdown
Author

dachen0 commented Mar 13, 2026

Thanks for taking the time to respond!

I will sit tight then.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants