Enable direct io for blockstore_db reads and writes#10907
Conversation
|
This is not merge ready. The direct io flag should be controlled by the user through a cli flag, similar to how snapshot direct io is controlled. |
There was a problem hiding this comment.
Pull request overview
This PR enables direct I/O for RocksDB reads and writes in the blockstore, which can improve performance by bypassing the OS page cache and reducing CPU overhead from cache management. The change is scoped to Linux only via a #[cfg(target_os = "linux")] attribute.
Changes:
- Enables
set_use_direct_readsfor all RocksDB blockstore reads on Linux. - Enables
set_use_direct_io_for_flush_and_compactionfor all RocksDB flush and compaction I/O on Linux.
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
| // Direct io | ||
| #[cfg(target_os = "linux")] | ||
| { | ||
| options.set_use_direct_io_for_flush_and_compaction(true); |
There was a problem hiding this comment.
set_use_direct_io_for_flush_and_compaction controls the I/O mode for RocksDB's background flush and compaction operations, which are write operations. This option is irrelevant and has no effect when the database is opened with AccessType::ReadOnly, since no flushes or compactions are performed in read-only mode.
While this is not harmful, for consistency with how other write-specific options are guarded (e.g., set_disable_auto_compactions is gated on should_disable_auto_compactions), consider only setting set_use_direct_io_for_flush_and_compaction when access_type is Primary or PrimaryForMaintenance.
5601f71 to
a7706e0
Compare
|
This should make shred insert faster due to less writes from the background compaction and flushes hitting the cache -> WAL writes are faster to the fs block cache. And overall the compacted/flushed entries don't need be in the cache(we only need recent shreds most of the time). I'm not really sure how to bench this change so please let me know what if/how it should be benched :) |
|
I suppose it will be better to have a flag that controls the mode directly and just check capability as a validation / better diagnostics as we did for |
|
https://github.com/facebook/rocksdb/wiki/Block-Cache I've been looking at EDIT: on a related note, should the block cache even be enabled? Since it's a RO LRU cache, it's not very useful when we already have memtables we can read from for recent writes. Shreds are inserted (mostly) in order, so the the block cache functions as a queue in our usecase. |
|
Hi @dachen0 - Appreciate the interest and you digging in here (and your other PR). I'm open to exploring these options; however, I'm currently focused on trying to land a change (& balance everything else) to avoid the WAL (#4132) for shred insertion that we can take in v4.0 Steady state replay hits memtables for reads, and skipping the WAL for shreds will ensure that any slowdowns for the WAL (flush happens at some set cadence IIRC) does not incur in the shred ingestion (Blockstore insertion) path. After that, I'm down to explore more optimizations like this PR and your other one. However, can we defer that for ~2 weeks - I should have fewer things on my plate / more time to give actual responses, ideas on testing, etc then |
|
Thanks for taking the time to respond! I will sit tight then. |
Problem
n/a
Summary of Changes
Add direct io support to blockstore_db reads and writes.
This uses
check_direct_io_capabilityadded by #10957 to check for dio support and then setsif dio is supported on the blockstore path.
Fixes #