-
Notifications
You must be signed in to change notification settings - Fork 1k
Add support for direct io in SequentialFileReader #9395
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Changes from all commits
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
| Original file line number | Diff line number | Diff line change | ||||||||||||||||||||||||||||||||||||||||
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
|
|
@@ -31,6 +31,9 @@ const DEFAULT_READ_SIZE: IoSize = 1024 * 1024; | |||||||||||||||||||||||||||||||||||||||||
| // For large file we don't really use workers as few regularly submitted requests get handled | ||||||||||||||||||||||||||||||||||||||||||
| // within sqpoll thread. Allow some workers just in case, but limit them. | ||||||||||||||||||||||||||||||||||||||||||
| const DEFAULT_MAX_IOWQ_WORKERS: u32 = 2; | ||||||||||||||||||||||||||||||||||||||||||
| // This is conservative read size alignment for use with direct IO, some block devices may have | ||||||||||||||||||||||||||||||||||||||||||
| // relaxed requirements, but detecting it is not trivial. | ||||||||||||||||||||||||||||||||||||||||||
| const DIRECT_IO_READ_LEN_ALIGNMENT: IoSize = 4096; | ||||||||||||||||||||||||||||||||||||||||||
|
|
||||||||||||||||||||||||||||||||||||||||||
| /// Utility for building `SequentialFileReader` with specified tuning options. | ||||||||||||||||||||||||||||||||||||||||||
| pub struct SequentialFileReaderBuilder<'sp> { | ||||||||||||||||||||||||||||||||||||||||||
|
|
@@ -40,6 +43,8 @@ pub struct SequentialFileReaderBuilder<'sp> { | |||||||||||||||||||||||||||||||||||||||||
| shared_sqpoll_fd: Option<BorrowedFd<'sp>>, | ||||||||||||||||||||||||||||||||||||||||||
| /// Register buffer as fixed with the kernel | ||||||||||||||||||||||||||||||||||||||||||
| register_buffer: bool, | ||||||||||||||||||||||||||||||||||||||||||
| /// Toggle option for opening files with the O_DIRECT flag | ||||||||||||||||||||||||||||||||||||||||||
| use_direct_io: bool, | ||||||||||||||||||||||||||||||||||||||||||
| } | ||||||||||||||||||||||||||||||||||||||||||
|
|
||||||||||||||||||||||||||||||||||||||||||
| impl<'sp> SequentialFileReaderBuilder<'sp> { | ||||||||||||||||||||||||||||||||||||||||||
|
|
@@ -50,6 +55,7 @@ impl<'sp> SequentialFileReaderBuilder<'sp> { | |||||||||||||||||||||||||||||||||||||||||
| ring_squeue_size: None, | ||||||||||||||||||||||||||||||||||||||||||
| shared_sqpoll_fd: None, | ||||||||||||||||||||||||||||||||||||||||||
| register_buffer: false, | ||||||||||||||||||||||||||||||||||||||||||
| use_direct_io: false, | ||||||||||||||||||||||||||||||||||||||||||
| } | ||||||||||||||||||||||||||||||||||||||||||
| } | ||||||||||||||||||||||||||||||||||||||||||
|
|
||||||||||||||||||||||||||||||||||||||||||
|
|
@@ -70,6 +76,16 @@ impl<'sp> SequentialFileReaderBuilder<'sp> { | |||||||||||||||||||||||||||||||||||||||||
| self | ||||||||||||||||||||||||||||||||||||||||||
| } | ||||||||||||||||||||||||||||||||||||||||||
|
|
||||||||||||||||||||||||||||||||||||||||||
| /// Read files in direct-IO mode (disables kernel caching of read contents). | ||||||||||||||||||||||||||||||||||||||||||
| /// | ||||||||||||||||||||||||||||||||||||||||||
| /// Enabling requires the filesystem to support directio and `read_capacity` | ||||||||||||||||||||||||||||||||||||||||||
| /// to be a multiple of 4096. | ||||||||||||||||||||||||||||||||||||||||||
| #[cfg(test)] | ||||||||||||||||||||||||||||||||||||||||||
| pub fn use_direct_io(mut self, use_direct_io: bool) -> Self { | ||||||||||||||||||||||||||||||||||||||||||
| self.use_direct_io = use_direct_io; | ||||||||||||||||||||||||||||||||||||||||||
| self | ||||||||||||||||||||||||||||||||||||||||||
| } | ||||||||||||||||||||||||||||||||||||||||||
|
|
||||||||||||||||||||||||||||||||||||||||||
| /// Use (or remove) a shared kernel thread to drain submission queue for IO operations | ||||||||||||||||||||||||||||||||||||||||||
| pub fn shared_sqpoll(mut self, shared_sqpoll_fd: Option<BorrowedFd<'sp>>) -> Self { | ||||||||||||||||||||||||||||||||||||||||||
| self.shared_sqpoll_fd = shared_sqpoll_fd; | ||||||||||||||||||||||||||||||||||||||||||
|
|
@@ -127,9 +143,29 @@ impl<'sp> SequentialFileReaderBuilder<'sp> { | |||||||||||||||||||||||||||||||||||||||||
| unsafe { IoBufferChunk::register(buf_slice_mut, &ring)? }; | ||||||||||||||||||||||||||||||||||||||||||
| } | ||||||||||||||||||||||||||||||||||||||||||
|
|
||||||||||||||||||||||||||||||||||||||||||
| if self.use_direct_io { | ||||||||||||||||||||||||||||||||||||||||||
| // O_DIRECT reads have size and alignment restrictions and must be into a sub-buffer of | ||||||||||||||||||||||||||||||||||||||||||
| // some multiple of the fs block size (see https://man7.org/linux/man-pages/man2/open.2.html#NOTES). | ||||||||||||||||||||||||||||||||||||||||||
| assert!( | ||||||||||||||||||||||||||||||||||||||||||
| self.read_capacity | ||||||||||||||||||||||||||||||||||||||||||
| .is_multiple_of(DIRECT_IO_READ_LEN_ALIGNMENT), | ||||||||||||||||||||||||||||||||||||||||||
| "read size is not aligned for direct IO({} is not a multiple of \ | ||||||||||||||||||||||||||||||||||||||||||
| {DIRECT_IO_READ_LEN_ALIGNMENT})", | ||||||||||||||||||||||||||||||||||||||||||
| self.read_capacity | ||||||||||||||||||||||||||||||||||||||||||
| ); | ||||||||||||||||||||||||||||||||||||||||||
|
Comment on lines
+149
to
+155
|
||||||||||||||||||||||||||||||||||||||||||
| assert!( | |
| self.read_capacity | |
| .is_multiple_of(DIRECT_IO_READ_LEN_ALIGNMENT), | |
| "read size is not aligned for direct IO({} is not a multiple of \ | |
| {DIRECT_IO_READ_LEN_ALIGNMENT})", | |
| self.read_capacity | |
| ); | |
| if !self | |
| .read_capacity | |
| .is_multiple_of(DIRECT_IO_READ_LEN_ALIGNMENT) | |
| { | |
| return Err(io::Error::new( | |
| io::ErrorKind::InvalidInput, | |
| format!( | |
| "read size is not aligned for direct IO({} is not a multiple of \ | |
| {DIRECT_IO_READ_LEN_ALIGNMENT})", | |
| self.read_capacity | |
| ), | |
| )); | |
| } |
Copilot
AI
Feb 10, 2026
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Doc inconsistency: here it says read_limit must be "less than the file size" for direct IO, but the earlier docs for add_owned_file_to_prefetch say "less than or equal". Please make these consistent (and ideally enforce the constraint in code if it’s required).
| /// `read_limit` must be less than the file size if using direct io. | |
| /// `read_limit` must be less than or equal to the file size if using direct io. |
Copilot
AI
Feb 10, 2026
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
internal_read_len can be larger than the logical read_len to satisfy O_DIRECT alignment, but complete() and EOF handling still use read_len/is_last_read. This can expose more bytes than read_limit (and/or mark eof_pos beyond the intended limit) when internal_read_len > read_len, breaking the documented "read finishes after read_limit bytes" behavior. Consider tracking both the submitted length vs requested length and clamping the buffer’s readable length/EOF position to the requested read_len (discarding any over-read bytes) so fill_buf() never returns more than requested.
Copilot
AI
Feb 10, 2026
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
test_direct_io_read assumes the temp directory’s filesystem supports O_DIRECT. On common Linux setups where tempfile uses /tmp mounted as tmpfs, O_DIRECT open can fail with EINVAL, making this unit test flaky across CI environments. Consider skipping the test when opening with O_DIRECT fails, or creating the temp file in a directory/filesystem known to support direct IO.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
use_direct_iois gated behind#[cfg(test)], so production code (e.g.,large_file_buf_reader) cannot enable O_DIRECT at all andopen_file_flagswill always beO_NOATIMEin non-test builds. If this PR is intended to speed up snapshot reads in production, this method (or an equivalent config path) needs to be available outside tests and wired through the call sites that constructSequentialFileReaderBuilder.