Skip to content

Add io uring file writer#10105

Open
dachen0 wants to merge 6 commits into
anza-xyz:masterfrom
dachen0:io_uring_file_writer
Open

Add io uring file writer#10105
dachen0 wants to merge 6 commits into
anza-xyz:masterfrom
dachen0:io_uring_file_writer

Conversation

@dachen0
Copy link
Copy Markdown

@dachen0 dachen0 commented Jan 19, 2026

Problem

Writing to file currently just uses BufWriter without IoUring.

Summary of Changes

Add IoUringFileWriter which allows writing to a single file using io uring.

bench results:

files writer stats - num_direct_io_writes: 0 num_regular_io_writes: 5000 avg_num_buffers_free_during_write: 0.0028
files writer stats - num_direct_io_writes: 5000 num_regular_io_writes: 0 avg_num_buffers_free_during_write: 0.0042
io_uring: 1.133726434s
io_uring direct io: 1.051639206s
BufWriter: 1.506094476s

cc @kskalski

I copied most of the code from SequentialFileReader and IoUringFileCreator and didn't touch max_iowq_workers, ring_squeue_size, shared_sqpoll_fd.

@mergify mergify Bot requested a review from a team January 19, 2026 01:08
@kskalski
Copy link
Copy Markdown

Ideally we would just use file creator. But do wonder how to best provide std::io::Write capability to file creator and maybe that would deserve a separate module, but I hope not.

BTW tiered_storage is not used right now I don't know of plans to use it.

@kskalski
Copy link
Copy Markdown

Basically we need to move on two efforts:

  • implement io-uring writer returned here
    pub fn large_file_buf_writer(path: impl AsRef<Path>) -> io::Result<impl io::Write + io::Seek> {
  • use above interface in more places

Currently writing new accounts storage is done with

write_buffer_to_file(file, data, pos as u64)?;

After 4.0 cut we will remove mmap variant for storage access and then it should be easy to refactor the APIs to simply use std::io::Write and write the whole file from start to end instead of doing random position writes.

@dachen0
Copy link
Copy Markdown
Author

dachen0 commented Jan 19, 2026 via email

@dachen0
Copy link
Copy Markdown
Author

dachen0 commented Jan 19, 2026 via email

@dachen0
Copy link
Copy Markdown
Author

dachen0 commented Jan 20, 2026

Master

[2026-01-20T06:04:02.812789564Z INFO  solana_metrics::metrics] datapoint: serialize_account_storage_ms duration=14i num_entries=90275i
[2026-01-20T06:04:04.413420093Z INFO  solana_runtime::snapshot_utils] bank serialize took 1.9s for slot 383001061 at /solana/ledger/.tmp9cwzzQ/383001061/383001061
[2026-01-20T06:04:04.413455994Z INFO  agave_snapshots::archive] Generating snapshot archive for slot 383001061, kind: Full

Pr

[2026-01-20T05:56:08.166241071Z INFO  solana_metrics::metrics] datapoint: serialize_account_storage_ms duration=15i num_entries=90275i
[2026-01-20T05:56:10.794343162Z INFO  solana_runtime::snapshot_utils] bank serialize took 3.1s for slot 383001061 at /solana/ledger/.tmpmntQ5x/383001061/383001061
[2026-01-20T05:56:10.794383793Z INFO  agave_snapshots::archive] Generating snapshot archive for slot 383001061, kind: Full

So...

The PR does something. But unfortunately that something is making snapshot writing take almost twice as long. Will look into this.

@dachen0
Copy link
Copy Markdown
Author

dachen0 commented Jan 20, 2026

It's faster on sequential writes(~15% faster). Does the compressor for snapshots seek a lot? My current implementation always flushes on seek.

bench for sequential writes

    #[test]
    #[ignore]
    fn bench_io_uring_vs_bufwriter() {
        use std::{fs::File, io::BufWriter, time::Instant};

        const FILE_SIZE: usize = 5000 * 1024 * 1024; // 5000 MB
        const CHUNK_SIZE: usize = 4096;
        const DEFAULT_BUFFER_WRITE_SIZE: usize = 2 * 1024 * 1024;

        let data: Vec<u8> = (0..CHUNK_SIZE).map(|i| i as u8).collect();

        // io_uring writer
        let path = "/home/ra/test-garbage-data.dat";
        File::create(path).unwrap();
        let buf = PageAlignedMemory::new(DEFAULT_BUFFER_WRITE_SIZE).unwrap();
        let mut writer = IoUringFileWriterBuilder::new()
            .build_with_buffer(path, buf)
            .unwrap();
        let start = Instant::now();
        for _ in 0..FILE_SIZE / CHUNK_SIZE {
            writer.write_all(&data).unwrap();
        }
        writer.flush().unwrap();
        let io_uring_time = start.elapsed();
        std::fs::remove_file(path).unwrap();

        // BufWriter
        let path = "/home/ra/test-garbage-data.dat";
        let file = File::create(path).unwrap();
        let mut writer = BufWriter::with_capacity(DEFAULT_BUFFER_WRITE_SIZE, file);
        let start = Instant::now();
        for _ in 0..FILE_SIZE / CHUNK_SIZE {
            writer.write_all(&data).unwrap();
        }
        writer.flush().unwrap();
        let bufwriter_time = start.elapsed();
        std::fs::remove_file(path).unwrap();

        println!("io_uring: {:?}", io_uring_time);
        println!("BufWriter: {:?}", bufwriter_time);
        println!(
            "io_uring is {:.2}x faster",
            bufwriter_time.as_secs_f64() / io_uring_time.as_secs_f64()
        );
    }

@dachen0
Copy link
Copy Markdown
Author

dachen0 commented Jan 20, 2026

Changing chunk size to be smaller makes IoUringWriter much slower.
But if we wrap IoUringWriter with BufWriter, it becomes much faster than just a BufWriter with a large buffer. And on large writes it's still ~15% faster. This is absolutely stupid but it works lmao.

@dachen0
Copy link
Copy Markdown
Author

dachen0 commented Jan 20, 2026

BufWriter + IoUringWriter is 60% faster on CHUNK_SIZE=100 and decreases to 15% when chunk size is in the kilobyte range.

@dachen0 dachen0 force-pushed the io_uring_file_writer branch from 3900e80 to df92c43 Compare January 20, 2026 20:56
@dachen0
Copy link
Copy Markdown
Author

dachen0 commented Jan 21, 2026

Added direct io and benched using the following. Direct io is slightly faster for the test data. Also changed the default write size and to 1024 * 1024 and got ~10% more performance out of that. Maybe the file creator default write size should also be looked at.

    #[test]
    #[ignore]
    fn bench_io_uring_vs_bufwriter() {
        use std::{fs::File, io::BufWriter, time::Instant};

        const FILE_SIZE: usize = 5000 * 1024 * 1024; // 5000 MB
        const CHUNK_SIZE: usize = 800;
        const DEFAULT_BUFFER_WRITE_SIZE: usize = 2 * 1024 * 1024;

        let data: Vec<u8> = (0..CHUNK_SIZE).map(|i| i as u8).collect();

        // io_uring writer
        let path = "/home/ra/test-garbage-data.dat";
        File::create(path).unwrap();
        let buf = PageAlignedMemory::new(DEFAULT_BUFFER_WRITE_SIZE).unwrap();
        let mut writer = BufWriter::new(IoUringFileWriterBuilder::new()
            .build_with_buffer(path, buf)
            .unwrap());
        let start = Instant::now();
        for _ in 0..FILE_SIZE / CHUNK_SIZE {
            writer.write_all(&data).unwrap();
        }
        writer.flush().unwrap();
        let io_uring_time = start.elapsed();
        std::fs::remove_file(path).unwrap();


        // io_uring writer with direct io
        let path = "/home/ra/test-garbage-data.dat";
        File::create(path).unwrap();
        let buf = PageAlignedMemory::new(DEFAULT_BUFFER_WRITE_SIZE).unwrap();
        let mut writer = BufWriter::new(IoUringFileWriterBuilder::new()
            .use_direct_io(true)
            .build_with_buffer(path, buf)
            .unwrap());
        let start = Instant::now();
        for _ in 0..FILE_SIZE / CHUNK_SIZE {
            writer.write_all(&data).unwrap();
        }
        writer.flush().unwrap();
        let io_uring_direct_io_time = start.elapsed();
        std::fs::remove_file(path).unwrap();

        // BufWriter
        let path = "/home/ra/test-garbage-data.dat";
        let file = File::create(path).unwrap();
        let mut writer = BufWriter::with_capacity(DEFAULT_BUFFER_WRITE_SIZE, file);
        let start = Instant::now();
        for _ in 0..FILE_SIZE / CHUNK_SIZE {
            writer.write_all(&data).unwrap();
        }
        writer.flush().unwrap();
        let bufwriter_time = start.elapsed();
        std::fs::remove_file(path).unwrap();

        println!("io_uring: {:?}", io_uring_time);
        println!("io_uring direct io: {:?}", io_uring_direct_io_time);
        println!("BufWriter: {:?}", bufwriter_time);
    }

@dachen0
Copy link
Copy Markdown
Author

dachen0 commented Jan 21, 2026

bench test output:

files writer stats - num_direct_io_writes: 0 num_regular_io_writes: 5000 avg_num_buffers_free_during_write: 0.0028
files writer stats - num_direct_io_writes: 5000 num_regular_io_writes: 0 avg_num_buffers_free_during_write: 0.0042
io_uring: 1.133726434s
io_uring direct io: 1.051639206s
BufWriter: 1.506094476s

@kskalski I think this pr is ready for review.

@dachen0 dachen0 marked this pull request as ready for review January 21, 2026 18:04
Copilot AI review requested due to automatic review settings January 21, 2026 18:04
Copy link
Copy Markdown

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR adds IoUringFileWriter, a new io_uring-based file writer for improved write performance, particularly for accounts storage operations. The writer supports asynchronous writes with configurable buffer sizes and includes optional direct I/O support.

Changes:

  • Added new IoUringFileWriter and IoUringFileWriterBuilder with comprehensive test coverage
  • Extended IoBufferChunk with AsMut<[u8]> trait implementation for buffer manipulation
  • Integrated io_uring writer into large_file_buf_writer() replacing the previous BufWriter<File> implementation

Reviewed changes

Copilot reviewed 5 out of 5 changed files in this pull request and generated 11 comments.

Show a summary per file
File Description
fs/src/io_uring/file_writer.rs New module implementing IoUringFileWriter with builder pattern, write/seek support, and comprehensive tests
fs/src/io_uring/memory.rs Added AsMut<[u8]> implementation for IoBufferChunk to support mutable buffer access
fs/src/io_uring/file_creator.rs Changed CHECK_PROGRESS_AFTER_SUBMIT_TIMEOUT visibility to pub(crate) for reuse in file_writer
fs/src/io_uring/mod.rs Added file_writer module export
fs/src/buffered_writer.rs Replaced direct file writing with IoUringFileWriter wrapped in BufWriter for improved performance

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment thread fs/src/io_uring/file_writer.rs Outdated
Comment thread fs/src/io_uring/file_writer.rs Outdated
Comment thread fs/src/io_uring/file_writer.rs
Comment thread fs/src/io_uring/file_writer.rs Outdated
Comment thread fs/src/io_uring/file_writer.rs Outdated
Comment on lines +82 to +83
/// Enabling requires the filesystem to support directio and subbuffers to be a multiple
/// of the fs block size
Copy link

Copilot AI Jan 21, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The documentation refers to "subbuffers" which is unclear. This should clarify that it refers to the write_size chunks, e.g., "Enabling requires the filesystem to support directio and the write_size to be a multiple of the fs block size".

Suggested change
/// Enabling requires the filesystem to support directio and subbuffers to be a multiple
/// of the fs block size
/// Enabling requires the filesystem to support directio and the `write_size` to be a
/// multiple of the filesystem block size.

Copilot uses AI. Check for mistakes.
Comment thread fs/src/io_uring/file_writer.rs Outdated
Comment thread fs/src/io_uring/file_writer.rs
Comment thread fs/src/io_uring/file_writer.rs Outdated
Comment thread fs/src/io_uring/file_writer.rs Outdated
@dachen0 dachen0 force-pushed the io_uring_file_writer branch from 9ff1615 to d3b0fb6 Compare January 21, 2026 18:17
Copy link
Copy Markdown

@kskalski kskalski left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I was wondering if this could be implemented as a wrapper on IoUringFileCreator - the creator's internal representation should allow the required operations, e.g.:

  • initialize the file state on construction (creator's open)
  • obtain the buffer (wait_free_buf) and store it locally
  • keep filling the current buffer as writes come
  • once buffer is full perform the part of the code from write_and_close that schedules new op and replace the buffer

@dachen0
Copy link
Copy Markdown
Author

dachen0 commented Jan 22, 2026

I think it's better to do it this way -- either completely rewrite file creator to support regular writes or just make a standalone. This feels like the type of project where it's a trap to merge both of them even though they're similar.

totally vibes based take though

@dachen0 dachen0 closed this Jan 22, 2026
@dachen0 dachen0 reopened this Jan 22, 2026
@kskalski
Copy link
Copy Markdown

File creator has everything that is needed for implementing io::Write, we just need an API to populate buffers from provided slice without finalizing the file and track offset as part of state.
io::Seek is probably not much harder.

I created a refactor PR (#10157) that should make this all pretty straight-forward. I foresee the writer to basically contain { creator: IoUringFileCreator, offset: FileSize }.

@dachen0
Copy link
Copy Markdown
Author

dachen0 commented Jan 23, 2026

The refactor does make it quite easy. I'll do a rewrite using FileCreator's internals once that gets merged.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants