-
Notifications
You must be signed in to change notification settings - Fork 107
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
I/O: Read::bytes()
is very slow, even on a BufReader
#69
Comments
I found there's a bug in your third program, which causes it to give different answers to the first two programs for any file with more than 64 KiB chars. The problem is that on the final iteration Here's a version that uses use std::io::Read;
fn main() -> Result<(), Box<dyn std::error::Error>> {
let test_file_path = std::env::args_os().nth(1).expect("No input file specified!");
let mut test_file = std::fs::File::open(test_file_path)?;
// Instead of reading the whole file in memory, read it in small-ish chunks that fit into CPU
// cache. This also dramatically reduces memory usage because we never keep more than 64 KiB in
// memory.
let mut chunk = vec![0; 65536];
let mut sum: u64 = 0;
loop {
let bytes_read = test_file.read(&mut chunk)?;
if bytes_read == 0 {
break;
}
sum += chunk.iter().take(bytes_read).map(|i| *i as u64).sum::<u64>();
}
println!("{sum}");
Ok(())
} |
Good catch! Thanks! There may be some bugs like these lurking still, I didn't do any in-depth testing yet. The fully manual line reader is definitely buggy, but probably not in ways that matter for performance. I've adopted your version of the manual read loop, only changing |
I filed rust-lang/rust#116651 which speeds up |
Oh, that's amazing! I didn't expect my observations to lead to actual speedups! I cannot wrap my head about why that transformation makes the code faster. How does reading into a struct member instead of a local variable speed things up? |
As per the commit log, there's no need to initialize it to zero on every iteration. But that was only a small improvement, inlining was most of the initial improvement. And I just added another improvement using |
Now that rust-lang/rust#116775 and rust-lang/rust#116785 have landed, |
The performance of
Read::bytes()
iterator is atrocious, even over aBufReader
.Here's a sample program calculating a sum of all bytes in the file with various approaches:
Read::read()
In this test
Read::bytes()
10x slower than reading the whole file into memory up front and 20x slower than manually reading into a smallish buffer over and over. The performance gap is so large that even reading line-by-line with UTF-8 validation is still multiple times faster!Benchmarked using
hyperfine
on this public domain book in plain text format repeated 50 times.The text was updated successfully, but these errors were encountered: