-
Notifications
You must be signed in to change notification settings - Fork 12.7k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Speed up Read::bytes
#116651
Speed up Read::bytes
#116651
Conversation
This greatly increases its speed.
We can reuse this in `next`, avoiding the need to zero-initialize a local variable every time, for a small speed win.
r? @cuviper (rustbot has picked a reviewer for you, use r? to override) |
This is much faster.
I added another commit that reduces the time further, from 215ms to 140ms. |
1c12a4e
to
ed2a587
Compare
Have you tried using specialization and reaching into the |
You can read into a MaybeUninit via |
These optimizations apply regardless of the underlying reader. For example, a Specialization could be worth trying, but that is orthogonal to this PR. |
Specialization was a good idea, and I managed to get it working. But even after trying a dozen different formulations I couldn't get it any faster than the non-specialized version. Surprising! I thought a lot of the code handling variable lengths would boil away when specialized for single byte reads, making a big difference, but no. And the specialized version was more complicated. So I think this can be reviewed in its current state. |
Another thing, |
Indeed, This should be faster because most of the time we only need one check per byte read - whether we should stop iterating, and the check if the buffer needs to be refilled is only going to happen once in every |
Here's a sketch of fn custom_try_fold<B, F, I>(mut reader: I, init: B, mut f: F) -> impl Try<Output = B>
where
I: BufRead,
F: FnMut(B, u8) -> ControlFlow<B, B>,
{
let mut accum = init;
loop {
match reader.fill_buf() {
Ok(chunk) => {
// check for EOF
if chunk.len() == 0 {
return Ok(accum);
}
let mut iterator = chunk.iter().copied();
let result = iterator.try_fold(accum, &mut f);
let consumed = chunk.len() - iterator.len();
reader.consume(consumed);
match result {
ControlFlow::Continue(a) => accum = a,
ControlFlow::Break(a) => {
accum = a;
return Ok(accum);
},
}
},
Err(e) => return Err(e),
};
}
} It's not fully generic because I don't really understand the nightly-only Here is the same test program as before rewritten using this function, which is what I used for benchmarks: https://gist.github.com/Shnatsel/c13021180a8378a8970fd89006b2dd09 |
But the internal iteration implementation is a lot more complex, so it is probably best to follow up on this in a separate PR. The changes proposed here are valuable, and will not be superseded by an optimized internal iteration if/when it materializes. |
There is a bit of a semantic issue with implementing
This means that filtering the result of Also, if |
I have pulled out the inlining patch to #116775, because it's a simple change that gives a big win that I think is worth merging while we work through the other more complicated changes. |
Another option is to implement |
It seems that I find that surprising. Shouldn't internal iteration be used whenever possible, since it optimizes better? |
I tried to implement impl<R: Read> Iterator for Bytes<R> {
type Item = Result<u8>;
fn try_fold<B, F, Res>(&mut self, init: B, mut f: F) -> Res
where
F: FnMut(B, Self::Item) -> Res,
Res: Try<Output = B>,
{
let mut buf = [0u8; 256]; // njn: size?
let mut acc = init;
loop {
match self.inner.read(&mut buf) {
Ok(0) => return Res::from_output(acc),
Ok(n) => {
// njn: need to protect against `f` panicking
acc = buf[..n].iter().map(|&i| Ok(i)).try_fold(acc, &mut f)?;
}
Err(ref e) if e.is_interrupted() => continue,
// njn: impossible? no way to convert an io::Error into a
// generic residual, because there's no `Try::from_error` method
Err(e) => ...
}
}
}
} |
On Zulip, @the8472 said:
I agree with the first paragraph. But I don't see how |
I've done some specialization in #116785. |
@nnethercote I believe you are looking for This method is defined on a supertrait of |
Regarding this line:
I don't think buffering inside an iterator is correct for Buffering is feasible for any |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
r=me
after adding a comment about the byte
.
(unless you're still experimenting?)
@@ -2772,24 +2772,23 @@ impl<T> SizeHint for Take<T> { | |||
#[derive(Debug)] | |||
pub struct Bytes<R> { | |||
inner: R, | |||
byte: u8, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The commit explains why this is here, but that deserves a comment too.
@cuviper: sorry, this ended up being superseded by #116775 and #116785. Plus @LegionMammal978 had a good comment about the I will close this. It's been an interesting ride! I've learned about specialization and various interesting |
nnethercote/perf-book#69 explains that
Read::bytes
in combination with aBufReader
is very slow. This PR speeds it up quite a bit -- on a simple test program the runtime dropped from 320ms to 215ms -- but it's still a lot slower than alternatives. This is basically becauseBufReader
has a certain amount of overhead for eachread
call, and so a configuration where every single byte requires aread
is just a bad one for it.