Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Iterator::fold is a little slow compared to bare loop #76725

Closed
mlodato517 opened this issue Sep 15, 2020 · 4 comments
Closed

Iterator::fold is a little slow compared to bare loop #76725

mlodato517 opened this issue Sep 15, 2020 · 4 comments
Labels
A-iterators Area: Iterators I-slow Issue: Problems and improvements with respect to performance of generated code. T-libs Relevant to the library team, which will review and decide on the PR/issue. T-libs-api Relevant to the library API team, which will review and decide on the PR/issue.

Comments

@mlodato517
Copy link
Contributor

Background

I was writing a small project to compare the memory impact of iterators. I decided to also investigate the runtime impact and see how close Rust came to providing various zero cost abstractions to iteration. I compared three types of iteration:

fn filter_map_filter(nums: &[u64]) -> Vec<u64> {
  nums.iter().filter(...).map(...).filter(...).collect()
}

fn fold(nums: &[u64]) -> Vec<u64> {
  nums.iter().fold(Vec::new(), ...)
}

fn raw(nums: &[u64]) -> Vec<u64> {
  let mut result = Vec::new();
  for n in nums { ... }
  result
}

and saw this Criterion output plot:
criterion-plot

The outliers to the right are the timings of the fold functions while the two pairs on the left are the filter_map_filter and raw versions. Rust did a great job ensuring .filter.map.filter was the same speed as a raw loop but .fold seemed to be lacking.

Quick Investigation

Looking at the source for .fold the accum is reassigned with the result of each invocation of f. I quickly tested if this could be improved with a &mut instead in this PR. The result was surprising (to me):

criterion-plot-mut-ref

The "custom fold" method was faster than all the other options (which doesn't make a ton of sense to me but that's what y'all are here for!).

Path Forward

I initially was going to suggest adding some sort of fold_mut or some better named method to allow for this faster fold iterator. This could be a performance improvement in some areas and could also improve the syntax when the closure couldn't "easily" return the new accumulator:

iter.fold(Vec::new(), |v, x| {
  v.push(x);
  v // this line is a little weird
})

I made a branch for this if we want to head in that direction (the tests are slim, the benchmarks are probably overkill, the stability is missing, and the docs are probably slim and improperly formatted but it's a start!) and saw some improvements in the benchmarks I added:

benchmarks

Now I'm not sure if this "fold_mut" path is the right way to go - I'm not sure if it's awkward or dangerous. It seems similar to Ruby's each_with_object so there's maybe something there. It could also be that with some compiler witchcraft we can just make fold a "true" zero cost abstraction.

In any case, thought I'd post here instead of making a PR so we could decided if there should be any PR and I'm happy to help with whatever path forward we choose!

@jyn514 jyn514 added A-iterators Area: Iterators I-slow Issue: Problems and improvements with respect to performance of generated code. T-libs Relevant to the library team, which will review and decide on the PR/issue. labels Sep 15, 2020
@jyn514
Copy link
Member

jyn514 commented Sep 15, 2020

For adding new APIs like this, I would recommend just making the PR directly, especially since you already wrote the code. Please change #[stable] to #[unstable], though.

@jyn514 jyn514 added the T-libs-api Relevant to the library API team, which will review and decide on the PR/issue. label Sep 15, 2020
@workingjubilee
Copy link
Member

I thought update-in-place semantics were supposed to be equivalent to mut in simple cases like this? This PR should definitely be made but simultaneously shouldn't have to be, since fold isn't the only place where update-in-place reasoning is used extensively.

@mlodato517
Copy link
Contributor Author

I hoped for something like that! I wanted these to be "the same" and that's why I wasn't sure if we wanted to add fold_mut or just figure out why fold wasn't optimizing the way we wanted it to. fold_mut may have some merit as a stop-gap and also it has (occasionally) nicer semantics, but it'd be (probably) better if fold could do the same work as fold_mut is doing.

I hope it's not bad microbenchmarks either - no idea if they're being optimized differently just for that reason or anything. No idea how that works

@workingjubilee
Copy link
Member

I don't think it's an issue with the microbenchmarks.

And PRs are not always accepted! And can feature discussion on their appropriateness. But it's often better to have that in one place, if plausible.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
A-iterators Area: Iterators I-slow Issue: Problems and improvements with respect to performance of generated code. T-libs Relevant to the library team, which will review and decide on the PR/issue. T-libs-api Relevant to the library API team, which will review and decide on the PR/issue.
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants