Iterator::fold is a little slow compared to bare loop #76725

mlodato517 · 2020-09-15T00:59:39Z

Background

I was writing a small project to compare the memory impact of iterators. I decided to also investigate the runtime impact and see how close Rust came to providing various zero cost abstractions to iteration. I compared three types of iteration:

fn filter_map_filter(nums: &[u64]) -> Vec<u64> {
  nums.iter().filter(...).map(...).filter(...).collect()
}

fn fold(nums: &[u64]) -> Vec<u64> {
  nums.iter().fold(Vec::new(), ...)
}

fn raw(nums: &[u64]) -> Vec<u64> {
  let mut result = Vec::new();
  for n in nums { ... }
  result
}

and saw this Criterion output plot:

The outliers to the right are the timings of the fold functions while the two pairs on the left are the filter_map_filter and raw versions. Rust did a great job ensuring .filter.map.filter was the same speed as a raw loop but .fold seemed to be lacking.

Quick Investigation

Looking at the source for .fold the accum is reassigned with the result of each invocation of f. I quickly tested if this could be improved with a &mut instead in this PR. The result was surprising (to me):

The "custom fold" method was faster than all the other options (which doesn't make a ton of sense to me but that's what y'all are here for!).

Path Forward

I initially was going to suggest adding some sort of fold_mut or some better named method to allow for this faster fold iterator. This could be a performance improvement in some areas and could also improve the syntax when the closure couldn't "easily" return the new accumulator:

iter.fold(Vec::new(), |v, x| {
  v.push(x);
  v // this line is a little weird
})

I made a branch for this if we want to head in that direction (the tests are slim, the benchmarks are probably overkill, the stability is missing, and the docs are probably slim and improperly formatted but it's a start!) and saw some improvements in the benchmarks I added:

Now I'm not sure if this "fold_mut" path is the right way to go - I'm not sure if it's awkward or dangerous. It seems similar to Ruby's each_with_object so there's maybe something there. It could also be that with some compiler witchcraft we can just make fold a "true" zero cost abstraction.

In any case, thought I'd post here instead of making a PR so we could decided if there should be any PR and I'm happy to help with whatever path forward we choose!

The text was updated successfully, but these errors were encountered:

jyn514 · 2020-09-15T04:37:53Z

For adding new APIs like this, I would recommend just making the PR directly, especially since you already wrote the code. Please change #[stable] to #[unstable], though.

workingjubilee · 2020-09-15T15:26:41Z

I thought update-in-place semantics were supposed to be equivalent to mut in simple cases like this? This PR should definitely be made but simultaneously shouldn't have to be, since fold isn't the only place where update-in-place reasoning is used extensively.

mlodato517 · 2020-09-15T15:29:24Z

I hoped for something like that! I wanted these to be "the same" and that's why I wasn't sure if we wanted to add fold_mut or just figure out why fold wasn't optimizing the way we wanted it to. fold_mut may have some merit as a stop-gap and also it has (occasionally) nicer semantics, but it'd be (probably) better if fold could do the same work as fold_mut is doing.

I hope it's not bad microbenchmarks either - no idea if they're being optimized differently just for that reason or anything. No idea how that works

workingjubilee · 2020-09-15T18:19:43Z

I don't think it's an issue with the microbenchmarks.

And PRs are not always accepted! And can feature discussion on their appropriateness. But it's often better to have that in one place, if plausible.

jyn514 added A-iterators Area: Iterators I-slow Issue: Problems and improvements with respect to performance of generated code. T-libs Relevant to the library team, which will review and decide on the PR/issue. labels Sep 15, 2020

jyn514 added the T-libs-api Relevant to the library API team, which will review and decide on the PR/issue. label Sep 15, 2020

mlodato517 mentioned this issue Sep 15, 2020

Add fold_mut alternative to Iterator trait #76746

Closed

scottmcm mentioned this issue Sep 16, 2020

Lint for fold closure that never moves the accumulator rust-lang/rust-clippy#6053

Open

mlodato517 closed this as completed Sep 18, 2020

mlodato517 mentioned this issue Sep 21, 2020

Add fold_mut alternative to fold rust-itertools/itertools#481

Closed

scottmcm mentioned this issue Apr 10, 2021

Add find_or_last method to Itertools trait rust-itertools/itertools#535

Merged

aetelani mentioned this issue Oct 31, 2023

Rust stable: Changed fold to foreach niklas-heer/speed-comparison#122

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Iterator::fold is a little slow compared to bare loop #76725

Iterator::fold is a little slow compared to bare loop #76725

mlodato517 commented Sep 15, 2020

jyn514 commented Sep 15, 2020

workingjubilee commented Sep 15, 2020

mlodato517 commented Sep 15, 2020

workingjubilee commented Sep 15, 2020

Iterator::fold is a little slow compared to bare loop #76725

Iterator::fold is a little slow compared to bare loop #76725

Comments

mlodato517 commented Sep 15, 2020

Background

Quick Investigation

Path Forward

jyn514 commented Sep 15, 2020

workingjubilee commented Sep 15, 2020

mlodato517 commented Sep 15, 2020

workingjubilee commented Sep 15, 2020