Skip to content

Conversation

RobWalt
Copy link
Contributor

@RobWalt RobWalt commented Oct 20, 2023

I was just reading through the docs when I noticed something odd. It's probably just a typo:

There is a recommendation to use reduce instead of tree_fold1 when f is associative. Since tree_fold1 won't really work as expected (unless you know what you're doing) in non-associative cases, this would imply that we don't recommend the use of tree_fold1 at all?

I just took a guess and adjusted that part of the docs to what it most likely was supposed to be. Feel free to close this PR if I'm completely wrong here 🙈

@Philippe-Cholet
Copy link
Member

@scottmcm I believe you wrote Itertools::tree_fold1, can you shed some light here?

@scottmcm
Copy link
Contributor

Since tree_fold1 won't really work as expected (unless you know what you're doing) in non-associative cases

I would say the different behaviour for non-associative cases is entirely the point of tree_fold1 (and of rfold, for that matter).

If you have something truely associative, like u32::wrapping_add, then there's no point in running the materially-more-complicated tree_fold1 since .reduce(u32::wrapping_add) will do the same thing and probably do it much faster.

But even with something only slightly non-associative like f32::add, that's where the good of the non-linear folding structure comes in. Simple demo: https://play.rust-lang.org/?version=stable&mode=debug&edition=2021&gist=b6a1c419176ec7b02fb6af3539708da0

See also the string format bench (

pub fn string_format(c: &mut Criterion) {
) for a place where the non-associativity of performance of the operation really matters.

@RobWalt
Copy link
Contributor Author

RobWalt commented Oct 21, 2023

Ok yeah, I see what you mean. Thanks for the explanation. I think I missed some things before. Let's take a closer look at all the possible cases:

  • associative:
    • f is simple: e.g. cases like u32::wrapping_add. I guess you wanted to say that this case is so simple that it's better to use the more simplistic reduce since it's probably better for optimizations, code gen etc, right? Some evidence that this is true would be cool
      • I can try to benchmark this a little to shine some light on it
    • f is complex, e.g. the format case. Here we recommend the use of tree_fold1 since it reduces the amount of required operations from n to ln2(n). I thought this is the main use case of tree_fold1 and this is where my confusions and "improvements" of the docs come from
      • As mentioned, this is imo the main use case and should be highlighted. Writing

        If f is associative, prefer the normal [Iterator::reduce] instead.

        seems misleading.

  • non-associative:
    • in general: e.g. the i32::sub case from the docs. tree_fold1 will most likely produce different results than reduce in an "unintuitive" way and this is why it shouldn't be recommended in this case
    • in special cases: e.g. your example with f32::add. If you really know what you're doing and if you can estimate the data you're dealing with then it might make sense that you use it in this case. I think your example is a bit constructed in a way since the result of the f32::add operations also depends on the order of the elements in the array and not only on the folding function used. So yeah, in a way tree_fold1 can produce a more exact result, but you need to ensure that the data is ordered so that it works out. Take this example where we shuffle the array first https://play.rust-lang.org/?version=stable&mode=debug&edition=2021&gist=31ea48fe708bf3778358c48a930963b5

Both of the non-associative cases should be neglacted in my opinion.

@scottmcm
Copy link
Contributor

One particular place where this can be really useful is in building binary trees. Compare the outputs in https://play.rust-lang.org/?version=stable&mode=debug&edition=2021&gist=f6a580a37e0a01b1c94767680dd444ca -- using tree_fold1 gives an O(log n)-depth tree vs the O(n)-depth tree from reduce, and the lower depth tends to be better, since it means things like a recursive DFS will be much happier.

I don't know a great way to fit that example into a doc comment, though :/

@RobWalt
Copy link
Contributor Author

RobWalt commented Oct 23, 2023

Just did a very basic test on compiler explorer and both reduce and tree_fold1 generate the exact same machine code for the simple case of u32::add as can be seen here, which implies that there is no benefit of using one over the other.

To see this you need to scroll down all the way on the right hand side window. The generated assembly code is color coded the same way as the related rust code.

https://rust.godbolt.org/z/b7K5EenEE

@scottmcm
Copy link
Contributor

Never use + over a Range as your test, because when LLVM notices that's what you're doing it optimizes it to the closed-form solution.

I suggest comparing summing up a slice, where you'll see, for example, that the normal reduce gets vectorized but the tree_fold does not: https://rust.godbolt.org/z/6hb13ae4T

They're very very different things once you're not doing something that's not just a constant.

@RobWalt
Copy link
Contributor Author

RobWalt commented Oct 23, 2023

I guess my understanding of all of this is just lacking then. I'll just close the issue. Thanks a lot for the extra explanation, that was helpful! And sorry for the needless discussion

@RobWalt RobWalt closed this Oct 23, 2023
@scottmcm
Copy link
Contributor

No, the discussion is good! It'd be great if you could find a nice way to reflect this difference into the documentation, to help the next person who's wondering why the method exists.

@RobWalt
Copy link
Contributor Author

RobWalt commented Oct 24, 2023

Thanks for the positive feedback :) Ok, then let me know which parts I should put the highlights on! The different use cases of tree_fold1 and reduce based on the complexity of f?

@RobWalt RobWalt reopened this Oct 24, 2023
@RobWalt
Copy link
Contributor Author

RobWalt commented Oct 25, 2023

@scottmcm I adjusted the docs once again. Does this come closer to an acceptable explanation? 😅

@RobWalt
Copy link
Contributor Author

RobWalt commented Nov 1, 2023

bump to not forget this issue as it's also not that big of an issue 😅

@jswrenn jswrenn added this pull request to the merge queue Nov 13, 2023
@jswrenn jswrenn added this to the next milestone Nov 13, 2023
Merged via the queue into rust-itertools:master with commit b76172b Nov 13, 2023
@RobWalt RobWalt deleted the chore/improve-docs branch November 13, 2023 21:16
@jswrenn jswrenn mentioned this pull request Nov 14, 2023
@TomFryersMidsummer
Copy link

TomFryersMidsummer commented Jan 12, 2024

Some of this doesn't seem right to me, and the docs now seem a bit confusing, if not inaccurate. Specifically:

if f is non-trivial like format!, you should use tree_fold1 since it reduces the number of operations from O(n) to O(ln(n))

Both reduce and tree_fold1 call f n − 1 times, which is O(n).

I think what this is trying to get at is that if f is O(n), the iterator elements are of similar sizes*, and size(f(x, y)) = size(x) + size(y), then reduce(f) is O(n2) and tree_fold1 is O(n log n).

format! does satisfy these condition, but it may not be the best example. format!("{x}{y}") and format!("{x}foo{y}") are the only obvious non-trivial associative cases, and these are concat and join.

* Not necessary, but sufficient. reduce gets faster if you put the largest items at the end.

@RobWalt
Copy link
Contributor Author

RobWalt commented Jan 16, 2024

I'm sorry if I made the docs less correct. I can look into this when I find the time again. Otherwise feel free to open a PR yourself

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants