Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add mapAccumulate to Traverse #4209

Merged
merged 2 commits into from
May 28, 2022
Merged

Conversation

BalmungSan
Copy link
Contributor

@BalmungSan BalmungSan commented May 21, 2022

As promised in discord :)

@BalmungSan BalmungSan force-pushed the add-map-accum branch 4 times, most recently from 797337f to eb1f1e1 Compare May 21, 2022 03:10
@armanbilge armanbilge added this to the 2.8.0 milestone May 21, 2022
@BalmungSan
Copy link
Contributor Author

Okay, I think I made all the requested changes; please let me know if I forgot something or if there is something else.

@johnynek
Copy link
Contributor

BTW: also I think you can override this for list, vector, lazy list and stream that doesn't use state and is likely far faster.

@johnynek
Copy link
Contributor

One more thing: I think this is a very nice function.

@BalmungSan
Copy link
Contributor Author

While trying to override the current implementation for LazyList I realized that the current interface is broken for lazy collections, it forces the evaluation of them, which is not required at all. And the problem is for trying to return the final state... such value may not be computed after the call to mapAccumulate; actually such value may never exist at all; since the collection could be infinite.
Thus, I think the best is to just return F[B]

What do you all think @satorg @armanbilge @johnynek?

@satorg
Copy link
Contributor

satorg commented May 22, 2022

BTW: also I think you can override this for list, vector, lazy list and stream that doesn't use state and is likely far faster.

Don't forget Seq please. As well as non-empty counterparts of all from the above: NonEmptyList, NonEmptyVector, etc. :)

But it is going to be quite a big chunk of additional work. I would suggest to do it in a separate PR, if possible.

@johnynek
Copy link
Contributor

Traverse already forces a LazyList doesn't it? Otherwise we would need fs2 as much. Am I mistaken?

@satorg
Copy link
Contributor

satorg commented May 22, 2022

Traverse already forces a LazyList doesn't it?

That's correct. But in fact, traverse itself cannot be processed lazily, e.g.:

val lzy: LazyList[Int] = ???

val traversedLzy: Option[LazyList[Int]] = lzy.traverse(a => if (someCondition(a)) Some(a) else None)

So we cannot emit the outer Option as a result without forcing the LazyList.

But seems that it would not be the case for mapAccumulate if we did not have to return the accumulator value itself...
So yeah, I am kind of hesitant here again...

@satorg
Copy link
Contributor

satorg commented May 22, 2022

Just a note: seems that traverse for LazyList fully forces it even in this case:

val lzy: LazyList[Int] = ???

val traversedLzy: LazyList[LazyList[Int]] = lzy.traverse(_ #:: LazyList.empty)

Not sure if in such a case it should really be enforced...
I would expect it should remain "untraversed" at least before we get the very first item of the outer list.
But in fact, it gets traversed right inside the traverse method.

@satorg
Copy link
Contributor

satorg commented May 22, 2022

I can think about several options on how we could address it.

  1. Just do nothing (besides we have in this PR already). Perhaps that having mapAccumulate that behaves truly lazily for LazyList is not that important.
  2. Make mapAccumulate returning (Eval[S], F[B]) instead. In that case the LazyList could be optimized in the way that it is enforced on either fetching items from F[B] or attempting to get S out of Eval only. Could be quite difficult to implement, though.
  3. Create 2 functions instead of just one mapAccumulate: the first one returning just F[B] but allowing lazy traversal where applicable and the second one returning (S, F[B]) but always eager.

@BalmungSan
Copy link
Contributor Author

Create 2 functions instead of just one mapAccumulate: the first one returning just F[B] but allowing lazy traversal where applicable and the second one returning (S, F[B]) but always eager.

The more I think about this the less I like it, I think the safest is just to not leak the S in the final signature.
As we said, if someone needs it they can just make it part of the B
Another argument for not doing that is that the current mapAccumulate of fs2 doesn't return that final S (for obvious reasons), thus I think it would be best to be consistent.

@johnynek
Copy link
Contributor

But there is no Traverse instance for fs2.Stream right? There can't be for the same reason you can't have one for IO.

I'm disappointed that the more theoretic argument about what the signature should be (the one that composes and does lifting of the F[_]) takes a back seat to the ideas that:

  1. Even though for Traverse forces LazyList and Stream, we want to change a type signature so those two instances could not force for this one method (even though all other traverse derived functions will force).

  2. We want to be consistent with another method in fs2 despite that type not being Traverse.

@johnynek
Copy link
Contributor

johnynek commented May 22, 2022

Regarding composition I mentioned in my last comment. Recall the functions here: #4209 (comment)

So if we have:

val mapList: [X, Y] => (X => Y) => (List[X] => List[Y]) = ...
val mapOption: [X, Y] => (X => Y) => (Option[X] => Option[Y]) = ...

val mapListOption: [X, Y] => (X => Y) => (List[Option[X]] => List[Option[Y]]) =
  [X, Y] => (fn: X => Y) => mapList(mapOption(fn)) 

so this is what we mean when we say Functor composes.

If you look at the shapes of the functions in #4209 (comment) , you can see which ones are kind of "shape preserving" meaning the results are the same shape as the original with an F[_] added or removed. We can see map, foldLeft, traverse all have that property, which is due to the fact that those type classes all compose: if you have Functor[F] and Functor[G] you can make Functor[[X] =>> F[G[X]]] and so on with Foldable and Traverse.

Now, look at the case we have here:

val mapAccList: [X, S, Y] => ((X, S) => (Y, S)) => ((List[X], S) => (List[Y], S) ...
val mapAccOption: [X, S, Y] => ((X, S) => (Y, S)) => ((Option[X], S) => (Option[Y], S)) = ...

val mapAccListOption: [X, S, Y] => ((X, S) => (Y, S)) => ((List[Option[X]], S) => (List[Option[Y], S)) =
  [X, Y] => (fn: (X, S) => (Y, S)) => mapAccList(mapAccOption(fn))

So, in this way, if we have a mapAccumulate optimized for Vector and one optimized for List we can keep our optimizations for [X] =>> Vector[List[X]] since the function composes.

We can't keep our optimizations under compositions if we discard the final S.

I can see the motivations to discard the S in many cases, but I think if we really think we want that we could add a second function that just drops the S off.

But to me, the theoretical reasons should be the highest priority here since they tend to guide us to universal principles, not just the aesthetics of one person or another.

@BalmungSan
Copy link
Contributor Author

@johnynek @satorg @armanbilge

Okay, thank you all for the feedback, I think the definition of the method is ready to go.
I also override the implementation of mapAccumulate, mapWithIndex, and zipWithIndex for List, Vector, Seq, Chain and their non-empty counterparts; I made all that in a second commit thus if you prefer I can move that to another PR.

Let me know what you all think and if there is any other detail to discuss :)

@johnynek
Copy link
Contributor

can we also override mapAccumulate here:

private[cats] trait ComposedTraverse[F[_], G[_]]

and:

override def traverse[H[_]: Applicative, A, B](fga: Nested[F, G, A])(f: A => H[B]): H[Nested[F, G, B]] =

@BalmungSan BalmungSan force-pushed the add-map-accum branch 2 times, most recently from 9eccf9d to 7707e2b Compare May 23, 2022 18:49
Copy link
Contributor

@johnynek johnynek left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looking good. Two minor issues.

core/src/main/scala/cats/Composed.scala Outdated Show resolved Hide resolved
core/src/main/scala/cats/Composed.scala Outdated Show resolved Hide resolved
core/src/main/scala/cats/data/Nested.scala Outdated Show resolved Hide resolved
core/src/main/scala/cats/data/NonEmptySeq.scala Outdated Show resolved Hide resolved
core/src/main/scala/cats/instances/seq.scala Outdated Show resolved Hide resolved
@satorg
Copy link
Contributor

satorg commented May 23, 2022

Not saying we should do it here, but I wonder if there should be something like

trait StrictTraverse[F[_]] extends Traverse[F] {
  def mapAccumulate... // put strict-only implementation directly to here
}

Then let strict collections only implement this typeclass.

This would allow us to represent the idea of "strictness" it a typefull way (not just by hiding it somewhere inside Cats).

@johnynek
Copy link
Contributor

@satorg but traverse itself is strict isn't it? How can we lazily implement traverse without any constraints on G[_]: Applicative?

johnynek
johnynek previously approved these changes May 23, 2022
Copy link
Contributor

@johnynek johnynek left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this is pretty great.

Very nice generalization of mapWithIndex.

johnynek
johnynek previously approved these changes May 23, 2022
@satorg
Copy link
Contributor

satorg commented May 23, 2022

but traverse itself is strict isn't it? How can we lazily implement traverse without any constraints on G[_]: Applicative?

Actually, this is not what I meant – I didn't think it through enough. Yes, you're right – it is strict itself.
(Although such function as mapWithIndex still does not have to be strict, does it?)

What I really meant: seems we could define an abstract internal mix-in trait StrictTraverse with pre-defined implementations for mapAccumulate and mapWithIndex (and perhaps something else) and then mix it into the implementations of Traverse for List, Vector, maybe Chain. It is just a matter of the way how the code is organized within Cats. But I think it is a really minor thing, not worth long discussion.

Comment on lines 263 to 275
def mapAccumulate[S, B](init: S)(f: (S, A) => (S, B)): (S, NonEmptySeq[B]) = {
var state = init

val fb = iterator.map { a =>
val (newState, b) = f(state, a)

state = newState

b
}.toIndexedSeq

(state, new NonEmptySeq(fb))
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is that .toIndexedSeq still 100% safe here? I mean, IndexedSeq is an interface. Although all IndexedSeq-s from the Scala library might be strict collections, but we cannot completely exclude a case when someone can try to create a NonEmptySeq out of their own IndexedSeq implementation that have iterator.toIndexedSeq returning a non-strict collection. E.g. imagine some storage-efficient sparse-array implementation. If it happens then state won't be properly calculated.

I'd suggest to iterate the iterator directly via hasNext/next methods – that will definitely guarantee the enforced evaluation. Also it will probably make the implementation even more efficient since there will be no closure involved anymore.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actually, there is one in the stdlib IndexedSeqView which means yeah this is unsafe.

Comment on lines 138 to 144
val fb = fa.iterator.map { a =>
val (newState, b) = f(state, a)

state = newState

b
}.toIndexedSeq
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The same comment as for the NonEmptySeq above.

Actually, I think that the only implementation should be put here and then re-used from those for Traverse[NonEmptySeq].

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also I still think that it would be better to preserve the original Seq's concrete type. The only thing, it should be a bit different for Scala 2.12 and 2.13+:

val builder = fa.companion.newBuilder[B] // 2.12
OR
val builder = fa.iterableFactory.newBuilder[B] // 2.13+

Seq#companion
Seq#iterableFactory

@BalmungSan
Copy link
Contributor Author

@satorg I am sorry, but I really don't have the mental energy to deal with Seq shenanigans, I just reverted most of the changes to those and replaced a previous usage of toIndexedSeq on mapWithIndex with a version that does two traversals but preserves the underlying class. This is the best I can do right now.

Copy link
Member

@armanbilge armanbilge left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

One question, otherwise LGTM! Thanks for all your work :)

core/src/main/scala/cats/instances/seq.scala Show resolved Hide resolved
@johnynek johnynek merged commit 8d4cf28 into typelevel:main May 28, 2022
@BalmungSan BalmungSan deleted the add-map-accum branch May 28, 2022 18:24
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants