Add mapAccumulate to Traverse #4209

BalmungSan · 2022-05-21T01:56:45Z

As promised in discord :)

core/src/main/scala/cats/Traverse.scala

BalmungSan · 2022-05-22T04:19:46Z

Okay, I think I made all the requested changes; please let me know if I forgot something or if there is something else.

tests/shared/src/test/scala/cats/tests/TraverseSuite.scala

core/src/main/scala/cats/Traverse.scala

johnynek · 2022-05-22T05:07:34Z

BTW: also I think you can override this for list, vector, lazy list and stream that doesn't use state and is likely far faster.

johnynek · 2022-05-22T05:08:05Z

One more thing: I think this is a very nice function.

BalmungSan · 2022-05-22T05:37:42Z

While trying to override the current implementation for LazyList I realized that the current interface is broken for lazy collections, it forces the evaluation of them, which is not required at all. And the problem is for trying to return the final state... such value may not be computed after the call to mapAccumulate; actually such value may never exist at all; since the collection could be infinite.
Thus, I think the best is to just return F[B]

What do you all think @satorg @armanbilge @johnynek?

satorg · 2022-05-22T05:49:43Z

BTW: also I think you can override this for list, vector, lazy list and stream that doesn't use state and is likely far faster.

Don't forget Seq please. As well as non-empty counterparts of all from the above: NonEmptyList, NonEmptyVector, etc. :)

But it is going to be quite a big chunk of additional work. I would suggest to do it in a separate PR, if possible.

johnynek · 2022-05-22T06:15:06Z

Traverse already forces a LazyList doesn't it? Otherwise we would need fs2 as much. Am I mistaken?

satorg · 2022-05-22T06:31:48Z

Traverse already forces a LazyList doesn't it?

That's correct. But in fact, traverse itself cannot be processed lazily, e.g.:

val lzy: LazyList[Int] = ???

val traversedLzy: Option[LazyList[Int]] = lzy.traverse(a => if (someCondition(a)) Some(a) else None)

So we cannot emit the outer Option as a result without forcing the LazyList.

But seems that it would not be the case for mapAccumulate if we did not have to return the accumulator value itself...
So yeah, I am kind of hesitant here again...

satorg · 2022-05-22T06:36:02Z

Just a note: seems that traverse for LazyList fully forces it even in this case:

val lzy: LazyList[Int] = ???

val traversedLzy: LazyList[LazyList[Int]] = lzy.traverse(_ #:: LazyList.empty)

Not sure if in such a case it should really be enforced...
I would expect it should remain "untraversed" at least before we get the very first item of the outer list.
But in fact, it gets traversed right inside the traverse method.

satorg · 2022-05-22T09:53:51Z

I can think about several options on how we could address it.

Just do nothing (besides we have in this PR already). Perhaps that having mapAccumulate that behaves truly lazily for LazyList is not that important.
Make mapAccumulate returning (Eval[S], F[B]) instead. In that case the LazyList could be optimized in the way that it is enforced on either fetching items from F[B] or attempting to get S out of Eval only. Could be quite difficult to implement, though.
Create 2 functions instead of just one mapAccumulate: the first one returning just F[B] but allowing lazy traversal where applicable and the second one returning (S, F[B]) but always eager.

BalmungSan · 2022-05-22T14:54:19Z

Create 2 functions instead of just one mapAccumulate: the first one returning just F[B] but allowing lazy traversal where applicable and the second one returning (S, F[B]) but always eager.

The more I think about this the less I like it, I think the safest is just to not leak the S in the final signature.
As we said, if someone needs it they can just make it part of the B
Another argument for not doing that is that the current mapAccumulate of fs2 doesn't return that final S (for obvious reasons), thus I think it would be best to be consistent.

johnynek · 2022-05-22T16:54:32Z

But there is no Traverse instance for fs2.Stream right? There can't be for the same reason you can't have one for IO.

I'm disappointed that the more theoretic argument about what the signature should be (the one that composes and does lifting of the F[_]) takes a back seat to the ideas that:

Even though for Traverse forces LazyList and Stream, we want to change a type signature so those two instances could not force for this one method (even though all other traverse derived functions will force).
We want to be consistent with another method in fs2 despite that type not being Traverse.

johnynek · 2022-05-22T17:15:05Z

Regarding composition I mentioned in my last comment. Recall the functions here: #4209 (comment)

So if we have:

val mapList: [X, Y] => (X => Y) => (List[X] => List[Y]) = ...
val mapOption: [X, Y] => (X => Y) => (Option[X] => Option[Y]) = ...

val mapListOption: [X, Y] => (X => Y) => (List[Option[X]] => List[Option[Y]]) =
  [X, Y] => (fn: X => Y) => mapList(mapOption(fn))

so this is what we mean when we say Functor composes.

If you look at the shapes of the functions in #4209 (comment) , you can see which ones are kind of "shape preserving" meaning the results are the same shape as the original with an F[_] added or removed. We can see map, foldLeft, traverse all have that property, which is due to the fact that those type classes all compose: if you have Functor[F] and Functor[G] you can make Functor[[X] =>> F[G[X]]] and so on with Foldable and Traverse.

Now, look at the case we have here:

val mapAccList: [X, S, Y] => ((X, S) => (Y, S)) => ((List[X], S) => (List[Y], S) ...
val mapAccOption: [X, S, Y] => ((X, S) => (Y, S)) => ((Option[X], S) => (Option[Y], S)) = ...

val mapAccListOption: [X, S, Y] => ((X, S) => (Y, S)) => ((List[Option[X]], S) => (List[Option[Y], S)) =
  [X, Y] => (fn: (X, S) => (Y, S)) => mapAccList(mapAccOption(fn))

So, in this way, if we have a mapAccumulate optimized for Vector and one optimized for List we can keep our optimizations for [X] =>> Vector[List[X]] since the function composes.

We can't keep our optimizations under compositions if we discard the final S.

I can see the motivations to discard the S in many cases, but I think if we really think we want that we could add a second function that just drops the S off.

But to me, the theoretical reasons should be the highest priority here since they tend to guide us to universal principles, not just the aesthetics of one person or another.

BalmungSan · 2022-05-23T12:29:58Z

@johnynek @satorg @armanbilge

Okay, thank you all for the feedback, I think the definition of the method is ready to go.
I also override the implementation of mapAccumulate, mapWithIndex, and zipWithIndex for List, Vector, Seq, Chain and their non-empty counterparts; I made all that in a second commit thus if you prefer I can move that to another PR.

Let me know what you all think and if there is any other detail to discuss :)

core/src/main/scala/cats/data/NonEmptyList.scala

core/src/main/scala/cats/instances/list.scala

johnynek · 2022-05-23T17:50:42Z

can we also override mapAccumulate here:

cats/core/src/main/scala/cats/Composed.scala

Line 118 in 708cbab

private[cats] trait ComposedTraverse[F[_], G[_]]

and:

cats/core/src/main/scala/cats/data/Nested.scala

Line 348 in 708cbab

    
           override def traverse[H[_]: Applicative, A, B](fga: Nested[F, G, A])(f: A => H[B]): H[Nested[F, G, B]] =

johnynek

Looking good. Two minor issues.

core/src/main/scala/cats/Composed.scala

core/src/main/scala/cats/data/Nested.scala

core/src/main/scala/cats/data/NonEmptySeq.scala

core/src/main/scala/cats/instances/seq.scala

satorg · 2022-05-23T19:23:45Z

Not saying we should do it here, but I wonder if there should be something like

trait StrictTraverse[F[_]] extends Traverse[F] {
  def mapAccumulate... // put strict-only implementation directly to here
}

Then let strict collections only implement this typeclass.

This would allow us to represent the idea of "strictness" it a typefull way (not just by hiding it somewhere inside Cats).

johnynek · 2022-05-23T20:02:17Z

@satorg but traverse itself is strict isn't it? How can we lazily implement traverse without any constraints on G[_]: Applicative?

johnynek

I think this is pretty great.

Very nice generalization of mapWithIndex.

satorg · 2022-05-23T22:02:33Z

but traverse itself is strict isn't it? How can we lazily implement traverse without any constraints on G[_]: Applicative?

Actually, this is not what I meant – I didn't think it through enough. Yes, you're right – it is strict itself.
(Although such function as mapWithIndex still does not have to be strict, does it?)

What I really meant: seems we could define an abstract internal mix-in trait StrictTraverse with pre-defined implementations for mapAccumulate and mapWithIndex (and perhaps something else) and then mix it into the implementations of Traverse for List, Vector, maybe Chain. It is just a matter of the way how the code is organized within Cats. But I think it is a really minor thing, not worth long discussion.

satorg · 2022-05-23T23:52:32Z

core/src/main/scala/cats/data/NonEmptySeq.scala

+  def mapAccumulate[S, B](init: S)(f: (S, A) => (S, B)): (S, NonEmptySeq[B]) = {
+    var state = init
+
+    val fb = iterator.map { a =>
+      val (newState, b) = f(state, a)
+
+      state = newState
+
+      b
+    }.toIndexedSeq
+
+    (state, new NonEmptySeq(fb))
+  }


Is that .toIndexedSeq still 100% safe here? I mean, IndexedSeq is an interface. Although all IndexedSeq-s from the Scala library might be strict collections, but we cannot completely exclude a case when someone can try to create a NonEmptySeq out of their own IndexedSeq implementation that have iterator.toIndexedSeq returning a non-strict collection. E.g. imagine some storage-efficient sparse-array implementation. If it happens then state won't be properly calculated.

I'd suggest to iterate the iterator directly via hasNext/next methods – that will definitely guarantee the enforced evaluation. Also it will probably make the implementation even more efficient since there will be no closure involved anymore.

Actually, there is one in the stdlib IndexedSeqView which means yeah this is unsafe.

satorg · 2022-05-23T23:55:51Z

core/src/main/scala/cats/instances/seq.scala

+        val fb = fa.iterator.map { a =>
+          val (newState, b) = f(state, a)
+
+          state = newState
+
+          b
+        }.toIndexedSeq


The same comment as for the NonEmptySeq above.

Actually, I think that the only implementation should be put here and then re-used from those for Traverse[NonEmptySeq].

Also I still think that it would be better to preserve the original Seq's concrete type. The only thing, it should be a bit different for Scala 2.12 and 2.13+:

val builder = fa.companion.newBuilder[B] // 2.12 OR val builder = fa.iterableFactory.newBuilder[B] // 2.13+

Seq#companion
Seq#iterableFactory

…ions

BalmungSan · 2022-05-24T16:42:54Z

@satorg I am sorry, but I really don't have the mental energy to deal with Seq shenanigans, I just reverted most of the changes to those and replaced a previous usage of toIndexedSeq on mapWithIndex with a version that does two traversals but preserves the underlying class. This is the best I can do right now.

armanbilge

One question, otherwise LGTM! Thanks for all your work :)

core/src/main/scala/cats/instances/seq.scala

BalmungSan force-pushed the add-map-accum branch 4 times, most recently from 797337f to eb1f1e1 Compare May 21, 2022 03:10

armanbilge added this to the 2.8.0 milestone May 21, 2022

BalmungSan force-pushed the add-map-accum branch from eb1f1e1 to f707496 Compare May 21, 2022 03:22

armanbilge reviewed May 21, 2022

View reviewed changes