Add conflate operations to Stream #3401

mpilquist · 2024-02-28T14:56:11Z

Similar to conflate/conflateWithSeed from akka-streams.

This PR adds:

conflateChunks
conflate
conflate1
conflateSemigroup
conflateMap

Recommended by @seigert on r/scala here: https://www.reddit.com/r/scala/comments/1ayqcx0/comment/krx6nr6/

mpilquist · 2024-02-28T15:04:30Z

core/shared/src/main/scala/fs2/Stream.scala

@@ -568,6 +568,56 @@ final class Stream[+F[_], +O] private[fs2] (private[fs2] val underlying: Pull[F,
    Stream.eval(fstream)
  }

+  def conflate[F2[x] >: F[x], O2 >: O](implicit


I don't love this signature. Feels very adhoc.

I was thinking on Channel-based implementation that you suggested on Reddit and come up with some method signatures that I think would feel better:

def conflateChunks(implicit F: Concurrent[F]): Stream[F, Chunk[O]] -- the base method for all others.
As Channel.stream emits all collected data in a singular chunk, it feels natural to give a user
access to them w/o any additional operations on elements.
Also, I think it's worth doing even for current implementation, as pulling chunks is cheaper
than pulling elements and chunk concat is O(1).

def conflate[O2](zero: O2)(f: (O2, O) => O2)(implicit F: Concurrent[F]): Stream[F, O2] -- I think
conflate series of methods should be similar in signatures to fold/scan.

def conflate1(f: (O, O) => O)(implicit F: Concurrent[F]): Stream[F, O] - again, there are fold1/scan1
and we just 'swapped' Akka's conflate/conflateWithSeed.

def conflateSemigroup(implicit F: Concurrent[F], O: Semigroup[O]): Stream[F, O] -- once more
we do like foldMonoid/scanMonoid except that we don't need empty;

def conflateMap -- I'm not sure about that one, but why not?

On a side note, Cambridge Dictionary suggest 'combine, fuse, meld, merge' as synonims. 'Merge' is taken, 'fuse' and 'meld' are meh, but 'combine' is a-ok especially because we use Semigroup. On the other hand conflate would be more weel known for the people with Akka/Reactive Streams experience, much like switchMap.

If we do something like conflateChunks, we're getting very close to the behavior of prefetchN -- which might be a sign that we should just offer a prefetchN variant that combines all chunks when dequeuing. Using conflateWithSeed as the base operation allows the conflation to drop / summarize data instead of storing each element. On the other hand, conflate suffers from loss of backpressure as-is -- there's no limit to how much is pulled from source stream, so if downstream isn'e expedient, we could run out of memory. prefetchN doesn't have this issue as it uses a bounded channel to transfer elements.

On second thought, prefetchN already does the right thing - accumulates chunks that arrive while downstream is processing. So conflate is as simple as:

def conflate[F2[x] >: F[x], O2 >: O](implicit F: Concurrent[F2], O: Semigroup[O2] ): Stream[F2, O2] = prefetchN(Int.MaxValue).chunks.map(_.combineAll)

I pushed some new implementations using your suggested signatures plus the addition of a chunkLimit param on each. Please take a look.

While channel does the right thing, prefetchN doesn't as it maintains the source chunk structure. The version of conflateChunks I just pushed uses a Channel directly ~~and inlines the chunk conflation logic using unconsFlatMap~~ (edit: removed the micro optimization).

I like it very much!

One nitpick -- I'm not sure about chunkLimit, maybe it's easier to use Channel.unbounded? We still have no control over number of elements in each conflated chunk, so it's not like very have any control over consumed memory.

I'd prefer to force folks to make a decision to use Int.MaxValue here instead of, by default, allowing unbounded memory usage. It's a bit of a speed bump to get folks to think about memory usage. Not sure chunkLimit is the right name given it's really a limit on number of chunks pulled from source stream and the max size of chunks emitted downstream.

seigert · 2024-02-29T14:35:19Z

core/shared/src/main/scala/fs2/Stream.scala

-  def conflateMap[F2[x] >: F[x]: Concurrent, O2: Semigroup](chunkLimit: Int)(f: O => O2): Stream[F2, O2] =
+  def conflateMap[F2[x] >: F[x]: Concurrent, O2: Semigroup](chunkLimit: Int)(
+      f: O => O2
+  ): Stream[F2, O2] =
    map(f).conflateSemigroup[F2, O2](chunkLimit)


Maybe move mapping function application after conflated chunks pull?

def conflateMap[F2[x] >: F[x]: Concurrent, O2: Semigroup](chunkLimit: Int)( f: O => O2 ): Stream[F2, O2] = conflateChunks[F2](chunkLimit).map { c => c.drop(1).foldLeft(f(c(0)))((x, y) => Semigroup[O2].combine(x, f(y))) }

Any rationale? No strong preference either way, just curious on the motivation.

Just that this way we apply f only when data was actually pulled. Thus, if downstream decides to stop/cancel, we didn't do any unnecessary transformations, consuming cpu and memory for allocations.

seigert · 2024-02-29T14:39:35Z

core/shared/src/main/scala/fs2/Stream.scala

@@ -568,6 +568,56 @@ final class Stream[+F[_], +O] private[fs2] (private[fs2] val underlying: Pull[F,
    Stream.eval(fstream)
  }

+  def conflate[F2[x] >: F[x], O2 >: O](implicit


I like it very much!

One nitpick -- I'm not sure about chunkLimit, maybe it's easier to use Channel.unbounded? We still have no control over number of elements in each conflated chunk, so it's not like very have any control over consumed memory.

Jasper-M · 2024-03-18T13:48:53Z

Maybe also interesting to know that this operation was called bufferIntrospective in monix.

Add initial implementation of conflate and conflateWithSeed

1c66d40

mpilquist commented Feb 28, 2024

View reviewed changes

mpilquist added 3 commits February 28, 2024 21:11

Add test for conflate

3604eed

Reimplement conflate operations

5a522c5

Further simplify conflateChunks

33320fa

seigert reviewed Feb 29, 2024

View reviewed changes

Docs for conflate functions

87f3229

mpilquist marked this pull request as ready for review February 29, 2024 21:18

mpilquist changed the title ~~Add initial implementation of conflate and conflateWithSeed~~ Add conflate operations to Stream Feb 29, 2024

Slight refactor of chunk reduction

b939a1f

mpilquist mentioned this pull request Mar 7, 2024

Ability to fetch all available records in the Stream to a chunk before each evalMap #1423

Closed

mpilquist merged commit e0cdf07 into main Mar 8, 2024
31 checks passed

mpilquist deleted the topic/conflate branch March 8, 2024 12:48

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add conflate operations to Stream #3401

Add conflate operations to Stream #3401

mpilquist commented Feb 28, 2024 •

edited

Loading

mpilquist Feb 28, 2024

seigert Feb 29, 2024

mpilquist Feb 29, 2024

mpilquist Feb 29, 2024

mpilquist Feb 29, 2024 •

edited

Loading

seigert Feb 29, 2024

mpilquist Feb 29, 2024

seigert Feb 29, 2024

mpilquist Feb 29, 2024

seigert Feb 29, 2024

seigert Feb 29, 2024

Jasper-M commented Mar 18, 2024

Add conflate operations to Stream #3401

Add conflate operations to Stream #3401

Conversation

mpilquist commented Feb 28, 2024 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

mpilquist Feb 29, 2024 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Jasper-M commented Mar 18, 2024

mpilquist commented Feb 28, 2024 •

edited

Loading

mpilquist Feb 29, 2024 •

edited

Loading