Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Allow optional seed in Stream.scan? #1822

Closed
dmitriz opened this issue Apr 28, 2017 · 12 comments
Closed

Allow optional seed in Stream.scan? #1822

dmitriz opened this issue Apr 28, 2017 · 12 comments
Labels
Type: Enhancement For any feature request or suggestion that isn't a bug fix

Comments

@dmitriz
Copy link
Contributor

dmitriz commented Apr 28, 2017

Right now with 2 arguments, it throws a nondescriptive error:

// current behaviour
s = Stream(11)
t = Stream.scan((x,y)=>x+y, s)
//=> stream.js:106 Uncaught TypeError: Cannot read property '_state' of undefined(…)

It is, however, annoying to be forced to provide a seed when you just want to use the current stream value. It would be nice to make the seed optional, assuming it equal the current stream value if missing:

// This would be nice to have!
s = Stream(11)
t = Stream.scan((x,y)=>x+y, s) //=> 11
s(1), t()  //=> 12

In comparision, RxJS does allow for optional seed in their similar method:
https://github.com/Reactive-Extensions/RxJS/blob/master/doc/api/core/operators/scan.md

@JAForbes
Copy link
Collaborator

@dmitriz
Usually when I run into this, its a sign that a stream is receiving data when it shouldn't.

e.g.

Why initialize s with 11? You could leave s as Stream() and pass 11 to scan.

Maybe that's just for this example, but even in a real world case it boils down to the same decision.

I think it's better to have a single entry point than multiple entry points (where possible).

In the rare case this is needed, you can just pass in s() as the initial value. So its not a lot of boilerplate.

I'm personally against this change, I think variadic api's and optional arguments lead to less clear code. I wouldn't want to mimic Rx's API or JS's reduce, its tiny changes like this, that little by little make an API harder to use. In isolation they seem convenient, but overall the implementation requires more (implicit or explicit) branching, which makes it more likely there'll be bugs and more tests, and more documentation.

Another subtle point I'd like to make re: scan, providing the initial argument is a great opportunity to document in code what the shape of your resulting data structure is/will be. It's often handy to see it in full inline (or nearby), so that when you write your update function you have a reference. In this case its just a number, but in a larger application it may be a quite large object, and each case in a reducer may not ever show the complete object, instead it just assigns to it in piecemeal.

@dmitriz
Copy link
Contributor Author

dmitriz commented Apr 29, 2017

Thank you @JAForbes for an excellent explanation!

Not only I agree with all you said but I have even found a serious flaw in my proposal -- you may not even be able in some cases to pass s() as initial value because it might not be of the right type!

I should have picked a better more realistic example. In the real use cases, you supply a stream of actions and scan over it to get the stream of states. Plus, the choice of the sum as reducer is terrible because it has 2 special features: both accumulator and feed are of the same type and the result is commutative. Which make it a horrible choice for an example because it deceives you into wrong intuition.

So let us pick the better reducer that has none of these "flaws":

const reducer = (state: Array<number>, action: number) => state.concat(action) 

And now we can initialise our action stream and scan over the reducer to get the state stream:

var actions = Stream()
var states = Stream.scan(reducer, [10], actions)
actions() //=> undefined
states() //=> [10] 

After this setup, all we need is simply pipe our actions into the actions stream and our states stream will update itself via reducer with no extra work!

actions(1)
states() //=> [10, 1] 
actions(2)
states() //=> [10, 1, 2]
... 

And now we can pipe that state stream into our view and render to get it updated.

But what if the action stream already had some data before the scan? Like the last action?

var actions = Stream(1)
var states = Stream.scan(reducer, [10], actions)
actions() //=> 1
states() //=> [10, 1] 

So the last action had been taken into account, which might be unwanted. Say your last action was to view specific hotel but now you pass the state representing the full hotel list as seed to the scan. Then the new state will be computed from the last action applied to the seed, which will be that last hotel, instead of the seed itself, which is what you want.

So this looks like a flaw in the scan implementation, doesn't it? We really want the seed in the current state till the next action is dispatched, don't we? In the last example that means, we really want [10], not [10, 1].

And what about passing actions() as the seed? You get a type error!
Which must also happen with RxJS if I understand it correctly.
But sure enough both their examples are addition and multiplications -- both are commutative and have equal types for both args. And you won't discover that problem in those cases. So I'm really puzzled how is their implementation working in general.

@JAForbes
Copy link
Collaborator

Its not necessarily a type error, but in the case of patterns like the Elm architecture it is.

But what if the action stream already had some data before the scan? Like the last action?

This only happens if components seed their own streams instead of letting a parent context do that (which is usually better). But this isn't a problem, your update will be called with the seed and the value from your actions stream.

And you won't discover that problem in those cases. So I'm really puzzled how is their implementation working in general.

This is exactly right, if you want to fold without a seed in a type safe way, you need to have a notion of empty for that type. The name for a type that has both concat and empty is a Monoid.
Without an empty value we can't fold without a seed in a type safe way.

Because Stream.scan doesn't know what type we are folding into, we need to provide an initial value, otherwise we're relying on runtime context to make a decision (that's a very Javascript way of doing things) and that's a lot less simple.

So the last action had been taken into account, which might be unwanted.

If the last action is unwanted, then its beyond Stream.scan's responsibility to do that. Its behaviour is correct and useful. If you want to dismiss that value, you can decorate your data somehow so your update function can disregard it, perhaps a session increment or some other solution. e.g.

function update(state, action){
  if( state.session > action.session ){
    return state
  } else {
     ... 
  }
}

But that's beyond the responsibility of scan, that's an application concern. So I don't see how scan is flawed, apologies if I'm misunderstanding you. 😄

@dmitriz
Copy link
Contributor Author

dmitriz commented Apr 29, 2017

Because Stream.scan doesn't know what type we are folding into, we need to provide an initial value, otherwise we're relying on runtime context to make a decision (that's a very Javascript way of doing things) and that's a lot less simple.

Indeed, so it seems the RxJS has this safety "problem", interesting...

But what if the action stream already had some data before the scan? Like the last action?

This only happens if components seed their own streams instead of letting a parent context do that (which is usually better). But this isn't a problem, your update will be called with the seed and the value from your actions stream.

I mean the opposite, the new component would re-use the existing action stream from the parent, which means it has some old actions from the past that we don't necessarily want to replay on the new component, do we?

We might even expect new actions of new type accounted for in the new reducer, but not necessarily for the old actions, so we might end up with type error. And it feel weird to patch the update/reducer function just to accommodate the single last action that I don't even want :)

I'll probably rather patch the scan method, but perhaps I am missing something and would love to see a real example with that situation actually occurring (scan over non-empty action stream).

@dmitriz
Copy link
Contributor Author

dmitriz commented Apr 29, 2017

@dead-claudia
Copy link
Member

Not a single test with nonempty stream before scan!

Something that needs rectified as per the result of this issue. 😉

@dead-claudia dead-claudia added the Type: Enhancement For any feature request or suggestion that isn't a bug fix label Apr 29, 2017
@JAForbes
Copy link
Collaborator

has some old actions from the past that we don't necessarily want to replay on the new component, do we?

This is really up to the application. In many cases this is exactly the behaviour we want, in other cases it would never happen because we only initialise streams at the single entry point to the entire application.

We may want this behaviour if we do not want to consider race conditions from separate systems merging (e.g. a value sent down a stream from a url change, vs a value coming from a server). Maybe one module is loaded after the other, we shouldn't care, the semantics of emitting the existing value makes streams really easy to predict, it abstracts over the async and sync case, whichever it is, scan, and map will behave in the same way, and emit the same number of times whether the value arrives later, or whether it was already there at the time the other module is initialised.

And it feel weird to patch the update/reducer function just to accommodate the single last action that I don't even want :)

This is an application concern, so it makes sense to have the reducer handle application specific behaviour.

Not a single test with nonempty stream before scan!

Its worth testing, but it also shouldn't be divergent behaviour in flyd/mithril, so the test should just state that both cases emit the same number of times for an existing value and a value that comes later.

Right now with 2 arguments, it throws a nondescriptive error:

I agree we could emit a more descriptive error. But I don't agree the behaviour should be changed.

@dmitriz
Copy link
Contributor Author

dmitriz commented Apr 30, 2017

has some old actions from the past that we don't necessarily want to replay on the new component, do we?

This is really up to the application. In many cases this is exactly the behaviour we want, in other cases it would never happen because we only initialise streams at the single entry point to the entire application.

Ok, here is my conceptual attempt. I might be wrong of course.

I consider a stream x conceptually as just a collection of values spread over discrete time moments. It is not the same as the function x(t) defined for all times t and constant on the intervals between the events. Why? Because e.g. a constant function with value 0 over the whole time axis is not the same as that same value 0 being fired at specific time moments. Because we care when the events are fired, rather then when their value changes.

Now the scan method creates new child stream from the parent. When does it emit? Precisely when the parent stream does! But that does not happen at the time of creation of the child. So the new value is computed only at the moments when parent emits at all times, except the time of creation, where it is still computed nevertheless. That just feel conceptually irregular. It might be just me of course.

I am not saying the current implementation might not be useful. It surely can be in some cases. And not so in others. However, you can't simply argue by the usefulness. Not only because it depends on use cases and subjective, but also because usefulness comes from the way it is applied. Which is not necessarily the pure method's responsibility. The method itself, in my view, should behave the most natural and the simplest possible way.

For instance, you define the map(f) over array by applying it to every value. Why? Because it is the simplest rule possible. But you could define it differently, e.g. apply f to all array elements except the last one. Or except the first one. Or both. And in each of these other ways, you still get a functor satisfying all the laws! So the functoriality is not enough, it is the simplicity or the lowest complexity, e.g. the Kolmogorov complexity if you like :) The algorithm to define the standard map is shorter, hence the complexity is lower.

The category theory is simply not enough to tell us the difference. We have to use the complexity theory :)

Now, in case of the scan, saying that the reducer is computed exactly whenever the parent stream fires, has lower complexity, than saying it does so BUT also the same reducer is computed at the creation time, even if the parent stream does not emit anything at that moment.

Finally, there is yet another argument. If you really like your reducer to compute on creation, nothing can be easier than that:

scan(reducer, reducer(seed, actions()), actions)

The dead simple composition!
Also very clear and expressive: you apply your reducer to the last action stored and pass it as the new seed.

However, going back the other way is much harder and requires considerable hacks and adding code for no clear benefit. That argument alone would be for me the reason to prefer the other way.

@dmitriz dmitriz closed this as completed Apr 30, 2017
@dmitriz dmitriz reopened this Apr 30, 2017
@JAForbes
Copy link
Collaborator

JAForbes commented Apr 30, 2017

I consider a stream x conceptually as just a collection of values spread over discrete time moments.

I've ranted about this interpretation of "observables" and marble diagrams, and "arrays over time" at length in the past. I think we shouldn't think of discrete points in time but a curve, a function, a formula, an infinite continuous curve described by a function. Why? Because its more powerful to model our programs using an abstraction that is unaware of discrete points in time. Its more powerful to take advantage of the abstraction provided instead of focusing on the machinery behind the scene necessary to create it. If we think of streams as "arrays but over time", or "a Promise but for multiple values" we're defeating the entire point of abstracting over time.

We shouldn't think of individual values, when they arrive, and watching or observing them, reacting to changes and modifying things as a result. We should think of relationships.

Now back to the problem at hand. We've got a relationship between a stream a and a scanned stream b. Our update function can be thought of as a perfect representation of the relationship between a and the resulting outputted b stream. We shouldn't concern ourselves with individual values, but the constraints formed by our update function. That constraint is all that matters.

In reality, we can only represent points, single moments and values. We can't represent the true nature of the relationship, which holds for all values.

What does this update function need, let's look at its signature: ( a, b ) -> a. It needs two inputs. Well if our stream a already has an input, and our seed value has also been provided, then we have enough information to emit a value, so it should because otherwise it isn't representing the curve, its representing a slice of a curve, its behavior diverges from its definition.

Now when we say things like "this stream should only emit from this point in time", or "previous value", or even "current value" we're thinking about it all wrong, we're still thinking about time, but that's what we want to abstract over when working with streams.

As I said before, the current behaviour prevents ordering problems and race conditions, it has the same behaviour for both sync and async, and the case you're seeking to prevent would only happen if we're not following best practices of starting our streams separately to our definitions of their relationships.

Note this is all my opinion, and most in the Observable world would disagree with me. But it's these small, simple points of divergence aggregated that I think make mithril / flyd streams better than the alternatives. It's not a perfect interface by any means, and there are things I'd like to be changed. But I'm against any core behavior that is aware of time. That stuff is great in modules for special cases, but it's so much simpler if we can abstract away time and definition order completely.

@dmitriz
Copy link
Contributor Author

dmitriz commented Apr 30, 2017

Note this is all my opinion, and most in the Observable world would disagree with me. But it's these small, simple points of divergence aggregated that I think make mithril / flyd streams better than the alternatives. It's not a perfect interface by any means, and there are things I'd like to be changed. But I'm against any core behavior that is aware of time. That stuff is great in modules for special cases, but it's so much simpler if we can abstract away time and definition order completely.

To comment on this, actually, that is exactly the direction I am trying to head. Abstracting away the time! Only possibly in a different way :)

Take the discrete event model. Forget the timing when the events fired, just keep the order. And now the time is gone and you get the pure array. That is exactly it. The time is abstracted away. We can do everything we do with arrays, no stream, no observable magic.

@dead-claudia
Copy link
Member

@JAForbes You might be interested in at least looking at this ES strawman of mine, although that's been put on the back burner while I've been busy with other things, and just rethinking the paradigm in general (I also could use some help making better comparisons).


(n.b. using CoffeeScript here to avoid 100s of parentheses)

I've also been looking into what data types I need for a programming language that's time-independent (it's going to be a GUI-based language for a closed-source project), and so I've been thinking of this a lot. There are several things I've since figured out with that really simplify the model:

  1. There conceptually doesn't need to be an observable difference between a single value and multiple values over time.

You can model constants as just a single value over time, and most stream/observable libraries, and even the ES proposal itself, offer something like Observable.of(value) to emit a single value.

Also, consider the common "end"` event in Node.js event-based APIs. That's only fired once 99% of the time, and usually, if that's the only conceptual "event", a plain callback is accepted.

  1. All values can be viewed as time-independent within some context.

Any time you have an immutable binding, it's guaranteed to be constant for the lifetime of the binding. Same with immutable values and their lifetimes. And when you read a mutable binding not by reference, you will get an immutable value out of it.

  1. All mutable values can be viewed as time-dependent within some context.

Any time you have a mutable value, it can change between reads by very definition of mutability. By corollary, time-dependent values may change between reads, and this is one common way you get race conditions in multithreaded C/C++ programs.

If you also let the setter become observable, and give a subscription access to the binding (even indirectly), you could recursively change it, resulting in similar time-dependent behavior.

  1. You could build most the common stream operators from a small number of primitives:
  • Source stream

The basic stream data type.

  • Closeable stream with source stream that emits once on close

This allows the ability to close a stream and observe that closure to, e.g., release acquired resources.

  • Pipe

Join a source to a transformer.

  • Read/write cells with source stream that emits the new value on change

This is effectively how mithril/stream and flyd work. You have a single mutable cell that emits the new value on change.

  • If/Else

Stream that accepts a source stream and on each emit, runs a block and emits to one stream if the block returns true, and another if it's false. In observable terms, it'd be implemented as this:

cond = (observable, test) ->
    o1 = o2 = undefined

    res1 = new Observable (observer) ->
        o1 = observer
        return () ->
          o1 = undefined
          subscription.unsubscribe() unless o2?

    res2 = new Observable (observer) ->
        o2 = observer
        return () ->
          o2 = undefined
          subscription.unsubscribe() unless o1?

    subscription = observable.subscribe
        next: (v) -> (if test(v) then o1 else o2).next(v)
        error: (v) -> o1.error(v); o2.error(v)
        complete: (v) -> o1.complete(v); o2.complete(v)

    return [res1, res2]
  • Map

Transform a value on change. In observable terms, it'd be implemented as this:

map = (observable, func) ->
    return new Observable (observer) ->
        return observable.subscribe
            next: (v) -> observer.next(func(v)),
            error: (v) -> observer.error(v)
            complete: (v) -> observer.complete(v)

@dead-claudia
Copy link
Member

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Type: Enhancement For any feature request or suggestion that isn't a bug fix
Projects
None yet
Development

No branches or pull requests

3 participants