Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Create an io.js streams team #97

Closed
rvagg opened this issue Dec 31, 2014 · 39 comments
Closed

Create an io.js streams team #97

rvagg opened this issue Dec 31, 2014 · 39 comments

Comments

@rvagg
Copy link
Member

rvagg commented Dec 31, 2014

The Streams Team

@chrisdickinson and I have been charged with splitting streams concerns out of io.js core such that the authoritative source of code, docs and issues for streams comes from this repo and flows back into iojs/io.js. @isaacs has been kind enough to gift this repo to the iojs org and the first order of business is to create a streams team that can be responsible for maintaining this repo, dealing with issues, updating docs, deciding the future directions for streams in io.js (and potentially joyent/node, but we'll see where that goes), and actually implementing it.

Ideally, those involved will have some clue how the current implementation (streams3 mainly) works. The biggest problem iojs (and node) faces wrt streams is the limited number of people that understand how they work, certainly most of the TC doesn't really have an in-depth knowledge.

As a starting point, I'd like to include people that have made contributions to readable-stream since they are the obvious candidates for forming a core team:

Please indicate your willingness to be included here and then we can start to move forward with some more practical concerns about how this repo & the flow of code, docs & issues is going to work.

@mafintosh
Copy link
Member

Great initiative! I'd be more than happy to help out.

@chrisdickinson
Copy link
Contributor

The WHATWG streams spec is shaping up nicely; it might be good to see if @domenic would be interested in helping / peering over what we're working on here depending on his availability.

@calvinmetcalf
Copy link
Contributor

Excited to help (though will not be around much the next week)

On Wed, Dec 31, 2014, 8:23 AM Chris Dickinson [email protected]
wrote:

The WHATWG streams spec is shaping up nicely; it might be good to see if
@domenic https://github.com/Domenic would be interested in helping /
peering over what we're working on here depending on his availability.


Reply to this email directly or view it on GitHub
#97 (comment).

@sonewman
Copy link
Contributor

I would definitely be happy to be involved 😃

👍 to inviting @domenic, I think that is a good move.

@Fishrock123
Copy link

👍 to @domenic

@timgestson
Copy link
Contributor

Would be happy to help.

@rvagg
Copy link
Member Author

rvagg commented Jan 4, 2015

I'll schedule a Hangout On Air in a week or two so we can get started, I know that a lot of people have powered down for holidays just at the moment. We can start with the initial list of people here but that doesn't mean it's fixed in stone, we really just need to get some direction and momentum.

@bjouhier
Copy link

bjouhier commented Jan 4, 2015

I am interested if the group is ready to consider some radical departures from current design, more specifically moving from a device-centric approach to a data processing approach. See nodejs/node#188

But otherwise userland is a comfortable place.

@sonewman
Copy link
Contributor

sonewman commented Jan 4, 2015

@bjouhier I think there are some interesting ideas and discussions floating around in nodejs/node#89 & nodejs/node#188.

The stream module has had a level 2 unstable rating in joyent/node.

The main issues I see to radical API changes are that a lot of core is built on these constructs, changes to semantics could significantly affect the behaviour of these components and should be approached with caution and not on a whim (how would this implicate movements to merging upstream with joyent/node).

Also despite the unstable rating there is also a large developing eco-system of streams building on top of the current implementation.

IMO the current streams implementations are quite good, although we would benefit from moving closer to ideas from whatwg.

Any changes made would ideally need to be gradually, since they could have significant implications (to core and user-land).

As discussed nodejs/node#89 core's main use-cases for streams are to handle binary and text protocols. User-lands main use-cases are primarily objectMode streams.

For this reason I would advocate making streams as simple, flexible and un-bias as possible.

However this could be a topic for further discussion.

But we should discuss this in another thread.

@Raynos
Copy link
Contributor

Raynos commented Jan 4, 2015

You cannot make breaking changes to the streams interface.

WHATWG streams or what @bjouhier proposed (which seems close to simple-streams or pull-streams from creationix & dominictarr) are completely untested in production and will have all kinds of bugs.

Bugs in the streams interface will ripple through everything, including TCP and HTTP. This would devastate the stability of all existing servers in node.

The streams1 to streams2 migration went reasonably smoothly because it had full back compat, but that still introduced a bunch of edge cases.

That being said, count me in for a streams team 👍 !

@bjouhier
Copy link

bjouhier commented Jan 4, 2015

@sonewman The principles that I am proposing also apply to text and binary streams. There are a few text parsers and formatters in ez-streams but we also used the approach to implement a binary TCP/IP protocol converter in our ERP product (and believe me, it was a pretty hairy one).

Separation of concerns applies here: it is easier to implement parsers and formatters with a simple callback API because you don't have to deal with backpressure explicitly.

@Raynos I agree with you about backward compat. Given the complexity of today's streams it seems very risky to reimplement the current interface with a different approach. But streams API could also evolve with a side-by-side approach: keep the current implementation "as is" for backward compat and start fresh on the side.

Saying that simple-streams, pull-streams or ez-streams will have all kinds of bugs is an easy way to rule them out. FWIW we are using ez-streams in our ERP product, which has been released commercially last June. I would not claim that it is bugfree but my experience is that bugs have been very easy to fix so far because the API is very simple.

@mikeal
Copy link

mikeal commented Jan 4, 2015

Any differences between io.js streams and WHATWG streams will have to be reconciled at some point. By that I do not mean that io.js streams will become WHATWG streams or vice versa, but that we need to iron out any irreconcilable differences while WHATWG streams are being standardized so that either io.js streams become a compatibility API between legacy streams and WHATWG stream or transpilers like browserify can solve this in the future.

The thing I'm most worried about is a situation similar to what we have with Buffer vs TypedArray.

@sonewman
Copy link
Contributor

sonewman commented Jan 5, 2015

@mikeal I agree completely, that seems like a perfect direction for discussion and development.

@sonewman
Copy link
Contributor

This would be a good thing to get consensus on: nodejs/node#288

@Fishrock123
Copy link

If we are going to be moving towards the whatwg spec, but haven't reached that yet, it sounds unstable.

@sonewman
Copy link
Contributor

@Fishrock123 I linked that issue because people have strong opinions about this, so I thought it best to bring it to wider attention.

The way I have interpreted the ideas with regards to WHATWG are that we should ensure we align internal behaviour with that of WHATWG (so there are no surprises) and discuss ideas around compatibility between the two types.

There has been very little discussion about this yet, but it's important that there is collaboration and understanding of implications from all angles.

@calvinmetcalf
Copy link
Contributor

Part of what to figure out is how close to align with whatwg streams, off the top of my head some major differences

  • whatwg streams are the constructors intended to use to create an object while node streams are objects meant to subclass (though you can subclass whatwg streams)
  • whatwg streams have no duplex stream object but uses object literals with readable and writable members, as opposed to node which has a duplex object with methods from readable and writable streams
  • there are 2 types of piping based on error forwarding, pipeTo corresponds to the node pipe method while pipeThrough forwards errors down the pipe chain based on whether they are piping to a duplex stream or a writable stream.
  • whatwg streams are promise based while node ones are event based
  • whatwg streams have a queue api that may be swapped out. The queue api has methods for customizing chunk size calculations and whether a stream is over the high water mark, node streams have 2 built in queue strategies with no way to define more.
  • whatwg streams have a concept of an exclusive reader which gives an object exclusive access to a stream
  • node streams have 2 modes, a flowing mode and a non flowing mode, the difference based on whether the reader is exerting back pressure edit this is true for v2 streams.
  • node streams can't enqueue null values as they are used to signify end of stream
  • node streams have a concept of 'corking' the stream, writing to it and then uncorking it
  • node streams puts internal methods like push on the prototype while whatwg streams pass the equivalent function as an argument
  • node streams seam to be built with binary date in mind first and object streams secondary while whatwg streams seem to be built with object streams first.
  • both have synchronous read methods but whatwg streams throw when no data is available while node.js streams return null, node.js streams also take a size parameter to signify how much to read.
  • both have async write methods but the node.js one takes a callback and the whatwg one returns a promise

In many ways whatwg streams are more generic than node.js streams and node streams could be built built as subclasses of whatwg streams but there would be compatibility issues with the similar but slightly different read and write methods (mainly the read method, write could return a promise if there was no callback provided).

Edit: corrected some errors

@sonewman
Copy link
Contributor

@calvinmetcalf this very useful well documented comparison. It is quite evident there are a lot of differences in API.

It would be quite good to decouple the enqueue strategy from node streams.

In addition since we are taking a step back from the idea of providing promise based API's for other methods, is this something we would want to consider for io.js streams?

In particular it would require adding a dependency for some es6 promise polyfill library, which would be an additional load on Browerify. I also wonder what effects a promised based approach would have on performance, since core infrastructural components of io.js currently rely on streams for internals (net, http, etc).

Passing internal methods to the constructor of a stream could be a simple addition.

I think the fact that io.js streams having both flowing and not flowing mode in readable-streams is a great advantage, as is the cork and uncork for writables (although lacking real exposure given the lack of documentation, due to there being no v0.12 release). Could these features be implemented with WHATWG streams with a custom enqueue strategy?

We could potentially alias .pipe to .pipeTo. Implementing .pipeThrough error forwarding downstream would probably require some thought though since there is a lot of logic in the current pipe which could be utilised, the error handle management would probably require some thought though.

Anyway as I said, that is some great information ☺️

@chrisdickinson
Copy link
Contributor

Thanks for the detailed reply, @calvinmetcalf! I've reordered your points a bit:

  • whatwg streams are the constructors intended to use to create an object while node streams are objects meant to subclass (though you can subclass whatwg streams)
  • node streams puts internal methods like push on the prototype while whatwg streams pass the equivalent function as an argument

I think this is an improvement on Node streams -- by limiting the number of ways to interact with the API, it becomes a lot easier to make sure users are oriented properly. Right now it's possible to push into a readable stream "out-of-band," or attach the _read method a bunch of different ways. It's hard to communicate to users what the "desired" way to use the API is. Optimization-wise: re-using the same function isn't doing us a lot of favors, it very quickly becomes megamorphic. Adopting a revealing-constructor-style pattern for Node streams would be a quick win, I think (we would retain the old ._read and push, but over time remove them from docs / deprecate them.) This would also let us address the next point...

  • node streams can't enqueue null values as they are used to signify end of stream
  • node streams seam to be built with binary date in mind first and object streams secondary while whatwg streams seem to be built with object streams first.

By separating enqueue functionality from eof functionality, we can avoid having "out of alphabet" values like null. If we move towards a revealing constructor pattern, it would be natural to introduce this behavior there (while retaining the old, unfortunate behavior in the push method until it's fully deprecated.) In general I'd like to see buffers be the specialization instead of generic objects -- I think node has the priorities reversed at the moment: the buffering strategy is the only thing that should have to examine the items being passed through the stream -- the core streams API shouldn't concern itself with the values being queued.

whatwg streams have a queue api that may be swapped out. The queue api has methods for customizing chunk size calculations and whether a stream is over the high water mark, node streams have 2 built in queue strategies with no way to define more.

This is one of my favorite features of WHATWG streams. The buffering strategy that's baked into Node streams is one of the hardest things to reason about for the least amount of benefit for userland streams. It exists because transitioning the state machine between reading and paused (or writing and waiting) is expensive for the kinds of streams built into node. This isn't usually the case for userland streams. What ends up happening when this concept is layered (via pipe chains) is that there's a big disconnect between what the user expects to happen and what reads and writes actually happen underneath the hood. There's no reason the stream buffering behavior shouldn't be configurable in this fashion.

  • node streams have a concept of 'corking' the stream, writing to it and then uncorking it

This can be roughly emulated using a writableBufferingStrategy, if I understand it correctly.

  • whatwg streams have no duplex stream object but uses object literals with readable and writable members, as opposed to node which has a duplex object with methods from readable and writable streams

Another good thing, I think. It reduces the number of "true streams" to just Readable and Writable -- Transforms and Duplexes are just objects that have-a readable and writable side. Diagnosing problems with Node's Duplex and Transform is a dizzying experience because they inherit from both Readable and Writable.

  • there are 2 types of piping based on error forwarding, pipeTo corresponds to the node pipe method while pipeThrough forwards errors down the pipe chain.

That doesn't match my understanding of pipeTo vs. pipeThrough. pipeTo corresponds to the node pipe method but only applies to Readable -> Writable pipes. pipeThrough is just sugar for readable.pipeTo(duplex.writable); return duplex.readable;. The error handling is provided by returning a Promise from pipeTo, which pipeThrough discards at present -- I don't think errors are forwarded through pipelines.

  • node streams have 2 modes, a flowing mode and a non flowing mode, the difference based on whether the reader is exerting back pressure.

In streams2 the modes were way different -- to the point of monkeypatching the stream instance. As of streams3, the modes are unified, though they're a tri-state: paused, flowing, or "uninitialized" -- the "uninitialized" state standing in so that we can implicitly start the stream .on("readable" | "data").

  • whatwg streams have a concept of an exclusive reader which gives an object exclusive access to a stream
  • both have synchronous read methods but whatwg streams throw when no data is available while node.js streams return null, node.js streams also take a size parameter to signify how much to read.

I think it's a little unfortunate that streams expose .read, .write, and .end publicly. It makes for some weirdness -- if a stream is part of an existing pipeline, and a user also reads from it, what should happen? What if the stream is currently paused, or otherwise waiting for a downstream drain? The flipside is also odd: by exposing .write, backpressure from the write side can only ever be advisory, since the user can call .write whenever they like. The exclusive reader is interesting; it sets WHATWG streams up for some optimizations (e.g., all of the streams could agree that they can communicate off-main-thread), but I'm a little sad that the public APIs still exist for this reason.

Re: the size parameter in read -- I think that's ultimately more trouble than it's worth (due to having to slice or copy or concat buffers to fulfill the requested size). Repeated reads are probably better than single reads with buffer copies.

  • both have async write methods but the node.js one takes a callback and the whatwg one returns a promise
  • whatwg streams are promise based while node ones are event based [emphasis mine -cd]

I think this is going to be the biggest issue for interop: the underlying communication mechanism is different. Node streams probably can't abandon events without breaking all sorts of things, so we'll have to tread very carefully when approaching this problem.

@domenic
Copy link

domenic commented Jan 12, 2015

there are 2 types of piping based on error forwarding, pipeTo corresponds to the node pipe method while pipeThrough forwards errors down the pipe chain.

This isn't quite accurate. pipeTo does error forwarding as well. pipeThrough(x) is literally just pipeTo(x.writable); return x.readable.

node streams have a concept of 'corking' the stream, writing to it and then uncorking it

This is still a desired feature that'll definitely make it.

node streams seam to be built with binary date in mind first and object streams secondary while whatwg streams seem to be built with object streams first.

The current thinking for binary data is captured in https://github.com/whatwg/streams/blob/master/BinaryExtension.md

@domenic
Copy link

domenic commented Jan 12, 2015

if a stream is part of an existing pipeline, and a user also reads from it, what should happen?

This is answered by using the exclusive reader to get a lock during piping. Relatively recent innovation.

Re: the size parameter in read -- I think that's ultimately more trouble than it's worth (due to having to slice or copy or concat buffers to fulfill the requested size).

See the readInto for the BinaryReadableStream

@sonewman
Copy link
Contributor

Thanks @chrisdickinson & @domenic for clearing up some of that (my knowledge of WHATWG is a little hairy).

I agree on the most part with your observations. Particularly things like when calling read you can never have two listeners listening to the readable event and get the same data. Yet in comparison to the on data event, readable is emitted asynchronously and therefore may consist of more than one chunk.

As you say the read(n) functionality is a little awkward, and could probably be determined in the inner queuing api, however the current api does allow the consumer of the stream to dictate this outcome from the stream rather than it being predetermined.

You are correct though, I think there are challenges ahead, but I have every bit of faith that the outcome will be awesome 😸

@calvinmetcalf
Copy link
Contributor

Updated my previous comment to correct some errors.

I don't think events vs promises are going to be as major of an issue as
you can make a stream which fulfills both apis fairly easily.

I think main issues that would prevent an ambidextrous stream is handling
of reading from an empty queue. If I have time in the next couple days I
might try to figure it out.

On Mon, Jan 12, 2015, 4:47 PM Domenic Denicola [email protected]
wrote:

if a stream is part of an existing pipeline, and a user also reads from
it, what should happen?

This is answered by using the exclusive reader to get a lock during
piping. Relatively recent innovation.

Re: the size parameter in read -- I think that's ultimately more trouble
than it's worth (due to having to slice or copy or concat buffers to
fulfill the requested size).

See the readInto for the BinaryReadableStream


Reply to this email directly or view it on GitHub
#97 (comment).

@chrisdickinson
Copy link
Contributor

@domenic

This is answered by using the exclusive reader to get a lock during piping. Relatively recent innovation.

I've been watching the exclusive reader api with interest -- great work landing it. I still have the vague feeling that exposing read / write / end publicly makes streams a TIMTOWTDI API -- that the aforementioned methods could be exposed by a subclass that explictly provides them at either end of a stream, instead of available on all Readables/Writables. That said, it's a vague feeling, so feel free to dismiss :)

See the readInto for the BinaryReadableStream

This looks great; I'm a big fan of the way this keeps these concerns separate from the core streams implementation.

@sonewman
Copy link
Contributor

if a stream is part of an existing pipeline, and a user also reads from it, what should happen?

This is answered by using the exclusive reader to get a lock during piping. Relatively recent innovation.

@domenic I am very intrigued by this, are you refering to: whatwg/streams/...experimental/readable-byte-stream.js?

See the readInto for the BinaryReadableStream

This looks great; I'm a big fan of the way this keeps these concerns separate from the core streams implementation.

@chrisdickinson I am very much in agreement with you on this.

@calvinmetcalf
Copy link
Contributor

so looking through the code, the use of exclusive stream reader for piping is going to cause compatibility issues as piping to 2 different streams is totally valid in node (I want to say it pipes at the spead of the slowest stream but not positive) but an error in whatwg

@domenic
Copy link

domenic commented Jan 13, 2015

@sonewman not quite, check out:

@calvinmetcalf agreed, although I have heard it is "buggy" in Node. (As in, a few times I have heard of people trying that, and getting unexpected results even for common cases.) But since pipeTo has (very carefully!) been designed to only use public APIs, it can always be re-implemented to not use readers, and to not propagate errors, and anything else.

@chrisdickinson

I still have the vague feeling that exposing read / write / end publicly makes streams a TIMTOWTDI API

Although I can appreciate general uneasiness about exposing stream "internals" in this way, at least on the TIMTOWTDI front, pipeTo is just sugar over the more basic operations, so there's really only one way to do it plus a helper function for a common case.

@calvinmetcalf
Copy link
Contributor

@domenic agreed it would be pretty easy to write a pipe method that acted more like nodes (didn't care about target, allowed multiple etc).

@max-mapper
Copy link
Contributor

I pipe multiple readers into one writer all the time. You have to use the end: false option or else the first one that ends will end the writable. This puts the burden on you to keep track of the end state of your readables and then ending your writable when you want it to end, but its not that bad.

@domenic
Copy link

domenic commented Jan 13, 2015

Yeah the case we're talking about is a single readable stream into multiple writable streams.

@calvinmetcalf
Copy link
Contributor

I do have some code that does that, I pipe data into a database, but I also pipe it into analytics, it might make sense for me to just file an issue about this and discuss it over at whatwg

@chrisdickinson
Copy link
Contributor

@domenic re:

pipeTo is just sugar over the more basic operations

Interesting – I've come at it from the opposite angle. I probably got this from min-streams, where the equivalents of read, write, and end are internal operations in service of the larger pipeline.
Toh-may-to, toh-mah-to, I suppose. 🍅


I'm excited to get started! In the interest of forward motion, here are some smaller actions we can take to get the ball rolling:

  • Backport iojs's lib/_{{read,writ}able,duplex,transform}_stream.js to this repository.
  • PR a script similar to tools/upgrade-npm.sh into iojs.
  • Start cleaning up / documenting internal stream API usage in iojs core modules. For example, net.Socket directly manipulates the flowing flag. This will likely make future changes we make more difficult than they need to be!

@sonewman
Copy link
Contributor

@domenic Thanks for the links 👍

@calvinmetcalf That is interesting because if one writer applies back pressure would that apply the same buffering to all the streams being written too?

@chrisdickinson they seem like reasonable preliminary tasks to me. Excitement is what we need:exclamation::boom:

@calvinmetcalf
Copy link
Contributor

@sonewman yes it pipes at the rate of the slowest reader

@sonewman
Copy link
Contributor

@calvinmetcalf sorry i think i missed that part in one of the above messages.
That's probably the only nice to solution to that problem, since you would otherwise need separate internal buffer arrays per the listening writers, which would certainly increase complexity.

@chrisdickinson
Copy link
Contributor

Here's the PR backporting iojs streams. Please review! Thanks all. I think it might be a bit trickier than expected to start flowing changes from readable-stream into iojs core -- we may want to consider adopting a preprocessing step of some sort so we can conditionally select what dependencies we're bringing in.

@chrisdickinson
Copy link
Contributor

Hey all! As an update, the following people have been added to the io.js streams team:

@iojs/streams: Thanks for your time! I will have an issue up early this coming week detailing what's coming up and we can start to divvy tasks up / talk problems out / schedule meetings if necessary/desired from there.

If @rvagg mentioned you and you did not get an invite, please email me and I'd be happy to add you.

To folks interested in joining this effort: the best way to do so is to contribute to this repository, iojs/io.js, or joyent/node with a focus on solving specific issues with streams. Contributions need not necessarily be code: helping debug streams issues on the platform repos is super helpful as well. If you are interested in contributing, please reach out to @rvagg or I and we would be happy to point you in the right direction.

@chrisdickinson
Copy link
Contributor

Updated: added @isaacs to the team.

@sonewman
Copy link
Contributor

@chrisdickinson ⬆️ 👍

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests