Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

stream: introduce Body #39483

Closed
wants to merge 6 commits into from
Closed

stream: introduce Body #39483

wants to merge 6 commits into from

Conversation

ronag
Copy link
Member

@ronag ronag commented Jul 22, 2021

This introduces a new stream primitive called Body which helps
with performance, ergonomics and compatibility when working
with different types of data producers and consumers.

Using Body it will be possible to delay converting streamlike
objects as long as possible and enable some optimizations where
we can avoid e.g. intermediate node streams.

The current implementation does not yet take advantage of possible optimizations. However, I think that is something we can add and improve on in the future.

@ronag ronag added the stream Issues and PRs related to the stream subsystem. label Jul 22, 2021
@nodejs-github-bot nodejs-github-bot added lib / src Issues and PRs related to general changes in the lib or src directory. needs-ci PRs that need a full CI run. labels Jul 22, 2021
@ronag
Copy link
Member Author

ronag commented Jul 22, 2021

@nodejs/streams

@ronag
Copy link
Member Author

ronag commented Jul 22, 2021

This needs more work on docs and tests but the basics should be there. Would appreciate some general feedback on the concept.

@ronag
Copy link
Member Author

ronag commented Jul 22, 2021

This also helps the ecosystem with consuming streamlike objects, e.g:

async function myProduceApi() {
  return compose(myReadableOfChoice())
}
async function myConsumeApi(body) {
  await pipeline(body, myWritableOfChoice())
}
function myTransformApi(body) {
  return compose(body, myTransformerOfChoice())
}

e.g. in undici we currently need to have quite complicated and brittle code in order to support as many different input types as possible, this PR would allow the ecosystem to unify this.

@ronag ronag added wip Issues and PRs that are still a work in progress. and removed lib / src Issues and PRs related to general changes in the lib or src directory. labels Jul 22, 2021
@ronag ronag force-pushed the stream-body branch 2 times, most recently from 39a70c4 to d4c3a1f Compare July 22, 2021 08:42
@ronag
Copy link
Member Author

ronag commented Jul 22, 2021

FileHandle already implements readableWebStream and could easily implement the rest of this API using a mixin.

@ronag
Copy link
Member Author

ronag commented Jul 22, 2021

@jasnell you might want to consider this for quic and/or the new http APi's.

@mcollina
Copy link
Member

Did compose already ship into a release? If not, it would be better to avoid shipping it or flagging it as experimental, otherwise we will need to go through a deprecation cycle.

@ronag
Copy link
Member Author

ronag commented Jul 22, 2021

Did compose already ship into a release? If not, it would be better to avoid shipping it or flagging it as experimental, otherwise we will need to go through a deprecation cycle.

It hasn't shipped. Right now it can only ship in v17 as it depends on semver major.

@ronag ronag force-pushed the stream-body branch 3 times, most recently from 39027d9 to babe5ec Compare July 22, 2021 11:32
This introduce a new stream primitive called Body which helps
with performance, ergonomics and compatibility when working
with different types of data producers and consumers.

Using Body it will be possible to delay converting streamlike
objects as long as possible and enable some optimizations where
we can avoid e.g. intermediate node streams.
@ronag

This comment has been minimized.

doc/api/stream.md Outdated Show resolved Hide resolved
@ronag ronag added request-ci Add this label to start a Jenkins CI on a PR. and removed needs-ci PRs that need a full CI run. wip Issues and PRs that are still a work in progress. labels Jul 22, 2021
@github-actions github-actions bot removed the request-ci Add this label to start a Jenkins CI on a PR. label Jul 22, 2021
@nodejs-github-bot
Copy link
Collaborator

added: REPLACEME
-->

* Returns: {Readable}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
* Returns: {Readable}
* Returns: {Writable}

@@ -1722,6 +1722,141 @@ const cleanup = finished(rs, (err) => {
// ...
});
```
#### Class: `stream.Body`
Copy link
Member

@jasnell jasnell Jul 22, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

While it makes things a bit more complicated, I would generally prefer to separate out the Body mixin pieces here from the Node.js specific additional APIs. Or, if not that, let's not call this Body to avoid confusion. stream.Consumers perhaps?

Further, I think an API with static methods similar to would be nicer here... it gives us more options for wrapping the methods in various API specific ways.

const ab = await stream.consumers.arrayBuffer(readable);

// ...

const ab = await stream.consumers.blob(stream.compose(async function()* { yield 'hello'; }));

I really don't want to introduce yet another top level data encapsulation object given how many we already have.


Returns a promise that fulfills with an {Buffer} containing a copy of the body data.

### `body.nodeStream()`
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Since we already have the node-to-web adapters, and your new compose function hopefully landing soon, I don't think we need these. Let's keep this focused on the accumulator funtions.

@mcollina
Copy link
Member

It might make sense to mark both stream.Body and stream.compose as experimental for at least one release?

Let's do it.

Copy link
Member

@jasnell jasnell left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just to make it explicit. I'm -1 on introducing a new Body object. This should provide utilities for making it possible to implement the Body mixin pattern but having it as a separate new kind of object is counter productive.

@ronag
Copy link
Member Author

ronag commented Jul 22, 2021

@jasnell The primary problem this tries to resolve is to convert to a specific consumer as late as possible as to avoid intermediate glue streams. The problem with e.g. compose is that it always returns an intermediate Duplex even though it might be consumed later as something else which in most cases will result in 2 unnecessary glue instances, one introduced by compose and one by the consumer.

consider e.g.

const composed = compose(createMyWebStream(), ...transforms, createMyWebTransformStream())
myWebApi(composed.toWeb())

Which ends up with:

TransformStream => Duplex (glue) => ReadableStream (glue) => consumer

If we instead had a "placeholder" e.g. Body this could become:

TransformStream => consumer

If we want to achieve this in terms of performance then IMO we do need another top level data structure (sorry).

Even without compose this becomes the problem as you often have to decide early in what form you receive an data stream, even if that form might not fit the final consumer, e.g. FileHandle could implement a native variant of each (node stream, web stream, async generator etc.) providing maximum performance without glue regardless of what way it's consumed.

In a way I guess this is just extending the BodyMixin?

The other ergonomic parts of this PR I would like but I don't feel so strongly about it.

@jasnell
Copy link
Member

jasnell commented Jul 22, 2021

In a way I guess this is just extending the BodyMixin?

Except the Body mixin is not something that exists independently, and the existence of Body can make things more difficult when adapting to standard APIs. For instance, for fetch response on top of QUIC, this Body object doesn't actually help anything. Yes, we need the accumulator functions like arrayBuffer() and text() but those are easily implemented directly on top of the QUIC stream.readableWebStream() API with very good performance... e.g.

async function text() {
  let text = '';
  for await (const chunk of stream.readableWebStream().pipeThrough(new TextDecoderStream()))
    text += chunk;
  return chunk;
}

Having the additional intermediate Body class doesn't really help as it would just be an adapter for an adapter (both the readableWebStream() and readableNodeStream() methods on the QUIC Stream object just return adapters for the underlying native level data buffer. Body would just be an additional allocation

What does make sense to me is providing the underlying accumulator functions so they can be used in multiple places...

For instance, in the QUIC stream class I could provide methods...

const {
  text,
  json,
  arrayBuffer,
} = require('stream/accumulators');
// ..

class Stream {
  // ...
  async function text() {
    return text(this.readableWebStream());
  }

  async function json() {
    return json(this.readableWebStream());
  }

  async function arrayBuffer() {
    return arrayBuffer(this.readableWebStream());
  }
}

Or, perhaps using the mixin model... something like...

const {
  kGetReadableWebStream,
  mixinBody,
} = require('stream/accumulators');
// ..

class Stream {
  get [kGetReadableWebStream]() {
    return this.readableWebStream();
  }
}

mixinBody(Stream.prototype);

// Such that mixinBody adds the common set of Body mixin methods to the `Stream` prototype.

@ronag
Copy link
Member Author

ronag commented Jul 23, 2021

In a way I guess this is just extending the BodyMixin?

Except the Body mixin is not something that exists independently, and the existence of Body can make things more difficult when adapting to standard APIs. For instance, for fetch response on top of QUIC, this Body object doesn't actually help anything. Yes, we need the accumulator functions like arrayBuffer() and text() but those are easily implemented directly on top of the QUIC stream.readableWebStream() API with very good performance... e.g.

async function text() {
  let text = '';
  for await (const chunk of stream.readableWebStream().pipeThrough(new TextDecoderStream()))
    text += chunk;
  return chunk;
}

You still need to run through a lot of glue here. Just writing directly to a buffer or string would still be faster than quic -> webstream -> transform stream -> text decoder -> async generator -> promise. But you can choose to implement text() directly on the quic stream however you wish so these details are not that relevant to this PR.

e.g. in undici we could totally skip all glue and do:

class Body {
  [kPush](chunk) {
    if (this[kType] === CONSUME_STRING_TYPE) {
      if (chunk) {
        this[kString] += chunk
      } else {
        this[kResolve]()
      }
    }
  }

  text () {
    if (this[kType]) throw new TypeError('disturbed')
    this[kType] = CONSUME_STRING_TYPE
    return new Promise((resolve, reject) => {
      this[kResolve] = resolve
      this[kReject] = reject
    })
  }
}

Please see https://github.com/nodejs/undici/pull/898/files for a complete example.

Having the additional intermediate Body class doesn't really help as it would just be an adapter for an adapter (both the readableWebStream() and readableNodeStream() methods on the QUIC Stream object just return adapters for the underlying native level data buffer. Body would just be an additional allocation

What does make sense to me is providing the underlying accumulator functions so they can be used in multiple places...

For instance, in the QUIC stream class I could provide methods...

// Such that mixinBody adds the common set of Body mixin methods to the `Stream` prototype.

That misses the point that we should try to avoid consuming streams multiple times. But if we skip that concern then yes, guess that is an option for the stream + Body part. Let's not get to stuck on this part right now as it's secondary. What about the disturbed part of body?

Again the primary point is to avoid glue when consuming data streams, i.e. a top level "lazy stream type" data structure.

@ronag
Copy link
Member Author

ronag commented Jul 23, 2021

So there are 2 parts of this, i.e. we usually have two glue steps every time we consume data, e.g.

undici native -> node streams (1) -> web streams (2) -> consumer

  1. Choosing a "native" API when implementing a data source. This is the primary problem with e.g. quic, undici etc.
  2. Transforming into the API expected by the consumer. This is the primary problem with compose.

1, Is about implementing some kind of body mixin where the different ways of consuming can be implemented without glue.
2, Is about implementing some kind of "lazy" intermediate stream representation.

So basically what we want is:

undici native -> lazy -> web streams which collapses to undici native -> webstreams -> consumer.

@ronag ronag mentioned this pull request Jul 23, 2021
@ronag
Copy link
Member Author

ronag commented Jul 24, 2021

@nodejs/streams I think we need more opinions and feedback here to get further in discussion.

@benjamingr
Copy link
Member

I've read the thread and see the arguments but I feel like in order to hold an informed opinion I'd need to spend ±10-20 good hours on this. This is why I've also been avoiding most of the compose discussions - I see the merit but I feel like coming in with an uninformed opinion because I don't have the capacity to be informed would just hinder the people contributing.

If you'd like I'm happy to take a look in ~a month when hopefully Daniel (the 👶) sleeps through the night, I'm done with the move and I'm less as new at Microsoft :)

@ronag
Copy link
Member Author

ronag commented Jul 27, 2021

Let's wait for what happens with #39520 and whether or not that can be used or affect this PR. I'll wait with pushing for this PR until then.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
stream Issues and PRs related to the stream subsystem.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants