Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

ReadableStream.prototype.arrayBuffer() #1019

Open
ricea opened this issue Oct 15, 2019 · 34 comments
Open

ReadableStream.prototype.arrayBuffer() #1019

ricea opened this issue Oct 15, 2019 · 34 comments

Comments

@ricea
Copy link
Collaborator

ricea commented Oct 15, 2019

It would be good to have a convenience method to concatenate the contents of a ReadableStream into a (promise for a) Uint8Array.

It is unclear what should happen if the stream contains chunks that are not ArrayBuffer or ArrayBufferViews.

Other similar methods we might consider are .blob() and .text().

@domenic
Copy link
Member

domenic commented Oct 15, 2019

A more generic method might be stream.toArray() which returns a promise for an array of chunks. (If https://github.com/tc39/proposal-iterator-helpers lands this would basically become an alias for stream.values().toArray(), although potentially easier to optimize since it's more direct.)

If we had that, then we'd just need a way of turning an array of Uint8Arrays into a single Uint8Array. Surprisingly I can't find any easy way to do this, especially not performantly. So, maybe that is not the right path.

@annevk
Copy link
Member

annevk commented Dec 13, 2019

Probably also want to consider an equivalent API where the buffer is already allocated (similar to encodeInto(), BYOB).

@MattiasBuelens
Copy link
Collaborator

Probably also want to consider an equivalent API where the buffer is already allocated (similar to encodeInto(), BYOB).

I like the idea of a method that fills up an entire ArrayBuffer (or ArrayBufferView). I think it would work better as a reader method, for example ReadableBYOBReader.readFully(view) (taken from Java's DataInput.readFully()). Optionally, we could also add a shorthand method on ReadableStream, similar to how ReadableStream.cancel() is a shorthand for ReadableStreamDefaultReader.cancel(). The readInto() function in this example seems like a good starting point.

It would be different from ReadableStream.arrayBuffer() and/or ReadableStream.toArray() though. Those two methods would always read the entire contents of the stream, whereas a readFully() method may leave some bytes on the stream if the given buffer is not large enough to fit the entire contents.

@jakearchibald
Copy link

How about:

  • array() - Array of chunks
  • blob(BlobPropertyBag) - Blob. Rejects and cancels stream if any chunks are not BlobPart.
  • arrayBuffer({ into }) - Rejects and cancels stream if any chunks are not buffer sources, or if into is provided and isn't big enough to hold the data. Resolves with an arraybuffer, which will be a slice of into if it's provided.

Maybe we can look at text() later, but (await stream.array()).join('') doesn't seem too bad.

@jimmywarting
Copy link

jimmywarting commented May 24, 2021

The new Response(blob).body hack was the reason why we got blob.arrayBuffer(), blob.text() & blob.stream()
You can use the same hack here ... new Response(stream).blob()
Could you imagine what the fetch would look like instead if we had done response.body.blob()? then we would probably not have response.blob() in the first place

It would be nice to have all array/blob/text/arrayBuffer - hopefully node streams would follow suit than you could use express.js to do stuff like

app.post('route', (req, res) => {
  await req.arrayBuffer()
})

Rejects and cancels stream if any chunks are not BlobPart.

Not sure what you mean by this

Blob constructor can take any kind of object you throw at it. The default fallback behavior is to cast anything it can't handle into a string...
if you do: new Blob([{}]) then it will dump "[Object object]" as the content. Maybe should do the same thing.

@jakearchibald
Copy link

jakearchibald commented May 24, 2021

Could you imagine what the fetch would look like instead if we had done response.body.blob()? then we would probably not have response.blob() in the first place

That would have delayed shipping fetch and therefore service workers for many years. Also, the methods on response are still useful, because:

  • Space can be pre-allocated using the content-length
  • The mime type on Blob can be filled in from the request/response content-type

Blob constructor can take any kind of object you throw at it. The default fallback behavior is to cast anything it can't handle into a string...
if you do: new Blob([{}]) then it will dump "[Object object]" as the content. Maybe should do the same thing.

Yeah, agreed. My intention was: Chunks can be anything that can appear in the sequence used in the blob constructor. As you say, almost everything is acceptable, with a few exceptions like detached buffers.

@taralx
Copy link

taralx commented May 30, 2021

FWIW, +1 to a readFully method on BYOB readers. Maybe a new issue for it?

@ricea
Copy link
Collaborator Author

ricea commented Jul 12, 2021

An interesting use of .blob() that I hadn't thought of is to efficiently handle streams that would use excessive amounts of memory to load completely, and so would ideally be stored on disk. The platform doesn't provide a primitive to stream data into a Blob, but it appears implementations do have this as part of their implementation of the blob() method on Response.

See https://stackoverflow.com/questions/67362823/progress-for-a-fetch-blob-javascript/67364010#67364010 for an example we could make much cleaner.

@o-t-w
Copy link

o-t-w commented Aug 19, 2021

Looks like Node has just implemented this for arrayBuffer, blob, json, and text, but the API is a bit different.
https://nodejs.org/api/webstreams.html#webstreams_utility_consumers

@wffurr
Copy link

wffurr commented Sep 20, 2023

Any chance of this ever landing? It's amazing to me that there's no "stream.collect" equivalent.

@jimmywarting
Copy link

just figured if this could be part of something like iterator helpers
sense streams are iterable then this should be essentially free. and they would then also work for node streams and for any kind of iterator that could yield any form of ArrayBufferView

@bakkot
Copy link
Contributor

bakkot commented Oct 27, 2023

just figured if this could be part of something like iterator helpers

As mentioned above, iterator helpers have a .toArray method, and async iterator helpers will have the same. The helper will let you do await stream.values().toArray(), which is nicer than what you currently do - but getting all the bytes into a single buffer after is still going to be awkward, and even with a helper like this you'd still have a choice between either copying every chunk after reading all of them or giving up locality. This use case really wants a more specific "put all the bytes from a readable byte stream into a single buffer" helper.

And iterator helpers are intended as fully general utilities, so they're probably not the right place to have a method specifically for producers which yield chunks of bytes. That's a pretty uncommon thing to do with iterators anyway; the ReadableStream machinery is much better suited to handle such producers efficiently.

@datner
Copy link

datner commented Nov 9, 2023

I believe that is native streaming instruments had better api they would be used more, which should not be a controversial opinion..
Even if we are collectively 'stuck' on this issue, letting it rot will not move this forward. Streaming is a great concept, and is becoming more and more relevant in the browser.
Do we not think that this should be given another inspection?

@jasnell
Copy link

jasnell commented Mar 23, 2024

Related: #1238

@Jarred-Sumner
Copy link

Jarred-Sumner commented Mar 24, 2024

We would like to implement this in Bun, along with ReadableStream.prototype.text, ReadableStream.prototype.blob, and ReadableStream.prototype.json.

We also would be interested in implementing ReadableStream.prototype.array and ReadableStream.prototype.formData but feel less strongly about those. FormData can accept an optional Content-Type argument and if provided parse as multi-part form and otherwise parse as url-encoded form, rejecting the Promise if invalid data is passed (and canceling the stream)

@bakkot
Copy link
Contributor

bakkot commented Mar 24, 2024

Can I suggest instead having a .bytes() method which gives you a Uint8Array? See w3ctag/design-principles#463, whatwg/fetch#1724 (comment), etc. My proposal for base64 support in JS also operates on Uint8Arrays.

@jasnell
Copy link

jasnell commented Mar 24, 2024

I'm happy with this approach also (in contrast to what I suggest in #1238).

A key challenge I have with having these methods directly on ReadableStream is that they are not useful with ReadableStreams that are not byte-oriented. Putting these on a Reader instead makes it a bit nicer to limit to just byte-oriented streams similar to the how ReadableStreamBYOBReader does. If we do add these directly to ReadableStream.prototype then they should only work on byte-oriented streams.

ReadableStream.prototype.array() really isn't necessary given that Array.fromAsync(...) (https://github.com/tc39/proposal-array-from-async) is coming.

const chunks = Array.fromAsync(readable);

ReadableStream.prototype.text() is a bit tricky if we can't specify the text encoding. With response.text() we can use the content-type to identify the encoding, and at that point there's a bit of overlap with TextDecoderStream that we'd need to reconcile.

Honestly, the more I think about it, the more I'd really prefer the ...fromAsync(...) pattern for multiple things...

const u8 = await Uint8Array.fromAsync(iterable);  // iterable must provide BufferSources
const str = await String.fromAsync(iterable);  // iterable must provide stringifiable chunks
const val = await JSON.fromAsync(iterable);  // iterable must provide stringifiable chunks
// etc

For text encoding, the pattern is pretty natural:

const str = await String.fromAsync(readable.pipeThrough(new TextDecoderStream("...")));

@Jarred-Sumner
Copy link

If we do add these directly to ReadableStream.prototype then they should only work on byte-oriented streams.

If the streams are passing strings that should be fine too for json and text, right? Particularly if they're internally buffering rather than returning the next chunk of valid data

Honestly, the more I think about it, the more I'd really prefer the ...fromAsync(...) pattern for multiple things...

From an implementer perspective, it sounds nice.

From a JS developer perspective, I have trouble imagining developers prefer this over .text(), .bytes(), .json().

My gut here is that most JS developers would prefer the idea of both, but probably 98% of all usage would be one of .text(), .json() .arrayBuffer() .bytes() rather than the Constructor.fromAsync(iterable) variant.

People are used to fetch-like .text() instead of fromAsync and fromAsync is more complicated to explain.

This chain of reasoning is more complicated:

  1. it's a ReadableStream
  2. therefore it's an async iterable
  3. therefore to get the value as a String you call String.fromAsync

Compared to:

  1. call .text() on the ReadableStream

if we can't specify the text encoding

Isn't there precedent for defaulting to UTF-8? Or is that Windows-1252?

For text encoding, the pattern is pretty natural:
const str = await String.fromAsync(readable.pipeThrough(new TextDecoderStream("...")));

I think this pattern composes well, but would only feel natural to developers who understand a lot of streams and spec authors. It would feel complicated for less experienced developers

@bakkot
Copy link
Contributor

bakkot commented Mar 25, 2024

I would guess that something like Uint8Array.fromAsync would be unlikely to be optimized well: that is, it would probably end actually allocating and reifying each chunk of bytes as a JS object. It's possible to optimize that away under some circumstances, but it would require special cases for each producer/consumer pair, and there's a lot of fiddly details which would need to get implemented. Whereas something like ReadableStream.prototype.bytes() could trivially avoid those intermediate allocations.

@annevk
Copy link
Member

annevk commented Mar 25, 2024

There is precedent for enforcing UTF-8 for text() and I'd object to doing anything else here. Folks can write their own wrapper if they can't be bothered to fix the legacy content.

@ricea
Copy link
Collaborator Author

ricea commented Mar 25, 2024

It looks like we have a plan here, and we just need someone to do the work. I don't have time to commit to this in the near future, but if someone has time to draft a PR, I'd be happy to look at it.

I believe we have rough consensus on the following issues:

  1. Chunks should be of the same types as accepted by the Blob constructor. As an implementer, I'm a little unhappy with implementing Blob chunks because reading it is async and will make my life hard, but I can live with it. I also have some ergonomics concerns that accepting USVString will result in a lot of [object Object] in the output, but people have been living with that with Blob and it isn't the end of the world.
  2. We should only handle UTF-8 for input and output. TextDecoderStream already exists for people who need more encoding support.
  3. text(), arrayBuffer() and blob() seem uncontroversial. json() is easy to do if we spec it as parsing at the end like fetch does but maybe there isn't much demand? Other methods seem to require further discussion.

@annevk
Copy link
Member

annevk commented Mar 25, 2024

I think it's worth considering doing bytes() (Uint8Array) instead of arrayBuffer() as @bakkot suggested. We should probably do bytes() on the other objects that offer such methods.

@ricea
Copy link
Collaborator Author

ricea commented Mar 25, 2024

I like bytes() but I thought it might be controversial.

@Jarred-Sumner
Copy link

bytes() seems nice. I think it’s unusual for developers to want to create an ArrayBuffer, they’re anlmost always going to either pass it somewhere else or create a Uint8Array wrapping it. In JSC iirc creating an ArrayBuffer ends up being more expensive since it becomes a WastefulTypedArray internally. As a sidenote, are there plans for BodyMixin / Response / Request & Blob to get a bytes()?

@annevk
Copy link
Member

annevk commented Mar 25, 2024

No concrete plans, but PRs will be accepted.

@jasnell
Copy link

jasnell commented Mar 25, 2024

All SGTM!

@bakkot
Copy link
Contributor

bakkot commented Apr 3, 2024

As a sidenote, are there plans for BodyMixin / Response / Request & Blob to get a bytes()?

I have whatwg/fetch#1732 but there's no expressed implementer interest yet.

@saschanaz
Copy link
Member

I believe we have rough consensus on the following issues:

  1. Chunks should be of the same types as accepted by the Blob constructor

Maybe I'm missing something but I don't see this part in this thread, has there been discussion about this somewhere else? new Response(stream).arrayBuffer() throws if the chunk is not Uint8Array, so this is a bit surprising to me.

@ricea
Copy link
Collaborator Author

ricea commented Apr 8, 2024

Maybe I'm missing something but I don't see this part in this thread, has there been discussion about this somewhere else? new Response(stream).arrayBuffer() throws if the chunk is not Uint8Array, so this is a bit surprising to me.

This was based on @jimmywarting
and @jakearchibald's comments: #1019 (comment)

However, on re-reading I realised that @jakearchibald was only saying this should work for the blob() method. This comes for free if you create a Blob by converting the ReadableStream to an array and then calling the Blob(array) constructor, but I would like to stream data into the Blob if at all possible.

Anyway, it looks like we don't have consensus on this item, so let's discuss it a bit more.

First question: should all of text(), blob() and bytes() accept the same chunk types?

Arguments for:

  • Easy to explain.
  • Avoids people doing hacks like await (await response.body.blob()).text() just to accept the largest possible set of chunk types.

Arguments against:

  • Implicit conversions between text and binary are nasty.
  • Blob chunks are annoying to implement because the contents may only be available asynchronously.

Anything else?

@saschanaz
Copy link
Member

First question: should all of text(), blob() and bytes() accept the same chunk types?

I vote yes.

My second question: do we really want to accept everything possible in non-byte ReadableStream instead of simply requiring ArrayBufferView as ReadableByteStreamControllerEnqueue does? If yes, should we also accept everything possible in the byte stream controller?

@taralx
Copy link

taralx commented Apr 8, 2024

Is there an equivalent for getting the endings option if we allow strings?

If I had a vote I'd say "yes, but only buffer sources". Allowing Blobs and strings in a constructor is not the same as allowing them in a stream.

@jasnell
Copy link

jasnell commented Apr 9, 2024

I've made a first attempt at drafting some spec around this in #1311 ... it intentionally limits to just text, Uint8Array, and Blob, ignoring json() for now. Still likely a number of rough edges but it's a start

@ricea
Copy link
Collaborator Author

ricea commented Apr 10, 2024

#1311 allows cancellation of blob(), text() and bytes() with an AbortSignal. It cancels the ReadableStream if abort is signalled. An alternative would be to leave data in the stream. Personally I prefer the cancellation approach since it is more likely to be consistent between implementations, but if anyone feels differently please speak up.

@jasnell
Copy link

jasnell commented Apr 10, 2024

Yeah, definitely worth talking about that point. I had considered make it so that it would only cancel the read loop and not the underlying stream. A preventCancel option like with pipeTo/pipeThrough might be worthwhile?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Development

No branches or pull requests