-
-
Notifications
You must be signed in to change notification settings - Fork 17
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
make both node & web stream example the same #1
Comments
but i would change the node example to use And i would have used let stream = fs.createReadStream(filePath);
let webStream = ReadableStream.from(stream);
let textStream = webStream.pipeThrough(new TextDecoderStream());
let parser = null;
for await (const strChunk of textStream) {
parser ??= initParser(inferSchema(strChunk));
parser.chunk(strChunk, parser.stringArrs);
}
let result = parser.end(); |
there is also this new let blob = await fs.openAsBlob(filePath);
let stream = blob.stream().pipeThrough(new TextDecoderStream())
let parser = null;
for await (const strChunk of textStream) {
parser ??= initParser(inferSchema(strChunk));
parser.chunk(strChunk, parser.stringArrs);
}
let result = parser.end(); but it requires node v20+ |
maybe you could also get some perf boost by moving the let blob = await fs.openAsBlob(filePath);
let stream = blob.stream().pipeThrough(new TextDecoderStream());
let iterator = stream.values();
let firstChunk = await await iterator.next()
let parser = initParser(inferSchema(firstChunk.value));
parser.chunk(firstChunk.value, parser.stringArrs);
// continue on the same iterator.
for await (const strChunk of iterator) {
parser.chunk(strChunk, parser.stringArrs);
}
let result = parser.end(); |
could also use two loops as described by this: https://stackoverflow.com/a/51020535/1008999 let blob = await fs.openAsBlob(filePath);
let stream = blob.stream().pipeThrough(new TextDecoderStream());
let iterator = stream.values();
// this will only loop 1 time.
for await (const strChunk of iterator) {
const parser = initParser(inferSchema(strChunk));
parser.chunk(strChunk, parser.stringArrs);
// continue on the same iterator by consuming the rest of the iterator.
for await (const strChunk of iterator) {
parser.chunk(strChunk, parser.stringArrs);
}
let result = parser.end();
} |
hey, yeah we can try these out and see. in my testing, the older/more mature apis tend to be faster. for example, the all these variations would be good for a wiki page or something, but i'd like to keep the fastest option in the main readme. this thread is kind of nice affirmation that i made the right choice to keep the chunk-feeding pipeline external to the core.
the chunk size is fixed to 64KB [1], so you have 16 of these per MB. if we're maxing out at about 300MB/s then that's 4800 null checks that should be noise-level (even if the JIT doesnt optimize them out, which looks like it should be easy). i'll bet that nullish coalesce on its own is on the order of millions/sec. i'm happy to be proven wrong...with data ;) |
it looks like nodejs/node#49089 just landed in v20.6.0: https://nodejs.org/en/blog/release/v20.6.0 👀 |
Not really worth opening as it's own issue, especially if these examples are getting reworked anyhow, but there is a typo in the node example here That variable should be called |
oops! |
Wanted to open up a discussion. But that was closed. |
NodeJS streams as well as web streams are both async iterable and both yields uint8arrays. and both env have
TextDecoderStream
in the global namespaceso there is no need to use eventEmitter and writing different code for both.
so this
and this:
could be written as:
The text was updated successfully, but these errors were encountered: