-
Notifications
You must be signed in to change notification settings - Fork 29.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Readable.toWeb
seems to load file contents to memory
#46347
Comments
Can you post a complete, ready-to-run test case? Maybe you've hit a bug, maybe you haven't, but it's impossible to tell right now. |
// since it's ESM, save it as .mjs
import fs from 'node:fs'
import process from 'node:process'
import {Readable} from 'node:stream'
// we initialize a stream, but not start consuming it
const randomNodeStream = fs.createReadStream('/dev/urandom')
// after 10 seconds, it'll get converted to web stream
let randomWebStream
// we check memory usage every second
// since it's a stream, it shouldn't be higher than the chunk size
const reportMemoryUsage = () => {
const {arrayBuffers} = process.memoryUsage()
console.log(
`Array buffers memory usage is ${Math.round(
arrayBuffers / 1024 / 1024
)} MiB`
)
if (arrayBuffers > 256 * 1024 * 1024) {
// streaming should not lead to such a memory increase
// therefore, if it happens => bail
console.log('Over 256 MiB taken, exiting')
process.exit(0)
}
}
setInterval(reportMemoryUsage, 1000)
// after 10 seconds we use Readable.toWeb
// memory usage should stay pretty much the same since it's still a stream
setTimeout(() => {
console.log('converting node stream to web stream')
randomWebStream = Readable.toWeb(randomNodeStream)
}, 10000)
// after 15 seconds we start consuming the stream
// memory usage will grow, but the old chunks should be garbage-collected pretty quickly
setTimeout(async () => {
console.log('reading the chunks')
for await (const chunk of randomWebStream) {
// do nothing, just let the stream flow
}
}, 15000) This produces the same behavior on macOS and Linux:
Immediately after using You can compare it to a third-party library // since it's ESM, save it as .mjs
import fs from 'node:fs'
import process from 'node:process'
import nodeToWebStream from 'readable-stream-node-to-web'
// we initialize a stream, but not start consuming it
const randomNodeStream = fs.createReadStream('/dev/urandom')
// after 10 seconds, it'll get converted to web stream
let randomWebStream
// we check memory usage every second
// since it's a stream, it shouldn't be higher than the chunk size
const reportMemoryUsage = () => {
const {arrayBuffers} = process.memoryUsage()
console.log(
`Array buffers memory usage is ${Math.round(
arrayBuffers / 1024 / 1024
)} MiB`
)
if (arrayBuffers > 256 * 1024 * 1024) {
// streaming should not lead to such a memory increase
// therefore, if it happens => bail
console.log('Over 256 MiB taken, exiting')
process.exit(0)
}
}
setInterval(reportMemoryUsage, 1000)
// after 10 seconds we use nodeToWebStream
// memory usage should stay pretty much the same since it's still a stream
setTimeout(() => {
console.log('converting node stream to web stream')
randomWebStream = nodeToWebStream(randomNodeStream)
}, 10000)
// after 15 seconds we start consuming the stream
// memory usage will grow, but the old chunks should be garbage-collected pretty quickly
setTimeout(async () => {
console.log('reading the chunks')
for await (const chunk of randomWebStream) {
// do nothing, just let the stream flow
}
}, 15000) In that case, the memory usage is fine:
|
Thanks, I see what you mean. The stream starts reading before there's something consuming it. @nodejs/whatwg-stream this is a legitimate bug. You can see it even more clearly when you switch from /dev/urandom to /dev/zero. edit: bug also exists in v19.6.0. |
Hello, @Dzieni I think this issue is solved if you pass a strategy while converting the stream from node stream something like: randomWebStream = Readable.toWeb(randomNodeStream, {
strategy: new CountQueuingStrategy({ highWaterMark: 100 }),
}); Tried this on my local machine and memory doesn't seem to be overflowing I think the behavior here is somewhat expected when the readable stream is created, the pull function as described here https://nodejs.org/api/webstreams.html#new-readablestreamunderlyingsource--strategy would be called continuously as soon as the node/lib/internal/webstreams/adapters.js Line 462 in 23effb2
streamReadable.resume() will try to consume the whole file causing memory overflow, but when we pass a strategy it ensures backpressure is applied
Maybe the docs here could be updated to include this scenario https://nodejs.org/api/stream.html#streamreadabletowebstreamreadable-options @bnoordhuis |
I think this is a bug. The whole point of streams is to manage the flow of data. |
So something like a default highWatermark while doing |
No I think there is an actual bug somewhere. Instead of resume, this should call read() |
@debadree25 |
Hey guys and gals, could you please confirm my findings. So I've looked into this briefly, and I believe I have traced this to Now, I am not entirely sure if it's by design but neither Refusing to pull when a stream has no reader seems to alleviate the issue. Something like this seems to do the trick: Testing against @Dzieni's benchmark - slightly modified to convert to the web stream sooner, - here's before the change is applied:
Here's after the change is applied:
For completeness, I'm attaching @Dzieni's benchmark with the slight modification that I mentioned along with this message. // since it's ESM, save it as .mjs
import fs from 'node:fs'
import process from 'node:process'
import {Readable} from 'node:stream'
// we initialize a stream, but not start consuming it
const randomNodeStream = fs.createReadStream('/dev/urandom')
// in a few seconds, it'll get converted to web stream
let randomWebStream
// we check memory usage every second
// since it's a stream, it shouldn't be higher than the chunk size
const reportMemoryUsage = () => {
const {arrayBuffers} = process.memoryUsage()
console.log(
`Array buffers memory usage is ${Math.round(
arrayBuffers / 1024 / 1024
)} MiB`
)
if (arrayBuffers > 256 * 1024 * 1024) {
// streaming should not lead to such a memory increase
// therefore, if it happens => bail
console.log('Over 256 MiB taken, exiting')
process.exit(0)
}
}
setInterval(reportMemoryUsage, 1000)
// after 3 seconds we use Readable.toWeb
// memory usage should stay pretty much the same since it's still a stream
setTimeout(() => {
console.log('converting node stream to web stream')
randomWebStream = Readable.toWeb(randomNodeStream)
}, 3000)
// after 30 seconds we start consuming the stream
// memory usage will grow, but the old chunks should be garbage-collected pretty quickly
setTimeout(async () => {
console.log('reading the chunks')
for await (const chunk of randomWebStream) {
// do nothing, just let the stream flow
}
}, 30000) [Edit, Feb 14]: updated the draft solution link to point to a specific commit. |
@jasnell @KhafraDev could you take a look? |
It looks like node implements |
@debadree25 I've been looking into this on and off and I may need a clarification. Could you please clarify it for me - whether in the code snippet below, - the const randomNodeStream = fs.createReadStream('/dev/urandom')
const randomWebStream = Readable.toWeb(randomNodeStream) |
From what I understand the high watermark would be 65536 "chunks" since here node/lib/internal/webstreams/adapters.js Line 424 in 132c383
we dont set any size function and by default the size function just returns 1 Ref: node/lib/internal/webstreams/util.js Line 73 in 132c383
so it would be 65536 "chunks" each chunk regarded as size 1 The comment in lines node/lib/internal/webstreams/adapters.js Lines 420 to 423 in 132c383
mention ByteLengthQueuingStrategy as unecessary but maybe it indeed is?
|
@debadree25 Thank you for your reply. As I was inspecting the code yesterday, I just found it somewhat odd - hence I had to clarify it with someone. Another odd thing I found, - if you have code like this: const randomNodeStream = fs.createReadStream('/dev/urandom', {
highWaterMark: 5556
})
const randomWebStream = Readable.toWeb(randomNodeStream) Upon inspection with a debugger, I found that That means if a user does not explicitly pass an hwm to Is this a bug or am I misunderstanding it? |
This isn't a "good first issue", I don't think "streams" and "good first issue" really mix except for tests/docs |
@lilsweetcaligula tbf even I am confused here would need deeper investigation 😅😅 |
@benjamingr I think this might be as simple as it gets for streams. The problem is that in node/lib/internal/webstreams/adapters.js Line 462 in 0093fd3
resume() and in node/lib/internal/webstreams/adapters.js Lines 436 to 437 in 0093fd3
pause() only on certain conditions (which I think are never met or similar). This is somewhat dangerous and can cause exactly what we are seeing here.
Note that this should be calling |
Refs: #46347 PR-URL: #46617 Reviewed-By: Antoine du Hamel <[email protected]> Reviewed-By: Harshitha K P <[email protected]> Reviewed-By: James M Snell <[email protected]>
Hey @mcollina I think @debadree25 is right in his comment that we should use the nevertheless, I did try using the |
Refs: #46347 PR-URL: #46617 Reviewed-By: Antoine du Hamel <[email protected]> Reviewed-By: Harshitha K P <[email protected]> Reviewed-By: James M Snell <[email protected]>
Refs: #46347 PR-URL: #46617 Reviewed-By: Antoine du Hamel <[email protected]> Reviewed-By: Harshitha K P <[email protected]> Reviewed-By: James M Snell <[email protected]>
BLQS = ByteLengthQueuingStrategy
BLQS = ByteLengthQueuingStrategy Fixes: nodejs#46347
Any updates on this? |
In meanwhile I found another workaround - |
BLQS = ByteLengthQueuingStrategy Fixes: nodejs#46347
Fixes: #46347 PR-URL: #48847 Reviewed-By: James M Snell <[email protected]> Reviewed-By: Matteo Collina <[email protected]>
Fixes: #46347 PR-URL: #48847 Reviewed-By: James M Snell <[email protected]> Reviewed-By: Matteo Collina <[email protected]>
I had the same issue with the memory leak, and after installing Node v22.2 it went away, but not entirely. I still get massive slowdown/memory leak when I use the ReadableStream in fetch (which many probably try with this toWeb function): await fetch(url, {
method: 'POST',
body: Readable.toWeb(data),
...({ duplex: "half" })
}); I had to switch to |
Fixes: #46347 PR-URL: #48847 Reviewed-By: James M Snell <[email protected]> Reviewed-By: Matteo Collina <[email protected]>
Fixes: #46347 PR-URL: #48847 Reviewed-By: James M Snell <[email protected]> Reviewed-By: Matteo Collina <[email protected]>
Fixes: #46347 PR-URL: #48847 Reviewed-By: James M Snell <[email protected]> Reviewed-By: Matteo Collina <[email protected]>
Fixes: nodejs#46347 PR-URL: nodejs#48847 Reviewed-By: James M Snell <[email protected]> Reviewed-By: Matteo Collina <[email protected]>
Fixes: nodejs#46347 PR-URL: nodejs#48847 Reviewed-By: James M Snell <[email protected]> Reviewed-By: Matteo Collina <[email protected]>
I want to note that I'm still experiencing a serious memory leak here as of node The I refactored to use the async function requestHandler(request, response){
const upstreamUrl = decodeURIComponent(request.query.url);
const upstreamResponse = await fetch(upstreamUrl, {
method: request.method,
duplex: 'half',
// Both approaches cause memory leaks
body: Readable.toWeb(request),
// body: ReadableStream.from(request),
});
await upstreamResponse.body.pipeTo(Writable.toWeb(response));
} |
Version
v19.5.0
Platform
Darwin dzieni 21.6.0 Darwin Kernel Version 21.6.0: Wed Aug 10 14:28:35 PDT 2022; root:xnu-8020.141.5~2/RELEASE_ARM64_T8101 arm64
Subsystem
stream
What steps will reproduce the bug?
const nodeStream = fs.createReadStream('/some/large/file')
process.memoryUsage()
const webStream = (require('stream').Readable).toWeb(nodeStream)
process.memoryUsage()
How often does it reproduce? Is there a required condition?
At all times
What is the expected behavior?
Memory usage does not grow significantly.
What do you see instead?
Memory usage (precisely
arrayBuffers
section) grows by a few orders of magnitude. It seems to be correlated with the file size.Additional information
No response
The text was updated successfully, but these errors were encountered: