-
Notifications
You must be signed in to change notification settings - Fork 29.9k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Reduce memory consumption of streams for large amounts of data #6078
Comments
That doesn't seem too untypical. Newer versions of v8 like to keep the allocations they make as long as possible. If it's actually an issue in streams like you imply, we could look into using an existing buffer I suppose. cc @nodejs/streams? |
Streams were designed for use within node.js programs where they are one of many parts of the program, so relying on garbage collection isn't all that big a deal for them because if they didn't some other part would so there wouldn't be a net gain. I'm not discounting that streams are allocation heavy but I'm not sure much they add to the memory usage in a real program where they are one of many things allocating objects. In a real program a stream will often be one of several in a chain where asynchronous actions are being done in the middle. This is the use case where streams excel as they may use hundreds of megabytes of memory, they will consistently use only that much memory no matter how much data is being piped through them. Another thing to bear in mind when talking about something (like memory usage) being wasteful, is that it actually might be buying something else, for instance you're version which does frequent garbage collection probably runs slower and use more CPU cycles. Streams are a compromise between being lazy and being performant, so for instance they all have input buffers so that different parts don't have to wait around for others, I'd guess that they might be an issue in real world programs taking up a more significant amount of memory. Streams are a good abstraction, but they are not all things to all users so if you need absolute minimum external memory usage streams have some features that might make that hard and you might be better off using fs.read. It would be pretty hard to fix some of these things without breaking changes, for instance you can currently keep a copy of a buffer you got from a stream and not do anything with it, e.g. this not terribly uncommon code: var chunks = [];
process.stdin.on('data' d=>chunks.push(d))
process.stdin.once('end', function() {
// do something with all the chunks
}); So I guess my questions for you are
|
First of all: are you getting an out of memory error? V8 gc takes into account the amount of physical memory that is available, and it performs best given the amount of memory that is available on the system. Because of this, allocating 300MB of RAM is probably the best thing to do to read that file as quickly as possible. On the other end, I completely agree that streams are not the best API for specific use cases, but they work great for 99% of the times. Also, streams are built to slow things down. They are built so that the producer is going at the speed of the consumer, and they offer a "simple" API for doing so. Implementing it manually takes a lot of code and skills. |
Actually, that's not true, and calling a scavenge each 50-th |
On my tests calling Here is my code: var fs = require('fs')
var stream = fs.createReadStream('out')
// create out by doing
// Mac:
// dd if=/dev/zero bs=1048576 count=1024 > out
// Linux:
// dd if=/dev/zero bs=1M count=1024 > out
var before = process.memoryUsage().rss
var counter = 0
// decomment to see the impact of calling gc()
// needs to be run with --expose-gc
// stream.on('data', () => counter++ % 50 === 0 && gc())
stream.once('end', function() {
console.log('memory increased by', Math.round((process.memoryUsage().rss - before) / 1024 / 1024), 'MB');
});
stream.resume() I am still not understanding what problem having free or uncollected memory creates. What is the use case we are discussing, and what are the implications. What could node do better if the memory consumption of streams would be lower? Anyway, the fact that calling gc() every once in a while reduced the memory consumption is extremely good, because it means it is probably a setting in v8 that can be configured for doing it in that way. |
@mcollina Scavenge is Nevertheless, manual mark-sweep calls each 1000-th data event also speeds things up a bit, but the memory usage is ~150 MiB. |
@ChALkeR you are right! Calling Here is what is happening: the objects are short-lived, and an (early) collection via scavenge solves the issues. Using
Basically, we keep hitting mark-and-sweep. |
Thank you for all the replies! I am not getting any out of memory errors, or even hitting 2GB remotely, this is just an observation that I've made, having seen rather simple scripts consume a lot of system RAM. So basically, things are working :) I understand that for most "real" programs, there are usually other allocations than streams, and are affected by GC too. It's just that most variables typically consume bytes of memory, whilst stream Buffers often consume 10000's of bytes, so it's much easier for the latter to overwhelm memory than the former, and can be done so in fairly trivial cases. I disagree with claims that consuming 300MB of RAM somehow improves efficiency in any way. Allocations force all sorts of performance penalties, from overwhelming CPU caches, forcing more context switches (due to increased page faults), forcing the GC to work more etc. See my example script below.
There are many good reasons for keeping the amount of external memory usage low. From making more memory available to other applications, reducing swap usage, to enabling the OS to more efficiently use disk buffering/caching. I don't believe V8 is intelligent enough to know about how a user may be using on their system, to effectively judge what amount of RAM is usage is 'available' (other than less = better) for optimal overall system performance.
For I don't believe a change necessarily has to be breaking to client code, though I don't think a solution is necessarily easy. In the example below, I've added a fake Here's a demonstration for calculating the MD5 hash of a file. Toggle the if condition to see the effect of the stream vs manually reading into two buffers (I use two buffers here to offset the effect of stalling that synchronous hashing induces).
On the system I have here (running node v0.10.29), using streams:
Using
The latter is both faster and keeps memory usage low, as somewhat expected. Thanks again for the responses! |
I think @animetosho is spot on. It's one of the reasons I usually try to avoid streams. I prefer to do as much static allocation of buffers and as few memory copies as possible. It's a simple and powerful idea, mechanical sympathy in action. It improves cache locality, reduces CPU, blocking, GC pressure etc. Node can be fast, one just needs to work with the GC, not against it. For example, @jasnell and @trevnorris are currently working on an improved Buffer.compare which accepts offsets into an existing buffer, rather than forcing a new object to be created using slice. These kinds of small changes really add up, and there are plenty of places in Node which could use them. If there was a streams3 that encouraged static allocation then Node would be all the better for it. |
@animetosho running your scripts on node v5.10: Stream:
Read:
I recognize the problem, but I do not see a solution, apart from improving V8 garbage collector, as discussed here #1671. What is happening is that buffers that pass through the stream harness do not trigger a scavenge. At some point the GC passes from scavenge to mark-and-sweep:
Does anyone has ideas why this happens? Or any tools/flags/methods to track these problems? |
They can, or |
@ChALkeR sorry, that is what I meant. I've edited the main comment. |
One side note, the two branches of this example behave in the same way (with memory balooning): var fs=require('fs');
var chunkSize = 65536;
var bigfile = 'out';
var hash = require('crypto').createHash('md5');
var read= require('stream').Readable()
read._count = 0
read._read = function () {
if (++read._count === 10000) {
this.push(null)
} else {
var buf = new Buffer(chunkSize)
buf.fill('42')
this.push(buf)
}
}
console.log(process.memoryUsage());
(function(cb) {
if(false) {
console.log('stream')
read.on('data', function (buf) {
hash.update(buf);
})
read.on('end', cb)
} else {
console.log('read')
for (var i = 0; i < 10000; i++) {
var buf = new Buffer(chunkSize)
buf.fill('42')
hash.update(buf);
}
cb()
}
})(function(err, out) {
console.log(process.memoryUsage());
if (out) {
console.log(out)
} else {
console.log(hash.digest('hex'));
}
}); |
Oh, oops. If the GC can clean up the objects earlier, then that will be a big improvement. Not as efficient as static memory allocation, but it's probably sufficient most of the time. |
|
@SourceBoy can you upload a barebone script to reproduce your problem using just |
I apologize. After conducting further tests, it may have been the underlying transport implementation in userland that led me to believe it was the stream. |
In my use case (passing through large amount of data from a process stdout to a socket) it would be helpful if underlying implementation of .pipe() checked whether both the source and destination streams are file descriptor based and then it could relay the data without routing it through the V8 memory management... |
I believe we may be hitting this issue, but am having a lot of trouble isolating it down to figure out what the root cause is. We run 100 or so Node applications per server, in general they use about 100mb of memory each. We have a few of them where the RSS is all the way up to 3-5gb. At the same time the heap is only 130mb or so. In example here is a dump of the following: console.log({
memoryUsage : process.memoryUsage(),
heapStatistics : v8.getHeapStatistics(),
heapSpaceStatistics : v8.getHeapSpaceStatistics()
});
{
"memoryUsage": {
"rss": 5089964032,
"heapTotal": 119304192,
"heapUsed": 88738968,
"external": 35846497
},
"heapStatistics": {
"total_heap_size": 119304192,
"total_heap_size_executable": 9437184,
"total_physical_size": 114067920,
"total_available_size": 1397227056,
"used_heap_size": 88741064,
"heap_size_limit": 1501560832,
"malloced_memory": 21024,
"peak_malloced_memory": 9687264,
"does_zap_garbage": 0
},
"heapSpaceStatistics": [
{
"space_name": "new_space",
"space_size": 8388608,
"space_used_size": 4776,
"space_available_size": 4119896,
"physical_space_size": 4263896
},
{
"space_name": "old_space",
"space_size": 75862016,
"space_used_size": 66552480,
"space_available_size": 2213064,
"physical_space_size": 75861048
},
{
"space_name": "code_space",
"space_size": 7397376,
"space_used_size": 5162560,
"space_available_size": 1422080,
"physical_space_size": 6814528
},
{
"space_name": "map_space",
"space_size": 4214784,
"space_used_size": 2076888,
"space_available_size": 2059560,
"physical_space_size": 3687040
},
{
"space_name": "large_object_space",
"space_size": 23441408,
"space_used_size": 14957624,
"space_available_size": 1387400704,
"physical_space_size": 23441408
}
]
} Looking at the dump you can see the rss is absolutely massive, but the heap isn't. We have been pouring over heapsnapshots for about a week now, and we can't find any evidence of anything leaking, and based on the fact that the heap isn't growing, this makes sense. In the above process it's been running for about 1 week or so, and the heap hasn't really changed size in that amount of time. The issue is that we have 100 apps running similar code, but it's clear something in a few of them is causing this RSS to balloon, while others are fine. Looking at the Node issues there is this one there is #13917, both are entirely possible as we have a few routes which do some key encrypt/decrypt. We also upload large files to Cloudinary. We accept POST requests from users. Lets say the problem we face is this issue, is there some way we could locate it? Right now it's basically impossible to try and find every piece of code that could be using Buffers, since they are everywhere... |
@owenallenaz those are extremely hard to debug, but I would say that you are not being hit by this issue. Generically, some piece of code that you are running are making V8 garbage collector balloon the RSS. This typically happens for long-running requests, you are accumulating the results (maybe using a stream) and then rendering those to the end users. If you pass two gc Scavenge cycles, your data is going to end up in old-space, which will then be collected by a full mark-and-sweep but before this happens slowly and V8 would try to bump the RSS if the problem persists. BTW, which version of Node are you running? I have tested this is Node 8, and the problem was described here was fixed, so I am closing this bug. |
We are running Node 7.10.0. On average our mean response time is 300ms, we have some responses that vary between 2000 - 5000ms when we are communicating with external services, slow form submissions to legacy infrastructure. We ran a test over the weekend where one set of our processes were moved down to 7.4.0 while the rest remained at 7.10.0. The ones that we moved down to 7.4.0, did not balloon the RSS. When I run the scripts from @animetosho on both 7.4.0 and 7.10.0, I can see that 7.4.0 (148mb) balloons the RSS worse on stream than it does on 7.10.0 (88mb). So that pretty much says that this isn't my problem, unfortunately. I'm still trying to understand how the different memory spaces work. I've read this post (http://jayconrod.com/posts/55/a-tour-of-v8-garbage-collection) pretty thoroughly which is old but I'm hoping still accurate. Unfortunately he doesn't really connect the dots between old space and RSS. So if you could help explain that would be awesome. So with that in mind, lets say something passes 2 GC scavenge cycles, it gets promoted to the old space. According to my tests as the old space increases so does the RSS, when the old space decreases the RSS decreases as well. The odd thing is if you look at the dumps above, you can see the the RSS is 5gb and yet the old space is only 66mb. Is this a case of memory fragmentation where there is still some in-use memory on page in RSS, preventing it from being deallocated, even though the old-space has been? I have not been able to figure out a way to get my RSS to keep growing and yet my old-space stays the same, but this tells me I must be missing a key piece of understanding on how these mechanics work. Is there a simplistic script which could be run to see this in action? If I'm able to reproduce it in isolation may give a clue as to what we're doing in production to cause the issue. |
Extremely slowly, and over a long period of time. So, when old space increase, it might be just for good. Also, Node 8 solves this problem, so I recommend upgrading. |
@owenallenaz I think we are having the same issue as you, and on Node 8, see #1671 (comment) There have been a few memory and fragmentation related issues in the past few months. |
@jorangreef Is there a chance you are using |
We're not using WeakMap anywhere. |
Is there some reason Stream couldn't work more like Linux pipe buffers where maximum buffer size is closer to 4 MB? If source tries to write faster to pipe, Linux will prefer giving CPU time to the process that reads from the pipe until the buffer is not full. This allows the buffer to often fit in L2 cache of the CPU and the performance is much better than going via system RAM. If a stream could balloon up to 4 MB then gc wouldn't need to run that often only because of lots of stream data which should improve overall performance, too. |
Node streams can (and usually do) eat up a lot of system memory if a lot of data passes through them. Here's a simple demo script to show what I'm talking about:
Run using
dd if=/dev/zero bs=1M count=1024|node script.js
On my machine, simply reading this data from stdin consumes a horrendous 340MB of RAM!
This is obviously due to the fact that streams work by constantly creating new Buffers, and rely on the GC to eventually clean things up. Whilst this may be the 'GC way of doing things', it's extremely wasteful considering that Buffers can quickly consume a lot of memory.
Unfortunately, there doesn't appear to be any easy way around this (that I know of). Even trying to hint the GC into running more often (reducing
max_old_space_size
) doesn't seem to help much.But if we change the last line above to
process.stdin.on('data', gc);
(and useexpose_gc
), the memory used to read the stream drops to a sane 1.6MB. Whilst this confirms that it is a GC related issue, it's not a good solution, especially for applications where GC is slow.For files, we can work around this with the
fs
module by directly invokingread
into a single fixed size Buffer. Doing so prevents this growth of memory usage, and is significantly more efficient than using a ReadStream (in terms of memory usage, less pressure on GC and fewer memory (de)allocations).Unfortunately, Node uses streams all over the place, so this strategy doesn't work everywhere, making it difficult/impossible to work around this inefficiency.
I don't have any good solutions to this, but I think that it should be considered, for those who wish to keep system memory usage low (or rather, not be wasteful about it). Solutions obviously need to stay efficient with smaller amounts of data, since I expect many streams to be of that nature.
Some ideas I have:
fs.read
). The stream would likely have its own internal buffer that never gets reallocated. Writes to the stream copy to the buffer, and reads copy out of it. Memory copies may seem wasteful in some circumstances, so it would have to be optional somehow.data
event orread
function, can be volatile. Users need to ensure that the data is consumed before returning from the callback (or resumed after pausing). This enables the stream (and the write component) to internally use a fixed Buffer without constantly having to allocate new ones whilst reading.I'm hoping that others may be able to come up with better ideas :)
Thank you for your consideration!
The text was updated successfully, but these errors were encountered: