Memory Leak #134

mattmcla · 2014-03-22T16:50:17Z

I've been hunting memory leaks for the past 4 days but this one stumped me for a bit. While the application was idle, it was consuming a consistant amount of memory. That's when I noticed new relic appeared to be crashing and not recovering.

Logs:
https://gist.github.com/mattmcla/958c26fb8e8374981016

packages:

"dependencies": {
"express": "~3.2.5",
"jade": "~0.35.0",
"versionator": "~0.4.0",
"oauth": "~0.9.10",
"pg": "~2.8.2",
"hat": "0.0.3",
"knox": "~0.8.8",
"mime": "~1.2.11",
"slugify": "~0.1.0",
"dateutil": "~0.1.0",
"newrelic": "~1.4.0",
"orchestrate": "0.0.4",
"bluebird": "~1.0.4",
"dotenv": "~0.2.4",
"less-middleware": "~0.2.0-beta"
},
"engines": {
"node": "0.10.22",
"npm": "1.3.x"
}

It's only been a hour or so since I removed new relic but I'm not seeing any leaky behavior.

rwky · 2014-03-24T13:11:11Z

I've been experiencing the same issue

groundwater · 2014-03-24T18:39:17Z

Hi @mattmcla we definitely do not want the module causing memory leaks. Our tracing module patches node core pretty deeply, so it's totally possible that we're causing a leak.

I would like to gather a bit more information here. It is expected that overhead will increase when using our module, but are you seeing a steady growth of memory over time? I have tried to reproduce the leak locally based on descriptions, but have been unable to show any significant growth of memory, even when placing a high load on our sample apps.

I am fairly sure we're capable of causing a leak, I just don't have enough information to reproduce the issue. If you have some sample code, I would be happy to try running it.

Alternatively, if you are able to core-dump the leaky process, I should be able to inspect the core for leaks. This would only work with Linux or SmartOS cores.

I hope this isn't causing too much extra overhead on your application, but I really appreciate any help you can provide here.

mattmcla · 2014-03-24T19:20:31Z

Here's a chart from you guys that outlines what was going on. As you can see after each restart memory consumption would go through the roof. The app consumes 42M just after start and typically settles at around 82M per instance after it's warmed up. With NR installed it would just consume memory until Heroku would restart the application, the site would get sluggish, etc. All the things that speak to a memory leak.

Our application is very new at this point and there are long periods of idle time but during those times memory would just get consumed and never let go.

Here is how we're kicking our express app off:

https://gist.github.com/mattmcla/b82b064a639efa4b7e00

Other than that it's a pretty straight forward expressjs app. I did try using it with just one CPU (still using cluster but overriding the numCPUs to 1) to the same effect. As for getting a core dump, that would be a bit of work as we're on Heroku. I'll look into it if it's necessary.

What I can tell you, is with new relic removed, we're running smooth.

mattmcla · 2014-03-24T19:24:27Z

Also, in my first comment I linked to a gist of our logs. It does show a stack trace coming from the new relic module. When those errors occur is when the memory starts climbing until there is no memory left.

groundwater · 2014-03-24T20:46:42Z

@mattmcla this is great information, thank you!

The errors in those logs are enough to suggest that there is a leaky way we're handling re-connection attempts. This may or may not be related to SSL, but given that this coincides with 1.4.0, I'm going to start looking there.

I still need to generate a reproducible case, but I believe I have enough information to go on. If you have any other information, logs or code samples I would gladly accept them. You can email me directly at [email protected].

@rwky if you have any logs to share, I would also love to see them.

Sorry for the spikes! I hope we can get this sorted out soon, and I appreciate all the detailed information you're providing.

mattmcla · 2014-03-24T20:53:14Z

Cool, yeah, I can replicate it every time, but it sometimes takes an hour before the failure happens. It lead me astray a number of times. Let me know if there is anything else I can do.

rwky · 2014-03-25T09:27:50Z

@groundwater here's our trace:

New Relic for Node.js was unable to start due to an error:
 Error: socket hang up
 at createHangUpError (http.js:1472:15)
 at CleartextStream.socketCloseListener (http.js:1522:23)
 at CleartextStream.EventEmitter.emit (events.js:117:20)
 at tls.js:696:10
 at node_modules/newrelic/node_modules/continuation-local-storage/node_modules/async-listener/glue.js:177:31
 at process._tickDomainCallback (node.js:459:13)
 at process.<anonymous> (node_modules/newrelic/node_modules/continuation-local-storage/node_modules/async-listener/index.js:18:15)

The same error is repeated throughout the logs, it only happened since upgrading to 1.4.0 (I've now pinned it at 1.3.2) and it takes a couple of hours to see a problem.

groundwater · 2014-03-31T22:50:18Z

I just wanted to update everyone here with that's been going on this last week.

We are focusing in on the socket hang up error that certain customers have been experiencing, but it has been difficult to reproduce. This may be intimately related to the network where the application runs, so it may be situational. Through some artificial means, I have been able to trigger the above hang up errors intermittently however, and I am using that to debug the problem. I will try to keep this thread updated.

This may or may not be related to the memory leak issues reported; I don't yet have a sample app that shows the same kind of leaks. Given the lack of data in this area, I'm focusing on fixing the socket error first. Once that's cleared, I will be diving deeper into the memory leak issue.

In the mean time, if anyone has an app they can share which reliably leaks memory with the newrelic module installed, I would be grateful for a solid repro case.

rwky · 2014-04-01T09:02:18Z

Our app is far too large (and confidential) to share but once you've fixed the socket hang up I can upgrade new relic and see if the issue persists.

mattmcla · 2014-04-01T14:55:20Z

Our app is our secret sauce so it's not something I'm willing to share. As mentioned by rwky, if you fix the socket hang up issue I'll be more than happy to give NR another shot.

groundwater · 2014-04-01T15:58:58Z

@rwky @mattmcla much appreciated! I'll keep you updated.

sebastianhoitz · 2014-04-02T10:32:55Z

I just wanted to jump in and say that we are experiencing the same issues.
Memory consumption is consistently getting bigger.

Edit: We are running on dedicated servers. No Heroku or similar platform.

brettcave · 2014-04-04T10:53:38Z

We've also experiencing this issue. Our express app dies with the following error message:

Allocation failed - process out of memory

We've just adopted NR into our node app. I'm busy trying to replicate the issue. Previously, we've been running our application for over a year, and it has been stable in terms of memory and CPU usage and general availability. Since adding in NR, we have noticed instability.

We're using express 2.5.x, jade 0.25.0 and nr 1.3.x with node 0.10.x and a handful of other modules. We're running in AWS, and using NodeJS clustering.

Update: I've just run some small load tests against our app: 1 with NR and 1 without. Difference in memory and CPU and memory is negligible, and I was not able to replicate the out of memory error.

brettcave · 2014-04-05T13:45:33Z

System memory usage per host. Hosts change after each deployment. We deployed NR integration on March 31. The app crashed after about 54h with Allocation failed error.

brettcave · 2014-04-07T15:56:13Z

have just boosted nodejs memory from 512m for more stability, with --max-old-space-size=1024

edit I have upgraded to NR 1.4.0, and it's more unstable than 1.3.x: crashes within 20 minutes of starting up the application consistently.

groundwater · 2014-04-11T23:16:39Z

We just released version 1.5.0 of the newrelic module, which includes a fix for the socket hangup error I've noticed in a number of user logs.

Unfortunately I do not know if this was causing the memory leak or not. I very much believe we have caused a leak somewhere, but I don't know where yet. We have not been able to reproduce the problem, which makes tracking it down difficult.

I would be grateful to anyone here would can try version 1.5.0 and let us know if the memory leak is still happening. At the very least, we will know definitively that it is not related to the socket hangup error. At best, we will have fixed the leak issue.

Thanks again for your wonderful support!

brettcave · 2014-04-12T05:32:05Z

Hi groundwater,

Thanks for the release. I've been running 1.5.0 for over 5 hours now and have not had a crash. 1.4.0 was crashing within 20 minutes, so on first look, the new version looks a lot more stable. I am going to add some load to the app and see how it handles over the next couple of days.

rwky · 2014-04-12T16:25:02Z

Unfortunately we're still experiencing memory leaks with 1.5.0. The socket hangup has been fixed though.

mattmcla · 2014-04-13T16:47:39Z

Memory still leaks with 1.5.0

This happened well after the memory started leaking

07:35:48.416  2014-04-13 14:35:48.217255+00:00 app web.1     - - {"name":"newrelic","hostname":"62cfab7c-9e03-4611-ba47-c92a5acc3305","pid":8,"component":"collector_api","level":50,"err":{"message":"write ECONNRESET","name":"Error","stack":"Error: write ECONNRESET\n    at errnoException (net.js:901:11)\n    at Object.afterWrite (net.js:718:19)","code":"ECONNRESET","errno":"ECONNRESET","syscall":"write"},"msg":"Calling metric_data on New Relic failed unexpectedly. Data will be held until it can be submitted:","time":"2014-04-13T14:35:48.199Z","v":0}
{"name":"newrelic","hostname":"62cfab7c-9e03-4611-ba47-c92a5acc3305","pid":8,"level":30,"err":{"message":"write ECONNRESET","name":"Error","stack":"Error: write ECONNRESET\n    at errnoException (net.js:901:11)\n    at Object.afterWrite (net.js:718:19)","code":"ECONNRESET","errno":"ECONNRESET","syscall":"write"},"msg":"Error on submission to New Relic (data held for redelivery):","time":"2014-04-13T14:35:48.217Z","v":0}
{"name":"newrelic","hostname":"62cfab7c-9e03-4611-ba47-c92a5acc3305","pid":8,"component":"collector_api","level":50,"err":{"message":"write ECONNRESET","name":"Error","stack":"Error: write ECONNRESET\n    at errnoException (net.js:901:11)\n    at Object.afterWrite (net.js:718:19)","code":"ECONNRESET","errno":"ECONNRESET","syscall":"write"},"msg":"Calling metric_data on New Relic failed unexpectedly. Data will be held until it can be submitted:","time":"2014-04-13T14:35:48.228Z","v":0}
{"name":"newrelic","hostname":"62cfab7c-9e03-4611-ba47-c92a5acc3305","pid":8,"level":30,"err":{"message":"write ECONNRESET","name":"Error","stack":"Error: write ECONNRESET\n    at errnoException (net.js:901:11)\n    at Object.afterWrite (net.js:718:19)","code":"ECONNRESET","errno":"ECONNRESET","syscall":"write"},"msg":"Error on submission to New Relic (data held for redelivery):","time":"2014-04-13T14:35:48.229Z","v":0}

In the picture you can see the line plateau. This is due to throttling on Heroku. All of this also happened during the night and we were receiving no traffic. It's also worth noting we're running 4 instances on 2 dyno's.

jmdobry · 2014-04-14T20:03:14Z

My Node app would run stable in memory in the 70-80 MB range. The last couple of weeks I've noticed that the memory usage has started to grow steadily until it caps the server, requiring a restart of the app. I've been debugging memory leaks for days, banging my head against this wall. I just recently tried removing newrelic-node from my app, and memory is stable again.

newrelic_agent.log looks fine except for the following error which appears periodically:

{"name":"newrelic","hostname":"www","pid":13499,"component":"collector_api","level":50,"err":{"message":"socket hang up","name":"Error","stack":"Error: socket hang up
  at createHangUpError (http.js:1472:15)
  at CleartextStream.socketCloseListener (http.js:1522:23)
  at CleartextStream.EventEmitter.emit (events.js:117:20)
  at tls.js:692:10
  at /var/www/app/node_modules/newrelic/node_modules/continuation-local-storage/node_modules/async-listener/glue.js:177:31
  at process._tickCallback (node.js:415:13)
  ","code":"ECONNRESET"},"msg":"Calling metric_data on New Relic failed unexpectedly. Data will be held until it can be submitted:","time":"2014-04-14T17:04:23.872Z","v":0}
{"name":"newrelic","hostname":"www","pid":13499,"level":30,"err":{"message":"socket hang up","name":"Error","stack":"Error: socket hang up
  at createHangUpError (http.js:1472:15)
  at CleartextStream.socketCloseListener (http.js:1522:23)
  at CleartextStream.EventEmitter.emit (events.js:117:20)
  at tls.js:692:10
  at /var/www/app/node_modules/newrelic/node_modules/continuation-local-storage/node_modules/async-listener/glue.js:177:31
  at process._tickCallback (node.js:415:13)
  ","code":"ECONNRESET"},"msg":"Error on submission to New Relic (data held for redelivery):","time":"2014-04-14T17:04:23.873Z","v":0}

othiym23 · 2014-04-14T20:05:19Z

@jmdobry that issue, at least, is fixed in v1.5.0 of New Relic for Node, which we released on Friday (see @groundwater's comment upthread), but for at least a few people this hasn't fixed the memory leak issue, which means the two probably aren't correlated. We're actively investigating this issue, but it's only happening for some people, and we haven't been able to reproduce the problem locally. Sorry, and thanks for your report and your patience while we figure this out!

jmdobry · 2014-04-14T20:10:17Z

@othiym23 Good to hear about the socket hangup. Still have the leak though. For what it's worth, the leak seemed worse the more verbose the logging level was.

brettcave · 2014-04-15T09:11:13Z

we're also still seeing the memory leak + instability (though not sure whether it's related to socket hangup). We're using a local port monitor that restarts the service when it stops responding (3s timeout), and we start experiencing non-responsiveness within an hour. However, we run 2 Node apps side by side, and only 1 of the apps ever becomes unresponsive, even though they have similar architectures.

Here's some info that might help with reproducing. As with others, we are unfortunately unable to share our app, but maybe something in here can be used to help narrow down the issue. Some libraries that are in the unstable app, but not in the stable app:

handlebars
events
requirejs
prettyjson
mandrill-api

Middleware configured in unstable app, not in stable app:

i18n (0.3.x)

The graph below shows host memory usage with NR enabled until about 5:30pm, after which we disabled it and restarted the services due to unresponsiveness.

sebastianhoitz · 2014-04-15T13:58:44Z

Just want to jump in here and say that we are also using

handlebars
i18n

Do the other people experiencing this also use these modules?

On Tue, Apr 15, 2014 at 11:11 AM, Brett cave [email protected]:

we're also still seeing the memory leak + instability (though not sure
whether it's related to socket hangup). We're using a local port monitor
that restarts the service when it stops responding (3s timeout), and we
start experiencing non-responsiveness within an hour. However, we run 2
Node apps side by side, and only 1 of the apps ever becomes unresponsive,
even though they have similar architectures.

Here's some info that might help with reproducing. As with others, we are
unfortunately unable to share our app, but maybe something in here can be
used to help narrow down the issue. Some libraries that are in the unstable
app, but not in the stable app:

handlebars

events

requirejs

prettyjson

mandrill-api

Middleware configured in unstable app, not in stable app:

i18n (0.3.x)

The graph below shows host memory usage with NR enabled until about
5:30pm, after which we disabled it and restarted the services due to
unresponsiveness.

[image: newrelic-memusage]https://cloud.githubusercontent.com/assets/129494/2705296/b7084846-c47b-11e3-9991-7e3dc2c00d27.png

—
Reply to this email directly or view it on GitHubhttps://github.com//issues/134#issuecomment-40460516
.

Sebastian Hoitz

Geschäftsführer, Entwickler, Coffee 2 Code converter

komola GmbH
Rebenring 33
38106 Braunschweig

Telefon: +49 531 3804200
Mobil: +49 175 2517038

Geschäftsführer: Sebastian Hoitz, Thomas Schaaf
Amtsgericht Braunschweig | HRB 201595

jfeust · 2014-04-18T03:48:29Z

We were in communcation with @groundwater as soon as 1.4.0 was release because we were experiencing this same memory leak. We've been running 1.3.2 due to this issue. I just want to throw our voice in that this is a major issue for us because business is starting to require RUM data. I just tried 1.5.0 and still have the leak. In about an hour we are reaching the heroku dyno limit.

"dependencies": {
"analytics-node": "0.5.0",
"connect-flash": "0.1.1",
"express": "3.4.6",
"jade": "~1.3.1",
"logfmt": "0.18.1",
"mongodb": "1.4.x",
"mongoose": "3.8.3",
"request": "2.30.0",
"underscore": "1.5.2",
"newrelic": "1.3.2",
"express-cdn": "https://registry.npmjs.org/express-cdn/-/express-cdn-0.1.9.tgz"
},

We're running over https on heroku and doing a lot of https api requests using the request http client. We're also using cluster.

jfeust · 2014-04-18T03:49:47Z

@sebastianhoitz nope, we're not using handlebars or i18n and are still experiencing the memory leak with 1.5.0

groundwater · 2014-04-18T04:54:47Z

I think there is a memory leak, but we still haven't been able to reproduce it. I think the fastest path to a solution from here is getting our hands on some hard evidence.

I completely understand if you cannot share you app, but perhaps there are other solutions.

We can do a screen share, and go hunting. There are several directions we can take here depending on what you're comfortable with.
You can send a heapdump of the application after it's been leaking memory for a while (the longer the better)
You can send a Linux or SmartOS core dump, which we can inspect using Joyent's handy mdb tool.

If you'd like to email me directly at [email protected] we can talk about details.

Thanks to everyone for their great help so far. It sucks when we cause problems for your apps, and we really really appreciate you helping us fix these things.

secretfader · 2014-04-25T15:06:18Z

I just finished preparing heapdumps that should help resolve this issue. I'm emailing them to you, @groundwater, and the New Relic support team.

secretfader · 2014-04-25T22:02:49Z

The latest theory that I've heard indicates that the memory leak has something to do with wrapping MongoDB queries. I created a simple proof-of-concept app that seems to verify this. If anyone has changes or tweaks that might help them shake out this bug, feel free to fork it.

https://github.com/nicholaswyoung/new-relic-leak

groundwater · 2014-04-29T22:59:53Z

@nicholaswyoung I ran your demo app, and drove 500k requests to it. I did not get a memory leak. Neither memwatch nor my external metrics indicating any problems, and the memory did not grow beyond about 80mb peak. The memory promptly dropped when I stopped driving traffic to the application.

I've asked my colleague to look at it, just in case there is something about system setup involved. Can you give me the exact command you used to drive traffic, and how long before the leak occurred?

txase · 2014-07-02T17:31:51Z

@Chuwiey, hrm, forgot that I can't see your email address through github. Can you email me at [email protected] so we can have a more direct dialog?

Thanks!

Chuwiey · 2014-07-02T18:39:22Z

Responding via email...

Rowno · 2014-07-05T13:06:11Z

This memory leak is pretty insane. Here's the memory usage I'm seeing before and after newrelic is included. The only difference in the code is the newrelic module being required, no other code is changed. The log level is set to warn.

framerate · 2014-07-08T06:14:05Z

Not sure if it's the same issue, but running with stats set to "trace" OR "info" I'm seeing a ~50MB increase every 30 minutes. Disabling this module (but still using new relic on the server) reports no increase in RAM over time.

Emailed heather@newrelic (my contact) more info, but I wanted to post here as well.

txase · 2014-07-08T15:48:02Z

@Rowno Without y-axis labels, your chart doesn't tell us much. Could you provide the amount of memory usage you are seeing before and after?

We're in the middle of doing some deep inspection of core dumps and other data to try to gather as much information as we can. Stay tuned for more info.

txase · 2014-07-08T15:52:59Z

@framerate Yes, using a verbose logging level can cause a serious increase in memory usage. We used to default to "trace" level, but have backed that off to just "info" level at this point.

That said, we believed "info" was unlikely to cause a noticeable memory usage. Can you double check that "info" level is still problematic?

The reason verbose log levels are problematic for memory usage is due to garbage collection. Normally, one would expect log message data to be ephemeral and be collected quickly during garbage collection scavenge cycles. However, we are finding log data persisting into the old-generational space of memory. The end result is that a lot of log messages end up sitting around in memory waiting for a slower, less frequent mark and sweep cycle to be collected. This means the memory usage in the steady state for a given app is higher than if all log messages were collected immediately by scavenge cycles.

We're still investigating. Stay tuned!

framerate · 2014-07-08T17:34:28Z

@txase I let it run over night and sadly found an actual memory leak in my API so my data is corrupted, but it still appears with a small sample size that even with 'info' logging I go up ~2MB every 5 minutes with newrelic running at app level. Initial tests you'll see running overnight the slight "slope". Then the drop off when I restarted without running new relic on the application and it baselines.

txase · 2014-07-08T17:54:28Z

@framerate The small slope could be an indication of a leak, or simply a rise over time that hasn't plateaued yet. Due to how the agent works, we need to isolate memory usage due to a leak versus usage due to higher request throughput. For your particular environment, it might be useful to let the app run a few days (going through day/night peak cycles), and then check for continually rising memory usage indicating a memory leak. When we've asked other customers to try this, they eventually see memory usage level-off.

Thanks for following up!

framerate · 2014-07-08T17:57:22Z

@txase - This is running on a micro AWS instance. The reason this is on my radar is because my server seems to hit 100% Ram (micro has no swap) and become non-responsive. Could be related to this issue, could not be, but it seems to be :(.

So running a few days and watching becomes an issue since the 100% RAM never drops back down. Granted some of the leaks were mine, but the above graph is running a clean app with/without newrelic agent.

I'm going to keep investigating, but I have to turn off newrelic until I have time to circle back next sprint.

txase · 2014-07-08T18:29:33Z

@framerate We're preparing a document with things you can do to mitigate memory usage. The gist is that you can try one of the following:

If your app doesn't do any crypto (HTTPS, SSL, etc.), reduce tls.SLAB_BUFFER_SIZE to 128 KB
Make sure you log at "info" level or higher
Reduce number of cluster workers if you use them (cluster probably isn't helpful on AWS micro instances anyways)
Upgrade to an instance with more memory

To a certain extent, we simply record a lot of data in order to provide our customers with as much info as possible. Using our product will entail a certain overhead in memory, and you may need to increate the available memory.

Thanks!

framerate · 2014-07-08T18:31:12Z

Thanks Chase! I'm upgrading to a medium soon. I don't mind the overheard, I
just need to make sure my app stays stable and obviously a micro instance
is part of the problem.

I look forward to seeing this doc! Thanks!

*- justin *| @framerate http://twitter.com/framerate | framerate.info

On Tue, Jul 8, 2014 at 11:29 AM, Chase Douglas [email protected]
wrote:

@framerate https://github.com/framerate We're preparing a document with
things you can do to mitigate memory usage. The gist is that you can try
one of the following:

If your app doesn't do any crypto (HTTPS, SSL, etc.), reduce
tls.SLAB_BUFFER_SIZE to 128 KB

Make sure you log at "info" level or higher

Reduce number of cluster workers if you use them (cluster probably
isn't helpful on AWS micro instances anyways)

Upgrade to an instance with more memory

To a certain extent, we simply record a lot of data in order to provide
our customers with as much info as possible. Using our product will entail
a certain overhead in memory, and you may need to increate the available
memory.

Thanks!

—
Reply to this email directly or view it on GitHub
#134 (comment)
.

supergrilo · 2014-07-17T20:19:03Z

I've been experiencing the same issue

I've two server with no requests, on server with newrelic module, memory leak is visible.

#Server with newrelic module

#Server without newrelic module

$ node --version
v0.10.29

├─┬ [email protected]
│ ├── [email protected]
│ ├─┬ [email protected]
│ │ ├─┬ [email protected]
│ │ │ └── [email protected]
│ │ └─┬ [email protected]
│ │ └── [email protected]
│ └── [email protected]

framerate · 2014-07-17T20:22:21Z

Thanks @supergrilo. I had to remove newrelic for now. They suggested that I run it on a server with more ram and it'll eventually plateau. How much ram is on your machine? (mine was ~600MB micro AWS instance)

supergrilo · 2014-07-17T20:38:51Z

@framerate

My machine has 4G of memory ram, but v8 are using only 1.4G for default.

jmaitrehenry · 2014-07-23T14:09:34Z

Hello, I have the same problem on my node.js apps.
My apps run withotu SSL/TLS stuff, I have a load balancer in front of nodejs apps.
My logs are set in 'info'.
I have 2 process in my cluster and run in a 2G instance server.

One of my node have newrelic, the other one not.
The first one use 1.3G of memory for the nodejs process, and the other one 300M.

$ node --version
v0.10.26

$ npm list
https://gist.github.com/Precea/702a4f0bb62ed110acd5

etiennea · 2014-07-23T14:56:46Z

Still have this!

rictorres · 2014-07-23T20:49:14Z

Yep, same here. Running Ghost 0.4.

At first I thought it was related to pm2, but then I tried with forever and also node filename.js

ericsantos · 2014-07-23T21:07:11Z

Hi, I have the same problem on my node.js apps.
Please increase the prioritization of this issue ;)

xavigil · 2014-07-23T21:13:04Z

+1

Some weeks ago I rolled back the new relic deployment and I am watching this issue since then. I understand it is not an easy problem to solve, but hopefully you can give it higher priority.

Thanks.

wraithan · 2014-07-23T21:15:54Z

The best way to get the priority bumped on a problem you are having is to
contact our support: https://support.newrelic.com/

The github issue tracker is being phased out, and all of our internal tools
for tracking issue priority depend on going through the support system.

On Wed, Jul 23, 2014 at 2:07 PM, ericsantos [email protected]
wrote:

Hi, I have the same problem on my node.js apps.

Please increase the prioritize of this issue ;)

—
Reply to this email directly or view it on GitHub
#134 (comment)
.

ruimarinho · 2014-07-23T21:44:06Z

I was experiencing the same issue running on node 0.11. I had to remove it because the memory increase was huge - from ~170mb to ~2gb. Like @rictorres, I originally thought this could be related to pm2, but the issue still presented itself when running with node --harmony.

@wraithan I don't think that at this point it should be up to us, as users and/or customers, to report this issue on yet another tracker, considering that a discussion is already on-going here.

wraithan · 2014-07-23T21:54:46Z

@ruimarinho I get that. But without account data, module lists, being able to correlate things across those, etc. It makes our job harder. On top of that, product management uses support tickets to push what is important.

@ruimarinho Also of note, we don't really support --harmony

We can't reproduce an unbounded leak. We can find cases of higher than user desired memory, but nothing that actually shows a leak. Most of the memory usage appears to be in flight objects, especially in the higher memory cases. That large number of in flight objects cause V8 to allocate a lot of memory (it is greedy) and puts pressure on the GC.

ruimarinho · 2014-07-23T23:29:38Z

I understand that node --harmony is not fully supported yet and that is one of the reasons why I was following this issue closely but had not reported anything back yet. My only purpose, like many other users involved in this discussion, is to find a fix that will allow us to continue using the newrelic service as part of our instrumentation tools.

Indeed, I can't really reproduce a memory leak but the observed memory usage is much, much higher than what you would normally get without the module installed. Like you said, this may not exactly be a bug in the module, but it is a tradeoff that some of us are not willing to make.

Nevertheless, the workarounds mentioned above to limit this issue did not work for me, but I'll gladly try other suggestions if you have any available. My help may be limited to node with the harmony flag enabled, but right now the symptoms are similar to what others are experience in node 0.10.

victorquinn · 2014-07-28T23:02:30Z

+1 here, we are seeing the same memory leak.

Hope this is resolved quickly, but honestly just glad to know about it -- I've been waking up at odd hours of the night for a couple of months to restart our Node processes to prevent the memory leak from overwhelming our servers and we didn't know the culprit until today. We have a few clusters of servers running on AWS with a handful of different Node apps, all with NewRelic with sawtooth memory usage graphs. Disabling NewRelic's Node module solved it immediately.

Just submitted a NewRelic support request as @wraithan suggested. Looking forward to a fix here.

etiennea · 2014-07-30T01:19:33Z

A big warning should be put somewhere! This agent should not be used in production as it will almost certainly decrease the performance substantially of your app due to this leak!!!! The stable version is 1.3.2

zamiang · 2014-07-31T16:25:05Z

@wraithan will followup with newrelic support but do want to add some learnings to this more public forum since it is very active.

In short, we have tried newrelic 1.9 with RUM enabled, 1.3.2 with RUM disabled (not supported) and sans newrelic. We did see some improvements when going from 1.9 to 1.3.2 but the when removing newrelic entirely we saw a significant drop in memory usage over time.

Here is a screenshot of our heroku dashboard with newrelic-node 1.9 installed vs no newrelic. Note that throughput is about the same. We are an https only app serving mostly web pages. Any sudden drops in app memory are from a deploy which restarts the app. I understand that monitoring isn't cheap but we saw significant improvements across the board when removing newrelic and are looking at other smaller monitoring solutions now.

txase · 2014-08-01T18:05:22Z

Hi folks,

First off, we've received a lot of very helpful information in this thread. We appreciate the amount of time and effort people have put into helping us determine potential issues in our agent. Our greatest concern is ensuring that we do not negatively impact our customers, so we take this issue very seriously.

We've spent a lot of time behind the scenes looking into the memory usage of our agent. We worked with a small number of customers who provided core dumps of their apps, and this has led to a few discoveries:

http://docs.newrelic.com/docs/agents/nodejs-agent/troubleshooting/large-memory-usage

However, continuing this issue on GitHub will not help us. If, after consulting the documentation above, you continue to experience memory usage issues, please follow up with us at [email protected]. If possible, please contact us using the email address you use to log into New Relic, and include your account # and application name(s). This is a temporary address specifically set up to help us create a direct support ticket for you. Creating a dedicated support ticket will allow us to work with you on an individual basis to gather the information we need. We highly encourage you to follow up there, and we will be locking this issue.

Relatedly, we are winding down our use of GitHub issues. It can be difficult to support our customers through GitHub because we can’t share confidential information. Instead, please contact us through our dedicated portal at http://support.newrelic.com for any other issues you encounter. We are better equipped to support you there, and issues filed there are resolved more quickly.

Towards this end, we will soon be turning off GitHub issues. Once we flip the switch, all access to issues, both active and closed, will be gone. This is an unfortunate limitation of how GitHub handles issues once the feature is disabled.

Thank you for your understanding as we undergo this transition.

zamiang mentioned this issue Jul 28, 2014

Does not track swap memory #168

Closed

txase closed this as completed Aug 1, 2014

newrelic locked and limited conversation to collaborators Aug 1, 2014

Memory Leak #134

Memory Leak #134

Comments

mattmcla commented Mar 22, 2014

rwky commented Mar 24, 2014

groundwater commented Mar 24, 2014

mattmcla commented Mar 24, 2014

mattmcla commented Mar 24, 2014

groundwater commented Mar 24, 2014

mattmcla commented Mar 24, 2014

rwky commented Mar 25, 2014

groundwater commented Mar 31, 2014

rwky commented Apr 1, 2014

mattmcla commented Apr 1, 2014

groundwater commented Apr 1, 2014

sebastianhoitz commented Apr 2, 2014

brettcave commented Apr 4, 2014

brettcave commented Apr 5, 2014

brettcave commented Apr 7, 2014

groundwater commented Apr 11, 2014

brettcave commented Apr 12, 2014

rwky commented Apr 12, 2014

mattmcla commented Apr 13, 2014

jmdobry commented Apr 14, 2014

othiym23 commented Apr 14, 2014

jmdobry commented Apr 14, 2014

brettcave commented Apr 15, 2014

sebastianhoitz commented Apr 15, 2014

jfeust commented Apr 18, 2014

jfeust commented Apr 18, 2014

groundwater commented Apr 18, 2014

secretfader commented Apr 25, 2014

secretfader commented Apr 25, 2014

groundwater commented Apr 29, 2014

txase commented Jul 2, 2014

Chuwiey commented Jul 2, 2014

Rowno commented Jul 5, 2014

framerate commented Jul 8, 2014

txase commented Jul 8, 2014

txase commented Jul 8, 2014

framerate commented Jul 8, 2014

txase commented Jul 8, 2014

framerate commented Jul 8, 2014

txase commented Jul 8, 2014

framerate commented Jul 8, 2014

supergrilo commented Jul 17, 2014

framerate commented Jul 17, 2014

supergrilo commented Jul 17, 2014

jmaitrehenry commented Jul 23, 2014

etiennea commented Jul 23, 2014

rictorres commented Jul 23, 2014

ericsantos commented Jul 23, 2014

xavigil commented Jul 23, 2014

wraithan commented Jul 23, 2014

ruimarinho commented Jul 23, 2014

wraithan commented Jul 23, 2014

ruimarinho commented Jul 23, 2014

victorquinn commented Jul 28, 2014

etiennea commented Jul 30, 2014

zamiang commented Jul 31, 2014

txase commented Aug 1, 2014