Memory usage not accurate on Heroku #7

schneems · 2014-04-07T04:39:51Z

From Heroku:

2014-04-07T04:16:14.403204+00:00 app[web.1]: PumaAutoTune (358s): All is well measure#puma.resource_ram_mb=488.5341796875 measure#puma.current_cluster_size=6 measure#puma.max_worker_limit=Infinity
2014-04-07T04:16:15.416413+00:00 heroku[web.1]: source=web.1 dyno=heroku.5385926.fb0a01d2-dbc7-46f1-aae9-d2eddc627596 sample#load_avg_1m=0.15 sample#load_avg_5m=0.06
2014-04-07T04:16:15.416714+00:00 heroku[web.1]: source=web.1 dyno=heroku.5385926.fb0a01d2-dbc7-46f1-aae9-d2eddc627596 sample#memory_total=516.06MB sample#memory_rss=510.52MB sample#memory_cache=0.02MB sample#memory_swap=5.52MB sample#memory_pgpgin=153581pages sample#memory_pgpgout=22883pages

Note that puma.resource_ram_mb=488.5341796875 while memory_total=516.06. I can't find docs on the behavior of Pss and whether swap memory gets reported.

For PumaAutoTune. I'm thinking we'll need to add a fudge factor similar to that in PumaWorkerKiller, though I would like to get this library as accurate as possible.

This may help a bit: https://www.kernel.org/doc/Documentation/filesystems/proc.txt

Related: schneems/puma_auto_tune#9

The text was updated successfully, but these errors were encountered:

schneems · 2014-04-07T04:43:48Z

I attempted to include swap from smaps and it doesn't look like it does anything:

2014-04-07T04:39:05.280486+00:00 heroku[web.1]: source=web.1 dyno=heroku.5385926.369b952b-5f15-4655-8c40-3cf56ca2de1b sample#load_avg_1m=0.12
2014-04-07T04:39:05.280975+00:00 heroku[web.1]: source=web.1 dyno=heroku.5385926.369b952b-5f15-4655-8c40-3cf56ca2de1b sample#memory_total=573.77MB sample#memory_rss=511.93MB sample#memory_cache=0.00MB sample#memory_swap=61.84MB sample#memory_pgpgin=176571pages sample#memory_pgpgout=45515pages
2014-04-07T04:39:05.281567+00:00 heroku[web.1]: Process running mem=573M(112.1%)
2014-04-07T04:39:05.281777+00:00 heroku[web.1]: Error R14 (Memory quota exceeded)
2014-04-07T04:39:08.011804+00:00 app[web.1]: PumaAutoTune (248s): All is well measure#puma.resource_ram_mb=500.4775390625 measure#puma.current_cluster_size=5 measure#puma.max_worker_limit=Infinity
2014-04-07T04:39:08.143982+00:00 app[web.2]: PumaAutoTune (248s): All is well measure#puma.resource_ram_mb=500.267578125 measure#puma.current_cluster_size=5 measure#puma.max_worker_limit=Infinity

Here's my the commit on my branch to add in swap: 17e6b9a

schneems · 2014-04-07T18:04:25Z

As an experiment I tried putting in a task that looped through all smaps files to get the RAM usage, but this comes out much higher than what Heroku reports:

2014-04-07T18:02:36.494027+00:00 app[web.2]: Experiment: measure#memory_total=268.771484375
2014-04-07T18:02:37.428483+00:00 heroku[web.2]: source=web.2 dyno=heroku.5385926.7c03cf14-af7a-4549-94cf-9bbd4af6e198 sample#memory_total=211.09MB sample#memory_rss=211.09MB sample#memory_cache=0.00MB sample#memory_swap=0.00MB sample#memory_pgpgin=74027pages sample#memory_pgpgout=19988pages

Here's the file: https://gist.github.com/schneems/10025798

jeremyhaile · 2014-04-29T16:38:45Z

I guess there is no way to get this information directly from heroku, other than using log drains - right?

I took a look at their API and don't see any memory reporting. I wish heroku offered their "runtime-metrics" via their API for each dyno.

jeremyhaile · 2014-04-29T16:39:28Z

Have you tried reaching out to heroku support to see if there is a command you could run to get the same value they are using?

jeremyhaile · 2014-04-29T16:43:17Z

Have you looked at how oink is doing it? Do any of these approaches match heroku?

https://github.com/noahd1/oink/blob/master/lib/oink/instrumentation/memory_snapshot.rb

schneems · 2014-04-29T18:57:35Z

I work for Heroku 😀

So this is actually a linux concern http://fabiokung.com/2014/03/13/memory-inside-linux-containers/ basically linux is not container aware when it comes to containers like LXC (which Heroku uses). While you can get the memory from outside the container easily, getting it from within, accurately, is really really hard (perhaps impossible?). As you mentioned we do drain the information via log-runtime-metrics, unfortunately they are getting this info from outside of the container.

Another suggestion someone gave me at Railsconf was to take a look at how Passenger measures memory as they do something similar to puma_auto_tune (so i've been told).

jeremyhaile · 2014-04-29T19:09:55Z

Ah - I see. Well what I think would be nice is if there was a way to use the heroku gem to query for the latest memory check. For example, if the values that log-runtime-metrics is displaying in the log could be stored somewhere and then retrieved via the API (i.e. the heroku gem)

Make sense? Obviously this isn't a tiny change, but would really help with things like puma_auto_tune

ksoderstrom · 2014-04-30T16:58:02Z

I briefly checked the Passenger code, and while they seem to have several memory measurements, this looks interesting: https://github.com/phusion/passenger/blob/master/ext/common/Utils/ProcessMetricsCollector.h#L495

They're using PSS + private dirty RSS + swap. I've been quite busy lately, but will try to get some time to test and see how that would work with puma_auto_tune/puma_worker_killer.

schneems · 2014-04-30T17:33:48Z

Thanks for the tip. I'm running on a mac, if you want an example snap check
out test/fixtures. looks like Private_Dirty is the key we would be
referencing. Heres some quick googling for links

http://stackoverflow.com/questions/17594183/what-does-private-dirty-memory-mean-in-smaps

http://bmaurer.blogspot.com/2006/03/memory-usage-with-smaps.html

On Wed, Apr 30, 2014 at 11:58 AM, Karl Söderström
[email protected]:

I briefly checked the Passenger code, and while they seem to have several
memory measurements, this looks interesting:
https://github.com/phusion/passenger/blob/master/ext/common/Utils/ProcessMetricsCollector.h#L495

They're using PSS + private dirty RSS + swap. I've been quite busy lately,
but will try to get some time to test and see how that would work with
puma_auto_tune/puma_worker_killer.

—
Reply to this email directly or view it on GitHubhttps://github.com//issues/7#issuecomment-41821276
.

ksoderstrom · 2014-04-30T22:32:40Z

I tested with PSS + dirty + swap, but the results are far from accurate.

2014-04-30T22:26:41.747623+00:00 heroku[web.1]: source=web.1 dyno=heroku.23103076.0038ac92-bc5d-4a8b-84d3-798ead3adfa7 sample#load_avg_1m=0.00
2014-04-30T22:26:41.747883+00:00 heroku[web.1]: source=web.1 dyno=heroku.23103076.0038ac92-bc5d-4a8b-84d3-798ead3adfa7 sample#memory_total=314.22MB sample#memory_rss=314.21MB sample#memory_cache=0.01MB sample#memory_swap=0.00MB sample#memory_pgpgin=271176pages sample#memory_pgpgout=190736pages
2014-04-30T22:26:44.006693+00:00 app[web.1]: PumaAutoTune (179s): All is well puma.resource_ram_mb=509.7822265625 puma.current_cluster_size=5

I've also tested a bit with different calculations for master and child processes. I believe that RSS + dirty + swap for the puma master process, and dirty + swap for the workers should give a pretty accurate result, but so far no luck...

schneems · 2014-05-01T20:05:43Z

I played with this for a bit. It looks like they're using real_memory = swap + private_dirty (https://github.com/phusion/passenger/blob/master/ext/common/Utils/ProcessMetricsCollector.h#L126).

When i try the same thing https://gist.github.com/schneems/4ac2983f0abf3e200ec9

2014-05-01T19:47:38.792533+00:00 app[web.1]: Experiment: measure#puma_memory_total=271.8359375
2014-05-01T19:47:39.230994+00:00 app[web.1]: Experiment: measure#memory_total=367.96875
14-05-01T19:47:39.226449+00:00 heroku[web.1]: source=web.1 dyno=heroku.5385926.f178766a-0af7-4379-8dc9-f030f419af94 
sample#memory_total=320.13MB sample#memory_rss=320.00MB sample#memory_cache=0.07MB sample#memory_swap=0.06MB sample#memory_pgpgin=102888pages sample#memory_pgpgout=20950pages

Neither 272.17 or 367.96 are close to the Heroku reported 320mb. It's interesting though that they're both about 15% off and in opposite directions. So if you average them together you get 319 which is pretty freaking close. This is voodoo math right now as I have no good reason why we should be doing this average other than it just happens to get close. Also this method only works when using a container as we're grabbing the entire memory of the whole OS.

ksoderstrom · 2014-05-01T20:19:28Z

But only using swap + private_dirty doesn't make sense, except for comparison between workers. To calculate the total memory size we need to include the shared memory somewhere, either as RSS (once?) or PSS per process. I still don't understand why including RSS in the master process doesn't give better results.

ksoderstrom · 2014-05-01T23:29:32Z

I did some more testing, and found something interesting while checking the memory of an instance only running bash:

Running `bash` attached to terminal... up, run.3434
~ $ ps ax
  PID TTY      STAT   TIME COMMAND
    1 ?        S      0:00 ps-run
    2 ?        S      0:00 bash
    3 ?        R+     0:00 ps ax
~ $ cat /proc/2/smaps | grep 'Rss' | awk '{ s+= $2 } END { print s }'
2052
~ $ cat /proc/2/smaps | grep 'Shared_Clean\|Shared_Dirty' | awk '{ s+= $2 } END { print s }'
1336
~ $ cat /proc/2/smaps | grep 'Private_Clean\|Private_Dirty' | awk '{ s+= $2 } END { print s }'
716
~ $

RSS should be Shared_Clean + Shared_Dirty + Private_Clean + Private_Dirty. Private_* is private to that process, and Shared_* is of course shared between multiple processes. In this case, RSS for pid 2 (bash) is 2052 kb, the shared parts are 1336 kb, and the memory private to this process is 716 kb.

My guess would be that this instance would use about 2mb of memory. Sounds reasonable since the RSS for bash is 2 mb, right?

2014-05-01T23:18:40.900237+00:00 heroku[run.3434]: source=run.3434 dyno=heroku.23103076.d44dfb2a-d74e-43a8-8b7c-307fc462ac68 sample#memory_total=1.01MB sample#memory_rss=0.73MB sample#memory_cache=0.28MB sample#memory_swap=0.00MB sample#memory_pgpgin=1395pages sample#memory_pgpgout=1137pages

Turns out that memory_rss is closer to only the private memory of the bash. Does this mean that the shared memory might actually be shared with other containers or the host? I've tried to find info about this, but haven't been able to yet.

If memory can be shared between containers and/or host, it will be even harder to figure out the memory use for a specific process...

allaire · 2014-06-12T00:30:43Z

@schneems Would I hit this nasty bug on something else than Heroku? Let see a basic digital ocean VPS running Ubuntu 12.04/14.04 ?

schneems · 2014-06-12T00:41:27Z

Basically measuring effective memory on any linux system seems to be impossible :( If you know you have the whole machine you could watch swap usage and minimize that, but it's not ideal.

bdmac · 2014-06-25T23:40:38Z

Any update on this lately? I switched back to puma_worker_killer from puma_auto_tune (not that that is all that relevant) and am seeing PWK report memory usage of 819mb when heroku reports 307mb.

schneems · 2014-06-26T01:33:43Z

I changed the calculation back to use RSS which is what Unicorn Worker Killer uses. RSS is a better stop gap measure as it reports that more memory is used as opposed to less. It's still not perfect.

We're working on rolling out some tools internally to better expose memory. It's a surprisingly hard problem. It will likely take awhile.
—
Richard Schneeman

On Wed, Jun 25, 2014 at 4:40 PM, Brian McManus [email protected]
wrote:

Any update on this lately? I switched back to puma_worker_killer from puma_auto_tune (not that that is all that relevant) and am seeing PWK report memory usage of 819mb when heroku reports 307mb.

Reply to this email directly or view it on GitHub:
#7 (comment)

bdmac · 2014-06-26T01:37:04Z

Somewhat strange that it's so tough when Heroku reports on this in the logs (via log_runtime_metrics) and obviously knows when to issue R12 warnings as well. As a potentially terrible solution could you just periodically parse it out of the logs haha? Another idea would be to have whatever is monitoring memory write the current memory usage to an ENV var whenever it outputs to the log?

schneems · 2014-06-26T15:46:01Z

The way memory is exposed outside the container is very different than how it's exposed inside. More info http://fabiokung.com/2014/03/13/memory-inside-linux-containers/

nathany · 2014-06-26T20:09:22Z

@schneems Would this gem still be useful for relative memory usage? Had memory jump by 369MB on one dyno today. I'm finding it surprisingly difficult to determine which action was the culprit. A lot can happen in the 20-seconds worth of requests between log-runtime-metrics output. Thinking of adding memory_total to our lograge logs.

igorbernstein · 2014-06-27T03:56:15Z

@nathany check out https://github.com/noahd1/oink

maxim · 2014-07-18T01:26:57Z

Could you folks clarify whether this issue is irrelevant if hosting without containers (simple VPS like DigitalOcean/Linode)?

bdmac · 2014-07-18T02:00:08Z

Would also be interested in knowing if this affects PX dynos on heroku since they are not multitenant?~ Brian

On Thu, Jul 17, 2014 at 6:26 PM, Maxim Chernyak aka hakunin
[email protected] wrote:

Could you folks clarify whether this issue is irrelevant if hosting without containers (simple VPS like DigitalOcean/Linode)?

Reply to this email directly or view it on GitHub:
#7 (comment)

schneems · 2014-07-18T19:13:20Z

@maxim YMMV. RSS or PSS are both approximate measures. Honestly it doesn't matter where you run they're only going to be indicators of if you're using more or less memory, but not ever a byte-by-byte measure of exact memory.

@bdmac a PX dyno is a container that takes up the ENTIRE instance, so it will be better...but not perfect. This buys you the ability to use existing OS tools like top to get an accurate view of resources being consumed. It's impossible to know if you're in a container if the container has access to the WHOLE machine, so right now this library doesn't have any special case code for the PX case.

The issue comes really from the nebulous nature of memory rather than if it's run in a container or not. What number does DigitalOcean/Linode use to start throttling your app, i.e. how do you know when your app has started swapping to disk because no more physical memory is available?

Hard as it is to believe, memory measurements when it comes to an OS is almost always an approximation at best. Trying to match our approximation to the host's approximation is the hard part. I've never run this anywhere in production other than Heroku, would be interested in experiences.

maxim · 2014-07-25T18:56:28Z

@igorbernstein btw was looking into oink, and found this, so probably not going to work well for puma.

@schneems I believe newrelic also uses rss for memory measurements, and their graphs always correlate perfectly with me getting "cannot allocate memory" errors, so perhaps my mileage will be pretty spot on? It's not like the function of "restart puma riiight about nnnow" can ever be precise, false positives are not really an issue as long as all the true positives are prevented. So considering the above, is this usable in production?

maxim · 2014-07-27T22:23:11Z

@schneems Heh, I forgot that this isn't the puma_auto_tune repo, but hopefully it was clear that that's what I was referring to.

maxrosecollins · 2015-06-22T08:30:38Z

Has this been fixed yet?

schneems · 2015-06-22T14:17:57Z

Has this been fixed yet?

It can't really be "fixed" from inside of a container until containers expose that information.

fernandokosh · 2015-09-22T13:21:43Z

And about XEN? Any issues?

pratik60 · 2016-03-13T05:12:13Z

@schneems The first line on your readme for puma auto-tune, says don't use until this is fixed. Should we just ignore this and use it?

schneems · 2016-03-13T14:16:58Z

Use at your own risk. I recommend puma worker killer rolling restarts instead.

watsonian · 2016-04-11T22:25:47Z

Just as an update, I was playing around with this some and it appears as though memory usage is accurate until you start swapping. After that, it fails to account for the swap and constantly shows up as being below the max memory limit because everything over that is getting swapped out.

As such, if you have a very regular memory leak that happens over the course of several minutes, it should be possible to catch it and restart with a fairly decent level of accuracy so long as you leave enough headroom in the percentage (e.g., 0.9 rather than 0.98) and a short enough frequency.

It's also worth noting that up until Oct 28, 2015, the file system cache was being included in the memory total reported by the platform (see here) which could have thrown off the calculation before as well.

jjb · 2017-01-21T00:48:16Z

What about an approach which has an app process its own logs using a middleware?

simplest approach: set up a log drain to send the app's logs back to itself. the middleware picks it up, looks at the memory info, discards everything. drawbacks:

generally adding complexity to production app
adding load to production app (although even with enormous log volume if it's the first middleware i bet it wouldn't affect the app's performance much).
log would be posted to random dynos so a dyno might go several minutes without know its own memory info.

more complicated approach to deal with problem 2: make standalone app which accepts log drain and nothing else. it filters out the memory info and sends it back to the production app. now the production app just needs to deal with a couple POSTs/minute.

even more complicated approach to deal with problems 2 and 3: same standalone app above, but instead of sending logs back to production app, the production app can query the standalone app on a schedule. this might offer other interesting opportunities for coordination. although it's getting into territory that is probably anathema to 12-factor.

schneems · 2017-01-24T07:58:08Z

Check out https://github.com/noahd1/oink it sounds similar to that proposal

ghost · 2018-05-06T04:39:52Z

@schneems any reason why PSS was totally deleted? Before the gem had an option which type to choose GetProcessMem.new(Process.pid, mem_type: 'rss') (https://github.com/schneems/get_process_mem/tree/edd767cff8a7463e756bac323742e711d445380f).

There are a lot libraries which uses PSS as well: https://github.com/propella/multimem (ruby, it does the thing but slower then this one), https://www.selenic.com/smem/ (python), https://github.com/ColinIanKing/smemstat (Python).

There is no problem with PSS. It is not an inaccurate. It's just PSS. For some processes it's better to use RSS, but for others - PSS. I really don't understand why PSS was deleted.

schneems · 2018-05-06T12:46:41Z

@vfreely this would be a better question for a new issue, it’s not related to the core issue here other than I was using pss to measure. The short answer is that I don’t remember. I think maybe it was because it wasn’t supported in all OSes but I’m not 100% positive.

ghost · 2018-05-06T13:54:31Z

@schneems got it, thanks for answer

iGEL · 2018-08-17T09:42:57Z

@schneems I was wondering whether the recent addition (Docker 1.13, January 2017) of the statistics /sys/fs/cgroup/memory might be of use here?

schneems · 2018-08-17T14:00:41Z

Heroku runs on LXC containers do you know if that cgroup interface was implemented there as well?

iGEL · 2018-08-17T14:27:20Z

@schneems No, sorry, I don't know, as we don't use it. However, it would be useful as it seems to become a defacto standard.

In general, this problem applies also to Docker and all other cgroup based containers, doesn't it?

schneems · 2018-08-17T21:05:10Z

The somewhat canonical issue I've been watching is moby/moby#8427 (comment). If that is indeed a defacto standard then I would post there to see what other people think of its use.

cesarizu · 2019-08-20T19:40:14Z

Is this related?

https://help.heroku.com/TWBM7DL0/how-do-i-measure-current-memory-use-and-max-available-memory-on-a-dyno-in-a-private-space

schneems · 2019-08-22T15:23:16Z

Kinda, but get_process_mem's job is to find the memory of a specific process while that file tells us the memory of the whole system. We could maybe add helper methods like GetProcessMem.heroku_private_space_mem and GetProcessMem.heroku_private_space? (so we can know when we're in a private space), i'm not sure what the cannonical way of knowing is, though I guess presence of that file would be a pretty good indication.

ioquatix · 2020-03-25T01:55:36Z

@schneems How does Heroku determine the space used by a container?

schneems · 2020-08-25T18:13:42Z

How does Heroku determine the space used by a container?

It's reported externally to the host OS. We don't document the exact mechanism anywhere.

I want to close this issue in the interest of cleaning up issues. It's a much bigger problem than can be solved in just this thread.

schneems mentioned this issue Apr 28, 2014

Memory usage probably wrong schneems/puma_auto_tune#6

Closed

schneems mentioned this issue May 6, 2014

Not compatible with get_process_mem 0.1.1 schneems/puma_auto_tune#13

Closed

schneems added a commit that referenced this issue Jun 16, 2014

revert back to RSS until #7 is solved

ed81d8f

schneems mentioned this issue Aug 15, 2014

Significantly incorrect memory usage reported zombocom/puma_worker_killer#6

Closed

schneems mentioned this issue Nov 5, 2014

workers being constantly booted zombocom/puma_worker_killer#8

Closed

schneems changed the title ~~PSS Memory usage not accurate~~ Memory usage not accurate May 7, 2018

schneems changed the title ~~Memory usage not accurate~~ Memory usage not accurate on Heroku May 7, 2018

wjordan mentioned this issue May 13, 2020

Report Proportional Set Size [PSS] when available instead of Resident Set Size [RSS] #40

Closed

schneems closed this as completed Aug 25, 2020

kevinhq mentioned this issue Apr 24, 2021

The memory reported doesn't match with the memory in Heroku metric zombocom/puma_worker_killer#95

Closed

Memory usage not accurate on Heroku #7

Memory usage not accurate on Heroku #7

Comments

schneems commented Apr 7, 2014

schneems commented Apr 7, 2014

schneems commented Apr 7, 2014

jeremyhaile commented Apr 29, 2014

jeremyhaile commented Apr 29, 2014

jeremyhaile commented Apr 29, 2014

schneems commented Apr 29, 2014

jeremyhaile commented Apr 29, 2014

ksoderstrom commented Apr 30, 2014

schneems commented Apr 30, 2014

ksoderstrom commented Apr 30, 2014

schneems commented May 1, 2014

ksoderstrom commented May 1, 2014

ksoderstrom commented May 1, 2014

allaire commented Jun 12, 2014

schneems commented Jun 12, 2014

bdmac commented Jun 25, 2014

schneems commented Jun 26, 2014

Any update on this lately? I switched back to puma_worker_killer from puma_auto_tune (not that that is all that relevant) and am seeing PWK report memory usage of 819mb when heroku reports 307mb.

bdmac commented Jun 26, 2014

schneems commented Jun 26, 2014

nathany commented Jun 26, 2014

igorbernstein commented Jun 27, 2014

maxim commented Jul 18, 2014

bdmac commented Jul 18, 2014

Could you folks clarify whether this issue is irrelevant if hosting without containers (simple VPS like DigitalOcean/Linode)?

schneems commented Jul 18, 2014

maxim commented Jul 25, 2014

maxim commented Jul 27, 2014

maxrosecollins commented Jun 22, 2015

schneems commented Jun 22, 2015

fernandokosh commented Sep 22, 2015

pratik60 commented Mar 13, 2016

schneems commented Mar 13, 2016

watsonian commented Apr 11, 2016

jjb commented Jan 21, 2017 • edited Loading

schneems commented Jan 24, 2017

ghost commented May 6, 2018 • edited by ghost Loading

schneems commented May 6, 2018

ghost commented May 6, 2018

iGEL commented Aug 17, 2018

schneems commented Aug 17, 2018

iGEL commented Aug 17, 2018

schneems commented Aug 17, 2018

cesarizu commented Aug 20, 2019

schneems commented Aug 22, 2019

ioquatix commented Mar 25, 2020

schneems commented Aug 25, 2020

jjb commented Jan 21, 2017 •

edited

Loading

ghost commented May 6, 2018 •

edited by ghost

Loading