Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

TraceEvent can cause OOMs in simulation #2218

Closed
jzhou77 opened this issue Oct 8, 2019 · 0 comments
Closed

TraceEvent can cause OOMs in simulation #2218

jzhou77 opened this issue Oct 8, 2019 · 0 comments

Comments

@jzhou77
Copy link
Contributor

jzhou77 commented Oct 8, 2019

We had some very rare OOMs in simulation for a while, which are very hard to debug. In the past, I had some evidence these might be related to TraceEvent, but I don't have enough proof. This is because most of the memory used in FDB is allocated via FastAllocator, which can't be tracked.

With #2217, for the first time, I could profile the whole memory usage including FastAllocator. What I did is:

  1. Install gperftools if needed, e.g., yum install -y gperftools-devel gperftools-libs gperftools
  2. Compile with gperf tools: cmake -DUSE_GPERFTOOLS=1 ../foundationdb -G Ninja; ninja
  3. Run with gperftools enabled:HEAPPROFILE=/tmp/fdbserver fdbserver [args...]
  4. Profile the heap profile: pprof-symbolize gperf-build/bin/fdbserver fdbserver.0065.heap

Finally, I got the profile, which can be downloaded here. The profile shows that 5914MB are allocated for TraceEvent (the total usage is 6318MB).

Screen Shot 2019-10-08 at 10 10 19 AM

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

1 participant