Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add first of the new diagnostics articles to guides #1444

Merged
merged 5 commits into from
Feb 20, 2019
Merged
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
121 changes: 121 additions & 0 deletions locale/en/docs/guides/diagnostics-flamegraph.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,121 @@
---
title: Diagnostics - Flame Graphs
layout: docs.hbs
---

# Flame Graphs

## What's a flame graph useful for?
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Explaining why before explaining how - part of my intended pattern for those guides.


Flame graphs are a way of visualizing CPU time spent in functions. They can help you pin down where you spend too much time doing synchronous operations.

## How to create a flame graph

You might have heard creating a flame graph for Node.js is difficult, but that's not true (anymore).
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I support this a lot as a technical documentation for how to make a flame graph using only basic primitives. However, I wouldn't call this easy. I someone asked me how to make flame graphs easily I would defer to 0x.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hehe, easy in comparison. It's a second mention of 0x, so it's time I added a mention. thx

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

someone who hasn't heard its difficult now has, maybe this will make people nervous before they even started? I third the 0x reference

Solaris vms are no longer needed for flame graphs!

Flame graphs are generated from `perf` output, which is not a node-specific tool. While it's the most powerful way to visualize CPU time spent, it may have issues with how JavaScript code is optimized in Node 8 and above. See [perf output issues](#perf-output-issues) section below.

### Use a pre-packaged tool

If you want a single step that produces a flame graph locally, try [0x](https://www.npmjs.com/package/0x)

For diagnosing production deployments, read these notes: [0x production servers](https://github.com/davidmarkclements/0x/blob/master/docs/production-servers.md)

### Create a flame graph with system perf tools

The purpose of this guide is to show steps involved in creating a flame graph and keep you in control of each step.

If you want to understand each step better take a look at the sections that follow were we go into more detail.

Now let's get to work.

1. Install `perf` (usually available through the linux-tools-common package if not already installed)
2. try running `perf` - it might complain about missing kernel modules, install them too
3. run node with perf enabled (see [perf output issues](#perf-output-issues) for tips specific to node versions)
```bash
perf record -e cycles:u -g -- node --perf-basic-prof app.js
```
4. disregard warnings unless they're saying you can't run perf due to missing packages; you may get some warnings about not being able to access kernel module samples which you're not after anyway.
5. Run `perf script > perfs.out` to generate the data file you'll visualize in a moment. It's useful to [apply some cleanup](#filtering-out-node-internal-functions) for a more readable graph
6. install stackvis if not yet installed `npm i -g stackvis`
7. run `stackvis perf < perfs.out > flamegraph.htm`
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I tried this today and the result does not display nicely for me in the browser. Does it still work ok for you?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe we should suggest using Brendan Gregg's flamegraph scripts instead? Stackvis result is quite unreadable here as well:

stackvis perf < perfs.out > flamegraph.htm :

image

./FlameGraph/stackcollapse-perf.pl < ./perfs.out | ./FlameGraph/flamegraph.pl --color=js > ./flamegraph.svg:

image

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I did follow that suggestion in the demo I was putting together for my NodeConfEU presentation. FlameGraph worked for me. Once we close on the content we should see if we can figure out some testing to make sure our recommendation continues to work properly.

Copy link
Contributor Author

@naugtur naugtur Oct 2, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That's exactly why I'm not saying this is ready to publish - it started showing me the same issue you have on the screenshot. It seemed an issue with perf output, so I upgraded my OS and retired, got the same thing, ran out of time. I'll check other options, but it's good to see the output from perf seems ok and it's about the visualization tool.

I'm hoping to look for reasons in stackvis and explore other options. IMHO it'd be nice to use something one can install from npm after all.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@naugtur thanks for the update.

Copy link
Contributor Author

@naugtur naugtur Oct 27, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've compared various flamegraph generators and the difference in readabiliy is mostly due to overflow:hidden of long trace names, but the input data is still incorrect. I switched to ubuntu 18.04.1 and it still produces bad stack traces.

Planning to check on a few systems and compare.

And BTW, 0x didn't work for me either.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

but the input data is still incorrect

Incorrect how? If you could share an example I can happily take a look :)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

https://naugtur.egnyte.com/fl/6GGszZUQFy
Every tool I used to generate a flame graph on my machine had the same issues


Now open the flame graph file in your favorite browser and watch it burn. It's color-coded so you can focus on the most saturated orange bars first. They're likely to represent CPU heavy functions.

Worth mentioning - if you click an element of a flame graph a zoom-in of its surroundings will get displayed above the graph.

### Using perf to sample a running process

This is great for recording flame graph data from an already running process that you don't want to interrupt. Imagine a production process with a hard to reproduce issue.

```
perf record -F99 -p `pgrep -n node` -g -- sleep 3
```

Wait, what is that `sleep 3` for? It's there to keep the perf running - despite `-p` option pointing to a different pid, the command needs to be executed on a process and end with it.
perf runs for the life of the command you pass to it, whether or not you're actually profiling that command. `sleep 3` ensures that perf runs for 3 seconds.

Why is `-F` (profiling frequency) set to 99? It's a reasonable default. You can adjust if you want.
-F99 tells perf to take 99 samples per second, for more precision increase the value. Lower values should produce less output with less precise results. Precision you need depends on how long your CPU intensive functions really run. IF you're looking for the reason of a noticeable slowdown 99 frames per second should be more than enough.

After you get that 3 second perf record, proceed with generating the flame graph with the last two steps from above.

### Filtering out Node internal functions


Usually you just want to look at the performance of your own calls, so filtering out Node and V8 internal functions can make the graph much easier to read. You can clean up your perf file with:

```bash
sed -i \
-e "/( __libc_start| LazyCompile | v8::internal::| Builtin:| Stub:| LoadIC:|\[unknown\]| LoadPolymorphicIC:)/d" \
-e 's/ LazyCompile:[*~]\?/ /' \
perfs.out
```

If you read your flame graph and it seems odd, as if something is missing in the key function taking up most time, try generating your flame graph without the filters - maybe you got a rare case of an issue with Node itself.

### Node's profiling options

`--perf-basic-prof-only-functions` and `--perf-basic-prof` are the two that are useful for debugging your JavaScript code. Other options are used for profiling Node itself, which is outside the scope of this guide.

`--perf-basic-prof-only-functions` produces less output, so it's the option with least overhead.

Why do I need them at all?

Well, without these options you'll still get a flame graph, but with most bars labeled `v8::Function::Call`.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

To be more detailed. They contain a map between the code addresses and the JS function names, collected by walking stack. I have not tried it, but do you really get more v8::Function::Call frames? I would suspect that you simply don't see the JS function frames, as pref doesn't know how to resolve the symbols for those code address.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I did generate a flame graph with all javascript functions labeled v8::Function::Call if I didn't pass --perf-* - that's what it does.

--perf-basic-prof is probably doing more than just exposing the function names from JS, but I don't know enough about it to elaborate.


## perf output issues

### Node.js 8.x V8 pipeline changes

Node.js 8.x and above ships with new optimizations to JavaScript compilation pipeline in V8 engine which makes function names/references unreachable for perf sometimes. (It's called Turbofan)

The result is you might not get your function names right in the flame graph.

You'll notice `ByteCodeHandler:` where you'd expect function names.

[0x](https://www.npmjs.com/package/0x) has some mitigations for that built in.

For details see:
- https://github.com/nodejs/benchmarking/issues/168
- https://github.com/nodejs/diagnostics/issues/148#issuecomment-369348961

### Node.js 10+

Node.js 10.x addresses the issue with Turbofan using the`--interpreted-frames-native-stack` flag.

Run `node --interpreted-frames-native-stack --perf-basic-prof-only-functions` to get function names in the flame graph regardless of which pipeline V8 used to compile your JavaScript.

### broken labels in flamegraph

If you're seeing labels looking like this
```
node`_ZN2v88internal11interpreter17BytecodeGenerator15VisitStatementsEPNS0_8ZoneListIPNS0_9StatementEEE
```
it means the Linux perf you're using was not compiled with demangle support, see https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1396654 for example


## Examples

Practice capturing flame graphs yourself with [a flame graph exercise](https://github.com/naugtur/node-example-flamegraph)!