-
Notifications
You must be signed in to change notification settings - Fork 6.3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add first of the new diagnostics articles to guides #1444
Changes from 2 commits
d63a83f
7126572
ed48f71
3c6f665
79618d3
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,63 @@ | ||
--- | ||
title: Diagnostics - Flame Graphs | ||
layout: docs.hbs | ||
--- | ||
|
||
# Flame Graphs | ||
|
||
## What's a flame graph useful for? | ||
|
||
Flame graphs are a way of visualizing CPU time spent in functions. They can help you pin down where you spend too much time doing synchronous operations. | ||
|
||
## How to create a flame graph | ||
|
||
You might have heard creating a flame graph for Node.js is difficult, but that's not true (anymore). | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I support this a lot as a technical documentation for how to make a flame graph using only basic primitives. However, I wouldn't call this easy. I someone asked me how to make flame graphs easily I would defer to 0x. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Hehe, easy in comparison. It's a second mention of 0x, so it's time I added a mention. thx There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. someone who hasn't heard its difficult now has, maybe this will make people nervous before they even started? I third the |
||
Solaris vms are no longer needed for flame graphs on linux! | ||
|
||
1. install linux-tools-common - the package containing `perf` (eg. with apt-get) | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Maybe:
|
||
1. try running `perf` - it might complain about missing kernel modules, install them too | ||
1. install stackvis `npm i -g stackvis` | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Might be helpful to explain why:
|
||
1. run node with perf enabled `perf record -e cycles:u -g -- node --perf-basic-prof app.js` | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I guess the number at the beginning reflect a step in the process. If so they should be incremented at every step. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. It's how markdown works. Generated HTML will have an ordered list here. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I didn't know 😮. Always used There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I tend to spend a lot of time reading markdown files in the raw (not least in PR review), so I'd appreciate incremental numbering. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Would be nice to break this out too: 1. Run your script through `perf` to ensure everything is setup correctly:
```bash
perf record -e cycles:u -g -- node --perf-basic-prof app.js
```
|
||
1. disregard warnings unless they're saying you can't run perf due to missing packages; you may get some warnings about not being able to access kernel module samples which you're not after anyway. | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Maybe simplify to:
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. The errors you get did sound pretty fatal to me, but they only matter if they say perf is missing some dependency. I went through figuring it out and I don't think I would understand what you mean by "non-fatal" before I figured it out. |
||
1. run `perf script`, but pipe it through some cleanup: `perf script | egrep -v "( __libc_start| LazyCompile | v8::internal::| Builtin:| Stub:| LoadIC:|\[unknown\]| LoadPolymorphicIC:)" | sed 's/ LazyCompile:[*~]\?/ /' > perfs.out` [explanation](#about-perf-script-filtering) | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. It would be good to have the part you want to cut and paste on its own line. In this case it I included the 'explanation' part in my cut and paste and of course if failed. So while people can figure that out making it easier to get it right the first time makes sense to me. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. +1 on a code block, and I think it's worth repeating the whole perf line: 1. Run your script through `perf` again, removing V8 internals and redirecting output to a file.
```bash
perf record -e cycles:u -g -- node --perf-basic-prof app.js > perfs.out
```
I'd suggest moving the grep and sed into your paragraph below. I think it's a good idea, and worth mentioning, but it makes the initial example more complex, and as you note later it isn't always useful. |
||
1. run `stackvis perf < perfs.out > flamegraph.htm` | ||
|
||
Now open the flame graph file in your favorite browser and watch it burn. It's color-coded so you can focus on the most saturated orange bars first. They're likely to represent CPU heavy functions. | ||
|
||
Worth mentioning - if you click an element of a flame graph a zoom-in of its surroundings will get displayed above the graph. | ||
|
||
### Use perf to sample a running process | ||
|
||
``` | ||
perf record -F99 -p `pgrep -n node` -g -- sleep 3 | ||
``` | ||
|
||
Wait, what is that `sleep 3` for? It's there to keep the perf running - despite `-p` option pointing to a different pid, the command needs to be executed on a process and end with it. Just a convenient way to keep perf going for a given time. | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. This wasn't immediately clear to me. Maybe something like:
|
||
|
||
Why is `-F` (profiling frequency) set to 99? It's a reasonable default. You can adjust if you want. Lower values should produce less output with less precise results. | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Maybe:
|
||
|
||
After you get that 3 second perf record, proceed with generating the flame graph with last two steps from above. | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more.
|
||
|
||
### About `perf script` filtering | ||
|
||
The long line of greps and seds after `perf script` call is there to remove some V8 and node internals so that it's easier for you to focus on your JavaScript calls. These are only important if you are looking for an issue with Node internals. | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. As I mentioned above, I'd suggest putting the grep/sed in here: ### Filtering out Node internal functions
Usually you just want to look at the performance of your own calls, so filtering out Node and V8 internal functions can make the graph much easier to read. You can clean up your perf file with:
```bash
sed -i \
-e "s/( __libc_start| LazyCompile | v8::internal::| Builtin:| Stub:| LoadIC:|\[unknown\]| LoadPolymorphicIC:)/d" \
-e 's/ LazyCompile:[*~]\?/ /' \
perfs.out
```
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I'm not a grammatician, but I read the "These" of the second sentence to refer to the most recently mentioned thing, "JavaScript calls", maybe be more explicit: The internals are only important if you are .... |
||
|
||
If you read your flame graph and it seems odd, as if something is missing in the key function taking up most time, try generating your flame graph without the filters - maybe you got a rare case of an issue with Node itself. | ||
|
||
### Node's profiling options | ||
|
||
`--perf-basic-prof-only-functions` and `--perf-basic-prof` seem like the only two you might be initially interested in for debugging your JavaScript code. Other options should only be useful for profiling Node itself, which is outside the scope of this howto. | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Maybe:
to
|
||
|
||
`--perf-basic-prof-only-functions` produces less output, so it's the option with least overhead. | ||
|
||
Why do I need them at all? | ||
|
||
Well, without these options you'll still get a flame graph, but with most bars labeled `v8::Function::Call`. | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. To be more detailed. They contain a map between the code addresses and the JS function names, collected by walking stack. I have not tried it, but do you really get more There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I did generate a flame graph with all javascript functions labeled
|
||
|
||
## Node.js 8.x V8 pipeline changes | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Maybe move this warning to the begin of the guide? Otherwise, people will read the entire guide to find out this technique may not work for them. Furthermore, they may get "weird" |
||
|
||
Node.js 8.x ships with new optimizations to JavaScript compilation pipeline in V8 engine which makes function names/references unreachable for perf. The result is you might not get your function names right in the flame graph. See this thread for details: https://github.com/nodejs/benchmarking/issues/168 | ||
|
||
## Examples | ||
|
||
Practice capturing a flame graph yourself with [a flame graph exercise](https://github.com/naugtur/node-example-flamegraph) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Explaining why before explaining how - part of my intended pattern for those guides.