Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add first of the new diagnostics articles to guides #1444

Merged
merged 5 commits into from
Feb 20, 2019
Merged
Changes from 2 commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
63 changes: 63 additions & 0 deletions locale/en/docs/guides/diagnostics-flamegraph.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,63 @@
---
title: Diagnostics - Flame Graphs
layout: docs.hbs
---

# Flame Graphs

## What's a flame graph useful for?
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Explaining why before explaining how - part of my intended pattern for those guides.


Flame graphs are a way of visualizing CPU time spent in functions. They can help you pin down where you spend too much time doing synchronous operations.

## How to create a flame graph

You might have heard creating a flame graph for Node.js is difficult, but that's not true (anymore).
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I support this a lot as a technical documentation for how to make a flame graph using only basic primitives. However, I wouldn't call this easy. I someone asked me how to make flame graphs easily I would defer to 0x.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hehe, easy in comparison. It's a second mention of 0x, so it's time I added a mention. thx

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

someone who hasn't heard its difficult now has, maybe this will make people nervous before they even started? I third the 0x reference

Solaris vms are no longer needed for flame graphs on linux!

1. install linux-tools-common - the package containing `perf` (eg. with apt-get)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe:

Install perf (usually available through the linux-tools-common package if not already installed).

1. try running `perf` - it might complain about missing kernel modules, install them too
1. install stackvis `npm i -g stackvis`
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Might be helpful to explain why:

Install stackvis with npm i -g stackvis, to generate html flamegraphs from the perf results.

1. run node with perf enabled `perf record -e cycles:u -g -- node --perf-basic-prof app.js`
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I guess the number at the beginning reflect a step in the process. If so they should be incremented at every step.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's how markdown works. Generated HTML will have an ordered list here.
I could switch to incremental so the MD file is more readable, but it's not meant to be read without rendering.

Copy link
Member

@lpinca lpinca Nov 4, 2017

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I didn't know 😮. Always used 1. 2. 3. .... Thanks.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I tend to spend a lot of time reading markdown files in the raw (not least in PR review), so I'd appreciate incremental numbering.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Would be nice to break this out too:

1. Run your script through `perf` to ensure everything is setup correctly:
```bash
perf record -e cycles:u -g -- node --perf-basic-prof app.js
```

1. disregard warnings unless they're saying you can't run perf due to missing packages; you may get some warnings about not being able to access kernel module samples which you're not after anyway.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe simplify to:

Disregard non-fatal warnings, e.g. those about accessing kernel module samples.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The errors you get did sound pretty fatal to me, but they only matter if they say perf is missing some dependency. I went through figuring it out and I don't think I would understand what you mean by "non-fatal" before I figured it out.

1. run `perf script`, but pipe it through some cleanup: `perf script | egrep -v "( __libc_start| LazyCompile | v8::internal::| Builtin:| Stub:| LoadIC:|\[unknown\]| LoadPolymorphicIC:)" | sed 's/ LazyCompile:[*~]\?/ /' > perfs.out` [explanation](#about-perf-script-filtering)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It would be good to have the part you want to cut and paste on its own line. In this case it I included the 'explanation' part in my cut and paste and of course if failed. So while people can figure that out making it easier to get it right the first time makes sense to me.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

+1 on a code block, and I think it's worth repeating the whole perf line:

1. Run your script through `perf` again, removing V8 internals and redirecting output to a file.
```bash
perf record -e cycles:u -g -- node --perf-basic-prof app.js > perfs.out
```

I'd suggest moving the grep and sed into your paragraph below. I think it's a good idea, and worth mentioning, but it makes the initial example more complex, and as you note later it isn't always useful.

1. run `stackvis perf < perfs.out > flamegraph.htm`

Now open the flame graph file in your favorite browser and watch it burn. It's color-coded so you can focus on the most saturated orange bars first. They're likely to represent CPU heavy functions.

Worth mentioning - if you click an element of a flame graph a zoom-in of its surroundings will get displayed above the graph.

### Use perf to sample a running process

```
perf record -F99 -p `pgrep -n node` -g -- sleep 3
```

Wait, what is that `sleep 3` for? It's there to keep the perf running - despite `-p` option pointing to a different pid, the command needs to be executed on a process and end with it. Just a convenient way to keep perf going for a given time.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This wasn't immediately clear to me. Maybe something like:

perf runs for the life of the command you pass to it, whether or not you're actually profiling that command. sleep 3 ensures that perf runs for 3 seconds.


Why is `-F` (profiling frequency) set to 99? It's a reasonable default. You can adjust if you want. Lower values should produce less output with less precise results.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe:

-F99 tells perf to take 99 samples per second, for more precision increase the value.


After you get that 3 second perf record, proceed with generating the flame graph with last two steps from above.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

last two -> the last two


### About `perf script` filtering

The long line of greps and seds after `perf script` call is there to remove some V8 and node internals so that it's easier for you to focus on your JavaScript calls. These are only important if you are looking for an issue with Node internals.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As I mentioned above, I'd suggest putting the grep/sed in here:

### Filtering out Node internal functions

Usually you just want to look at the performance of your own calls, so filtering out Node and V8 internal functions can make the graph much easier to read. You can clean up your perf file with:

```bash
sed -i \
  -e "s/( __libc_start| LazyCompile | v8::internal::| Builtin:| Stub:| LoadIC:|\[unknown\]| LoadPolymorphicIC:)/d" \
  -e 's/ LazyCompile:[*~]\?/ /' \
  perfs.out
```

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not a grammatician, but I read the "These" of the second sentence to refer to the most recently mentioned thing, "JavaScript calls", maybe be more explicit: The internals are only important if you are ....


If you read your flame graph and it seems odd, as if something is missing in the key function taking up most time, try generating your flame graph without the filters - maybe you got a rare case of an issue with Node itself.

### Node's profiling options

`--perf-basic-prof-only-functions` and `--perf-basic-prof` seem like the only two you might be initially interested in for debugging your JavaScript code. Other options should only be useful for profiling Node itself, which is outside the scope of this howto.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe:

--perf-basic-prof-only-functions and --perf-basic-prof seem like the only two you might be initially interested in for debugging your JavaScript code. Other options should only be useful for profiling Node itself, which is outside the scope of this howto.

to

--perf-basic-prof-only-functions and --perf-basic-prof are the two that are useful for debugging your JavaScript code. Other options are used for profiling Node itself, which is outside the scope of this howto.


`--perf-basic-prof-only-functions` produces less output, so it's the option with least overhead.

Why do I need them at all?

Well, without these options you'll still get a flame graph, but with most bars labeled `v8::Function::Call`.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

To be more detailed. They contain a map between the code addresses and the JS function names, collected by walking stack. I have not tried it, but do you really get more v8::Function::Call frames? I would suspect that you simply don't see the JS function frames, as pref doesn't know how to resolve the symbols for those code address.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I did generate a flame graph with all javascript functions labeled v8::Function::Call if I didn't pass --perf-* - that's what it does.

--perf-basic-prof is probably doing more than just exposing the function names from JS, but I don't know enough about it to elaborate.


## Node.js 8.x V8 pipeline changes
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe move this warning to the begin of the guide? Otherwise, people will read the entire guide to find out this technique may not work for them. Furthermore, they may get "weird" BytecodeHandler::* stuff in the Flame Graph while following the guide (see nodejs/benchmarking#168 (comment))


Node.js 8.x ships with new optimizations to JavaScript compilation pipeline in V8 engine which makes function names/references unreachable for perf. The result is you might not get your function names right in the flame graph. See this thread for details: https://github.com/nodejs/benchmarking/issues/168

## Examples

Practice capturing a flame graph yourself with [a flame graph exercise](https://github.com/naugtur/node-example-flamegraph)