-
Notifications
You must be signed in to change notification settings - Fork 0
/
README
98 lines (67 loc) · 4.12 KB
/
README
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
Flame Graphs visualize hot-CPU code-paths.
Using DTrace, see: http://dtrace.org/blogs/brendan/2011/12/16/flame-graphs/
Using perf_events or SystemTap, see: http://dtrace.org/blogs/brendan/2012/03/17/linux-kernel-performance-flame-graphs/
Using XCode Instruments, see: http://schani.wordpress.com/2012/11/16/flame-graphs-for-instruments/
These can be created in three steps:
1. Capture stacks
2. Fold stacks
3. flamegraph.pl
1. Capture stacks
=================
Stack samples can be captured using DTrace, perf_events or SystemTap.
Using DTrace to capture 60 seconds of kernel stacks at 997 Hertz:
# dtrace -x stackframes=100 -n 'profile-997 /arg0/ { @[stack()] = count(); } tick-60s { exit(0); }' -o out.kern_stacks
Using DTrace to capture 60 seconds of user-level stacks for PID 12345 at 97 Hertz:
# dtrace -x ustackframes=100 -n 'profile-97 /PID == 12345 && arg1/ { @[ustack()] = count(); } tick-60s { exit(0); }' -o out.user_stacks
Using DTrace to capture 60 seconds of user-level stacks, including while time is spent in the kernel, for PID 12345 at 97 Hertz:
# dtrace -x ustackframes=100 -n 'profile-97 /PID == 12345/ { @[ustack()] = count(); } tick-60s { exit(0); }' -o out.user_stacks
Switch ustack() for jstack() if the application has a ustack helper to include translated frames (eg, node.js frames; see: http://dtrace.org/blogs/dap/2012/01/05/where-does-your-node-program-spend-its-time/). The rate for user-level stack collection is deliberately slower than kernel, which is especially important when using jstack() as it performs additional work to translate frames.
2. Fold stacks
==============
Use the stackcollapse programs to fold stack samples into single lines. The programs provided are:
- stackcollapse.pl: for DTrace stacks
- stackcollapse-perf.pl: for perf_events "perf script" output
- stackcollapse-stap.pl: for SystemTap stacks
- stackcollapse-instruments.pl: for XCode Instruments
Usage example:
$ ./stackcollapse.pl out.kern_stacks > out.kern_folded
The output looks like this:
unix`_sys_sysenter_post_swapgs 1401
unix`_sys_sysenter_post_swapgs;genunix`close 5
unix`_sys_sysenter_post_swapgs;genunix`close;genunix`closeandsetf 85
unix`_sys_sysenter_post_swapgs;genunix`close;genunix`closeandsetf;c2audit`audit_closef 26
unix`_sys_sysenter_post_swapgs;genunix`close;genunix`closeandsetf;c2audit`audit_setf 5
unix`_sys_sysenter_post_swapgs;genunix`close;genunix`closeandsetf;genunix`audit_getstate 6
unix`_sys_sysenter_post_swapgs;genunix`close;genunix`closeandsetf;genunix`audit_unfalloc 2
unix`_sys_sysenter_post_swapgs;genunix`close;genunix`closeandsetf;genunix`closef 48
[...]
3. flamegraph.pl
================
Use flamegraph.pl to render a SVG.
$ ./flamegraph.pl out.kern_folded > kernel.svg
An advantage of having the folded input file (and why this is separate to flamegraph.pl) is that you can use grep for functions of interest. Eg:
$ grep cpuid out.kern_folded | ./flamegraph.pl > cpuid.svg
Provided Example
================
An example output from DTrace is included, both the captured stacks and
the resulting Flame Graph. You can generate it yourself using:
$ ./stackcollapse.pl example-stacks.txt | ./flamegraph.pl > example.svg
This was from a particular performance investigation: the Flame Graph
identified that CPU time was spent in the lofs module, and quantified
that time.
Options
=======
See the USAGE message (--help) for options:
USAGE: ./flamegraph.pl [options] infile > outfile.svg
--titletext # change title text
--width # width of image (default 1200)
--height # height of each frame (default 16)
--minwidth # omit smaller functions (default 0.1 pixels)
--fonttype # font type (default "Verdana")
--fontsize # font size (default 12)
--countname # count type label (default "samples")
--nametype # name type label (default "Function:")
eg,
./flamegraph.pl --titletext="Flame Graph: malloc()" trace.txt > graph.svg
As suggested in the example, flame graphs can process traces of any event,
such as malloc()s, provided stack traces are gathered.