-
Notifications
You must be signed in to change notification settings - Fork 2k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
stackcollapse-perf.pl ignores period of samples #165
Comments
Also ran into the same issue. @hardc0der did you somehow resolve it? |
FWIW, practically all the sampling we do is using cpu-clock since cpu-cycles typically isn't available in cloud environments (perf warns: "The cycles event is not supported, trying to fall back to cpu-clock-ticks"), and with cpu-clock that field shows the frequency and can be ignored. If there's a problem with cpu-cycles then I may be unaware as I rarely get a chance to use it, so please let me know. What do you think period means, anyway? E.g., I just tried on a bare metal system; what does this mean?:
(No, bash was not hot on-CPU.) |
From https://perf.wiki.kernel.org/index.php/Tutorial#Event-based_sampling_overview:
And later on the same page:
I think it's the dynamically chosen sampling period, expressed in terms of number of events, for this sample. In your example, the requested sampling frequency is 1 Hz, and the period turned out to be 2358327118 cycles. Does that match the clock frequency of your machine? |
Sure, that roughly matches the clock frequency. But what do we do with this information? bash was not in this function for 1 second. |
I take that to mean that this sample was taken after that many cycle events fired, not that the IP stayed there for that long. |
Right, so what do we do with this information? What's the proposed change to stackcollapse-perf.pl? |
I guess we should weight the samples by this amount. I'm going to check how |
Yeah |
So in the same profile:
Currently it weights them equally since it ignores the period. But you're saying you'd weight the bash sample as 2358327118 and this sample as 1? I don't think bash consumed 2.3 billion times more CPU in security_inode_permission() than this sample in native_write_msr(). For this example that seems even more wrong than just ignoring the period. If all the samples were roughly 2358327118, then I could understand that this is a problem of sampling granularity (since I choose -F 1). But that's not the case. I get periods from 1 through to 6281124533. Usually perf is a great reference of doing things right, but in this case I don't understand it. |
I'm looking at other profiles where if you sum the period, it shows more than >10x the real CPU time in the process (as shown by other tools). I choose the -F 1 to magnify sampling and period issues. If someone can 100% explain the periods in an -F 1 profile, including why they vary from 1 to 1B+, and why their sum greatly exceeds measured CPU, then we should be in a position to fix stackcollapse-perf.pl if needed (and the fix may just be to use the period as a weight). I'm nervous about fixing something I don't completely understand. |
Yeah I'm getting astronomical numbers that clearly don't make sense in the I still think the period field is meant to give weights to samples, but it seems the field is currently not reliable. I think for me the solution is to use |
It strikes me if I ignore the period=1 samples, then perhaps the other periods makes sense. Perhaps the period=1 events are due to some odd kernel code. And by weighting on period, they are effectively ignored anyway. I'm still thinking about the consequences. Given mixed periods, it means the flame graph will have mixed resolution. You may be able to zoom into some towers and see detail, but other towers and not see detail where the frames are elided (and wrongly assume the CPU time is in the parent frame). I don't know of a good solution though, other than documenting it (or artificially ignoring extra resolution). |
Yeah I think that's fair. Maybe recommend setting |
I may be totally wrong here, but just a note:
The kernel's automatic frequency adjustment seems to be designed for counters that mostly tick at a constant rate, and I think the behavior of |
@Knio thanks for the kernel code link: that I think we understand this enough to justify the change, and pay attention to the period as @hardc0der suggested. Someone want to do the PR? (plus I have other PRs to catch up on, now I've finished another book.) |
Summary: Disclaimer - I am not an expert on Perl, perf or FlameGraph. This is a hack for brendangregg#165 By hack I mean that it works for me but I didn't update all of the test files. The problem is that stackcollapse assumes that all stack traces have an equal weight but they don't based on based on current perf output. For example, there can be lines like this where 90689, 32298287 and 1 are the weights but stackcollapse assumes they all have weight 1, and by weight I mean the equivalent of the number of times this stack trace happens. Thus, the relative frequency of xpl_accept-3 is 1 / (90689 + 32289287 + 1). xpl_accept-1 2777710 12891973.429160: 90689 cycles: xpl_accept-2 2777710 12891975.431112: 32289287 cycles: xpl_accept-3 2777710 12891975.431112: 1 cycles: This is based on brendangregg#250 but I am not sure that PR is correct, it is also ~20 diffs behind current FlameGraph. Test Plan: works for me Reviewers: Subscribers: Tasks: Tags:
Are you open to another PR to fix this? Because I get extremely misleading flamegraphs today with:
I am not sure the PR is correct and it is behind latest by ~20 diffs. My hack is here. It works for me and is current with latest. |
I found this issue because the SVG produced by Digging into this was very confusing, so I thought it would be useful to others to summarize. Please correct me if I'm wrong:
I don't have an intuitive understanding of what it means to sample based on CPU cycles executed:
If the answer to the last question is "unclear," then perhaps #250 should be reverted (given the confusing nature of the "samples" count and its interpretation), and users should just be encouraged, for this use case of FlameGraphs, to always explicitly specify |
Not so simple as
Might be related to using |
Seems to have be because of this bug: brendangregg/FlameGraph#165. One workaround there was to record data with `-e cpu-clock`. I don't know what that means but that is what this commit does.
It seems like
stackcollapse-perf.pl
ignores the period associated with each sample record so all recorded samples are given the same weight. This ends up in the generated flamegraph not being consistent with whatperf report
outputs for the overhead of the different call stacks. I usually sample by specifying a frequency and the hardware CPU cycles event with something likeperf record -e cpu-cycles -F 997
and I can get pretty different output. I was wondering if you run into the same thing.The text was updated successfully, but these errors were encountered: