-
-
Notifications
You must be signed in to change notification settings - Fork 1.5k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
add stack sampling profiler for flamegraphs #11373
base: master
Are you sure you want to change the base?
Conversation
a5af020
to
92bf85d
Compare
Here is an example trace: https://github.com/Mic92/nix-1/releases/download/assets/nixos-trace.zst You can download and decompress it with zstd:
Than visit https://www.speedscope.app/ and import it. |
This pull request has been mentioned on NixOS Discourse. There might be relevant details there: |
auto now = std::chrono::high_resolution_clock::now(); | ||
if (now - state.lastStackSample > SAMPLE_INTERVAL) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This technically slows down all calls, but probably not in a way that biases the result in a significant way.
Nonetheless, we'll need to know the performance impact of this.
Possible mitigations:
- add a conditional
if (evalSettings.profiling)
; a predictable branch is cheaper than getting the time - conditional compilation: worse UX but guaranteed to work
So, specifically what we need is benchmarks of
- the control:
git merge-base HEAD upstream/master
- the current unconditional implementation
- an implementation with a conditional based on a setting, with the setting disabled
- not critical, but good to know: a measurement with the setting enabled
@@ -364,6 +364,36 @@ EvalState::EvalState( | |||
|
|||
EvalState::~EvalState() | |||
{ | |||
auto profileFile = getEnv("NIX_PROFILE_FILE"); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This should be a setting.
EDIT: and so should NIX_SHOW_STATS
actually
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Agreed.
@@ -364,6 +364,36 @@ EvalState::EvalState( | |||
|
|||
EvalState::~EvalState() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Not super happy about doing this in a destructor, though it probably works fine.
Please have a look at NIX_SHOW_STATS
. You could move this logic out of the destructor and into a method, and then call it in the same way.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Currently it needs access to eval state to symbolize the stack, NIX_SHOW_STATS probably can access other global state.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
NIX_SHOW_STATS
counters are also in EvalState
.
@@ -360,6 +361,10 @@ private: | |||
|
|||
public: | |||
|
|||
std::vector<PosIdx> stack = {}; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It'd be good to have primops as well. They all have an internal name besides their builtins
attribute, so this can work:
std::vector<PosIdx> stack = {}; | |
std::vector<std::variant<PosIdx, std::string>> stack = {}; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good idea.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Perhaps a more general solution is to allow primops to have their own type of Pos
, but that may or may not be complicated because of Pos
's other uses, so I'd be fine with either solution.
Thanks a bunch for this! I tried going through your flame graph and have some feedback:
|
The issue is that function in nix don't really have names. Often enough they are assigned to variables but not always i.e. if they are passed to other functions.
If a builtin calls a function than we don't have a position at the moment for example. I might be able to provide the string name of builtins, but I don't know if I can get the position in this case as well. |
True, but
This may be the position of the call instead of the position of the function. |
92bf85d
to
e4a1e6a
Compare
This is code is a bit rough in the sense that I have taken shortcuts to quickly iterate. This is not ready for merging yet, but I think it's already useful for people that want to generate flamegraphs for things as large as NixOS. Usage: NIX_PROFILE_FILE=/tmp/nixos-trace nix eval -v --no-eval-cache .#nixosConfigurations.turingmachine.config.system.build.toplevel The result can be imported in tools that support folded stacks i.e. https://www.speedscope.app/ or the original flamegraph script (https://github.com/brendangregg/FlameGraph) The profiler records stack trace of the nix evaluation every 10ms (100Hz). The resulting file is 1.9GB uncompressed and 5.4MB compressed with zstd. Change-Id: I3484d3bd832e612747d02c251f2763e0c133a5dc
e4a1e6a
to
794007c
Compare
Rebased + macOS fix. I haven't addressed any of the comments yet. |
This is code is a bit rough in the sense that I have taken shortcuts to quickly iterate.
This is not ready for merging yet, but I think it's already useful for people that want to generate flamegraphs for things as large as NixOS.
Usage:
The result is in this example stored in /tmp/nixos-trace. It can be imported in tools that support folded stacks i.e. https://www.speedscope.app/ or the original flamegraph script (https://github.com/brendangregg/FlameGraph)
The profiler records stack trace of the nix evaluation every 10ms (100Hz).
The resulting file compresses well with zstd:
Motivation
Context
Priorities and Process
Add 👍 to pull requests you find important.
The Nix maintainer team uses a GitHub project board to schedule and track reviews.