Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

add stack sampling profiler for flamegraphs #11373

Draft
wants to merge 1 commit into
base: master
Choose a base branch
from

Conversation

Mic92
Copy link
Member

@Mic92 Mic92 commented Aug 26, 2024

This is code is a bit rough in the sense that I have taken shortcuts to quickly iterate.
This is not ready for merging yet, but I think it's already useful for people that want to generate flamegraphs for things as large as NixOS.

Usage:

$ NIX_PROFILE_FILE=/tmp/nixos-trace nix run github:mic92/nix-1/sampling-profiler eval -v --no-eval-cache github:mic92/dotfiles#nixosConfigurations.turingmachine.config.system.build.toplevel    

The result is in this example stored in /tmp/nixos-trace. It can be imported in tools that support folded stacks i.e. https://www.speedscope.app/ or the original flamegraph script (https://github.com/brendangregg/FlameGraph)

The profiler records stack trace of the nix evaluation every 10ms (100Hz).

The resulting file compresses well with zstd:

/tmp/nixos-trace     :  0.27%   (  2.15 GiB =>   5.95 MiB, /tmp/nixos-trace.zst) 

Motivation

Context

Priorities and Process

Add 👍 to pull requests you find important.

The Nix maintainer team uses a GitHub project board to schedule and track reviews.

@Mic92
Copy link
Member Author

Mic92 commented Aug 26, 2024

This is a screen shot when evaluating my NixOS machine:

nixos-trace - speedscope

@Mic92
Copy link
Member Author

Mic92 commented Aug 26, 2024

Here is an example trace: https://github.com/Mic92/nix-1/releases/download/assets/nixos-trace.zst

You can download and decompress it with zstd:

zstd -d /tmp/nixos-trace.zst

Than visit https://www.speedscope.app/ and import it.

@nixos-discourse
Copy link

This pull request has been mentioned on NixOS Discourse. There might be relevant details there:

https://discourse.nixos.org/t/flamegraphs-for-nixos/51183/1

@Mic92
Copy link
Member Author

Mic92 commented Aug 26, 2024

cc @picnoir @Atemu who were involved in the tracy based profiler: #9967

Comment on lines +1528 to +1532
auto now = std::chrono::high_resolution_clock::now();
if (now - state.lastStackSample > SAMPLE_INTERVAL) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This technically slows down all calls, but probably not in a way that biases the result in a significant way.
Nonetheless, we'll need to know the performance impact of this.

Possible mitigations:

  • add a conditional if (evalSettings.profiling); a predictable branch is cheaper than getting the time
  • conditional compilation: worse UX but guaranteed to work

So, specifically what we need is benchmarks of

  • the control: git merge-base HEAD upstream/master
  • the current unconditional implementation
  • an implementation with a conditional based on a setting, with the setting disabled
  • not critical, but good to know: a measurement with the setting enabled

@@ -364,6 +364,36 @@ EvalState::EvalState(

EvalState::~EvalState()
{
auto profileFile = getEnv("NIX_PROFILE_FILE");
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This should be a setting.

EDIT: and so should NIX_SHOW_STATS actually

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Agreed.

@@ -364,6 +364,36 @@ EvalState::EvalState(

EvalState::~EvalState()
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not super happy about doing this in a destructor, though it probably works fine.

Please have a look at NIX_SHOW_STATS. You could move this logic out of the destructor and into a method, and then call it in the same way.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Currently it needs access to eval state to symbolize the stack, NIX_SHOW_STATS probably can access other global state.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

NIX_SHOW_STATS counters are also in EvalState.

@@ -360,6 +361,10 @@ private:

public:

std::vector<PosIdx> stack = {};
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It'd be good to have primops as well. They all have an internal name besides their builtins attribute, so this can work:

Suggested change
std::vector<PosIdx> stack = {};
std::vector<std::variant<PosIdx, std::string>> stack = {};

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good idea.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Perhaps a more general solution is to allow primops to have their own type of Pos, but that may or may not be complicated because of Pos's other uses, so I'd be fine with either solution.

@Atemu
Copy link
Member

Atemu commented Aug 26, 2024

Thanks a bunch for this! I tried going through your flame graph and have some feedback:

  1. File + line isn't very useful info at a glance. To actually get an idea of what's going on, you'd need the name of the binding on that line in addition to the file+line number
  2. There's a bunch of «none»s in the output, what do those represent?

@Mic92
Copy link
Member Author

Mic92 commented Aug 26, 2024

Thanks a bunch for this! I tried going through your flame graph and have some feedback:

1. File + line isn't very useful info at a glance. To actually get an idea of what's going on, you'd need the name of the binding on that line in addition to the file+line number

The issue is that function in nix don't really have names. Often enough they are assigned to variables but not always i.e. if they are passed to other functions.

2. There's a bunch of `«none»`s in the output, what do those represent?

If a builtin calls a function than we don't have a position at the moment for example. I might be able to provide the string name of builtins, but I don't know if I can get the position in this case as well.

@roberth
Copy link
Member

roberth commented Aug 26, 2024

The issue is that function in nix don't really have names.

True, but ExprLambda has a name that's based on its context, as it is often part of a binding. It's imperfect information, but it works well.

  1. There's a bunch of «none»s in the output, what do those represent?

This may be the position of the call instead of the position of the function. ExprLambda has its own position which I believe is always available.

This is code is a bit rough in the sense that I have taken shortcuts to
quickly iterate.
This is not ready for merging yet, but I think it's already useful for
people that want to generate flamegraphs for things as large as NixOS.

Usage:

NIX_PROFILE_FILE=/tmp/nixos-trace nix eval -v --no-eval-cache
.#nixosConfigurations.turingmachine.config.system.build.toplevel

The result can be imported in tools that support folded stacks i.e.
https://www.speedscope.app/ or the original flamegraph script
(https://github.com/brendangregg/FlameGraph)

The profiler records stack trace of the nix evaluation every 10ms (100Hz).
The resulting file is 1.9GB uncompressed and 5.4MB compressed with zstd.

Change-Id: I3484d3bd832e612747d02c251f2763e0c133a5dc
@Mic92
Copy link
Member Author

Mic92 commented Sep 9, 2024

Rebased + macOS fix. I haven't addressed any of the comments yet.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants