Skip to content

libexpr: allocate the Exprs themselves in Exprs::alloc#14472

Merged
xokdvium merged 1 commit intoNixOS:masterfrom
Radvendii:exprs-alloc
Nov 6, 2025
Merged

libexpr: allocate the Exprs themselves in Exprs::alloc#14472
xokdvium merged 1 commit intoNixOS:masterfrom
Radvendii:exprs-alloc

Conversation

@Radvendii
Copy link
Contributor

Motivation

for big-picture motivation, see the tracking issue

This PR moves the Expr objects into Exprs::alloc. In truth, eventually we want them to live outside of the allocator in vectors.

However, we're in a bit of a catch 22. Part of the reason we want to move the Exprs into vectors is so they don't get leaked so that we can do fuzz testing. But it's a big enough change that we're going to want to be able to fuzz-test it!

This is my way of snipping that loop. It gets rid of the leak so we can fuzz test the bigger changes.

Context


Add 👍 to pull requests you find important.

The Nix maintainer team uses a GitHub project board to schedule and track reviews.

@Radvendii Radvendii requested a review from edolstra as a code owner November 4, 2025 14:53
@Radvendii
Copy link
Contributor Author

CC @Ericson2314 @xokdvium

@Radvendii
Copy link
Contributor Author

Radvendii commented Nov 4, 2025

I don't know how to make sense of these performance numbers:

For parsing, we see

  • -7.6% memory usage
  • +5.6% instructions
  • -13.1% cache misses
  • -1.3% branch misses

The only number that I have an explanation for is -13% cache misses. That could make sense since we're now allocating Exprs (a) near each other, and (b) near the data that comprises them.

$ poop -d 60000 '../master/result/bin/nix-instantiate --parse ../../nixpkgs/pkgs/development/haskell-modules/hackage-packages.nix' '../feature/result/bin/nix-instantiate --parse ../../nixpkgs/pkgs/development/haskell-modules/hackage-packages.nix'
Benchmark 1 (198 runs): ../master/result/bin/nix-instantiate --parse ../../nixpkgs/pkgs/development/haskell-modules/hackage-packages.nix
  measurement          mean ± σ            min … max           outliers         delta
  wall_time           304ms ± 6.22ms     293ms …  355ms          9 ( 5%)        0%
  peak_rss            101MB ±  190KB     101MB …  102MB          1 ( 1%)        0%
  cpu_cycles         1.22G  ± 24.1M     1.20G  … 1.40G          17 ( 9%)        0%
  instructions       3.42G  ± 8.79K     3.42G  … 3.42G           2 ( 1%)        0%
  cache_references   27.5M  ± 3.01M     26.0M  … 51.4M          32 (16%)        0%
  cache_misses       1.78M  ± 18.1K     1.72M  … 1.95M           8 ( 4%)        0%
  branch_misses      3.86M  ± 24.8K     3.82M  … 4.02M           7 ( 4%)        0%
Benchmark 2 (201 runs): ../feature/result/bin/nix-instantiate --parse ../../nixpkgs/pkgs/development/haskell-modules/hackage-packages.nix
  measurement          mean ± σ            min … max           outliers         delta
  wall_time           300ms ± 3.16ms     294ms …  316ms          9 ( 4%)        ⚡-  1.3% ±  0.3%
  peak_rss           93.7MB ±  188KB    93.2MB … 94.1MB          0 ( 0%)        ⚡-  7.6% ±  0.0%
  cpu_cycles         1.21G  ± 11.3M     1.19G  … 1.30G           9 ( 4%)          -  0.9% ±  0.3%
  instructions       3.61G  ± 8.76K     3.61G  … 3.61G           0 ( 0%)        💩+  5.6% ±  0.0%
  cache_references   26.9M  ± 1.30M     26.0M  … 39.2M          17 ( 8%)          -  2.2% ±  1.7%
  cache_misses       1.54M  ± 14.6K     1.52M  … 1.60M           6 ( 3%)        ⚡- 13.1% ±  0.2%
  branch_misses      3.81M  ± 47.5K     3.77M  … 4.08M          17 ( 8%)        ⚡-  1.3% ±  0.2%

Overall performance numbers when we look at evaluation of a NixOS configuration are

  • -1.9% (20MB) memory usage
  • +2.6% instructions
$ poop '../master/result/bin/nix eval /path/to/my/nixos/flake#nixosConfigurations.smudge.config.system.build.toplevel' '../feature/result/bin/nix eval /path/to/my/nixos/flake#nixosConfigurations.smudge.config.system.build.toplevel'
Benchmark 1 (3 runs): ../master/result/bin/nix eval /path/to/my/nixos/flake#nixosConfigurations.smudge.config.system.build.toplevel
  measurement          mean ± σ            min … max           outliers         delta
  wall_time          7.40s  ± 54.1ms    7.34s  … 7.45s           0 ( 0%)        0%
  peak_rss           1.17GB ±  101KB    1.17GB … 1.17GB          0 ( 0%)        0%
  cpu_cycles         36.6G  ±  161M     36.4G  … 36.8G           0 ( 0%)        0%
  instructions       74.1G  ± 2.89M     74.1G  … 74.1G           0 ( 0%)        0%
  cache_references   1.92G  ± 12.2M     1.91G  … 1.93G           0 ( 0%)        0%
  cache_misses        276M  ± 6.69M      268M  …  282M           0 ( 0%)        0%
  branch_misses       218M  ± 4.00M      214M  …  222M           0 ( 0%)        0%
Benchmark 2 (3 runs): ../feature/result/bin/nix eval /path/to/my/nixos/flake#nixosConfigurations.smudge.config.system.build.toplevel
  measurement          mean ± σ            min … max           outliers         delta
  wall_time          7.34s  ±  301ms    7.07s  … 7.66s           0 ( 0%)          -  0.8% ±  6.6%
  peak_rss           1.15GB ±  223KB    1.15GB … 1.15GB          0 ( 0%)        ⚡-  1.9% ±  0.0%
  cpu_cycles         37.1G  ±  714M     36.4G  … 37.8G           0 ( 0%)          +  1.3% ±  3.2%
  instructions       76.1G  ± 34.7M     76.0G  … 76.1G           0 ( 0%)        💩+  2.6% ±  0.1%
  cache_references   1.92G  ± 15.7M     1.90G  … 1.94G           0 ( 0%)          +  0.2% ±  1.7%
  cache_misses        280M  ± 8.59M      271M  …  288M           0 ( 0%)          +  1.4% ±  6.3%
  branch_misses       213M  ± 5.02M      208M  …  218M           0 ( 0%)          -  2.3% ±  4.7%

@Radvendii
Copy link
Contributor Author

Radvendii commented Nov 4, 2025

I tried putting [[gnu::always_inline]] in front of the add functions but it didn't change anything. I thought that might have been the source of the added instructions.

@Radvendii
Copy link
Contributor Author

I forgot parser-state.hh. good catch.

@Ericson2314
Copy link
Member

@Radvendii as to the instruction count:

  1. If the cash misses speed things up enough, who cares! :)

  2. One way to bisect is to introduce the add function, and make everything use that, but make add use regular heap just like before. Than we can separate allocator cost vs new infra cost. (Cause it does look like the new infra ought to be essential free, after inlining, I agree!)

@Radvendii
Copy link
Contributor Author

  1. If the cash misses speed things up enough, who cares! :)

Maybe we could get even more performance!

  1. One way to bisect is to introduce the add function, and make everything use that, but make add use regular heap just like before. Than we can separate allocator cost vs new infra cost. (Cause it does look like the new infra ought to be essential free, after inlining, I agree!)

Good thought. I'll try this later today.

@Ericson2314
Copy link
Member

Good thought. I'll try this later today.

And do be clear, feel free to just keep the 3-commit history, and I don't mind merging the first 2 steps (current first commit and the new in-between commit) in the meantime.

@Radvendii
Copy link
Contributor Author

Okay. Things are now in three commmits. Only the last one causes the performance changes.

@Ericson2314
Copy link
Member

Great, thanks @Radvendii. Make a PR with the first two commits (with the good perf numbers), and I am happy merging it right away.

@Radvendii
Copy link
Contributor Author

This is proving quite hard to debug (increased instruction count isn't even reproducible with debugoptimized mode), and as @Ericson2314 pointed out, it's not even a problem.

So I think it's best to just move on.

@xokdvium xokdvium added this pull request to the merge queue Nov 6, 2025
Merged via the queue into NixOS:master with commit d596b97 Nov 6, 2025
16 checks passed
@Ericson2314
Copy link
Member

Glad it's merged!

@edolstra edolstra mentioned this pull request Dec 9, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants