Skip to content

libstore: Use boost::regex for GC root discovery#13142

Merged
Mic92 merged 3 commits intoNixOS:masterfrom
xokdvium:gc-root-boost-regex
May 18, 2025
Merged

libstore: Use boost::regex for GC root discovery#13142
Mic92 merged 3 commits intoNixOS:masterfrom
xokdvium:gc-root-boost-regex

Conversation

@xokdvium
Copy link
Contributor

@xokdvium xokdvium commented May 6, 2025

Motivation

As it turns out using std::regex is actually the bottleneck for root discovery. Just substituting std:: -> boost:: makes root discovery twice as fast (3x if counting only userspace time).

Some rather ad-hoc measurements to motivate the switch:

(On master)

nix build github:nixos/nix/1e822bd4149a8bce1da81ee2ad9404986b07914c#nix-cli --out-link result-1e822bd4149a8bce1da81ee2ad9404986b07914c
taskset -c 2,3 hyperfine "result-1e822bd4149a8bce1da81ee2ad9404986b07914c/bin/nix store gc --dry-run --max 0"
Benchmark 1: result-1e822bd4149a8bce1da81ee2ad9404986b07914c/bin/nix store gc --dry-run --max 0
  Time (mean ± σ):     481.6 ms ±   3.9 ms    [User: 336.2 ms, System: 142.0 ms]
  Range (min … max):   474.6 ms … 487.7 ms    10 runs

(After this patch)

taskset -c 2,3 hyperfine "result/bin/nix store gc --dry-run --max 0"
Benchmark 1: result/bin/nix store gc --dry-run --max 0
  Time (mean ± σ):     254.7 ms ±   9.7 ms    [User: 111.1 ms, System: 141.3 ms]
  Range (min … max):   246.5 ms … 281.3 ms    10 runs

boost::regex is a drop-in replacement for std::regex, but much faster. Doing a simple before/after comparison doesn't surface any change in behavior:

result/bin/nix store gc --dry-run -vvvvv --max 0 |& grep "got additional" | wc -l
result-1e822bd4149a8bce1da81ee2ad9404986b07914c/bin/nix store gc --dry-run -vvvvv --max 0 |& grep "got additional" | wc -l

Note

This really doesn't make a dent in the actual runtime of GC itself.

Context


Add 👍 to pull requests you find important.

The Nix maintainer team uses a GitHub project board to schedule and track reviews.

As it turns out using `std::regex` is actually the bottleneck
for root discovery. Just substituting `std::` -> `boost::`
makes root discovery twice as fast (3x if counting only userspace time).

Some rather ad-hoc measurements to motivate the switch:

(On master)

```
nix build github:nixos/nix/1e822bd4149a8bce1da81ee2ad9404986b07914c#nix-cli --out-link result-1e822bd4149a8bce1da81ee2ad9404986b07914c
taskset -c 2,3 hyperfine "result-1e822bd4149a8bce1da81ee2ad9404986b07914c/bin/nix store gc --dry-run --max 0"
Benchmark 1: result-1e822bd4149a8bce1da81ee2ad9404986b07914c/bin/nix store gc --dry-run --max 0
  Time (mean ± σ):     481.6 ms ±   3.9 ms    [User: 336.2 ms, System: 142.0 ms]
  Range (min … max):   474.6 ms … 487.7 ms    10 runs
```

(After this patch)

```
taskset -c 2,3 hyperfine "result/bin/nix store gc --dry-run --max 0"
Benchmark 1: result/bin/nix store gc --dry-run --max 0
  Time (mean ± σ):     254.7 ms ±   9.7 ms    [User: 111.1 ms, System: 141.3 ms]
  Range (min … max):   246.5 ms … 281.3 ms    10 runs
```

`boost::regex` is a drop-in replacement for `std::regex`, but much faster.
Doing a simple before/after comparison doesn't surface any change in behavior:

```
result/bin/nix store gc --dry-run -vvvvv --max 0 |& grep "got additional" | wc -l
result-1e822bd4149a8bce1da81ee2ad9404986b07914c/bin/nix store gc --dry-run -vvvvv --max 0 |& grep "got additional" | wc -l
```
@xokdvium xokdvium requested a review from Ericson2314 as a code owner May 6, 2025 22:14
@edolstra
Copy link
Member

edolstra commented May 7, 2025

Please check that this doesn't cause a dependency on libicu (see #7762).

@Mic92
Copy link
Member

Mic92 commented May 13, 2025

Please check that this doesn't cause a dependency on libicu (see #7762).

@xokdvium have you checked this?

@xokdvium
Copy link
Contributor Author

Hmm, libnixstore.so already depends on libboost_regex.so.1.87.0 as well as libicu*so.76 on master's ed52176:

$ nix build github:nixos/nix/ed521760bc92a93c27b8e0a69b5e89f610cef697#nix-store --out-link result-ed521760bc92a93c27b8e0a69b5e89f610cef697
$ ldd result-ed521760bc92a93c27b8e0a69b5e89f610cef697/lib/libnixstore.so
....
        libboost_regex.so.1.87.0 => /nix/store/z4k2zqmkpb7byagshlxv1ik7iijp5b9h-boost-1.87.0/lib/libboost_regex.so.1.87.0 (0x00007f4db3e8e000)
        libicudata.so.76 => /nix/store/6pr4vfwd4s376sfa784d2ad0b82gdd2d-icu4c-76.1/lib/libicudata.so.76 (0x00007f4db202b000)
        libicui18n.so.76 => /nix/store/6pr4vfwd4s376sfa784d2ad0b82gdd2d-icu4c-76.1/lib/libicui18n.so.76 (0x00007f4db1c78000)
        libicuuc.so.76 => /nix/store/6pr4vfwd4s376sfa784d2ad0b82gdd2d-icu4c-76.1/lib/libicuuc.so.76 (0x00007f4db1a47000)
        libunistring.so.5 => /nix/store/mbx9ii53lzjlrsnlrfmzpwm33ynljwdn-libunistring-1.3/lib/libunistring.so.5 (0x00007f4db185a000)
....

I guess this is boost's auto-link at work and some boost libraries that we are already using transitively include boost::regex headers https://github.com/boostorg/regex/blob/f851a08050dd6d4bdea552d2cf614c69d8d3cff9/include/boost/regex/v5/cregex.hpp#L28-L39.

Should I also modify packaging/dependencies.nix to disable enableIcu?

xokdvium added 2 commits May 13, 2025 08:47
This reduces the closure size on master by 40MiB.

```
$ nix build github:nixos/nix/1e822bd4149a8bce1da81ee2ad9404986b07914c#nix-store --out-link closure-on-master
$ nix build .#nix-store -L --out-link closure-without-icu
$ nix path-info --closure-size -h ./closure-on-master
/nix/store/8gwr38m5h6p7245ji9jv28a2a11w1isx-nix-store-2.29.0pre  124.4 MiB
$ nix path-info --closure-size -h ./closure-without-icu
/nix/store/k0gwfykjqpnmaqbwh23nk55lhanc9g24-nix-store-2.29.0pre   86.6 MiB
```
@xokdvium xokdvium requested a review from edolstra as a code owner May 13, 2025 08:52
@xokdvium
Copy link
Contributor Author

xokdvium commented May 13, 2025

@edolstra @Mic92 Seems like master already links to libicu somehow. #13142 (comment). I've disabled locale support in packaging/dependencies in f3090ef. Does that make sense? Should that be split off into a different PR?
AFAICT this is a closure-size regression from meson migration.

Copy link
Member

@Mic92 Mic92 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Changes look good to me.

@xokdvium
Copy link
Contributor Author

@edolstra ping wrt libicu dependency on master. If you'd like I could split f3090ef into a separate PR if that closure size reduction warrants a backport.

"--with-coroutine"
"--with-iostreams"
];
enableIcu = false;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We could in a different PR probably use disallowedReferences to make sure we never reference to icu in nix.

@Mic92 Mic92 merged commit e7078d4 into NixOS:master May 18, 2025
12 checks passed
@Mic92
Copy link
Member

Mic92 commented May 18, 2025

@edolstra ping wrt libicu dependency on master. If you'd like I could split f3090ef into a separate PR if that closure size reduction warrants a backport.

Unless there is someone else interested, I would just backport this to 2.29.

@Mic92 Mic92 added the backport 2.29-maintenance Automatically creates a PR against the branch label May 18, 2025
@Mic92
Copy link
Member

Mic92 commented May 18, 2025

@xokdvium I wonder if this is also a candidate for builtins.match if it that much faster. I remember the gcc version of std::regex also would stackoverflow on slightly larger input.

@xokdvium
Copy link
Contributor Author

@Mic92 that's the plan. There's been several attempts at this (see #7762) and also Lix did the migration as well (lix-project/lix@447212f). Before we do that though I think we need a good test suite. One possible way to do that is to extract builtins.match calls from nixpkgs by instrumenting the match primop and produce gtest/language level regression suite in-tree from that.

@xokdvium xokdvium deleted the gc-root-boost-regex branch May 18, 2025 19:54
Mic92 added a commit that referenced this pull request May 18, 2025
…3142

libstore: Use `boost::regex` for GC root discovery (backport #13142)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

backport 2.29-maintenance Automatically creates a PR against the branch

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants