libstore: Use boost::regex for GC root discovery#13142
Conversation
As it turns out using `std::regex` is actually the bottleneck for root discovery. Just substituting `std::` -> `boost::` makes root discovery twice as fast (3x if counting only userspace time). Some rather ad-hoc measurements to motivate the switch: (On master) ``` nix build github:nixos/nix/1e822bd4149a8bce1da81ee2ad9404986b07914c#nix-cli --out-link result-1e822bd4149a8bce1da81ee2ad9404986b07914c taskset -c 2,3 hyperfine "result-1e822bd4149a8bce1da81ee2ad9404986b07914c/bin/nix store gc --dry-run --max 0" Benchmark 1: result-1e822bd4149a8bce1da81ee2ad9404986b07914c/bin/nix store gc --dry-run --max 0 Time (mean ± σ): 481.6 ms ± 3.9 ms [User: 336.2 ms, System: 142.0 ms] Range (min … max): 474.6 ms … 487.7 ms 10 runs ``` (After this patch) ``` taskset -c 2,3 hyperfine "result/bin/nix store gc --dry-run --max 0" Benchmark 1: result/bin/nix store gc --dry-run --max 0 Time (mean ± σ): 254.7 ms ± 9.7 ms [User: 111.1 ms, System: 141.3 ms] Range (min … max): 246.5 ms … 281.3 ms 10 runs ``` `boost::regex` is a drop-in replacement for `std::regex`, but much faster. Doing a simple before/after comparison doesn't surface any change in behavior: ``` result/bin/nix store gc --dry-run -vvvvv --max 0 |& grep "got additional" | wc -l result-1e822bd4149a8bce1da81ee2ad9404986b07914c/bin/nix store gc --dry-run -vvvvv --max 0 |& grep "got additional" | wc -l ```
|
Please check that this doesn't cause a dependency on |
|
Hmm, I guess this is boost's Should I also modify |
This reduces the closure size on master by 40MiB. ``` $ nix build github:nixos/nix/1e822bd4149a8bce1da81ee2ad9404986b07914c#nix-store --out-link closure-on-master $ nix build .#nix-store -L --out-link closure-without-icu $ nix path-info --closure-size -h ./closure-on-master /nix/store/8gwr38m5h6p7245ji9jv28a2a11w1isx-nix-store-2.29.0pre 124.4 MiB $ nix path-info --closure-size -h ./closure-without-icu /nix/store/k0gwfykjqpnmaqbwh23nk55lhanc9g24-nix-store-2.29.0pre 86.6 MiB ```
|
@edolstra @Mic92 Seems like master already links to |
| "--with-coroutine" | ||
| "--with-iostreams" | ||
| ]; | ||
| enableIcu = false; |
There was a problem hiding this comment.
We could in a different PR probably use disallowedReferences to make sure we never reference to icu in nix.
|
@xokdvium I wonder if this is also a candidate for builtins.match if it that much faster. I remember the gcc version of std::regex also would stackoverflow on slightly larger input. |
|
@Mic92 that's the plan. There's been several attempts at this (see #7762) and also Lix did the migration as well (lix-project/lix@447212f). Before we do that though I think we need a good test suite. One possible way to do that is to extract |
…3142 libstore: Use `boost::regex` for GC root discovery (backport #13142)
Motivation
As it turns out using
std::regexis actually the bottleneck for root discovery. Just substitutingstd::->boost::makes root discovery twice as fast (3x if counting only userspace time).Some rather ad-hoc measurements to motivate the switch:
(On master)
(After this patch)
boost::regexis a drop-in replacement forstd::regex, but much faster. Doing a simple before/after comparison doesn't surface any change in behavior:Note
This really doesn't make a dent in the actual runtime of GC itself.
Context
Add 👍 to pull requests you find important.
The Nix maintainer team uses a GitHub project board to schedule and track reviews.