Skip to content

Switch from std::regex to boost::regex#7762

Merged
thufschmitt merged 2 commits intoNixOS:masterfrom
yorickvP:boost-regex
Nov 27, 2023
Merged

Switch from std::regex to boost::regex#7762
thufschmitt merged 2 commits intoNixOS:masterfrom
yorickvP:boost-regex

Conversation

@yorickvP
Copy link
Contributor

@yorickvP yorickvP commented Feb 6, 2023

Motivation

Requested in #7336

Context

Fixes #2147, fixes #4758
See also #3826.

  • s/std::regex/regex/g, added aliases to enable switching back and forth in the future.
  • Disabled enableIcu on boost dep, but this hasn't taken effect yet because this change isn't in nixpkgs 22.11.

Checklist for maintainers

Maintainers: tick if completed or explain if not relevant

  • agreed on idea
  • agreed on implementation strategy
  • tests, as appropriate
    • functional tests - tests/**.sh
    • unit tests - src/*/tests
    • integration tests - tests/nixos/*
  • documentation in the manual
  • code and comments are self-explanatory
  • commit message explains why the change was made
  • new feature or bug fix: updated release notes

@edolstra
Copy link
Member

edolstra commented Feb 6, 2023

I guess we should merge this after the enableIcu change has landed in stable Nixpkgs.

Copy link
Member

@thufschmitt thufschmitt left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice :)

Can you confirm that the closure size doesn't increase too much (with a nixpkgs that has the enableIcu patch)?

Other than that, I agree with Eelco, let's just wait for enableIcu to land in nixpkgs (can you link to the PR for that?), and then we can merge

@yorickvP
Copy link
Contributor Author

yorickvP commented Feb 7, 2023

enableIcu is NixOS/nixpkgs#205166, but not backported to 22.11

Sizes:

$ nix path-info ./master
/nix/store/dlzwlll6d5k5yjfg98q51szgzxgqhmf7-nix-2.14.0pre20230203_895dfc6                  	  16.7M	  88.1M

$ nix path-info ./boost-regex
/nix/store/r3x96j3kmcs8dv4l02rrjmbhm535jycy-icu4c-72.1                                     	  36.8M	  76.7M
/nix/store/xyxm8cp19vrqnfssvhkm6k50ry0gaqhj-nix-2.14.0pre20230207_dirty                    	  17.3M	 125.5M

$ nix path-info ./boost-regex-without-icu
/nix/store/74jhsx372zq72hga6l2llfks700p3bam-nix-2.14.0pre20230207_dirty                    	  17.3M	  88.7M

@thufschmitt
Copy link
Member

thufschmitt commented Feb 10, 2023

(answering #7336 (comment) here since it's where the boost implementation takes place):

@SuperSandro2000 that's a very valid point. @yorickvP do you think you could benchmark this quickly to see whether it has an impact? Something like hyperfine --warmup 1 "nix build nix/895dfc656a21f6252ddf48df0d1f215effa04ecb --rebuild" "nix build nix/d06b144dc38f509bb35f940b38d810a94ad5b852 --rebuild" on a quiet machine should do it

@yorickvP
Copy link
Contributor Author

boost::regex:

$ hyperfine --warmup 1 'make -j32 --assume-new src/libexpr/primops.cc'
Benchmark 1: make -j32 --assume-new src/libexpr/primops.cc
  Time (mean ± σ):     23.082 s ±  0.447 s    [User: 23.026 s, System: 1.366 s]
  Range (min … max):   22.213 s … 23.520 s    10 runs

std::regex:

$ hyperfine --warmup 1 'make -j32 --assume-new src/libexpr/primops.cc'
Benchmark 1: make -j32 --assume-new src/libexpr/primops.cc
  Time (mean ± σ):     21.586 s ±  0.192 s    [User: 21.564 s, System: 1.331 s]
  Range (min … max):   21.355 s … 21.934 s    10 runs

The above also tests linking time.

If compilation speed is a consideration, I'd suggest going with PCRE ;).

@thufschmitt
Copy link
Member

I think a 2s difference is quite OK. I benchmarked the time of a full nix build (it's cold out there, needed to heat my office a bit), and the difference is neglegible

Summary
  'nix build nix/d06b144dc38f509bb35f940b38d810a94ad5b852 --rebuild' ran
    1.00 ± 0.01 times faster than 'nix build nix/895dfc656a21f6252ddf48df0d1f215effa04ecb --rebuild'

@SuperSandro2000
Copy link
Member

The difference should be much bigger when boost would be fully removed which is a design decision completely out of scope of this PR.

See https://www.factorio.com/blog/post/fff-206#:~:text=step%203%20-%20getting%20rid%20of%20boost for an example and some stats

@yorickvP
Copy link
Contributor Author

@SuperSandro2000 Factorio looks like it was using a lot more boost than Nix.

it might be worth it to remove boost::lexical_cast from util.hh or boost::format from logging, but the rest of the boost usage seems to be very contained and not likely to matter much for compile times.

I don't predict more than a 10% compile time increase if we replace it.

@nixos-discourse
Copy link

This pull request has been mentioned on NixOS Discourse. There might be relevant details there:

https://discourse.nixos.org/t/tweag-nix-dev-update-44/25546/1

@nixos-discourse
Copy link

This pull request has been mentioned on NixOS Discourse. There might be relevant details there:

https://discourse.nixos.org/t/2023-04-10-nix-team-meeting-minutes-47/27357/1

@yorickvP
Copy link
Contributor Author

Nix team meeting notes:

@regnat had talked to @yorickvP, will continue after next NixOS stable so we can use for next Nix release.

I think we should merge it so that it ends up in 23.05, as well. Would that be possible?

@con-f-use
Copy link

Any reason why this never got merged?

@yorickvP
Copy link
Contributor Author

Needed to wait for enableIcu to land on the boost dependency, which I think is the case now.

@fricklerhandwerk
Copy link
Contributor

fricklerhandwerk commented Oct 27, 2023

Revisited in Nix team meeting:

  • Both fixes a bug and makes regex support consistent
  • We want this, but needs to be rebased
  • Should be compatible, std::regex is iirc derived from boost::regex
  • Good test suite? Probably not but don't want to block on that.

@yorickvP since the preconditions are met, can you rebase?

@nixos-discourse
Copy link

This pull request has been mentioned on NixOS Discourse. There might be relevant details there:

https://discourse.nixos.org/t/2023-10-27-nix-team-meeting-minutes-98/34695/1

@iFreilicht
Copy link
Contributor

As far as I understand, this would also fix #8768 (at least the inconsistency part), #3047 and #1537.

@yorickvP
Copy link
Contributor Author

Rebased!

@thufschmitt
Copy link
Member

Thanks, @yorickvP !

Set to auto-merge since it was already approved on principle.

@thufschmitt thufschmitt merged commit d46230e into NixOS:master Nov 27, 2023
@nixos-discourse
Copy link

This pull request has been mentioned on NixOS Discourse. There might be relevant details there:

https://discourse.nixos.org/t/2023-11-27-nix-team-meeting-minutes-107/36112/1

@9999years
Copy link
Contributor

This broke lib.escapeRegex from nixpkgs. Before:

$ nix repl nixpkgs
Welcome to Nix 2.18.1. Type :? for help.

Loading installable 'flake:nixpkgs#'...
Added 5 variables.
nix-repl> builtins.match (lib.escapeRegex "{}") "{}"
[ ]

After:

$ ~/nix/result/bin/nix repl nixpkgs
Welcome to Nix 2.20.0pre20231130_dirty. Type :? for help.

Loading installable 'flake:nixpkgs#'...
Added 5 variables.
nix-repl> builtins.match (lib.escapeRegex "{}") "{}"
error:
       … while calling the 'match' builtin

         at «string»:1:1:

            1| builtins.match (lib.escapeRegex "{}") "{}"
             | ^

       error: invalid regular expression '\{}'

9999years added a commit to 9999years/nixpkgs that referenced this pull request Nov 30, 2023
NixOS/nix#7762 switched `nix` to use the
`boost::regex` implementation, which expects that all literal braces are
escaped. That is, the regex `\{crate}` no longer parses. We modify
`lib.escapeRegex` to accommodate this change. Fortunately, the old regex
implementation doesn't mind if closing braces are escaped as well, so
this is backwards compatible and we don't need to worry about
version-gating it.

Without this patch:

Old regex implementation:

    $ nix repl nixpkgs
    Welcome to Nix 2.18.1. Type :? for help.

    Loading installable 'flake:nixpkgs#'...
    Added 5 variables.
    nix-repl> builtins.match (lib.escapeRegex "{}") "{}"
    [ ]

    $ ~/nix/result/bin/nix repl nixpkgs
    Welcome to Nix 2.20.0pre20231130_dirty. Type :? for help.

`boost::regex` implementation:

    $ ~/nix/result/bin/nix repl nixpkgs
    Welcome to Nix 2.20.0pre20231130_dirty. Type :? for help.

    Loading installable 'flake:nixpkgs#'...
    Added 5 variables.
    nix-repl> builtins.match (lib.escapeRegex "{}") "{}"
    error:
           … while calling the 'match' builtin

             at «string»:1:1:

                1| builtins.match (lib.escapeRegex "{}") "{}"
                 | ^

           error: invalid regular expression '\{}'
@infinisil
Copy link
Member

This should be reverted, I opened #9508 to do that

@thufschmitt
Copy link
Member

Thanks for the revert @infinisil !

Could have been caught earlier?. Is there a nixpkgs lib test suite that we can use? (We already test that the output of nix-env -qaP --drv-path -f <someFixedNixpkgsVersion> doesn't change, but that's apparently not enough)

@roberth
Copy link
Member

roberth commented Dec 1, 2023

We have nixpkgsLibTests on Hydra. It runs ${nixpkgs}/lib/tests/release.nix with the fresh Nix.

@yorickvP
Copy link
Contributor Author

yorickvP commented Dec 1, 2023

Hrm, sad. I hope it just needs boost::regex::extended. Should test.

@thufschmitt
Copy link
Member

We have nixpkgsLibTests on Hydra. It runs ${nixpkgs}/lib/tests/release.nix with the fresh Nix.

Oh, thanks, I missed that one.
It looks like it didn't catch that precise error though. Is it just a case of the tests not covering it?

@roberth
Copy link
Member

roberth commented Dec 1, 2023

lib is only responsible for testing itself, but we use it here as a cheap way to hopefully extend our own coverage a bit.

lf- pushed a commit to lix-project/lix that referenced this pull request Aug 22, 2024
This avoids C++'s standard library regexes, which aren't the same
across platforms, and have many other issues, like using stack
so much that they stack overflow when processing a lot of data.

To avoid backwards and forward compatibility issues, regexes are
processed using a function converting libstdc++ regexes into Boost
regexes, escaping characters that Boost needs to have escaped, and
rejecting features that Boost has and libstdc++ doesn't.

Related context:

- Original failed attempt to use `boost::regex` in CppNix, failed due to
  boost icu dependency being large (disabling ICU is no longer necessary
  because linking ICU requires using a different header file,
  `boost/regex/icu.hpp`): NixOS/nix#3826

- An attempt to use PCRE, rejected due to providing less backwards
  compatibility with `std::regex` than `boost::regex`:
  NixOS/nix#7336

- Second attempt to use `boost::regex`, failed due to `}` regex failing
  to compile (dealt with by writing a wrapper that parses a regular
  expression and escapes `}` characters):
  NixOS/nix#7762

Closes #34. Closes #476.

Change-Id: Ieb0eb9e270a93e4c7eed412ba4f9f96cb00a5fa4
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

Archived in project

Development

Successfully merging this pull request may close these issues.

builtins.match regex support "broken" on aarch64-darwin builtins.match regression: stack overflow on large strings