Skip to content

switch from std::regex to boost::regex#3826

Closed
Mic92 wants to merge 1 commit intoNixOS:masterfrom
Mic92:fix-match
Closed

switch from std::regex to boost::regex#3826
Mic92 wants to merge 1 commit intoNixOS:masterfrom
Mic92:fix-match

Conversation

@Mic92
Copy link
Member

@Mic92 Mic92 commented Jul 17, 2020

  1. std::regex is not consistent across different platforms (libcxx vs libstdc++)
  2. libstdc++ implementation is braindead:
    https://gcc.gnu.org/bugzilla/show_bug.cgi?id=93502
    and overflows the stack if the input string gets too long.

If this change is accepted I will also replace all other instances.

1. std::regex is not consistent across different platforms (libcxx vs libstdc++)
2. libstdc++ implementation is braindead:
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=93502
   and overflows the stack if the input string gets too long.

If this change is accepted I will also replace all other instances.
@edolstra
Copy link
Member

I'm not really in favor of this. boost::regex is just the predecessor of std::regex so going back to boost::regex seems like a step back. We'd be pulling in an external dependency for something that's already provided by the standard library.

BTW what string sizes are we talking about here?

@Mic92
Copy link
Member Author

Mic92 commented Jul 17, 2020

Depends on the platform's default stacksize limit but on x86_64-linux this leads to reliable crashes: #3804

@edolstra
Copy link
Member

Looks like this may get fixed upstream: https://gcc.gnu.org/bugzilla/show_bug.cgi?id=86164#c8

@Mic92
Copy link
Member Author

Mic92 commented Jul 17, 2020

I'm not really in favor of this. boost::regex is just the predecessor of std::regex so going back to boost::regex seems like a step back. We'd be pulling in an external dependency for something that's already provided by the standard library.

BTW what string sizes are we talking about here?

It does not really increase the closure size though in Nix. There might be some distributions such as Debian that package liboost_regex separately: https://packages.debian.org/sid/amd64/libboost-regex1.71-dev/filelist

What is your proposed fix instead? fixing libstdc++'s regex is beyond my capabilities.

@Mic92
Copy link
Member Author

Mic92 commented Jul 17, 2020

Looks like this may get fixed upstream: gcc.gnu.org/bugzilla/show_bug.cgi?id=86164#c8

Not really the commenter gave up.

@Mic92
Copy link
Member Author

Mic92 commented Jul 17, 2020

I'm not really in favor of this. boost::regex is just the predecessor of std::regex so going back to boost::regex seems like a step back. We'd be pulling in an external dependency for something that's already provided by the standard library.

BTW what string sizes are we talking about here?

28k in my case.

@edolstra
Copy link
Member

Note that libboost_regex.so is about 1.3 MB, but worse, it pulls in libicu4c which is 34 MB.

@Mic92 Mic92 closed this Jul 17, 2020
@Mic92
Copy link
Member Author

Mic92 commented Jul 19, 2020

I think it used regcomp before: b05b98d#diff-52d20285486e90d564ca2535d4d0fe55L11

It could be reverted to that. However its too much guessing from my side what the maintainers actually want as a acceptable solution so I just leave it as it is given that the one bug in nixpkgs itself is fixed...

@Mic92 Mic92 deleted the fix-match branch July 19, 2020 07:28
@zimbatm
Copy link
Member

zimbatm commented Jul 20, 2020

+1 for PCRE, oniguruma or any other regexp engine that has a consistent evaluation across platforms.

@adisbladis
Copy link
Member

@zimbatm While I would absolutely love PCRE the problem is that Nix is already using POSIX ERE and migrating to something else would break builtins.match.

lf- pushed a commit to lix-project/lix that referenced this pull request Aug 22, 2024
This avoids C++'s standard library regexes, which aren't the same
across platforms, and have many other issues, like using stack
so much that they stack overflow when processing a lot of data.

To avoid backwards and forward compatibility issues, regexes are
processed using a function converting libstdc++ regexes into Boost
regexes, escaping characters that Boost needs to have escaped, and
rejecting features that Boost has and libstdc++ doesn't.

Related context:

- Original failed attempt to use `boost::regex` in CppNix, failed due to
  boost icu dependency being large (disabling ICU is no longer necessary
  because linking ICU requires using a different header file,
  `boost/regex/icu.hpp`): NixOS/nix#3826

- An attempt to use PCRE, rejected due to providing less backwards
  compatibility with `std::regex` than `boost::regex`:
  NixOS/nix#7336

- Second attempt to use `boost::regex`, failed due to `}` regex failing
  to compile (dealt with by writing a wrapper that parses a regular
  expression and escapes `}` characters):
  NixOS/nix#7762

Closes #34. Closes #476.

Change-Id: Ieb0eb9e270a93e4c7eed412ba4f9f96cb00a5fa4
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants