-
-
Notifications
You must be signed in to change notification settings - Fork 363
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Replace current regex engine with PCRE2 #4033
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
- Please send it from
dist-
branch fromrizinorg
account, to check that it builds and runs fine on all supported platforms and configurations - Once it's done, I think it is worth removing the old implementation as well.
would be nice to also allow PCRE (v1) for older distro |
Currently for every distro which has no pcre2 it is compiled as subproject. Shouldn't this be enough? |
Yes, I think it should be enough. We have the same strategy for Capstone dependency as well. |
The PCRE2 defines a huge amount of options and flags. Is it fine if we do not add an |
you should only use |
558652e
to
4168046
Compare
Tests/build still fail, but I'd like to know your opinion on the API first. Because now there are some changes in actual code now. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please also fix test\db\archos\windows-x64\dbg_dts
- remove \n
or just don't include it in the capture.
macOS (Darwin) regexes should be fixed too, in particular with whitespace and newlines handling: https://ci.rizin.re/repos/27/pipeline/4083/7#L616
OpenBSD error message is puzzling:
ERROR: Regex compilation failed at 0: no more memory
Same happens on NetBSD too.
This comment was marked as resolved.
This comment was marked as resolved.
This comment was marked as resolved.
This comment was marked as resolved.
Yep, aware of it. Just wait and bundle it with other fixes, so the CI is not triggered again and again. |
BSD problems seem to be a bug in PCRE2 (https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=276252). Hopefully only affecting the JIT compiler? Going make a minimal example and open an issue later. If it is indeed a BSD only problem, I would just exclude BSD from JIT/ |
@Rot127 yes, though FreeBSD works just fine. The problem occurs only in OpenBSD and NetBSD, please disable JIT only for those, but not the FreeBSD. |
Once this PR is green after the rebase, please send one from the |
Absolutely. The commit history is a mess |
5ff6680
to
5a5d5af
Compare
This comment was marked as resolved.
This comment was marked as resolved.
Should be fixed now. |
PCRE2 has way better performance than the OpenBSD library (something around 20 times faster). The following flags are enabled for every pattern: - PCRE2_UTF - PCRE2_MATCH_INVALID_UTF - PCRE2_NO_UTF_CHECK All the others are optional. Changes made: - Adds PCRE2 as subproject. - Changes the API away from POSIX to PCRE2. - Edits many regex patterns because: - ' ' is skipped in patterns, if the EXTENDED flag is set for matching. '\s' must be set now. - '.' doesn't match newlines by default. - Changes the API so matches and their groups are bundled into PVectors. - Moves the regex component to rz_util.
@Rot127 please send a new PR from inside the |
Your checklist for this pull request
Detailed description
Replace the current regex engine with PCRE2.
Test plan
All green.
Closing issues
closes #3730
Partially addresses #4055
Todo
rz_vector_pop_ptr()
function which doesn't usememcpy()
and use it inmatch_all_flat()
. Or add arz_vector_concat()
function with the same functionality.__asm
rz_regex_get_match_name