Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

unnecessary PREG_OFFSET_CAPTURE usage (?) #110

Open
ViliusS opened this issue Apr 18, 2024 · 3 comments
Open

unnecessary PREG_OFFSET_CAPTURE usage (?) #110

ViliusS opened this issue Apr 18, 2024 · 3 comments

Comments

@ViliusS
Copy link

ViliusS commented Apr 18, 2024

Just wondering is PREG_OFFSET_CAPTURE is needed here

$count = preg_match_all($this->lexerRegex, $text, $matches, PREG_SET_ORDER | PREG_OFFSET_CAPTURE);

I don't see offset used anywhere in the following code however during tests I found that it uses up memory pretty heavily. If I did my testing right, in one case preg_match_all() uses up 80 MB with PREG_OFFSET_CAPTURE and 23 MB without.

@thunderer
Copy link
Owner

Interesting, it looks like a leftover from the time I actually used the offset, now replaced with calculation via mb_strlen(). Would you be interested in preparing a PR for this?

@ViliusS
Copy link
Author

ViliusS commented Apr 18, 2024

I've actually tried that, but the logic in the following switch statement is a little bit over my head. What I could share is why I have investigated this, and minimal test cases which shows memory issues with that preg_match_all() statement https://1drv.ms/f/s!AgnMn-haWyFrkcN50beuEO4A0m6PQw?e=GjwcEV . Ideally I wanted to find out why these content cases uses so much memory, but other cases don't. PREG_OFFSET_CAPTURE looked like a low hanging fruit, but there could be other issues still.

@thunderer
Copy link
Owner

thunderer commented Dec 20, 2024

@ViliusS @funkjedi the PR is merged now, and I also got another idea to make it even better using PCRE MARK verb. Unfortunately this will work in PHP 7+ only, so I'll merge it after figuring out the strategy to bump the required PHP version and modernize the codebase. My plan so far is v0.8.0 for PHP 5.3+, v1.0.0 for PHP 7.1+ and v2.0.0 for PHP 8.1+, with dedicated branches for each major version - master stays for 0.x, then v1.x, and v2.x respectively. PR is here: #116

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants