Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Regex stack overflow in Neko/HL #10867

Closed
nspitko opened this issue Dec 1, 2022 · 5 comments
Closed

Regex stack overflow in Neko/HL #10867

nspitko opened this issue Dec 1, 2022 · 5 comments
Assignees
Milestone

Comments

@nspitko
Copy link

nspitko commented Dec 1, 2022

In my quest to write the most cursed regexes, I found one that breaks on moderately sized strings.

This looks to be an issue in the regex libs itself, though I'm filing here as a first point of contact, since I don't have a good way to test if it's due to some shenanigans on our end or not.

Test case:

class Test {
  static function main() {
    var r = ~/\/\*(((?!\*\/).)*)$/s;
    var a = "Lorem ipsum dolor sit amet, consectetur adipiscing elit, sed do eiusmod tempor incididunt ut labore et dolore magna aliqua. Ut enim ad minim veniam, quis nostrud exercitation ullamco laboris nisi ut aliquip ex ea commodo consequat. Duis aute irure dolor in reprehenderit in voluptate velit esse cillum dolore eu fugiat nulla pariatur. Excepteur sint occaecat cupidatat non proident, sunt in culpa qui officia deserunt mollit anim id est laborum.";
    
    var test = "/*";
    for( i in 0 ... 10 )
    {
      test += a;
      trace( 'iter $i: ${r.match(test)}');    
    }
    
    trace( "Finished");
  }
}

Or if you prefer, a try.haxe link: https://try.haxe.org/#cAfb5b04

This code works fine in Javascript and python, I've not tested other platforms yet. However, both Neko and HL fail (Neko simply aborts, Hashlink throws an unrecoverable stack overflow exception.

If you wind down the iterations it'll succeed, so it's limited to longer string lengths.

@tobil4sk
Copy link
Member

tobil4sk commented Dec 1, 2022

I tested with a nightly build of neko, where the regex engine has been updated to pcre2, and it seems to fix the issue. pcre2 was designed to avoid stack overflow issues like this.

The migration is still not complete for Hashlink, however, it can be tested with this PR, which also seems to fix it for me: HaxeFoundation/hashlink#515

It's likely to fail also on Eval and Lua, which also still use pcre1.

@Uzume
Copy link

Uzume commented Dec 3, 2022

This is just another reason to make the pcre2 migration.

Lua should not be that hard to get updated but Eval will be considerably harder (since it requires a currently nonexistent Ocaml binding for pcre2).

@Simn Simn added this to the Release 4.3 milestone Mar 24, 2023
@Uzume
Copy link

Uzume commented Mar 25, 2023

Lua issue seems to be fixed with #11030

@Simn Simn self-assigned this Mar 25, 2023
Simn added a commit that referenced this issue Mar 25, 2023
@tobil4sk
Copy link
Member

I tested with a nightly build of neko, [...] and it seems to fix the issue

My bad, I must have been using a build of neko from this PR: HaxeFoundation/neko#262, so it will have to be merged for the neko tests to pass. This looks like this could have the same root cause as #8829.

Simn added a commit that referenced this issue Mar 26, 2023
@Simn
Copy link
Member

Simn commented Mar 26, 2023

That helped, thanks!

@Simn Simn closed this as completed Mar 26, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

4 participants