-
Notifications
You must be signed in to change notification settings - Fork 92
Add limit for regex length #75
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
Signed-off-by: Arthur Chan <[email protected]>
|
@arthurscchan I think this is a good idea but can you investigate making this a configurable item. joni has a lot of items you can set. You can look at #67 for how to do this. |
|
@enebo Thanks for the suggestion. I will fix the PR accordingly. |
Signed-off-by: Arthur Chan <[email protected]>
|
@enebo Fixed accordingly, thanks again for your suggestion. |
|
shouldnt there be end - p instead of bytes.length ? |
Signed-off-by: Arthur Chan <[email protected]>
|
@arthurscchan Thanks for the PR. I think this will add another lever to control time spent. |
|
believe logstash's grok-ing patterns might end up with regular expressions even in the 1000s (although they still should not be that complicated and usually do not blow up the stack) ... hopefully not |
|
@kares @arthurscchan @lopex I did hit this the other month while running Ruby unit tests so I increased it....BUT...This is not a minor release change. Any existing user may stop working just because they have a big regexp they intend to have. So I changed the default to -1 and you need to set the value to opt into it. We can open an issue for this if we want to change this to opt out with a reasonable default. The commit is 03ac9a5. |
|
@enebo JRuby 10 might be the right time to attempt a hard limit. |
|
@headius I am wondering if this should ever have a hard limit or should just be available as an option. Thus far it has just hit two projects with reasonable but big regexps. regexp instruction count does not imply performance issue but I can if you are building up something from a dsl or something perhaps you will want this. |
|
@enebo Ahh I read more of the context and I agree this probably should only be an opt-in feature, at least until someone reports a CVE that we need to address. 😀 I would guess that CRuby has a similar problem in Onigmo, but perhaps nobody has tried. @arthurscchan Is there some way we could reproduce the issue you found and test other Ruby Regexp implementations? |
|
@headius Because this issue was reported almost a year ago, I don't have the "actual" input in hand anymore. But in my understanding and what has been discovered from OSS-Fuzz fuzzing, the input byte array that triggers the problem is a very long byte array with a complex structure, for example, a high dimension nested array or some invalid byte array that keeps opening nested elements until the heap memory is overflow. The length of the byte array reached more than 100,000 bytes in there as I recall. |
|
@arthurscchan Ok thanks. I guess we'll just let it ride for now. |
This fixes a possible out of stack memory problem in src/org/joni/Regex.java for parsing long and complicated regex structures.
In the constructors of the Regex class, it will take in a string or a byte array as for generating a regex matcher. The received string or byte array will be stored and go through the
compile()method of the Analyser class. The Analyser class will then use different parsing methods in the Parser class to parse the regular expression. Those parse methods will call each other recursively depending on the structure of the given string or byte array. If the input is too long or contains so many complicated or levelled structures, the depth of the recursion will be high. As each of the recursive calls will occupy part of the stack, the high depth of recursion will use up the stack quickly and result in a stack memory overflow problem.This PR reduces the problem by adding an additional length limit as a private class variable. If the provided regex string or byte array is longer than the maximum length. The method will simply throw a ValueException.
We found this bug using fuzzing by way of OSS-Fuzz, where we recently integrated joni (google/oss-fuzz#10680). OSS-Fuzz is a free service run by Google for fuzzing important open source software. If you'd like to know more about this then I'm happy to go into detail and also set up things so you can receive emails and detailed reports when bugs are found.