-
Notifications
You must be signed in to change notification settings - Fork 1.4k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Fix chained strings #1107
Fix chained strings #1107
Conversation
The function yr_re_ast_split_at_chaining_point was not splitting correctly regexps and hex strings that contained more than one large jump (a.k.a gap). For example, { 01 02 03 [0-1000] 04 05 06 [500-2000] 07 08 09 } was not splitted in three parts as expected, but only in two parts: { 01 02 03 } and { 04 05 06 [500-2000] 07 08 09 } . This didn't affected the correctness of the matches, but it caused ERROR_TOO_MANY_RE_FIBERS in some cases while trying to match { 04 05 06 [500-2000] 07 08 09 } because of the large jump in the pattern. The occurrence of the error depended on the data being scanned. This commit fixes the aforementioned issue, improves the documentation of related functions and simplifies some portions of the code.
This fixes a small issue introduced in #1096. The same string could be put more than once in the matching_strings_arena, this doesn't causes any major error nor affect the correctness of the matching, but it could have a negative impact on performance.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank for the the extensive comments here! This has certainly helped me understand things better. :)
@@ -520,10 +520,10 @@ int _yr_atoms_trim( | |||
} | |||
|
|||
// If the number of unknown bytes is >= than the number of known bytes | |||
// it doesn't make sense the to use this atom, so we use the a single byte | |||
// atom with the first known byte. If YR_MAX_ATOM_LENGTH == 4 this happens | |||
// it doesn't make sense the to use this atom, so we use a single byte atpm |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Some typos here. I think it should be:
// it doesn't make sense to use this atom, so we use a single byte atom
int64_t fixed_offset; | ||
|
||
// Each item in the "matches" array represents a list of matches, there's one | ||
// list of matches for each thread currently using the compiled rules. Up to | ||
// YR_MAX_THREADS can use the compiled rules simultaneosly, that's why the |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Minor typo: simultaneously
// chain is also matching. For example, if the string S was splitted and | ||
// converted in a chain S1 <- S2 <- S3 (see yr_re_ast_split_at_chaining_point), | ||
// and a match for S3 was found, this functions finds out if there are matches | ||
// for S1 and S2 that together with the match found for S3 conform a match for |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
confirm
if (matching_string->matches[tidx].count == 0 && | ||
matching_string->private_matches[tidx].count == 0 && |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good catch! Sorry for missing this. :(
@@ -461,17 +461,18 @@ int yr_parser_reduce_string_declaration( | |||
SIZED_STRING* str, | |||
YR_STRING** string) | |||
{ | |||
int min_atom_quality = YR_MIN_ATOM_QUALITY; | |||
int min_atom_quality_aux = YR_MIN_ATOM_QUALITY; | |||
int min_atom_quality = YR_MAX_ATOM_QUALITY; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Shouldn't this be YR_MIN_ATOM_QUALITY
?
* Add test case. * Fix issue with regexps and hex strings that were not splitted correctly. The function yr_re_ast_split_at_chaining_point was not splitting correctly regexps and hex strings that contained more than one large jump (a.k.a gap). For example, { 01 02 03 [0-1000] 04 05 06 [500-2000] 07 08 09 } was not splitted in three parts as expected, but only in two parts: { 01 02 03 } and { 04 05 06 [500-2000] 07 08 09 } . This didn't affected the correctness of the matches, but it caused ERROR_TOO_MANY_RE_FIBERS in some cases while trying to match { 04 05 06 [500-2000] 07 08 09 } because of the large jump in the pattern. The occurrence of the error depended on the data being scanned. This commit fixes the aforementioned issue, improves the documentation of related functions and simplifies some portions of the code. * Avoid putting the same string multiple times in matching_strings_arena. This fixes a small issue introduced in VirusTotal#1096. The same string could be put more than once in the matching_strings_arena, this doesn't causes any major error nor affect the correctness of the matching, but it could have a negative impact on performance. * Add more comments. * Improve documentation for _yr_scan_verify_chained_string_match.
The function yr_re_ast_split_at_chaining_point was not splitting correctly regexps and hex strings that contained more than one large jump (a.k.a gap). For example, { 01 02 03 [0-1000] 04 05 06 [500-2000] 07 08 09 } was not splitted in three parts as expected, but only in two parts: { 01 02 03 } and { 04 05 06 [500-2000] 07 08 09 } . This didn't affected the correctness of the matches, but it caused ERROR_TOO_MANY_RE_FIBERS in some cases while trying to match { 04 05 06 [500-2000] 07 08 09 } because of the large jump in the pattern. The occurrence of the error depended on the data being scanned.
This PR fixes the aforementioned issue, improves the documentation of related functions and simplifies some portions of the code.