Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix chained strings #1107

Merged
merged 5 commits into from
Aug 12, 2019
Merged

Fix chained strings #1107

merged 5 commits into from
Aug 12, 2019

Conversation

plusvic
Copy link
Member

@plusvic plusvic commented Aug 9, 2019

The function yr_re_ast_split_at_chaining_point was not splitting correctly regexps and hex strings that contained more than one large jump (a.k.a gap). For example, { 01 02 03 [0-1000] 04 05 06 [500-2000] 07 08 09 } was not splitted in three parts as expected, but only in two parts: { 01 02 03 } and { 04 05 06 [500-2000] 07 08 09 } . This didn't affected the correctness of the matches, but it caused ERROR_TOO_MANY_RE_FIBERS in some cases while trying to match { 04 05 06 [500-2000] 07 08 09 } because of the large jump in the pattern. The occurrence of the error depended on the data being scanned.

This PR fixes the aforementioned issue, improves the documentation of related functions and simplifies some portions of the code.

The function yr_re_ast_split_at_chaining_point was not splitting correctly regexps and hex strings that contained more than one large jump (a.k.a gap). For example, { 01 02 03 [0-1000] 04 05 06 [500-2000] 07 08 09 } was not splitted in three parts as expected, but only in two parts: { 01 02 03 } and { 04 05 06 [500-2000] 07 08 09 } . This didn't affected the correctness of the matches, but it caused ERROR_TOO_MANY_RE_FIBERS in some cases while trying to match { 04 05 06 [500-2000] 07 08 09 } because of the large jump in the pattern. The occurrence of the error depended on the data being scanned.

This commit fixes the aforementioned issue, improves the documentation of related functions and simplifies some portions of the code.
This fixes a small issue introduced in #1096. The same string could be put more than once in the matching_strings_arena, this doesn't causes any major error nor affect the correctness of the matching, but it could have a negative impact on performance.
@plusvic plusvic requested a review from wxsBSD August 9, 2019 07:56
Copy link
Collaborator

@wxsBSD wxsBSD left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank for the the extensive comments here! This has certainly helped me understand things better. :)

@@ -520,10 +520,10 @@ int _yr_atoms_trim(
}

// If the number of unknown bytes is >= than the number of known bytes
// it doesn't make sense the to use this atom, so we use the a single byte
// atom with the first known byte. If YR_MAX_ATOM_LENGTH == 4 this happens
// it doesn't make sense the to use this atom, so we use a single byte atpm
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Some typos here. I think it should be:

// it doesn't make sense to use this atom, so we use a single byte atom

int64_t fixed_offset;

// Each item in the "matches" array represents a list of matches, there's one
// list of matches for each thread currently using the compiled rules. Up to
// YR_MAX_THREADS can use the compiled rules simultaneosly, that's why the
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Minor typo: simultaneously

// chain is also matching. For example, if the string S was splitted and
// converted in a chain S1 <- S2 <- S3 (see yr_re_ast_split_at_chaining_point),
// and a match for S3 was found, this functions finds out if there are matches
// for S1 and S2 that together with the match found for S3 conform a match for
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

confirm

if (matching_string->matches[tidx].count == 0 &&
matching_string->private_matches[tidx].count == 0 &&
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good catch! Sorry for missing this. :(

@@ -461,17 +461,18 @@ int yr_parser_reduce_string_declaration(
SIZED_STRING* str,
YR_STRING** string)
{
int min_atom_quality = YR_MIN_ATOM_QUALITY;
int min_atom_quality_aux = YR_MIN_ATOM_QUALITY;
int min_atom_quality = YR_MAX_ATOM_QUALITY;
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Shouldn't this be YR_MIN_ATOM_QUALITY?

@plusvic plusvic merged commit 5aa70a0 into master Aug 12, 2019
@plusvic plusvic deleted the fix_chained_strings branch October 10, 2019 10:55
tarterp pushed a commit to mandiant/yara that referenced this pull request Mar 31, 2022
* Add test case.

* Fix issue with regexps and hex strings that were not splitted correctly.

The function yr_re_ast_split_at_chaining_point was not splitting correctly regexps and hex strings that contained more than one large jump (a.k.a gap). For example, { 01 02 03 [0-1000] 04 05 06 [500-2000] 07 08 09 } was not splitted in three parts as expected, but only in two parts: { 01 02 03 } and { 04 05 06 [500-2000] 07 08 09 } . This didn't affected the correctness of the matches, but it caused ERROR_TOO_MANY_RE_FIBERS in some cases while trying to match { 04 05 06 [500-2000] 07 08 09 } because of the large jump in the pattern. The occurrence of the error depended on the data being scanned.

This commit fixes the aforementioned issue, improves the documentation of related functions and simplifies some portions of the code.

* Avoid putting the same string multiple times in matching_strings_arena.

This fixes a small issue introduced in VirusTotal#1096. The same string could be put more than once in the matching_strings_arena, this doesn't causes any major error nor affect the correctness of the matching, but it could have a negative impact on performance.

* Add more comments.

* Improve documentation for _yr_scan_verify_chained_string_match.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants