Fix chained strings #1107

plusvic · 2019-08-09T07:54:30Z

The function yr_re_ast_split_at_chaining_point was not splitting correctly regexps and hex strings that contained more than one large jump (a.k.a gap). For example, { 01 02 03 [0-1000] 04 05 06 [500-2000] 07 08 09 } was not splitted in three parts as expected, but only in two parts: { 01 02 03 } and { 04 05 06 [500-2000] 07 08 09 } . This didn't affected the correctness of the matches, but it caused ERROR_TOO_MANY_RE_FIBERS in some cases while trying to match { 04 05 06 [500-2000] 07 08 09 } because of the large jump in the pattern. The occurrence of the error depended on the data being scanned.

This PR fixes the aforementioned issue, improves the documentation of related functions and simplifies some portions of the code.

The function yr_re_ast_split_at_chaining_point was not splitting correctly regexps and hex strings that contained more than one large jump (a.k.a gap). For example, { 01 02 03 [0-1000] 04 05 06 [500-2000] 07 08 09 } was not splitted in three parts as expected, but only in two parts: { 01 02 03 } and { 04 05 06 [500-2000] 07 08 09 } . This didn't affected the correctness of the matches, but it caused ERROR_TOO_MANY_RE_FIBERS in some cases while trying to match { 04 05 06 [500-2000] 07 08 09 } because of the large jump in the pattern. The occurrence of the error depended on the data being scanned. This commit fixes the aforementioned issue, improves the documentation of related functions and simplifies some portions of the code.

This fixes a small issue introduced in #1096. The same string could be put more than once in the matching_strings_arena, this doesn't causes any major error nor affect the correctness of the matching, but it could have a negative impact on performance.

wxsBSD

Thank for the the extensive comments here! This has certainly helped me understand things better. :)

wxsBSD · 2019-08-09T12:49:33Z

libyara/atoms.c

@@ -520,10 +520,10 @@ int _yr_atoms_trim(
  }

  // If the number of unknown bytes is >= than the number of known bytes
-  // it doesn't make sense the to use this atom, so we use the a single byte
-  // atom with the first known byte. If YR_MAX_ATOM_LENGTH == 4 this happens
+  // it doesn't make sense the to use this atom, so we use a single byte atpm


Some typos here. I think it should be:

// it doesn't make sense to use this atom, so we use a single byte atom

wxsBSD · 2019-08-09T12:53:55Z

libyara/include/yara/types.h

  int64_t fixed_offset;

+  // Each item in the "matches" array represents a list of matches, there's one
+  // list of matches for each thread currently using the compiled rules. Up to
+  // YR_MAX_THREADS can use the compiled rules simultaneosly, that's why the


Minor typo: simultaneously

wxsBSD · 2019-08-09T13:39:07Z

libyara/scan.c

+// chain is also matching. For example, if the string S was splitted and
+// converted in a chain S1 <- S2 <- S3 (see yr_re_ast_split_at_chaining_point),
+// and a match for S3 was found, this functions finds out if there are matches
+// for S1 and S2 that together with the match found for S3 conform a match for


wxsBSD · 2019-08-09T13:48:51Z

libyara/scan.c

      if (matching_string->matches[tidx].count == 0 &&
+          matching_string->private_matches[tidx].count == 0 &&


Good catch! Sorry for missing this. :(

wxsBSD · 2019-08-09T14:01:54Z

libyara/parser.c

@@ -461,17 +461,18 @@ int yr_parser_reduce_string_declaration(
    SIZED_STRING* str,
    YR_STRING** string)
 {
-  int min_atom_quality = YR_MIN_ATOM_QUALITY;
-  int min_atom_quality_aux = YR_MIN_ATOM_QUALITY;
+  int min_atom_quality = YR_MAX_ATOM_QUALITY;


Shouldn't this be YR_MIN_ATOM_QUALITY?

* Add test case. * Fix issue with regexps and hex strings that were not splitted correctly. The function yr_re_ast_split_at_chaining_point was not splitting correctly regexps and hex strings that contained more than one large jump (a.k.a gap). For example, { 01 02 03 [0-1000] 04 05 06 [500-2000] 07 08 09 } was not splitted in three parts as expected, but only in two parts: { 01 02 03 } and { 04 05 06 [500-2000] 07 08 09 } . This didn't affected the correctness of the matches, but it caused ERROR_TOO_MANY_RE_FIBERS in some cases while trying to match { 04 05 06 [500-2000] 07 08 09 } because of the large jump in the pattern. The occurrence of the error depended on the data being scanned. This commit fixes the aforementioned issue, improves the documentation of related functions and simplifies some portions of the code. * Avoid putting the same string multiple times in matching_strings_arena. This fixes a small issue introduced in VirusTotal#1096. The same string could be put more than once in the matching_strings_arena, this doesn't causes any major error nor affect the correctness of the matching, but it could have a negative impact on performance. * Add more comments. * Improve documentation for _yr_scan_verify_chained_string_match.

plusvic added 5 commits August 8, 2019 20:35

Add test case.

6d0f9ac

Add more comments.

919d67c

Improve documentation for _yr_scan_verify_chained_string_match.

48f930f

plusvic requested a review from wxsBSD August 9, 2019 07:56

wxsBSD reviewed Aug 9, 2019

View reviewed changes

plusvic merged commit 5aa70a0 into master Aug 12, 2019

plusvic deleted the fix_chained_strings branch October 10, 2019 10:55

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Fix chained strings #1107

Fix chained strings #1107

plusvic commented Aug 9, 2019 •

edited

Loading

wxsBSD left a comment

wxsBSD Aug 9, 2019

wxsBSD Aug 9, 2019

wxsBSD Aug 9, 2019

wxsBSD Aug 9, 2019

wxsBSD Aug 9, 2019

		if (matching_string->matches[tidx].count == 0 &&
		matching_string->private_matches[tidx].count == 0 &&

Fix chained strings #1107

Fix chained strings #1107

Conversation

plusvic commented Aug 9, 2019 • edited Loading

wxsBSD left a comment

Choose a reason for hiding this comment

wxsBSD Aug 9, 2019

Choose a reason for hiding this comment

wxsBSD Aug 9, 2019

Choose a reason for hiding this comment

wxsBSD Aug 9, 2019

Choose a reason for hiding this comment

wxsBSD Aug 9, 2019

Choose a reason for hiding this comment

wxsBSD Aug 9, 2019

Choose a reason for hiding this comment

plusvic commented Aug 9, 2019 •

edited

Loading