-
Notifications
You must be signed in to change notification settings - Fork 1.4k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
feat: skip bytecode evaluation for some rules without string matches #1927
Conversation
Optimize a common case where YARA conditions are formed like e.g. "... and 1 of them and ...", in other words, requiring a string match to ever be true. By noting these cases and recording in a bitmap if a string match occurred, the condition evaluation for these rules can be skipped entirely in most cases.
In case it's relevant, some background information that led to this PR: Some additional optimizations (like: tracking which strings actually have to match for the condition to possibly be true, tracking the number of string matches, ...) could be implemented if considered worthwhile. |
libyara/scanner.c
Outdated
@@ -498,6 +508,17 @@ YR_API int yr_scanner_scan_mem_blocks( | |||
if (result != ERROR_SUCCESS) | |||
goto _exit; | |||
|
|||
for (i = 0; i < scanner->rules->num_rules; i++) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
My only concern is this loop. The loop gets executed with every scanned file, and it may take a while when you have hundreds of thousands of rules as we have in VirusTotal. I think this could be donde more efficiently if we store the original bitmask in scanner->rules
, and with every scanned file rule_evaluate_condition_flags
is initialized by making a copy of the original bitmask using memcpy
. This should be much more efficient as memcpy will copy multiple bits per CPU instruction, instead of having multiple instructions per bit, like in this case.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done
PR #1927 didn't take into account that as a side-effect of skipping the evaluation of global rules, the corresponding namespace was not flagged as unsatisfied (i.e: a global rule that evaluates to false prevents all other rules in the namespace from evaluating to true). The issue was not caught by any test case because the test case for global rules were using rules that didn't required any strings to match, and therefore were unaffected by the changes introduced in #1927. This change makes sure that any global rule that is skipped is considered false, and therefore its namespace is flagged as unsatisfied. Closes #2022.
Optimize a common case where YARA conditions are formed like e.g. "... and 1 of them and ...", in other words, requiring a string match to ever be true.
By noting these cases and recording in a bitmap if a string match occurred, the condition evaluation for these rules can be skipped entirely in most cases.