Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

A new too slow scanning callback #1921

Merged
merged 1 commit into from
May 25, 2023
Merged

Conversation

regeciovad
Copy link
Contributor

The goal was to create a deterministic way to detect potentially slow scanning due to a lower quality of rules.
The first version tested the actual speed. However, other factors, such as CPU usage, could influence this.
In this version, I was focusing more on indicators of the rules themselves.

The first indicator is where Yara is using 0-length atoms, basically testing input byte by byte. This problem is partially addressed by existing warnings about the low quality of atoms (aka famous slowing-down scanning). Still, due to the changing nature of heuristics for these calculations, it is sometimes hard to conclude this is the case.
However, I did not want to generate a callback if the size of the scanned input is relatively small; thus, the effect of the slowing is not that significant. I tested how the slow rules behave on different sizes of inputs. The slowing was more notable when the files were bigger than 0.2 MB. For that reason, I am generating a callback just for files that are larger than that.

The second indicator is the number of potential matches. If the count is higher than one million, the ERROR_TOO_MANY_MATCHES is returned. However, even the lower bound can indicate that something is wrong.
I tested some additional factors, but these two showed up as the simplest yet the most effective so far.

Example:

$ cat rule.yar
rule rule_com {
  strings:
    $com = /.{1,2}\.com/
  condition:
    $com
}
$ ./yara rule.yar top-1m.csv
warning: rule "rule_com": scanning with string $com is taking a very long time, it is either too general or very common.
rule_com top-1m.csv

@plusvic
Copy link
Member

plusvic commented May 10, 2023

It looks like the test cases are failing due to some heap overflow detected with --enable-address-sanitizer.

https://github.com/VirusTotal/yara/actions/runs/4927239541/jobs/8803939475?pr=1921

@regeciovad
Copy link
Contributor Author

I am sorry for the late reply. The PR should be fixed now.

@plusvic plusvic merged commit 7f46c88 into VirusTotal:master May 25, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants