A new too slow scanning callback #1921

regeciovad · 2023-05-09T14:54:26Z

The goal was to create a deterministic way to detect potentially slow scanning due to a lower quality of rules.
The first version tested the actual speed. However, other factors, such as CPU usage, could influence this.
In this version, I was focusing more on indicators of the rules themselves.

The first indicator is where Yara is using 0-length atoms, basically testing input byte by byte. This problem is partially addressed by existing warnings about the low quality of atoms (aka famous slowing-down scanning). Still, due to the changing nature of heuristics for these calculations, it is sometimes hard to conclude this is the case.
However, I did not want to generate a callback if the size of the scanned input is relatively small; thus, the effect of the slowing is not that significant. I tested how the slow rules behave on different sizes of inputs. The slowing was more notable when the files were bigger than 0.2 MB. For that reason, I am generating a callback just for files that are larger than that.

The second indicator is the number of potential matches. If the count is higher than one million, the ERROR_TOO_MANY_MATCHES is returned. However, even the lower bound can indicate that something is wrong.
I tested some additional factors, but these two showed up as the simplest yet the most effective so far.

Example:

$ cat rule.yar
rule rule_com {
  strings:
    $com = /.{1,2}\.com/
  condition:
    $com
}
$ ./yara rule.yar top-1m.csv
warning: rule "rule_com": scanning with string $com is taking a very long time, it is either too general or very common.
rule_com top-1m.csv

plusvic · 2023-05-10T07:39:20Z

It looks like the test cases are failing due to some heap overflow detected with --enable-address-sanitizer.

https://github.com/VirusTotal/yara/actions/runs/4927239541/jobs/8803939475?pr=1921

regeciovad · 2023-05-23T13:02:49Z

I am sorry for the late reply. The PR should be fixed now.

regeciovad force-pushed the too_slow branch from d747f8e to c9865f7 Compare May 23, 2023 09:05

a new too slow scanning callback

f759ca9

regeciovad force-pushed the too_slow branch from c9865f7 to f759ca9 Compare May 23, 2023 12:36

plusvic approved these changes May 25, 2023

View reviewed changes

plusvic merged commit 7f46c88 into VirusTotal:master May 25, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

A new too slow scanning callback #1921

A new too slow scanning callback #1921

regeciovad commented May 9, 2023

plusvic commented May 10, 2023

regeciovad commented May 23, 2023

A new too slow scanning callback #1921

A new too slow scanning callback #1921

Conversation

regeciovad commented May 9, 2023

plusvic commented May 10, 2023

regeciovad commented May 23, 2023