Support reporting of fuzzing-time stats (by fuzzer.py) #648

jonathanmetzman · 2020-08-10T03:16:17Z

Currently the only stats FuzzBench reports per cycle/snapshot are those recorded by the measurer. This means we cannot report some useful fuzzing-time stats such as executions/sec, stability, CPU/RAM or I/O usage etc.

This PR adds an optional API for fuzzer.py to report extra stats in a uniform way that allows fuzzers to be compared on these stats.
For now it only supports executions per second a float that we call avg_execs.
We can use the same pipeline in runner.py to report fuzz-time stats that are fuzzer independent such as RAM, disk, IO, CPU.
TODO before this is ready for review:

Rename avg_execs to execs_sec. It's more descriptive this way.
Reuse AFL's reporting API implementation in AFL-based fuzzers.
Implement stats reporting in (some) other fuzzers like libFuzzer.
Support at least one fuzzer-independent stat like CPU usage.
Document this API for external devs to use.
Figure out if we want each stat to be its own column or if one JSON column for all of them is fine. I think JSON field makes more sense since each stat is optional and adding new ones will be easier this way (no need for DB migrations). The downside is when generating reports we will have to collapse this field to create a dataframe where we have a column for each supported stat. It may also make querying harder for SQL wizards.

We also need to figure out how to report this. I think we could maybe link to another page where we generate stats in aggregate and for each benchmark. I think the exact way we report it can wait though.

jonathanmetzman · 2020-09-29T16:19:36Z

I fixed a lot of some of the more important issues, others (like reusing the stats implementation in other AFL-based fuzzers) can be done later.
@inferno-chromium WDYT?

inferno-chromium · 2020-09-29T16:24:56Z

I fixed a lot of some of the more important issues, others (like reusing the stats implementation in other AFL-based fuzzers) can be done later.
@inferno-chromium WDYT?

So, we wont have stats for libFuzzer and honggfuzz fuzzers ? Starting somewhere is fine, we need to add those as well over time.

jonathanmetzman · 2020-09-29T16:28:36Z

I fixed a lot of some of the more important issues, others (like reusing the stats implementation in other AFL-based fuzzers) can be done later.
@inferno-chromium WDYT?

So, we wont have stats for libFuzzer and honggfuzz fuzzers ? Starting somewhere is fine, we need to add those as well over time.

Not in this patch. But they can definitely be added. This API is meant to be supportable by most fuzzers and is meant to be optional as well so we don't increase onboarding complexity.

inferno-chromium · 2020-09-29T16:34:24Z

I fixed a lot of some of the more important issues, others (like reusing the stats implementation in other AFL-based fuzzers) can be done later.
@inferno-chromium WDYT?

So, we wont have stats for libFuzzer and honggfuzz fuzzers ? Starting somewhere is fine, we need to add those as well over time.

Not in this patch. But they can definitely be added. This API is meant to be supportable by most fuzzers and is meant to be optional as well so we don't increase onboarding complexity.

Sounds perfect, we definitely add this incrementally and not everything needs to integrate with this.

jonathanmetzman · 2020-10-09T17:41:13Z

I think this is ready to land, I'm just going to test it with a local experiment first.

lszekeres · 2020-10-12T02:06:21Z

database/models.py

@@ -62,3 +68,4 @@ class Snapshot(Base):
    trial_id = Column(Integer, ForeignKey('trial.id'), primary_key=True)
    trial = sqlalchemy.orm.relationship('Trial', back_populates='snapshots')
    edges_covered = Column(Integer, nullable=False)
+    fuzzer_stats = Column(JSON, nullable=True)


Are we planning to keep this? If yes, what is the benefit of having a JSON here?

IMO having JSONs here will be a large source of complexity (e.g. in queries) and bugs.

Consider having a number_of_executions column (Integer, nullable) instead for this PR, which would be more normal form than {'execs_per_sec': float} JSONs. Later, other metrics should have their own column too (like CPU/mem usage at time).

Related:
https://www.thomascerqueus.fr/json-fields-relational-databases-postgresql/
https://stackoverflow.com/questions/15367696/storing-json-in-database-vs-having-a-new-column-for-each-key
https://softwareengineering.stackexchange.com/questions/215429/is-it-wise-to-store-a-big-lump-of-json-on-a-database-row

I think it depends on if we will have custom stats and whether we want to allow fuzzers to use those. If we decide with Laszlo approach, i dont think we should add all these columns in Snapshot table ? maybe some other table type. Should we have a FuzzerStats table object here and then we can add any columns to it over time ?

Are we planning to keep this?

I think so.

If yes, what is the benefit of having a JSON here?

It offers a lot more flexibility. It means that stats can be added and removed without changing the SQL schema.

IMO having JSONs here will be a large source of complexity (e.g. in queries) and bugs.

I suspect this won't actually be the case since we tend to most analysis on dataframes, so once we clean things up so that JSON fields work the same as other fields in dataframes, we wont have a problem.

In https://stackoverflow.com/questions/15367696/storing-json-in-database-vs-having-a-new-column-for-each-key they describe a case for JSON fields that I think is appicable:

Saving arbitrary key/value user preferences (where the value can be boolean, textual, or numeric, and you don't want to have separate columns for different data types)

I think I want to do something similar in the next iteration. There I envision allowing fuzzers to store arbitrary key/value as stats.

Personally I don't think I will ever want to use these stats in an SQL query. For me, using pandas is easier, and this is what most users will do as well. Thus I don't think we lose anything from using a JSON field.
We do gain flexibility though so on balance I think there is a stronger case for JSON fields.

I think in-person you preferred using number of execs rather than avg execs since the latter can be derived from the former (which conveys more info). I think it's probably better to store stats in the format they will be consumed. So assuming avg execs is more useful to users than total execs (and maybe I'm wrong) it's better to store avg rather than store total and compute average before graphing. What do you think?

It means that stats can be added and removed without changing the SQL schema.

That is what I think we should avoid. Would we allow arbitrary JSON, or enforce a schema? If the former, how are we going to use that? The PR suggests the latter, as it implements manual schema checking in common/fuzzer_stats.py (SCHEMA = {'execs_per_sec': float}). In this case what's not clear is: why would we implement this manually, adding all this code and complexity, while this is what the DB schema gives us out of the box?

IMO having JSONs here will be a large source of complexity (e.g. in queries) and bugs.

I suspect this won't actually be the case since we tend to most analysis on dataframes, so once we clean things up so that JSON fields work the same as other fields in dataframes, we wont have a problem.

Yes, this is what I was referring to: using the DB schema, we get the dataframe automatically, no additional code needed. With JSON we need to write custom code validating, and converting arbitrary schemaless JSONs to dataframe columns manually, handling all possible errors and so on, which I think will be a massive source of complexity and bugs.

I think I want to do something similar in the next iteration. There I envision allowing fuzzers to store arbitrary key/value as stats.

What do you mean by arbitrary? Does arbitrary mean that it's relevant only to a particular fuzzer? If yes, how would we use that? Or is it just for the developer of a particular engine? If it's a fuzzer engine specific stat and only for the developer of that engine to be used by their private scripts, and not by fuzzbench code, than JSON indeed sounds good.

However, for common stats (that are relevant to all engines) we should carefully choose the ones that we want to allow engines to provide (and have a schema for them) to ensure uniformity and normalized data, so we can compare them along those metrics (eg. in the reports). To me, these would be basically

number of execs ,

cpu usage,

mem usage

(Is there anything else?)

arbitrary key/value as stats.

Why key/value? What key would you use other than time or number_of_execs as key?

specific in column name is weird, i think fuzzer_stats would imply fuzzer specific stats. and, cpu and memory columns will probably come later as we add capturing hooks ? for now, i think just json is fine, since are adding fuzzer stats.

specific in column name is weird, i think fuzzer_stats would imply fuzzer specific stats. and, cpu and memory columns will probably come later as we add capturing hooks ?

sgtm. we might still consider some alternative name that differentiates a bit better between "common" and "specific" stats. eg. edges_covered is also "fuzzer statistic", but it's a "common" one, as opposed to say the afl specific stability. maybe something like extra_stats or custom_stats or user_defined_stats would capture this a bit better.

for now, i think just json is fine, since are adding fuzzer stats.

which stats are we adding first?

i mean we will have json stats for now, since more common stats like cpu and mem are not hooked and will require custom capturing code. num_execs, can come as common one, but would need code across all fuzzers, i think this one is just doing afl as a start. once we add to all fuzzers, they can add code to set this column "num_execs"

Yeah, I'm of two minds about this question since you (@lszekeres) raised valid points.
But I think it would be simpler to handle all of the stats which use custom capturing in the same way (JSON).
I also realized there is a pretty good reason we don't want to rely on the schema for type validation.
We wont save stats to the db in the runner, we will do this in the measurer. So I don't think we can wait until saving to the db for validation because it happens so far from the error.

Do we agree to separate common metrics from fuzzer-specific ones?

If yes, then re. type/schema validation, let's consider the two cases:

Fuzzer specific metrics wouldn't have a schema, users would be allowed to provide arbitrary JSONs, right? I.e., if json.dumps() succeeds, it's valid and good to go. (We could also statically pytype-check that the fuzzer.py's get_custom_stats() returns a dict.)

For common metrics, let's look at them case by case:

Stuff like cpu_usage and mem_usage wouldn't be provided by the fuzzer.py, but by fuzzbench itself. No special type/schema validation is necessary.

For num_execs we need to rely on fuzzer.py. I'd collect this data by letting fuzzers define a def get_num_execs() -> int function. Early type/schema validation is done by pytype (-> int) at linting time.

jonathanmetzman · 2020-10-12T19:13:00Z

I think this is ready to land, I'm just going to test it with a local experiment first.

Local experiment worked fine.

jonathanmetzman · 2020-10-14T15:38:19Z

fuzzers/test_fuzzers.py

@@ -76,3 +79,14 @@ def test_fuzz_function_errors():
        # so fail if we see one. If that is not the case than this assert
        # should be removed.
        assert not isinstance(error.value, TypeError)
+
+
+def test_afl_get_stats(tmp_path):


What do we think about putting the test for each fuzzer's get_stats here?
The more intuitive alternative is to create fuzzers/afl/test_fuzzer.py and put it there.
But I don't really love the idea of polluting fuzzer directories with this (and with test files as well).

jonathanmetzman added 8 commits August 5, 2020 08:46

tmp

7615e00

tmp

c5a9f33

Fixup

228e46e

working

8dac305

tmp

d17bed3

tmp

0713db7

fix and add tests

1190257

Fix exceptions

65fd09e

google-cla bot added the cla: yes label Aug 10, 2020

jonathanmetzman marked this pull request as draft August 10, 2020 03:16

lszekeres mentioned this pull request Aug 10, 2020

Brandon Falk @gamozolabs FuzzBench feedback #654

Open

jonathanmetzman mentioned this pull request Aug 12, 2020

Variance in FuzzBench trials with same random seed #667

Open

jonathanmetzman added 3 commits August 12, 2020 18:16

merge

8dbe577

fix

1d1cfd8

fix

5d43644

jonathanmetzman mentioned this pull request Aug 14, 2020

How to get 100% stability for a target in under 5 minutes (TM) #677

Closed

jonathanmetzman marked this pull request as ready for review September 29, 2020 16:19

jonathanmetzman added 6 commits October 7, 2020 11:16

Merge branch 'master' of github.com:google/fuzzbench into stats

8ba00f8

Add tests

aaf3c1a

Add tests

593d584

fmt/lnt add more tests

fad2b49

add test file

d326d8a

Merge branch 'master' of github.com:google/fuzzbench into stats

e74bfc7

fmt

87a360c

lszekeres reviewed Oct 12, 2020

View reviewed changes

jonathanmetzman commented Oct 14, 2020

View reviewed changes

cleanup

b600d90

inferno-chromium mentioned this pull request Oct 14, 2020

Feature proposal: FuzzBench aware fuzzers #831

Open

jonathanmetzman added 2 commits October 16, 2020 10:22

Merge branch 'master' of github.com:google/fuzzbench into stats

81b6ff5

undo weird

1bc21b7

jonathanmetzman merged commit 7c02168 into master Oct 21, 2020

jonathanmetzman deleted the stats branch October 21, 2020 15:04

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Support reporting of fuzzing-time stats (by fuzzer.py) #648

Support reporting of fuzzing-time stats (by fuzzer.py) #648

jonathanmetzman commented Aug 10, 2020 •

edited

Loading

jonathanmetzman commented Sep 29, 2020

inferno-chromium commented Sep 29, 2020

jonathanmetzman commented Sep 29, 2020

inferno-chromium commented Sep 29, 2020

jonathanmetzman commented Oct 9, 2020

lszekeres Oct 12, 2020

inferno-chromium Oct 12, 2020

jonathanmetzman Oct 12, 2020

jonathanmetzman Oct 12, 2020

lszekeres Oct 12, 2020 •

edited

Loading

inferno-chromium Oct 13, 2020

lszekeres Oct 13, 2020 •

edited

Loading

inferno-chromium Oct 13, 2020

jonathanmetzman Oct 14, 2020

lszekeres Oct 14, 2020 •

edited

Loading

jonathanmetzman commented Oct 12, 2020

jonathanmetzman Oct 14, 2020

Support reporting of fuzzing-time stats (by fuzzer.py) #648

Support reporting of fuzzing-time stats (by fuzzer.py) #648

Conversation

jonathanmetzman commented Aug 10, 2020 • edited Loading

jonathanmetzman commented Sep 29, 2020

inferno-chromium commented Sep 29, 2020

jonathanmetzman commented Sep 29, 2020

inferno-chromium commented Sep 29, 2020

jonathanmetzman commented Oct 9, 2020

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

lszekeres Oct 12, 2020 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

lszekeres Oct 13, 2020 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

lszekeres Oct 14, 2020 • edited Loading

Choose a reason for hiding this comment

jonathanmetzman commented Oct 12, 2020

Choose a reason for hiding this comment

jonathanmetzman commented Aug 10, 2020 •

edited

Loading

lszekeres Oct 12, 2020 •

edited

Loading

lszekeres Oct 13, 2020 •

edited

Loading

lszekeres Oct 14, 2020 •

edited

Loading