Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Support reporting of fuzzing-time stats (by fuzzer.py) #648

Merged
merged 21 commits into from
Oct 21, 2020
Merged

Conversation

jonathanmetzman
Copy link
Contributor

@jonathanmetzman jonathanmetzman commented Aug 10, 2020

Currently the only stats FuzzBench reports per cycle/snapshot are those recorded by the measurer. This means we cannot report some useful fuzzing-time stats such as executions/sec, stability, CPU/RAM or I/O usage etc.

This PR adds an optional API for fuzzer.py to report extra stats in a uniform way that allows fuzzers to be compared on these stats.
For now it only supports executions per second a float that we call avg_execs.
We can use the same pipeline in runner.py to report fuzz-time stats that are fuzzer independent such as RAM, disk, IO, CPU.
TODO before this is ready for review:

  1. Rename avg_execs to execs_sec. It's more descriptive this way.
  2. Reuse AFL's reporting API implementation in AFL-based fuzzers.
  3. Implement stats reporting in (some) other fuzzers like libFuzzer.
  4. Support at least one fuzzer-independent stat like CPU usage.
  5. Document this API for external devs to use.
  6. Figure out if we want each stat to be its own column or if one JSON column for all of them is fine. I think JSON field makes more sense since each stat is optional and adding new ones will be easier this way (no need for DB migrations). The downside is when generating reports we will have to collapse this field to create a dataframe where we have a column for each supported stat. It may also make querying harder for SQL wizards.

We also need to figure out how to report this. I think we could maybe link to another page where we generate stats in aggregate and for each benchmark. I think the exact way we report it can wait though.

@jonathanmetzman
Copy link
Contributor Author

I fixed a lot of some of the more important issues, others (like reusing the stats implementation in other AFL-based fuzzers) can be done later.
@inferno-chromium WDYT?

@jonathanmetzman jonathanmetzman marked this pull request as ready for review September 29, 2020 16:19
@inferno-chromium
Copy link
Collaborator

I fixed a lot of some of the more important issues, others (like reusing the stats implementation in other AFL-based fuzzers) can be done later.
@inferno-chromium WDYT?

So, we wont have stats for libFuzzer and honggfuzz fuzzers ? Starting somewhere is fine, we need to add those as well over time.

@jonathanmetzman
Copy link
Contributor Author

I fixed a lot of some of the more important issues, others (like reusing the stats implementation in other AFL-based fuzzers) can be done later.
@inferno-chromium WDYT?

So, we wont have stats for libFuzzer and honggfuzz fuzzers ? Starting somewhere is fine, we need to add those as well over time.

Not in this patch. But they can definitely be added. This API is meant to be supportable by most fuzzers and is meant to be optional as well so we don't increase onboarding complexity.

@inferno-chromium
Copy link
Collaborator

I fixed a lot of some of the more important issues, others (like reusing the stats implementation in other AFL-based fuzzers) can be done later.
@inferno-chromium WDYT?

So, we wont have stats for libFuzzer and honggfuzz fuzzers ? Starting somewhere is fine, we need to add those as well over time.

Not in this patch. But they can definitely be added. This API is meant to be supportable by most fuzzers and is meant to be optional as well so we don't increase onboarding complexity.

Sounds perfect, we definitely add this incrementally and not everything needs to integrate with this.

@jonathanmetzman
Copy link
Contributor Author

I think this is ready to land, I'm just going to test it with a local experiment first.

@@ -62,3 +68,4 @@ class Snapshot(Base):
trial_id = Column(Integer, ForeignKey('trial.id'), primary_key=True)
trial = sqlalchemy.orm.relationship('Trial', back_populates='snapshots')
edges_covered = Column(Integer, nullable=False)
fuzzer_stats = Column(JSON, nullable=True)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Are we planning to keep this? If yes, what is the benefit of having a JSON here?

IMO having JSONs here will be a large source of complexity (e.g. in queries) and bugs.

Consider having a number_of_executions column (Integer, nullable) instead for this PR, which would be more normal form than {'execs_per_sec': float} JSONs. Later, other metrics should have their own column too (like CPU/mem usage at time).

Related:
https://www.thomascerqueus.fr/json-fields-relational-databases-postgresql/
https://stackoverflow.com/questions/15367696/storing-json-in-database-vs-having-a-new-column-for-each-key
https://softwareengineering.stackexchange.com/questions/215429/is-it-wise-to-store-a-big-lump-of-json-on-a-database-row

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it depends on if we will have custom stats and whether we want to allow fuzzers to use those. If we decide with Laszlo approach, i dont think we should add all these columns in Snapshot table ? maybe some other table type. Should we have a FuzzerStats table object here and then we can add any columns to it over time ?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Are we planning to keep this?

I think so.

If yes, what is the benefit of having a JSON here?

It offers a lot more flexibility. It means that stats can be added and removed without changing the SQL schema.

IMO having JSONs here will be a large source of complexity (e.g. in queries) and bugs.

I suspect this won't actually be the case since we tend to most analysis on dataframes, so once we clean things up so that JSON fields work the same as other fields in dataframes, we wont have a problem.

In https://stackoverflow.com/questions/15367696/storing-json-in-database-vs-having-a-new-column-for-each-key they describe a case for JSON fields that I think is appicable:

Saving arbitrary key/value user preferences (where the value can be boolean, textual, or numeric, and you don't want to have separate columns for different data types)

I think I want to do something similar in the next iteration. There I envision allowing fuzzers to store arbitrary key/value as stats.

Personally I don't think I will ever want to use these stats in an SQL query. For me, using pandas is easier, and this is what most users will do as well. Thus I don't think we lose anything from using a JSON field.
We do gain flexibility though so on balance I think there is a stronger case for JSON fields.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think in-person you preferred using number of execs rather than avg execs since the latter can be derived from the former (which conveys more info). I think it's probably better to store stats in the format they will be consumed. So assuming avg execs is more useful to users than total execs (and maybe I'm wrong) it's better to store avg rather than store total and compute average before graphing. What do you think?

Copy link
Member

@lszekeres lszekeres Oct 12, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It means that stats can be added and removed without changing the SQL schema.

That is what I think we should avoid. Would we allow arbitrary JSON, or enforce a schema? If the former, how are we going to use that? The PR suggests the latter, as it implements manual schema checking in common/fuzzer_stats.py (SCHEMA = {'execs_per_sec': float}). In this case what's not clear is: why would we implement this manually, adding all this code and complexity, while this is what the DB schema gives us out of the box?

IMO having JSONs here will be a large source of complexity (e.g. in queries) and bugs.

I suspect this won't actually be the case since we tend to most analysis on dataframes, so once we clean things up so that JSON fields work the same as other fields in dataframes, we wont have a problem.

Yes, this is what I was referring to: using the DB schema, we get the dataframe automatically, no additional code needed. With JSON we need to write custom code validating, and converting arbitrary schemaless JSONs to dataframe columns manually, handling all possible errors and so on, which I think will be a massive source of complexity and bugs.

I think I want to do something similar in the next iteration. There I envision allowing fuzzers to store arbitrary key/value as stats.

What do you mean by arbitrary? Does arbitrary mean that it's relevant only to a particular fuzzer? If yes, how would we use that? Or is it just for the developer of a particular engine? If it's a fuzzer engine specific stat and only for the developer of that engine to be used by their private scripts, and not by fuzzbench code, than JSON indeed sounds good.

However, for common stats (that are relevant to all engines) we should carefully choose the ones that we want to allow engines to provide (and have a schema for them) to ensure uniformity and normalized data, so we can compare them along those metrics (eg. in the reports). To me, these would be basically

  • number of execs ,
  • cpu usage,
  • mem usage

(Is there anything else?)

arbitrary key/value as stats.

Why key/value? What key would you use other than time or number_of_execs as key?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

specific in column name is weird, i think fuzzer_stats would imply fuzzer specific stats. and, cpu and memory columns will probably come later as we add capturing hooks ? for now, i think just json is fine, since are adding fuzzer stats.

Copy link
Member

@lszekeres lszekeres Oct 13, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

specific in column name is weird, i think fuzzer_stats would imply fuzzer specific stats. and, cpu and memory columns will probably come later as we add capturing hooks ?

sgtm. we might still consider some alternative name that differentiates a bit better between "common" and "specific" stats. eg. edges_covered is also "fuzzer statistic", but it's a "common" one, as opposed to say the afl specific stability. maybe something like extra_stats or custom_stats or user_defined_stats would capture this a bit better.

for now, i think just json is fine, since are adding fuzzer stats.

which stats are we adding first?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

i mean we will have json stats for now, since more common stats like cpu and mem are not hooked and will require custom capturing code. num_execs, can come as common one, but would need code across all fuzzers, i think this one is just doing afl as a start. once we add to all fuzzers, they can add code to set this column "num_execs"

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, I'm of two minds about this question since you (@lszekeres) raised valid points.
But I think it would be simpler to handle all of the stats which use custom capturing in the same way (JSON).
I also realized there is a pretty good reason we don't want to rely on the schema for type validation.
We wont save stats to the db in the runner, we will do this in the measurer. So I don't think we can wait until saving to the db for validation because it happens so far from the error.

Copy link
Member

@lszekeres lszekeres Oct 14, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we agree to separate common metrics from fuzzer-specific ones?

If yes, then re. type/schema validation, let's consider the two cases:

  • Fuzzer specific metrics wouldn't have a schema, users would be allowed to provide arbitrary JSONs, right? I.e., if json.dumps() succeeds, it's valid and good to go. (We could also statically pytype-check that the fuzzer.py's get_custom_stats() returns a dict.)

  • For common metrics, let's look at them case by case:

    • Stuff like cpu_usage and mem_usage wouldn't be provided by the fuzzer.py, but by fuzzbench itself. No special type/schema validation is necessary.
    • For num_execs we need to rely on fuzzer.py. I'd collect this data by letting fuzzers define a def get_num_execs() -> int function. Early type/schema validation is done by pytype (-> int) at linting time.

@jonathanmetzman
Copy link
Contributor Author

I think this is ready to land, I'm just going to test it with a local experiment first.

Local experiment worked fine.

@@ -76,3 +79,14 @@ def test_fuzz_function_errors():
# so fail if we see one. If that is not the case than this assert
# should be removed.
assert not isinstance(error.value, TypeError)


def test_afl_get_stats(tmp_path):
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What do we think about putting the test for each fuzzer's get_stats here?
The more intuitive alternative is to create fuzzers/afl/test_fuzzer.py and put it there.
But I don't really love the idea of polluting fuzzer directories with this (and with test files as well).

@jonathanmetzman jonathanmetzman merged commit 7c02168 into master Oct 21, 2020
@jonathanmetzman jonathanmetzman deleted the stats branch October 21, 2020 15:04
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants