Skip to content

Support fuzzing encoder callbacks in UberFilterFuzzer#11209

Merged
mattklein123 merged 3 commits intoenvoyproxy:masterfrom
nareddyt:fuzz-upstream-data
May 20, 2020
Merged

Support fuzzing encoder callbacks in UberFilterFuzzer#11209
mattklein123 merged 3 commits intoenvoyproxy:masterfrom
nareddyt:fuzz-upstream-data

Conversation

@nareddyt
Copy link
Contributor

@nareddyt nareddyt commented May 15, 2020

Commit Message: Support fuzzing encoder callbacks in UberFilterFuzzer

Additional Description:

Minimized code duplication by making use of templating and template specialization. The 6 cases for sending headers/data/trailers in decoders/encoders each have their own function. By making use of templating, we can enforce compile time checks and keep the fuzzing runtime fast. The one downside is the amount of overhead in declaring the functions, but this seems like the cleanest solution to me.

Alternatives considered:

  • If/else statements: This became messy quickly, as we also have different validation logic for request/response headers. And we need to pass extra data around to keep track of whether we were fuzzing decoders or encoders.
  • Lambdas: Similar to this approach with potentially less code for defining the function. But this does not enforce compile time checking, requires passing more data around, and potentially requires dynamic casts (which are expensive, not good for fuzzing).

Unfortunately, I was not able to make these functions static (as discussed with @asraa) due to lifetime issues. The request/response headers must be kept alive for the duration of the filter (as decodeData may use headers from decodeHeaders). Best option was to make use of member variables to preserve it. This is OK, ESPv2 can workaround this (see the follow-ups below).

A few other minor changes:

  • Add a small max limit to the number of chunks the proto data is broken into. Sometimes the fuzzer would fail in absl::ByLength, which I assume is due to a large chunk size.

Risk Level: None

Testing:

  • Added new entry to in-repo corpus that executes decoders and encoders.
  • Ran against in-repo corpus and manually inspected the debug logs.
  • Ran with_libfuzzer and ensured crashes did not occur in UberFilterFuzzer

Follow-up PRs:

  • Support fuzzing access logs in a similar fashion. This should be fairly straightforward and requires no new templates.
  • Split the UberFilterFuzzer into two classes: One for constructing the filter using the factory, and one for actually fuzzing the filter. In ESPv2, we have fuzz tests per filter that already set all the callbacks needed. So we can ignore all the mocks in the UberFilterFuzzer factory, and just directly use the UberFilterFuzzer filter fuzzer.

Signed-off-by: Teju Nareddy nareddyt@google.com

Signed-off-by: Teju Nareddy <nareddyt@google.com>
@nareddyt
Copy link
Contributor Author

cc @asraa, please read over the "alternatives considered" and let me know what you think about the templating here

@jmarantz
Copy link
Contributor

@nareddyt can you fix format so we can what CI looks like? I think failing format causes some of the other checks to be deferred.

Copy link
Contributor

@asraa asraa left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Of the alternatives considered, this one probably seems clearest to me. It's easy to follow and understand the flow. This is great!

Were you able to run the fuzzer for a few minutes locally? It looks like you did catch the absl::ByLength() issue that came up over the weekend: https://bugs.chromium.org/p/oss-fuzz/issues/detail?id=22246, so I believe you should be able to run for a few min. Please mark that as fixed!

Coverage works too: FUZZ_COVERAGE=true test/run_envoy_bazel_coverage.sh //test/extensions/filters/http/common/fuzz:filter_fuzz_test_with_libfuzzer if you can get a decent number for the http filter subdirectory that would be good. FUZZ_CORPUS_TIME environment variable will control how long the fuzzer runs while collecting coverage

nareddyt added 2 commits May 18, 2020 09:02
Signed-off-by: Teju Nareddy <nareddyt@google.com>
Signed-off-by: Teju Nareddy <nareddyt@google.com>
@nareddyt
Copy link
Contributor Author

Format fixed

@asraa:

  • Thanks for sending me the issue, seems like this should fix it. Let's keep it open and let OSS fuzz close it after this change is in.
  • I ran locally for a few minutes multiple times. There were a few crashes, but they were in various filters. I don't think they're related to this change.
  • I checked coverage. Encoders are definitely being covered, but not doing anything interesting due to missing grpc headers. How about we let this run in OSS Fuzz for a little and address this if OSS Fuzz cannot select those headers correctly? I'll also add those headers to our corpus.
  • Also added better validation of the :status header as discussed offline.
  • Disabled the log silencer for debugging fuzz tests as discussed offline.

@nareddyt nareddyt requested a review from asraa May 18, 2020 20:38
Copy link
Contributor

@asraa asraa left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you! One final nit but this looks great. Excited to iterate on this.

@asraa
Copy link
Contributor

asraa commented May 20, 2020

Did you find fuzzing the JSON string to be a blocker for some codepaths?

  • Is it worth adding structure to the JSON strings? (if this filter is json transcoder, use the fuzzed http data to select key/values fand construct a json string that makes sense for the bookstore service?)

@nareddyt
Copy link
Contributor Author

Did you find fuzzing the JSON string to be a blocker for some codepaths?

  • Is it worth adding structure to the JSON strings? (if this filter is json transcoder, use the fuzzed http data to select key/values fand construct a json string that makes sense for the bookstore service?)

Good point! I didn't let it run long enough locally to have enough data, I'd prefer to let OSS run this and generate a coverage report we can inspect.

If we do find this to be a blocker, we can add in a 3rd structured type of input for the body. Similar to how we added proto body in #10796, just generate a google.protobuf.Struct and marshal it into JSON. That would be a fairly simple code change, I'll follow up on this in another PR if needed.

@mattklein123 mattklein123 merged commit 2b0633a into envoyproxy:master May 20, 2020
@nareddyt nareddyt deleted the fuzz-upstream-data branch May 20, 2020 23:22

// The size (in bytes) of each buffer when forming the requests.
uint64 chunk_size = 2 [(validate.rules).uint64.gt = 0];
uint64 chunk_size = 2 [(validate.rules).uint64 = {gt: 0, lt: 8192}];
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@nareddyt Is this meant to be 8192? I think I'm seeing a regression.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Right, i increased it to be more realistic. 20 was too small, and envoys buffer should handle this much data.

It was still working when running with libfuzzer for me. Could you share more details?

Copy link
Contributor

@asraa asraa May 21, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

#11045 runs each fuzzer for a minute: https://circleci.com/gh/envoyproxy/envoy/347829?utm_campaign=vcs-integration-link&utm_medium=referral&utm_source=github-build-link

str_split.cc : 123] RAW: Check length > 0 failed: 
AddressSanitizer:DEADLYSIGNAL
=================================================================
==20698==ERROR: AddressSanitizer: ABRT on unknown address 0x0000000050da (pc 0x7f06f772be97 bp 0x7ffd0fe621b0 sp 0x7ffd0fe61070 T0)
    #0 0x7f06f772be96 in raise (/lib/x86_64-linux-gnu/libc.so.6+0x3ee96)
    #1 0x7f06f772d800 in abort (/lib/x86_64-linux-gnu/libc.so.6+0x40800)
    #2 0x469e2e23 in (anonymous namespace)::RawLogVA(absl::LogSeverity, char const*, int, char const*, __va_list_tag*) (/build/tmp/_bazel_bazel/b570b5ccd0454dc9af9f65ab1833764d/execroot/envoy/bazel-out/k8-fastbuild/bin/test/extensions/filters/http/common/fuzz/filter_fuzz_test_with_libfuzzer+0x469e2e23)

Maybe it shoudl be
data_chunks = absl::StrSplit(serialized, min(1, absl::ByLength(data.proto_body().chunk_size())));

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think you can repro by making any of your corpus entries a 0 chunk size

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks, I'll take a look and make a PR to fix

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I cannot reproduce this. I assume it's failing on this line here: https://chromium.googlesource.com/external/github.com/abseil/abseil-cpp/+/HEAD/absl/strings/str_split.cc#123

But the proto has validation to prevent this:

uint64 chunk_size = 2 [(validate.rules).uint64 = {gt: 0, lt: 8192}];

I confirmed setting this to 0 in the corpus results in validation failure:

[2020-05-21 20:01:05.178][12][debug][misc] [source/common/protobuf/utility.cc:255] Proto validation error; throwing Proto constraint validation failed (FilterFuzzTestCaseValidationError.Data: ["embedded message failed validation"] | caused by HttpDataValidationError.ProtoBody: ["embedded message failed validation"] | caused by ProtoBodyValidationError.ChunkSize: ["value must be inside range (" '\x00' ", " '\u2000' ")"]): config {

I also tested with:

  • a empty google.protobuf.Any
  • an Any with a type URL but no value

Neither crashes

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ahhh! Thank you. I'm really sorry for causing trouble, it definitely can't be empty either with those validations. I think the reason might be because the PR didn't have your PR merged. I'm waiting for CI to pass after the merge, but thank you so much. I'm sorry about that!

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No problem! Good to double check edge cases :)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants