better fuzzing #68

pascalkuthe · 2022-10-27T12:03:50Z

While working on #66 I fuzzed the hash implementation.
I noticed that the fuzzer took extremely long to find any results, because chunk boundaries are quite rare (so huge strings are required).
By using cfg(fuzzing) in addition to cfg(test) for enabling a much smaller fuzz size, chunk sizes are much more common and the
fuzzer finds problems much quicker.
For example with the change the bug from #67 is quickly detected.
I have removed the medium.txt case from the fuzzer because it slows the fuzzing to a crawl due to the huge amount of chunks.
I think the main point of the file was to test chunk boundaries anyway which now also occur for small.txt and randomly generated text.

cessen · 2022-11-01T16:40:25Z

I'd like to keep the fuzzing code running as similar to a release build as possible. My rationale is that memory safety issues are often subtle, and although it's unlikely that different tree constants would cause any issues, it's not impossible.

And as I mentioned in #66, the randomized tests for the Hash impl belong in the property tests anyway, which already build with the smaller tree constants.

pascalkuthe · 2022-11-01T16:57:04Z

I'd like to keep the fuzzing code running as similar to a release build as possible. My rationale is that memory safety issues are often subtle, and although it's unlikely that different tree constants would cause any issues, it's not impossible.

And as I mentioned in #66, the randomized tests for the Hash impl belong in the property tests anyway, which already build with the smaller tree constants.

The reason I submitted this as a PR at all was because it was able to catch #67 which can cause memory issues in release builds too, the fuzzer was just unable to find the case (or did not run long enough). It still took a while even with this fuzzer so a proptest would not have found that bug. Maybe we could put this behind a feature flag so we can fuzz both ways?

cessen · 2022-11-01T20:33:51Z

I think I need to come back to this PR later. My covid brain + jet lag is leaving me in a weird emotional state, where I just really really want to keep the separation between fuzz testing and property testing very strict. But I don't think I'm evaluating things rationally right now. I'll revisit this when I have my head on straight again.

cessen · 2022-11-07T09:41:37Z

Okay, so now that my head is on straight again, I agree that this is fine as a fuzz test. However, I do still want the main fuzz testing to use release tree constants, for the reasons I outlined above.

Maybe we could put this behind a feature flag so we can fuzz both ways?

I think something along these lines makes sense. I'm wondering if the fuzz testing framework allows using different feature flags for different fuzz executables? That would be ideal, because then we wouldn't even have to think about it: the fuzz tests that need the smaller tree constants will always use them, and the rest will always use the release constants.

Also: it looks like this PR includes the changes from #67? Can you split that out, so we can handle them separately? Edit: I mean, make this the PR that adds the fuzz testing, and leave #67 to just add the fix.

pascalkuthe · 2022-11-07T19:39:14Z

Yeah I added the change from #67 originally because I assumed that you required fuzzing to pass for merging a PR (and fuzzing fails almost imminently with that fix with the smaller chunk size). I assumed that fix would get merged first. I have dropped the commit now.

I have now put the smaller chunk size behind a feature flag and restored the old fuzz target. I added my modified versin as a new fuzz target that requires the small_chunks feature flag. To run this new fuzz target you need to use:

cargo +nightly fuzz run mutation_small_chunks --features="small_chunks"

Its a bit ugly but at least cargo will refuse to run the fuzzer if you don't specify this feature flag (while keeping the old fuzz target unchanged).

cessen · 2022-11-09T09:54:44Z

Cargo.toml

@@ -17,6 +17,11 @@ cr_lines = [] # Enable recognizing carriage returns as line breaks.
 unicode_lines = ["cr_lines"] # Enable recognizing all Unicode line breaks.
 simd = ["str_indices/simd"]

+# Internal feature: Not part of public stable API
+# enables a much smaller chunk size that makes it
+# esier to catch bugs without requiring huge text sizes during fuzzing


Typos:

esier - > easier

Missing period at the end of the sentence.

cessen · 2022-11-09T09:57:51Z

Once the typos are fixed, this looks ready to merge. Thanks!

pascalkuthe · 2022-11-09T15:54:10Z

Once the typos are fixed, this looks ready to merge. Thanks!

I have fixed the typo and added the period. Should be ready to now :)

cessen · 2022-11-09T16:23:25Z

Awesome, thanks!

I think that leaves just the lines iterator PR. That one's going to take a bit, because I'd like to make sure I fully understand the implementation.

pascalkuthe · 2022-11-09T17:00:16Z

Awesome, thanks!

I think that leaves just the lines iterator PR. That one's going to take a bit, because I'd like to make sure I fully understand the implementation.

I am currently in the process of fixing documentation there and moving the lines iterator back.

I get that the lines iterator is super hard to review.
If that might help with the review I would be happy to discuss implementation details over on matrix aswell.
It might help to just casually chat about some details of the implementation. Not as a replacement for actual review comments of course just so that I can help with understanding the implementation faster and you have an easier time reviewing it.
Ofourse if you rather prefer not to do that that'ss perfectly fine too, I just wanted to offer since I usually find these discussions helpful

cessen · 2022-11-09T20:20:58Z

Ah, thanks for the offer! I'll let you know if I think that'll be helpful. But I haven't properly dived into the code yet, so I'll wait until at least trying first. :-)

pascalkuthe force-pushed the better_fuzzing branch from 9ef71cb to eebe674 Compare November 7, 2022 19:33

cessen reviewed Nov 9, 2022

View reviewed changes

user smaller chunk size during fuzzing to detect bugs more reliably

9be20bf

pascalkuthe force-pushed the better_fuzzing branch from eebe674 to 9be20bf Compare November 9, 2022 15:49

cessen merged commit 56679d8 into cessen:master Nov 9, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

better fuzzing #68

better fuzzing #68

pascalkuthe commented Oct 27, 2022

cessen commented Nov 1, 2022

pascalkuthe commented Nov 1, 2022

cessen commented Nov 1, 2022

cessen commented Nov 7, 2022 •

edited

Loading

pascalkuthe commented Nov 7, 2022

cessen Nov 9, 2022 •

edited

Loading

cessen commented Nov 9, 2022

pascalkuthe commented Nov 9, 2022

cessen commented Nov 9, 2022

pascalkuthe commented Nov 9, 2022

cessen commented Nov 9, 2022

better fuzzing #68

better fuzzing #68

Conversation

pascalkuthe commented Oct 27, 2022

cessen commented Nov 1, 2022

pascalkuthe commented Nov 1, 2022

cessen commented Nov 1, 2022

cessen commented Nov 7, 2022 • edited Loading

pascalkuthe commented Nov 7, 2022

cessen Nov 9, 2022 • edited Loading

Choose a reason for hiding this comment

cessen commented Nov 9, 2022

pascalkuthe commented Nov 9, 2022

cessen commented Nov 9, 2022

pascalkuthe commented Nov 9, 2022

cessen commented Nov 9, 2022

cessen commented Nov 7, 2022 •

edited

Loading

cessen Nov 9, 2022 •

edited

Loading