Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

better fuzzing #68

Merged
merged 1 commit into from
Nov 9, 2022
Merged

better fuzzing #68

merged 1 commit into from
Nov 9, 2022

Conversation

pascalkuthe
Copy link
Collaborator

While working on #66 I fuzzed the hash implementation.
I noticed that the fuzzer took extremely long to find any results, because chunk boundaries are quite rare (so huge strings are required).
By using cfg(fuzzing) in addition to cfg(test) for enabling a much smaller fuzz size, chunk sizes are much more common and the
fuzzer finds problems much quicker.
For example with the change the bug from #67 is quickly detected.
I have removed the medium.txt case from the fuzzer because it slows the fuzzing to a crawl due to the huge amount of chunks.
I think the main point of the file was to test chunk boundaries anyway which now also occur for small.txt and randomly generated text.

@cessen
Copy link
Owner

cessen commented Nov 1, 2022

I'd like to keep the fuzzing code running as similar to a release build as possible. My rationale is that memory safety issues are often subtle, and although it's unlikely that different tree constants would cause any issues, it's not impossible.

And as I mentioned in #66, the randomized tests for the Hash impl belong in the property tests anyway, which already build with the smaller tree constants.

@pascalkuthe
Copy link
Collaborator Author

I'd like to keep the fuzzing code running as similar to a release build as possible. My rationale is that memory safety issues are often subtle, and although it's unlikely that different tree constants would cause any issues, it's not impossible.

And as I mentioned in #66, the randomized tests for the Hash impl belong in the property tests anyway, which already build with the smaller tree constants.

The reason I submitted this as a PR at all was because it was able to catch #67 which can cause memory issues in release builds too, the fuzzer was just unable to find the case (or did not run long enough). It still took a while even with this fuzzer so a proptest would not have found that bug. Maybe we could put this behind a feature flag so we can fuzz both ways?

@cessen
Copy link
Owner

cessen commented Nov 1, 2022

I think I need to come back to this PR later. My covid brain + jet lag is leaving me in a weird emotional state, where I just really really want to keep the separation between fuzz testing and property testing very strict. But I don't think I'm evaluating things rationally right now. I'll revisit this when I have my head on straight again.

@cessen
Copy link
Owner

cessen commented Nov 7, 2022

Okay, so now that my head is on straight again, I agree that this is fine as a fuzz test. However, I do still want the main fuzz testing to use release tree constants, for the reasons I outlined above.

Maybe we could put this behind a feature flag so we can fuzz both ways?

I think something along these lines makes sense. I'm wondering if the fuzz testing framework allows using different feature flags for different fuzz executables? That would be ideal, because then we wouldn't even have to think about it: the fuzz tests that need the smaller tree constants will always use them, and the rest will always use the release constants.

Also: it looks like this PR includes the changes from #67? Can you split that out, so we can handle them separately? Edit: I mean, make this the PR that adds the fuzz testing, and leave #67 to just add the fix.

@pascalkuthe
Copy link
Collaborator Author

Yeah I added the change from #67 originally because I assumed that you required fuzzing to pass for merging a PR (and fuzzing fails almost imminently with that fix with the smaller chunk size). I assumed that fix would get merged first. I have dropped the commit now.

I have now put the smaller chunk size behind a feature flag and restored the old fuzz target. I added my modified versin as a new fuzz target that requires the small_chunks feature flag. To run this new fuzz target you need to use:

cargo +nightly fuzz run mutation_small_chunks --features="small_chunks"

Its a bit ugly but at least cargo will refuse to run the fuzzer if you don't specify this feature flag (while keeping the old fuzz target unchanged).

Cargo.toml Outdated
@@ -17,6 +17,11 @@ cr_lines = [] # Enable recognizing carriage returns as line breaks.
unicode_lines = ["cr_lines"] # Enable recognizing all Unicode line breaks.
simd = ["str_indices/simd"]

# Internal feature: Not part of public stable API
# enables a much smaller chunk size that makes it
# esier to catch bugs without requiring huge text sizes during fuzzing
Copy link
Owner

@cessen cessen Nov 9, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Typos:

  • esier - > easier
  • Missing period at the end of the sentence.

@cessen
Copy link
Owner

cessen commented Nov 9, 2022

Once the typos are fixed, this looks ready to merge. Thanks!

@pascalkuthe
Copy link
Collaborator Author

Once the typos are fixed, this looks ready to merge. Thanks!

I have fixed the typo and added the period. Should be ready to now :)

@cessen
Copy link
Owner

cessen commented Nov 9, 2022

Awesome, thanks!

I think that leaves just the lines iterator PR. That one's going to take a bit, because I'd like to make sure I fully understand the implementation.

@cessen cessen merged commit 56679d8 into cessen:master Nov 9, 2022
@pascalkuthe
Copy link
Collaborator Author

Awesome, thanks!

I think that leaves just the lines iterator PR. That one's going to take a bit, because I'd like to make sure I fully understand the implementation.

I am currently in the process of fixing documentation there and moving the lines iterator back.

I get that the lines iterator is super hard to review.
If that might help with the review I would be happy to discuss implementation details over on matrix aswell.
It might help to just casually chat about some details of the implementation. Not as a replacement for actual review comments of course just so that I can help with understanding the implementation faster and you have an easier time reviewing it.
Ofourse if you rather prefer not to do that that'ss perfectly fine too, I just wanted to offer since I usually find these discussions helpful

@cessen
Copy link
Owner

cessen commented Nov 9, 2022

Ah, thanks for the offer! I'll let you know if I think that'll be helpful. But I haven't properly dived into the code yet, so I'll wait until at least trying first. :-)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants