Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Figure out a solution for the stdlib #5

Open
Manishearth opened this issue Feb 21, 2017 · 15 comments
Open

Figure out a solution for the stdlib #5

Manishearth opened this issue Feb 21, 2017 · 15 comments

Comments

@Manishearth
Copy link
Member

Libstd isn't built with sanitizer support. Perhaps we should release a version that is.

@Manishearth
Copy link
Member Author

@nagisa
Copy link
Member

nagisa commented Feb 21, 2017

It is likely that instrumenting libstd itself is counterproductive. Instrumenting libstd and all dependent libraries will increase the workload of sanitizer greatly and may take longer than necessary to yield useful results.

Instead we should instrument the target crate only by default and provide options to increase fuzzing scope (instrumenting dependencies, instrumenting libstd, etc.)

@Manishearth
Copy link
Member Author

idk, lots of panics originate from the stdlib (indexing, etc). While it's true that the branches from these usually get inlined (even though the panic code doesn't), which should be enough for fuzzing, it would be nice to have an instrumented stdlib.

rust-san has instructions for setting up an instrumented stdlib with xargo, we could probably just provide links to that and perhaps add a cargo fuzz command for setting that up. You're right that this doesn't have to be the default.

@nagisa
Copy link
Member

nagisa commented Feb 21, 2017

Some clarification to my comment above:

libstd does not need to be instrumented for panics to be understood by the fuzzer. With that out of the way…

By only instrumenting the crate of interest, what is essentially done is search for bugs within the crate of interest alone (± some inlined code). By instrumenting the dependencies (incl. libstd) as well, fuzzer now effectively searches for bugs within the dependencies at the same time.

This may sound desirable, but it is not. The dependencies can be fuzzed on their own. Fuzzing the crates separately both reduces the scope of fuzzing greatly (i.e. fuzzing is faster to find bugs, and if no bugs are found, there’s more confidence about crate being correct for the same investment of time), but also verifies the correctness of the dependency crate independently, which is strictly better than checking correctness of dependency within context of your own crate.

So it seems to me that fuzzing crates separately and in isolation is ideal. I recognise that not everybody wants to spend effort to fuzz their dependencies and that fuzzing everything at once is less effort – that’s where the options to increase the scope come in.

@Manishearth
Copy link
Member Author

libstd does not need to be instrumented for panics to be understood by the fuzzer. With that out of the way…

Well, yeah, but the branches in libstd will be opaque to the fuzzer, no?

Interesting point about fuzzing dependencies. We're currently using RUSTFLAGS but an option to use cargo rustc would be nice.

@emk
Copy link
Contributor

emk commented Mar 5, 2017

Well, yeah, but the branches in libstd will be opaque to the fuzzer, no?

I suspect this is why I can't usefully fuzz the vobsub crate, which does lots of tricky, low-level parsing. The very first thing I do with the input bytes is search for an MPEG-2 Program Stream header:

            // Search for the start of a ProgramStream packet.
            let needle = &[0x00, 0x00, 0x01, 0xba];
            let start = self.remaining.windows(needle.len())
                .position(|window| needle == window);

When I fuzz this, cargo fuzz appears to be endlessly fuzzing single-byte inputs:

INFO: Seed: 577350870
INFO: Loaded 0 modules (0 guards): 
Loading corpus dir: corpus
INFO: -max_len is not provided, using 64
INFO: A corpus is not provided, starting from an empty corpus
#0	READ units: 1
#1	INITED cov: 63 corp: 1/1b exec/s: 0 rss: 70Mb
#1048576	pulse  cov: 63 corp: 1/1b exec/s: 524288 rss: 136Mb
#2097152	pulse  cov: 63 corp: 1/1b exec/s: 419430 rss: 194Mb
#4194304	pulse  cov: 63 corp: 1/1b exec/s: 419430 rss: 311Mb
#8388608	pulse  cov: 63 corp: 1/1b exec/s: 399457 rss: 544Mb
#16777216	pulse  cov: 63 corp: 1/1b exec/s: 409200 rss: 746Mb
#33554432	pulse  cov: 63 corp: 1/1b exec/s: 419430 rss: 747Mb
#67108864	pulse  cov: 63 corp: 1/1b exec/s: 422068 rss: 747Mb

I'm going to try reading the "san" docs to see if I can rebuild std, and see if that helps any.

@emk
Copy link
Contributor

emk commented Mar 5, 2017

OK, there's some useful information in the Reddit thread announcing cargo fuzz. This links to Better backtraces, which explains how to rebuild rust-src using xargo. But I'm not quite sure how to make this work under cargo fuzz without modifying cargo fuzz itself. I'll try poking at this later if I have time.

I'd really like to fuzz vobsub; it's parsing all sorts of ancient, ugly MPEG-2 data, and I would love to provide it as part of a web service someday.

@nagisa
Copy link
Member

nagisa commented Mar 5, 2017

@emk consider providing at least one starter file to the fuzzer (put into fuzz/corpus).


Well, yeah, but the branches in libstd will be opaque to the fuzzer, no?

That’s only for functions monomorphised within libstd. They are few and far between.

@emk
Copy link
Contributor

emk commented Mar 5, 2017

@nagisa Thank you for your advice! It turned out I had two problems:

  1. The MPEG2 packet formats are complex enough that I really did need to start with a valid input to get good results.
  2. My driver was accidentally ignoring certain important code paths.

Now I've found 5 issues and I'm looking for more (I'll submit a PR for the trophy case shortly). These kinds of low-level binary decoders are excellent hunting grounds for bugs, even in safe mode.

I've provided documentation on how to fuzz-test vobsub here, in case anybody wants to experiment with it. Thank you for your help and for a great tool!

@whitequark
Copy link
Member

Triage: documentation issue, excerpt from above:

rust-san has instructions for setting up an instrumented stdlib with xargo, we could probably just provide links to that and perhaps add a cargo fuzz command for setting that up. You're right that this doesn't have to be the default.

@elichai
Copy link

elichai commented Jan 20, 2020

I think this is kinda solved in nightly? https://doc.rust-lang.org/nightly/cargo/reference/unstable.html#build-std

@Manishearth
Copy link
Member Author

Yeah, you can also use xargo on stable. I'd accept PRs adding support for a build-std mode

@Shnatsel
Copy link
Member

Shnatsel commented Jun 5, 2020

FWIW this is strictly required for Memory Sanitizer to work. Right now enabling msan in cargo-fuzz makes any binary immediately abort with a MSAN error. This is documented in more detail here: https://doc.rust-lang.org/nightly/unstable-book/compiler-flags/sanitizer.html#memorysanitizer

A recipe for MSAN build that also enables MSAN for C dependencies that are linked into the binary can be found here: rust-lang/rust#39610 (comment)

@Shnatsel
Copy link
Member

This is now as simple as cargo fuzz run target_name -Z build-std

Also, #233 automatically enables this option if Memory Sanitizer is enabled.

@saethlin
Copy link
Contributor

Should fuzz targets always use -Zbuild-std? The standard library has a lot of debug assertions and I'm sure there are Rust-level UB that the sanitizers won't understand but which could be caught by having debug_assertions turned on in the standard library.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

7 participants