Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

MSAN not working? Accepted methods of testing? #419

Open
agoodm88 opened this issue Jul 5, 2022 · 16 comments
Open

MSAN not working? Accepted methods of testing? #419

agoodm88 opened this issue Jul 5, 2022 · 16 comments

Comments

@agoodm88
Copy link

agoodm88 commented Jul 5, 2022

Hi,

As an idle time project/hobby I've spent some time fuzzing various image processing related applications/libraries. I've found some interesting bugs; I was confident that some were even interesting enough to report and some bugs were squashed (one potentially nasty looking). I am experienced with working in Linux systems, software compilation etc but I am not per say a developer. In the past week I've turned my attentions to libpng.

I've successfully compiled zlib with afl or honggfuzz instrumentation and then compiled instrumented libpng linked to this. This all seems to work and the results seem plausible. For example; zlib:
export CC=/home/alan/honggfuzz/hfuzz_cc/hfuzz-clang CXX=/home/alan/honggfuzz/hfuzz_cc/hfuzz-clang++
./configure --static
make -j24
Libpng:
./autogen.sh
./configure --with-zlib-prefix=/home/alan/zlib
make -j24
Then I compile the persistent mode harness provided in the honngfuzz examples: /home/alan/honggfuzz/hfuzz_cc/hfuzz-clang -I/home/alan/libpng persistent-png.c /home/alan/libpng/.libs/.a /home/alan/zlib/.a again this all works as expected.

A similar process was used using afl-clang-fast and defining CC=afl-clang-fast CXX=afl-clang-fast++ which all went well. Things got more complicated when I attempted to create instrumented binaries with added MSAN or ASAN/UBSAN. I completed the above steps with added CFLAGS="-fsanitize=memory" LDFLAGS="-fsanitize=memory" in the configure stage, swapping out memory as appropriate for ASAN/UBSAN and being careful to use fresh copies of the code for zlib/libpng for each iteration. This all worked as expected until I tried to test my hfuzz with MSAN build:

convert -size 1x1 xc:black black1x1.png
alan@fuzz:~$ ./libpng_msanhfuzz/a.out < black1x1.png
Accepting input from '[STDIN]'
Usage for fuzzing: honggfuzz -P [flags] -- ./libpng_msanhfuzz/a.out
Uninitialized bytes in __interceptor_write at offset 1 inside [0x701000000020, 6)
==3006822==WARNING: MemorySanitizer: use-of-uninitialized-value
#0 0x55e2277f1ca3 in LLVMFuzzerTestOneInput (/home/alan/libpng_msanhfuzz/a.out+0xa6ca3) (BuildId: cccb707311f27fd7ab1dae87897625c0f93ec9d3)
#1 0x55e22788eaf7 in HonggfuzzMain (/home/alan/libpng_msanhfuzz/a.out+0x143af7) (BuildId: cccb707311f27fd7ab1dae87897625c0f93ec9d3)
#2 0x7f0270ed2d8f in __libc_start_call_main csu/../sysdeps/nptl/libc_start_call_main.h:58:16
#3 0x7f0270ed2e3f in __libc_start_main csu/../csu/libc-start.c:392:3
#4 0x55e22776a6c4 in _start (/home/alan/libpng_msanhfuzz/a.out+0x1f6c4) (BuildId: cccb707311f27fd7ab1dae87897625c0f93ec9d3)

SUMMARY: MemorySanitizer: use-of-uninitialized-value (/home/alan/libpng_msanhfuzz/a.out+0xa6ca3) (BuildId: cccb707311f27fd7ab1dae87897625c0f93ec9d3) in LLVMFuzzerTestOneInput

This seems like an implausible result to me so I started looking around for the libpng equivalent of dwebp from libwebp or djpeg from libjpeg and didnt find much chatter online apart from one bug report here where a bug was apparently replicated in 'pngtest' so I tried this and got another implausible feeling result:

alan@fuzz:~$ ./libpng_msan/pngtest black1x1.png

Testing libpng version 1.6.38.git
with zlib version 1.2.11

libpng version 1.6.38.git
Copyright (c) 2018-2020 Cosmin Truta
Copyright (c) 1998-2002,2004,2006-2018 Glenn Randers-Pehrson
Copyright (c) 1996-1997 Andreas Dilger
Copyright (c) 1995-1996 Guy Eric Schalnat, Group 42, Inc.
library (10638): libpng version 1.6.38.git

pngtest (10638): libpng version 1.6.38.git

==3006831==WARNING: MemorySanitizer: use-of-uninitialized-value
#0 0x4ab305 in count_zero_samples /home/alan/libpng_msan/pngtest.c:255:17
#1 0x7f8b6521937b in png_do_write_transformations /home/alan/libpng_msan/pngwtran.c:511:10
#2 0x7f8b6521033f in png_write_row /home/alan/libpng_msan/pngwrite.c:857:7
#3 0x7f8b6520fd4e in png_write_rows /home/alan/libpng_msan/pngwrite.c:588:7
#4 0x4a7695 in test_one_file /home/alan/libpng_msan/pngtest.c:1507:10
#5 0x4a2a0e in main /home/alan/libpng_msan/pngtest.c:2036:19
#6 0x7f8b64e4ad8f in __libc_start_call_main csu/../sysdeps/nptl/libc_start_call_main.h:58:16
#7 0x7f8b64e4ae3f in __libc_start_main csu/../csu/libc-start.c:392:3
#8 0x423854 in _start (/home/alan/libpng_msan/.libs/pngtest+0x423854)

SUMMARY: MemorySanitizer: use-of-uninitialized-value /home/alan/libpng_msan/pngtest.c:255:17 in count_zero_samples
Exiting

Are there any accepted 'normal' methods of testing libpng? If so what are they? I tried various different combinations of configure options including --disable-shared without luck. As a last ditch attempt I tried compiling libwebp with MSAN and PNG support linked to my MSAN compiled binaries which also returned implausible results. Am I doing something wrong in my compilation process or is it simply not possible to build libpng/zlib with MSAN?

I guess if its impossible I just revert to the slower method of using valgrind and a debug enabled build built without instrumentation but otherwise using method from above?

Alan

@thealberto
Copy link
Contributor

HI Alan,
your issue is interesting. Have you spent more time and tried to understand which variable is initialized?

Recently I started to work on libpng as well and in my opinion the best way to fuzz libpng is via oss-fuzz.

I have event began to improve the current fuzzer and hopefully in the future it can be integrate: #274

Happy to collaborate if you want

@agoodm88
Copy link
Author

agoodm88 commented Jul 9, 2022

Since I'm operating with limited resources (up to 100 cores only but typically 12-48) I've also been attempting to avoid using oss-fuzz since Google is already running this at scale.

I've been attempting to build corpus which I think exercises code that oss-fuzz might be missing and then running this. This worked for Libraw for example. Typically I instrument a binary with only basic instrumentation to get maximum performance and then I periodically run the entire queue (afl)/corpus (other fuzz engines) through binaries instrumented with ASAN/UBSAN/MSAN to catch anything exciting.

This is where I came unstuck with Libpng because my MSAN builds are returning issues on every PNG file which unpacks successfully which feels suspicious to me - surely 'testpng' is a properly written tool which would not choke on a properly formed 1x1 pixel black image?!

@thealberto
Copy link
Contributor

Hi,

Since I'm operating with limited resources (up to 100 cores only but typically 12-48) I've also been attempting to avoid using oss-fuzz since Google is already running this at scale.

I've been attempting to build corpus which I think exercises code that oss-fuzz might be missing and then running this. This worked for Libraw for example. Typically I instrument a binary with only basic instrumentation to get maximum performance and then I periodically run the entire queue (afl)/corpus (other fuzz engines) through binaries instrumented with ASAN/UBSAN/MSAN to catch anything exciting.

This is a good idea IMO. Recently, I have been working on an fuzzer which gives better coverage which allowed me to find a heap overflow despite I have just used my laptop :)

I'd like to know how you create the new corpus and mayne propose to integrate it within the current one? Have you tried to verify the level of coverage via oss-fuzz?

This is where I came unstuck with Libpng because my MSAN builds are returning issues on every PNG file which unpacks successfully which feels suspicious to me - surely 'testpng' is a properly written tool which would not choke on a properly formed 1x1 pixel black image?!

My knowledge is still limited around testpng. Have you tried to understand which variable is unitialized and why? Could share the image the image as well? We could work on fix if you'd like.

Thanks

@agoodm88
Copy link
Author

Creating the corpus, I started with a large cache of PNG files I had left over from when I used to run a free image hosting service back in the early 2010s, I then downloaded basically every PNG file I could find on the internet until I got 200GB worth and ran this all through afl-tmin.

In terms of coverage etc I simply compile with CFLAGS=--coverage, run the corpus through the standard tool (currently im using pngtest here but im not sure if the accepted method in this regard) and then lcov/genhtml as appropriate.

I've got some (very hacky) scripts which I run to compare the coverage of my own testcases vs the ones uses to seed the fuzzing in oss-fuzz. I then pick out just the test cases hitting untouhced code and create a corpus. Where I go from there depends basically upon how I am feeling at that moment.

I've not really tried to understand why my msan build is failing in the way depicted in the first post because it feels like I've done something wrong - surely the default test tool initialises its variables so as to avoid undefined bahaviour?!

@agoodm88
Copy link
Author

P.S. You can generate the test image yourself using image magick - its just a single black pixel. convert -size 1x1 xc:black black1x1.png

@thealberto
Copy link
Contributor

thealberto commented Jul 12, 2022

Hi @agoodm88

Creating the corpus, I started with a large cache of PNG files I had left over from when I used to run a free image hosting service back in the early 2010s, I then downloaded basically every PNG file I could find on the internet until I got 200GB worth and ran this all through afl-tmin.

This is interesting, what the file size of the set of images that you have obtained via afl-min?

In terms of coverage etc I simply compile with CFLAGS=--coverage, run the corpus through the standard tool (currently im using pngtest here but im not sure if the accepted method in this regard) and then lcov/genhtml as appropriate.

I've got some (very hacky) scripts which I run to compare the coverage of my own testcases vs the ones uses to seed the fuzzing in oss-fuzz. I then pick out just the test cases hitting untouhced code and create a corpus. Where I go from there depends basically upon how I am feeling at that moment.

Do you think it would be possible for you to check the coverage using oss-fuzz? I would love to compare it with the coverage ... Could you share an image at the end? I think you should be able to do it vi the following commands from inside the oss-fuzz directory

sudo python infra/helper.py build_image libpng
sudo python infra/helper.py build_fuzzers --sanitizer=coverage libpng
sudo python infra/helper.py coverage --fuzz-target=<fuzzer_name> --corpus-dir=<path_to_the_corpus> libpng

I've not really tried to understand why my msan build is failing in the way depicted in the first post because it feels like I've done something wrong - surely the default test tool initialises its variables so as to avoid undefined bahaviour?!

I built pngtest as I was able to reproduce the error. I'll work a little bit onto it.

I'm not sure in this case, I'm curious to check it as well :)

Thanks for you comment

@agoodm88
Copy link
Author

This is interesting, what the file size of the set of images that you have obtained via afl-min?

After the tmin run the original files which cause unique execution paths (or something along those lines) are copied to the output directory. I've had mixed luck doing case minimization. Its a necessary step that does bring huge benefits. I've made a custom script (very hacky, but a lot better than the afl-ptmin script that someone else published) to run the case minimization in parallel but large/complex testcases could literally take weeks to process. Usually I just discard the slow testcases and make do with whatever minimized within 'a few days'. I process them in file size order to ensure I get best bang for buck here.

I will attempt to get coverage stats from oss-fuzz later on.

@agoodm88
Copy link
Author

Easier to get going than I expected to be honest.

With a corpus created from the aforementioned cache of PNG files shoved through honggfuzz in corpus minimization mode and allowed to run for some days:
37.98% (5017/13208)
43.88% (172/392)
30.66% (4779/15587)

With the corpus provided by libpng:
28.54% (3769/13208)
35.97% (141/392)
23.46% (3657/15587)

With a corpus I am currently running (10 hours in) in honggfuzz which focuses on interlaced PNG files:
36.38% (4805/13208)
43.11% (169/392)
29.68% (4627/15587)

All of the above files combined into one:
38.60% (5098/13208)
43.88% (172/392)
30.97% (4827/15587)

The above are line / function / region in order for each set.

My previous comments about obtaining corpus were not terribly representative of the works I've already completed in libpng - this is due to me not being able to get an afl instrumented binary that appears to work 100%. I've completed the getting files part, but stalled on the afl-tmin and afl-cmin... Hence this issue report... It also looks like my current attempt at targeting interlaced png files (due to lots of interlace handling related lines not being hit in the oss-fuzz coverage report) is still not hitting those lines so more investigation needed I guess...

@thealberto
Copy link
Contributor

thealberto commented Jul 12, 2022

Hi,

Easier to get going than I expected to be honest.

With a corpus created from the aforementioned cache of PNG files shoved through honggfuzz in corpus minimization mode and allowed to run for some days: 37.98% (5017/13208) 43.88% (172/392) 30.66% (4779/15587)

With the corpus provided by libpng: 28.54% (3769/13208) 35.97% (141/392) 23.46% (3657/15587)

With a corpus I am currently running (10 hours in) in honggfuzz which focuses on interlaced PNG files: 36.38% (4805/13208) 43.11% (169/392) 29.68% (4627/15587)

All of the above files combined into one: 38.60% (5098/13208) 43.88% (172/392) 30.97% (4827/15587)

The above are line / function / region in order for each set.

I think there are improvements thanks to your corpus. Are you able to compare it with oss-fuzz results. it would be superb if you could paste a screenshot of the acutal result here so we could keep track of the progress.

With my small corpus and improved fuzzer I obtain this #1048576 pulse cov: 417 ft: 418 corp: 2/9b lim: 4096 exec/s: 19418 rss: 241Mb. Pretty limited IMO.

My previous comments about obtaining corpus were not terribly representative of the works I've already completed in libpng - this is due to me not being able to get an afl instrumented binary that appears to work 100%. I've completed the getting files part, but stalled on the afl-tmin and afl-cmin... Hence this issue report... It also looks like my current attempt at targeting interlaced png files (due to lots of interlace handling related lines not being hit in the oss-fuzz coverage report) is still not hitting those lines so more investigation needed I guess...

I also noticed the interlaced images allow to cover part of the code which was not fuzzed before. Please keep going and keep me posted :)

@agoodm88
Copy link
Author

The (extremely hacky) way I compare coverage on a per file basis between a corpus I've created and the oss-fuzz results is to take those html files, pass them through lynx -dump file.html > file.html.dec. Then you can diff their results and your results and grep for "< 0" or "> 0" which will show lines you're hitting that oss-fuzz is missing and vice versa. I've made a really hacky script that automates this process and can create comparisons per input file.

I just ran my results and googles results through this process and the only line I hit and they miss is pngrutil.c:923 which isnt terribly exciting.

Based on this I've paused my fuzzing of libpng. I will be trying to work out how I can hit the deinterlace related code.

@thealberto
Copy link
Contributor

Hi @agoodm88 ,
is it possible for you to share your corpus please? How big is it? I would like to try it with my version of the fuzzer.

Thanks

@thealberto
Copy link
Contributor

Hi @agoodm88 ,
unfortunately I did not see any improvement as well :(

My current coverage is much worse then the one provided by oss-fuzz. I really I can have access to their corpus soon.

This is my coverage ATM
immagine

@agoodm88
Copy link
Author

Their corpus will be the 'rolling' one with the source files (per the ones in the oss-fuzz folder on here) plus all the broken files hitting every error handler in the code.

I'm happy to throw 12-24 cores at it for a few days if you are willing to share it?

@thealberto
Copy link
Contributor

Hi,
I think it is something different because I copied all those images but I cannot reproduce the same results. Honestly for the coverage you just need few seconds instead all those cores but thanks for asking. I'm sure in the future the would help.

@kcc could you help us to understand how to achieve the same level of coverage?

Anyway I think we are going a bit of topic and we should continue: here

@jbowler
Copy link
Contributor

jbowler commented Sep 28, 2024

@ctruta: close, issue was solved by oss-foss.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants