Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Panic when parsing fuzzed file with malformed HTML in it #521

Closed
5225225 opened this issue Mar 22, 2021 · 5 comments
Closed

Panic when parsing fuzzed file with malformed HTML in it #521

5225225 opened this issue Mar 22, 2021 · 5 comments
Labels

Comments

@5225225
Copy link
Contributor

5225225 commented Mar 22, 2021

There's already the support for fuzzing here, but I found this issue through https://github.com/rust-fuzz/targets and wrote a simple cargo-fuzz fuzzing target that only checks that parsing arbitrary HTML doesn't panic. (Might be worth having a cargo-fuzz target in addition to the existing one, simply because cargo fuzz is easier to get set up and seems to be able to run test cases faster? Though I know testing for panics isn't the main purpose of the existing fuzzer)

#![no_main]
use libfuzzer_sys::fuzz_target;

fuzz_target!(|data: &str| {
    let mut output = String::new();
    let parser = pulldown_cmark::Parser::new(data);
    pulldown_cmark::html::push_html(&mut output, parser);
});

Below's a standalone demo program. I tested that it crashes both with 0.8.0 from crates.io, as well as the latest version from git (d99667b, if just going to the directory in ~/.cargo/git/checkouts is correct).

fn main() {
    let mut output = String::new();
    let parser = pulldown_cmark::Parser::new(">\n >>><N\n");
    pulldown_cmark::html::push_html(&mut output, parser);
}

Backtrace is below

thread 'main' panicked at 'range start index 4 out of range for slice of length 3', /home/jess/.cargo/git/checkouts/pulldown-cmark-0e08cefe6e291038/d99667b/src/scanners.rs:1061:45
stack backtrace:
   0: rust_begin_unwind
             at /rustc/61edfd591cedff66fca639c02f66984f6271e5a6/library/std/src/panicking.rs:493:5
   1: core::panicking::panic_fmt
             at /rustc/61edfd591cedff66fca639c02f66984f6271e5a6/library/core/src/panicking.rs:92:14
   2: core::slice::index::slice_start_index_len_fail
             at /rustc/61edfd591cedff66fca639c02f66984f6271e5a6/library/core/src/slice/index.rs:34:5
   3: <core::ops::range::RangeFrom<usize> as core::slice::index::SliceIndex<[T]>>::index
             at /home/jess/.rustup/toolchains/nightly-x86_64-unknown-linux-gnu/lib/rustlib/src/rust/library/core/src/slice/index.rs:322:13
   4: core::slice::index::<impl core::ops::index::Index<I> for [T]>::index
             at /home/jess/.rustup/toolchains/nightly-x86_64-unknown-linux-gnu/lib/rustlib/src/rust/library/core/src/slice/index.rs:15:9
   5: pulldown_cmark::scanners::scan_html_block_inner
             at /home/jess/.cargo/git/checkouts/pulldown-cmark-0e08cefe6e291038/d99667b/src/scanners.rs:1061:45
   6: pulldown_cmark::parse::Parser::scan_inline_html
             at /home/jess/.cargo/git/checkouts/pulldown-cmark-0e08cefe6e291038/d99667b/src/parse.rs:841:29
   7: pulldown_cmark::parse::Parser::handle_inline_pass1::{{closure}}
             at /home/jess/.cargo/git/checkouts/pulldown-cmark-0e08cefe6e291038/d99667b/src/parse.rs:243:29
   8: core::option::Option<T>::and_then
             at /home/jess/.rustup/toolchains/nightly-x86_64-unknown-linux-gnu/lib/rustlib/src/rust/library/core/src/option.rs:724:24
   9: pulldown_cmark::parse::Parser::handle_inline_pass1
             at /home/jess/.cargo/git/checkouts/pulldown-cmark-0e08cefe6e291038/d99667b/src/parse.rs:242:43
  10: pulldown_cmark::parse::Parser::handle_inline
             at /home/jess/.cargo/git/checkouts/pulldown-cmark-0e08cefe6e291038/d99667b/src/parse.rs:196:9
  11: <pulldown_cmark::parse::Parser as core::iter::traits::iterator::Iterator>::next
             at /home/jess/.cargo/git/checkouts/pulldown-cmark-0e08cefe6e291038/d99667b/src/parse.rs:1453:21
  12: pulldown_cmark::html::HtmlWriter<I,W>::run
             at /home/jess/.cargo/git/checkouts/pulldown-cmark-0e08cefe6e291038/d99667b/src/html.rs:87:33
  13: pulldown_cmark::html::push_html
             at /home/jess/.cargo/git/checkouts/pulldown-cmark-0e08cefe6e291038/d99667b/src/html.rs:419:5
  14: scratchNTsHONsYl::main
             at ./main.rs:4:5
  15: core::ops::function::FnOnce::call_once
             at /home/jess/.rustup/toolchains/nightly-x86_64-unknown-linux-gnu/lib/rustlib/src/rust/library/core/src/ops/function.rs:227:5
note: Some details are omitted, run with `RUST_BACKTRACE=full` for a verbose backtrace.
@marcusklaas
Copy link
Collaborator

marcusklaas commented Mar 25, 2021

Dang, I thought we had caught all panics by now. It's great that you found this, because it almost certainly means there is a bug in the crate. I will try to reproduce this using your code snippet, and subsequently track down the offending code.

Concerning the fuzzing: this is something that I've been thinking about for a while. I had actually considered to move the fuzzing code out of crate and make it a proper project of its own, because a good fuzzer could be useful for other markdown parsers as well. I haven't actually brought this up with any other contributors, so it's just to say that our direction with regards to fuzzing isn't quite clear yet.

In any case, thanks for opening this issue. This will help make pulldown more reliable!

@5225225
Copy link
Contributor Author

5225225 commented Mar 25, 2021

If it helps, I can't seem to get it to panic on any input shorter than 8 bytes.

And there seems to be multiple bugs, the input " \u{b}\\\r- " (not raw, that's a rust string literal) looks to panic on

thread '<unnamed>' panicked at 'begin <= end (1 <= 0) when slicing ` 
-   `', /home/jess/src/pulldown-cmark/src/parse.rs:1389:46           \
stack backtrace:
   0:     0x55d924e08930 - std::backtrace_rs::backtrace::libunwind::trace::h4dee703919bfd40a
                               at /rustc/61edfd591cedff66fca639c02f66984f6271e5a6/library/std/src/../../backtrace/src/backtrace/libunwind.rs:90:5
   1:     0x55d924e08930 - std::backtrace_rs::backtrace::trace_unsynchronized::h457e839f1a563e20
                               at /rustc/61edfd591cedff66fca639c02f66984f6271e5a6/library/std/src/../../backtrace/src/backtrace/mod.rs:66:5
   2:     0x55d924e08930 - std::sys_common::backtrace::_print_fmt::h86a55fb30f8393c8
                               at /rustc/61edfd591cedff66fca639c02f66984f6271e5a6/library/std/src/sys_common/backtrace.rs:67:5
   3:     0x55d924e08930 - <std::sys_common::backtrace::_print::DisplayBacktrace as core::fmt::Display>::fmt::h7b3d6cac46d277e1
                               at /rustc/61edfd591cedff66fca639c02f66984f6271e5a6/library/std/src/sys_common/backtrace.rs:46:22
   4:     0x55d924e75f0f - core::fmt::write::h127419eb46f2ecc9
                               at /rustc/61edfd591cedff66fca639c02f66984f6271e5a6/library/core/src/fmt/mod.rs:1092:17
   5:     0x55d924dfca42 - std::io::Write::write_fmt::h6010cfbb4726588b
                               at /rustc/61edfd591cedff66fca639c02f66984f6271e5a6/library/std/src/io/mod.rs:1572:15
   6:     0x55d924e0c665 - std::sys_common::backtrace::_print::h79b4f9652330cc9d
                               at /rustc/61edfd591cedff66fca639c02f66984f6271e5a6/library/std/src/sys_common/backtrace.rs:49:5
   7:     0x55d924e0c665 - std::sys_common::backtrace::print::h330bb326a76af8cf
                               at /rustc/61edfd591cedff66fca639c02f66984f6271e5a6/library/std/src/sys_common/backtrace.rs:36:9
   8:     0x55d924e0c665 - std::panicking::default_hook::{{closure}}::heb6a42a7d50a472e
                               at /rustc/61edfd591cedff66fca639c02f66984f6271e5a6/library/std/src/panicking.rs:208:50
   9:     0x55d924e0c113 - std::panicking::default_hook::h17e521ba6d68d6e1
                               at /rustc/61edfd591cedff66fca639c02f66984f6271e5a6/library/std/src/panicking.rs:225:9
  10:     0x55d924d8c141 - libfuzzer_sys::initialize::{{closure}}::h58ed087a4bfb2657
  11:     0x55d924e0cdd0 - std::panicking::rust_panic_with_hook::h70db735e3a6e70cb
                               at /rustc/61edfd591cedff66fca639c02f66984f6271e5a6/library/std/src/panicking.rs:595:17
  12:     0x55d924e0c947 - std::panicking::begin_panic_handler::{{closure}}::h777c71c8e5a7e25c
                               at /rustc/61edfd591cedff66fca639c02f66984f6271e5a6/library/std/src/panicking.rs:497:13
  13:     0x55d924e08dec - std::sys_common::backtrace::__rust_end_short_backtrace::h3e9bf30168899554
                               at /rustc/61edfd591cedff66fca639c02f66984f6271e5a6/library/std/src/sys_common/backtrace.rs:141:18
  14:     0x55d924e0c8a9 - rust_begin_unwind
                               at /rustc/61edfd591cedff66fca639c02f66984f6271e5a6/library/std/src/panicking.rs:493:5
  15:     0x55d924b839d1 - core::panicking::panic_fmt::h5322a082d19786c3
                               at /rustc/61edfd591cedff66fca639c02f66984f6271e5a6/library/core/src/panicking.rs:92:14
  16:     0x55d924b83dfe - core::str::slice_error_fail::h4e664243c3c6a204
  17:     0x55d924d1e568 - pulldown_cmark::parse::item_to_event::h7dd71974418fa7a8
  18:     0x55d924d1ef31 - <pulldown_cmark::parse::Parser as core::iter::traits::iterator::Iterator>::next::hc6621a6ea07be5e0
  19:     0x55d924c32725 - pulldown_cmark::html::HtmlWriter<I,W>::run::h26f81a3cb529ddce
  20:     0x55d924c4b15b - pulldown_cmark::html::push_html::h8f9aab55c03c56c0
  21:     0x55d924c6dab4 - rust_fuzzer_test_input
  22:     0x55d924d8c191 - __rust_try
  23:     0x55d924d8bdf0 - LLVMFuzzerTestOneInput
  24:     0x55d924d82645 - _ZN6fuzzer6Fuzzer15ExecuteCallbackEPKhm
  25:     0x55d924d76b5b - _ZN6fuzzer10RunOneTestEPNS_6FuzzerEPKcm
  26:     0x55d924d7aaf7 - _ZN6fuzzer12FuzzerDriverEPiPPPcPFiPKhmE
  27:     0x55d924b840f3 - main
  28:     0x7f6fd062ab25 - __libc_start_main
  29:     0x55d924b8429e - _start
  30:                0x0 - <unknown>

(in item_to_event).

So my guess is that there's more than one bug, but it'll be difficult to find any more without fixing this main one.

I can just re-run the fuzzer after checking out the fix and report any more bugs either here or in new issues, depending on whatever whoever fixes this wants me to do.

@evverx
Copy link

evverx commented Jun 6, 2021

FWIW there is another panic the fuzz target triggers almost as soon as it starts (with a dictionary and seed corpus borrowed from https://github.com/google/oss-fuzz/blob/master/projects/cmark/build.sh):

INFO: Running with entropic power schedule (0xFF, 100).
INFO: Seed: 3405351740
INFO: Loaded 1 modules   (15097 inline 8-bit counters): 15097 [0x55c9cef8ac82, 0x55c9cef8e77b),
INFO: Loaded 1 PC tables (15097 PCs): 15097 [0x55c9cef8e780,0x55c9cefc9710),
./fuzz/target/x86_64-unknown-linux-gnu/release/fuzz_pulldown_cmark_read: Running 1 inputs 1 time(s) each.
Running: crash-fbbc8735058bdad5a1bce72cab76f7cb7bc91419
thread '<unnamed>' panicked at 'range start index 7 out of range for slice of length 6', /home/vagrant/RUST/pulldown-cmark/src/scanners.rs:871:17
note: run with `RUST_BACKTRACE=1` environment variable to display a backtrace
==190440== ERROR: libFuzzer: deadly signal
    #0 0x55c9cebdf291 in __sanitizer_print_stack_trace /rustc/llvm/src/llvm-project/compiler-rt/lib/asan/asan_stack.cpp:86:3
    #1 0x55c9cec36430 in fuzzer::PrintStackTrace() (/home/vagrant/RUST/pulldown-cmark/fuzz/target/x86_64-unknown-linux-gnu/release/fuzz_pulldown_cmark_read+0x1d8430)

it's just to say that our direction with regards to fuzzing isn't quite clear yet.

I think until it's decided another option would be to keep the "fuzz" directory in the OSS-Fuzz repository so that the project could be fuzzed continuously there. OSS-Fuzz automatically reports issues and closes them once they're fixed. I've opened google/oss-fuzz#5880. I'd appreciate it if you could let me know what you think. Thanks!

@loiclec
Copy link

loiclec commented Oct 8, 2021

Hi! I was testing pulldown-cmark to try my fuzzer, fuzzcheck, and also discovered three panics. I have added three minimal test cases below to reproduce them. The first two are already mentioned in this thread but the third one is new, I think.

fn parse(input: &str) {
    let parser = Parser::new(input);
    for event in parser {
        std::hint::black_box(event);
    }
}
#[test]
fn reproduce_crash_1() {
    let s = "><a\n";
    parse(s);
    // panicked at 'range start index 4 out of range for slice of length 3', src/scanners.rs:1061:45
}
#[test]
fn reproduce_crash2() {
    let s = "><a a\n";
    parse(s);
    // panicked at 'range start index 6 out of range for slice of length 5', src/scanners.rs:871:17
}
#[test]
fn reproduce_crash3() {
    let s = "><a a=\n毿>";
    parse(s);
    // panicked at 'invalid utf8: FromUtf8Error { bytes: [60, 97, 32, 97, 61, 10, 175, 191, 62], error: Utf8Error { valid_up_to: 6, error_len: Some(1) } }', src/parse.rs:252:61
}

@marcusklaas
Copy link
Collaborator

The two test cases you provided which indeed did crash pulldown in the past seem to no longer do that. I suspect it's due to the introduction of header attribute parsing done by @lo48576. I added two regression tests for this here, and reran your fuzzer for 10 minutes, but found nothing new.

Thanks a bunch for bringing this issue to our attention, and for fixing a bunch of them! I'll close this for now as there are no more known panics.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

4 participants