diff --git a/crates/biome_markdown_parser/tests/fuzz_corpus/seed.jsonl b/crates/biome_markdown_parser/tests/fuzz_corpus/seed.jsonl new file mode 100644 index 000000000000..ec11860bda6e --- /dev/null +++ b/crates/biome_markdown_parser/tests/fuzz_corpus/seed.jsonl @@ -0,0 +1,102 @@ +{"markdown":"[valid-title]: /url \"title\"\n\n- outer\n - nested\n lazy line\n- Bar\n ---\n\n> first line\nlazy continuation\n\n- Foo\n ===\n\n","html":"\n
\n

first line\nlazy continuation

\n
\n\n"} +{"markdown":"> `x` *em* text\n> baz *em* **bold**\n\n indented\n> *italic* word\n\nbar content world bar bar\n\n> [text](url)\n> **strong**\n> `x` foo `code`\n\n","html":"
\n

x em text\nbaz em bold

\n
\n
indented\n
\n
\n

italic word

\n
\n

bar content world bar bar

\n
\n

text\nstrong\nx foo code

\n
\n"} +{"markdown":"> first line\n> continued\n\n code line\n\n- item one\n- item two\n- item three\n+ `x` `x`\n\nBar\n===\n\n","html":"
\n

first line\ncontinued

\n
\n
code line\n
\n\n\n

Bar

\n"} +{"markdown":" let x = 1;\n\n","html":"
let x = 1;\n
\n"} +{"markdown":"1. tag *em* `x` `x`\n2. **bold** **strong** [text](url) baz\n\n> foo\n> [link](url) foo *italic* tag\n\n###### Part\n\n- item\n\n ```\n code\n ```\n","html":"
    \n
  1. tag em x x
  2. \n
  3. bold strong text baz
  4. \n
\n
\n

foo\nlink foo italic tag

\n
\n
Part
\n\n"} +{"markdown":"***\n- Foo\n ===\n> first line\n> continued\n","html":"
\n\n
\n

first line\ncontinued

\n
\n"} +{"markdown":"- outer\n - nested\n lazy line\n","html":"\n"} +{"markdown":"text ok end.\n\n- Foo\n ===\n___\n\n- item\n\n ```\n code\n ```\n\n","html":"

text ok end.

\n
    \n
  • \n

    Foo

    \n
  • \n
\n
\n
    \n
  • \n

    item

    \n
    code\n
    \n
  • \n
\n"} +{"markdown":"- Foo\n ===\n","html":"
    \n
  • \n

    Foo

    \n
  • \n
\n"} +{"markdown":"[link]: /path\n\n> first line\n> continued\n\n> first line\nlazy continuation\nHeading\n===\n\n","html":"
\n

first line\ncontinued

\n
\n
\n

first line\nlazy continuation\nHeading\n===

\n
\n"} +{"markdown":"bar\nworld foo foo content\n\n","html":"

bar\nworld foo foo content

\n"} +{"markdown":"text ok end.\n\n> first line\n> continued\n[angle]: trailing\n> first line\nlazy continuation\n\n```md\nlet x = 1;\n```\n\n","html":"

text ok end.

\n
\n

first line\ncontinued\n[angle]: trailing\nfirst line\nlazy continuation

\n
\n
let x = 1;\n
\n"} +{"markdown":"> first line\n> continued\n1. `x` *italic*\n##### Section\n","html":"
\n

first line\ncontinued

\n
\n
    \n
  1. x italic
  2. \n
\n
Section
\n"} +{"markdown":"#### Heading\n\n- outer\n * nested\n lazy line\n\nbar world test test hello\n","html":"

Heading

\n
    \n
  • outer\n
      \n
    • nested\nlazy line
    • \n
    \n
  • \n
\n

bar world test test hello

\n"} +{"markdown":"- item\n\n ~~~\n code\n ~~~\n\n***\n\n* item one\n\n* item two\n\n* item three\n\n","html":"
    \n
  • \n

    item

    \n
    code\n
    \n
  • \n
\n
\n
    \n
  • \n

    item one

    \n
  • \n
  • \n

    item two

    \n
  • \n
  • \n

    item three

    \n
  • \n
\n"} +{"markdown":"- Foo\n ===\n\n code line\n\n","html":"
    \n
  • \n

    Foo

    \n

    code line

    \n
  • \n
\n"} +{"markdown":"- item\n\n ```\n code\n ```\n[bar]: https://example.com\n> first line\nlazy continuation\n\ntest bar hello\n\nhello baz foo\n\n","html":"
    \n
  • \n

    item

    \n
    code\n
    \n
  • \n
\n
\n

first line\nlazy continuation

\n
\n

test bar hello

\n

hello baz foo

\n"} +{"markdown":"> tag foo **strong** **strong**\n\n- outer\n * nested\n lazy line\n","html":"
\n

tag foo strong strong

\n
\n
    \n
  • outer\n
      \n
    • nested\nlazy line
    • \n
    \n
  • \n
\n"} +{"markdown":"~~~\nfn main() {}\n~~~\n","html":"
fn main() {}\n
\n"} +{"markdown":"> bar\n\n[bar]: https://example.com\nworld\n\ncontent content\n\n* foo [link](url)\n* [link](url) *em* tag\n* *italic* bar *em*\n* foo [link](url) bar\n\n","html":"
\n

bar

\n
\n

world

\n

content content

\n\n"} +{"markdown":"* text text bar\n* text foo tag\n","html":"
    \n
  • text text bar
  • \n
  • text foo tag
  • \n
\n"} +{"markdown":"* outer\n - nested\n lazy line\n\n> *em* tag
\n> [text](url)\n> text **strong** tag\n\n- Bar\n ---\n","html":"
    \n
  • outer\n
      \n
    • nested\nlazy line
    • \n
    \n
  • \n
\n
\n

em tag\ntext\ntext strong tag

\n
\n
    \n
  • \n

    Bar

    \n
  • \n
\n"} +{"markdown":"```\ncode here\n```\n\n- Bar\n ===\n\n","html":"
code here\n
\n
    \n
  • \n

    Bar

    \n
  • \n
\n"} +{"markdown":"- # Foo\n\n___\n","html":"
    \n
  • \n

    Foo

    \n
  • \n
\n
\n"} +{"markdown":"- outer\n * nested\n lazy line\n\n","html":"
    \n
  • outer\n
      \n
    • nested\nlazy line
    • \n
    \n
  • \n
\n"} +{"markdown":"text ok end.\n\nFoo\n---\n\n","html":"

text ok end.

\n

Foo

\n"} +{"markdown":"***\n\n indented\n","html":"
\n
indented\n
\n"} +{"markdown":"baz content\n- ## Bar\n\n[foo]: https://example.com \"my title\"\n1. tag
`x`\n\n","html":"

baz content

\n
    \n
  • \n

    Bar

    \n
  • \n
\n
    \n
  1. tag x
  2. \n
\n"} +{"markdown":"* item one\n\n* item two\n\n* item three\n* item one\n\n* item two\n\n* item three\n\n- Foo\n ===\n","html":"
    \n
  • \n

    item one

    \n
  • \n
  • \n

    item two

    \n
  • \n
  • \n

    item three

    \n
  • \n
  • \n

    item one

    \n
  • \n
  • \n

    item two

    \n
  • \n
  • \n

    item three

    \n
  • \n
\n
    \n
  • \n

    Foo

    \n
  • \n
\n"} +{"markdown":"- outer\n - nested\n lazy line\n\ntext ok end.\n\n","html":"
    \n
  • outer\n
      \n
    • nested\nlazy line
    • \n
    \n
  • \n
\n

text ok end.

\n"} +{"markdown":"## Title\n~~~md\nfn main() {}\n~~~\n","html":"

Title

\n
fn main() {}\n
\n"} +{"markdown":"[angle]: trailing\n\n","html":"

[angle]: trailing

\n"} +{"markdown":"hello\n\n- Foo\n ===\n> first line\nlazy continuation\n\n~~~rust\nfn main() {}\n~~~\n[angle]: trailing\n","html":"

hello

\n
    \n
  • \n

    Foo

    \n
  • \n
\n
\n

first line\nlazy continuation

\n
\n
fn main() {}\n
\n

[angle]: trailing

\n"} +{"markdown":"* item one\n* item two\n* item three\n```\nlet x = 1;\n```\n\n","html":"
    \n
  • item one
  • \n
  • item two
  • \n
  • item three
  • \n
\n
let x = 1;\n
\n"} +{"markdown":"content content baz foo baz\n- item\n\n ~~~\n code\n ~~~\n\n> baz *italic*\n> **strong** [link](url) `x`\nFoo\n---\n\n","html":"

content content baz foo baz

\n
    \n
  • \n

    item

    \n
    code\n
    \n
  • \n
\n
\n

baz italic\nstrong link x\nFoo

\n
\n
\n"} +{"markdown":"* `code` `code` [text](url) *em*\n* *italic* foo\n* tag **strong**\n\n[valid-title]: /url \"title\"\n","html":"
    \n
  • code code text em
  • \n
  • italic foo
  • \n
  • tag strong
  • \n
\n"} +{"markdown":"- Bar\n ===\n> first line\nlazy continuation\n\n- item one\n\n- item two\n\n- item three\n","html":"
    \n
  • \n

    Bar

    \n
  • \n
\n
\n

first line\nlazy continuation

\n
\n
    \n
  • \n

    item one

    \n
  • \n
  • \n

    item two

    \n
  • \n
  • \n

    item three

    \n
  • \n
\n"} +{"markdown":"###### Part ######\n\n- item\n\n ```\n code\n ```\n","html":"
Part
\n
    \n
  • \n

    item

    \n
    code\n
    \n
  • \n
\n"} +{"markdown":"- item\n\n ~~~\n code\n ~~~\n> [link](url) `code`\n> `code` bar `x`\n> *italic* bar **strong** [link](url)\n\n* foo [text](url) **strong** [text](url)\n* word **bold** tag bar\n\n","html":"
    \n
  • \n

    item

    \n
    code\n
    \n
  • \n
\n
\n

link code\ncode bar x\nitalic bar strong link

\n
\n
    \n
  • foo text strong text
  • \n
  • word bold tag bar
  • \n
\n"} +{"markdown":"- ### Foo\n1. `code`\n\n","html":"
    \n
  • \n

    Foo

    \n
  • \n
\n
    \n
  1. code
  2. \n
\n"} +{"markdown":"world hello bar test bar\n\n***\n\n#### Heading ####\n\n- # Bar\n\n> first line\n> continued\n\n","html":"

world hello bar test bar

\n
\n

Heading

\n
    \n
  • \n

    Bar

    \n
  • \n
\n
\n

first line\ncontinued

\n
\n"} +{"markdown":"Bar\n===\n\nworld foo\n\n- ## Foo\n\n","html":"

Bar

\n

world foo

\n
    \n
  • \n

    Foo

    \n
  • \n
\n"} +{"markdown":"[valid]: /url\n\n- Foo\n ===\n- Bar\n ---\n- Bar\n ===\n\n","html":"
    \n
  • \n

    Foo

    \n
  • \n
  • \n

    Bar

    \n
  • \n
  • \n

    Bar

    \n
  • \n
\n"} +{"markdown":" indented\n\n[valid-title]: /url \"title\"\n___\n","html":"
indented\n
\n
\n"} +{"markdown":"- item one\n- item two\n- item three\n[valid]: /url\n let x = 1;\n\n","html":"
    \n
  • item one
  • \n
  • item two
  • \n
  • item three\n[valid]: /url\nlet x = 1;
  • \n
\n"} +{"markdown":"* [link](url)\n* **bold**\n* `code` **strong**\n* [link](url) *italic* *em* *em*\n> first line\nlazy continuation\ntext ok end.\n\n","html":"
    \n
  • link
  • \n
  • bold
  • \n
  • code strong
  • \n
  • link italic em em
  • \n
\n
\n

first line\nlazy continuation\ntext ok end.

\n
\n"} +{"markdown":"[angle]: trailing\n","html":"

[angle]: trailing

\n"} +{"markdown":"Heading\n---\n\n","html":"

Heading

\n"} +{"markdown":"[angle]: trailing\n\nworld\n","html":"

[angle]: trailing

\n

world

\n"} +{"markdown":"~~~js\nfn main() {}\n~~~\n\n","html":"
fn main() {}\n
\n"} +{"markdown":"- item one\n- item two\n- item three\n","html":"
    \n
  • item one
  • \n
  • item two
  • \n
  • item three
  • \n
\n"} +{"markdown":"+ bar *italic*\n+ `x`\n\n","html":"
    \n
  • bar italic
  • \n
  • x
  • \n
\n"} +{"markdown":"- item one\n- item two\n- item three\n```\ncode here\n```\n[valid-title]: /url \"title\"\n\n- outer\n * nested\n lazy line\n1. bar\n2. **strong** foo **strong**\n3. bar *italic* `x`\n\n","html":"
    \n
  • item one
  • \n
  • item two
  • \n
  • item three
  • \n
\n
code here\n
\n
    \n
  • outer\n
      \n
    • nested\nlazy line
    • \n
    \n
  • \n
\n
    \n
  1. bar
  2. \n
  3. strong foo strong
  4. \n
  5. bar italic x
  6. \n
\n"} +{"markdown":" let x = 1;\n\n- `x`\n- *em* tag `x` text\n- baz *em*\n- `x` **strong** *italic* foo\n\n> first line\nlazy continuation\n","html":"
let x = 1;\n
\n
    \n
  • x
  • \n
  • em tag x text
  • \n
  • baz em
  • \n
  • x strong italic foo
  • \n
\n
\n

first line\nlazy continuation

\n
\n"} +{"markdown":"- item\n\n ```\n code\n ```\n\n> `code` bar bar\n","html":"
    \n
  • \n

    item

    \n
    code\n
    \n
  • \n
\n
\n

code bar bar

\n
\n"} +{"markdown":"> first line\nlazy continuation\n let x = 1;\n\n1. bar *italic* **strong** [text](url)\n2. tag
\n","html":"
\n

first line\nlazy continuation\nlet x = 1;

\n
\n
    \n
  1. bar italic strong text
  2. \n
  3. tag
  4. \n
\n"} +{"markdown":"* `x` [link](url)\n* *em* bar tag
\n","html":"
    \n
  • x link
  • \n
  • em bar tag
  • \n
\n"} +{"markdown":"[valid]: /url\n\n- baz *italic*\n","html":"
    \n
  • baz italic
  • \n
\n"} +{"markdown":"- Bar\n ===\n* `x` *italic* tag bar\n* `code` **strong** foo\n\n","html":"
    \n
  • \n

    Bar

    \n
  • \n
\n
    \n
  • x italic tag bar
  • \n
  • code strong foo
  • \n
\n"} +{"markdown":"+ `x` **bold**\n+ bar `code` [text](url) text\n+ bar\n\n[valid]: /url\n- Bar\n ===\n- Bar\n ===\n\n###### Heading\n","html":"
    \n
  • x bold
  • \n
  • bar code text text
  • \n
  • bar
  • \n
\n
    \n
  • \n

    Bar

    \n
  • \n
  • \n

    Bar

    \n
  • \n
\n
Heading
\n"} +{"markdown":"1. bar tag `x` word\n2. `code` **bold** [link](url)\n\n1. *em*\n2. tag tag [link](url)\n> first line\nlazy continuation\n let x = 1;\n","html":"
    \n
  1. \n

    bar tag x word

    \n
  2. \n
  3. \n

    code bold link

    \n
  4. \n
  5. \n

    em

    \n
  6. \n
  7. \n

    tag tag link

    \n
  8. \n
\n
\n

first line\nlazy continuation\nlet x = 1;

\n
\n"} +{"markdown":"- Bar\n ===\n","html":"
    \n
  • \n

    Bar

    \n
  • \n
\n"} +{"markdown":"###### Part\n\n> word tag\n\n> `code` *italic*\n> foo tag\n","html":"
Part
\n
\n

word tag

\n
\n
\n

code italic\nfoo tag

\n
\n"} +{"markdown":"[foo]: /path\n\nBar\n---\n\n- item\n\n ```\n code\n ```\n\n","html":"

Bar

\n
    \n
  • \n

    item

    \n
    code\n
    \n
  • \n
\n"} +{"markdown":"* item one\n\n* item two\n\n* item three\n","html":"
    \n
  • \n

    item one

    \n
  • \n
  • \n

    item two

    \n
  • \n
  • \n

    item three

    \n
  • \n
\n"} +{"markdown":"- Bar\n ===\n\n","html":"
    \n
  • \n

    Bar

    \n
  • \n
\n"} +{"markdown":"> [link](url) [text](url) `x` bar\n> tag **strong** foo\n> tag tag `code` **bold**\n let x = 1;\n\n","html":"
\n

link text x bar\ntag strong foo\ntag tag code bold\nlet x = 1;

\n
\n"} +{"markdown":"> first line\nlazy continuation\n\n","html":"
\n

first line\nlazy continuation

\n
\n"} +{"markdown":"1. **bold** [text](url) **strong**\n2. [text](url) *italic* `x`\n> tag bar\n- ## Foo\n\n","html":"
    \n
  1. bold text strong
  2. \n
  3. text italic x
  4. \n
\n
\n

tag bar

\n
\n
    \n
  • \n

    Foo

    \n
  • \n
\n"} +{"markdown":" indented\n\n let x = 1;\n\n> tag
\n> text bar\n\n> first line\n> continued\n- item\n\n ~~~\n code\n ~~~\n\n","html":"
indented\n\nlet x = 1;\n
\n
\n

tag\ntext bar

\n
\n
\n

first line\ncontinued

\n
\n
    \n
  • \n

    item

    \n
    code\n
    \n
  • \n
\n"} +{"markdown":"- item\n\n ~~~\n code\n ~~~\n","html":"
    \n
  • \n

    item

    \n
    code\n
    \n
  • \n
\n"} +{"markdown":"text ok end.\n\n___\n> *em* [text](url) *em* **strong**\n> [link](url) **bold**\n> **bold**\n\nhello bar foo\n- item\n\n ~~~\n code\n ~~~\n\n","html":"

text ok end.

\n
\n
\n

em text em strong\nlink bold\nbold

\n
\n

hello bar foo

\n
    \n
  • \n

    item

    \n
    code\n
    \n
  • \n
\n"} +{"markdown":"* outer\n - nested\n lazy line\n> **bold** `code`\n> `x` `x` [text](url)\n> tag
\n* item one\n\n* item two\n\n* item three\n\n","html":"
    \n
  • outer\n
      \n
    • nested\nlazy line
    • \n
    \n
  • \n
\n
\n

bold code\nx x text\ntag

\n
\n
    \n
  • \n

    item one

    \n
  • \n
  • \n

    item two

    \n
  • \n
  • \n

    item three

    \n
  • \n
\n"} +{"markdown":"#### Heading ####\n> first line\nlazy continuation\n[valid]: /url\n- ## Foo\n","html":"

Heading

\n
\n

first line\nlazy continuation\n[valid]: /url

\n
\n
    \n
  • \n

    Foo

    \n
  • \n
\n"} +{"markdown":"* outer\n - nested\n lazy line\n\n[valid]: /url\n\n- ### Foo\n+ baz tag\n","html":"
    \n
  • outer\n
      \n
    • nested\nlazy line
    • \n
    \n
  • \n
\n
    \n
  • \n

    Foo

    \n
  • \n
\n
    \n
  • baz tag
  • \n
\n"} +{"markdown":"* outer\n * nested\n lazy line\n\n","html":"
    \n
  • outer\n
      \n
    • nested\nlazy line
    • \n
    \n
  • \n
\n"} +{"markdown":"[bar]: /path \"my title\"\n\n","html":""} +{"markdown":"> first line\n> continued\n> first line\nlazy continuation\n\n code line\n\n##### Part\n\n","html":"
\n

first line\ncontinued\nfirst line\nlazy continuation

\n
\n
code line\n
\n
Part
\n"} +{"markdown":"- item\n\n ```\n code\n ```\n\nworld test foo\n```\nlet x = 1;\n```\n\n- item one\n\n- item two\n\n- item three\n\n- Foo\n ---\n\n","html":"
    \n
  • \n

    item

    \n
    code\n
    \n
  • \n
\n

world test foo

\n
let x = 1;\n
\n
    \n
  • \n

    item one

    \n
  • \n
  • \n

    item two

    \n
  • \n
  • \n

    item three

    \n
  • \n
  • \n

    Foo

    \n
  • \n
\n"} +{"markdown":"> **bold** *italic*\n\n* outer\n - nested\n lazy line\n\n# Heading\n\n> first line\nlazy continuation\n[invalid]: /url trailing text\n\n","html":"
\n

bold italic

\n
\n
    \n
  • outer\n
      \n
    • nested\nlazy line
    • \n
    \n
  • \n
\n

Heading

\n
\n

first line\nlazy continuation\n[invalid]: /url trailing text

\n
\n"} +{"markdown":"bar test\n","html":"

bar test

\n"} +{"markdown":"---\n- item\n\n ~~~\n code\n ~~~\n\n","html":"
\n
    \n
  • \n

    item

    \n
    code\n
    \n
  • \n
\n"} +{"markdown":"# Heading #\n\n- Foo\n ===\n\n[valid-title]: /url \"title\"\n\n","html":"

Heading

\n
    \n
  • \n

    Foo

    \n
  • \n
\n"} +{"markdown":"> first line\n> continued\n##### Title\n\n","html":"
\n

first line\ncontinued

\n
\n
Title
\n"} +{"markdown":"- item\n\n ```\n code\n ```\n\n[foo]: https://example.com\n\n","html":"
    \n
  • \n

    item

    \n
    code\n
    \n
  • \n
\n"} +{"markdown":"```rust\ncode here\n```\n\n* item one\n\n* item two\n\n* item three\n* outer\n - nested\n lazy line\n\n","html":"
code here\n
\n
    \n
  • \n

    item one

    \n
  • \n
  • \n

    item two

    \n
  • \n
  • \n

    item three

    \n
  • \n
  • \n

    outer

    \n
      \n
    • nested\nlazy line
    • \n
    \n
  • \n
\n"} +{"markdown":"hello content bar\n","html":"

hello content bar

\n"} +{"markdown":"foo\n\n* tag [text](url)\n* [text](url) text *italic*\n* foo\n* bar `code`\n+ tag tag\n+ bar `x`\n\n+ bar tag bar *em*\n1. tag tag tag\n2. tag\n3. [text](url) **strong**\n\n","html":"

foo

\n
    \n
  • tag text
  • \n
  • text text italic
  • \n
  • foo
  • \n
  • bar code
  • \n
\n
    \n
  • \n

    tag tag

    \n
  • \n
  • \n

    bar x

    \n
  • \n
  • \n

    bar tag bar em

    \n
  • \n
\n
    \n
  1. tag tag tag
  2. \n
  3. tag
  4. \n
  5. text strong
  6. \n
\n"} +{"markdown":"- item\n\n ```\n code\n ```\n\n> first line\nlazy continuation\n\n- item\n\n ~~~\n code\n ~~~\n\n- # Foo\n\n","html":"
    \n
  • \n

    item

    \n
    code\n
    \n
  • \n
\n
\n

first line\nlazy continuation

\n
\n
    \n
  • \n

    item

    \n
    code\n
    \n
  • \n
  • \n

    Foo

    \n
  • \n
\n"} +{"markdown":"1. foo\n2. bar [link](url) `code` word\n3. tag tag
`code` [text](url)\n> [text](url)\n> tag `code` *italic* bar\n> bar foo foo [link](url)\n","html":"
    \n
  1. foo
  2. \n
  3. bar link code word
  4. \n
  5. tag tag code text
  6. \n
\n
\n

text\ntag code italic bar\nbar foo foo link

\n
\n"} +{"markdown":"[foo]: /url\n+ [text](url)\n+ *em*\n+ **strong**\n+ foo\n\ntext ok end.\n\nbaz\n","html":"
    \n
  • text
  • \n
  • em
  • \n
  • strong
  • \n
  • foo
  • \n
\n

text ok end.

\n

baz

\n"} +{"markdown":"[valid]: /url\n\n* outer\n * nested\n lazy line\n> first line\nlazy continuation\n[valid-title]: /url \"title\"\n","html":"
    \n
  • outer\n
      \n
    • nested\nlazy line
    • \n
    \n
  • \n
\n
\n

first line\nlazy continuation\n[valid-title]: /url "title"

\n
\n"} +{"markdown":"[valid-title]: /url \"title\"\n\n___\n* outer\n - nested\n lazy line\n\n[valid]: /url\n> tag
\n> **strong**\n> bar\n","html":"
\n
    \n
  • outer\n
      \n
    • nested\nlazy line
    • \n
    \n
  • \n
\n
\n

tag\nstrong\nbar

\n
\n"} +{"markdown":"- outer\n * nested\n lazy line\n> **bold**\n\n let x = 1;\n\n- item\n\n ```\n code\n ```\n\n","html":"
    \n
  • outer\n
      \n
    • nested\nlazy line
    • \n
    \n
  • \n
\n
\n

bold

\n
\n
let x = 1;\n
\n
    \n
  • \n

    item

    \n
    code\n
    \n
  • \n
\n"} +{"markdown":"- outer\n - nested\n lazy line\n\n","html":"
    \n
  • outer\n
      \n
    • nested\nlazy line
    • \n
    \n
  • \n
\n"} +{"markdown":"- # Foo\n\n> first line\nlazy continuation\n\n> **bold** **strong** bar bar\n\n","html":"
    \n
  • \n

    Foo

    \n
  • \n
\n
\n

first line\nlazy continuation

\n
\n
\n

bold strong bar bar

\n
\n"} +{"markdown":"text ok end.\n+ [text](url) bar\n+ *em* `code` [link](url)\n***\n\n","html":"

text ok end.

\n\n
\n"} +{"markdown":"Foo\n---\n###### Title\n","html":"

Foo

\n
Title
\n"} +{"markdown":"~~~rust\nfn main() {}\n~~~\n","html":"
fn main() {}\n
\n"} +{"markdown":"Foo\n===\n\n- ## Foo\n\n> bar *em* tag\n\n","html":"

Foo

\n
    \n
  • \n

    Foo

    \n
  • \n
\n
\n

bar em tag

\n
\n"} +{"markdown":"```js\ncode here\n```\n\n","html":"
code here\n
\n"} +{"markdown":"- ## Foo\n> *em* `x` [link](url) *em*\n\n","html":"
    \n
  • \n

    Foo

    \n
  • \n
\n
\n

em x link em

\n
\n"} diff --git a/crates/biome_markdown_parser/tests/fuzz_differential.rs b/crates/biome_markdown_parser/tests/fuzz_differential.rs new file mode 100644 index 000000000000..669aeedad137 --- /dev/null +++ b/crates/biome_markdown_parser/tests/fuzz_differential.rs @@ -0,0 +1,167 @@ +//! Differential fuzzer: compares Biome's markdown HTML output against +//! commonmark.js reference output from a pre-generated corpus. +//! +//! The checked-in seed corpus (`seed.jsonl`) contains only passing cases. +//! Any failure is either a regression or a newly discovered mismatch. +//! +//! Run with: cargo test -p biome_markdown_parser --test fuzz_differential -- --ignored --nocapture + +use biome_markdown_parser::{document_to_html, parse_markdown}; +use biome_markdown_syntax::MdDocument; +use biome_rowan::AstNode; +use std::fs; +use std::path::{Path, PathBuf}; + +/// Normalize HTML for comparison, preserving whitespace inside `
` blocks.
+/// Matches the normalization in `xtask/coverage/src/markdown/commonmark.rs`.
+fn normalize_html(html: &str) -> String {
+    let mut result = Vec::new();
+    let mut in_pre = false;
+
+    for line in html.lines() {
+        if line.contains("") {
+            in_pre = false;
+        }
+    }
+
+    result.join("\n").trim().to_string() + "\n"
+}
+
+/// FNV-1a 64-bit hash — deterministic across Rust toolchain versions.
+fn content_hash(s: &str) -> String {
+    let mut hash: u64 = 0xcbf2_9ce4_8422_2325;
+    for byte in s.as_bytes() {
+        hash ^= *byte as u64;
+        hash = hash.wrapping_mul(0x0100_0000_01b3);
+    }
+    format!("{hash:016x}")
+}
+
+#[derive(serde::Deserialize)]
+struct SeedCase {
+    markdown: String,
+    html: String,
+}
+
+struct Failure {
+    hash: String,
+    markdown: String,
+    expected: String,
+    actual: String,
+}
+
+fn run_corpus(path: &Path) -> (Vec, usize) {
+    let content = fs::read_to_string(path)
+        .unwrap_or_else(|e| panic!("Failed to read corpus {}: {e}", path.display()));
+
+    let mut failures = vec![];
+    let mut total = 0usize;
+
+    for (i, line) in content.lines().enumerate() {
+        if line.trim().is_empty() {
+            continue;
+        }
+
+        let entry: SeedCase = serde_json::from_str(line)
+            .unwrap_or_else(|e| panic!("Malformed JSON at {}:{}: {e}", path.display(), i + 1));
+
+        let markdown = &entry.markdown;
+        let expected_html = &entry.html;
+        total += 1;
+
+        let parsed = parse_markdown(markdown);
+        let Some(doc) = MdDocument::cast(parsed.syntax()) else {
+            failures.push(Failure {
+                hash: content_hash(markdown),
+                markdown: markdown.clone(),
+                expected: expected_html.clone(),
+                actual: "".to_string(),
+            });
+            continue;
+        };
+
+        let actual = document_to_html(
+            &doc,
+            parsed.list_tightness(),
+            parsed.list_item_indents(),
+            parsed.quote_indents(),
+        );
+
+        let expected_normalized = normalize_html(expected_html);
+        let actual_normalized = normalize_html(&actual);
+
+        if expected_normalized != actual_normalized {
+            failures.push(Failure {
+                hash: content_hash(markdown),
+                markdown: markdown.clone(),
+                expected: expected_html.clone(),
+                actual,
+            });
+        }
+    }
+
+    (failures, total)
+}
+
+#[test]
+#[ignore]
+fn differential_fuzz_against_commonmark_js() {
+    let manifest_dir = Path::new(env!("CARGO_MANIFEST_DIR"));
+    let corpus_dir = manifest_dir.join("tests/fuzz_corpus");
+
+    // Always run the checked-in seed corpus (passing cases only)
+    let seed_path = corpus_dir.join("seed.jsonl");
+    let (mut all_failures, mut total) = run_corpus(&seed_path);
+
+    // Optionally run an extended corpus if FUZZ_CORPUS env var is set
+    if let Ok(extra_path) = std::env::var("FUZZ_CORPUS") {
+        let (extra_failures, extra_total) = run_corpus(Path::new(&extra_path));
+        all_failures.extend(extra_failures);
+        total += extra_total;
+    }
+
+    // Write failure artifacts if FUZZ_FAILURES_DIR is set
+    if let Ok(failures_dir) = std::env::var("FUZZ_FAILURES_DIR") {
+        let dir = PathBuf::from(&failures_dir);
+        fs::create_dir_all(&dir).expect("Failed to create failures directory");
+
+        for failure in &all_failures {
+            let base = dir.join(&failure.hash);
+            fs::write(base.with_extension("md"), &failure.markdown).ok();
+            fs::write(base.with_extension("expected.html"), &failure.expected).ok();
+            fs::write(base.with_extension("actual.html"), &failure.actual).ok();
+        }
+    }
+
+    // Print summary
+    let passed = total - all_failures.len();
+    eprintln!(
+        "\nDifferential fuzz: {total} cases, {passed} passed, {} failed",
+        all_failures.len()
+    );
+
+    if !all_failures.is_empty() {
+        eprintln!("\n=== {} differential failures ===\n", all_failures.len());
+        for (i, f) in all_failures.iter().enumerate().take(10) {
+            eprintln!("--- Failure {} [{}] ---", i + 1, f.hash);
+            eprintln!("Input:\n{}", f.markdown);
+            eprintln!("Expected:\n{}", f.expected);
+            eprintln!("Actual:\n{}", f.actual);
+            eprintln!();
+        }
+        if all_failures.len() > 10 {
+            eprintln!("... and {} more", all_failures.len() - 10);
+        }
+        panic!("{} differential mismatches found", all_failures.len());
+    }
+
+    eprintln!("All cases passed.");
+}
diff --git a/crates/biome_markdown_parser/tests/fuzz_generate_corpus.cjs b/crates/biome_markdown_parser/tests/fuzz_generate_corpus.cjs
new file mode 100644
index 000000000000..8e02bfd092ce
--- /dev/null
+++ b/crates/biome_markdown_parser/tests/fuzz_generate_corpus.cjs
@@ -0,0 +1,295 @@
+#!/usr/bin/env node
+// Differential fuzzer corpus generator for Biome's markdown parser.
+// Generates random markdown inputs from construct combinators and renders
+// reference HTML via commonmark.js.
+//
+// Usage:
+//   node fuzz_generate_corpus.cjs [--count=N] [--seed=N] [--output=path]
+//
+// Requires `pnpm install` from the repo root (commonmark is a root devDependency).
+
+"use strict";
+
+const { writeFileSync } = require("node:fs");
+
+// Parse CLI args
+const args = Object.fromEntries(
+  process.argv.slice(2).map((a) => {
+    const [k, v] = a.replace(/^--/, "").split("=");
+    return [k, v];
+  })
+);
+
+const count = parseInt(args.count || "1000", 10);
+const seed = parseInt(args.seed || "42", 10);
+const outputPath = args.output || "corpus.jsonl";
+
+// Load commonmark via require() — relies on cwd having node_modules/commonmark
+const { Parser, HtmlRenderer } = require("commonmark");
+
+const parser = new Parser();
+const renderer = new HtmlRenderer();
+
+function render(md) {
+  return renderer.render(parser.parse(md));
+}
+
+// #region Seeded PRNG (xorshift32)
+let rngState = seed === 0 ? 1 : seed;
+function rand() {
+  rngState ^= rngState << 13;
+  rngState ^= rngState >> 17;
+  rngState ^= rngState << 5;
+  return (rngState >>> 0) / 0x100000000;
+}
+function randInt(min, max) {
+  return min + Math.floor(rand() * (max - min + 1));
+}
+function pick(arr) {
+  return arr[randInt(0, arr.length - 1)];
+}
+function maybe(prob = 0.5) {
+  return rand() < prob;
+}
+// #endregion
+
+// #region Construct combinators
+
+function genParagraph() {
+  const words = ["foo", "bar", "baz", "hello", "world", "test", "content"];
+  const len = randInt(1, 5);
+  return Array.from({ length: len }, () => pick(words)).join(" ") + "\n";
+}
+
+function genAtxHeading() {
+  const level = randInt(1, 6);
+  const text = pick(["Heading", "Title", "Section", "Part"]);
+  const trailing = maybe(0.3) ? " " + "#".repeat(level) : "";
+  return "#".repeat(level) + " " + text + trailing + "\n";
+}
+
+function genSetextHeading() {
+  const text = pick(["Foo", "Bar", "Heading"]);
+  const marker = maybe(0.5) ? "---" : "===";
+  return text + "\n" + marker + "\n";
+}
+
+function genThematicBreak() {
+  return pick(["---", "***", "___"]) + "\n";
+}
+
+function genBulletList() {
+  const items = randInt(1, 4);
+  const marker = pick(["-", "*", "+"]);
+  let result = "";
+  for (let i = 0; i < items; i++) {
+    result += marker + " " + genInlineContent() + "\n";
+  }
+  return result;
+}
+
+function genOrderedList() {
+  const items = randInt(1, 3);
+  let result = "";
+  for (let i = 0; i < items; i++) {
+    result += (i + 1) + ". " + genInlineContent() + "\n";
+  }
+  return result;
+}
+
+function genBlockquote() {
+  const lines = randInt(1, 3);
+  let result = "";
+  for (let i = 0; i < lines; i++) {
+    result += "> " + genInlineContent() + "\n";
+  }
+  return result;
+}
+
+function genFencedCode() {
+  const fence = maybe(0.5) ? "```" : "~~~";
+  const lang = maybe(0.5) ? pick(["js", "rust", "md", ""]) : "";
+  const body = pick(["let x = 1;", "code here", "fn main() {}"]);
+  return fence + lang + "\n" + body + "\n" + fence + "\n";
+}
+
+function genIndentedCode() {
+  return "    " + pick(["code line", "let x = 1;", "indented"]) + "\n";
+}
+
+function genLinkRefDef() {
+  const label = pick(["foo", "bar", "link"]);
+  const url = pick(["/url", "https://example.com", "/path"]);
+  const title = maybe(0.3) ? ' "' + pick(["title", "my title"]) + '"' : "";
+  return "[" + label + "]: " + url + title + "\n";
+}
+
+function genInlineContent() {
+  const parts = [];
+  const len = randInt(1, 4);
+  for (let i = 0; i < len; i++) {
+    const kind = randInt(0, 6);
+    switch (kind) {
+      case 0: parts.push(pick(["foo", "bar", "baz", "text", "word"])); break;
+      case 1: parts.push("*" + pick(["em", "italic"]) + "*"); break;
+      case 2: parts.push("**" + pick(["bold", "strong"]) + "**"); break;
+      case 3: parts.push("`" + pick(["code", "x"]) + "`"); break;
+      case 4: parts.push("[" + pick(["link", "text"]) + "](url)"); break;
+      case 5: parts.push("<" + pick(["span", "b", "i"]) + ">tag"); break;
+      case 6: parts.push(pick(["foo", "bar"])); break;
+    }
+  }
+  return parts.join(" ");
+}
+
+// #endregion
+
+// #region Interaction combinators (the high-value generators)
+
+function genHeadingInList() {
+  const heading = maybe(0.5)
+    ? "#".repeat(randInt(1, 3)) + " " + pick(["Foo", "Bar"])
+    : pick(["Foo", "Bar"]) + "\n  " + pick(["---", "==="]);
+  return "- " + heading + "\n";
+}
+
+function genSetextInBlockquote() {
+  const text = pick(["Foo", "Bar", "Content"]);
+  const marker = maybe(0.5) ? "---" : "===";
+  return "> " + text + "\n> " + marker + "\n";
+}
+
+function genCodeInList() {
+  const fence = maybe(0.5) ? "```" : "~~~";
+  const indent = "  ";
+  return "- item\n\n" + indent + fence + "\n" + indent + "code\n" + indent + fence + "\n";
+}
+
+function genInlineHtmlNearBlockquote() {
+  // Valid multiline tag (attr on next line, not starting with >)
+  const valid = "text ok end.\n";
+  // Invalid multiline tag (> at line start = blockquote)
+  const invalid = "text 
ok
end.\n"; + return maybe(0.5) ? valid : invalid; +} + +function genMixedListMarkers() { + const m1 = pick(["-", "*", "+"]); + let m2 = pick(["-", "*", "+"]); + while (m2 === m1) m2 = pick(["-", "*", "+"]); + return m1 + " item one\n\n" + m2 + " item two\n"; +} + +function genNestedListLazyContinuation() { + const outer = pick(["-", "*"]); + const inner = pick(["-", "*"]); + return outer + " outer\n " + inner + " nested\n lazy line\n"; +} + +function genLinkDefWithTrailing() { + return pick([ + "[valid]: /url\n", + "[valid-title]: /url \"title\"\n", + "[invalid]: /url trailing text\n", + "[angle]: trailing\n", + ]); +} + +function genListWithBlankLines() { + const marker = pick(["-", "*"]); + const tight = maybe(0.5); + let result = marker + " item one\n"; + if (!tight) result += "\n"; + result += marker + " item two\n"; + if (!tight) result += "\n"; + result += marker + " item three\n"; + return result; +} + +function genBlockquoteWithContinuation() { + const lazy = maybe(0.5); + let result = "> first line\n"; + if (lazy) { + result += "lazy continuation\n"; + } else { + result += "> continued\n"; + } + return result; +} + +// #endregion + +// #region Document generator + +const blockGenerators = [ + { fn: genParagraph, weight: 2 }, + { fn: genAtxHeading, weight: 2 }, + { fn: genSetextHeading, weight: 1 }, + { fn: genThematicBreak, weight: 1 }, + { fn: genBulletList, weight: 2 }, + { fn: genOrderedList, weight: 1 }, + { fn: genBlockquote, weight: 2 }, + { fn: genFencedCode, weight: 1 }, + { fn: genIndentedCode, weight: 1 }, + { fn: genLinkRefDef, weight: 1 }, + // Interaction combinators — higher weight to bias toward interaction bugs + { fn: genHeadingInList, weight: 3 }, + { fn: genSetextInBlockquote, weight: 3 }, + { fn: genCodeInList, weight: 2 }, + { fn: genInlineHtmlNearBlockquote, weight: 2 }, + { fn: genMixedListMarkers, weight: 2 }, + { fn: genNestedListLazyContinuation, weight: 2 }, + { fn: genLinkDefWithTrailing, weight: 2 }, + { fn: genListWithBlankLines, weight: 2 }, + { fn: genBlockquoteWithContinuation, weight: 2 }, +]; + +const totalWeight = blockGenerators.reduce((sum, g) => sum + g.weight, 0); + +function pickWeighted() { + let r = rand() * totalWeight; + for (const g of blockGenerators) { + r -= g.weight; + if (r <= 0) return g.fn; + } + return blockGenerators[blockGenerators.length - 1].fn; +} + +function genDocument() { + const blocks = randInt(1, 5); + let result = ""; + for (let i = 0; i < blocks; i++) { + const gen = pickWeighted(); + result += gen(); + if (maybe(0.6)) result += "\n"; // blank line between blocks + } + return result; +} + +// #endregion + +// #region Main + +const output = []; +const seen = new Set(); + +for (let i = 0; i < count; i++) { + const md = genDocument(); + + // Deduplicate + if (seen.has(md)) continue; + seen.add(md); + + try { + const html = render(md); + output.push(JSON.stringify({ markdown: md, html })); + } catch { + // Skip inputs that crash commonmark.js (shouldn't happen) + continue; + } +} + +writeFileSync(outputPath, output.join("\n") + "\n"); +console.log(`Generated ${output.length} test cases (seed=${seed}) → ${outputPath}`); + +// #endregion diff --git a/justfile b/justfile index 453b23c53906..aaa5f4c0eab7 100644 --- a/justfile +++ b/justfile @@ -237,6 +237,25 @@ test-doc: test-markdown-conformance: cargo run -p xtask_coverage -- --suites=markdown/commonmark +# Generate differential fuzz corpus for the markdown parser using commonmark.js +# Requires `pnpm install` from the repo root (commonmark is a root devDependency). +fuzz-markdown-generate count="1000" seed="42": + node crates/biome_markdown_parser/tests/fuzz_generate_corpus.cjs \ + --count={{count}} --seed={{seed}} \ + --output=crates/biome_markdown_parser/tests/fuzz_corpus/corpus.jsonl + +# Run differential fuzzer comparing Biome markdown output against commonmark.js +# Runs the checked-in seed corpus plus any generated corpus.jsonl +fuzz-markdown-differential: + #!/usr/bin/env bash + set -euo pipefail + CORPUS="$(pwd)/crates/biome_markdown_parser/tests/fuzz_corpus/corpus.jsonl" + if [ -f "$CORPUS" ]; then + FUZZ_CORPUS="$CORPUS" cargo test -p biome_markdown_parser --test fuzz_differential -- --ignored --nocapture + else + cargo test -p biome_markdown_parser --test fuzz_differential -- --ignored --nocapture + fi + # Update the CommonMark spec.json to a specific version update-commonmark-spec version: ./scripts/update-commonmark-spec.sh {{version}} diff --git a/package.json b/package.json index 3234381d5849..16564cdc559b 100644 --- a/package.json +++ b/package.json @@ -23,6 +23,7 @@ "@changesets/changelog-github": "0.6.0", "@changesets/cli": "2.30.0", "@types/node": "24.12.0", + "commonmark": "0.31.2", "tombi": "0.9.13" } } diff --git a/pnpm-lock.yaml b/pnpm-lock.yaml index abac457adba5..2d156e18a0cc 100644 --- a/pnpm-lock.yaml +++ b/pnpm-lock.yaml @@ -20,6 +20,9 @@ importers: '@types/node': specifier: 24.12.0 version: 24.12.0 + commonmark: + specifier: 0.31.2 + version: 0.31.2 tombi: specifier: 0.9.13 version: 0.9.13 @@ -1224,6 +1227,10 @@ packages: color-name@1.1.4: resolution: {integrity: sha512-dOy+3AuW3a2wNbZHIuMZpTcgjGuLU/uBL/ubcZF9OXbDo8ff4O8yVp5Bf0efS8uEoYo5q4Fx7dY9OgQGXgAsQA==} + commonmark@0.31.2: + resolution: {integrity: sha512-2fRLTyb9r/2835k5cwcAwOj0DEc44FARnMp5veGsJ+mEAZdi52sNopLu07ZyElQUz058H43whzlERDIaaSw4rg==} + hasBin: true + concat-map@0.0.1: resolution: {integrity: sha512-/Srv4dswyQNBfohGpz9o6Yb3Gz3SrUDqBH5rTuhGR7ahtlbYKnVxw2bCFMRljaA7EXHaXZ8wsHdodFvbkhKmqg==} @@ -1279,6 +1286,10 @@ packages: resolution: {integrity: sha512-rRqJg/6gd538VHvR3PSrdRBb/1Vy2YfzHqzvbhGIQpDRKIa4FgV/54b5Q1xYSxOOwKvjXweS26E0Q+nAMwp2pQ==} engines: {node: '>=8.6'} + entities@3.0.1: + resolution: {integrity: sha512-WiyBqoomrwMdFG1e0kqvASYfnlb0lp8M5o5Fw2OFq1hNZxxcNk8Ik0Xm7LxzBhuidnZB/UtBqVCgUz3kBOP51Q==} + engines: {node: '>=0.12'} + entities@7.0.1: resolution: {integrity: sha512-TWrgLOFUQTH994YUyl1yT4uyavY5nNB5muff+RtWaqNVCAK408b5ZnnbNAUEWLTCpum9w6arT70i1XdQ4UeOPA==} engines: {node: '>=0.12'} @@ -1602,6 +1613,9 @@ packages: engines: {node: '>= 20'} hasBin: true + mdurl@1.0.1: + resolution: {integrity: sha512-/sKlQJCBYVY9Ers9hqzKou4H6V5UWc/M59TH2dvkt+84itfnq7uFOMLpOiOS4ujvHP4etln18fmIxA5R5fll0g==} + merge2@1.4.1: resolution: {integrity: sha512-8q7VEgMJW4J8tcfVPy8g09NcQwZdbwFEqhe/WZkoIzjn/3TGDwtOCYtXGxA3O8tPzpczCCDgv+P2P5y00ZJOOg==} engines: {node: '>= 8'} @@ -1622,6 +1636,9 @@ packages: resolution: {integrity: sha512-G6T0ZX48xgozx7587koeX9Ys2NYy6Gmv//P89sEte9V9whIapMNF4idKxnW2QtCcLiTWlb/wfCabAtAFWhhBow==} engines: {node: '>=16 || 14 >=14.17'} + minimist@1.2.8: + resolution: {integrity: sha512-2yyAR8qBkN3YuheJanUpWC5U3bb5osDywNB8RzDVlDwDHbocAJveqqj1u8+SVD7jkWT4yvsHCpWqqWqAxb0zCA==} + mri@1.2.0: resolution: {integrity: sha512-tzzskb3bG8LvYGFF/mDTpq3jpI6Q9wc3LEmBaghu+DdCssd1FakN7Bc0hVNmEyGq1bq3RgfkCb3cmQLpNPOroA==} engines: {node: '>=4'} @@ -3269,6 +3286,12 @@ snapshots: color-name@1.1.4: {} + commonmark@0.31.2: + dependencies: + entities: 3.0.1 + mdurl: 1.0.1 + minimist: 1.2.8 + concat-map@0.0.1: {} convert-source-map@2.0.0: {} @@ -3320,6 +3343,8 @@ snapshots: ansi-colors: 4.1.3 strip-ansi: 6.0.1 + entities@3.0.1: {} + entities@7.0.1: {} es-module-lexer@2.0.0: {} @@ -3692,6 +3717,8 @@ snapshots: marked@17.0.1: {} + mdurl@1.0.1: {} + merge2@1.4.1: {} micromatch@4.0.8: @@ -3709,6 +3736,8 @@ snapshots: dependencies: brace-expansion: 2.0.1 + minimist@1.2.8: {} + mri@1.2.0: {} ms@2.1.3: {}