diff --git a/crates/biome_markdown_parser/tests/fuzz_corpus/seed.jsonl b/crates/biome_markdown_parser/tests/fuzz_corpus/seed.jsonl new file mode 100644 index 000000000000..ec11860bda6e --- /dev/null +++ b/crates/biome_markdown_parser/tests/fuzz_corpus/seed.jsonl @@ -0,0 +1,102 @@ +{"markdown":"[valid-title]: /url \"title\"\n\n- outer\n - nested\n lazy line\n- Bar\n ---\n\n> first line\nlazy continuation\n\n- Foo\n ===\n\n","html":"
\n\nfirst line\nlazy continuation
\n
\n\n\n
xem text\nbaz em bold
indented\n\n\n\nitalic word
\n
bar content world bar bar
\n\n\n"} +{"markdown":"> first line\n> continued\n\n code line\n\n- item one\n- item two\n- item three\n+ `x` `x`\n\nBar\n===\n\n","html":"text\nstrong\n
\nxfoocode
\n\nfirst line\ncontinued
\n
code line\n\nx xlet x = 1;\n\n"}
+{"markdown":"1. tag *em* `x` `x`\n2. **bold** **strong** [text](url) baz\n\n> foo\n> [link](url) foo *italic* tag\n\n###### Part\n\n- item\n\n ```\n code\n ```\n","html":"x x\n\nfoo\nlink foo italic tag
\n
item
\ncode\n\n\n\n"} +{"markdown":"- outer\n - nested\n lazy line\n","html":"first line\ncontinued
\n
text
item
\ncode\n\n\n\nfirst line\ncontinued
\n
\n\n"} +{"markdown":"bar\nworld foo foo content\n\n","html":"first line\nlazy continuation\nHeading\n===
\n
bar\nworld foo foo content
\n"} +{"markdown":"texttext
\n\nfirst line\ncontinued\n[angle]: trailing\nfirst line\nlazy continuation
\n
let x = 1;\n\n"}
+{"markdown":"> first line\n> continued\n1. `x` *italic*\n##### Section\n","html":"\n\nfirst line\ncontinued
\n
x italicbar world test test hello
\n"} +{"markdown":"- item\n\n ~~~\n code\n ~~~\n\n***\n\n* item one\n\n* item two\n\n* item three\n\n","html":"item
\ncode\n\nitem one
\nitem two
\nitem three
\ncode line
\nitem
\ncode\n\n\n\nfirst line\nlazy continuation
\n
test bar hello
\nhello baz foo
\n"} +{"markdown":"> tag foo **strong** **strong**\n\n- outer\n * nested\n lazy line\n","html":"\n\ntag foo strong strong
\n
fn main() {}\n\n"}
+{"markdown":"> bar\n\n[bar]: https://example.com\nworld\n\ncontent content\n\n* foo [link](url)\n* [link](url) *em* tag\n* *italic* bar *em*\n* foo [link](url) bar\n\n","html":"\n\nbar
\n
world
\ncontent content
\n\n"} +{"markdown":"* text text bar\n* text foo tag\n","html":"\n\n\nem tag\ntext\ntext strong tag
code here\n\ntext
indented\n\n"}
+{"markdown":"baz content\n- ## Bar\n\n[foo]: https://example.com \"my title\"\n1. tag `x`\n\n","html":"baz content
\nxitem one
\nitem two
\nitem three
\nitem one
\nitem two
\nitem three
\ntext
fn main() {}\n\n"}
+{"markdown":"[angle]: trailing\n\n","html":"[angle]: trailing
\n"} +{"markdown":"hello\n\n- Foo\n ===\n> first line\nlazy continuation\n\n~~~rust\nfn main() {}\n~~~\n[angle]: trailing\n","html":"hello
\n\n\nfirst line\nlazy continuation
\n
fn main() {}\n\n[angle]: trailing
\n"} +{"markdown":"* item one\n* item two\n* item three\n```\nlet x = 1;\n```\n\n","html":"let x = 1;\n\n"}
+{"markdown":"content content baz foo baz\n- item\n\n ~~~\n code\n ~~~\n\n> baz *italic*\n> **strong** [link](url) `x`\nFoo\n---\n\n","html":"content content baz foo baz
\nitem
\ncode\n\n\n\nbaz italic\nstrong link
\nx\nFoo
code code text em\n\nfirst line\nlazy continuation
\n
item one
\nitem two
\nitem three
\nitem
\ncode\n\nitem
\ncode\n\n\n\n\n\n"} +{"markdown":"- ### Foo\n1. `code`\n\n","html":"
codeworld hello bar test bar
\n\n\n"} +{"markdown":"Bar\n===\n\nworld foo\n\n- ## Foo\n\n","html":"first line\ncontinued
\n
world foo
\nindented\n\n\n\n"} +{"markdown":"[angle]: trailing\n","html":"first line\nlazy continuation\ntext
okend.\n
[angle]: trailing
\n"} +{"markdown":"Heading\n---\n\n","html":"[angle]: trailing
\nworld
\n"} +{"markdown":"~~~js\nfn main() {}\n~~~\n\n","html":"fn main() {}\n\n"}
+{"markdown":"- item one\n- item two\n- item three\n","html":"xcode here\n\nxlet x = 1;\n\nxx textx strong italic foo\n\n"} +{"markdown":"- item\n\n ```\n code\n ```\n\n> `code` bar bar\n","html":"first line\nlazy continuation
\n
item
\ncode\n\n\n\n"} +{"markdown":"> first line\nlazy continuation\n let x = 1;\n\n1. bar *italic* **strong** [text](url)\n2. tag\n","html":"\n
codebar bar
\n\nfirst line\nlazy continuation\nlet x = 1;
\n
x linkx italic tag barcode strong foox boldcode text text\n\n"} +{"markdown":"- Bar\n ===\n","html":"first line\nlazy continuation\nlet x = 1;
\n
\n\nword tag
\n
\n\n"} +{"markdown":"[foo]: /path\n\nBar\n---\n\n- item\n\n ```\n code\n ```\n\n","html":"\n
codeitalic\nfoo tag
item
\ncode\n\nitem one
\nitem two
\nitem three
\n\n\n"} +{"markdown":"> first line\nlazy continuation\n\n","html":"link text
\nxbar\ntag strong foo\ntag tagcodebold\nlet x = 1;
\n\n"} +{"markdown":"1. **bold** [text](url) **strong**\n2. [text](url) *italic* `x`\n> tag bar\n- ## Foo\n\n","html":"\nfirst line\nlazy continuation
\n
\n\ntag bar
\n
indented\n\nlet x = 1;\n\n\n\ntag\ntext bar
\n
\n\nfirst line\ncontinued
\n
item
\ncode\n\nitem
\ncode\n\ntext
\n\n\n
hello bar foo
\nitem
\ncode\n\n\n\nbold
\ncode\nxxtext\ntag
item one
\nitem two
\nitem three
\n\n\nfirst line\nlazy continuation\n[valid]: /url
\n
\n\nfirst line\ncontinued\nfirst line\nlazy continuation
\n
code line\n\nitem
\ncode\n\nworld test foo
\nlet x = 1;\n\nitem one
\nitem two
\nitem three
\n\n\nbold italic
\n
\n\n"} +{"markdown":"bar test\n","html":"first line\nlazy continuation\n[invalid]: /url trailing text
\n
bar test
\n"} +{"markdown":"---\n- item\n\n ~~~\n code\n ~~~\n\n","html":"item
\ncode\n\n\n\nfirst line\ncontinued
\n
item
\ncode\n\ncode here\n\nitem one
\nitem two
\nitem three
\nouter
\nhello content bar
\n"} +{"markdown":"foo\n\n* tag [text](url)\n* [text](url) text *italic*\n* foo\n* bar `code`\n+ tag tag\n+ bar `x`\n\n+ bar tag bar *em*\n1. tag tag tag\n2. tag\n3. [text](url) **strong**\n\n","html":"foo
\n\ntag tag
\nbar x
bar tag bar em
\nitem
\ncode\n\n\n\nfirst line\nlazy continuation
\n
item
\ncode\n\ncode wordcode text\n\n\n\n\n"} +{"markdown":"[foo]: /url\n+ [text](url)\n+ *em*\n+ **strong**\n+ foo\n\ntext
text
baz
\n"} +{"markdown":"[valid]: /url\n\n* outer\n * nested\n lazy line\n> first line\nlazy continuation\n[valid-title]: /url \"title\"\n","html":"\n\n"} +{"markdown":"[valid-title]: /url \"title\"\n\n___\n* outer\n - nested\n lazy line\n\n[valid]: /url\n> tag\n> **strong**\n> bar\n","html":"first line\nlazy continuation\n[valid-title]: /url "title"
\n
\n\n"} +{"markdown":"- outer\n * nested\n lazy line\n> **bold**\n\n let x = 1;\n\n- item\n\n ```\n code\n ```\n\n","html":"tag\nstrong\nbar
\n
\n\nbold
\n
let x = 1;\n\nitem
\ncode\n\n\n\nfirst line\nlazy continuation
\n
\n\n"} +{"markdown":"textbold strong bar bar
\n
text
fn main() {}\n\n"}
+{"markdown":"Foo\n===\n\n- ## Foo\n\n> bar *em* tag\n\n","html":"\n\n"} +{"markdown":"```js\ncode here\n```\n\n","html":"bar em tag
\n
code here\n\n"}
+{"markdown":"- ## Foo\n> *em* `x` [link](url) *em*\n\n","html":"\n\n"} diff --git a/crates/biome_markdown_parser/tests/fuzz_differential.rs b/crates/biome_markdown_parser/tests/fuzz_differential.rs new file mode 100644 index 000000000000..669aeedad137 --- /dev/null +++ b/crates/biome_markdown_parser/tests/fuzz_differential.rs @@ -0,0 +1,167 @@ +//! Differential fuzzer: compares Biome's markdown HTML output against +//! commonmark.js reference output from a pre-generated corpus. +//! +//! The checked-in seed corpus (`seed.jsonl`) contains only passing cases. +//! Any failure is either a regression or a newly discovered mismatch. +//! +//! Run with: cargo test -p biome_markdown_parser --test fuzz_differential -- --ignored --nocapture + +use biome_markdown_parser::{document_to_html, parse_markdown}; +use biome_markdown_syntax::MdDocument; +use biome_rowan::AstNode; +use std::fs; +use std::path::{Path, PathBuf}; + +/// Normalize HTML for comparison, preserving whitespace inside `em
\nxlink em
` blocks.
+/// Matches the normalization in `xtask/coverage/src/markdown/commonmark.rs`.
+fn normalize_html(html: &str) -> String {
+ let mut result = Vec::new();
+ let mut in_pre = false;
+
+ for line in html.lines() {
+ if line.contains("") {
+ in_pre = false;
+ }
+ }
+
+ result.join("\n").trim().to_string() + "\n"
+}
+
+/// FNV-1a 64-bit hash — deterministic across Rust toolchain versions.
+fn content_hash(s: &str) -> String {
+ let mut hash: u64 = 0xcbf2_9ce4_8422_2325;
+ for byte in s.as_bytes() {
+ hash ^= *byte as u64;
+ hash = hash.wrapping_mul(0x0100_0000_01b3);
+ }
+ format!("{hash:016x}")
+}
+
+#[derive(serde::Deserialize)]
+struct SeedCase {
+ markdown: String,
+ html: String,
+}
+
+struct Failure {
+ hash: String,
+ markdown: String,
+ expected: String,
+ actual: String,
+}
+
+fn run_corpus(path: &Path) -> (Vec, usize) {
+ let content = fs::read_to_string(path)
+ .unwrap_or_else(|e| panic!("Failed to read corpus {}: {e}", path.display()));
+
+ let mut failures = vec![];
+ let mut total = 0usize;
+
+ for (i, line) in content.lines().enumerate() {
+ if line.trim().is_empty() {
+ continue;
+ }
+
+ let entry: SeedCase = serde_json::from_str(line)
+ .unwrap_or_else(|e| panic!("Malformed JSON at {}:{}: {e}", path.display(), i + 1));
+
+ let markdown = &entry.markdown;
+ let expected_html = &entry.html;
+ total += 1;
+
+ let parsed = parse_markdown(markdown);
+ let Some(doc) = MdDocument::cast(parsed.syntax()) else {
+ failures.push(Failure {
+ hash: content_hash(markdown),
+ markdown: markdown.clone(),
+ expected: expected_html.clone(),
+ actual: "".to_string(),
+ });
+ continue;
+ };
+
+ let actual = document_to_html(
+ &doc,
+ parsed.list_tightness(),
+ parsed.list_item_indents(),
+ parsed.quote_indents(),
+ );
+
+ let expected_normalized = normalize_html(expected_html);
+ let actual_normalized = normalize_html(&actual);
+
+ if expected_normalized != actual_normalized {
+ failures.push(Failure {
+ hash: content_hash(markdown),
+ markdown: markdown.clone(),
+ expected: expected_html.clone(),
+ actual,
+ });
+ }
+ }
+
+ (failures, total)
+}
+
+#[test]
+#[ignore]
+fn differential_fuzz_against_commonmark_js() {
+ let manifest_dir = Path::new(env!("CARGO_MANIFEST_DIR"));
+ let corpus_dir = manifest_dir.join("tests/fuzz_corpus");
+
+ // Always run the checked-in seed corpus (passing cases only)
+ let seed_path = corpus_dir.join("seed.jsonl");
+ let (mut all_failures, mut total) = run_corpus(&seed_path);
+
+ // Optionally run an extended corpus if FUZZ_CORPUS env var is set
+ if let Ok(extra_path) = std::env::var("FUZZ_CORPUS") {
+ let (extra_failures, extra_total) = run_corpus(Path::new(&extra_path));
+ all_failures.extend(extra_failures);
+ total += extra_total;
+ }
+
+ // Write failure artifacts if FUZZ_FAILURES_DIR is set
+ if let Ok(failures_dir) = std::env::var("FUZZ_FAILURES_DIR") {
+ let dir = PathBuf::from(&failures_dir);
+ fs::create_dir_all(&dir).expect("Failed to create failures directory");
+
+ for failure in &all_failures {
+ let base = dir.join(&failure.hash);
+ fs::write(base.with_extension("md"), &failure.markdown).ok();
+ fs::write(base.with_extension("expected.html"), &failure.expected).ok();
+ fs::write(base.with_extension("actual.html"), &failure.actual).ok();
+ }
+ }
+
+ // Print summary
+ let passed = total - all_failures.len();
+ eprintln!(
+ "\nDifferential fuzz: {total} cases, {passed} passed, {} failed",
+ all_failures.len()
+ );
+
+ if !all_failures.is_empty() {
+ eprintln!("\n=== {} differential failures ===\n", all_failures.len());
+ for (i, f) in all_failures.iter().enumerate().take(10) {
+ eprintln!("--- Failure {} [{}] ---", i + 1, f.hash);
+ eprintln!("Input:\n{}", f.markdown);
+ eprintln!("Expected:\n{}", f.expected);
+ eprintln!("Actual:\n{}", f.actual);
+ eprintln!();
+ }
+ if all_failures.len() > 10 {
+ eprintln!("... and {} more", all_failures.len() - 10);
+ }
+ panic!("{} differential mismatches found", all_failures.len());
+ }
+
+ eprintln!("All cases passed.");
+}
diff --git a/crates/biome_markdown_parser/tests/fuzz_generate_corpus.cjs b/crates/biome_markdown_parser/tests/fuzz_generate_corpus.cjs
new file mode 100644
index 000000000000..8e02bfd092ce
--- /dev/null
+++ b/crates/biome_markdown_parser/tests/fuzz_generate_corpus.cjs
@@ -0,0 +1,295 @@
+#!/usr/bin/env node
+// Differential fuzzer corpus generator for Biome's markdown parser.
+// Generates random markdown inputs from construct combinators and renders
+// reference HTML via commonmark.js.
+//
+// Usage:
+// node fuzz_generate_corpus.cjs [--count=N] [--seed=N] [--output=path]
+//
+// Requires `pnpm install` from the repo root (commonmark is a root devDependency).
+
+"use strict";
+
+const { writeFileSync } = require("node:fs");
+
+// Parse CLI args
+const args = Object.fromEntries(
+ process.argv.slice(2).map((a) => {
+ const [k, v] = a.replace(/^--/, "").split("=");
+ return [k, v];
+ })
+);
+
+const count = parseInt(args.count || "1000", 10);
+const seed = parseInt(args.seed || "42", 10);
+const outputPath = args.output || "corpus.jsonl";
+
+// Load commonmark via require() — relies on cwd having node_modules/commonmark
+const { Parser, HtmlRenderer } = require("commonmark");
+
+const parser = new Parser();
+const renderer = new HtmlRenderer();
+
+function render(md) {
+ return renderer.render(parser.parse(md));
+}
+
+// #region Seeded PRNG (xorshift32)
+let rngState = seed === 0 ? 1 : seed;
+function rand() {
+ rngState ^= rngState << 13;
+ rngState ^= rngState >> 17;
+ rngState ^= rngState << 5;
+ return (rngState >>> 0) / 0x100000000;
+}
+function randInt(min, max) {
+ return min + Math.floor(rand() * (max - min + 1));
+}
+function pick(arr) {
+ return arr[randInt(0, arr.length - 1)];
+}
+function maybe(prob = 0.5) {
+ return rand() < prob;
+}
+// #endregion
+
+// #region Construct combinators
+
+function genParagraph() {
+ const words = ["foo", "bar", "baz", "hello", "world", "test", "content"];
+ const len = randInt(1, 5);
+ return Array.from({ length: len }, () => pick(words)).join(" ") + "\n";
+}
+
+function genAtxHeading() {
+ const level = randInt(1, 6);
+ const text = pick(["Heading", "Title", "Section", "Part"]);
+ const trailing = maybe(0.3) ? " " + "#".repeat(level) : "";
+ return "#".repeat(level) + " " + text + trailing + "\n";
+}
+
+function genSetextHeading() {
+ const text = pick(["Foo", "Bar", "Heading"]);
+ const marker = maybe(0.5) ? "---" : "===";
+ return text + "\n" + marker + "\n";
+}
+
+function genThematicBreak() {
+ return pick(["---", "***", "___"]) + "\n";
+}
+
+function genBulletList() {
+ const items = randInt(1, 4);
+ const marker = pick(["-", "*", "+"]);
+ let result = "";
+ for (let i = 0; i < items; i++) {
+ result += marker + " " + genInlineContent() + "\n";
+ }
+ return result;
+}
+
+function genOrderedList() {
+ const items = randInt(1, 3);
+ let result = "";
+ for (let i = 0; i < items; i++) {
+ result += (i + 1) + ". " + genInlineContent() + "\n";
+ }
+ return result;
+}
+
+function genBlockquote() {
+ const lines = randInt(1, 3);
+ let result = "";
+ for (let i = 0; i < lines; i++) {
+ result += "> " + genInlineContent() + "\n";
+ }
+ return result;
+}
+
+function genFencedCode() {
+ const fence = maybe(0.5) ? "```" : "~~~";
+ const lang = maybe(0.5) ? pick(["js", "rust", "md", ""]) : "";
+ const body = pick(["let x = 1;", "code here", "fn main() {}"]);
+ return fence + lang + "\n" + body + "\n" + fence + "\n";
+}
+
+function genIndentedCode() {
+ return " " + pick(["code line", "let x = 1;", "indented"]) + "\n";
+}
+
+function genLinkRefDef() {
+ const label = pick(["foo", "bar", "link"]);
+ const url = pick(["/url", "https://example.com", "/path"]);
+ const title = maybe(0.3) ? ' "' + pick(["title", "my title"]) + '"' : "";
+ return "[" + label + "]: " + url + title + "\n";
+}
+
+function genInlineContent() {
+ const parts = [];
+ const len = randInt(1, 4);
+ for (let i = 0; i < len; i++) {
+ const kind = randInt(0, 6);
+ switch (kind) {
+ case 0: parts.push(pick(["foo", "bar", "baz", "text", "word"])); break;
+ case 1: parts.push("*" + pick(["em", "italic"]) + "*"); break;
+ case 2: parts.push("**" + pick(["bold", "strong"]) + "**"); break;
+ case 3: parts.push("`" + pick(["code", "x"]) + "`"); break;
+ case 4: parts.push("[" + pick(["link", "text"]) + "](url)"); break;
+ case 5: parts.push("<" + pick(["span", "b", "i"]) + ">tag" + pick(["span", "b", "i"]) + ">"); break;
+ case 6: parts.push(pick(["foo", "bar"])); break;
+ }
+ }
+ return parts.join(" ");
+}
+
+// #endregion
+
+// #region Interaction combinators (the high-value generators)
+
+function genHeadingInList() {
+ const heading = maybe(0.5)
+ ? "#".repeat(randInt(1, 3)) + " " + pick(["Foo", "Bar"])
+ : pick(["Foo", "Bar"]) + "\n " + pick(["---", "==="]);
+ return "- " + heading + "\n";
+}
+
+function genSetextInBlockquote() {
+ const text = pick(["Foo", "Bar", "Content"]);
+ const marker = maybe(0.5) ? "---" : "===";
+ return "> " + text + "\n> " + marker + "\n";
+}
+
+function genCodeInList() {
+ const fence = maybe(0.5) ? "```" : "~~~";
+ const indent = " ";
+ return "- item\n\n" + indent + fence + "\n" + indent + "code\n" + indent + fence + "\n";
+}
+
+function genInlineHtmlNearBlockquote() {
+ // Valid multiline tag (attr on next line, not starting with >)
+ const valid = "text ok end.\n";
+ // Invalid multiline tag (> at line start = blockquote)
+ const invalid = "text ok end.\n";
+ return maybe(0.5) ? valid : invalid;
+}
+
+function genMixedListMarkers() {
+ const m1 = pick(["-", "*", "+"]);
+ let m2 = pick(["-", "*", "+"]);
+ while (m2 === m1) m2 = pick(["-", "*", "+"]);
+ return m1 + " item one\n\n" + m2 + " item two\n";
+}
+
+function genNestedListLazyContinuation() {
+ const outer = pick(["-", "*"]);
+ const inner = pick(["-", "*"]);
+ return outer + " outer\n " + inner + " nested\n lazy line\n";
+}
+
+function genLinkDefWithTrailing() {
+ return pick([
+ "[valid]: /url\n",
+ "[valid-title]: /url \"title\"\n",
+ "[invalid]: /url trailing text\n",
+ "[angle]: trailing\n",
+ ]);
+}
+
+function genListWithBlankLines() {
+ const marker = pick(["-", "*"]);
+ const tight = maybe(0.5);
+ let result = marker + " item one\n";
+ if (!tight) result += "\n";
+ result += marker + " item two\n";
+ if (!tight) result += "\n";
+ result += marker + " item three\n";
+ return result;
+}
+
+function genBlockquoteWithContinuation() {
+ const lazy = maybe(0.5);
+ let result = "> first line\n";
+ if (lazy) {
+ result += "lazy continuation\n";
+ } else {
+ result += "> continued\n";
+ }
+ return result;
+}
+
+// #endregion
+
+// #region Document generator
+
+const blockGenerators = [
+ { fn: genParagraph, weight: 2 },
+ { fn: genAtxHeading, weight: 2 },
+ { fn: genSetextHeading, weight: 1 },
+ { fn: genThematicBreak, weight: 1 },
+ { fn: genBulletList, weight: 2 },
+ { fn: genOrderedList, weight: 1 },
+ { fn: genBlockquote, weight: 2 },
+ { fn: genFencedCode, weight: 1 },
+ { fn: genIndentedCode, weight: 1 },
+ { fn: genLinkRefDef, weight: 1 },
+ // Interaction combinators — higher weight to bias toward interaction bugs
+ { fn: genHeadingInList, weight: 3 },
+ { fn: genSetextInBlockquote, weight: 3 },
+ { fn: genCodeInList, weight: 2 },
+ { fn: genInlineHtmlNearBlockquote, weight: 2 },
+ { fn: genMixedListMarkers, weight: 2 },
+ { fn: genNestedListLazyContinuation, weight: 2 },
+ { fn: genLinkDefWithTrailing, weight: 2 },
+ { fn: genListWithBlankLines, weight: 2 },
+ { fn: genBlockquoteWithContinuation, weight: 2 },
+];
+
+const totalWeight = blockGenerators.reduce((sum, g) => sum + g.weight, 0);
+
+function pickWeighted() {
+ let r = rand() * totalWeight;
+ for (const g of blockGenerators) {
+ r -= g.weight;
+ if (r <= 0) return g.fn;
+ }
+ return blockGenerators[blockGenerators.length - 1].fn;
+}
+
+function genDocument() {
+ const blocks = randInt(1, 5);
+ let result = "";
+ for (let i = 0; i < blocks; i++) {
+ const gen = pickWeighted();
+ result += gen();
+ if (maybe(0.6)) result += "\n"; // blank line between blocks
+ }
+ return result;
+}
+
+// #endregion
+
+// #region Main
+
+const output = [];
+const seen = new Set();
+
+for (let i = 0; i < count; i++) {
+ const md = genDocument();
+
+ // Deduplicate
+ if (seen.has(md)) continue;
+ seen.add(md);
+
+ try {
+ const html = render(md);
+ output.push(JSON.stringify({ markdown: md, html }));
+ } catch {
+ // Skip inputs that crash commonmark.js (shouldn't happen)
+ continue;
+ }
+}
+
+writeFileSync(outputPath, output.join("\n") + "\n");
+console.log(`Generated ${output.length} test cases (seed=${seed}) → ${outputPath}`);
+
+// #endregion
diff --git a/justfile b/justfile
index 453b23c53906..aaa5f4c0eab7 100644
--- a/justfile
+++ b/justfile
@@ -237,6 +237,25 @@ test-doc:
test-markdown-conformance:
cargo run -p xtask_coverage -- --suites=markdown/commonmark
+# Generate differential fuzz corpus for the markdown parser using commonmark.js
+# Requires `pnpm install` from the repo root (commonmark is a root devDependency).
+fuzz-markdown-generate count="1000" seed="42":
+ node crates/biome_markdown_parser/tests/fuzz_generate_corpus.cjs \
+ --count={{count}} --seed={{seed}} \
+ --output=crates/biome_markdown_parser/tests/fuzz_corpus/corpus.jsonl
+
+# Run differential fuzzer comparing Biome markdown output against commonmark.js
+# Runs the checked-in seed corpus plus any generated corpus.jsonl
+fuzz-markdown-differential:
+ #!/usr/bin/env bash
+ set -euo pipefail
+ CORPUS="$(pwd)/crates/biome_markdown_parser/tests/fuzz_corpus/corpus.jsonl"
+ if [ -f "$CORPUS" ]; then
+ FUZZ_CORPUS="$CORPUS" cargo test -p biome_markdown_parser --test fuzz_differential -- --ignored --nocapture
+ else
+ cargo test -p biome_markdown_parser --test fuzz_differential -- --ignored --nocapture
+ fi
+
# Update the CommonMark spec.json to a specific version
update-commonmark-spec version:
./scripts/update-commonmark-spec.sh {{version}}
diff --git a/package.json b/package.json
index 3234381d5849..16564cdc559b 100644
--- a/package.json
+++ b/package.json
@@ -23,6 +23,7 @@
"@changesets/changelog-github": "0.6.0",
"@changesets/cli": "2.30.0",
"@types/node": "24.12.0",
+ "commonmark": "0.31.2",
"tombi": "0.9.13"
}
}
diff --git a/pnpm-lock.yaml b/pnpm-lock.yaml
index abac457adba5..2d156e18a0cc 100644
--- a/pnpm-lock.yaml
+++ b/pnpm-lock.yaml
@@ -20,6 +20,9 @@ importers:
'@types/node':
specifier: 24.12.0
version: 24.12.0
+ commonmark:
+ specifier: 0.31.2
+ version: 0.31.2
tombi:
specifier: 0.9.13
version: 0.9.13
@@ -1224,6 +1227,10 @@ packages:
color-name@1.1.4:
resolution: {integrity: sha512-dOy+3AuW3a2wNbZHIuMZpTcgjGuLU/uBL/ubcZF9OXbDo8ff4O8yVp5Bf0efS8uEoYo5q4Fx7dY9OgQGXgAsQA==}
+ commonmark@0.31.2:
+ resolution: {integrity: sha512-2fRLTyb9r/2835k5cwcAwOj0DEc44FARnMp5veGsJ+mEAZdi52sNopLu07ZyElQUz058H43whzlERDIaaSw4rg==}
+ hasBin: true
+
concat-map@0.0.1:
resolution: {integrity: sha512-/Srv4dswyQNBfohGpz9o6Yb3Gz3SrUDqBH5rTuhGR7ahtlbYKnVxw2bCFMRljaA7EXHaXZ8wsHdodFvbkhKmqg==}
@@ -1279,6 +1286,10 @@ packages:
resolution: {integrity: sha512-rRqJg/6gd538VHvR3PSrdRBb/1Vy2YfzHqzvbhGIQpDRKIa4FgV/54b5Q1xYSxOOwKvjXweS26E0Q+nAMwp2pQ==}
engines: {node: '>=8.6'}
+ entities@3.0.1:
+ resolution: {integrity: sha512-WiyBqoomrwMdFG1e0kqvASYfnlb0lp8M5o5Fw2OFq1hNZxxcNk8Ik0Xm7LxzBhuidnZB/UtBqVCgUz3kBOP51Q==}
+ engines: {node: '>=0.12'}
+
entities@7.0.1:
resolution: {integrity: sha512-TWrgLOFUQTH994YUyl1yT4uyavY5nNB5muff+RtWaqNVCAK408b5ZnnbNAUEWLTCpum9w6arT70i1XdQ4UeOPA==}
engines: {node: '>=0.12'}
@@ -1602,6 +1613,9 @@ packages:
engines: {node: '>= 20'}
hasBin: true
+ mdurl@1.0.1:
+ resolution: {integrity: sha512-/sKlQJCBYVY9Ers9hqzKou4H6V5UWc/M59TH2dvkt+84itfnq7uFOMLpOiOS4ujvHP4etln18fmIxA5R5fll0g==}
+
merge2@1.4.1:
resolution: {integrity: sha512-8q7VEgMJW4J8tcfVPy8g09NcQwZdbwFEqhe/WZkoIzjn/3TGDwtOCYtXGxA3O8tPzpczCCDgv+P2P5y00ZJOOg==}
engines: {node: '>= 8'}
@@ -1622,6 +1636,9 @@ packages:
resolution: {integrity: sha512-G6T0ZX48xgozx7587koeX9Ys2NYy6Gmv//P89sEte9V9whIapMNF4idKxnW2QtCcLiTWlb/wfCabAtAFWhhBow==}
engines: {node: '>=16 || 14 >=14.17'}
+ minimist@1.2.8:
+ resolution: {integrity: sha512-2yyAR8qBkN3YuheJanUpWC5U3bb5osDywNB8RzDVlDwDHbocAJveqqj1u8+SVD7jkWT4yvsHCpWqqWqAxb0zCA==}
+
mri@1.2.0:
resolution: {integrity: sha512-tzzskb3bG8LvYGFF/mDTpq3jpI6Q9wc3LEmBaghu+DdCssd1FakN7Bc0hVNmEyGq1bq3RgfkCb3cmQLpNPOroA==}
engines: {node: '>=4'}
@@ -3269,6 +3286,12 @@ snapshots:
color-name@1.1.4: {}
+ commonmark@0.31.2:
+ dependencies:
+ entities: 3.0.1
+ mdurl: 1.0.1
+ minimist: 1.2.8
+
concat-map@0.0.1: {}
convert-source-map@2.0.0: {}
@@ -3320,6 +3343,8 @@ snapshots:
ansi-colors: 4.1.3
strip-ansi: 6.0.1
+ entities@3.0.1: {}
+
entities@7.0.1: {}
es-module-lexer@2.0.0: {}
@@ -3692,6 +3717,8 @@ snapshots:
marked@17.0.1: {}
+ mdurl@1.0.1: {}
+
merge2@1.4.1: {}
micromatch@4.0.8:
@@ -3709,6 +3736,8 @@ snapshots:
dependencies:
brace-expansion: 2.0.1
+ minimist@1.2.8: {}
+
mri@1.2.0: {}
ms@2.1.3: {}