test(markdown_parser): add differential fuzzer against commonmark.js by jfmcdowell · Pull Request #9784 · biomejs/biome

jfmcdowell · 2026-04-03T01:45:37Z

Note

This PR was created with AI assistance (Claude Code).

Summary

Adds a differential fuzzer that generates random markdown from construct combinators and compares Biome's document_to_html output against commonmark.js reference output.

The generator is biased toward interaction patterns that have produced parser bugs: headers inside lists, setext headings in blockquotes, inline HTML near blockquote markers, mixed list markers, and lazy continuation at various indent levels.

The checked-in seed corpus contains only passing cases. Any failure is either a regression or a newly discovered mismatch worth triaging. Extended corpora can be generated locally for discovery and fixed cases promoted into the seed. Because the checked-in seed corpus contains only passing cases, this differential test is now viable for CI against the seed corpus, while larger generated corpora can remain a local or scheduled discovery workflow.

Test Plan

cargo test -p biome_markdown_parser --test fuzz_differential -- --ignored --nocapture
just test-crate biome_markdown_parser
just f
just l

Docs

N/A

changeset-bot · 2026-04-03T01:45:43Z

⚠️ No Changeset found

Latest commit: f5bdd2c

Merging this PR will not cause a version bump for any packages. If these changes should not result in a new version, you're good to go. If these changes should result in a version bump, you need to add a changeset.

This PR includes no changesets

When changesets are added to this PR, you'll see the packages that this PR includes changesets for and the associated semver types

Click here to learn what changesets are, and how to add one.

Click here if you're a maintainer who wants to add a changeset to this PR

codspeed-hq · 2026-04-03T01:52:49Z

Merging this PR will not alter performance

✅ 28 untouched benchmarks
⏩ 228 skipped benchmarks¹

_{Comparing jfmcdowell:test/md-differential-fuzzer (f5bdd2c) with main (f3d60a6)²}

228 benchmarks were skipped, so the baseline results were used instead. If they were deleted from the codebase, click here and archive them to remove them from the performance reports. ↩
No successful run was found on main (d5ca672) during the generation of this report, so f3d60a6 was used instead as the comparison base. There might be some changes unrelated to this pull request in this report. ↩

jfmcdowell · 2026-04-03T02:31:00Z

I'd recommend adding commonmark as a root devDependency for this workflow. Biome already has root Node tooling and root devDependencies, so this wouldn't be introducing a new kind of project dependency, and commonmark is unusually justified here because it is the reference implementation the differential test compares against. It would also make the generator workflow simpler and easier to run manually. @ematipico, does that dependency choice seem acceptable here?

For now the PR uses a temp-dir npm install approach to avoid forcing that decision, but it adds ceremony to the justfile recipe that a root devDependency would eliminate.

coderabbitai · 2026-04-03T02:53:25Z

Note

Reviews paused

It looks like this branch is under active development. To avoid overwhelming you with review comments due to an influx of new commits, CodeRabbit has automatically paused this review. You can configure this behavior by changing the reviews.auto_review.auto_pause_after_reviewed_commits setting.

Use the following commands to manage reviews:

@coderabbitai resume to resume automatic reviews.
@coderabbitai review to trigger a single review.

Use the checkboxes below for quick actions:

▶️ Resume reviews
🔍 Trigger review

No actionable comments were generated in the recent review. 🎉

ℹ️ Recent review info

⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

Run ID: 854ebbe2-1b92-4476-a586-3c2a7e5155dd

📥 Commits

Reviewing files that changed from the base of the PR and between 97dadc2 and f5bdd2c.

⛔ Files ignored due to path filters (1)

pnpm-lock.yaml is excluded by !**/pnpm-lock.yaml and included by **

📒 Files selected for processing (5)

crates/biome_markdown_parser/tests/fuzz_corpus/seed.jsonl
crates/biome_markdown_parser/tests/fuzz_differential.rs
crates/biome_markdown_parser/tests/fuzz_generate_corpus.cjs
justfile
package.json

✅ Files skipped from review due to trivial changes (5)

package.json
justfile
crates/biome_markdown_parser/tests/fuzz_corpus/seed.jsonl
crates/biome_markdown_parser/tests/fuzz_generate_corpus.cjs
crates/biome_markdown_parser/tests/fuzz_differential.rs

Walkthrough

Adds fuzzing infrastructure for the markdown parser: a checked‑in JSONL seed corpus (crates/biome_markdown_parser/tests/fuzz_corpus/seed.jsonl) containing 102 { markdown, html } pairs; an ignored differential test (fuzz_differential.rs) that parses each markdown, renders HTML via the crate renderer, normalises lines (preserving <pre> blocks), compares against the reference HTML, records failures using a deterministic 64‑bit FNV‑1a hash, and can write per‑failure artifacts. Adds a Node.js corpus generator (fuzz_generate_corpus.cjs) that uses commonmark to produce references, a devDependency on commonmark, and two just recipes to generate the corpus and run the differential test.

Suggested reviewers

ematipico
dyc3

🚥 Pre-merge checks | ✅ 2

✅ Passed checks (2 passed)

Check name	Status	Explanation
Title check	✅ Passed	The title accurately summarises the main addition: a differential fuzzer for the markdown parser comparing against commonmark.js, which aligns directly with the changeset.
Description check	✅ Passed	The description clearly explains the purpose, approach, and testing plan of the differential fuzzer, relating directly to the changeset.

_{✏️ Tip: You can configure your own custom pre-merge checks in the settings.}

✨ Finishing Touches

🧪 Generate unit tests (beta)

Create PR with unit tests

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

_{Comment @coderabbitai help to get the list of available commands and usage tips.}

coderabbitai

Actionable comments posted: 2

🤖 Prompt for all review comments with AI agents

Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@crates/biome_markdown_parser/tests/fuzz_differential.rs`:
- Around line 55-61: run_corpus currently swallows I/O and JSON errors which
lets a corrupted seed.jsonl hide as “All cases passed”; change run_corpus to
fail fast by returning or panicking on unreadable files and malformed JSON
instead of skipping them, and replace ad-hoc serde_json::Value handling with a
typed deserialisable struct (e.g., SeedCase with markdown and html fields) when
parsing each line so missing or wrong fields cause an immediate error; update
the parsing logic (the code that reads lines and the block referenced around
lines 72–82) to deserialize into SeedCase and propagate or surface any
deserialization/I/O errors rather than continuing silently.

In `@crates/biome_markdown_parser/tests/fuzz_generate_corpus.cjs`:
- Around line 24-25: The PRNG can get stuck when seeded with 0 and can produce
rand() === 1.0 causing randInt() to return max+1 and make pick() yield
undefined; to fix, ensure the xorshift32 state is never initialized to zero (if
seed === 0 set it to 1) and change the floating-point normalization in rand() to
divide by 0x100000000 (4294967296) instead of 0xffffffff so rand() is always in
[0,1). Update the implementations of xorshift32 (state init), rand(), randInt(),
and pick() accordingly so randInt() uses the corrected rand() and pick() cannot
index out of bounds.

🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

Push a commit to this branch (recommended)
Create a new PR with the fixes

ℹ️ Review info

⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

Run ID: febdfe05-717f-4f2e-aa6a-7e2b8384d080

📥 Commits

Reviewing files that changed from the base of the PR and between b22f31a and e692a76.

📒 Files selected for processing (4)

crates/biome_markdown_parser/tests/fuzz_corpus/seed.jsonl
crates/biome_markdown_parser/tests/fuzz_differential.rs
crates/biome_markdown_parser/tests/fuzz_generate_corpus.cjs
justfile

coderabbitai

Actionable comments posted: 1

🤖 Prompt for all review comments with AI agents

Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@crates/biome_markdown_parser/tests/fuzz_differential.rs`:
- Around line 136-140: The loop over all_failures silently swallows fs::write
errors (base.with_extension(...).ok()) and reuses hash-only basenames (base =
dir.join(&failure.hash)) which can hide missing artefacts and cause overwrites;
change the loop to enumerate(all_failures) to generate a unique basename per
failure (e.g., combine failure.hash with the index or timestamp) and replace the
.ok() calls with error-propagating writes (use ? or expect with a clear message,
or collect and panic on errors) for the three writes that use
base.with_extension("md"), base.with_extension("expected.html"), and
base.with_extension("actual.html") so failures surface loudly and filenames are
unique.

🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

Push a commit to this branch (recommended)
Create a new PR with the fixes

ℹ️ Review info

⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

Run ID: d1e887e6-2076-4c70-af2c-95d8555c29c8

📥 Commits

Reviewing files that changed from the base of the PR and between 133c81c and 7d72775.

📒 Files selected for processing (1)

crates/biome_markdown_parser/tests/fuzz_differential.rs

ematipico · 2026-04-03T06:19:23Z

Oh yeah that's totally reasonable, go ahead.

Also, I'm not familiar with terminology here: what "seed corpus" and "corpora" mean?

jfmcdowell · 2026-04-03T14:17:57Z

Good question.

Seed corpus: the checked-in baseline inputs the differential test always runs. Ours is seed.jsonl, and it should contain only passing { markdown, html } pairs.
Corpus / corpora: the full collection of inputs. Besides the seed corpus, you can generate a larger disposable corpus locally to look for new mismatches.

So: seed corpus = stable baseline, generated corpus = exploratory fuzz input.

Also pushed a commit adding commonmark as a root devDependency and simplifying the just recipe.

Generates random markdown from construct combinators (lists, blockquotes, headings, inline HTML, fenced code, link definitions) biased toward interaction patterns that have produced parser bugs. Compares rendered HTML from document_to_html against commonmark.js reference output. - Generator: tests/fuzz_generate_corpus.mjs (seeded, reproducible) - Seed corpus: tests/fuzz_corpus/seed.jsonl (102 passing cases) - Differential test: tests/fuzz_differential.rs (#[ignore] by default) - Justfile: fuzz-markdown-generate, fuzz-markdown-differential

- fuzz_differential.rs: panic on unreadable files and malformed JSON instead of silently skipping; use typed SeedCase struct so missing markdown/html fields fail at deserialization. - fuzz_generate_corpus.cjs: guard seed=0 (xorshift32 gets stuck at zero) and use 0x100000000 divisor so rand() returns [0,1) not [0,1].

Fixes clippy::implicit_clone warnings.

…recipe

github-actions bot added A-Parser Area: parser L-Markdown Language: Markdown labels Apr 3, 2026

jfmcdowell force-pushed the test/md-differential-fuzzer branch from adc9177 to e692a76 Compare April 3, 2026 02:30

jfmcdowell marked this pull request as ready for review April 3, 2026 02:42

coderabbitai bot reviewed Apr 3, 2026

View reviewed changes

Comment thread crates/biome_markdown_parser/tests/fuzz_differential.rs Outdated

Comment thread crates/biome_markdown_parser/tests/fuzz_generate_corpus.cjs

coderabbitai bot reviewed Apr 3, 2026

View reviewed changes

Comment thread crates/biome_markdown_parser/tests/fuzz_differential.rs

jfmcdowell added 4 commits April 6, 2026 09:17

fix(test): use clone() instead of to_string() for String refs

304dd77

Fixes clippy::implicit_clone warnings.

fix(review): add commonmark as root devDependency, simplify justfile …

f5bdd2c

…recipe

jfmcdowell force-pushed the test/md-differential-fuzzer branch from 97dadc2 to f5bdd2c Compare April 6, 2026 13:23

ematipico approved these changes Apr 6, 2026

View reviewed changes

ematipico merged commit 626344e into biomejs:main Apr 6, 2026
16 checks passed

coderabbitai bot mentioned this pull request Apr 6, 2026

refactor(markdown_parser): deduplicate normalize_html test helper #9833

Merged

jfmcdowell deleted the test/md-differential-fuzzer branch April 13, 2026 12:29

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

test(markdown_parser): add differential fuzzer against commonmark.js#9784

test(markdown_parser): add differential fuzzer against commonmark.js#9784
ematipico merged 4 commits intobiomejs:mainfrom
jfmcdowell:test/md-differential-fuzzer

jfmcdowell commented Apr 3, 2026 •

edited

Loading

Uh oh!

changeset-bot bot commented Apr 3, 2026 •

edited

Loading

Uh oh!

codspeed-hq bot commented Apr 3, 2026 •

edited

Loading

Uh oh!

jfmcdowell commented Apr 3, 2026

Uh oh!

coderabbitai bot commented Apr 3, 2026 •

edited

Loading

Reviews paused

Uh oh!

coderabbitai bot left a comment

Uh oh!

Uh oh!

Uh oh!

coderabbitai bot left a comment

Uh oh!

Uh oh!

ematipico commented Apr 3, 2026

Uh oh!

jfmcdowell commented Apr 3, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Uh oh!

Conversation

jfmcdowell commented Apr 3, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

Test Plan

Docs

Uh oh!

changeset-bot bot commented Apr 3, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

⚠️ No Changeset found

Uh oh!

codspeed-hq bot commented Apr 3, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Merging this PR will not alter performance

Footnotes

Uh oh!

jfmcdowell commented Apr 3, 2026

Uh oh!

coderabbitai bot commented Apr 3, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Reviews paused

Walkthrough

Suggested reviewers

Uh oh!

coderabbitai bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

coderabbitai bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

ematipico commented Apr 3, 2026

Uh oh!

jfmcdowell commented Apr 3, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

jfmcdowell commented Apr 3, 2026 •

edited

Loading

changeset-bot bot commented Apr 3, 2026 •

edited

Loading

codspeed-hq bot commented Apr 3, 2026 •

edited

Loading

coderabbitai bot commented Apr 3, 2026 •

edited

Loading