Skip to content

test(markdown_parser): add reference HTML validation and structural invariants#9779

Merged
ematipico merged 4 commits intobiomejs:mainfrom
jfmcdowell:test/md-reference-html-validation
Apr 4, 2026
Merged

test(markdown_parser): add reference HTML validation and structural invariants#9779
ematipico merged 4 commits intobiomejs:mainfrom
jfmcdowell:test/md-reference-html-validation

Conversation

@jfmcdowell
Copy link
Copy Markdown
Contributor

@jfmcdowell jfmcdowell commented Apr 2, 2026

Note

This PR was created with AI assistance (Claude Code).

Summary

Adds reference HTML validation for eligible markdown parser ok/ fixtures and two structural invariants in spec_test.rs to catch known CST regressions.

Also adds dedicated fixtures for ownership-sensitive cases:

  • blockquote_inside_list.md
  • setext_heading_inside_list.md
  • indented_code_in_list_vs_continuation.md

Test Plan

  • just test-crate biome_markdown_parser
  • just test-markdown-conformance
  • just f
  • just l

Docs

N/A.

…nvariants

Add two new validation layers to the markdown parser test suite:

1. Optional reference HTML comparison: ok/ fixtures with a .html
   sidecar are rendered via document_to_html and compared against
   commonmark.js reference output. 38 of 75 fixtures are covered;
   the rest are excluded because they test Biome-specific CST
   structure or have known behavioral mismatches.

2. Targeted structural invariants for known bug classes:
   - MdContinuationIndent children must all be MdIndentToken
   - MdHeader inside list items must have a present indent slot

Also adds 3 ownership-sensitive fixtures:
   - blockquote_inside_list.md
   - setext_heading_inside_list.md
   - indented_code_in_list_vs_continuation.md
@changeset-bot
Copy link
Copy Markdown

changeset-bot bot commented Apr 2, 2026

⚠️ No Changeset found

Latest commit: e11220b

Merging this PR will not cause a version bump for any packages. If these changes should not result in a new version, you're good to go. If these changes should result in a version bump, you need to add a changeset.

This PR includes no changesets

When changesets are added to this PR, you'll see the packages that this PR includes changesets for and the associated semver types

Click here to learn what changesets are, and how to add one.

Click here if you're a maintainer who wants to add a changeset to this PR

@github-actions github-actions bot added A-Parser Area: parser L-Markdown Language: Markdown labels Apr 2, 2026
@codspeed-hq
Copy link
Copy Markdown

codspeed-hq bot commented Apr 2, 2026

Merging this PR will not alter performance

✅ 58 untouched benchmarks
⏩ 196 skipped benchmarks1


Comparing jfmcdowell:test/md-reference-html-validation (e11220b) with main (35305c9)

Open in CodSpeed

Footnotes

  1. 196 benchmarks were skipped, so the baseline results were used instead. If they were deleted from the codebase, click here and archive them to remove them from the performance reports.

@jfmcdowell jfmcdowell marked this pull request as ready for review April 2, 2026 19:51
@coderabbitai
Copy link
Copy Markdown
Contributor

coderabbitai bot commented Apr 2, 2026

Walkthrough

Adds 40+ new Markdown test fixtures under crates/biome_markdown_parser/tests/md_test_suite/ok covering headings, emphasis, links, inline HTML, fenced/indented code, blockquotes, lists, lazy continuation and many edge cases. Updates crates/biome_markdown_parser/tests/spec_test.rs to optionally compare rendered output against sibling .html files (with HTML normalisation) and to enforce two runtime invariants: MD_CONTINUATION_INDENT must contain only MD_INDENT_TOKEN children, and headers inside list items must not have an empty first slot. Adds a dev-dependency on similar-asserts for test comparisons.

Possibly related PRs

Suggested reviewers

  • dyc3
  • ematipico
🚥 Pre-merge checks | ✅ 2
✅ Passed checks (2 passed)
Check name Status Explanation
Title check ✅ Passed The title clearly and specifically describes the main change: adding reference HTML validation and structural invariants to the markdown parser tests.
Description check ✅ Passed The description is well-related to the changeset, clearly explaining what was added (HTML validation, structural invariants, and new test fixtures) with a test plan and context.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

Copy link
Copy Markdown
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 2

🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@crates/biome_markdown_parser/tests/spec_test.rs`:
- Around line 123-125: The current conditional (checking html_path.exists() &&
let Some(doc) = MdDocument::cast(parsed.syntax())) can silently do nothing and
hide a vanished reference check; update this branch so that when a .html sidecar
exists and MdDocument::cast(parsed.syntax()) yields Some(doc) the test performs
the sidecar validation instead of skipping — either invoke the existing sidecar
validation helper (the function used elsewhere to compare/validate .html
sidecars) with doc and html_path, or explicitly fail the test (panic/assert)
with a clear message that the fixture's sidecar must be validated; ensure you
modify the block surrounding html_path.exists(),
MdDocument::cast(parsed.syntax()), and the doc variable accordingly so the test
no longer silently passes.
- Around line 143-157: When iterating parsed.syntax().descendants() and
encountering a node of kind MD_CONTINUATION_INDENT, first assert that the node
contains the wrapper child of kind MD_INDENT_TOKEN_LIST (e.g. find a child where
child.kind() == MarkdownSyntaxKind::MD_INDENT_TOKEN_LIST and assert it exists
and there are no unexpected sibling kinds), then iterate that wrapper's children
and assert each is MD_INDENT_TOKEN as currently done (use the same
MarkdownSyntaxKind symbols MD_CONTINUATION_INDENT, MD_INDENT_TOKEN_LIST,
MD_INDENT_TOKEN and reuse file_name in assertion messages).
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

Run ID: f3c172f0-a7a4-4977-aff4-a803d80d48d4

📥 Commits

Reviewing files that changed from the base of the PR and between 35305c9 and c2d26a4.

⛔ Files ignored due to path filters (3)
  • crates/biome_markdown_parser/tests/md_test_suite/ok/blockquote_inside_list.md.snap is excluded by !**/*.snap and included by **
  • crates/biome_markdown_parser/tests/md_test_suite/ok/indented_code_in_list_vs_continuation.md.snap is excluded by !**/*.snap and included by **
  • crates/biome_markdown_parser/tests/md_test_suite/ok/setext_heading_inside_list.md.snap is excluded by !**/*.snap and included by **
📒 Files selected for processing (44)
  • crates/biome_markdown_parser/tests/md_test_suite/ok/atx_heading_trailing_hash.html
  • crates/biome_markdown_parser/tests/md_test_suite/ok/autolinks.html
  • crates/biome_markdown_parser/tests/md_test_suite/ok/block_quote.html
  • crates/biome_markdown_parser/tests/md_test_suite/ok/block_quote_grouping.html
  • crates/biome_markdown_parser/tests/md_test_suite/ok/blockquote_inside_list.md
  • crates/biome_markdown_parser/tests/md_test_suite/ok/emphasis_complex.html
  • crates/biome_markdown_parser/tests/md_test_suite/ok/emphasis_crossing.html
  • crates/biome_markdown_parser/tests/md_test_suite/ok/emphasis_edge_cases.html
  • crates/biome_markdown_parser/tests/md_test_suite/ok/emphasis_flanking.html
  • crates/biome_markdown_parser/tests/md_test_suite/ok/emphasis_link_text.html
  • crates/biome_markdown_parser/tests/md_test_suite/ok/entity_references.html
  • crates/biome_markdown_parser/tests/md_test_suite/ok/fenced_code_advanced.html
  • crates/biome_markdown_parser/tests/md_test_suite/ok/fenced_code_block.html
  • crates/biome_markdown_parser/tests/md_test_suite/ok/fenced_code_in_blockquote.html
  • crates/biome_markdown_parser/tests/md_test_suite/ok/fenced_code_in_list.html
  • crates/biome_markdown_parser/tests/md_test_suite/ok/fenced_code_indentation.html
  • crates/biome_markdown_parser/tests/md_test_suite/ok/fenced_code_info_backtick.html
  • crates/biome_markdown_parser/tests/md_test_suite/ok/hard_line_break.html
  • crates/biome_markdown_parser/tests/md_test_suite/ok/header.html
  • crates/biome_markdown_parser/tests/md_test_suite/ok/html_block.html
  • crates/biome_markdown_parser/tests/md_test_suite/ok/html_block_in_list.html
  • crates/biome_markdown_parser/tests/md_test_suite/ok/indent_code_block.html
  • crates/biome_markdown_parser/tests/md_test_suite/ok/indented_code_blank_lines.html
  • crates/biome_markdown_parser/tests/md_test_suite/ok/indented_code_in_list_vs_continuation.html
  • crates/biome_markdown_parser/tests/md_test_suite/ok/indented_code_in_list_vs_continuation.md
  • crates/biome_markdown_parser/tests/md_test_suite/ok/inline_elements.html
  • crates/biome_markdown_parser/tests/md_test_suite/ok/inline_html.html
  • crates/biome_markdown_parser/tests/md_test_suite/ok/inline_link_destination_title.html
  • crates/biome_markdown_parser/tests/md_test_suite/ok/inline_link_whitespace.html
  • crates/biome_markdown_parser/tests/md_test_suite/ok/lazy_continuation.html
  • crates/biome_markdown_parser/tests/md_test_suite/ok/link_definition.html
  • crates/biome_markdown_parser/tests/md_test_suite/ok/list_blank_lines_between_items.html
  • crates/biome_markdown_parser/tests/md_test_suite/ok/list_tightness.html
  • crates/biome_markdown_parser/tests/md_test_suite/ok/multi_backtick_code.html
  • crates/biome_markdown_parser/tests/md_test_suite/ok/nested_list_interrupt_after_newline.html
  • crates/biome_markdown_parser/tests/md_test_suite/ok/nested_quote.html
  • crates/biome_markdown_parser/tests/md_test_suite/ok/paragraph.html
  • crates/biome_markdown_parser/tests/md_test_suite/ok/paragraph_interruption.html
  • crates/biome_markdown_parser/tests/md_test_suite/ok/reference_links.html
  • crates/biome_markdown_parser/tests/md_test_suite/ok/setext_heading.html
  • crates/biome_markdown_parser/tests/md_test_suite/ok/setext_heading_inside_list.html
  • crates/biome_markdown_parser/tests/md_test_suite/ok/setext_heading_inside_list.md
  • crates/biome_markdown_parser/tests/md_test_suite/ok/thematic_break_block.html
  • crates/biome_markdown_parser/tests/spec_test.rs

Comment thread crates/biome_markdown_parser/tests/spec_test.rs Outdated
Comment thread crates/biome_markdown_parser/tests/spec_test.rs
…L sidecar tests

The previous `if let` chain silently skipped HTML reference validation
when MdDocument::cast returned None. Now panics with a clear message.
Comment on lines +137 to +140
assert_eq!(
expected_normalized, actual_normalized,
"HTML mismatch for {file_name}.\nExpected:\n{expected}\nActual:\n{actual}"
);
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We should use the similar and similar-asserts crates for a better look and feel of differences between the strings.

The crates are already in the workspace and used somewhere else

assert_eq!(
token_node.kind(),
MarkdownSyntaxKind::MD_INDENT_TOKEN,
"MdContinuationIndent invariant: expected MD_INDENT_TOKEN, got {:?} in {file_name} at {:?}",
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What should a developer do when it encounters such an error? We should at least provide an actionable suggestion (not the solution of course)

Comment thread crates/biome_markdown_parser/tests/spec_test.rs
- Switch HTML sidecar comparison to `similar_asserts::assert_eq!` for
  colored unified diffs on failure.
- Make panic/assert messages actionable with next-step guidance.
- Fix duplicate format arg in structural invariant assertion.
- Include Cargo.lock update for new dev-dependency.
Copy link
Copy Markdown
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🧹 Nitpick comments (1)
crates/biome_markdown_parser/tests/spec_test.rs (1)

170-175: Improve the assertion message with useful debug info and actionable guidance.

node.kind() is always MD_HEADER here (we just checked), so printing it isn't helpful. The text range would be more useful for debugging. Also, per ematipico's earlier feedback, an actionable suggestion would help developers fix the issue.

♻️ Suggested improvement
                     assert!(
                         !matches!(node.slots().next(), Some(SyntaxSlot::Empty { .. })),
-                        "MdHeader indent invariant: MD_HEADER inside list item is missing indent slot in {file_name} at {:?}",
-                        node.kind()
+                        "MdHeader indent invariant: MD_HEADER inside list item is missing indent slot in {file_name} at {:?}. Check the parser's header indent handling in syntax/header.rs.",
+                        node.text_range()
                     );
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@crates/biome_markdown_parser/tests/spec_test.rs` around lines 170 - 175, The
assertion message is unhelpful because node.kind() is always MD_HEADER; change
the failure message to include the node's text range (use node.text_range()) and
file_name and give actionable guidance like "ensure MD_HEADER inside list item
has an indent slot (non-empty SyntaxSlot) or adjust parser to insert one" so
maintainers know what to fix; update the assert! for the slot check around
node.slots() / SyntaxSlot::Empty to print file_name, node.text_range(), and the
expected invariant (e.g., "expected non-empty indent slot for MD_HEADER inside
list item") to make debugging and remediation clear.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Nitpick comments:
In `@crates/biome_markdown_parser/tests/spec_test.rs`:
- Around line 170-175: The assertion message is unhelpful because node.kind() is
always MD_HEADER; change the failure message to include the node's text range
(use node.text_range()) and file_name and give actionable guidance like "ensure
MD_HEADER inside list item has an indent slot (non-empty SyntaxSlot) or adjust
parser to insert one" so maintainers know what to fix; update the assert! for
the slot check around node.slots() / SyntaxSlot::Empty to print file_name,
node.text_range(), and the expected invariant (e.g., "expected non-empty indent
slot for MD_HEADER inside list item") to make debugging and remediation clear.

ℹ️ Review info
⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

Run ID: 85ad39b1-ab37-4411-8f71-d52a46bb026e

📥 Commits

Reviewing files that changed from the base of the PR and between f2f7655 and 58d3e5f.

⛔ Files ignored due to path filters (1)
  • Cargo.lock is excluded by !**/*.lock and included by **
📒 Files selected for processing (2)
  • crates/biome_markdown_parser/Cargo.toml
  • crates/biome_markdown_parser/tests/spec_test.rs

Copy link
Copy Markdown
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🧹 Nitpick comments (1)
crates/biome_markdown_parser/tests/spec_test.rs (1)

170-176: Add an actionable suggestion to the MdHeader invariant error message.

Per ematipico's earlier feedback: developers encountering this should know what to do. Also, node.kind() in the message is redundant (we already know it's MD_HEADER); the text range would be more useful for locating the issue.

Suggested improvement
                     if in_list {
                         assert!(
                             !matches!(node.slots().next(), Some(SyntaxSlot::Empty { .. })),
-                            "MdHeader indent invariant: MD_HEADER inside list item is missing indent slot in {file_name} at {:?}",
-                            node.kind()
+                            "MdHeader indent invariant: MD_HEADER inside list item is missing indent slot in {file_name} at {:?}. Check the parser's header-inside-list logic in syntax/list.rs or syntax/header.rs.",
+                            node.text_range()
                         );
                     }
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@crates/biome_markdown_parser/tests/spec_test.rs` around lines 170 - 176, The
MdHeader indent invariant assertion message is unhelpful and uses node.kind()
redundantly; update the assertion in the in_list branch so it tells the
developer what to do (e.g., "MD_HEADER inside list item is missing leading
indent slot; ensure an indent SyntaxSlot is present") and include the node's
text range for locating the error (use node.text_range() or the equivalent span
accessor) along with file_name to aid debugging; modify the assert! message
attached to the check of node.slots().next() / SyntaxSlot::Empty accordingly.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Nitpick comments:
In `@crates/biome_markdown_parser/tests/spec_test.rs`:
- Around line 170-176: The MdHeader indent invariant assertion message is
unhelpful and uses node.kind() redundantly; update the assertion in the in_list
branch so it tells the developer what to do (e.g., "MD_HEADER inside list item
is missing leading indent slot; ensure an indent SyntaxSlot is present") and
include the node's text range for locating the error (use node.text_range() or
the equivalent span accessor) along with file_name to aid debugging; modify the
assert! message attached to the check of node.slots().next() / SyntaxSlot::Empty
accordingly.

ℹ️ Review info
⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

Run ID: 43a30771-8bd1-4593-95cb-d1b542e8b8af

📥 Commits

Reviewing files that changed from the base of the PR and between 58d3e5f and e11220b.

📒 Files selected for processing (2)
  • crates/biome_markdown_parser/Cargo.toml
  • crates/biome_markdown_parser/tests/spec_test.rs
✅ Files skipped from review due to trivial changes (1)
  • crates/biome_markdown_parser/Cargo.toml

@jfmcdowell
Copy link
Copy Markdown
Contributor Author

@ematipico: ACK on your feedback. Resolved in 58d3e5f

@ematipico ematipico merged commit 6ef0ffc into biomejs:main Apr 4, 2026
35 checks passed
@jfmcdowell jfmcdowell deleted the test/md-reference-html-validation branch April 13, 2026 12:29
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

A-Parser Area: parser L-Markdown Language: Markdown

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants