test(markdown_parser): add reference HTML validation and structural invariants#9779
Conversation
…nvariants Add two new validation layers to the markdown parser test suite: 1. Optional reference HTML comparison: ok/ fixtures with a .html sidecar are rendered via document_to_html and compared against commonmark.js reference output. 38 of 75 fixtures are covered; the rest are excluded because they test Biome-specific CST structure or have known behavioral mismatches. 2. Targeted structural invariants for known bug classes: - MdContinuationIndent children must all be MdIndentToken - MdHeader inside list items must have a present indent slot Also adds 3 ownership-sensitive fixtures: - blockquote_inside_list.md - setext_heading_inside_list.md - indented_code_in_list_vs_continuation.md
|
Merging this PR will not alter performance
Comparing Footnotes
|
WalkthroughAdds 40+ new Markdown test fixtures under crates/biome_markdown_parser/tests/md_test_suite/ok covering headings, emphasis, links, inline HTML, fenced/indented code, blockquotes, lists, lazy continuation and many edge cases. Updates crates/biome_markdown_parser/tests/spec_test.rs to optionally compare rendered output against sibling Possibly related PRs
Suggested reviewers
🚥 Pre-merge checks | ✅ 2✅ Passed checks (2 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches🧪 Generate unit tests (beta)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
There was a problem hiding this comment.
Actionable comments posted: 2
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@crates/biome_markdown_parser/tests/spec_test.rs`:
- Around line 123-125: The current conditional (checking html_path.exists() &&
let Some(doc) = MdDocument::cast(parsed.syntax())) can silently do nothing and
hide a vanished reference check; update this branch so that when a .html sidecar
exists and MdDocument::cast(parsed.syntax()) yields Some(doc) the test performs
the sidecar validation instead of skipping — either invoke the existing sidecar
validation helper (the function used elsewhere to compare/validate .html
sidecars) with doc and html_path, or explicitly fail the test (panic/assert)
with a clear message that the fixture's sidecar must be validated; ensure you
modify the block surrounding html_path.exists(),
MdDocument::cast(parsed.syntax()), and the doc variable accordingly so the test
no longer silently passes.
- Around line 143-157: When iterating parsed.syntax().descendants() and
encountering a node of kind MD_CONTINUATION_INDENT, first assert that the node
contains the wrapper child of kind MD_INDENT_TOKEN_LIST (e.g. find a child where
child.kind() == MarkdownSyntaxKind::MD_INDENT_TOKEN_LIST and assert it exists
and there are no unexpected sibling kinds), then iterate that wrapper's children
and assert each is MD_INDENT_TOKEN as currently done (use the same
MarkdownSyntaxKind symbols MD_CONTINUATION_INDENT, MD_INDENT_TOKEN_LIST,
MD_INDENT_TOKEN and reuse file_name in assertion messages).
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: Path: .coderabbit.yaml
Review profile: CHILL
Plan: Pro
Run ID: f3c172f0-a7a4-4977-aff4-a803d80d48d4
⛔ Files ignored due to path filters (3)
crates/biome_markdown_parser/tests/md_test_suite/ok/blockquote_inside_list.md.snapis excluded by!**/*.snapand included by**crates/biome_markdown_parser/tests/md_test_suite/ok/indented_code_in_list_vs_continuation.md.snapis excluded by!**/*.snapand included by**crates/biome_markdown_parser/tests/md_test_suite/ok/setext_heading_inside_list.md.snapis excluded by!**/*.snapand included by**
📒 Files selected for processing (44)
crates/biome_markdown_parser/tests/md_test_suite/ok/atx_heading_trailing_hash.htmlcrates/biome_markdown_parser/tests/md_test_suite/ok/autolinks.htmlcrates/biome_markdown_parser/tests/md_test_suite/ok/block_quote.htmlcrates/biome_markdown_parser/tests/md_test_suite/ok/block_quote_grouping.htmlcrates/biome_markdown_parser/tests/md_test_suite/ok/blockquote_inside_list.mdcrates/biome_markdown_parser/tests/md_test_suite/ok/emphasis_complex.htmlcrates/biome_markdown_parser/tests/md_test_suite/ok/emphasis_crossing.htmlcrates/biome_markdown_parser/tests/md_test_suite/ok/emphasis_edge_cases.htmlcrates/biome_markdown_parser/tests/md_test_suite/ok/emphasis_flanking.htmlcrates/biome_markdown_parser/tests/md_test_suite/ok/emphasis_link_text.htmlcrates/biome_markdown_parser/tests/md_test_suite/ok/entity_references.htmlcrates/biome_markdown_parser/tests/md_test_suite/ok/fenced_code_advanced.htmlcrates/biome_markdown_parser/tests/md_test_suite/ok/fenced_code_block.htmlcrates/biome_markdown_parser/tests/md_test_suite/ok/fenced_code_in_blockquote.htmlcrates/biome_markdown_parser/tests/md_test_suite/ok/fenced_code_in_list.htmlcrates/biome_markdown_parser/tests/md_test_suite/ok/fenced_code_indentation.htmlcrates/biome_markdown_parser/tests/md_test_suite/ok/fenced_code_info_backtick.htmlcrates/biome_markdown_parser/tests/md_test_suite/ok/hard_line_break.htmlcrates/biome_markdown_parser/tests/md_test_suite/ok/header.htmlcrates/biome_markdown_parser/tests/md_test_suite/ok/html_block.htmlcrates/biome_markdown_parser/tests/md_test_suite/ok/html_block_in_list.htmlcrates/biome_markdown_parser/tests/md_test_suite/ok/indent_code_block.htmlcrates/biome_markdown_parser/tests/md_test_suite/ok/indented_code_blank_lines.htmlcrates/biome_markdown_parser/tests/md_test_suite/ok/indented_code_in_list_vs_continuation.htmlcrates/biome_markdown_parser/tests/md_test_suite/ok/indented_code_in_list_vs_continuation.mdcrates/biome_markdown_parser/tests/md_test_suite/ok/inline_elements.htmlcrates/biome_markdown_parser/tests/md_test_suite/ok/inline_html.htmlcrates/biome_markdown_parser/tests/md_test_suite/ok/inline_link_destination_title.htmlcrates/biome_markdown_parser/tests/md_test_suite/ok/inline_link_whitespace.htmlcrates/biome_markdown_parser/tests/md_test_suite/ok/lazy_continuation.htmlcrates/biome_markdown_parser/tests/md_test_suite/ok/link_definition.htmlcrates/biome_markdown_parser/tests/md_test_suite/ok/list_blank_lines_between_items.htmlcrates/biome_markdown_parser/tests/md_test_suite/ok/list_tightness.htmlcrates/biome_markdown_parser/tests/md_test_suite/ok/multi_backtick_code.htmlcrates/biome_markdown_parser/tests/md_test_suite/ok/nested_list_interrupt_after_newline.htmlcrates/biome_markdown_parser/tests/md_test_suite/ok/nested_quote.htmlcrates/biome_markdown_parser/tests/md_test_suite/ok/paragraph.htmlcrates/biome_markdown_parser/tests/md_test_suite/ok/paragraph_interruption.htmlcrates/biome_markdown_parser/tests/md_test_suite/ok/reference_links.htmlcrates/biome_markdown_parser/tests/md_test_suite/ok/setext_heading.htmlcrates/biome_markdown_parser/tests/md_test_suite/ok/setext_heading_inside_list.htmlcrates/biome_markdown_parser/tests/md_test_suite/ok/setext_heading_inside_list.mdcrates/biome_markdown_parser/tests/md_test_suite/ok/thematic_break_block.htmlcrates/biome_markdown_parser/tests/spec_test.rs
…L sidecar tests The previous `if let` chain silently skipped HTML reference validation when MdDocument::cast returned None. Now panics with a clear message.
| assert_eq!( | ||
| expected_normalized, actual_normalized, | ||
| "HTML mismatch for {file_name}.\nExpected:\n{expected}\nActual:\n{actual}" | ||
| ); |
There was a problem hiding this comment.
We should use the similar and similar-asserts crates for a better look and feel of differences between the strings.
The crates are already in the workspace and used somewhere else
| assert_eq!( | ||
| token_node.kind(), | ||
| MarkdownSyntaxKind::MD_INDENT_TOKEN, | ||
| "MdContinuationIndent invariant: expected MD_INDENT_TOKEN, got {:?} in {file_name} at {:?}", |
There was a problem hiding this comment.
What should a developer do when it encounters such an error? We should at least provide an actionable suggestion (not the solution of course)
- Switch HTML sidecar comparison to `similar_asserts::assert_eq!` for colored unified diffs on failure. - Make panic/assert messages actionable with next-step guidance. - Fix duplicate format arg in structural invariant assertion. - Include Cargo.lock update for new dev-dependency.
There was a problem hiding this comment.
🧹 Nitpick comments (1)
crates/biome_markdown_parser/tests/spec_test.rs (1)
170-175: Improve the assertion message with useful debug info and actionable guidance.
node.kind()is alwaysMD_HEADERhere (we just checked), so printing it isn't helpful. The text range would be more useful for debugging. Also, per ematipico's earlier feedback, an actionable suggestion would help developers fix the issue.♻️ Suggested improvement
assert!( !matches!(node.slots().next(), Some(SyntaxSlot::Empty { .. })), - "MdHeader indent invariant: MD_HEADER inside list item is missing indent slot in {file_name} at {:?}", - node.kind() + "MdHeader indent invariant: MD_HEADER inside list item is missing indent slot in {file_name} at {:?}. Check the parser's header indent handling in syntax/header.rs.", + node.text_range() );🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@crates/biome_markdown_parser/tests/spec_test.rs` around lines 170 - 175, The assertion message is unhelpful because node.kind() is always MD_HEADER; change the failure message to include the node's text range (use node.text_range()) and file_name and give actionable guidance like "ensure MD_HEADER inside list item has an indent slot (non-empty SyntaxSlot) or adjust parser to insert one" so maintainers know what to fix; update the assert! for the slot check around node.slots() / SyntaxSlot::Empty to print file_name, node.text_range(), and the expected invariant (e.g., "expected non-empty indent slot for MD_HEADER inside list item") to make debugging and remediation clear.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Nitpick comments:
In `@crates/biome_markdown_parser/tests/spec_test.rs`:
- Around line 170-175: The assertion message is unhelpful because node.kind() is
always MD_HEADER; change the failure message to include the node's text range
(use node.text_range()) and file_name and give actionable guidance like "ensure
MD_HEADER inside list item has an indent slot (non-empty SyntaxSlot) or adjust
parser to insert one" so maintainers know what to fix; update the assert! for
the slot check around node.slots() / SyntaxSlot::Empty to print file_name,
node.text_range(), and the expected invariant (e.g., "expected non-empty indent
slot for MD_HEADER inside list item") to make debugging and remediation clear.
ℹ️ Review info
⚙️ Run configuration
Configuration used: Path: .coderabbit.yaml
Review profile: CHILL
Plan: Pro
Run ID: 85ad39b1-ab37-4411-8f71-d52a46bb026e
⛔ Files ignored due to path filters (1)
Cargo.lockis excluded by!**/*.lockand included by**
📒 Files selected for processing (2)
crates/biome_markdown_parser/Cargo.tomlcrates/biome_markdown_parser/tests/spec_test.rs
There was a problem hiding this comment.
🧹 Nitpick comments (1)
crates/biome_markdown_parser/tests/spec_test.rs (1)
170-176: Add an actionable suggestion to the MdHeader invariant error message.Per ematipico's earlier feedback: developers encountering this should know what to do. Also,
node.kind()in the message is redundant (we already know it'sMD_HEADER); the text range would be more useful for locating the issue.Suggested improvement
if in_list { assert!( !matches!(node.slots().next(), Some(SyntaxSlot::Empty { .. })), - "MdHeader indent invariant: MD_HEADER inside list item is missing indent slot in {file_name} at {:?}", - node.kind() + "MdHeader indent invariant: MD_HEADER inside list item is missing indent slot in {file_name} at {:?}. Check the parser's header-inside-list logic in syntax/list.rs or syntax/header.rs.", + node.text_range() ); }🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@crates/biome_markdown_parser/tests/spec_test.rs` around lines 170 - 176, The MdHeader indent invariant assertion message is unhelpful and uses node.kind() redundantly; update the assertion in the in_list branch so it tells the developer what to do (e.g., "MD_HEADER inside list item is missing leading indent slot; ensure an indent SyntaxSlot is present") and include the node's text range for locating the error (use node.text_range() or the equivalent span accessor) along with file_name to aid debugging; modify the assert! message attached to the check of node.slots().next() / SyntaxSlot::Empty accordingly.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Nitpick comments:
In `@crates/biome_markdown_parser/tests/spec_test.rs`:
- Around line 170-176: The MdHeader indent invariant assertion message is
unhelpful and uses node.kind() redundantly; update the assertion in the in_list
branch so it tells the developer what to do (e.g., "MD_HEADER inside list item
is missing leading indent slot; ensure an indent SyntaxSlot is present") and
include the node's text range for locating the error (use node.text_range() or
the equivalent span accessor) along with file_name to aid debugging; modify the
assert! message attached to the check of node.slots().next() / SyntaxSlot::Empty
accordingly.
ℹ️ Review info
⚙️ Run configuration
Configuration used: Path: .coderabbit.yaml
Review profile: CHILL
Plan: Pro
Run ID: 43a30771-8bd1-4593-95cb-d1b542e8b8af
📒 Files selected for processing (2)
crates/biome_markdown_parser/Cargo.tomlcrates/biome_markdown_parser/tests/spec_test.rs
✅ Files skipped from review due to trivial changes (1)
- crates/biome_markdown_parser/Cargo.toml
|
@ematipico: ACK on your feedback. Resolved in 58d3e5f |
Note
This PR was created with AI assistance (Claude Code).
Summary
Adds reference HTML validation for eligible markdown parser
ok/fixtures and two structural invariants inspec_test.rsto catch known CST regressions.Also adds dedicated fixtures for ownership-sensitive cases:
blockquote_inside_list.mdsetext_heading_inside_list.mdindented_code_in_list_vs_continuation.mdTest Plan
just test-crate biome_markdown_parserjust test-markdown-conformancejust fjust lDocs
N/A.