refactor(markdown): replace skipped trivia with formatter-safe recovery#9746
Conversation
|
Merging this PR will not alter performance
Comparing Footnotes
|
WalkthroughAdded Possibly related PRs
Suggested labels
Suggested reviewers
🚥 Pre-merge checks | ✅ 2✅ Passed checks (2 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches🧪 Generate unit tests (beta)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
There was a problem hiding this comment.
Actionable comments posted: 1
Caution
Some comments are outside the diff and can’t be posted inline due to platform limitations.
⚠️ Outside diff range comments (1)
crates/biome_markdown_parser/src/parser.rs (1)
396-445:⚠️ Potential issue | 🟠 MajorPlease add parser fixtures for the new whitespace-trivia path.
This flips normal parse paths from
SkippedtoWhitespace, but the diff ships no parser cases to pin the new CST/trivia shape. I’d want at least# h \n,# h \n,# h ## \n, and one blank-line/list continuation case in the same PR; these are exactly the sort of gremlins that regress quietly.As per coding guidelines, "All code changes MUST include appropriate tests: lint rules require snapshot tests in 'tests/specs/{group}/{rule}/', formatters require snapshot tests with valid/invalid cases, parsers require test files covering valid and error cases, and bug fixes require tests that reproduce and validate the fix."
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@crates/biome_markdown_parser/src/parser.rs` around lines 396 - 445, Add parser snapshot tests that exercise the new whitespace-trivia path so we pin the CST/trivia shape produced by skip_line_indent and consume_as_whitespace_trivia: create tests that parse the input lines "# h \n", "# h \n", "# h ## \n", and a blank-line/list-continuation case, then assert the produced CST/trivia contains TriviaPieceKind::Whitespace (not Skipped) attached in the Regular MarkdownLexContext; run/update snapshots to capture the new shape and ensure the parser fixtures fail if skip_line_indent stops consuming or attaches Skipped trivia instead.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@crates/biome_markdown_parser/src/lexer/mod.rs`:
- Around line 271-278: The HeadingContent case still collapses 3+ trailing
spaces because consume_textual() bails on is_potential_hard_line_break() even
inside headings; update the mid-token hard-break guard in consume_textual() to
skip that bail when the current context is MarkdownLexContext::HeadingContent
(i.e., only treat is_potential_hard_line_break() as a hard-break bail-out when
context != MarkdownLexContext::HeadingContent), so that headings will consume
spaces normally; reference consume_textual(), is_potential_hard_line_break(),
and MarkdownLexContext::HeadingContent to locate and change the conditional
logic accordingly.
---
Outside diff comments:
In `@crates/biome_markdown_parser/src/parser.rs`:
- Around line 396-445: Add parser snapshot tests that exercise the new
whitespace-trivia path so we pin the CST/trivia shape produced by
skip_line_indent and consume_as_whitespace_trivia: create tests that parse the
input lines "# h \n", "# h \n", "# h ## \n", and a
blank-line/list-continuation case, then assert the produced CST/trivia contains
TriviaPieceKind::Whitespace (not Skipped) attached in the Regular
MarkdownLexContext; run/update snapshots to capture the new shape and ensure the
parser fixtures fail if skip_line_indent stops consuming or attaches Skipped
trivia instead.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: Path: .coderabbit.yaml
Review profile: CHILL
Plan: Pro
Run ID: bb9dbab5-3f2c-4065-bc58-18c018ba635a
⛔ Files ignored due to path filters (7)
crates/biome_markdown_parser/tests/md_test_suite/ok/atx_heading_trailing_hash.md.snapis excluded by!**/*.snapand included by**crates/biome_markdown_parser/tests/md_test_suite/ok/fenced_code_in_list.md.snapis excluded by!**/*.snapand included by**crates/biome_markdown_parser/tests/md_test_suite/ok/header.md.snapis excluded by!**/*.snapand included by**crates/biome_markdown_parser/tests/md_test_suite/ok/html_block_in_list.md.snapis excluded by!**/*.snapand included by**crates/biome_markdown_parser/tests/md_test_suite/ok/list_continuation_edge_cases.md.snapis excluded by!**/*.snapand included by**crates/biome_markdown_parser/tests/md_test_suite/ok/list_indentation.md.snapis excluded by!**/*.snapand included by**crates/biome_markdown_parser/tests/md_test_suite/ok/setext_heading_edge_cases.md.snapis excluded by!**/*.snapand included by**
📒 Files selected for processing (7)
crates/biome_markdown_parser/src/lexer/mod.rscrates/biome_markdown_parser/src/parser.rscrates/biome_markdown_parser/src/syntax/header.rscrates/biome_markdown_parser/src/syntax/list.rscrates/biome_markdown_parser/src/syntax/mod.rscrates/biome_markdown_parser/src/token_source.rscrates/biome_parser/src/token_source.rs
Refs biomejs#9742. `Skipped` trivia now only appears in genuine error-recovery paths (quote/list nesting depth limits). Spec-driven structural whitespace now uses `Whitespace` trivia instead. Add `skip_as_trivia_of_kind_with_context` to `BumpWithContext` in `biome_parser` so parsers can consume tokens as a chosen trivia kind. Markdown uses this to emit `Whitespace` trivia for structural whitespace the CommonMark spec says to strip. Add `MarkdownLexContext::HeadingContent` to split `MD_HARD_LINE_LITERAL` (trailing spaces + newline) into separate tokens in heading parsing. This lets the parser consume only the whitespace as trivia while keeping the newline visible as a real token, avoiding syntax-significant trivia. Converted normal-path sites from `Skipped` to `Whitespace`: - `parser.rs`: `skip_line_indent` - `header.rs`: whitespace before/after trailing `#` - `header.rs`: trailing spaces in heading content via re-lexing - `mod.rs`: blank-line whitespace and post-hard-break spaces - `list.rs`: blank-line whitespace in lists Also convert `list.rs:skip_list_marker_indent` from `Skipped` to `Whitespace`; this helper is only used in nesting-depth recovery paths. Retain `Skipped` only for recovery: - `quote.rs`: quote nesting depth exceeded - `list.rs`: list nesting depth exceeded Formatter-facing CST structure for quote prefixes, list prefixes, and header indent/hash nodes was already explicit and did not need grammar changes.
00430d0 to
f2e7e03
Compare
Add MdBogusBlock to the Markdown grammar as a valid AnyMdBlock child. Use it in quote and list depth-overflow recovery paths to wrap the consumed marker tokens instead of routing them through parse_as_skipped_trivia_tokens. Previously, recovery tokens (e.g. an over-nested `>` marker) were attached as Skipped trivia on the next normal paragraph content. This made them visible to the formatter on normal nodes. Now they are contained inside MdBogusBlock, which the formatter skips. With this change, zero Skipped trivia remains anywhere in the Markdown parser — not on normal paths, and not on recovery paths. Remove the now-unused skip_optional_marker_space function.
There was a problem hiding this comment.
🧹 Nitpick comments (1)
crates/biome_markdown_parser/src/syntax/quote.rs (1)
89-93: Reusetry_bump_quote_markerin this recovery branch.This branch duplicates marker-consumption logic already centralised in
try_bump_quote_marker, which can drift over time.♻️ Small refactor
- if p.at(T![>]) { - p.bump(T![>]); - } else if p.at(MD_TEXTUAL_LITERAL) && p.cur_text() == ">" { - p.bump_remap(T![>]); - } + let _ = try_bump_quote_marker(p);🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@crates/biome_markdown_parser/src/syntax/quote.rs` around lines 89 - 93, The recovery branch duplicates the quote-marker consumption logic; replace the conditional that checks p.at(T![>]) / p.at(MD_TEXTUAL_LITERAL) and calls p.bump or p.bump_remap with a single call to the existing helper try_bump_quote_marker so the centralized logic in try_bump_quote_marker is reused; locate the recovery branch in quote.rs and call try_bump_quote_marker(p) (or the appropriate receiver) instead of duplicating the two p.at/... branches, preserving existing control flow and any surrounding error/recovery handling.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Nitpick comments:
In `@crates/biome_markdown_parser/src/syntax/quote.rs`:
- Around line 89-93: The recovery branch duplicates the quote-marker consumption
logic; replace the conditional that checks p.at(T![>]) /
p.at(MD_TEXTUAL_LITERAL) and calls p.bump or p.bump_remap with a single call to
the existing helper try_bump_quote_marker so the centralized logic in
try_bump_quote_marker is reused; locate the recovery branch in quote.rs and call
try_bump_quote_marker(p) (or the appropriate receiver) instead of duplicating
the two p.at/... branches, preserving existing control flow and any surrounding
error/recovery handling.
ℹ️ Review info
⚙️ Run configuration
Configuration used: Path: .coderabbit.yaml
Review profile: CHILL
Plan: Pro
Run ID: 48462d35-7339-4024-a346-a165d4623af6
⛔ Files ignored due to path filters (6)
crates/biome_markdown_factory/src/generated/node_factory.rsis excluded by!**/generated/**,!**/generated/**and included by**crates/biome_markdown_factory/src/generated/syntax_factory.rsis excluded by!**/generated/**,!**/generated/**and included by**crates/biome_markdown_parser/tests/md_test_suite/error/quote_nesting_too_deep.md.snapis excluded by!**/*.snapand included by**crates/biome_markdown_syntax/src/generated/kind.rsis excluded by!**/generated/**,!**/generated/**and included by**crates/biome_markdown_syntax/src/generated/macros.rsis excluded by!**/generated/**,!**/generated/**and included by**crates/biome_markdown_syntax/src/generated/nodes.rsis excluded by!**/generated/**,!**/generated/**and included by**
📒 Files selected for processing (8)
crates/biome_markdown_formatter/src/generated.rscrates/biome_markdown_formatter/src/markdown/any/block.rscrates/biome_markdown_formatter/src/markdown/bogus/bogus_block.rscrates/biome_markdown_formatter/src/markdown/bogus/mod.rscrates/biome_markdown_parser/src/syntax/list.rscrates/biome_markdown_parser/src/syntax/quote.rsxtask/codegen/markdown.ungramxtask/codegen/src/markdown_kinds_src.rs
✅ Files skipped from review due to trivial changes (2)
- crates/biome_markdown_formatter/src/markdown/bogus/mod.rs
- xtask/codegen/src/markdown_kinds_src.rs
🚧 Files skipped from review as they are similar to previous changes (1)
- crates/biome_markdown_parser/src/syntax/list.rs
Note
This PR was created with AI assistance (Claude Code).
Summary
Refs #9742.
Replace
Skippedtrivia in Markdown parsing with formatter-safe representations. Normal parsing paths now useWhitespacetrivia for structural whitespace, and quote/list depth-overflow recovery now wraps content inMdBogusBlockinstead of attaching skipped trivia to normal nodes.Test Plan
just test-crate biome_markdown_parser— all passedjust test-crate biome_markdown_formatter— all passedjust test-markdown-conformance— 652/652just fjust lDocs
N/A.