refactor(markdown-parser): promote thematic break skipped trivia to explicit CST nodes#9337
Conversation
|
78cab77 to
50cddf6
Compare
WalkthroughThis PR converts thematic breaks from a single literal token to a parts-based representation. It adds new grammar kinds (MdThematicBreakChar, AnyMdThematicBreakPart, MdThematicBreakPartList), updates the lexer to emit a ThematicBreakParts context and tokenisation (marker chars and indent tokens), changes parser logic to re-lex and parse thematic-break parts, and wires formatter implementations for the new node types. Control flow primarily delegates to existing formatting and parsing paths with a parts-based parsing/re-lexing path for complex cases. Possibly related PRs
Suggested reviewers
🚥 Pre-merge checks | ✅ 2✅ Passed checks (2 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches🧪 Generate unit tests (beta)
Tip Try Coding Plans. Let us write the prompt for your AI agent so you can ship faster (with fewer bugs). Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
There was a problem hiding this comment.
Actionable comments posted: 1
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@crates/biome_markdown_parser/src/lexer/mod.rs`:
- Around line 233-240: The numbered comment describing whitespace handling is
stale: the code now checks ThematicBreakParts before CodeSpan but the list still
shows the old order; update the comment block that documents the dispatch order
(the multiline comment above the whitespace handling logic) to reflect the
current sequence used by the lexer—mention ThematicBreakParts prior to CodeSpan
and keep the rest of steps consistent with the implementation around the
whitespace dispatch in mod.rs (look for references to ThematicBreakParts and
CodeSpan in the surrounding code).
ℹ️ Review info
⚙️ Run configuration
Configuration used: Path: .coderabbit.yaml
Review profile: CHILL
Plan: Pro
Run ID: c522adf5-a34d-4adf-8d15-77ecdc3ef68b
⛔ Files ignored due to path filters (10)
crates/biome_markdown_factory/src/generated/node_factory.rsis excluded by!**/generated/**,!**/generated/**and included by**crates/biome_markdown_factory/src/generated/syntax_factory.rsis excluded by!**/generated/**,!**/generated/**and included by**crates/biome_markdown_parser/tests/md_test_suite/ok/lazy_continuation.md.snapis excluded by!**/*.snapand included by**crates/biome_markdown_parser/tests/md_test_suite/ok/paragraph_interruption.md.snapis excluded by!**/*.snapand included by**crates/biome_markdown_parser/tests/md_test_suite/ok/thematic_break_block.md.snapis excluded by!**/*.snapand included by**crates/biome_markdown_parser/tests/md_test_suite/ok/thematic_break_in_list.md.snapis excluded by!**/*.snapand included by**crates/biome_markdown_syntax/src/generated/kind.rsis excluded by!**/generated/**,!**/generated/**and included by**crates/biome_markdown_syntax/src/generated/macros.rsis excluded by!**/generated/**,!**/generated/**and included by**crates/biome_markdown_syntax/src/generated/nodes.rsis excluded by!**/generated/**,!**/generated/**and included by**crates/biome_markdown_syntax/src/generated/nodes_mut.rsis excluded by!**/generated/**,!**/generated/**and included by**
📒 Files selected for processing (15)
crates/biome_markdown_formatter/src/generated.rscrates/biome_markdown_formatter/src/markdown/any/mod.rscrates/biome_markdown_formatter/src/markdown/any/thematic_break_part.rscrates/biome_markdown_formatter/src/markdown/auxiliary/mod.rscrates/biome_markdown_formatter/src/markdown/auxiliary/thematic_break_char.rscrates/biome_markdown_formatter/src/markdown/lists/mod.rscrates/biome_markdown_formatter/src/markdown/lists/thematic_break_part_list.rscrates/biome_markdown_parser/src/lexer/mod.rscrates/biome_markdown_parser/src/parser.rscrates/biome_markdown_parser/src/syntax/list.rscrates/biome_markdown_parser/src/syntax/thematic_break_block.rscrates/biome_markdown_parser/src/token_source.rscrates/biome_markdown_parser/tests/md_test_suite/ok/thematic_break_in_list.mdxtask/codegen/markdown.ungramxtask/codegen/src/markdown_kinds_src.rs
There was a problem hiding this comment.
🧹 Nitpick comments (1)
crates/biome_markdown_parser/src/lexer/mod.rs (1)
706-715: Consider de-duplicating marker→token mapping.The same
* / _ / -mapping is repeated in a few places; a tiny helper would reduce drift risk.Possible tidy-up
+ #[inline] + fn marker_to_token(marker: u8) -> MarkdownSyntaxKind { + match marker { + b'*' => STAR, + b'_' => UNDERSCORE, + b'-' => MINUS, + _ => MINUS, + } + } + @@ if matches!(context, MarkdownLexContext::ThematicBreakParts) { self.advance(1); - return match start_char { - b'*' => STAR, - b'_' => UNDERSCORE, - _ => MINUS, - }; + return Self::marker_to_token(start_char); } @@ if matches!(context, MarkdownLexContext::EmphasisInline) { self.advance(1); - return match start_char { - b'*' => STAR, - b'_' => UNDERSCORE, - b'-' => MINUS, - _ => unreachable!(), - }; + return Self::marker_to_token(start_char); } @@ - match start_char { - b'*' => STAR, - b'_' => UNDERSCORE, - b'-' => MINUS, - _ => unreachable!(), - } + Self::marker_to_token(start_char)Also applies to: 773-779, 794-799
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@crates/biome_markdown_parser/src/lexer/mod.rs` around lines 706 - 715, Create a small helper function (e.g., marker_to_token(start_char: u8) -> TokenType) that maps b'*'→STAR, b'_'→UNDERSCORE, and default→MINUS, and replace the duplicated match expressions in MarkdownLexContext::ThematicBreakParts and the other locations (the matches around lines 773–779 and 794–799) with a call to this helper; ensure the existing advance(1) logic and return semantics are preserved so you only swap the match block for marker_to_token(start_char).
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Nitpick comments:
In `@crates/biome_markdown_parser/src/lexer/mod.rs`:
- Around line 706-715: Create a small helper function (e.g.,
marker_to_token(start_char: u8) -> TokenType) that maps b'*'→STAR,
b'_'→UNDERSCORE, and default→MINUS, and replace the duplicated match expressions
in MarkdownLexContext::ThematicBreakParts and the other locations (the matches
around lines 773–779 and 794–799) with a call to this helper; ensure the
existing advance(1) logic and return semantics are preserved so you only swap
the match block for marker_to_token(start_char).
ℹ️ Review info
⚙️ Run configuration
Configuration used: Path: .coderabbit.yaml
Review profile: CHILL
Plan: Pro
Run ID: 8bc6e3e4-3dce-487f-99e8-fd5092727d98
📒 Files selected for processing (1)
crates/biome_markdown_parser/src/lexer/mod.rs
…xplicit CST nodes Replace all `parse_as_skipped_trivia_tokens` calls in thematic break parsing with explicit `MdThematicBreakChar` and `MdIndentToken` CST nodes. Break characters (*, -, _) and inter-marker whitespace are now structurally represented via `MdThematicBreakPartList` instead of being hidden in trivia. - Add `MdThematicBreakChar`, `AnyMdThematicBreakPart`, and `MdThematicBreakPartList` to the ungram grammar - Add `ThematicBreakParts` lexer context for single-char token emission - Implement `parse_thematic_break_parts` with re-lex happy path and fallback path for list-item contexts - Fix infinite loop in list parser when thematic break detection succeeds but parsing returns Absent - Add `thematic_break_in_list.md` test fixture for fallback path coverage
Add ThematicBreakParts to the numbered comment documenting the whitespace handling dispatch order in the lexer.
…ext after upstream rename
d498012 to
9d16839
Compare
Note
AI Assistance Disclosure: This PR was developed with assistance from Claude Code.
Summary
MdThematicBreakChar,AnyMdThematicBreakPart, andMdThematicBreakPartListto the ungram grammar, replacing the singlemd_thematic_break_literaltoken model with a structured parts list.ThematicBreakPartslexer context that emits single-charSTAR/MINUS/UNDERSCOREtokens andMD_INDENT_CHARfor inter-marker whitespace, instead of aggregating intoMD_THEMATIC_BREAK_LITERAL.parse_thematic_break_partswith re-lex happy path (decomposingMD_THEMATIC_BREAK_LITERAL) and fallback path for list-item contexts where tokens are already individual.FormatNodeRulestubs forMdThematicBreakChar,AnyMdThematicBreakPart, andMdThematicBreakPartList.Absent(pre-existing bug exposed by new test fixture).thematic_break_in_list.mdtest fixture for fallback-path coverage (thematic breaks inside list items).Follow-up to #9321. All three
parse_as_skipped_trivia_tokenscall sites in thematic break parsing have been eliminated. Break characters (*,-,_) and inter-marker whitespace are now real CST nodes visible to the formatter harness.No user-facing behavior change. Parsed semantics are preserved; only the internal CST representation changes.
Test Plan
just test-crate biome_markdown_parsercargo insta test -p biome_markdown_parsercargo clippy -p biome_markdown_parser -p biome_markdown_formattercargo test -p biome_clijust test-markdown-conformanceDocs
N/A — internal structural change, no new user-facing features.