refactor(markdown-parser): promote thematic break skipped trivia to explicit CST nodes by jfmcdowell · Pull Request #9337 · biomejs/biome

jfmcdowell · 2026-03-05T03:51:41Z

Note

AI Assistance Disclosure: This PR was developed with assistance from Claude Code.

Summary

Add MdThematicBreakChar, AnyMdThematicBreakPart, and MdThematicBreakPartList to the ungram grammar, replacing the single md_thematic_break_literal token model with a structured parts list.
Add ThematicBreakParts lexer context that emits single-char STAR/MINUS/UNDERSCORE tokens and MD_INDENT_CHAR for inter-marker whitespace, instead of aggregating into MD_THEMATIC_BREAK_LITERAL.
Implement parse_thematic_break_parts with re-lex happy path (decomposing MD_THEMATIC_BREAK_LITERAL) and fallback path for list-item contexts where tokens are already individual.
Add FormatNodeRule stubs for MdThematicBreakChar, AnyMdThematicBreakPart, and MdThematicBreakPartList.
Fix infinite loop in list parser when thematic break detection succeeds but parsing returns Absent (pre-existing bug exposed by new test fixture).
Add thematic_break_in_list.md test fixture for fallback-path coverage (thematic breaks inside list items).
Update all thematic break parser snapshots to reflect the new CST shape.

Follow-up to #9321. All three parse_as_skipped_trivia_tokens call sites in thematic break parsing have been eliminated. Break characters (*, -, _) and inter-marker whitespace are now real CST nodes visible to the formatter harness.

No user-facing behavior change. Parsed semantics are preserved; only the internal CST representation changes.

Test Plan

just test-crate biome_markdown_parser
cargo insta test -p biome_markdown_parser
cargo clippy -p biome_markdown_parser -p biome_markdown_formatter
cargo test -p biome_cli
just test-markdown-conformance

Docs

N/A — internal structural change, no new user-facing features.

changeset-bot · 2026-03-05T03:51:46Z

⚠️ No Changeset found

Latest commit: 9d16839

Merging this PR will not cause a version bump for any packages. If these changes should not result in a new version, you're good to go. If these changes should result in a version bump, you need to add a changeset.

This PR includes no changesets

When changesets are added to this PR, you'll see the packages that this PR includes changesets for and the associated semver types

Click here to learn what changesets are, and how to add one.

Click here if you're a maintainer who wants to add a changeset to this PR

coderabbitai · 2026-03-05T13:37:20Z

Walkthrough

This PR converts thematic breaks from a single literal token to a parts-based representation. It adds new grammar kinds (MdThematicBreakChar, AnyMdThematicBreakPart, MdThematicBreakPartList), updates the lexer to emit a ThematicBreakParts context and tokenisation (marker chars and indent tokens), changes parser logic to re-lex and parse thematic-break parts, and wires formatter implementations for the new node types. Control flow primarily delegates to existing formatting and parsing paths with a parts-based parsing/re-lexing path for complex cases.

Possibly related PRs

refactor(markdown-parser): extract magic numbers to named constants #9228 — Modifies thematic_break_block parsing and lexer/parser indentation handling; touches the same detection/counting logic updated here.
refactor(markdown-parser): promote list structural tokens from skipped trivia to explicit CST nodes #9274 — Adds MdIndentToken formatting plumbing that this PR reuses when emitting/formatting thematic-break parts.
refactor(markdown-parser): promote fenced code block skipped trivia to explicit CST nodes #9321 — Changes lexer behaviour to emit indent/marker tokens in specific contexts; closely related to the new ThematicBreakParts lexing.

Suggested reviewers

ematipico

🚥 Pre-merge checks | ✅ 2

✅ Passed checks (2 passed)

Check name	Status	Explanation
Title check	✅ Passed	The title accurately reflects the main objective: refactoring thematic break parsing by promoting skipped trivia (whitespace/markers) to explicit CST nodes.
Description check	✅ Passed	The description comprehensively documents the changeset, covering grammar changes, lexer contexts, parser implementations, formatting rules, bug fixes, test fixtures, and the rationale for promoting skipped trivia to explicit nodes.

_{✏️ Tip: You can configure your own custom pre-merge checks in the settings.}

✨ Finishing Touches

🧪 Generate unit tests (beta)

Create PR with unit tests
Post copyable unit tests in a comment

Tip

Try Coding Plans. Let us write the prompt for your AI agent so you can ship faster (with fewer bugs).
Share your feedback on Discord.

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

_{Comment @coderabbitai help to get the list of available commands and usage tips.}

coderabbitai

Actionable comments posted: 1

🤖 Prompt for all review comments with AI agents

Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@crates/biome_markdown_parser/src/lexer/mod.rs`:
- Around line 233-240: The numbered comment describing whitespace handling is
stale: the code now checks ThematicBreakParts before CodeSpan but the list still
shows the old order; update the comment block that documents the dispatch order
(the multiline comment above the whitespace handling logic) to reflect the
current sequence used by the lexer—mention ThematicBreakParts prior to CodeSpan
and keep the rest of steps consistent with the implementation around the
whitespace dispatch in mod.rs (look for references to ThematicBreakParts and
CodeSpan in the surrounding code).

ℹ️ Review info

⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

Run ID: c522adf5-a34d-4adf-8d15-77ecdc3ef68b

📥 Commits

Reviewing files that changed from the base of the PR and between 20a64c4 and 50cddf6.

⛔ Files ignored due to path filters (10)

crates/biome_markdown_factory/src/generated/node_factory.rs is excluded by !**/generated/**, !**/generated/** and included by **
crates/biome_markdown_factory/src/generated/syntax_factory.rs is excluded by !**/generated/**, !**/generated/** and included by **
crates/biome_markdown_parser/tests/md_test_suite/ok/lazy_continuation.md.snap is excluded by !**/*.snap and included by **
crates/biome_markdown_parser/tests/md_test_suite/ok/paragraph_interruption.md.snap is excluded by !**/*.snap and included by **
crates/biome_markdown_parser/tests/md_test_suite/ok/thematic_break_block.md.snap is excluded by !**/*.snap and included by **
crates/biome_markdown_parser/tests/md_test_suite/ok/thematic_break_in_list.md.snap is excluded by !**/*.snap and included by **
crates/biome_markdown_syntax/src/generated/kind.rs is excluded by !**/generated/**, !**/generated/** and included by **
crates/biome_markdown_syntax/src/generated/macros.rs is excluded by !**/generated/**, !**/generated/** and included by **
crates/biome_markdown_syntax/src/generated/nodes.rs is excluded by !**/generated/**, !**/generated/** and included by **
crates/biome_markdown_syntax/src/generated/nodes_mut.rs is excluded by !**/generated/**, !**/generated/** and included by **

📒 Files selected for processing (15)

crates/biome_markdown_formatter/src/generated.rs
crates/biome_markdown_formatter/src/markdown/any/mod.rs
crates/biome_markdown_formatter/src/markdown/any/thematic_break_part.rs
crates/biome_markdown_formatter/src/markdown/auxiliary/mod.rs
crates/biome_markdown_formatter/src/markdown/auxiliary/thematic_break_char.rs
crates/biome_markdown_formatter/src/markdown/lists/mod.rs
crates/biome_markdown_formatter/src/markdown/lists/thematic_break_part_list.rs
crates/biome_markdown_parser/src/lexer/mod.rs
crates/biome_markdown_parser/src/parser.rs
crates/biome_markdown_parser/src/syntax/list.rs
crates/biome_markdown_parser/src/syntax/thematic_break_block.rs
crates/biome_markdown_parser/src/token_source.rs
crates/biome_markdown_parser/tests/md_test_suite/ok/thematic_break_in_list.md
xtask/codegen/markdown.ungram
xtask/codegen/src/markdown_kinds_src.rs

crates/biome_markdown_parser/src/lexer/mod.rs

coderabbitai

🧹 Nitpick comments (1)

crates/biome_markdown_parser/src/lexer/mod.rs (1)

706-715: Consider de-duplicating marker→token mapping.

The same * / _ / - mapping is repeated in a few places; a tiny helper would reduce drift risk.

Possible tidy-up

+    #[inline]
+    fn marker_to_token(marker: u8) -> MarkdownSyntaxKind {
+        match marker {
+            b'*' => STAR,
+            b'_' => UNDERSCORE,
+            b'-' => MINUS,
+            _ => MINUS,
+        }
+    }
+
@@
         if matches!(context, MarkdownLexContext::ThematicBreakParts) {
             self.advance(1);
-            return match start_char {
-                b'*' => STAR,
-                b'_' => UNDERSCORE,
-                _ => MINUS,
-            };
+            return Self::marker_to_token(start_char);
         }
@@
         if matches!(context, MarkdownLexContext::EmphasisInline) {
             self.advance(1);
-            return match start_char {
-                b'*' => STAR,
-                b'_' => UNDERSCORE,
-                b'-' => MINUS,
-                _ => unreachable!(),
-            };
+            return Self::marker_to_token(start_char);
         }
@@
-        match start_char {
-            b'*' => STAR,
-            b'_' => UNDERSCORE,
-            b'-' => MINUS,
-            _ => unreachable!(),
-        }
+        Self::marker_to_token(start_char)

Also applies to: 773-779, 794-799

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed.

In `@crates/biome_markdown_parser/src/lexer/mod.rs` around lines 706 - 715, Create
a small helper function (e.g., marker_to_token(start_char: u8) -> TokenType)
that maps b'*'→STAR, b'_'→UNDERSCORE, and default→MINUS, and replace the
duplicated match expressions in MarkdownLexContext::ThematicBreakParts and the
other locations (the matches around lines 773–779 and 794–799) with a call to
this helper; ensure the existing advance(1) logic and return semantics are
preserved so you only swap the match block for marker_to_token(start_char).

🤖 Prompt for all review comments with AI agents

Verify each finding against the current code and only fix it if needed.

Nitpick comments:
In `@crates/biome_markdown_parser/src/lexer/mod.rs`:
- Around line 706-715: Create a small helper function (e.g.,
marker_to_token(start_char: u8) -> TokenType) that maps b'*'→STAR,
b'_'→UNDERSCORE, and default→MINUS, and replace the duplicated match expressions
in MarkdownLexContext::ThematicBreakParts and the other locations (the matches
around lines 773–779 and 794–799) with a call to this helper; ensure the
existing advance(1) logic and return semantics are preserved so you only swap
the match block for marker_to_token(start_char).

ℹ️ Review info

⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

Run ID: 8bc6e3e4-3dce-487f-99e8-fd5092727d98

📥 Commits

Reviewing files that changed from the base of the PR and between 50cddf6 and d498012.

📒 Files selected for processing (1)

crates/biome_markdown_parser/src/lexer/mod.rs

…xplicit CST nodes Replace all `parse_as_skipped_trivia_tokens` calls in thematic break parsing with explicit `MdThematicBreakChar` and `MdIndentToken` CST nodes. Break characters (*, -, _) and inter-marker whitespace are now structurally represented via `MdThematicBreakPartList` instead of being hidden in trivia. - Add `MdThematicBreakChar`, `AnyMdThematicBreakPart`, and `MdThematicBreakPartList` to the ungram grammar - Add `ThematicBreakParts` lexer context for single-char token emission - Implement `parse_thematic_break_parts` with re-lex happy path and fallback path for list-item contexts - Fix infinite loop in list parser when thematic break detection succeeds but parsing returns Absent - Add `thematic_break_in_list.md` test fixture for fallback path coverage

Add ThematicBreakParts to the numbered comment documenting the whitespace handling dispatch order in the lexer.

…ext after upstream rename

github-actions bot added A-Parser Area: parser A-Formatter Area: formatter A-Tooling Area: internal tools labels Mar 5, 2026

jfmcdowell force-pushed the refactor/thematic-break-skipped-trivia branch 4 times, most recently from 78cab77 to 50cddf6 Compare March 5, 2026 12:51

jfmcdowell marked this pull request as ready for review March 5, 2026 13:26

coderabbitai bot reviewed Mar 5, 2026

View reviewed changes

crates/biome_markdown_parser/src/lexer/mod.rs Outdated Show resolved Hide resolved

coderabbitai bot reviewed Mar 5, 2026

View reviewed changes

jfmcdowell added 3 commits March 9, 2026 10:14

fix(markdown-parser): update stale whitespace dispatch order comment

765ff34

Add ThematicBreakParts to the numbered comment documenting the whitespace handling dispatch order in the lexer.

fix(markdown-formatter): update MarkdownFormatContext to MdFormatCont…

9d16839

…ext after upstream rename

jfmcdowell force-pushed the refactor/thematic-break-skipped-trivia branch from d498012 to 9d16839 Compare March 9, 2026 14:21

dyc3 approved these changes Mar 9, 2026

View reviewed changes

dyc3 merged commit 5978824 into biomejs:main Mar 9, 2026
14 checks passed

jfmcdowell mentioned this pull request Mar 10, 2026

refactor(markdown-parser): promote remaining skipped trivia to explicit CST nodes #9427

Merged

jfmcdowell deleted the refactor/thematic-break-skipped-trivia branch March 10, 2026 01:17

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

refactor(markdown-parser): promote thematic break skipped trivia to explicit CST nodes#9337

refactor(markdown-parser): promote thematic break skipped trivia to explicit CST nodes#9337
dyc3 merged 3 commits intobiomejs:mainfrom
jfmcdowell:refactor/thematic-break-skipped-trivia

jfmcdowell commented Mar 5, 2026 •

edited

Loading

Uh oh!

changeset-bot bot commented Mar 5, 2026 •

edited

Loading

Uh oh!

coderabbitai bot commented Mar 5, 2026 •

edited

Loading

Uh oh!

coderabbitai bot left a comment

Uh oh!

Uh oh!

coderabbitai bot left a comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Uh oh!

Conversation

jfmcdowell commented Mar 5, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

Test Plan

Docs

Uh oh!

changeset-bot bot commented Mar 5, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

⚠️ No Changeset found

Uh oh!

coderabbitai bot commented Mar 5, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Walkthrough

Possibly related PRs

Suggested reviewers

Uh oh!

coderabbitai bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

coderabbitai bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

jfmcdowell commented Mar 5, 2026 •

edited

Loading

changeset-bot bot commented Mar 5, 2026 •

edited

Loading

coderabbitai bot commented Mar 5, 2026 •

edited

Loading