fix(markdown_parser): prefer list item over thematic break for `- ---` by jfmcdowell · Pull Request #9946 · biomejs/biome

jfmcdowell · 2026-04-12T18:18:39Z

Note

This PR was created with AI assistance (Claude Code).

Summary

When the lexer produces MD_THEMATIC_BREAK_LITERAL for a line like - ---, the thematic break check in the block dispatcher fires before the list item check, so the line is parsed as a top-level <hr /> instead of a list item containing <hr />.

Per CommonMark §5.2/§4.1 (verified against commonmark.js + markdown-it): when stripping a bullet marker + space from the token text leaves a consecutive run of 3+ matching break characters, the list item interpretation wins:

- --- → list item containing <hr /> (consecutive --- after marker)
* *** → list item containing <hr />
+ ___ → list item containing <hr />
- - - → thematic break (spaced chars after marker — stays a break)
* * * → thematic break

The fix adds a parser-side guard (thematic_break_hides_list_item) that inspects the token text. When triggered, the token is re-lexed via ThematicBreakParts context to expose the individual marker tokens, then list item parsing proceeds normally.

Also fixes 2 pre-existing CommonMark conformance failures (examples 53, 54) — conformance is now 652/652 (100%).

Test Plan

just test-crate biome_markdown_parser
just test-markdown-conformance
9 targeted disambiguation tests added to spec_test.rs
1 CST snapshot updated (thematic_break_in_list.md)

Docs

N/A

changeset-bot · 2026-04-12T18:18:49Z

⚠️ No Changeset found

Latest commit: bd64d69

Merging this PR will not cause a version bump for any packages. If these changes should not result in a new version, you're good to go. If these changes should result in a version bump, you need to add a changeset.

This PR includes no changesets

When changesets are added to this PR, you'll see the packages that this PR includes changesets for and the associated semver types

Click here to learn what changesets are, and how to add one.

Click here if you're a maintainer who wants to add a changeset to this PR

codspeed-hq · 2026-04-12T18:44:12Z

Merging this PR will not alter performance

✅ 28 untouched benchmarks
⏩ 228 skipped benchmarks¹

_{Comparing jfmcdowell:fix/md-thematic-break-list-precedence (81aaa78) with main (bcd6508)}

228 benchmarks were skipped, so the baseline results were used instead. If they were deleted from the codebase, click here and archive them to remove them from the performance reports. ↩

coderabbitai · 2026-04-12T19:30:48Z

No actionable comments were generated in the recent review. 🎉

ℹ️ Recent review info

⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

Run ID: 104187e8-0452-43b3-a6f3-8d0e0b5f59c6

📥 Commits

Reviewing files that changed from the base of the PR and between 13597b9 and bd64d69.

⛔ Files ignored due to path filters (1)

crates/biome_markdown_parser/tests/md_test_suite/ok/thematic_break_in_list.md.snap is excluded by !**/*.snap and included by **

📒 Files selected for processing (3)

crates/biome_markdown_parser/src/syntax/mod.rs
crates/biome_markdown_parser/src/syntax/thematic_break_block.rs
crates/biome_markdown_parser/tests/spec_test.rs

🚧 Files skipped from review as they are similar to previous changes (2)

crates/biome_markdown_parser/tests/spec_test.rs
crates/biome_markdown_parser/src/syntax/thematic_break_block.rs

Walkthrough

This PR changes thematic-break recognition and parsing to disambiguate cases where a bullet marker (-, *, +) + space is immediately followed by a run of three or more identical thematic characters. It adds a thematic_break_hides_list_item predicate and byte-oriented checks for break markers, and updates lexer/parse dispatch so such sequences force re-lexing and are parsed as list items containing an <hr /> instead of a top-level thematic break. Tests were updated to reflect the new precedence.

Possibly related PRs

fix(markdown_parser): recognize setext heading inside blockquote #9782: Introduced force-relex handling for thematic-break tokens after blockquote prefixes and changed dispatch between thematic-break and block parsing (similar re-lex/dispatch pattern).

Suggested reviewers

ematipico
dyc3

🚥 Pre-merge checks | ✅ 2

✅ Passed checks (2 passed)

Check name	Status	Explanation
Title check	✅ Passed	The title clearly and concisely summarises the main fix: establishing correct parsing precedence where list items take priority over thematic breaks for patterns like `- ---`.
Description check	✅ Passed	The description is well-related to the changeset, explaining the CommonMark precedence issue, the fix strategy, conformance improvements, and test coverage in detail.

_{✏️ Tip: You can configure your own custom pre-merge checks in the settings.}

✨ Finishing Touches

🧪 Generate unit tests (beta)

Create PR with unit tests

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

_{Comment @coderabbitai help to get the list of available commands and usage tips.}

ematipico · 2026-04-13T05:13:16Z

+    if !matches!(bytes[0], b'-' | b'*' | b'+') {
+        return false;
+    }
+    if !matches!(bytes[1], b' ' | b'\t') {
+        return false;
+    }
+
+    // The payload (after marker + space) must be 3+ consecutive matching
+    // break characters, optionally followed by trailing whitespace only.
+    let payload = text[2..].trim_end_matches([' ', '\t']);
+    let payload_bytes = payload.as_bytes();
+    if payload_bytes.len() < THEMATIC_BREAK_MIN_CHARS {
+        return false;
+    }
+    let break_char = payload_bytes[0];
+    if !matches!(break_char, b'-' | b'*' | b'_') {


Remember to use the lookup table for known characters

When the lexer produces `MD_THEMATIC_BREAK_LITERAL` for a line like `- ---`, the thematic break interpretation won because it was checked before list items in the block dispatcher. Per CommonMark §5.2/§4.1 (and verified against commonmark.js + markdown-it), when stripping a bullet marker + space from the token leaves content that is itself a valid thematic break (3+ matching chars), the list item interpretation should win. E.g.: - `- ---` → list item containing <hr /> (3 chars remain) - `- - -` → thematic break (only 2 chars remain after marker) The fix adds a parser-side guard (`thematic_break_hides_list_item`) that inspects the token text. When triggered, the token is re-lexed via `ThematicBreakParts` context to expose the individual marker tokens, then list item parsing proceeds normally.

…classification Route `*`, `-`, and `_` classification through `biome_unicode_table::lookup_byte` via a shared `is_break_marker` helper, following the project convention. Whitespace checks (`' '`/`'\t'`) are kept explicit since `WHS` is semantically broader than what CommonMark requires here.

jfmcdowell · 2026-04-13T13:17:15Z

After re-examining the CommonMark spec (§4.1 Thematic breaks) this is the wrong approach.

github-actions bot added A-Parser Area: parser L-Markdown Language: Markdown labels Apr 12, 2026

jfmcdowell force-pushed the fix/md-thematic-break-list-precedence branch from 54712fd to efd3e6a Compare April 12, 2026 18:37

jfmcdowell marked this pull request as ready for review April 12, 2026 19:26

ematipico reviewed Apr 13, 2026

View reviewed changes

jfmcdowell and others added 3 commits April 13, 2026 08:32

[autofix.ci] apply automated fixes

a5f958e

jfmcdowell force-pushed the fix/md-thematic-break-list-precedence branch from 024a7e4 to bd64d69 Compare April 13, 2026 12:32

chore: update module_graph snapshot after upstream rebase

81aaa78

github-actions bot added the A-Project Area: project label Apr 13, 2026

jfmcdowell marked this pull request as draft April 13, 2026 13:08

jfmcdowell closed this Apr 13, 2026

jfmcdowell deleted the fix/md-thematic-break-list-precedence branch April 13, 2026 13:18

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

fix(markdown_parser): prefer list item over thematic break for `- ---`#9946

fix(markdown_parser): prefer list item over thematic break for `- ---`#9946
jfmcdowell wants to merge 4 commits intobiomejs:mainfrom
jfmcdowell:fix/md-thematic-break-list-precedence

jfmcdowell commented Apr 12, 2026

Uh oh!

changeset-bot bot commented Apr 12, 2026 •

edited

Loading

Uh oh!

codspeed-hq bot commented Apr 12, 2026 •

edited

Loading

Uh oh!

coderabbitai bot commented Apr 12, 2026 •

edited

Loading

Uh oh!

ematipico Apr 13, 2026

Uh oh!

jfmcdowell commented Apr 13, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Uh oh!

Conversation

jfmcdowell commented Apr 12, 2026

Summary

Test Plan

Docs

Uh oh!

changeset-bot bot commented Apr 12, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

⚠️ No Changeset found

Uh oh!

codspeed-hq bot commented Apr 12, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Merging this PR will not alter performance

Footnotes

Uh oh!

coderabbitai bot commented Apr 12, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Walkthrough

Possibly related PRs

Suggested reviewers

Uh oh!

ematipico Apr 13, 2026

Choose a reason for hiding this comment

Uh oh!

jfmcdowell commented Apr 13, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

changeset-bot bot commented Apr 12, 2026 •

edited

Loading

codspeed-hq bot commented Apr 12, 2026 •

edited

Loading

coderabbitai bot commented Apr 12, 2026 •

edited

Loading