fix(markdown_parser): prefer list item over thematic break for - ---#9946
fix(markdown_parser): prefer list item over thematic break for - ---#9946jfmcdowell wants to merge 4 commits intobiomejs:mainfrom
- ---#9946Conversation
|
54712fd to
efd3e6a
Compare
Merging this PR will not alter performance
Comparing Footnotes
|
|
No actionable comments were generated in the recent review. 🎉 ℹ️ Recent review info⚙️ Run configurationConfiguration used: Path: .coderabbit.yaml Review profile: CHILL Plan: Pro Run ID: ⛔ Files ignored due to path filters (1)
📒 Files selected for processing (3)
🚧 Files skipped from review as they are similar to previous changes (2)
WalkthroughThis PR changes thematic-break recognition and parsing to disambiguate cases where a bullet marker ( Possibly related PRs
Suggested reviewers
🚥 Pre-merge checks | ✅ 2✅ Passed checks (2 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches🧪 Generate unit tests (beta)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
| if !matches!(bytes[0], b'-' | b'*' | b'+') { | ||
| return false; | ||
| } | ||
| if !matches!(bytes[1], b' ' | b'\t') { | ||
| return false; | ||
| } | ||
|
|
||
| // The payload (after marker + space) must be 3+ consecutive matching | ||
| // break characters, optionally followed by trailing whitespace only. | ||
| let payload = text[2..].trim_end_matches([' ', '\t']); | ||
| let payload_bytes = payload.as_bytes(); | ||
| if payload_bytes.len() < THEMATIC_BREAK_MIN_CHARS { | ||
| return false; | ||
| } | ||
| let break_char = payload_bytes[0]; | ||
| if !matches!(break_char, b'-' | b'*' | b'_') { |
There was a problem hiding this comment.
Remember to use the lookup table for known characters
When the lexer produces `MD_THEMATIC_BREAK_LITERAL` for a line like `- ---`, the thematic break interpretation won because it was checked before list items in the block dispatcher. Per CommonMark §5.2/§4.1 (and verified against commonmark.js + markdown-it), when stripping a bullet marker + space from the token leaves content that is itself a valid thematic break (3+ matching chars), the list item interpretation should win. E.g.: - `- ---` → list item containing <hr /> (3 chars remain) - `- - -` → thematic break (only 2 chars remain after marker) The fix adds a parser-side guard (`thematic_break_hides_list_item`) that inspects the token text. When triggered, the token is re-lexed via `ThematicBreakParts` context to expose the individual marker tokens, then list item parsing proceeds normally.
…classification Route `*`, `-`, and `_` classification through `biome_unicode_table::lookup_byte` via a shared `is_break_marker` helper, following the project convention. Whitespace checks (`' '`/`'\t'`) are kept explicit since `WHS` is semantically broader than what CommonMark requires here.
024a7e4 to
bd64d69
Compare
|
After re-examining the CommonMark spec (§4.1 Thematic breaks) this is the wrong approach. |
Note
This PR was created with AI assistance (Claude Code).
Summary
When the lexer produces
MD_THEMATIC_BREAK_LITERALfor a line like- ---, the thematic break check in the block dispatcher fires before the list item check, so the line is parsed as a top-level<hr />instead of a list item containing<hr />.Per CommonMark §5.2/§4.1 (verified against commonmark.js + markdown-it): when stripping a bullet marker + space from the token text leaves a consecutive run of 3+ matching break characters, the list item interpretation wins:
- ---→ list item containing<hr />(consecutive---after marker)* ***→ list item containing<hr />+ ___→ list item containing<hr />- - -→ thematic break (spaced chars after marker — stays a break)* * *→ thematic breakThe fix adds a parser-side guard (
thematic_break_hides_list_item) that inspects the token text. When triggered, the token is re-lexed viaThematicBreakPartscontext to expose the individual marker tokens, then list item parsing proceeds normally.Also fixes 2 pre-existing CommonMark conformance failures (examples 53, 54) — conformance is now 652/652 (100%).
Test Plan
just test-crate biome_markdown_parserjust test-markdown-conformancespec_test.rsthematic_break_in_list.md)Docs
N/A