refactor(markdown-parser): align newline/prescan paragraph-break checks by jfmcdowell · Pull Request #9197 · biomejs/biome

jfmcdowell · 2026-02-23T01:49:39Z

Note

AI Assistance Disclosure: This PR was developed with assistance from Claude Code.

Summary

Extract shared at_paragraph_break predicate to consolidate duplicate setext, thematic break, fence, block interrupt, and textual list marker checks between handle_inline_newline and inline_list_source_len.
Add a setext/thematic check after list-indent stripping in inline_list_source_len, tightening parse/prescan parity for list-indented continuation lines.
Align classify_quote_break_after_newline so both callers consistently consider textual list markers, then remove the now-redundant include_textual_markers parameter.
Add a focused regression fixture and snapshot (quote_textual_marker_parity.md) covering quote-newline handling for textual - / 1. markers and normal quote continuation.

No user-facing behavior change is intended; this aligns parse/prescan decision points and adds regression coverage to lock parity.
Scope is limited to crates/biome_markdown_parser/src/syntax/mod.rs plus crates/biome_markdown_parser/tests/md_test_suite/ok/quote_textual_marker_parity.md and its snapshot.

Test Plan

cargo test -p biome_markdown_parser
cargo insta test -p biome_markdown_parser
just test-markdown-conformance
just f && just l

Docs

N/A

Consolidate duplicate setext underline, thematic break, fence, block interrupt, and textual list marker checks from handle_inline_newline and inline_list_source_len into a shared at_paragraph_break predicate. This also adds the previously missing textual_looks_like_list_marker check to the prescan (inline_list_source_len), aligning it with the parse path.

The prescan (inline_list_source_len) was missing a setext underline check after stripping the list item's required indent. This parity gap with handle_inline_newline was harmless — the prescan caught it on the next iteration — but made the code harder to reason about. Add a direct check after indent stripping and document the remaining block-interrupt parity gap (handled on the next iteration).

Both callers of classify_quote_break_after_newline now pass true for include_textual_markers (the prescan was previously passing false, but at_paragraph_break already checks textual_looks_like_list_marker in the subsequent step). Remove the parameter entirely and always check textual markers, simplifying the API.

changeset-bot · 2026-02-23T01:49:43Z

⚠️ No Changeset found

Latest commit: 1971de8

Merging this PR will not cause a version bump for any packages. If these changes should not result in a new version, you're good to go. If these changes should result in a version bump, you need to add a changeset.

This PR includes no changesets

When changesets are added to this PR, you'll see the packages that this PR includes changesets for and the associated semver types

Click here to learn what changesets are, and how to add one.

Click here if you're a maintainer who wants to add a changeset to this PR

coderabbitai · 2026-02-23T01:59:24Z

Walkthrough

Removed the include_textual_markers parameter from classify_quote_break_after_newline and updated its call sites. Added a new crate-visible predicate at_paragraph_break(p, has_content) to centralise paragraph-interruption checks (setext underline, thematic break, fence starts, block interrupts, textual list markers). Updated handle_inline_newline and inline_list_source_len to use the new predicate and revised quote-break logic; re-checks for setext/thematic markers after stripping list indentation were added. Adjusted fence handling for textual fences and added a new Markdown test covering quoted blocks with nested lists and paragraphs.

Suggested labels: A-Parser

Suggested reviewers

ematipico
dyc3

🚥 Pre-merge checks | ✅ 2

✅ Passed checks (2 passed)

Check name	Status	Explanation
Title check	✅ Passed	The title accurately summarises the main refactoring objective: aligning paragraph-break checks between newline and prescan logic paths in the markdown parser.
Description check	✅ Passed	The description clearly relates to the changeset, detailing the extracted at_paragraph_break predicate, parameter removal, setext/thematic checks, and the regression test fixture.

_{✏️ Tip: You can configure your own custom pre-merge checks in the settings.}

✨ Finishing Touches

📝 Generate docstrings (stacked PR)
📝 Generate docstrings (commit on current branch)

🧪 Generate unit tests (beta)

Create PR with unit tests
Post copyable unit tests in a comment

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

_{Comment @coderabbitai help to get the list of available commands and usage tips.}

coderabbitai

🧹 Nitpick comments (2)

crates/biome_markdown_parser/src/syntax/mod.rs (2)
851-867: Doc comment is missing textual list markers.

Line 866 returns true for textual_looks_like_list_marker, but the doc only lists setext underline, thematic break, fence, and block interrupt.
📝 Proposed doc fix
-/// Check if the current position is a paragraph break (setext underline,
-/// thematic break, fence, or block interrupt).
+/// Check if the current position is a paragraph break (setext underline,
+/// thematic break, fence, block interrupt, or textual list marker).
As per coding guidelines: "Use inline rustdoc documentation for rules, assists, and their options."
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@crates/biome_markdown_parser/src/syntax/mod.rs` around lines 851 - 867, The
doc comment for function at_paragraph_break omits "textual list markers" even
though the function checks textual_looks_like_list_marker; update the Rustdoc to
list all break conditions including setext underline, thematic break, fence,
block interrupt, and textual list markers (or "textual list marker" detection)
so the documentation matches the implementation in at_paragraph_break and
references the textual_looks_like_list_marker check.
1235-1235: Prescan always treats has_content as true — minor parity gap with parse path.

Two related spots:

Line 1235 — at_paragraph_break(p, true): setext/thematic-break detection is always active, whereas handle_inline_newline passes the actual has_content value (and the parse path at lines 965, 1068–1082 gates on has_content).

Lines 1273–1278 — the post-indent setext/thematic re-check also has no has_content guard (cf. parse path at line 965: if is_setext && has_content).

Net effect: the prescan may terminate earlier than the parse path for continuation lines that lead with a setext/thematic marker without preceding paragraph content, producing a shorter emphasis-context span. Nothing catastrophic, but it's worth tracking as a known parity gap.

Also applies to: 1266-1278
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@crates/biome_markdown_parser/src/syntax/mod.rs` at line 1235, Prescan
currently calls at_paragraph_break(p, true) and re-checks setext/thematic
markers unconditionally, causing prescan to always treat has_content as true and
potentially stop earlier than the parse path; modify the prescan logic so it
uses the actual has_content flag (propagate the same has_content value used by
handle_inline_newline) when calling at_paragraph_break and add the same guard
around the post-indent setext/thematic re-check (mirror the parse-path
condition: only check is_setext/thematic if has_content is true) so prescan
parity with the parse path (symbols: at_paragraph_break, handle_inline_newline,
is_setext, has_content) is restored.

🤖 Prompt for all review comments with AI agents

Verify each finding against the current code and only fix it if needed.

Nitpick comments:
In `@crates/biome_markdown_parser/src/syntax/mod.rs`:
- Around line 851-867: The doc comment for function at_paragraph_break omits
"textual list markers" even though the function checks
textual_looks_like_list_marker; update the Rustdoc to list all break conditions
including setext underline, thematic break, fence, block interrupt, and textual
list markers (or "textual list marker" detection) so the documentation matches
the implementation in at_paragraph_break and references the
textual_looks_like_list_marker check.
- Line 1235: Prescan currently calls at_paragraph_break(p, true) and re-checks
setext/thematic markers unconditionally, causing prescan to always treat
has_content as true and potentially stop earlier than the parse path; modify the
prescan logic so it uses the actual has_content flag (propagate the same
has_content value used by handle_inline_newline) when calling at_paragraph_break
and add the same guard around the post-indent setext/thematic re-check (mirror
the parse-path condition: only check is_setext/thematic if has_content is true)
so prescan parity with the parse path (symbols: at_paragraph_break,
handle_inline_newline, is_setext, has_content) is restored.

coderabbitai

Actionable comments posted: 1

🧹 Nitpick comments (2)

crates/biome_markdown_parser/src/syntax/mod.rs (2)
1275-1275: Nit: prefer is_dash_only_thematic_break(p) for consistency.

Every other call site in this file uses the thin wrapper is_dash_only_thematic_break(p); only line 1275 reaches into is_dash_only_thematic_break_text(p.cur_text()) directly.
🔧 Trivial fix
-                        && is_dash_only_thematic_break_text(p.cur_text()))
+                        && is_dash_only_thematic_break(p))
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@crates/biome_markdown_parser/src/syntax/mod.rs` at line 1275, Replace the
direct call to is_dash_only_thematic_break_text(p.cur_text()) with the existing
wrapper is_dash_only_thematic_break(p) for consistency with other call sites;
update the expression that currently references p.cur_text() so it instead
passes the parser/token context object p into is_dash_only_thematic_break to
match usages elsewhere in this module.
856-867: at_paragraph_break omits the < 4 indent guard seen in the inline loop, but this is safe.

The inline-loop checks at lines 1068–1084 guard with real_line_indent_from_source(p) < INDENT_CODE_BLOCK_SPACES; at_paragraph_break does not. In practice this isn't reachable with 4+ indent because (a) for required_indent == 0 the earlier setext check at line 942 fires first and when it falls through the parser is still positioned at the whitespace token, not the underline; (b) for required_indent > 0, allow_setext_heading returns false when indent < required_indent. Worth a brief comment to explain why the guard is deliberately absent here, so the next reader doesn't add it unnecessarily.
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@crates/biome_markdown_parser/src/syntax/mod.rs` around lines 856 - 867, The
function at_paragraph_break omits the real_line_indent_from_source(p) <
INDENT_CODE_BLOCK_SPACES guard present in the inline loop, which could confuse
future readers; add a brief comment inside or just above at_paragraph_break
explaining that the <4-indent check is intentionally omitted because (1) when
required_indent == 0 the earlier setext check handles the case and the parser is
positioned at the whitespace token, and (2) when required_indent > 0
allow_setext_heading already returns false for indent < required_indent, so the
guard is unnecessary—mention at_paragraph_break and allow_setext_heading by name
so maintainers know why not to reintroduce the <4 check.

🤖 Prompt for all review comments with AI agents

Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@crates/biome_markdown_parser/src/syntax/mod.rs`:
- Line 1235: The prescan currently hardcodes has_content=true in calls like
at_paragraph_break(p, true), causing setext/thematic guards to be
unconditionally enabled; add a local mutable has_content flag (initialized
false) in the prescan loop (mirror parse_inline_item_list) and set it true
whenever you encounter a non-newline content token during the prescan, then
replace the hardcoded true in at_paragraph_break and the later re-check (the
post-list-indent strip) with this has_content variable and ensure the same
is_setext && has_content gating used in the parse path (e.g.,
handle_inline_newline / inline_list_source_len checks) is applied so
setext/thematic logic only runs when real content was seen.

---

Nitpick comments:
In `@crates/biome_markdown_parser/src/syntax/mod.rs`:
- Line 1275: Replace the direct call to
is_dash_only_thematic_break_text(p.cur_text()) with the existing wrapper
is_dash_only_thematic_break(p) for consistency with other call sites; update the
expression that currently references p.cur_text() so it instead passes the
parser/token context object p into is_dash_only_thematic_break to match usages
elsewhere in this module.
- Around line 856-867: The function at_paragraph_break omits the
real_line_indent_from_source(p) < INDENT_CODE_BLOCK_SPACES guard present in the
inline loop, which could confuse future readers; add a brief comment inside or
just above at_paragraph_break explaining that the <4-indent check is
intentionally omitted because (1) when required_indent == 0 the earlier setext
check handles the case and the parser is positioned at the whitespace token, and
(2) when required_indent > 0 allow_setext_heading already returns false for
indent < required_indent, so the guard is unnecessary—mention at_paragraph_break
and allow_setext_heading by name so maintainers know why not to reintroduce the
<4 check.

coderabbitai · 2026-02-23T02:15:30Z

crates/biome_markdown_parser/src/syntax/mod.rs

-                }
-
-                if at_block_interrupt(p) {
+                if at_paragraph_break(p, true) {


⚠️ Potential issue | 🟡 Minor

Prescan doesn't track has_content; both setext/thematic guards are unconditionally enabled.

at_paragraph_break(p, true) at line 1235 hardcodes has_content = true. The parse path (handle_inline_newline) gates the setext/thematic break on the actual has_content flag accumulated during parse. The re-check added at lines 1273–1278 (post-list-indent stripping) has the same gap — its parse-path mirror at lines 965–969 is guarded by if is_setext && has_content.

In practice inline_list_source_len is never called when the inline list starts with a NEWLINE (paragraphs can't start with one), so the risk is limited to emphasis-context sizing in edge cases. But this is a direct parity gap contrary to the PR's stated goal of aligning decision points.

A has_content local flag (mirroring how parse_inline_item_list tracks it) would close the gap:

💡 Sketch — add `has_content` tracking to the prescan

fn inline_list_source_len(p: &mut MarkdownParser) -> usize { p.lookahead(|p| { let mut len = 0usize; + let mut has_content = false; loop { // … if p.at(NEWLINE) { // … - if at_paragraph_break(p, true) { + if at_paragraph_break(p, has_content) { break; } // … // After stripping list indent, re-check setext/thematic markers - if p.at(MD_SETEXT_UNDERLINE_LITERAL) - || (p.at(MD_THEMATIC_BREAK_LITERAL) - && is_dash_only_thematic_break_text(p.cur_text())) - { - break; - } + if has_content + && (p.at(MD_SETEXT_UNDERLINE_LITERAL) + || (p.at(MD_THEMATIC_BREAK_LITERAL) && is_dash_only_thematic_break(p))) + { + break; + } continue; } // … len += p.cur_text().len(); p.bump(p.cur()); + // mirror has_content tracking from parse_inline_item_list + let inline_end: usize = p.cur_range().start().into(); + // (simplified: any non-NEWLINE token counts as content for prescan purposes) + has_content = true; } len }) }

Also applies to: 1273-1278

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed. In `@crates/biome_markdown_parser/src/syntax/mod.rs` at line 1235, The prescan currently hardcodes has_content=true in calls like at_paragraph_break(p, true), causing setext/thematic guards to be unconditionally enabled; add a local mutable has_content flag (initialized false) in the prescan loop (mirror parse_inline_item_list) and set it true whenever you encounter a non-newline content token during the prescan, then replace the hardcoded true in at_paragraph_break and the later re-check (the post-list-indent strip) with this has_content variable and ensure the same is_setext && has_content gating used in the parse path (e.g., handle_inline_newline / inline_list_source_len checks) is applied so setext/thematic logic only runs when real content was seen.

Is this relevant @jfmcdowell ?

Yes, this issue was real and is fixed in #9313: the prescan mismatch in inline_list_source_len is now gated with has_content for setext/thematic checks (aligned with parse_inline_item_list). Revalidated with parser tests and CommonMark conformance passing. It slipped past the keeper.

jfmcdowell added 4 commits February 22, 2026 18:43

test(markdown): add quote textual-marker parity regression

57651cd

github-actions bot added the A-Parser Area: parser label Feb 23, 2026

coderabbitai bot reviewed Feb 23, 2026

View reviewed changes

docs(markdown-parser): clarify paragraph-break rustdoc

1971de8

coderabbitai bot reviewed Feb 23, 2026

View reviewed changes

ematipico approved these changes Mar 3, 2026

View reviewed changes

ematipico merged commit 1e82da1 into biomejs:main Mar 3, 2026
13 checks passed

jfmcdowell deleted the refactor/newline-prescan-parity branch March 3, 2026 20:46

coderabbitai bot mentioned this pull request Mar 11, 2026

refactor(markdown-parser): simplify inline newline handling #9446

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

refactor(markdown-parser): align newline/prescan paragraph-break checks#9197

refactor(markdown-parser): align newline/prescan paragraph-break checks#9197
ematipico merged 5 commits intobiomejs:mainfrom
jfmcdowell:refactor/newline-prescan-parity

jfmcdowell commented Feb 23, 2026

Uh oh!

changeset-bot bot commented Feb 23, 2026 •

edited

Loading

Uh oh!

coderabbitai bot commented Feb 23, 2026 •

edited

Loading

Uh oh!

coderabbitai bot left a comment

Uh oh!

coderabbitai bot left a comment

Uh oh!

coderabbitai bot Feb 23, 2026

Uh oh!

ematipico Mar 3, 2026

Uh oh!

jfmcdowell Mar 3, 2026 •

edited

Loading

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Uh oh!

Conversation

jfmcdowell commented Feb 23, 2026

Summary

Test Plan

Docs

Uh oh!

changeset-bot bot commented Feb 23, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

⚠️ No Changeset found

Uh oh!

coderabbitai bot commented Feb 23, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Walkthrough

Suggested reviewers

Uh oh!

coderabbitai bot left a comment

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot left a comment

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot Feb 23, 2026

Choose a reason for hiding this comment

Uh oh!

ematipico Mar 3, 2026

Choose a reason for hiding this comment

Uh oh!

jfmcdowell Mar 3, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

changeset-bot bot commented Feb 23, 2026 •

edited

Loading

coderabbitai bot commented Feb 23, 2026 •

edited

Loading

jfmcdowell Mar 3, 2026 •

edited

Loading