refactor(markdown-parser): align newline/prescan paragraph-break checks#9197
Conversation
Consolidate duplicate setext underline, thematic break, fence, block interrupt, and textual list marker checks from handle_inline_newline and inline_list_source_len into a shared at_paragraph_break predicate. This also adds the previously missing textual_looks_like_list_marker check to the prescan (inline_list_source_len), aligning it with the parse path.
The prescan (inline_list_source_len) was missing a setext underline check after stripping the list item's required indent. This parity gap with handle_inline_newline was harmless — the prescan caught it on the next iteration — but made the code harder to reason about. Add a direct check after indent stripping and document the remaining block-interrupt parity gap (handled on the next iteration).
Both callers of classify_quote_break_after_newline now pass true for include_textual_markers (the prescan was previously passing false, but at_paragraph_break already checks textual_looks_like_list_marker in the subsequent step). Remove the parameter entirely and always check textual markers, simplifying the API.
|
WalkthroughRemoved the Suggested labels: Suggested reviewers
🚥 Pre-merge checks | ✅ 2✅ Passed checks (2 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches
🧪 Generate unit tests (beta)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
There was a problem hiding this comment.
🧹 Nitpick comments (2)
crates/biome_markdown_parser/src/syntax/mod.rs (2)
851-867: Doc comment is missing textual list markers.Line 866 returns
truefortextual_looks_like_list_marker, but the doc only lists setext underline, thematic break, fence, and block interrupt.📝 Proposed doc fix
-/// Check if the current position is a paragraph break (setext underline, -/// thematic break, fence, or block interrupt). +/// Check if the current position is a paragraph break (setext underline, +/// thematic break, fence, block interrupt, or textual list marker).As per coding guidelines: "Use inline rustdoc documentation for rules, assists, and their options."
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@crates/biome_markdown_parser/src/syntax/mod.rs` around lines 851 - 867, The doc comment for function at_paragraph_break omits "textual list markers" even though the function checks textual_looks_like_list_marker; update the Rustdoc to list all break conditions including setext underline, thematic break, fence, block interrupt, and textual list markers (or "textual list marker" detection) so the documentation matches the implementation in at_paragraph_break and references the textual_looks_like_list_marker check.
1235-1235: Prescan always treatshas_contentastrue— minor parity gap with parse path.Two related spots:
- Line 1235 —
at_paragraph_break(p, true): setext/thematic-break detection is always active, whereashandle_inline_newlinepasses the actualhas_contentvalue (and the parse path at lines 965, 1068–1082 gates onhas_content).- Lines 1273–1278 — the post-indent setext/thematic re-check also has no
has_contentguard (cf. parse path at line 965:if is_setext && has_content).Net effect: the prescan may terminate earlier than the parse path for continuation lines that lead with a setext/thematic marker without preceding paragraph content, producing a shorter emphasis-context span. Nothing catastrophic, but it's worth tracking as a known parity gap.
Also applies to: 1266-1278
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@crates/biome_markdown_parser/src/syntax/mod.rs` at line 1235, Prescan currently calls at_paragraph_break(p, true) and re-checks setext/thematic markers unconditionally, causing prescan to always treat has_content as true and potentially stop earlier than the parse path; modify the prescan logic so it uses the actual has_content flag (propagate the same has_content value used by handle_inline_newline) when calling at_paragraph_break and add the same guard around the post-indent setext/thematic re-check (mirror the parse-path condition: only check is_setext/thematic if has_content is true) so prescan parity with the parse path (symbols: at_paragraph_break, handle_inline_newline, is_setext, has_content) is restored.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Nitpick comments:
In `@crates/biome_markdown_parser/src/syntax/mod.rs`:
- Around line 851-867: The doc comment for function at_paragraph_break omits
"textual list markers" even though the function checks
textual_looks_like_list_marker; update the Rustdoc to list all break conditions
including setext underline, thematic break, fence, block interrupt, and textual
list markers (or "textual list marker" detection) so the documentation matches
the implementation in at_paragraph_break and references the
textual_looks_like_list_marker check.
- Line 1235: Prescan currently calls at_paragraph_break(p, true) and re-checks
setext/thematic markers unconditionally, causing prescan to always treat
has_content as true and potentially stop earlier than the parse path; modify the
prescan logic so it uses the actual has_content flag (propagate the same
has_content value used by handle_inline_newline) when calling at_paragraph_break
and add the same guard around the post-indent setext/thematic re-check (mirror
the parse-path condition: only check is_setext/thematic if has_content is true)
so prescan parity with the parse path (symbols: at_paragraph_break,
handle_inline_newline, is_setext, has_content) is restored.
There was a problem hiding this comment.
Actionable comments posted: 1
🧹 Nitpick comments (2)
crates/biome_markdown_parser/src/syntax/mod.rs (2)
1275-1275: Nit: preferis_dash_only_thematic_break(p)for consistency.Every other call site in this file uses the thin wrapper
is_dash_only_thematic_break(p); only line 1275 reaches intois_dash_only_thematic_break_text(p.cur_text())directly.🔧 Trivial fix
- && is_dash_only_thematic_break_text(p.cur_text())) + && is_dash_only_thematic_break(p))🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@crates/biome_markdown_parser/src/syntax/mod.rs` at line 1275, Replace the direct call to is_dash_only_thematic_break_text(p.cur_text()) with the existing wrapper is_dash_only_thematic_break(p) for consistency with other call sites; update the expression that currently references p.cur_text() so it instead passes the parser/token context object p into is_dash_only_thematic_break to match usages elsewhere in this module.
856-867:at_paragraph_breakomits the< 4indent guard seen in the inline loop, but this is safe.The inline-loop checks at lines 1068–1084 guard with
real_line_indent_from_source(p) < INDENT_CODE_BLOCK_SPACES;at_paragraph_breakdoes not. In practice this isn't reachable with 4+ indent because (a) forrequired_indent == 0the earlier setext check at line 942 fires first and when it falls through the parser is still positioned at the whitespace token, not the underline; (b) forrequired_indent > 0,allow_setext_headingreturnsfalsewhenindent < required_indent. Worth a brief comment to explain why the guard is deliberately absent here, so the next reader doesn't add it unnecessarily.🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@crates/biome_markdown_parser/src/syntax/mod.rs` around lines 856 - 867, The function at_paragraph_break omits the real_line_indent_from_source(p) < INDENT_CODE_BLOCK_SPACES guard present in the inline loop, which could confuse future readers; add a brief comment inside or just above at_paragraph_break explaining that the <4-indent check is intentionally omitted because (1) when required_indent == 0 the earlier setext check handles the case and the parser is positioned at the whitespace token, and (2) when required_indent > 0 allow_setext_heading already returns false for indent < required_indent, so the guard is unnecessary—mention at_paragraph_break and allow_setext_heading by name so maintainers know why not to reintroduce the <4 check.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@crates/biome_markdown_parser/src/syntax/mod.rs`:
- Line 1235: The prescan currently hardcodes has_content=true in calls like
at_paragraph_break(p, true), causing setext/thematic guards to be
unconditionally enabled; add a local mutable has_content flag (initialized
false) in the prescan loop (mirror parse_inline_item_list) and set it true
whenever you encounter a non-newline content token during the prescan, then
replace the hardcoded true in at_paragraph_break and the later re-check (the
post-list-indent strip) with this has_content variable and ensure the same
is_setext && has_content gating used in the parse path (e.g.,
handle_inline_newline / inline_list_source_len checks) is applied so
setext/thematic logic only runs when real content was seen.
---
Nitpick comments:
In `@crates/biome_markdown_parser/src/syntax/mod.rs`:
- Line 1275: Replace the direct call to
is_dash_only_thematic_break_text(p.cur_text()) with the existing wrapper
is_dash_only_thematic_break(p) for consistency with other call sites; update the
expression that currently references p.cur_text() so it instead passes the
parser/token context object p into is_dash_only_thematic_break to match usages
elsewhere in this module.
- Around line 856-867: The function at_paragraph_break omits the
real_line_indent_from_source(p) < INDENT_CODE_BLOCK_SPACES guard present in the
inline loop, which could confuse future readers; add a brief comment inside or
just above at_paragraph_break explaining that the <4-indent check is
intentionally omitted because (1) when required_indent == 0 the earlier setext
check handles the case and the parser is positioned at the whitespace token, and
(2) when required_indent > 0 allow_setext_heading already returns false for
indent < required_indent, so the guard is unnecessary—mention at_paragraph_break
and allow_setext_heading by name so maintainers know why not to reintroduce the
<4 check.
| } | ||
|
|
||
| if at_block_interrupt(p) { | ||
| if at_paragraph_break(p, true) { |
There was a problem hiding this comment.
Prescan doesn't track has_content; both setext/thematic guards are unconditionally enabled.
at_paragraph_break(p, true) at line 1235 hardcodes has_content = true. The parse path (handle_inline_newline) gates the setext/thematic break on the actual has_content flag accumulated during parse. The re-check added at lines 1273–1278 (post-list-indent stripping) has the same gap — its parse-path mirror at lines 965–969 is guarded by if is_setext && has_content.
In practice inline_list_source_len is never called when the inline list starts with a NEWLINE (paragraphs can't start with one), so the risk is limited to emphasis-context sizing in edge cases. But this is a direct parity gap contrary to the PR's stated goal of aligning decision points.
A has_content local flag (mirroring how parse_inline_item_list tracks it) would close the gap:
💡 Sketch — add `has_content` tracking to the prescan
fn inline_list_source_len(p: &mut MarkdownParser) -> usize {
p.lookahead(|p| {
let mut len = 0usize;
+ let mut has_content = false;
loop {
// …
if p.at(NEWLINE) {
// …
- if at_paragraph_break(p, true) {
+ if at_paragraph_break(p, has_content) {
break;
}
// …
// After stripping list indent, re-check setext/thematic markers
- if p.at(MD_SETEXT_UNDERLINE_LITERAL)
- || (p.at(MD_THEMATIC_BREAK_LITERAL)
- && is_dash_only_thematic_break_text(p.cur_text()))
- {
- break;
- }
+ if has_content
+ && (p.at(MD_SETEXT_UNDERLINE_LITERAL)
+ || (p.at(MD_THEMATIC_BREAK_LITERAL) && is_dash_only_thematic_break(p)))
+ {
+ break;
+ }
continue;
}
// …
len += p.cur_text().len();
p.bump(p.cur());
+ // mirror has_content tracking from parse_inline_item_list
+ let inline_end: usize = p.cur_range().start().into();
+ // (simplified: any non-NEWLINE token counts as content for prescan purposes)
+ has_content = true;
}
len
})
}Also applies to: 1273-1278
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@crates/biome_markdown_parser/src/syntax/mod.rs` at line 1235, The prescan
currently hardcodes has_content=true in calls like at_paragraph_break(p, true),
causing setext/thematic guards to be unconditionally enabled; add a local
mutable has_content flag (initialized false) in the prescan loop (mirror
parse_inline_item_list) and set it true whenever you encounter a non-newline
content token during the prescan, then replace the hardcoded true in
at_paragraph_break and the later re-check (the post-list-indent strip) with this
has_content variable and ensure the same is_setext && has_content gating used in
the parse path (e.g., handle_inline_newline / inline_list_source_len checks) is
applied so setext/thematic logic only runs when real content was seen.
There was a problem hiding this comment.
Yes, this issue was real and is fixed in #9313: the prescan mismatch in inline_list_source_len is now gated with has_content for setext/thematic checks (aligned with parse_inline_item_list). Revalidated with parser tests and CommonMark conformance passing. It slipped past the keeper.
Note
AI Assistance Disclosure: This PR was developed with assistance from Claude Code.
Summary
at_paragraph_breakpredicate to consolidate duplicate setext, thematic break, fence, block interrupt, and textual list marker checks betweenhandle_inline_newlineandinline_list_source_len.inline_list_source_len, tightening parse/prescan parity for list-indented continuation lines.classify_quote_break_after_newlineso both callers consistently consider textual list markers, then remove the now-redundantinclude_textual_markersparameter.No user-facing behavior change is intended; this aligns parse/prescan decision points and adds regression coverage to lock parity.
Scope is limited to
crates/biome_markdown_parser/src/syntax/mod.rspluscrates/biome_markdown_parser/tests/md_test_suite/ok/quote_textual_marker_parity.mdand its snapshot.Test Plan
cargo test -p biome_markdown_parsercargo insta test -p biome_markdown_parserjust test-markdown-conformancejust f && just lDocs
N/A