fix(markdown-parser): promote blockquote prefix markers from skipped trivia to explicit CST nodes#9219
Conversation
|
WalkthroughThis change introduces MdQuotePrefix into the Markdown grammar, parser, HTML extractor and formatter. The parser now emits MdQuotePrefix tokens and uses new helpers for prefix emission and post-marker spacing, with nesting-depth checks. Code generation and kind literals are updated to include MD_QUOTE_PREFIX and related literals. The formatter gains a quote_prefix module and implements formatting rules and match arms to render MdQuotePrefix as verbatim prefix nodes. Possibly related PRs
Suggested reviewers
🚥 Pre-merge checks | ✅ 2✅ Passed checks (2 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches
🧪 Generate unit tests (beta)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
…ST nodes Replace parse_as_skipped_trivia_tokens() calls in quote.rs with explicit MdQuotePrefix node construction. The > marker and post-marker space are now real CST nodes visible to the formatter harness instead of hidden skipped trivia. - Add MdQuotePrefix to grammar with pre_marker_indent, marker, and post_marker_space tokens - Register MD_QUOTE_PRE_MARKER_INDENT and MD_QUOTE_POST_MARKER_SPACE token kinds - MdQuotePrefix appears in both AnyMdBlock and AnyMdInline for source-ordered interleaving at block and inline levels - Lazy continuation represented by absence of MdQuotePrefix after a line break - Move line_has_quote_prefix from mod.rs into quote.rs as line_has_quote_prefix_at_current to co-locate quote logic - All parser snapshot tests updated to reflect new CST shape
abb65ee to
f10b8e3
Compare
There was a problem hiding this comment.
Caution
Some comments are outside the diff and can’t be posted inline due to platform limitations.
⚠️ Outside diff range comments (1)
crates/biome_markdown_parser/src/syntax/quote.rs (1)
97-107:⚠️ Potential issue | 🟡 Minor
marker_space = falseis ambiguous; missingdebug_assert!for the impossibleNonepath.
emit_quote_prefix_nodereturnsfalsein two distinct cases:
- No
>token was found (the prefix wasn't emitted at all).>was found but no trailing space.Both collapse to
indent = 1, but case 1 produces anMD_QUOTEnode with noMdQuotePrefixchild, violating the grammar. The guard isat_quotehaving already passed, which is a silent invariant. Adebug_assert!at the call site makes that invariant explicit:🛡️ Proposed defensive assert
- let marker_space = emit_quote_prefix_node(p); + let marker_space = emit_quote_prefix_node(p); + debug_assert!( + // If emit_quote_prefix_node returned false solely because no space + // followed the '>', the prefix WAS emitted. If it returned false + // because no '>' was found at all (impossible here since at_quote + // already confirmed one), the node would be incomplete. + // This assert catches the latter regression at dev time. + at_quote(p) == false, // parser has advanced past '>' already + );Or simpler — return
Option<bool>fromemit_quote_prefix_nodeand handleNoneexplicitly.🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@crates/biome_markdown_parser/src/syntax/quote.rs` around lines 97 - 107, emit_quote_prefix_node can return false for two different reasons (no '>' token vs '>' without space), which can create an MD_QUOTE node with no MdQuotePrefix child; add a defensive check at the call site in the function around emit_quote_prefix_node(p) (before computing indent and recording quote indent) to make the invariant explicit: either change emit_quote_prefix_node to return Option<bool> and handle None by treating it as a parsing error, or keep the bool return and add a debug_assert! that at_quote is true / that a prefix node was emitted (i.e., assert the None/impossible path cannot occur), then compute indent and call p.record_quote_indent(range, indent) as before; refer to emit_quote_prefix_node, parse_quote_block_list, MD_QUOTE, MdQuotePrefix, and p.record_quote_indent when making the change.
🧹 Nitpick comments (3)
crates/biome_markdown_parser/src/syntax/quote.rs (3)
130-131: Link the TODO to a tracking issue.The deferred
MD_QUOTE_PRE_MARKER_INDENTemission will be easy to forget once this PR merges. A reference to a GitHub issue would keep it on the radar.Want me to draft an issue description for the follow-up task?
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@crates/biome_markdown_parser/src/syntax/quote.rs` around lines 130 - 131, Replace the TODO comment in quote.rs about deferring MD_QUOTE_PRE_MARKER_INDENT emission with a short note linking to a tracking GitHub issue; update the comment that currently reads "TODO: Emit MD_QUOTE_PRE_MARKER_INDENT directly..." to include the issue number or URL and mention the migration step (Step 1b) and symbol MD_QUOTE_PRE_MARKER_INDENT so the follow-up task is discoverable.
452-459: Partial CST emission on mid-loop failure is unrecoverable — invariant should be documented.If
emit_quote_prefix_tokensreturnsNoneat depth iteration i, the i already-completedMdQuotePrefixnodes can't be rolled back, yetfalseis returned. This is safe only because every caller checkshas_quote_prefix(p, depth)first. That invariant is currently implicit.A doc comment on
consume_quote_prefix_implnoting this precondition (or adebug_assert!) would prevent future callers from breaking it silently:📝 Proposed doc comment
+/// # Precondition +/// Callers MUST verify `has_quote_prefix(p, depth)` before calling this +/// function when `set_virtual_line_start = true`. If `emit_quote_prefix_tokens` +/// returns `None` mid-loop, already-emitted `MdQuotePrefix` nodes cannot be +/// rolled back, leaving the CST partially modified. fn consume_quote_prefix_impl(🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@crates/biome_markdown_parser/src/syntax/quote.rs` around lines 452 - 459, The loop in consume_quote_prefix_impl can leave partially-completed MD_QUOTE_PREFIX nodes if emit_quote_prefix_tokens returns None mid-loop; document the precondition and/or add a runtime debug check to prevent misuse. Add a doc comment on consume_quote_prefix_impl stating callers must ensure has_quote_prefix(p, depth) is true (or equivalent precondition), and add a debug_assert!(has_quote_prefix(p, depth)) at the start of consume_quote_prefix_impl (or validate before entering the loop) so future callers of emit_quote_prefix_tokens, consume_quote_prefix_impl, and functions creating MdQuotePrefix can't accidentally rely on this implicit invariant.
160-181:emit_post_marker_spaceis a misnomer in thepreserve_tab = truebranch.When
preserve_tab = trueand the current token is"\t", the function returnstrue(indicating a post-marker separator exists) but emits nothing — the tab stays in the stream for the child block to claim. The name "emit" implies a token was always produced, which isn't the case here, making callers (and future readers) believeMD_QUOTE_POST_MARKER_SPACEwas always emitted when the return value istrue.A more accurate name would be
check_and_consume_post_marker_space, or the return type could be an enumPostMarkerSpace { Consumed, Preserved, Absent }to distinguish the three cases. At minimum, the doc-comment on the function should reflect this nuance.🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@crates/biome_markdown_parser/src/syntax/quote.rs` around lines 160 - 181, The function emit_post_marker_space() is misleading because when preserve_tab is true it returns true for a tab without emitting MD_QUOTE_POST_MARKER_SPACE; rename the function to check_and_consume_post_marker_space (or alternatively update its doc-comment) and clearly document the three outcomes: consumed (emitted MD_QUOTE_POST_MARKER_SPACE), preserved (tab present but not remapped when preserve_tab==true), and absent (no post-marker). Update the function signature/name and all call sites that rely on the old meaning (or adjust callers to only assume emission when the function indicates "consumed") so callers do not assume MD_QUOTE_POST_MARKER_SPACE was emitted whenever the function returns true; reference emit_post_marker_space / MD_QUOTE_POST_MARKER_SPACE / preserve_tab when making these changes.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Outside diff comments:
In `@crates/biome_markdown_parser/src/syntax/quote.rs`:
- Around line 97-107: emit_quote_prefix_node can return false for two different
reasons (no '>' token vs '>' without space), which can create an MD_QUOTE node
with no MdQuotePrefix child; add a defensive check at the call site in the
function around emit_quote_prefix_node(p) (before computing indent and recording
quote indent) to make the invariant explicit: either change
emit_quote_prefix_node to return Option<bool> and handle None by treating it as
a parsing error, or keep the bool return and add a debug_assert! that at_quote
is true / that a prefix node was emitted (i.e., assert the None/impossible path
cannot occur), then compute indent and call p.record_quote_indent(range, indent)
as before; refer to emit_quote_prefix_node, parse_quote_block_list, MD_QUOTE,
MdQuotePrefix, and p.record_quote_indent when making the change.
---
Nitpick comments:
In `@crates/biome_markdown_parser/src/syntax/quote.rs`:
- Around line 130-131: Replace the TODO comment in quote.rs about deferring
MD_QUOTE_PRE_MARKER_INDENT emission with a short note linking to a tracking
GitHub issue; update the comment that currently reads "TODO: Emit
MD_QUOTE_PRE_MARKER_INDENT directly..." to include the issue number or URL and
mention the migration step (Step 1b) and symbol MD_QUOTE_PRE_MARKER_INDENT so
the follow-up task is discoverable.
- Around line 452-459: The loop in consume_quote_prefix_impl can leave
partially-completed MD_QUOTE_PREFIX nodes if emit_quote_prefix_tokens returns
None mid-loop; document the precondition and/or add a runtime debug check to
prevent misuse. Add a doc comment on consume_quote_prefix_impl stating callers
must ensure has_quote_prefix(p, depth) is true (or equivalent precondition), and
add a debug_assert!(has_quote_prefix(p, depth)) at the start of
consume_quote_prefix_impl (or validate before entering the loop) so future
callers of emit_quote_prefix_tokens, consume_quote_prefix_impl, and functions
creating MdQuotePrefix can't accidentally rely on this implicit invariant.
- Around line 160-181: The function emit_post_marker_space() is misleading
because when preserve_tab is true it returns true for a tab without emitting
MD_QUOTE_POST_MARKER_SPACE; rename the function to
check_and_consume_post_marker_space (or alternatively update its doc-comment)
and clearly document the three outcomes: consumed (emitted
MD_QUOTE_POST_MARKER_SPACE), preserved (tab present but not remapped when
preserve_tab==true), and absent (no post-marker). Update the function
signature/name and all call sites that rely on the old meaning (or adjust
callers to only assume emission when the function indicates "consumed") so
callers do not assume MD_QUOTE_POST_MARKER_SPACE was emitted whenever the
function returns true; reference emit_post_marker_space /
MD_QUOTE_POST_MARKER_SPACE / preserve_tab when making these changes.
ℹ️ Review info
Configuration used: Path: .coderabbit.yaml
Review profile: CHILL
Plan: Pro
⛔ Files ignored due to path filters (14)
crates/biome_markdown_factory/src/generated/node_factory.rsis excluded by!**/generated/**,!**/generated/**and included by**crates/biome_markdown_factory/src/generated/syntax_factory.rsis excluded by!**/generated/**,!**/generated/**and included by**crates/biome_markdown_parser/tests/md_test_suite/error/quote_nesting_too_deep.md.snapis excluded by!**/*.snapand included by**crates/biome_markdown_parser/tests/md_test_suite/ok/block_quote.md.snapis excluded by!**/*.snapand included by**crates/biome_markdown_parser/tests/md_test_suite/ok/block_quote_grouping.md.snapis excluded by!**/*.snapand included by**crates/biome_markdown_parser/tests/md_test_suite/ok/lazy_continuation.md.snapis excluded by!**/*.snapand included by**crates/biome_markdown_parser/tests/md_test_suite/ok/list_continuation_edge_cases.md.snapis excluded by!**/*.snapand included by**crates/biome_markdown_parser/tests/md_test_suite/ok/nested_quote.md.snapis excluded by!**/*.snapand included by**crates/biome_markdown_parser/tests/md_test_suite/ok/paragraph_interruption.md.snapis excluded by!**/*.snapand included by**crates/biome_markdown_parser/tests/md_test_suite/ok/setext_heading_edge_cases.md.snapis excluded by!**/*.snapand included by**crates/biome_markdown_syntax/src/generated/kind.rsis excluded by!**/generated/**,!**/generated/**and included by**crates/biome_markdown_syntax/src/generated/macros.rsis excluded by!**/generated/**,!**/generated/**and included by**crates/biome_markdown_syntax/src/generated/nodes.rsis excluded by!**/generated/**,!**/generated/**and included by**crates/biome_markdown_syntax/src/generated/nodes_mut.rsis excluded by!**/generated/**,!**/generated/**and included by**
📒 Files selected for processing (10)
crates/biome_markdown_formatter/src/generated.rscrates/biome_markdown_formatter/src/markdown/any/block.rscrates/biome_markdown_formatter/src/markdown/any/inline.rscrates/biome_markdown_formatter/src/markdown/auxiliary/mod.rscrates/biome_markdown_formatter/src/markdown/auxiliary/quote_prefix.rscrates/biome_markdown_parser/src/syntax/mod.rscrates/biome_markdown_parser/src/syntax/quote.rscrates/biome_markdown_parser/src/to_html.rsxtask/codegen/markdown.ungramxtask/codegen/src/markdown_kinds_src.rs
🚧 Files skipped from review as they are similar to previous changes (1)
- crates/biome_markdown_formatter/src/markdown/auxiliary/mod.rs
Note
AI Assistance Disclosure: This PR was developed with assistance from Claude Code.
Summary
MdQuotePrefixnode to the grammar withpre_marker_indent,marker(>), andpost_marker_spacefields, registered as a variant of bothAnyMdBlockandAnyMdInline.MD_QUOTE_PRE_MARKER_INDENTandMD_QUOTE_POST_MARKER_SPACEtoken kinds tomarkdown_kinds_src.rs.parse_as_skipped_trivia_tokenscalls for>markers inquote.rswith explicitMdQuotePrefixnode emission, making every blockquote>a real CST node visible to the formatter harness.MdQuotegrammar frommarker: '>' content: AnyMdBlocktoprefix: MdQuotePrefix content: MdBlockList.line_has_quote_prefixfrommod.rsintoquote.rsasline_has_quote_prefix_at_currentto co-locate quote logic.MdQuotePrefixhandling toto_html.rsalt-text extraction and formatter dispatch.This is Phase 1 (Blockquotes) of a multi-phase effort to promote structurally significant tokens from skipped trivia to explicit CST nodes. The
>markers were previously stored as skipped trivia, bypassing grammar codegen and making them invisible to the formatter's token-tracking harness.
Pre-marker indent emission (
MD_QUOTE_PRE_MARKER_INDENT) is deferred to a follow-up to keep scope narrow.No user-facing behavior change is intended. The parsed semantics are preserved; only the internal CST representation changes.
Test Plan
cargo test -p biome_markdown_parser— 64 tests passcargo insta test -p biome_markdown_parser— all snapshots acceptedrg -n "SKIPPED_TOKEN_TRIVIA.*>" crates/biome_markdown_parser/testsrg -n "MdQuotePrefix \\{[\\s\\S]{0,300}pre_marker_indent_token:[\\s\\S]{0,300}marker_token:[\\s\\S]{0,300}post_marker_space_token:" crates/ biome_markdown_parser/tests/md_test_suite/**/*.snapDocs
N/A — internal structural change, no new user-facing features.