Skip to content

fix(markdown-parser): promote blockquote prefix markers from skipped trivia to explicit CST nodes#9219

Merged
ematipico merged 1 commit intobiomejs:mainfrom
jfmcdowell:fix/md-quote-prefix-nodes
Feb 24, 2026
Merged

fix(markdown-parser): promote blockquote prefix markers from skipped trivia to explicit CST nodes#9219
ematipico merged 1 commit intobiomejs:mainfrom
jfmcdowell:fix/md-quote-prefix-nodes

Conversation

@jfmcdowell
Copy link
Contributor

Note

AI Assistance Disclosure: This PR was developed with assistance from Claude Code.

Summary

  • Add MdQuotePrefix node to the grammar with pre_marker_indent, marker (>), and post_marker_space fields, registered as a variant of both
    AnyMdBlock and AnyMdInline.
  • Add MD_QUOTE_PRE_MARKER_INDENT and MD_QUOTE_POST_MARKER_SPACE token kinds to markdown_kinds_src.rs.
  • Replace parse_as_skipped_trivia_tokens calls for > markers in quote.rs with explicit MdQuotePrefix node emission, making every blockquote > a real CST node visible to the formatter harness.
  • Migrate MdQuote grammar from marker: '>' content: AnyMdBlock to prefix: MdQuotePrefix content: MdBlockList.
  • Move line_has_quote_prefix from mod.rs into quote.rs as line_has_quote_prefix_at_current to co-locate quote logic.
  • Add MdQuotePrefix handling to to_html.rs alt-text extraction and formatter dispatch.
  • Update all blockquote parser snapshots to reflect the new CST shape.

This is Phase 1 (Blockquotes) of a multi-phase effort to promote structurally significant tokens from skipped trivia to explicit CST nodes. The >
markers were previously stored as skipped trivia, bypassing grammar codegen and making them invisible to the formatter's token-tracking harness.

Pre-marker indent emission (MD_QUOTE_PRE_MARKER_INDENT) is deferred to a follow-up to keep scope narrow.

No user-facing behavior change is intended. The parsed semantics are preserved; only the internal CST representation changes.

Test Plan

  • cargo test -p biome_markdown_parser — 64 tests pass
  • cargo insta test -p biome_markdown_parser — all snapshots accepted
  • rg -n "SKIPPED_TOKEN_TRIVIA.*>" crates/biome_markdown_parser/tests
  • rg -n "MdQuotePrefix \\{[\\s\\S]{0,300}pre_marker_indent_token:[\\s\\S]{0,300}marker_token:[\\s\\S]{0,300}post_marker_space_token:" crates/ biome_markdown_parser/tests/md_test_suite/**/*.snap

Docs

N/A — internal structural change, no new user-facing features.

@changeset-bot
Copy link

changeset-bot bot commented Feb 24, 2026

⚠️ No Changeset found

Latest commit: f10b8e3

Merging this PR will not cause a version bump for any packages. If these changes should not result in a new version, you're good to go. If these changes should result in a version bump, you need to add a changeset.

This PR includes no changesets

When changesets are added to this PR, you'll see the packages that this PR includes changesets for and the associated semver types

Click here to learn what changesets are, and how to add one.

Click here if you're a maintainer who wants to add a changeset to this PR

@github-actions github-actions bot added A-Parser Area: parser A-Formatter Area: formatter A-Tooling Area: internal tools labels Feb 24, 2026
@jfmcdowell jfmcdowell marked this pull request as draft February 24, 2026 00:45
@coderabbitai
Copy link
Contributor

coderabbitai bot commented Feb 24, 2026

Walkthrough

This change introduces MdQuotePrefix into the Markdown grammar, parser, HTML extractor and formatter. The parser now emits MdQuotePrefix tokens and uses new helpers for prefix emission and post-marker spacing, with nesting-depth checks. Code generation and kind literals are updated to include MD_QUOTE_PREFIX and related literals. The formatter gains a quote_prefix module and implements formatting rules and match arms to render MdQuotePrefix as verbatim prefix nodes.

Possibly related PRs

Suggested reviewers

  • ematipico
  • dyc3
🚥 Pre-merge checks | ✅ 2
✅ Passed checks (2 passed)
Check name Status Explanation
Title check ✅ Passed The title accurately describes the main change: promoting blockquote prefix markers from skipped trivia to explicit CST nodes, which is the core objective of this PR.
Description check ✅ Passed The description is thorough and directly related to the changeset, covering the grammar changes, node additions, parser refactoring, and test plan for the blockquote prefix promotion effort.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
  • 📝 Generate docstrings (stacked PR)
  • 📝 Generate docstrings (commit on current branch)
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Post copyable unit tests in a comment

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

…ST nodes

Replace parse_as_skipped_trivia_tokens() calls in quote.rs with explicit
MdQuotePrefix node construction. The > marker and post-marker space are
now real CST nodes visible to the formatter harness instead of hidden
skipped trivia.

- Add MdQuotePrefix to grammar with pre_marker_indent, marker, and
  post_marker_space tokens
- Register MD_QUOTE_PRE_MARKER_INDENT and MD_QUOTE_POST_MARKER_SPACE
  token kinds
- MdQuotePrefix appears in both AnyMdBlock and AnyMdInline for
  source-ordered interleaving at block and inline levels
- Lazy continuation represented by absence of MdQuotePrefix after
  a line break
- Move line_has_quote_prefix from mod.rs into quote.rs as
  line_has_quote_prefix_at_current to co-locate quote logic
- All parser snapshot tests updated to reflect new CST shape
@jfmcdowell jfmcdowell force-pushed the fix/md-quote-prefix-nodes branch from abb65ee to f10b8e3 Compare February 24, 2026 01:30
@jfmcdowell jfmcdowell marked this pull request as ready for review February 24, 2026 09:56
Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (1)
crates/biome_markdown_parser/src/syntax/quote.rs (1)

97-107: ⚠️ Potential issue | 🟡 Minor

marker_space = false is ambiguous; missing debug_assert! for the impossible None path.

emit_quote_prefix_node returns false in two distinct cases:

  1. No > token was found (the prefix wasn't emitted at all).
  2. > was found but no trailing space.

Both collapse to indent = 1, but case 1 produces an MD_QUOTE node with no MdQuotePrefix child, violating the grammar. The guard is at_quote having already passed, which is a silent invariant. A debug_assert! at the call site makes that invariant explicit:

🛡️ Proposed defensive assert
-    let marker_space = emit_quote_prefix_node(p);
+    let marker_space = emit_quote_prefix_node(p);
+    debug_assert!(
+        // If emit_quote_prefix_node returned false solely because no space
+        // followed the '>',  the prefix WAS emitted. If it returned false
+        // because no '>' was found at all (impossible here since at_quote
+        // already confirmed one), the node would be incomplete.
+        // This assert catches the latter regression at dev time.
+        at_quote(p) == false, // parser has advanced past '>' already
+    );

Or simpler — return Option<bool> from emit_quote_prefix_node and handle None explicitly.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@crates/biome_markdown_parser/src/syntax/quote.rs` around lines 97 - 107,
emit_quote_prefix_node can return false for two different reasons (no '>' token
vs '>' without space), which can create an MD_QUOTE node with no MdQuotePrefix
child; add a defensive check at the call site in the function around
emit_quote_prefix_node(p) (before computing indent and recording quote indent)
to make the invariant explicit: either change emit_quote_prefix_node to return
Option<bool> and handle None by treating it as a parsing error, or keep the bool
return and add a debug_assert! that at_quote is true / that a prefix node was
emitted (i.e., assert the None/impossible path cannot occur), then compute
indent and call p.record_quote_indent(range, indent) as before; refer to
emit_quote_prefix_node, parse_quote_block_list, MD_QUOTE, MdQuotePrefix, and
p.record_quote_indent when making the change.
🧹 Nitpick comments (3)
crates/biome_markdown_parser/src/syntax/quote.rs (3)

130-131: Link the TODO to a tracking issue.

The deferred MD_QUOTE_PRE_MARKER_INDENT emission will be easy to forget once this PR merges. A reference to a GitHub issue would keep it on the radar.

Want me to draft an issue description for the follow-up task?

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@crates/biome_markdown_parser/src/syntax/quote.rs` around lines 130 - 131,
Replace the TODO comment in quote.rs about deferring MD_QUOTE_PRE_MARKER_INDENT
emission with a short note linking to a tracking GitHub issue; update the
comment that currently reads "TODO: Emit MD_QUOTE_PRE_MARKER_INDENT directly..."
to include the issue number or URL and mention the migration step (Step 1b) and
symbol MD_QUOTE_PRE_MARKER_INDENT so the follow-up task is discoverable.

452-459: Partial CST emission on mid-loop failure is unrecoverable — invariant should be documented.

If emit_quote_prefix_tokens returns None at depth iteration i, the i already-completed MdQuotePrefix nodes can't be rolled back, yet false is returned. This is safe only because every caller checks has_quote_prefix(p, depth) first. That invariant is currently implicit.

A doc comment on consume_quote_prefix_impl noting this precondition (or a debug_assert!) would prevent future callers from breaking it silently:

📝 Proposed doc comment
+/// # Precondition
+/// Callers MUST verify `has_quote_prefix(p, depth)` before calling this
+/// function when `set_virtual_line_start = true`.  If `emit_quote_prefix_tokens`
+/// returns `None` mid-loop, already-emitted `MdQuotePrefix` nodes cannot be
+/// rolled back, leaving the CST partially modified.
 fn consume_quote_prefix_impl(
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@crates/biome_markdown_parser/src/syntax/quote.rs` around lines 452 - 459, The
loop in consume_quote_prefix_impl can leave partially-completed MD_QUOTE_PREFIX
nodes if emit_quote_prefix_tokens returns None mid-loop; document the
precondition and/or add a runtime debug check to prevent misuse. Add a doc
comment on consume_quote_prefix_impl stating callers must ensure
has_quote_prefix(p, depth) is true (or equivalent precondition), and add a
debug_assert!(has_quote_prefix(p, depth)) at the start of
consume_quote_prefix_impl (or validate before entering the loop) so future
callers of emit_quote_prefix_tokens, consume_quote_prefix_impl, and functions
creating MdQuotePrefix can't accidentally rely on this implicit invariant.

160-181: emit_post_marker_space is a misnomer in the preserve_tab = true branch.

When preserve_tab = true and the current token is "\t", the function returns true (indicating a post-marker separator exists) but emits nothing — the tab stays in the stream for the child block to claim. The name "emit" implies a token was always produced, which isn't the case here, making callers (and future readers) believe MD_QUOTE_POST_MARKER_SPACE was always emitted when the return value is true.

A more accurate name would be check_and_consume_post_marker_space, or the return type could be an enum PostMarkerSpace { Consumed, Preserved, Absent } to distinguish the three cases. At minimum, the doc-comment on the function should reflect this nuance.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@crates/biome_markdown_parser/src/syntax/quote.rs` around lines 160 - 181, The
function emit_post_marker_space() is misleading because when preserve_tab is
true it returns true for a tab without emitting MD_QUOTE_POST_MARKER_SPACE;
rename the function to check_and_consume_post_marker_space (or alternatively
update its doc-comment) and clearly document the three outcomes: consumed
(emitted MD_QUOTE_POST_MARKER_SPACE), preserved (tab present but not remapped
when preserve_tab==true), and absent (no post-marker). Update the function
signature/name and all call sites that rely on the old meaning (or adjust
callers to only assume emission when the function indicates "consumed") so
callers do not assume MD_QUOTE_POST_MARKER_SPACE was emitted whenever the
function returns true; reference emit_post_marker_space /
MD_QUOTE_POST_MARKER_SPACE / preserve_tab when making these changes.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Outside diff comments:
In `@crates/biome_markdown_parser/src/syntax/quote.rs`:
- Around line 97-107: emit_quote_prefix_node can return false for two different
reasons (no '>' token vs '>' without space), which can create an MD_QUOTE node
with no MdQuotePrefix child; add a defensive check at the call site in the
function around emit_quote_prefix_node(p) (before computing indent and recording
quote indent) to make the invariant explicit: either change
emit_quote_prefix_node to return Option<bool> and handle None by treating it as
a parsing error, or keep the bool return and add a debug_assert! that at_quote
is true / that a prefix node was emitted (i.e., assert the None/impossible path
cannot occur), then compute indent and call p.record_quote_indent(range, indent)
as before; refer to emit_quote_prefix_node, parse_quote_block_list, MD_QUOTE,
MdQuotePrefix, and p.record_quote_indent when making the change.

---

Nitpick comments:
In `@crates/biome_markdown_parser/src/syntax/quote.rs`:
- Around line 130-131: Replace the TODO comment in quote.rs about deferring
MD_QUOTE_PRE_MARKER_INDENT emission with a short note linking to a tracking
GitHub issue; update the comment that currently reads "TODO: Emit
MD_QUOTE_PRE_MARKER_INDENT directly..." to include the issue number or URL and
mention the migration step (Step 1b) and symbol MD_QUOTE_PRE_MARKER_INDENT so
the follow-up task is discoverable.
- Around line 452-459: The loop in consume_quote_prefix_impl can leave
partially-completed MD_QUOTE_PREFIX nodes if emit_quote_prefix_tokens returns
None mid-loop; document the precondition and/or add a runtime debug check to
prevent misuse. Add a doc comment on consume_quote_prefix_impl stating callers
must ensure has_quote_prefix(p, depth) is true (or equivalent precondition), and
add a debug_assert!(has_quote_prefix(p, depth)) at the start of
consume_quote_prefix_impl (or validate before entering the loop) so future
callers of emit_quote_prefix_tokens, consume_quote_prefix_impl, and functions
creating MdQuotePrefix can't accidentally rely on this implicit invariant.
- Around line 160-181: The function emit_post_marker_space() is misleading
because when preserve_tab is true it returns true for a tab without emitting
MD_QUOTE_POST_MARKER_SPACE; rename the function to
check_and_consume_post_marker_space (or alternatively update its doc-comment)
and clearly document the three outcomes: consumed (emitted
MD_QUOTE_POST_MARKER_SPACE), preserved (tab present but not remapped when
preserve_tab==true), and absent (no post-marker). Update the function
signature/name and all call sites that rely on the old meaning (or adjust
callers to only assume emission when the function indicates "consumed") so
callers do not assume MD_QUOTE_POST_MARKER_SPACE was emitted whenever the
function returns true; reference emit_post_marker_space /
MD_QUOTE_POST_MARKER_SPACE / preserve_tab when making these changes.

ℹ️ Review info

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between abb65ee and f10b8e3.

⛔ Files ignored due to path filters (14)
  • crates/biome_markdown_factory/src/generated/node_factory.rs is excluded by !**/generated/**, !**/generated/** and included by **
  • crates/biome_markdown_factory/src/generated/syntax_factory.rs is excluded by !**/generated/**, !**/generated/** and included by **
  • crates/biome_markdown_parser/tests/md_test_suite/error/quote_nesting_too_deep.md.snap is excluded by !**/*.snap and included by **
  • crates/biome_markdown_parser/tests/md_test_suite/ok/block_quote.md.snap is excluded by !**/*.snap and included by **
  • crates/biome_markdown_parser/tests/md_test_suite/ok/block_quote_grouping.md.snap is excluded by !**/*.snap and included by **
  • crates/biome_markdown_parser/tests/md_test_suite/ok/lazy_continuation.md.snap is excluded by !**/*.snap and included by **
  • crates/biome_markdown_parser/tests/md_test_suite/ok/list_continuation_edge_cases.md.snap is excluded by !**/*.snap and included by **
  • crates/biome_markdown_parser/tests/md_test_suite/ok/nested_quote.md.snap is excluded by !**/*.snap and included by **
  • crates/biome_markdown_parser/tests/md_test_suite/ok/paragraph_interruption.md.snap is excluded by !**/*.snap and included by **
  • crates/biome_markdown_parser/tests/md_test_suite/ok/setext_heading_edge_cases.md.snap is excluded by !**/*.snap and included by **
  • crates/biome_markdown_syntax/src/generated/kind.rs is excluded by !**/generated/**, !**/generated/** and included by **
  • crates/biome_markdown_syntax/src/generated/macros.rs is excluded by !**/generated/**, !**/generated/** and included by **
  • crates/biome_markdown_syntax/src/generated/nodes.rs is excluded by !**/generated/**, !**/generated/** and included by **
  • crates/biome_markdown_syntax/src/generated/nodes_mut.rs is excluded by !**/generated/**, !**/generated/** and included by **
📒 Files selected for processing (10)
  • crates/biome_markdown_formatter/src/generated.rs
  • crates/biome_markdown_formatter/src/markdown/any/block.rs
  • crates/biome_markdown_formatter/src/markdown/any/inline.rs
  • crates/biome_markdown_formatter/src/markdown/auxiliary/mod.rs
  • crates/biome_markdown_formatter/src/markdown/auxiliary/quote_prefix.rs
  • crates/biome_markdown_parser/src/syntax/mod.rs
  • crates/biome_markdown_parser/src/syntax/quote.rs
  • crates/biome_markdown_parser/src/to_html.rs
  • xtask/codegen/markdown.ungram
  • xtask/codegen/src/markdown_kinds_src.rs
🚧 Files skipped from review as they are similar to previous changes (1)
  • crates/biome_markdown_formatter/src/markdown/auxiliary/mod.rs

Copy link
Member

@ematipico ematipico left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you

@ematipico ematipico merged commit d60f438 into biomejs:main Feb 24, 2026
13 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

A-Formatter Area: formatter A-Parser Area: parser A-Tooling Area: internal tools

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants