Skip to content

refactor(markdown-parser): promote fenced code block skipped trivia to explicit CST nodes#9321

Merged
ematipico merged 10 commits intobiomejs:mainfrom
jfmcdowell:refactor/fenced-code-block-prefix
Mar 4, 2026
Merged

refactor(markdown-parser): promote fenced code block skipped trivia to explicit CST nodes#9321
ematipico merged 10 commits intobiomejs:mainfrom
jfmcdowell:refactor/fenced-code-block-prefix

Conversation

@jfmcdowell
Copy link
Contributor

@jfmcdowell jfmcdowell commented Mar 3, 2026

Note

AI Assistance Disclosure: This PR was developed with assistance from Claude Code.

Summary

  • Add MdIndentToken to AnyMdInline in the grammar for fence indent stripping tokens.
  • Replace 4 parse_as_skipped_trivia_tokens() call sites in fenced_code_block.rs with explicit CST node emission:
    • Sites 1-3: Blockquote > prefixes on continuation lines within fenced code blocks now emit MdQuotePrefix nodes (with MdQuoteIndentList, marker, and optional post-marker space).
    • Site 4: Fence indent stripping per CommonMark §4.5 now emits MdIndentToken nodes with MD_INDENT_CHAR tokens.
  • Add MdIndentToken no-op arm in to_html.rs extract_alt_text_inline exhaustive match.
  • Regenerate codegen output (biome_markdown_syntax, biome_markdown_formatter).
  • Add error fixture fenced_code_in_blockquote.md documenting pre-existing limitation where fenced code blocks inside blockquotes produce unterminated fence diagnostics.
  • Update fenced_code_advanced.md snapshot to reflect new CST shape.

Continues the skipped trivia promotion series (#9219, #9274, #9313). Sites 1-3 (quote prefixes in code content) are structurally correct but exercised only via the pre-existing blockquote+fenced-code path which has a known limitation — the error fixture documents current behavior until a follow-up fix lands.

No user-facing behavior change. Parsed semantics are preserved; only the internal CST representation changes.

Test Plan

  • just test-crate biome_markdown_parser
  • just test-markdown-conformance
  • just f && just l

Docs

N/A — internal structural change, no new user-facing features.

@changeset-bot
Copy link

changeset-bot bot commented Mar 3, 2026

⚠️ No Changeset found

Latest commit: cf5d8f5

Merging this PR will not cause a version bump for any packages. If these changes should not result in a new version, you're good to go. If these changes should result in a version bump, you need to add a changeset.

This PR includes no changesets

When changesets are added to this PR, you'll see the packages that this PR includes changesets for and the associated semver types

Click here to learn what changesets are, and how to add one.

Click here if you're a maintainer who wants to add a changeset to this PR

@github-actions github-actions bot added A-Parser Area: parser A-Formatter Area: formatter A-Tooling Area: internal tools labels Mar 3, 2026
…o explicit CST nodes

Replace 4 parse_as_skipped_trivia_tokens() call sites in fenced_code_block.rs:
- Sites 1-3: blockquote > prefixes on continuation lines emit MdQuotePrefix nodes
- Site 4: fence indent stripping emits MdIndentToken nodes

Add MdIndentToken to AnyMdInline in the grammar and regenerate codegen.
Add MdIndentToken no-op arm in to_html.rs extract_alt_text_inline.
Add error fixture documenting pre-existing fenced-code-in-blockquote limitation.
Extract try_bump_quote_marker as pub(crate) to deduplicate marker-bumping logic.
@jfmcdowell jfmcdowell force-pushed the refactor/fenced-code-block-prefix branch from ccff014 to 041e82b Compare March 4, 2026 01:51
@jfmcdowell jfmcdowell marked this pull request as ready for review March 4, 2026 02:26
@coderabbitai
Copy link
Contributor

coderabbitai bot commented Mar 4, 2026

Note

Reviews paused

It looks like this branch is under active development. To avoid overwhelming you with review comments due to an influx of new commits, CodeRabbit has automatically paused this review. You can configure this behavior by changing the reviews.auto_review.auto_pause_after_reviewed_commits setting.

Use the following commands to manage reviews:

  • @coderabbitai resume to resume automatic reviews.
  • @coderabbitai review to trigger a single review.

Use the checkboxes below for quick actions:

  • ▶️ Resume reviews
  • 🔍 Trigger review

Walkthrough

Adds MdIndentToken to the inline grammar and wires it through the formatter and HTML alt-text extraction. Refactors fenced-code parsing into a stateful loop with helpers for quote-prefix handling, virtual-line-start semantics and earlier closing-fence detection. Reorganises quote parsing (introducing emit_quote_prefix_tokens, try_bump_quote_marker, virtual-line-start helpers, and improved indent/prefix handling). Adds tests for fenced code blocks inside blockquotes. No public API signatures changed.

Possibly related PRs

Suggested reviewers

  • ematipico
  • dyc3
🚥 Pre-merge checks | ✅ 2
✅ Passed checks (2 passed)
Check name Status Explanation
Title check ✅ Passed The title accurately describes the main structural refactoring: promoting fenced code block skipped trivia (indentation and quote prefixes) to explicit CST nodes (MdIndentToken).
Description check ✅ Passed The description comprehensively details the changes across multiple files, the rationale for the refactoring, and links to related work—clearly related to the changeset throughout.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Post copyable unit tests in a comment

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@crates/biome_markdown_parser/src/syntax/fenced_code_block.rs`:
- Around line 370-373: The call to try_bump_quote_marker(p) is inside
debug_assert! so it is skipped in release builds and the parser state won't be
updated; replace the debug_assert! invocation with an unconditional call to
try_bump_quote_marker(p) (so the marker is always consumed) and keep an optional
debug-only check if desired (e.g., call try_bump_quote_marker(p) and then
debug_assert!(result, "guard above guarantees marker present")); update the code
around the debug_assert! to call try_bump_quote_marker(p) unconditionally and
handle a false result only via debug assertion or by panicking with the same
message.

ℹ️ Review info

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

Run ID: 02a409d7-7cbc-4b18-996b-0f5447559e8c

📥 Commits

Reviewing files that changed from the base of the PR and between 1022662 and 8d084b9.

⛔ Files ignored due to path filters (3)
  • crates/biome_markdown_parser/tests/md_test_suite/error/fenced_code_in_blockquote.md.snap is excluded by !**/*.snap and included by **
  • crates/biome_markdown_parser/tests/md_test_suite/ok/fenced_code_advanced.md.snap is excluded by !**/*.snap and included by **
  • crates/biome_markdown_syntax/src/generated/nodes.rs is excluded by !**/generated/**, !**/generated/** and included by **
📒 Files selected for processing (6)
  • crates/biome_markdown_formatter/src/markdown/any/inline.rs
  • crates/biome_markdown_parser/src/syntax/fenced_code_block.rs
  • crates/biome_markdown_parser/src/syntax/quote.rs
  • crates/biome_markdown_parser/src/to_html.rs
  • crates/biome_markdown_parser/tests/md_test_suite/error/fenced_code_in_blockquote.md
  • xtask/codegen/markdown.ungram

Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@crates/biome_markdown_parser/src/syntax/fenced_code_block.rs`:
- Around line 314-335: The code currently sets at_line_start = false immediately
after consume_quote_prefixes_in_code_content, which prevents the later
fence-indent stripping block (skip_fenced_content_indent and at_closing_fence)
from running for lines inside blockquotes; update the loop in
fenced_code_block.rs so fence-indent stripping runs after quote prefix
consumption: after calling consume_quote_prefixes_in_code_content (function
name) and before or regardless of resetting at_line_start, call
skip_fenced_content_indent when fence_indent > 0 and then re-check
at_closing_fence (function name) — or alternatively handle blockquote-nested
indentation explicitly by adding a branch that strips fence_indent even when
at_line_start was just true and quote prefixes were consumed; ensure
CodeContentLoopAction semantics and the at_line_start flag are preserved.

ℹ️ Review info

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

Run ID: 7e51518c-9db4-4808-9efb-2a5a282df782

📥 Commits

Reviewing files that changed from the base of the PR and between 8d084b9 and 582f51a.

📒 Files selected for processing (1)
  • crates/biome_markdown_parser/src/syntax/fenced_code_block.rs

Use virtual_line_start in line_has_closing_fence so fence detection
starts after consumed quote prefixes instead of seeing `>` as
non-whitespace. Set virtual_line_start after quote prefix consumption
and allow fence-indent stripping to run on blockquote lines.
@ematipico
Copy link
Member

It's weird that the coverage job isn't triggered by these changes.

indent_list_m.complete(p, MD_QUOTE_INDENT_LIST);

let marker_bumped = try_bump_quote_marker(p);
debug_assert!(marker_bumped, "guard above guarantees marker present");
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We usually try to use messages to understand what went wrong and/or how to fix it. For example, if a developer lands here, the message should tell what caused the problem, and where to look at for possible fixes (if applicable)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed — replaced unreachable!() with a safe fallback (prefix_m.abandon(p); return false), and improved the debug_assert! message to explain the root cause and where to look:

"consume_quote_prefix_in_code_content: quote marker not found after guard confirmed `>` token — check that force_relex_regular and the guard condition are in sync"

let marker_bumped = try_bump_quote_marker(p);
debug_assert!(marker_bumped, "guard above guarantees marker present");
if !marker_bumped {
unreachable!("guard above guarantees marker present");
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No code that panics in production. Let's find a safer approach

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed — replaced unreachable!() with prefix_m.abandon(p); return false. The empty MD_QUOTE_INDENT_LIST that was already completed gets reparented to the parent via abandon, which is harmless in this theoretically unreachable path.

Comment on lines +289 to +290
CodeContentLoopAction::Continue => continue,
CodeContentLoopAction::ConsumeText => {}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
CodeContentLoopAction::Continue => continue,
CodeContentLoopAction::ConsumeText => {}
CodeContentLoopAction::Continue |
CodeContentLoopAction::ConsumeText => continue,

Copy link
Contributor Author

@jfmcdowell jfmcdowell Mar 4, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

These arms have different semantics — ConsumeText falls through to bump_code_textual(p) + at_line_start = false, while Continue skips both. Merging them would cause an infinite loop (parser position never advances).

Open to restructuring if you have a different approach in mind — what would you prefer here?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe something like this that inverts the enum?

enum CodeContentTokenAction {
    Break,
    Skip,  // renamed from Continue
    Consume,  // renamed from ConsumeText
}

fn parse_code_content(...) {
    // ...
    while !p.at(T![EOF]) {
        match prepare_next_code_content_token(...) {
            CodeContentTokenAction::Break => break,
            CodeContentTokenAction::Skip => continue,
            CodeContentTokenAction::Consume => {
                bump_code_textual(p);
                at_line_start = false;
            }
        }
    }
}

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Much better yes!

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Resolved: I've refactored the control flow to make all three code paths explicit within the match statement (d44b297)

true
}

fn consume_code_textual(p: &mut MarkdownParser) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There's a bit of misalignment among these new functions. Some return a boolean, some don't, but they all start with consume_*. I would look for a better alignment in naming

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed — renamed consume_code_textualbump_code_textual since it unconditionally bumps and doesn't return a bool. The consume_* functions all follow the try-consume pattern (return bool), while bump_* is unconditional.

- Add docstrings to prepare_next_code_content_token,
  consume_quote_prefixes_in_code_content, and
  consume_quote_prefix_in_code_content
- Replace unreachable!() with safe fallback (abandon + return false)
- Improve debug_assert! message with actionable diagnostic
- Rename consume_code_textual → bump_code_textual for naming alignment
Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@crates/biome_markdown_parser/src/syntax/fenced_code_block.rs`:
- Around line 350-360: The function consume_quote_prefixes_in_code_content
currently mutates parser state (virtual_line_start and by calling
skip_line_indent and consume_quote_prefix_in_code_content) as it iterates and
returns false on first failure, which can leave the parser mid-line; change it
to perform a preflight check or snapshot-and-restore: save the parser state (via
p.state()/p.state_mut() snapshot) before attempting to consume prefixes,
simulate or loop calling consume_quote_prefix_in_code_content on a
temporary/simulated parser (or perform the checks without mutating real state)
and only if all quote_depth prefixes succeed apply the real mutations
(virtual_line_start update, skip_line_indent and the actual
consume_quote_prefix_in_code_content calls); ensure that if any prefix fails no
real parser state is changed so outer-container parsing is not corrupted.

ℹ️ Review info
⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

Run ID: 76b463cf-10b6-47ec-b826-cc403def3478

📥 Commits

Reviewing files that changed from the base of the PR and between 870df65 and 4dbe781.

⛔ Files ignored due to path filters (2)
  • crates/biome_markdown_formatter/tests/specs/prettier/markdown/blockquote/code.md.snap is excluded by !**/*.snap and included by **
  • crates/biome_markdown_formatter/tests/specs/prettier/markdown/blockquote/ignore-code.md.snap is excluded by !**/*.snap and included by **
📒 Files selected for processing (1)
  • crates/biome_markdown_parser/src/syntax/fenced_code_block.rs

…tion

Prevents partial consumption of outer blockquote markers when an inner
prefix is missing (quote_depth > 1). Without this, the first `>` would
be consumed into the code block content, stealing it from the outer
blockquote and corrupting subsequent parsing.
Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@crates/biome_markdown_parser/src/syntax/fenced_code_block.rs`:
- Around line 340-341: prepare_next_code_content_token can advance the parser to
EOF but still return CodeContentLoopAction::ConsumeText, causing later code
(around the MD_TEXTUAL_LITERAL remapping at the block that currently handles
p.at(T![EOF]) on lines ~293-295) to mis-handle EOF; update
prepare_next_code_content_token to check p.at(T![EOF]) immediately after
consuming the in-loop prefix/indent and return CodeContentLoopAction::Break
instead of ConsumeText when EOF is reached, and also modify the downstream logic
that remaps EOF to MD_TEXTUAL_LITERAL so it no longer treats EOF as text (i.e.,
ensure the EOF check precedes any remapping to MD_TEXTUAL_LITERAL).

ℹ️ Review info
⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

Run ID: eacb4ca6-5a18-472f-990d-3fc24636b9ad

📥 Commits

Reviewing files that changed from the base of the PR and between 4dbe781 and a8280e0.

📒 Files selected for processing (1)
  • crates/biome_markdown_parser/src/syntax/fenced_code_block.rs

Adds EOF check in prepare_next_code_content_token before returning
ConsumeText. Prevents bump_code_textual from remapping EOF as
MD_TEXTUAL_LITERAL when quote prefix or indent consumption advances
the parser to end-of-input mid-iteration.
Renamed `CodeContentLoopAction` to `CodeContentTokenAction` and moved all
control flow logic into explicit match arms, eliminating the fall-through
pattern that was causing review confusion.

Changes:
- Renamed enum: `CodeContentLoopAction` → `CodeContentTokenAction`
- Renamed variants: `Continue` → `Skip`, `ConsumeText` → `Consume`
- Moved `bump_code_textual(p)` and `at_line_start = false` into the
  `Consume` match arm for clarity

All tests pass. Behavior unchanged.
Clippy flagged the continue as redundant since nothing executes after the
match. Using an empty block achieves the same result without the warning.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

A-Formatter Area: formatter A-Parser Area: parser A-Tooling Area: internal tools

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants