refactor(markdown-parser): promote fenced code block skipped trivia to explicit CST nodes#9321
Conversation
|
…o explicit CST nodes Replace 4 parse_as_skipped_trivia_tokens() call sites in fenced_code_block.rs: - Sites 1-3: blockquote > prefixes on continuation lines emit MdQuotePrefix nodes - Site 4: fence indent stripping emits MdIndentToken nodes Add MdIndentToken to AnyMdInline in the grammar and regenerate codegen. Add MdIndentToken no-op arm in to_html.rs extract_alt_text_inline. Add error fixture documenting pre-existing fenced-code-in-blockquote limitation. Extract try_bump_quote_marker as pub(crate) to deduplicate marker-bumping logic.
ccff014 to
041e82b
Compare
|
Note Reviews pausedIt looks like this branch is under active development. To avoid overwhelming you with review comments due to an influx of new commits, CodeRabbit has automatically paused this review. You can configure this behavior by changing the Use the following commands to manage reviews:
Use the checkboxes below for quick actions:
WalkthroughAdds MdIndentToken to the inline grammar and wires it through the formatter and HTML alt-text extraction. Refactors fenced-code parsing into a stateful loop with helpers for quote-prefix handling, virtual-line-start semantics and earlier closing-fence detection. Reorganises quote parsing (introducing emit_quote_prefix_tokens, try_bump_quote_marker, virtual-line-start helpers, and improved indent/prefix handling). Adds tests for fenced code blocks inside blockquotes. No public API signatures changed. Possibly related PRs
Suggested reviewers
🚥 Pre-merge checks | ✅ 2✅ Passed checks (2 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches🧪 Generate unit tests (beta)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
There was a problem hiding this comment.
Actionable comments posted: 1
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@crates/biome_markdown_parser/src/syntax/fenced_code_block.rs`:
- Around line 370-373: The call to try_bump_quote_marker(p) is inside
debug_assert! so it is skipped in release builds and the parser state won't be
updated; replace the debug_assert! invocation with an unconditional call to
try_bump_quote_marker(p) (so the marker is always consumed) and keep an optional
debug-only check if desired (e.g., call try_bump_quote_marker(p) and then
debug_assert!(result, "guard above guarantees marker present")); update the code
around the debug_assert! to call try_bump_quote_marker(p) unconditionally and
handle a false result only via debug assertion or by panicking with the same
message.
ℹ️ Review info
Configuration used: Path: .coderabbit.yaml
Review profile: CHILL
Plan: Pro
Run ID: 02a409d7-7cbc-4b18-996b-0f5447559e8c
⛔ Files ignored due to path filters (3)
crates/biome_markdown_parser/tests/md_test_suite/error/fenced_code_in_blockquote.md.snapis excluded by!**/*.snapand included by**crates/biome_markdown_parser/tests/md_test_suite/ok/fenced_code_advanced.md.snapis excluded by!**/*.snapand included by**crates/biome_markdown_syntax/src/generated/nodes.rsis excluded by!**/generated/**,!**/generated/**and included by**
📒 Files selected for processing (6)
crates/biome_markdown_formatter/src/markdown/any/inline.rscrates/biome_markdown_parser/src/syntax/fenced_code_block.rscrates/biome_markdown_parser/src/syntax/quote.rscrates/biome_markdown_parser/src/to_html.rscrates/biome_markdown_parser/tests/md_test_suite/error/fenced_code_in_blockquote.mdxtask/codegen/markdown.ungram
There was a problem hiding this comment.
Actionable comments posted: 1
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@crates/biome_markdown_parser/src/syntax/fenced_code_block.rs`:
- Around line 314-335: The code currently sets at_line_start = false immediately
after consume_quote_prefixes_in_code_content, which prevents the later
fence-indent stripping block (skip_fenced_content_indent and at_closing_fence)
from running for lines inside blockquotes; update the loop in
fenced_code_block.rs so fence-indent stripping runs after quote prefix
consumption: after calling consume_quote_prefixes_in_code_content (function
name) and before or regardless of resetting at_line_start, call
skip_fenced_content_indent when fence_indent > 0 and then re-check
at_closing_fence (function name) — or alternatively handle blockquote-nested
indentation explicitly by adding a branch that strips fence_indent even when
at_line_start was just true and quote prefixes were consumed; ensure
CodeContentLoopAction semantics and the at_line_start flag are preserved.
ℹ️ Review info
Configuration used: Path: .coderabbit.yaml
Review profile: CHILL
Plan: Pro
Run ID: 7e51518c-9db4-4808-9efb-2a5a282df782
📒 Files selected for processing (1)
crates/biome_markdown_parser/src/syntax/fenced_code_block.rs
Use virtual_line_start in line_has_closing_fence so fence detection starts after consumed quote prefixes instead of seeing `>` as non-whitespace. Set virtual_line_start after quote prefix consumption and allow fence-indent stripping to run on blockquote lines.
|
It's weird that the coverage job isn't triggered by these changes. |
| indent_list_m.complete(p, MD_QUOTE_INDENT_LIST); | ||
|
|
||
| let marker_bumped = try_bump_quote_marker(p); | ||
| debug_assert!(marker_bumped, "guard above guarantees marker present"); |
There was a problem hiding this comment.
We usually try to use messages to understand what went wrong and/or how to fix it. For example, if a developer lands here, the message should tell what caused the problem, and where to look at for possible fixes (if applicable)
There was a problem hiding this comment.
Fixed — replaced unreachable!() with a safe fallback (prefix_m.abandon(p); return false), and improved the debug_assert! message to explain the root cause and where to look:
"consume_quote_prefix_in_code_content: quote marker not found after guard confirmed `>` token — check that force_relex_regular and the guard condition are in sync"
| let marker_bumped = try_bump_quote_marker(p); | ||
| debug_assert!(marker_bumped, "guard above guarantees marker present"); | ||
| if !marker_bumped { | ||
| unreachable!("guard above guarantees marker present"); |
There was a problem hiding this comment.
No code that panics in production. Let's find a safer approach
There was a problem hiding this comment.
Fixed — replaced unreachable!() with prefix_m.abandon(p); return false. The empty MD_QUOTE_INDENT_LIST that was already completed gets reparented to the parent via abandon, which is harmless in this theoretically unreachable path.
| CodeContentLoopAction::Continue => continue, | ||
| CodeContentLoopAction::ConsumeText => {} |
There was a problem hiding this comment.
| CodeContentLoopAction::Continue => continue, | |
| CodeContentLoopAction::ConsumeText => {} | |
| CodeContentLoopAction::Continue | | |
| CodeContentLoopAction::ConsumeText => continue, |
There was a problem hiding this comment.
These arms have different semantics — ConsumeText falls through to bump_code_textual(p) + at_line_start = false, while Continue skips both. Merging them would cause an infinite loop (parser position never advances).
Open to restructuring if you have a different approach in mind — what would you prefer here?
There was a problem hiding this comment.
Maybe something like this that inverts the enum?
enum CodeContentTokenAction {
Break,
Skip, // renamed from Continue
Consume, // renamed from ConsumeText
}
fn parse_code_content(...) {
// ...
while !p.at(T![EOF]) {
match prepare_next_code_content_token(...) {
CodeContentTokenAction::Break => break,
CodeContentTokenAction::Skip => continue,
CodeContentTokenAction::Consume => {
bump_code_textual(p);
at_line_start = false;
}
}
}
}There was a problem hiding this comment.
Resolved: I've refactored the control flow to make all three code paths explicit within the match statement (d44b297)
| true | ||
| } | ||
|
|
||
| fn consume_code_textual(p: &mut MarkdownParser) { |
There was a problem hiding this comment.
There's a bit of misalignment among these new functions. Some return a boolean, some don't, but they all start with consume_*. I would look for a better alignment in naming
There was a problem hiding this comment.
Fixed — renamed consume_code_textual → bump_code_textual since it unconditionally bumps and doesn't return a bool. The consume_* functions all follow the try-consume pattern (return bool), while bump_* is unconditional.
- Add docstrings to prepare_next_code_content_token, consume_quote_prefixes_in_code_content, and consume_quote_prefix_in_code_content - Replace unreachable!() with safe fallback (abandon + return false) - Improve debug_assert! message with actionable diagnostic - Rename consume_code_textual → bump_code_textual for naming alignment
There was a problem hiding this comment.
Actionable comments posted: 1
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@crates/biome_markdown_parser/src/syntax/fenced_code_block.rs`:
- Around line 350-360: The function consume_quote_prefixes_in_code_content
currently mutates parser state (virtual_line_start and by calling
skip_line_indent and consume_quote_prefix_in_code_content) as it iterates and
returns false on first failure, which can leave the parser mid-line; change it
to perform a preflight check or snapshot-and-restore: save the parser state (via
p.state()/p.state_mut() snapshot) before attempting to consume prefixes,
simulate or loop calling consume_quote_prefix_in_code_content on a
temporary/simulated parser (or perform the checks without mutating real state)
and only if all quote_depth prefixes succeed apply the real mutations
(virtual_line_start update, skip_line_indent and the actual
consume_quote_prefix_in_code_content calls); ensure that if any prefix fails no
real parser state is changed so outer-container parsing is not corrupted.
ℹ️ Review info
⚙️ Run configuration
Configuration used: Path: .coderabbit.yaml
Review profile: CHILL
Plan: Pro
Run ID: 76b463cf-10b6-47ec-b826-cc403def3478
⛔ Files ignored due to path filters (2)
crates/biome_markdown_formatter/tests/specs/prettier/markdown/blockquote/code.md.snapis excluded by!**/*.snapand included by**crates/biome_markdown_formatter/tests/specs/prettier/markdown/blockquote/ignore-code.md.snapis excluded by!**/*.snapand included by**
📒 Files selected for processing (1)
crates/biome_markdown_parser/src/syntax/fenced_code_block.rs
…tion Prevents partial consumption of outer blockquote markers when an inner prefix is missing (quote_depth > 1). Without this, the first `>` would be consumed into the code block content, stealing it from the outer blockquote and corrupting subsequent parsing.
There was a problem hiding this comment.
Actionable comments posted: 1
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@crates/biome_markdown_parser/src/syntax/fenced_code_block.rs`:
- Around line 340-341: prepare_next_code_content_token can advance the parser to
EOF but still return CodeContentLoopAction::ConsumeText, causing later code
(around the MD_TEXTUAL_LITERAL remapping at the block that currently handles
p.at(T![EOF]) on lines ~293-295) to mis-handle EOF; update
prepare_next_code_content_token to check p.at(T![EOF]) immediately after
consuming the in-loop prefix/indent and return CodeContentLoopAction::Break
instead of ConsumeText when EOF is reached, and also modify the downstream logic
that remaps EOF to MD_TEXTUAL_LITERAL so it no longer treats EOF as text (i.e.,
ensure the EOF check precedes any remapping to MD_TEXTUAL_LITERAL).
ℹ️ Review info
⚙️ Run configuration
Configuration used: Path: .coderabbit.yaml
Review profile: CHILL
Plan: Pro
Run ID: eacb4ca6-5a18-472f-990d-3fc24636b9ad
📒 Files selected for processing (1)
crates/biome_markdown_parser/src/syntax/fenced_code_block.rs
Adds EOF check in prepare_next_code_content_token before returning ConsumeText. Prevents bump_code_textual from remapping EOF as MD_TEXTUAL_LITERAL when quote prefix or indent consumption advances the parser to end-of-input mid-iteration.
Renamed `CodeContentLoopAction` to `CodeContentTokenAction` and moved all control flow logic into explicit match arms, eliminating the fall-through pattern that was causing review confusion. Changes: - Renamed enum: `CodeContentLoopAction` → `CodeContentTokenAction` - Renamed variants: `Continue` → `Skip`, `ConsumeText` → `Consume` - Moved `bump_code_textual(p)` and `at_line_start = false` into the `Consume` match arm for clarity All tests pass. Behavior unchanged.
Clippy flagged the continue as redundant since nothing executes after the match. Using an empty block achieves the same result without the warning.
Note
AI Assistance Disclosure: This PR was developed with assistance from Claude Code.
Summary
MdIndentTokentoAnyMdInlinein the grammar for fence indent stripping tokens.parse_as_skipped_trivia_tokens()call sites infenced_code_block.rswith explicit CST node emission:>prefixes on continuation lines within fenced code blocks now emitMdQuotePrefixnodes (withMdQuoteIndentList, marker, and optional post-marker space).MdIndentTokennodes withMD_INDENT_CHARtokens.MdIndentTokenno-op arm into_html.rsextract_alt_text_inlineexhaustive match.biome_markdown_syntax,biome_markdown_formatter).fenced_code_in_blockquote.mddocumenting pre-existing limitation where fenced code blocks inside blockquotes produce unterminated fence diagnostics.fenced_code_advanced.mdsnapshot to reflect new CST shape.Continues the skipped trivia promotion series (#9219, #9274, #9313). Sites 1-3 (quote prefixes in code content) are structurally correct but exercised only via the pre-existing blockquote+fenced-code path which has a known limitation — the error fixture documents current behavior until a follow-up fix lands.
No user-facing behavior change. Parsed semantics are preserved; only the internal CST representation changes.
Test Plan
just test-crate biome_markdown_parserjust test-markdown-conformancejust f && just lDocs
N/A — internal structural change, no new user-facing features.