Skip to content

refactor(markdown): replace skipped trivia with formatter-safe recovery#9746

Merged
dyc3 merged 2 commits intobiomejs:mainfrom
jfmcdowell:refactor/markdown-remove-skipped-trivia
Mar 31, 2026
Merged

refactor(markdown): replace skipped trivia with formatter-safe recovery#9746
dyc3 merged 2 commits intobiomejs:mainfrom
jfmcdowell:refactor/markdown-remove-skipped-trivia

Conversation

@jfmcdowell
Copy link
Copy Markdown
Contributor

@jfmcdowell jfmcdowell commented Mar 31, 2026

Note

This PR was created with AI assistance (Claude Code).

Summary

Refs #9742.

Replace Skipped trivia in Markdown parsing with formatter-safe representations. Normal parsing paths now use Whitespace trivia for structural whitespace, and quote/list depth-overflow recovery now wraps content in MdBogusBlock instead of attaching skipped trivia to normal nodes.

Test Plan

  • just test-crate biome_markdown_parser — all passed
  • just test-crate biome_markdown_formatter — all passed
  • just test-markdown-conformance — 652/652
  • just f
  • just l

Docs

N/A.

@changeset-bot
Copy link
Copy Markdown

changeset-bot bot commented Mar 31, 2026

⚠️ No Changeset found

Latest commit: 66269f9

Merging this PR will not cause a version bump for any packages. If these changes should not result in a new version, you're good to go. If these changes should result in a version bump, you need to add a changeset.

This PR includes no changesets

When changesets are added to this PR, you'll see the packages that this PR includes changesets for and the associated semver types

Click here to learn what changesets are, and how to add one.

Click here if you're a maintainer who wants to add a changeset to this PR

@github-actions github-actions bot added A-Parser Area: parser L-Markdown Language: Markdown labels Mar 31, 2026
@codspeed-hq
Copy link
Copy Markdown

codspeed-hq bot commented Mar 31, 2026

Merging this PR will not alter performance

✅ 58 untouched benchmarks
⏩ 196 skipped benchmarks1


Comparing jfmcdowell:refactor/markdown-remove-skipped-trivia (66269f9) with main (8837bf3)

Open in CodSpeed

Footnotes

  1. 196 benchmarks were skipped, so the baseline results were used instead. If they were deleted from the codebase, click here and archive them to remove them from the performance reports.

@jfmcdowell jfmcdowell marked this pull request as ready for review March 31, 2026 10:21
@coderabbitai
Copy link
Copy Markdown
Contributor

coderabbitai bot commented Mar 31, 2026

Walkthrough

Added MarkdownLexContext::HeadingContent plus a force_relex_heading_content path so heading text is lexed without treating trailing spaces as hard line breaks. Introduced consume_as_whitespace_trivia() and a BumpWithContext::skip_as_trivia_of_kind_with_context hook to classify leading/trailing indentation and blank-line spaces as Whitespace trivia instead of Skipped. Introduced an MdBogusBlock grammar node, wired its CST formatter, and updated parser recovery paths (lists, quotes, headers) to emit bogus blocks or attach whitespace as trivia. Tests for heading trailing-space behaviour were added.

Possibly related PRs

Suggested labels

A-Formatter, A-Tooling

Suggested reviewers

  • dyc3
  • ematipico
🚥 Pre-merge checks | ✅ 2
✅ Passed checks (2 passed)
Check name Status Explanation
Description check ✅ Passed The PR description clearly relates to the changeset, explaining the refactoring of Skipped trivia handling in Markdown parsing with specific technical details about the changes.
Title check ✅ Passed The title accurately summarises the main refactoring: replacing skipped trivia with formatter-safe recovery mechanisms throughout the markdown parser.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

Copy link
Copy Markdown
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (1)
crates/biome_markdown_parser/src/parser.rs (1)

396-445: ⚠️ Potential issue | 🟠 Major

Please add parser fixtures for the new whitespace-trivia path.

This flips normal parse paths from Skipped to Whitespace, but the diff ships no parser cases to pin the new CST/trivia shape. I’d want at least # h \n, # h \n, # h ## \n, and one blank-line/list continuation case in the same PR; these are exactly the sort of gremlins that regress quietly.

As per coding guidelines, "All code changes MUST include appropriate tests: lint rules require snapshot tests in 'tests/specs/{group}/{rule}/', formatters require snapshot tests with valid/invalid cases, parsers require test files covering valid and error cases, and bug fixes require tests that reproduce and validate the fix."

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@crates/biome_markdown_parser/src/parser.rs` around lines 396 - 445, Add
parser snapshot tests that exercise the new whitespace-trivia path so we pin the
CST/trivia shape produced by skip_line_indent and consume_as_whitespace_trivia:
create tests that parse the input lines "# h  \n", "# h   \n", "# h ##  \n", and
a blank-line/list-continuation case, then assert the produced CST/trivia
contains TriviaPieceKind::Whitespace (not Skipped) attached in the Regular
MarkdownLexContext; run/update snapshots to capture the new shape and ensure the
parser fixtures fail if skip_line_indent stops consuming or attaches Skipped
trivia instead.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@crates/biome_markdown_parser/src/lexer/mod.rs`:
- Around line 271-278: The HeadingContent case still collapses 3+ trailing
spaces because consume_textual() bails on is_potential_hard_line_break() even
inside headings; update the mid-token hard-break guard in consume_textual() to
skip that bail when the current context is MarkdownLexContext::HeadingContent
(i.e., only treat is_potential_hard_line_break() as a hard-break bail-out when
context != MarkdownLexContext::HeadingContent), so that headings will consume
spaces normally; reference consume_textual(), is_potential_hard_line_break(),
and MarkdownLexContext::HeadingContent to locate and change the conditional
logic accordingly.

---

Outside diff comments:
In `@crates/biome_markdown_parser/src/parser.rs`:
- Around line 396-445: Add parser snapshot tests that exercise the new
whitespace-trivia path so we pin the CST/trivia shape produced by
skip_line_indent and consume_as_whitespace_trivia: create tests that parse the
input lines "# h  \n", "# h   \n", "# h ##  \n", and a
blank-line/list-continuation case, then assert the produced CST/trivia contains
TriviaPieceKind::Whitespace (not Skipped) attached in the Regular
MarkdownLexContext; run/update snapshots to capture the new shape and ensure the
parser fixtures fail if skip_line_indent stops consuming or attaches Skipped
trivia instead.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

Run ID: bb9dbab5-3f2c-4065-bc58-18c018ba635a

📥 Commits

Reviewing files that changed from the base of the PR and between 8837bf3 and 00430d0.

⛔ Files ignored due to path filters (7)
  • crates/biome_markdown_parser/tests/md_test_suite/ok/atx_heading_trailing_hash.md.snap is excluded by !**/*.snap and included by **
  • crates/biome_markdown_parser/tests/md_test_suite/ok/fenced_code_in_list.md.snap is excluded by !**/*.snap and included by **
  • crates/biome_markdown_parser/tests/md_test_suite/ok/header.md.snap is excluded by !**/*.snap and included by **
  • crates/biome_markdown_parser/tests/md_test_suite/ok/html_block_in_list.md.snap is excluded by !**/*.snap and included by **
  • crates/biome_markdown_parser/tests/md_test_suite/ok/list_continuation_edge_cases.md.snap is excluded by !**/*.snap and included by **
  • crates/biome_markdown_parser/tests/md_test_suite/ok/list_indentation.md.snap is excluded by !**/*.snap and included by **
  • crates/biome_markdown_parser/tests/md_test_suite/ok/setext_heading_edge_cases.md.snap is excluded by !**/*.snap and included by **
📒 Files selected for processing (7)
  • crates/biome_markdown_parser/src/lexer/mod.rs
  • crates/biome_markdown_parser/src/parser.rs
  • crates/biome_markdown_parser/src/syntax/header.rs
  • crates/biome_markdown_parser/src/syntax/list.rs
  • crates/biome_markdown_parser/src/syntax/mod.rs
  • crates/biome_markdown_parser/src/token_source.rs
  • crates/biome_parser/src/token_source.rs

Comment thread crates/biome_markdown_parser/src/lexer/mod.rs
Refs biomejs#9742. `Skipped` trivia now only appears in genuine
error-recovery paths (quote/list nesting depth limits). Spec-driven
structural whitespace now uses `Whitespace` trivia instead.

Add `skip_as_trivia_of_kind_with_context` to `BumpWithContext` in
`biome_parser` so parsers can consume tokens as a chosen trivia kind.
Markdown uses this to emit `Whitespace` trivia for structural
whitespace the CommonMark spec says to strip.

Add `MarkdownLexContext::HeadingContent` to split
`MD_HARD_LINE_LITERAL` (trailing spaces + newline) into separate
tokens in heading parsing. This lets the parser consume only the
whitespace as trivia while keeping the newline visible as a real token,
avoiding syntax-significant trivia.

Converted normal-path sites from `Skipped` to `Whitespace`:
- `parser.rs`: `skip_line_indent`
- `header.rs`: whitespace before/after trailing `#`
- `header.rs`: trailing spaces in heading content via re-lexing
- `mod.rs`: blank-line whitespace and post-hard-break spaces
- `list.rs`: blank-line whitespace in lists

Also convert `list.rs:skip_list_marker_indent` from `Skipped` to
`Whitespace`; this helper is only used in nesting-depth recovery paths.

Retain `Skipped` only for recovery:
- `quote.rs`: quote nesting depth exceeded
- `list.rs`: list nesting depth exceeded

Formatter-facing CST structure for quote prefixes, list prefixes, and
header indent/hash nodes was already explicit and did not need grammar
changes.
@jfmcdowell jfmcdowell force-pushed the refactor/markdown-remove-skipped-trivia branch from 00430d0 to f2e7e03 Compare March 31, 2026 10:56
Add MdBogusBlock to the Markdown grammar as a valid AnyMdBlock child.
Use it in quote and list depth-overflow recovery paths to wrap the
consumed marker tokens instead of routing them through
parse_as_skipped_trivia_tokens.

Previously, recovery tokens (e.g. an over-nested `>` marker) were
attached as Skipped trivia on the next normal paragraph content. This
made them visible to the formatter on normal nodes. Now they are
contained inside MdBogusBlock, which the formatter skips.

With this change, zero Skipped trivia remains anywhere in the Markdown
parser — not on normal paths, and not on recovery paths.

Remove the now-unused skip_optional_marker_space function.
@github-actions github-actions bot added A-Formatter Area: formatter A-Tooling Area: internal tools labels Mar 31, 2026
Copy link
Copy Markdown
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🧹 Nitpick comments (1)
crates/biome_markdown_parser/src/syntax/quote.rs (1)

89-93: Reuse try_bump_quote_marker in this recovery branch.

This branch duplicates marker-consumption logic already centralised in try_bump_quote_marker, which can drift over time.

♻️ Small refactor
-        if p.at(T![>]) {
-            p.bump(T![>]);
-        } else if p.at(MD_TEXTUAL_LITERAL) && p.cur_text() == ">" {
-            p.bump_remap(T![>]);
-        }
+        let _ = try_bump_quote_marker(p);
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@crates/biome_markdown_parser/src/syntax/quote.rs` around lines 89 - 93, The
recovery branch duplicates the quote-marker consumption logic; replace the
conditional that checks p.at(T![>]) / p.at(MD_TEXTUAL_LITERAL) and calls p.bump
or p.bump_remap with a single call to the existing helper try_bump_quote_marker
so the centralized logic in try_bump_quote_marker is reused; locate the recovery
branch in quote.rs and call try_bump_quote_marker(p) (or the appropriate
receiver) instead of duplicating the two p.at/... branches, preserving existing
control flow and any surrounding error/recovery handling.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Nitpick comments:
In `@crates/biome_markdown_parser/src/syntax/quote.rs`:
- Around line 89-93: The recovery branch duplicates the quote-marker consumption
logic; replace the conditional that checks p.at(T![>]) /
p.at(MD_TEXTUAL_LITERAL) and calls p.bump or p.bump_remap with a single call to
the existing helper try_bump_quote_marker so the centralized logic in
try_bump_quote_marker is reused; locate the recovery branch in quote.rs and call
try_bump_quote_marker(p) (or the appropriate receiver) instead of duplicating
the two p.at/... branches, preserving existing control flow and any surrounding
error/recovery handling.

ℹ️ Review info
⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

Run ID: 48462d35-7339-4024-a346-a165d4623af6

📥 Commits

Reviewing files that changed from the base of the PR and between f2e7e03 and 66269f9.

⛔ Files ignored due to path filters (6)
  • crates/biome_markdown_factory/src/generated/node_factory.rs is excluded by !**/generated/**, !**/generated/** and included by **
  • crates/biome_markdown_factory/src/generated/syntax_factory.rs is excluded by !**/generated/**, !**/generated/** and included by **
  • crates/biome_markdown_parser/tests/md_test_suite/error/quote_nesting_too_deep.md.snap is excluded by !**/*.snap and included by **
  • crates/biome_markdown_syntax/src/generated/kind.rs is excluded by !**/generated/**, !**/generated/** and included by **
  • crates/biome_markdown_syntax/src/generated/macros.rs is excluded by !**/generated/**, !**/generated/** and included by **
  • crates/biome_markdown_syntax/src/generated/nodes.rs is excluded by !**/generated/**, !**/generated/** and included by **
📒 Files selected for processing (8)
  • crates/biome_markdown_formatter/src/generated.rs
  • crates/biome_markdown_formatter/src/markdown/any/block.rs
  • crates/biome_markdown_formatter/src/markdown/bogus/bogus_block.rs
  • crates/biome_markdown_formatter/src/markdown/bogus/mod.rs
  • crates/biome_markdown_parser/src/syntax/list.rs
  • crates/biome_markdown_parser/src/syntax/quote.rs
  • xtask/codegen/markdown.ungram
  • xtask/codegen/src/markdown_kinds_src.rs
✅ Files skipped from review due to trivial changes (2)
  • crates/biome_markdown_formatter/src/markdown/bogus/mod.rs
  • xtask/codegen/src/markdown_kinds_src.rs
🚧 Files skipped from review as they are similar to previous changes (1)
  • crates/biome_markdown_parser/src/syntax/list.rs

@jfmcdowell jfmcdowell changed the title refactor(markdown): eliminate Skipped trivia from normal parsing paths refactor(markdown): replace skipped trivia with formatter-safe recovery Mar 31, 2026
@jfmcdowell jfmcdowell mentioned this pull request Mar 31, 2026
1 task
Copy link
Copy Markdown
Contributor

@dyc3 dyc3 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

snapshots look good

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

A-Formatter Area: formatter A-Parser Area: parser A-Tooling Area: internal tools L-Markdown Language: Markdown

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants