Skip to content

refactor(markdown-parser): promote pre-marker indent to explicit CST#9224

Merged
ematipico merged 2 commits intobiomejs:mainfrom
jfmcdowell:refactor/md-parser-pre-marker-indent
Feb 27, 2026
Merged

refactor(markdown-parser): promote pre-marker indent to explicit CST#9224
ematipico merged 2 commits intobiomejs:mainfrom
jfmcdowell:refactor/md-parser-pre-marker-indent

Conversation

@jfmcdowell
Copy link
Contributor

Note

AI Assistance Disclosure: This PR was developed with assistance from Claude Code.

Summary

  • Add MdQuoteIndent and MdQuoteIndentList to the grammar, following the MdHash/MdHashList wrapper pattern for repeating over raw tokens.
  • Change MdQuotePrefix.pre_marker_indent from a single optional token slot to MdQuoteIndentList, so each space before > gets its own MdQuoteIndent node.
  • Register MD_QUOTE_INDENT and MD_QUOTE_INDENT_LIST in markdown_kinds_src.rs.
  • Replace skip_line_indent(3) in emit_quote_prefix_tokens with an explicit loop that emits MdQuoteIndentList > MdQuoteIndent nodes with MD_QUOTE_PRE_MARKER_INDENT tokens.
  • Add FormatNodeRule stubs for MdQuoteIndent and FormatRule for MdQuoteIndentList.
  • Add test cases for 1-space, 2-space, 3-space, tab, and nested pre-marker indentation.
  • Update all blockquote parser snapshots to reflect the new CST shape.

This is the follow-up to #9219, completing Phase 1 parser-side work. Pre-marker indentation (0-3 spaces before >) was the last remaining skipped trivia in blockquote parsing. Each indent space is now a real CST node visible to the formatter harness.

No user-facing behavior change. Parsed semantics are preserved; only the internal CST representation changes.

Test Plan

  • cargo test -p biome_markdown_parser — 66 tests pass (65 existing + 1 new)
  • cargo insta test -p biome_markdown_parser
  • rg -n "pre_marker_indent: MdQuoteIndentList|MD_QUOTE_INDENT_LIST|MD_QUOTE_INDENT" crates/biome_markdown_parser/tests/md_test_suite/**/*.snap — verifies snapshots contain explicit pre-marker indent nodes
  • Tab pre-marker indent correctly rejected (tab = 4 columns > 3 max, parsed as indented code block)

Docs

N/A — internal structural change, no new user-facing features.

…nodes

Replace skip_line_indent(3) in emit_quote_prefix_tokens with explicit
MdQuoteIndentList > MdQuoteIndent node emission, following the
MdHash/MdHashList pattern. Each space before '>' is now a real CST
node visible to the formatter harness instead of skipped trivia.

Grammar: add MdQuoteIndent, MdQuoteIndentList; change MdQuotePrefix
pre_marker_indent from optional token to list field.
@changeset-bot
Copy link

changeset-bot bot commented Feb 24, 2026

⚠️ No Changeset found

Latest commit: 6cf94b3

Merging this PR will not cause a version bump for any packages. If these changes should not result in a new version, you're good to go. If these changes should result in a version bump, you need to add a changeset.

This PR includes no changesets

When changesets are added to this PR, you'll see the packages that this PR includes changesets for and the associated semver types

Click here to learn what changesets are, and how to add one.

Click here if you're a maintainer who wants to add a changeset to this PR

@github-actions github-actions bot added A-Parser Area: parser A-Formatter Area: formatter A-Tooling Area: internal tools labels Feb 24, 2026
@coderabbitai
Copy link
Contributor

coderabbitai bot commented Feb 24, 2026

No actionable comments were generated in the recent review. 🎉

ℹ️ Recent review info

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between cb74be5 and 6cf94b3.

📒 Files selected for processing (2)
  • crates/biome_markdown_parser/src/syntax/mod.rs
  • crates/biome_markdown_parser/src/syntax/quote.rs

Walkthrough

The PR introduces explicit handling of quote pre‑marker indentation across parser, codegen and formatter. The parser adds MAX_BLOCK_PREFIX_INDENT and new token kinds (MD_QUOTE_PRE_MARKER_INDENT, MD_QUOTE_INDENT, MD_QUOTE_INDENT_LIST) and emits bounded indentation tokens. Codegen adds MdQuoteIndent and MdQuoteIndentList nodes. The formatter gains modules and crate‑scoped formatters (FormatMdQuoteIndent, FormatMdQuoteIndentList) plus AsFormat/IntoFormat/FormatRule wiring and tests for varied pre‑marker indent patterns.

Possibly related PRs

Suggested reviewers

  • ematipico
  • dyc3
🚥 Pre-merge checks | ✅ 2
✅ Passed checks (2 passed)
Check name Status Explanation
Title check ✅ Passed The title accurately summarises the main change: promoting pre-marker indent to explicit CST nodes in the markdown parser.
Description check ✅ Passed The description is comprehensive and directly related to the changeset, detailing grammar changes, parser updates, and test coverage.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
  • 📝 Generate docstrings (stacked PR)
  • 📝 Generate docstrings (commit on current branch)
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Post copyable unit tests in a comment

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

Copy link
Member

@ematipico ematipico left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Great work! I left a couple of comments. I'll merge once you add the comment I asked for

Comment on lines 143 to 149
if text.is_empty() || !text.chars().all(|c| c == ' ' || c == '\t') {
break;
}
let indent: usize = text.chars().map(|c| if c == '\t' { 4 } else { 1 }).sum();
if consumed + indent > 3 {
break;
}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You might want to add some comments that explain this logic, mostly because there are some magic numbers that don't give enough context of the business logic

Copy link
Contributor Author

@jfmcdowell jfmcdowell Feb 24, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I’ll address the review comments here and replace the semantic/spec magic numbers that are in scope for this PR.

For the broader cleanup, I’ll open a follow-up PR to standardize the remaining semantic constants across the markdown parser in one pass, since that ended up being larger than expected.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

fixed

p.bump_remap(MD_QUOTE_PRE_MARKER_INDENT);
indent_m.complete(p, MD_QUOTE_INDENT);
}
indent_list_m.complete(p, MD_QUOTE_INDENT_LIST);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is there a particular reason why we don't use ParseList for this? Just curious

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

My thinking: this is a tiny bounded scan (<= 3 cols per CommonMark’s 0–3 indent before >), immediately followed by > validation, so a direct loop felt simpler than ParseNodeList. It also keeps this path strictly no-recovery/no-diagnostic.

Happy to switch to ParseNodeList if you prefer consistency.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No, I think it's fine for now. Maybe can you leave a comment explaning the reasoning

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

fixed

Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

♻️ Duplicate comments (1)
crates/biome_markdown_parser/src/syntax/quote.rs (1)

139-162: Comments and reasoning look good — past feedback addressed.

The bounded scan is well-documented, the tab-expansion logic matches the existing pattern in fenced_code_block.rs and parser.rs, and the rationale for not using ParseNodeList is clearly stated. The always-emitted (possibly empty) MD_QUOTE_INDENT_LIST is consistent with how other list nodes work in biome.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@crates/biome_markdown_parser/src/syntax/quote.rs` around lines 139 - 162, No
change required: the bounded scan in quote handling is correct — keep the
tab-expansion logic and the always-emitted MD_QUOTE_INDENT_LIST as-is; verify
the existing symbols indent_list_m, indent_m, TAB_STOP_SPACES and
MAX_BLOCK_PREFIX_INDENT remain used exactly as shown and leave the
MD_QUOTE_PRE_MARKER_INDENT remap and completion to MD_QUOTE_INDENT untouched.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@crates/biome_markdown_parser/src/syntax/quote.rs`:
- Around line 44-46: Define a new constant named MAX_BLOCK_PREFIX_INDENT in the
constants section of the syntax module (next to INDENT_CODE_BLOCK_SPACES and
TAB_STOP_SPACES) with visibility pub(crate), type usize, and value 3 so imports
of MAX_BLOCK_PREFIX_INDENT in quote.rs and other files resolve; ensure the
constant is declared alongside the existing constants in mod.rs.

---

Duplicate comments:
In `@crates/biome_markdown_parser/src/syntax/quote.rs`:
- Around line 139-162: No change required: the bounded scan in quote handling is
correct — keep the tab-expansion logic and the always-emitted
MD_QUOTE_INDENT_LIST as-is; verify the existing symbols indent_list_m, indent_m,
TAB_STOP_SPACES and MAX_BLOCK_PREFIX_INDENT remain used exactly as shown and
leave the MD_QUOTE_PRE_MARKER_INDENT remap and completion to MD_QUOTE_INDENT
untouched.

ℹ️ Review info

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 86ff35c and cb74be5.

📒 Files selected for processing (1)
  • crates/biome_markdown_parser/src/syntax/quote.rs

@ematipico ematipico merged commit ce67318 into biomejs:main Feb 27, 2026
14 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

A-Formatter Area: formatter A-Parser Area: parser A-Tooling Area: internal tools

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants