feat(parser/markdown): parser implementation by jfmcdowell · Pull Request #8525 · biomejs/biome

jfmcdowell · 2025-12-20T17:18:12Z

Note

AI Assistance Disclosure: This PR was developed with assistance from Claude Code.

Summary

CommonMark-compliant Markdown parser achieving ~69.8% spec compliance (455/652 examples) and addresses #3718.

Compliance Highlights

Perfect (100%)	Strong (80%+)	Needs Work
Block quotes (25/25)	Thematic breaks (95%)	Emphasis (77%)
Indented code (12/12)	ATX headings (89%)	List items (77%)
Soft breaks (2/2)	Paragraphs (88%)	Link ref defs (74%)
Blank lines (1/1)	Fenced code (86%)	Setext headings (70%)
Textual content (3/3)	Backslash escapes (85%)	HTML blocks (68%)
	Tabs (82%)	Images (68%)
	Lists (81%)	Entity refs (65%)
		Links (49%)
		Code spans (46%)
		Hard breaks (40%)
		Raw HTML (40%)
		Autolinks (21%)

Note: Compliance dropped from ~75% to ~70% during ParseNodeList refactoring. The parser is functionally correct; some HTML output edge cases need refinement.

Implemented

Blocks: ATX/Setext headers, fenced/indented code, HTML blocks, blockquotes, thematic breaks
Lists: Bullet (-, +, *) and ordered (1., 1)) with nesting and tightness detection
Inlines: Emphasis, links, images, code spans, autolinks, inline HTML, escapes, entities
Infrastructure: Lossless CST, test_utils feature for HTML rendering, spec test harness

Test Plan

# Run CommonMark compliance report (652 examples)
just test-markdown-conformance

# Run all tests
cargo test -p biome_markdown_parser

69.8% CommonMark spec compliance
59 targeted edge-case tests
just f and just l pass

Future PRs

Remaining compliance gaps (links, code spans, emphasis edge cases)
Move conformance tests to xtask/coverage per project conventions

changeset-bot · 2025-12-20T17:18:16Z

⚠️ No Changeset found

Latest commit: eae8e0f

Merging this PR will not cause a version bump for any packages. If these changes should not result in a new version, you're good to go. If these changes should result in a version bump, you need to add a changeset.

This PR includes no changesets

When changesets are added to this PR, you'll see the packages that this PR includes changesets for and the associated semver types

Click here to learn what changesets are, and how to add one.

Click here if you're a maintainer who wants to add a changeset to this PR

coderabbitai · 2025-12-20T17:23:54Z

Walkthrough

The PR converts the crate into a Markdown parser: lexer rewritten for Markdown tokens (headers, fences, emphasis, block quotes, lists, escapes, textual grouping); new private lexer helpers for backticks, tildes, hashes, escapes and thematic-break vs emphasis logic; parser gains trivia inspection APIs and new modules (fenced_code_block, header, inline, list, quote) implementing fenced/indented code blocks, ATX headers, block quotes, bullet lists, paragraphs and inline elements; grammar/kind updates and a large suite of lexer and fixture tests added.

Suggested reviewers

dyc3
ematipico

🚥 Pre-merge checks | ✅ 2

✅ Passed checks (2 passed)

Check name	Status	Explanation
Title check	✅ Passed	The title accurately describes the main change: a Markdown parser implementation, which aligns with the extensive parser infrastructure and syntax modules added across the codebase.
Description check	✅ Passed	The PR description clearly outlines a CommonMark-compliant Markdown parser implementation addressing issue `#3718`, with detailed compliance metrics, implemented features, test coverage, and future work deferred.

_{✏️ Tip: You can configure your own custom pre-merge checks in the settings.}

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

_{Comment @coderabbitai help to get the list of available commands and usage tips.}

coderabbitai

Actionable comments posted: 3

🧹 Nitpick comments (1)

crates/biome_markdown_parser/src/syntax/header.rs (1)

151-155: Redundant wrapper function.

has_newline_since just delegates to p.has_newline_since. Consider inlining calls directly.

📜 Review details

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between ab88099 and 5b90681.

⛔ Files ignored due to path filters (13)

crates/biome_markdown_factory/src/generated/node_factory.rs is excluded by !**/generated/**, !**/generated/** and included by **
crates/biome_markdown_factory/src/generated/syntax_factory.rs is excluded by !**/generated/**, !**/generated/** and included by **
crates/biome_markdown_parser/tests/md_test_suite/ok/block_quote.md.snap is excluded by !**/*.snap and included by **
crates/biome_markdown_parser/tests/md_test_suite/ok/bullet_list.md.snap is excluded by !**/*.snap and included by **
crates/biome_markdown_parser/tests/md_test_suite/ok/fenced_code_block.md.snap is excluded by !**/*.snap and included by **
crates/biome_markdown_parser/tests/md_test_suite/ok/header.md.snap is excluded by !**/*.snap and included by **
crates/biome_markdown_parser/tests/md_test_suite/ok/indent_code_block.md.snap is excluded by !**/*.snap and included by **
crates/biome_markdown_parser/tests/md_test_suite/ok/inline_elements.md.snap is excluded by !**/*.snap and included by **
crates/biome_markdown_parser/tests/md_test_suite/ok/paragraph.md.snap is excluded by !**/*.snap and included by **
crates/biome_markdown_syntax/src/generated/kind.rs is excluded by !**/generated/**, !**/generated/** and included by **
crates/biome_markdown_syntax/src/generated/macros.rs is excluded by !**/generated/**, !**/generated/** and included by **
crates/biome_markdown_syntax/src/generated/nodes.rs is excluded by !**/generated/**, !**/generated/** and included by **
crates/biome_markdown_syntax/src/generated/nodes_mut.rs is excluded by !**/generated/**, !**/generated/** and included by **

📒 Files selected for processing (21)

.changeset/curly-donkeys-show.md (1 hunks)
crates/biome_markdown_parser/src/lexer/mod.rs (4 hunks)
crates/biome_markdown_parser/src/lexer/tests.rs (1 hunks)
crates/biome_markdown_parser/src/parser.rs (1 hunks)
crates/biome_markdown_parser/src/syntax.rs (3 hunks)
crates/biome_markdown_parser/src/syntax/fenced_code_block.rs (1 hunks)
crates/biome_markdown_parser/src/syntax/header.rs (1 hunks)
crates/biome_markdown_parser/src/syntax/inline.rs (1 hunks)
crates/biome_markdown_parser/src/syntax/list.rs (1 hunks)
crates/biome_markdown_parser/src/syntax/quote.rs (1 hunks)
crates/biome_markdown_parser/src/token_source.rs (1 hunks)
crates/biome_markdown_parser/tests/md_test_suite/ok/block_quote.md (1 hunks)
crates/biome_markdown_parser/tests/md_test_suite/ok/bullet_list.md (1 hunks)
crates/biome_markdown_parser/tests/md_test_suite/ok/fenced_code_block.md (1 hunks)
crates/biome_markdown_parser/tests/md_test_suite/ok/header.md (1 hunks)
crates/biome_markdown_parser/tests/md_test_suite/ok/indent_code_block.md (1 hunks)
crates/biome_markdown_parser/tests/md_test_suite/ok/inline_elements.md (1 hunks)
crates/biome_markdown_parser/tests/md_test_suite/ok/paragraph.md (1 hunks)
crates/biome_markdown_parser/tests/spec_test.rs (1 hunks)
xtask/codegen/markdown.ungram (4 hunks)
xtask/codegen/src/markdown_kinds_src.rs (1 hunks)

🧰 Additional context used

📓 Path-based instructions (2)

crates/**/*.rs

📄 CodeRabbit inference engine (CONTRIBUTING.md)

Update inline rustdoc documentation for rules, assists, and their options when adding new features or changing existing features in Rust crates

Files:

crates/biome_markdown_parser/src/parser.rs
crates/biome_markdown_parser/src/syntax/list.rs
crates/biome_markdown_parser/src/syntax/header.rs
crates/biome_markdown_parser/src/syntax/fenced_code_block.rs
crates/biome_markdown_parser/src/syntax/inline.rs
crates/biome_markdown_parser/src/syntax/quote.rs
crates/biome_markdown_parser/src/token_source.rs
crates/biome_markdown_parser/src/lexer/tests.rs
crates/biome_markdown_parser/src/lexer/mod.rs
crates/biome_markdown_parser/tests/spec_test.rs
crates/biome_markdown_parser/src/syntax.rs

.changeset/*.md

📄 CodeRabbit inference engine (CONTRIBUTING.md)

Write changesets that are concise (1-3 sentences), user-focused, use past tense for actions taken and present tense for Biome behavior, include code examples for rules, and end sentences with periods

Files:

.changeset/curly-donkeys-show.md

🧠 Learnings (37)

📓 Common learnings

Learnt from: CR
Repo: biomejs/biome PR: 0
File: crates/biome_parser/CONTRIBUTING.md:0-0
Timestamp: 2025-11-24T18:06:03.545Z
Learning: Applies to crates/biome_parser/**/lexer/mod.rs : Implement a `Lexer` trait from `biome_parser` crate for the lexer struct that consumes characters from source code and emits tokens

📚 Learning: 2025-12-19T12:53:30.399Z

Learnt from: CR
Repo: biomejs/biome PR: 0
File: crates/biome_analyze/CONTRIBUTING.md:0-0
Timestamp: 2025-12-19T12:53:30.399Z
Learning: Applies to crates/biome_analyze/**/*analyze/src/**/*.rs : The first paragraph of rule documentation must be a single line describing what the rule does

Applied to files:

crates/biome_markdown_parser/tests/md_test_suite/ok/paragraph.md

📚 Learning: 2025-12-19T12:53:30.399Z

Learnt from: CR
Repo: biomejs/biome PR: 0
File: crates/biome_analyze/CONTRIBUTING.md:0-0
Timestamp: 2025-12-19T12:53:30.399Z
Learning: Applies to crates/biome_analyze/**/*analyze/src/**/*.rs : Prefix line with `#` in documentation code examples sparingly; prefer concise complete snippets

Applied to files:

crates/biome_markdown_parser/tests/md_test_suite/ok/paragraph.md
crates/biome_markdown_parser/tests/md_test_suite/ok/block_quote.md
crates/biome_markdown_parser/tests/md_test_suite/ok/indent_code_block.md
crates/biome_markdown_parser/tests/md_test_suite/ok/inline_elements.md
crates/biome_markdown_parser/src/syntax/fenced_code_block.rs
crates/biome_markdown_parser/src/syntax/inline.rs
crates/biome_markdown_parser/src/syntax/quote.rs
crates/biome_markdown_parser/tests/md_test_suite/ok/fenced_code_block.md
crates/biome_markdown_parser/tests/spec_test.rs
crates/biome_markdown_parser/src/syntax.rs

📚 Learning: 2025-12-04T13:29:49.287Z

Learnt from: dyc3
Repo: biomejs/biome PR: 8291
File: crates/biome_html_formatter/tests/specs/prettier/vue/html-vue/elastic-header.html:10-10
Timestamp: 2025-12-04T13:29:49.287Z
Learning: Files under `crates/biome_html_formatter/tests/specs/prettier` are test fixtures synced from Prettier and should not receive detailed code quality reviews (e.g., HTTP vs HTTPS, formatting suggestions, etc.). These files are test data meant to validate formatter behavior and should be preserved as-is.

Applied to files:

crates/biome_markdown_parser/tests/md_test_suite/ok/paragraph.md
crates/biome_markdown_parser/tests/md_test_suite/ok/indent_code_block.md
crates/biome_markdown_parser/tests/md_test_suite/ok/inline_elements.md
crates/biome_markdown_parser/tests/md_test_suite/ok/fenced_code_block.md
crates/biome_markdown_parser/src/lexer/tests.rs
crates/biome_markdown_parser/tests/spec_test.rs

📚 Learning: 2025-11-24T18:06:03.545Z

Learnt from: CR
Repo: biomejs/biome PR: 0
File: crates/biome_parser/CONTRIBUTING.md:0-0
Timestamp: 2025-11-24T18:06:03.545Z
Learning: Applies to crates/biome_parser/**/*.ungram : Add a legend comment to `.ungram` files explaining the grammar syntax including comments, non-terminals, tokens, sequences, alternations, repetitions, and optional elements

Applied to files:

crates/biome_markdown_parser/tests/md_test_suite/ok/paragraph.md
xtask/codegen/markdown.ungram

📚 Learning: 2025-11-24T18:06:03.545Z

Learnt from: CR
Repo: biomejs/biome PR: 0
File: crates/biome_parser/CONTRIBUTING.md:0-0
Timestamp: 2025-11-24T18:06:03.545Z
Learning: Applies to crates/biome_parser/**/src/**/*.rs : Implement a token source struct that wraps the lexer and implements `TokenSourceWithBufferedLexer` and `LexerWithCheckpoint` for lookahead and re-lexing capabilities

Applied to files:

crates/biome_markdown_parser/src/parser.rs
crates/biome_markdown_parser/src/syntax/list.rs
crates/biome_markdown_parser/src/syntax/fenced_code_block.rs
crates/biome_markdown_parser/src/syntax/inline.rs
crates/biome_markdown_parser/src/syntax/quote.rs
crates/biome_markdown_parser/src/token_source.rs
crates/biome_markdown_parser/src/lexer/tests.rs
crates/biome_markdown_parser/src/lexer/mod.rs
crates/biome_markdown_parser/src/syntax.rs

📚 Learning: 2025-11-24T18:06:03.545Z

Learnt from: CR
Repo: biomejs/biome PR: 0
File: crates/biome_parser/CONTRIBUTING.md:0-0
Timestamp: 2025-11-24T18:06:03.545Z
Learning: Applies to crates/biome_parser/**/src/**/*.rs : A parser struct must implement the `Parser` trait and save the token source, parser context, and optional parser options

Applied to files:

crates/biome_markdown_parser/src/parser.rs
crates/biome_markdown_parser/src/syntax/quote.rs
crates/biome_markdown_parser/src/syntax.rs

📚 Learning: 2025-12-19T12:53:30.399Z

Learnt from: CR
Repo: biomejs/biome PR: 0
File: crates/biome_analyze/CONTRIBUTING.md:0-0
Timestamp: 2025-12-19T12:53:30.399Z
Learning: Applies to crates/biome_analyze/**/*analyze/tests/specs/**/*.jsonc : Use `.jsonc` files to contain arrays of code snippet strings for snapshot tests

Applied to files:

crates/biome_markdown_parser/tests/md_test_suite/ok/indent_code_block.md
crates/biome_markdown_parser/tests/md_test_suite/ok/fenced_code_block.md
crates/biome_markdown_parser/tests/spec_test.rs

📚 Learning: 2025-11-24T18:06:03.545Z

Learnt from: CR
Repo: biomejs/biome PR: 0
File: crates/biome_parser/CONTRIBUTING.md:0-0
Timestamp: 2025-11-24T18:06:03.545Z
Learning: Applies to crates/biome_parser/**/language_kind.rs : Add a new language prefix to the `LANGUAGE_PREFIXES` constant in `language_kind.rs` file

Applied to files:

xtask/codegen/src/markdown_kinds_src.rs
crates/biome_markdown_parser/src/syntax/fenced_code_block.rs

📚 Learning: 2025-11-24T18:06:03.545Z

Learnt from: CR
Repo: biomejs/biome PR: 0
File: crates/biome_parser/CONTRIBUTING.md:0-0
Timestamp: 2025-11-24T18:06:03.545Z
Learning: Applies to crates/biome_parser/**/language_kind.rs : Add a new variant to `LanguageKind` enum in `language_kind.rs` file and implement all methods for the new language variant

Applied to files:

xtask/codegen/src/markdown_kinds_src.rs
crates/biome_markdown_parser/src/syntax/fenced_code_block.rs
crates/biome_markdown_parser/src/syntax.rs

📚 Learning: 2025-11-24T18:06:03.545Z

Learnt from: CR
Repo: biomejs/biome PR: 0
File: crates/biome_parser/CONTRIBUTING.md:0-0
Timestamp: 2025-11-24T18:06:03.545Z
Learning: Applies to crates/biome_parser/**/src/*_kinds_src.rs : Create a file called `src/{language}_kinds_src.rs` that returns a static `KindSrc` for new language parsers

Applied to files:

xtask/codegen/src/markdown_kinds_src.rs

📚 Learning: 2025-11-24T18:06:03.545Z

Learnt from: CR
Repo: biomejs/biome PR: 0
File: crates/biome_parser/CONTRIBUTING.md:0-0
Timestamp: 2025-11-24T18:06:03.545Z
Learning: Applies to crates/biome_parser/**/src/**/*.rs : Use `ParseSeparatedList` and `ParseNodeList` for parsing lists with error recovery to avoid infinite loops

Applied to files:

crates/biome_markdown_parser/src/syntax/list.rs
crates/biome_markdown_parser/src/syntax/inline.rs
crates/biome_markdown_parser/src/syntax.rs

📚 Learning: 2025-12-12T10:11:05.564Z

Learnt from: CR
Repo: biomejs/biome PR: 0
File: CONTRIBUTING.md:0-0
Timestamp: 2025-12-12T10:11:05.564Z
Learning: Applies to crates/**/*.rs : Update inline rustdoc documentation for rules, assists, and their options when adding new features or changing existing features in Rust crates

Applied to files:

crates/biome_markdown_parser/src/syntax/list.rs
crates/biome_markdown_parser/tests/md_test_suite/ok/inline_elements.md
crates/biome_markdown_parser/src/syntax/fenced_code_block.rs
crates/biome_markdown_parser/src/syntax/inline.rs
crates/biome_markdown_parser/src/lexer/tests.rs
crates/biome_markdown_parser/src/lexer/mod.rs
crates/biome_markdown_parser/src/syntax.rs

📚 Learning: 2025-11-24T18:06:03.545Z

Learnt from: CR
Repo: biomejs/biome PR: 0
File: crates/biome_parser/CONTRIBUTING.md:0-0
Timestamp: 2025-11-24T18:06:03.545Z
Learning: Applies to crates/biome_parser/**/src/**/*.rs : Implement error recovery in list parsing using `or_recover()` to wrap unparseable tokens in a `BOGUS_*` node and consume tokens until a recovery token is found

Applied to files:

crates/biome_markdown_parser/src/syntax/list.rs
crates/biome_markdown_parser/src/syntax.rs

📚 Learning: 2025-12-19T12:53:30.399Z

Learnt from: CR
Repo: biomejs/biome PR: 0
File: crates/biome_analyze/CONTRIBUTING.md:0-0
Timestamp: 2025-12-19T12:53:30.399Z
Learning: Applies to crates/biome_analyze/**/*analyze/src/lint/nursery/**/*.rs : Add `issue_number` field to `declare_lint_rule!` macro for work-in-progress rules

Applied to files:

crates/biome_markdown_parser/src/syntax/list.rs

📚 Learning: 2025-11-24T18:05:20.371Z

Learnt from: CR
Repo: biomejs/biome PR: 0
File: crates/biome_formatter/CONTRIBUTING.md:0-0
Timestamp: 2025-11-24T18:05:20.371Z
Learning: Applies to crates/biome_formatter/**/biome_*_formatter/Cargo.toml : Include development dependencies in `Cargo.toml` for formatter tests: `biome_formatter_test`, `biome_<language>_factory`, `biome_<language>_parser`, `biome_parser`, `biome_service`, `countme`, `iai`, `quickcheck`, `quickcheck_macros`, and `tests_macros`

Applied to files:

crates/biome_markdown_parser/tests/md_test_suite/ok/inline_elements.md
crates/biome_markdown_parser/src/lexer/tests.rs

📚 Learning: 2025-11-24T18:05:20.371Z

Learnt from: CR
Repo: biomejs/biome PR: 0
File: crates/biome_formatter/CONTRIBUTING.md:0-0
Timestamp: 2025-11-24T18:05:20.371Z
Learning: Applies to crates/biome_formatter/**/biome_*_formatter/tests/language.rs : Implement `TestFormatLanguage` trait in `tests/language.rs` for the formatter's test language

Applied to files:

crates/biome_markdown_parser/tests/md_test_suite/ok/inline_elements.md
crates/biome_markdown_parser/tests/md_test_suite/ok/fenced_code_block.md
crates/biome_markdown_parser/src/lexer/tests.rs
crates/biome_markdown_parser/tests/spec_test.rs

📚 Learning: 2025-12-19T12:53:30.399Z

Learnt from: CR
Repo: biomejs/biome PR: 0
File: crates/biome_analyze/CONTRIBUTING.md:0-0
Timestamp: 2025-12-19T12:53:30.399Z
Learning: Applies to crates/biome_analyze/**/*analyze/src/**/*.rs : Use `ignore` code block property to exclude documentation code examples from automatic validation

Applied to files:

crates/biome_markdown_parser/tests/md_test_suite/ok/inline_elements.md
crates/biome_markdown_parser/tests/md_test_suite/ok/fenced_code_block.md
crates/biome_markdown_parser/tests/spec_test.rs

📚 Learning: 2025-12-19T12:53:30.399Z

Learnt from: CR
Repo: biomejs/biome PR: 0
File: crates/biome_analyze/CONTRIBUTING.md:0-0
Timestamp: 2025-12-19T12:53:30.399Z
Learning: Applies to crates/biome_analyze/**/*analyze/src/**/*.rs : Lines prefixed with `#` in rule documentation code examples will be hidden from output

Applied to files:

crates/biome_markdown_parser/tests/md_test_suite/ok/inline_elements.md
crates/biome_markdown_parser/src/syntax/quote.rs
crates/biome_markdown_parser/tests/md_test_suite/ok/fenced_code_block.md
crates/biome_markdown_parser/tests/spec_test.rs

📚 Learning: 2025-12-12T10:11:05.564Z

Learnt from: CR
Repo: biomejs/biome PR: 0
File: CONTRIBUTING.md:0-0
Timestamp: 2025-12-12T10:11:05.564Z
Learning: Applies to .changeset/*.md : Write changesets that are concise (1-3 sentences), user-focused, use past tense for actions taken and present tense for Biome behavior, include code examples for rules, and end sentences with periods

Applied to files:

.changeset/curly-donkeys-show.md

📚 Learning: 2025-12-12T10:11:05.564Z

Learnt from: CR
Repo: biomejs/biome PR: 0
File: CONTRIBUTING.md:0-0
Timestamp: 2025-12-12T10:11:05.564Z
Learning: Create a changeset for user-facing bug fixes and features using `just new-changeset`, selecting appropriate packages and change type (`major`, `minor`, or `patch`)

Applied to files:

.changeset/curly-donkeys-show.md

📚 Learning: 2025-12-19T12:53:30.399Z

Learnt from: CR
Repo: biomejs/biome PR: 0
File: crates/biome_analyze/CONTRIBUTING.md:0-0
Timestamp: 2025-12-19T12:53:30.399Z
Learning: Applies to crates/biome_analyze/**/*analyze/src/**/*.rs : Rule documentation code blocks must have a language defined for syntax highlighting

Applied to files:

crates/biome_markdown_parser/src/syntax/fenced_code_block.rs
crates/biome_markdown_parser/src/syntax/inline.rs
crates/biome_markdown_parser/src/syntax/quote.rs
crates/biome_markdown_parser/tests/md_test_suite/ok/fenced_code_block.md
crates/biome_markdown_parser/tests/spec_test.rs
crates/biome_markdown_parser/src/syntax.rs

📚 Learning: 2025-12-19T12:53:30.399Z

Learnt from: CR
Repo: biomejs/biome PR: 0
File: crates/biome_analyze/CONTRIBUTING.md:0-0
Timestamp: 2025-12-19T12:53:30.399Z
Learning: Applies to crates/biome_analyze/**/*analyze/src/**/*.rs : Rule documentation code blocks should be ordered as language, expect_diagnostic, options/full_options/use_options, ignore, file

Applied to files:

crates/biome_markdown_parser/src/syntax/fenced_code_block.rs
crates/biome_markdown_parser/src/syntax/inline.rs
crates/biome_markdown_parser/src/syntax/quote.rs
crates/biome_markdown_parser/tests/md_test_suite/ok/fenced_code_block.md
crates/biome_markdown_parser/src/syntax.rs

📚 Learning: 2025-11-24T18:06:03.545Z

Learnt from: CR
Repo: biomejs/biome PR: 0
File: crates/biome_parser/CONTRIBUTING.md:0-0
Timestamp: 2025-11-24T18:06:03.545Z
Learning: Applies to crates/biome_parser/**/src/**/*.rs : Parse rules must take a mutable reference to the parser as their only parameter and return a `ParsedSyntax`

Applied to files:

crates/biome_markdown_parser/src/syntax/fenced_code_block.rs
crates/biome_markdown_parser/src/syntax/inline.rs
crates/biome_markdown_parser/src/syntax/quote.rs
crates/biome_markdown_parser/src/syntax.rs

📚 Learning: 2025-11-24T18:06:03.545Z

Learnt from: CR
Repo: biomejs/biome PR: 0
File: crates/biome_parser/CONTRIBUTING.md:0-0
Timestamp: 2025-11-24T18:06:03.545Z
Learning: Applies to crates/biome_parser/**/src/**/*.rs : Use `ConditionalParsedSyntax` for syntax that is only valid in specific contexts (e.g., strict mode, file types, language versions) and call `or_invalid_to_bogus()` to convert to a bogus node if not supported

Applied to files:

crates/biome_markdown_parser/src/syntax/fenced_code_block.rs
crates/biome_markdown_parser/src/syntax/inline.rs
crates/biome_markdown_parser/src/syntax/quote.rs
crates/biome_markdown_parser/src/syntax.rs

📚 Learning: 2025-11-24T18:06:03.545Z

Learnt from: CR
Repo: biomejs/biome PR: 0
File: crates/biome_parser/CONTRIBUTING.md:0-0
Timestamp: 2025-11-24T18:06:03.545Z
Learning: Applies to crates/biome_parser/**/src/**/*.rs : Parse rule functions must be prefixed with `parse_` and use the name defined in the grammar file, e.g., `parse_for_statement` or `parse_expression`

Applied to files:

crates/biome_markdown_parser/src/syntax/fenced_code_block.rs
crates/biome_markdown_parser/src/syntax/inline.rs
crates/biome_markdown_parser/src/syntax/quote.rs
crates/biome_markdown_parser/src/syntax.rs

📚 Learning: 2025-11-24T18:06:03.545Z

Learnt from: CR
Repo: biomejs/biome PR: 0
File: crates/biome_parser/CONTRIBUTING.md:0-0
Timestamp: 2025-11-24T18:06:03.545Z
Learning: Applies to crates/biome_parser/**/src/**/*.rs : Parse rules must return `ParsedSyntax::Absent` if the rule can't predict by the next token(s) if they form the expected node, and must not progress the parser in this case

Applied to files:

crates/biome_markdown_parser/src/syntax/fenced_code_block.rs
crates/biome_markdown_parser/src/syntax/inline.rs
crates/biome_markdown_parser/src/syntax/quote.rs
crates/biome_markdown_parser/src/syntax.rs

📚 Learning: 2025-11-24T18:06:03.545Z

Learnt from: CR
Repo: biomejs/biome PR: 0
File: crates/biome_parser/CONTRIBUTING.md:0-0
Timestamp: 2025-11-24T18:06:03.545Z
Learning: Applies to crates/biome_parser/**/src/**/*.rs : Use `p.eat(token)` for optional tokens, `p.expect(token)` for required tokens, `parse_rule(p).ok(p)` for optional nodes, and `parse_rule(p).or_add_diagnostic(p, error)` for required nodes

Applied to files:

crates/biome_markdown_parser/src/syntax/inline.rs
crates/biome_markdown_parser/src/lexer/tests.rs
crates/biome_markdown_parser/src/lexer/mod.rs
crates/biome_markdown_parser/src/syntax.rs

📚 Learning: 2025-12-19T12:53:30.399Z

Learnt from: CR
Repo: biomejs/biome PR: 0
File: crates/biome_analyze/CONTRIBUTING.md:0-0
Timestamp: 2025-12-19T12:53:30.399Z
Learning: Applies to crates/biome_analyze/**/*analyze/src/**/*.rs : Use `full_options` code block property for complete biome.json configuration snippets in documentation

Applied to files:

crates/biome_markdown_parser/tests/md_test_suite/ok/fenced_code_block.md

📚 Learning: 2025-12-19T12:53:30.399Z

Learnt from: CR
Repo: biomejs/biome PR: 0
File: crates/biome_analyze/CONTRIBUTING.md:0-0
Timestamp: 2025-12-19T12:53:30.399Z
Learning: Applies to crates/biome_analyze/**/*analyze/src/**/*.rs : Invalid code examples in rule documentation must be marked with `expect_diagnostic` code block property

Applied to files:

crates/biome_markdown_parser/tests/md_test_suite/ok/fenced_code_block.md
crates/biome_markdown_parser/tests/spec_test.rs

📚 Learning: 2025-11-24T18:06:03.545Z

Learnt from: CR
Repo: biomejs/biome PR: 0
File: crates/biome_parser/CONTRIBUTING.md:0-0
Timestamp: 2025-11-24T18:06:03.545Z
Learning: Applies to crates/biome_parser/**/lexer/mod.rs : Implement a `Lexer` trait from `biome_parser` crate for the lexer struct that consumes characters from source code and emits tokens

Applied to files:

crates/biome_markdown_parser/src/lexer/tests.rs
crates/biome_markdown_parser/src/lexer/mod.rs
crates/biome_markdown_parser/src/syntax.rs

📚 Learning: 2025-11-24T18:05:20.371Z

Learnt from: CR
Repo: biomejs/biome PR: 0
File: crates/biome_formatter/CONTRIBUTING.md:0-0
Timestamp: 2025-11-24T18:05:20.371Z
Learning: Applies to crates/biome_formatter/**/biome_*_formatter/tests/spec_tests.rs : Use the `tests_macros::gen_tests!` macro in `spec_tests.rs` to generate test functions for each specification file matching the pattern `tests/specs/<language>/**/*.<ext>`

Applied to files:

crates/biome_markdown_parser/src/lexer/tests.rs
crates/biome_markdown_parser/tests/spec_test.rs

📚 Learning: 2025-11-24T18:06:12.048Z

Learnt from: CR
Repo: biomejs/biome PR: 0
File: crates/biome_service/CONTRIBUTING.md:0-0
Timestamp: 2025-11-24T18:06:12.048Z
Learning: Applies to crates/biome_service/src/workspace/watcher.tests.rs : Implement watcher tests for workspace methods in watcher.tests.rs and end-to-end tests in LSP tests

Applied to files:

crates/biome_markdown_parser/src/lexer/tests.rs
crates/biome_markdown_parser/tests/spec_test.rs

📚 Learning: 2025-11-24T18:05:20.371Z

Learnt from: CR
Repo: biomejs/biome PR: 0
File: crates/biome_formatter/CONTRIBUTING.md:0-0
Timestamp: 2025-11-24T18:05:20.371Z
Learning: Create test infrastructure with `tests/specs` folder structure and `spec_test.rs`, `spec_tests.rs`, and `language.rs` files in test directories

Applied to files:

crates/biome_markdown_parser/src/lexer/tests.rs

📚 Learning: 2025-11-24T18:05:27.810Z

Learnt from: CR
Repo: biomejs/biome PR: 0
File: crates/biome_js_formatter/CONTRIBUTING.md:0-0
Timestamp: 2025-11-24T18:05:27.810Z
Learning: Applies to crates/biome_js_formatter/**/*.rs : When formatting AST nodes, use mandatory tokens from the AST instead of hardcoding token strings (e.g., use `node.l_paren_token().format()` instead of `token("(")`)

Applied to files:

crates/biome_markdown_parser/src/lexer/tests.rs
crates/biome_markdown_parser/src/lexer/mod.rs

📚 Learning: 2025-11-09T12:47:46.298Z

Learnt from: ematipico
Repo: biomejs/biome PR: 8031
File: crates/biome_html_parser/src/syntax/svelte.rs:140-147
Timestamp: 2025-11-09T12:47:46.298Z
Learning: In the Biome HTML parser, `expect` and `expect_with_context` consume the current token and then lex the next token. The context parameter in `expect_with_context` controls how the next token (after the consumed one) is lexed, not the current token being consumed. For example, in Svelte parsing, after `bump_with_context(T!["{:"], HtmlLexContext::Svelte)`, the next token is already lexed in the Svelte context, so `expect(T![else])` is sufficient unless the token after `else` also needs to be lexed in a specific context.

Applied to files:

crates/biome_markdown_parser/src/lexer/mod.rs

📚 Learning: 2025-12-19T12:53:30.399Z

Learnt from: CR
Repo: biomejs/biome PR: 0
File: crates/biome_analyze/CONTRIBUTING.md:0-0
Timestamp: 2025-12-19T12:53:30.399Z
Learning: Applies to crates/biome_analyze/**/*analyze/src/**/*.rs : Valid code examples in rule documentation should not trigger any diagnostics

Applied to files:

crates/biome_markdown_parser/tests/spec_test.rs

🧬 Code graph analysis (7)

crates/biome_markdown_parser/src/parser.rs (2)

crates/biome_markdown_parser/src/token_source.rs (2)

has_blank_line_since (93-122)

has_newline_since (126-133)

crates/biome_markdown_parser/src/syntax/header.rs (1)

has_newline_since (153-155)

crates/biome_markdown_parser/src/syntax/list.rs (2)

crates/biome_markdown_parser/src/syntax.rs (1)

parse_any_inline (160-162)

crates/biome_markdown_parser/src/syntax/inline.rs (1)

parse_any_inline (244-258)

crates/biome_markdown_parser/src/syntax/inline.rs (2)

crates/biome_markdown_parser/src/lexer/mod.rs (1)

source (74-76)

crates/biome_markdown_parser/src/syntax.rs (2)

parse_any_inline (160-162)

parse_textual (171-181)

crates/biome_markdown_parser/src/syntax/quote.rs (1)

crates/biome_markdown_parser/src/syntax.rs (1)

parse_any_inline (160-162)

crates/biome_markdown_parser/src/token_source.rs (2)

crates/biome_markdown_parser/src/parser.rs (2)

has_blank_line_since (40-42)

has_newline_since (45-47)

crates/biome_markdown_parser/src/syntax/header.rs (1)

has_newline_since (153-155)

crates/biome_markdown_parser/src/lexer/mod.rs (2)

crates/biome_unicode_table/src/lib.rs (1)

lookup_byte (106-109)

crates/biome_yaml_parser/src/lexer/mod.rs (1)

start_position (872-874)

crates/biome_markdown_parser/src/syntax.rs (5)

crates/biome_markdown_parser/src/syntax/fenced_code_block.rs (2)

at_fenced_code_block (9-11)

parse_fenced_code_block (21-55)

crates/biome_markdown_parser/src/syntax/header.rs (2)

at_header (15-17)

parse_header (27-53)

crates/biome_markdown_parser/src/syntax/list.rs (2)

at_bullet_list_item (17-30)

parse_bullet_list_item (42-72)

crates/biome_markdown_parser/src/syntax/quote.rs (2)

at_quote (14-16)

parse_quote (24-58)

crates/biome_markdown_parser/src/syntax/inline.rs (1)

parse_any_inline (244-258)

🪛 LanguageTool

crates/biome_markdown_parser/tests/md_test_suite/ok/inline_elements.md

[style] ~4-~4: Three successive sentences begin with the same word. Consider rewording the sentence or use a thesaurus to find a synonym.
Context: ...This is italic and this is bold. This is also italic and also bold. He...

(ENGLISH_WORD_REPEAT_BEGINNING_RULE)

[style] ~5-~5: You’ve already used the word ‘also’ once in your sentence, so using it again may be redundant.
Context: ... bold. This is also italic and also bold. Here is a [link](https://example.co...

(REDUNDANT_FILLER)

🔇 Additional comments (53)

xtask/codegen/src/markdown_kinds_src.rs (1)

19-20: LGTM!

The TRIPLE_TILDE token follows the established pattern alongside TRIPLE_BACKTICK, neatly supporting CommonMark tilde-fenced code blocks.

crates/biome_markdown_parser/tests/md_test_suite/ok/block_quote.md (1)

1-4: LGTM!

Solid test fixture covering multi-line block quotes and blank-line separation between distinct quote blocks.

crates/biome_markdown_parser/tests/md_test_suite/ok/inline_elements.md (1)

1-7: LGTM!

Comprehensive coverage of inline elements: code spans, both emphasis syntaxes (*/_), links, and images. The static analysis grumbles about repetitive phrasing, but that's rather the point of a test fixture.

crates/biome_markdown_parser/tests/md_test_suite/ok/header.md (1)

1-17: LGTM!

Excellent coverage of ATX headers (H1–H6) and trailing-hash edge cases. The "Mixed # content ## with ### trailing ####" line is a nice stress test for the lookahead-based detection.

Consider adding edge-case tests for invalid headers in a separate fixture (e.g., ####### Seven hashes which shouldn't parse as a header, or #NoSpace without the required space). These would exercise error recovery paths.

crates/biome_markdown_parser/tests/md_test_suite/ok/bullet_list.md (1)

1-6: LGTM!

Covers both bullet markers (- and *) with clean separation. Given multi-line list items are deferred to future work, this is appropriate Phase 1 coverage.

crates/biome_markdown_parser/tests/md_test_suite/ok/indent_code_block.md (1)

1-8: LGTM!

Test fixture appropriately exercises indented code block parsing with proper 4-space indentation and mixed content.

crates/biome_markdown_parser/tests/md_test_suite/ok/paragraph.md (1)

1-3: LGTM!

Clean test fixture for paragraph parsing with blank line separation.

crates/biome_markdown_parser/tests/spec_test.rs (1)

138-141: LGTM!

Test input updated with realistic indented code block content, aligning with the new test fixtures.

crates/biome_markdown_parser/src/lexer/tests.rs (1)

165-422: LGTM!

Comprehensive test coverage for the new Markdown lexer capabilities. The tests are well-structured, consistently use the assert_lex! macro, and appropriately exercise ATX headers, fenced code blocks, block quotes, emphasis, links, images, and escape sequences.

crates/biome_markdown_parser/tests/md_test_suite/ok/fenced_code_block.md (1)

1-15: LGTM!

Test fixture properly exercises fenced code block parsing with both backtick and tilde syntax, including language specifiers.

.changeset/curly-donkeys-show.md (1)

7-16: LGTM!

Clear, detailed description of the implemented features with appropriate caveats about scope limitations.

crates/biome_markdown_parser/src/token_source.rs (3)

86-89: LGTM!

Simple trivia length getter appropriately exposes the trivia list size for position tracking.

91-122: LGTM!

The blank line detection logic is correct: it properly counts consecutive newlines (ignoring whitespace between them), resets on trailing trivia or non-newline/whitespace trivia, and returns true when 2+ consecutive newlines are found.

124-133: LGTM!

Simple and correct detection of any non-trailing newline since the given position.

crates/biome_markdown_parser/src/parser.rs (1)

34-53: LGTM!

All four methods are clean delegations to the token source with clear documentation. They appropriately expose trivia tracking capabilities for block-level construct detection and paragraph boundary handling.

crates/biome_markdown_parser/src/syntax/quote.rs (2)

1-11: Module structure looks good.

Clean imports and module documentation. The module is correctly scoped as pub(crate).

13-16: Detection logic is correct.

Properly checks for > at line start using has_preceding_line_break() or at_start_of_input().

crates/biome_markdown_parser/src/syntax/fenced_code_block.rs (4)

8-11: Detection function is correct.

Properly identifies both backtick and tilde fence openings.

21-55: Well-structured fenced code block parsing.

Good handling of fence type matching per CommonMark spec. The optional closing fence is correctly handled.

62-82: Language name parsing is correct.

Sensibly checks for preceding line break to detect empty language strings and parses until newline.

86-105: Content parsing handles fence matching correctly.

Appropriately stops at matching fence type or EOF. Each token is wrapped as MD_TEXTUAL which aligns with the grammar.

crates/biome_markdown_parser/src/syntax/header.rs (4)

10-11: Good use of named constant.

MAX_HEADER_HASHES = 6 makes the CommonMark constraint clear.

57-70: Hash list parsing is straightforward.

Correctly wraps each # in MD_HASH and the sequence in MD_HASH_LIST.

126-149: Trailing hash detection uses checkpoint correctly.

The lookahead pattern with checkpoint/rewind is the right approach for this detection.

27-42: No issue here — the parser state is properly cleaned up.

parse_header is called only from within try_parse, which saves a checkpoint before the call. When rewind() is triggered on parse failure, it truncates the entire events list back to the checkpoint position, discarding the MD_HASH_LIST completion and all subsequent work. The function signature correctly returns Absent and relies on the caller's checkpoint/rewind mechanism, which is the standard pattern in biome_parser.

Likely an incorrect or invalid review comment.

crates/biome_markdown_parser/src/syntax.rs (6)

1-5: Good module organisation.

Clean separation of concerns with dedicated submodules for each block/inline type.

21-22: Named constant improves readability.

INDENT_CODE_BLOCK_SPACES = 4 documents the CommonMark requirement clearly.

27-28: EOF bump is required by the grammar.

Good addition with clear comment explaining the rationale.

41-77: Block dispatch logic is well-structured.

Good fallback chain with try_parse for constructs that may fail validation (thematic breaks, headers). The fallback to paragraph parsing is sensible.

90-109: Indented code block parsing is correct.

Properly checks for 4+ spaces and consumes content while indentation holds.

133-155: Inline item list parsing handles blank lines correctly.

Unlike quote.rs, the trivia position is captured before the loop (line 137), so blank-line detection works as intended.

crates/biome_markdown_parser/src/syntax/list.rs (4)

17-30: Bullet detection logic is sound.

Correctly requires line-start position for bullet markers.

52-68: Error recovery prevents infinite loops.

Good defensive programming with position tracking.

101-122: Bullet content parsing stops at newline correctly.

The has_preceding_line_break() check before parsing ensures single-line items.

124-137: Ordered list stubs are appropriately marked.

Using #[expect(dead_code)] is cleaner than #[allow(dead_code)] as it will warn when the code is eventually used.

crates/biome_markdown_parser/src/lexer/mod.rs (5)

13-14: Import refactoring is cleaner.

Importing Dispatch module and lookup_byte separately improves clarity.

182-196: Dispatch table extended correctly.

Good coverage of Markdown-specific tokens: hash, backtick, tilde, brackets, escape sequences.

199-250: Escape handling follows CommonMark spec.

All escapable ASCII punctuation characters are correctly enumerated.

336-399: Thematic break vs emphasis disambiguation is well-handled.

The save/restore position pattern for speculative parsing is correct. Good handling of the edge case where IDT can match letters or underscore.

401-436: Hash, backtick, and tilde consumers are straightforward.

Clean implementation differentiating single vs triple tokens where applicable.

crates/biome_markdown_parser/src/syntax/inline.rs (8)

1-10: Clean module setup with clear documentation.

Good overview of supported inline elements.

15-40: Inline code parsing handles unclosed spans gracefully.

Correctly parses content until closing backtick, newline, or EOF. The optional closing backtick handling is appropriate for error tolerance.

45-78: Emphasis parsing is correct.

Good use of opener variable to ensure matching close marker.

80-116: Italic parsing mirrors emphasis structure.

Structurally similar to parse_inline_emphasis. If this pattern repeats further, consider extracting a helper, but for two cases it's fine.

118-165: Link parsing handles partial syntax gracefully.

Good error tolerance - incomplete [text] or missing (url) portions are handled without panicking.

167-223: Image parsing correctly requires ![ prefix.

The two-token lookahead (BANG then L_BRACK) is the right approach.

225-241: Nested inline avoids emphasis recursion - good design.

The comment at lines 228-230 explains the intentional limitation clearly.

243-258: Dispatcher routes correctly with textual fallback.

Complete coverage of inline constructs with sensible default to parse_textual.

xtask/codegen/markdown.ungram (5)

109-113: Grammar allows mismatched fence characters.

The grammar permits a block opened with backticks to close with tildes (or vice versa). This is fine since the grammar defines structure rather than semantics — the parser is responsible for enforcing that the closing fence matches the opening fence per CommonMark.

If this is intentional for maximum error tolerance, brilliant. Just worth confirming the parser validates matching fence types.

163-167: Optional closing tokens for error tolerance — sensible approach.

Making r_fence optional here (and similarly for emphasis, code spans, links, and images) allows the parser to produce a valid AST even when the user hasn't finished typing. Good call for editor integration.

188-195: MdInlineLink: Consider requiring '[' to pair with ']' in the required set.

Currently '[' is required whilst ']', '(', and ')' are all optional. This means a valid link node could be just [ followed by content — which is maximally tolerant but potentially quite lenient.

If you find false positives during testing (stray [ characters being parsed as links), you might tighten this. For now, this is a design choice rather than a defect.

200-207: MdInlineImage structure looks good.

The '!' and '[' are required (correctly identifying intent to create an image), with everything else optional for error tolerance. This mirrors the link structure and is consistent.

7-19: Legend comment present — nicely done.

As per the contribution guidelines, the legend explaining grammar syntax is included. Appreciate the clarity.

coderabbitai · 2025-12-20T17:23:57Z

.changeset/curly-donkeys-show.md

+"@biomejs/biome": patch
+---
+
+Add Markdown parser support for inline elements, block quotes, lists, headers, and code blocks.


⚠️ Potential issue | 🟡 Minor

Use past tense for the summary.

Per coding guidelines, changesets should use past tense for actions taken.

🔎 Proposed fix

-Add Markdown parser support for inline elements, block quotes, lists, headers, and code blocks. +Added Markdown parser support for inline elements, block quotes, lists, headers, and code blocks.

As per coding guidelines, changesets should use past tense for actions taken.

📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change

Add Markdown parser support for inline elements, block quotes, lists, headers, and code blocks.

Added Markdown parser support for inline elements, block quotes, lists, headers, and code blocks.

🤖 Prompt for AI Agents

In .changeset/curly-donkeys-show.md around line 5 the changeset summary is written in present tense ("Add Markdown parser support..."); update that summary to past tense ("Added Markdown parser support for inline elements, block quotes, lists, headers, and code blocks.") and save the file so the changeset follows the repository's past-tense guideline.

Added support for + bullet list marker per CommonMark spec.

crates/biome_markdown_parser/src/syntax/list.rs

crates/biome_markdown_parser/src/syntax/quote.rs

.changeset/curly-donkeys-show.md

xtask/codegen/markdown.ungram

crates/biome_markdown_parser/src/syntax/list.rs

coderabbitai

Actionable comments posted: 0

🧹 Nitpick comments (2)

crates/biome_markdown_parser/src/syntax/quote.rs (2)

36-40: Consider explaining the rationale for remapping the quote marker.

The code remaps > to MD_TEXTUAL_LITERAL, making it part of the paragraph content rather than a separate quote marker node. Whilst this works, the design choice is non-obvious. Expanding the comment to explain why this approach was chosen (e.g., to preserve the marker for formatting or due to AST design constraints) would help future maintainers.

1-5: Document Phase 1 scope limitations.

The module documentation mentions "nested block elements", but the current implementation only parses inline content within a single paragraph and doesn't support nested quotes or multiple blocks. Adding a note about Phase 1 limitations (deferred: nesting, multi-line quote consolidation, multiple blocks) would set correct expectations.

Also applies to: 18-23

📜 Review details

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 5b90681 and ae590a3.

⛔ Files ignored due to path filters (1)

crates/biome_markdown_parser/tests/md_test_suite/ok/block_quote.md.snap is excluded by !**/*.snap and included by **

📒 Files selected for processing (1)

crates/biome_markdown_parser/src/syntax/quote.rs (1 hunks)

🧰 Additional context used

📓 Path-based instructions (1)

crates/**/*.rs

📄 CodeRabbit inference engine (CONTRIBUTING.md)

Update inline rustdoc documentation for rules, assists, and their options when adding new features or changing existing features in Rust crates

Files:

crates/biome_markdown_parser/src/syntax/quote.rs

🧠 Learnings (6)

📚 Learning: 2025-11-24T18:06:03.545Z

Learnt from: CR
Repo: biomejs/biome PR: 0
File: crates/biome_parser/CONTRIBUTING.md:0-0
Timestamp: 2025-11-24T18:06:03.545Z
Learning: Applies to crates/biome_parser/**/src/**/*.rs : Parse rules must take a mutable reference to the parser as their only parameter and return a `ParsedSyntax`

Applied to files:

crates/biome_markdown_parser/src/syntax/quote.rs

📚 Learning: 2025-12-19T12:53:30.399Z

Learnt from: CR
Repo: biomejs/biome PR: 0
File: crates/biome_analyze/CONTRIBUTING.md:0-0
Timestamp: 2025-12-19T12:53:30.399Z
Learning: Applies to crates/biome_analyze/**/*analyze/src/**/*.rs : Prefix line with `#` in documentation code examples sparingly; prefer concise complete snippets

Applied to files:

crates/biome_markdown_parser/src/syntax/quote.rs

📚 Learning: 2025-12-19T12:53:30.399Z

Learnt from: CR
Repo: biomejs/biome PR: 0
File: crates/biome_analyze/CONTRIBUTING.md:0-0
Timestamp: 2025-12-19T12:53:30.399Z
Learning: Applies to crates/biome_analyze/**/*analyze/src/**/*.rs : Rule documentation code blocks must have a language defined for syntax highlighting

Applied to files:

crates/biome_markdown_parser/src/syntax/quote.rs

📚 Learning: 2025-12-19T12:53:30.399Z

Learnt from: CR
Repo: biomejs/biome PR: 0
File: crates/biome_analyze/CONTRIBUTING.md:0-0
Timestamp: 2025-12-19T12:53:30.399Z
Learning: Applies to crates/biome_analyze/**/*analyze/src/**/*.rs : Rule documentation code blocks should be ordered as language, expect_diagnostic, options/full_options/use_options, ignore, file

Applied to files:

crates/biome_markdown_parser/src/syntax/quote.rs

📚 Learning: 2025-11-24T18:06:03.545Z

Learnt from: CR
Repo: biomejs/biome PR: 0
File: crates/biome_parser/CONTRIBUTING.md:0-0
Timestamp: 2025-11-24T18:06:03.545Z
Learning: Applies to crates/biome_parser/**/src/**/*.rs : Implement a token source struct that wraps the lexer and implements `TokenSourceWithBufferedLexer` and `LexerWithCheckpoint` for lookahead and re-lexing capabilities

Applied to files:

crates/biome_markdown_parser/src/syntax/quote.rs

📚 Learning: 2025-11-24T18:06:03.545Z

Learnt from: CR
Repo: biomejs/biome PR: 0
File: crates/biome_parser/CONTRIBUTING.md:0-0
Timestamp: 2025-11-24T18:06:03.545Z
Learning: Applies to crates/biome_parser/**/src/**/*.rs : A parser struct must implement the `Parser` trait and save the token source, parser context, and optional parser options

Applied to files:

crates/biome_markdown_parser/src/syntax/quote.rs

🧬 Code graph analysis (1)

crates/biome_markdown_parser/src/syntax/quote.rs (1)

crates/biome_markdown_parser/src/syntax.rs (1)

parse_any_inline (160-162)

🔇 Additional comments (1)

crates/biome_markdown_parser/src/syntax/quote.rs (1)
24-57: Multi-line quote handling requires clarification.

The implementation treats consecutive lines with > markers as separate block quotes rather than as a single multi-line quote. This is confirmed by the test snapshot for block_quote.md, which shows the second line's > being parsed as inline textual content within the first quote block, rather than as a quote continuation marker.

In the test input:
> This is a quote
> It continues here
The expected AST includes the second > as MD_TEXTUAL_LITERAL within the inline item list of the first quote, not as a block-level marker. This differs from CommonMark handling, where each > at line start represents a block quote marker.

Is this an intentional Phase 1 limitation for basic quote support, or should continuation lines with > be recognised as proper quote structure?

coderabbitai

Actionable comments posted: 0

🧹 Nitpick comments (1)

crates/biome_markdown_parser/src/lexer/mod.rs (1)
200-251: Consider using rustdoc comments for documentation.

The escape handling logic correctly implements CommonMark's ASCII punctuation escape rules. However, the function uses standard // comments rather than /// rustdoc comments. Per coding guidelines, inline rustdoc documentation should be updated when adding new features.
🔎 Suggested documentation style
-    /// Consume a backslash escape sequence.
-    /// Per CommonMark spec, a backslash before ASCII punctuation makes it literal.
-    /// Escapable: !"#$%&'()*+,-./:;<=>?@[\]^_`{|}~
+    /// Consume a backslash escape sequence.
+    ///
+    /// Per CommonMark spec, a backslash before ASCII punctuation makes it literal.
+    /// Escapable: `!"#$%&'()*+,-./:;<=>?@[\]^_`{|}~`
+    ///
+    /// If the backslash is not followed by an escapable character, only the backslash
+    /// is consumed and returned as a textual literal.
     fn consume_escape(&mut self) -> MarkdownSyntaxKind {
Based on learnings, inline rustdoc documentation should be maintained for features in Rust crates.

📜 Review details

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between e179019 and e75b98f.

⛔ Files ignored due to path filters (3)

crates/biome_markdown_factory/src/generated/syntax_factory.rs is excluded by !**/generated/**, !**/generated/** and included by **
crates/biome_markdown_parser/tests/md_test_suite/ok/bullet_list.md.snap is excluded by !**/*.snap and included by **
crates/biome_markdown_syntax/src/generated/kind.rs is excluded by !**/generated/**, !**/generated/** and included by **

📒 Files selected for processing (6)

crates/biome_markdown_parser/src/lexer/mod.rs (4 hunks)
crates/biome_markdown_parser/src/lexer/tests.rs (2 hunks)
crates/biome_markdown_parser/src/syntax/list.rs (1 hunks)
crates/biome_markdown_parser/tests/md_test_suite/ok/bullet_list.md (1 hunks)
xtask/codegen/markdown.ungram (5 hunks)
xtask/codegen/src/markdown_kinds_src.rs (2 hunks)

🚧 Files skipped from review as they are similar to previous changes (2)

crates/biome_markdown_parser/src/syntax/list.rs
crates/biome_markdown_parser/src/lexer/tests.rs

🧰 Additional context used

📓 Path-based instructions (1)

crates/**/*.rs

📄 CodeRabbit inference engine (CONTRIBUTING.md)

Update inline rustdoc documentation for rules, assists, and their options when adding new features or changing existing features in Rust crates

Files:

crates/biome_markdown_parser/src/lexer/mod.rs

🧠 Learnings (13)

📚 Learning: 2025-12-04T13:29:49.287Z

Learnt from: dyc3
Repo: biomejs/biome PR: 8291
File: crates/biome_html_formatter/tests/specs/prettier/vue/html-vue/elastic-header.html:10-10
Timestamp: 2025-12-04T13:29:49.287Z
Learning: Files under `crates/biome_html_formatter/tests/specs/prettier` are test fixtures synced from Prettier and should not receive detailed code quality reviews (e.g., HTTP vs HTTPS, formatting suggestions, etc.). These files are test data meant to validate formatter behavior and should be preserved as-is.

Applied to files:

crates/biome_markdown_parser/tests/md_test_suite/ok/bullet_list.md

📚 Learning: 2025-11-24T18:06:03.545Z

Learnt from: CR
Repo: biomejs/biome PR: 0
File: crates/biome_parser/CONTRIBUTING.md:0-0
Timestamp: 2025-11-24T18:06:03.545Z
Learning: Applies to crates/biome_parser/**/language_kind.rs : Add a new variant to `LanguageKind` enum in `language_kind.rs` file and implement all methods for the new language variant

Applied to files:

xtask/codegen/src/markdown_kinds_src.rs

📚 Learning: 2025-11-24T18:06:03.545Z

Learnt from: CR
Repo: biomejs/biome PR: 0
File: crates/biome_parser/CONTRIBUTING.md:0-0
Timestamp: 2025-11-24T18:06:03.545Z
Learning: Applies to crates/biome_parser/**/language_kind.rs : Add a new language prefix to the `LANGUAGE_PREFIXES` constant in `language_kind.rs` file

Applied to files:

xtask/codegen/src/markdown_kinds_src.rs

📚 Learning: 2025-11-24T18:06:03.545Z

Learnt from: CR
Repo: biomejs/biome PR: 0
File: crates/biome_parser/CONTRIBUTING.md:0-0
Timestamp: 2025-11-24T18:06:03.545Z
Learning: Applies to crates/biome_parser/**/lexer/mod.rs : Implement a `Lexer` trait from `biome_parser` crate for the lexer struct that consumes characters from source code and emits tokens

Applied to files:

crates/biome_markdown_parser/src/lexer/mod.rs

📚 Learning: 2025-11-24T18:06:03.545Z

Learnt from: CR
Repo: biomejs/biome PR: 0
File: crates/biome_parser/CONTRIBUTING.md:0-0
Timestamp: 2025-11-24T18:06:03.545Z
Learning: Applies to crates/biome_parser/**/src/**/*.rs : Implement a token source struct that wraps the lexer and implements `TokenSourceWithBufferedLexer` and `LexerWithCheckpoint` for lookahead and re-lexing capabilities

Applied to files:

crates/biome_markdown_parser/src/lexer/mod.rs

📚 Learning: 2025-11-24T18:06:03.545Z

Learnt from: CR
Repo: biomejs/biome PR: 0
File: crates/biome_parser/CONTRIBUTING.md:0-0
Timestamp: 2025-11-24T18:06:03.545Z
Learning: Applies to crates/biome_parser/**/src/**/*.rs : Use `p.eat(token)` for optional tokens, `p.expect(token)` for required tokens, `parse_rule(p).ok(p)` for optional nodes, and `parse_rule(p).or_add_diagnostic(p, error)` for required nodes

Applied to files:

crates/biome_markdown_parser/src/lexer/mod.rs
xtask/codegen/markdown.ungram

📚 Learning: 2025-11-24T18:05:27.810Z

Learnt from: CR
Repo: biomejs/biome PR: 0
File: crates/biome_js_formatter/CONTRIBUTING.md:0-0
Timestamp: 2025-11-24T18:05:27.810Z
Learning: Applies to crates/biome_js_formatter/**/*.rs : When formatting AST nodes, use mandatory tokens from the AST instead of hardcoding token strings (e.g., use `node.l_paren_token().format()` instead of `token("(")`)

Applied to files:

crates/biome_markdown_parser/src/lexer/mod.rs

📚 Learning: 2025-12-12T10:11:05.564Z

Learnt from: CR
Repo: biomejs/biome PR: 0
File: CONTRIBUTING.md:0-0
Timestamp: 2025-12-12T10:11:05.564Z
Learning: Applies to crates/**/*.rs : Update inline rustdoc documentation for rules, assists, and their options when adding new features or changing existing features in Rust crates

Applied to files:

crates/biome_markdown_parser/src/lexer/mod.rs
xtask/codegen/markdown.ungram

📚 Learning: 2025-11-24T18:06:03.545Z

Learnt from: CR
Repo: biomejs/biome PR: 0
File: crates/biome_parser/CONTRIBUTING.md:0-0
Timestamp: 2025-11-24T18:06:03.545Z
Learning: Applies to crates/biome_parser/**/src/**/*.rs : Implement error recovery in list parsing using `or_recover()` to wrap unparseable tokens in a `BOGUS_*` node and consume tokens until a recovery token is found

Applied to files:

crates/biome_markdown_parser/src/lexer/mod.rs

📚 Learning: 2025-11-09T12:47:46.298Z

Learnt from: ematipico
Repo: biomejs/biome PR: 8031
File: crates/biome_html_parser/src/syntax/svelte.rs:140-147
Timestamp: 2025-11-09T12:47:46.298Z
Learning: In the Biome HTML parser, `expect` and `expect_with_context` consume the current token and then lex the next token. The context parameter in `expect_with_context` controls how the next token (after the consumed one) is lexed, not the current token being consumed. For example, in Svelte parsing, after `bump_with_context(T!["{:"], HtmlLexContext::Svelte)`, the next token is already lexed in the Svelte context, so `expect(T![else])` is sufficient unless the token after `else` also needs to be lexed in a specific context.

Applied to files:

crates/biome_markdown_parser/src/lexer/mod.rs
xtask/codegen/markdown.ungram

📚 Learning: 2025-11-24T18:06:03.545Z

Learnt from: CR
Repo: biomejs/biome PR: 0
File: crates/biome_parser/CONTRIBUTING.md:0-0
Timestamp: 2025-11-24T18:06:03.545Z
Learning: Applies to crates/biome_parser/**/*.ungram : Add a legend comment to `.ungram` files explaining the grammar syntax including comments, non-terminals, tokens, sequences, alternations, repetitions, and optional elements

Applied to files:

xtask/codegen/markdown.ungram

📚 Learning: 2025-12-12T10:11:05.564Z

Learnt from: CR
Repo: biomejs/biome PR: 0
File: CONTRIBUTING.md:0-0
Timestamp: 2025-12-12T10:11:05.564Z
Learning: Applies to .changeset/*.md : Write changesets that are concise (1-3 sentences), user-focused, use past tense for actions taken and present tense for Biome behavior, include code examples for rules, and end sentences with periods

Applied to files:

xtask/codegen/markdown.ungram

📚 Learning: 2025-11-24T18:06:03.545Z

Learnt from: CR
Repo: biomejs/biome PR: 0
File: crates/biome_parser/CONTRIBUTING.md:0-0
Timestamp: 2025-11-24T18:06:03.545Z
Learning: Preserve valid tree structure during error recovery by marking invalid parts as 'bogus' nodes rather than invalidating parent nodes

Applied to files:

xtask/codegen/markdown.ungram

🧬 Code graph analysis (1)

crates/biome_markdown_parser/src/lexer/mod.rs (2)

crates/biome_unicode_table/src/lib.rs (1)

lookup_byte (106-109)

crates/biome_yaml_parser/src/lexer/mod.rs (1)

start_position (872-874)

🔇 Additional comments (13)

xtask/codegen/src/markdown_kinds_src.rs (1)

16-16: LGTM! Token and node additions align with CommonMark support.

The PLUS token enables + bullet lists, TRIPLE_TILDE supports tilde-fenced code blocks (~~~), and MD_BOGUS_BULLET provides error recovery for malformed bullets. All additions are consistent with the PR objectives.

Also applies to: 21-21, 43-43

xtask/codegen/markdown.ungram (6)

105-113: Approve tilde fence support; verify content type is addressed.

The addition of tilde-fenced code blocks (~~~) aligns with CommonMark. However, the content type issue flagged separately must be resolved.

115-115: LGTM! Simplified grammar.

Removing the label simplifies the grammar without losing functionality.

134-134: LGTM! Plus bullet support added.

The + bullet marker aligns with CommonMark and is consistent with the PLUS token addition.

144-144: Reasonable to make hard line optional.

Not all paragraphs end with explicit hard line breaks, so making this optional is sensible.

193-202: LGTM! Simplified inline image structure.

Removing intermediate nodes (MdInlineImageAlt, MdInlineImageSource, MdInlineImageLink) and using MdInlineItemList directly simplifies the grammar whilst maintaining the necessary structure for image alt text and source.

97-99: [Rewritten review comment]
[Classification tag]

crates/biome_markdown_parser/tests/md_test_suite/ok/bullet_list.md (1)

1-9: Test fixture looks good.

The bullet list test data appropriately covers the three common Markdown bullet markers (-, *, +) with proper formatting and blank line separation. This should provide good coverage for bullet list parsing validation.

crates/biome_markdown_parser/src/lexer/mod.rs (5)

1-14: Import structure looks solid.

The updated imports correctly bring in the dispatch mechanism and lookup_byte function needed for the expanded token routing. Documentation update appropriately reflects the Markdown-specific implementation.

180-198: Token dispatch routing is well-structured.

The dispatch logic properly routes each token type to its appropriate handler, with simple single-character tokens using the consume_byte helper and complex constructs (thematic breaks, escapes) delegated to dedicated parsing functions.

402-437: Token consumption logic is sound.

The three token consumers (hash, backtick, tilde) correctly handle their respective characters. The backtick and tilde handlers appropriately distinguish between single and triple variants using lookahead, which aligns with CommonMark's fenced code block syntax.

503-507: Helper function is fit for purpose.

The consume_byte helper cleanly handles single-byte token consumption and is actively used throughout the dispatch logic.

382-390: No action required—the implementation correctly reflects the CommonMark specification.

The code tokenises ** and __ as DOUBLE_STAR and DOUBLE_UNDERSCORE respectively for strong emphasis, whilst -- is correctly tokenised as MINUS. This is not an inconsistency; it reflects the markdown spec: dashes are not emphasis markers, only asterisks and underscores are. The comment at line 387 is accurate.

ematipico · 2025-12-21T10:39:16Z

Thank you @jfmcdowell

Please review the implementation of the parser. While I understand you used AI assistance, a human intervention is always required.

Here's an example: check the emitted CST in the snapshots. For each letter inside a paragraph, we create a node. This implantation will create an exponentially big CST for large documents. That's not acceptable for us and the users. Please review the parsing logic so that the emitted CST groups text into a single node.

coderabbitai

Actionable comments posted: 1

🧹 Nitpick comments (2)

crates/biome_markdown_parser/src/lexer/tests.rs (1)

166-184: Consider consolidating hash tests or testing the full header use case.

The hash_token and multiple_hashes tests verify that hashes are emitted individually, which is correct for ATX header parsing. However, it would be valuable to add a test covering a complete header scenario (e.g., "# Title") to ensure the combination works as expected in practice.

crates/biome_markdown_parser/src/lexer/mod.rs (1)

409-437: Code fences support 3+ characters per CommonMark spec—consider extending beyond the current hard-coded triple.

CommonMark requires code fences to be at least three consecutive backticks or tildes, and using four backticks to wrap three is perfectly valid for nesting. The current implementation handles exactly 1 or 3 characters; supporting variable lengths (4, 5, etc.) would allow proper code block nesting without resorting to tildes.

For Phase 1 this is acceptable, but worth addressing in a follow-up.

📜 Review details

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between e75b98f and a7c2a67.

⛔ Files ignored due to path filters (7)

crates/biome_markdown_parser/tests/md_test_suite/ok/block_quote.md.snap is excluded by !**/*.snap and included by **
crates/biome_markdown_parser/tests/md_test_suite/ok/bullet_list.md.snap is excluded by !**/*.snap and included by **
crates/biome_markdown_parser/tests/md_test_suite/ok/fenced_code_block.md.snap is excluded by !**/*.snap and included by **
crates/biome_markdown_parser/tests/md_test_suite/ok/header.md.snap is excluded by !**/*.snap and included by **
crates/biome_markdown_parser/tests/md_test_suite/ok/indent_code_block.md.snap is excluded by !**/*.snap and included by **
crates/biome_markdown_parser/tests/md_test_suite/ok/inline_elements.md.snap is excluded by !**/*.snap and included by **
crates/biome_markdown_parser/tests/md_test_suite/ok/paragraph.md.snap is excluded by !**/*.snap and included by **

📒 Files selected for processing (2)

crates/biome_markdown_parser/src/lexer/mod.rs (5 hunks)
crates/biome_markdown_parser/src/lexer/tests.rs (2 hunks)

🧰 Additional context used

📓 Path-based instructions (1)

crates/**/*.rs

📄 CodeRabbit inference engine (CONTRIBUTING.md)

Update inline rustdoc documentation for rules, assists, and their options when adding new features or changing existing features in Rust crates

Files:

crates/biome_markdown_parser/src/lexer/tests.rs
crates/biome_markdown_parser/src/lexer/mod.rs

🧠 Learnings (18)

📓 Common learnings

Learnt from: CR
Repo: biomejs/biome PR: 0
File: crates/biome_parser/CONTRIBUTING.md:0-0
Timestamp: 2025-11-24T18:06:03.545Z
Learning: Applies to crates/biome_parser/**/src/**/*.rs : Implement a token source struct that wraps the lexer and implements `TokenSourceWithBufferedLexer` and `LexerWithCheckpoint` for lookahead and re-lexing capabilities

Learnt from: CR
Repo: biomejs/biome PR: 0
File: crates/biome_parser/CONTRIBUTING.md:0-0
Timestamp: 2025-11-24T18:06:03.545Z
Learning: Applies to crates/biome_parser/**/*.ungram : Add a legend comment to `.ungram` files explaining the grammar syntax including comments, non-terminals, tokens, sequences, alternations, repetitions, and optional elements

Learnt from: CR
Repo: biomejs/biome PR: 0
File: CONTRIBUTING.md:0-0
Timestamp: 2025-12-12T10:11:05.564Z
Learning: Applies to crates/**/*.rs : Update inline rustdoc documentation for rules, assists, and their options when adding new features or changing existing features in Rust crates

Learnt from: CR
Repo: biomejs/biome PR: 0
File: crates/biome_parser/CONTRIBUTING.md:0-0
Timestamp: 2025-11-24T18:06:03.545Z
Learning: Applies to crates/biome_parser/**/lexer/mod.rs : Implement a `Lexer` trait from `biome_parser` crate for the lexer struct that consumes characters from source code and emits tokens

📚 Learning: 2025-11-24T18:06:03.545Z

Learnt from: CR
Repo: biomejs/biome PR: 0
File: crates/biome_parser/CONTRIBUTING.md:0-0
Timestamp: 2025-11-24T18:06:03.545Z
Learning: Applies to crates/biome_parser/**/src/**/*.rs : Implement a token source struct that wraps the lexer and implements `TokenSourceWithBufferedLexer` and `LexerWithCheckpoint` for lookahead and re-lexing capabilities

Applied to files:

crates/biome_markdown_parser/src/lexer/tests.rs
crates/biome_markdown_parser/src/lexer/mod.rs

📚 Learning: 2025-11-24T18:05:20.371Z

Learnt from: CR
Repo: biomejs/biome PR: 0
File: crates/biome_formatter/CONTRIBUTING.md:0-0
Timestamp: 2025-11-24T18:05:20.371Z
Learning: Applies to crates/biome_formatter/**/biome_*_formatter/tests/language.rs : Implement `TestFormatLanguage` trait in `tests/language.rs` for the formatter's test language

Applied to files:

crates/biome_markdown_parser/src/lexer/tests.rs

📚 Learning: 2025-11-24T18:06:03.545Z

Learnt from: CR
Repo: biomejs/biome PR: 0
File: crates/biome_parser/CONTRIBUTING.md:0-0
Timestamp: 2025-11-24T18:06:03.545Z
Learning: Applies to crates/biome_parser/**/lexer/mod.rs : Implement a `Lexer` trait from `biome_parser` crate for the lexer struct that consumes characters from source code and emits tokens

Applied to files:

crates/biome_markdown_parser/src/lexer/tests.rs
crates/biome_markdown_parser/src/lexer/mod.rs

📚 Learning: 2025-11-24T18:06:12.048Z

Learnt from: CR
Repo: biomejs/biome PR: 0
File: crates/biome_service/CONTRIBUTING.md:0-0
Timestamp: 2025-11-24T18:06:12.048Z
Learning: Applies to crates/biome_service/src/workspace/watcher.tests.rs : Implement watcher tests for workspace methods in watcher.tests.rs and end-to-end tests in LSP tests

Applied to files:

crates/biome_markdown_parser/src/lexer/tests.rs

📚 Learning: 2025-12-12T10:11:05.564Z

Learnt from: CR
Repo: biomejs/biome PR: 0
File: CONTRIBUTING.md:0-0
Timestamp: 2025-12-12T10:11:05.564Z
Learning: Applies to crates/**/*.rs : Update inline rustdoc documentation for rules, assists, and their options when adding new features or changing existing features in Rust crates

Applied to files:

crates/biome_markdown_parser/src/lexer/tests.rs
crates/biome_markdown_parser/src/lexer/mod.rs

📚 Learning: 2025-11-24T18:05:20.371Z

Learnt from: CR
Repo: biomejs/biome PR: 0
File: crates/biome_formatter/CONTRIBUTING.md:0-0
Timestamp: 2025-11-24T18:05:20.371Z
Learning: Applies to crates/biome_formatter/**/biome_*_formatter/tests/spec_tests.rs : Use the `tests_macros::gen_tests!` macro in `spec_tests.rs` to generate test functions for each specification file matching the pattern `tests/specs/<language>/**/*.<ext>`

Applied to files:

crates/biome_markdown_parser/src/lexer/tests.rs

📚 Learning: 2025-11-24T18:05:27.810Z

Learnt from: CR
Repo: biomejs/biome PR: 0
File: crates/biome_js_formatter/CONTRIBUTING.md:0-0
Timestamp: 2025-11-24T18:05:27.810Z
Learning: Applies to crates/biome_js_formatter/**/*.rs : When formatting AST nodes, use mandatory tokens from the AST instead of hardcoding token strings (e.g., use `node.l_paren_token().format()` instead of `token("(")`)

Applied to files:

crates/biome_markdown_parser/src/lexer/tests.rs
crates/biome_markdown_parser/src/lexer/mod.rs

📚 Learning: 2025-12-04T13:29:49.287Z

Learnt from: dyc3
Repo: biomejs/biome PR: 8291
File: crates/biome_html_formatter/tests/specs/prettier/vue/html-vue/elastic-header.html:10-10
Timestamp: 2025-12-04T13:29:49.287Z
Learning: Files under `crates/biome_html_formatter/tests/specs/prettier` are test fixtures synced from Prettier and should not receive detailed code quality reviews (e.g., HTTP vs HTTPS, formatting suggestions, etc.). These files are test data meant to validate formatter behavior and should be preserved as-is.

Applied to files:

crates/biome_markdown_parser/src/lexer/tests.rs

📚 Learning: 2025-11-24T18:05:20.371Z

Learnt from: CR
Repo: biomejs/biome PR: 0
File: crates/biome_formatter/CONTRIBUTING.md:0-0
Timestamp: 2025-11-24T18:05:20.371Z
Learning: Create test infrastructure with `tests/specs` folder structure and `spec_test.rs`, `spec_tests.rs`, and `language.rs` files in test directories

Applied to files:

crates/biome_markdown_parser/src/lexer/tests.rs

📚 Learning: 2025-12-19T12:53:30.399Z

Learnt from: CR
Repo: biomejs/biome PR: 0
File: crates/biome_analyze/CONTRIBUTING.md:0-0
Timestamp: 2025-12-19T12:53:30.399Z
Learning: Applies to crates/biome_analyze/**/*analyze/src/**/*.rs : Avoid string allocations by comparing against `&str` or using `TokenText`

Applied to files:

crates/biome_markdown_parser/src/lexer/tests.rs

📚 Learning: 2025-11-24T18:06:03.545Z

Learnt from: CR
Repo: biomejs/biome PR: 0
File: crates/biome_parser/CONTRIBUTING.md:0-0
Timestamp: 2025-11-24T18:06:03.545Z
Learning: Applies to crates/biome_parser/**/src/**/*.rs : Use `p.eat(token)` for optional tokens, `p.expect(token)` for required tokens, `parse_rule(p).ok(p)` for optional nodes, and `parse_rule(p).or_add_diagnostic(p, error)` for required nodes

Applied to files:

crates/biome_markdown_parser/src/lexer/tests.rs
crates/biome_markdown_parser/src/lexer/mod.rs

📚 Learning: 2025-11-09T12:47:46.298Z

Learnt from: ematipico
Repo: biomejs/biome PR: 8031
File: crates/biome_html_parser/src/syntax/svelte.rs:140-147
Timestamp: 2025-11-09T12:47:46.298Z
Learning: In the Biome HTML parser, `expect` and `expect_with_context` consume the current token and then lex the next token. The context parameter in `expect_with_context` controls how the next token (after the consumed one) is lexed, not the current token being consumed. For example, in Svelte parsing, after `bump_with_context(T!["{:"], HtmlLexContext::Svelte)`, the next token is already lexed in the Svelte context, so `expect(T![else])` is sufficient unless the token after `else` also needs to be lexed in a specific context.

Applied to files:

crates/biome_markdown_parser/src/lexer/tests.rs
crates/biome_markdown_parser/src/lexer/mod.rs

📚 Learning: 2025-11-24T18:06:03.545Z

Learnt from: CR
Repo: biomejs/biome PR: 0
File: crates/biome_parser/CONTRIBUTING.md:0-0
Timestamp: 2025-11-24T18:06:03.545Z
Learning: Applies to crates/biome_parser/**/*.ungram : Add a legend comment to `.ungram` files explaining the grammar syntax including comments, non-terminals, tokens, sequences, alternations, repetitions, and optional elements

Applied to files:

crates/biome_markdown_parser/src/lexer/mod.rs

📚 Learning: 2025-11-24T18:06:03.545Z

Learnt from: CR
Repo: biomejs/biome PR: 0
File: crates/biome_parser/CONTRIBUTING.md:0-0
Timestamp: 2025-11-24T18:06:03.545Z
Learning: Applies to crates/biome_parser/**/src/**/*.rs : Implement error recovery in list parsing using `or_recover()` to wrap unparseable tokens in a `BOGUS_*` node and consume tokens until a recovery token is found

Applied to files:

crates/biome_markdown_parser/src/lexer/mod.rs

📚 Learning: 2025-12-19T12:53:30.399Z

Learnt from: CR
Repo: biomejs/biome PR: 0
File: crates/biome_analyze/CONTRIBUTING.md:0-0
Timestamp: 2025-12-19T12:53:30.399Z
Learning: Applies to crates/biome_analyze/**/*analyze/src/**/*.rs : Prefix line with `#` in documentation code examples sparingly; prefer concise complete snippets

Applied to files:

crates/biome_markdown_parser/src/lexer/mod.rs

📚 Learning: 2025-11-24T18:05:27.810Z

Learnt from: CR
Repo: biomejs/biome PR: 0
File: crates/biome_js_formatter/CONTRIBUTING.md:0-0
Timestamp: 2025-11-24T18:05:27.810Z
Learning: Applies to crates/biome_js_formatter/**/*.rs : For tokens that are not mandatory, use helper functions instead of hardcoding

Applied to files:

crates/biome_markdown_parser/src/lexer/mod.rs

📚 Learning: 2025-11-24T18:06:03.545Z

Learnt from: CR
Repo: biomejs/biome PR: 0
File: crates/biome_parser/CONTRIBUTING.md:0-0
Timestamp: 2025-11-24T18:06:03.545Z
Learning: Applies to crates/biome_parser/**/src/**/*.rs : A parser struct must implement the `Parser` trait and save the token source, parser context, and optional parser options

Applied to files:

crates/biome_markdown_parser/src/lexer/mod.rs

🧬 Code graph analysis (1)

crates/biome_markdown_parser/src/lexer/mod.rs (1)

crates/biome_unicode_table/src/lib.rs (1)

lookup_byte (106-109)

🔇 Additional comments (7)

crates/biome_markdown_parser/src/lexer/tests.rs (3)

364-392: LGTM!

The escape sequence tests correctly verify CommonMark escape handling: punctuation escapes are grouped with their backslash, while non-punctuation escapes leave the backslash as a standalone token.

232-240: Good coverage for textual grouping.

These tests directly verify that consecutive text characters are batched into single MD_TEXTUAL_LITERAL tokens rather than per-character nodes — precisely what was requested in the PR feedback regarding CST size.

Also applies to: 262-273, 299-308

143-164: LGTM!

Thematic break tests cover the key CommonMark variations including spaced markers, confirming that internal whitespace is included in the literal token.

crates/biome_markdown_parser/src/lexer/mod.rs (4)

200-251: LGTM!

The escape handling correctly implements CommonMark's backslash escape rules with the full set of escapable ASCII punctuation characters.

493-540: Core fix for CST bloat — well implemented.

This grouping logic directly addresses the reviewer concern about per-letter nodes. By batching consecutive non-special characters into single MD_TEXTUAL_LITERAL tokens, the CST size becomes linear rather than exponential with document length.

The dispatch-based boundary detection is thorough and correctly handles multi-byte UTF-8 via advance_char_unchecked().

13-14: LGTM!

The dispatch-based token routing is clean and efficient. Using lookup_byte for character classification keeps the lexer fast while supporting the full Markdown token set.

Also applies to: 180-198

345-352: LGTM!

Good defensive check — IDT dispatch covers both letters and underscore, so explicitly filtering for b'_' prevents letters from being mishandled as emphasis markers.

coderabbitai · 2025-12-21T11:52:32Z

crates/biome_markdown_parser/src/lexer/mod.rs

+        // Check for double emphasis markers (**, __, --)
+        if self.peek_byte() == Some(start_char) {
+            self.advance(2);
+            return match start_char {
+                b'*' => DOUBLE_STAR,
+                b'_' => DOUBLE_UNDERSCORE,
+                b'-' => MINUS, // No DOUBLE_MINUS in grammar, use MINUS
+                _ => unreachable!(),
+            };
+        }


⚠️ Potential issue | 🟡 Minor

Potential inconsistency: -- returns MINUS with length 2.

When encountering --, this advances by 2 bytes but returns MINUS, which is typically a single-character token. This creates an inconsistency where the same token kind can have different lengths (1 or 2), which may confuse downstream parsing or source reconstruction.

Consider either:

Adding a DOUBLE_MINUS token to the grammar, or

Returning two separate MINUS tokens by only advancing 1 byte here.

🔎 Option 2: Return single MINUS for consistency

- // Check for double emphasis markers (**, __, --) - if self.peek_byte() == Some(start_char) { - self.advance(2); - return match start_char { - b'*' => DOUBLE_STAR, - b'_' => DOUBLE_UNDERSCORE, - b'-' => MINUS, // No DOUBLE_MINUS in grammar, use MINUS - _ => unreachable!(), - }; - } + // Check for double emphasis markers (**, __) + // Note: -- is not doubled; let it fall through to single MINUS + if self.peek_byte() == Some(start_char) && start_char != b'-' { + self.advance(2); + return match start_char { + b'*' => DOUBLE_STAR, + b'_' => DOUBLE_UNDERSCORE, + _ => unreachable!(), + }; + }

🤖 Prompt for AI Agents

In crates/biome_markdown_parser/src/lexer/mod.rs around lines 381–390, the lexer treats `--` by advancing 2 bytes but returns a single-character MINUS token, causing token-length inconsistency; change the logic so that when start_char == b'-' you only advance(1) and return MINUS (while keeping advance(2) for `*` and `_`), ensuring `--` will be tokenized as two separate MINUS tokens; update or add lexer tests to cover `-` vs `--` cases and adjust any downstream assumptions about MINUS token length if necessary.

jfmcdowell · 2025-12-21T11:54:39Z

Thank you @jfmcdowell

Please review the implementation of the parser. While I understand you used AI assistance, a human intervention is always required.

Here's an example: check the emitted CST in the snapshots. For each letter inside a paragraph, we create a node. This implantation will create an exponentially big CST for large documents. That's not acceptable for us and the users. Please review the parsing logic so that the emitted CST groups text into a single node.

Fixed the issue where each character in a paragraph created a separate MdTextual node. The lexer's
consume_textual() now groups consecutive non-special characters into a single MD_TEXTUAL_LITERAL token.

ematipico

That's better, but I think we need to change things.

Compared to other languages, markdown is purely markup. This means that: trivia don't exist, exception for some rare cases.

The markdown lexer needs to treat whitespaces and newlines as part of the language:

whitespaces are part of a text
newlines become relevant during parsing

coderabbitai

Actionable comments posted: 4

🧹 Nitpick comments (2)

crates/biome_markdown_parser/src/lexer/mod.rs (2)

211-243: Consider refactoring the escapable character match for maintainability.

The 33-line match against escapable ASCII punctuation is difficult to read and maintain. Consider extracting this into a helper method or using a lookup approach.

🔎 Example refactor using a helper function

Add a helper method:

/// Returns true if the byte is an ASCII punctuation character that can be escaped
/// per the CommonMark spec: !"#$%&'()*+,-./:;<=>?@[\]^_`{|}~
#[inline]
fn is_escapable_ascii_punctuation(byte: u8) -> bool {
    matches!(
        byte,
        b'!' | b'"' | b'#' | b'$' | b'%' | b'&' | b'\'' 
        | b'(' | b')' | b'*' | b'+' | b',' | b'-' | b'.' | b'/'
        | b':' | b';' | b'<' | b'=' | b'>' | b'?' | b'@'
        | b'[' | b'\\' | b']' | b'^' | b'_' | b'`'
        | b'{' | b'|' | b'}' | b'~'
    )
}

Then simplify consume_escape:

     // Check if next character is escapable ASCII punctuation
-    if let Some(next) = self.current_byte()
-        && matches!(
-            next,
-            b'!' | b'"'
-                | b'#'
-                | b'$'
-                | b'%'
-                | b'&'
-                | b'\''
-                | b'('
-                | b')'
-                | b'*'
-                | b'+'
-                | b','
-                | b'-'
-                | b'.'
-                | b'/'
-                | b':'
-                | b';'
-                | b'<'
-                | b'='
-                | b'>'
-                | b'?'
-                | b'@'
-                | b'['
-                | b'\\'
-                | b']'
-                | b'^'
-                | b'_'
-                | b'`'
-                | b'{'
-                | b'|'
-                | b'}'
-                | b'~'
-        )
-    {
+    if let Some(next) = self.current_byte() && Self::is_escapable_ascii_punctuation(next) {
         // Consume the escaped character too
         self.advance(1);
     }

345-352: IDT dispatch handling could be more maintainable.

The IDT dispatch matches both letters (A-Z, a-z) and underscore, but only underscore is a Markdown emphasis marker. This special-case handling is duplicated in consume_textual (lines 526-531), making the pattern less obvious and potentially error-prone during future maintenance.

Consider adding a comment explaining why IDT requires special handling, or extracting the underscore check into a helper method for clarity.

📜 Review details

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between a7c2a67 and 1102636.

⛔ Files ignored due to path filters (7)

crates/biome_markdown_parser/tests/md_test_suite/ok/block_quote.md.snap is excluded by !**/*.snap and included by **
crates/biome_markdown_parser/tests/md_test_suite/ok/bullet_list.md.snap is excluded by !**/*.snap and included by **
crates/biome_markdown_parser/tests/md_test_suite/ok/fenced_code_block.md.snap is excluded by !**/*.snap and included by **
crates/biome_markdown_parser/tests/md_test_suite/ok/header.md.snap is excluded by !**/*.snap and included by **
crates/biome_markdown_parser/tests/md_test_suite/ok/indent_code_block.md.snap is excluded by !**/*.snap and included by **
crates/biome_markdown_parser/tests/md_test_suite/ok/inline_elements.md.snap is excluded by !**/*.snap and included by **
crates/biome_markdown_parser/tests/md_test_suite/ok/paragraph.md.snap is excluded by !**/*.snap and included by **

📒 Files selected for processing (2)

crates/biome_markdown_parser/src/lexer/mod.rs (5 hunks)
crates/biome_markdown_parser/src/lexer/tests.rs (2 hunks)

🚧 Files skipped from review as they are similar to previous changes (1)

crates/biome_markdown_parser/src/lexer/tests.rs

🧰 Additional context used

📓 Path-based instructions (1)

crates/**/*.rs

📄 CodeRabbit inference engine (CONTRIBUTING.md)

Update inline rustdoc documentation for rules, assists, and their options when adding new features or changing existing features in Rust crates

Files:

crates/biome_markdown_parser/src/lexer/mod.rs

🧠 Learnings (11)

📓 Common learnings

Learnt from: CR
Repo: biomejs/biome PR: 0
File: CONTRIBUTING.md:0-0
Timestamp: 2025-12-12T10:11:05.564Z
Learning: Applies to crates/**/*.rs : Update inline rustdoc documentation for rules, assists, and their options when adding new features or changing existing features in Rust crates

Learnt from: CR
Repo: biomejs/biome PR: 0
File: crates/biome_parser/CONTRIBUTING.md:0-0
Timestamp: 2025-11-24T18:06:03.545Z
Learning: Applies to crates/biome_parser/**/*.ungram : Add a legend comment to `.ungram` files explaining the grammar syntax including comments, non-terminals, tokens, sequences, alternations, repetitions, and optional elements

Learnt from: CR
Repo: biomejs/biome PR: 0
File: crates/biome_analyze/CONTRIBUTING.md:0-0
Timestamp: 2025-12-19T12:53:30.399Z
Learning: Applies to crates/biome_analyze/**/*analyze/src/**/*.rs : Prefix line with `#` in documentation code examples sparingly; prefer concise complete snippets

📚 Learning: 2025-11-24T18:06:03.545Z

Learnt from: CR
Repo: biomejs/biome PR: 0
File: crates/biome_parser/CONTRIBUTING.md:0-0
Timestamp: 2025-11-24T18:06:03.545Z
Learning: Applies to crates/biome_parser/**/lexer/mod.rs : Implement a `Lexer` trait from `biome_parser` crate for the lexer struct that consumes characters from source code and emits tokens

Applied to files:

crates/biome_markdown_parser/src/lexer/mod.rs

📚 Learning: 2025-11-24T18:06:03.545Z

Learnt from: CR
Repo: biomejs/biome PR: 0
File: crates/biome_parser/CONTRIBUTING.md:0-0
Timestamp: 2025-11-24T18:06:03.545Z
Learning: Applies to crates/biome_parser/**/src/**/*.rs : Implement a token source struct that wraps the lexer and implements `TokenSourceWithBufferedLexer` and `LexerWithCheckpoint` for lookahead and re-lexing capabilities

Applied to files:

crates/biome_markdown_parser/src/lexer/mod.rs

📚 Learning: 2025-11-24T18:06:03.545Z

Learnt from: CR
Repo: biomejs/biome PR: 0
File: crates/biome_parser/CONTRIBUTING.md:0-0
Timestamp: 2025-11-24T18:06:03.545Z
Learning: Applies to crates/biome_parser/**/src/**/*.rs : Use `p.eat(token)` for optional tokens, `p.expect(token)` for required tokens, `parse_rule(p).ok(p)` for optional nodes, and `parse_rule(p).or_add_diagnostic(p, error)` for required nodes

Applied to files:

crates/biome_markdown_parser/src/lexer/mod.rs

📚 Learning: 2025-12-12T10:11:05.564Z

Learnt from: CR
Repo: biomejs/biome PR: 0
File: CONTRIBUTING.md:0-0
Timestamp: 2025-12-12T10:11:05.564Z
Learning: Applies to crates/**/*.rs : Update inline rustdoc documentation for rules, assists, and their options when adding new features or changing existing features in Rust crates

Applied to files:

crates/biome_markdown_parser/src/lexer/mod.rs

📚 Learning: 2025-11-24T18:05:27.810Z

Learnt from: CR
Repo: biomejs/biome PR: 0
File: crates/biome_js_formatter/CONTRIBUTING.md:0-0
Timestamp: 2025-11-24T18:05:27.810Z
Learning: Applies to crates/biome_js_formatter/**/*.rs : When formatting AST nodes, use mandatory tokens from the AST instead of hardcoding token strings (e.g., use `node.l_paren_token().format()` instead of `token("(")`)

Applied to files:

crates/biome_markdown_parser/src/lexer/mod.rs

📚 Learning: 2025-12-19T12:53:30.399Z

Learnt from: CR
Repo: biomejs/biome PR: 0
File: crates/biome_analyze/CONTRIBUTING.md:0-0
Timestamp: 2025-12-19T12:53:30.399Z
Learning: Applies to crates/biome_analyze/**/*analyze/src/**/*.rs : Prefix line with `#` in documentation code examples sparingly; prefer concise complete snippets

Applied to files:

crates/biome_markdown_parser/src/lexer/mod.rs

📚 Learning: 2025-11-24T18:06:03.545Z

Learnt from: CR
Repo: biomejs/biome PR: 0
File: crates/biome_parser/CONTRIBUTING.md:0-0
Timestamp: 2025-11-24T18:06:03.545Z
Learning: Applies to crates/biome_parser/**/*.ungram : Add a legend comment to `.ungram` files explaining the grammar syntax including comments, non-terminals, tokens, sequences, alternations, repetitions, and optional elements

Applied to files:

crates/biome_markdown_parser/src/lexer/mod.rs

📚 Learning: 2025-11-24T18:06:03.545Z

Learnt from: CR
Repo: biomejs/biome PR: 0
File: crates/biome_parser/CONTRIBUTING.md:0-0
Timestamp: 2025-11-24T18:06:03.545Z
Learning: Applies to crates/biome_parser/**/src/**/*.rs : Implement error recovery in list parsing using `or_recover()` to wrap unparseable tokens in a `BOGUS_*` node and consume tokens until a recovery token is found

Applied to files:

crates/biome_markdown_parser/src/lexer/mod.rs

📚 Learning: 2025-12-19T12:53:30.399Z

Learnt from: CR
Repo: biomejs/biome PR: 0
File: crates/biome_analyze/CONTRIBUTING.md:0-0
Timestamp: 2025-12-19T12:53:30.399Z
Learning: Applies to crates/biome_analyze/**/*analyze/src/**/*.rs : Avoid string allocations by comparing against `&str` or using `TokenText`

Applied to files:

crates/biome_markdown_parser/src/lexer/mod.rs

📚 Learning: 2025-11-09T12:47:46.298Z

Learnt from: ematipico
Repo: biomejs/biome PR: 8031
File: crates/biome_html_parser/src/syntax/svelte.rs:140-147
Timestamp: 2025-11-09T12:47:46.298Z
Learning: In the Biome HTML parser, `expect` and `expect_with_context` consume the current token and then lex the next token. The context parameter in `expect_with_context` controls how the next token (after the consumed one) is lexed, not the current token being consumed. For example, in Svelte parsing, after `bump_with_context(T!["{:"], HtmlLexContext::Svelte)`, the next token is already lexed in the Svelte context, so `expect(T![else])` is sufficient unless the token after `else` also needs to be lexed in a specific context.

Applied to files:

crates/biome_markdown_parser/src/lexer/mod.rs

🧬 Code graph analysis (1)

crates/biome_markdown_parser/src/lexer/mod.rs (1)

crates/biome_unicode_table/src/lib.rs (1)

lookup_byte (106-109)

🔇 Additional comments (7)

crates/biome_markdown_parser/src/lexer/mod.rs (7)

1-14: Documentation and imports updated appropriately.

The module documentation now accurately describes the Markdown lexer, and the import changes support the dispatch-based token handling.

180-198: Dispatch table correctly routes Markdown tokens.

The expanded dispatcher appropriately handles all Markdown syntax characters, routing them to dedicated consumption methods.

381-390: The -- token-length inconsistency has been resolved.

The past review concern about -- advancing 2 bytes whilst returning a single-character MINUS token has been addressed. Line 383 now explicitly excludes - from double-marker treatment, so each hyphen in -- is lexed separately as an individual MINUS token with consistent length.

493-540: Textual batching successfully addresses CST bloat concern.

This implementation groups consecutive textual characters into a single MD_TEXTUAL_LITERAL token, directly addressing the issue raised in the PR objectives where each letter was previously emitted as a separate node. The dispatch-based boundary detection correctly identifies when to stop batching, maintaining accurate parsing whilst improving efficiency.

258-262: Inconsistency with AI-generated summary.

The AI summary states "Removed the dead_code attribute from the eat_byte method", but the #[expect(dead_code)] attribute is still present at line 258. If this method is now used, the attribute should be removed; otherwise, the summary is incorrect.

409-422: Convert comment to proper rustdoc format.

The comment at line 409 should use /// for rustdoc instead of // to be included in generated documentation.

As per coding guidelines, update inline rustdoc documentation when adding new features.

⛔ Skipped due to learnings

Learnt from: CR
Repo: biomejs/biome PR: 0
File: crates/biome_analyze/CONTRIBUTING.md:0-0
Timestamp: 2025-12-19T12:53:30.399Z
Learning: Applies to crates/biome_analyze/**/*analyze/src/**/*.rs : Prefix line with `#` in documentation code examples sparingly; prefer concise complete snippets

Learnt from: CR
Repo: biomejs/biome PR: 0
File: CONTRIBUTING.md:0-0
Timestamp: 2025-12-12T10:11:05.564Z
Learning: Applies to crates/**/*.rs : Update inline rustdoc documentation for rules, assists, and their options when adding new features or changing existing features in Rust crates

Learnt from: CR
Repo: biomejs/biome PR: 0
File: crates/biome_analyze/CONTRIBUTING.md:0-0
Timestamp: 2025-12-19T12:53:30.399Z
Learning: Applies to crates/biome_analyze/**/*analyze/src/**/*.rs : Lines prefixed with `#` in rule documentation code examples will be hidden from output

Learnt from: CR
Repo: biomejs/biome PR: 0
File: crates/biome_analyze/CONTRIBUTING.md:0-0
Timestamp: 2025-12-19T12:53:30.399Z
Learning: Applies to crates/biome_analyze/**/*analyze/src/**/*.rs : Rule documentation code blocks should be ordered as language, expect_diagnostic, options/full_options/use_options, ignore, file

Learnt from: CR
Repo: biomejs/biome PR: 0
File: crates/biome_analyze/CONTRIBUTING.md:0-0
Timestamp: 2025-12-19T12:53:30.399Z
Learning: Applies to crates/biome_analyze/**/*analyze/src/**/*.rs : The first paragraph of rule documentation must be a single line describing what the rule does

Learnt from: CR
Repo: biomejs/biome PR: 0
File: crates/biome_parser/CONTRIBUTING.md:0-0
Timestamp: 2025-11-24T18:06:03.545Z
Learning: Applies to crates/biome_parser/**/*.ungram : Add a legend comment to `.ungram` files explaining the grammar syntax including comments, non-terminals, tokens, sequences, alternations, repetitions, and optional elements

Learnt from: CR
Repo: biomejs/biome PR: 0
File: crates/biome_analyze/CONTRIBUTING.md:0-0
Timestamp: 2025-12-19T12:53:30.399Z
Learning: Applies to crates/biome_analyze/**/*analyze/src/**/*.rs : Use `options` code block property for rule-specific configuration snippets in documentation

Learnt from: CR
Repo: biomejs/biome PR: 0
File: crates/biome_analyze/CONTRIBUTING.md:0-0
Timestamp: 2025-12-19T12:53:30.399Z
Learning: Applies to crates/biome_analyze/**/*analyze/src/**/*.rs : Rule documentation code blocks must have a language defined for syntax highlighting

Learnt from: CR
Repo: biomejs/biome PR: 0
File: crates/biome_analyze/CONTRIBUTING.md:0-0
Timestamp: 2025-12-19T12:53:30.399Z
Learning: Applies to crates/biome_analyze/**/*analyze/src/**/*.rs : Invalid code examples in rule documentation must be marked with `expect_diagnostic` code block property

Learnt from: CR
Repo: biomejs/biome PR: 0
File: crates/biome_analyze/CONTRIBUTING.md:0-0
Timestamp: 2025-12-19T12:53:30.399Z
Learning: Applies to crates/biome_analyze/**/*analyze/src/**/*.rs : Rule documentation must include `## Options` section if the rule has options

493-494: Convert comment to proper rustdoc format.

The comments at lines 493-494 should use /// for rustdoc instead of // to be included in generated documentation.

As per coding guidelines, update inline rustdoc documentation when adding new features.

⛔ Skipped due to learnings

Learnt from: CR
Repo: biomejs/biome PR: 0
File: CONTRIBUTING.md:0-0
Timestamp: 2025-12-12T10:11:05.564Z
Learning: Applies to crates/**/*.rs : Update inline rustdoc documentation for rules, assists, and their options when adding new features or changing existing features in Rust crates

Learnt from: CR
Repo: biomejs/biome PR: 0
File: crates/biome_analyze/CONTRIBUTING.md:0-0
Timestamp: 2025-12-19T12:53:30.399Z
Learning: Applies to crates/biome_analyze/**/*analyze/src/**/*.rs : Prefix line with `#` in documentation code examples sparingly; prefer concise complete snippets

Learnt from: CR
Repo: biomejs/biome PR: 0
File: crates/biome_analyze/CONTRIBUTING.md:0-0
Timestamp: 2025-12-19T12:53:30.399Z
Learning: Applies to crates/biome_analyze/**/*analyze/src/**/*.rs : Invalid code examples in rule documentation must be marked with `expect_diagnostic` code block property

Learnt from: CR
Repo: biomejs/biome PR: 0
File: crates/biome_analyze/CONTRIBUTING.md:0-0
Timestamp: 2025-12-19T12:53:30.399Z
Learning: Applies to crates/biome_analyze/**/*analyze/src/**/*.rs : Use `options` code block property for rule-specific configuration snippets in documentation

Learnt from: CR
Repo: biomejs/biome PR: 0
File: crates/biome_analyze/CONTRIBUTING.md:0-0
Timestamp: 2025-12-19T12:53:30.399Z
Learning: Applies to crates/biome_analyze/**/*analyze/src/**/*.rs : Use `use_options` code block property for code examples that follow an options configuration in documentation

Learnt from: CR
Repo: biomejs/biome PR: 0
File: crates/biome_analyze/CONTRIBUTING.md:0-0
Timestamp: 2025-12-19T12:53:30.399Z
Learning: Applies to crates/biome_analyze/**/*analyze/src/**/*.rs : Lines prefixed with `#` in rule documentation code examples will be hidden from output

Learnt from: CR
Repo: biomejs/biome PR: 0
File: crates/biome_formatter/CONTRIBUTING.md:0-0
Timestamp: 2025-11-24T18:05:20.371Z
Learning: Applies to crates/biome_formatter/**/biome_*_formatter/src/comments.rs : Define `<Language>CommentStyle` as a public type alias for `Comments<<Language>Language>` in a `comments.rs` file

Learnt from: CR
Repo: biomejs/biome PR: 0
File: crates/biome_analyze/CONTRIBUTING.md:0-0
Timestamp: 2025-12-19T12:53:30.399Z
Learning: Applies to crates/biome_analyze/**/*analyze/src/**/*.rs : Rule documentation must include `## Examples` section with `### Invalid` and `### Valid` subsections

Learnt from: CR
Repo: biomejs/biome PR: 0
File: crates/biome_analyze/CONTRIBUTING.md:0-0
Timestamp: 2025-12-19T12:53:30.399Z
Learning: Applies to crates/biome_analyze/**/*analyze/src/**/*.rs : Rule documentation code blocks should be ordered as language, expect_diagnostic, options/full_options/use_options, ignore, file

Learnt from: CR
Repo: biomejs/biome PR: 0
File: crates/biome_analyze/CONTRIBUTING.md:0-0
Timestamp: 2025-12-19T12:53:30.399Z
Learning: Applies to crates/biome_analyze/**/*analyze/src/**/*.rs : Rule documentation must include `## Options` section if the rule has options

coderabbitai · 2025-12-21T12:08:02Z

crates/biome_markdown_parser/src/lexer/mod.rs

+    /// Consume a backslash escape sequence.
+    /// Per CommonMark spec, a backslash before ASCII punctuation makes it literal.
+    /// Escapable: !"#$%&'()*+,-./:;<=>?@[\]^_`{|}~


🛠️ Refactor suggestion | 🟠 Major

Add rustdoc documentation.

This method lacks documentation. Per coding guidelines, rustdoc should be added when introducing new features. Consider documenting the CommonMark escape behaviour and which characters are escapable.

As per coding guidelines, update inline rustdoc documentation when adding new features.

🤖 Prompt for AI Agents

In crates/biome_markdown_parser/src/lexer/mod.rs around lines 200 to 202, the method that consumes a backslash escape sequence is missing rustdoc; add a triple-slash doc comment above the method that briefly states its purpose (consume a backslash escape per CommonMark), describes the behavior (a backslash before ASCII punctuation makes that character literal), lists or references the set of escapable characters (e.g. !"#$%&'()*+,-./:;<=>?@[\]^_`{|}~), and include a short note pointing to the CommonMark spec and a tiny usage example or edge-case note (e.g., that non-ASCII characters are not escaped). Ensure the wording is concise and follows project rustdoc style.

coderabbitai · 2025-12-21T12:08:02Z

crates/biome_markdown_parser/src/lexer/mod.rs

+    /// Consumes thematic break literal or returns emphasis marker tokens.
+    /// Called when we see *, -, or _.
+    fn consume_thematic_break_or_emphasis(&mut self, dispatched: Dispatch) -> MarkdownSyntaxKind {


🛠️ Refactor suggestion | 🟠 Major

Add rustdoc documentation.

This method lacks documentation explaining when it returns a thematic break versus emphasis tokens, and the special handling for the IDT dispatch case.

As per coding guidelines, update inline rustdoc documentation when adding new features.

coderabbitai · 2025-12-21T12:08:02Z

crates/biome_markdown_parser/src/lexer/mod.rs

+    /// Consume a single hash character for ATX headers
+    fn consume_hash(&mut self) -> MarkdownSyntaxKind {
+        self.assert_at_char_boundary();
+        self.advance(1);
+        HASH
+    }


🛠️ Refactor suggestion | 🟠 Major

Add rustdoc documentation.

Brief documentation explaining that this consumes a single hash character for ATX headers would improve code clarity.

As per coding guidelines, update inline rustdoc documentation when adding new features.

🤖 Prompt for AI Agents

In crates/biome_markdown_parser/src/lexer/mod.rs around lines 402 to 407, the private method consume_hash lacks rustdoc; add a concise /// doc comment above the function stating that it consumes a single '#' character used for ATX headers, ensures the current position is at a UTF-8 character boundary, advances the lexer by one byte, and returns the HASH MarkdownSyntaxKind token; keep the doc short and aligned with project guidelines and update it if behavior changes in future.

coderabbitai · 2025-12-21T12:08:02Z

crates/biome_markdown_parser/src/lexer/mod.rs

+    /// Consume tilde(s) - either single for other uses or triple for fenced code blocks
+    fn consume_tilde(&mut self) -> MarkdownSyntaxKind {
+        self.assert_at_char_boundary();
+
+        // Check for triple tilde
+        if self.peek_byte() == Some(b'~') && self.byte_at(2) == Some(b'~') {
+            self.advance(3);
+            return TRIPLE_TILDE;
+        }
+
+        // Single tilde
+        self.advance(1);
+        TILDE
    }


🛠️ Refactor suggestion | 🟠 Major

Convert comment to proper rustdoc format.

The comment at line 424 should use /// for rustdoc instead of // to be included in generated documentation.

As per coding guidelines, update inline rustdoc documentation when adding new features.

🤖 Prompt for AI Agents

In crates/biome_markdown_parser/src/lexer/mod.rs around lines 424 to 437, the leading comment for consume_tilde should be a rustdoc comment; replace the current inline comment with a triple-slash doc comment (///) immediately above the fn declaration and ensure the text is formatted as a single-line or short paragraph rustdoc description so it appears in generated docs.

codspeed-hq · 2025-12-30T03:21:18Z

Merging this PR will not alter performance

✅ 58 untouched benchmarks
⏩ 95 skipped benchmarks¹

_{Comparing jfmcdowell:feat/markdown-support-next (eae8e0f) with next (375792e)}

95 benchmarks were skipped, so the baseline results were used instead. If they were deleted from the codebase, click here and archive them to remove them from the performance reports. ↩

jfmcdowell · 2026-01-18T17:32:33Z

@ematipico , holding here to make sure this is directionally correct before finishing this out.

dyc3 · 2026-01-18T20:51:58Z

crates/biome_markdown_parser/src/syntax/fenced_code_block.rs

+        let is_backtick_fence = rest.starts_with("```");
+        let is_tilde_fence = rest.starts_with("~~~");


Little known feature about markdown, you can use more than 3 backticks to start a code block, so you can have 3 backticks inside the code block.

From "4.5 Fenced code blocks" (emphasis my own):

A code fence is a sequence of at least three consecutive backtick characters (`) or tildes (~).

Not sure if you want to resolve that in this PR or a new one.

Thanks for flagging this. The implementation does handle 3+ backticks correctly, but looking at the code again I can see why it wasn't clear.

You can verify the behavior with CommonMark spec examples 124-127:
Edit debug_single_example() to set example_num = 124, then run:

cargo test -p biome_markdown_parser --test commonmark_spec debug_single_example -- --nocapture --ignored

I've refactored the code to make the rule self-documenting. The previous starts_with("```") approach worked but wasn't obvious at a glance. Now it explicitly counts fence characters and validates against a MIN_FENCE_LENGTH constant with CommonMark 4.5 cited in the docs:

/// Minimum number of fence characters required per CommonMark 4.5. const MIN_FENCE_LENGTH: usize = 3; /// Per CommonMark 4.5: "A code fence is a sequence of at least three /// consecutive backtick characters (`) or tildes (~)." fn detect_fence(s: &str) -> Option<(char, usize)> { ... }

Appreciate the review — this is much clearer now. Commit: fb5b4ef

dyc3 · 2026-01-18T21:08:34Z

crates/biome_markdown_parser/src/syntax/header.rs

+/// Parse a list of hash tokens as MdHashList containing MdHash nodes.
+/// Returns the number of hashes parsed.
+fn parse_hash_list(p: &mut MarkdownParser) -> usize {
+    let m = p.start();
+    let mut count = 0;
+
+    while p.at(T![#]) {
+        let hash_m = p.start();
+        p.bump(T![#]);
+        hash_m.complete(p, MD_HASH);
+        count += 1;
+    }
+
+    m.complete(p, MD_HASH_LIST);
+    count
+}


This feels like it would be better handled in the lexer. The way its currently set up, we are going to be allocating a bunch of nodes for a header (for a h6, it would be 6 MdHash + 1 MdHashList node). Plus, this implementation looks like it could allow whitespace in between the hashes (which is not allowed).

Instead, it would be way easier to emit one token for the entire header, and then consumers can tell which level the header is by checking it's length.

BTW, this is not how we parse lists in our parsers. We usually have a struct that implements the ParseNodeList trait.

Thanks for the feedback. I've reworked the header parsing:

Node allocations: Reduced from N+1 nodes to just 2 nodes (MdHash + MdHashList) regardless of heading level.
The lexer already emits consecutive # as a single token - now we use p.cur_text().len() to get the heading level.

Whitespace between hashes: Not possible - the lexer only groups consecutive # into one token. Whitespace breaks the sequence, so # # # foo correctly becomes <h1># # foo</h1>.

Commit: bf3a261

dyc3 · 2026-01-18T21:21:36Z

crates/biome_markdown_parser/src/syntax/html_block.rs

+//! HTML block parsing for Markdown (CommonMark §4.6).
+//!
+//! Per CommonMark §4.6, there are 7 types of HTML blocks:
+//!
+//! 1. `<script`, `<pre`, `<style`, `<textarea` - ends at closing tag
+//! 2. `<!--` - ends at `-->`
+//! 3. `<?` - ends at `?>`
+//! 4. `<!` + uppercase letter - ends at `>`
+//! 5. `<![CDATA[` - ends at `]]>`
+//! 6. Block-level HTML tags (div, p, table, etc.) - ends at blank line
+//! 7. Other tags - ends at blank line


Not entirely sure how I feel about this. On one hand, it is literally defined like this in the commonmark spec, but on the other hand, we already have an html parser.

I believe we should treat the HTML as a special node, but then everything as text. Then we can use snippets capabilities via workspace. Essentially, same logic we have for <script> and <style> contents in our HTML parser.

Thanks for the feedback. I simplified HTML block parsing per your suggestion. HTML content is now captured as raw text rather than implementing all 7 CommonMark 4.6 termination rules. This requires workspace snippets integration in the future.

Trade-off: All HTML blocks terminate at blank lines (~70% CommonMark HTML blocks compliance).

Commit: cb3888d

crates/biome_markdown_parser/src/syntax/html_block.rs

dyc3 · 2026-01-18T21:32:48Z

crates/biome_markdown_parser/src/syntax/quote.rs

+fn parse_quote_block_list(p: &mut MarkdownParser) {
+    let m = p.start();
+    let mut first_line = true;
+    let depth = p.state().block_quote_depth;
+    let mut last_block_was_paragraph = false;
+


This should be a ParseNodeList impl

Thanks for the feedback. I refactored block parsing to return ParsedSyntax instead of ParsedBlockKind enum, enabling proper ParseNodeList implementation for QuoteBlockList. Commit: a3af0c2

I also extended the pattern to root-level document parsing with DocumentBlockList. Commit: 753adcc

The markdown parser now uses ParseNodeList consistently for all major list structures:

DocumentBlockList (root document blocks)

QuoteBlockList (block quote content)

BulletList/ OrderedList (list items)

crates/biome_markdown_parser/src/lexer/mod.rs

dyc3 · 2026-01-18T21:40:12Z

crates/biome_markdown_parser/src/lexer/mod.rs

+    /// Check if text at current position matches a valid entity reference pattern.
+    /// Returns the length of the entity if valid, None otherwise.
+    ///
+    /// Patterns per CommonMark §6.2:
+    /// - Named: `&name;` where name is 2-31 alphanumeric chars starting with letter
+    /// - Decimal: `&#digits;` where digits is 1-7 decimal digits
+    /// - Hex: `&#xhex;` or `&#Xhex;` where hex is 1-6 hex digits
+    fn match_entity_reference(&self) -> Option<usize> {


Not sure if we care about actually parsing these? We don't care about them in the HTML parser, we just treat them as regular text.

Good point, they were added for CommonMark 6.2 compliance but happy to remove if treating them as plain text is the preferred approach. Would simplify the lexer.

ematipico

Great, thank you. I think we can merge it, however I want to understand if you @jfmcdowell are willing to continue to help or not

ematipico · 2026-01-21T07:01:03Z

crates/biome_markdown_parser/tests/md_test_suite/error/unclosed_code_span.md.snap

+```
+unclosed_code_span.md:1:10 parse ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━
+
+  × Unclosed code span, expected closing ```.


This is incorrect. The suggestion should say that only one back tick should be required

Fixed in commit 5992b2e. The format strings were wrapping {backticks} with literal backticks, adding 2 extra characters. Now correctly shows the actual number of opening backticks in the error message.

ematipico · 2026-01-21T07:02:18Z

crates/biome_markdown_parser/tests/md_test_suite/error/unclosed_link.md.snap

+
+  × Unclosed link, expected `]` to close link text.
+
+  > 1 │ This has [unclosed link


nit: ideally the range should span the text too

Fixed in c1c1fe1. The error range now spans the entire unclosed content (e.g., [unclosed link or ![unclosed image) instead of just the opening bracket.

ematipico · 2026-01-21T07:02:51Z

crates/biome_markdown_parser/tests/md_test_suite/error/unclosed_image.md.snap

+
+  × Unclosed image, expected `]` to close alt text.
+
+  > 1 │ This has ![unclosed image


nit: same here regarding the span

Also fixed in c1c1fe1. The error range now spans the entire unclosed content (e.g., [unclosed link or ![unclosed image) instead of just the opening bracket.

ematipico · 2026-01-21T07:07:19Z

crates/biome_markdown_parser/tests/list_tightness.rs

@@ -0,0 +1,32 @@
+use biome_markdown_parser::parse_markdown;


Conformance tests should go inside the proper folder https://github.com/biomejs/biome/tree/main/xtask/coverage/src

Not a blocker, but we should address it

Acknowledged. I'd like to address this in a separate PR.

ematipico · 2026-01-21T07:28:34Z

One more thing.

Please create a just task to run the conformance, and rebase your branch so that CI passes

jfmcdowell · 2026-01-21T14:05:09Z

Great, thank you. I think we can merge it, however I want to understand if you @jfmcdowell are willing to continue to help or not

Absolutely! Happy to keep contributing where I can.

…harness CommonMark 0.31.2 parser achieving 75.2% spec compliance (490/652 examples). Block elements: - ATX/setext headings, fenced/indented code blocks, HTML blocks - Block quotes with lazy continuation, thematic breaks - Ordered/unordered lists with nesting and tightness detection - Link reference definitions Inline elements: - Emphasis via delimiter stack algorithm (§6.4) - Links, images, autolinks, code spans - Entity references, backslash escapes, inline HTML - Hard/soft line breaks Testing: - CommonMark spec test harness with HTML comparison - Feature-gated to_html utility, 59 edge-case tests

Add unconditional no-progress recovery in block parsing and replace unsafe unreachable_unchecked() with safe unreachable!(). Introduce an upfront oversized-input guard in the lexer and make max nesting depth configurable via parse options. Update list/quote handling and related snapshots to match the new recovery behavior and configurable limits.

…menting Add explicit `detect_fence()` helper with `MIN_FENCE_LENGTH` constant that clearly documents the CommonMark §4.5 requirement: "A code fence is a sequence of at least three consecutive backtick or tilde characters." The previous `starts_with("```")` approach worked correctly for 3+ backticks but wasn't self-documenting. The new implementation explicitly counts consecutive fence characters and validates against the minimum length. Also improve the spec test harness to gracefully handle bogus nodes instead of panicking.

…uote grammar Address maintainer feedback about header parsing allocating too many nodes. Previously, `###### Heading 6` would create 7 nodes (6 MdHash + 1 MdHashList). Now it creates only 2 nodes (1 MdHash + 1 MdHashList). Changes: - Lexer: emit all consecutive `#` characters as a single HASH token - Parser: determine heading level from token text length instead of counting nodes - Grammar: fix MdQuote to use `content: MdBlockList` (matches MdBullet pattern) - Update helper functions to work with consolidated tokens The grammar fix restores CommonMark compliance from 69.3% back to 75.2%, with Block quotes returning to 100% (25/25) and HTML blocks at 100% (44/44).

Capture HTML content as raw text rather than implementing all 7 CommonMark §4.6 termination rules. This keeps the parser simple; full HTML parsing can be added via workspace snippets integration in the future. Trade-off: All HTML blocks terminate at blank lines, so not 100% CommonMark §4.6 compliant (types 1-5 have specific terminators like -->, ?>, etc.).

Block parsing functions now return ParsedSyntax instead of ParsedBlockKind enum, enabling proper ParseNodeList trait implementation for QuoteBlockList. - parse_paragraph, parse_indent_code_block return ParsedSyntax - parse_any_block_with_indent_code_policy returns ParsedSyntax - Added is_paragraph_like() helper for lazy continuation logic - Updated call sites in quote.rs and list.rs

Root-level document block parsing now uses ParseNodeList trait instead of manual while loop, aligning with project conventions used by QuoteBlockList, BulletList, and OrderedList.

File size limits are already enforced outside of parsers, so the lexer-level MAX_MARKDOWN_SOURCE_SIZE check would never be hit.

The error message for unclosed inline code spans was incorrectly showing triple backticks (```) instead of the actual number of opening backticks. For input like `This has `unclosed code`, the error said "expected closing ```" but should say "expected closing `". The bug was in the format strings in `unclosed_code_span()` which wrapped the `{backticks}` variable with literal backticks, adding 2 extra characters. Since the variable already contained the correct backticks, this resulted in 3 backticks when only 1 was expected. Also includes justfile fix from PR biomejs#8789 to remove errant `run` argument from test commands.

…ge text The error diagnostic for unclosed links and images now spans the entire unclosed content (e.g., `[unclosed link`) instead of just pointing at the opening bracket (`[`). This provides better context by highlighting exactly what content is missing its closing delimiter, making the error easier to understand and fix. Affects: - Unclosed links: `[text` now highlights `[text` instead of just `[` - Unclosed images: `![alt` now highlights `![alt` instead of just `![` - Unclosed reference labels in both links and images

Adds a just task to run CommonMark spec conformance tests for the markdown parser. Currently runs 652 spec examples with ~69.8% compliance. Usage: just test-markdown-conformance

jfmcdowell · 2026-01-23T02:38:47Z

One more thing.

Please create a just task to run the conformance, and rebase your branch so that CI passes

Added just test-markdown-conformance to run the CommonMark spec tests. This command will be updated when conformance tests move toxtask/coverage.

ematipico

We just made great steps towards markdown parsing, thank you @jfmcdowell !

Apologies for the slow reviews, and thank for bearing with us

github-actions bot added A-Parser Area: parser A-Tooling Area: internal tools labels Dec 20, 2025

jfmcdowell changed the title ~~feat(parser/markdown): complete Phase 1 parser implementation~~ feat(parser/markdown): parser implementation Dec 20, 2025

coderabbitai bot reviewed Dec 20, 2025

View reviewed changes

dyc3 requested changes Dec 20, 2025

View reviewed changes

.changeset/curly-donkeys-show.md Outdated Show resolved Hide resolved

xtask/codegen/markdown.ungram Outdated Show resolved Hide resolved

crates/biome_markdown_parser/src/syntax/list.rs Show resolved Hide resolved

coderabbitai bot reviewed Dec 20, 2025

View reviewed changes

jfmcdowell requested a review from dyc3 December 20, 2025 18:54

coderabbitai bot reviewed Dec 20, 2025

View reviewed changes

coderabbitai bot reviewed Dec 21, 2025

View reviewed changes

ematipico reviewed Dec 21, 2025

View reviewed changes

jfmcdowell force-pushed the feat/markdown-support-next branch from a7c2a67 to 1102636 Compare December 21, 2025 12:04

coderabbitai bot reviewed Dec 21, 2025

View reviewed changes

jfmcdowell requested a review from ematipico December 21, 2025 16:40

jfmcdowell force-pushed the feat/markdown-support-next branch from 7d3d2b9 to 2634098 Compare December 30, 2025 01:36

dyc3 requested changes Jan 18, 2026

View reviewed changes

ematipico reviewed Jan 21, 2026

View reviewed changes

jfmcdowell and others added 6 commits January 22, 2026 20:48

[autofix.ci] apply automated fixes

4be0efe

jfmcdowell added 6 commits January 22, 2026 20:48

refactor(markdown_parser): implement ParseNodeList for DocumentBlockList

364506e

Root-level document block parsing now uses ParseNodeList trait instead of manual while loop, aligning with project conventions used by QuoteBlockList, BulletList, and OrderedList.

fix(markdown_parser): remove redundant input size check

692bd91

File size limits are already enforced outside of parsers, so the lexer-level MAX_MARKDOWN_SOURCE_SIZE check would never be hit.

chore(justfile): add test-markdown-conformance task

eae8e0f

Adds a just task to run CommonMark spec conformance tests for the markdown parser. Currently runs 652 spec examples with ~69.8% compliance. Usage: just test-markdown-conformance

jfmcdowell force-pushed the feat/markdown-support-next branch from c1c1fe1 to eae8e0f Compare January 23, 2026 01:50

jfmcdowell requested a review from ematipico January 23, 2026 02:39

ematipico approved these changes Jan 23, 2026

View reviewed changes

ematipico requested a review from dyc3 January 23, 2026 08:57

dyc3 approved these changes Jan 23, 2026

View reviewed changes

dyc3 merged commit fe81000 into biomejs:next Jan 23, 2026
32 checks passed

jfmcdowell deleted the feat/markdown-support-next branch January 23, 2026 15:50

jfmcdowell mentioned this pull request Jan 24, 2026

chore(coverage): move CommonMark conformance tests to xtask/coverage #8857

Merged

tidefield mentioned this pull request Jan 27, 2026

📎 markdown support #3718

Open

This was referenced Jan 30, 2026

fix(markdown): improve CommonMark conformance and parser structure #8908

Merged

feat(formatter): set up boiletplate for markdown formatter #8962

Merged

This was referenced Feb 21, 2026

refactor(markdown): split list item parser #9167

Merged

fix(markdown-parser): promote blockquote prefix markers from skipped trivia to explicit CST nodes #9219

Merged

refactor(markdown-parser): promote pre-marker indent to explicit CST #9224

Merged

coderabbitai bot mentioned this pull request Feb 28, 2026

refactor(markdown-parser): promote list structural tokens from skipped trivia to explicit CST nodes #9274

Open

	Add Markdown parser support for inline elements, block quotes, lists, headers, and code blocks.
	Added Markdown parser support for inline elements, block quotes, lists, headers, and code blocks.

		let is_backtick_fence = rest.starts_with("```");
		let is_tilde_fence = rest.starts_with("~~~");


		× Unclosed link, expected `]` to close link text.

		> 1 │ This has [unclosed link


		× Unclosed image, expected `]` to close alt text.

		> 1 │ This has ![unclosed image

Uh oh!

Conversation

jfmcdowell commented Dec 20, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

Compliance Highlights

Implemented

Test Plan

Future PRs

Uh oh!

changeset-bot bot commented Dec 20, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

⚠️ No Changeset found

Uh oh!

coderabbitai bot commented Dec 20, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Walkthrough

Suggested reviewers

Uh oh!

coderabbitai bot left a comment

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot Dec 20, 2025

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

coderabbitai bot left a comment

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot left a comment

Choose a reason for hiding this comment

Uh oh!

ematipico commented Dec 21, 2025

Uh oh!

coderabbitai bot left a comment

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot Dec 21, 2025

Choose a reason for hiding this comment

Uh oh!

jfmcdowell commented Dec 21, 2025

Uh oh!

ematipico left a comment • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot left a comment

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot Dec 21, 2025

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot Dec 21, 2025

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot Dec 21, 2025

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot Dec 21, 2025

Choose a reason for hiding this comment

Uh oh!

codspeed-hq bot commented Dec 30, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Merging this PR will not alter performance

Footnotes

Uh oh!

jfmcdowell commented Jan 18, 2026

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

jfmcdowell commented Dec 20, 2025 •

edited

Loading

changeset-bot bot commented Dec 20, 2025 •

edited

Loading

coderabbitai bot commented Dec 20, 2025 •

edited

Loading

ematipico left a comment •

edited

Loading

codspeed-hq bot commented Dec 30, 2025 •

edited

Loading

jfmcdowell Jan 19, 2026 •

edited

Loading

jfmcdowell Jan 19, 2026 •

edited

Loading

jfmcdowell Jan 20, 2026 •

edited

Loading

jfmcdowell Jan 20, 2026 •

edited

Loading

ematipico commented Jan 21, 2026 •

edited

Loading