Skip to content

feat(parser/markdown): parser implementation#8525

Merged
dyc3 merged 12 commits intobiomejs:nextfrom
jfmcdowell:feat/markdown-support-next
Jan 23, 2026
Merged

feat(parser/markdown): parser implementation#8525
dyc3 merged 12 commits intobiomejs:nextfrom
jfmcdowell:feat/markdown-support-next

Conversation

@jfmcdowell
Copy link
Contributor

@jfmcdowell jfmcdowell commented Dec 20, 2025

Note

AI Assistance Disclosure: This PR was developed with assistance from Claude Code.

Summary

CommonMark-compliant Markdown parser achieving ~69.8% spec compliance (455/652 examples) and addresses #3718.

Compliance Highlights

Perfect (100%) Strong (80%+) Needs Work
Block quotes (25/25) Thematic breaks (95%) Emphasis (77%)
Indented code (12/12) ATX headings (89%) List items (77%)
Soft breaks (2/2) Paragraphs (88%) Link ref defs (74%)
Blank lines (1/1) Fenced code (86%) Setext headings (70%)
Textual content (3/3) Backslash escapes (85%) HTML blocks (68%)
Tabs (82%) Images (68%)
Lists (81%) Entity refs (65%)
Links (49%)
Code spans (46%)
Hard breaks (40%)
Raw HTML (40%)
Autolinks (21%)

Note: Compliance dropped from ~75% to ~70% during ParseNodeList refactoring. The parser is functionally correct; some HTML output edge cases need refinement.

Implemented

  • Blocks: ATX/Setext headers, fenced/indented code, HTML blocks, blockquotes, thematic breaks
  • Lists: Bullet (-, +, *) and ordered (1., 1)) with nesting and tightness detection
  • Inlines: Emphasis, links, images, code spans, autolinks, inline HTML, escapes, entities
  • Infrastructure: Lossless CST, test_utils feature for HTML rendering, spec test harness

Test Plan

# Run CommonMark compliance report (652 examples)
just test-markdown-conformance

# Run all tests
cargo test -p biome_markdown_parser
  • 69.8% CommonMark spec compliance
  • 59 targeted edge-case tests
  • just f and just l pass

Future PRs

  • Remaining compliance gaps (links, code spans, emphasis edge cases)
  • Move conformance tests to xtask/coverage per project conventions

@changeset-bot
Copy link

changeset-bot bot commented Dec 20, 2025

⚠️ No Changeset found

Latest commit: eae8e0f

Merging this PR will not cause a version bump for any packages. If these changes should not result in a new version, you're good to go. If these changes should result in a version bump, you need to add a changeset.

This PR includes no changesets

When changesets are added to this PR, you'll see the packages that this PR includes changesets for and the associated semver types

Click here to learn what changesets are, and how to add one.

Click here if you're a maintainer who wants to add a changeset to this PR

@github-actions github-actions bot added A-Parser Area: parser A-Tooling Area: internal tools labels Dec 20, 2025
@jfmcdowell jfmcdowell changed the title feat(parser/markdown): complete Phase 1 parser implementation feat(parser/markdown): parser implementation Dec 20, 2025
@coderabbitai
Copy link
Contributor

coderabbitai bot commented Dec 20, 2025

Walkthrough

The PR converts the crate into a Markdown parser: lexer rewritten for Markdown tokens (headers, fences, emphasis, block quotes, lists, escapes, textual grouping); new private lexer helpers for backticks, tildes, hashes, escapes and thematic-break vs emphasis logic; parser gains trivia inspection APIs and new modules (fenced_code_block, header, inline, list, quote) implementing fenced/indented code blocks, ATX headers, block quotes, bullet lists, paragraphs and inline elements; grammar/kind updates and a large suite of lexer and fixture tests added.

Suggested reviewers

  • dyc3
  • ematipico
🚥 Pre-merge checks | ✅ 2
✅ Passed checks (2 passed)
Check name Status Explanation
Title check ✅ Passed The title accurately describes the main change: a Markdown parser implementation, which aligns with the extensive parser infrastructure and syntax modules added across the codebase.
Description check ✅ Passed The PR description clearly outlines a CommonMark-compliant Markdown parser implementation addressing issue #3718, with detailed compliance metrics, implemented features, test coverage, and future work deferred.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.


Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 3

🧹 Nitpick comments (1)
crates/biome_markdown_parser/src/syntax/header.rs (1)

151-155: Redundant wrapper function.

has_newline_since just delegates to p.has_newline_since. Consider inlining calls directly.

📜 Review details

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between ab88099 and 5b90681.

⛔ Files ignored due to path filters (13)
  • crates/biome_markdown_factory/src/generated/node_factory.rs is excluded by !**/generated/**, !**/generated/** and included by **
  • crates/biome_markdown_factory/src/generated/syntax_factory.rs is excluded by !**/generated/**, !**/generated/** and included by **
  • crates/biome_markdown_parser/tests/md_test_suite/ok/block_quote.md.snap is excluded by !**/*.snap and included by **
  • crates/biome_markdown_parser/tests/md_test_suite/ok/bullet_list.md.snap is excluded by !**/*.snap and included by **
  • crates/biome_markdown_parser/tests/md_test_suite/ok/fenced_code_block.md.snap is excluded by !**/*.snap and included by **
  • crates/biome_markdown_parser/tests/md_test_suite/ok/header.md.snap is excluded by !**/*.snap and included by **
  • crates/biome_markdown_parser/tests/md_test_suite/ok/indent_code_block.md.snap is excluded by !**/*.snap and included by **
  • crates/biome_markdown_parser/tests/md_test_suite/ok/inline_elements.md.snap is excluded by !**/*.snap and included by **
  • crates/biome_markdown_parser/tests/md_test_suite/ok/paragraph.md.snap is excluded by !**/*.snap and included by **
  • crates/biome_markdown_syntax/src/generated/kind.rs is excluded by !**/generated/**, !**/generated/** and included by **
  • crates/biome_markdown_syntax/src/generated/macros.rs is excluded by !**/generated/**, !**/generated/** and included by **
  • crates/biome_markdown_syntax/src/generated/nodes.rs is excluded by !**/generated/**, !**/generated/** and included by **
  • crates/biome_markdown_syntax/src/generated/nodes_mut.rs is excluded by !**/generated/**, !**/generated/** and included by **
📒 Files selected for processing (21)
  • .changeset/curly-donkeys-show.md (1 hunks)
  • crates/biome_markdown_parser/src/lexer/mod.rs (4 hunks)
  • crates/biome_markdown_parser/src/lexer/tests.rs (1 hunks)
  • crates/biome_markdown_parser/src/parser.rs (1 hunks)
  • crates/biome_markdown_parser/src/syntax.rs (3 hunks)
  • crates/biome_markdown_parser/src/syntax/fenced_code_block.rs (1 hunks)
  • crates/biome_markdown_parser/src/syntax/header.rs (1 hunks)
  • crates/biome_markdown_parser/src/syntax/inline.rs (1 hunks)
  • crates/biome_markdown_parser/src/syntax/list.rs (1 hunks)
  • crates/biome_markdown_parser/src/syntax/quote.rs (1 hunks)
  • crates/biome_markdown_parser/src/token_source.rs (1 hunks)
  • crates/biome_markdown_parser/tests/md_test_suite/ok/block_quote.md (1 hunks)
  • crates/biome_markdown_parser/tests/md_test_suite/ok/bullet_list.md (1 hunks)
  • crates/biome_markdown_parser/tests/md_test_suite/ok/fenced_code_block.md (1 hunks)
  • crates/biome_markdown_parser/tests/md_test_suite/ok/header.md (1 hunks)
  • crates/biome_markdown_parser/tests/md_test_suite/ok/indent_code_block.md (1 hunks)
  • crates/biome_markdown_parser/tests/md_test_suite/ok/inline_elements.md (1 hunks)
  • crates/biome_markdown_parser/tests/md_test_suite/ok/paragraph.md (1 hunks)
  • crates/biome_markdown_parser/tests/spec_test.rs (1 hunks)
  • xtask/codegen/markdown.ungram (4 hunks)
  • xtask/codegen/src/markdown_kinds_src.rs (1 hunks)
🧰 Additional context used
📓 Path-based instructions (2)
crates/**/*.rs

📄 CodeRabbit inference engine (CONTRIBUTING.md)

Update inline rustdoc documentation for rules, assists, and their options when adding new features or changing existing features in Rust crates

Files:

  • crates/biome_markdown_parser/src/parser.rs
  • crates/biome_markdown_parser/src/syntax/list.rs
  • crates/biome_markdown_parser/src/syntax/header.rs
  • crates/biome_markdown_parser/src/syntax/fenced_code_block.rs
  • crates/biome_markdown_parser/src/syntax/inline.rs
  • crates/biome_markdown_parser/src/syntax/quote.rs
  • crates/biome_markdown_parser/src/token_source.rs
  • crates/biome_markdown_parser/src/lexer/tests.rs
  • crates/biome_markdown_parser/src/lexer/mod.rs
  • crates/biome_markdown_parser/tests/spec_test.rs
  • crates/biome_markdown_parser/src/syntax.rs
.changeset/*.md

📄 CodeRabbit inference engine (CONTRIBUTING.md)

Write changesets that are concise (1-3 sentences), user-focused, use past tense for actions taken and present tense for Biome behavior, include code examples for rules, and end sentences with periods

Files:

  • .changeset/curly-donkeys-show.md
🧠 Learnings (37)
📓 Common learnings
Learnt from: CR
Repo: biomejs/biome PR: 0
File: crates/biome_parser/CONTRIBUTING.md:0-0
Timestamp: 2025-11-24T18:06:03.545Z
Learning: Applies to crates/biome_parser/**/lexer/mod.rs : Implement a `Lexer` trait from `biome_parser` crate for the lexer struct that consumes characters from source code and emits tokens
📚 Learning: 2025-12-19T12:53:30.399Z
Learnt from: CR
Repo: biomejs/biome PR: 0
File: crates/biome_analyze/CONTRIBUTING.md:0-0
Timestamp: 2025-12-19T12:53:30.399Z
Learning: Applies to crates/biome_analyze/**/*analyze/src/**/*.rs : The first paragraph of rule documentation must be a single line describing what the rule does

Applied to files:

  • crates/biome_markdown_parser/tests/md_test_suite/ok/paragraph.md
📚 Learning: 2025-12-19T12:53:30.399Z
Learnt from: CR
Repo: biomejs/biome PR: 0
File: crates/biome_analyze/CONTRIBUTING.md:0-0
Timestamp: 2025-12-19T12:53:30.399Z
Learning: Applies to crates/biome_analyze/**/*analyze/src/**/*.rs : Prefix line with `#` in documentation code examples sparingly; prefer concise complete snippets

Applied to files:

  • crates/biome_markdown_parser/tests/md_test_suite/ok/paragraph.md
  • crates/biome_markdown_parser/tests/md_test_suite/ok/block_quote.md
  • crates/biome_markdown_parser/tests/md_test_suite/ok/indent_code_block.md
  • crates/biome_markdown_parser/tests/md_test_suite/ok/inline_elements.md
  • crates/biome_markdown_parser/src/syntax/fenced_code_block.rs
  • crates/biome_markdown_parser/src/syntax/inline.rs
  • crates/biome_markdown_parser/src/syntax/quote.rs
  • crates/biome_markdown_parser/tests/md_test_suite/ok/fenced_code_block.md
  • crates/biome_markdown_parser/tests/spec_test.rs
  • crates/biome_markdown_parser/src/syntax.rs
📚 Learning: 2025-12-04T13:29:49.287Z
Learnt from: dyc3
Repo: biomejs/biome PR: 8291
File: crates/biome_html_formatter/tests/specs/prettier/vue/html-vue/elastic-header.html:10-10
Timestamp: 2025-12-04T13:29:49.287Z
Learning: Files under `crates/biome_html_formatter/tests/specs/prettier` are test fixtures synced from Prettier and should not receive detailed code quality reviews (e.g., HTTP vs HTTPS, formatting suggestions, etc.). These files are test data meant to validate formatter behavior and should be preserved as-is.

Applied to files:

  • crates/biome_markdown_parser/tests/md_test_suite/ok/paragraph.md
  • crates/biome_markdown_parser/tests/md_test_suite/ok/indent_code_block.md
  • crates/biome_markdown_parser/tests/md_test_suite/ok/inline_elements.md
  • crates/biome_markdown_parser/tests/md_test_suite/ok/fenced_code_block.md
  • crates/biome_markdown_parser/src/lexer/tests.rs
  • crates/biome_markdown_parser/tests/spec_test.rs
📚 Learning: 2025-11-24T18:06:03.545Z
Learnt from: CR
Repo: biomejs/biome PR: 0
File: crates/biome_parser/CONTRIBUTING.md:0-0
Timestamp: 2025-11-24T18:06:03.545Z
Learning: Applies to crates/biome_parser/**/*.ungram : Add a legend comment to `.ungram` files explaining the grammar syntax including comments, non-terminals, tokens, sequences, alternations, repetitions, and optional elements

Applied to files:

  • crates/biome_markdown_parser/tests/md_test_suite/ok/paragraph.md
  • xtask/codegen/markdown.ungram
📚 Learning: 2025-11-24T18:06:03.545Z
Learnt from: CR
Repo: biomejs/biome PR: 0
File: crates/biome_parser/CONTRIBUTING.md:0-0
Timestamp: 2025-11-24T18:06:03.545Z
Learning: Applies to crates/biome_parser/**/src/**/*.rs : Implement a token source struct that wraps the lexer and implements `TokenSourceWithBufferedLexer` and `LexerWithCheckpoint` for lookahead and re-lexing capabilities

Applied to files:

  • crates/biome_markdown_parser/src/parser.rs
  • crates/biome_markdown_parser/src/syntax/list.rs
  • crates/biome_markdown_parser/src/syntax/fenced_code_block.rs
  • crates/biome_markdown_parser/src/syntax/inline.rs
  • crates/biome_markdown_parser/src/syntax/quote.rs
  • crates/biome_markdown_parser/src/token_source.rs
  • crates/biome_markdown_parser/src/lexer/tests.rs
  • crates/biome_markdown_parser/src/lexer/mod.rs
  • crates/biome_markdown_parser/src/syntax.rs
📚 Learning: 2025-11-24T18:06:03.545Z
Learnt from: CR
Repo: biomejs/biome PR: 0
File: crates/biome_parser/CONTRIBUTING.md:0-0
Timestamp: 2025-11-24T18:06:03.545Z
Learning: Applies to crates/biome_parser/**/src/**/*.rs : A parser struct must implement the `Parser` trait and save the token source, parser context, and optional parser options

Applied to files:

  • crates/biome_markdown_parser/src/parser.rs
  • crates/biome_markdown_parser/src/syntax/quote.rs
  • crates/biome_markdown_parser/src/syntax.rs
📚 Learning: 2025-12-19T12:53:30.399Z
Learnt from: CR
Repo: biomejs/biome PR: 0
File: crates/biome_analyze/CONTRIBUTING.md:0-0
Timestamp: 2025-12-19T12:53:30.399Z
Learning: Applies to crates/biome_analyze/**/*analyze/tests/specs/**/*.jsonc : Use `.jsonc` files to contain arrays of code snippet strings for snapshot tests

Applied to files:

  • crates/biome_markdown_parser/tests/md_test_suite/ok/indent_code_block.md
  • crates/biome_markdown_parser/tests/md_test_suite/ok/fenced_code_block.md
  • crates/biome_markdown_parser/tests/spec_test.rs
📚 Learning: 2025-11-24T18:06:03.545Z
Learnt from: CR
Repo: biomejs/biome PR: 0
File: crates/biome_parser/CONTRIBUTING.md:0-0
Timestamp: 2025-11-24T18:06:03.545Z
Learning: Applies to crates/biome_parser/**/language_kind.rs : Add a new language prefix to the `LANGUAGE_PREFIXES` constant in `language_kind.rs` file

Applied to files:

  • xtask/codegen/src/markdown_kinds_src.rs
  • crates/biome_markdown_parser/src/syntax/fenced_code_block.rs
📚 Learning: 2025-11-24T18:06:03.545Z
Learnt from: CR
Repo: biomejs/biome PR: 0
File: crates/biome_parser/CONTRIBUTING.md:0-0
Timestamp: 2025-11-24T18:06:03.545Z
Learning: Applies to crates/biome_parser/**/language_kind.rs : Add a new variant to `LanguageKind` enum in `language_kind.rs` file and implement all methods for the new language variant

Applied to files:

  • xtask/codegen/src/markdown_kinds_src.rs
  • crates/biome_markdown_parser/src/syntax/fenced_code_block.rs
  • crates/biome_markdown_parser/src/syntax.rs
📚 Learning: 2025-11-24T18:06:03.545Z
Learnt from: CR
Repo: biomejs/biome PR: 0
File: crates/biome_parser/CONTRIBUTING.md:0-0
Timestamp: 2025-11-24T18:06:03.545Z
Learning: Applies to crates/biome_parser/**/src/*_kinds_src.rs : Create a file called `src/{language}_kinds_src.rs` that returns a static `KindSrc` for new language parsers

Applied to files:

  • xtask/codegen/src/markdown_kinds_src.rs
📚 Learning: 2025-11-24T18:06:03.545Z
Learnt from: CR
Repo: biomejs/biome PR: 0
File: crates/biome_parser/CONTRIBUTING.md:0-0
Timestamp: 2025-11-24T18:06:03.545Z
Learning: Applies to crates/biome_parser/**/src/**/*.rs : Use `ParseSeparatedList` and `ParseNodeList` for parsing lists with error recovery to avoid infinite loops

Applied to files:

  • crates/biome_markdown_parser/src/syntax/list.rs
  • crates/biome_markdown_parser/src/syntax/inline.rs
  • crates/biome_markdown_parser/src/syntax.rs
📚 Learning: 2025-12-12T10:11:05.564Z
Learnt from: CR
Repo: biomejs/biome PR: 0
File: CONTRIBUTING.md:0-0
Timestamp: 2025-12-12T10:11:05.564Z
Learning: Applies to crates/**/*.rs : Update inline rustdoc documentation for rules, assists, and their options when adding new features or changing existing features in Rust crates

Applied to files:

  • crates/biome_markdown_parser/src/syntax/list.rs
  • crates/biome_markdown_parser/tests/md_test_suite/ok/inline_elements.md
  • crates/biome_markdown_parser/src/syntax/fenced_code_block.rs
  • crates/biome_markdown_parser/src/syntax/inline.rs
  • crates/biome_markdown_parser/src/lexer/tests.rs
  • crates/biome_markdown_parser/src/lexer/mod.rs
  • crates/biome_markdown_parser/src/syntax.rs
📚 Learning: 2025-11-24T18:06:03.545Z
Learnt from: CR
Repo: biomejs/biome PR: 0
File: crates/biome_parser/CONTRIBUTING.md:0-0
Timestamp: 2025-11-24T18:06:03.545Z
Learning: Applies to crates/biome_parser/**/src/**/*.rs : Implement error recovery in list parsing using `or_recover()` to wrap unparseable tokens in a `BOGUS_*` node and consume tokens until a recovery token is found

Applied to files:

  • crates/biome_markdown_parser/src/syntax/list.rs
  • crates/biome_markdown_parser/src/syntax.rs
📚 Learning: 2025-12-19T12:53:30.399Z
Learnt from: CR
Repo: biomejs/biome PR: 0
File: crates/biome_analyze/CONTRIBUTING.md:0-0
Timestamp: 2025-12-19T12:53:30.399Z
Learning: Applies to crates/biome_analyze/**/*analyze/src/lint/nursery/**/*.rs : Add `issue_number` field to `declare_lint_rule!` macro for work-in-progress rules

Applied to files:

  • crates/biome_markdown_parser/src/syntax/list.rs
📚 Learning: 2025-11-24T18:05:20.371Z
Learnt from: CR
Repo: biomejs/biome PR: 0
File: crates/biome_formatter/CONTRIBUTING.md:0-0
Timestamp: 2025-11-24T18:05:20.371Z
Learning: Applies to crates/biome_formatter/**/biome_*_formatter/Cargo.toml : Include development dependencies in `Cargo.toml` for formatter tests: `biome_formatter_test`, `biome_<language>_factory`, `biome_<language>_parser`, `biome_parser`, `biome_service`, `countme`, `iai`, `quickcheck`, `quickcheck_macros`, and `tests_macros`

Applied to files:

  • crates/biome_markdown_parser/tests/md_test_suite/ok/inline_elements.md
  • crates/biome_markdown_parser/src/lexer/tests.rs
📚 Learning: 2025-11-24T18:05:20.371Z
Learnt from: CR
Repo: biomejs/biome PR: 0
File: crates/biome_formatter/CONTRIBUTING.md:0-0
Timestamp: 2025-11-24T18:05:20.371Z
Learning: Applies to crates/biome_formatter/**/biome_*_formatter/tests/language.rs : Implement `TestFormatLanguage` trait in `tests/language.rs` for the formatter's test language

Applied to files:

  • crates/biome_markdown_parser/tests/md_test_suite/ok/inline_elements.md
  • crates/biome_markdown_parser/tests/md_test_suite/ok/fenced_code_block.md
  • crates/biome_markdown_parser/src/lexer/tests.rs
  • crates/biome_markdown_parser/tests/spec_test.rs
📚 Learning: 2025-12-19T12:53:30.399Z
Learnt from: CR
Repo: biomejs/biome PR: 0
File: crates/biome_analyze/CONTRIBUTING.md:0-0
Timestamp: 2025-12-19T12:53:30.399Z
Learning: Applies to crates/biome_analyze/**/*analyze/src/**/*.rs : Use `ignore` code block property to exclude documentation code examples from automatic validation

Applied to files:

  • crates/biome_markdown_parser/tests/md_test_suite/ok/inline_elements.md
  • crates/biome_markdown_parser/tests/md_test_suite/ok/fenced_code_block.md
  • crates/biome_markdown_parser/tests/spec_test.rs
📚 Learning: 2025-12-19T12:53:30.399Z
Learnt from: CR
Repo: biomejs/biome PR: 0
File: crates/biome_analyze/CONTRIBUTING.md:0-0
Timestamp: 2025-12-19T12:53:30.399Z
Learning: Applies to crates/biome_analyze/**/*analyze/src/**/*.rs : Lines prefixed with `#` in rule documentation code examples will be hidden from output

Applied to files:

  • crates/biome_markdown_parser/tests/md_test_suite/ok/inline_elements.md
  • crates/biome_markdown_parser/src/syntax/quote.rs
  • crates/biome_markdown_parser/tests/md_test_suite/ok/fenced_code_block.md
  • crates/biome_markdown_parser/tests/spec_test.rs
📚 Learning: 2025-12-12T10:11:05.564Z
Learnt from: CR
Repo: biomejs/biome PR: 0
File: CONTRIBUTING.md:0-0
Timestamp: 2025-12-12T10:11:05.564Z
Learning: Applies to .changeset/*.md : Write changesets that are concise (1-3 sentences), user-focused, use past tense for actions taken and present tense for Biome behavior, include code examples for rules, and end sentences with periods

Applied to files:

  • .changeset/curly-donkeys-show.md
📚 Learning: 2025-12-12T10:11:05.564Z
Learnt from: CR
Repo: biomejs/biome PR: 0
File: CONTRIBUTING.md:0-0
Timestamp: 2025-12-12T10:11:05.564Z
Learning: Create a changeset for user-facing bug fixes and features using `just new-changeset`, selecting appropriate packages and change type (`major`, `minor`, or `patch`)

Applied to files:

  • .changeset/curly-donkeys-show.md
📚 Learning: 2025-12-19T12:53:30.399Z
Learnt from: CR
Repo: biomejs/biome PR: 0
File: crates/biome_analyze/CONTRIBUTING.md:0-0
Timestamp: 2025-12-19T12:53:30.399Z
Learning: Applies to crates/biome_analyze/**/*analyze/src/**/*.rs : Rule documentation code blocks must have a language defined for syntax highlighting

Applied to files:

  • crates/biome_markdown_parser/src/syntax/fenced_code_block.rs
  • crates/biome_markdown_parser/src/syntax/inline.rs
  • crates/biome_markdown_parser/src/syntax/quote.rs
  • crates/biome_markdown_parser/tests/md_test_suite/ok/fenced_code_block.md
  • crates/biome_markdown_parser/tests/spec_test.rs
  • crates/biome_markdown_parser/src/syntax.rs
📚 Learning: 2025-12-19T12:53:30.399Z
Learnt from: CR
Repo: biomejs/biome PR: 0
File: crates/biome_analyze/CONTRIBUTING.md:0-0
Timestamp: 2025-12-19T12:53:30.399Z
Learning: Applies to crates/biome_analyze/**/*analyze/src/**/*.rs : Rule documentation code blocks should be ordered as language, expect_diagnostic, options/full_options/use_options, ignore, file

Applied to files:

  • crates/biome_markdown_parser/src/syntax/fenced_code_block.rs
  • crates/biome_markdown_parser/src/syntax/inline.rs
  • crates/biome_markdown_parser/src/syntax/quote.rs
  • crates/biome_markdown_parser/tests/md_test_suite/ok/fenced_code_block.md
  • crates/biome_markdown_parser/src/syntax.rs
📚 Learning: 2025-11-24T18:06:03.545Z
Learnt from: CR
Repo: biomejs/biome PR: 0
File: crates/biome_parser/CONTRIBUTING.md:0-0
Timestamp: 2025-11-24T18:06:03.545Z
Learning: Applies to crates/biome_parser/**/src/**/*.rs : Parse rules must take a mutable reference to the parser as their only parameter and return a `ParsedSyntax`

Applied to files:

  • crates/biome_markdown_parser/src/syntax/fenced_code_block.rs
  • crates/biome_markdown_parser/src/syntax/inline.rs
  • crates/biome_markdown_parser/src/syntax/quote.rs
  • crates/biome_markdown_parser/src/syntax.rs
📚 Learning: 2025-11-24T18:06:03.545Z
Learnt from: CR
Repo: biomejs/biome PR: 0
File: crates/biome_parser/CONTRIBUTING.md:0-0
Timestamp: 2025-11-24T18:06:03.545Z
Learning: Applies to crates/biome_parser/**/src/**/*.rs : Use `ConditionalParsedSyntax` for syntax that is only valid in specific contexts (e.g., strict mode, file types, language versions) and call `or_invalid_to_bogus()` to convert to a bogus node if not supported

Applied to files:

  • crates/biome_markdown_parser/src/syntax/fenced_code_block.rs
  • crates/biome_markdown_parser/src/syntax/inline.rs
  • crates/biome_markdown_parser/src/syntax/quote.rs
  • crates/biome_markdown_parser/src/syntax.rs
📚 Learning: 2025-11-24T18:06:03.545Z
Learnt from: CR
Repo: biomejs/biome PR: 0
File: crates/biome_parser/CONTRIBUTING.md:0-0
Timestamp: 2025-11-24T18:06:03.545Z
Learning: Applies to crates/biome_parser/**/src/**/*.rs : Parse rule functions must be prefixed with `parse_` and use the name defined in the grammar file, e.g., `parse_for_statement` or `parse_expression`

Applied to files:

  • crates/biome_markdown_parser/src/syntax/fenced_code_block.rs
  • crates/biome_markdown_parser/src/syntax/inline.rs
  • crates/biome_markdown_parser/src/syntax/quote.rs
  • crates/biome_markdown_parser/src/syntax.rs
📚 Learning: 2025-11-24T18:06:03.545Z
Learnt from: CR
Repo: biomejs/biome PR: 0
File: crates/biome_parser/CONTRIBUTING.md:0-0
Timestamp: 2025-11-24T18:06:03.545Z
Learning: Applies to crates/biome_parser/**/src/**/*.rs : Parse rules must return `ParsedSyntax::Absent` if the rule can't predict by the next token(s) if they form the expected node, and must not progress the parser in this case

Applied to files:

  • crates/biome_markdown_parser/src/syntax/fenced_code_block.rs
  • crates/biome_markdown_parser/src/syntax/inline.rs
  • crates/biome_markdown_parser/src/syntax/quote.rs
  • crates/biome_markdown_parser/src/syntax.rs
📚 Learning: 2025-11-24T18:06:03.545Z
Learnt from: CR
Repo: biomejs/biome PR: 0
File: crates/biome_parser/CONTRIBUTING.md:0-0
Timestamp: 2025-11-24T18:06:03.545Z
Learning: Applies to crates/biome_parser/**/src/**/*.rs : Use `p.eat(token)` for optional tokens, `p.expect(token)` for required tokens, `parse_rule(p).ok(p)` for optional nodes, and `parse_rule(p).or_add_diagnostic(p, error)` for required nodes

Applied to files:

  • crates/biome_markdown_parser/src/syntax/inline.rs
  • crates/biome_markdown_parser/src/lexer/tests.rs
  • crates/biome_markdown_parser/src/lexer/mod.rs
  • crates/biome_markdown_parser/src/syntax.rs
📚 Learning: 2025-12-19T12:53:30.399Z
Learnt from: CR
Repo: biomejs/biome PR: 0
File: crates/biome_analyze/CONTRIBUTING.md:0-0
Timestamp: 2025-12-19T12:53:30.399Z
Learning: Applies to crates/biome_analyze/**/*analyze/src/**/*.rs : Use `full_options` code block property for complete biome.json configuration snippets in documentation

Applied to files:

  • crates/biome_markdown_parser/tests/md_test_suite/ok/fenced_code_block.md
📚 Learning: 2025-12-19T12:53:30.399Z
Learnt from: CR
Repo: biomejs/biome PR: 0
File: crates/biome_analyze/CONTRIBUTING.md:0-0
Timestamp: 2025-12-19T12:53:30.399Z
Learning: Applies to crates/biome_analyze/**/*analyze/src/**/*.rs : Invalid code examples in rule documentation must be marked with `expect_diagnostic` code block property

Applied to files:

  • crates/biome_markdown_parser/tests/md_test_suite/ok/fenced_code_block.md
  • crates/biome_markdown_parser/tests/spec_test.rs
📚 Learning: 2025-11-24T18:06:03.545Z
Learnt from: CR
Repo: biomejs/biome PR: 0
File: crates/biome_parser/CONTRIBUTING.md:0-0
Timestamp: 2025-11-24T18:06:03.545Z
Learning: Applies to crates/biome_parser/**/lexer/mod.rs : Implement a `Lexer` trait from `biome_parser` crate for the lexer struct that consumes characters from source code and emits tokens

Applied to files:

  • crates/biome_markdown_parser/src/lexer/tests.rs
  • crates/biome_markdown_parser/src/lexer/mod.rs
  • crates/biome_markdown_parser/src/syntax.rs
📚 Learning: 2025-11-24T18:05:20.371Z
Learnt from: CR
Repo: biomejs/biome PR: 0
File: crates/biome_formatter/CONTRIBUTING.md:0-0
Timestamp: 2025-11-24T18:05:20.371Z
Learning: Applies to crates/biome_formatter/**/biome_*_formatter/tests/spec_tests.rs : Use the `tests_macros::gen_tests!` macro in `spec_tests.rs` to generate test functions for each specification file matching the pattern `tests/specs/<language>/**/*.<ext>`

Applied to files:

  • crates/biome_markdown_parser/src/lexer/tests.rs
  • crates/biome_markdown_parser/tests/spec_test.rs
📚 Learning: 2025-11-24T18:06:12.048Z
Learnt from: CR
Repo: biomejs/biome PR: 0
File: crates/biome_service/CONTRIBUTING.md:0-0
Timestamp: 2025-11-24T18:06:12.048Z
Learning: Applies to crates/biome_service/src/workspace/watcher.tests.rs : Implement watcher tests for workspace methods in watcher.tests.rs and end-to-end tests in LSP tests

Applied to files:

  • crates/biome_markdown_parser/src/lexer/tests.rs
  • crates/biome_markdown_parser/tests/spec_test.rs
📚 Learning: 2025-11-24T18:05:20.371Z
Learnt from: CR
Repo: biomejs/biome PR: 0
File: crates/biome_formatter/CONTRIBUTING.md:0-0
Timestamp: 2025-11-24T18:05:20.371Z
Learning: Create test infrastructure with `tests/specs` folder structure and `spec_test.rs`, `spec_tests.rs`, and `language.rs` files in test directories

Applied to files:

  • crates/biome_markdown_parser/src/lexer/tests.rs
📚 Learning: 2025-11-24T18:05:27.810Z
Learnt from: CR
Repo: biomejs/biome PR: 0
File: crates/biome_js_formatter/CONTRIBUTING.md:0-0
Timestamp: 2025-11-24T18:05:27.810Z
Learning: Applies to crates/biome_js_formatter/**/*.rs : When formatting AST nodes, use mandatory tokens from the AST instead of hardcoding token strings (e.g., use `node.l_paren_token().format()` instead of `token("(")`)

Applied to files:

  • crates/biome_markdown_parser/src/lexer/tests.rs
  • crates/biome_markdown_parser/src/lexer/mod.rs
📚 Learning: 2025-11-09T12:47:46.298Z
Learnt from: ematipico
Repo: biomejs/biome PR: 8031
File: crates/biome_html_parser/src/syntax/svelte.rs:140-147
Timestamp: 2025-11-09T12:47:46.298Z
Learning: In the Biome HTML parser, `expect` and `expect_with_context` consume the current token and then lex the next token. The context parameter in `expect_with_context` controls how the next token (after the consumed one) is lexed, not the current token being consumed. For example, in Svelte parsing, after `bump_with_context(T!["{:"], HtmlLexContext::Svelte)`, the next token is already lexed in the Svelte context, so `expect(T![else])` is sufficient unless the token after `else` also needs to be lexed in a specific context.

Applied to files:

  • crates/biome_markdown_parser/src/lexer/mod.rs
📚 Learning: 2025-12-19T12:53:30.399Z
Learnt from: CR
Repo: biomejs/biome PR: 0
File: crates/biome_analyze/CONTRIBUTING.md:0-0
Timestamp: 2025-12-19T12:53:30.399Z
Learning: Applies to crates/biome_analyze/**/*analyze/src/**/*.rs : Valid code examples in rule documentation should not trigger any diagnostics

Applied to files:

  • crates/biome_markdown_parser/tests/spec_test.rs
🧬 Code graph analysis (7)
crates/biome_markdown_parser/src/parser.rs (2)
crates/biome_markdown_parser/src/token_source.rs (2)
  • has_blank_line_since (93-122)
  • has_newline_since (126-133)
crates/biome_markdown_parser/src/syntax/header.rs (1)
  • has_newline_since (153-155)
crates/biome_markdown_parser/src/syntax/list.rs (2)
crates/biome_markdown_parser/src/syntax.rs (1)
  • parse_any_inline (160-162)
crates/biome_markdown_parser/src/syntax/inline.rs (1)
  • parse_any_inline (244-258)
crates/biome_markdown_parser/src/syntax/inline.rs (2)
crates/biome_markdown_parser/src/lexer/mod.rs (1)
  • source (74-76)
crates/biome_markdown_parser/src/syntax.rs (2)
  • parse_any_inline (160-162)
  • parse_textual (171-181)
crates/biome_markdown_parser/src/syntax/quote.rs (1)
crates/biome_markdown_parser/src/syntax.rs (1)
  • parse_any_inline (160-162)
crates/biome_markdown_parser/src/token_source.rs (2)
crates/biome_markdown_parser/src/parser.rs (2)
  • has_blank_line_since (40-42)
  • has_newline_since (45-47)
crates/biome_markdown_parser/src/syntax/header.rs (1)
  • has_newline_since (153-155)
crates/biome_markdown_parser/src/lexer/mod.rs (2)
crates/biome_unicode_table/src/lib.rs (1)
  • lookup_byte (106-109)
crates/biome_yaml_parser/src/lexer/mod.rs (1)
  • start_position (872-874)
crates/biome_markdown_parser/src/syntax.rs (5)
crates/biome_markdown_parser/src/syntax/fenced_code_block.rs (2)
  • at_fenced_code_block (9-11)
  • parse_fenced_code_block (21-55)
crates/biome_markdown_parser/src/syntax/header.rs (2)
  • at_header (15-17)
  • parse_header (27-53)
crates/biome_markdown_parser/src/syntax/list.rs (2)
  • at_bullet_list_item (17-30)
  • parse_bullet_list_item (42-72)
crates/biome_markdown_parser/src/syntax/quote.rs (2)
  • at_quote (14-16)
  • parse_quote (24-58)
crates/biome_markdown_parser/src/syntax/inline.rs (1)
  • parse_any_inline (244-258)
🪛 LanguageTool
crates/biome_markdown_parser/tests/md_test_suite/ok/inline_elements.md

[style] ~4-~4: Three successive sentences begin with the same word. Consider rewording the sentence or use a thesaurus to find a synonym.
Context: ...This is italic and this is bold. This is also italic and also bold. He...

(ENGLISH_WORD_REPEAT_BEGINNING_RULE)


[style] ~5-~5: You’ve already used the word ‘also’ once in your sentence, so using it again may be redundant.
Context: ... bold. This is also italic and also bold. Here is a [link](https://example.co...

(REDUNDANT_FILLER)

🔇 Additional comments (53)
xtask/codegen/src/markdown_kinds_src.rs (1)

19-20: LGTM!

The TRIPLE_TILDE token follows the established pattern alongside TRIPLE_BACKTICK, neatly supporting CommonMark tilde-fenced code blocks.

crates/biome_markdown_parser/tests/md_test_suite/ok/block_quote.md (1)

1-4: LGTM!

Solid test fixture covering multi-line block quotes and blank-line separation between distinct quote blocks.

crates/biome_markdown_parser/tests/md_test_suite/ok/inline_elements.md (1)

1-7: LGTM!

Comprehensive coverage of inline elements: code spans, both emphasis syntaxes (*/_), links, and images. The static analysis grumbles about repetitive phrasing, but that's rather the point of a test fixture.

crates/biome_markdown_parser/tests/md_test_suite/ok/header.md (1)

1-17: LGTM!

Excellent coverage of ATX headers (H1–H6) and trailing-hash edge cases. The "Mixed # content ## with ### trailing ####" line is a nice stress test for the lookahead-based detection.

Consider adding edge-case tests for invalid headers in a separate fixture (e.g., ####### Seven hashes which shouldn't parse as a header, or #NoSpace without the required space). These would exercise error recovery paths.

crates/biome_markdown_parser/tests/md_test_suite/ok/bullet_list.md (1)

1-6: LGTM!

Covers both bullet markers (- and *) with clean separation. Given multi-line list items are deferred to future work, this is appropriate Phase 1 coverage.

crates/biome_markdown_parser/tests/md_test_suite/ok/indent_code_block.md (1)

1-8: LGTM!

Test fixture appropriately exercises indented code block parsing with proper 4-space indentation and mixed content.

crates/biome_markdown_parser/tests/md_test_suite/ok/paragraph.md (1)

1-3: LGTM!

Clean test fixture for paragraph parsing with blank line separation.

crates/biome_markdown_parser/tests/spec_test.rs (1)

138-141: LGTM!

Test input updated with realistic indented code block content, aligning with the new test fixtures.

crates/biome_markdown_parser/src/lexer/tests.rs (1)

165-422: LGTM!

Comprehensive test coverage for the new Markdown lexer capabilities. The tests are well-structured, consistently use the assert_lex! macro, and appropriately exercise ATX headers, fenced code blocks, block quotes, emphasis, links, images, and escape sequences.

crates/biome_markdown_parser/tests/md_test_suite/ok/fenced_code_block.md (1)

1-15: LGTM!

Test fixture properly exercises fenced code block parsing with both backtick and tilde syntax, including language specifiers.

.changeset/curly-donkeys-show.md (1)

7-16: LGTM!

Clear, detailed description of the implemented features with appropriate caveats about scope limitations.

crates/biome_markdown_parser/src/token_source.rs (3)

86-89: LGTM!

Simple trivia length getter appropriately exposes the trivia list size for position tracking.


91-122: LGTM!

The blank line detection logic is correct: it properly counts consecutive newlines (ignoring whitespace between them), resets on trailing trivia or non-newline/whitespace trivia, and returns true when 2+ consecutive newlines are found.


124-133: LGTM!

Simple and correct detection of any non-trailing newline since the given position.

crates/biome_markdown_parser/src/parser.rs (1)

34-53: LGTM!

All four methods are clean delegations to the token source with clear documentation. They appropriately expose trivia tracking capabilities for block-level construct detection and paragraph boundary handling.

crates/biome_markdown_parser/src/syntax/quote.rs (2)

1-11: Module structure looks good.

Clean imports and module documentation. The module is correctly scoped as pub(crate).


13-16: Detection logic is correct.

Properly checks for > at line start using has_preceding_line_break() or at_start_of_input().

crates/biome_markdown_parser/src/syntax/fenced_code_block.rs (4)

8-11: Detection function is correct.

Properly identifies both backtick and tilde fence openings.


21-55: Well-structured fenced code block parsing.

Good handling of fence type matching per CommonMark spec. The optional closing fence is correctly handled.


62-82: Language name parsing is correct.

Sensibly checks for preceding line break to detect empty language strings and parses until newline.


86-105: Content parsing handles fence matching correctly.

Appropriately stops at matching fence type or EOF. Each token is wrapped as MD_TEXTUAL which aligns with the grammar.

crates/biome_markdown_parser/src/syntax/header.rs (4)

10-11: Good use of named constant.

MAX_HEADER_HASHES = 6 makes the CommonMark constraint clear.


57-70: Hash list parsing is straightforward.

Correctly wraps each # in MD_HASH and the sequence in MD_HASH_LIST.


126-149: Trailing hash detection uses checkpoint correctly.

The lookahead pattern with checkpoint/rewind is the right approach for this detection.


27-42: No issue here — the parser state is properly cleaned up.

parse_header is called only from within try_parse, which saves a checkpoint before the call. When rewind() is triggered on parse failure, it truncates the entire events list back to the checkpoint position, discarding the MD_HASH_LIST completion and all subsequent work. The function signature correctly returns Absent and relies on the caller's checkpoint/rewind mechanism, which is the standard pattern in biome_parser.

Likely an incorrect or invalid review comment.

crates/biome_markdown_parser/src/syntax.rs (6)

1-5: Good module organisation.

Clean separation of concerns with dedicated submodules for each block/inline type.


21-22: Named constant improves readability.

INDENT_CODE_BLOCK_SPACES = 4 documents the CommonMark requirement clearly.


27-28: EOF bump is required by the grammar.

Good addition with clear comment explaining the rationale.


41-77: Block dispatch logic is well-structured.

Good fallback chain with try_parse for constructs that may fail validation (thematic breaks, headers). The fallback to paragraph parsing is sensible.


90-109: Indented code block parsing is correct.

Properly checks for 4+ spaces and consumes content while indentation holds.


133-155: Inline item list parsing handles blank lines correctly.

Unlike quote.rs, the trivia position is captured before the loop (line 137), so blank-line detection works as intended.

crates/biome_markdown_parser/src/syntax/list.rs (4)

17-30: Bullet detection logic is sound.

Correctly requires line-start position for bullet markers.


52-68: Error recovery prevents infinite loops.

Good defensive programming with position tracking.


101-122: Bullet content parsing stops at newline correctly.

The has_preceding_line_break() check before parsing ensures single-line items.


124-137: Ordered list stubs are appropriately marked.

Using #[expect(dead_code)] is cleaner than #[allow(dead_code)] as it will warn when the code is eventually used.

crates/biome_markdown_parser/src/lexer/mod.rs (5)

13-14: Import refactoring is cleaner.

Importing Dispatch module and lookup_byte separately improves clarity.


182-196: Dispatch table extended correctly.

Good coverage of Markdown-specific tokens: hash, backtick, tilde, brackets, escape sequences.


199-250: Escape handling follows CommonMark spec.

All escapable ASCII punctuation characters are correctly enumerated.


336-399: Thematic break vs emphasis disambiguation is well-handled.

The save/restore position pattern for speculative parsing is correct. Good handling of the edge case where IDT can match letters or underscore.


401-436: Hash, backtick, and tilde consumers are straightforward.

Clean implementation differentiating single vs triple tokens where applicable.

crates/biome_markdown_parser/src/syntax/inline.rs (8)

1-10: Clean module setup with clear documentation.

Good overview of supported inline elements.


15-40: Inline code parsing handles unclosed spans gracefully.

Correctly parses content until closing backtick, newline, or EOF. The optional closing backtick handling is appropriate for error tolerance.


45-78: Emphasis parsing is correct.

Good use of opener variable to ensure matching close marker.


80-116: Italic parsing mirrors emphasis structure.

Structurally similar to parse_inline_emphasis. If this pattern repeats further, consider extracting a helper, but for two cases it's fine.


118-165: Link parsing handles partial syntax gracefully.

Good error tolerance - incomplete [text] or missing (url) portions are handled without panicking.


167-223: Image parsing correctly requires ![ prefix.

The two-token lookahead (BANG then L_BRACK) is the right approach.


225-241: Nested inline avoids emphasis recursion - good design.

The comment at lines 228-230 explains the intentional limitation clearly.


243-258: Dispatcher routes correctly with textual fallback.

Complete coverage of inline constructs with sensible default to parse_textual.

xtask/codegen/markdown.ungram (5)

109-113: Grammar allows mismatched fence characters.

The grammar permits a block opened with backticks to close with tildes (or vice versa). This is fine since the grammar defines structure rather than semantics — the parser is responsible for enforcing that the closing fence matches the opening fence per CommonMark.

If this is intentional for maximum error tolerance, brilliant. Just worth confirming the parser validates matching fence types.


163-167: Optional closing tokens for error tolerance — sensible approach.

Making r_fence optional here (and similarly for emphasis, code spans, links, and images) allows the parser to produce a valid AST even when the user hasn't finished typing. Good call for editor integration.


188-195: MdInlineLink: Consider requiring '[' to pair with ']' in the required set.

Currently '[' is required whilst ']', '(', and ')' are all optional. This means a valid link node could be just [ followed by content — which is maximally tolerant but potentially quite lenient.

If you find false positives during testing (stray [ characters being parsed as links), you might tighten this. For now, this is a design choice rather than a defect.


200-207: MdInlineImage structure looks good.

The '!' and '[' are required (correctly identifying intent to create an image), with everything else optional for error tolerance. This mirrors the link structure and is consistent.


7-19: Legend comment present — nicely done.

As per the contribution guidelines, the legend explaining grammar syntax is included. Appreciate the clarity.

"@biomejs/biome": patch
---

Add Markdown parser support for inline elements, block quotes, lists, headers, and code blocks.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor

Use past tense for the summary.

Per coding guidelines, changesets should use past tense for actions taken.

🔎 Proposed fix
-Add Markdown parser support for inline elements, block quotes, lists, headers, and code blocks.
+Added Markdown parser support for inline elements, block quotes, lists, headers, and code blocks.

As per coding guidelines, changesets should use past tense for actions taken.

📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
Add Markdown parser support for inline elements, block quotes, lists, headers, and code blocks.
Added Markdown parser support for inline elements, block quotes, lists, headers, and code blocks.
🤖 Prompt for AI Agents
In .changeset/curly-donkeys-show.md around line 5 the changeset summary is
written in present tense ("Add Markdown parser support..."); update that summary
to past tense ("Added Markdown parser support for inline elements, block quotes,
lists, headers, and code blocks.") and save the file so the changeset follows
the repository's past-tense guideline.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Added support for + bullet list marker per CommonMark spec.

Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 0

🧹 Nitpick comments (2)
crates/biome_markdown_parser/src/syntax/quote.rs (2)

36-40: Consider explaining the rationale for remapping the quote marker.

The code remaps > to MD_TEXTUAL_LITERAL, making it part of the paragraph content rather than a separate quote marker node. Whilst this works, the design choice is non-obvious. Expanding the comment to explain why this approach was chosen (e.g., to preserve the marker for formatting or due to AST design constraints) would help future maintainers.


1-5: Document Phase 1 scope limitations.

The module documentation mentions "nested block elements", but the current implementation only parses inline content within a single paragraph and doesn't support nested quotes or multiple blocks. Adding a note about Phase 1 limitations (deferred: nesting, multi-line quote consolidation, multiple blocks) would set correct expectations.

Also applies to: 18-23

📜 Review details

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 5b90681 and ae590a3.

⛔ Files ignored due to path filters (1)
  • crates/biome_markdown_parser/tests/md_test_suite/ok/block_quote.md.snap is excluded by !**/*.snap and included by **
📒 Files selected for processing (1)
  • crates/biome_markdown_parser/src/syntax/quote.rs (1 hunks)
🧰 Additional context used
📓 Path-based instructions (1)
crates/**/*.rs

📄 CodeRabbit inference engine (CONTRIBUTING.md)

Update inline rustdoc documentation for rules, assists, and their options when adding new features or changing existing features in Rust crates

Files:

  • crates/biome_markdown_parser/src/syntax/quote.rs
🧠 Learnings (6)
📚 Learning: 2025-11-24T18:06:03.545Z
Learnt from: CR
Repo: biomejs/biome PR: 0
File: crates/biome_parser/CONTRIBUTING.md:0-0
Timestamp: 2025-11-24T18:06:03.545Z
Learning: Applies to crates/biome_parser/**/src/**/*.rs : Parse rules must take a mutable reference to the parser as their only parameter and return a `ParsedSyntax`

Applied to files:

  • crates/biome_markdown_parser/src/syntax/quote.rs
📚 Learning: 2025-12-19T12:53:30.399Z
Learnt from: CR
Repo: biomejs/biome PR: 0
File: crates/biome_analyze/CONTRIBUTING.md:0-0
Timestamp: 2025-12-19T12:53:30.399Z
Learning: Applies to crates/biome_analyze/**/*analyze/src/**/*.rs : Prefix line with `#` in documentation code examples sparingly; prefer concise complete snippets

Applied to files:

  • crates/biome_markdown_parser/src/syntax/quote.rs
📚 Learning: 2025-12-19T12:53:30.399Z
Learnt from: CR
Repo: biomejs/biome PR: 0
File: crates/biome_analyze/CONTRIBUTING.md:0-0
Timestamp: 2025-12-19T12:53:30.399Z
Learning: Applies to crates/biome_analyze/**/*analyze/src/**/*.rs : Rule documentation code blocks must have a language defined for syntax highlighting

Applied to files:

  • crates/biome_markdown_parser/src/syntax/quote.rs
📚 Learning: 2025-12-19T12:53:30.399Z
Learnt from: CR
Repo: biomejs/biome PR: 0
File: crates/biome_analyze/CONTRIBUTING.md:0-0
Timestamp: 2025-12-19T12:53:30.399Z
Learning: Applies to crates/biome_analyze/**/*analyze/src/**/*.rs : Rule documentation code blocks should be ordered as language, expect_diagnostic, options/full_options/use_options, ignore, file

Applied to files:

  • crates/biome_markdown_parser/src/syntax/quote.rs
📚 Learning: 2025-11-24T18:06:03.545Z
Learnt from: CR
Repo: biomejs/biome PR: 0
File: crates/biome_parser/CONTRIBUTING.md:0-0
Timestamp: 2025-11-24T18:06:03.545Z
Learning: Applies to crates/biome_parser/**/src/**/*.rs : Implement a token source struct that wraps the lexer and implements `TokenSourceWithBufferedLexer` and `LexerWithCheckpoint` for lookahead and re-lexing capabilities

Applied to files:

  • crates/biome_markdown_parser/src/syntax/quote.rs
📚 Learning: 2025-11-24T18:06:03.545Z
Learnt from: CR
Repo: biomejs/biome PR: 0
File: crates/biome_parser/CONTRIBUTING.md:0-0
Timestamp: 2025-11-24T18:06:03.545Z
Learning: Applies to crates/biome_parser/**/src/**/*.rs : A parser struct must implement the `Parser` trait and save the token source, parser context, and optional parser options

Applied to files:

  • crates/biome_markdown_parser/src/syntax/quote.rs
🧬 Code graph analysis (1)
crates/biome_markdown_parser/src/syntax/quote.rs (1)
crates/biome_markdown_parser/src/syntax.rs (1)
  • parse_any_inline (160-162)
🔇 Additional comments (1)
crates/biome_markdown_parser/src/syntax/quote.rs (1)

24-57: Multi-line quote handling requires clarification.

The implementation treats consecutive lines with > markers as separate block quotes rather than as a single multi-line quote. This is confirmed by the test snapshot for block_quote.md, which shows the second line's > being parsed as inline textual content within the first quote block, rather than as a quote continuation marker.

In the test input:

> This is a quote
> It continues here

The expected AST includes the second > as MD_TEXTUAL_LITERAL within the inline item list of the first quote, not as a block-level marker. This differs from CommonMark handling, where each > at line start represents a block quote marker.

Is this an intentional Phase 1 limitation for basic quote support, or should continuation lines with > be recognised as proper quote structure?

@jfmcdowell jfmcdowell requested a review from dyc3 December 20, 2025 18:54
Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 0

🧹 Nitpick comments (1)
crates/biome_markdown_parser/src/lexer/mod.rs (1)

200-251: Consider using rustdoc comments for documentation.

The escape handling logic correctly implements CommonMark's ASCII punctuation escape rules. However, the function uses standard // comments rather than /// rustdoc comments. Per coding guidelines, inline rustdoc documentation should be updated when adding new features.

🔎 Suggested documentation style
-    /// Consume a backslash escape sequence.
-    /// Per CommonMark spec, a backslash before ASCII punctuation makes it literal.
-    /// Escapable: !"#$%&'()*+,-./:;<=>?@[\]^_`{|}~
+    /// Consume a backslash escape sequence.
+    ///
+    /// Per CommonMark spec, a backslash before ASCII punctuation makes it literal.
+    /// Escapable: `!"#$%&'()*+,-./:;<=>?@[\]^_`{|}~`
+    ///
+    /// If the backslash is not followed by an escapable character, only the backslash
+    /// is consumed and returned as a textual literal.
     fn consume_escape(&mut self) -> MarkdownSyntaxKind {

Based on learnings, inline rustdoc documentation should be maintained for features in Rust crates.

📜 Review details

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between e179019 and e75b98f.

⛔ Files ignored due to path filters (3)
  • crates/biome_markdown_factory/src/generated/syntax_factory.rs is excluded by !**/generated/**, !**/generated/** and included by **
  • crates/biome_markdown_parser/tests/md_test_suite/ok/bullet_list.md.snap is excluded by !**/*.snap and included by **
  • crates/biome_markdown_syntax/src/generated/kind.rs is excluded by !**/generated/**, !**/generated/** and included by **
📒 Files selected for processing (6)
  • crates/biome_markdown_parser/src/lexer/mod.rs (4 hunks)
  • crates/biome_markdown_parser/src/lexer/tests.rs (2 hunks)
  • crates/biome_markdown_parser/src/syntax/list.rs (1 hunks)
  • crates/biome_markdown_parser/tests/md_test_suite/ok/bullet_list.md (1 hunks)
  • xtask/codegen/markdown.ungram (5 hunks)
  • xtask/codegen/src/markdown_kinds_src.rs (2 hunks)
🚧 Files skipped from review as they are similar to previous changes (2)
  • crates/biome_markdown_parser/src/syntax/list.rs
  • crates/biome_markdown_parser/src/lexer/tests.rs
🧰 Additional context used
📓 Path-based instructions (1)
crates/**/*.rs

📄 CodeRabbit inference engine (CONTRIBUTING.md)

Update inline rustdoc documentation for rules, assists, and their options when adding new features or changing existing features in Rust crates

Files:

  • crates/biome_markdown_parser/src/lexer/mod.rs
🧠 Learnings (13)
📚 Learning: 2025-12-04T13:29:49.287Z
Learnt from: dyc3
Repo: biomejs/biome PR: 8291
File: crates/biome_html_formatter/tests/specs/prettier/vue/html-vue/elastic-header.html:10-10
Timestamp: 2025-12-04T13:29:49.287Z
Learning: Files under `crates/biome_html_formatter/tests/specs/prettier` are test fixtures synced from Prettier and should not receive detailed code quality reviews (e.g., HTTP vs HTTPS, formatting suggestions, etc.). These files are test data meant to validate formatter behavior and should be preserved as-is.

Applied to files:

  • crates/biome_markdown_parser/tests/md_test_suite/ok/bullet_list.md
📚 Learning: 2025-11-24T18:06:03.545Z
Learnt from: CR
Repo: biomejs/biome PR: 0
File: crates/biome_parser/CONTRIBUTING.md:0-0
Timestamp: 2025-11-24T18:06:03.545Z
Learning: Applies to crates/biome_parser/**/language_kind.rs : Add a new variant to `LanguageKind` enum in `language_kind.rs` file and implement all methods for the new language variant

Applied to files:

  • xtask/codegen/src/markdown_kinds_src.rs
📚 Learning: 2025-11-24T18:06:03.545Z
Learnt from: CR
Repo: biomejs/biome PR: 0
File: crates/biome_parser/CONTRIBUTING.md:0-0
Timestamp: 2025-11-24T18:06:03.545Z
Learning: Applies to crates/biome_parser/**/language_kind.rs : Add a new language prefix to the `LANGUAGE_PREFIXES` constant in `language_kind.rs` file

Applied to files:

  • xtask/codegen/src/markdown_kinds_src.rs
📚 Learning: 2025-11-24T18:06:03.545Z
Learnt from: CR
Repo: biomejs/biome PR: 0
File: crates/biome_parser/CONTRIBUTING.md:0-0
Timestamp: 2025-11-24T18:06:03.545Z
Learning: Applies to crates/biome_parser/**/lexer/mod.rs : Implement a `Lexer` trait from `biome_parser` crate for the lexer struct that consumes characters from source code and emits tokens

Applied to files:

  • crates/biome_markdown_parser/src/lexer/mod.rs
📚 Learning: 2025-11-24T18:06:03.545Z
Learnt from: CR
Repo: biomejs/biome PR: 0
File: crates/biome_parser/CONTRIBUTING.md:0-0
Timestamp: 2025-11-24T18:06:03.545Z
Learning: Applies to crates/biome_parser/**/src/**/*.rs : Implement a token source struct that wraps the lexer and implements `TokenSourceWithBufferedLexer` and `LexerWithCheckpoint` for lookahead and re-lexing capabilities

Applied to files:

  • crates/biome_markdown_parser/src/lexer/mod.rs
📚 Learning: 2025-11-24T18:06:03.545Z
Learnt from: CR
Repo: biomejs/biome PR: 0
File: crates/biome_parser/CONTRIBUTING.md:0-0
Timestamp: 2025-11-24T18:06:03.545Z
Learning: Applies to crates/biome_parser/**/src/**/*.rs : Use `p.eat(token)` for optional tokens, `p.expect(token)` for required tokens, `parse_rule(p).ok(p)` for optional nodes, and `parse_rule(p).or_add_diagnostic(p, error)` for required nodes

Applied to files:

  • crates/biome_markdown_parser/src/lexer/mod.rs
  • xtask/codegen/markdown.ungram
📚 Learning: 2025-11-24T18:05:27.810Z
Learnt from: CR
Repo: biomejs/biome PR: 0
File: crates/biome_js_formatter/CONTRIBUTING.md:0-0
Timestamp: 2025-11-24T18:05:27.810Z
Learning: Applies to crates/biome_js_formatter/**/*.rs : When formatting AST nodes, use mandatory tokens from the AST instead of hardcoding token strings (e.g., use `node.l_paren_token().format()` instead of `token("(")`)

Applied to files:

  • crates/biome_markdown_parser/src/lexer/mod.rs
📚 Learning: 2025-12-12T10:11:05.564Z
Learnt from: CR
Repo: biomejs/biome PR: 0
File: CONTRIBUTING.md:0-0
Timestamp: 2025-12-12T10:11:05.564Z
Learning: Applies to crates/**/*.rs : Update inline rustdoc documentation for rules, assists, and their options when adding new features or changing existing features in Rust crates

Applied to files:

  • crates/biome_markdown_parser/src/lexer/mod.rs
  • xtask/codegen/markdown.ungram
📚 Learning: 2025-11-24T18:06:03.545Z
Learnt from: CR
Repo: biomejs/biome PR: 0
File: crates/biome_parser/CONTRIBUTING.md:0-0
Timestamp: 2025-11-24T18:06:03.545Z
Learning: Applies to crates/biome_parser/**/src/**/*.rs : Implement error recovery in list parsing using `or_recover()` to wrap unparseable tokens in a `BOGUS_*` node and consume tokens until a recovery token is found

Applied to files:

  • crates/biome_markdown_parser/src/lexer/mod.rs
📚 Learning: 2025-11-09T12:47:46.298Z
Learnt from: ematipico
Repo: biomejs/biome PR: 8031
File: crates/biome_html_parser/src/syntax/svelte.rs:140-147
Timestamp: 2025-11-09T12:47:46.298Z
Learning: In the Biome HTML parser, `expect` and `expect_with_context` consume the current token and then lex the next token. The context parameter in `expect_with_context` controls how the next token (after the consumed one) is lexed, not the current token being consumed. For example, in Svelte parsing, after `bump_with_context(T!["{:"], HtmlLexContext::Svelte)`, the next token is already lexed in the Svelte context, so `expect(T![else])` is sufficient unless the token after `else` also needs to be lexed in a specific context.

Applied to files:

  • crates/biome_markdown_parser/src/lexer/mod.rs
  • xtask/codegen/markdown.ungram
📚 Learning: 2025-11-24T18:06:03.545Z
Learnt from: CR
Repo: biomejs/biome PR: 0
File: crates/biome_parser/CONTRIBUTING.md:0-0
Timestamp: 2025-11-24T18:06:03.545Z
Learning: Applies to crates/biome_parser/**/*.ungram : Add a legend comment to `.ungram` files explaining the grammar syntax including comments, non-terminals, tokens, sequences, alternations, repetitions, and optional elements

Applied to files:

  • xtask/codegen/markdown.ungram
📚 Learning: 2025-12-12T10:11:05.564Z
Learnt from: CR
Repo: biomejs/biome PR: 0
File: CONTRIBUTING.md:0-0
Timestamp: 2025-12-12T10:11:05.564Z
Learning: Applies to .changeset/*.md : Write changesets that are concise (1-3 sentences), user-focused, use past tense for actions taken and present tense for Biome behavior, include code examples for rules, and end sentences with periods

Applied to files:

  • xtask/codegen/markdown.ungram
📚 Learning: 2025-11-24T18:06:03.545Z
Learnt from: CR
Repo: biomejs/biome PR: 0
File: crates/biome_parser/CONTRIBUTING.md:0-0
Timestamp: 2025-11-24T18:06:03.545Z
Learning: Preserve valid tree structure during error recovery by marking invalid parts as 'bogus' nodes rather than invalidating parent nodes

Applied to files:

  • xtask/codegen/markdown.ungram
🧬 Code graph analysis (1)
crates/biome_markdown_parser/src/lexer/mod.rs (2)
crates/biome_unicode_table/src/lib.rs (1)
  • lookup_byte (106-109)
crates/biome_yaml_parser/src/lexer/mod.rs (1)
  • start_position (872-874)
🔇 Additional comments (13)
xtask/codegen/src/markdown_kinds_src.rs (1)

16-16: LGTM! Token and node additions align with CommonMark support.

The PLUS token enables + bullet lists, TRIPLE_TILDE supports tilde-fenced code blocks (~~~), and MD_BOGUS_BULLET provides error recovery for malformed bullets. All additions are consistent with the PR objectives.

Also applies to: 21-21, 43-43

xtask/codegen/markdown.ungram (6)

105-113: Approve tilde fence support; verify content type is addressed.

The addition of tilde-fenced code blocks (~~~) aligns with CommonMark. However, the content type issue flagged separately must be resolved.


115-115: LGTM! Simplified grammar.

Removing the label simplifies the grammar without losing functionality.


134-134: LGTM! Plus bullet support added.

The + bullet marker aligns with CommonMark and is consistent with the PLUS token addition.


144-144: Reasonable to make hard line optional.

Not all paragraphs end with explicit hard line breaks, so making this optional is sensible.


193-202: LGTM! Simplified inline image structure.

Removing intermediate nodes (MdInlineImageAlt, MdInlineImageSource, MdInlineImageLink) and using MdInlineItemList directly simplifies the grammar whilst maintaining the necessary structure for image alt text and source.


97-99: [Rewritten review comment]
[Classification tag]

crates/biome_markdown_parser/tests/md_test_suite/ok/bullet_list.md (1)

1-9: Test fixture looks good.

The bullet list test data appropriately covers the three common Markdown bullet markers (-, *, +) with proper formatting and blank line separation. This should provide good coverage for bullet list parsing validation.

crates/biome_markdown_parser/src/lexer/mod.rs (5)

1-14: Import structure looks solid.

The updated imports correctly bring in the dispatch mechanism and lookup_byte function needed for the expanded token routing. Documentation update appropriately reflects the Markdown-specific implementation.


180-198: Token dispatch routing is well-structured.

The dispatch logic properly routes each token type to its appropriate handler, with simple single-character tokens using the consume_byte helper and complex constructs (thematic breaks, escapes) delegated to dedicated parsing functions.


402-437: Token consumption logic is sound.

The three token consumers (hash, backtick, tilde) correctly handle their respective characters. The backtick and tilde handlers appropriately distinguish between single and triple variants using lookahead, which aligns with CommonMark's fenced code block syntax.


503-507: Helper function is fit for purpose.

The consume_byte helper cleanly handles single-byte token consumption and is actively used throughout the dispatch logic.


382-390: No action required—the implementation correctly reflects the CommonMark specification.

The code tokenises ** and __ as DOUBLE_STAR and DOUBLE_UNDERSCORE respectively for strong emphasis, whilst -- is correctly tokenised as MINUS. This is not an inconsistency; it reflects the markdown spec: dashes are not emphasis markers, only asterisks and underscores are. The comment at line 387 is accurate.

@ematipico
Copy link
Member

Thank you @jfmcdowell

Please review the implementation of the parser. While I understand you used AI assistance, a human intervention is always required.

Here's an example: check the emitted CST in the snapshots. For each letter inside a paragraph, we create a node. This implantation will create an exponentially big CST for large documents. That's not acceptable for us and the users. Please review the parsing logic so that the emitted CST groups text into a single node.

Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

🧹 Nitpick comments (2)
crates/biome_markdown_parser/src/lexer/tests.rs (1)

166-184: Consider consolidating hash tests or testing the full header use case.

The hash_token and multiple_hashes tests verify that hashes are emitted individually, which is correct for ATX header parsing. However, it would be valuable to add a test covering a complete header scenario (e.g., "# Title") to ensure the combination works as expected in practice.

crates/biome_markdown_parser/src/lexer/mod.rs (1)

409-437: Code fences support 3+ characters per CommonMark spec—consider extending beyond the current hard-coded triple.

CommonMark requires code fences to be at least three consecutive backticks or tildes, and using four backticks to wrap three is perfectly valid for nesting. The current implementation handles exactly 1 or 3 characters; supporting variable lengths (4, 5, etc.) would allow proper code block nesting without resorting to tildes.

For Phase 1 this is acceptable, but worth addressing in a follow-up.

📜 Review details

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between e75b98f and a7c2a67.

⛔ Files ignored due to path filters (7)
  • crates/biome_markdown_parser/tests/md_test_suite/ok/block_quote.md.snap is excluded by !**/*.snap and included by **
  • crates/biome_markdown_parser/tests/md_test_suite/ok/bullet_list.md.snap is excluded by !**/*.snap and included by **
  • crates/biome_markdown_parser/tests/md_test_suite/ok/fenced_code_block.md.snap is excluded by !**/*.snap and included by **
  • crates/biome_markdown_parser/tests/md_test_suite/ok/header.md.snap is excluded by !**/*.snap and included by **
  • crates/biome_markdown_parser/tests/md_test_suite/ok/indent_code_block.md.snap is excluded by !**/*.snap and included by **
  • crates/biome_markdown_parser/tests/md_test_suite/ok/inline_elements.md.snap is excluded by !**/*.snap and included by **
  • crates/biome_markdown_parser/tests/md_test_suite/ok/paragraph.md.snap is excluded by !**/*.snap and included by **
📒 Files selected for processing (2)
  • crates/biome_markdown_parser/src/lexer/mod.rs (5 hunks)
  • crates/biome_markdown_parser/src/lexer/tests.rs (2 hunks)
🧰 Additional context used
📓 Path-based instructions (1)
crates/**/*.rs

📄 CodeRabbit inference engine (CONTRIBUTING.md)

Update inline rustdoc documentation for rules, assists, and their options when adding new features or changing existing features in Rust crates

Files:

  • crates/biome_markdown_parser/src/lexer/tests.rs
  • crates/biome_markdown_parser/src/lexer/mod.rs
🧠 Learnings (18)
📓 Common learnings
Learnt from: CR
Repo: biomejs/biome PR: 0
File: crates/biome_parser/CONTRIBUTING.md:0-0
Timestamp: 2025-11-24T18:06:03.545Z
Learning: Applies to crates/biome_parser/**/src/**/*.rs : Implement a token source struct that wraps the lexer and implements `TokenSourceWithBufferedLexer` and `LexerWithCheckpoint` for lookahead and re-lexing capabilities
Learnt from: CR
Repo: biomejs/biome PR: 0
File: crates/biome_parser/CONTRIBUTING.md:0-0
Timestamp: 2025-11-24T18:06:03.545Z
Learning: Applies to crates/biome_parser/**/*.ungram : Add a legend comment to `.ungram` files explaining the grammar syntax including comments, non-terminals, tokens, sequences, alternations, repetitions, and optional elements
Learnt from: CR
Repo: biomejs/biome PR: 0
File: CONTRIBUTING.md:0-0
Timestamp: 2025-12-12T10:11:05.564Z
Learning: Applies to crates/**/*.rs : Update inline rustdoc documentation for rules, assists, and their options when adding new features or changing existing features in Rust crates
Learnt from: CR
Repo: biomejs/biome PR: 0
File: crates/biome_parser/CONTRIBUTING.md:0-0
Timestamp: 2025-11-24T18:06:03.545Z
Learning: Applies to crates/biome_parser/**/lexer/mod.rs : Implement a `Lexer` trait from `biome_parser` crate for the lexer struct that consumes characters from source code and emits tokens
📚 Learning: 2025-11-24T18:06:03.545Z
Learnt from: CR
Repo: biomejs/biome PR: 0
File: crates/biome_parser/CONTRIBUTING.md:0-0
Timestamp: 2025-11-24T18:06:03.545Z
Learning: Applies to crates/biome_parser/**/src/**/*.rs : Implement a token source struct that wraps the lexer and implements `TokenSourceWithBufferedLexer` and `LexerWithCheckpoint` for lookahead and re-lexing capabilities

Applied to files:

  • crates/biome_markdown_parser/src/lexer/tests.rs
  • crates/biome_markdown_parser/src/lexer/mod.rs
📚 Learning: 2025-11-24T18:05:20.371Z
Learnt from: CR
Repo: biomejs/biome PR: 0
File: crates/biome_formatter/CONTRIBUTING.md:0-0
Timestamp: 2025-11-24T18:05:20.371Z
Learning: Applies to crates/biome_formatter/**/biome_*_formatter/tests/language.rs : Implement `TestFormatLanguage` trait in `tests/language.rs` for the formatter's test language

Applied to files:

  • crates/biome_markdown_parser/src/lexer/tests.rs
📚 Learning: 2025-11-24T18:06:03.545Z
Learnt from: CR
Repo: biomejs/biome PR: 0
File: crates/biome_parser/CONTRIBUTING.md:0-0
Timestamp: 2025-11-24T18:06:03.545Z
Learning: Applies to crates/biome_parser/**/lexer/mod.rs : Implement a `Lexer` trait from `biome_parser` crate for the lexer struct that consumes characters from source code and emits tokens

Applied to files:

  • crates/biome_markdown_parser/src/lexer/tests.rs
  • crates/biome_markdown_parser/src/lexer/mod.rs
📚 Learning: 2025-11-24T18:06:12.048Z
Learnt from: CR
Repo: biomejs/biome PR: 0
File: crates/biome_service/CONTRIBUTING.md:0-0
Timestamp: 2025-11-24T18:06:12.048Z
Learning: Applies to crates/biome_service/src/workspace/watcher.tests.rs : Implement watcher tests for workspace methods in watcher.tests.rs and end-to-end tests in LSP tests

Applied to files:

  • crates/biome_markdown_parser/src/lexer/tests.rs
📚 Learning: 2025-12-12T10:11:05.564Z
Learnt from: CR
Repo: biomejs/biome PR: 0
File: CONTRIBUTING.md:0-0
Timestamp: 2025-12-12T10:11:05.564Z
Learning: Applies to crates/**/*.rs : Update inline rustdoc documentation for rules, assists, and their options when adding new features or changing existing features in Rust crates

Applied to files:

  • crates/biome_markdown_parser/src/lexer/tests.rs
  • crates/biome_markdown_parser/src/lexer/mod.rs
📚 Learning: 2025-11-24T18:05:20.371Z
Learnt from: CR
Repo: biomejs/biome PR: 0
File: crates/biome_formatter/CONTRIBUTING.md:0-0
Timestamp: 2025-11-24T18:05:20.371Z
Learning: Applies to crates/biome_formatter/**/biome_*_formatter/tests/spec_tests.rs : Use the `tests_macros::gen_tests!` macro in `spec_tests.rs` to generate test functions for each specification file matching the pattern `tests/specs/<language>/**/*.<ext>`

Applied to files:

  • crates/biome_markdown_parser/src/lexer/tests.rs
📚 Learning: 2025-11-24T18:05:27.810Z
Learnt from: CR
Repo: biomejs/biome PR: 0
File: crates/biome_js_formatter/CONTRIBUTING.md:0-0
Timestamp: 2025-11-24T18:05:27.810Z
Learning: Applies to crates/biome_js_formatter/**/*.rs : When formatting AST nodes, use mandatory tokens from the AST instead of hardcoding token strings (e.g., use `node.l_paren_token().format()` instead of `token("(")`)

Applied to files:

  • crates/biome_markdown_parser/src/lexer/tests.rs
  • crates/biome_markdown_parser/src/lexer/mod.rs
📚 Learning: 2025-12-04T13:29:49.287Z
Learnt from: dyc3
Repo: biomejs/biome PR: 8291
File: crates/biome_html_formatter/tests/specs/prettier/vue/html-vue/elastic-header.html:10-10
Timestamp: 2025-12-04T13:29:49.287Z
Learning: Files under `crates/biome_html_formatter/tests/specs/prettier` are test fixtures synced from Prettier and should not receive detailed code quality reviews (e.g., HTTP vs HTTPS, formatting suggestions, etc.). These files are test data meant to validate formatter behavior and should be preserved as-is.

Applied to files:

  • crates/biome_markdown_parser/src/lexer/tests.rs
📚 Learning: 2025-11-24T18:05:20.371Z
Learnt from: CR
Repo: biomejs/biome PR: 0
File: crates/biome_formatter/CONTRIBUTING.md:0-0
Timestamp: 2025-11-24T18:05:20.371Z
Learning: Create test infrastructure with `tests/specs` folder structure and `spec_test.rs`, `spec_tests.rs`, and `language.rs` files in test directories

Applied to files:

  • crates/biome_markdown_parser/src/lexer/tests.rs
📚 Learning: 2025-12-19T12:53:30.399Z
Learnt from: CR
Repo: biomejs/biome PR: 0
File: crates/biome_analyze/CONTRIBUTING.md:0-0
Timestamp: 2025-12-19T12:53:30.399Z
Learning: Applies to crates/biome_analyze/**/*analyze/src/**/*.rs : Avoid string allocations by comparing against `&str` or using `TokenText`

Applied to files:

  • crates/biome_markdown_parser/src/lexer/tests.rs
📚 Learning: 2025-11-24T18:06:03.545Z
Learnt from: CR
Repo: biomejs/biome PR: 0
File: crates/biome_parser/CONTRIBUTING.md:0-0
Timestamp: 2025-11-24T18:06:03.545Z
Learning: Applies to crates/biome_parser/**/src/**/*.rs : Use `p.eat(token)` for optional tokens, `p.expect(token)` for required tokens, `parse_rule(p).ok(p)` for optional nodes, and `parse_rule(p).or_add_diagnostic(p, error)` for required nodes

Applied to files:

  • crates/biome_markdown_parser/src/lexer/tests.rs
  • crates/biome_markdown_parser/src/lexer/mod.rs
📚 Learning: 2025-11-09T12:47:46.298Z
Learnt from: ematipico
Repo: biomejs/biome PR: 8031
File: crates/biome_html_parser/src/syntax/svelte.rs:140-147
Timestamp: 2025-11-09T12:47:46.298Z
Learning: In the Biome HTML parser, `expect` and `expect_with_context` consume the current token and then lex the next token. The context parameter in `expect_with_context` controls how the next token (after the consumed one) is lexed, not the current token being consumed. For example, in Svelte parsing, after `bump_with_context(T!["{:"], HtmlLexContext::Svelte)`, the next token is already lexed in the Svelte context, so `expect(T![else])` is sufficient unless the token after `else` also needs to be lexed in a specific context.

Applied to files:

  • crates/biome_markdown_parser/src/lexer/tests.rs
  • crates/biome_markdown_parser/src/lexer/mod.rs
📚 Learning: 2025-11-24T18:06:03.545Z
Learnt from: CR
Repo: biomejs/biome PR: 0
File: crates/biome_parser/CONTRIBUTING.md:0-0
Timestamp: 2025-11-24T18:06:03.545Z
Learning: Applies to crates/biome_parser/**/*.ungram : Add a legend comment to `.ungram` files explaining the grammar syntax including comments, non-terminals, tokens, sequences, alternations, repetitions, and optional elements

Applied to files:

  • crates/biome_markdown_parser/src/lexer/mod.rs
📚 Learning: 2025-11-24T18:06:03.545Z
Learnt from: CR
Repo: biomejs/biome PR: 0
File: crates/biome_parser/CONTRIBUTING.md:0-0
Timestamp: 2025-11-24T18:06:03.545Z
Learning: Applies to crates/biome_parser/**/src/**/*.rs : Implement error recovery in list parsing using `or_recover()` to wrap unparseable tokens in a `BOGUS_*` node and consume tokens until a recovery token is found

Applied to files:

  • crates/biome_markdown_parser/src/lexer/mod.rs
📚 Learning: 2025-12-19T12:53:30.399Z
Learnt from: CR
Repo: biomejs/biome PR: 0
File: crates/biome_analyze/CONTRIBUTING.md:0-0
Timestamp: 2025-12-19T12:53:30.399Z
Learning: Applies to crates/biome_analyze/**/*analyze/src/**/*.rs : Prefix line with `#` in documentation code examples sparingly; prefer concise complete snippets

Applied to files:

  • crates/biome_markdown_parser/src/lexer/mod.rs
📚 Learning: 2025-11-24T18:05:27.810Z
Learnt from: CR
Repo: biomejs/biome PR: 0
File: crates/biome_js_formatter/CONTRIBUTING.md:0-0
Timestamp: 2025-11-24T18:05:27.810Z
Learning: Applies to crates/biome_js_formatter/**/*.rs : For tokens that are not mandatory, use helper functions instead of hardcoding

Applied to files:

  • crates/biome_markdown_parser/src/lexer/mod.rs
📚 Learning: 2025-11-24T18:06:03.545Z
Learnt from: CR
Repo: biomejs/biome PR: 0
File: crates/biome_parser/CONTRIBUTING.md:0-0
Timestamp: 2025-11-24T18:06:03.545Z
Learning: Applies to crates/biome_parser/**/src/**/*.rs : A parser struct must implement the `Parser` trait and save the token source, parser context, and optional parser options

Applied to files:

  • crates/biome_markdown_parser/src/lexer/mod.rs
🧬 Code graph analysis (1)
crates/biome_markdown_parser/src/lexer/mod.rs (1)
crates/biome_unicode_table/src/lib.rs (1)
  • lookup_byte (106-109)
🔇 Additional comments (7)
crates/biome_markdown_parser/src/lexer/tests.rs (3)

364-392: LGTM!

The escape sequence tests correctly verify CommonMark escape handling: punctuation escapes are grouped with their backslash, while non-punctuation escapes leave the backslash as a standalone token.


232-240: Good coverage for textual grouping.

These tests directly verify that consecutive text characters are batched into single MD_TEXTUAL_LITERAL tokens rather than per-character nodes — precisely what was requested in the PR feedback regarding CST size.

Also applies to: 262-273, 299-308


143-164: LGTM!

Thematic break tests cover the key CommonMark variations including spaced markers, confirming that internal whitespace is included in the literal token.

crates/biome_markdown_parser/src/lexer/mod.rs (4)

200-251: LGTM!

The escape handling correctly implements CommonMark's backslash escape rules with the full set of escapable ASCII punctuation characters.


493-540: Core fix for CST bloat — well implemented.

This grouping logic directly addresses the reviewer concern about per-letter nodes. By batching consecutive non-special characters into single MD_TEXTUAL_LITERAL tokens, the CST size becomes linear rather than exponential with document length.

The dispatch-based boundary detection is thorough and correctly handles multi-byte UTF-8 via advance_char_unchecked().


13-14: LGTM!

The dispatch-based token routing is clean and efficient. Using lookup_byte for character classification keeps the lexer fast while supporting the full Markdown token set.

Also applies to: 180-198


345-352: LGTM!

Good defensive check — IDT dispatch covers both letters and underscore, so explicitly filtering for b'_' prevents letters from being mishandled as emphasis markers.

Comment on lines 381 to 390
// Check for double emphasis markers (**, __, --)
if self.peek_byte() == Some(start_char) {
self.advance(2);
return match start_char {
b'*' => DOUBLE_STAR,
b'_' => DOUBLE_UNDERSCORE,
b'-' => MINUS, // No DOUBLE_MINUS in grammar, use MINUS
_ => unreachable!(),
};
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor

Potential inconsistency: -- returns MINUS with length 2.

When encountering --, this advances by 2 bytes but returns MINUS, which is typically a single-character token. This creates an inconsistency where the same token kind can have different lengths (1 or 2), which may confuse downstream parsing or source reconstruction.

Consider either:

  1. Adding a DOUBLE_MINUS token to the grammar, or
  2. Returning two separate MINUS tokens by only advancing 1 byte here.
🔎 Option 2: Return single MINUS for consistency
-        // Check for double emphasis markers (**, __, --)
-        if self.peek_byte() == Some(start_char) {
-            self.advance(2);
-            return match start_char {
-                b'*' => DOUBLE_STAR,
-                b'_' => DOUBLE_UNDERSCORE,
-                b'-' => MINUS, // No DOUBLE_MINUS in grammar, use MINUS
-                _ => unreachable!(),
-            };
-        }
+        // Check for double emphasis markers (**, __)
+        // Note: -- is not doubled; let it fall through to single MINUS
+        if self.peek_byte() == Some(start_char) && start_char != b'-' {
+            self.advance(2);
+            return match start_char {
+                b'*' => DOUBLE_STAR,
+                b'_' => DOUBLE_UNDERSCORE,
+                _ => unreachable!(),
+            };
+        }
🤖 Prompt for AI Agents
In crates/biome_markdown_parser/src/lexer/mod.rs around lines 381–390, the lexer
treats `--` by advancing 2 bytes but returns a single-character MINUS token,
causing token-length inconsistency; change the logic so that when start_char ==
b'-' you only advance(1) and return MINUS (while keeping advance(2) for `*` and
`_`), ensuring `--` will be tokenized as two separate MINUS tokens; update or
add lexer tests to cover `-` vs `--` cases and adjust any downstream assumptions
about MINUS token length if necessary.

@jfmcdowell
Copy link
Contributor Author

Thank you @jfmcdowell

Please review the implementation of the parser. While I understand you used AI assistance, a human intervention is always required.

Here's an example: check the emitted CST in the snapshots. For each letter inside a paragraph, we create a node. This implantation will create an exponentially big CST for large documents. That's not acceptable for us and the users. Please review the parsing logic so that the emitted CST groups text into a single node.

Fixed the issue where each character in a paragraph created a separate MdTextual node. The lexer's
consume_textual() now groups consecutive non-special characters into a single MD_TEXTUAL_LITERAL token.

Copy link
Member

@ematipico ematipico left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That's better, but I think we need to change things.

Compared to other languages, markdown is purely markup. This means that: trivia don't exist, exception for some rare cases.

The markdown lexer needs to treat whitespaces and newlines as part of the language:

  • whitespaces are part of a text
  • newlines become relevant during parsing

@jfmcdowell jfmcdowell force-pushed the feat/markdown-support-next branch from a7c2a67 to 1102636 Compare December 21, 2025 12:04
Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 4

🧹 Nitpick comments (2)
crates/biome_markdown_parser/src/lexer/mod.rs (2)

211-243: Consider refactoring the escapable character match for maintainability.

The 33-line match against escapable ASCII punctuation is difficult to read and maintain. Consider extracting this into a helper method or using a lookup approach.

🔎 Example refactor using a helper function

Add a helper method:

/// Returns true if the byte is an ASCII punctuation character that can be escaped
/// per the CommonMark spec: !"#$%&'()*+,-./:;<=>?@[\]^_`{|}~
#[inline]
fn is_escapable_ascii_punctuation(byte: u8) -> bool {
    matches!(
        byte,
        b'!' | b'"' | b'#' | b'$' | b'%' | b'&' | b'\'' 
        | b'(' | b')' | b'*' | b'+' | b',' | b'-' | b'.' | b'/'
        | b':' | b';' | b'<' | b'=' | b'>' | b'?' | b'@'
        | b'[' | b'\\' | b']' | b'^' | b'_' | b'`'
        | b'{' | b'|' | b'}' | b'~'
    )
}

Then simplify consume_escape:

     // Check if next character is escapable ASCII punctuation
-    if let Some(next) = self.current_byte()
-        && matches!(
-            next,
-            b'!' | b'"'
-                | b'#'
-                | b'$'
-                | b'%'
-                | b'&'
-                | b'\''
-                | b'('
-                | b')'
-                | b'*'
-                | b'+'
-                | b','
-                | b'-'
-                | b'.'
-                | b'/'
-                | b':'
-                | b';'
-                | b'<'
-                | b'='
-                | b'>'
-                | b'?'
-                | b'@'
-                | b'['
-                | b'\\'
-                | b']'
-                | b'^'
-                | b'_'
-                | b'`'
-                | b'{'
-                | b'|'
-                | b'}'
-                | b'~'
-        )
-    {
+    if let Some(next) = self.current_byte() && Self::is_escapable_ascii_punctuation(next) {
         // Consume the escaped character too
         self.advance(1);
     }

345-352: IDT dispatch handling could be more maintainable.

The IDT dispatch matches both letters (A-Z, a-z) and underscore, but only underscore is a Markdown emphasis marker. This special-case handling is duplicated in consume_textual (lines 526-531), making the pattern less obvious and potentially error-prone during future maintenance.

Consider adding a comment explaining why IDT requires special handling, or extracting the underscore check into a helper method for clarity.

📜 Review details

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between a7c2a67 and 1102636.

⛔ Files ignored due to path filters (7)
  • crates/biome_markdown_parser/tests/md_test_suite/ok/block_quote.md.snap is excluded by !**/*.snap and included by **
  • crates/biome_markdown_parser/tests/md_test_suite/ok/bullet_list.md.snap is excluded by !**/*.snap and included by **
  • crates/biome_markdown_parser/tests/md_test_suite/ok/fenced_code_block.md.snap is excluded by !**/*.snap and included by **
  • crates/biome_markdown_parser/tests/md_test_suite/ok/header.md.snap is excluded by !**/*.snap and included by **
  • crates/biome_markdown_parser/tests/md_test_suite/ok/indent_code_block.md.snap is excluded by !**/*.snap and included by **
  • crates/biome_markdown_parser/tests/md_test_suite/ok/inline_elements.md.snap is excluded by !**/*.snap and included by **
  • crates/biome_markdown_parser/tests/md_test_suite/ok/paragraph.md.snap is excluded by !**/*.snap and included by **
📒 Files selected for processing (2)
  • crates/biome_markdown_parser/src/lexer/mod.rs (5 hunks)
  • crates/biome_markdown_parser/src/lexer/tests.rs (2 hunks)
🚧 Files skipped from review as they are similar to previous changes (1)
  • crates/biome_markdown_parser/src/lexer/tests.rs
🧰 Additional context used
📓 Path-based instructions (1)
crates/**/*.rs

📄 CodeRabbit inference engine (CONTRIBUTING.md)

Update inline rustdoc documentation for rules, assists, and their options when adding new features or changing existing features in Rust crates

Files:

  • crates/biome_markdown_parser/src/lexer/mod.rs
🧠 Learnings (11)
📓 Common learnings
Learnt from: CR
Repo: biomejs/biome PR: 0
File: CONTRIBUTING.md:0-0
Timestamp: 2025-12-12T10:11:05.564Z
Learning: Applies to crates/**/*.rs : Update inline rustdoc documentation for rules, assists, and their options when adding new features or changing existing features in Rust crates
Learnt from: CR
Repo: biomejs/biome PR: 0
File: crates/biome_parser/CONTRIBUTING.md:0-0
Timestamp: 2025-11-24T18:06:03.545Z
Learning: Applies to crates/biome_parser/**/*.ungram : Add a legend comment to `.ungram` files explaining the grammar syntax including comments, non-terminals, tokens, sequences, alternations, repetitions, and optional elements
Learnt from: CR
Repo: biomejs/biome PR: 0
File: crates/biome_analyze/CONTRIBUTING.md:0-0
Timestamp: 2025-12-19T12:53:30.399Z
Learning: Applies to crates/biome_analyze/**/*analyze/src/**/*.rs : Prefix line with `#` in documentation code examples sparingly; prefer concise complete snippets
📚 Learning: 2025-11-24T18:06:03.545Z
Learnt from: CR
Repo: biomejs/biome PR: 0
File: crates/biome_parser/CONTRIBUTING.md:0-0
Timestamp: 2025-11-24T18:06:03.545Z
Learning: Applies to crates/biome_parser/**/lexer/mod.rs : Implement a `Lexer` trait from `biome_parser` crate for the lexer struct that consumes characters from source code and emits tokens

Applied to files:

  • crates/biome_markdown_parser/src/lexer/mod.rs
📚 Learning: 2025-11-24T18:06:03.545Z
Learnt from: CR
Repo: biomejs/biome PR: 0
File: crates/biome_parser/CONTRIBUTING.md:0-0
Timestamp: 2025-11-24T18:06:03.545Z
Learning: Applies to crates/biome_parser/**/src/**/*.rs : Implement a token source struct that wraps the lexer and implements `TokenSourceWithBufferedLexer` and `LexerWithCheckpoint` for lookahead and re-lexing capabilities

Applied to files:

  • crates/biome_markdown_parser/src/lexer/mod.rs
📚 Learning: 2025-11-24T18:06:03.545Z
Learnt from: CR
Repo: biomejs/biome PR: 0
File: crates/biome_parser/CONTRIBUTING.md:0-0
Timestamp: 2025-11-24T18:06:03.545Z
Learning: Applies to crates/biome_parser/**/src/**/*.rs : Use `p.eat(token)` for optional tokens, `p.expect(token)` for required tokens, `parse_rule(p).ok(p)` for optional nodes, and `parse_rule(p).or_add_diagnostic(p, error)` for required nodes

Applied to files:

  • crates/biome_markdown_parser/src/lexer/mod.rs
📚 Learning: 2025-12-12T10:11:05.564Z
Learnt from: CR
Repo: biomejs/biome PR: 0
File: CONTRIBUTING.md:0-0
Timestamp: 2025-12-12T10:11:05.564Z
Learning: Applies to crates/**/*.rs : Update inline rustdoc documentation for rules, assists, and their options when adding new features or changing existing features in Rust crates

Applied to files:

  • crates/biome_markdown_parser/src/lexer/mod.rs
📚 Learning: 2025-11-24T18:05:27.810Z
Learnt from: CR
Repo: biomejs/biome PR: 0
File: crates/biome_js_formatter/CONTRIBUTING.md:0-0
Timestamp: 2025-11-24T18:05:27.810Z
Learning: Applies to crates/biome_js_formatter/**/*.rs : When formatting AST nodes, use mandatory tokens from the AST instead of hardcoding token strings (e.g., use `node.l_paren_token().format()` instead of `token("(")`)

Applied to files:

  • crates/biome_markdown_parser/src/lexer/mod.rs
📚 Learning: 2025-12-19T12:53:30.399Z
Learnt from: CR
Repo: biomejs/biome PR: 0
File: crates/biome_analyze/CONTRIBUTING.md:0-0
Timestamp: 2025-12-19T12:53:30.399Z
Learning: Applies to crates/biome_analyze/**/*analyze/src/**/*.rs : Prefix line with `#` in documentation code examples sparingly; prefer concise complete snippets

Applied to files:

  • crates/biome_markdown_parser/src/lexer/mod.rs
📚 Learning: 2025-11-24T18:06:03.545Z
Learnt from: CR
Repo: biomejs/biome PR: 0
File: crates/biome_parser/CONTRIBUTING.md:0-0
Timestamp: 2025-11-24T18:06:03.545Z
Learning: Applies to crates/biome_parser/**/*.ungram : Add a legend comment to `.ungram` files explaining the grammar syntax including comments, non-terminals, tokens, sequences, alternations, repetitions, and optional elements

Applied to files:

  • crates/biome_markdown_parser/src/lexer/mod.rs
📚 Learning: 2025-11-24T18:06:03.545Z
Learnt from: CR
Repo: biomejs/biome PR: 0
File: crates/biome_parser/CONTRIBUTING.md:0-0
Timestamp: 2025-11-24T18:06:03.545Z
Learning: Applies to crates/biome_parser/**/src/**/*.rs : Implement error recovery in list parsing using `or_recover()` to wrap unparseable tokens in a `BOGUS_*` node and consume tokens until a recovery token is found

Applied to files:

  • crates/biome_markdown_parser/src/lexer/mod.rs
📚 Learning: 2025-12-19T12:53:30.399Z
Learnt from: CR
Repo: biomejs/biome PR: 0
File: crates/biome_analyze/CONTRIBUTING.md:0-0
Timestamp: 2025-12-19T12:53:30.399Z
Learning: Applies to crates/biome_analyze/**/*analyze/src/**/*.rs : Avoid string allocations by comparing against `&str` or using `TokenText`

Applied to files:

  • crates/biome_markdown_parser/src/lexer/mod.rs
📚 Learning: 2025-11-09T12:47:46.298Z
Learnt from: ematipico
Repo: biomejs/biome PR: 8031
File: crates/biome_html_parser/src/syntax/svelte.rs:140-147
Timestamp: 2025-11-09T12:47:46.298Z
Learning: In the Biome HTML parser, `expect` and `expect_with_context` consume the current token and then lex the next token. The context parameter in `expect_with_context` controls how the next token (after the consumed one) is lexed, not the current token being consumed. For example, in Svelte parsing, after `bump_with_context(T!["{:"], HtmlLexContext::Svelte)`, the next token is already lexed in the Svelte context, so `expect(T![else])` is sufficient unless the token after `else` also needs to be lexed in a specific context.

Applied to files:

  • crates/biome_markdown_parser/src/lexer/mod.rs
🧬 Code graph analysis (1)
crates/biome_markdown_parser/src/lexer/mod.rs (1)
crates/biome_unicode_table/src/lib.rs (1)
  • lookup_byte (106-109)
🔇 Additional comments (7)
crates/biome_markdown_parser/src/lexer/mod.rs (7)

1-14: Documentation and imports updated appropriately.

The module documentation now accurately describes the Markdown lexer, and the import changes support the dispatch-based token handling.


180-198: Dispatch table correctly routes Markdown tokens.

The expanded dispatcher appropriately handles all Markdown syntax characters, routing them to dedicated consumption methods.


381-390: The -- token-length inconsistency has been resolved.

The past review concern about -- advancing 2 bytes whilst returning a single-character MINUS token has been addressed. Line 383 now explicitly excludes - from double-marker treatment, so each hyphen in -- is lexed separately as an individual MINUS token with consistent length.


493-540: Textual batching successfully addresses CST bloat concern.

This implementation groups consecutive textual characters into a single MD_TEXTUAL_LITERAL token, directly addressing the issue raised in the PR objectives where each letter was previously emitted as a separate node. The dispatch-based boundary detection correctly identifies when to stop batching, maintaining accurate parsing whilst improving efficiency.


258-262: Inconsistency with AI-generated summary.

The AI summary states "Removed the dead_code attribute from the eat_byte method", but the #[expect(dead_code)] attribute is still present at line 258. If this method is now used, the attribute should be removed; otherwise, the summary is incorrect.


409-422: Convert comment to proper rustdoc format.

The comment at line 409 should use /// for rustdoc instead of // to be included in generated documentation.

As per coding guidelines, update inline rustdoc documentation when adding new features.

⛔ Skipped due to learnings
Learnt from: CR
Repo: biomejs/biome PR: 0
File: crates/biome_analyze/CONTRIBUTING.md:0-0
Timestamp: 2025-12-19T12:53:30.399Z
Learning: Applies to crates/biome_analyze/**/*analyze/src/**/*.rs : Prefix line with `#` in documentation code examples sparingly; prefer concise complete snippets
Learnt from: CR
Repo: biomejs/biome PR: 0
File: CONTRIBUTING.md:0-0
Timestamp: 2025-12-12T10:11:05.564Z
Learning: Applies to crates/**/*.rs : Update inline rustdoc documentation for rules, assists, and their options when adding new features or changing existing features in Rust crates
Learnt from: CR
Repo: biomejs/biome PR: 0
File: crates/biome_analyze/CONTRIBUTING.md:0-0
Timestamp: 2025-12-19T12:53:30.399Z
Learning: Applies to crates/biome_analyze/**/*analyze/src/**/*.rs : Lines prefixed with `#` in rule documentation code examples will be hidden from output
Learnt from: CR
Repo: biomejs/biome PR: 0
File: crates/biome_analyze/CONTRIBUTING.md:0-0
Timestamp: 2025-12-19T12:53:30.399Z
Learning: Applies to crates/biome_analyze/**/*analyze/src/**/*.rs : Rule documentation code blocks should be ordered as language, expect_diagnostic, options/full_options/use_options, ignore, file
Learnt from: CR
Repo: biomejs/biome PR: 0
File: crates/biome_analyze/CONTRIBUTING.md:0-0
Timestamp: 2025-12-19T12:53:30.399Z
Learning: Applies to crates/biome_analyze/**/*analyze/src/**/*.rs : The first paragraph of rule documentation must be a single line describing what the rule does
Learnt from: CR
Repo: biomejs/biome PR: 0
File: crates/biome_parser/CONTRIBUTING.md:0-0
Timestamp: 2025-11-24T18:06:03.545Z
Learning: Applies to crates/biome_parser/**/*.ungram : Add a legend comment to `.ungram` files explaining the grammar syntax including comments, non-terminals, tokens, sequences, alternations, repetitions, and optional elements
Learnt from: CR
Repo: biomejs/biome PR: 0
File: crates/biome_analyze/CONTRIBUTING.md:0-0
Timestamp: 2025-12-19T12:53:30.399Z
Learning: Applies to crates/biome_analyze/**/*analyze/src/**/*.rs : Use `options` code block property for rule-specific configuration snippets in documentation
Learnt from: CR
Repo: biomejs/biome PR: 0
File: crates/biome_analyze/CONTRIBUTING.md:0-0
Timestamp: 2025-12-19T12:53:30.399Z
Learning: Applies to crates/biome_analyze/**/*analyze/src/**/*.rs : Rule documentation code blocks must have a language defined for syntax highlighting
Learnt from: CR
Repo: biomejs/biome PR: 0
File: crates/biome_analyze/CONTRIBUTING.md:0-0
Timestamp: 2025-12-19T12:53:30.399Z
Learning: Applies to crates/biome_analyze/**/*analyze/src/**/*.rs : Invalid code examples in rule documentation must be marked with `expect_diagnostic` code block property
Learnt from: CR
Repo: biomejs/biome PR: 0
File: crates/biome_analyze/CONTRIBUTING.md:0-0
Timestamp: 2025-12-19T12:53:30.399Z
Learning: Applies to crates/biome_analyze/**/*analyze/src/**/*.rs : Rule documentation must include `## Options` section if the rule has options

493-494: Convert comment to proper rustdoc format.

The comments at lines 493-494 should use /// for rustdoc instead of // to be included in generated documentation.

As per coding guidelines, update inline rustdoc documentation when adding new features.

⛔ Skipped due to learnings
Learnt from: CR
Repo: biomejs/biome PR: 0
File: CONTRIBUTING.md:0-0
Timestamp: 2025-12-12T10:11:05.564Z
Learning: Applies to crates/**/*.rs : Update inline rustdoc documentation for rules, assists, and their options when adding new features or changing existing features in Rust crates
Learnt from: CR
Repo: biomejs/biome PR: 0
File: crates/biome_analyze/CONTRIBUTING.md:0-0
Timestamp: 2025-12-19T12:53:30.399Z
Learning: Applies to crates/biome_analyze/**/*analyze/src/**/*.rs : Prefix line with `#` in documentation code examples sparingly; prefer concise complete snippets
Learnt from: CR
Repo: biomejs/biome PR: 0
File: crates/biome_analyze/CONTRIBUTING.md:0-0
Timestamp: 2025-12-19T12:53:30.399Z
Learning: Applies to crates/biome_analyze/**/*analyze/src/**/*.rs : Invalid code examples in rule documentation must be marked with `expect_diagnostic` code block property
Learnt from: CR
Repo: biomejs/biome PR: 0
File: crates/biome_analyze/CONTRIBUTING.md:0-0
Timestamp: 2025-12-19T12:53:30.399Z
Learning: Applies to crates/biome_analyze/**/*analyze/src/**/*.rs : Use `options` code block property for rule-specific configuration snippets in documentation
Learnt from: CR
Repo: biomejs/biome PR: 0
File: crates/biome_analyze/CONTRIBUTING.md:0-0
Timestamp: 2025-12-19T12:53:30.399Z
Learning: Applies to crates/biome_analyze/**/*analyze/src/**/*.rs : Use `use_options` code block property for code examples that follow an options configuration in documentation
Learnt from: CR
Repo: biomejs/biome PR: 0
File: crates/biome_analyze/CONTRIBUTING.md:0-0
Timestamp: 2025-12-19T12:53:30.399Z
Learning: Applies to crates/biome_analyze/**/*analyze/src/**/*.rs : Lines prefixed with `#` in rule documentation code examples will be hidden from output
Learnt from: CR
Repo: biomejs/biome PR: 0
File: crates/biome_formatter/CONTRIBUTING.md:0-0
Timestamp: 2025-11-24T18:05:20.371Z
Learning: Applies to crates/biome_formatter/**/biome_*_formatter/src/comments.rs : Define `<Language>CommentStyle` as a public type alias for `Comments<<Language>Language>` in a `comments.rs` file
Learnt from: CR
Repo: biomejs/biome PR: 0
File: crates/biome_analyze/CONTRIBUTING.md:0-0
Timestamp: 2025-12-19T12:53:30.399Z
Learning: Applies to crates/biome_analyze/**/*analyze/src/**/*.rs : Rule documentation must include `## Examples` section with `### Invalid` and `### Valid` subsections
Learnt from: CR
Repo: biomejs/biome PR: 0
File: crates/biome_analyze/CONTRIBUTING.md:0-0
Timestamp: 2025-12-19T12:53:30.399Z
Learning: Applies to crates/biome_analyze/**/*analyze/src/**/*.rs : Rule documentation code blocks should be ordered as language, expect_diagnostic, options/full_options/use_options, ignore, file
Learnt from: CR
Repo: biomejs/biome PR: 0
File: crates/biome_analyze/CONTRIBUTING.md:0-0
Timestamp: 2025-12-19T12:53:30.399Z
Learning: Applies to crates/biome_analyze/**/*analyze/src/**/*.rs : Rule documentation must include `## Options` section if the rule has options

Comment on lines 200 to 202
/// Consume a backslash escape sequence.
/// Per CommonMark spec, a backslash before ASCII punctuation makes it literal.
/// Escapable: !"#$%&'()*+,-./:;<=>?@[\]^_`{|}~
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🛠️ Refactor suggestion | 🟠 Major

Add rustdoc documentation.

This method lacks documentation. Per coding guidelines, rustdoc should be added when introducing new features. Consider documenting the CommonMark escape behaviour and which characters are escapable.

As per coding guidelines, update inline rustdoc documentation when adding new features.

🤖 Prompt for AI Agents
In crates/biome_markdown_parser/src/lexer/mod.rs around lines 200 to 202, the
method that consumes a backslash escape sequence is missing rustdoc; add a
triple-slash doc comment above the method that briefly states its purpose
(consume a backslash escape per CommonMark), describes the behavior (a backslash
before ASCII punctuation makes that character literal), lists or references the
set of escapable characters (e.g. !"#$%&'()*+,-./:;<=>?@[\]^_`{|}~), and include
a short note pointing to the CommonMark spec and a tiny usage example or
edge-case note (e.g., that non-ASCII characters are not escaped). Ensure the
wording is concise and follows project rustdoc style.

Comment on lines 337 to 339
/// Consumes thematic break literal or returns emphasis marker tokens.
/// Called when we see *, -, or _.
fn consume_thematic_break_or_emphasis(&mut self, dispatched: Dispatch) -> MarkdownSyntaxKind {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🛠️ Refactor suggestion | 🟠 Major

Add rustdoc documentation.

This method lacks documentation explaining when it returns a thematic break versus emphasis tokens, and the special handling for the IDT dispatch case.

As per coding guidelines, update inline rustdoc documentation when adding new features.

Comment on lines 402 to 407
/// Consume a single hash character for ATX headers
fn consume_hash(&mut self) -> MarkdownSyntaxKind {
self.assert_at_char_boundary();
self.advance(1);
HASH
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🛠️ Refactor suggestion | 🟠 Major

Add rustdoc documentation.

Brief documentation explaining that this consumes a single hash character for ATX headers would improve code clarity.

As per coding guidelines, update inline rustdoc documentation when adding new features.

🤖 Prompt for AI Agents
In crates/biome_markdown_parser/src/lexer/mod.rs around lines 402 to 407, the
private method consume_hash lacks rustdoc; add a concise /// doc comment above
the function stating that it consumes a single '#' character used for ATX
headers, ensures the current position is at a UTF-8 character boundary, advances
the lexer by one byte, and returns the HASH MarkdownSyntaxKind token; keep the
doc short and aligned with project guidelines and update it if behavior changes
in future.

Comment on lines 424 to 437
/// Consume tilde(s) - either single for other uses or triple for fenced code blocks
fn consume_tilde(&mut self) -> MarkdownSyntaxKind {
self.assert_at_char_boundary();

// Check for triple tilde
if self.peek_byte() == Some(b'~') && self.byte_at(2) == Some(b'~') {
self.advance(3);
return TRIPLE_TILDE;
}

// Single tilde
self.advance(1);
TILDE
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🛠️ Refactor suggestion | 🟠 Major

Convert comment to proper rustdoc format.

The comment at line 424 should use /// for rustdoc instead of // to be included in generated documentation.

As per coding guidelines, update inline rustdoc documentation when adding new features.

🤖 Prompt for AI Agents
In crates/biome_markdown_parser/src/lexer/mod.rs around lines 424 to 437, the
leading comment for consume_tilde should be a rustdoc comment; replace the
current inline comment with a triple-slash doc comment (///) immediately above
the fn declaration and ensure the text is formatted as a single-line or short
paragraph rustdoc description so it appears in generated docs.

@jfmcdowell jfmcdowell requested a review from ematipico December 21, 2025 16:40
@jfmcdowell jfmcdowell force-pushed the feat/markdown-support-next branch from 7d3d2b9 to 2634098 Compare December 30, 2025 01:36
@codspeed-hq
Copy link

codspeed-hq bot commented Dec 30, 2025

Merging this PR will not alter performance

✅ 58 untouched benchmarks
⏩ 95 skipped benchmarks1


Comparing jfmcdowell:feat/markdown-support-next (eae8e0f) with next (375792e)

Open in CodSpeed

Footnotes

  1. 95 benchmarks were skipped, so the baseline results were used instead. If they were deleted from the codebase, click here and archive them to remove them from the performance reports.

@jfmcdowell
Copy link
Contributor Author

@ematipico , holding here to make sure this is directionally correct before finishing this out.

Comment on lines 47 to 48
let is_backtick_fence = rest.starts_with("```");
let is_tilde_fence = rest.starts_with("~~~");
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Little known feature about markdown, you can use more than 3 backticks to start a code block, so you can have 3 backticks inside the code block.

From "4.5 Fenced code blocks" (emphasis my own):

A code fence is a sequence of at least three consecutive backtick characters (`) or tildes (~).

Not sure if you want to resolve that in this PR or a new one.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for flagging this. The implementation does handle 3+ backticks correctly, but looking at the code again I can see why it wasn't clear.

You can verify the behavior with CommonMark spec examples 124-127:
Edit debug_single_example() to set example_num = 124, then run:

cargo test -p biome_markdown_parser --test commonmark_spec debug_single_example -- --nocapture --ignored

I've refactored the code to make the rule self-documenting. The previous starts_with("```") approach worked but wasn't obvious at a glance. Now it explicitly counts fence characters and validates against a MIN_FENCE_LENGTH constant with CommonMark 4.5 cited in the docs:

/// Minimum number of fence characters required per CommonMark 4.5.
 const MIN_FENCE_LENGTH: usize = 3;

/// Per CommonMark 4.5: "A code fence is a sequence of at least three
/// consecutive backtick characters (`) or tildes (~)."
fn detect_fence(s: &str) -> Option<(char, usize)> { ... }

Appreciate the review — this is much clearer now. Commit: fb5b4ef

Comment on lines 106 to 121
/// Parse a list of hash tokens as MdHashList containing MdHash nodes.
/// Returns the number of hashes parsed.
fn parse_hash_list(p: &mut MarkdownParser) -> usize {
let m = p.start();
let mut count = 0;

while p.at(T![#]) {
let hash_m = p.start();
p.bump(T![#]);
hash_m.complete(p, MD_HASH);
count += 1;
}

m.complete(p, MD_HASH_LIST);
count
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This feels like it would be better handled in the lexer. The way its currently set up, we are going to be allocating a bunch of nodes for a header (for a h6, it would be 6 MdHash + 1 MdHashList node). Plus, this implementation looks like it could allow whitespace in between the hashes (which is not allowed).

Instead, it would be way easier to emit one token for the entire header, and then consumers can tell which level the header is by checking it's length.

BTW, this is not how we parse lists in our parsers. We usually have a struct that implements the ParseNodeList trait.

Copy link
Contributor Author

@jfmcdowell jfmcdowell Jan 19, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the feedback. I've reworked the header parsing:

Node allocations: Reduced from N+1 nodes to just 2 nodes (MdHash + MdHashList) regardless of heading level.
The lexer already emits consecutive # as a single token - now we use p.cur_text().len() to get the heading level.

Whitespace between hashes: Not possible - the lexer only groups consecutive # into one token. Whitespace breaks the sequence, so # # # foo correctly becomes <h1># # foo</h1>.

Commit: bf3a261

Comment on lines 1 to 11
//! HTML block parsing for Markdown (CommonMark §4.6).
//!
//! Per CommonMark §4.6, there are 7 types of HTML blocks:
//!
//! 1. `<script`, `<pre`, `<style`, `<textarea` - ends at closing tag
//! 2. `<!--` - ends at `-->`
//! 3. `<?` - ends at `?>`
//! 4. `<!` + uppercase letter - ends at `>`
//! 5. `<![CDATA[` - ends at `]]>`
//! 6. Block-level HTML tags (div, p, table, etc.) - ends at blank line
//! 7. Other tags - ends at blank line
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not entirely sure how I feel about this. On one hand, it is literally defined like this in the commonmark spec, but on the other hand, we already have an html parser.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I believe we should treat the HTML as a special node, but then everything as text. Then we can use snippets capabilities via workspace. Essentially, same logic we have for <script> and <style> contents in our HTML parser.

Copy link
Contributor Author

@jfmcdowell jfmcdowell Jan 19, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the feedback. I simplified HTML block parsing per your suggestion. HTML content is now captured as raw text rather than implementing all 7 CommonMark 4.6 termination rules. This requires workspace snippets integration in the future.

Trade-off: All HTML blocks terminate at blank lines (~70% CommonMark HTML blocks compliance).

Commit: cb3888d

Comment on lines 118 to 123
fn parse_quote_block_list(p: &mut MarkdownParser) {
let m = p.start();
let mut first_line = true;
let depth = p.state().block_quote_depth;
let mut last_block_was_paragraph = false;

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This should be a ParseNodeList impl

Copy link
Contributor Author

@jfmcdowell jfmcdowell Jan 20, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the feedback. I refactored block parsing to return ParsedSyntax instead of ParsedBlockKind enum, enabling proper ParseNodeList implementation for QuoteBlockList. Commit: a3af0c2

I also extended the pattern to root-level document parsing with DocumentBlockList. Commit: 753adcc

The markdown parser now uses ParseNodeList consistently for all major list structures:

  • DocumentBlockList (root document blocks)
  • QuoteBlockList (block quote content)
  • BulletList/ OrderedList (list items)

Comment on lines +407 to +414
/// Check if text at current position matches a valid entity reference pattern.
/// Returns the length of the entity if valid, None otherwise.
///
/// Patterns per CommonMark §6.2:
/// - Named: `&name;` where name is 2-31 alphanumeric chars starting with letter
/// - Decimal: `&#digits;` where digits is 1-7 decimal digits
/// - Hex: `&#xhex;` or `&#Xhex;` where hex is 1-6 hex digits
fn match_entity_reference(&self) -> Option<usize> {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not sure if we care about actually parsing these? We don't care about them in the HTML parser, we just treat them as regular text.

Copy link
Contributor Author

@jfmcdowell jfmcdowell Jan 20, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good point, they were added for CommonMark 6.2 compliance but happy to remove if treating them as plain text is the preferred approach. Would simplify the lexer.

Copy link
Member

@ematipico ematipico left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Great, thank you. I think we can merge it, however I want to understand if you @jfmcdowell are willing to continue to help or not

```
unclosed_code_span.md:1:10 parse ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━

× Unclosed code span, expected closing ```.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is incorrect. The suggestion should say that only one back tick should be required

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed in commit 5992b2e. The format strings were wrapping {backticks} with literal backticks, adding 2 extra characters. Now correctly shows the actual number of opening backticks in the error message.


× Unclosed link, expected `]` to close link text.

> 1 │ This has [unclosed link
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: ideally the range should span the text too

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed in c1c1fe1. The error range now spans the entire unclosed content (e.g., [unclosed link or ![unclosed image) instead of just the opening bracket.


× Unclosed image, expected `]` to close alt text.

> 1 │ This has ![unclosed image
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: same here regarding the span

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also fixed in c1c1fe1. The error range now spans the entire unclosed content (e.g., [unclosed link or ![unclosed image) instead of just the opening bracket.

@@ -0,0 +1,32 @@
use biome_markdown_parser::parse_markdown;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Conformance tests should go inside the proper folder https://github.com/biomejs/biome/tree/main/xtask/coverage/src

Not a blocker, but we should address it

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Acknowledged. I'd like to address this in a separate PR.

@ematipico
Copy link
Member

ematipico commented Jan 21, 2026

One more thing.

Please create a just task to run the conformance, and rebase your branch so that CI passes

@jfmcdowell
Copy link
Contributor Author

Great, thank you. I think we can merge it, however I want to understand if you @jfmcdowell are willing to continue to help or not

Absolutely! Happy to keep contributing where I can.

jfmcdowell and others added 6 commits January 22, 2026 20:48
…harness

CommonMark 0.31.2 parser achieving 75.2% spec compliance (490/652 examples).

Block elements:
- ATX/setext headings, fenced/indented code blocks, HTML blocks
- Block quotes with lazy continuation, thematic breaks
- Ordered/unordered lists with nesting and tightness detection
- Link reference definitions

Inline elements:
- Emphasis via delimiter stack algorithm (§6.4)
- Links, images, autolinks, code spans
- Entity references, backslash escapes, inline HTML
- Hard/soft line breaks

Testing:
- CommonMark spec test harness with HTML comparison
- Feature-gated to_html utility, 59 edge-case tests
 Add unconditional no-progress recovery in block parsing and replace unsafe unreachable_unchecked() with safe
  unreachable!().

 Introduce an upfront oversized-input guard in the lexer and make max nesting depth configurable via parse options.

 Update list/quote handling and related snapshots to match the new recovery behavior and configurable limits.
…menting

Add explicit `detect_fence()` helper with `MIN_FENCE_LENGTH` constant
that clearly documents the CommonMark §4.5 requirement: "A code fence
is a sequence of at least three consecutive backtick or tilde characters."

The previous `starts_with("```")` approach worked correctly for 3+ backticks
but wasn't self-documenting. The new implementation explicitly counts
consecutive fence characters and validates against the minimum length.

Also improve the spec test harness to gracefully handle bogus nodes
instead of panicking.
…uote grammar

Address maintainer feedback about header parsing allocating too many nodes.
Previously, `###### Heading 6` would create 7 nodes (6 MdHash + 1 MdHashList).
Now it creates only 2 nodes (1 MdHash + 1 MdHashList).

Changes:
- Lexer: emit all consecutive `#` characters as a single HASH token
- Parser: determine heading level from token text length instead of counting nodes
- Grammar: fix MdQuote to use `content: MdBlockList` (matches MdBullet pattern)
- Update helper functions to work with consolidated tokens

The grammar fix restores CommonMark compliance from 69.3% back to 75.2%,
with Block quotes returning to 100% (25/25) and HTML blocks at 100% (44/44).
Capture HTML content as raw text rather than implementing all 7 CommonMark
§4.6 termination rules. This keeps the parser simple; full HTML parsing can
be added via workspace snippets integration in the future.

Trade-off: All HTML blocks terminate at blank lines, so not 100% CommonMark
§4.6 compliant (types 1-5 have specific terminators like -->, ?>, etc.).
Block parsing functions now return ParsedSyntax instead of ParsedBlockKind
enum, enabling proper ParseNodeList trait implementation for QuoteBlockList.

- parse_paragraph, parse_indent_code_block return ParsedSyntax
- parse_any_block_with_indent_code_policy returns ParsedSyntax
- Added is_paragraph_like() helper for lazy continuation logic
- Updated call sites in quote.rs and list.rs
Root-level document block parsing now uses ParseNodeList trait instead of
manual while loop, aligning with project conventions used by QuoteBlockList,
BulletList, and OrderedList.
File size limits are already enforced outside of parsers, so the
lexer-level MAX_MARKDOWN_SOURCE_SIZE check would never be hit.
The error message for unclosed inline code spans was incorrectly showing
triple backticks (```) instead of the actual number of opening backticks.

For input like `This has `unclosed code`, the error said "expected closing
```" but should say "expected closing `".

The bug was in the format strings in `unclosed_code_span()` which wrapped
the `{backticks}` variable with literal backticks, adding 2 extra characters.
Since the variable already contained the correct backticks, this resulted in
3 backticks when only 1 was expected.

Also includes justfile fix from PR biomejs#8789 to remove errant `run` argument
from test commands.
…ge text

The error diagnostic for unclosed links and images now spans the entire
unclosed content (e.g., `[unclosed link`) instead of just pointing at the
opening bracket (`[`).

This provides better context by highlighting exactly what content is
missing its closing delimiter, making the error easier to understand
and fix.

Affects:
- Unclosed links: `[text` now highlights `[text` instead of just `[`
- Unclosed images: `![alt` now highlights `![alt` instead of just `![`
- Unclosed reference labels in both links and images
Adds a just task to run CommonMark spec conformance tests for the
markdown parser. Currently runs 652 spec examples with ~69.8% compliance.

Usage: just test-markdown-conformance
@jfmcdowell jfmcdowell force-pushed the feat/markdown-support-next branch from c1c1fe1 to eae8e0f Compare January 23, 2026 01:50
@jfmcdowell
Copy link
Contributor Author

One more thing.

Please create a just task to run the conformance, and rebase your branch so that CI passes

Added just test-markdown-conformance to run the CommonMark spec tests. This command will be updated when conformance tests move toxtask/coverage.

@jfmcdowell jfmcdowell requested a review from ematipico January 23, 2026 02:39
Copy link
Member

@ematipico ematipico left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We just made great steps towards markdown parsing, thank you @jfmcdowell !

Apologies for the slow reviews, and thank for bearing with us

@ematipico ematipico requested a review from dyc3 January 23, 2026 08:57
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

A-Parser Area: parser A-Tooling Area: internal tools

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants