fix(markdown_parser): recognize setext heading inside blockquote by jfmcdowell · Pull Request #9782 · biomejs/biome

jfmcdowell · 2026-04-03T00:02:13Z

Note

This PR was created with AI assistance (Claude Code).

Summary

Fixes setext heading detection inside blockquotes.

After consuming a blockquote prefix (> ), the lexer no longer considered the following token to be at line start, so --- was lexed as MINUS instead of MD_THEMATIC_BREAK_LITERAL. As a result, input like:

> Foo
> ---

was parsed as a paragraph instead of a setext heading. Per CommonMark §5.1 and §4.3, blockquote content should still participate in setext heading parsing after the quote prefix is removed.

This adds force_relex_at_line_start to re-lex the current token as if it were at line start, and uses it in the blockquote/setext detection path.

Test Plan

New lexer unit test: force_relex_at_line_start_produces_thematic_break
New fixture: setext_heading_in_blockquote.md
just test-crate biome_markdown_parser
just test-markdown-conformance
just f
just l

Docs

N/A

changeset-bot · 2026-04-03T00:02:17Z

⚠️ No Changeset found

Latest commit: e9eca12

Merging this PR will not cause a version bump for any packages. If these changes should not result in a new version, you're good to go. If these changes should result in a version bump, you need to add a changeset.

This PR includes no changesets

When changesets are added to this PR, you'll see the packages that this PR includes changesets for and the associated semver types

Click here to learn what changesets are, and how to add one.

Click here if you're a maintainer who wants to add a changeset to this PR

codspeed-hq · 2026-04-03T00:09:11Z

Merging this PR will not alter performance

✅ 58 untouched benchmarks
⏩ 196 skipped benchmarks¹

_{Comparing jfmcdowell:fix/md-setext-heading-in-blockquote (e9eca12) with main (1d09f0f)}

196 benchmarks were skipped, so the baseline results were used instead. If they were deleted from the codebase, click here and archive them to remove them from the performance reports. ↩

coderabbitai · 2026-04-03T00:38:46Z

Note

Reviews paused

It looks like this branch is under active development. To avoid overwhelming you with review comments due to an influx of new commits, CodeRabbit has automatically paused this review. You can configure this behavior by changing the reviews.auto_review.auto_pause_after_reviewed_commits setting.

Use the following commands to manage reviews:

@coderabbitai resume to resume automatic reviews.
@coderabbitai review to trigger a single review.

Use the checkboxes below for quick actions:

▶️ Resume reviews
🔍 Trigger review

Walkthrough

This change adds a "re-lex as if at line start" capability used by the markdown lexer and parser. It implements BufferedLexer::force_relex_at_line_start, exposes MarkdownTokenSource::force_relex_at_line_start and MarkdownParser::force_relex_at_line_start, and invokes that re-lexing at specific points after consuming > quote prefixes so line-start‑gated tokens (for example MD_THEMATIC_BREAK_LITERAL / setext underlines) are recognised inside blockquotes. Tests and fixtures for setext headings and thematic breaks in blockquotes were added.

Possibly related PRs

refactor(markdown): cleanup nits from #9746 #9751 — Modifies parse_quote in crates/biome_markdown_parser/src/syntax/quote.rs; strong code-level overlap around quote-prefix handling and marker consumption.

Suggested reviewers

dyc3
ematipico

🚥 Pre-merge checks | ✅ 2

✅ Passed checks (2 passed)

Check name	Status	Explanation
Title check	✅ Passed	The title accurately summarizes the main change: fixing setext heading detection inside blockquotes, which is the core problem addressed by the PR.
Description check	✅ Passed	The description clearly explains the problem (setext headings not recognised in blockquotes due to lexer re-lexing), the solution (force_relex_at_line_start), and provides test coverage details.

_{✏️ Tip: You can configure your own custom pre-merge checks in the settings.}

✨ Finishing Touches

🧪 Generate unit tests (beta)

Create PR with unit tests

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

_{Comment @coderabbitai help to get the list of available commands and usage tips.}

coderabbitai

Actionable comments posted: 1

🤖 Prompt for all review comments with AI agents

Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@crates/biome_markdown_parser/src/syntax/mod.rs`:
- Around line 871-875: The quote-entry path in parse_quote_block_list (after
emit_quote_prefix_node / after consume_quote_prefix) does not call
p.force_relex_at_line_start(), so a quoted line like "> ---" is tokenized as
MINUS tokens instead of MD_THEMATIC_BREAK_LITERAL; add a call to
p.force_relex_at_line_start() immediately after
emit_quote_prefix_node()/consume_quote_prefix in parse_quote_block_list (or in
the first-block dispatch that handles the entry path) so the
paragraph/thematic-break lexer runs at line start, and add a unit test asserting
that a standalone blockquote thematic break (e.g., "> ---") is parsed as a
thematic break inside the blockquote to prevent regressions.

🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

Push a commit to this branch (recommended)
Create a new PR with the fixes

ℹ️ Review info

⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

Run ID: aff0ab95-7158-4eab-a64f-ca5c0b4a4555

📥 Commits

Reviewing files that changed from the base of the PR and between b22f31a and ad20351.

⛔ Files ignored due to path filters (2)

crates/biome_markdown_parser/tests/md_test_suite/ok/setext_heading_edge_cases.md.snap is excluded by !**/*.snap and included by **
crates/biome_markdown_parser/tests/md_test_suite/ok/setext_heading_in_blockquote.md.snap is excluded by !**/*.snap and included by **

📒 Files selected for processing (7)

crates/biome_markdown_parser/src/lexer/tests.rs
crates/biome_markdown_parser/src/parser.rs
crates/biome_markdown_parser/src/syntax/mod.rs
crates/biome_markdown_parser/src/token_source.rs
crates/biome_markdown_parser/tests/md_test_suite/ok/setext_heading_in_blockquote.md
crates/biome_markdown_parser/tests/spec_test.rs
crates/biome_parser/src/lexer.rs

ematipico · 2026-04-03T05:43:09Z

+        || (p.at(MD_TEXTUAL_LITERAL)
+            && p.cur_text()
+                .chars()
+                .all(|c| c == ' ' || c == '\t' || c == '-' || c == '*' || c == '_'));


Let's use the lookup table for faster access to bytes.

Also, I believe this logic is incorrect: we're checking a union of characters, which means text like _*- matches the all() function , which I believe it's not correct

ematipico · 2026-04-03T05:46:24Z

+/// After consuming a quote prefix, selectively re-lex the current token as if
+/// it were at line start when the remaining line could form a thematic break.
+///
+/// Re-lexing unconditionally perturbs ordinary quoted text tokenization by
+/// splitting leading spaces into separate tokens. We only need line-start
+/// semantics here for thematic-break candidates like `> ---`.


While I understand the good and technical comment, it doesn't actually explain the criteria of what we check for the thematic line break.

I suggest rewording the docstring with a more concrete approach, or having some inline comments in the weird parts of the code. For example the all() function usage is weird to me, and probably wrong (I might be wrong, but alas that's why I ask for a more down to earth comment)

jfmcdowell · 2026-04-03T13:53:09Z

@ematipico feedback addressed in c83efe6. After morning coffee, all() was buggy and has been replaced.

coderabbitai

Actionable comments posted: 1

🤖 Prompt for all review comments with AI agents

Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@crates/biome_markdown_parser/src/syntax/quote.rs`:
- Line 100: The re-lex for thematic breaks after consuming a quote prefix
(currently in force_relex_thematic_break_after_quote_prefix(p)) misses cases
where parse_code_block_newline() consumes the quote prefix and returns parked at
the next line, so create a shared helper (e.g.,
mark_quote_prefix_consumed_and_relex(p)) and call it whenever any code path
consumes a quote prefix — replace direct calls to
force_relex_thematic_break_after_quote_prefix(p) and add a call from
parse_code_block_newline() (and the other spot referenced around 328) so the
next token is re-lexed and indented-code hand-off (e.g., `>     code\n> ---`)
runs through the same re-lex hook.

🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

Push a commit to this branch (recommended)
Create a new PR with the fixes

ℹ️ Review info

⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

Run ID: e9c205ae-4952-4b5e-b5dc-3d577f7a37d0

📥 Commits

Reviewing files that changed from the base of the PR and between c83efe6 and ce5b6a4.

📒 Files selected for processing (1)

crates/biome_markdown_parser/src/syntax/quote.rs

After consuming a blockquote prefix (`> `), the lexer's `after_newline` flag is false, so `---` is lexed as MINUS tokens instead of MD_THEMATIC_BREAK_LITERAL. This prevented setext heading detection inside blockquotes. Add `force_relex_at_line_start` to the buffered lexer which re-lexes the current token with `after_line_break = true`. Use it in `classify_quote_break_after_newline` (lookahead) and `break_for_quote_prefix_after_inline_newline` (parse path) so the lexer produces the correct block-level tokens after a quote prefix.

…andidate, add tests

Address review feedback: use `biome_unicode_table` dispatch variants (MIN, MUL, IDT) instead of raw byte literals for thematic break character matching in `is_thematic_break_candidate_text`.

github-actions bot added A-Parser Area: parser L-Markdown Language: Markdown labels Apr 3, 2026

jfmcdowell marked this pull request as ready for review April 3, 2026 00:24

coderabbitai bot reviewed Apr 3, 2026

View reviewed changes

Comment thread crates/biome_markdown_parser/src/syntax/mod.rs

ematipico reviewed Apr 3, 2026

View reviewed changes

coderabbitai bot reviewed Apr 3, 2026

View reviewed changes

Comment thread crates/biome_markdown_parser/src/syntax/quote.rs Outdated

ematipico reviewed Apr 4, 2026

View reviewed changes

Comment thread crates/biome_markdown_parser/src/syntax/quote.rs Outdated

jfmcdowell and others added 7 commits April 5, 2026 19:42

fix(markdown_parser): parse quoted thematic breaks at line start

2df3196

fix(review): use lookup table, fix mixed-char bug in thematic break c…

23de06d

…andidate, add tests

[autofix.ci] apply automated fixes

e753161

fix(markdown): relex quoted thematic breaks after indented code

633b0c5

fix(markdown): stop quoted code before thematic break

71f073b

refactor: use dispatch table for thematic break char matching

e9eca12

Address review feedback: use `biome_unicode_table` dispatch variants (MIN, MUL, IDT) instead of raw byte literals for thematic break character matching in `is_thematic_break_candidate_text`.

jfmcdowell force-pushed the fix/md-setext-heading-in-blockquote branch from 831dca1 to e9eca12 Compare April 5, 2026 23:43

ematipico approved these changes Apr 6, 2026

View reviewed changes

ematipico merged commit b897832 into biomejs:main Apr 6, 2026
31 checks passed

jfmcdowell deleted the fix/md-setext-heading-in-blockquote branch April 6, 2026 13:48

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

fix(markdown_parser): recognize setext heading inside blockquote#9782

fix(markdown_parser): recognize setext heading inside blockquote#9782
ematipico merged 7 commits intobiomejs:mainfrom
jfmcdowell:fix/md-setext-heading-in-blockquote

jfmcdowell commented Apr 3, 2026

Uh oh!

changeset-bot bot commented Apr 3, 2026 •

edited

Loading

Uh oh!

codspeed-hq bot commented Apr 3, 2026 •

edited

Loading

Uh oh!

coderabbitai bot commented Apr 3, 2026 •

edited

Loading

Reviews paused

Uh oh!

coderabbitai bot left a comment

Uh oh!

Uh oh!

ematipico Apr 3, 2026

Uh oh!

ematipico Apr 3, 2026 •

edited

Loading

Uh oh!

Uh oh!

jfmcdowell commented Apr 3, 2026

Uh oh!

coderabbitai bot left a comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Uh oh!

Conversation

jfmcdowell commented Apr 3, 2026

Summary

Test Plan

Docs

Uh oh!

changeset-bot bot commented Apr 3, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

⚠️ No Changeset found

Uh oh!

codspeed-hq bot commented Apr 3, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Merging this PR will not alter performance

Footnotes

Uh oh!

coderabbitai bot commented Apr 3, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Reviews paused

Walkthrough

Possibly related PRs

Suggested reviewers

Uh oh!

coderabbitai bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

ematipico Apr 3, 2026

Choose a reason for hiding this comment

Uh oh!

ematipico Apr 3, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

jfmcdowell commented Apr 3, 2026

Uh oh!

coderabbitai bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

changeset-bot bot commented Apr 3, 2026 •

edited

Loading

codspeed-hq bot commented Apr 3, 2026 •

edited

Loading

coderabbitai bot commented Apr 3, 2026 •

edited

Loading

ematipico Apr 3, 2026 •

edited

Loading