fix(markdown_parser): handle tab-separated container markers by jfmcdowell · Pull Request #9965 · biomejs/biome

jfmcdowell · 2026-04-13T14:53:08Z

Note

This PR was created with AI assistance (Codex).

Summary

Fixes the tab-separated container marker cases from the markdown fuzzer without changing thematic-break precedence. The lexer now splits post-marker whitespace only when tabs are involved, preserving ordinary list CST shape while letting the parser distinguish marker separator space from real content indent.

This fixes > > foo, - foo, and - foo, and preserves the correct CommonMark behavior for - --- while updating the stale test comments around that case.

Test Plan

cargo test -p biome_markdown_parser
just test-markdown-conformance
Added targeted regressions in spec_test.rs for the tab-separated quote/list cases

Docs

N/A

changeset-bot · 2026-04-13T14:53:14Z

⚠️ No Changeset found

Latest commit: 19c23a4

Merging this PR will not cause a version bump for any packages. If these changes should not result in a new version, you're good to go. If these changes should result in a version bump, you need to add a changeset.

This PR includes no changesets

When changesets are added to this PR, you'll see the packages that this PR includes changesets for and the associated semver types

Click here to learn what changesets are, and how to add one.

Click here if you're a maintainer who wants to add a changeset to this PR

codspeed-hq · 2026-04-13T15:00:58Z

Merging this PR will not alter performance

✅ 28 untouched benchmarks
⏩ 228 skipped benchmarks¹

_{Comparing jfmcdowell:fix/md-tab-separated-container-markers (19c23a4) with main (9956f1d)}

228 benchmarks were skipped, so the baseline results were used instead. If they were deleted from the codebase, click here and archive them to remove them from the performance reports. ↩

coderabbitai · 2026-04-13T17:07:26Z

No actionable comments were generated in the recent review. 🎉

ℹ️ Recent review info

⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

Run ID: b287425e-04e1-4204-8fcc-8daeaaa13518

📥 Commits

Reviewing files that changed from the base of the PR and between 24baa1e and 19c23a4.

⛔ Files ignored due to path filters (3)

crates/biome_markdown_parser/tests/md_test_suite/ok/block_quote_tab_separated.md.snap is excluded by !**/*.snap and included by **
crates/biome_markdown_parser/tests/md_test_suite/ok/bullet_list_space_tab_space.md.snap is excluded by !**/*.snap and included by **
crates/biome_markdown_parser/tests/md_test_suite/ok/bullet_list_tab_separated.md.snap is excluded by !**/*.snap and included by **

📒 Files selected for processing (3)

crates/biome_markdown_parser/tests/md_test_suite/ok/block_quote_tab_separated.md
crates/biome_markdown_parser/tests/md_test_suite/ok/bullet_list_space_tab_space.md
crates/biome_markdown_parser/tests/md_test_suite/ok/bullet_list_tab_separated.md

✅ Files skipped from review due to trivial changes (3)

crates/biome_markdown_parser/tests/md_test_suite/ok/bullet_list_tab_separated.md
crates/biome_markdown_parser/tests/md_test_suite/ok/bullet_list_space_tab_space.md
crates/biome_markdown_parser/tests/md_test_suite/ok/block_quote_tab_separated.md

Walkthrough

This PR refines Markdown list and blockquote parsing by adding lexer-level whitespace controls and tighter ordered-list marker validation. It introduces a MAX_ORDERED_LIST_MARKER_DIGITS cap, is_space_or_tab_byte and is_ascii_digit_byte helpers, and is_in_list_marker_whitespace to detect whitespace immediately after top-level list markers. The lexer now emits single-space/tab textual-literal tokens in that region. Parser changes conditionally preserve or remap tab post-marker whitespace for lists and quotes and constrain first-line indent consumption for indented-code detection. New tests cover tab-separated nested blockquotes and mixed space/tab list markers.

Possibly related PRs

fix(markdown): preserve nested list indent tokens #9717: Modifies list-marker whitespace classification and indentation handling across lexer and parser layers.
test(markdown): add regression tests for mixed list markers across blank lines #9732: Changes parse_bullet / parse_ordered_bullet and post-marker whitespace handling in syntax/list.rs.
fix(markdown_parser): incorrect tight/loose list classification at marker boundaries #9787: Overlaps list parsing updates in syntax/list.rs, including post-marker tab handling and indentation behaviour.

Suggested reviewers

ematipico

🚥 Pre-merge checks | ✅ 2

✅ Passed checks (2 passed)

Check name	Status	Explanation
Title check	✅ Passed	The title directly describes the primary change: fixing how the markdown parser handles tabs as separators in container markers (blockquotes and lists).
Description check	✅ Passed	The description clearly relates to the changeset, explaining the fix for tab-separated container markers, the specific cases addressed, test coverage, and preserved CommonMark behaviour.

_{✏️ Tip: You can configure your own custom pre-merge checks in the settings.}

✨ Finishing Touches

🧪 Generate unit tests (beta)

Create PR with unit tests

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

_{Comment @coderabbitai help to get the list of available commands and usage tips.}

jfmcdowell · 2026-04-17T01:06:55Z

Quick triage pass: let's keep this review pinned to the listed reproducer cases.

The PR description is well-scoped, but the main unblock for review is proof that the lexer-side change is contained to those tab-after-marker behaviors. Most of the churn is in crates/biome_markdown_parser/src/lexer/mod.rs, so I want the testcase evidence to be very explicit before deeper review.

Could you either point to the exact spec_test.rs / conformance cases or paste the expected assertions for:

>\t>\tfoo

-\tfoo

- \t foo

the preserved CommonMark/thematic-break behavior for - ---

one control case showing ordinary space-separated > foo / - foo is unchanged

one case showing a tab that should remain real content indent still does so

If those control cases already exist in the current conformance suite, linking the case names is enough.

Also, if the missing changeset is intentional for this parser fix, please confirm that explicitly.

Head SHA: 24baa1e

Thanks for the review. On 24baa1e here are the requested pointers:

>\t>\tfoo: example 99923 (spec_test.rs:278)
-\tfoo: example 10014 (spec_test.rs:299)
- \t foo: example 10015 (spec_test.rs:300)
preserved - --- behavior: example 30013 and fuzz_mixed_markers_thematic_break (spec_test.rs:424, spec_test.rs:499)
unchanged space-separated controls: > foo in example 93 (spec_test.rs:247), - foo in example 9993 (spec_test.rs:283)
tab still treated as real content indent where it should be: -\t\tfoo in example 7 (spec_test.rs:220)

So the coverage is pinned to the tab-after-marker cases plus the requested controls.

No changeset is intentional here. This markdown parser/formatter area is still under active development, so per prior maintainer guidance I didn’t add one for this fix.

ematipico

Can we add some tests to snapshot? We should add them for regression

…tainer markers Add CST/AST snapshot fixtures for the three tab-separated cases covered by the preceding fix: nested blockquote (`>\t>\tfoo`), list marker with tab separator (`-\tfoo`), and list marker with space-tab-space separator (`- \t foo`).

ematipico

Thank you! Really appreciated

jfmcdowell · 2026-04-17T12:59:37Z

Thank you! Really appreciated

It was my oversight.

fix(markdown_parser): handle tab-separated container markers

24baa1e

github-actions bot added A-Parser Area: parser L-Markdown Language: Markdown labels Apr 13, 2026

jfmcdowell marked this pull request as ready for review April 13, 2026 17:03

This comment has been minimized.

Sign in to view

ematipico reviewed Apr 17, 2026

View reviewed changes

ematipico approved these changes Apr 17, 2026

View reviewed changes

ematipico merged commit 0b5a9c4 into biomejs:main Apr 17, 2026
28 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

fix(markdown_parser): handle tab-separated container markers#9965

fix(markdown_parser): handle tab-separated container markers#9965
ematipico merged 2 commits intobiomejs:mainfrom
jfmcdowell:fix/md-tab-separated-container-markers

jfmcdowell commented Apr 13, 2026 •

edited

Loading

Uh oh!

changeset-bot bot commented Apr 13, 2026 •

edited

Loading

Uh oh!

codspeed-hq bot commented Apr 13, 2026 •

edited

Loading

Uh oh!

coderabbitai bot commented Apr 13, 2026 •

edited

Loading

Uh oh!

This comment has been minimized.

jfmcdowell commented Apr 17, 2026

Uh oh!

ematipico left a comment

Uh oh!

ematipico left a comment

Uh oh!

Uh oh!

jfmcdowell commented Apr 17, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Uh oh!

Conversation

jfmcdowell commented Apr 13, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

Test Plan

Docs

Uh oh!

changeset-bot bot commented Apr 13, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

⚠️ No Changeset found

Uh oh!

codspeed-hq bot commented Apr 13, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Merging this PR will not alter performance

Footnotes

Uh oh!

coderabbitai bot commented Apr 13, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Walkthrough

Possibly related PRs

Suggested reviewers

Uh oh!

This comment has been minimized.

jfmcdowell commented Apr 17, 2026

Uh oh!

ematipico left a comment

Choose a reason for hiding this comment

Uh oh!

ematipico left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

jfmcdowell commented Apr 17, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

jfmcdowell commented Apr 13, 2026 •

edited

Loading

changeset-bot bot commented Apr 13, 2026 •

edited

Loading

codspeed-hq bot commented Apr 13, 2026 •

edited

Loading

coderabbitai bot commented Apr 13, 2026 •

edited

Loading