Skip to content

fix(markdown_parser): handle tab-separated container markers#9965

Merged
ematipico merged 2 commits intobiomejs:mainfrom
jfmcdowell:fix/md-tab-separated-container-markers
Apr 17, 2026
Merged

fix(markdown_parser): handle tab-separated container markers#9965
ematipico merged 2 commits intobiomejs:mainfrom
jfmcdowell:fix/md-tab-separated-container-markers

Conversation

@jfmcdowell
Copy link
Copy Markdown
Contributor

@jfmcdowell jfmcdowell commented Apr 13, 2026

Note

This PR was created with AI assistance (Codex).

Summary

Fixes the tab-separated container marker cases from the markdown fuzzer without changing thematic-break precedence. The lexer now splits post-marker whitespace only when tabs are involved, preserving ordinary list CST shape while letting the parser distinguish marker separator space from real content indent.

This fixes > > foo, - foo, and - foo, and preserves the correct CommonMark behavior for - --- while updating the stale test comments around that case.

Test Plan

  • cargo test -p biome_markdown_parser
  • just test-markdown-conformance
  • Added targeted regressions in spec_test.rs for the tab-separated quote/list cases

Docs

N/A

@changeset-bot
Copy link
Copy Markdown

changeset-bot bot commented Apr 13, 2026

⚠️ No Changeset found

Latest commit: 19c23a4

Merging this PR will not cause a version bump for any packages. If these changes should not result in a new version, you're good to go. If these changes should result in a version bump, you need to add a changeset.

This PR includes no changesets

When changesets are added to this PR, you'll see the packages that this PR includes changesets for and the associated semver types

Click here to learn what changesets are, and how to add one.

Click here if you're a maintainer who wants to add a changeset to this PR

@github-actions github-actions bot added A-Parser Area: parser L-Markdown Language: Markdown labels Apr 13, 2026
@codspeed-hq
Copy link
Copy Markdown

codspeed-hq bot commented Apr 13, 2026

Merging this PR will not alter performance

✅ 28 untouched benchmarks
⏩ 228 skipped benchmarks1


Comparing jfmcdowell:fix/md-tab-separated-container-markers (19c23a4) with main (9956f1d)

Open in CodSpeed

Footnotes

  1. 228 benchmarks were skipped, so the baseline results were used instead. If they were deleted from the codebase, click here and archive them to remove them from the performance reports.

@jfmcdowell jfmcdowell marked this pull request as ready for review April 13, 2026 17:03
@coderabbitai
Copy link
Copy Markdown
Contributor

coderabbitai bot commented Apr 13, 2026

No actionable comments were generated in the recent review. 🎉

ℹ️ Recent review info
⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

Run ID: b287425e-04e1-4204-8fcc-8daeaaa13518

📥 Commits

Reviewing files that changed from the base of the PR and between 24baa1e and 19c23a4.

⛔ Files ignored due to path filters (3)
  • crates/biome_markdown_parser/tests/md_test_suite/ok/block_quote_tab_separated.md.snap is excluded by !**/*.snap and included by **
  • crates/biome_markdown_parser/tests/md_test_suite/ok/bullet_list_space_tab_space.md.snap is excluded by !**/*.snap and included by **
  • crates/biome_markdown_parser/tests/md_test_suite/ok/bullet_list_tab_separated.md.snap is excluded by !**/*.snap and included by **
📒 Files selected for processing (3)
  • crates/biome_markdown_parser/tests/md_test_suite/ok/block_quote_tab_separated.md
  • crates/biome_markdown_parser/tests/md_test_suite/ok/bullet_list_space_tab_space.md
  • crates/biome_markdown_parser/tests/md_test_suite/ok/bullet_list_tab_separated.md
✅ Files skipped from review due to trivial changes (3)
  • crates/biome_markdown_parser/tests/md_test_suite/ok/bullet_list_tab_separated.md
  • crates/biome_markdown_parser/tests/md_test_suite/ok/bullet_list_space_tab_space.md
  • crates/biome_markdown_parser/tests/md_test_suite/ok/block_quote_tab_separated.md

Walkthrough

This PR refines Markdown list and blockquote parsing by adding lexer-level whitespace controls and tighter ordered-list marker validation. It introduces a MAX_ORDERED_LIST_MARKER_DIGITS cap, is_space_or_tab_byte and is_ascii_digit_byte helpers, and is_in_list_marker_whitespace to detect whitespace immediately after top-level list markers. The lexer now emits single-space/tab textual-literal tokens in that region. Parser changes conditionally preserve or remap tab post-marker whitespace for lists and quotes and constrain first-line indent consumption for indented-code detection. New tests cover tab-separated nested blockquotes and mixed space/tab list markers.

Possibly related PRs

Suggested reviewers

  • ematipico
🚥 Pre-merge checks | ✅ 2
✅ Passed checks (2 passed)
Check name Status Explanation
Title check ✅ Passed The title directly describes the primary change: fixing how the markdown parser handles tabs as separators in container markers (blockquotes and lists).
Description check ✅ Passed The description clearly relates to the changeset, explaining the fix for tab-separated container markers, the specific cases addressed, test coverage, and preserved CommonMark behaviour.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

@auscaster

This comment has been minimized.

@jfmcdowell
Copy link
Copy Markdown
Contributor Author

Quick triage pass: let's keep this review pinned to the listed reproducer cases.

The PR description is well-scoped, but the main unblock for review is proof that the lexer-side change is contained to those tab-after-marker behaviors. Most of the churn is in crates/biome_markdown_parser/src/lexer/mod.rs, so I want the testcase evidence to be very explicit before deeper review.

Could you either point to the exact spec_test.rs / conformance cases or paste the expected assertions for:

  • >\t>\tfoo
  • -\tfoo
  • - \t foo
  • the preserved CommonMark/thematic-break behavior for - ---
  • one control case showing ordinary space-separated > foo / - foo is unchanged
  • one case showing a tab that should remain real content indent still does so

If those control cases already exist in the current conformance suite, linking the case names is enough.

Also, if the missing changeset is intentional for this parser fix, please confirm that explicitly.

Head SHA: 24baa1e

Thanks for the review. On 24baa1e here are the requested pointers:

  • >\t>\tfoo: example 99923 (spec_test.rs:278)
  • -\tfoo: example 10014 (spec_test.rs:299)
  • - \t foo: example 10015 (spec_test.rs:300)
  • preserved - --- behavior: example 30013 and fuzz_mixed_markers_thematic_break (spec_test.rs:424, spec_test.rs:499)
  • unchanged space-separated controls: > foo in example 93 (spec_test.rs:247), - foo in example 9993 (spec_test.rs:283)
  • tab still treated as real content indent where it should be: -\t\tfoo in example 7 (spec_test.rs:220)

So the coverage is pinned to the tab-after-marker cases plus the requested controls.

No changeset is intentional here. This markdown parser/formatter area is still under active development, so per prior maintainer guidance I didn’t add one for this fix.

Copy link
Copy Markdown
Member

@ematipico ematipico left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we add some tests to snapshot? We should add them for regression

…tainer markers

Add CST/AST snapshot fixtures for the three tab-separated cases covered by
the preceding fix: nested blockquote (`>\t>\tfoo`), list marker with tab
separator (`-\tfoo`), and list marker with space-tab-space separator
(`- \t foo`).
Copy link
Copy Markdown
Member

@ematipico ematipico left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you! Really appreciated

@ematipico ematipico merged commit 0b5a9c4 into biomejs:main Apr 17, 2026
28 checks passed
@jfmcdowell
Copy link
Copy Markdown
Contributor Author

Thank you! Really appreciated

It was my oversight.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

A-Parser Area: parser L-Markdown Language: Markdown

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants