Skip to content

fix(markdown): preserve nested list indent tokens#9717

Merged
dyc3 merged 1 commit intobiomejs:mainfrom
jfmcdowell:fix-md-nested-list-indent-9715
Mar 30, 2026
Merged

fix(markdown): preserve nested list indent tokens#9717
dyc3 merged 1 commit intobiomejs:mainfrom
jfmcdowell:fix-md-nested-list-indent-9715

Conversation

@jfmcdowell
Copy link
Copy Markdown
Contributor

@jfmcdowell jfmcdowell commented Mar 30, 2026

Note

This PR was created with AI assistance (Codex).

Summary

Fixes #9715.

Preserve nested list marker indentation as structural MD_INDENT_TOKEN_LIST nodes instead of skipped trivia.

This also fixes nested list continuation ownership: parent-indented continuation lines are no longer absorbed into child list item paragraphs via lazy continuation, and now correctly return to the parent item.

Test Plan

  • just test-crate biome_markdown_parser
  • just test-markdown-conformance
  • just f
  • just l

Docs

N/A.

@changeset-bot
Copy link
Copy Markdown

changeset-bot bot commented Mar 30, 2026

⚠️ No Changeset found

Latest commit: e28a5ef

Merging this PR will not cause a version bump for any packages. If these changes should not result in a new version, you're good to go. If these changes should result in a version bump, you need to add a changeset.

This PR includes no changesets

When changesets are added to this PR, you'll see the packages that this PR includes changesets for and the associated semver types

Click here to learn what changesets are, and how to add one.

Click here if you're a maintainer who wants to add a changeset to this PR

@github-actions github-actions bot added A-Parser Area: parser L-Markdown Language: Markdown labels Mar 30, 2026
@jfmcdowell jfmcdowell changed the title fix(markdown): preserve nested list indent tokens 🤖🤖🤖 fix(markdown): preserve nested list indent tokens Mar 30, 2026
@codspeed-hq
Copy link
Copy Markdown

codspeed-hq bot commented Mar 30, 2026

Merging this PR will degrade performance by 13.13%

❌ 2 regressed benchmarks
✅ 26 untouched benchmarks
⏩ 228 skipped benchmarks1

⚠️ Please fix the performance issues or acknowledge them on CodSpeed.

Performance Changes

Benchmark BASE HEAD Efficiency
synthetic/nested-lists.md[uncached] 4.3 ms 5 ms -13.13%
synthetic/nested-lists.md[cached] 4.3 ms 4.9 ms -13%

Comparing jfmcdowell:fix-md-nested-list-indent-9715 (e28a5ef) with main (61b7ec5)

Open in CodSpeed

Footnotes

  1. 228 benchmarks were skipped, so the baseline results were used instead. If they were deleted from the codebase, click here and archive them to remove them from the performance reports.

@jfmcdowell jfmcdowell force-pushed the fix-md-nested-list-indent-9715 branch 4 times, most recently from 5a9f0c7 to e058e9a Compare March 30, 2026 15:37
@github-actions github-actions bot added the A-Formatter Area: formatter label Mar 30, 2026
@jfmcdowell jfmcdowell force-pushed the fix-md-nested-list-indent-9715 branch from e058e9a to 24d2e49 Compare March 30, 2026 16:35
@jfmcdowell jfmcdowell force-pushed the fix-md-nested-list-indent-9715 branch from 24d2e49 to e28a5ef Compare March 30, 2026 16:58
@jfmcdowell
Copy link
Copy Markdown
Contributor Author

The ~13% regression on synthetic/nested-lists.md is expected: indent-aware list-end detection requires deeper lookaheads than the old non-indent-aware version, which produced wrong trees for nested list continuation. Happy to investigate further if needed.

@jfmcdowell jfmcdowell marked this pull request as ready for review March 30, 2026 17:00
@coderabbitai
Copy link
Copy Markdown
Contributor

coderabbitai bot commented Mar 30, 2026

Walkthrough

This pull request refactors list parsing to correctly handle indentation in nested bullet and ordered lists. The changes address incorrect token skipping in nested list markers by introducing indent-aware lookahead detection, proper virtual-line-start handling for marker position calculation, and state propagation for blank-line-terminated lists. The parser now tracks marker indentation at each list level and uses this context to determine list continuation boundaries and paragraph breaks, ensuring that leading whitespace before nested markers is parsed as indent tokens rather than being skipped.

🚥 Pre-merge checks | ✅ 4
✅ Passed checks (4 passed)
Check name Status Explanation
Title check ✅ Passed The title precisely describes the main change: preserving nested list indent tokens instead of treating them as skipped trivia.
Description check ✅ Passed The description clearly relates to the changeset, referencing issue #9715 and explaining the core fixes: indent token preservation and continuation ownership.
Linked Issues check ✅ Passed The PR fully addresses #9715's requirements: indent tokens are now parsed into MD_INDENT_TOKEN_LIST nodes [#9715], skipped tokens on nested markers are eliminated [#9715], and continuation ownership is corrected [#9715].
Out of Scope Changes check ✅ Passed All changes—parser state tracking, marker indent detection, list-end logic, and test additions—directly support the core objective of preserving nested list indent tokens and fixing continuation ownership.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

Copy link
Copy Markdown
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (1)
crates/biome_markdown_parser/src/syntax/list.rs (1)

2110-2120: ⚠️ Potential issue | 🟠 Major

Clear the blank-line latch before each block parse.

finish_list() now leaves last_list_ends_with_blank set for every list, including unrelated top-level ones. This take() then runs after any later paragraph/code/quote block, so an earlier loose list can falsely mark the current item as having seen a blank line. Please reset the latch before parse_any_block_with_indent_code_policy() so this read only reflects the block we just parsed.

Possible fix
     let allow_indent_code_block = !state.last_block_was_paragraph || prev_was_blank;
+    let _ = p.take_last_list_ends_with_blank();
     let parsed = parse_any_block_with_indent_code_policy(p, allow_indent_code_block);
     state.last_block_was_paragraph = if let Present(ref marker) = parsed {
         is_paragraph_like(marker.kind(p))
     } else {
         false
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@crates/biome_markdown_parser/src/syntax/list.rs` around lines 2110 - 2120,
The bug is that last_list_ends_with_blank is only consumed after parsing, so a
previously set latch can incorrectly mark the current block as blank; before
calling parse_any_block_with_indent_code_policy(p, allow_indent_code_block)
clear/reset the latch by calling p.clear_last_list_ends_with_blank() (or the
equivalent method that clears last_list_ends_with_blank) so
take_last_list_ends_with_blank() only reflects the block just parsed; adjust the
code around parse_any_block_with_indent_code_policy, finish_list, and
take_last_list_ends_with_blank to ensure the latch is cleared prior to parsing
each block and only consumed immediately afterward, and keep updating
state.has_blank_line/state.last_was_blank as before.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@crates/biome_markdown_parser/src/syntax/list.rs`:
- Around line 846-847: The lookahead used in
has_ordered_item_after_blank_lines_at_indent (and the similar branches around
the 855-882 region) currently only returns a boolean so it loses the
ordered-item delimiter ('.' vs ')'), causing mixed-delimiter items to be merged;
update the lookahead API and its call sites (e.g., where
has_ordered_item_after_blank_lines_at_indent is invoked alongside
self.marker_indent and &mut self.is_tight) to return or propagate the actual
marker delimiter (or accept marker_delim as an input) and use that delimiter
when validating continuations (instead of calling current_ordered_delim at the
pre-NEWLINE position or skipping marker_delim checks). Ensure functions that
decide continuation branches (the newline path and other ordered-item checks)
receive and compare the marker_delim so delimiter type is preserved across the
lookahead.

---

Outside diff comments:
In `@crates/biome_markdown_parser/src/syntax/list.rs`:
- Around line 2110-2120: The bug is that last_list_ends_with_blank is only
consumed after parsing, so a previously set latch can incorrectly mark the
current block as blank; before calling
parse_any_block_with_indent_code_policy(p, allow_indent_code_block) clear/reset
the latch by calling p.clear_last_list_ends_with_blank() (or the equivalent
method that clears last_list_ends_with_blank) so
take_last_list_ends_with_blank() only reflects the block just parsed; adjust the
code around parse_any_block_with_indent_code_policy, finish_list, and
take_last_list_ends_with_blank to ensure the latch is cleared prior to parsing
each block and only consumed immediately afterward, and keep updating
state.has_blank_line/state.last_was_blank as before.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

Run ID: 86eb2d38-631c-4188-acaf-effd15429d29

📥 Commits

Reviewing files that changed from the base of the PR and between 9e2ef0d and e28a5ef.

⛔ Files ignored due to path filters (7)
  • crates/biome_markdown_formatter/tests/specs/prettier/markdown/list/issue-17652.md.snap is excluded by !**/*.snap and included by **
  • crates/biome_markdown_parser/tests/md_test_suite/ok/list_continuation_edge_cases.md.snap is excluded by !**/*.snap and included by **
  • crates/biome_markdown_parser/tests/md_test_suite/ok/list_indentation.md.snap is excluded by !**/*.snap and included by **
  • crates/biome_markdown_parser/tests/md_test_suite/ok/list_tightness.md.snap is excluded by !**/*.snap and included by **
  • crates/biome_markdown_parser/tests/md_test_suite/ok/multiline_list.md.snap is excluded by !**/*.snap and included by **
  • crates/biome_markdown_parser/tests/md_test_suite/ok/nested_bullet_indent_tokens.md.snap is excluded by !**/*.snap and included by **
  • crates/biome_markdown_parser/tests/md_test_suite/ok/nested_list_interrupt_after_newline.md.snap is excluded by !**/*.snap and included by **
📒 Files selected for processing (5)
  • crates/biome_markdown_parser/src/parser.rs
  • crates/biome_markdown_parser/src/syntax/list.rs
  • crates/biome_markdown_parser/src/syntax/mod.rs
  • crates/biome_markdown_parser/tests/md_test_suite/ok/nested_bullet_indent_tokens.md
  • crates/biome_markdown_parser/tests/spec_test.rs

Comment on lines +846 to 847
|p| has_ordered_item_after_blank_lines_at_indent(p, self.marker_indent),
&mut self.is_tight,
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

Keep the ordered-list delimiter in the indent-aware lookahead.

These branches now only answer “is there an ordered item here?” and lose whether it was . or ). The newline path also asks current_ordered_delim(p) at the pre-NEWLINE position, so the marker_delim check can be skipped entirely. That can merge mixed-delimiter items into one list when this continuation path is taken.

Also applies to: 855-882

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@crates/biome_markdown_parser/src/syntax/list.rs` around lines 846 - 847, The
lookahead used in has_ordered_item_after_blank_lines_at_indent (and the similar
branches around the 855-882 region) currently only returns a boolean so it loses
the ordered-item delimiter ('.' vs ')'), causing mixed-delimiter items to be
merged; update the lookahead API and its call sites (e.g., where
has_ordered_item_after_blank_lines_at_indent is invoked alongside
self.marker_indent and &mut self.is_tight) to return or propagate the actual
marker delimiter (or accept marker_delim as an input) and use that delimiter
when validating continuations (instead of calling current_ordered_delim at the
pre-NEWLINE position or skipping marker_delim checks). Ensure functions that
decide continuation branches (the newline path and other ordered-item checks)
receive and compare the marker_delim so delimiter type is preserved across the
lookahead.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

A-Formatter Area: formatter A-Parser Area: parser L-Markdown Language: Markdown

Projects

None yet

Development

Successfully merging this pull request may close these issues.

🐛 Skipped tokens in bullet lists

2 participants