fix(markdown): preserve nested list indent tokens#9717
Conversation
|
Merging this PR will degrade performance by 13.13%
Performance Changes
Comparing Footnotes
|
5a9f0c7 to
e058e9a
Compare
e058e9a to
24d2e49
Compare
24d2e49 to
e28a5ef
Compare
|
The ~13% regression on |
WalkthroughThis pull request refactors list parsing to correctly handle indentation in nested bullet and ordered lists. The changes address incorrect token skipping in nested list markers by introducing indent-aware lookahead detection, proper virtual-line-start handling for marker position calculation, and state propagation for blank-line-terminated lists. The parser now tracks marker indentation at each list level and uses this context to determine list continuation boundaries and paragraph breaks, ensuring that leading whitespace before nested markers is parsed as indent tokens rather than being skipped. 🚥 Pre-merge checks | ✅ 4✅ Passed checks (4 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches🧪 Generate unit tests (beta)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
There was a problem hiding this comment.
Actionable comments posted: 1
Caution
Some comments are outside the diff and can’t be posted inline due to platform limitations.
⚠️ Outside diff range comments (1)
crates/biome_markdown_parser/src/syntax/list.rs (1)
2110-2120:⚠️ Potential issue | 🟠 MajorClear the blank-line latch before each block parse.
finish_list()now leaveslast_list_ends_with_blankset for every list, including unrelated top-level ones. Thistake()then runs after any later paragraph/code/quote block, so an earlier loose list can falsely mark the current item as having seen a blank line. Please reset the latch beforeparse_any_block_with_indent_code_policy()so this read only reflects the block we just parsed.Possible fix
let allow_indent_code_block = !state.last_block_was_paragraph || prev_was_blank; + let _ = p.take_last_list_ends_with_blank(); let parsed = parse_any_block_with_indent_code_policy(p, allow_indent_code_block); state.last_block_was_paragraph = if let Present(ref marker) = parsed { is_paragraph_like(marker.kind(p)) } else { false🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@crates/biome_markdown_parser/src/syntax/list.rs` around lines 2110 - 2120, The bug is that last_list_ends_with_blank is only consumed after parsing, so a previously set latch can incorrectly mark the current block as blank; before calling parse_any_block_with_indent_code_policy(p, allow_indent_code_block) clear/reset the latch by calling p.clear_last_list_ends_with_blank() (or the equivalent method that clears last_list_ends_with_blank) so take_last_list_ends_with_blank() only reflects the block just parsed; adjust the code around parse_any_block_with_indent_code_policy, finish_list, and take_last_list_ends_with_blank to ensure the latch is cleared prior to parsing each block and only consumed immediately afterward, and keep updating state.has_blank_line/state.last_was_blank as before.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@crates/biome_markdown_parser/src/syntax/list.rs`:
- Around line 846-847: The lookahead used in
has_ordered_item_after_blank_lines_at_indent (and the similar branches around
the 855-882 region) currently only returns a boolean so it loses the
ordered-item delimiter ('.' vs ')'), causing mixed-delimiter items to be merged;
update the lookahead API and its call sites (e.g., where
has_ordered_item_after_blank_lines_at_indent is invoked alongside
self.marker_indent and &mut self.is_tight) to return or propagate the actual
marker delimiter (or accept marker_delim as an input) and use that delimiter
when validating continuations (instead of calling current_ordered_delim at the
pre-NEWLINE position or skipping marker_delim checks). Ensure functions that
decide continuation branches (the newline path and other ordered-item checks)
receive and compare the marker_delim so delimiter type is preserved across the
lookahead.
---
Outside diff comments:
In `@crates/biome_markdown_parser/src/syntax/list.rs`:
- Around line 2110-2120: The bug is that last_list_ends_with_blank is only
consumed after parsing, so a previously set latch can incorrectly mark the
current block as blank; before calling
parse_any_block_with_indent_code_policy(p, allow_indent_code_block) clear/reset
the latch by calling p.clear_last_list_ends_with_blank() (or the equivalent
method that clears last_list_ends_with_blank) so
take_last_list_ends_with_blank() only reflects the block just parsed; adjust the
code around parse_any_block_with_indent_code_policy, finish_list, and
take_last_list_ends_with_blank to ensure the latch is cleared prior to parsing
each block and only consumed immediately afterward, and keep updating
state.has_blank_line/state.last_was_blank as before.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: Path: .coderabbit.yaml
Review profile: CHILL
Plan: Pro
Run ID: 86eb2d38-631c-4188-acaf-effd15429d29
⛔ Files ignored due to path filters (7)
crates/biome_markdown_formatter/tests/specs/prettier/markdown/list/issue-17652.md.snapis excluded by!**/*.snapand included by**crates/biome_markdown_parser/tests/md_test_suite/ok/list_continuation_edge_cases.md.snapis excluded by!**/*.snapand included by**crates/biome_markdown_parser/tests/md_test_suite/ok/list_indentation.md.snapis excluded by!**/*.snapand included by**crates/biome_markdown_parser/tests/md_test_suite/ok/list_tightness.md.snapis excluded by!**/*.snapand included by**crates/biome_markdown_parser/tests/md_test_suite/ok/multiline_list.md.snapis excluded by!**/*.snapand included by**crates/biome_markdown_parser/tests/md_test_suite/ok/nested_bullet_indent_tokens.md.snapis excluded by!**/*.snapand included by**crates/biome_markdown_parser/tests/md_test_suite/ok/nested_list_interrupt_after_newline.md.snapis excluded by!**/*.snapand included by**
📒 Files selected for processing (5)
crates/biome_markdown_parser/src/parser.rscrates/biome_markdown_parser/src/syntax/list.rscrates/biome_markdown_parser/src/syntax/mod.rscrates/biome_markdown_parser/tests/md_test_suite/ok/nested_bullet_indent_tokens.mdcrates/biome_markdown_parser/tests/spec_test.rs
| |p| has_ordered_item_after_blank_lines_at_indent(p, self.marker_indent), | ||
| &mut self.is_tight, |
There was a problem hiding this comment.
Keep the ordered-list delimiter in the indent-aware lookahead.
These branches now only answer “is there an ordered item here?” and lose whether it was . or ). The newline path also asks current_ordered_delim(p) at the pre-NEWLINE position, so the marker_delim check can be skipped entirely. That can merge mixed-delimiter items into one list when this continuation path is taken.
Also applies to: 855-882
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@crates/biome_markdown_parser/src/syntax/list.rs` around lines 846 - 847, The
lookahead used in has_ordered_item_after_blank_lines_at_indent (and the similar
branches around the 855-882 region) currently only returns a boolean so it loses
the ordered-item delimiter ('.' vs ')'), causing mixed-delimiter items to be
merged; update the lookahead API and its call sites (e.g., where
has_ordered_item_after_blank_lines_at_indent is invoked alongside
self.marker_indent and &mut self.is_tight) to return or propagate the actual
marker delimiter (or accept marker_delim as an input) and use that delimiter
when validating continuations (instead of calling current_ordered_delim at the
pre-NEWLINE position or skipping marker_delim checks). Ensure functions that
decide continuation branches (the newline path and other ordered-item checks)
receive and compare the marker_delim so delimiter type is preserved across the
lookahead.
Note
This PR was created with AI assistance (Codex).
Summary
Fixes #9715.
Preserve nested list marker indentation as structural
MD_INDENT_TOKEN_LISTnodes instead of skipped trivia.This also fixes nested list continuation ownership: parent-indented continuation lines are no longer absorbed into child list item paragraphs via lazy continuation, and now correctly return to the parent item.
Test Plan
just test-crate biome_markdown_parserjust test-markdown-conformancejust fjust lDocs
N/A.