fix(markdown_parser): incorrect tight/loose list classification at marker boundaries#9787
Conversation
|
Merging this PR will not alter performance
Comparing Footnotes
|
|
Note Reviews pausedIt looks like this branch is under active development. To avoid overwhelming you with review comments due to an influx of new commits, CodeRabbit has automatically paused this review. You can configure this behavior by changing the Use the following commands to manage reviews:
Use the checkboxes below for quick actions:
WalkthroughThe parser now records parent-list marker metadata in Possibly related PRs
Suggested reviewers
🚥 Pre-merge checks | ✅ 2✅ Passed checks (2 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches🧪 Generate unit tests (beta)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
There was a problem hiding this comment.
Actionable comments posted: 3
Caution
Some comments are outside the diff and can’t be posted inline due to platform limitations.
⚠️ Outside diff range comments (1)
crates/biome_markdown_parser/src/syntax/list.rs (1)
523-554:⚠️ Potential issue | 🟠 MajorWhitespace-only separator lines still dodge the new bullet split check.
The new marker-change handling only runs from the
NEWLINEbranch. If the item parser leaves you on a whitespace token instead — for example-\n \n* nextafter a marker-only item — the earlierat_blank_line_start()path still keeps* nextin the same list and makes it loose.🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@crates/biome_markdown_parser/src/syntax/list.rs` around lines 523 - 554, The closure passed into is_at_list_end_common only runs marker_changes_after_newline when the lookahead sees a NEWLINE, but it must also detect whitespace-only separator lines (e.g., a blank line containing only spaces/tabs) and run the same marker-change logic; update the closure in is_at_list_end_common (the anonymous |p, marker_kind| closure) so that when the NEWLINE branch is not taken you additionally peek/skip optional whitespace-only content with p.lookahead (similar to how has_bullet_item_after_blank_lines_at_indent is used) and call marker_changes_after_blank_lines(p, marker_kind) before returning; ensure this mirrors the NEWLINE branch behavior (use at_bullet_list_item_with_base_indent and has_bullet_item_after_blank_lines_at_indent as appropriate) so marker_changes_after_blank_lines is considered for whitespace-only blank lines as well as for explicit NEWLINEs.
🧹 Nitpick comments (1)
justfile (1)
241-251: Pincommonmarkhere, or the corpus will drift.
pnpm add commonmarkinstalls whatever the registry serves that day. That makesjust fuzz-markdown-generatenon-reproducible and can change the reference HTML without any Biome change. A tiny pinned temp manifest keeps the fuzzer honest.🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@justfile` around lines 241 - 251, The recipe fuzz-markdown-generate currently runs pnpm add commonmark which installs whatever version is in the registry and makes generated corpus non-reproducible; update that line to pin an explicit version (e.g., replace pnpm add commonmark > /dev/null 2>&1 with pnpm add commonmark@0.30.0 > /dev/null 2>&1 or introduce a COMMONMARK_VERSION variable and use pnpm add commonmark@${COMMONMARK_VERSION}) inside the fuzz-markdown-generate recipe so the temp manifest is deterministic and the corpus won’t drift.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@crates/biome_markdown_parser/src/syntax/list.rs`:
- Around line 1594-1600: The quote branch in classify_blank_line still calls
classify_blank_line_in_quote without passing the threaded context
(parent_marker_kind and parent_ordered_delim), so quoted lists cannot
distinguish loose items vs new boundaries; update the quote-branch call site in
classify_blank_line to forward state.parent_marker_kind and
state.parent_ordered_delim into classify_blank_line_in_quote (and any other
helper invoked for quoted blanks at the other location around lines ~1692-1698),
adjust classify_blank_line_in_quote signature to accept and use these params to
decide loose vs boundary, and add a regression test exercising a quoted case
like "> - one\n>\n> * two" to ensure the behavior is fixed.
In `@crates/biome_markdown_parser/tests/fuzz_differential.rs`:
- Around line 72-81: The test currently swallows malformed corpus lines and
silently defaults missing fields, so change the logic to fail fast: replace the
forgiving match on serde_json::from_str(line) with a fallible call that
returns/panics on parse error (e.g. propagate the error or unwrap), and require
presence of the fields by asserting or returning an error if
entry["markdown"].as_str() or entry["html"].as_str() is None instead of using
unwrap_or(""), so malformed JSON or missing `markdown`/`html` fields cause the
test to fail immediately.
In `@crates/biome_markdown_parser/tests/fuzz_generate_corpus.mjs`:
- Around line 45-56: The PRNG currently divides by 0xffffffff in rand(),
allowing a result of 1.0 and causing randInt()/pick() to index out of bounds;
update the divisor in the rand() function (which uses rngState and is called by
randInt and pick) to 0x100000000 so rand() returns values in [0,1) and
randInt/pick cannot exceed the array bounds.
---
Outside diff comments:
In `@crates/biome_markdown_parser/src/syntax/list.rs`:
- Around line 523-554: The closure passed into is_at_list_end_common only runs
marker_changes_after_newline when the lookahead sees a NEWLINE, but it must also
detect whitespace-only separator lines (e.g., a blank line containing only
spaces/tabs) and run the same marker-change logic; update the closure in
is_at_list_end_common (the anonymous |p, marker_kind| closure) so that when the
NEWLINE branch is not taken you additionally peek/skip optional whitespace-only
content with p.lookahead (similar to how
has_bullet_item_after_blank_lines_at_indent is used) and call
marker_changes_after_blank_lines(p, marker_kind) before returning; ensure this
mirrors the NEWLINE branch behavior (use at_bullet_list_item_with_base_indent
and has_bullet_item_after_blank_lines_at_indent as appropriate) so
marker_changes_after_blank_lines is considered for whitespace-only blank lines
as well as for explicit NEWLINEs.
---
Nitpick comments:
In `@justfile`:
- Around line 241-251: The recipe fuzz-markdown-generate currently runs pnpm add
commonmark which installs whatever version is in the registry and makes
generated corpus non-reproducible; update that line to pin an explicit version
(e.g., replace pnpm add commonmark > /dev/null 2>&1 with pnpm add
commonmark@0.30.0 > /dev/null 2>&1 or introduce a COMMONMARK_VERSION variable
and use pnpm add commonmark@${COMMONMARK_VERSION}) inside the
fuzz-markdown-generate recipe so the temp manifest is deterministic and the
corpus won’t drift.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: Path: .coderabbit.yaml
Review profile: CHILL
Plan: Pro
Run ID: f1403a26-7158-4f25-b53a-53ec7aff5f26
⛔ Files ignored due to path filters (6)
crates/biome_markdown_parser/tests/md_test_suite/ok/bullet_list.md.snapis excluded by!**/*.snapand included by**crates/biome_markdown_parser/tests/md_test_suite/ok/header_in_list.md.snapis excluded by!**/*.snapand included by**crates/biome_markdown_parser/tests/md_test_suite/ok/list_marker_trailing_spaces.md.snapis excluded by!**/*.snapand included by**crates/biome_markdown_parser/tests/md_test_suite/ok/multiline_list.md.snapis excluded by!**/*.snapand included by**crates/biome_markdown_parser/tests/md_test_suite/ok/ordered_list.md.snapis excluded by!**/*.snapand included by**crates/biome_markdown_parser/tests/md_test_suite/ok/thematic_break_in_list.md.snapis excluded by!**/*.snapand included by**
📒 Files selected for processing (8)
crates/biome_markdown_parser/src/parser.rscrates/biome_markdown_parser/src/syntax/list.rscrates/biome_markdown_parser/src/to_html.rscrates/biome_markdown_parser/tests/fuzz_corpus/seed.jsonlcrates/biome_markdown_parser/tests/fuzz_differential.rscrates/biome_markdown_parser/tests/fuzz_generate_corpus.mjscrates/biome_markdown_parser/tests/spec_test.rsjustfile
There was a problem hiding this comment.
Actionable comments posted: 1
🧹 Nitpick comments (1)
crates/biome_markdown_parser/src/syntax/list.rs (1)
644-684: Minor duplication in blank-line skipping loops.The loops at lines 644–656 and 669–680 are near-identical. Consider extracting a small
skip_blank_lines_in_lookaheadhelper if this pattern appears elsewhere or grows — but fine for now.🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@crates/biome_markdown_parser/src/syntax/list.rs` around lines 644 - 684, The two lookahead blocks in marker_changes_after_blank_lines and delim_changes_after_blank_lines duplicate the "skip blank lines" loop; extract that repeated logic into a small helper function (e.g., skip_blank_lines_in_lookahead) that accepts &mut MarkdownParser and performs the loop that bumps NEWLINE and skips MD_TEXTUAL_LITERAL whitespace-only lines, then call that helper from both lookahead closures before invoking current_bullet_marker or current_ordered_delim; update references to marker_changes_after_blank_lines and delim_changes_after_blank_lines to use the new helper and keep behavior identical.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Nitpick comments:
In `@crates/biome_markdown_parser/src/syntax/list.rs`:
- Around line 644-684: The two lookahead blocks in
marker_changes_after_blank_lines and delim_changes_after_blank_lines duplicate
the "skip blank lines" loop; extract that repeated logic into a small helper
function (e.g., skip_blank_lines_in_lookahead) that accepts &mut MarkdownParser
and performs the loop that bumps NEWLINE and skips MD_TEXTUAL_LITERAL
whitespace-only lines, then call that helper from both lookahead closures before
invoking current_bullet_marker or current_ordered_delim; update references to
marker_changes_after_blank_lines and delim_changes_after_blank_lines to use the
new helper and keep behavior identical.
ℹ️ Review info
⚙️ Run configuration
Configuration used: Path: .coderabbit.yaml
Review profile: CHILL
Plan: Pro
Run ID: 7fa1488d-18c4-4671-992f-e99b392d0167
📒 Files selected for processing (1)
crates/biome_markdown_parser/src/syntax/list.rs
There was a problem hiding this comment.
Caution
Some comments are outside the diff and can’t be posted inline due to platform limitations.
⚠️ Outside diff range comments (1)
crates/biome_markdown_parser/src/syntax/list.rs (1)
529-556:⚠️ Potential issue | 🟡 MinorAdd tests for bullet ↔ ordered list transitions across blank lines.
Tests in
spec_test.rscover bullet-marker changes (examples 10006, 20008) and ordered-delimiter changes (example 10005), but miss the bullet ↔ ordered transitions. Addtest_exampleentries for:
- Bullet list → ordered list across blank line
- Ordered list → bullet list across blank line
This ensures
marker_changes_after_newline()andmarker_changes_after_blank_lines()are exercised for all four marker-type pairs.🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@crates/biome_markdown_parser/src/syntax/list.rs` around lines 529 - 556, Add two new test_example entries in the test harness (spec_test.rs) that cover the missing bullet ↔ ordered list transitions across blank lines: one where a bullet list is followed after a blank line by an ordered list, and one where an ordered list is followed after a blank line by a bullet list. These tests should use the existing test_example pattern so they run with the suite and assert the parser behavior, thereby exercising marker_changes_after_newline() and marker_changes_after_blank_lines() for the bullet→ordered and ordered→bullet cases.
🧹 Nitpick comments (2)
crates/biome_markdown_parser/src/syntax/list.rs (2)
2662-2684: Duplicated marker-change detection logic.The logic at lines 2667–2683 is nearly identical to lines 2559–2579 in
classify_blank_line. If you're feeling tidy, you could extract a small helper:♻️ Optional: Extract marker-change check
/// Returns true if the next list item has a different marker/type than the current parent. fn is_marker_type_change( p: &mut MarkdownParser, marker_indent: usize, parent_marker_kind: Option<MarkdownSyntaxKind>, parent_ordered_delim: Option<char>, ) -> bool { let next_is_bullet = at_bullet_list_item_with_base_indent(p, marker_indent); let next_is_ordered = at_order_list_item_with_base_indent(p, marker_indent); if let Some(current) = parent_marker_kind { if next_is_ordered { return true; } if next_is_bullet { let next = current_bullet_marker(p); return matches!(next, Some(next) if current != next); } } else if let Some(current_delim) = parent_ordered_delim { if next_is_bullet { return true; } if next_is_ordered { let next_delim = current_ordered_delim(p); return matches!(next_delim, Some(next) if current_delim != next); } } false }🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@crates/biome_markdown_parser/src/syntax/list.rs` around lines 2662 - 2684, The duplicate marker-change detection in classify_blank_line (and the block using marker_indent/parent_marker_kind/parent_ordered_delim) should be extracted into a small helper, e.g. is_marker_type_change(p, marker_indent, parent_marker_kind, parent_ordered_delim); move the logic that calls at_bullet_list_item_with_base_indent, at_order_list_item_with_base_indent, current_bullet_marker, and current_ordered_delim into that helper and return a bool, then replace the duplicated branches with calls to this helper and use its result to choose BlankLineAction::EndItemBeforeBlank vs EndItemAfterBlank.
634-684: Minor duplication in blank-line skipping loops.The loop pattern at lines 644–656 and 669–680 is identical. Could extract a small helper to skip blank lines in lookahead context if you fancy tidying this up later.
♻️ Optional: Extract blank-line skipping helper
+/// Skip blank lines in a lookahead context. +/// Consumes NEWLINE tokens and any whitespace-only MD_TEXTUAL_LITERAL tokens between them. +fn skip_blank_lines_in_lookahead(p: &mut MarkdownParser) { + loop { + if !p.at(NEWLINE) { + break; + } + p.bump(NEWLINE); + while p.at(MD_TEXTUAL_LITERAL) && p.cur_text().chars().all(|c| c == ' ' || c == '\t') { + p.bump(MD_TEXTUAL_LITERAL); + } + if !p.at(NEWLINE) { + break; + } + } +}Then use it in both
marker_changes_after_blank_linesanddelim_changes_after_blank_lines.🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@crates/biome_markdown_parser/src/syntax/list.rs` around lines 634 - 684, Extract the duplicated blank-line skipping loop into a small helper like skip_blank_lines_in_lookahead(p: &mut MarkdownParser) that runs the exact loop body used in the two lookahead closures (consume NEWLINE, then consume MD_TEXTUAL_LITERAL lines that are only spaces/tabs, breaking when no more NEWLINE), and then call this helper from the lookahead closures inside marker_changes_after_blank_lines and delim_changes_after_blank_lines before calling current_bullet_marker(p) and current_ordered_delim(p) respectively so behavior remains identical; keep existing Option handling (marker_kind/marker_delim) and the matches! checks unchanged.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Outside diff comments:
In `@crates/biome_markdown_parser/src/syntax/list.rs`:
- Around line 529-556: Add two new test_example entries in the test harness
(spec_test.rs) that cover the missing bullet ↔ ordered list transitions across
blank lines: one where a bullet list is followed after a blank line by an
ordered list, and one where an ordered list is followed after a blank line by a
bullet list. These tests should use the existing test_example pattern so they
run with the suite and assert the parser behavior, thereby exercising
marker_changes_after_newline() and marker_changes_after_blank_lines() for the
bullet→ordered and ordered→bullet cases.
---
Nitpick comments:
In `@crates/biome_markdown_parser/src/syntax/list.rs`:
- Around line 2662-2684: The duplicate marker-change detection in
classify_blank_line (and the block using
marker_indent/parent_marker_kind/parent_ordered_delim) should be extracted into
a small helper, e.g. is_marker_type_change(p, marker_indent, parent_marker_kind,
parent_ordered_delim); move the logic that calls
at_bullet_list_item_with_base_indent, at_order_list_item_with_base_indent,
current_bullet_marker, and current_ordered_delim into that helper and return a
bool, then replace the duplicated branches with calls to this helper and use its
result to choose BlankLineAction::EndItemBeforeBlank vs EndItemAfterBlank.
- Around line 634-684: Extract the duplicated blank-line skipping loop into a
small helper like skip_blank_lines_in_lookahead(p: &mut MarkdownParser) that
runs the exact loop body used in the two lookahead closures (consume NEWLINE,
then consume MD_TEXTUAL_LITERAL lines that are only spaces/tabs, breaking when
no more NEWLINE), and then call this helper from the lookahead closures inside
marker_changes_after_blank_lines and delim_changes_after_blank_lines before
calling current_bullet_marker(p) and current_ordered_delim(p) respectively so
behavior remains identical; keep existing Option handling
(marker_kind/marker_delim) and the matches! checks unchanged.
ℹ️ Review info
⚙️ Run configuration
Configuration used: Path: .coderabbit.yaml
Review profile: CHILL
Plan: Pro
Run ID: 4d1bcb35-06a6-4623-8be0-d3d39da6ddae
📒 Files selected for processing (1)
crates/biome_markdown_parser/src/syntax/list.rs
66e44b2 to
9091c12
Compare
|
Conflicts |
9091c12 to
38cb82c
Compare
mischief managed as of 0957 ET |
…rker boundaries Lists separated by blank lines with different markers (bullet type, ordered delimiter, or list kind) were incorrectly classified as loose. The blank line at the inter-list boundary was attributed to the item's tightness rather than treated as a list separator. Root cause: classify_blank_line only detected bullet marker changes but missed ordered delimiter changes, cross-type transitions (bullet↔ordered), and the first-item case where the marker identity wasn't yet known. Changes: - Add list_item_ordered_delim to parser state for delimiter tracking - Extend classify_blank_line to detect all list boundary types - Fix OrderedList::is_at_list_end NEWLINE handler to check delimiters after bumping past the newline (was checking at NEWLINE position) - Detect marker/delimiter eagerly in parse_element so first-item content parsing knows its parent list identity - Mirror bullet path in OrderedList::parse_element: save/restore both list_item_marker_kind and list_item_ordered_delim
…d corpus, fix PRNG divisor
…rker boundary detection
Trailing newline between bullet and ordered list sections now correctly sits outside the bullet list item node.
38cb82c to
5cbf41c
Compare
Note
This PR was created with AI assistance (Claude Code).
Summary
Fixes incorrect tight/loose list classification when blank lines separate lists with different markers.
Per CommonMark §5.3, blank lines between different lists should not make either list loose. The parser was attributing those blank lines to the preceding item, so cases like mixed bullet markers, mixed ordered delimiters, and bullet↔ordered transitions rendered with
<p>wrappers when they should stay tight.This updates list boundary detection so inter-list blank lines are treated as list boundaries rather than item-internal blank lines.
Test Plan
cargo test -p biome_markdown_parserspec_test.rsexpectationsto_html.rstests for marker split, cross-type split, and ordered delimiter split tightnessDocs
N/A