fix(md-formatter): remove last hard line break #9856
Conversation
|
|
No actionable comments were generated in the recent review. 🎉 ℹ️ Recent review info⚙️ Run configurationConfiguration used: Path: .coderabbit.yaml Review profile: CHILL Plan: Pro Run ID: ⛔ Files ignored due to path filters (3)
📒 Files selected for processing (3)
✅ Files skipped from review due to trivial changes (1)
🚧 Files skipped from review as they are similar to previous changes (2)
WalkthroughThe formatter for Markdown hard lines now examines surrounding syntax (including Suggested reviewers
🚥 Pre-merge checks | ✅ 2✅ Passed checks (2 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches🧪 Generate unit tests (beta)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
| no hard line | ||
| here | ||
|
|
||
| foo |
There was a problem hiding this comment.
This is incorrect but the bug is in the parser.
CST for this block is currently
6: MD_PARAGRAPH@43..115
0: MD_INLINE_ITEM_LIST@43..115
0: MD_TEXTUAL@43..46
0: MD_TEXTUAL_LITERAL@43..46 "foo" [] []
1: MD_HARD_LINE@46..49
0: MD_HARD_LINE_LITERAL@46..49 " \n" [] []
2: MD_TEXTUAL@49..74
0: MD_TEXTUAL_LITERAL@49..74 "bar with empty line after" [] []
3: MD_HARD_LINE@74..77
0: MD_HARD_LINE_LITERAL@74..77 " \n" [] []
4: MD_TEXTUAL@77..78
0: MD_TEXTUAL_LITERAL@77..78 "\n" [] []
5: MD_TEXTUAL@78..81
0: MD_TEXTUAL_LITERAL@78..81 "foo" [] []
6: MD_HARD_LINE@81..84
0: MD_HARD_LINE_LITERAL@81..84 " \n" [] []
7: MD_TEXTUAL@84..112
0: MD_TEXTUAL_LITERAL@84..112 "bar without empty line after" [] []
8: MD_HARD_LINE@112..115
0: MD_HARD_LINE_LITERAL@112..115 " \n" [] []
1: (empty)
The expected behavior is that there should be two MD_PARAGRAPH nodes delimited by the empty line.
There was a problem hiding this comment.
@coderabbitai file an issue for this parser bug
There was a problem hiding this comment.
🧩 Analysis chain
🏁 Script executed:
#!/bin/bash
gh issue create \
--repo biomejs/biome \
--title "biome_markdown_parser: paragraph containing hard line breaks incorrectly absorbs following empty line and paragraph" \
--body "## Description
When a Markdown paragraph ends with a hard line break (\` \n\`) and is followed by an empty line and another paragraph, the parser incorrectly merges both paragraphs into a single \`MD_PARAGRAPH\` node rather than producing two separate \`MD_PARAGRAPH\` nodes delimited by the empty line.
## Steps to Reproduce
Input:
\`\`\`md
foo
bar with empty line after
foo
bar without empty line after
\`\`\`
## Incorrect CST Output
\`\`\`
6: MD_PARAGRAPH@43..115
0: MD_INLINE_ITEM_LIST@43..115
0: MD_TEXTUAL@43..46
0: MD_TEXTUAL_LITERAL@43..46 \"foo\" [] []
1: MD_HARD_LINE@46..49
0: MD_HARD_LINE_LITERAL@46..49 \" \\n\" [] []
2: MD_TEXTUAL@49..74
0: MD_TEXTUAL_LITERAL@49..74 \"bar with empty line after\" [] []
3: MD_HARD_LINE@74..77
0: MD_HARD_LINE_LITERAL@74..77 \" \\n\" [] []
4: MD_TEXTUAL@77..78
0: MD_TEXTUAL_LITERAL@77..78 \"\\n\" [] [] <-- empty line absorbed into paragraph
5: MD_TEXTUAL@78..81
0: MD_TEXTUAL_LITERAL@78..81 \"foo\" [] []
6: MD_HARD_LINE@81..84
0: MD_HARD_LINE_LITERAL@81..84 \" \\n\" [] []
7: MD_TEXTUAL@84..112
0: MD_TEXTUAL_LITERAL@84..112 \"bar without empty line after\" [] []
8: MD_HARD_LINE@112..115
0: MD_HARD_LINE_LITERAL@112..115 \" \\n\" [] []
1: (empty)
\`\`\`
## Expected Behaviour
Two separate \`MD_PARAGRAPH\` nodes should be produced, as per the CommonMark spec. See the [CommonMark dingus AST](https://spec.commonmark.org/dingus/?text=foo%20%20%20%20%20%20%20%0Abaz%0A%0Abackslash%5C%0Aform%0A%0Ano%20hard%20line%0Ahere%0A%0Afoo%20%20%0Abar%20with%20empty%20line%20after%20%20%0A%0Afoo%20%20%0Abar%20without%20empty%20line%20after%20%20) for reference.
## Impact
This parser bug causes the formatter to incorrectly omit the blank line between the two paragraphs, as surfaced in PR `#9856`.
## Related
- Discovered in `#9856`
- Requested by `@tidefield`
"
Length of output: 44
@tidefield, done! Filed as #9857 — it covers the incorrect single-MD_PARAGRAPH parse when a hard-line-terminated paragraph is followed by an empty line and a second paragraph, with a link back to this PR.
🧠 Learnings used
Learnt from: ematipico
Repo: biomejs/biome PR: 7467
File: crates/biome_service/src/file_handlers/html.rs:456-466
Timestamp: 2025-09-13T16:16:06.459Z
Learning: In biome_formatter, consecutive LineMode::Hard elements are automatically collapsed to a single newline by the printer. The printer has a test "it_prints_consecutive_hard_lines_as_one" that demonstrates this behavior - multiple consecutive hard line breaks result in only one newline in the output, not extra blank lines.
Learnt from: tidefield
Repo: biomejs/biome PR: 9480
File: crates/biome_markdown_formatter/src/markdown/auxiliary/hard_line.rs:11-31
Timestamp: 2026-03-14T11:30:13.488Z
Learning: In `crates/biome_markdown_formatter/src/markdown/auxiliary/hard_line.rs`, `FormatMdHardLine::fmt_fields` is responsible for emitting `hard_line_break()` itself (not the caller). As of PR `#9480` (building on `#9331` by ematipico), the function branches on the token text: if it ends with `\` it emits `format_removed(&token)` + `text("\\", ...)` + `hard_line_break()`; otherwise it normalises to two trailing spaces via `format_removed(&token)` + `text(" ", ...)` + `hard_line_break()`. The callers in `inline_item_list.rs` and `paragraph.rs` do NOT add an extra `hard_line_break()`.
Learnt from: CR
Repo: biomejs/biome PR: 0
File: crates/biome_formatter/CONTRIBUTING.md:0-0
Timestamp: 2026-02-05T16:31:32.222Z
Learning: Applies to crates/biome_formatter/**/tests/specs/**/* : Create test spec files in `tests/specs/<language>/` directory as input files for formatter testing; accept snapshots only after verifying correctness
Learnt from: jfmcdowell
Repo: biomejs/biome PR: 0
File: :0-0
Timestamp: 2026-03-30T19:37:09.297Z
Learning: In `crates/biome_markdown_parser/src/syntax/fenced_code_block.rs`, there is an intentional asymmetry between opening and closing fence indent stripping inside list items:
- **Opening fence**: `check_continuation_indent` (called before the fenced code block parser runs) already consumes the list continuation indent, so the opener only needs to emit the remaining 0–3 CommonMark-allowed fence spaces (`MAX_BLOCK_PREFIX_INDENT`). Stripping `list_item_required_indent` in the opener is NOT needed.
- **Closing fence**: processes the raw line without prior consumption, so it must strip `list_item_required_indent + MAX_BLOCK_PREFIX_INDENT` itself (the fix introduced in PR `#9730`).
Learnt from: jfmcdowell
Repo: biomejs/biome PR: 9779
File: crates/biome_markdown_parser/tests/spec_test.rs:143-157
Timestamp: 2026-04-03T00:16:33.809Z
Learning: In `crates/biome_markdown_parser/tests/spec_test.rs`, the `MdContinuationIndent` structural invariant (checking that children are `MD_INDENT_TOKEN`) intentionally does NOT assert that the `MD_INDENT_TOKEN_LIST` wrapper is present, because `has_bogus_nodes_or_empty_slots(&syntax)` runs earlier in the `ExpectedOutcome::Pass` branch and already catches a missing or empty wrapper child. The invariant is purposefully layered on top of that prior check.
Learnt from: tidefield
Repo: biomejs/biome PR: 9693
File: crates/biome_markdown_formatter/src/markdown/auxiliary/bullet.rs:24-24
Timestamp: 2026-04-02T09:34:17.898Z
Learning: In `crates/biome_markdown_formatter/` (and biome formatter crates generally), `format_verbatim_node` is only acceptable as initial scaffolding boilerplate when a formatter is first set up. Once active formatting features are being implemented, no new calls to `format_verbatim_node` should be introduced — new feature code must format the node's fields explicitly instead. Per ematipico: its use in feature code is considered a bug. Per tidefield: this applies to all new formatting feature work in biome_markdown_formatter.
Merging this PR will not alter performance
Comparing Footnotes
|
|
Tip For best results, initiate chat on the files or code changes.
🧠 Learnings used |
| no hard line | ||
| here | ||
|
|
||
| foo |
|
@ematipico oops. I reverted the change. |
Summary
By default, we keep all the hard line breaks to preserve the original semantic. To match prettier's behavior, this PR removes the last hard line break because
MD_INLINE_ITEM_LISTalways ends withMD_TEXTUAL\n.See the AST in https://spec.commonmark.org/dingus/?text=foo%20%20%0Abaz%20%20%20%0A%0Afoo for a better demonstration.
I added more test coverage which yields a bug in the parser. I'll comment below and file an issue.
Test Plan
Docs