Skip to content

fix(md-formatter): remove last hard line break #9856

Merged
ematipico merged 2 commits intobiomejs:mainfrom
tidefield:fix-9770
Apr 8, 2026
Merged

fix(md-formatter): remove last hard line break #9856
ematipico merged 2 commits intobiomejs:mainfrom
tidefield:fix-9770

Conversation

@tidefield
Copy link
Copy Markdown
Contributor

@tidefield tidefield commented Apr 8, 2026

Summary

By default, we keep all the hard line breaks to preserve the original semantic. To match prettier's behavior, this PR removes the last hard line break because MD_INLINE_ITEM_LIST always ends with MD_TEXTUAL \n.

See the AST in https://spec.commonmark.org/dingus/?text=foo%20%20%0Abaz%20%20%20%0A%0Afoo for a better demonstration.

I added more test coverage which yields a bug in the parser. I'll comment below and file an issue.

Test Plan

$ cargo test -p biome_markdown_formatter

Docs

@changeset-bot
Copy link
Copy Markdown

changeset-bot bot commented Apr 8, 2026

⚠️ No Changeset found

Latest commit: 39dc847

Merging this PR will not cause a version bump for any packages. If these changes should not result in a new version, you're good to go. If these changes should result in a version bump, you need to add a changeset.

This PR includes no changesets

When changesets are added to this PR, you'll see the packages that this PR includes changesets for and the associated semver types

Click here to learn what changesets are, and how to add one.

Click here if you're a maintainer who wants to add a changeset to this PR

@github-actions github-actions bot added A-Formatter Area: formatter L-Markdown Language: Markdown labels Apr 8, 2026
@coderabbitai
Copy link
Copy Markdown
Contributor

coderabbitai bot commented Apr 8, 2026

No actionable comments were generated in the recent review. 🎉

ℹ️ Recent review info
⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

Run ID: 8fc11f0b-1d87-4d61-aae9-ace97b1496d2

📥 Commits

Reviewing files that changed from the base of the PR and between 2d041a8 and 395d479.

⛔ Files ignored due to path filters (3)
  • crates/biome_markdown_formatter/tests/specs/markdown/hard_line.md.snap is excluded by !**/*.snap and included by **
  • crates/biome_markdown_formatter/tests/specs/prettier/markdown/spec/example-186.md.snap is excluded by !**/*.snap and included by **
  • crates/biome_markdown_formatter/tests/specs/prettier/markdown/spec/example-609.md.snap is excluded by !**/*.snap and included by **
📒 Files selected for processing (3)
  • crates/biome_markdown_formatter/src/markdown/auxiliary/hard_line.rs
  • crates/biome_markdown_formatter/tests/quick_test.rs
  • crates/biome_markdown_formatter/tests/specs/markdown/hard_line.md
✅ Files skipped from review due to trivial changes (1)
  • crates/biome_markdown_formatter/tests/specs/markdown/hard_line.md
🚧 Files skipped from review as they are similar to previous changes (2)
  • crates/biome_markdown_formatter/tests/quick_test.rs
  • crates/biome_markdown_formatter/src/markdown/auxiliary/hard_line.rs

Walkthrough

The formatter for Markdown hard lines now examines surrounding syntax (including MarkdownSyntaxKind and Direction) to decide if an MdHardLine is the last in its paragraph. If there is no next sibling or the next sibling is textual but trims to empty, the formatter emits format_removed(&token) followed by a single hard_line_break() (omitting the reinserted two-space marker). In all other cases it retains re-inserting exactly two spaces before the hard line break. Tests and fixtures were updated to cover backslash continuations, double-space hard breaks, and cases with or without intervening blank lines.

Suggested reviewers

  • dyc3
  • ematipico
🚥 Pre-merge checks | ✅ 2
✅ Passed checks (2 passed)
Check name Status Explanation
Title check ✅ Passed The title directly describes the main change: removing the last hard line break in the Markdown formatter to match Prettier's behaviour.
Description check ✅ Passed The description clearly explains the motivation (matching Prettier behaviour), references the technical reason (MD_INLINE_ITEM_LIST ending), and provides test instructions.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

no hard line
here

foo
Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is incorrect but the bug is in the parser.

CST for this block is currently

    6: MD_PARAGRAPH@43..115
      0: MD_INLINE_ITEM_LIST@43..115
        0: MD_TEXTUAL@43..46
          0: MD_TEXTUAL_LITERAL@43..46 "foo" [] []
        1: MD_HARD_LINE@46..49
          0: MD_HARD_LINE_LITERAL@46..49 "  \n" [] []
        2: MD_TEXTUAL@49..74
          0: MD_TEXTUAL_LITERAL@49..74 "bar with empty line after" [] []
        3: MD_HARD_LINE@74..77
          0: MD_HARD_LINE_LITERAL@74..77 "  \n" [] []
        4: MD_TEXTUAL@77..78
          0: MD_TEXTUAL_LITERAL@77..78 "\n" [] []
        5: MD_TEXTUAL@78..81
          0: MD_TEXTUAL_LITERAL@78..81 "foo" [] []
        6: MD_HARD_LINE@81..84
          0: MD_HARD_LINE_LITERAL@81..84 "  \n" [] []
        7: MD_TEXTUAL@84..112
          0: MD_TEXTUAL_LITERAL@84..112 "bar without empty line after" [] []
        8: MD_HARD_LINE@112..115
          0: MD_HARD_LINE_LITERAL@112..115 "  \n" [] []
      1: (empty)

The expected behavior is that there should be two MD_PARAGRAPH nodes delimited by the empty line.

See AST tab in https://spec.commonmark.org/dingus/?text=foo%20%20%20%20%20%20%20%0Abaz%0A%0Abackslash%5C%0Aform%0A%0Ano%20hard%20line%0Ahere%0A%0Afoo%20%20%0Abar%20with%20empty%20line%20after%20%20%0A%0Afoo%20%20%0Abar%20without%20empty%20line%20after%20%20.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@coderabbitai file an issue for this parser bug

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🧩 Analysis chain

🏁 Script executed:

#!/bin/bash
gh issue create \
  --repo biomejs/biome \
  --title "biome_markdown_parser: paragraph containing hard line breaks incorrectly absorbs following empty line and paragraph" \
  --body "## Description

When a Markdown paragraph ends with a hard line break (\`  \n\`) and is followed by an empty line and another paragraph, the parser incorrectly merges both paragraphs into a single \`MD_PARAGRAPH\` node rather than producing two separate \`MD_PARAGRAPH\` nodes delimited by the empty line.

## Steps to Reproduce

Input:
\`\`\`md
foo  
bar with empty line after  

foo  
bar without empty line after  
\`\`\`

## Incorrect CST Output

\`\`\`
6: MD_PARAGRAPH@43..115
  0: MD_INLINE_ITEM_LIST@43..115
    0: MD_TEXTUAL@43..46
      0: MD_TEXTUAL_LITERAL@43..46 \"foo\" [] []
    1: MD_HARD_LINE@46..49
      0: MD_HARD_LINE_LITERAL@46..49 \"  \\n\" [] []
    2: MD_TEXTUAL@49..74
      0: MD_TEXTUAL_LITERAL@49..74 \"bar with empty line after\" [] []
    3: MD_HARD_LINE@74..77
      0: MD_HARD_LINE_LITERAL@74..77 \"  \\n\" [] []
    4: MD_TEXTUAL@77..78
      0: MD_TEXTUAL_LITERAL@77..78 \"\\n\" [] []   <-- empty line absorbed into paragraph
    5: MD_TEXTUAL@78..81
      0: MD_TEXTUAL_LITERAL@78..81 \"foo\" [] []
    6: MD_HARD_LINE@81..84
      0: MD_HARD_LINE_LITERAL@81..84 \"  \\n\" [] []
    7: MD_TEXTUAL@84..112
      0: MD_TEXTUAL_LITERAL@84..112 \"bar without empty line after\" [] []
    8: MD_HARD_LINE@112..115
      0: MD_HARD_LINE_LITERAL@112..115 \"  \\n\" [] []
  1: (empty)
\`\`\`

## Expected Behaviour

Two separate \`MD_PARAGRAPH\` nodes should be produced, as per the CommonMark spec. See the [CommonMark dingus AST](https://spec.commonmark.org/dingus/?text=foo%20%20%20%20%20%20%20%0Abaz%0A%0Abackslash%5C%0Aform%0A%0Ano%20hard%20line%0Ahere%0A%0Afoo%20%20%0Abar%20with%20empty%20line%20after%20%20%0A%0Afoo%20%20%0Abar%20without%20empty%20line%20after%20%20) for reference.

## Impact

This parser bug causes the formatter to incorrectly omit the blank line between the two paragraphs, as surfaced in PR `#9856`.

## Related

- Discovered in `#9856`
- Requested by `@tidefield`
"

Length of output: 44


@tidefield, done! Filed as #9857 — it covers the incorrect single-MD_PARAGRAPH parse when a hard-line-terminated paragraph is followed by an empty line and a second paragraph, with a link back to this PR.


🧠 Learnings used
Learnt from: ematipico
Repo: biomejs/biome PR: 7467
File: crates/biome_service/src/file_handlers/html.rs:456-466
Timestamp: 2025-09-13T16:16:06.459Z
Learning: In biome_formatter, consecutive LineMode::Hard elements are automatically collapsed to a single newline by the printer. The printer has a test "it_prints_consecutive_hard_lines_as_one" that demonstrates this behavior - multiple consecutive hard line breaks result in only one newline in the output, not extra blank lines.

Learnt from: tidefield
Repo: biomejs/biome PR: 9480
File: crates/biome_markdown_formatter/src/markdown/auxiliary/hard_line.rs:11-31
Timestamp: 2026-03-14T11:30:13.488Z
Learning: In `crates/biome_markdown_formatter/src/markdown/auxiliary/hard_line.rs`, `FormatMdHardLine::fmt_fields` is responsible for emitting `hard_line_break()` itself (not the caller). As of PR `#9480` (building on `#9331` by ematipico), the function branches on the token text: if it ends with `\` it emits `format_removed(&token)` + `text("\\", ...)` + `hard_line_break()`; otherwise it normalises to two trailing spaces via `format_removed(&token)` + `text("  ", ...)` + `hard_line_break()`. The callers in `inline_item_list.rs` and `paragraph.rs` do NOT add an extra `hard_line_break()`.

Learnt from: CR
Repo: biomejs/biome PR: 0
File: crates/biome_formatter/CONTRIBUTING.md:0-0
Timestamp: 2026-02-05T16:31:32.222Z
Learning: Applies to crates/biome_formatter/**/tests/specs/**/* : Create test spec files in `tests/specs/<language>/` directory as input files for formatter testing; accept snapshots only after verifying correctness

Learnt from: jfmcdowell
Repo: biomejs/biome PR: 0
File: :0-0
Timestamp: 2026-03-30T19:37:09.297Z
Learning: In `crates/biome_markdown_parser/src/syntax/fenced_code_block.rs`, there is an intentional asymmetry between opening and closing fence indent stripping inside list items:
- **Opening fence**: `check_continuation_indent` (called before the fenced code block parser runs) already consumes the list continuation indent, so the opener only needs to emit the remaining 0–3 CommonMark-allowed fence spaces (`MAX_BLOCK_PREFIX_INDENT`). Stripping `list_item_required_indent` in the opener is NOT needed.
- **Closing fence**: processes the raw line without prior consumption, so it must strip `list_item_required_indent + MAX_BLOCK_PREFIX_INDENT` itself (the fix introduced in PR `#9730`).

Learnt from: jfmcdowell
Repo: biomejs/biome PR: 9779
File: crates/biome_markdown_parser/tests/spec_test.rs:143-157
Timestamp: 2026-04-03T00:16:33.809Z
Learning: In `crates/biome_markdown_parser/tests/spec_test.rs`, the `MdContinuationIndent` structural invariant (checking that children are `MD_INDENT_TOKEN`) intentionally does NOT assert that the `MD_INDENT_TOKEN_LIST` wrapper is present, because `has_bogus_nodes_or_empty_slots(&syntax)` runs earlier in the `ExpectedOutcome::Pass` branch and already catches a missing or empty wrapper child. The invariant is purposefully layered on top of that prior check.

Learnt from: tidefield
Repo: biomejs/biome PR: 9693
File: crates/biome_markdown_formatter/src/markdown/auxiliary/bullet.rs:24-24
Timestamp: 2026-04-02T09:34:17.898Z
Learning: In `crates/biome_markdown_formatter/` (and biome formatter crates generally), `format_verbatim_node` is only acceptable as initial scaffolding boilerplate when a formatter is first set up. Once active formatting features are being implemented, no new calls to `format_verbatim_node` should be introduced — new feature code must format the node's fields explicitly instead. Per ematipico: its use in feature code is considered a bug. Per tidefield: this applies to all new formatting feature work in biome_markdown_formatter.

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@codspeed-hq
Copy link
Copy Markdown

codspeed-hq bot commented Apr 8, 2026

Merging this PR will not alter performance

✅ 28 untouched benchmarks
⏩ 228 skipped benchmarks1


Comparing tidefield:fix-9770 (39dc847) with main (b7134d9)

Open in CodSpeed

Footnotes

  1. 228 benchmarks were skipped, so the baseline results were used instead. If they were deleted from the codebase, click here and archive them to remove them from the performance reports.

@tidefield
Copy link
Copy Markdown
Contributor Author

@coderabbitai

@coderabbitai
Copy link
Copy Markdown
Contributor

coderabbitai bot commented Apr 8, 2026

Tip

For best results, initiate chat on the files or code changes.

@tidefield Hey! Looks like your message came through empty. What can I help you with? If you'd like me to review the PR, just say the word! 🐰


🧠 Learnings used
Learnt from: tidefield
Repo: biomejs/biome PR: 9480
File: crates/biome_markdown_formatter/src/markdown/auxiliary/hard_line.rs:11-31
Timestamp: 2026-03-14T11:30:13.488Z
Learning: In `crates/biome_markdown_formatter/src/markdown/auxiliary/hard_line.rs`, `FormatMdHardLine::fmt_fields` is responsible for emitting `hard_line_break()` itself (not the caller). As of PR `#9480` (building on `#9331` by ematipico), the function branches on the token text: if it ends with `\` it emits `format_removed(&token)` + `text("\\", ...)` + `hard_line_break()`; otherwise it normalises to two trailing spaces via `format_removed(&token)` + `text("  ", ...)` + `hard_line_break()`. The callers in `inline_item_list.rs` and `paragraph.rs` do NOT add an extra `hard_line_break()`.

Learnt from: jfmcdowell
Repo: biomejs/biome PR: 0
File: :0-0
Timestamp: 2026-03-30T19:37:09.297Z
Learning: In `crates/biome_markdown_parser/src/syntax/fenced_code_block.rs`, there is an intentional asymmetry between opening and closing fence indent stripping inside list items:
- **Opening fence**: `check_continuation_indent` (called before the fenced code block parser runs) already consumes the list continuation indent, so the opener only needs to emit the remaining 0–3 CommonMark-allowed fence spaces (`MAX_BLOCK_PREFIX_INDENT`). Stripping `list_item_required_indent` in the opener is NOT needed.
- **Closing fence**: processes the raw line without prior consumption, so it must strip `list_item_required_indent + MAX_BLOCK_PREFIX_INDENT` itself (the fix introduced in PR `#9730`).

Learnt from: dyc3
Repo: biomejs/biome PR: 8291
File: crates/biome_html_formatter/tests/specs/prettier/vue/html-vue/elastic-header.html:10-10
Timestamp: 2025-12-04T13:29:49.287Z
Learning: Files under `crates/biome_html_formatter/tests/specs/prettier` are test fixtures synced from Prettier and should not receive detailed code quality reviews (e.g., HTTP vs HTTPS, formatting suggestions, etc.). These files are test data meant to validate formatter behavior and should be preserved as-is.

Learnt from: ematipico
Repo: biomejs/biome PR: 7467
File: crates/biome_service/src/file_handlers/html.rs:456-466
Timestamp: 2025-09-13T16:16:06.459Z
Learning: In biome_formatter, consecutive LineMode::Hard elements are automatically collapsed to a single newline by the printer. The printer has a test "it_prints_consecutive_hard_lines_as_one" that demonstrates this behavior - multiple consecutive hard line breaks result in only one newline in the output, not extra blank lines.

Learnt from: ematipico
Repo: biomejs/biome PR: 7467
File: crates/biome_service/src/file_handlers/html.rs:456-466
Timestamp: 2025-09-13T16:16:06.459Z
Learning: The biome_formatter printer deduplicates consecutive LineMode::Hard elements by checking if the current line is already empty (line_width > 0) before printing a newline. This prevents multiple consecutive hard line breaks from creating extra blank lines in the output.

Learnt from: tidefield
Repo: biomejs/biome PR: 9693
File: crates/biome_markdown_formatter/src/markdown/auxiliary/bullet.rs:24-24
Timestamp: 2026-04-02T09:34:17.898Z
Learning: In `crates/biome_markdown_formatter/` (and biome formatter crates generally), `format_verbatim_node` is only acceptable as initial scaffolding boilerplate when a formatter is first set up. Once active formatting features are being implemented, no new calls to `format_verbatim_node` should be introduced — new feature code must format the node's fields explicitly instead. Per ematipico: its use in feature code is considered a bug. Per tidefield: this applies to all new formatting feature work in biome_markdown_formatter.

Learnt from: ematipico
Repo: biomejs/biome PR: 9584
File: .changeset/fix-vue-directive-expression-parsing.md:5-5
Timestamp: 2026-03-22T14:48:22.401Z
Learning: In biomejs/biome, the bug fix changeset format `Fixed [`#NUMBER`](issue link): ...` is only required when a GitHub issue exists for the bug. When no issue is filed, a free-form description of the fix is acceptable for the changeset entry.

Learnt from: CR
Repo: biomejs/biome PR: 0
File: CONTRIBUTING.md:0-0
Timestamp: 2026-04-06T21:31:01.848Z
Learning: Applies to crates/biome_formatter/**/*.rs : Follow technical guidelines and patterns documented in the biome_formatter CONTRIBUTING.md for formatter implementation and testing

Learnt from: CR
Repo: biomejs/biome PR: 0
File: crates/biome_analyze/CONTRIBUTING.md:0-0
Timestamp: 2026-04-07T12:58:59.307Z
Learning: Applies to crates/biome_analyze/**/biome_*_analyze/lib/src/lint/**/*.rs : Use `#` prefix to hide lines from code block output in rule documentation while keeping them in validation

Learnt from: jfmcdowell
Repo: biomejs/biome PR: 9779
File: crates/biome_markdown_parser/tests/spec_test.rs:143-157
Timestamp: 2026-04-03T00:16:33.809Z
Learning: In `crates/biome_markdown_parser/tests/spec_test.rs`, the `MdContinuationIndent` structural invariant (checking that children are `MD_INDENT_TOKEN`) intentionally does NOT assert that the `MD_INDENT_TOKEN_LIST` wrapper is present, because `has_bogus_nodes_or_empty_slots(&syntax)` runs earlier in the `ExpectedOutcome::Pass` branch and already catches a missing or empty wrapper child. The invariant is purposefully layered on top of that prior check.

@tidefield
Copy link
Copy Markdown
Contributor Author

Copy link
Copy Markdown
Member

@ematipico ematipico left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just a comment to address

Comment thread crates/biome_markdown_formatter/tests/quick_test.rs Outdated
no hard line
here

foo
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@tidefield
Copy link
Copy Markdown
Contributor Author

@ematipico oops. I reverted the change.

@ematipico ematipico merged commit 66be88f into biomejs:main Apr 8, 2026
15 of 16 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

A-Formatter Area: formatter L-Markdown Language: Markdown

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants