Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
Original file line number Diff line number Diff line change
@@ -1,7 +1,8 @@
use crate::prelude::*;
use crate::shared::TextPrintMode;
use biome_formatter::{FormatRuleWithOptions, write};
use biome_markdown_syntax::MdHardLine;
use biome_markdown_syntax::{MarkdownSyntaxKind, MdHardLine};
use biome_rowan::Direction;

#[derive(Debug, Clone, Default)]
pub(crate) struct FormatMdHardLine {
Expand All @@ -28,6 +29,20 @@ impl FormatNodeRule<MdHardLine> for FormatMdHardLine {
]
)
} else {
// Detect if the hard line break is the last one of the paragraph.
let is_last_hard_line = match node.syntax().siblings(Direction::Next).nth(1) {
None => true,
Some(s) => {
s.kind() == MarkdownSyntaxKind::MD_TEXTUAL && s.text_trimmed().is_empty()
}
};

if is_last_hard_line {
// Drop the two-space marker but keep a single newline so the
// paragraph still terminates on its own line.
return write!(f, [format_removed(&token), hard_line_break()]);
}

// Given two or more spaces in MdHardLine, only two spaces has semantic meaning
// so we are adding back two spaces as required by the spec
// https://spec.commonmark.org/0.31.2/#hard-line-break
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -6,3 +6,9 @@ form

no hard line
here

foo
bar with empty line after

foo
bar without empty line after
Original file line number Diff line number Diff line change
Expand Up @@ -15,6 +15,11 @@ form
no hard line
here

foo
bar with empty line after

foo
bar without empty line after
```


Expand All @@ -30,4 +35,8 @@ form
no hard line
here

foo
Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is incorrect but the bug is in the parser.

CST for this block is currently

    6: MD_PARAGRAPH@43..115
      0: MD_INLINE_ITEM_LIST@43..115
        0: MD_TEXTUAL@43..46
          0: MD_TEXTUAL_LITERAL@43..46 "foo" [] []
        1: MD_HARD_LINE@46..49
          0: MD_HARD_LINE_LITERAL@46..49 "  \n" [] []
        2: MD_TEXTUAL@49..74
          0: MD_TEXTUAL_LITERAL@49..74 "bar with empty line after" [] []
        3: MD_HARD_LINE@74..77
          0: MD_HARD_LINE_LITERAL@74..77 "  \n" [] []
        4: MD_TEXTUAL@77..78
          0: MD_TEXTUAL_LITERAL@77..78 "\n" [] []
        5: MD_TEXTUAL@78..81
          0: MD_TEXTUAL_LITERAL@78..81 "foo" [] []
        6: MD_HARD_LINE@81..84
          0: MD_HARD_LINE_LITERAL@81..84 "  \n" [] []
        7: MD_TEXTUAL@84..112
          0: MD_TEXTUAL_LITERAL@84..112 "bar without empty line after" [] []
        8: MD_HARD_LINE@112..115
          0: MD_HARD_LINE_LITERAL@112..115 "  \n" [] []
      1: (empty)

The expected behavior is that there should be two MD_PARAGRAPH nodes delimited by the empty line.

See AST tab in https://spec.commonmark.org/dingus/?text=foo%20%20%20%20%20%20%20%0Abaz%0A%0Abackslash%5C%0Aform%0A%0Ano%20hard%20line%0Ahere%0A%0Afoo%20%20%0Abar%20with%20empty%20line%20after%20%20%0A%0Afoo%20%20%0Abar%20without%20empty%20line%20after%20%20.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@coderabbitai file an issue for this parser bug

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🧩 Analysis chain

🏁 Script executed:

#!/bin/bash
gh issue create \
  --repo biomejs/biome \
  --title "biome_markdown_parser: paragraph containing hard line breaks incorrectly absorbs following empty line and paragraph" \
  --body "## Description

When a Markdown paragraph ends with a hard line break (\`  \n\`) and is followed by an empty line and another paragraph, the parser incorrectly merges both paragraphs into a single \`MD_PARAGRAPH\` node rather than producing two separate \`MD_PARAGRAPH\` nodes delimited by the empty line.

## Steps to Reproduce

Input:
\`\`\`md
foo  
bar with empty line after  

foo  
bar without empty line after  
\`\`\`

## Incorrect CST Output

\`\`\`
6: MD_PARAGRAPH@43..115
  0: MD_INLINE_ITEM_LIST@43..115
    0: MD_TEXTUAL@43..46
      0: MD_TEXTUAL_LITERAL@43..46 \"foo\" [] []
    1: MD_HARD_LINE@46..49
      0: MD_HARD_LINE_LITERAL@46..49 \"  \\n\" [] []
    2: MD_TEXTUAL@49..74
      0: MD_TEXTUAL_LITERAL@49..74 \"bar with empty line after\" [] []
    3: MD_HARD_LINE@74..77
      0: MD_HARD_LINE_LITERAL@74..77 \"  \\n\" [] []
    4: MD_TEXTUAL@77..78
      0: MD_TEXTUAL_LITERAL@77..78 \"\\n\" [] []   <-- empty line absorbed into paragraph
    5: MD_TEXTUAL@78..81
      0: MD_TEXTUAL_LITERAL@78..81 \"foo\" [] []
    6: MD_HARD_LINE@81..84
      0: MD_HARD_LINE_LITERAL@81..84 \"  \\n\" [] []
    7: MD_TEXTUAL@84..112
      0: MD_TEXTUAL_LITERAL@84..112 \"bar without empty line after\" [] []
    8: MD_HARD_LINE@112..115
      0: MD_HARD_LINE_LITERAL@112..115 \"  \\n\" [] []
  1: (empty)
\`\`\`

## Expected Behaviour

Two separate \`MD_PARAGRAPH\` nodes should be produced, as per the CommonMark spec. See the [CommonMark dingus AST](https://spec.commonmark.org/dingus/?text=foo%20%20%20%20%20%20%20%0Abaz%0A%0Abackslash%5C%0Aform%0A%0Ano%20hard%20line%0Ahere%0A%0Afoo%20%20%0Abar%20with%20empty%20line%20after%20%20%0A%0Afoo%20%20%0Abar%20without%20empty%20line%20after%20%20) for reference.

## Impact

This parser bug causes the formatter to incorrectly omit the blank line between the two paragraphs, as surfaced in PR `#9856`.

## Related

- Discovered in `#9856`
- Requested by `@tidefield`
"

Length of output: 44


@tidefield, done! Filed as #9857 — it covers the incorrect single-MD_PARAGRAPH parse when a hard-line-terminated paragraph is followed by an empty line and a second paragraph, with a link back to this PR.


🧠 Learnings used
Learnt from: ematipico
Repo: biomejs/biome PR: 7467
File: crates/biome_service/src/file_handlers/html.rs:456-466
Timestamp: 2025-09-13T16:16:06.459Z
Learning: In biome_formatter, consecutive LineMode::Hard elements are automatically collapsed to a single newline by the printer. The printer has a test "it_prints_consecutive_hard_lines_as_one" that demonstrates this behavior - multiple consecutive hard line breaks result in only one newline in the output, not extra blank lines.

Learnt from: tidefield
Repo: biomejs/biome PR: 9480
File: crates/biome_markdown_formatter/src/markdown/auxiliary/hard_line.rs:11-31
Timestamp: 2026-03-14T11:30:13.488Z
Learning: In `crates/biome_markdown_formatter/src/markdown/auxiliary/hard_line.rs`, `FormatMdHardLine::fmt_fields` is responsible for emitting `hard_line_break()` itself (not the caller). As of PR `#9480` (building on `#9331` by ematipico), the function branches on the token text: if it ends with `\` it emits `format_removed(&token)` + `text("\\", ...)` + `hard_line_break()`; otherwise it normalises to two trailing spaces via `format_removed(&token)` + `text("  ", ...)` + `hard_line_break()`. The callers in `inline_item_list.rs` and `paragraph.rs` do NOT add an extra `hard_line_break()`.

Learnt from: CR
Repo: biomejs/biome PR: 0
File: crates/biome_formatter/CONTRIBUTING.md:0-0
Timestamp: 2026-02-05T16:31:32.222Z
Learning: Applies to crates/biome_formatter/**/tests/specs/**/* : Create test spec files in `tests/specs/<language>/` directory as input files for formatter testing; accept snapshots only after verifying correctness

Learnt from: jfmcdowell
Repo: biomejs/biome PR: 0
File: :0-0
Timestamp: 2026-03-30T19:37:09.297Z
Learning: In `crates/biome_markdown_parser/src/syntax/fenced_code_block.rs`, there is an intentional asymmetry between opening and closing fence indent stripping inside list items:
- **Opening fence**: `check_continuation_indent` (called before the fenced code block parser runs) already consumes the list continuation indent, so the opener only needs to emit the remaining 0–3 CommonMark-allowed fence spaces (`MAX_BLOCK_PREFIX_INDENT`). Stripping `list_item_required_indent` in the opener is NOT needed.
- **Closing fence**: processes the raw line without prior consumption, so it must strip `list_item_required_indent + MAX_BLOCK_PREFIX_INDENT` itself (the fix introduced in PR `#9730`).

Learnt from: jfmcdowell
Repo: biomejs/biome PR: 9779
File: crates/biome_markdown_parser/tests/spec_test.rs:143-157
Timestamp: 2026-04-03T00:16:33.809Z
Learning: In `crates/biome_markdown_parser/tests/spec_test.rs`, the `MdContinuationIndent` structural invariant (checking that children are `MD_INDENT_TOKEN`) intentionally does NOT assert that the `MD_INDENT_TOKEN_LIST` wrapper is present, because `has_bogus_nodes_or_empty_slots(&syntax)` runs earlier in the `ExpectedOutcome::Pass` branch and already catches a missing or empty wrapper child. The invariant is purposefully layered on top of that prior check.

Learnt from: tidefield
Repo: biomejs/biome PR: 9693
File: crates/biome_markdown_formatter/src/markdown/auxiliary/bullet.rs:24-24
Timestamp: 2026-04-02T09:34:17.898Z
Learning: In `crates/biome_markdown_formatter/` (and biome formatter crates generally), `format_verbatim_node` is only acceptable as initial scaffolding boilerplate when a formatter is first set up. Once active formatting features are being implemented, no new calls to `format_verbatim_node` should be introduced — new feature code must format the node's fields explicitly instead. Per ematipico: its use in feature code is considered a bug. Per tidefield: this applies to all new formatting feature work in biome_markdown_formatter.

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

bar with empty line after
foo
bar without empty line after
```

This file was deleted.

This file was deleted.

Loading