Skip to content

fix(md): code info string, and fmt advancement#9979

Merged
ematipico merged 3 commits intomainfrom
fix/md-fmt-and-parse
Apr 15, 2026
Merged

fix(md): code info string, and fmt advancement#9979
ematipico merged 3 commits intomainfrom
fix/md-fmt-and-parse

Conversation

@ematipico
Copy link
Copy Markdown
Member

@ematipico ematipico commented Apr 14, 2026

Summary

This PR does the following:

  • remove more verbatim formatting from the markdown formatter
  • patches the formatting of code blocks by matching prettier:
    • Tilde fences are normalised to backtick.
    • Non-closed code blocks are fixed by the formatter automatically. This is an exception, and it matches Prettier behaviour
  • Info strings in code block were incorrectly parsed. I added a new relex context to fix that. However, there's still some logic missing, which I decided not to do here. As per spec, code info blocks mustn't contain single backticks, but we don't emit an error if that's the case cc @jfmcdowell
  • Added a quick_test, which was missing from the parser testing infra.
  • @jfmcdowell I noticed we have some markdown context variants we don't use (i.e. [dead_code]). Can we delete them?

Test Plan

Added new tests. Some prettier snapshots are now gone

Docs

@changeset-bot
Copy link
Copy Markdown

changeset-bot bot commented Apr 14, 2026

⚠️ No Changeset found

Latest commit: ff9c215

Merging this PR will not cause a version bump for any packages. If these changes should not result in a new version, you're good to go. If these changes should result in a version bump, you need to add a changeset.

This PR includes no changesets

When changesets are added to this PR, you'll see the packages that this PR includes changesets for and the associated semver types

Click here to learn what changesets are, and how to add one.

Click here if you're a maintainer who wants to add a changeset to this PR

@github-actions github-actions bot added A-Parser Area: parser A-Formatter Area: formatter L-Markdown Language: Markdown labels Apr 14, 2026
@github-actions
Copy link
Copy Markdown
Contributor

Parser conformance results on

js/262

Test result main count This PR count Difference
Total 53178 53178 0
Passed 51958 51958 0
Failed 1178 1178 0
Panics 42 42 0
Coverage 97.71% 97.71% 0.00%

jsx/babel

Test result main count This PR count Difference
Total 38 38 0
Passed 37 37 0
Failed 1 1 0
Panics 0 0 0
Coverage 97.37% 97.37% 0.00%

markdown/commonmark

Test result main count This PR count Difference
Total 652 652 0
Passed 652 652 0
Failed 0 0 0
Panics 0 0 0
Coverage 100.00% 100.00% 0.00%

symbols/microsoft

Test result main count This PR count Difference
Total 5467 5467 0
Passed 1915 1915 0
Failed 3552 3552 0
Panics 0 0 0
Coverage 35.03% 35.03% 0.00%

ts/babel

Test result main count This PR count Difference
Total 640 640 0
Passed 569 569 0
Failed 71 71 0
Panics 0 0 0
Coverage 88.91% 88.91% 0.00%

ts/microsoft

Test result main count This PR count Difference
Total 18876 18876 0
Passed 13014 13014 0
Failed 5861 5861 0
Panics 1 1 0
Coverage 68.94% 68.94% 0.00%

@codspeed-hq
Copy link
Copy Markdown

codspeed-hq bot commented Apr 14, 2026

Merging this PR will not alter performance

✅ 28 untouched benchmarks
⏩ 228 skipped benchmarks1


Comparing fix/md-fmt-and-parse (ff9c215) with main (3f89810)

Open in CodSpeed

Footnotes

  1. 228 benchmarks were skipped, so the baseline results were used instead. If they were deleted from the codebase, click here and archive them to remove them from the performance reports.

@coderabbitai
Copy link
Copy Markdown
Contributor

coderabbitai bot commented Apr 14, 2026

Walkthrough

Updates Markdown formatter and parser handling for fenced code blocks and related auxiliary tokens. Formatter changes replace verbatim-node emissions with structured .format() calls across several rules, normalise fences to use backticks and always emit a closing fence, and migrate trim/trimming helpers to TextPrintMode helpers. Parser changes add a CodeInfoString lex/re-lex context, introduce MarkdownParser::re_lex, and invoke re-lexing when parsing fenced-code info strings. Tests and fixtures updated for new fenced-code variants.

Possibly related PRs

Suggested reviewers

  • dyc3
🚥 Pre-merge checks | ✅ 2
✅ Passed checks (2 passed)
Check name Status Explanation
Title check ✅ Passed The title directly addresses the main changes: code info string parsing fixes and formatter advancement improvements across multiple markdown formatting modules.
Description check ✅ Passed The description covers the PR's core objectives: removing verbatim formatting, normalising fences to backticks, fixing code block info string parsing, and adding tests.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Commit unit tests in branch fix/md-fmt-and-parse

Comment @coderabbitai help to get the list of available commands and usage tips.

Copy link
Copy Markdown
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🧹 Nitpick comments (2)
crates/biome_markdown_formatter/src/markdown/auxiliary/fenced_code_block.rs (1)

28-30: Tilde-to-backtick normalisation: theoretical edge case.

The formatter unconditionally normalises all fence characters to backticks. Per CommonMark, backtick-fenced info strings cannot contain backticks, but tilde-fenced ones can. Normalising a tilde fence with backticks in its info string would produce invalid Markdown.

That said, the parser already validates this (rejects backticks in backtick fence info strings), and no test cases with this scenario exist. If it becomes an issue in practice, adding a test case and handling would be straightforward.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@crates/biome_markdown_formatter/src/markdown/auxiliary/fenced_code_block.rs`
around lines 28 - 30, The current code in fenced_code_block.rs always normalizes
fences to backticks (using longest_fence_char_sequence(node, '`') and building
normalized_fence), which can produce invalid Markdown if the original fence was
'~' and the info string contains backticks; change the logic to preserve the
original fence character when the original fence is '~' and the info string
contains '`' (i.e., inspect node to determine the original fence char and the
info string, and only normalize to backticks when doing so will not introduce
backticks into the info string), otherwise proceed with the existing
longest_fence_char_sequence(...) and repeat_n(...) logic to build
normalized_fence.
crates/biome_markdown_parser/src/lexer/mod.rs (1)

275-280: Drop unreachable CodeInfoString branch in WHS arm.

The early return in consume_token already handles CodeInfoString, so this extra branch cannot run and just adds noise.

Suggested tidy-up
                 } else if matches!(context, MarkdownLexContext::LinkDefinition) {
                     // In link definition context, whitespace separates tokens.
                     // We consume it as textual literal so it's not treated as trivia by the parser.
                     self.consume_link_definition_whitespace()
-                } else if matches!(context, MarkdownLexContext::CodeInfoString) {
-                    return if current == b'\n' || current == b'\r' {
-                        self.consume_newline()
-                    } else {
-                        self.consume_code_info_string()
-                    };
                 } else if self.after_newline && matches!(current, b' ' | b'\t') {
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@crates/biome_markdown_parser/src/lexer/mod.rs` around lines 275 - 280, The
WHS arm in consume_token contains an unreachable branch checking
matches!(context, MarkdownLexContext::CodeInfoString) and returning
consume_newline() or consume_code_info_string(); remove this conditional block
from the WHS match arm (leaving the existing WHS behavior intact) because
consume_token already returns early for MarkdownLexContext::CodeInfoString;
update any surrounding match/else layout as needed to compile without changing
semantics of consume_token, consume_newline, or consume_code_info_string.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Nitpick comments:
In `@crates/biome_markdown_formatter/src/markdown/auxiliary/fenced_code_block.rs`:
- Around line 28-30: The current code in fenced_code_block.rs always normalizes
fences to backticks (using longest_fence_char_sequence(node, '`') and building
normalized_fence), which can produce invalid Markdown if the original fence was
'~' and the info string contains backticks; change the logic to preserve the
original fence character when the original fence is '~' and the info string
contains '`' (i.e., inspect node to determine the original fence char and the
info string, and only normalize to backticks when doing so will not introduce
backticks into the info string), otherwise proceed with the existing
longest_fence_char_sequence(...) and repeat_n(...) logic to build
normalized_fence.

In `@crates/biome_markdown_parser/src/lexer/mod.rs`:
- Around line 275-280: The WHS arm in consume_token contains an unreachable
branch checking matches!(context, MarkdownLexContext::CodeInfoString) and
returning consume_newline() or consume_code_info_string(); remove this
conditional block from the WHS match arm (leaving the existing WHS behavior
intact) because consume_token already returns early for
MarkdownLexContext::CodeInfoString; update any surrounding match/else layout as
needed to compile without changing semantics of consume_token, consume_newline,
or consume_code_info_string.

ℹ️ Review info
⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

Run ID: a6def3d3-f782-4091-adf1-90e523703da2

📥 Commits

Reviewing files that changed from the base of the PR and between 3f89810 and feaf08f.

⛔ Files ignored due to path filters (16)
  • crates/biome_markdown_formatter/tests/specs/markdown/fenced_code_block_info_string.md.snap is excluded by !**/*.snap and included by **
  • crates/biome_markdown_formatter/tests/specs/prettier/markdown/multiparser-js/meta-in-code-block.md.snap is excluded by !**/*.snap and included by **
  • crates/biome_markdown_formatter/tests/specs/prettier/markdown/spec/example-104.md.snap is excluded by !**/*.snap and included by **
  • crates/biome_markdown_formatter/tests/specs/prettier/markdown/spec/example-106.md.snap is excluded by !**/*.snap and included by **
  • crates/biome_markdown_formatter/tests/specs/prettier/markdown/spec/example-108.md.snap is excluded by !**/*.snap and included by **
  • crates/biome_markdown_formatter/tests/specs/prettier/markdown/spec/example-110.md.snap is excluded by !**/*.snap and included by **
  • crates/biome_markdown_formatter/tests/specs/prettier/markdown/spec/example-197.md.snap is excluded by !**/*.snap and included by **
  • crates/biome_markdown_formatter/tests/specs/prettier/markdown/spec/example-292.md.snap is excluded by !**/*.snap and included by **
  • crates/biome_markdown_formatter/tests/specs/prettier/markdown/spec/example-297.md.snap is excluded by !**/*.snap and included by **
  • crates/biome_markdown_formatter/tests/specs/prettier/markdown/spec/example-307.md.snap is excluded by !**/*.snap and included by **
  • crates/biome_markdown_formatter/tests/specs/prettier/markdown/spec/example-88.md.snap is excluded by !**/*.snap and included by **
  • crates/biome_markdown_formatter/tests/specs/prettier/markdown/spec/example-90.md.snap is excluded by !**/*.snap and included by **
  • crates/biome_markdown_formatter/tests/specs/prettier/markdown/spec/example-92.md.snap is excluded by !**/*.snap and included by **
  • crates/biome_markdown_formatter/tests/specs/prettier/markdown/spec/example-93.md.snap is excluded by !**/*.snap and included by **
  • crates/biome_markdown_formatter/tests/specs/prettier/markdown/spec/example-94.md.snap is excluded by !**/*.snap and included by **
  • crates/biome_markdown_formatter/tests/specs/prettier/markdown/spec/example-95.md.snap is excluded by !**/*.snap and included by **
📒 Files selected for processing (11)
  • crates/biome_markdown_formatter/src/markdown/auxiliary/fenced_code_block.rs
  • crates/biome_markdown_formatter/src/markdown/auxiliary/hard_line.rs
  • crates/biome_markdown_formatter/src/markdown/auxiliary/header.rs
  • crates/biome_markdown_formatter/src/markdown/auxiliary/indent_token.rs
  • crates/biome_markdown_formatter/src/markdown/lists/code_name_list.rs
  • crates/biome_markdown_formatter/tests/quick_test.rs
  • crates/biome_markdown_formatter/tests/specs/markdown/fenced_code_block_info_string.md
  • crates/biome_markdown_parser/src/lexer/mod.rs
  • crates/biome_markdown_parser/src/parser.rs
  • crates/biome_markdown_parser/src/syntax/fenced_code_block.rs
  • crates/biome_markdown_parser/tests/quick_test.rs

Copy link
Copy Markdown
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🧹 Nitpick comments (1)
crates/biome_markdown_parser/src/lexer/mod.rs (1)

74-75: Tiny docstring polish.

The comment reads “strings doesn't have particular meaning”; consider “strings don't have special meaning” for clarity.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@crates/biome_markdown_parser/src/lexer/mod.rs` around lines 74 - 75, Update
the doc comment for the enum variant CodeInfoString: replace the phrase "Inside
the code block list, where strings doesn't have particular meaning" with
grammatically correct wording such as "Inside the code block list, where strings
don't have special meaning" in the comment above CodeInfoString in lexer/mod.rs.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Nitpick comments:
In `@crates/biome_markdown_parser/src/lexer/mod.rs`:
- Around line 74-75: Update the doc comment for the enum variant CodeInfoString:
replace the phrase "Inside the code block list, where strings doesn't have
particular meaning" with grammatically correct wording such as "Inside the code
block list, where strings don't have special meaning" in the comment above
CodeInfoString in lexer/mod.rs.

ℹ️ Review info
⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

Run ID: fd0995c6-5825-4c14-8fee-5c05b7d6752b

📥 Commits

Reviewing files that changed from the base of the PR and between feaf08f and 01f6b79.

⛔ Files ignored due to path filters (1)
  • crates/biome_markdown_formatter/tests/specs/markdown/fenced_code_block.md.snap is excluded by !**/*.snap and included by **
📒 Files selected for processing (1)
  • crates/biome_markdown_parser/src/lexer/mod.rs

@ematipico ematipico requested review from a team April 14, 2026 09:12
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this looks like a regression

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good catch, fixed it

@ematipico ematipico requested a review from dyc3 April 14, 2026 14:57
Copy link
Copy Markdown
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🧹 Nitpick comments (1)
crates/biome_markdown_formatter/src/markdown/auxiliary/textual.rs (1)

72-80: Consider using .trim() for brevity.

trim_start().trim_end() is equivalent to .trim() — the latter is slightly more idiomatic.

♻️ Suggested simplification
         } else if self.print_mode.is_all() {
-            let trimmed_text = value_token.text().trim_start().trim_end();
+            let trimmed_text = value_token.text().trim();
             write!(
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@crates/biome_markdown_formatter/src/markdown/auxiliary/textual.rs` around
lines 72 - 80, Replace the verbose .trim_start().trim_end() call with .trim()
where the trimmed_text is computed (inside the branch that checks
self.print_mode.is_all()), i.e. update the assignment to trimmed_text that
currently uses value_token.text().trim_start().trim_end() to use
value_token.text().trim(); leave the surrounding write!( ...,
format_replaced(&value_token, &text(trimmed_text,
value_token.text_trimmed_range().start())) ) logic unchanged.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Nitpick comments:
In `@crates/biome_markdown_formatter/src/markdown/auxiliary/textual.rs`:
- Around line 72-80: Replace the verbose .trim_start().trim_end() call with
.trim() where the trimmed_text is computed (inside the branch that checks
self.print_mode.is_all()), i.e. update the assignment to trimmed_text that
currently uses value_token.text().trim_start().trim_end() to use
value_token.text().trim(); leave the surrounding write!( ...,
format_replaced(&value_token, &text(trimmed_text,
value_token.text_trimmed_range().start())) ) logic unchanged.

ℹ️ Review info
⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

Run ID: 35a92a93-b883-4c14-9530-8ca8ffe40b6d

📥 Commits

Reviewing files that changed from the base of the PR and between 01f6b79 and ff9c215.

📒 Files selected for processing (10)
  • crates/biome_markdown_formatter/src/markdown/auxiliary/inline_image.rs
  • crates/biome_markdown_formatter/src/markdown/auxiliary/inline_link.rs
  • crates/biome_markdown_formatter/src/markdown/auxiliary/link_destination.rs
  • crates/biome_markdown_formatter/src/markdown/auxiliary/link_title.rs
  • crates/biome_markdown_formatter/src/markdown/auxiliary/reference_link.rs
  • crates/biome_markdown_formatter/src/markdown/auxiliary/setext_header.rs
  • crates/biome_markdown_formatter/src/markdown/auxiliary/textual.rs
  • crates/biome_markdown_formatter/src/markdown/lists/code_name_list.rs
  • crates/biome_markdown_formatter/src/markdown/lists/inline_item_list.rs
  • crates/biome_markdown_formatter/src/shared.rs
🚧 Files skipped from review as they are similar to previous changes (1)
  • crates/biome_markdown_formatter/src/markdown/lists/code_name_list.rs

@jfmcdowell
Copy link
Copy Markdown
Contributor

@ematipico, yes the markdown dead code variants can be deleted. As for the backticks, we can add a diagnostic for friendlier feedback.

@ematipico ematipico merged commit 43bb1ed into main Apr 15, 2026
28 checks passed
@ematipico ematipico deleted the fix/md-fmt-and-parse branch April 15, 2026 04:55
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

A-Formatter Area: formatter A-Parser Area: parser L-Markdown Language: Markdown

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants