Skip to content

Format markdown code blocks with line-by-line regex parse#22996

Merged
amyreese merged 8 commits intomainfrom
amy/ruffen-docs-parser
Feb 3, 2026
Merged

Format markdown code blocks with line-by-line regex parse#22996
amyreese merged 8 commits intomainfrom
amy/ruffen-docs-parser

Conversation

@amyreese
Copy link
Member

Format markdown with line-by-line regex parse

  • Uses basic regex crate, so no backtracking or backreferences needed
  • Supports ~~~ and arbitrary length code fences
  • Supports <!-- ruff:off --> to skip formatting code blocks
  • Includes test cases from previous PRs, as well as new ones

Obviates #22962 and #22937

- Uses basic `regex` crate, so no backtracking or backreferences needed
- Supports `~~~` and arbitrary length code fences
- Supports `<!-- ruff:off -->` to skip formatting code blocks

Obviates #22962 and #22937
@astral-sh-bot
Copy link

astral-sh-bot bot commented Jan 31, 2026

Typing conformance results

No changes detected ✅

@astral-sh-bot
Copy link

astral-sh-bot bot commented Jan 31, 2026

mypy_primer results

Changes were detected when running on open source projects
static-frame (https://github.com/static-frame/static-frame)
+ static_frame/core/bus.py:645:16: error[invalid-return-type] Return type does not match returned value: expected `InterGetItemLocReduces[Bus[Any], object_]`, found `InterGetItemLocReduces[Bottom[Bus[Any]] | Bottom[Series[Any, Any]] | TypeBlocks | ... omitted 6 union elements, object_]`
- static_frame/core/bus.py:649:16: error[invalid-return-type] Return type does not match returned value: expected `InterGetItemILocReduces[Bus[Any], object_]`, found `InterGetItemILocReduces[Self@iloc, Self@iloc]`
+ static_frame/core/bus.py:649:16: error[invalid-return-type] Return type does not match returned value: expected `InterGetItemILocReduces[Bus[Any], object_]`, found `InterGetItemILocReduces[Bottom[Bus[Any]] | IndexHierarchy | TypeBlocks | ... omitted 7 union elements, Self@iloc]`
+ static_frame/core/series.py:772:16: error[invalid-return-type] Return type does not match returned value: expected `InterGetItemILocReduces[Series[Any, Any], TVDtype@Series]`, found `InterGetItemILocReduces[Bottom[Series[Any, Any]] | IndexHierarchy | TypeBlocks | ... omitted 7 union elements, TVDtype@Series]`
+ static_frame/core/series.py:4071:16: error[invalid-return-type] Return type does not match returned value: expected `InterGetItemILocReduces[SeriesHE[Any, Any], TVDtype@SeriesHE]`, found `InterGetItemILocReduces[Bottom[Series[Any, Any]] | Bottom[Index[Any]] | TypeBlocks | ... omitted 7 union elements, TVDtype@SeriesHE]`
+ static_frame/core/yarn.py:418:16: error[invalid-return-type] Return type does not match returned value: expected `InterGetItemILocReduces[Yarn[Any], object_]`, found `InterGetItemILocReduces[Bottom[Yarn[Any]] | Bottom[Index[Any]] | Bottom[Series[Any, Any]] | ... omitted 7 union elements, object_]`
- Found 1828 diagnostics
+ Found 1832 diagnostics

No memory usage changes detected ✅

@amyreese amyreese added formatter Related to the formatter preview Related to preview mode features labels Jan 31, 2026
@amyreese amyreese requested review from dylwil3 and ntBre January 31, 2026 02:06
@astral-sh-bot
Copy link

astral-sh-bot bot commented Jan 31, 2026

ruff-ecosystem results

Linter (stable)

✅ ecosystem check detected no linter changes.

Linter (preview)

✅ ecosystem check detected no linter changes.

Formatter (stable)

✅ ecosystem check detected no format changes.

Formatter (preview)

✅ ecosystem check detected no format changes.

@amyreese
Copy link
Member Author

Performance comparison of current main (without these features) and this PR branch:

amethyst@lunatone ~/workspace/ruff amy/ruffen-docs-parser » hf "format --preview --no-cache --isolated --line-length 130 --check crates/ty_python_semantic/resources/mdtest/**/*.md"
   Compiling ruff v0.14.14 (/Users/amethyst/workspace/ruff/crates/ruff)
    Finished `release` profile [optimized] target(s) in 1m 40s
Benchmark 1: target/release/ruff-main format --preview --no-cache --isolated --line-length 130 --check crates/ty_python_semantic/resources/mdtest/**/*.md
  Time (mean ± σ):      40.7 ms ±   0.9 ms    [User: 83.7 ms, System: 33.1 ms]
  Range (min … max):    39.0 ms …  43.2 ms    69 runs

  Warning: Ignoring non-zero exit code.

Benchmark 2: target/release/ruff-feat format --preview --no-cache --isolated --line-length 130 --check crates/ty_python_semantic/resources/mdtest/**/*.md
  Time (mean ± σ):      44.8 ms ±   1.0 ms    [User: 158.5 ms, System: 31.5 ms]
  Range (min … max):    41.4 ms …  46.9 ms    63 runs

  Warning: Ignoring non-zero exit code.

Performance on this PR is faster wall-time than #22937 (~45ms vs ~54ms) while also including the skip feature, and uses half the "User" time (~160ms vs ~320ms).

Copy link
Contributor

@ntBre ntBre left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice, this looks great!

Comment on lines 75 to 76
for code_line in lines.by_ref() {
if let Some(closing_capture) = MARKDOWN_CODE_FENCE.captures(&code_line) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You could maybe save some indentation by using a let-else here. Alternatively, if you're only looking for this final line, I think you could maybe do something like:

let Some(closing_capture) = lines.by_ref().find_map(|code_line| MARKDOWN_CODE_FENCE.captures(&code_line) else {
    continue
}

Copy link
Member Author

@amyreese amyreese Feb 3, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Unfortunately it needs to be more complicated than that, because it has to keep looking at lines matching the regex until it also finds lines that meet the conditions from line 79-85, so that it's handling cases where code blocks contain embedded code blocks (eg, not Python, or cases where the code block contains Python code that contains docstrings that contain code blocks 😅).

But the let-else to remove a layer of indentation is useful.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah right, makes sense!

@ntBre ntBre mentioned this pull request Feb 3, 2026
@amyreese amyreese enabled auto-merge (squash) February 3, 2026 16:55
@amyreese amyreese merged commit c0c1b98 into main Feb 3, 2026
48 checks passed
@amyreese amyreese deleted the amy/ruffen-docs-parser branch February 3, 2026 16:56
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

formatter Related to the formatter preview Related to preview mode features

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Handle ~~~ and arbitrary length code fences Skip directives for markdown formatting

2 participants