Skip to content

Support arbitrary length code fences in markdown#22937

Closed
amyreese wants to merge 1 commit intomainfrom
amy/ruffen-docs-fancy-regex
Closed

Support arbitrary length code fences in markdown#22937
amyreese wants to merge 1 commit intomainfrom
amy/ruffen-docs-fancy-regex

Conversation

@amyreese
Copy link
Member

  • Use fancy_regex crate
  • Use backreference in regex to match arbitrary length code fences
  • Support ~~~ style code fences as well

@astral-sh-bot
Copy link

astral-sh-bot bot commented Jan 29, 2026

Typing conformance results

No changes detected ✅

@astral-sh-bot
Copy link

astral-sh-bot bot commented Jan 29, 2026

mypy_primer results

Changes were detected when running on open source projects
pydantic (https://github.com/pydantic/pydantic)
- pydantic/_internal/_core_metadata.py:87:54: error[invalid-assignment] Invalid assignment to key "pydantic_js_extra" with declared type `dict[str, Divergent] | ((dict[str, Divergent], /) -> None) | ((dict[str, Divergent], type[Any], /) -> None)` on TypedDict `CoreMetadata`: value of type `dict[object, object]`
+ pydantic/_internal/_core_metadata.py:87:54: error[invalid-assignment] Invalid assignment to key "pydantic_js_extra" with declared type `dict[str, int | float | str | ... omitted 3 union elements] | ((dict[str, int | float | str | ... omitted 3 union elements], /) -> None) | ((dict[str, int | float | str | ... omitted 3 union elements], type[Any], /) -> None)` on TypedDict `CoreMetadata`: value of type `dict[object, object]`
- pydantic/fields.py:949:5: error[invalid-parameter-default] Default value of type `PydanticUndefinedType` is not assignable to annotated parameter type `dict[str, Divergent] | ((dict[str, Divergent], /) -> None) | None`
+ pydantic/fields.py:949:5: error[invalid-parameter-default] Default value of type `PydanticUndefinedType` is not assignable to annotated parameter type `dict[str, int | float | str | ... omitted 3 union elements] | ((dict[str, int | float | str | ... omitted 3 union elements], /) -> None) | None`
- pydantic/fields.py:989:5: error[invalid-parameter-default] Default value of type `PydanticUndefinedType` is not assignable to annotated parameter type `dict[str, Divergent] | ((dict[str, Divergent], /) -> None) | None`
+ pydantic/fields.py:989:5: error[invalid-parameter-default] Default value of type `PydanticUndefinedType` is not assignable to annotated parameter type `dict[str, int | float | str | ... omitted 3 union elements] | ((dict[str, int | float | str | ... omitted 3 union elements], /) -> None) | None`
- pydantic/fields.py:1032:5: error[invalid-parameter-default] Default value of type `PydanticUndefinedType` is not assignable to annotated parameter type `dict[str, Divergent] | ((dict[str, Divergent], /) -> None) | None`
+ pydantic/fields.py:1032:5: error[invalid-parameter-default] Default value of type `PydanticUndefinedType` is not assignable to annotated parameter type `dict[str, int | float | str | ... omitted 3 union elements] | ((dict[str, int | float | str | ... omitted 3 union elements], /) -> None) | None`
- pydantic/fields.py:1072:5: error[invalid-parameter-default] Default value of type `PydanticUndefinedType` is not assignable to annotated parameter type `dict[str, Divergent] | ((dict[str, Divergent], /) -> None) | None`
+ pydantic/fields.py:1072:5: error[invalid-parameter-default] Default value of type `PydanticUndefinedType` is not assignable to annotated parameter type `dict[str, int | float | str | ... omitted 3 union elements] | ((dict[str, int | float | str | ... omitted 3 union elements], /) -> None) | None`
- pydantic/fields.py:1115:5: error[invalid-parameter-default] Default value of type `PydanticUndefinedType` is not assignable to annotated parameter type `dict[str, Divergent] | ((dict[str, Divergent], /) -> None) | None`
+ pydantic/fields.py:1115:5: error[invalid-parameter-default] Default value of type `PydanticUndefinedType` is not assignable to annotated parameter type `dict[str, int | float | str | ... omitted 3 union elements] | ((dict[str, int | float | str | ... omitted 3 union elements], /) -> None) | None`
- pydantic/fields.py:1154:5: error[invalid-parameter-default] Default value of type `PydanticUndefinedType` is not assignable to annotated parameter type `dict[str, Divergent] | ((dict[str, Divergent], /) -> None) | None`
+ pydantic/fields.py:1154:5: error[invalid-parameter-default] Default value of type `PydanticUndefinedType` is not assignable to annotated parameter type `dict[str, int | float | str | ... omitted 3 union elements] | ((dict[str, int | float | str | ... omitted 3 union elements], /) -> None) | None`
- pydantic/fields.py:1194:5: error[invalid-parameter-default] Default value of type `PydanticUndefinedType` is not assignable to annotated parameter type `dict[str, Divergent] | ((dict[str, Divergent], /) -> None) | None`
+ pydantic/fields.py:1194:5: error[invalid-parameter-default] Default value of type `PydanticUndefinedType` is not assignable to annotated parameter type `dict[str, int | float | str | ... omitted 3 union elements] | ((dict[str, int | float | str | ... omitted 3 union elements], /) -> None) | None`
- pydantic/fields.py:1573:13: error[invalid-argument-type] Argument is incorrect: Expected `dict[str, Divergent] | ((dict[str, Divergent], /) -> None) | None`, found `Top[dict[Unknown, Unknown]] | (((dict[str, Divergent], /) -> None) & ~Top[dict[Unknown, Unknown]]) | None`
+ pydantic/fields.py:1573:13: error[invalid-argument-type] Argument is incorrect: Expected `dict[str, int | float | str | ... omitted 3 union elements] | ((dict[str, int | float | str | ... omitted 3 union elements], /) -> None) | None`, found `Top[dict[Unknown, Unknown]] | (((dict[str, int | float | str | ... omitted 3 union elements], /) -> None) & ~Top[dict[Unknown, Unknown]]) | None`

prefect (https://github.com/PrefectHQ/prefect)
- src/integrations/prefect-dbt/prefect_dbt/core/settings.py:94:28: error[invalid-assignment] Object of type `dict[Any, Any] | int | dict[str, Any] | ... omitted 4 union elements` is not assignable to `dict[str, Any]`
- src/integrations/prefect-dbt/prefect_dbt/core/settings.py:99:28: error[invalid-assignment] Object of type `int | dict[Any, Any] | float | ... omitted 3 union elements` is not assignable to `dict[str, Any]`
- src/prefect/cli/deploy/_core.py:86:21: error[invalid-assignment] Object of type `dict[Any, Any] | int | dict[str, Any] | ... omitted 4 union elements` is not assignable to `dict[str, Any]`
- src/prefect/cli/deploy/_core.py:87:21: error[invalid-assignment] Object of type `int | dict[Any, Any] | float | ... omitted 3 union elements` is not assignable to `dict[str, Any]`
- src/prefect/deployments/steps/core.py:137:38: error[invalid-argument-type] Argument is incorrect: Argument type `dict[Any, Any] | int | dict[str, Any] | ... omitted 4 union elements` does not satisfy constraints (`str`, `int`, `int | float`, `bool`, `dict[Any, Any]`, `list[Any]`, `None`) of type variable `T`
- src/prefect/utilities/templating.py:320:13: error[invalid-assignment] Invalid subscript assignment with key of type `object` and value of type `Unknown | int | dict[str, Any] | ... omitted 4 union elements` on object of type `dict[str, Any]`
+ src/prefect/utilities/templating.py:320:13: error[invalid-assignment] Invalid subscript assignment with key of type `object` and value of type `Unknown | dict[str, Any]` on object of type `dict[str, Any]`
- src/prefect/utilities/templating.py:323:16: error[invalid-return-type] Return type does not match returned value: expected `T@resolve_block_document_references | dict[str, Any]`, found `list[Unknown | int | dict[str, Any] | ... omitted 4 union elements]`
+ src/prefect/utilities/templating.py:323:16: error[invalid-return-type] Return type does not match returned value: expected `T@resolve_block_document_references | dict[str, Any]`, found `list[Unknown | dict[str, Any]]`
- src/prefect/utilities/templating.py:437:16: error[invalid-return-type] Return type does not match returned value: expected `T@resolve_variables`, found `dict[object, Unknown | int | float | ... omitted 4 union elements]`
+ src/prefect/utilities/templating.py:437:16: error[invalid-return-type] Return type does not match returned value: expected `T@resolve_variables`, found `dict[object, Unknown]`
- src/prefect/utilities/templating.py:442:16: error[invalid-return-type] Return type does not match returned value: expected `T@resolve_variables`, found `list[Unknown | int | float | ... omitted 4 union elements]`
+ src/prefect/utilities/templating.py:442:16: error[invalid-return-type] Return type does not match returned value: expected `T@resolve_variables`, found `list[Unknown]`
+ src/prefect/workers/base.py:232:13: error[invalid-argument-type] Argument is incorrect: Argument type `str | dict[str, Any]` does not satisfy constraints (`str`, `int`, `int | float`, `bool`, `dict[Any, Any]`, `list[Any]`, `None`) of type variable `T`
- src/prefect/workers/base.py:232:13: error[invalid-argument-type] Argument is incorrect: Argument type `str | int | dict[str, Any] | ... omitted 3 union elements` does not satisfy constraints (`str`, `int`, `int | float`, `bool`, `dict[Any, Any]`, `list[Any]`, `None`) of type variable `T`
- src/prefect/workers/base.py:234:20: error[invalid-argument-type] Argument expression after ** must be a mapping type: Found `int | Unknown | float | ... omitted 4 union elements`
- Found 5374 diagnostics
+ Found 5368 diagnostics

scikit-build-core (https://github.com/scikit-build/scikit-build-core)
- src/scikit_build_core/build/wheel.py:99:20: error[no-matching-overload] No overload of bound method `__init__` matches arguments
- Found 47 diagnostics
+ Found 46 diagnostics

No memory usage changes detected ✅

@astral-sh-bot
Copy link

astral-sh-bot bot commented Jan 29, 2026

ruff-ecosystem results

Linter (stable)

✅ ecosystem check detected no linter changes.

Linter (preview)

✅ ecosystem check detected no linter changes.

Formatter (stable)

✅ ecosystem check detected no format changes.

Formatter (preview)

✅ ecosystem check detected no format changes.

@amyreese amyreese requested a review from ntBre January 29, 2026 02:13
Copy link
Contributor

@ntBre ntBre left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should we add some new tests? This looks reasonable to me, but I'm curious if @BurntSushi has any thoughts on fancy-regex.


insta = { workspace = true }
regex = { workspace = true }
fancy-regex = "0.17.0"
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we usually add dependencies at the workspace level and then use { workspace = true } in the crates, but I could be wrong.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think that's common practice yeah.

@BurntSushi
Copy link
Member

fancy-regex is a solid crate if you want regex features not available in the regex crate. They mostly try to match Oniguruma.

My only real concern here is using backreferences if we don't really need them, because this does mean that the regex will need to do some kind of backtracking. I grant it's convenient here. The main failure mode is if there are inputs that can cause the regex to catastrophically backtrack. And if so, what those inputs look like. If they're pathological, then maybe we don't care. But it's notoriously difficult to know what specifically will cause catastrophic backtracking. (Hence why the regex crate provides a strict linear time guarantee.)

@amyreese
Copy link
Member Author

amyreese commented Jan 29, 2026

The backreferences are necessary in order to support arbitrary length code fences of either backticks or tildes, to ensure that we correctly match cases of code fences inside code fences, like something a code block that contains an example of a code block containing a code block (yo dawg):

````markdown
```py
print('hello')
```
````

Or a python code block that contains a multiline string with a markdown code block:

````py
print("""
```rust
fn foo() {}
```
""")
````

The alternative is replacing the single regex-with-backreference with a line-by-line approach where we check every line looking for a code fence, and then manually build up the code blocks from there. But IMO the performance risk is pretty low given that a python code block that isn't fenced correctly is already an edge case, and IMO better to have less code to maintain.

Performance wise, I tested against crates/ty_python_semantic/resources/mdtest/generics/pep695/functions.md, which had the most code fences out of all the md test files, and it was ~11.4ms on main vs ~12.2ms for this PR. I have not built/tested a line-by-line approach though, so I don't know where that would end up on performance comparison.

@BurntSushi
Copy link
Member

But IMO the performance risk is pretty low given that a python code block that isn't fenced correctly is already an edge case, and IMO better to have less code to maintain.

To clarify here, the risk of catastrophic backtracking isn't limited just to what the backreferences are doing. Once backtracking is used, it can generally occur anywhere else in the regex, particularly with repetition operators.

It might be fine. Running it on more big test files is probably enough to confirm that it's fine for inputs we care about.

amyreese added a commit that referenced this pull request Jan 31, 2026
- Uses basic `regex` crate, so no backtracking or backreferences needed
- Supports `~~~` and arbitrary length code fences
- Supports `<!-- ruff:off -->` to skip formatting code blocks

Obviates #22962 and #22937
amyreese added a commit that referenced this pull request Feb 3, 2026
Format markdown with line-by-line regex parse

- Uses basic `regex` crate, so no backtracking or backreferences needed
- Supports `~~~` and arbitrary length code fences
- Supports `<!-- fmt:off -->` to skip formatting code blocks
- Includes test cases from previous PRs, as well as new ones

Obviates #22962 and #22937
@amyreese amyreese closed this Feb 3, 2026
@amyreese amyreese deleted the amy/ruffen-docs-fancy-regex branch February 5, 2026 00:44
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants