Skip to content

Conversation

@Gankra
Copy link
Contributor

@Gankra Gankra commented Nov 20, 2025

Summary

This introduces a very bad and naive python-docstring-flavoured-reStructuredText to github-flavor-markdown translator. The main goal is to try to preserve a lot of the formatting and plaintext, progressively enhance the content when we find things we know about, and escape the text when we find things that might get corrupt.

Previously I'd broken this out into rendering each different format, but with this approach you don't really need to?

Test Plan

Lots of snapshot tests, also messed around in some random stdlib modules.

Comment on lines +166 to +173
let mut output = String::new();
let mut first_line = true;
let mut block_indent = 0;
let mut in_doctest = false;
let mut starting_literal = false;
let mut in_literal = false;
let mut in_any_code = false;
for untrimmed_line in docstring.lines() {
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is the worst parser I've written in a long while so if you tell me "Aria for the love of god make a Parser type with methods and enums" I will. Writing it this way was conducive to experimentation and thinking about the format.

@astral-sh-bot
Copy link

astral-sh-bot bot commented Nov 20, 2025

Diagnostic diff on typing conformance tests

No changes detected when running ty on typing conformance tests ✅

@Gankra
Copy link
Contributor Author

Gankra commented Nov 20, 2025

I spent a while salivating over the ruff docstring formatter as a basis for this, but I ultimately concluded it would be a nightmare for both codebases to try to share that logic, and it's not that much logic.

@Gankra Gankra added server Related to the LSP server ty Multi-file analysis & type inference labels Nov 20, 2025
@Gankra Gankra changed the title [ty] implement docstring rendering to markdown [ty] Implement docstring rendering to markdown Nov 20, 2025
@astral-sh-bot
Copy link

astral-sh-bot bot commented Nov 20, 2025

mypy_primer results

No ecosystem changes detected ✅

No memory usage changes detected ✅

@Gankra
Copy link
Contributor Author

Gankra commented Nov 20, 2025

The main risk of shipping this implementation is that the previous implementation was so wildly conservative that it "couldn't" ever mess up whereas insufficient escaping or getting confused could complete garble things and eat the docs.

On balance though this just looks so much better I think the risk is worth it (and shipping it will get me a lot of bug reports about what to fix :)

let mut starting_literal = false;
let mut in_literal = false;
let mut in_any_code = false;
for untrimmed_line in docstring.lines() {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You want to use .universal_newlines here to get proper \r support

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I need to signpost that better in the comments but this function assumes documentation_trim (in the same file) was already run, so our leading whitespace is only spaces, and our newlines are only \n (and unlike ruff we don't need to care about authentically preserving the flavour of whitespace).

///
/// This code currently assumes docstrings are reStructuredText with significant
/// line-breaks and indentation. Some constructs like *italic*, **bold**, and `inline code`
/// are shared between the two formats. Other things like _italic_ are only markdown
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What are the two formats that you're referring to here?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

reStructuredText and markdown

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh, that's not clear to me from the comment

// Add this line's indentation.
// We could subtract the block_indent here but in practice it's uglier
for _ in 0..line_indent {
// If we're not in a codeblock use non-breaking spaces to preserve the indent
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you say why this is necessary, given that it gets rendered as markdown?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes so the idea here is that docstrings really like using indentation for structure that we do not (care to) parse but we also don't want markdown turning into code-blocks. By replacing leading whitespace with   outside of codeblocks, we preserve the native indent of the docstring and stuff we don't understand still is rendered clearly and pretty.

Screenshot 2025-11-21 at 7 40 49 AM

Copy link
Contributor Author

@Gankra Gankra Nov 21, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

(Even if I had a perfect parser for this format, the only clear way to render this to markdown, imo, is by rendering these blocks as indented)

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh, I think I misread it as in a code block. This makes a ton of sense for indents outside code blocks.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I wonder if we should use the unicode code point directly so that we don't rely on the markdown renderer supporting HTML?


// Add this line's indentation.
// We could subtract the block_indent here but in practice it's uglier
for _ in 0..line_indent {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This won't work for code using tab indentation (which our formatter supports).

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

(another thing handled by documentation_trim)

///
/// This is by no means perfect but it handles a lot of common formats well.
///
/// This code currently assumes docstrings are reStructuredText with significant
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What happens for docstrings using the numpy or google format? will they still be rendered correctly?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Screenshot 2025-11-21 at 7 44 21 AM Screenshot 2025-11-21 at 7 44 48 AM Screenshot 2025-11-21 at 7 45 11 AM


// First thing's first, add a newline to start the new line
if !first_line {
// If we're not in a codeblock, add trailing space to the line to authentically wrap it
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think I understand this. Why is it necessary to add trailing whitespace to any line? This seems very undesired?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's part of the approach I'm taking where I want to preserve the wrapping/indentation of the original docstring.

It prevents e.g. param1 and param2 from being merged together as a single line in:

'param1' -- The first parameter description
'param2' -- The second parameter description
            This is a continuation of param2 description.
'param3' -- A parameter without type annotation

/// line-breaks and indentation. Some constructs like *italic*, **bold**, and `inline code`
/// are shared between the two formats. Other things like _italic_ are only markdown
/// and should be escaped.
fn render_markdown(docstring: &str) -> String {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let's add a link to reStructuredText

in_literal = true;
in_any_code = true;
block_indent = line_indent;
// TODO: should we not be this aggressive? Let it autodetect?
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

https://www.sphinx-doc.org/en/master/usage/restructuredtext/directives.html#directive-code-block

Defaulting to Python doesn't seem unreasonable but we should respect any override

}

// If we're not in a codeblock and we see something that signals a literal block, start one
if !in_any_code && let Some(without_lit) = line.strip_suffix("::") {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd have to play with restructuredText but the way I read this paragraph

Literal code blocks (ref) are introduced by ending a paragraph with the special marker ::. The literal block must be indented

is that what comes after is only a literal block if there's at least one blank line, and text is indented.

I think your code correctly handles the empty line. But it ends up pushing an empty markdown code block if the next paragraph isn't correctly indented.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Haha yes you caught me this part of the parser is distinctly janky. I'll tweak it to make it Spec Compliant (although I can't help but wonder if being sloppy will be the name of the game since we need to render whatever we find).

Comment on lines +166 to +172
let mut output = String::new();
let mut first_line = true;
let mut block_indent = 0;
let mut in_doctest = false;
let mut starting_literal = false;
let mut in_literal = false;
let mut in_any_code = false;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not sure about mixing the parsing of docstrings with the formatting logic. It seems harder to follow. I'd be inclined to parse the docstring into an intermediate representation and then do a second pass to format it.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Dang, ok if the captain of the performance team wants more intermediate allocations and representations who am I to complain 😄

(I was half expecting to be told to remove some intermediate Strings)

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Lol. I'm not too concerned here, given that we don't render markdown strings that frequently and having a more structured representation would ultimately be useful for Ruff too. Although I don't think we should aim to make it reusable for now.

@AlexWaygood AlexWaygood removed their request for review November 21, 2025 12:48
if !in_any_code {
// This is plain text, apply some escapes for reST that don't map to markdown:
// * underscores aren't used for italics/bold in reST (don't mess up __dunders__)
output.push_str(&line.replace('_', "\\_"));
Copy link
Contributor Author

@Gankra Gankra Nov 21, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Found an in-retrospect-obvious place where this is broken: this mangles inline code __init__ (I kept seeing it without any code markup, so that was my primary concern)

@Gankra Gankra merged commit 6292582 into main Nov 21, 2025
40 checks passed
@Gankra Gankra deleted the gankra/doc-render branch November 21, 2025 15:47
dcreager added a commit that referenced this pull request Nov 24, 2025
…d-typevar

* origin/main: (24 commits)
  [ty] Remove brittle constraint set reveal tests (#21568)
  [`ruff`] Catch more dummy variable uses (`RUF052`) (#19799)
  [ty] Use the same snapshot handling as other tests (#21564)
  [ty] suppress autocomplete suggestions during variable binding (#21549)
  Set severity for non-rule diagnostics (#21559)
  [ty] Add `with_type` convenience to display code (#21563)
  [ty] Implement docstring rendering to markdown (#21550)
  [ty] Reduce indentation of `TypeInferenceBuilder::infer_attribute_load` (#21560)
  Bump 0.14.6 (#21558)
  [ty] Improve debug messages when imports fail (#21555)
  [ty] Add support for relative import completions
  [ty] Refactor detection of import statements for completions
  [ty] Use dedicated collector for completions
  [ty] Attach subdiagnostics to `unresolved-import` errors for relative imports as well as absolute imports (#21554)
  [ty] support PEP 613 type aliases (#21394)
  [ty] More low-hanging fruit for inlay hint goto-definition (#21548)
  [ty] implement `TypedDict` structural assignment (#21467)
  [ty] Add more random TypeDetails and tests (#21546)
  [ty] Add goto for `Unknown` when it appears in an inlay hint (#21545)
  [ty] Add type definitions for `Type::SpecialForm`s (#21544)
  ...
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

server Related to the LSP server ty Multi-file analysis & type inference

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants