Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

RFC: rustdoc meta tags #1713

Closed
wants to merge 1 commit into from
Closed

RFC: rustdoc meta tags #1713

wants to merge 1 commit into from

Conversation

jimmycuadra
Copy link

@oli-obk
Copy link
Contributor

oli-obk commented Aug 11, 2016

one alternative for function arguments and return value is to add doc comment directly to the argument/return type.

@killercup
Copy link
Member

killercup commented Aug 11, 2016

Another alternative is to go the Swift-way, i.e. use special headings and formatting that rustdoc will have knowledge of. Which has the nice property that this will still look fine when seen as plain markdown and rustdoc doesn't have to care about it to render it nicely.

For example:

/// Fooify a `Foo` with a label
///
/// # Parameters
///
/// - `label`: A string labeling the foo
/// - `magic`: A `Foo` that will be labeled
///
/// # Returns
///
/// A `Result` which is:
///
/// - `Ok`: A `Bar` that is the labeled `Foo` and thus lives as long as the
///     `Foo` given in `magic`.
/// - `Err`: Returns the number of gravely appalled people (per half-century
///     per country) if you were to use that label *and* `Foo`'s acceptance
///     indicator is less than it.
///
/// # Lifetimes
///
/// - `floof`: The life time of the given foo as determined by the floof source
///     it was originally loaded from.
///
/// # Examples
///
/// ```rust
/// assert_eq!(fooify("lorem", Foo::extract_from_global_floof_resource()).label(),
///            Bar::with_label("lorem"))
/// ```
///
fn fooify<'floof>(label: String, magic: Foo<'floof>) -> Result<Bar<'floof>, i32> {
    unimplemented!();
}

Validating and parsing this might be more difficult than using new, explicit syntax, though.

(Updated example to use Result return type and be even more complicated.)

A few more notes: https://scribbles.pascalhertleif.de/machine-readable-inline-markdown-code-cocumentation.html

@sfackler
Copy link
Member

sfackler commented Aug 11, 2016

Javadoc has a similar setup to what's proposed:

/**
 * @brief some short description
 * @param foo A foobar
 * @returns blah
 */
Blah doFoo(Foobar foo) { ... }

@nrc nrc added T-dev-tools Relevant to the development tools team, which will review and decide on the RFC. T-doc Relevant to the documentation team, which will review and decide on the RFC. labels Aug 12, 2016
@nrc
Copy link
Member

nrc commented Aug 12, 2016

I know many other languages offer such a facility (e.g., Javadoc as mentioned by sfackler). I think this RFC would benefit from a survey of what other languages offer in this space and what we should/shouldn't copy.

@killercup
Copy link
Member

This might go a bit too far, but: Is there a plan to expose this kind of structure documentation in a machine readable way?

I assume the AST currently only gives you one #[doc="..."] token for the complete thing. When we introduce a way to parse doc strings in a special way (regardless which syntax we decide to use), consumers of the AST (e.g., rustdoc, editor plugins, lints) might not want to (maybe: should not) implement these parsing rules themselves.

There are a few things we can do:

  • Add a 'structured docs' attribute to each function (in addition to the regular 'doc' attribute), that contains a simple data structure (an AST itself) representing the documentation data.
  • Add doc attributes to the items described by the structured documentation, i.e., add the parameter documentation as a doc attribute to the function parameter item.

Another point: I would also expect there to be a clever lint that can find errors in those doc strings (like misspelt parameter names).


This lint and AST extension I described would need to be part of rustc (not rustdoc) to be useful, so it couples the doc syntax with the compiler. This is another drawback that will need to be mentioned in the RFC.

@JustAPerson
Copy link

Another alternative is to go the Swift-way, i.e. use special headings and formatting that rustdoc will have knowledge of. Which has the nice property that this will still look fine when seen as plain markdown and rustdoc doesn't have to care about it to render it nicely.

This is similar to the idiom in the standard library today. I personally find it painfully verbose to navigate in a text editor.

@ExpHP
Copy link

ExpHP commented Aug 14, 2016

How might the future addition of keyword, default, or variadic parameters affect the @param meta tag proposed here?

There already are some tricky bits about dealing with arguments, since they can be patterns; there can be dummy arguments:

fn foo(_: i32, _: i32) -> i32 { 0 }

As far as I can surmise these seem unlikely to be a major concern, since if an argument is worth documenting, then it must also be worth naming. (with a _prefixed name if necessary to avoid warnings)

Another thing is that the most comfortable way to write arguments for implementation may differ from the most comfortable way to document them:

type Point = (i32,i32);
type Vector = (i32,i32);

fn displacement((x1,y1): Point, (x2,y2): Point) -> Vector;

fn displacement(from: Point, to: Point) -> Vector;

The first function above is more comfortable to implement---but when it comes to documentation (either writing or reading it), I would sincerely hope for the latter!

@ExpHP
Copy link

ExpHP commented Aug 14, 2016

Also, do we call them "arguments" or "parameters?" I think after a while of using rust I have come to associate the word "parameter" with "type parameters."

@Mark-Simulacrum
Copy link
Member

Regarding the pattern matched arguments and the difficulty in naming them, I see two options: either naming them by position and/or through a use of named parameters (a potential feature, not one implemented yet or RFC-d yet).

Regarding the RFC itself, I think I prefer the Swift way more than Java's @param and similar. I find the @param syntax too restricting and rather unexpected, the more free form nature of Swift's lists is nicer. I don't entirely like the idea of restricting people (as mentioned by killercup above: Validating and parsing this might be more difficult than using new, explicit syntax, though) by explicitly defining the syntax of specifying parameters. I can't think of a better syntax yet; and I support the need to survey the available options as nrc said.

@ExpHP
Copy link

ExpHP commented Aug 14, 2016

Hm, reading it more closely now I see the proposal is quite specifically about rustdoc. In that case, for the purposes of generating documentation, I guess it doesn't really matter how the names of the documented parameters actually correspond to the function arguments.

That said, I imagine that other tooling such as IDEs would also be interested in metadata like this, and could probably benefit significantly from strict requirements on the syntax.

@strega-nil
Copy link

@ExpHP on parameter vs argument

There's actually a technical definition :)

http://stackoverflow.com/questions/156767/whats-the-difference-between-an-argument-and-a-parameter

fn foo<'a, T>(t: &'a T) {
    ...
}

'a is a lifetime parameter, T is a type parameter, t is a... normal parameter.

foo::<String>(&String::new())

the lifetime of the call expression is the (hidden) lifetime argument, String is the type argument, and &String::new() is the normal argument.

@killercup
Copy link
Member

killercup commented Aug 14, 2016

I personally find [the formatting killercup suggested] painfully verbose to navigate in a text editor.

That's a good point! I feel like it's only a minor inconvenience, though. Many editors/IDEs allow you to jump to a symbol (and Rust code is very grep-able) and collapse functions/doc comments; I would also say that parsing a doc comment in the style I proposed is far easier for a human (you can visually move from headline to headline as they are separated by empty lines, instead of the keywords after an '@').

Another thing is that the most comfortable way to write arguments for implementation may differ from the most comfortable way to document them

Indeed. I would expect the the doc syntax to support basically everything you can write on the left hand side of the parameter, though, as long as it's unique. So, e.g., Point(a, b) and Point(c, d) are valid parameter names. I would also expect it to be exactly the same as in the function definition (and checked with a lint).

@eddyb
Copy link
Member

eddyb commented Aug 14, 2016

@ubsan But we already have formal vs actual for that, which are much more precise. I prefer to use "argument" for "(non-constant) value parameter" instead, which is orthogonal to formal/actual.

@strega-nil
Copy link

strega-nil commented Aug 14, 2016

@eddyb That's not how it's defined. Argument and parameter are both very well defined terms...

https://msdn.microsoft.com/en-us/library/9kewt1b3.aspx

To communicate this information to the procedure, the procedure defines a parameter, and the calling code passes an argument to that parameter. You can think of the parameter as a parking space and the argument as an automobile. Just as different automobiles can park in a parking space at different times, the calling code can pass a different argument to the same parameter every time that it calls the procedure.

https://en.wikipedia.org/wiki/Parameter_(computer_programming)

Just as in standard mathematical usage, the argument is thus the actual input passed to a function, procedure, or routine, whereas the parameter is the variable inside the implementation of the subroutine. For example, if one defines the add subroutine as def add(x, y): return x + y, then x, y are parameters, while if this is called as add(2, 3), then 2, 3 are the arguments. Note that variables from the calling context can be arguments: if the subroutine is called as a = 2; b = 3; add(a, b) then the variables a, b are the arguments, not only the values 2, 3.

@oblitum
Copy link

oblitum commented Aug 17, 2016

I agree that Swift has improved a lot in this space. Their metatags aims to help not only in standard docs but also to help IDE tooling providing information for code completion, besides being markdown based.

@strega-nil
Copy link

I like the alternative @killercup has proposed far more than the original.

@steveklabnik
Copy link
Member

@jimmycuadra so we talked about this at the docs team meeting today.

It seems like most people are not concerned about this idea in general, but aren't super happy with the @ style. However, both in the meeting and in this thread, a lot of people did like @killercup 's style. What do you think? Is the semantic bit what's most important to you, or do you feel extremely strongly about the @ style?

@steveklabnik
Copy link
Member

The higher order bit is this: if we can get something that's closer to standard markdown, yet semantic enough through a convention, that would be preferable to inventing some kind of new syntax, basically.

@jimmycuadra
Copy link
Author

I'm not married to the @ syntax, personally. Using a more free-form style like @killercup's example seems fine to me, too. The only concern that comes to mind is whether it would be difficult or error-prone for rustdoc to interpret that style correctly. The @ syntax seems easier for a program to parse, but of course documentation should be optimized for people, not machines.

@killercup
Copy link
Member

The only concern that comes to mind is whether it would be difficult or error-prone for rustdoc to interpret that style correctly.

I think it will be more difficult to parse than @KEYWORD <name> <description>, but not much: Parse markdown, divide into sections by headline, skip unknown headlines, parse lists as described here. I think it makes sense that initially, rustdoc will be strict on what it accepts as valid structured docs, but might get more lenient over time.

@jimmycuadra By the way, feel free to use anything I wrote here (and in this post) in your RFC (if you want to rewrite it) – consider it licensed under CC0!

@jimmycuadra
Copy link
Author

What's the procedure for revising an RFC that drastically? Do I just push a commit that changes it, or create a new PR and close this one?

@Centril
Copy link
Contributor

Centril commented Aug 18, 2016

Why not support both? The two formats do not have any conflicts...

Many are used to the java, javascript, php (https://www.phpdoc.org/), c++ (http://www.stack.nl/~dimitri/doxygen/manual/docblocks.html), ruby (http://yardoc.org/features.html) way of documentation. And these languages make up a large chunk of the langs all developers use.

@safety should perhaps be split into @invariant, @requires, @ensures.

@steveklabnik
Copy link
Member

What's the procedure for revising an RFC that drastically? Do I just push a commit that changes it, or create a new PR and close this one?

So, interestingly enough, this is actually something that the new changes to the RFC process are hoping to address: people come together around a problem (the motivation), then figure out in a broad sense what strategy should be taken, and then end up submitting an RFC. Anyway, we're not there yet, but just a thought.

You can do either of those options; I personally would open a new RFC and link to this one as a comment on it.

Why not support both? The two formats do not have any conflicts...

Because supporting both is not free. You have to support and maintain both kinds of parsers, as well as splitting the ecosystem.

@Centril
Copy link
Contributor

Centril commented Aug 18, 2016

Because supporting both is not free. You have to support and maintain both kinds of parsers, as well as splitting the ecosystem.

Granted that is not free - but is the cost really that great? Looking for a limited set of predefined strings all beginning with @ and then extracting text until two consecutive newlines are found doesn't seem that demanding.

"Splitting the ecosystem" is one way to put it, "freedom of choice" is another... Why enforce one way over another when it is quite subjective?

The swift way of documenting seems nice tho... most IDEs and text editors can auto-close the each header section, so that's positive.

@frewsxcv
Copy link
Member

Why enforce one way over another when it is quite subjective?

Having one format means:

  • New Rust users don't have to learn a new flavor of Markdown with unofficial syntax extensions (the current documentation format is vanilla Markdown)
  • New Rust users don't need to learn about the pros and cons of each documentation format to decide which one to use
  • Contributors to a project don't have to learn what documentation format the project author prefers

Even though this is Rust, I personally enjoy languages that follow Python's principle:

There should be one-- and preferably only one --obvious way to do it.
-- The Zen of Python

@ExpHP
Copy link

ExpHP commented Aug 18, 2016

Granted that is not free - but is the cost really that great? Looking for a limited set of predefined strings all beginning with @ and then extracting text until two consecutive newlines are found doesn't seem that demanding.

  1. Having to support both will constrain possible extensions to the system, because they would need to be able to be added in both formats.
  2. The above can be avoided by constraining the formats instead (making them much more similar)---but inevitably, one will feel half-baked compared to the other.
  3. The differences between the two go beyond just differences in parsing. They also encourage different practices in how the documentation is phrased, styled, and organized.

@jimmycuadra
Copy link
Author

I will revise my RFC and create a new PR with the new approach and then close this one. If people have more thoughts on the topic, feel free to keep commenting here until then, of course!

@jimmycuadra
Copy link
Author

Thinking about this more, I'm less sure how rustdoc would behave if any of the special sections contained content it didn't know how to format. For example, what would it do if the parameters section contained two different lists separated by some free-form text? What would it do if there was one list, but one or more of the list items did not begin with something it could identify as a parameter name?

@killercup
Copy link
Member

killercup commented Aug 19, 2016

For example, what would it do if the parameters section contained two different lists separated by some free-form text? What would it do if there was one list, but one or more of the list items did not begin with something it could identify as a parameter name?

My opinion – just one way of dealing with this, as written above:

I think it makes sense that initially, rustdoc will be strict in what it accepts as valid structured docs, but might get more lenient over time.

Concretely: Don't treat it as a valid section, i.e., ignore it. Have rust{c,doc} print warnings using the lints I mentioned above.

@jimmycuadra
Copy link
Author

@killercup In the "returns" section example in your first comment, which part of that format do you imagine being required by rustdoc? I'm referring specifically to how it starts with the freeform phrase "A Result which is:". Maybe all special sections should consist only of unordered lists so that it's easy for both people and rustdoc to distinguish each point/item/variant/whatever for that section? That'd be more akin to having multiple of the same meta tag, for example to document multiple params, multiple examples or multiple safety issues. Examples might be difficult, too, since they may or may not contain freeform text before and/or after each code block explaining it, and rustdoc couldn't know which paragraphs which went with which code block. Without the developer having some way of explicitly enumerating things in each section, I can see there being more difficulty with the ambiguity of the freeform approach.

@killercup
Copy link
Member

killercup commented Aug 19, 2016

@jimmycuadra, very good questions! I'll try to address them one by one. I wrote a semi-formal description of each section (and how to parse lists) here, by the way.

In the "returns" section example in your first comment, which part of that format do you imagine being required by rustdoc?

The "Returns" section is the only one that does not need to contain a list, as any function or method can only ever return one (generic) type. Thus, only the freeform text is required (an empty section would be useless).

The only limitation on that freeform text is that it can't contain a list, as the only list that is allowed to follow it has to be an enumeration of enum variants of the return type. I felt like this was the most common case, but thinking about it now makes me to suggest a better, more general way: Instead of matching the enum variant names, we should allow arbitrary patterns. This way, it will be possible to describe all possible variants of a return type like Box<Result<Option<String>, CustomError>>. Whether we want to actually do that is another thing, but re-using the pattern match syntax could be beneficial.

Maybe all special sections should consist only of unordered lists so that it's easy for both people and rustdoc to distinguish each point/item/variant/whatever for that section? That'd be more akin to having multiple of the same meta tag, for example to document multiple params, multiple examples or multiple safety issues.

I don't think this is a good idea for human readability, but I can see where you are coming from. As I mentioned above, from the 4 sections I proposed, only "Returns" doesn't need to have a list.

It might make sense to use a list style for "Panics" as well, but for all other sections I would not enforce it.

Examples might be difficult, too, since they may or may not contain freeform text before and/or after each code block explaining it, and rustdoc couldn't know which paragraphs which went with which code block.

For "Examples" specifically, I would use subheadlines instead of lists as nesting code blocks in lists is pretty weird. I don't see a use case for mapping descriptions to examples; what did you have in mind?

Without the developer having some way of explicitly enumerating things in each section, I can see there being more difficulty with the ambiguity of the freeform approach.

I think it would be helpful to define a set of goals/properties we want to extract from doc comments. It might also be useful to write a POC markdown parser and combine that with a bunch of example files to see how hard it actually is to extract this information.

@pyfisch
Copy link
Contributor

pyfisch commented Aug 20, 2016

I am curious why nobody has yet mentioned RFC 1574 which introduces "certain conventions around documenting Rust projects". It defines some general conventions like using line comments and English as the language but also introduces six well known top-level headings:

  • Examples
  • Panics
  • Errors
  • Safety
  • Aborts
  • Undefined Behavior

These match @killercup's proposal and could be extended with headings for parameters, returns and lifetimes. Some sections may have a special syntax like a list for parameters. By the way it would be great if rustdoc highlighted the already defined sections in the output.

@killercup
Copy link
Member

@pyfisch Oh, I meant to link to #1574, but I see now that I only did that in my post, not in a comment here.

By the way it would be great if rustdoc highlighted the already defined sections in the output.

What do you mean by this? Highlight how? The headlines are rendered as, well, headlines. Would you like a special style for section with special names?

I think this would be a bit confusing; we should enhance the output where it makes sense (e.g., render parameter lists in a special way to include the type names), but not render stuff differently 'just because.'

Actually, I don't have a good idea how to change rustdoc's output based on this structured information (aside from the parameter list, and maybe adding tooltips). It will be great to see editors/IDEs use this information for contextual information, though.

@nagisa
Copy link
Member

nagisa commented Aug 20, 2016

I’m personally strongly against this RFC¹. I will not use any of these attributes, will reject all PRs to my own projects using any of them and will go as far as forking rustdoc if using such things would become a necessity with the rustdoc.

¹: Having used JavaDoc and Doxygen, I hate both of them (or rather format they use).

Let me explain why:

@example, for code examples

Examples, more often than not are free form, and this formalisation completely ignores that. Does this tag apply to the paragraph it is in? What about cases there’s multiple paragraphs in the example? If it applies to all the text after the tag till the next tag, what about getting back to free form documentation from the “example ”environment”? It seems to provide absolutely no benefit over a simple # Examples.

@safety and @panic and @error

More often than not (except when you’re doing formal proofs) you cannot or do not want to formalise either of these. Then again the same application stuff applies as above. Also what about this is better than a plain # Safety or # Panic?

@param and @return

The core of this proposal are these two tags. Currently there’s no way to apply documentation specifically to the parameters and return value, so this RFC is adding something over the status quo.

However, I can’t see the necessity to document separate parameters as anything else than anti-patern. If there’s a genuine need to do it, your function is overcomplicated and needs to be refactored into something else (builder, struct+impl, etc). Otherwise your freeform text combined with parameter names already, and naturally, documents these. There’s no need to provide people tools to make anti-patterns viable.

For example:

pub fn copy<P: AsRef, Q: AsRef>(from: P, to: Q) -> Result

Copies the contents of one file to another. This function will also copy the permission bits of the original file to the destination file.

If your function still does not make that obvious, you could instead write:

Copies the contents of from file to to. This function will also copy the permission bits of the from file to the to file.

Documenting return value on the function is even more obnoxious, especially when rustdoc already links (quite conveniently) to the documentation of type returned. Are you returning an u32 which has some other semantic purpose other than just unsigned integer? You need a newtype. Document that newtype instead.


By the time I finished writing and proof-reading above I started reflecting (yet again) on rustdoc making a great mistake by having gone the way of markdown for documentation. Using something like ReST would’ve made something like this RFC not happen in the first place, because using directives for all these things (except maybe for the anti-patterns) would’ve been the natural course of action.

In the end, if we’re sticking with markdown, please do stick with markdown. If markdown seems not enough, perhaps consider using a format which has the desired features in the first place?

@LaylBongers
Copy link

LaylBongers commented Aug 20, 2016

I strongly agree with @nagisa on that these attributes wouldn't be the way to go. I've had quite a few unpleasant experiences with Javadoc/XMLDoc/Doxygen where I was just going through the motions adding "file is the file that's being processed" and "returns the resulting value" to everything.

I think that the current situation is working fine and there's no need to add meta tags. I haven't seen or personally had a need to define documentation on a more fine-grained level than what rustdoc currently provides. The one situation I can see where these would be of use is in IDE integration, but in that case the IDE can just provide the function's doc.

The big question I really have is, what problem does this solve? The RFC mentions consensus on how to provide docs, but we have a decent consensus so far that's been used throughout the community of providing a brief explanation of the function.

@oblitum
Copy link

oblitum commented Aug 21, 2016

I disagree with @nagisa on param/return. As a illustration, this was on my twitter timeline at random, today:


Source: https://twitter.com/NoBugsHare/status/767008291218071552

Besides docs, what I'd really like to see would be the opportunity of having code assistance containing docs. These tags (or any solution in this sense) help on that. Avoiding code roundtrips to check docs for then, write correct code. Code completion, intellisense, etc, aims to deliver you that while you're coding. Free form documentation of the entire function is unnecessarily longer than the much shorter documentation of the actual parameter you're about to fill. I'm not sure whether anyone has attained the best solution in this realm, but for me as a whole, this kind of feature is useful.

Beyond that, Swift goes further making it even more documented at call site by actually having named parameters, so when you read the code you can actually read what is from: and what's is to: without invoking tooling.

One may argue that by having such features a language is welcoming anti-patterns (for which the sole one I can think of is long parameter lists) and hence to prevent that, a language should avoid it at all, just to make it more difficult to have long parameter lists. I don't like long parameter list too, but, like in the image above, even for a short prototype of two parameters, if you don't know what's from: and to:, you just don't know, and you have to check the docs. So having many short prototypes as a means to avoid a long one you go from having to check-the-docs/remember a bunch of parameters for a single prototype to have to check-the-docs/remember a bunch of individual small prototypes in the end. Keeping it small is better but doesn't completely solves this problem of cognitive load to maintain code, it just helps as well. One may also extract the parameters to a data structure, but this is another problem, often they don't fit for a single data structure.

@alexcrichton alexcrichton removed the T-dev-tools Relevant to the development tools team, which will review and decide on the RFC. label Aug 23, 2016
@killercup
Copy link
Member

killercup commented Sep 3, 2016

FYI, I was bored and after playing around with pulldown-cmark wrote a simple parser for doc strings in the format I described above. Nothing really fancy, just as a proof of concept. You can see the code here: https://github.com/killercup/rust-docstrings

@strega-nil
Copy link

ping @jimmycuadra @steveklabnik

status?

@jimmycuadra
Copy link
Author

I don't think I'm going to have time to work on a revised RFC for this anytime soon. What's the protocol? Just close the PR?

@steveklabnik
Copy link
Member

If you don't have time to work on a revised RFC any time soon; I vote that we should close as "postponed." This isn't a judgement on the validity of the RFC, and if anyone else wants to champion this at a later time, they can do so. Thanks!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
T-doc Relevant to the documentation team, which will review and decide on the RFC.
Projects
None yet
Development

Successfully merging this pull request may close these issues.