-
-
Notifications
You must be signed in to change notification settings - Fork 1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
feat(derive): add markdown parsing of doc comments in derive #4444
base: master
Are you sure you want to change the base?
Conversation
And after all that I forgot to include the example. The example from the included test: Source
Rendered(Note: the underlining is being filtered out of the HTML by github here.) A remarkable tool This tool is remarkable because its help is generated from markdown! It has some italic text. This is a link. (But not in the help output.) This paragraph has inline code. This is also a paragraph, but unlike the other paragraphs, this one is spread across multiple lines. Or at least it is in the markdown. Usage: clap [OPTIONS] Options: --foo Markdown is also handled here. -h, --help Print help information (use `-h` for a summary) Obviously it has some problems but it is help output styled with markdown. :D |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We need to work out more details on StyledStr
before we can have conversations on what to do about this generally and about how to map markdown to the terminal
@@ -42,6 +42,7 @@ quote = "1.0.9" | |||
proc-macro2 = "1.0.42" | |||
heck = "0.4.0" | |||
proc-macro-error = "1" | |||
pulldown-cmark = "0.9" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
minimad doesn't parse blocks and only parses some inlines. To be able to use this we'd have the write the first half of a markdown parser anyway.
I wonder how much of a problem that will actually be, especially compared to what we do now. The build times are significantly different than pulldown
https://github.com/rosetta-rs/md-rosetta-rs
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yep, I get that. Unfortunately, the way that it accomplishes that is by not really being a markdown parser. It's a parser for something that sorta looks like a subset of markdown.
Honestly I think that not supporting markdown is a better option than supporting minimad
's version of "markdown".
There's some things that look like they're just bugs that we could fix that I can only assume would be accepted, like:
Inlines aren't parsed in lists:
* some bullet item
* some _bullet_ item
* some bullet item
Normal(Composite { style: ListItem, compounds: ["some bullet item"] })
Normal(Composite { style: ListItem, compounds: ["some _bullet_ item"] })
Normal(Composite { style: ListItem, compounds: ["some bullet item"] })
There's some things that I'm not sure if they would consider them to be a bug, but we could technically workaround:
Each line of a paragraph is considered a paragraph.
Lorem ipsum dolor sit amet, consectetur adipiscing elit, sed do
eiusmod tempor incididunt ut labore et dolore magna aliqua. Dictum at
tempor commodo ullamcorper a lacus vestibulum sed arcu.
Normal(Composite { style: Paragraph, compounds: ["Lorem ipsum dolor sit amet, consectetur adipiscing elit, sed do"] })
Normal(Composite { style: Paragraph, compounds: ["eiusmod tempor incididunt ut labore et dolore magna aliqua. Dictum at"] })
Normal(Composite { style: Paragraph, compounds: ["tempor commodo ullamcorper a lacus vestibulum sed arcu."] })
Sub-lists (bonus points for parsing a single unconnected *
as the start of an italic block)
* some bullet item
* some bullet item
* some bullet item
Normal(Composite { style: ListItem, compounds: ["some bullet item"] })
Normal(Composite { style: Paragraph, compounds: [" ", I" some bullet item"] })
Normal(Composite { style: ListItem, compounds: ["some bullet item"] })
Only one block level is supported:
> # foobar
Normal(Composite { style: Quote, compounds: ["# foobar"] })
There's way more, these were just the most obviously wrong of the few things I tried. Honestly I think it'd be easier and faster to write a new markdown parser than to pre-parse and re-parse around this crate to get reasonable output of any sort of basic markdown document. That would put a major dent in its lead in your stats too.
tl;dr: We should heed the warning on that crate that says:
If you're looking for a Markdown parser, this one is probably not the one you want
What you want to use it for isn't what it's meant to be used for. It's not a general purpose markdown parser. IMO, it's not even a "general purpose (when your purpose is outputting to the terminal) markdown parser". It's good for when you want some minimal style in terminal output of static, non-user-supplied, text.
@@ -28,38 +28,56 @@ impl StyledStr { | |||
AnsiDisplay { styled: self } | |||
} | |||
|
|||
pub(crate) fn header(&mut self, msg: impl Into<String>) { | |||
/// Display string as header. | |||
pub fn header(&mut self, msg: impl Into<String>) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
StyledStr
's details can't be made public until after
- Allow good/warning/error/hint color to be customized #3234
- Allow styled text in
template
#1433 - The header for my examples are not colored like the others #3108
At that point, we'll have a better idea of what can and should be public
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
👍
Though, at least in my understanding of how StyledStr gets used, having a public interface would be a prerequsite for #1433 and #3108.
Would it be helpful to you for me to write up a separate issue that pulls out my random scattered comments about StyledStr
above (and makes them a lot more cohesive)?
Not with the current API for
The doc comments? The ones I saw basically repeat the function name so not the highest priority /value |
I don't understand. Is there another way to create a
Not doc comments, comments as in "the things I commented on earlier in this conversation". I just meant, would if be helpful to pull out my thoughts on |
My plan for the future API is nothing like the current internal API and the design of those features will help guide what that API looks like, so they will need to be developed in tandem.
Yes, this is why I prefer more of those requirement / design discussions to happen in the original issue so they don't get lost in PRs that come and go. |
This is an early draft of a PR that will resolve #2389. I wanted to get some feedback on intent in a few places to make sure I'm doing the right thing.
This largely takes the tack that was laid out in #2389.
Markdown parsing is done within derive.
This was all but explicitly stated, and is, I believe, the right place to do the parsing.
Then a
StyledStr
is output into the user's expanded code.StyledStr
isn't explicitly mentioned, but it's what the generated function takes and whatColorizer
uses internally, so I assume this is right.Though,
StyledStr
doesn't have things like bold and italic itself directly and I'm not sure if it's the right place to add it. I know you also haveanstyle
, but I'm not sure that makes sense to use here sinceStyledStr
seems to denote more semantic contexts. The biggest problem there is I'm not sure what to do when some text is bold and italic or worse a heading that is partly styled or any other combination of things since it only tracks a single enum variant for the style.This PR also makes the basic styling functions on
StyledStr
pub
so that they can be used from the derived code. I'm not sure if that's acceptable or not. We could hide them from the docs if you'd prefer they weren't fully public.Markdown is parsed with
pulldown-cmark
pulldown-cmark
isn't perfectly suited to what I'm doing here, but I think it's the only real option that will work.pulldown-cmark
seems to assume you'd only ever want to output html, and omits telling us a lot of information that would be helpful for outputting back to psuedo markdown. Doesn't let me do everything I want, but is good enough.comrak
parses as github flavored markdown, but doc comments in rust are parsed by Rust as commonmark markdown. It's much heavier thanpulldown-cmark
. Though it does have a somewhat better API for what we're trying to accomplish. Didn't try it though, don't know what other problems we'd run into.minimad
doesn't parse blocks and only parses some inlines. To be able to use this we'd have the write the first half of a markdown parser anyway.mini_markdown
, as mentioned in your rosetta repo, doesn't have a stable API yet and from a cursory glance doesn't seem complete enough. May be useful in the future.Formatting
As far as I'm aware there's not canonical representation of formatted markdown in the terminal. As part of this PR I'm essentially inventing one. A lot of things can't be represented in any native way in terminal output and will "fallback to markdown" for which I just mean output text that looks like markdown. Markdown was based on how people were already writings things in plaintext files after all.
Most of these aren't done, I wanted to get feedback before I go about implementing some of the harder ones. There are some that require a lot of other probably wide-reaching changes, pretty much anything that requires nested styling.
Block Elements
Thematic breaks
Fallback to markdown, insert a
---
line.Could potentially write
-
s across the whole terminal/written area if I have that info available at the time. This would require aStyledStr
with a newStyle
and no actual str.Headings
Unsure. Probably set to
StyledStr::heading()
, and ignore level.In the future could add levels to
StyledStr::heading
, not sure how headings other than the first level should even be styled though.Could also explicitly disallow more than a certain number of heading levels.
Code Blocks
Fallback to markdown but always output in the style of an indented code block.
Could consider syntax highlighting in the future, that would require setting colors on segments within the codeblock.
HTML Blocks
Probably just drop them and all of their content. I can't think of anything else to do with them.
Paragraphs
Merge into a single line of text that is wrapped as appropriate during output. (Though see also Hard/Soft Line Break.)
Blank Lines
Ignored. I believe this is essentially what already happens to doc comments.
Quotes
Fallback to markdown, prefix lines of block with
>
.This is the big problem child with styling. it probably needs to be a single
StyledStr::quote()
, but that then means that nothing inside it can be styled in any way. This would have to be done this way so that at output time the quoted things can wrap, but still be prepended. Having any internal styling would depend on aStyledStr
that can contain other internalStyledStr
.Lists
Fall back to markdown, but with proper item numbering & unified bullets.
Output plain text markdown so that inner styling works. This might cause minor formatting problems with wrapping, I haven't looked yet at how exactly that works.
Inlines
Code Span
Make bold, this seems to be how "code" in manpages gets formatted for the terminal.
Emphasis and strong emphasis
Emphasis -> italic, Strong Emphasis -> Bold
(Note: don't currently have a way to represent both at the same time, currently will just choose the innermost one. This could be solved with a special case
BoldItalic
style if we wanted to fix it short term.)strikethroughShow striked.
(Though strikethrough isn't supported in
termcolor
, but there is an open PR for it. This would have to wait for that or a change over toanstyle
. For now just fallback to markdownLinks
I don't have a good answer for this. Currently this just removed the link syntax and URL completely so that help text doesn't get too verbose.
Could possibly keep that for short help text & for manpages convert all links to ref-style links. Long help could go either way.
Images
Removed.
Could consider keeping the alt/title text.
Autolinks
Fallback to markdown. This is just a link in
<
>
so some terminals will detect it as a link anyway.HTML
Again, remove.
Hard Line Break
Insert a
\n
into the text.Soft Line Break
Join the lines with a space.
Text
Is text, show text.
Stubs
This code also contains (or will contain) some stubs for future expansion:
Manual Sections
Right now the doc comment is parsed into
short_about
andlong_about
implicitly based on the first blank line. This keeps that (assuming the first non-blank line is text and not something else), but also adds splitting out anything with a heading as manual content. (Though nothing is done with that content at the moment, I'm assuming it would make sense to make it available toclap_mangen
in the future.)Markdown Manuals
There's a few things that only make sense in this context. We want to render our manpages in our reference docs site. Eventually I want to be able to build a "markdown manpage" out of the doc comment markdown without round tripping through roff first.
Other Open Questions
Make This the Default or Require Opt-In
Right now I just have this basically hacked in to be the default formatting.
I think this should be safe to become the default. That's based on the assumption that anyone who was trying to do anything funky would be using
verbatim_doc_comment
.The only possible exception would be cutting off the long help text at the first valid heading. I imagine it would be unlikely to cause problems, but there's definitely a possibility there. That might be something that would have to be opt-in, at least until there's another breaking release.
This is probably a question that can remain open until this to closer to done. I'll have a better idea of other problems by then too.
Phew, that was a long one.
tl;dr
StyledStr
? Should it morph to support nested styles?TODO