-
-
Notifications
You must be signed in to change notification settings - Fork 317
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
RFC: Add support for display blocks & spans (for Math and diagrams) #745
base: master
Are you sure you want to change the base?
Conversation
This point to me precisely illustrates that no syntax extension is needed. GH uses code (with backticks). No new dollar support is needed. CM already allows: ```math
\frac{1}{2}
```
~~~math
\frac{1}{2}
~~~ why is an additional: $$$math
\frac{1}{2}
$$$ needed?
The current code blocks allow for syntax highlighting but do not specify that a particular syntax highlighting library is implemented. They do this by exposing the info string as a class. This “display blocks” PR seems to require that every markdown -> HTML compiler implements a particular LaTeX math -> MathML transform. I think this particularly means that every markdown compiler now shows a very big and heavy transform that not every user of markdown might want, TeX math as input which not everyone might want, and MathML output which not everyone might want. To illustrate, the smallest CM compliant markdown parser that I am aware of is 15kb minzipped. Adding support for roughly this PR with KaTeX adds 75kb minzipped.
Do you have an example of how LaTeX -> MathML is trivial? Some more Qs:
|
The idea of distinguishing code blocks meant to be displayed as code (possibly highlighted) and code blocks meant to be interpreted (e.g. executed or rendered) was discussed extensively here: https://talk.commonmark.org/t/mermaid-generation-of-diagrams-and-flowcharts-from-text-in-a-similar-manner-as-markdown/1882/1 |
There are a few reasons:
I think that it is acceptable to say that a parser is allowed to not perform any display block rendering, and instead present display blocks the same as code blocks if they do not support it. Most important is that there is an official choice for how display blocks should be notated so that parsers don't need to make an independent choice. Supporting JS->MathML should also be more lightweight than KaTeX/MathJax, which need to add styling.
Trivial was not the best word here, "well defined" is more accurate. I simply say that because there are already a few libraries to do the conversion:
I think that a renderer could probably treat this the same as inline html, but I don't know whether this should be required (more below)
This would follow the guidelines for codeblocks, the renderer can decide what to do with this information
Thank you for pointing this out, I will remove it. I think a general short summary of what I am proposing would be: "New syntax blocks |
GitLab has had this support for quite awhile, and in fact extends it to PlanutUML and Kroki (https://docs.gitlab.com/ee/user/markdown.html#diagrams-and-flowcharts) They have also had math support for quite awhile using However the desire for a well-defined and supported implementation for the I'm biased in preferring to use |
Perhaps there doesn’t need to be. I don’t think it breaks with CM when Pulldown turns
There is no info string for inline code either, that might be nice to have too. So perhaps a proper solution is needed for inline code/math, to allow tagging as a particular language?
I think my first paragraph in this comment answers that need.
There is a significant problem with trying to add this: single dollar support will break many existing markdown documents, because single dolars are quite common in (American English) natural language. |
What do you think of this proposal? I compare the proposals here. |
Yep, I also most like opt D. I do prefer that |
Predefined English keywords are never okay in CM. |
a) I don't think that is said in the spec, b) this place is literally about words, the first which is used, the rest currently dropped |
(been away for a bit, just getting back)
I think the first example works better for math where The
This is something I was considering proposing as well, specifically adopting RST's
I don't think this is likely to be much of a problem because if openers need preceding whitespace and closers need postceding whitespace, phrases like It is also already used by Github, Gitlab, tex all the things, stackoverflow and others, so I assume it has turned out not to be an issue. |
I completely agree. And I think the spec laid out by @jgm in https://github.com/jgm/commonmark-hs/blob/master/commonmark-extensions/test/math.md does a great job. Honestly I'm not crazy about
That feels more "markdownish" to me. I guess because I'm a programmer it makes sense. It would even translate well into inline code, such as |
Single dollars are an issue, to some degree, depending on audience.
So I think single dollars will unavoidably
The is also the matter that people tend to use unmodified off-the-shelf parsers and bolt-on math rendering around it, resulting in bugs e.g. So I don't have high hopes for
To put it another way: the reason I prefer literal-based syntaxes for math is better interop with existing parsers. Proclaiming |
Pandoc has pretty solid heuristics for For djot I just went with |
FWIW this describes the way I do it in |
I opened a PR to start discussion on info strings for inline code: #750 I think that a straightforward question that we could probably answer now is, do we want to support new syntax at all (likely something Since so many implementations support blocks with |
Perfect! :)
I have doubts. In CM we have another place where whitespace is important, emphasis/strong. That work quite badly in languages that don’t use (much) whitespace such as Chinese. There are issues about that here (and I get more in my projects). I feel better about reusing the syntax for code, and adding onto it in the currently specifically unused space: meta string after the first word.
I feel like this There’s also the thing where this breaks every tool that deals with markdown that exists. This meta space after the language name is explicitly ignored by CM, any markdown tool can start supporting it already. And if they don’t support it yet, the code will be displayed, likely syntax highlighted. Using the meta space also doesn’t break mermaid on GH. I’m not going to block |
Yes, but the spec doesn't specify any words, and the de facto standard that the language of the code block or the filename extension associated with said language is the first token of the info string doesn't introduce English into the spec/standard. The names “Rust”, “Javascript”, “Markdown” and their filename extensions are crosslingual names and identifiers.1
I think for the person reading the Markdown as content ("the reader") rather than as source code, which as you've mentioned is something Markdown supports as a priority, it doesn't matter that they don't know what the It only matters to the person writing the Markdown as source code for rendering ("the writer"). It is reasonable to expect the writer to learn this difference, just as they must learn the other rules of fenced code blocks that impact rendering and all the other rules of Markdown. The reader doesn't need to know any of those rules. Markdown's structure is designed to be self-evident for readers. None of the intricacies in the extremely long CommonMark spec matter to the reader, only to the writer, as long as the writer doesn't abuse those intricacies to produce content without self-evident structure, or use any of the writer conveniences that are reader-unfriendly (e.g. lazy continuation) unless they choose to dispense with Markdown's reader friendliness. Such a choice will naturally have the exact same property of self-evidence, because a writer is always also a reader. A writer can only willfully produce reader-unfriendly content with Markdown. I don't remember if the above distinction between reader and writer and the Markdown goals for each is articulated anywhere in the CommonMark spec or website, or on Gruber's website.
Yes, we absolutely cannot break things other than misused corner cases. Such changes really belong in a new language, e.g. djot. Footnotes |
From the spec:
If this behavior is de facto but not actual spec, I would say we should take that consensus and add it more explicitly here. On the other hand, if this entire behavior is indeed de facto spec instead of actual spec, then we don’t need to have much of this conversation. We can add another tip about using
It does say “word”. “First word”. So there is also an end of a word, a word break. So there can be two words.
That second case is why I think 1 punctuation character is less ideal.
I don’t think so, but I see it too. |
There are quite a few programming language names that end in punctuation: C--, C++, C#, F#, F*, J#, J++, M#, P′′, Q#, R++, Visual J++, X++, xBase++, Z++. None on that list end in Worthy of note: A not small number of language names have more than one word, so "The first word of the info string is typically used to specify the language" has limitations, or needs clarification. Not sure what people do for code blocks of those languages today. They might be so much in the minority that no one hears them. Maybe the name is turned into a single hyphenated token, which you'd need to do for the CSS class attribute anyway.1 As practical as that solution may be, it feels like a hack. I looked into how GitHub handles it2. Based on a single test, it was hyphens. I'd say having an "info string" standard would be useful, and maybe it should be separate from CommonMark, as other formats support something similar if not identical, e.g. djot (cc @jgm). Even if said standard we open-ended as it is today, but with additions for multi-word language names and optional explicit directives for render source vs eval semantics. Footnotes
|
I think it goes to the fact that the community, as far as I know, have gotten stuck defining any extensions. Lots of good ideas talked about but nothing ever decided. I'm thankful that |
It's not too different than trying to explain that putting a
You're right, for those that don't upgrade their parser to the latest CommonMark spec won't support the additional syntax. So if you try to use the newer syntax on an older implementation, it does seem to render the code block (at least here on GH), but does not syntax highlight it. Less than optimal, but not completely broken.
There was quite a bit of feedback on the admonition implementation thread where some folks were upset with using English words for the syntax. That would be one reason to favor using punctuation. Note that
yields
I had forgotten, but GitLab does support a certain syntax for this
The language string becomes Of course, like any renderer, if you don't know what the language is, the code block is still rendered. Not saying this is better than maybe I also wonder if it wouldn't be better to adopt jgm's attribute syntax and leverage that for providing these extra abilities, https://github.com/jgm/commonmark-hs/blob/master/commonmark-extensions/test/attributes.md |
IMHO, I like the idea of the "Generative code" both block and inlines. Nevertheless, I would prefer
as more people get used to the idea that code between dollar signs returns something that is not the code, but generated content. |
This proposed change defines "display blocks" and "display spans" that are meant to process their contents for rendering in some way, rather than being displayed as raw text.
Motivation
Math support is a discussion that seems to come up quite frequently; pandoc has an extension, StackExchange supports math expressions via MathJax, and both GitHub and GitLab support them too. GitHub also added support for Mermaid diagrams somewhat recently. This illustrates the need for a CommonMark specification to guide further implementations.
This PR defines "display blocks" that follow the same definition rules as code blocks but are intended to render their content into a display form, rather than a verbatim representation. By default these should process the data as TeX and output MathML, but the info string can be used to change the renderer to something like
asciidoc
,mermaid
,graphviz
, orsvg
.MathML is supported on every major browser only as of recently1, so providing a math implementation should be somewhat trivial. If a render engine does not support TeX or the requested renderer, it may fallback to presenting display blocks as code blocks.
The above examples are compatible with the mentioned PanDoc, GitHub, and GitLab flavored markdown, with the exception that the
```mermaid
blocks are rendered rather than presented as plain text.Previous discussion:
Footnotes
https://caniuse.com/mathml ↩