Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Leave the content of <code> and related HTML elements untouched by CommonMark parser #711

Open
kaushalmodi opened this issue May 22, 2022 · 15 comments

Comments

@kaushalmodi
Copy link

Hello,

Recently I discovered that CommonMark allows Markdown parsing within <code> blocks!

So if user had something like <code>**bold**</code> in their markdown content (which is analogous to Markdown `**bold**`), the CommonMark parser would parse those asterisks in there.

This can be reproduced at least with this CommonMark dingus

image

Can the spec be updated so that the content inside <code> (and also <kbd>, <samp> and <var>) also be treated verbatim like that done for the <pre> element?

  • <code> element

    displays its contents styled in a fashion intended to indicate that the text is a short fragment of computer code

    Now, if that short fragment happens to have Markdown markup characters, we don't want a markdown parser to render those as Markdown!

  • <kbd> element

    represents a span of inline text denoting textual user input from a keyboard, voice input, or any other text entry device

    The textual user input on the keyboard can contain asterisks, underscores and square-brackets too. We wouldn't want a Markdown parser to interpret those!

  • <samp> element

    used to enclose inline text which represents sample (or quoted) output from a computer program.

    What if the computer program is outputting Markdown text.. we want this element to show exactly what the computer program's output was; we won't want a Markdown parser to corrupt the sample of the output which the user is trying to preserve in a <samp> element.

  • <var> element

    represents the name of a variable in a mathematical expression or a programming context

    Mathematical expressions easily contain asterisks. We don't want the Markdown parsers to touch these elements either!

/cc: @jmooring

@wooorm
Copy link
Contributor

wooorm commented May 22, 2022

Hmm, if you want code, use markdown code: backticks.
I’ve seen folks depend on these “bugs” (imo features) so I don’t think it should be changed

@kaushalmodi
Copy link
Author

kaushalmodi commented May 22, 2022

use markdown code: backticks.

I could have used the backticks, but I needed to handle cases where I can add my custom HTML annotations to the inline code blocks.

Here's a watered down example:

In Nim, <code class="inline-src language-nim" data-lang="nim">echo "hello"</code> will print
_hello_.

Another usecase is to do syntax highlighting using classes in code tags.. here's a screenshot of what I mean

image

(note the inline colored code in there)

Right now, I am doing this by bypassing a commonmark parser.

@kaushalmodi
Copy link
Author

@wooorm

seen folks depend on these “bugs”

Can you point to a usecase where people would want a markdown parser to render stuff inside <code> blocks?

@wooorm
Copy link
Contributor

wooorm commented May 22, 2022

To add links to code for example

@jgm
Copy link
Member

jgm commented May 22, 2022

This is the way Markdown has traditionally done it, starting with Markdown.pl.

https://babelmark.github.io/?text=%3Ccode%3E%0A**a**%0A%3C%2Fcode%3E%0A

@kaushalmodi
Copy link
Author

@wooorm I don't follow.. the code element is for inline code. The current CommonMark behavior is inconsistent between block code (<pre> elements) and inline code (<code> elements). If the user meant to have Markdown links, they might as just put those outside the HTML code elements.

@kaushalmodi
Copy link
Author

kaushalmodi commented May 22, 2022

@jgm

This is the way Markdown has always done it:

I understand.. but may be it's our opportunity to fix that? I cannot find anyway in Markdown that I can write inline code with HTML attributes like so

In Nim, <code class="inline-src language-nim" data-lang="nim">echo "hello"</code> will print
_hello_.

@jgm
Copy link
Member

jgm commented May 22, 2022

A strong degree of compatibility with existing implementations was a design goal.

I do see why this prevents you from doing what you want to do here. It's not really a problem in pandoc, for example, where you can just do

`echo "hello"`{.inline-src language-nim lang=nim}

or use the raw attribute

`<code class="inline-src language-nim" data-lang="nim">echo "hello"</code>`{=html}

But there does seem to be an expressive gap here in core commonmark.

@wooorm
Copy link
Contributor

wooorm commented May 22, 2022

they might as just put those outside the HTML code elements.

With the current state, you can mark certain parts of the code as important, or link them. With your proposal, you can’t.

@kaushalmodi
Copy link
Author

@jgm Unfortunately I am not using pandoc. I am using the Go Commonmark parser called Goldmark through Hugo (static site generator).

@wooorm

With the current state, you can mark certain parts of the code as important, or link them. With your proposal, you can’t.

Note that will will affect only inline code. You cannot do those anyways in block code blocks or <pre> blocks. If at all, this will bring consistency between inline and block code blocks.

Again, an example of this in wild will be useful.

@wooorm
Copy link
Contributor

wooorm commented May 22, 2022

Bringing complete consistency is impossible: HTML in markdown is a black box. It “sniffs” things that look like XML and switches to a different state based on it starting with <style or so. It’s not an actual complete parser. Some consistency could be added but as mentioned above the compatibility is important.

Some examples: <code>c + *y*</code>, to emphasise y, and <code>[myId](#some-href) := someProduction</code>

@Crissov
Copy link
Contributor

Crissov commented May 22, 2022

In HTML, <pre> is for preformatted content (i.e. special treatment of whitespace), but it can still contain other elements; <code> isn’t restricted like that at all. So why should it be in Markdown?

@mhansen
Copy link

mhansen commented Nov 26, 2022

Hi there, we just noticed our Graphviz docs (hugo/goldmark-based) have smart quotes, breaking copy/paste (Graphviz docs issue), where we're using <code> blocks instead of backticks so that we can link sections inside the code block:

<code>[mode](/docs/attrs/mode/)="hier"</code>

Shows these smart quotes, unintentionally:

image

We seem to be stuck between two bad places: we seem to be relying on some markdown processing inside our code tag (to generate the <a> tag), but don't want the smart-quotes processing. We're caught between disabling links inside our code sections, or disabling the smart-quotes feature entirely, even outside of code blocks, or try to use two code sections (but that causes excessing padding issues). None of these options are appetising! Could anyone suggest another workaround that lets us keep intra-code-block links?

@jgm
Copy link
Member

jgm commented Nov 26, 2022

@mhansen you both want and don't want markdown processing inside the code span.
I'd suggest a different approach. A simple one would be

<code>[mode](/docs/attrs/mode/)</code>`="hier"`

Or even better

[`mode`](/docs/attrs/mode/)`="hier"`

@mhansen
Copy link

mhansen commented Nov 26, 2022

Thanks for the workaround. We'll need to remove left & right padding and the code-block rounded corners from our code blocks so they don't have extra padding and rounded corners, like the bottom here:

image

We can probably get away without the padding and rounded corners, though it won't look as nice, we might end up doing that:

image

FWIW, I'd be very happy to give up all markdown parsing in <code> blocks; it's very easy to write out the <code><a href="...">foo</a>=bar</code> but, I think, tougher to make two side-by-side code blocks look nice in CSS.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

5 participants