Leave the content of `<code>` and related HTML elements untouched by CommonMark parser #711

kaushalmodi · 2022-05-22T18:33:49Z

Hello,

Recently I discovered that CommonMark allows Markdown parsing within <code> blocks!

So if user had something like <code>**bold**</code> in their markdown content (which is analogous to Markdown `**bold**`), the CommonMark parser would parse those asterisks in there.

This can be reproduced at least with this CommonMark dingus

Can the spec be updated so that the content inside <code> (and also <kbd>, <samp> and <var>) also be treated verbatim like that done for the <pre> element?

<code> element

displays its contents styled in a fashion intended to indicate that the text is a short fragment of computer code

Now, if that short fragment happens to have Markdown markup characters, we don't want a markdown parser to render those as Markdown!
<kbd> element

represents a span of inline text denoting textual user input from a keyboard, voice input, or any other text entry device

The textual user input on the keyboard can contain asterisks, underscores and square-brackets too. We wouldn't want a Markdown parser to interpret those!
<samp> element

used to enclose inline text which represents sample (or quoted) output from a computer program.

What if the computer program is outputting Markdown text.. we want this element to show exactly what the computer program's output was; we won't want a Markdown parser to corrupt the sample of the output which the user is trying to preserve in a <samp> element.
<var> element

represents the name of a variable in a mathematical expression or a programming context

Mathematical expressions easily contain asterisks. We don't want the Markdown parsers to touch these elements either!

/cc: @jmooring

The text was updated successfully, but these errors were encountered:

wooorm · 2022-05-22T18:46:12Z

Hmm, if you want code, use markdown code: backticks.
I’ve seen folks depend on these “bugs” (imo features) so I don’t think it should be changed

kaushalmodi · 2022-05-22T18:51:46Z

use markdown code: backticks.

I could have used the backticks, but I needed to handle cases where I can add my custom HTML annotations to the inline code blocks.

Here's a watered down example:

In Nim, <code class="inline-src language-nim" data-lang="nim">echo "hello"</code> will print
_hello_.

Another usecase is to do syntax highlighting using classes in code tags.. here's a screenshot of what I mean

(note the inline colored code in there)

Right now, I am doing this by bypassing a commonmark parser.

kaushalmodi · 2022-05-22T18:55:07Z

@wooorm

seen folks depend on these “bugs”

Can you point to a usecase where people would want a markdown parser to render stuff inside <code> blocks?

wooorm · 2022-05-22T19:09:09Z

To add links to code for example

jgm · 2022-05-22T19:13:04Z

This is the way Markdown has traditionally done it, starting with Markdown.pl.

https://babelmark.github.io/?text=%3Ccode%3E%0A**a**%0A%3C%2Fcode%3E%0A

kaushalmodi · 2022-05-22T19:14:27Z

@wooorm I don't follow.. the code element is for inline code. The current CommonMark behavior is inconsistent between block code (<pre> elements) and inline code (<code> elements). If the user meant to have Markdown links, they might as just put those outside the HTML code elements.

kaushalmodi · 2022-05-22T19:16:28Z

@jgm

This is the way Markdown has always done it:

I understand.. but may be it's our opportunity to fix that? I cannot find anyway in Markdown that I can write inline code with HTML attributes like so

In Nim, <code class="inline-src language-nim" data-lang="nim">echo "hello"</code> will print
_hello_.

jgm · 2022-05-22T19:18:50Z

A strong degree of compatibility with existing implementations was a design goal.

I do see why this prevents you from doing what you want to do here. It's not really a problem in pandoc, for example, where you can just do

`echo "hello"`{.inline-src language-nim lang=nim}

or use the raw attribute

`<code class="inline-src language-nim" data-lang="nim">echo "hello"</code>`{=html}

But there does seem to be an expressive gap here in core commonmark.

wooorm · 2022-05-22T19:31:08Z

they might as just put those outside the HTML code elements.

With the current state, you can mark certain parts of the code as important, or link them. With your proposal, you can’t.

kaushalmodi · 2022-05-22T20:07:34Z

@jgm Unfortunately I am not using pandoc. I am using the Go Commonmark parser called Goldmark through Hugo (static site generator).

@wooorm

With the current state, you can mark certain parts of the code as important, or link them. With your proposal, you can’t.

Note that will will affect only inline code. You cannot do those anyways in block code blocks or <pre> blocks. If at all, this will bring consistency between inline and block code blocks.

Again, an example of this in wild will be useful.

wooorm · 2022-05-22T20:12:06Z

Bringing complete consistency is impossible: HTML in markdown is a black box. It “sniffs” things that look like XML and switches to a different state based on it starting with <style or so. It’s not an actual complete parser. Some consistency could be added but as mentioned above the compatibility is important.

Some examples: <code>c + *y*</code>, to emphasise y, and <code>[myId](#some-href) := someProduction</code>

Crissov · 2022-05-22T20:14:21Z

In HTML, <pre> is for preformatted content (i.e. special treatment of whitespace), but it can still contain other elements; <code> isn’t restricted like that at all. So why should it be in Markdown?

mhansen · 2022-11-26T05:26:13Z

Hi there, we just noticed our Graphviz docs (hugo/goldmark-based) have smart quotes, breaking copy/paste (Graphviz docs issue), where we're using <code> blocks instead of backticks so that we can link sections inside the code block:

<code>[mode](/docs/attrs/mode/)="hier"</code>

Shows these smart quotes, unintentionally:

We seem to be stuck between two bad places: we seem to be relying on some markdown processing inside our code tag (to generate the <a> tag), but don't want the smart-quotes processing. We're caught between disabling links inside our code sections, or disabling the smart-quotes feature entirely, even outside of code blocks, or try to use two code sections (but that causes excessing padding issues). None of these options are appetising! Could anyone suggest another workaround that lets us keep intra-code-block links?

jgm · 2022-11-26T05:34:57Z

@mhansen you both want and don't want markdown processing inside the code span.
I'd suggest a different approach. A simple one would be

<code>[mode](/docs/attrs/mode/)</code>`="hier"`

Or even better

[`mode`](/docs/attrs/mode/)`="hier"`

mhansen · 2022-11-26T05:47:34Z

Thanks for the workaround. We'll need to remove left & right padding and the code-block rounded corners from our code blocks so they don't have extra padding and rounded corners, like the bottom here:

We can probably get away without the padding and rounded corners, though it won't look as nice, we might end up doing that:

FWIW, I'd be very happy to give up all markdown parsing in <code> blocks; it's very easy to write out the <code><a href="...">foo</a>=bar</code> but, I think, tougher to make two side-by-side code blocks look nice in CSS.

kaushalmodi mentioned this issue May 22, 2022

typographer extension should ignore code element content yuin/goldmark#308

Closed

kaushalmodi mentioned this issue May 22, 2022

straight quotes rendered as curved quotes in inline src blocks kaushalmodi/ox-hugo#640

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Leave the content of `<code>` and related HTML elements untouched by CommonMark parser #711

Leave the content of `<code>` and related HTML elements untouched by CommonMark parser #711

kaushalmodi commented May 22, 2022

wooorm commented May 22, 2022

kaushalmodi commented May 22, 2022 •

edited

Loading

kaushalmodi commented May 22, 2022

wooorm commented May 22, 2022

jgm commented May 22, 2022 •

edited

Loading

kaushalmodi commented May 22, 2022

kaushalmodi commented May 22, 2022 •

edited

Loading

jgm commented May 22, 2022

wooorm commented May 22, 2022

kaushalmodi commented May 22, 2022

wooorm commented May 22, 2022

Crissov commented May 22, 2022

mhansen commented Nov 26, 2022 •

edited

Loading

jgm commented Nov 26, 2022

mhansen commented Nov 26, 2022 •

edited

Loading

Leave the content of <code> and related HTML elements untouched by CommonMark parser #711

Leave the content of <code> and related HTML elements untouched by CommonMark parser #711

Comments

kaushalmodi commented May 22, 2022

wooorm commented May 22, 2022

kaushalmodi commented May 22, 2022 • edited Loading

kaushalmodi commented May 22, 2022

wooorm commented May 22, 2022

jgm commented May 22, 2022 • edited Loading

kaushalmodi commented May 22, 2022

kaushalmodi commented May 22, 2022 • edited Loading

jgm commented May 22, 2022

wooorm commented May 22, 2022

kaushalmodi commented May 22, 2022

wooorm commented May 22, 2022

Crissov commented May 22, 2022

mhansen commented Nov 26, 2022 • edited Loading

jgm commented Nov 26, 2022

mhansen commented Nov 26, 2022 • edited Loading

Leave the content of `<code>` and related HTML elements untouched by CommonMark parser #711

Leave the content of `<code>` and related HTML elements untouched by CommonMark parser #711

kaushalmodi commented May 22, 2022 •

edited

Loading

jgm commented May 22, 2022 •

edited

Loading

kaushalmodi commented May 22, 2022 •

edited

Loading

mhansen commented Nov 26, 2022 •

edited

Loading

mhansen commented Nov 26, 2022 •

edited

Loading