Character references in autolinks #727

xiaq · 2022-11-05T11:24:10Z

The spec doesn't specify whether character references are supported inside autolinks. The following Markdown:

<aa:&#65;>

is rendered as the following by cmark:

<p><a href="aa:A">aa:A</a></p>

but as the following by commonmark.js:

<p><a href="aa:&amp;#65;">aa:&amp;#65;</a></p>

xiaq · 2022-11-05T11:28:12Z

Ah, I filed an issue about exactly the same problem in commonmark/commonmark.js#263. So it seems that the intention is to supported character references inside autolinks.

Maybe we can add an example to the spec with a character reference in an autolink?

wooorm · 2022-11-05T11:43:08Z

I’m pretty strongly in the camp that character references should not work in autolinks.
Except for this, they work in the same spaces where (backslash) character escapes work.
Character escapes is in the same (preliminaries) section in the spec, and it has an example: https://spec.commonmark.org/0.30/#example-20.

I don’t think there should be one edge case where backslashes don’t work but characters references do?

jgm · 2022-11-06T00:07:26Z

I think the motivation was that autolinks can be URLs that you just copy from some other source, and these might contain character references.

wooorm · 2022-11-06T00:12:19Z

I’m not sure about that reasoning: they might as well be fine unicode, particularly when coming from an address bar. I could see problems with double decoding.
But, most important for me: it has to be consistent with character escapes.

wooorm · 2022-11-06T00:16:02Z

On motivation: do you mean cmark is more in line with your motivation? That the absence in cmjs was because it was forgotten? That no test for it in the spec was intended? What do you think about the test on character escapes but no test of character references?

jgm · 2022-11-06T00:26:05Z

Yes, in the linked issue, I said I thought that cmark was getting it right.
It could be worth adding a spec example for this.

jgm · 2022-11-06T00:27:11Z

I see why it would be nice if entities got resolved in exactly the places backslash escapes do -- but again, this is motivated by a desire to support URL copy-pasting.

wooorm · 2022-11-06T10:22:10Z

Consistency with character escapes is most important to me.
If the character escapes are allowed too I am open to it. I still see a lot of inconsistency for character references in Babelmark (so good to specify whatever the choice is).
Here’s a test case of several normal cases and edge cases:

a <https://example&period;com>

b <https:&sol;&sol;example.com>

c <https&colon;//example.com>

d <&#104;ttps://example.com>

e <some&period;[email protected]>

f <some.user@example&period;com>

Note that C and D are not allowed per CommonMark as the protocol (part before and including :) does not allow &, ;, #.
And that E and F are not allowed per CM because neither the part before @ (ASCII atext) nor after (domain) allow ;.

xiaq · 2022-11-06T12:51:43Z

@jgm IMO there is an equally valid argument against character reference if we are talking about copy-pasting: one could also copy-paste from a place that doesn't interpret character references, like the browser's URL bar, or a displayed webpage (as opposed to the HTML source).

jgm · 2022-11-06T17:21:58Z

@xiaq - granted.

jgm · 2022-11-06T17:47:48Z

Granting that there are these two possible sources for copy/paste, I think my reasoning was that if a valid character reference occurs in a copied URL, it's by far likeliest that its source is raw HTML rather than the browser's URL bar or a displayed web page. How often does one want to display something like & in a URL?

wooorm · 2022-12-26T07:52:57Z

I mostly care about consistency, so then I’d also ask: how often does one want to display something like \?, where ? is any ASCII punctuation. If it’s consistent: I’m fine with it.

But thinking some more about this, while the motivation of “allow copy/paste” is a good one, to get there I believe we should then also allow unicode letters/punctuation in email atext, and unicode letters + at likely & + \ in email domains?

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Character references in autolinks #727

Character references in autolinks #727

xiaq commented Nov 5, 2022

xiaq commented Nov 5, 2022

wooorm commented Nov 5, 2022

jgm commented Nov 6, 2022

wooorm commented Nov 6, 2022

wooorm commented Nov 6, 2022

jgm commented Nov 6, 2022

jgm commented Nov 6, 2022

wooorm commented Nov 6, 2022

xiaq commented Nov 6, 2022

jgm commented Nov 6, 2022

jgm commented Nov 6, 2022

wooorm commented Dec 26, 2022

Character references in autolinks #727

Character references in autolinks #727

Comments

xiaq commented Nov 5, 2022

xiaq commented Nov 5, 2022

wooorm commented Nov 5, 2022

jgm commented Nov 6, 2022

wooorm commented Nov 6, 2022

wooorm commented Nov 6, 2022

jgm commented Nov 6, 2022

jgm commented Nov 6, 2022

wooorm commented Nov 6, 2022

xiaq commented Nov 6, 2022

jgm commented Nov 6, 2022

jgm commented Nov 6, 2022

wooorm commented Dec 26, 2022