IDNA #53

annevk · 2015-07-30T14:53:01Z

This issue tracks faults in http://www.unicode.org/reports/tr46/ since Unicode doesn't really do that well. If you find an issue, use http://www.unicode.org/reporting.html to report it and then report back here.

SimonSapin · 2015-07-30T14:54:29Z

I’ve just submitted the following to http://www.unicode.org/reporting.html. I’ll update this when I get a response.

Subject: xn-- prefix never added in UTS # 46

In http://www.unicode.org/reports/tr46/tr46-15.html#ProcessingStepConvertValidate , the algorithm looks for a xn-- prefix and decodes the rest of the label per Punycode when it is present.

In http://www.unicode.org/reports/tr46/tr46-15.html#ToASCII however, the xn-- prefix is never added:

Convert each label with non-ASCII characters into Punycode [RFC3492]. This may record an error.

This should probably be replaced with something like:

For each label with non-ASCII characters, replace the label with “xn--” followed by the encoding of the label according to Punycode [RFC3492]. This may record an error.

Sebmaster · 2015-08-16T22:06:51Z

My report to Unicode from some time ago which seems to not be fixed yet:

The Format section (8.1) under Conformance Testing in UTS46 is confusing.

The explanation for the toASCII and toUnicode explains to use the provided processing_option for toUnicode, and always use nontransitional for toASCII.
However, in the implementation section of toUnicode (4.3), it explains to always call the processing step with nontransitional. The toASCII parameter list provides a processing_option, though.

It looks to me, as if the descriptions for toASCII and toUnicode in the conformance testing section got mixed up. This also applies to the descriptions in the header of IdnaTest.txt.

The other thing is that there's only a single IdnaTest file, but there's no explanation to which algorithm it applies. Is it for IDNA2008, IDNA2003 or UTS46? It seems to be categorized according to Unicode standard instead of IDNA reference, which makes this really confusing. Haven't reported that one yet though.

SimonSapin · 2015-08-17T12:48:30Z

@Sebmaster regarding the other thing, http://www.unicode.org/reports/tr46/#Conformance_Testing explains how "To test for conformance to UTS46" using IdnaTest.txt.

Sebmaster · 2015-08-17T16:09:16Z

@SimonSapin I'm not sure that's totally correct either since:

Bn for Bidi Rule #n from Section 2. The Bidi Rule, in Right-to-Left Scripts for Internationalized Domain Names for Applications (IDNA) [IDNA2008]
Cn for ContextJ tests in, Appendix A.n in The Unicode Code Points and Internationalized Domain Names for Applications (IDNA) [IDNA2008]. Thus C1 = Appendix A.1. ZERO WIDTH NON-JOINER, and C2 = Appendix A.2. ZERO WIDTH JOINER. The CONTEXTO tests are optional for client software, and not tested here.

is not described in TR46 at all. It's imported from the IDNA2008 standard, which has no relevance in the TR46 spec... I think 😕

Sebmaster · 2015-08-18T23:35:21Z

Got a mail today from Unicode (regarding conformance test description):

This was discussed at the UTC meeting in July, and has been forwarded to the author of the UTS for consideration in a subsequent version.

So that's pretty sweet.

jcranmer · 2015-09-15T04:00:30Z

Oh yeah, I came back into this and recall that the IdnaTest.txt is really bad at telling you how to process it.

@Sebmaster:
The ToASCII column uses nontransitional processing (read IdnaTest.txt's commented header) and UseSTD3ASCIIRules=true (see §8 of the input). However, they definitely appear to have some extra rules not described in their algorithm (for example, ToUnicode should never produce an [A4_1] or [A4_2] error, since those are specific to the ToASCII regime and ToUnicode never calls ToASCII, yet you can clearly see for yourself that they do).

SimonSapin · 2015-11-01T04:47:49Z

I got a response to #53 (comment):

This has been added to the feedback document for next week's meeting.

SimonSapin · 2015-11-26T00:43:34Z

… and today:

I was directed by the UTC to let you know that this has been sent to the editor for review during the next update cycle.

valenting · 2016-02-08T19:20:15Z

As per servo/rust-url#160 I submitted feedback regarding Validation rule no. 2 - "2. The label must not contain a U+002D HYPHEN-MINUS character in both the third and fourth positions."
This isn't being enforced by all UAs, as it's being used on youtube which uses domains such as https://r3---sn-2gb7ln7k.googlevideo.com/videoplayback?... This domain breaks that rule.

srl295 · 2016-05-12T18:24:34Z

@valenting Your feedback is tracked as part of PRI317 http://www.unicode.org/review/pri317/ (being discussed now).

By the way @SimonSapin I'd think the right way to track is via UTC agenda items http://www.unicode.org/L2/L-curdoc.htm

Sebmaster · 2016-11-20T04:18:49Z

It seems like Unicode has closed that ticket without removing the -- validity requirement 😞

Does anybody have the ability to look into the Unicode ... process to see what's going on there?

annevk · 2016-11-21T08:54:47Z

https://lists.w3.org/Archives/Public/www-archive/2016Nov/0001.html

domenic · 2017-01-06T14:39:00Z

The validation rule problem mentioned at #53 (comment) doesn't seem to have made it into https://docs.google.com/document/d/11PEww2N0PbXyPhbsCdW_PjD3BNgZMy5XHUv02SSXNqY/edit#heading=h.p7mmdt3ofe3 by my reading. What's the latest? It's item A, nevermind

annevk · 2017-02-13T09:38:41Z

Going forward, rather than tracking all UTS 46 feedback here, I suggest we just create new issues against this repository, so we can discuss each problem in isolation. I created an idna label that we can use to group them all.

TimothyGu · 2017-05-11T08:24:59Z

As an update to the original issue, it seems the proposed changes to UTS#46 have been incorporated into its latest draft: http://www.unicode.org/reports/tr46/proposed.html.

Since traditionally UTS#46 updates are synced with Unicode Standard updates, a new version of UTS#46 with the CheckHyphens hook should be published next month, when Unicode 10.0.0 is scheduled to be released as well.

annevk · 2017-05-11T09:42:18Z

Yeah, but it's unclear if that's what we want as I commented in the document @domenic pointed to and discussed in email (primarily www-archive) and also at #267. However, it's been very hard to get implementers to give feedback on exactly what they're willing to try out here and what makes sense.

domenic · 2017-05-11T16:32:48Z

I think we should go for a quick-fix first so that people who are trying to use spec-complaint libraries like Node.js's URL and jsdom/whatwg-url don't continue to suffer. We can use #267 to figure out a longer-term browser-compatible plan.

annevk · 2017-05-11T16:46:37Z

This is the generic IDNA issue. It's not for the hyphen case specifically.
Is there actual suffering? Demonstrated compatibility issues would be really valuable.

domenic · 2017-05-11T17:00:46Z

Sure, I meant we should go for a quick-fix for the hyphens.

Examples of suffering:

Bug in parsing URLs jsdom/whatwg-url#50 (notice it cascading to cause problems in Google Lighthouse)
WHATWG URL throw parse error for valid domains nodejs/node#12965 (causing Node.js to contemplate deviating from the spec in url: ignore IDN errors when domainname have two hyphens nodejs/node#12966)

As reported at #53 (comment) this is causing issues in non-browser implementations.

Tests for whatwg/url#53 and friends.

annevk · 2017-05-24T10:52:52Z

FWIW, I changed my mind after seeing #309 (comment). I think what UTS46 revision 18 defines is reasonable and that's what we should go with.

annevk · 2017-05-24T10:57:50Z

@Sebmaster did your testing issue ever got addressed? If not, could you file a new issue on that? I'm happy to help investigate that as I've made some attempts myself as well now.

Fixes #53 and fixes #267 by no longer breaking on on hyphens in the 3rd and 4th position of a domain label. This is known to break YouTube: r3---sn-2gb7ln7k.googlevideo.com. This is done by setting the proposed CheckHyphens flag to false. Fixes #110 by clarifying that BIDI and CONTEXTJ checks are to be done by setting the proposed CheckBidi and CheckJoiners flags to true. Follow-up #313 is filed to remove the proposed bits once Unicode is updated.

Tests: web-platform-tests/wpt#5976. Fixes #53 and fixes #267 by no longer breaking on on hyphens in the 3rd and 4th position of a domain label. This is known to break YouTube: r3---sn-2gb7ln7k.googlevideo.com. This is done by setting the proposed CheckHyphens flag to false. Fixes #110 by clarifying that BIDI and CONTEXTJ checks are to be done by setting the proposed CheckBidi and CheckJoiners flags to true. Follow-up #313 is filed to remove the proposed bits once Unicode is updated.

Sebmaster · 2017-05-26T00:28:57Z

It's not addressed yet, but the latest draft contains a TODO for it.

Tests for whatwg/url#53 and friends, as fixed by whatwg/html#2627.

domenic · 2017-06-17T22:06:01Z

I reported the following editorial issue:

I help maintain a JavaScript library for implementing UTS 46. In the process of revising our public API for the upcoming proposed revision (http://www.unicode.org/reports/tr46/proposed.html#ToASCII), we noticed how strange it is that all other inputs to ToASCII, besides the input string, are booleans. Whereas processing_option is an enumeration with two values.

For editorial consistency, would it make sense to switch the processing option to a boolean flag, e.g. UseTransitionalProcessing?

annevk · 2017-06-22T17:43:20Z

@Sebmaster it seems http://www.unicode.org/reports/tr46/#Conformance_Testing was updated.

@domenic I asked for that change too. Note that if we want to track it here it should become its own issue. This is no longer a meta issue for all things IDNA as it got too unwieldy.

domenic · 2017-06-22T17:48:36Z

No need to really track it. I do think though having a public archive of feedback we've submitted is good, and my bad for my part in derailing the thread away from that. Maybe www-archive is OK?

annevk · 2017-06-22T17:52:30Z

Yeah or just a new issue for each piece of feedback. I don't think that would get crowded and if it does we can figure out a better approach.

SimonSapin mentioned this issue Jul 30, 2015

IDNA support servo/rust-url#119

Merged

SimonSapin mentioned this issue Jan 23, 2016

Consider ignoring UTS46 validity criteria V2 servo/rust-url#160

Closed

domenic mentioned this issue Apr 15, 2016

Bug in parsing URLs jsdom/whatwg-url#50

Closed

annevk mentioned this issue Jul 4, 2016

Figure out what to do with youtube IDNA issues #131

Closed

annevk added the topic: parser label Dec 20, 2016

bagder mentioned this issue Jan 30, 2017

IDNA2008 #223

Closed

annevk added the topic: idna label Feb 10, 2017

TimothyGu mentioned this issue May 11, 2017

WHATWG URL throw parse error for valid domains nodejs/node#12965

Closed

annevk added a commit that referenced this issue May 12, 2017

Clearly indicate a known issue with ToASCII

0ac537b

As reported at #53 (comment) this is causing issues in non-browser implementations.

annevk mentioned this issue May 12, 2017

Address several IDNA issues #309

Merged

annevk added a commit to web-platform-tests/wpt that referenced this issue May 18, 2017

URL: ToASCII

3a97ad9

Tests for whatwg/url#53 and friends.

annevk mentioned this issue May 18, 2017

URL: ToASCII web-platform-tests/wpt#5976

Merged

annevk added a commit to web-platform-tests/wpt that referenced this issue May 19, 2017

URL: ToASCII

192384e

Tests for whatwg/url#53 and friends.

domenic closed this as completed in dc9d831 Jun 1, 2017

domenic pushed a commit to web-platform-tests/wpt that referenced this issue Jun 1, 2017

URL: ToASCII

ded8ffe

Tests for whatwg/url#53 and friends, as fixed by whatwg/html#2627.

domenic mentioned this issue Jun 17, 2017

Update to spec version 10.0.0 + Revamp API jsdom/tr46#11

Merged

patrickhulce mentioned this issue Aug 18, 2017

ServiceWorker registration and 200 while offline false negatives on punycoded URL GoogleChrome/lighthouse#3040

Closed

TimothyGu mentioned this issue May 6, 2021

x/net/idna: Display returns invalid label for r4---sn-a5uuxaxjvh-gpm6.googlevideo.com. golang/go#27059

Open

moredure mentioned this issue Feb 1, 2022

idna: do not check ace prefix slashes in case of acePrefix not present golang/net#125

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

IDNA #53

IDNA #53

annevk commented Jul 30, 2015

SimonSapin commented Jul 30, 2015

Sebmaster commented Aug 16, 2015

SimonSapin commented Aug 17, 2015

Sebmaster commented Aug 17, 2015

Sebmaster commented Aug 18, 2015

jcranmer commented Sep 15, 2015

SimonSapin commented Nov 1, 2015

SimonSapin commented Nov 26, 2015

valenting commented Feb 8, 2016

srl295 commented May 12, 2016 •

edited

Loading

Sebmaster commented Nov 20, 2016

annevk commented Nov 21, 2016

domenic commented Jan 6, 2017 •

edited

Loading

annevk commented Feb 13, 2017

TimothyGu commented May 11, 2017

annevk commented May 11, 2017

domenic commented May 11, 2017

annevk commented May 11, 2017

domenic commented May 11, 2017

annevk commented May 24, 2017

annevk commented May 24, 2017

Sebmaster commented May 26, 2017

domenic commented Jun 17, 2017 •

edited

Loading

annevk commented Jun 22, 2017

domenic commented Jun 22, 2017

annevk commented Jun 22, 2017

IDNA #53

IDNA #53

Comments

annevk commented Jul 30, 2015

SimonSapin commented Jul 30, 2015

Sebmaster commented Aug 16, 2015

SimonSapin commented Aug 17, 2015

Sebmaster commented Aug 17, 2015

Sebmaster commented Aug 18, 2015

jcranmer commented Sep 15, 2015

SimonSapin commented Nov 1, 2015

SimonSapin commented Nov 26, 2015

valenting commented Feb 8, 2016

srl295 commented May 12, 2016 • edited Loading

Sebmaster commented Nov 20, 2016

annevk commented Nov 21, 2016

domenic commented Jan 6, 2017 • edited Loading

annevk commented Feb 13, 2017

TimothyGu commented May 11, 2017

annevk commented May 11, 2017

domenic commented May 11, 2017

annevk commented May 11, 2017

domenic commented May 11, 2017

annevk commented May 24, 2017

annevk commented May 24, 2017

Sebmaster commented May 26, 2017

domenic commented Jun 17, 2017 • edited Loading

annevk commented Jun 22, 2017

domenic commented Jun 22, 2017

annevk commented Jun 22, 2017

srl295 commented May 12, 2016 •

edited

Loading

domenic commented Jan 6, 2017 •

edited

Loading

domenic commented Jun 17, 2017 •

edited

Loading