Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Variation allowed in canonicalizing language tags--necessary? #111

Closed
littledan opened this issue Oct 18, 2016 · 15 comments
Closed

Variation allowed in canonicalizing language tags--necessary? #111

littledan opened this issue Oct 18, 2016 · 15 comments
Labels
c: locale Component: locale identifiers s: help wanted Status: help wanted; needs proposal champion

Comments

@littledan
Copy link
Member

In tc39/test262#774 , it as noticed that CanonicalizeLanguageTag makes canonicalization optional

The specifications for extensions to BCP 47 language tags, such as RFC 6067, may include canonicalization rules for the extension subtag sequences they define that go beyond the canonicalization rules of RFC 5646 section 4.5. Implementations are allowed, but not required, to apply these additional rules.

I don't see the value added for making canonicalization optional; seems like things would be better either requiring or prohibiting canonicalization. Thoughts? cc @jungshik

@caridy
Copy link
Contributor

caridy commented Oct 18, 2016

@littledan this piece precedes my time as the editor, maybe @NorbertLindenberg knows better.

@littledan littledan changed the title Useless variation allowed in canonicalizing language tags Variation allowed in canonicalizing language tags--necessary? Oct 18, 2016
@littledan
Copy link
Member Author

For anyone else following along, the relevant part seems to be (and this is followed in the second link by a bunch of people saying LGTM)

Issue 2, CanonicalizeLanguageTag (6.2.3), raised by Allen:

The spec used to say (before February 23): "Implementations are allowed, but not required, to also canonicalize each extension subtag sequence within the tag according to the canonicalization specified by the standard registering the extension, such as RFC 6067 section 2.1.1."

Allen points out that the result is visible to ECMAScript code, and that this is the sort of situation were TC39 prefers to mandate a consistent result across all implementations.

Counterarguments to requiring extension subtag sequence canonicalization:

  1. New extensions are being defined that implementations may not know about (and have no need to know about).
  2. For the extension that this API cares about, the -u- extension, a comparison of language tags as complete strings isn't very useful because different functionality cares about different extension keys - Collator about -co- and a few others, NumberFormat about -nu-, and DateTimeFormat about -ca-. ResolveLocale picks out the extension keys that are relevant for its caller.

Note that canonicalization according to BCP 47 is mandatory; only the additional rules created by extension specifications are currently optional.

My proposed resolution: Clarify that the quoted statement is only about canonicalization rules that go beyond those of BCP 47; don't change the behavior. The new wording in the February 23 draft is:
"The specifications for extensions to BCP 47 language tags, such as RFC 6067, may include canonicalization rules for the extension subtag sequences they define that go beyond the canonicalization rules of RFC 5646 section 4.5. Implementations are allowed, but not required, to apply these additional rules."

Seems like this all ties in pretty closely with another question: is it a good thing to make BCP 47 extensions be conceptually "open world", that implementations may add more, and unknown extensions are treated as irrelevant? I raised this question on #113; I'd prefer that the standard explain exactly which BCP 47 extensions must be supported by ECMA 402 and allow only those, with an API like resolvedOptions providing a clear signal of whether the flag was implemented or not.

In the case of canonicalizing the calendar, this isn't a place where some implementations support it and others are unaware so can't be expected to canonicalize; it is explicitly required. So I don't see a big reason to allow variation here.

cc @allenwb who originally raised this issue.

@littledan
Copy link
Member Author

I'm wondering how useful the ability to have this variation is to implementers. Due to @jungshik, we implement in V8, but maybe Mozilla makes another choice; what choices do other browsers prefer and why? @jswalden @zbraniecki @thetalecrafter @bterlson

@zbraniecki
Copy link
Member

I believe we'd prefer to remove the variation here and have it strictly defined.

@jungshik
Copy link

I also like to remove the variation and define the behavior strictly.

@caridy
Copy link
Contributor

caridy commented Oct 21, 2016

@littledan @zbraniecki do you think we need an actual proposal? or just a PR where we can discuss the details?

@allenwb
Copy link
Member

allenwb commented Oct 22, 2016

See this comment #110 (comment) .

Perhaps the sort of tightened requirements discussed in this thread should be part of the HTML spec rather than imposing them upon all platforms that might want to host ECMA-402.

@littledan
Copy link
Member Author

@caridy IMO a PR should be sufficient; this seems analogous to several other cases where ECMA 262 pulled in a needs-consensus PR. In this case, since the spectrum of options are represented by shipping browsers, we have implementation experience already.

@annevk
Copy link
Member

annevk commented Oct 26, 2016

It sounds like implementations are not going to expose hooks for hosts. I think then the specification should not pretend such hooks exist either and just define how implementations should work and disallow the variability. Such fiction only makes the requirements harder to follow and doesn't end up adding value. (If indeed at some point implementations are planning on having such variability, we can reconsider.)

@zbraniecki
Copy link
Member

Next step here? Anyone wants to champion that?

@littledan
Copy link
Member Author

@zbraniecki I think you'd be a natural champion, with the compatibility information that you've already collected. (If no one else comes along, I could take up writing a PR, but it might not be for two or three months that I really have a chance.)

@sffc sffc added s: help wanted Status: help wanted; needs proposal champion c: locale Component: locale identifiers labels Mar 19, 2019
@anba
Copy link
Contributor

anba commented Mar 16, 2020

This issue has been fixed with the move to refer to UTS 35 for all language tag canonicalisation steps.

@sffc
Copy link
Contributor

sffc commented Mar 16, 2020

Makes sense; @jswalden can you verify that this issue is fixed?

(For the other issues @anba pointed out were finished, I served as the verifier, but on this issue I'm not the best one to verify)

@littledan
Copy link
Member Author

I agree that we've fixed this issue with the UTS 35 reference.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
c: locale Component: locale identifiers s: help wanted Status: help wanted; needs proposal champion
Projects
None yet
Development

No branches or pull requests

9 participants