Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Incorrect canonical time zone name for Asia/Kolkata #1076

Closed
justingrant opened this issue Oct 29, 2020 · 14 comments
Closed

Incorrect canonical time zone name for Asia/Kolkata #1076

justingrant opened this issue Oct 29, 2020 · 14 comments

Comments

@justingrant
Copy link
Collaborator

Temporal.TimeZone.from('Asia/Kolkata')
// => Asia/Calcutta

According to https://en.wikipedia.org/wiki/List_of_tz_database_time_zones, Asia/Calcutta is an alias and Asia/Kolkata is canonical.

@justingrant
Copy link
Collaborator Author

@ptomato believes this may be a bug in Node's CLDR data. If anyone has insight into the root cause of this problem and/or how to fix it, feel free to pick up this issue!

@justingrant
Copy link
Collaborator Author

These issues seem like a good explanation of the issue, but I haven't read through them to know what are the correct next steps. tc39/ecma402#272 seems like the most comprehensive one.

tc39/ecma402#272 (comment)
https://unicode-org.atlassian.net/browse/CLDR-9892
https://bugs.chromium.org/p/chromium/issues/detail?id=580195

cc @sffc

@ptomato
Copy link
Collaborator

ptomato commented Oct 29, 2020

new Intl.DateTimeFormat('en', {timeZone: 'Asia/Kolkata'}).resolvedOptions().timeZone gives "Asia/Kolkata" in Firefox 79, and "Asia/Calcutta" in Node 12.16.1.

@justingrant
Copy link
Collaborator Author

This definitely seems like something that should be consistent across browsers. @sffc, is this something you could look into on the Chromium and/or 402 side? See 402 and chromium bugs linked above.

@sffc
Copy link
Collaborator

sffc commented Oct 30, 2020

@ryzokuken is assigned the 402 issue. It looks like it requires a fix in CLDR. The CLDR issue is not currently assigned to a milestone, which means someone needs to go in and pick it up.

@jungshik
Copy link

On purpose, CLDR has kept old names like Asia/Calcutta. So, it is more than a fix but requires a CLDR policy change on the timezone ID 'stability'.

@justingrant
Copy link
Collaborator Author

Is there a pointer to where this stability policy is documented and/or was discussed? Would be interesting to understand the pros and cons on either side of the decision.

FWIW, I assume that there will be increasing pressure from affected countries to fix this, so changes are probably going to happen at some point. Main question is probably when, not if.

@yumaoka
Copy link

yumaoka commented Oct 30, 2020

I'm not sure why it requires CLDR fix. In ICU, you can instantiate TimeZone object with both Asia/Kolkata and Asia/Calcutta. When you create a TimeZone object with zone ID "Asia/Kolkata", then TimeZone#getId() returns "Asia/Kolkata". "Asia/Kolkata" is mapped to "Asia/Calcutta" when you call TimeZone#getCanonicalID(), and it is returning CLDR canonical ID. CLDR canonical ID is used for resolving time zone display names in CLDR data.

FWIW, I assume that there will be increasing pressure from affected countries to fix this, so changes are probably going to happen at some point. Main question is probably when, not if.

As long as you accept "Asia/Kolkata", and returns "Asia/Kolkata" as its ID, then I don't think it's an issue.

@Ms2ger
Copy link
Collaborator

Ms2ger commented Oct 30, 2020

Whatever the outcome, this needs to be handled in 402.

@Ms2ger Ms2ger closed this as completed Oct 30, 2020
@justingrant
Copy link
Collaborator Author

justingrant commented Oct 30, 2020

For future readers: the root issue about which IDs should be canonical is tc39/ecma402#272. This issue should be focused on what Temporal is going to do (if anything) about time zone canonicalization issues.

As long as you accept "Asia/Kolkata", and returns "Asia/Kolkata" as its ID, then I don't think it's an issue.

FWIW, this is not the current Temporal behavior. I'm going to re-open this issue so we can decide how Temporal should act re: timezone ID canonicalization. The root cause (and perhaps the final solution too?) may be in 402, but regardless of the solution it also might require changes in Temporal's API and/or implementation as noted below. So it makes sense to keep a tracking issue on the Temporal side.

If the current canonical names stay as-is in the source data, then it seems like we (meaning Temporal and/or 402) have two choices:

  1. Maintain the name that the user supplied. Don't canonicalize it, or perhaps partially canonicalize (e.g. fix capitalization) but don't canonicalize from one alias to another.
  2. Keep canonicalizing the name when constructing new TimeZone instances. This is the current Temporal (and I assume 402 too?) behavior.

A problem with (1) is that it interferes with using TimeZone.prototype.id or TimeZone.prototype.toString() to compare time zones for equality. tz1.id === tz2.id could return false for the same time zone. The same issue exists when doing string comparisons of ZonedDateTime instances.

Another issue with (1): some Temporal operations are allowed vs. forbidden depending on matching time zones. For example, ZonedDateTime.prototype.until will throw when provided two instances in different time zones when the expected return value contains days or larger units. So (1) would make it harder for callers to anticipate when they'll run into cross-timezone exceptions. I don't think we want users to have to wrap every difference() call in a try/catch, nor do we want them using using exceptions to determine whether two time zones are the same or not.

On the other hand, a challenge with (2) is that different JS environments have solved the "stale canonical ID" problem in different ways. Firefox, for example, canonicalizes to Asia/Kolkata while Chrome uses Asia/Calcutta. So you can end up with the exact same data serialized using different strings depending on which browser is used. And even if CLDR policy is updated to canonicalize the correct IDs, the same problem will happen when the next large global city decides to changes its name.

We could make ID canonicalization a caller-facing option in TimeZone.from, but that seems like it will add one more variable to an already complex problem. Perhaps a clearer API would be a TimeZone.prototype.canonicalize() method to support use cases which depend on canonical names, or provide some other way to get the canonical ID for a particular TimeZone instance. Given that canonical IDs can change over time and across browsers, we'd want to document this carefully to avoid developers thinking this that the canonical name makes an implicit guarantee about stability that might not really exist.

At this point I'd lean the course of action below in Temporal (which might require some help from 402):

  • Stop canonicalizing time zone IDs to other Continent/City aliases, but do canonicalize capitalization, spaces, etc.
  • We may also want to retain canonicalization of the Etc zones, since those are unlikely to change due to political changes.
  • UTC vs. Etc/UTC is another interesting case that @gibson042 ran across. Not sure if it should be canonicalized, and if so what should be the canonical representation.
  • Add a TimeZone.prototype.equals method that would compare the canonical IDs. We'd probably want a symmetrical method on Calendar too. Users who want to compare the actual IDs can still use tz.id===other.id
  • All tests for time zone equality in Temporal polyfill and spec (like that difference case noted above) should use the canonical values
  • Consider a canonicalize() method, a canonicalId property, or a getCanonicalId() method on the Temporal.TimeZone prototype. My weakly-held opinion is that getCanonicalId() would be the best option because its shape implies the weakest guarantee about deterministic behavior over time and across environments.
  • I'm not sure whether Temporal.now should return the canonical ID or the one that the OS provides, if they're ever different. Maybe the latter?

@justingrant justingrant reopened this Oct 30, 2020
@gibson042
Copy link
Collaborator

gibson042 commented Oct 31, 2020

ECMA 402 is an internationalization specification, while ECMA 262 Temporal as a date and time specification. The use cases may overlap, but they are not identical, and I don't think allowing divergence should be off the table. There are also some other things worth noting:

  1. ECMA 402 has a normative reference to the IANA Time Zone Database for time zone names, but Temporal does not (and in my opinion, should not—ECMA 262 ought to remain independent of tzdata). Temporal could make recommendations but ultimately leave time zone canonicalization implementation-defined, with the expected result that engines supporting ECMA 402 will reuse the same algorithms and data.
  2. tzdata defines Links (aliases) for time zones that have been the same since 1970 but differed before then. ECMA 402 CanonicalizeTimeZone requires Intl.DateTimeFormat to replace the alias with its target, but that would not be appropriate for Temporal (and I would argue the same regarding the explicit step in that operation to replace "Etc/UTC" and "Etc/GMT" with "UTC", although that's not a hill worth dying on). This item alone could put sufficient pressure on implementations to adopt tzdata contents fully rather than ignoring "backzone".
  3. tzdata has changed before, and will change again. Canonicalization stability may be a good goal for ECMA 402 (though that's debatable), but IMO is not a good goal for Temporal.

@ptomato
Copy link
Collaborator

ptomato commented Nov 3, 2020

I don't agree with this proposal for the following reasons.

If the current canonical names stay as-is in the source data,

They only do this because, as I understand it, ICU makes a distinction between "ID" and "canonical ID". JS does not (and I don't believe it's necessary to do so, but if it were to change then I believe that ECMA-402 should lead here and not Temporal.)

On the other hand, a challenge with (2) is that different JS environments have solved the "stale canonical ID" problem in different ways. Firefox, for example, canonicalizes to Asia/Kolkata while Chrome uses Asia/Calcutta.

I don't agree with the interpretation that this is a decision to solve things differently across browsers. I am 99% sure this is just a bug in Chrome that has nothing to do with Temporal. Backwards compatible aliases are defined in the tzdata's backward file and that clearly shows that Asia/Kolkata is preferred to Asia/Calcutta. Chrome (and therefore Node) simply interpret this file inconsistently, canonicalizing to the old name for some zones and to the new name for others, while Firefox correctly canonicalizes to the new name. Here's an example of the buggy behaviour that I reproduced in Node 12.16 and Chromium 84, but not in Firefox 79:

> new Intl.DateTimeFormat('en', { timeZone: 'Asia/Calcutta' }).resolvedOptions().timeZone
'Asia/Calcutta'
> new Intl.DateTimeFormat('en', { timeZone: 'Asia/Kolkata' }).resolvedOptions().timeZone
'Asia/Calcutta'
> new Intl.DateTimeFormat('en', { timeZone: 'Asia/Ulan_Bator' }).resolvedOptions().timeZone
'Asia/Ulaanbaatar'
> new Intl.DateTimeFormat('en', { timeZone: 'Asia/Ulaanbaatar' }).resolvedOptions().timeZone
'Asia/Ulaanbaatar'

tzdata defines Links (aliases) for time zones that have been the same since 1970 but differed before then.

This is in backzone which is distinct from backward which I mentioned above. If 402 canonicalizes these to the modern time zones, then I think it would be inconsistent for Temporal to not canonicalize them. However, should someone want to have the pre-1970 data in Temporal, they can use the custom time zone tools that Temporal gives them, to read the backzone file and implement custom time zones in userland, and override Temporal.TimeZone.from() to return one of these when given a backzone name.

tl;dr I don't think any change is needed, I'm 99% sure the Calcutta name is a Chrome bug, and I think that backzones are not a use case for Temporal and should be done in userland. I believe this issue should be closed.

@sffc
Copy link
Collaborator

sffc commented Nov 3, 2020

The closest prior art we have in terms of canonicalization is Intl.Locale. We unconditionally canonicalize it, with a reference top UTS 35:

https://tc39.es/ecma402/#sec-canonicalizeunicodelocaleid

For example:

new Intl.Locale("iw").toString()
// "he"

Therefore I think it is okay if Temporal.TimeZone canonicalizes on input. It would be best though if we can point to a specification on exactly how to perform the canonicalization, like we do in Intl.Locale.

@ptomato
Copy link
Collaborator

ptomato commented Jan 14, 2021

Closing, unless someone can convince me that this is not just a bug in Chrome 😄

@ptomato ptomato closed this as completed Jan 14, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

7 participants