Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Should ECMA-402 spec text for time zone canonicalization refer to CLDR or to IANA as authoritative? #825

Closed
justingrant opened this issue Aug 9, 2023 · 9 comments · Fixed by #877
Assignees
Labels
c: datetime Component: dates, times, timezones s: in progress Status: the issue has an active proposal Small Smaller change solvable in a Pull Request
Milestone

Comments

@justingrant
Copy link
Contributor

justingrant commented Aug 9, 2023

CLDR in unicode-org/cldr#3105 will soon provide the ability to fetch modern canonical IDs for currently-problematic time zones like Asia/Calcutta and Europe/Kiev. ICU will also be adding an API to expose the IANA canonical ID. This will enable V8 and JSC to finally expose modern IANA canonical names like SpiderMonkey does, but without the separate IANA-based overrides that SpiderMonkey has had to maintain.

Before this change, it didn't make sense to have normative spec text in 402 to define which IDs should be primary (the new 262 term for canonical time zone ID) vs. non-primary. But with this CLDR/ICU change, it's finally practical to specify normative rules for determining which IDs are primary (and, if not, which primary ID they resolve to).

I can draft a PR with this normative text for discussion, but first there's one main question to answer: should we specify the rules solely as using CLDR data, solely as using IANA data, or should we specify rules for which IANA IDs are canonical that just happens to match what CLDR is doing?

As an illustration, here's two possible directions that we could take this spec text. Don't worry about the particular text used (it's very rough and will change) but the general approach of depending on CLDR vs. depending on IANA is where I'm most looking for feedback.

Which one is better? @sffc @gibson042 @anba @Constellation @FrankYFTang

Option A - Defer to CLDR

The Unicode Common Locale Data Repository (CLDR) defines which available named time zone identifiers are primary or non-primary, as well as which non-primary time zone identifiers resolve to which primary time zone identifiers. The following exceptions are applied:

  • For historical reasons, "UTC" is a primary time zone identifier while "Etc/UTC", "Etc/GMT", and "GMT" (and the identifiers that resolve to them) are non-primary time zone identifiers that resolve to "UTC".
  • CLDR identifiers that are not present in the IANA Time Zone Database ("Etc/Unknown", "Canada/East-Saskatchewan", and "US/Pacific-New") are not supported.
  • "Factory" is not supported.

The following spec text would replace the steps of https://tc39.es/proposal-temporal/#sup-availablenamedtimezoneidentifiers:

  1. Let result be a new empty List.
  2. For each <type>element that contains an alias attribute in timezone.xml in the Unicode Common Locale Data Repository (CLDR), do
    1. Let aliases be a new List, populated by splitting the space-delimited alias attribute.
    2. If an iana attribute is present, let primary be the String value of that attribute; otherwise, let primary be the first element in aliases.
    3. If primary is one of "Etc/UTC", "Etc/GMT", or "GMT", set primary to "UTC".
    4. If primary is not "Etc/Unknown" nor "Factory" , then
      1. Let record be the Time Zone Identifier Record { [[Identifier]]: primary, [[PrimaryIdentifier]]: primary }.
      2. Append record to result.
      3. For each element identifier in aliases, do
        1. If identifier is not primary and is not one of "Canada/East-Saskatchewan" nor "US/Pacific-New", then
          1. Set record to the Time Zone Identifier Record { [[Identifier]]: primary, [[PrimaryIdentifier]]: primary }.
          2. Append record to result.
  3. Sort result lexicographically by UTF-16 code unit of each element's [[Identifier]] field, in ascending order
  4. Assert: result contains a Time Zone Identifier Record r such that r.[[Identifier]] is "UTC" and r.[[PrimaryIdentifier]] is "UTC".
  5. Return result.

Option B - Define using IANA only

Each Zone in the IANA Time Zone Database must be a primary time zone identifier and each Link name in the IANA Time Zone Database must be a non-primary time zone identifier that resolves to its corresponding Zone name, with the following exceptions:

  • For historical reasons, "UTC" must be a primary time zone identifier. "Etc/UTC", "Etc/GMT", and "GMT", as well as all Link names that resolve to any of them, must be non-primary time identifiers that resolve to "UTC".
  • Any Link name in the TZ column of zone.tab of the IANA Time Zone Database must be a primary time zone identifier.
  • Any Link name that represents a geographical area entirely contained within the territory of a single ISO 3166-2 country code must resolve to a primary identifier that also represents a geographical area entirely contained within the territory of the same ISO 3166-2 country code. For example, "Atlantic/Jan_Mayen" must resolve to "Arctic/Longyearbyen".
  • The following legacy POSIX identifiers must resolve to their Continent/City equivalents, as shown in the table below:
Legacy POSIX Zone Name Primary Time Zone Identifier
EST Etc/GMT+5
MST Etc/GMT+7
HST Etc/GMT+10
EST5EDT America/New_York
CST6CDT America/Chicago
MST7MDT America/Denver
PST8PDT America/Los_Angeles
WET Europe/Lisbon
CET Europe/Berlin
MET Europe/Vienna
EET Europe/Athens

Option C - Define IANA rules, but explain how to use timezone.xml

We could also merge both (A) and (B): define the IANA *and* explain how to use timezone.xml data to satisfy those rules. I'm not going to draft text for this yet because I'm unsure if anyone would want it, but I wanted to include this option here for discussion.

@gibson042
Copy link
Contributor

https://tc39.es/ecma402/ is careful to recommend use of CLDR data without ever requiring it, and I don't want to change that. My preference would be following the existing recommendation pattern for time zone resolution.

@anba
Copy link
Contributor

anba commented Aug 15, 2023

  • Legacy POSIX identifiers that refer to a single fixed UTC offset, such as "EST" or "WET", must be non-primary time zone identifiers that resolve to the corresponding the corresponding Zone name, starting with "Etc/GMT", for the same UTC offset. For example, "EST" must resolve to "Etc/GMT+5".

As mentioned in the CLDR PR, WET, CET, MET, and EET are not fixed-offset time zones and therefore can't be mapped to some Etc/GMT±hh time zone. IOW only EST, HST, and MST can be mapped to Etc/GMT±hh time zones.

Option A will lead to adding the following time zones IDs:

Name Kind
Etc/Unknown Zone
Canada/East-Saskatchewan Link to America/Regina
US/Pacific-New Link to America/Los_Angeles

I don't think these IDs should be added. Etc/Unknown was already discussed in the June meetings. Canada/East-Saskatchewan has been removed from IANA because it's a misnomer. US/Pacific-New has also been removed from IANA, because it was intended for a possible US time zone rule change which never happened.

@justingrant
Copy link
Contributor Author

  • Legacy POSIX identifiers that refer to a single fixed UTC offset, such as "EST" or "WET", must be non-primary time zone identifiers that resolve to the corresponding the corresponding Zone name, starting with "Etc/GMT", for the same UTC offset. For example, "EST" must resolve to "Etc/GMT+5".

As mentioned in the CLDR PR, WET, CET, MET, and EET are not fixed-offset time zones and therefore can't be mapped to some Etc/GMT±hh time zone. IOW only EST, HST, and MST can be mapped to Etc/GMT±hh time zones.

@anba Good points. I updated (B) in the OP accordingly. I also changed from prose to using a table to indicate how we'll handle all those legacy IDs. Better?

Option A will lead to adding the following time zones IDs:

Name Kind
Etc/Unknown Zone
Canada/East-Saskatchewan Link to America/Regina
US/Pacific-New Link to America/Los_Angeles
I don't think these IDs should be added. Etc/Unknown was already discussed in the June meetings. Canada/East-Saskatchewan has been removed from IANA because it's a misnomer. US/Pacific-New has also been removed from IANA, because it was intended for a possible US time zone rule change which never happened.

I agree. I updated (A) to exclude them. I also turned (A) into more formal spec text.

https://tc39.es/ecma402/ is careful to recommend use of CLDR data without ever requiring it, and I don't want to change that. My preference would be following the existing recommendation pattern for time zone resolution.

@gibson042 Could you explain more about why a recommendation is needed as opposed to normative text? The problem we want to solve is deviation between implementations, so seems reasonable to limit wiggle room. Or am I missing something that makes this distinction not matter?

@gibson042
Copy link
Contributor

Basically, ECMA-402 just isn't the right place for such constraints because of https://tc39.es/ecma402/#sec-api-overview

The API can be added to an implementation of the ECMAScript 2024 Language Specification (ECMA-262 15th Edition, or successor) in whole or in part. This specification introduces new language values observable to ECMAScript code (such as the value of a [[FallbackSymbol]] internal slot and the set of values transitively reachable from %Intl% by property access), and also refines the definition of some functions specified in ECMA-262 (as described below). Neither category prohibits behaviour that is otherwise permitted for values and interfaces defined in ECMA-262, in order to support adoption of this specification by any implementation of ECMA-262.

ECMA-402 may add behavior that is not prohibited by ECMA-262, but may not prohibit behavior that is allowed by ECMA-262 except where ECMA-262 specifically delegates to it (as in toLocaleString methods). And CLDR is one source capable of providing localization-relevant information, but is not the only such source—an implementation that relies directly upon IANA data should not be considered non-conforming. Also note that ECMA-402 intentionally allows flexibility as a general rule (e.g., the output of a formatter is not considered stable across implementations or time), and deviations from that flexibility require strong motivation.

@justingrant
Copy link
Contributor Author

TG2 meeting 2023-10-12: Option C above is preferred, with the caveat that we should make the spec text be a recommendation not a normative requirement.

I'll prepare a PR. For specific detailed concerns like those raised in comments above, they're narrow enough that TG2 thought that we can hash them out in PR review.

@justingrant justingrant self-assigned this Oct 12, 2023
@sffc sffc added s: in progress Status: the issue has an active proposal Small Smaller change solvable in a Pull Request c: datetime Component: dates, times, timezones labels Dec 14, 2023
@sffc
Copy link
Contributor

sffc commented Dec 14, 2023

@justingrant The conclusion from the October meeting was that you would "draft a PR with a recommendation based on Option C that we can discuss further."

@sffc sffc added this to the ES 2024 milestone Dec 14, 2023
@justingrant
Copy link
Contributor Author

Yep, sorry I haven't done this yet, but will try to get to it in the next few months.

justingrant added a commit to justingrant/ecma402 that referenced this issue Mar 31, 2024
This PR resolves tc39#825 by adding spec text that defines how ECMA-402
implementations should decide which IANA time zone IDs should be
primary or non-primary.

This PR implements "Option C" in tc39#825 by deterministically defining
ECMAScript's exceptions from the IANA Time Zone Database's defaults,
and then pointing implementers at ICU as a convenient implementation
of those exceptions.

This PR also accommodates to web reality by aligning the 402 spec text
with the existing behavior of ICU4C which is used by most (all?) major
ECMAScript engines to implement time zone features.

This PR is stacked on top of an editorial PR to align ECMA-402's
time-zone-related spec text with ECMA-262.
justingrant added a commit to justingrant/ecma402 that referenced this issue Mar 31, 2024
This PR resolves tc39#825 by adding spec text that defines how ECMA-402
implementations should decide which IANA time zone IDs should be
primary vs. non-primary.

This PR implements "Option C" in tc39#825 by deterministically defining
ECMAScript's exceptions from the IANA Time Zone Database's defaults,
and then pointing implementers at ICU as a convenient implementation
of those exceptions.

This PR also accommodates to web reality by aligning the 402 spec text
with the existing behavior of ICU4C which is used by most (all?) major
ECMAScript engines to implement time zone features.

This PR is stacked on top of an editorial PR to align ECMA-402's
time-zone-related spec text with ECMA-262.
justingrant added a commit to justingrant/ecma402 that referenced this issue Mar 31, 2024
This PR resolves tc39#825 by adding spec text that defines how ECMA-402
implementations should decide which IANA time zone IDs should be
primary vs. non-primary.

This PR implements "Option C" in tc39#825 by deterministically defining
ECMAScript's exceptions from the IANA Time Zone Database's defaults,
and then pointing implementers at ICU as a convenient implementation
of those exceptions.

This PR also accommodates to web reality by aligning the 402 spec text
with the existing behavior of ICU4C which is used by most (all?) major
ECMAScript engines to implement time zone features.

This PR is stacked on top of an editorial PR to align ECMA-402's
time-zone-related spec text with ECMA-262.
justingrant added a commit to justingrant/ecma402 that referenced this issue Mar 31, 2024
This PR resolves tc39#825 by adding spec text that defines how ECMA-402
implementations should decide which IANA time zone IDs should be
primary vs. non-primary.

This PR implements "Option C" in tc39#825 by deterministically defining
ECMAScript's exceptions from the IANA Time Zone Database's defaults,
and then pointing implementers at ICU as a convenient implementation
of those exceptions.

This PR also accommodates to web reality by aligning the 402 spec text
with the existing behavior of ICU4C which is used by most (all?) major
ECMAScript engines to implement time zone features.

This PR is stacked on top of an editorial PR to align ECMA-402's
time-zone-related spec text with ECMA-262.
justingrant added a commit to justingrant/ecma402 that referenced this issue Mar 31, 2024
This PR resolves tc39#825 by adding spec text that defines how ECMA-402
implementations should decide which IANA time zone IDs should be
primary vs. non-primary.

This PR implements "Option C" in tc39#825 by deterministically defining
ECMAScript's exceptions from the IANA Time Zone Database's defaults,
and then pointing implementers at ICU as a convenient implementation
of those exceptions.

This PR also accommodates to web reality by aligning the 402 spec text
with the existing behavior of ICU.

This PR is stacked on top of tc39#876.
justingrant added a commit to justingrant/ecma402 that referenced this issue Mar 31, 2024
This PR resolves tc39#825 by adding spec text that defines how ECMA-402
implementations should decide which IANA time zone IDs should be
primary vs. non-primary.

This PR implements "Option C" in tc39#825 by deterministically defining
ECMAScript's exceptions from the IANA Time Zone Database's defaults,
and then pointing implementers at ICU as a convenient implementation
of those exceptions.

This PR also accommodates to web reality by aligning the 402 spec text
with the existing behavior of ICU.

This PR is stacked on top of tc39#876.
justingrant added a commit to justingrant/ecma402 that referenced this issue Mar 31, 2024
This PR resolves tc39#825 by adding spec text that defines how ECMA-402
implementations should decide which IANA time zone IDs should be
primary vs. non-primary.

This PR implements "Option C" in tc39#825 by deterministically defining
ECMAScript's exceptions from the IANA Time Zone Database's defaults,
and then pointing implementers at ICU as a convenient implementation
of those exceptions.

This PR also accommodates to web reality by aligning the 402 spec text
with the existing behavior of ICU.

This PR is stacked on top of tc39#876.
@justingrant
Copy link
Contributor Author

@justingrant The conclusion from the October meeting was that you would "draft a PR with a recommendation based on Option C that we can discuss further."

PR available! #877

Sorry I didn't get this done in time for the meeting last week, but maybe we could put it on the agenda for the next one?

justingrant added a commit to justingrant/ecma402 that referenced this issue Apr 3, 2024
This PR resolves tc39#825 by adding spec text that defines how ECMA-402
implementations should decide which IANA time zone IDs should be
primary vs. non-primary.

This PR implements "Option C" in tc39#825 by deterministically defining
ECMAScript's exceptions from the IANA Time Zone Database's defaults,
and then pointing implementers at ICU as a convenient implementation
of those exceptions.

This PR also accommodates to web reality by aligning the 402 spec text
with the existing behavior of ICU.

This PR is stacked on top of tc39#876.
justingrant added a commit to justingrant/ecma402 that referenced this issue Apr 3, 2024
This PR resolves tc39#825 by adding spec text that defines how ECMA-402
implementations should decide which IANA time zone IDs should be
primary vs. non-primary.

This PR implements "Option C" in tc39#825 by deterministically defining
ECMAScript's exceptions from the IANA Time Zone Database's defaults,
and then pointing implementers at ICU as a convenient implementation
of those exceptions.

This PR also accommodates to web reality by aligning the 402 spec text
with the existing behavior of ICU.

This PR is stacked on top of tc39#876.
justingrant added a commit to justingrant/ecma402 that referenced this issue Apr 3, 2024
This PR resolves tc39#825 by adding spec text that defines how ECMA-402
implementations should decide which IANA time zone IDs should be
primary vs. non-primary.

This PR implements "Option C" in tc39#825 by deterministically defining
ECMAScript's exceptions from the IANA Time Zone Database's defaults,
and then pointing implementers at ICU as a convenient implementation
of those exceptions.

This PR also accommodates to web reality by aligning the 402 spec text
with the existing behavior of ICU.

This PR is stacked on top of tc39#876.
justingrant added a commit to justingrant/ecma402 that referenced this issue Apr 3, 2024
This PR resolves tc39#825 by adding spec text that defines how ECMA-402
implementations should decide which IANA time zone IDs should be
primary vs. non-primary.

This PR implements "Option C" in tc39#825 by deterministically defining
ECMAScript's exceptions from the IANA Time Zone Database's defaults,
and then pointing implementers at ICU as a convenient implementation
of those exceptions.

This PR also accommodates to web reality by aligning the 402 spec text
with the existing behavior of ICU.

This PR is stacked on top of tc39#876.
justingrant added a commit to justingrant/ecma402 that referenced this issue May 22, 2024
This PR resolves tc39#825 by adding spec text that defines how ECMA-402
implementations should decide which IANA time zone IDs should be
primary vs. non-primary.

This PR implements "Option C" in tc39#825 by deterministically defining
ECMAScript's exceptions from the IANA Time Zone Database's defaults,
and then pointing implementers at ICU as a convenient implementation
of those exceptions.

This PR also accommodates to web reality by aligning the 402 spec text
with the existing behavior of ICU.

This PR is stacked on top of tc39#876.
justingrant added a commit to justingrant/ecma402 that referenced this issue May 29, 2024
This PR resolves tc39#825 by adding spec text that more clearly defines
how ECMA-402 implementations should decide which IANA time zone IDs
should be primary vs. non-primary.

This PR implements "Option C" in tc39#825 by deterministically defining
ECMAScript's exceptions from the IANA Time Zone Database's defaults,
and then pointing implementers at ICU as a convenient implementation
of those exceptions.

This PR also accommodates to web reality by aligning the 402 spec text
with the existing behavior of ICU and CLDR while providing deterministic
rules that can guide future changes in CLDR data.

Finally, this PR introduces a new StringSplitToList abstract operation
and uses it to simplify AvailableNamedTimeZoneIdentifiers and
IsWellFormedUnitIdentifier.
justingrant added a commit to justingrant/ecma402 that referenced this issue May 29, 2024
This PR resolves tc39#825 by adding spec text that more clearly defines
how ECMA-402 implementations should decide which IANA time zone IDs
should be primary vs. non-primary.

This PR implements "Option C" in tc39#825 by deterministically defining
ECMAScript's exceptions from the IANA Time Zone Database's defaults,
and then pointing implementers at ICU as a convenient implementation
of those exceptions.

This PR also accommodates to web reality by aligning the 402 spec text
with the existing behavior of ICU and CLDR while providing deterministic
rules that can guide future changes in CLDR data.

Finally, this PR introduces a new StringSplitToList abstract operation
and uses it to simplify AvailableNamedTimeZoneIdentifiers and
IsWellFormedUnitIdentifier.
justingrant added a commit to justingrant/ecma402 that referenced this issue May 31, 2024
This PR resolves tc39#825 by adding spec text that more clearly defines
how ECMA-402 implementations should decide which IANA time zone IDs
should be primary vs. non-primary.

This PR implements "Option C" in tc39#825 by deterministically defining
ECMAScript's exceptions from the IANA Time Zone Database's defaults,
and then pointing implementers at ICU as a convenient implementation
of those exceptions.

This PR also accommodates to web reality by aligning the 402 spec text
with the existing behavior of ICU and CLDR while providing deterministic
rules that can guide future changes in CLDR data.

Finally, this PR introduces a new StringSplitToList abstract operation
and uses it to simplify AvailableNamedTimeZoneIdentifiers and
IsWellFormedUnitIdentifier.
justingrant added a commit to justingrant/ecma402 that referenced this issue May 31, 2024
This PR resolves tc39#825 by adding spec text that more clearly defines
how ECMA-402 implementations should decide which IANA time zone IDs
should be primary vs. non-primary.

This PR implements "Option C" in tc39#825 by deterministically defining
ECMAScript's exceptions from the IANA Time Zone Database's defaults,
and then pointing implementers at ICU as a convenient implementation
of those exceptions.

This PR also accommodates to web reality by aligning the 402 spec text
with the existing behavior of ICU and CLDR while providing deterministic
rules that can guide future changes in CLDR data.

Finally, this PR introduces a new StringSplitToList abstract operation
and uses it to simplify AvailableNamedTimeZoneIdentifiers and
IsWellFormedUnitIdentifier.
justingrant added a commit to justingrant/ecma402 that referenced this issue May 31, 2024
This PR resolves tc39#825 by adding spec text that more clearly defines
how ECMA-402 implementations should decide which IANA time zone IDs
should be primary vs. non-primary.

This PR implements "Option C" in tc39#825 by deterministically defining
ECMAScript's exceptions from the IANA Time Zone Database's defaults,
and then pointing implementers at ICU as a convenient implementation
of those exceptions.

This PR also accommodates to web reality by aligning the 402 spec text
with the existing behavior of ICU and CLDR while providing deterministic
rules that can guide future changes in CLDR data.

Finally, this PR introduces a new StringSplitToList abstract operation
and uses it to simplify AvailableNamedTimeZoneIdentifiers and
IsWellFormedUnitIdentifier.
justingrant added a commit to justingrant/ecma402 that referenced this issue May 31, 2024
This PR resolves tc39#825 by adding spec text that more clearly defines
how ECMA-402 implementations should decide which IANA time zone IDs
should be primary vs. non-primary.

This PR implements "Option C" in tc39#825 by deterministically defining
ECMAScript's exceptions from the IANA Time Zone Database's defaults,
and then pointing implementers at ICU as a convenient implementation
of those exceptions.

This PR also accommodates to web reality by aligning the 402 spec text
with the existing behavior of ICU and CLDR while providing deterministic
rules that can guide future changes in CLDR data.

Finally, this PR introduces a new StringSplitToList abstract operation
and uses it to simplify AvailableNamedTimeZoneIdentifiers and
IsWellFormedUnitIdentifier.
@sffc sffc moved this to Previously Discussed in ECMA-402 Meeting Topics Aug 22, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
c: datetime Component: dates, times, timezones s: in progress Status: the issue has an active proposal Small Smaller change solvable in a Pull Request
Projects
Status: Previously Discussed
4 participants