-
Notifications
You must be signed in to change notification settings - Fork 2
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
How to reduce cross-implementation incompatibility when new IDs are introduced? #6
Comments
After seeing other bug reports about this topic, I now think this idea is a good one. The time when a new identifier is released will be a sensitive one, so delaying making a brand-new identifier canonical seems like a good idea so that when the change does happen, most of the Web will already know about the new identifier. Here's a few example bugs where this approach might help. Firefox 1796393: Javascript returns problematic timezone, breaking sites |
I don't think using brand new timezone is reasonable (ignoring emotional part). You never practically need to expose raw Olson ID and who is likely to suffer are your users. I agree with IANA's position that Olson ID is just ID and it could be anything - random string, UUID, a number and so on.
What is the use case of canonicalisation? When an user of the a timezone library will need it?
Items you mentioned are unlikely to be updated in that time frame.
nit: it is up to OEMs, not carriers. Though Android 10+ phones (roughly speaking, it is more complicated) can get updates from Mainline. |
Hi @Yqwed, thanks so much for your thoughtful replies. A few follow-up notes and questions are below.
I agree that the use cases for canonicalization are dubious! The current ECMAScript spec uses canonicalization everywhere a time zone identifier is used. There's no way for ECMAScript to return a non-canonical ID back to the caller from any ECMAScript API, including Temporal Stage 3. A goal of this proposal is to change this to reduce the scope of user-observable canonicalization in ECMAScript. The basic idea is that identifiers should never be canonicalized before being returned back to callers via APIs like So if a caller provides One thing I'm not sure about is how we should handle cases where the time zone ID comes not from an ECMAScript caller, but from the OS itself. Assume that an OS's IANA ID is, for example, The one place where I think that we must expose canonical values to developers is But other than that one API, I want to consider removing user-observable canonicalization everywhere else. See https://docs.google.com/presentation/d/1oapwqvqAtauGV5gqpMqWlSfsFE4A38Ylh7aFiMOSIBI/edit#slide=id.g23a7465b127_0_5 for more discussion about this.
Could you explain more what you mean by this? It's IANA, not ECMAScript, that's adding new identifiers in response to geopolitical changes like Kiev being renamed to Kyiv. If IANA introduces a new identifier, we'd have to start using it at some point, right? The context is that there are many years' worth of complaints about outdated city names being exposed by ECMAScript. So continuing to expose outdated names and angering developers isn't good either. Sadly there's no perfect answer here, only tradeoffs.
This is really helpful info to understand. Thanks for sharing!
Makes sense. Thanks for clarifying. |
How could I use Temporal to get from |
My current thinking is that offering a "canonicalize this ID" API is a bad idea, because it would encourage users to take a dependency on a potentially moving target, and make code fragile if the canonical ID changes (like Kiev=>Kyiv did in 2022). There are essentially three choices:
I'm currently thinking that (3) is the least-bad choice. The model I have in mind is like a case-insensitive SQL database colunm, where "Jordan", "jordan", and "JORDAN" are considered the same when it comes to comparison but the table still stores whatever string the user provided. There are tradeoffs to this approach, notably that It's possible that there may be use cases where a "canonicalize this ID" API is desperately needed, but so far I haven't heard any. Also, such an API could always be added later if needed, so I think it's probably good to ship without it and see how developers adjust to a world where canonicalization matters less before assuming we need that API. |
I like 3 in that I then don't have to care about which is canonical; it would indeed make comparing the strings by |
Yes, exactly! It's already a bad practice, but most developers probably don't realize that it's bad yet. The changes we're proposing will just make it more explicit that
We considered a static method here, for exactly the reason you noted, but the existing convention in Temporal is that anything you do with a time zone (like getPossibleInstantsFor, getNextTransition, or getPreviousTransition) requires instantiating a TimeZone instance. So my assumption was that we'd follow the same convention with If that becomes a problem, a later proposal could always add a batch of static methods to TimeZone. Sound OK? |
The entire motivation of allowing strings, as i recall, was to avoid the performance penalty of creating an instance, so that seems like a very unfortunate thing to force. |
I think it's helpful to differentiate two cases:
The problem with (1) was that implementers couldn't optimize unless they knew that no observable time zone or calendar code had to be run while performing those operations. With object-only calendars and time zones, knowing "is this an unmodified built-in calendar and/or time zone?" was hard and/or brittle. Now, it's easy: if the slot has a string, then it's safe to optimize. For (2), I'd be hesitant to break Temporal's prototype-focused API for this relatively uncommon case. Especially since every other Temporal type's I guess we could also offer a |
Isn't it up to factory method? What I mean is that if timezone is X according to the OS and nit: there is no "IANA canonical" thing.
Unfortunately I don't have a good answer here. Currently we are planning to use What I've meant is that one needs to be careful with new IDs: you really don't need to expose Olson ID to users and the use of new ID might affect user experience as it will take time for all the server you interact with to update.
I see it boiling down to "happy developers" vs "happy end users".
I forgot to mention that carriers are responsible for NITZ signal. They can't ignore timezone changes. So we depend on carriers - it's not phones they need to update, but signals their cell towers send. |
There are three sources of time zone IDs are output by ECMAScript:
The fundamental change in this proposal is in case (1): when a developer provides an Olson ID as input to an ECMAScript API. Currently, that ID is always canonicalized to ECMAScript's current canonical ID for that time zone. We're proposing to stop doing that. The proposed behavior is to retain the ID that the caller provided: if I pass Asia/Calcutta as input, then I get Asia/Calcutta as output. If I pass Asia/Kolkata as input, then I get Asia/Kolkata as output. The goal is to reduce the impact to programs when canonical IDs change for a time zone. Case (3) seems like we can't avoid returning the canonical ID back to the caller, because it would be bad to have a UI time zone picker with two identical entries with the same localized name, but one has a For case (2), we're unsure. Should (2) align with (1) and always return the ID that the OS gives us without changing it? Or should (2) align with (3) and always canonicalize what the OS provides before returning it to the caller? It sounds like your opinion is that aligning (1) and (2) is more important than aligning (2) and (3)? Is that correct?
Yep, agree. Trying to resolve this tradeoff is one of the main problems that this proposal is trying to solve. One way we're trying to resolve it is by reducing the importance of canonicalization to how ECMAScript programs behave. If ECMAScript doesn't change the input it gets from developers (and from the OS, per above?), then even if canonical IDs change then there's less of a chance that it will break existing code. At least that's the theory. What do you think about this plan?
Yeah, this is a hard problem without an obvious easy answer. The best idea I've had is similar to what you're doing in Android: wait for some period of time before showing that ID to users. Specifically, what I'm thinking is that ECMAScript would do this:
This is good info, thanks! Does the NITZ signal provide the Olson time zone name? Or just the UTC offset? The specification isn't clear and there are no examples in it. I found this document which suggests that it's just the offset, but I wasn't sure if that varied between carriers. If it's just an offset, then does Android automatically translate between the NITZ timezone and Olson ID using geo-location? If not, then how is the OS's Olson ID determined after NITZ tells the phone that the time zone has changed?
Yep, what I mean when I say "IANA canonical" is "The canonical ID according to the time zone database that the ECMAScript implementation embeds, which may be different from the IANA TZDB used by the OS." Note that some ECMAScript implementations like V8 don't use TZDB directly, but instead use CLDR data which roughly follows TZDB if it were built with So maybe a better way of saying what I was trying to ask would be this: "which ID do you think should be returned? The OS's ID or the canonical one according to ECMAScript's current time zone data?" |
I think timezone picker and enumeration of all the supported timezones are related, but different problems.
Ahhh, now I see what you mean. I thought that Before I answer the question: Do I understand correctly that
That sounds reasonable to me.
Unfortunately, they don't. NITZ provides offset and DST flag only. Then we try to find matching timezone for that country. Please see our documentation. Geo-location based timezone detection was recently introduced, and is also documented.
Be careful with backzone file [1]. TBH I thought ICU does not use it at all. Or maybe that's the way we build ICU dat files in Android.
It's tough. Will that ID be communicated back to the OS? Or is that browser's implementation detail? |
This data is well-structured. Very helpful to see. Thanks so much for sharing it. This file seems to have a lot of overlap in its intent and usage as CLDR's timezone.xml. Except the Android one has more useful metadata about how to choose the right alias. I wonder if it might make sense at some point to merge these into a single CLDR-maintained data set? Or to otherwise more closely align CLDR's data with what you're doing here so that there'd be more consistency among various CLDR-using software that Android devices communicate with? I really like how this file already has Europe/Kyiv added as an alias although it's not canonical yet. This is the same least-worst option I was considering for ECMAScript, where we'd quickly add new IDs to the "recognized" list, but wait some period of time (1-2 years?) before making them canonical if they're renames of an existing ID. It's encouraging that you adopted the same approach too. What do you do in Android about brand-new IDs that are not renames, like
Nope. Currently in V8, it returns the results of an ICU enumeration API:
In Safari, it returns the same thing but adds But regardless, only one ID per set of aliases are returned by
Correct. It will return only one of ICU and CLDR are currently investigating whether they can expose an API (likely a new one) that would enable returning
Thanks for the pointer. My assumption (may be wrong?) was that we needed to use backzone, otherwise not all IDs in zone.tab would have time zone rules in the build output of TZDB. And also that zones in backzone that do have valid data before 1970 (e.g. Europe/Copenhagen, Europe/Oslo, Atlantic/Reykjavik) would be replaced with other zones' data because TZDB's 2022 releases merged those zones into different Zones like Europe/Berlin or Africa/Abidjan. Maybe these were incorrect assumptions? In general I think we agree most closely to the philosophy of the global-tz fork maintained by the Joda Time maintainer, and my understanding was that
I know that ICU imports data from TZDB via some scripts they run when new TZDB releases come out. I'm not sure exactly how their import process works. I suspect you're correct here, although I am not sure. One of the work items in this proposal is to come up with normative guidelines for how ECMAScript engines should build TZDB in order to ensure more consistency across engines. Part of these guidelines will probably be recommending the build options of TZDB and/or a particular way of accessing CLDR, if not as a requirement then as an example of one way to obtain time zone data in the recommended way.
If "communicated back to the OS" means what I think it does, then no. The data flow for That said, I could also see cases where an ECMAScript program (like a React Native app) might call Also, another case I was thinking of was sysadmins who wanted to lock the OS on a particular older ID (e.g. Europe/Kiev) for backwards-compatibility or political reasons. My assumption is that those sysadmins would not like ECMAScript to ignore their overrides. I'm not sure about the priority of those cases. Both seem realistic but probably uncommon. Do you think that it's more important for |
There are Android-specific things like BTW, that file is the one we maintain manually, but actually used one is tzlookup.xml. It is very similar, based on countryzones.txt, but it also has historic timezones (see Argentina, for example). Here is the code which does that.
We add them to the list as soon as they are released. This situation is different from timezone renaming.
If a new timezone is introduced (a brand new one, not an alias) it usually means that existing timezones are different from it (in the future, currently, or in the past). In
I think set of supported IDs will be the same. Haven't checked though.
That's where your definition of
I haven't checked how exactly output of global-tz is different from the upstream. I don't have strong opinion here and we received no bugreports about weird API behaviour for pre-1970 dates, so...
I guess you situation is more complicated than ours: we can make sure that APIs (native (mktime/localtime) / ICU4(C|J)/java.util.TimeZone/java.time.*) are consistent as we control the data they use.
With such approach if a developer is unhappy about OS's timezone, they can provide their own overrides.
But in such case Am really sorry if what say is confusing or make things appear more complicated than they are - I haven't developed timezone APIs so far, I am just responsible for already existing ones. |
@Yqwed could you explain more about what you mean by "provide their own overrides"? Do you mean overrides inside ECMAScript code by creating custom time zones? Overrides by providing their own TZDB data to the OS and/or to Node? Overrides meaning they can reassign the OS's ID themselves, or their sysadmin can do it? Something else? I want to make sure that I understand what you had in mind.
This is already true. So the only change I'm wondering about is whether it's OK for
Yeah. My take on the TZDB controversy is that there are two reasonable but irreconcilable positions:
I'm sympathetic to both sides of the debate. It seems like an appropriate minimal compromise to keep one zone per country even if most intra-country zones are merged. It's unfortunate that this kind of compromise seems to have been overtaken by the more extreme positions above. The only really strong opinion I have is that merging across country boundaries is really bad, because then a stored timestamp string (with a time zone name) representing a future date might become inaccurate in the future if one of those two countries decides to change their offset or DST policy. The same potential problem could happen with intra-country merges, but modern time zone policy is almost always done at the country level, so it's relatively rare that an intra-country region would, for example, stop using DST and then change it back later. Although in countries that do tend to set DST policy at the sub-country level--for example Canada, Argentina, and perhaps Brazil, Russia, and a few others--I'd be inclined to leave at least the existing intra-country zones at the state/province level. For example, I think deprecating America/Montreal might end up being a bad idea, but I think the mergers of zones representing small counties in Indiana seem OK.
Yeah, I don't think the pre-1970 data is really a big deal for computing. Very little software (other than astrology programs?) is written that cares about exact time of day over 50 years ago. AFAICT, per-country zone merges are a much bigger deal for most real software.
Yep, that's much easier! At the last ECMA TG2 meeting, @Constellation said the same thing about iOS, where native ICU always matches WebKit's ICU. So I guess this is only a problem on desktop platforms. Good to know.
No worries, your feedback is very helpful! I'm grateful for it. |
I mean that if existing (or planned) APIs allow specifying timezone ID so that developers won't struggle with "this thing implicitly uses OS's timezone ID" and there API which do not canonicalise (sorry for using this term) then they can have list of overrides (
IDs are geopolitics sensitive and I am not sure that specific UI implementation should drive this decision - it can be dropdown list, but it also can be a map or "Choose region -> choose from a smaller subset" like solution. I think if a developer wants to implement timezone picker of any sort, they should use zone1970.tab as first approximation and modify it to the market they are focused in.
It also has
I think Android's Firefox uses system APIs for timezones and Chrome uses built-in ICU. You might also see it beyond desktops :) |
Circling back on this discussion, this proposal will recommend that implementations delay 2 years after an existing ID is renamed in TZDB. Here's the relevant spec text:
I'll leave this issue open for a while to allow for additional user and/or implementer feedback on this proposed solution. For the other issues that @Yqwed raised in this thread, I agree that ECMAScript implementations should be more consistent, but after discussions with multiple implementers, I now believe that this consistency work will happen outside the scope of the ECMAScript spec. A good first step is for CLDR and ICU to expose the latest IANA canonical IDs. See https://unicode-org.atlassian.net/browse/CLDR-14453 for latest status of that. When that's available, I think it makes sense to lobby V8 and JSC to start using the canonical IDs that CLDR provides. That will be a good time to encourage implementations to be more consistent and to rip out whatever special cases they may have added. |
Consider such a selection element: <Picker
value={database.getTimezoneId()} // or undefined, if not existing yet
defaultValue={Temporal.Now.timeZoneId()} // or new Intl.DateTimeFormat().resolvedOptions().timeZone?
data={Intl.supportedValuesOf('timeZone')}
onChange={database.putTimezoneId}
/> I see conflicting requirements:
So yes, I think that canonicalisation is needed. One could already use value={Intl.supportedValuesOf('timeZone').find(TimeZone.prototype.equals.bind(TimeZone.from(database.getTimezoneId()))} but that's cumbersome. I'd really prefer to have a method/getter for this on the value={TimeZone.from(database.getTimezoneId()).canonicalId} // or .canonical().id Of course, changing canonical IDs are not good. Exposing Is that what the statement "exposing canonical identifiers has been a source of grief in every software platform" is about? I would nonetheless disagree that this kind of API should be deferred to a later proposal. People will need, and find, ugly workarounds to achieve canonicalisation.
I could not agree more. Merging timezones requires a prediction that the two time zones will never diverge. This should not be based only on the fact that they haven't diverged so far - predictions are difficult, especially so when concerning the future. This is where geopolitics come into play, not a favourite topic in technical discussions. But with isolationism on the rise, it's unfortunately not an unlikely scenario - e.g. when the European Union discussed their plans to abandon summer time, various members have signalled that they would adopt separate standard times not CET. Edit: Ah, it seems that is solved by the rule a) for "primary time zone identifier" that is proposed in tc39/ecma402#806. I hope this lands asap and does not wait for Temporal. |
Thanks for your thoughtful comment!
The core issue here is that Also, although it's less ergonomic than using const ids = Intl.supportedValuesOf('timeZone');
timeZoneFromDatabase = Temporal.TimeZone.from(idatabase.getTimezoneId());
value = timeZoneFromDatabase && ids.find(id => timeZoneFromDatabase.equals(id));
if (timeZoneFromDatabase && !value) {
// handle the "unknown time zone" case
}
<Picker
value={ value }
defaultValue={ timeZoneFromDatabase ?? Temporal.Now.timeZoneId() }
data={ ids }
onChange={ database.putTimezoneId }
/>
Yes, although I probably should have been clearer than the biggest problem is auto-canonicalization, where an input ID is canonicalized before storing it. So you can use Asia/Calcutta as input and get Asia/Kolkata back, or vice versa. This is really problematic because it means that the behavior of working code will change based on updates to the IANA Time Zone Database. It's especially bad for Java. Unlike in (before this proposal) ECMAScript where engines control which IANA Time Zone Database data is used, in Java users can update TZDB on their own. This means that if TZDB makes a controversial canonicalization decision (like resolving Simply exposing canonicalization is not as bad as if it's always done, but it can still cause problems where users think that
AFAIK, current ECMAScript engines only allow one ID per country, with the one exception of |
Oh I totally agree that
I think it is bad from a performance perspective if I have to iterate through the array of canonical identifiers. (I don't mind readability, code can always be written in different ways to suit different preferences). Sure, the array is not super large, but it is still inefficient, and I think it's a shame to have to do this when the runtime already has the mapping available internally. That is what I take issue with. Similarly, it would be great to have But back to ergonomics and making the correct usage easier to use than the broken one. Having a Some other ideas that would make
Btw, I just noticed custom timezones. To make |
While reviewing bug reports on existing ECMAScript canonicalization behavior, I ran across a problematic case: when a new TZDB identifier is introduced, inevitably some environments will get it before others, and the slower-to-update environments will throw an exception when provided with that new ID. @Yqwed in #5 also raised this case.
When environments are evergreeen browsers, then the delay to get new IDs (and their accompanying Zone rules) is relatively short. But other environments could take months or years to update. For example:
How can we help make this problem less bad?
When the new ID is present because of a name change like Kiev=>Kyiv, then there is something we could recommend in the spec: that implementations first introduce the identifier without making it canonical (so that it will be recognized if others send it, but because non-canonical it won't be sent to others), wait a bit (maybe a few months?), and then make the new name canonical?
This would have a weird side effect though, because currently (at least once we fix the 13 outdated canonical identifiers in V8 and WebKit) non-canonical identifiers never become canonical; instead, the transitions only go one way. Is it OK to break that regularly? And even if yes, is it good to delay canonicalizing new renames?
On the other hand, when it's a brand-new identifier representing a new Zone, I'm not sure there's much the spec can do, other than perhaps adding additional metadata to time zones, e.g. an
addedDate
property toTemporal.TimeZone
which could allow applications to decide to hide newly-introduced identifiers from dropdown time zone choosers in UIs. This would add a maintenance burden because this added-date data is not available in TZDB. And it'd complicate the Temporal API. So I'm on the fence about whether the value of such an additional metadata is worth it.I have also been thinking about adding a
Temporal.TimeZone.p.metadata
getter which initially could return an object, e.g.{ version: '2022g' }
and could be extended with additional metadata later like an added date if there's enough customer demand. See #7 for discussion.Another thing we could do would be to encourage implementations to dynamically update to the latest TZDB more quickly (e.g. dynamically on install instead of being bundled into the distribution) but this would vary behavior for the same version of a browser, so I'm not sure that this would be good.
Regardless, one thing we should do is to encourage (e.g. in MDN docs) ECMAScript developers to anticipate this "unknown but not invalid identifier" case and to defend against it, using clearer error messages when an identifier is not recognized.
Feedback encouraged on this issue!
The text was updated successfully, but these errors were encountered: