Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Point to ISO standard for Date string syntax #198

Closed
Ms2ger opened this issue Oct 22, 2019 · 28 comments · Fixed by #404
Closed

Point to ISO standard for Date string syntax #198

Ms2ger opened this issue Oct 22, 2019 · 28 comments · Fixed by #404
Assignees
Labels
spec-text Specification text involved
Milestone

Comments

@Ms2ger
Copy link
Collaborator

Ms2ger commented Oct 22, 2019

While it's more doable to define this format right here for Date as opposed to Duration, should we point this to ISO 8601/RFC 3339 for the sake of consistency?

Originally posted by @ryzokuken in https://github.com/tc39/proposal-temporal/pull/194/files

@ryzokuken ryzokuken added this to the Stage 3 milestone Oct 28, 2019
@ryzokuken ryzokuken added the spec-text Specification text involved label Oct 28, 2019
@pipobscure
Copy link
Collaborator

pipobscure commented Oct 29, 2019

We support ISO-8601 with the following stipulations:

  • Only the Calendar-Date format is supported for dates/date-times
  • We use the ECMA-Script Expanded Years
  • Only the simple Duration format is supported (other would require special agreement)
  • In durations only seconds may have fractional parts.
  • The timezone designator may be extended by [<IANA>] zones to properly designate timezones
  • Dates/Times/TimeZones may occur individually or in combination
  • We agree to nanosecond precision meaning seconds may have 0, 3, 6 or 9 decimal places

@littledan
Copy link
Member

BTW I'd suggest not pointing to outdated, historical JavaScript standards, and instead pointing to the current one: Expanded years.

Why do we support these in Temporal?

@pipobscure
Copy link
Collaborator

Because we really like dates to be possible beyond the year 9999 ?
Because the other option is to come up with a different standard; and we need something; and by the year 999999 (at the rate we are going) there won't be any humans to program computers, so it's a perfectly valid limit. 😄

@littledan
Copy link
Member

OK, so Temporal will support even more than Date does in that linked section: The full six digit limit, not just the things in Date.parse range. Is that right? I guess we'll have to write a new definition or reword the existing one, if we want to go for this flexibility (as opposed to just sticking with the four digit limit).

@kaizhu256
Copy link
Contributor

can you give javascript-scenarios where we need [utc-accurate] years beyond +/-9999? because I honestly cannot think of one.

unless there's notable non-gregorian calendars requiring 4+ digit years in Intl, I prefer limiting the scope to 4-digits, to avoid programming-bugs in the common-scenario of sorting ISOStrings in JavaScript (and wasm-sqlite3).

@pipobscure
Copy link
Collaborator

It doesn’t just go for dates > 9999-12-31 but also for dates smaller 1000-01-01 because in that case the pattern changes as well.

+000999-12-31 would be the day before. So the simple case would be if you are dealing with history.

@pipobscure
Copy link
Collaborator

@littledan given that the Date range is +-1/4 million years (roughly) I’d be fine sticking to that range. That way we maintain compatibility and given that both people of the past and future (beyond that range) will likely not have a calendar based on the supposed birth of a deity the usefulness is in doubt. Especially since going outside that range generally involves other changes as well.

@ljharb
Copy link
Member

ljharb commented Oct 30, 2019

The Morlocks live in the year 802,701 :-p

@littledan
Copy link
Member

I'd imagine that we would continue to support bigger values in the constructor and from methods, and that we're talking narrowly about the grammar here, right? With everything on this thread, I am increasingly convinced we should not support extended years. OTOH the rest of the stipulations SGTM, e.g., nanosecond precision and timezone syntax seem pretty mandatory.

@kaizhu256
Copy link
Contributor

kaizhu256 commented Oct 31, 2019

The Morlocks live in the year 802,701 :-p

and the Earth is (approx) 4 billion years old, but who actually cares about [Temporal] utc-accuracy of these dates or anything past +/-9999? most scenarios for such timescales only care about precision to +/-1 year. u don't need overengineered Temporals for that -- basic math-arithmetic is usually good-enough and more cost-effective.

as for years with 3 or less digits, the correct way is to leftpad zeros when in ISOstring-form to coerce it to 4-digits for common-case string sorting/comparing.

@pipobscure
Copy link
Collaborator

@littledan only if we accept that we can produce datetimes/absolutes that cannot be serialized or deserialized. (Which in my mind is a no-go)

@ljharb
Copy link
Member

ljharb commented Oct 31, 2019

@kaizhu256 i think there’s a number of geological websites and astronomical websites that would love to be able to represent eons instead of just a handful of millennia.

Why would we want to artificially restrict ourselves?

@littledan
Copy link
Member

@pipobscure Interesting; why is this a requirement? Don't we have the same issue with a six-digit limit?

@littledan
Copy link
Member

A separate aspect is the variation in punctuation we allow. I believe @gibson042 investigated this in some depth for Date.parse, and now #229 permits a bit more variation. What do we want to permit in Temporal exactly? (I'd suggest we think this through before landing #229.)

@pipobscure
Copy link
Collaborator

@littledan the space instead of T thing is an addition of RFC3339 which bases itself on ISO8601. So by just allowing that, we have much better compatibility. I figured that would be worth it, since for the most part RFC3339 just specifies an ISO8601 profile like ECMA does for JS. Both are frequent use-cases.

@littledan
Copy link
Member

Cool, I'm not opposed to that particular change, but I just wanted to raise this because @gibson042 's presentation included several other syntactic variants and I don't know whether we want to include those.

@gibson042
Copy link
Collaborator

If Temporal admits even one alternate spelling of a value with identical precision, then it should admit all standardized alternate spellings from ISO 8601. This includes arbitrary-case alphabetic designators and . or , as decimal sign, such that e.g. 1955-11-13T06:04:00.9Z is equivalent to 1955-11-13t06:04:00,9z (and also 1955-W45-7t06:04:00,9Z, if Temporal includes deserialization of week dates).

@littledan
Copy link
Member

@gibson042 Do you have any thoughts about the plan above to allow six digit years?

@gibson042
Copy link
Collaborator

My priorities are something like this:

  1. Use the same range for all overlapping Temporal types (i.e., Absolute, Date, DateTime, and YearMonth).
  2. Don't exceed the bounds of the existing ECMAScript date-time string interchange format (which specifies six digits for expanded years).
  3. Align with the existing ECMAScript Date range of POSIX epoch ± 1e8 days, +275760-09-13T00:00Z to -271821-04-20T00:00Z (TODO: Define min/max values #24 (comment) ).

I hold the third priority only weakly and would be willing to let it go, but not without an explicit decision regarding the resulting edge cases such as Temporal.Absolute.from("+999999-12-31T18:00Z").inTimeZone("+10:00") (a date and time of day in year 1000000) and Temporal.DateTime.from("+999999-12-31T18:00").inTimeZone("-08:00") (an instant in year 1000000)—my preference for both is throwing a RangeError.

@ptomato
Copy link
Collaborator

ptomato commented Jan 20, 2020

I've been looking into this a bit, pursuant to #312.

We support ISO-8601 with the following stipulations:

  • Only the Calendar-Date format is supported for dates/date-times

What's the motivation not to support weekdates and ordinal dates (2020-W04-1 and 2020-020 respectively?) Weekdates could be argued to add complexity since there is a bit of calculation required around when week 1 starts, but ordinal dates would seem fairly trivial to support.

Agreed; was there any consensus on supporting P7W for "seven weeks" as it does seem to be part of the "simple" format?

  • In durations only seconds may have fractional parts.
  • The timezone designator may be extended by [<IANA>] zones to properly designate timezones
  • Dates/Times/TimeZones may occur individually or in combination

I think this rule introduces some complications, which we are talking about in #313 — allowing time zones by themselves would make Z and -08:00[America/Vancouver] legal ISO 8601 strings, which seems surprising for a number of people. The ISO 8601 grammar in RFC 3339 treats a time zone as an optional addition to a time, and so does the standard itself as far as I can tell from its description in Wikipedia. I think we should stick to the standard here and speak of date representations, time representations (which may contain a time zone), and combined date/time representations, not allowing time zones by themselves. We can allow lone time zones separately in TimeZone.from() but it seems to me that we should not call them legal ISO strings.

  • We agree to nanosecond precision meaning seconds may have 0, 3, 6 or 9 decimal places

I think this should read "0 through 9" so that 0.5 means 500 ms, it would be surprising if you were required to input 0.500. Would we make any further decimal places beyond 9 illegal or simply truncate them?

@ptomato
Copy link
Collaborator

ptomato commented Jan 27, 2020

Meeting Jan. 27: We will not support weekdates, ordinal dates, or week durations at this time. We will also not support time zone parts in an ISO string without an accompanying time or datetime. For serializing, we'll output only 0, 3, 6, or 9 decimal places, but we'll accept 0 through 9 when parsing. For more than 9 we'll throw.

That makes the updated list of stipulations:

  • Only the Calendar-Date format is supported for dates/date-times
  • We use the ECMA-Script Expanded Years
  • Only the simple Duration format is supported (other would require special agreement) with the exception of week units
  • In durations only seconds may have fractional parts.
  • The timezone designator may be extended by [<IANA>] zones to properly designate timezones
  • Dates and Times may occur individually or in combination
  • We agree to nanosecond precision. Seconds may have 0, 3, 6 or 9 decimal places in serialized strings, and 0 through 9 decimal places are accepted when parsing.

@kaizhu256
Copy link
Contributor

kaizhu256 commented Jan 27, 2020

Seconds may have 0, 3, 6 or 9 decimal places in serialized strings

that makes sorting isostrings problematic as pointed out in issue #329. can the user specify a truncation/padding length when serializing?

i honestly see little value from microsecond/nanosecond precision (for all the trouble it creates). problems requiring that level of precision are generally out-of-scope of this proposal, and don't care about timespans >24h or calendar dates.

@ptomato
Copy link
Collaborator

ptomato commented Feb 3, 2020

that makes sorting isostrings problematic as pointed out in issue #329. can the user specify a truncation/padding length when serializing?

I guess that's a question for #329...

@ptomato
Copy link
Collaborator

ptomato commented Feb 3, 2020

The deeper I look into this, the more stipulations I find that we have to add... here is my current list.

  • Only the calendar date format is supported, not the weekdate or ordinal date format.
  • Two-digit years are disallowed.
  • Expanded Years are allowed.
    • Open question: range is 0±999999 years, or epoch±1e8 days?
  • Omitting the year in a date representation, when the month is present, is allowed
    • This is because ISO 8601 allowed --02-28 or --0228 for month-day expressions but it was removed from the 2004 version of the standard. I think we should allow this, and also allow 02-28 (but not 0228 because that's ambiguous with January 1, 228 CE), but only when expecting a Temporal.MonthDay.
  • Fractional parts may have 1 through 9 decimal places.
  • Only seconds are allowed to have a fractional part.
  • The time zone, if given as a UTC offset, may be suffixed by an IANA time zone name in square brackets.
  • A space may be used to separate the date and time in a combined date / time representation, but not in a duration.
    • as suggested in Richard's comment above
  • Alphabetic designators may be in lower or upper case.
    • as suggested in Richard's comment above
  • Period or comma may be used as the decimal separator.
    • as suggested in Richard's comment above
  • Anything else described by the standard as requiring mutual agreement between communicating parties, is disallowed.

By the way I'm also aware of the following differences between RFC 3339 and ISO 8601:

  • Mixtures of basic (no punctuation) and extended (with punctuation) expressions are not permitted in ISO 8601, but they are permitted by the grammar in RFC 3339.
    • Following RFC 3339 would be the most flexible way, but then we'd have to accept a lot of things that look confusing and not much like dates... e.g. 3446-0508T03:2815-0630 meaning 3:28:15 AM, May 8, 3446, in a time zone that's 6:30 before UTC
  • The fractional part of a second is required to be preceded by "00" in ISO 8601, but not in the grammar in RFC 3339.
    • I recommend we stick to ISO 8601 here because 17:45.10 looks either like a typo for 17:45:10, or as a fractional part of a minute.
  • A time zone offset of "-00:00" is allowed in RFC 3339, but not in ISO 8601.
    • I recommend we stick to RFC 3339 here since that would be the most flexible.

@gibson042
Copy link
Collaborator

  • Mixtures of basic (no punctuation) and extended (with punctuation) expressions are not permitted in ISO 8601, but they are permitted by the grammar in RFC 3339.

Are you sure that ISO 8601 prohibits mixtures? The authors of RFC 3339 weren't ("ISO 8601 is not clear if mixtures of basic and extended format are permissible. This [attempt to create a formal grammar from ISO 8601] permits mixtures.").

Following RFC 3339 would be the most flexible way, but then we'd have to accept a lot of things that look confusing and not much like dates... e.g. 3446-0508T03:2815-0630 meaning 3:28:15 AM, May 8, 3446, in a time zone that's 6:30 before UTC

I don't think it's quite that bad. Per ISO 8601, the basic format for complete calendar date has no punctuation while the extended format has two mandatory dashes, the basic format for complete time of day has no separating punctuation while the extended format has two mandatory colons, and the basic format for complete UTC offset has no separating punctuation while the extended format has one mandatory colon. So the worst case for Temporal parsing is more like 34460508T03:28:15-0630 (which is admittedly still pretty bad).

The fractional part of a second is required to be preceded by "00" in ISO 8601, but not in the grammar in RFC 3339.

I don't know what you mean if not ISO 8601 "a decimal fraction of hour, minute or second may be included", which is already covered by your "Only seconds are allowed to have a fractional part" bullet point.

A time zone offset of "-00:00" is allowed in RFC 3339, but not in ISO 8601.

  • I recommend we stick to RFC 3339 here since that would be the most flexible.

Agreed.

@ptomato
Copy link
Collaborator

ptomato commented Feb 6, 2020

Are you sure that ISO 8601 prohibits mixtures? The authors of RFC 3339 weren't

They later released an erratum clarifying that ISO 8601 does prohibit mixtures.

I don't think it's quite that bad. Per ISO 8601, the basic format for complete calendar date has no punctuation while the extended format has two mandatory dashes, the basic format for complete time of day has no separating punctuation while the extended format has two mandatory colons, and the basic format for complete UTC offset has no separating punctuation while the extended format has one mandatory colon. So the worst case for Temporal parsing is more like 34460508T03:28:15-0630 (which is admittedly still pretty bad).

What I meant was, the RFC 3339 grammar does permit each punctuation mark to be present or absent individually, so a mess like 3446-0508T03:2815-0630 could indeed be generated from that grammar.

I don't know what you mean if not ISO 8601 "a decimal fraction of hour, minute or second may be included", which is already covered by your "Only seconds are allowed to have a fractional part" bullet point.

Sorry, I'll try to put it in a different way; RFC 3339 would allow 17:45.22 for "quarter to 6 plus 220 milliseconds", whereas ISO 8601 would require 17:45:00.22. The former would be ambiguous except that RFC 3339 also doesn't allow fractional parts elsewhere than seconds. Certainly we should never emit 17:45.22 but should we accept it? My feeling is no, because it looks like it could be a typo for 17:45:22.

@gibson042
Copy link
Collaborator

Are you sure that ISO 8601 prohibits mixtures? The authors of RFC 3339 weren't

They later released an erratum clarifying that ISO 8601 does prohibit mixtures.

👍

What I meant was, the RFC 3339 grammar does permit each punctuation mark to be present or absent individually, so a mess like 3446-0508T03:2815-0630 could indeed be generated from that grammar.

Right, but only in the Appendix A attempted formal grammar for ISO 8601. I think that should be considered too loose for Temporal, which should at least require that every date, time, and UTC offset are either completely basic or completely extended. It could also go further and reject strings that mix basic and extended format across date, time, and UTC offset, but I'm not sure if should (I suspect that extended format date and time of day in combination with basic format offset, as in "2020-02-14T22:09-0500", are not that uncommon).

Sorry, I'll try to put it in a different way; RFC 3339 would allow 17:45.22 for "quarter to 6 plus 220 milliseconds", whereas ISO 8601 would require 17:45:00.22. The former would be ambiguous except that RFC 3339 also doesn't allow fractional parts elsewhere than seconds. Certainly we should never emit 17:45.22 but should we accept it? My feeling is no, because it looks like it could be a typo for 17:45:22.

I'm still having a hard time following you. The RFC 3339 Internet Date/Time Format requires seconds and permits fractions only after seconds, so "…T17:45.22" is not valid. The RFC 3339 ISO 8601 formal grammar permits fractions after hours, minutes, or seconds, but doesn't specify their semantics (so "…T17:45.22" is valid but presumably interpreted per ISO 8601). And ISO 8601 permits decimal fractions after hours, minutes, or seconds, so "…T17:45.22" is valid and interpreted as "quarter to 6 plus 0.22 minutes", equivalent to "17:45:13.2"). There's no ambiguity that I can see, only a decision about whether or not to accept the fractional hours or minutes that are permitted by ISO 8601 but not by RFC 3339. Personally, I would place them in the same "advanced usage" bucket as ordinal dates and week dates, and either accept them all with ISO 8601 semantics, or reject them all as too deviant from the time elements appearing in RFC 3339.

@ptomato
Copy link
Collaborator

ptomato commented Feb 19, 2020

I think that should be considered too loose for Temporal, which should at least require that every date, time, and UTC offset are either completely basic or completely extended.

👍

Personally, I would place them in the same "advanced usage" bucket as ordinal dates and week dates, and either accept them all with ISO 8601 semantics, or reject them all as too deviant from the time elements appearing in RFC 3339.

OK, I get you now, I was reading RFC 3339 incorrectly. I was assuming that because RFC 3339 only permits seconds fractions, that implied the semantics of T17:45.22 would be "quarter to 6 plus 0 seconds and 220 ms", but on second reading I think you're right that no particular semantics are implied by the grammar. I think we should stick to our earlier determination that only seconds fractions are allowed and everything else is too advanced.

@ptomato ptomato self-assigned this Feb 25, 2020
ryzokuken pushed a commit that referenced this issue May 6, 2020
We specify our own grammar because what we have decided to accept is
different enough from both ISO 8601 and RFC 3339 that we cannot really
refer directly to either one. A list of where the differences are is
included in the grammar section of the spec.

Closes: #312
Closes: #198
Closes: #71
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
spec-text Specification text involved
Projects
None yet
Development

Successfully merging a pull request may close this issue.

8 participants