-
Notifications
You must be signed in to change notification settings - Fork 159
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Should all the parsing routines only validate the syntax? #1901
Comments
One specific reason I ask this is right now I try to refactor the parsing part of the implementation to a lower level subcomponent but the fact that parsing routines need to validate the range of value by calling IsValidISODate(), IsValidTime() and IsBuiltinCalendar() seems are misaligned with our previous discussion also make us hard to separate them into lower routines only PARSE. |
Custom calendars and time zones are explicitly included in Temporal's scope. So we should plan for the case where:
Parsing implementations should be layered to support the cases above. On the other hand, ISO syntax (other than annotations in brackets) is carefully specified and is not open to customization in userland. So there's no inherent need to split ISO-date/ISO-time validation from low-level parsing, because there are no Temporal use cases that expose syntactically-valid but semantically-invalid ISO strings to callers. That said, one useful thing that implementations could do would be to emit error messages that highlight which part of an ISO string is syntactically valid but semantically invalid. A RangeError message of "Invalid ISO string" is much less useful than "Invalid time: 25:00" or "Invalid date: 2020-02-29". If the current split in the spec is helpful in providing these distinct error messages, then IMHO it should stay. I don't think we need to bend over backwards to, for example, tell the user that "2020-01-01T23:599" has an extra digit, but if it's easy to provide better error messages in common cases, then it seems worthwhile. |
I think the latest on this was #1293 (comment), fixed in #1419. IIRC, the conclusion was that in v2 we could add a calendar or time zone registry to help with the parsing. |
@sffc I don't think this comment answers my question. That comment seems to be about types that do use the annotations. What about types that don't? FWIW, I can guess that there's a non-trivial perf advantage for implementations that never use calendars or time zones to be able to avoid initializing and allocating RAM for the part of their code that deals with calendars or time zones. Specifically, I suspect there's a lot of low-level code that will use Instant and Duration (for timestamps, profiling, timing, etc.) without ever dealing with calendars or time zones at all. Might be nice to avoid the lookups completely for those cases. |
The new syntax that @ryzokuken is working on defines ISO-8601 plus extensions, which are enclosed in |
But if parsing routine should only valid "syntax" then it should not check the day range in ParseISODateTime neither right? |
Discussed in the champions meeting of 2021-11-04. There are two questions here: a long-term one dealing with what comes out of the discussions with the IETF as valid syntax in between brackets. It looks like that might turn out to be "anything, so long as the brackets are balanced", but currently in the spec text there is a more restrictive grammar of what is allowed as an annotation. So the short-term question is the one that Frank asked. The consensus for the short-term question is that we should not validate that the time zone or calendar is available in the system when parsing. So, for example, since Instant does not have a calendar, this operation should succeed: Temporal.Instant.from('2021-11-04T17:34Z[u-ca=aaaaaaaa]') (but the same string passed to Regarding the IsValidISODate and IsValidTime calls which Frank asked about in the OP, ISO 8601 includes some context-dependent rules on what constitutes a syntactically valid string, which those operations enforce, since they cannot be enforced in the grammar. Although we agreed to check and make sure that these context-dependent rules are reflected correctly in the parsing operations. |
My question is should Temporal.PlainTime.from('2021-02-31T17:34Z') success since |
I need to check with ISO 8601, but if I remember correctly, a string with a date of 2021-02-31 would be a syntactically invalid string. |
I didn't see anything in ISO 8601 explicitly (at least the 2004 edition) explicitly about this. But there is a section 3.2.1 The Gregorian calendar which lists the number of days in each month, so there's at least a strong implication that invalid day numbers are disallowed. |
What does RFC 3339 say about that? |
RFC 3339 is very clear about 2021-02-31 and 25:00 being syntactically invalid: https://datatracker.ietf.org/doc/html/rfc3339#section-5.6
ISO 8601:2019 is slightly less clear, but still does include text limiting e.g. calendar day of month "through ‘28’, ‘29’, ‘30’ or ‘31’ (depending on the month)" and calendar day of year "through ‘365’ (common year) or ‘366’ (leap year)" and clock second "to ‘58’, ‘59’ or ‘60’, identifying the last second of a clock minute (‘58’ with a negative leap second, ‘59’ without a leap second, ‘60’ with a leap second)", and explicitly stating that "‘24’ shall not be used to represent hour in accordance with this document". |
@FrankYFTang I think this issue includes #1897 as well, am I correct about that? If so, we can close that other one. |
My original intent was to keep #1897 for ParseTemporalTimeZoneString but this for other parsing routine. |
I've opened a PR to move the validation step for calendars (which would throw on a syntactically legal but not available calendar name) outside of ParseTemporalCalendarString. This turned out not to affect any observable behaviour. I've reviewed ISO 8601 again and I don't see anything else that would need to change. I agree with @gibson042's analysis above, ISO 8601 is not 100% clear about whether 2021-02-31 is syntactically invalid, but it seems to suggest so, and additionally RFC 3339 is 100% clear about that. From that I conclude that the IsValidISODate and IsValidTime calls should stay where they are. |
In the 2021-10-28 Bi-weekly Temporal meeting, we discuss and conclude that in ParseTemporalTimeZoneString() we should not call step 7-a If ! "IsValidTimeZoneName(name) is false, throw a RangeError exception." to validate the timezone name is supported or not. [see the discussion about ""Time zone annotation is ignored in input ISO string" #3265 ")
Now I just realize we also do some validate more than syntax in other parsing routine, so.. how about those places? Should those places also only validate the grammar syntanx but not the value is sematic valid or not?
Here is the validation steps in those routines:
Step 16 and 17 in https://tc39.es/proposal-temporal/#sec-temporal-parseisodatetime
13.34 ParseISODateTime ( isoString )
In step 5 of ParseTemporalCalendarString
https://tc39.es/proposal-temporal/#sec-temporal-parsetemporalcalendarstring
@justingrant @sffc @ptomato
The text was updated successfully, but these errors were encountered: