Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Should all the parsing routines only validate the syntax? #1901

Closed
FrankYFTang opened this issue Nov 1, 2021 · 16 comments · Fixed by #1954
Closed

Should all the parsing routines only validate the syntax? #1901

FrankYFTang opened this issue Nov 1, 2021 · 16 comments · Fixed by #1954

Comments

@FrankYFTang
Copy link
Contributor

In the 2021-10-28 Bi-weekly Temporal meeting, we discuss and conclude that in ParseTemporalTimeZoneString() we should not call step 7-a If ! "IsValidTimeZoneName(name) is false, throw a RangeError exception." to validate the timezone name is supported or not. [see the discussion about ""Time zone annotation is ignored in input ISO string" #3265 ")

Now I just realize we also do some validate more than syntax in other parsing routine, so.. how about those places? Should those places also only validate the grammar syntanx but not the value is sematic valid or not?

Here is the validation steps in those routines:

Step 16 and 17 in https://tc39.es/proposal-temporal/#sec-temporal-parseisodatetime
13.34 ParseISODateTime ( isoString )

16. If ! IsValidISODate(year, month, day) is false, throw a RangeError exception.
17. If ! IsValidTime(hour, minute, second, millisecond, microsecond, nanosecond) is false, throw a RangeError exception.

In step 5 of ParseTemporalCalendarString
https://tc39.es/proposal-temporal/#sec-temporal-parsetemporalcalendarstring

5 If ! IsBuiltinCalendar(id) is false, then
a. Throw a RangeError exception.

@justingrant @sffc @ptomato

@FrankYFTang FrankYFTang changed the title Should all the parsing routine only validate the syntax? Should all the parsing routines only validate the syntax? Nov 1, 2021
@FrankYFTang
Copy link
Contributor Author

One specific reason I ask this is right now I try to refactor the parsing part of the implementation to a lower level subcomponent but the fact that parsing routines need to validate the range of value by calling IsValidISODate(), IsValidTime() and IsBuiltinCalendar() seems are misaligned with our previous discussion also make us hard to separate them into lower routines only PARSE.

@justingrant
Copy link
Collaborator

Custom calendars and time zones are explicitly included in Temporal's scope. So we should plan for the case where:

  • Types that don't care about calendar (PlainTime and Instant) can still be parsed from date/time strings which use a custom calendar annotation.
  • Types that don't care about time zone (all types except ZonedDateTime and Time Zone) can still be parsed from a date/time string that contains a custom time zone annotation.

Parsing implementations should be layered to support the cases above.

On the other hand, ISO syntax (other than annotations in brackets) is carefully specified and is not open to customization in userland. So there's no inherent need to split ISO-date/ISO-time validation from low-level parsing, because there are no Temporal use cases that expose syntactically-valid but semantically-invalid ISO strings to callers.

That said, one useful thing that implementations could do would be to emit error messages that highlight which part of an ISO string is syntactically valid but semantically invalid. A RangeError message of "Invalid ISO string" is much less useful than "Invalid time: 25:00" or "Invalid date: 2020-02-29". If the current split in the spec is helpful in providing these distinct error messages, then IMHO it should stay. I don't think we need to bend over backwards to, for example, tell the user that "2020-01-01T23:599" has an extra digit, but if it's easy to provide better error messages in common cases, then it seems worthwhile.

@justingrant
Copy link
Collaborator

Actually, @sffc @ptomato is my assertion above correct? Do we support custom calendar and TZ annotations when parsing types that don't use those annotations? Now I'm not 100% sure.

@sffc
Copy link
Collaborator

sffc commented Nov 1, 2021

I think the latest on this was #1293 (comment), fixed in #1419. IIRC, the conclusion was that in v2 we could add a calendar or time zone registry to help with the parsing.

@justingrant
Copy link
Collaborator

I think the latest on this was #1293 (comment), fixed in #1419. IIRC, the conclusion was that in v2 we could add a calendar or time zone registry to help with the parsing.

@sffc I don't think this comment answers my question. That comment seems to be about types that do use the annotations. What about types that don't?

FWIW, I can guess that there's a non-trivial perf advantage for implementations that never use calendars or time zones to be able to avoid initializing and allocating RAM for the part of their code that deals with calendars or time zones.

Specifically, I suspect there's a lot of low-level code that will use Instant and Duration (for timestamps, profiling, timing, etc.) without ever dealing with calendars or time zones at all. Might be nice to avoid the lookups completely for those cases.

@sffc
Copy link
Collaborator

sffc commented Nov 2, 2021

The new syntax that @ryzokuken is working on defines ISO-8601 plus extensions, which are enclosed in []. Part of my intent with that design is so that anything inside [] that isn't understood by a particular parser is simply ignored.

@FrankYFTang
Copy link
Contributor Author

But if parsing routine should only valid "syntax" then it should not check the day range in ParseISODateTime neither right?
But you do agree with me that ParseTemporalCalendarString should not call IsBuiltinCalendar to do a check the calendar is a build in or not, right?

@ptomato
Copy link
Collaborator

ptomato commented Nov 4, 2021

Discussed in the champions meeting of 2021-11-04.

There are two questions here: a long-term one dealing with what comes out of the discussions with the IETF as valid syntax in between brackets. It looks like that might turn out to be "anything, so long as the brackets are balanced", but currently in the spec text there is a more restrictive grammar of what is allowed as an annotation. So the short-term question is the one that Frank asked.

The consensus for the short-term question is that we should not validate that the time zone or calendar is available in the system when parsing. So, for example, since Instant does not have a calendar, this operation should succeed:

Temporal.Instant.from('2021-11-04T17:34Z[u-ca=aaaaaaaa]')

(but the same string passed to Temporal.PlainDate.from() would continue to fail, no change there)

Regarding the IsValidISODate and IsValidTime calls which Frank asked about in the OP, ISO 8601 includes some context-dependent rules on what constitutes a syntactically valid string, which those operations enforce, since they cannot be enforced in the grammar. Although we agreed to check and make sure that these context-dependent rules are reflected correctly in the parsing operations.

@FrankYFTang
Copy link
Contributor Author

My question is should Temporal.PlainTime.from('2021-02-31T17:34Z') success since
PlainTime will only read 17:34 but not info from "2021-02-31" anyway.

@ptomato
Copy link
Collaborator

ptomato commented Nov 5, 2021

I need to check with ISO 8601, but if I remember correctly, a string with a date of 2021-02-31 would be a syntactically invalid string.

@justingrant
Copy link
Collaborator

I need to check with ISO 8601, but if I remember correctly, a string with a date of 2021-02-31 would be a syntactically invalid string.

I didn't see anything in ISO 8601 explicitly (at least the 2004 edition) explicitly about this. But there is a section 3.2.1 The Gregorian calendar which lists the number of days in each month, so there's at least a strong implication that invalid day numbers are disallowed.

@sffc
Copy link
Collaborator

sffc commented Nov 7, 2021

What does RFC 3339 say about that?

@gibson042
Copy link
Collaborator

gibson042 commented Nov 7, 2021

RFC 3339 is very clear about 2021-02-31 and 25:00 being syntactically invalid: https://datatracker.ietf.org/doc/html/rfc3339#section-5.6

   date-month      = 2DIGIT  ; 01-12
   date-mday       = 2DIGIT  ; 01-28, 01-29, 01-30, 01-31 based on
                             ; month/year
   time-hour       = 2DIGIT  ; 00-23
   time-minute     = 2DIGIT  ; 00-59
   time-second     = 2DIGIT  ; 00-58, 00-59, 00-60 based on leap second
                             ; rules

ISO 8601:2019 is slightly less clear, but still does include text limiting e.g. calendar day of month "through ‘28’, ‘29’, ‘30’ or ‘31’ (depending on the month)" and calendar day of year "through ‘365’ (common year) or ‘366’ (leap year)" and clock second "to ‘58’, ‘59’ or ‘60’, identifying the last second of a clock minute (‘58’ with a negative leap second, ‘59’ without a leap second, ‘60’ with a leap second)", and explicitly stating that "‘24’ shall not be used to represent hour in accordance with this document".

@ptomato
Copy link
Collaborator

ptomato commented Nov 25, 2021

@FrankYFTang I think this issue includes #1897 as well, am I correct about that? If so, we can close that other one.

@FrankYFTang
Copy link
Contributor Author

@FrankYFTang I think this issue includes #1897 as well, am I correct about that? If so, we can close that other one.

My original intent was to keep #1897 for ParseTemporalTimeZoneString but this for other parsing routine.

ptomato added a commit that referenced this issue Dec 3, 2021
Similar to #1897, we want to move any validation that is not purely
syntactic out of ParseTemporal___String abstract operations. This
accomplishes that for Calendar. However, it affects nothing observable,
just reorganizes code.

Closes: #1901
@ptomato
Copy link
Collaborator

ptomato commented Dec 3, 2021

I've opened a PR to move the validation step for calendars (which would throw on a syntactically legal but not available calendar name) outside of ParseTemporalCalendarString. This turned out not to affect any observable behaviour.

I've reviewed ISO 8601 again and I don't see anything else that would need to change. I agree with @gibson042's analysis above, ISO 8601 is not 100% clear about whether 2021-02-31 is syntactically invalid, but it seems to suggest so, and additionally RFC 3339 is 100% clear about that. From that I conclude that the IsValidISODate and IsValidTime calls should stay where they are.

Ms2ger pushed a commit that referenced this issue Dec 6, 2021
Similar to #1897, we want to move any validation that is not purely
syntactic out of ParseTemporal___String abstract operations. This
accomplishes that for Calendar. However, it affects nothing observable,
just reorganizes code.

Closes: #1901
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

5 participants