Skip to content

Conversation

@takezoe
Copy link
Member

@takezoe takezoe commented Aug 28, 2024

Description

The current INTERVAL literal syntax allows below but it should be rejected:

INTERVAL '124' YEAR TO YEAR

Semantically invalid literals like below becomes an error at the analysis phase but the error message is not clear (See #23002). Anyway, it's better to reject at the syntax level.

INTERVAL '124' MONTH TO YEAR

Additional context and related issues

Release notes

( ) This is not user-visible or is docs only, and no release notes are required.
( ) Release notes are required. Please propose a release note for me.
( ) Release notes are required, with the following suggested text:

# Section
* Fix some things. ({issue}`issuenumber`)

@cla-bot cla-bot bot added the cla-signed label Aug 28, 2024
@hashhar hashhar assigned hashhar and unassigned hashhar Aug 28, 2024
@hashhar hashhar self-requested a review August 28, 2024 20:06
@hashhar
Copy link
Member

hashhar commented Aug 28, 2024

Thanks for the PR, I'm adding syntax-needs-review while I check what the SQL standard has to say about this (e.g. future incompatibilities we may run into).

@ebyhr
Copy link
Member

ebyhr commented Aug 29, 2024

@hashhar FYI, this syntax is documented in 10.1 <interval qualifier> of SQL:2016. Especially, Syntax Rules 1) and 2) are related to this change. 1) explains ordering of YEAR, MONTH, DAY, HOUR, MINUTE, and SECOND. 2) explains MONTH & YEAR cases.

@takezoe
Copy link
Member Author

takezoe commented Aug 31, 2024

The second case is anyway rejected at the analysis phase currently, so the only first example (specifying the same unit both for FROM and TO) could be an incompatibility.

@takezoe takezoe force-pushed the strict-interval-literal-syntax branch from 9466171 to 2ee8640 Compare August 31, 2024 03:00
Copy link
Member

@hashhar hashhar left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry it took me so long.

The grammar changes seem in agreement with spec. The only caveat being that the spec specifies the restriction in "syntax rules" with language using "SHALL". I don't know what this implies. In general do we implement the checks under "syntax rules" in analyzer or in grammar itself? @martint to answer this.

I've yet to look at tests, will try to do by end of day.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

First thing I notice - the grammar in ANSI SQL matches what we had earlier. The restrictions on what combination of from and to are valid is mentioned in the "Syntax Rules" section.

Comment on lines +723 to +728
: INTERVAL sign=(PLUS | MINUS)? string from=YEAR (TO to=MONTH)?
| INTERVAL sign=(PLUS | MINUS)? string from=MONTH
| INTERVAL sign=(PLUS | MINUS)? string from=DAY (TO to=(HOUR | MINUTE | SECOND))?
| INTERVAL sign=(PLUS | MINUS)? string from=HOUR (TO to=(MINUTE | SECOND))?
| INTERVAL sign=(PLUS | MINUS)? string from=MINUTE (TO to=SECOND)?
| INTERVAL sign=(PLUS | MINUS)? string from=SECOND
Copy link
Member

@hashhar hashhar Sep 4, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The rules basically say:

  • If TO is specified then fromSHALL NOT be MONTH
  • If TO is specified then from SHALL be more significant than to.
  • If from=YEAR then to=MONTH SHALL be the only valid option.

From the above the only valid combinations are:

from=YEAR
from=YEAR TO to=MONTH
from=MONTH
from=DAY
from=DAY TO to=(HOUR | MINUTE | SECOND)
from=HOUR
from=HOUR TO to=(MINUTE | SECOND)
from=MINUTE
from=MINUTE TO to=SECOND
from=SECOND

The changes seem to match this.

And significance is defined in the order YEAR > MONTH > DAY > HOUR > MINUTE > SECOND.

Comment on lines +737 to +742
| INTERVAL from=YEAR (TO to=MONTH)? #yearMonthIntervalDataType
| INTERVAL from=MONTH #yearMonthIntervalDataType
| INTERVAL from=DAY (TO to=(HOUR | MINUTE | SECOND))? #dayTimeIntervalDataType
| INTERVAL from=HOUR (TO to=(MINUTE | SECOND))? #dayTimeIntervalDataType
| INTERVAL from=MINUTE (TO to=MINUTE)? #dayTimeIntervalDataType
| INTERVAL from=SECOND #dayTimeIntervalDataType
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why is the sign part missing here?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also since you've changed the type names here I guess AstBuilder also needs to be updated.

Comment on lines +87 to +88
assertThatThrownBy(assertions.expression("INTERVAL '124' MONTH TO MONTH"):: evaluate)
.hasMessageContaining("line 1:33: mismatched input 'TO'.");
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't see how this is better than the older behaviour?

@martint
Copy link
Member

martint commented Sep 4, 2024

In general do we implement the checks under "syntax rules" in analyzer or in grammar itself? @martint to answer this.

It depends on each case. Some things are not possible to constrain at the grammar level. For others, it's easier to produce more user-friendly error messages by enforcing the constraints during analysis.

This one is in a grey area. I would argue that at the grammar level we just want to support the general shape (to / from field or just a single field) and leave the validation of the specific combinations to the analysis phase. We have to enforce the validity during analysis anyway, since the AST allows for any combination of to/from fields. The format of the interval string also depends on the combination of from/to fields, and that has to be validated during analysis, too.

@hashhar
Copy link
Member

hashhar commented Sep 6, 2024

So it seems #23002 is the better direction to go for then.

@takezoe
Copy link
Member Author

takezoe commented Sep 7, 2024

As far as I have checked, some major SQL engines seem to check this at the syntax level. However, choice in Trino is ours. I don't have objection as long as invalid literals are rejected with user-friendly error message.

@takezoe
Copy link
Member Author

takezoe commented Sep 28, 2024

I will close this since we seem to prefer to treat this as a semantic error. Thank you for all comments here. I will continue working on #23002.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Development

Successfully merging this pull request may close these issues.

4 participants