Skip to content

Conversation

@MaxGekk
Copy link
Member

@MaxGekk MaxGekk commented Apr 30, 2021

What changes were proposed in this pull request?

  1. Extend Spark SQL parser to support parsing of:
    • INTERVAL YEAR TO MONTH to YearMonthIntervalType
    • INTERVAL DAY TO SECOND to DayTimeIntervalType
  2. Assign new names to the ANSI interval types according to the SQL standard to be able to parse the names back by Spark SQL parser. Override the typeName() name of YearMonthIntervalType/DayTimeIntervalType.

Why are the changes needed?

To be able to use new ANSI interval types in SQL. The SQL standard requires the types to be defined according to the rules:

<interval type> ::= INTERVAL <interval qualifier>
<interval qualifier> ::= <start field> TO <end field> | <single datetime field>
<start field> ::= <non-second primary datetime field> [ <left paren> <interval leading field precision> <right paren> ]
<end field> ::= <non-second primary datetime field> | SECOND [ <left paren> <interval fractional seconds precision> <right paren> ]
<primary datetime field> ::= <non-second primary datetime field | SECOND
<non-second primary datetime field> ::= YEAR | MONTH | DAY | HOUR | MINUTE
<interval fractional seconds precision> ::= <unsigned integer>
<interval leading field precision> ::= <unsigned integer>

Currently, Spark SQL supports only YEAR TO MONTH and DAY TO SECOND as <interval qualifier>.

Does this PR introduce any user-facing change?

Should not since the types has not been released yet.

How was this patch tested?

By running the affected tests such as:

$ build/sbt "sql/testOnly *SQLQueryTestSuite -- -z interval.sql"
$ build/sbt "sql/testOnly *SQLQueryTestSuite -- -z datetime.sql"
$ build/sbt "test:testOnly *ExpressionTypeCheckingSuite"
$ build/sbt "sql/testOnly *SQLQueryTestSuite -- -z windowFrameCoercion.sql"
$ build/sbt "sql/testOnly *SQLQueryTestSuite -- -z literals.sql"

@github-actions github-actions bot added the SQL label Apr 30, 2021
@SparkQA
Copy link

SparkQA commented Apr 30, 2021

Kubernetes integration test starting
URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/42632/

@SparkQA
Copy link

SparkQA commented Apr 30, 2021

Kubernetes integration test status failure
URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/42632/

@github-actions github-actions bot added the DOCS label Apr 30, 2021
@SparkQA
Copy link

SparkQA commented Apr 30, 2021

Test build #138111 has finished for PR 32409 at commit a335f00.

  • This patch fails Spark unit tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@SparkQA
Copy link

SparkQA commented Apr 30, 2021

Kubernetes integration test starting
URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/42635/

@SparkQA
Copy link

SparkQA commented Apr 30, 2021

Kubernetes integration test status failure
URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/42635/

@SparkQA
Copy link

SparkQA commented Apr 30, 2021

Test build #138114 has finished for PR 32409 at commit 0b4f6bf.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@MaxGekk MaxGekk changed the title [WIP][SPARK-35285][SQL] Parse ANSI interval types in SQL schema [SPARK-35285][SQL] Parse ANSI interval types in SQL schema May 1, 2021
@MaxGekk MaxGekk marked this pull request as ready for review May 1, 2021 07:08
@MaxGekk MaxGekk requested review from cloud-fan and yaooqinn May 1, 2021 07:09
@MaxGekk
Copy link
Member Author

MaxGekk commented May 1, 2021

@yaooqinn @AngersZhuuuu @beliefer @cloud-fan Please, review this PR.

@AngersZhuuuu
Copy link
Contributor

Since change the type name, should we add this to migration guide?

@AngersZhuuuu
Copy link
Contributor

And there are many test case use year-month interval such as SPARK-34721: add a year-month interval to a date.

Main code LGTM

@MaxGekk
Copy link
Member Author

MaxGekk commented May 1, 2021

@AngersZhuuuu The types have not been released yet. There are no versions to migrate from.

@MaxGekk
Copy link
Member Author

MaxGekk commented May 1, 2021

And there are many test case use year-month interval

This is the name of a sub-class of interval type. It is ok to use it in test titles, see PR's description #31810

@AngersZhuuuu
Copy link
Contributor

@AngersZhuuuu The types have not been released yet. There are no versions to migrate from.

Sounds good.

@AngersZhuuuu
Copy link
Contributor

And there are many test case use year-month interval

This is the name of a sub-class of interval type. It is ok to use it in test titles, see PR's description #31810

Got it, thanks for your explain. LGTM

@HyukjinKwon
Copy link
Member

Merged to master.

Literal(Period.ofDays(2))),
EmptyRow,
"sequence step must be a day year-month interval if start and end values are dates")
"sequence step must be a day interval year to month if start and end values are dates")
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@beliefer The error message confuses me slightly, especially the combination a day interval year to month. Could you open a PR to improve the error, please, something like "... sequence step must be an interval of day granularity ...".

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I see.

xuanyuanking pushed a commit to xuanyuanking/spark that referenced this pull request Sep 29, 2021
### What changes were proposed in this pull request?
1. Extend Spark SQL parser to support parsing of:
    - `INTERVAL YEAR TO MONTH` to `YearMonthIntervalType`
    - `INTERVAL DAY TO SECOND` to `DayTimeIntervalType`
2. Assign new names to the ANSI interval types according to the SQL standard to be able to parse the names back by Spark SQL parser. Override the `typeName()` name of `YearMonthIntervalType`/`DayTimeIntervalType`.

### Why are the changes needed?
To be able to use new ANSI interval types in SQL. The SQL standard requires the types to be defined according to the rules:
```
<interval type> ::= INTERVAL <interval qualifier>
<interval qualifier> ::= <start field> TO <end field> | <single datetime field>
<start field> ::= <non-second primary datetime field> [ <left paren> <interval leading field precision> <right paren> ]
<end field> ::= <non-second primary datetime field> | SECOND [ <left paren> <interval fractional seconds precision> <right paren> ]
<primary datetime field> ::= <non-second primary datetime field | SECOND
<non-second primary datetime field> ::= YEAR | MONTH | DAY | HOUR | MINUTE
<interval fractional seconds precision> ::= <unsigned integer>
<interval leading field precision> ::= <unsigned integer>
```
Currently, Spark SQL supports only `YEAR TO MONTH` and `DAY TO SECOND` as `<interval qualifier>`.

### Does this PR introduce _any_ user-facing change?
Should not since the types has not been released yet.

### How was this patch tested?
By running the affected tests such as:
```
$ build/sbt "sql/testOnly *SQLQueryTestSuite -- -z interval.sql"
$ build/sbt "sql/testOnly *SQLQueryTestSuite -- -z datetime.sql"
$ build/sbt "test:testOnly *ExpressionTypeCheckingSuite"
$ build/sbt "sql/testOnly *SQLQueryTestSuite -- -z windowFrameCoercion.sql"
$ build/sbt "sql/testOnly *SQLQueryTestSuite -- -z literals.sql"
```

Closes apache#32409 from MaxGekk/parse-ansi-interval-types.

Authored-by: Max Gekk <max.gekk@gmail.com>
Signed-off-by: HyukjinKwon <gurwls223@apache.org>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants