[SPARK-29368][SQL][TEST] Port interval.sql by MaxGekk · Pull Request #26055 · apache/spark

MaxGekk · 2019-10-08T11:45:22Z

What changes were proposed in this pull request?

This PR is to port interval.sql from PostgreSQL regression tests: https://raw.githubusercontent.com/postgres/postgres/REL_12_STABLE/src/test/regress/sql/interval.sql

The expected results can be found in the link: https://github.com/postgres/postgres/blob/REL_12_STABLE/src/test/regress/expected/interval.out

When porting the test cases, found PostgreSQL specific features below that do not exist in Spark SQL:

SPARK-29369: Accept strings without interval prefix in casting to intervals
SPARK-29370: Interval strings without explicit unit markings
SPARK-29371: Support interval field values with fractional parts
SPARK-29382: Support writing INTERVAL type to datasource table
SPARK-29383: Support the optional prefix @ in interval strings
SPARK-29384: Support ago in interval strings
SPARK-29385: Make INTERVAL values comparable
SPARK-29386: Copy data between a file and a table
SPARK-29387: Support * and \ operators for intervals
SPARK-29388: Construct intervals from the millenniums, centuries or decades units
SPARK-29389: Support synonyms for interval units
SPARK-29390: Add the justify_days(), justify_hours() and justify_interval() functions
SPARK-29391: Default year-month units
SPARK-29393: Add the make_interval() function
SPARK-29394: Support ISO 8601 format for intervals
SPARK-29395: Precision of the interval type
SPARK-29406: Interval output styles
SPARK-29407: Support syntax for zero interval
SPARK-29408: Support interval literal with negative sign -

Why are the changes needed?

To improve the test coverage, see https://issues.apache.org/jira/browse/SPARK-27763

Does this PR introduce any user-facing change?

No

How was this patch tested?

By manually comparing Spark results with PostgreSQL

SparkQA · 2019-10-08T11:53:45Z

Test build #111890 has started for PR 26055 at commit 6456d74.

MaxGekk · 2019-10-08T18:19:41Z

@dongjoon-hyun @maropu @wangyum Could you review this PR, please.

SparkQA · 2019-10-08T22:08:03Z

Test build #111915 has finished for PR 26055 at commit 4be325a.

This patch passes all tests.
This patch merges cleanly.
This patch adds no public classes.

dongjoon-hyun · 2019-10-08T22:58:08Z

Sure. Thank you for doing this and pinging me, @MaxGekk .

dongjoon-hyun · 2019-10-08T23:05:19Z

sql/core/src/test/resources/sql-tests/inputs/postgreSQL/interval.sql

+-- SELECT INTERVAL '1.5 months' AS `One month 15 days`;
+-- SELECT INTERVAL '10 years -11 month -12 days +13:14' AS `9 years...`;
+
+-- [SPARK-29382] Support the `INTERVAL` type by Parquet datasource


Since this is not Parquet specific issue, I commented on the JIRA. We need to update the JIRA issue title and this comment.

ok. I just looked at other *.sql to see how we solved issue of creating a table with explicit datasource like this:

spark-sql> CREATE TABLE INTERVAL_TBL (f1 int); 19/10/09 11:07:19 WARN HiveMetaStore: Location: file:/user/hive/warehouse/interval_tbl specified for non-external table:interval_tbl Error in query: org.apache.hadoop.hive.ql.metadata.HiveException: MetaException(message:file:/user/hive/warehouse/interval_tbl is not a directory or unable to create one);

So, we added USING parquet everywhere. If using parquet is some kind of common agreement. Isn't the issue in the Parquet datasource because it doesn't allow writing values of CalendarIntervalType.

We need to update the JIRA issue title and this comment.

Could you give me any clues what would you like to see in the title. Your comment "Since this is not Parquet specific issue" gave me nothing, sorry.

@dongjoon-hyun please, clarify this.

We do not support CREATE TABLE t(f1 int) in sql/core module:

-- !query 0 CREATE TABLE t(f1 int) -- !query 0 schema struct<> -- !query 0 output org.apache.spark.sql.AnalysisException Hive support is required to CREATE Hive TABLE (AS SELECT);

So I added USING parquet everywhere. How about change it to?

-- [SPARK-29382] Support writing `INTERVAL` type to datasource table

This is too much generic, from my point of view. If we use parquet everywhere in the ported tests, interval.sql shouldn't be exclusion. And here we have concrete problem is the parquet datasource doesn't support writing values of the interval type.

-- [SPARK-29382] Support writing INTERVAL type to datasource table

Does it mean that INTERVAL should be supported by all builtin datasources?

I will change the title of the JIRA ticket to unblock this PR.

dongjoon-hyun · 2019-10-08T23:06:44Z

sql/core/src/test/resources/sql-tests/inputs/postgreSQL/interval.sql

+
+-- test interval operators
+
+-- SELECT '' AS ten, * FROM INTERVAL_TBL;


This is commented out because we couldn't create the table INTERVAL_TBL instead of not supporting this.

sql/core/src/test/resources/sql-tests/inputs/postgreSQL/interval.sql

dongjoon-hyun · 2019-10-08T23:18:09Z

sql/core/src/test/resources/sql-tests/inputs/postgreSQL/interval.sql

+
+-- test inputting and outputting SQL standard interval literals
+-- SET IntervalStyle TO sql_standard;
+SELECT  interval '0'                       AS zero,


It seems that our result is NULL in the output which is different from PostgreSQL. If we have a JIRA issue for this, could you add at line 249?

postgres=# select interval '0'; interval ---------- 00:00:00 (1 row)

Probably, this https://issues.apache.org/jira/browse/SPARK-29391 should address this issue

Opened the separate ticket for this: https://issues.apache.org/jira/browse/SPARK-29407

dongjoon-hyun · 2019-10-08T23:22:42Z

sql/core/src/test/resources/sql-tests/inputs/postgreSQL/interval.sql

+        interval '1-2' year to month       AS `year-month`,
+        interval '1 2:03:04' day to second AS `day-time`,
+        - interval '1-2'                   AS `negative year-month`,
+        - interval '1 2:03:04'             AS `negative day-time`;


The above two expressions seem to NULL in the generated output which are different from PostgreSQL. It seems that we already have the issue for this. If you don't mind, could you add the JIRA reference for these here explicitly?

zero | year-month | day-time | negative year-month | negative day-time ------+------------+-----------+---------------------+------------------- 0 | 1-2 | 1 2:03:04 | -1-2 | -1 2:03:04

https://issues.apache.org/jira/browse/SPARK-29408

dongjoon-hyun · 2019-10-08T23:23:59Z

In general, this is good for test coverage. I left minor comments, @MaxGekk .

MaxGekk · 2019-10-11T16:45:00Z

@dongjoon-hyun @wangyum Could you take a look at this one more time. This PR can be merged without jenkins, I hope.

wangyum · 2019-10-12T02:41:00Z

retest this please

SparkQA · 2019-10-12T06:15:26Z

Test build #111942 has finished for PR 26055 at commit 749fb56.

This patch passes all tests.
This patch merges cleanly.
This patch adds no public classes.

dongjoon-hyun

+1, LGTM. Thank you, @MaxGekk !

The added test cases are helpful for the test coverage and the commented cases helps us know the difference. We may not support them all, but this investigation is valuable. Thank you again.

Merged to master. (Also, thank you, @wangyum ).

MaxGekk added 6 commits October 7, 2019 10:05

Add interval.sql

9808988

Generate results of interval.sql as is

d2def5d

Disable PostgreSQL specific settings

18b15b9

Fix aliases quoting

26ec823

Phase 1

d0bac6e

Phase 2

6456d74

Phase 3

4be325a

MaxGekk changed the title ~~[WIP][SPARK-29368][SQL][TEST] Port interval.sql~~ [SPARK-29368][SQL][TEST] Port interval.sql Oct 8, 2019

dongjoon-hyun added the SQL label Oct 8, 2019

dongjoon-hyun reviewed Oct 8, 2019

View reviewed changes

sql/core/src/test/resources/sql-tests/inputs/postgreSQL/interval.sql Show resolved Hide resolved

dongjoon-hyun reviewed Oct 8, 2019

View reviewed changes

sql/core/src/test/resources/sql-tests/inputs/postgreSQL/interval.sql Show resolved Hide resolved

dongjoon-hyun reviewed Oct 8, 2019

View reviewed changes

dongjoon-hyun added the TESTS label Oct 8, 2019

MaxGekk added 3 commits October 9, 2019 11:43

Style issues

01b4d8f

Comment a query

1ee95c5

Regenerate interval.sql.out

cf263d9

MaxGekk mentioned this pull request Oct 10, 2019

[SPARK-29369][SQL] Support string intervals without the interval prefix #26079

Closed

Change the ticket title regarding to table creation

749fb56

dongjoon-hyun approved these changes Oct 13, 2019

View reviewed changes

dongjoon-hyun closed this in d193248 Oct 13, 2019

MaxGekk deleted the port-interval-sql branch October 15, 2019 19:56


		-- test interval operators

		-- SELECT '' AS ten, * FROM INTERVAL_TBL;

Comments

Conversation

MaxGekk commented Oct 8, 2019 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

What changes were proposed in this pull request?

Why are the changes needed?

Does this PR introduce any user-facing change?

How was this patch tested?

Uh oh!

SparkQA commented Oct 8, 2019

Uh oh!

MaxGekk commented Oct 8, 2019

Uh oh!

SparkQA commented Oct 8, 2019

Uh oh!

dongjoon-hyun commented Oct 8, 2019

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

dongjoon-hyun commented Oct 8, 2019

Uh oh!

MaxGekk commented Oct 11, 2019

Uh oh!

wangyum commented Oct 12, 2019

Uh oh!

SparkQA commented Oct 12, 2019

Uh oh!

dongjoon-hyun left a comment

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

MaxGekk commented Oct 8, 2019 •

edited

Loading