[SPARK-29368][SQL][TEST] Port interval.sql#26055
[SPARK-29368][SQL][TEST] Port interval.sql#26055MaxGekk wants to merge 11 commits intoapache:masterfrom
Conversation
|
Test build #111890 has started for PR 26055 at commit |
|
@dongjoon-hyun @maropu @wangyum Could you review this PR, please. |
|
Test build #111915 has finished for PR 26055 at commit
|
|
Sure. Thank you for doing this and pinging me, @MaxGekk . |
| -- SELECT INTERVAL '1.5 months' AS `One month 15 days`; | ||
| -- SELECT INTERVAL '10 years -11 month -12 days +13:14' AS `9 years...`; | ||
|
|
||
| -- [SPARK-29382] Support the `INTERVAL` type by Parquet datasource |
There was a problem hiding this comment.
Since this is not Parquet specific issue, I commented on the JIRA. We need to update the JIRA issue title and this comment.
There was a problem hiding this comment.
ok. I just looked at other *.sql to see how we solved issue of creating a table with explicit datasource like this:
spark-sql> CREATE TABLE INTERVAL_TBL (f1 int);
19/10/09 11:07:19 WARN HiveMetaStore: Location: file:/user/hive/warehouse/interval_tbl specified for non-external table:interval_tbl
Error in query: org.apache.hadoop.hive.ql.metadata.HiveException: MetaException(message:file:/user/hive/warehouse/interval_tbl is not a directory or unable to create one);
So, we added USING parquet everywhere. If using parquet is some kind of common agreement. Isn't the issue in the Parquet datasource because it doesn't allow writing values of CalendarIntervalType.
We need to update the JIRA issue title and this comment.
Could you give me any clues what would you like to see in the title. Your comment "Since this is not Parquet specific issue" gave me nothing, sorry.
There was a problem hiding this comment.
We do not support CREATE TABLE t(f1 int) in sql/core module:
-- !query 0
CREATE TABLE t(f1 int)
-- !query 0 schema
struct<>
-- !query 0 output
org.apache.spark.sql.AnalysisException
Hive support is required to CREATE Hive TABLE (AS SELECT);
So I added USING parquet everywhere. How about change it to?
-- [SPARK-29382] Support writing `INTERVAL` type to datasource table
There was a problem hiding this comment.
This is too much generic, from my point of view. If we use parquet everywhere in the ported tests, interval.sql shouldn't be exclusion. And here we have concrete problem is the parquet datasource doesn't support writing values of the interval type.
-- [SPARK-29382] Support writing
INTERVALtype to datasource table
Does it mean that INTERVAL should be supported by all builtin datasources?
I will change the title of the JIRA ticket to unblock this PR.
|
|
||
| -- test interval operators | ||
|
|
||
| -- SELECT '' AS ten, * FROM INTERVAL_TBL; |
There was a problem hiding this comment.
This is commented out because we couldn't create the table INTERVAL_TBL instead of not supporting this.
|
|
||
| -- test inputting and outputting SQL standard interval literals | ||
| -- SET IntervalStyle TO sql_standard; | ||
| SELECT interval '0' AS zero, |
There was a problem hiding this comment.
It seems that our result is NULL in the output which is different from PostgreSQL. If we have a JIRA issue for this, could you add at line 249?
postgres=# select interval '0';
interval
----------
00:00:00
(1 row)
There was a problem hiding this comment.
Probably, this https://issues.apache.org/jira/browse/SPARK-29391 should address this issue
There was a problem hiding this comment.
Opened the separate ticket for this: https://issues.apache.org/jira/browse/SPARK-29407
| interval '1-2' year to month AS `year-month`, | ||
| interval '1 2:03:04' day to second AS `day-time`, | ||
| - interval '1-2' AS `negative year-month`, | ||
| - interval '1 2:03:04' AS `negative day-time`; |
There was a problem hiding this comment.
The above two expressions seem to NULL in the generated output which are different from PostgreSQL. It seems that we already have the issue for this. If you don't mind, could you add the JIRA reference for these here explicitly?
zero | year-month | day-time | negative year-month | negative day-time
------+------------+-----------+---------------------+-------------------
0 | 1-2 | 1 2:03:04 | -1-2 | -1 2:03:04
There was a problem hiding this comment.
|
In general, this is good for test coverage. I left minor comments, @MaxGekk . |
|
@dongjoon-hyun @wangyum Could you take a look at this one more time. This PR can be merged without jenkins, I hope. |
|
retest this please |
|
Test build #111942 has finished for PR 26055 at commit
|
What changes were proposed in this pull request?
This PR is to port interval.sql from PostgreSQL regression tests: https://raw.githubusercontent.com/postgres/postgres/REL_12_STABLE/src/test/regress/sql/interval.sql
The expected results can be found in the link: https://github.com/postgres/postgres/blob/REL_12_STABLE/src/test/regress/expected/interval.out
When porting the test cases, found PostgreSQL specific features below that do not exist in Spark SQL:
intervalprefix in casting to intervalsINTERVALtype to datasource table@in interval stringsagoin interval stringsINTERVALvalues comparable*and\operators for intervalsmillenniums,centuriesordecadesunits-Why are the changes needed?
To improve the test coverage, see https://issues.apache.org/jira/browse/SPARK-27763
Does this PR introduce any user-facing change?
No
How was this patch tested?
By manually comparing Spark results with PostgreSQL