[SPARK-33474][SQL] Support TypeConstructed partition spec value #30421

AngersZhuuuu · 2020-11-19T06:23:48Z

What changes were proposed in this pull request?

Hive support type constructed value as partition spec value, spark should support too.

Why are the changes needed?

Support TypeConstructed partition spec value keep same with hive

Does this PR introduce any user-facing change?

Yes, user can use TypeConstruct value as partition spec value such as

CREATE TABLE t1(name STRING) PARTITIONED BY (part DATE)
INSERT INTO t1 PARTITION(part = date'2019-01-02') VALUES('a')

CREATE TABLE t2(name STRING) PARTITIONED BY (part TIMESTAMP)
INSERT INTO t2 PARTITION(part = timestamp'2019-01-02 11:11:11') VALUES('a')

CREATE TABLE t4(name STRING) PARTITIONED BY (part BINARY)
INSERT INTO t4 PARTITION(part = X'537061726B2053514C') VALUES('a')

How was this patch tested?

Added UT

SparkQA · 2020-11-19T07:05:29Z

Kubernetes integration test starting
URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/35926/

SparkQA · 2020-11-19T07:19:28Z

Kubernetes integration test starting
URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/35928/

SparkQA · 2020-11-19T07:30:18Z

Kubernetes integration test status failure
URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/35926/

SparkQA · 2020-11-19T07:50:58Z

Kubernetes integration test status success
URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/35928/

SparkQA · 2020-11-19T08:05:02Z

Test build #131324 has finished for PR 30421 at commit c9a97d0.

This patch fails due to an unknown error code, -9.
This patch merges cleanly.
This patch adds no public classes.

AngersZhuuuu · 2020-11-19T08:08:28Z

retest this please

SparkQA · 2020-11-19T08:58:42Z

Kubernetes integration test starting
URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/35935/

SparkQA · 2020-11-19T09:30:58Z

Kubernetes integration test status success
URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/35935/

SparkQA · 2020-11-19T13:21:36Z

Test build #131330 has finished for PR 30421 at commit c9a97d0.

This patch fails Spark unit tests.
This patch merges cleanly.
This patch adds no public classes.

SparkQA · 2020-11-19T15:05:53Z

Kubernetes integration test starting
URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/35956/

SparkQA · 2020-11-19T15:28:04Z

Kubernetes integration test status success
URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/35956/

SparkQA · 2020-11-19T19:51:18Z

Test build #131352 has finished for PR 30421 at commit bcdc7e5.

This patch passes all tests.
This patch merges cleanly.
This patch adds no public classes.

AngersZhuuuu · 2020-11-23T01:23:44Z

gentle ping @cloud-fan @maropu

maropu · 2020-11-24T05:09:57Z

sql/hive/src/test/scala/org/apache/spark/sql/hive/execution/SQLQuerySuite.scala

+
+  test("SPARK-33474: Support TypeConstructed partition spec value") {
+    withTable("t") {
+      sql("CREATE TABLE t(name STRING) PARTITIONED BY (part DATE) STORED AS ORC")


Please add more tests for the other types and add tests in DDLParserSuite.

btw, why did you use stored as orc explicitly for this test?

btw, why did you use stored as orc explicitly for this test?

removed

Please add more tests for the other types and add tests in DDLParserSuite.

Both updated

sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/parser/AstBuilder.scala

maropu · 2020-11-24T05:27:15Z

Yes, user can use TypeConstruct value as partition spec value such as
``

Is the description above incomplete? Btw, could you put an example query that this PR intends to support in the PR description?

AngersZhuuuu · 2020-11-24T10:21:41Z

Yes, user can use TypeConstruct value as partition spec value such as
``

Is the description above incomplete? Btw, could you put an example query that this PR intends to support in the PR description?

Yea... seem I forgot to save it， updated.

SparkQA · 2020-11-24T11:26:13Z

Test build #131646 has finished for PR 30421 at commit 6adefa7.

This patch fails Spark unit tests.
This patch merges cleanly.
This patch adds no public classes.

SparkQA · 2020-11-24T12:08:48Z

Test build #131650 has finished for PR 30421 at commit 05f1962.

This patch fails Spark unit tests.
This patch merges cleanly.
This patch adds no public classes.

SparkQA · 2021-03-01T17:47:52Z

Kubernetes integration test starting
URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/40180/

SparkQA · 2021-03-01T18:00:44Z

Kubernetes integration test starting
URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/40181/

SparkQA · 2021-03-01T18:23:00Z

Kubernetes integration test status failure
URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/40180/

SparkQA · 2021-03-01T18:37:20Z

Kubernetes integration test status failure
URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/40181/

SparkQA · 2021-03-01T18:46:05Z

Kubernetes integration test starting
URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/40183/

SparkQA · 2021-03-01T18:55:24Z

Kubernetes integration test status failure
URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/40183/

SparkQA · 2021-03-01T21:30:21Z

Test build #135599 has finished for PR 30421 at commit 2ef0fd0.

This patch passes all tests.
This patch merges cleanly.
This patch adds no public classes.

SparkQA · 2021-03-01T21:48:12Z

Test build #135600 has finished for PR 30421 at commit 4023041.

This patch passes all tests.
This patch merges cleanly.
This patch adds no public classes.

SparkQA · 2021-03-01T22:36:20Z

Test build #135602 has finished for PR 30421 at commit fe5095a.

This patch passes all tests.
This patch merges cleanly.
This patch adds no public classes.

SparkQA · 2021-03-02T07:01:19Z

Test build #135620 has finished for PR 30421 at commit 63a4fb4.

This patch passes all tests.
This patch merges cleanly.
This patch adds no public classes.

docs/sql-migration-guide.md

cloud-fan · 2021-03-02T09:34:33Z

sql/core/src/test/scala/org/apache/spark/sql/SQLInsertTestSuite.scala

+
+      sql(
+        s"""
+           | INSERT OVERWRITE t1 PARTITION(


seems we can test with only one insert: part4 to test plain string, part5, part6 and part7 to test type conversion from date/timestamp/binary.

SparkQA · 2021-03-02T10:36:50Z

Kubernetes integration test starting
URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/40234/

SparkQA · 2021-03-02T10:41:56Z

Kubernetes integration test status failure
URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/40234/

sql/core/src/test/scala/org/apache/spark/sql/SQLInsertTestSuite.scala

SparkQA · 2021-03-02T14:48:42Z

Kubernetes integration test starting
URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/40244/

SparkQA · 2021-03-02T14:51:49Z

Test build #135654 has finished for PR 30421 at commit 4f61e60.

This patch passes all tests.
This patch merges cleanly.
This patch adds no public classes.

SparkQA · 2021-03-02T15:16:21Z

Kubernetes integration test status success
URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/40244/

maropu · 2021-03-03T07:49:04Z

Thanks! Merged to master.

### What changes were proposed in this pull request? Hive support type constructed value as partition spec value, spark should support too. ### Why are the changes needed? Support TypeConstructed partition spec value keep same with hive ### Does this PR introduce _any_ user-facing change? Yes, user can use TypeConstruct value as partition spec value such as ``` CREATE TABLE t1(name STRING) PARTITIONED BY (part DATE) INSERT INTO t1 PARTITION(part = date'2019-01-02') VALUES('a') CREATE TABLE t2(name STRING) PARTITIONED BY (part TIMESTAMP) INSERT INTO t2 PARTITION(part = timestamp'2019-01-02 11:11:11') VALUES('a') CREATE TABLE t4(name STRING) PARTITIONED BY (part BINARY) INSERT INTO t4 PARTITION(part = X'537061726B2053514C') VALUES('a') ``` ### How was this patch tested? Added UT Closes apache#30421 from AngersZhuuuu/SPARK-33474. Lead-authored-by: angerszhu <[email protected]> Co-authored-by: Angerszhuuuu <[email protected]> Co-authored-by: AngersZhuuuu <[email protected]> Signed-off-by: Takeshi Yamamuro <[email protected]>

… numeric types ### What changes were proposed in this pull request? Ensure that partitions of type string without quotation marks are not recognized as numeric types. For example: ``` create table if not exists test_90(a string, b string) partitioned by (dt string); desc formatted test_90; insert into table test_90 partition (dt=05) values("1","2"); insert into table test_90 partition (dt='05') values("1","2"); drop table test_90; ``` before spark3.1 and earlier, it will generate such a path: `hdfs://test5/user/hive/db1/test_90/dt=05` ``` spark-sql> select * from test_90; 1 2 05 1 2 05 Time taken: 1.316 seconds, Fetched 2 row(s) spark-sql> show partitions test_90; dt=05 Time taken: 0.201 seconds, Fetched 1 row(s) spark-sql> select * from test_90 where dt='05'; 1 2 05 1 2 05 Time taken: 0.212 seconds, Fetched 2 row(s) ``` after spark3.1, it will generate two path: `hdfs://test5/user/hive/db1/test_90/dt=05` and `hdfs://test5/user/hive/db1/test_90/dt=5` ``` spark-sql> select * from test_90; 1 2 05 1 2 5 Time taken: 2.119 seconds, Fetched 2 row(s) spark-sql> show partitions test_90; dt=05 dt=5 Time taken: 0.161 seconds, Fetched 2 row(s) spark-sql> select * from test_90 where dt='05'; 1 2 05 Time taken: 0.252 seconds, Fetched 1 row(s) ``` This will cause inconsistent read data. After seeing [https://github.com/apache/spark/pull/30421](https://github.com/apache/spark/pull/30421), I think if the user does not know about this change and the migration document does not mention it, I think it will affect the data quality, so I added the parameter `spark.sql.legacy.keepPartitionSpecAsStringLiteral`, which will maintain the original effect when the parameter set `true`. ### Why are the changes needed? If the partition is of `String`, but the value of partition without quotation marks, it will still be treated as `String` through parameter configuration. ### Does this PR introduce _any_ user-facing change? After the parameter `spark.sql.legacy.keepPartitionSpecAsStringLiteral` is enabled, the partition path generated by partition `partition (dt=05)` and partition `partition (dt='05')` is the same. ### How was this patch tested? New uts. Closes #39558 from smallzhongfeng/SPARK-41982. Authored-by: smallzhongfeng <[email protected]> Signed-off-by: Wenchen Fan <[email protected]>

AngersZhuuuu added 3 commits November 19, 2020 14:16

[SPARK-33474][SQL] Support TypeConstructed partition spec value

adb1842

Update AstBuilder.scala

ae59115

Update AstBuilder.scala

c9a97d0

github-actions bot added the SQL label Nov 19, 2020

Update SQLQuerySuite.scala

bcdc7e5

maropu reviewed Nov 24, 2020

View reviewed changes

sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/parser/AstBuilder.scala Outdated Show resolved Hide resolved

AngersZhuuuu added 6 commits November 24, 2020 17:50

FOLLOW COMMENT

d171377

Update SQLQuerySuite.scala

516c070

Update DDLParserSuite.scala

251c36b

Merge branch 'master' into SPARK-33474

1590e8a

Update DDLParserSuite.scala

6adefa7

Update SQLQuerySuite.scala

05f1962

follow comment

63a4fb4

cloud-fan reviewed Mar 2, 2021

View reviewed changes

docs/sql-migration-guide.md Outdated Show resolved Hide resolved

cloud-fan reviewed Mar 2, 2021

View reviewed changes

follow comment

4f61e60

cloud-fan reviewed Mar 2, 2021

View reviewed changes

sql/core/src/test/scala/org/apache/spark/sql/SQLInsertTestSuite.scala Outdated Show resolved Hide resolved

cloud-fan approved these changes Mar 2, 2021

View reviewed changes

Update SQLInsertTestSuite.scala

08c55f6

maropu approved these changes Mar 3, 2021

View reviewed changes

maropu closed this in 56edb81 Mar 3, 2021

smallzhongfeng mentioned this pull request Jan 13, 2023

[SPARK-41982][SQL] Partitions of type string should not be treated as numeric types #39558

Closed

[SPARK-33474][SQL] Support TypeConstructed partition spec value #30421

[SPARK-33474][SQL] Support TypeConstructed partition spec value #30421

Uh oh!

Conversation

AngersZhuuuu commented Nov 19, 2020 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

What changes were proposed in this pull request?

Why are the changes needed?

Does this PR introduce any user-facing change?

How was this patch tested?

Uh oh!

SparkQA commented Nov 19, 2020

Uh oh!

SparkQA commented Nov 19, 2020

Uh oh!

SparkQA commented Nov 19, 2020

Uh oh!

SparkQA commented Nov 19, 2020

Uh oh!

SparkQA commented Nov 19, 2020

Uh oh!

AngersZhuuuu commented Nov 19, 2020

Uh oh!

SparkQA commented Nov 19, 2020

Uh oh!

SparkQA commented Nov 19, 2020

Uh oh!

SparkQA commented Nov 19, 2020

Uh oh!

SparkQA commented Nov 19, 2020

Uh oh!

SparkQA commented Nov 19, 2020

Uh oh!

SparkQA commented Nov 19, 2020

Uh oh!

AngersZhuuuu commented Nov 23, 2020

Uh oh!

maropu Nov 24, 2020 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

maropu Nov 24, 2020 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

AngersZhuuuu Nov 24, 2020

Choose a reason for hiding this comment

Uh oh!

AngersZhuuuu Nov 24, 2020

Choose a reason for hiding this comment

Uh oh!

Uh oh!

maropu commented Nov 24, 2020

Uh oh!

AngersZhuuuu commented Nov 24, 2020

Uh oh!

SparkQA commented Nov 24, 2020

Uh oh!

SparkQA commented Nov 24, 2020

Uh oh!

SparkQA commented Mar 1, 2021

Uh oh!

SparkQA commented Mar 1, 2021

Uh oh!

SparkQA commented Mar 1, 2021

Uh oh!

SparkQA commented Mar 1, 2021

Uh oh!

SparkQA commented Mar 1, 2021

Uh oh!

SparkQA commented Mar 1, 2021

Uh oh!

SparkQA commented Mar 1, 2021

Uh oh!

SparkQA commented Mar 1, 2021

Uh oh!

SparkQA commented Mar 1, 2021

Uh oh!

SparkQA commented Mar 2, 2021

Uh oh!

AngersZhuuuu commented Nov 19, 2020 •

edited

Loading

maropu Nov 24, 2020 •

edited

Loading

maropu Nov 24, 2020 •

edited

Loading