-
Notifications
You must be signed in to change notification settings - Fork 29k
[SPARK-33474][SQL] Support TypeConstructed partition spec value #30421
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
|
Kubernetes integration test starting |
|
Kubernetes integration test starting |
|
Kubernetes integration test status failure |
|
Kubernetes integration test status success |
|
Test build #131324 has finished for PR 30421 at commit
|
|
retest this please |
|
Kubernetes integration test starting |
|
Kubernetes integration test status success |
|
Test build #131330 has finished for PR 30421 at commit
|
|
Kubernetes integration test starting |
|
Kubernetes integration test status success |
|
Test build #131352 has finished for PR 30421 at commit
|
|
gentle ping @cloud-fan @maropu |
|
|
||
| test("SPARK-33474: Support TypeConstructed partition spec value") { | ||
| withTable("t") { | ||
| sql("CREATE TABLE t(name STRING) PARTITIONED BY (part DATE) STORED AS ORC") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please add more tests for the other types and add tests in DDLParserSuite.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
btw, why did you use stored as orc explicitly for this test?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
btw, why did you use
stored as orcexplicitly for this test?
removed
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please add more tests for the other types and add tests in
DDLParserSuite.
Both updated
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/parser/AstBuilder.scala
Outdated
Show resolved
Hide resolved
Is the description above incomplete? Btw, could you put an example query that this PR intends to support in the PR description? |
Yea... seem I forgot to save it, updated. |
|
Test build #131646 has finished for PR 30421 at commit
|
|
Test build #131650 has finished for PR 30421 at commit
|
|
Kubernetes integration test starting |
|
Kubernetes integration test starting |
|
Kubernetes integration test status failure |
|
Kubernetes integration test status failure |
|
Kubernetes integration test starting |
|
Kubernetes integration test status failure |
|
Test build #135599 has finished for PR 30421 at commit
|
|
Test build #135600 has finished for PR 30421 at commit
|
|
Test build #135602 has finished for PR 30421 at commit
|
|
Test build #135620 has finished for PR 30421 at commit
|
|
|
||
| sql( | ||
| s""" | ||
| | INSERT OVERWRITE t1 PARTITION( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
seems we can test with only one insert: part4 to test plain string, part5, part6 and part7 to test type conversion from date/timestamp/binary.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done
|
Kubernetes integration test starting |
|
Kubernetes integration test status failure |
sql/core/src/test/scala/org/apache/spark/sql/SQLInsertTestSuite.scala
Outdated
Show resolved
Hide resolved
|
Kubernetes integration test starting |
|
Test build #135654 has finished for PR 30421 at commit
|
|
Kubernetes integration test status success |
|
Thanks! Merged to master. |
### What changes were proposed in this pull request?
Hive support type constructed value as partition spec value, spark should support too.
### Why are the changes needed?
Support TypeConstructed partition spec value keep same with hive
### Does this PR introduce _any_ user-facing change?
Yes, user can use TypeConstruct value as partition spec value such as
```
CREATE TABLE t1(name STRING) PARTITIONED BY (part DATE)
INSERT INTO t1 PARTITION(part = date'2019-01-02') VALUES('a')
CREATE TABLE t2(name STRING) PARTITIONED BY (part TIMESTAMP)
INSERT INTO t2 PARTITION(part = timestamp'2019-01-02 11:11:11') VALUES('a')
CREATE TABLE t4(name STRING) PARTITIONED BY (part BINARY)
INSERT INTO t4 PARTITION(part = X'537061726B2053514C') VALUES('a')
```
### How was this patch tested?
Added UT
Closes apache#30421 from AngersZhuuuu/SPARK-33474.
Lead-authored-by: angerszhu <[email protected]>
Co-authored-by: Angerszhuuuu <[email protected]>
Co-authored-by: AngersZhuuuu <[email protected]>
Signed-off-by: Takeshi Yamamuro <[email protected]>
… numeric types
### What changes were proposed in this pull request?
Ensure that partitions of type string without quotation marks are not recognized as numeric types.
For example:
```
create table if not exists test_90(a string, b string) partitioned by (dt string);
desc formatted test_90;
insert into table test_90 partition (dt=05) values("1","2");
insert into table test_90 partition (dt='05') values("1","2");
drop table test_90;
```
before spark3.1 and earlier, it will generate such a path: `hdfs://test5/user/hive/db1/test_90/dt=05`
```
spark-sql> select * from test_90;
1 2 05
1 2 05
Time taken: 1.316 seconds, Fetched 2 row(s)
spark-sql> show partitions test_90;
dt=05
Time taken: 0.201 seconds, Fetched 1 row(s)
spark-sql> select * from test_90 where dt='05';
1 2 05
1 2 05
Time taken: 0.212 seconds, Fetched 2 row(s)
```
after spark3.1, it will generate two path: `hdfs://test5/user/hive/db1/test_90/dt=05` and `hdfs://test5/user/hive/db1/test_90/dt=5`
```
spark-sql> select * from test_90;
1 2 05
1 2 5
Time taken: 2.119 seconds, Fetched 2 row(s)
spark-sql> show partitions test_90;
dt=05
dt=5
Time taken: 0.161 seconds, Fetched 2 row(s)
spark-sql> select * from test_90 where dt='05';
1 2 05
Time taken: 0.252 seconds, Fetched 1 row(s)
```
This will cause inconsistent read data. After seeing [https://github.com/apache/spark/pull/30421](https://github.com/apache/spark/pull/30421), I think if the user does not know about this change and the migration document does not mention it, I think it will affect the data quality, so I added the parameter `spark.sql.legacy.keepPartitionSpecAsStringLiteral`, which will maintain the original effect when the parameter set `true`.
### Why are the changes needed?
If the partition is of `String`, but the value of partition without quotation marks, it will still be treated as `String` through parameter configuration.
### Does this PR introduce _any_ user-facing change?
After the parameter `spark.sql.legacy.keepPartitionSpecAsStringLiteral` is enabled, the partition path generated by partition `partition (dt=05)` and partition `partition (dt='05')` is the same.
### How was this patch tested?
New uts.
Closes #39558 from smallzhongfeng/SPARK-41982.
Authored-by: smallzhongfeng <[email protected]>
Signed-off-by: Wenchen Fan <[email protected]>
What changes were proposed in this pull request?
Hive support type constructed value as partition spec value, spark should support too.
Why are the changes needed?
Support TypeConstructed partition spec value keep same with hive
Does this PR introduce any user-facing change?
Yes, user can use TypeConstruct value as partition spec value such as
How was this patch tested?
Added UT