Skip to content

Conversation

@AngersZhuuuu
Copy link
Contributor

@AngersZhuuuu AngersZhuuuu commented Nov 19, 2020

What changes were proposed in this pull request?

Hive support type constructed value as partition spec value, spark should support too.

Why are the changes needed?

Support TypeConstructed partition spec value keep same with hive

Does this PR introduce any user-facing change?

Yes, user can use TypeConstruct value as partition spec value such as

CREATE TABLE t1(name STRING) PARTITIONED BY (part DATE)
INSERT INTO t1 PARTITION(part = date'2019-01-02') VALUES('a')

CREATE TABLE t2(name STRING) PARTITIONED BY (part TIMESTAMP)
INSERT INTO t2 PARTITION(part = timestamp'2019-01-02 11:11:11') VALUES('a')

CREATE TABLE t4(name STRING) PARTITIONED BY (part BINARY)
INSERT INTO t4 PARTITION(part = X'537061726B2053514C') VALUES('a')

How was this patch tested?

Added UT

@github-actions github-actions bot added the SQL label Nov 19, 2020
@SparkQA
Copy link

SparkQA commented Nov 19, 2020

Kubernetes integration test starting
URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/35926/

@SparkQA
Copy link

SparkQA commented Nov 19, 2020

Kubernetes integration test starting
URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/35928/

@SparkQA
Copy link

SparkQA commented Nov 19, 2020

Kubernetes integration test status failure
URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/35926/

@SparkQA
Copy link

SparkQA commented Nov 19, 2020

Kubernetes integration test status success
URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/35928/

@SparkQA
Copy link

SparkQA commented Nov 19, 2020

Test build #131324 has finished for PR 30421 at commit c9a97d0.

  • This patch fails due to an unknown error code, -9.
  • This patch merges cleanly.
  • This patch adds no public classes.

@AngersZhuuuu
Copy link
Contributor Author

retest this please

@SparkQA
Copy link

SparkQA commented Nov 19, 2020

Kubernetes integration test starting
URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/35935/

@SparkQA
Copy link

SparkQA commented Nov 19, 2020

Kubernetes integration test status success
URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/35935/

@SparkQA
Copy link

SparkQA commented Nov 19, 2020

Test build #131330 has finished for PR 30421 at commit c9a97d0.

  • This patch fails Spark unit tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@SparkQA
Copy link

SparkQA commented Nov 19, 2020

Kubernetes integration test starting
URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/35956/

@SparkQA
Copy link

SparkQA commented Nov 19, 2020

Kubernetes integration test status success
URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/35956/

@SparkQA
Copy link

SparkQA commented Nov 19, 2020

Test build #131352 has finished for PR 30421 at commit bcdc7e5.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@AngersZhuuuu
Copy link
Contributor Author

gentle ping @cloud-fan @maropu


test("SPARK-33474: Support TypeConstructed partition spec value") {
withTable("t") {
sql("CREATE TABLE t(name STRING) PARTITIONED BY (part DATE) STORED AS ORC")
Copy link
Member

@maropu maropu Nov 24, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please add more tests for the other types and add tests in DDLParserSuite.

Copy link
Member

@maropu maropu Nov 24, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

btw, why did you use stored as orc explicitly for this test?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

btw, why did you use stored as orc explicitly for this test?

removed

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please add more tests for the other types and add tests in DDLParserSuite.

Both updated

@maropu
Copy link
Member

maropu commented Nov 24, 2020

Yes, user can use TypeConstruct value as partition spec value such as
``

Is the description above incomplete? Btw, could you put an example query that this PR intends to support in the PR description?

@AngersZhuuuu
Copy link
Contributor Author

Yes, user can use TypeConstruct value as partition spec value such as
``

Is the description above incomplete? Btw, could you put an example query that this PR intends to support in the PR description?

Yea... seem I forgot to save it, updated.

@SparkQA
Copy link

SparkQA commented Nov 24, 2020

Test build #131646 has finished for PR 30421 at commit 6adefa7.

  • This patch fails Spark unit tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@SparkQA
Copy link

SparkQA commented Nov 24, 2020

Test build #131650 has finished for PR 30421 at commit 05f1962.

  • This patch fails Spark unit tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@SparkQA
Copy link

SparkQA commented Mar 1, 2021

Kubernetes integration test starting
URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/40180/

@SparkQA
Copy link

SparkQA commented Mar 1, 2021

Kubernetes integration test starting
URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/40181/

@SparkQA
Copy link

SparkQA commented Mar 1, 2021

Kubernetes integration test status failure
URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/40180/

@SparkQA
Copy link

SparkQA commented Mar 1, 2021

Kubernetes integration test status failure
URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/40181/

@SparkQA
Copy link

SparkQA commented Mar 1, 2021

Kubernetes integration test starting
URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/40183/

@SparkQA
Copy link

SparkQA commented Mar 1, 2021

Kubernetes integration test status failure
URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/40183/

@SparkQA
Copy link

SparkQA commented Mar 1, 2021

Test build #135599 has finished for PR 30421 at commit 2ef0fd0.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@SparkQA
Copy link

SparkQA commented Mar 1, 2021

Test build #135600 has finished for PR 30421 at commit 4023041.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@SparkQA
Copy link

SparkQA commented Mar 1, 2021

Test build #135602 has finished for PR 30421 at commit fe5095a.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@SparkQA
Copy link

SparkQA commented Mar 2, 2021

Test build #135620 has finished for PR 30421 at commit 63a4fb4.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.


sql(
s"""
| INSERT OVERWRITE t1 PARTITION(
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

seems we can test with only one insert: part4 to test plain string, part5, part6 and part7 to test type conversion from date/timestamp/binary.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done

@SparkQA
Copy link

SparkQA commented Mar 2, 2021

Kubernetes integration test starting
URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/40234/

@SparkQA
Copy link

SparkQA commented Mar 2, 2021

Kubernetes integration test status failure
URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/40234/

@SparkQA
Copy link

SparkQA commented Mar 2, 2021

Kubernetes integration test starting
URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/40244/

@SparkQA
Copy link

SparkQA commented Mar 2, 2021

Test build #135654 has finished for PR 30421 at commit 4f61e60.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@SparkQA
Copy link

SparkQA commented Mar 2, 2021

Kubernetes integration test status success
URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/40244/

@maropu
Copy link
Member

maropu commented Mar 3, 2021

Thanks! Merged to master.

@maropu maropu closed this in 56edb81 Mar 3, 2021
xuanyuanking pushed a commit to xuanyuanking/spark that referenced this pull request Sep 29, 2021
### What changes were proposed in this pull request?
Hive support type constructed value as partition spec value, spark should support too.

### Why are the changes needed?
 Support TypeConstructed partition spec value keep same with hive

### Does this PR introduce _any_ user-facing change?
Yes, user can use TypeConstruct value as partition spec value such as
```
CREATE TABLE t1(name STRING) PARTITIONED BY (part DATE)
INSERT INTO t1 PARTITION(part = date'2019-01-02') VALUES('a')

CREATE TABLE t2(name STRING) PARTITIONED BY (part TIMESTAMP)
INSERT INTO t2 PARTITION(part = timestamp'2019-01-02 11:11:11') VALUES('a')

CREATE TABLE t4(name STRING) PARTITIONED BY (part BINARY)
INSERT INTO t4 PARTITION(part = X'537061726B2053514C') VALUES('a')
```

### How was this patch tested?
Added UT

Closes apache#30421 from AngersZhuuuu/SPARK-33474.

Lead-authored-by: angerszhu <[email protected]>
Co-authored-by: Angerszhuuuu <[email protected]>
Co-authored-by: AngersZhuuuu <[email protected]>
Signed-off-by: Takeshi Yamamuro <[email protected]>
cloud-fan referenced this pull request Jan 17, 2023
… numeric types

### What changes were proposed in this pull request?

Ensure that partitions of type string without quotation marks are not recognized as numeric types.
For example:
```
create table if not exists test_90(a string, b string) partitioned by (dt string);
desc formatted test_90;
insert into table test_90 partition (dt=05) values("1","2");
insert into table test_90 partition (dt='05') values("1","2");
drop table test_90;
```
before spark3.1 and earlier, it will generate such a path: `hdfs://test5/user/hive/db1/test_90/dt=05`

```
spark-sql> select * from test_90;
1       2       05
1       2       05
Time taken: 1.316 seconds, Fetched 2 row(s)

spark-sql> show partitions test_90;
dt=05
Time taken: 0.201 seconds, Fetched 1 row(s)

spark-sql> select * from test_90 where dt='05';
1       2       05
1       2       05
Time taken: 0.212 seconds, Fetched 2 row(s)
```

after spark3.1, it will generate two path:  `hdfs://test5/user/hive/db1/test_90/dt=05` and `hdfs://test5/user/hive/db1/test_90/dt=5`

```
spark-sql> select * from test_90;
1       2       05
1       2       5
Time taken: 2.119 seconds, Fetched 2 row(s)

spark-sql> show partitions test_90;
dt=05
dt=5
Time taken: 0.161 seconds, Fetched 2 row(s)

spark-sql> select * from test_90 where dt='05';
1       2       05
Time taken: 0.252 seconds, Fetched 1 row(s)
```

This will cause inconsistent read data. After seeing [https://github.com/apache/spark/pull/30421](https://github.com/apache/spark/pull/30421), I think if the user does not know about this change and the migration document does not mention it, I think it will affect the data quality, so I added the parameter `spark.sql.legacy.keepPartitionSpecAsStringLiteral`, which will maintain the original effect when the parameter set `true`.

### Why are the changes needed?

If the partition is of `String`, but the value of partition without quotation marks, it will still be treated as `String` through parameter configuration.

### Does this PR introduce _any_ user-facing change?

After the parameter `spark.sql.legacy.keepPartitionSpecAsStringLiteral` is enabled, the partition path generated by partition `partition (dt=05)` and partition `partition (dt='05')` is the same.

### How was this patch tested?

New uts.

Closes #39558 from smallzhongfeng/SPARK-41982.

Authored-by: smallzhongfeng <[email protected]>
Signed-off-by: Wenchen Fan <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

6 participants