[SPARK-33591][SQL][3.0] Recognize `null` in partition spec values #31095

MaxGekk · 2021-01-08T15:50:54Z

What changes were proposed in this pull request?

Recognize null while parsing partition specs, and put null instead of "null" as partition values.
For V1 catalog: replace null by __HIVE_DEFAULT_PARTITION__.

Why are the changes needed?

Currently, null in partition specs is recognized as the "null" string which could lead to incorrect results, for example:

spark-sql> CREATE TABLE tbl5 (col1 INT, p1 STRING) USING PARQUET PARTITIONED BY (p1);
spark-sql> INSERT INTO TABLE tbl5 PARTITION (p1 = null) SELECT 0;
spark-sql> SELECT isnull(p1) FROM tbl5;
false

Even we inserted a row to the partition with the null value, the resulted table doesn't contain null.

Does this PR introduce any user-facing change?

Yes. After the changes, the example above works as expected:

spark-sql> SELECT isnull(p1) FROM tbl5;
true

How was this patch tested?

By running the affected test suites:

$ build/sbt -Phive -Phive-thriftserver "test:testOnly *SQLQuerySuite"
$ build/sbt -Phive -Phive-thriftserver "test:testOnly *CatalogedDDLSuite"

Authored-by: Max Gekk [email protected]
Signed-off-by: Wenchen Fan [email protected]
(cherry picked from commit 157b72a)
Signed-off-by: Max Gekk [email protected]

1. Recognize `null` while parsing partition specs, and put `null` instead of `"null"` as partition values. 2. For V1 catalog: replace `null` by `__HIVE_DEFAULT_PARTITION__`. 3. For V2 catalogs: pass `null` AS IS, and let catalog implementations to decide how to handle `null`s as partition values in spec. Currently, `null` in partition specs is recognized as the `"null"` string which could lead to incorrect results, for example: ```sql spark-sql> CREATE TABLE tbl5 (col1 INT, p1 STRING) USING PARQUET PARTITIONED BY (p1); spark-sql> INSERT INTO TABLE tbl5 PARTITION (p1 = null) SELECT 0; spark-sql> SELECT isnull(p1) FROM tbl5; false ``` Even we inserted a row to the partition with the `null` value, **the resulted table doesn't contain `null`**. Yes. After the changes, the example above works as expected: ```sql spark-sql> SELECT isnull(p1) FROM tbl5; true ``` 1. By running the affected test suites `SQLQuerySuite`, `AlterTablePartitionV2SQLSuite` and `v1/ShowPartitionsSuite`. 2. Compiling by Scala 2.13: ``` $ ./dev/change-scala-version.sh 2.13 $ ./build/sbt -Pscala-2.13 compile ``` Closes apache#30538 from MaxGekk/partition-spec-value-null. Authored-by: Max Gekk <[email protected]> Signed-off-by: Wenchen Fan <[email protected]> (cherry picked from commit 157b72a) Signed-off-by: Max Gekk <[email protected]>

SparkQA · 2021-01-08T16:54:57Z

Kubernetes integration test starting
URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/38438/

SparkQA · 2021-01-08T17:15:22Z

Kubernetes integration test status success
URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/38438/

SparkQA · 2021-01-08T20:37:38Z

Test build #133849 has finished for PR 31095 at commit 4fef8be.

This patch passes all tests.
This patch merges cleanly.
This patch adds no public classes.

### What changes were proposed in this pull request? 1. Recognize `null` while parsing partition specs, and put `null` instead of `"null"` as partition values. 2. For V1 catalog: replace `null` by `__HIVE_DEFAULT_PARTITION__`. ### Why are the changes needed? Currently, `null` in partition specs is recognized as the `"null"` string which could lead to incorrect results, for example: ```sql spark-sql> CREATE TABLE tbl5 (col1 INT, p1 STRING) USING PARQUET PARTITIONED BY (p1); spark-sql> INSERT INTO TABLE tbl5 PARTITION (p1 = null) SELECT 0; spark-sql> SELECT isnull(p1) FROM tbl5; false ``` Even we inserted a row to the partition with the `null` value, **the resulted table doesn't contain `null`**. ### Does this PR introduce _any_ user-facing change? Yes. After the changes, the example above works as expected: ```sql spark-sql> SELECT isnull(p1) FROM tbl5; true ``` ### How was this patch tested? By running the affected test suites: ``` $ build/sbt -Phive -Phive-thriftserver "test:testOnly *SQLQuerySuite" $ build/sbt -Phive -Phive-thriftserver "test:testOnly *CatalogedDDLSuite" ``` Authored-by: Max Gekk <max.gekkgmail.com> Signed-off-by: Wenchen Fan <wenchendatabricks.com> (cherry picked from commit 157b72a) Signed-off-by: Max Gekk <max.gekkgmail.com> Closes #31095 from MaxGekk/partition-spec-value-null-3.0. Authored-by: Max Gekk <[email protected]> Signed-off-by: Wenchen Fan <[email protected]>

cloud-fan · 2021-01-11T04:58:19Z

thanks, merging to 3.0!

MaxGekk mentioned this pull request Jan 8, 2021

[SPARK-33591][SQL] Recognize null in partition spec values #30538

Closed

cloud-fan closed this Jan 11, 2021

MaxGekk mentioned this pull request Jan 25, 2021

[SPARK-34203][SQL] Convert null partition values to __HIVE_DEFAULT_PARTITION__ in v1 In-Memory catalog #31322

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[SPARK-33591][SQL][3.0] Recognize `null` in partition spec values #31095

[SPARK-33591][SQL][3.0] Recognize `null` in partition spec values #31095

Uh oh!

MaxGekk commented Jan 8, 2021

Uh oh!

SparkQA commented Jan 8, 2021

Uh oh!

SparkQA commented Jan 8, 2021

Uh oh!

SparkQA commented Jan 8, 2021

Uh oh!

cloud-fan commented Jan 11, 2021

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

[SPARK-33591][SQL][3.0] Recognize null in partition spec values #31095

[SPARK-33591][SQL][3.0] Recognize null in partition spec values #31095

Uh oh!

Conversation

MaxGekk commented Jan 8, 2021

What changes were proposed in this pull request?

Why are the changes needed?

Does this PR introduce any user-facing change?

How was this patch tested?

Uh oh!

SparkQA commented Jan 8, 2021

Uh oh!

SparkQA commented Jan 8, 2021

Uh oh!

SparkQA commented Jan 8, 2021

Uh oh!

cloud-fan commented Jan 11, 2021

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

[SPARK-33591][SQL][3.0] Recognize `null` in partition spec values #31095

[SPARK-33591][SQL][3.0] Recognize `null` in partition spec values #31095