[SPARK-33591][SQL][3.1] Recognize `null` in partition spec values #31094

MaxGekk · 2021-01-08T15:18:30Z

What changes were proposed in this pull request?

Recognize null while parsing partition specs, and put null instead of "null" as partition values.
For V1 catalog: replace null by __HIVE_DEFAULT_PARTITION__.
For V2 catalogs: pass null AS IS, and let catalog implementations to decide how to handle nulls as partition values in spec.

Why are the changes needed?

Currently, null in partition specs is recognized as the "null" string which could lead to incorrect results, for example:

spark-sql> CREATE TABLE tbl5 (col1 INT, p1 STRING) USING PARQUET PARTITIONED BY (p1);
spark-sql> INSERT INTO TABLE tbl5 PARTITION (p1 = null) SELECT 0;
spark-sql> SELECT isnull(p1) FROM tbl5;
false

Even we inserted a row to the partition with the null value, the resulted table doesn't contain null.

Does this PR introduce any user-facing change?

Yes. After the changes, the example above works as expected:

spark-sql> SELECT isnull(p1) FROM tbl5;
true

How was this patch tested?

By running the affected test suites SQLQuerySuite, AlterTablePartitionV2SQLSuite and v1/ShowPartitionsSuite.

Authored-by: Max Gekk [email protected]
Signed-off-by: Wenchen Fan [email protected]
(cherry picked from commit 157b72a)
Signed-off-by: Max Gekk [email protected]

1. Recognize `null` while parsing partition specs, and put `null` instead of `"null"` as partition values. 2. For V1 catalog: replace `null` by `__HIVE_DEFAULT_PARTITION__`. 3. For V2 catalogs: pass `null` AS IS, and let catalog implementations to decide how to handle `null`s as partition values in spec. Currently, `null` in partition specs is recognized as the `"null"` string which could lead to incorrect results, for example: ```sql spark-sql> CREATE TABLE tbl5 (col1 INT, p1 STRING) USING PARQUET PARTITIONED BY (p1); spark-sql> INSERT INTO TABLE tbl5 PARTITION (p1 = null) SELECT 0; spark-sql> SELECT isnull(p1) FROM tbl5; false ``` Even we inserted a row to the partition with the `null` value, **the resulted table doesn't contain `null`**. Yes. After the changes, the example above works as expected: ```sql spark-sql> SELECT isnull(p1) FROM tbl5; true ``` 1. By running the affected test suites `SQLQuerySuite`, `AlterTablePartitionV2SQLSuite` and `v1/ShowPartitionsSuite`. 2. Compiling by Scala 2.13: ``` $ ./dev/change-scala-version.sh 2.13 $ ./build/sbt -Pscala-2.13 compile ``` Closes apache#30538 from MaxGekk/partition-spec-value-null. Authored-by: Max Gekk <[email protected]> Signed-off-by: Wenchen Fan <[email protected]> (cherry picked from commit 157b72a) Signed-off-by: Max Gekk <[email protected]>

SparkQA · 2021-01-08T16:08:58Z

Kubernetes integration test starting
URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/38437/

SparkQA · 2021-01-08T16:42:29Z

Kubernetes integration test status failure
URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/38437/

SparkQA · 2021-01-08T19:55:01Z

Test build #133848 has finished for PR 31094 at commit 41df7ed.

This patch passes all tests.
This patch merges cleanly.
This patch adds no public classes.

cloud-fan · 2021-01-11T04:56:51Z

thanks, merging to 3.1!

### What changes were proposed in this pull request? 1. Recognize `null` while parsing partition specs, and put `null` instead of `"null"` as partition values. 2. For V1 catalog: replace `null` by `__HIVE_DEFAULT_PARTITION__`. 3. For V2 catalogs: pass `null` AS IS, and let catalog implementations to decide how to handle `null`s as partition values in spec. ### Why are the changes needed? Currently, `null` in partition specs is recognized as the `"null"` string which could lead to incorrect results, for example: ```sql spark-sql> CREATE TABLE tbl5 (col1 INT, p1 STRING) USING PARQUET PARTITIONED BY (p1); spark-sql> INSERT INTO TABLE tbl5 PARTITION (p1 = null) SELECT 0; spark-sql> SELECT isnull(p1) FROM tbl5; false ``` Even we inserted a row to the partition with the `null` value, **the resulted table doesn't contain `null`**. ### Does this PR introduce _any_ user-facing change? Yes. After the changes, the example above works as expected: ```sql spark-sql> SELECT isnull(p1) FROM tbl5; true ``` ### How was this patch tested? By running the affected test suites `SQLQuerySuite`, `AlterTablePartitionV2SQLSuite` and `v1/ShowPartitionsSuite`. Authored-by: Max Gekk <max.gekkgmail.com> Signed-off-by: Wenchen Fan <wenchendatabricks.com> (cherry picked from commit 157b72a) Signed-off-by: Max Gekk <max.gekkgmail.com> Closes #31094 from MaxGekk/partition-spec-value-null-3.1. Authored-by: Max Gekk <[email protected]> Signed-off-by: Wenchen Fan <[email protected]>

MaxGekk mentioned this pull request Jan 8, 2021

[SPARK-33591][SQL] Recognize null in partition spec values #30538

Closed

github-actions bot added the SQL label Jan 8, 2021

cloud-fan approved these changes Jan 11, 2021

View reviewed changes

cloud-fan closed this Jan 11, 2021

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[SPARK-33591][SQL][3.1] Recognize `null` in partition spec values #31094

[SPARK-33591][SQL][3.1] Recognize `null` in partition spec values #31094

Uh oh!

MaxGekk commented Jan 8, 2021

Uh oh!

SparkQA commented Jan 8, 2021

Uh oh!

SparkQA commented Jan 8, 2021

Uh oh!

SparkQA commented Jan 8, 2021

Uh oh!

cloud-fan commented Jan 11, 2021

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

[SPARK-33591][SQL][3.1] Recognize null in partition spec values #31094

[SPARK-33591][SQL][3.1] Recognize null in partition spec values #31094

Uh oh!

Conversation

MaxGekk commented Jan 8, 2021

What changes were proposed in this pull request?

Why are the changes needed?

Does this PR introduce any user-facing change?

How was this patch tested?

Uh oh!

SparkQA commented Jan 8, 2021

Uh oh!

SparkQA commented Jan 8, 2021

Uh oh!

SparkQA commented Jan 8, 2021

Uh oh!

cloud-fan commented Jan 11, 2021

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

[SPARK-33591][SQL][3.1] Recognize `null` in partition spec values #31094

[SPARK-33591][SQL][3.1] Recognize `null` in partition spec values #31094