Skip to content

Conversation

@gengliangwang
Copy link
Member

What changes were proposed in this pull request?

This is a follow up for #30538.
It adds a legacy conf spark.sql.legacy.parseNullPartitionSpecAsStringLiteral in case users wants the legacy behavior.
It also adds document for the behavior change.

Why are the changes needed?

In case users want the legacy behavior, they can set spark.sql.legacy.parseNullPartitionSpecAsStringLiteral as true.

Does this PR introduce any user-facing change?

Yes, adding a legacy configuration to restore the old behavior.

How was this patch tested?

Unit test.

@gengliangwang
Copy link
Member Author

cc @cloud-fan @HyukjinKwon @MaxGekk

}
}

test("SPARK-33591: null as string partition literal value 'null' after setting legacy conf") {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could you move this test to org.apache.spark.sql.execution.command.ShowPartitionsSuiteBase. It should pass for DS v2 too.

Comment on lines 477 to 478
val processNullLiteral =
!conf.getConf(SQLConf.LEGACY_PARSE_NULL_PARTITION_SPEC_AS_STRING_LITERAL)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could you move the config read out of map(). Don't need to re-read it every partition part.

@SparkQA
Copy link

SparkQA commented Feb 1, 2021

Test build #134741 has finished for PR 31421 at commit b062f8b.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@gengliangwang
Copy link
Member Author

@MaxGekk Thanks for the suggestions. I have addressed all of them.

@SparkQA
Copy link

SparkQA commented Feb 2, 2021

Kubernetes integration test starting
URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/39342/

@SparkQA
Copy link

SparkQA commented Feb 2, 2021

Kubernetes integration test status success
URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/39342/

@HyukjinKwon
Copy link
Member

Merged to master.

@gengliangwang would you mind changing the version and making backporting PRs please?


- In Spark 3.2, the auto-generated `Cast` (such as those added by type coercion rules) will be stripped when generating column alias names. E.g., `sql("SELECT floor(1)").columns` will be `FLOOR(1)` instead of `FLOOR(CAST(1 AS DOUBLE))`.

- In Spark 3.2, a null partition value is parsed as it is. In Spark 3.1 or earlier, it is parsed as a string literal of its text representation, e.g., string "null". To restore the legacy behavior, you can set `spark.sql.legacy.parseNullPartitionSpecAsStringLiteral` as true.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let's be more specific

In Spark 3.2, `PARTITION(col=null)` is always parsed as a null literal in the partition spec. In Spark 3.1 or earlier,
it is parsed as a string literal ..., if the partition column is string type. To restore the legacy behavior, ...

*/
protected def visitStringConstant(ctx: ConstantContext): String = withOrigin(ctx) {
protected def visitStringConstant(
ctx: ConstantContext,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: 4 spaces indentation

*/
override def visitPartitionSpec(
ctx: PartitionSpecContext): Map[String, Option[String]] = withOrigin(ctx) {
val processNullLiteral =
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this name is weird, how about legacyNullAsString?

val LEGACY_PARSE_NULL_PARTITION_SPEC_AS_STRING_LITERAL =
buildConf("spark.sql.legacy.parseNullPartitionSpecAsStringLiteral")
.internal()
.doc("If it is set to true, a null partition value is parsed as a string literal of its " +
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ditto, make it more specific.

.doc("If it is set to true, a null partition value is parsed as a string literal of its " +
"text representation, e.g., string 'null'. Otherwise, null partition values are parsed " +
"as they are.")
.version("3.2.0")
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

the version should be 3.1.1, as we are going to backport this.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

should be 3.0.2 actually.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah I see, then how should we place it in the migration guide? Repeat it 3 times in 3.0.2, 3.1.1 and 3.2.0 sections?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we have documented only in 3.0.2 so far in such cases ..

sql(s"CREATE TABLE $t (col1 INT, p1 STRING) $defaultUsing PARTITIONED BY (p1)")
sql(s"INSERT INTO TABLE $t PARTITION (p1 = null) SELECT 0")
runShowPartitionsSql(s"SHOW PARTITIONS $t", Row("p1=null") :: Nil)
runShowPartitionsSql(s"SHOW PARTITIONS $t PARTITION (p1 = null)", Row("p1=null") :: Nil)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

how does this show the null string behavior? shall we just test a simple table scan operation?

@gengliangwang
Copy link
Member Author

@cloud-fan Thanks for the comments. I have addressed them in #31434

@SparkQA
Copy link

SparkQA commented Feb 2, 2021

Test build #134756 has finished for PR 31421 at commit 971ccbd.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

cloud-fan pushed a commit that referenced this pull request Feb 2, 2021
…l.legacy.parseNullPartitionSpecAsStringLiteral`

### What changes were proposed in this pull request?

Correct the version of SQL configuration `spark.sql.legacy.parseNullPartitionSpecAsStringLiteral` from 3.2.0 to 3.0.2.
Also, revise the documentation and test case.

### Why are the changes needed?

The release version in #31421 was wrong.

### Does this PR introduce _any_ user-facing change?

No

### How was this patch tested?

Unit tests

Closes #31434 from gengliangwang/reviseVersion.

Authored-by: Gengliang Wang <[email protected]>
Signed-off-by: Wenchen Fan <[email protected]>
gengliangwang added a commit to gengliangwang/spark that referenced this pull request Feb 2, 2021
…artition spec values

This is a follow up for apache#30538.
It adds a legacy conf `spark.sql.legacy.parseNullPartitionSpecAsStringLiteral` in case users wants the legacy behavior.
It also adds document for the behavior change.

In case users want the legacy behavior, they can set `spark.sql.legacy.parseNullPartitionSpecAsStringLiteral` as true.

Yes, adding a legacy configuration to restore the old behavior.

Unit test.

Closes apache#31421 from gengliangwang/legacyNullStringConstant.

Authored-by: Gengliang Wang <[email protected]>
Signed-off-by: HyukjinKwon <[email protected]>
gengliangwang added a commit to gengliangwang/spark that referenced this pull request Feb 2, 2021
…l.legacy.parseNullPartitionSpecAsStringLiteral`

Correct the version of SQL configuration `spark.sql.legacy.parseNullPartitionSpecAsStringLiteral` from 3.2.0 to 3.0.2.
Also, revise the documentation and test case.

The release version in apache#31421 was wrong.

No

Unit tests

Closes apache#31434 from gengliangwang/reviseVersion.

Authored-by: Gengliang Wang <[email protected]>
Signed-off-by: Wenchen Fan <[email protected]>
gengliangwang added a commit to gengliangwang/spark that referenced this pull request Feb 2, 2021
…artition spec values

This is a follow up for apache#30538.
It adds a legacy conf `spark.sql.legacy.parseNullPartitionSpecAsStringLiteral` in case users wants the legacy behavior.
It also adds document for the behavior change.

In case users want the legacy behavior, they can set `spark.sql.legacy.parseNullPartitionSpecAsStringLiteral` as true.

Yes, adding a legacy configuration to restore the old behavior.

Unit test.

Closes apache#31421 from gengliangwang/legacyNullStringConstant.

Authored-by: Gengliang Wang <[email protected]>
Signed-off-by: HyukjinKwon <[email protected]>
gengliangwang added a commit to gengliangwang/spark that referenced this pull request Feb 2, 2021
…l.legacy.parseNullPartitionSpecAsStringLiteral`

Correct the version of SQL configuration `spark.sql.legacy.parseNullPartitionSpecAsStringLiteral` from 3.2.0 to 3.0.2.
Also, revise the documentation and test case.

The release version in apache#31421 was wrong.

No

Unit tests

Closes apache#31434 from gengliangwang/reviseVersion.

Authored-by: Gengliang Wang <[email protected]>
Signed-off-by: Wenchen Fan <[email protected]>
HyukjinKwon pushed a commit that referenced this pull request Feb 2, 2021
…ull partition spec values

### What changes were proposed in this pull request?

This PR is to backport #31421 and #31421 to branch 3.0
This is a follow up for #30538.
It adds a legacy conf `spark.sql.legacy.parseNullPartitionSpecAsStringLiteral` in case users wants the legacy behavior.
It also adds document for the behavior change.

### Why are the changes needed?

In case users want the legacy behavior, they can set `spark.sql.legacy.parseNullPartitionSpecAsStringLiteral` as true.

### Does this PR introduce _any_ user-facing change?

Yes, adding a legacy configuration to restore the old behavior.

### How was this patch tested?

Unit test.

Closes #31441 from gengliangwang/backportLegacyConf3.0.

Authored-by: Gengliang Wang <[email protected]>
Signed-off-by: HyukjinKwon <[email protected]>
HyukjinKwon pushed a commit that referenced this pull request Feb 3, 2021
…ull partition spec values

### What changes were proposed in this pull request?

This PR is to backport #31421 and #31434 to branch 3.1
This is a follow up for #30538.
It adds a legacy conf `spark.sql.legacy.parseNullPartitionSpecAsStringLiteral` in case users wants the legacy behavior.
It also adds document for the behavior change.

### Why are the changes needed?

In case users want the legacy behavior, they can set `spark.sql.legacy.parseNullPartitionSpecAsStringLiteral` as true.

### Does this PR introduce _any_ user-facing change?

Yes, adding a legacy configuration to restore the old behavior.

### How was this patch tested?

Unit test.

Closes #31439 from gengliangwang/backportLegacyConf3.1.

Authored-by: Gengliang Wang <[email protected]>
Signed-off-by: HyukjinKwon <[email protected]>
skestle pushed a commit to skestle/spark that referenced this pull request Feb 3, 2021
…artition spec values

### What changes were proposed in this pull request?

This is a follow up for apache#30538.
It adds a legacy conf `spark.sql.legacy.parseNullPartitionSpecAsStringLiteral` in case users wants the legacy behavior.
It also adds document for the behavior change.

### Why are the changes needed?

In case users want the legacy behavior, they can set `spark.sql.legacy.parseNullPartitionSpecAsStringLiteral` as true.

### Does this PR introduce _any_ user-facing change?

Yes, adding a legacy configuration to restore the old behavior.

### How was this patch tested?

Unit test.

Closes apache#31421 from gengliangwang/legacyNullStringConstant.

Authored-by: Gengliang Wang <[email protected]>
Signed-off-by: HyukjinKwon <[email protected]>
skestle pushed a commit to skestle/spark that referenced this pull request Feb 3, 2021
…l.legacy.parseNullPartitionSpecAsStringLiteral`

### What changes were proposed in this pull request?

Correct the version of SQL configuration `spark.sql.legacy.parseNullPartitionSpecAsStringLiteral` from 3.2.0 to 3.0.2.
Also, revise the documentation and test case.

### Why are the changes needed?

The release version in apache#31421 was wrong.

### Does this PR introduce _any_ user-facing change?

No

### How was this patch tested?

Unit tests

Closes apache#31434 from gengliangwang/reviseVersion.

Authored-by: Gengliang Wang <[email protected]>
Signed-off-by: Wenchen Fan <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants