-
Notifications
You must be signed in to change notification settings - Fork 29k
[SPARK-33591][SQL][FOLLOWUP] Add legacy config for recognizing null partition spec values #31421
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[SPARK-33591][SQL][FOLLOWUP] Add legacy config for recognizing null partition spec values #31421
Conversation
| } | ||
| } | ||
|
|
||
| test("SPARK-33591: null as string partition literal value 'null' after setting legacy conf") { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could you move this test to org.apache.spark.sql.execution.command.ShowPartitionsSuiteBase. It should pass for DS v2 too.
sql/core/src/test/scala/org/apache/spark/sql/execution/command/v1/ShowPartitionsSuite.scala
Outdated
Show resolved
Hide resolved
| val processNullLiteral = | ||
| !conf.getConf(SQLConf.LEGACY_PARSE_NULL_PARTITION_SPEC_AS_STRING_LITERAL) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could you move the config read out of map(). Don't need to re-read it every partition part.
|
Test build #134741 has finished for PR 31421 at commit
|
…/v1/ShowPartitionsSuite.scala Co-authored-by: Maxim Gekk <[email protected]>
|
@MaxGekk Thanks for the suggestions. I have addressed all of them. |
|
Kubernetes integration test starting |
|
Kubernetes integration test status success |
|
Merged to master. @gengliangwang would you mind changing the version and making backporting PRs please? |
|
|
||
| - In Spark 3.2, the auto-generated `Cast` (such as those added by type coercion rules) will be stripped when generating column alias names. E.g., `sql("SELECT floor(1)").columns` will be `FLOOR(1)` instead of `FLOOR(CAST(1 AS DOUBLE))`. | ||
|
|
||
| - In Spark 3.2, a null partition value is parsed as it is. In Spark 3.1 or earlier, it is parsed as a string literal of its text representation, e.g., string "null". To restore the legacy behavior, you can set `spark.sql.legacy.parseNullPartitionSpecAsStringLiteral` as true. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Let's be more specific
In Spark 3.2, `PARTITION(col=null)` is always parsed as a null literal in the partition spec. In Spark 3.1 or earlier,
it is parsed as a string literal ..., if the partition column is string type. To restore the legacy behavior, ...
| */ | ||
| protected def visitStringConstant(ctx: ConstantContext): String = withOrigin(ctx) { | ||
| protected def visitStringConstant( | ||
| ctx: ConstantContext, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: 4 spaces indentation
| */ | ||
| override def visitPartitionSpec( | ||
| ctx: PartitionSpecContext): Map[String, Option[String]] = withOrigin(ctx) { | ||
| val processNullLiteral = |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this name is weird, how about legacyNullAsString?
| val LEGACY_PARSE_NULL_PARTITION_SPEC_AS_STRING_LITERAL = | ||
| buildConf("spark.sql.legacy.parseNullPartitionSpecAsStringLiteral") | ||
| .internal() | ||
| .doc("If it is set to true, a null partition value is parsed as a string literal of its " + |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ditto, make it more specific.
| .doc("If it is set to true, a null partition value is parsed as a string literal of its " + | ||
| "text representation, e.g., string 'null'. Otherwise, null partition values are parsed " + | ||
| "as they are.") | ||
| .version("3.2.0") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
the version should be 3.1.1, as we are going to backport this.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
should be 3.0.2 actually.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ah I see, then how should we place it in the migration guide? Repeat it 3 times in 3.0.2, 3.1.1 and 3.2.0 sections?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think we have documented only in 3.0.2 so far in such cases ..
| sql(s"CREATE TABLE $t (col1 INT, p1 STRING) $defaultUsing PARTITIONED BY (p1)") | ||
| sql(s"INSERT INTO TABLE $t PARTITION (p1 = null) SELECT 0") | ||
| runShowPartitionsSql(s"SHOW PARTITIONS $t", Row("p1=null") :: Nil) | ||
| runShowPartitionsSql(s"SHOW PARTITIONS $t PARTITION (p1 = null)", Row("p1=null") :: Nil) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
how does this show the null string behavior? shall we just test a simple table scan operation?
|
@cloud-fan Thanks for the comments. I have addressed them in #31434 |
|
Test build #134756 has finished for PR 31421 at commit
|
…l.legacy.parseNullPartitionSpecAsStringLiteral` ### What changes were proposed in this pull request? Correct the version of SQL configuration `spark.sql.legacy.parseNullPartitionSpecAsStringLiteral` from 3.2.0 to 3.0.2. Also, revise the documentation and test case. ### Why are the changes needed? The release version in #31421 was wrong. ### Does this PR introduce _any_ user-facing change? No ### How was this patch tested? Unit tests Closes #31434 from gengliangwang/reviseVersion. Authored-by: Gengliang Wang <[email protected]> Signed-off-by: Wenchen Fan <[email protected]>
…artition spec values This is a follow up for apache#30538. It adds a legacy conf `spark.sql.legacy.parseNullPartitionSpecAsStringLiteral` in case users wants the legacy behavior. It also adds document for the behavior change. In case users want the legacy behavior, they can set `spark.sql.legacy.parseNullPartitionSpecAsStringLiteral` as true. Yes, adding a legacy configuration to restore the old behavior. Unit test. Closes apache#31421 from gengliangwang/legacyNullStringConstant. Authored-by: Gengliang Wang <[email protected]> Signed-off-by: HyukjinKwon <[email protected]>
…l.legacy.parseNullPartitionSpecAsStringLiteral` Correct the version of SQL configuration `spark.sql.legacy.parseNullPartitionSpecAsStringLiteral` from 3.2.0 to 3.0.2. Also, revise the documentation and test case. The release version in apache#31421 was wrong. No Unit tests Closes apache#31434 from gengliangwang/reviseVersion. Authored-by: Gengliang Wang <[email protected]> Signed-off-by: Wenchen Fan <[email protected]>
…artition spec values This is a follow up for apache#30538. It adds a legacy conf `spark.sql.legacy.parseNullPartitionSpecAsStringLiteral` in case users wants the legacy behavior. It also adds document for the behavior change. In case users want the legacy behavior, they can set `spark.sql.legacy.parseNullPartitionSpecAsStringLiteral` as true. Yes, adding a legacy configuration to restore the old behavior. Unit test. Closes apache#31421 from gengliangwang/legacyNullStringConstant. Authored-by: Gengliang Wang <[email protected]> Signed-off-by: HyukjinKwon <[email protected]>
…l.legacy.parseNullPartitionSpecAsStringLiteral` Correct the version of SQL configuration `spark.sql.legacy.parseNullPartitionSpecAsStringLiteral` from 3.2.0 to 3.0.2. Also, revise the documentation and test case. The release version in apache#31421 was wrong. No Unit tests Closes apache#31434 from gengliangwang/reviseVersion. Authored-by: Gengliang Wang <[email protected]> Signed-off-by: Wenchen Fan <[email protected]>
…ull partition spec values ### What changes were proposed in this pull request? This PR is to backport #31421 and #31421 to branch 3.0 This is a follow up for #30538. It adds a legacy conf `spark.sql.legacy.parseNullPartitionSpecAsStringLiteral` in case users wants the legacy behavior. It also adds document for the behavior change. ### Why are the changes needed? In case users want the legacy behavior, they can set `spark.sql.legacy.parseNullPartitionSpecAsStringLiteral` as true. ### Does this PR introduce _any_ user-facing change? Yes, adding a legacy configuration to restore the old behavior. ### How was this patch tested? Unit test. Closes #31441 from gengliangwang/backportLegacyConf3.0. Authored-by: Gengliang Wang <[email protected]> Signed-off-by: HyukjinKwon <[email protected]>
…ull partition spec values ### What changes were proposed in this pull request? This PR is to backport #31421 and #31434 to branch 3.1 This is a follow up for #30538. It adds a legacy conf `spark.sql.legacy.parseNullPartitionSpecAsStringLiteral` in case users wants the legacy behavior. It also adds document for the behavior change. ### Why are the changes needed? In case users want the legacy behavior, they can set `spark.sql.legacy.parseNullPartitionSpecAsStringLiteral` as true. ### Does this PR introduce _any_ user-facing change? Yes, adding a legacy configuration to restore the old behavior. ### How was this patch tested? Unit test. Closes #31439 from gengliangwang/backportLegacyConf3.1. Authored-by: Gengliang Wang <[email protected]> Signed-off-by: HyukjinKwon <[email protected]>
…artition spec values ### What changes were proposed in this pull request? This is a follow up for apache#30538. It adds a legacy conf `spark.sql.legacy.parseNullPartitionSpecAsStringLiteral` in case users wants the legacy behavior. It also adds document for the behavior change. ### Why are the changes needed? In case users want the legacy behavior, they can set `spark.sql.legacy.parseNullPartitionSpecAsStringLiteral` as true. ### Does this PR introduce _any_ user-facing change? Yes, adding a legacy configuration to restore the old behavior. ### How was this patch tested? Unit test. Closes apache#31421 from gengliangwang/legacyNullStringConstant. Authored-by: Gengliang Wang <[email protected]> Signed-off-by: HyukjinKwon <[email protected]>
…l.legacy.parseNullPartitionSpecAsStringLiteral` ### What changes were proposed in this pull request? Correct the version of SQL configuration `spark.sql.legacy.parseNullPartitionSpecAsStringLiteral` from 3.2.0 to 3.0.2. Also, revise the documentation and test case. ### Why are the changes needed? The release version in apache#31421 was wrong. ### Does this PR introduce _any_ user-facing change? No ### How was this patch tested? Unit tests Closes apache#31434 from gengliangwang/reviseVersion. Authored-by: Gengliang Wang <[email protected]> Signed-off-by: Wenchen Fan <[email protected]>
What changes were proposed in this pull request?
This is a follow up for #30538.
It adds a legacy conf
spark.sql.legacy.parseNullPartitionSpecAsStringLiteralin case users wants the legacy behavior.It also adds document for the behavior change.
Why are the changes needed?
In case users want the legacy behavior, they can set
spark.sql.legacy.parseNullPartitionSpecAsStringLiteralas true.Does this PR introduce any user-facing change?
Yes, adding a legacy configuration to restore the old behavior.
How was this patch tested?
Unit test.