-
Notifications
You must be signed in to change notification settings - Fork 29k
[SPARK-41982][SQL] Partitions of type string should not be treated as numeric types #39558
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
|
cc @AngersZhuuuu @cloud-fan @maropu @wangyum @dongjoon-hyun @HyukjinKwon Hope to get your reply, thanks :-) |
|
Can one of the admins verify this patch? |
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/parser/AstBuilder.scala
Outdated
Show resolved
Hide resolved
sql/catalyst/src/main/scala/org/apache/spark/sql/internal/SQLConf.scala
Outdated
Show resolved
Hide resolved
sql/core/src/main/scala/org/apache/spark/sql/execution/SparkSqlParser.scala
Outdated
Show resolved
Hide resolved
sql/core/src/test/scala/org/apache/spark/sql/SQLInsertTestSuite.scala
Outdated
Show resolved
Hide resolved
|
Could you help me review again ? @cloud-fan |
sql/catalyst/src/main/scala/org/apache/spark/sql/internal/SQLConf.scala
Outdated
Show resolved
Hide resolved
sql/core/src/test/scala/org/apache/spark/sql/SQLInsertTestSuite.scala
Outdated
Show resolved
Hide resolved
|
All comments have been addressed, PTAL, thanks @cloud-fan |
|
thanks, merging to master! |
Thanks for your review! @cloud-fan |
|
@smallzhongfeng Could you help add it to the migration guide? |
|
Sure, but the result of the previous discussion is that there is no need to add, you can see #39558 (comment) I will add it if necessary. @gatorsmile |
What changes were proposed in this pull request?
Ensure that partitions of type string without quotation marks are not recognized as numeric types.
For example:
before spark3.1 and earlier, it will generate such a path:
hdfs://test5/user/hive/db1/test_90/dt=05after spark3.1, it will generate two path:
hdfs://test5/user/hive/db1/test_90/dt=05andhdfs://test5/user/hive/db1/test_90/dt=5This will cause inconsistent read data. After seeing #30421, I think if the user does not know about this change and the migration document does not mention it, I think it will affect the data quality, so I added the parameter
spark.sql.legacy.keepPartitionSpecAsStringLiteral, which will maintain the original effect when the parameter settrue.Why are the changes needed?
If the partition is of
String, but the value of partition without quotation marks, it will still be treated asStringthrough parameter configuration.Does this PR introduce any user-facing change?
After the parameter
spark.sql.legacy.keepPartitionSpecAsStringLiteralis enabled, the partition path generated by partitionpartition (dt=05)and partitionpartition (dt='05')is the same.How was this patch tested?
New uts.