-
Notifications
You must be signed in to change notification settings - Fork 29k
[SPARK-18566][SQL] remove OverwriteOptions #15995
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
|
Test build #69083 has finished for PR 15995 at commit
|
|
Iirc there was an issue where |
|
@ericl you are right, I pushed a new commit to do |
|
Test build #69220 has finished for PR 15995 at commit
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This seems to be failing in the tests. Is it because the preprocessing rule no longer gets a chance to run now that the static resolution is combined with this rule?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm going to submit another PR to tweak the extended analyzer rules execution order, currently these rules like PreProcessCreateTable, PreWriteCheck, DataSourceAnalysis, etc. may have some dependencies and may worth to put them in different batches.
BTW I have retargeted this JIRA ticket to 2.2
|
Test build #69579 has finished for PR 15995 at commit
|
|
Test build #69623 has finished for PR 15995 at commit
|
| } else { | ||
| Map.empty | ||
| } | ||
| val staticPartitions = parts.filter(_._2.nonEmpty).map { case (k, v) => k -> v.get } |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The column names in partition spec are already normalized in PreprocessTableInsertion rule, we don't need to consider case sensitivity here. And the if-else is not needed, because:
staticPartitionsis used to getmatchingPartitionsin this line, and thematchingPartitionsis used to decided which partitions need to be added to metastore. Previously ifoverwriteis false, we will get all partitions asmatchingPartitions, and issue a lot of unnecessaryADD PARTITIONcalls. After removing theif-else, it's fixed.- After we pass
staticPartitionstoInsertIntoHadoopFsRelationCommand, it will be used only withOverWritemode, so theif-elseis unnecessary.
|
Test build #69986 has finished for PR 15995 at commit
|
| projectList | ||
| } | ||
|
|
||
| private def hasBeenPreprocessed( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Add a comment about who preprocesses this?
| partSpec: Map[String, Option[String]], | ||
| query: LogicalPlan): Boolean = { | ||
| val partColNames = partSchema.map(_.name).toSet | ||
| query.resolved && partSpec.keys.forall(partColNames.contains) && { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is it necessary to check that the keys are all valid columns?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ah, is the issue to avoid this running before PreprocessTableInsertion?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yup
| query.resolved && partSpec.keys.forall(partColNames.contains) && { | ||
| val staticPartCols = partSpec.filter(_._2.isDefined).keySet | ||
| val expectedColumns = tableOutput.filterNot(a => staticPartCols.contains(a.name)) | ||
| expectedColumns.toStructType.sameType(query.schema) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
similar question, when is this false?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this is to follow the previous condition: https://github.com/apache/spark/pull/15995/files#diff-d99813bd5bbc18277e4090475e4944cfL166
This can be caused if users issue an invalid command, e.g. INSERT INTO src SELECT 1,2 while table src has 3 columns.
|
Test build #70060 has finished for PR 15995 at commit
|
|
Cool, this looks good then |
|
thanks for the review, merging to master! |
## What changes were proposed in this pull request? `OverwriteOptions` was introduced in apache#15705, to carry the information of static partitions. However, after further refactor, this information becomes duplicated and we can remove `OverwriteOptions`. ## How was this patch tested? N/A Author: Wenchen Fan <[email protected]> Closes apache#15995 from cloud-fan/overwrite.
## What changes were proposed in this pull request? `OverwriteOptions` was introduced in apache#15705, to carry the information of static partitions. However, after further refactor, this information becomes duplicated and we can remove `OverwriteOptions`. ## How was this patch tested? N/A Author: Wenchen Fan <[email protected]> Closes apache#15995 from cloud-fan/overwrite.
What changes were proposed in this pull request?
OverwriteOptionswas introduced in #15705, to carry the information of static partitions. However, after further refactor, this information becomes duplicated and we can removeOverwriteOptions.How was this patch tested?
N/A