Skip to content

Conversation

@cloud-fan
Copy link
Contributor

What changes were proposed in this pull request?

OverwriteOptions was introduced in #15705, to carry the information of static partitions. However, after further refactor, this information becomes duplicated and we can remove OverwriteOptions.

How was this patch tested?

N/A

@cloud-fan
Copy link
Contributor Author

cc @yhuai @ericl

@SparkQA
Copy link

SparkQA commented Nov 23, 2016

Test build #69083 has finished for PR 15995 at commit f52b364.

  • This patch fails Spark unit tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@ericl
Copy link
Contributor

ericl commented Nov 28, 2016

Iirc there was an issue where convertStaticPartitions erased the partitioning information during analysis. Was this fixed by the refactoring?

@cloud-fan
Copy link
Contributor Author

@ericl you are right, I pushed a new commit to do convertStaticPartitions right before we convert InsertIntoTable to InsertIntoHadoopFsRelation, so the partitioning information won't be erased.

@SparkQA
Copy link

SparkQA commented Nov 28, 2016

Test build #69220 has finished for PR 15995 at commit 354a860.

  • This patch fails Spark unit tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

Copy link
Contributor

@ericl ericl Nov 29, 2016

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This seems to be failing in the tests. Is it because the preprocessing rule no longer gets a chance to run now that the static resolution is combined with this rule?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm going to submit another PR to tweak the extended analyzer rules execution order, currently these rules like PreProcessCreateTable, PreWriteCheck, DataSourceAnalysis, etc. may have some dependencies and may worth to put them in different batches.

BTW I have retargeted this JIRA ticket to 2.2

@SparkQA
Copy link

SparkQA commented Dec 2, 2016

Test build #69579 has finished for PR 15995 at commit 5f9da54.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@SparkQA
Copy link

SparkQA commented Dec 3, 2016

Test build #69623 has finished for PR 15995 at commit b5f4394.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

} else {
Map.empty
}
val staticPartitions = parts.filter(_._2.nonEmpty).map { case (k, v) => k -> v.get }
Copy link
Contributor Author

@cloud-fan cloud-fan Dec 11, 2016

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The column names in partition spec are already normalized in PreprocessTableInsertion rule, we don't need to consider case sensitivity here. And the if-else is not needed, because:

  1. staticPartitions is used to get matchingPartitions in this line, and the matchingPartitions is used to decided which partitions need to be added to metastore. Previously if overwrite is false, we will get all partitions as matchingPartitions, and issue a lot of unnecessary ADD PARTITION calls. After removing the if-else, it's fixed.
  2. After we pass staticPartitions to InsertIntoHadoopFsRelationCommand, it will be used only with OverWrite mode, so the if-else is unnecessary.

@SparkQA
Copy link

SparkQA commented Dec 11, 2016

Test build #69986 has finished for PR 15995 at commit 323a97c.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

projectList
}

private def hasBeenPreprocessed(
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Add a comment about who preprocesses this?

partSpec: Map[String, Option[String]],
query: LogicalPlan): Boolean = {
val partColNames = partSchema.map(_.name).toSet
query.resolved && partSpec.keys.forall(partColNames.contains) && {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is it necessary to check that the keys are all valid columns?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah, is the issue to avoid this running before PreprocessTableInsertion?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yup

query.resolved && partSpec.keys.forall(partColNames.contains) && {
val staticPartCols = partSpec.filter(_._2.isDefined).keySet
val expectedColumns = tableOutput.filterNot(a => staticPartCols.contains(a.name))
expectedColumns.toStructType.sameType(query.schema)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

similar question, when is this false?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this is to follow the previous condition: https://github.com/apache/spark/pull/15995/files#diff-d99813bd5bbc18277e4090475e4944cfL166

This can be caused if users issue an invalid command, e.g. INSERT INTO src SELECT 1,2 while table src has 3 columns.

@SparkQA
Copy link

SparkQA commented Dec 13, 2016

Test build #70060 has finished for PR 15995 at commit ed548e6.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@ericl
Copy link
Contributor

ericl commented Dec 14, 2016

Cool, this looks good then

@cloud-fan
Copy link
Contributor Author

thanks for the review, merging to master!

@asfgit asfgit closed this in 3e307b4 Dec 14, 2016
robert3005 pushed a commit to palantir/spark that referenced this pull request Dec 15, 2016
## What changes were proposed in this pull request?

`OverwriteOptions` was introduced in apache#15705, to carry the information of static partitions. However, after further refactor, this information becomes duplicated and we can remove `OverwriteOptions`.

## How was this patch tested?

N/A

Author: Wenchen Fan <[email protected]>

Closes apache#15995 from cloud-fan/overwrite.
uzadude pushed a commit to uzadude/spark that referenced this pull request Jan 27, 2017
## What changes were proposed in this pull request?

`OverwriteOptions` was introduced in apache#15705, to carry the information of static partitions. However, after further refactor, this information becomes duplicated and we can remove `OverwriteOptions`.

## How was this patch tested?

N/A

Author: Wenchen Fan <[email protected]>

Closes apache#15995 from cloud-fan/overwrite.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants