Skip to content

Conversation

@ericl
Copy link
Contributor

@ericl ericl commented Nov 30, 2016

What changes were proposed in this pull request?

Two bugs are addressed here

  1. INSERT OVERWRITE TABLE sometime crashed when catalog partition management was enabled. This was because when dropping partitions after an overwrite operation, the Hive client will attempt to delete the partition files. If the entire partition directory was dropped, this would fail. The PR fixes this by adding a flag to control whether the Hive client should attempt to delete files.
  2. The static partition spec for OVERWRITE TABLE was not correctly resolved to the case-sensitive original partition names. This resulted in the entire table being overwritten if you did not correctly capitalize your partition names.

cc @yhuai @cloud-fan

How was this patch tested?

Unit tests. Surprisingly, the existing overwrite table tests did not catch these edge cases.

@SparkQA
Copy link

SparkQA commented Dec 1, 2016

Test build #69432 has finished for PR 16088 at commit 4b6a4ee.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

// partial partition spec.
partSpecs.foreach { p =>
if (existingParts.contains(p) && shouldRemovePartitionLocation) {
if (existingParts.contains(p) && shouldRemovePartitionLocation && !retainData) {
Copy link
Contributor

@cloud-fan cloud-fan Dec 1, 2016

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

shall we put the !retainData in the definition of shouldRemovePartitionLocation? which looks more logical

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done

ignoreIfNotExists: Boolean,
purge: Boolean): Unit = {
purge: Boolean,
retainData: Boolean): Unit = {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

shall we provide a default value? looks like most of the time we want it to be false

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It seemed a little safer to make it mandatory to avoid some component forgetting to propagate it.

ignoreIfNotExists: Boolean,
purge: Boolean): Unit = withClient {
purge: Boolean,
deleteFiles: Boolean): Unit = withClient {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why change the parameter name here? it's called retainData right?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed

@SparkQA
Copy link

SparkQA commented Dec 2, 2016

Test build #69504 has finished for PR 16088 at commit f4356f0.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

AlterTableDropPartitionCommand(
l.catalogTable.get.identifier, deletedPartitions.toSeq,
ifExists = true, purge = true).run(t.sparkSession)
ifExists = true, purge = true,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

an unrelated question, when dropping partitions here, do we have to set purge = true?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it doesn't matter since we don't delete files.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

then can we set it to false? DROP PARTITION ... PURGE is not supported in hive 0.13, setting it to false can make the partition related functions still work in older hive versions.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done

@cloud-fan
Copy link
Contributor

LGTM pending jenkins

@SparkQA
Copy link

SparkQA commented Dec 2, 2016

Test build #69532 has finished for PR 16088 at commit b79cef4.

  • This patch fails Spark unit tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@cloud-fan
Copy link
Contributor

retest this please

@SparkQA
Copy link

SparkQA commented Dec 2, 2016

Test build #69537 has finished for PR 16088 at commit a9f9710.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@SparkQA
Copy link

SparkQA commented Dec 2, 2016

Test build #69543 has finished for PR 16088 at commit a9f9710.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

asfgit pushed a commit that referenced this pull request Dec 2, 2016
…rce tables

## What changes were proposed in this pull request?

Two bugs are addressed here
1. INSERT OVERWRITE TABLE sometime crashed when catalog partition management was enabled. This was because when dropping partitions after an overwrite operation, the Hive client will attempt to delete the partition files. If the entire partition directory was dropped, this would fail. The PR fixes this by adding a flag to control whether the Hive client should attempt to delete files.
2. The static partition spec for OVERWRITE TABLE was not correctly resolved to the case-sensitive original partition names. This resulted in the entire table being overwritten if you did not correctly capitalize your partition names.

cc yhuai cloud-fan

## How was this patch tested?

Unit tests. Surprisingly, the existing overwrite table tests did not catch these edge cases.

Author: Eric Liang <[email protected]>

Closes #16088 from ericl/spark-18659.

(cherry picked from commit 7935c84)
Signed-off-by: Wenchen Fan <[email protected]>
@cloud-fan
Copy link
Contributor

thanks, merging to master/2.1!

@asfgit asfgit closed this in 7935c84 Dec 2, 2016
robert3005 pushed a commit to palantir/spark that referenced this pull request Dec 15, 2016
…rce tables

## What changes were proposed in this pull request?

Two bugs are addressed here
1. INSERT OVERWRITE TABLE sometime crashed when catalog partition management was enabled. This was because when dropping partitions after an overwrite operation, the Hive client will attempt to delete the partition files. If the entire partition directory was dropped, this would fail. The PR fixes this by adding a flag to control whether the Hive client should attempt to delete files.
2. The static partition spec for OVERWRITE TABLE was not correctly resolved to the case-sensitive original partition names. This resulted in the entire table being overwritten if you did not correctly capitalize your partition names.

cc yhuai cloud-fan

## How was this patch tested?

Unit tests. Surprisingly, the existing overwrite table tests did not catch these edge cases.

Author: Eric Liang <[email protected]>

Closes apache#16088 from ericl/spark-18659.
uzadude pushed a commit to uzadude/spark that referenced this pull request Jan 27, 2017
…rce tables

## What changes were proposed in this pull request?

Two bugs are addressed here
1. INSERT OVERWRITE TABLE sometime crashed when catalog partition management was enabled. This was because when dropping partitions after an overwrite operation, the Hive client will attempt to delete the partition files. If the entire partition directory was dropped, this would fail. The PR fixes this by adding a flag to control whether the Hive client should attempt to delete files.
2. The static partition spec for OVERWRITE TABLE was not correctly resolved to the case-sensitive original partition names. This resulted in the entire table being overwritten if you did not correctly capitalize your partition names.

cc yhuai cloud-fan

## How was this patch tested?

Unit tests. Surprisingly, the existing overwrite table tests did not catch these edge cases.

Author: Eric Liang <[email protected]>

Closes apache#16088 from ericl/spark-18659.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants