Skip to content

Comments

[SPARK-28054][SQL] Fix error when insert Hive partitioned table dynamically where partition name is upper case#24886

Closed
viirya wants to merge 1 commit intoapache:masterfrom
viirya:SPARK-28054
Closed

[SPARK-28054][SQL] Fix error when insert Hive partitioned table dynamically where partition name is upper case#24886
viirya wants to merge 1 commit intoapache:masterfrom
viirya:SPARK-28054

Conversation

@viirya
Copy link
Member

@viirya viirya commented Jun 16, 2019

What changes were proposed in this pull request?

When we use upper case partition name in Hive table, like:

CREATE TABLE src (KEY STRING, VALUE STRING) PARTITIONED BY (DS STRING)

Then, insert into table query doesn't work

INSERT INTO TABLE src PARTITION(ds) SELECT 'k' key, 'v' value, '1' ds
// or
INSERT INTO TABLE src PARTITION(DS) SELECT 'k' KEY, 'v' VALUE, '1' DS
[info]   org.apache.spark.sql.AnalysisException:
org.apache.hadoop.hive.ql.metadata.Table.ValidationFailureSemanticException: Partition spec {ds=, DS=1} contains non-partition columns;  

As Hive metastore is not case preserving and keeps partition columns with lower cased names, we lowercase column names in partition spec before passing to Hive client. But we write upper case column names in partition paths.

However, when calling loadDynamicPartitions to do insert into table for dynamic partition, Hive calculates full path spec for partition paths. So it calculates a partition spec like {ds=, DS=1} in above case and fails partition column validation. This patch is proposed to fix the issue by lowercasing the column names in written partition paths for Hive partitioned table.

This fix touchs saveAsHiveFile method, which is used in InsertIntoHiveDirCommand and InsertIntoHiveTable commands. Among them, only InsertIntoHiveTable passes partitionAttributes parameter. So I think this change only affects InsertIntoHiveTable command.

How was this patch tested?

Added test.

@IceMimosa
Copy link

LGTM

@SparkQA
Copy link

SparkQA commented Jun 16, 2019

Test build #106559 has finished for PR 24886 at commit 1aec7f8.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@viirya
Copy link
Member Author

viirya commented Jun 18, 2019

cc @cloud-fan

@HyukjinKwon
Copy link
Member

Looks correct to me.

@HyukjinKwon
Copy link
Member

Merged to master.

@viirya
Copy link
Member Author

viirya commented Jun 24, 2019

Thanks @HyukjinKwon @cloud-fan


test("SPARK-28054: Unable to insert partitioned table when partition name is upper case") {
withTable("spark_28054_test") {
sql("set hive.exec.dynamic.partition.mode=nonstrict")
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should we set it back?

Use withSQLConf ?

Also do we need to set the case sensitivity conf?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This set follows other tests in same suite. Using withSQLConf is good, yes.

The case sensitivity conf has no effect on this, I think it is fine.

// we also need to lowercase the column names in written partition paths.
// scalastyle:off caselocale
val hiveCompatiblePartitionColumns = partitionAttributes.map { attr =>
attr.withName(attr.name.toLowerCase)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

indent.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

oops..will fix in a follow-up. Thanks.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.

7 participants