Skip to content

[SPARK-25418][SQL] The metadata of DataSource table should not include Hive-generated storage properties.#22410

Closed
ueshin wants to merge 1 commit intoapache:masterfrom
ueshin:issues/SPARK-25418/hive_metadata
Closed

[SPARK-25418][SQL] The metadata of DataSource table should not include Hive-generated storage properties.#22410
ueshin wants to merge 1 commit intoapache:masterfrom
ueshin:issues/SPARK-25418/hive_metadata

Conversation

@ueshin
Copy link
Copy Markdown
Member

@ueshin ueshin commented Sep 13, 2018

What changes were proposed in this pull request?

When Hive support enabled, Hive catalog puts extra storage properties into table metadata even for DataSource tables, but we should not have them.

How was this patch tested?

Modified a test.

@ueshin
Copy link
Copy Markdown
Member Author

ueshin commented Sep 13, 2018

cc @gatorsmile

@SparkQA
Copy link
Copy Markdown

SparkQA commented Sep 13, 2018

Test build #96023 has finished for PR 22410 at commit 2484ca6.

  • This patch fails due to an unknown error code, -9.
  • This patch merges cleanly.
  • This patch adds no public classes.

@dilipbiswal
Copy link
Copy Markdown
Contributor

retest this please

@SparkQA
Copy link
Copy Markdown

SparkQA commented Sep 13, 2018

Test build #96026 has finished for PR 22410 at commit 2484ca6.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.


val CREATED_SPARK_VERSION = SPARK_SQL_PREFIX + "create.version"

val HIVE_GENERATED_STORAGE_PROPERTIES = Set(SERIALIZATION_FORMAT)
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@ueshin . The title means Hive-generated storage properties, but this PR excludes only this one. Could you add more? Othewise, can we make this as a SQLConf in order to be configurable?

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actually the hive-generated storage property I think we should exclude for now is only this one, but we might have some more in the future, so I'd say "properties" and we will add them to this set in the case. WDYT?

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We can add more in the future. Basically, these properties are useless to Spark data source tables.

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I got it, @ueshin and @gatorsmile .

@gatorsmile
Copy link
Copy Markdown
Member

LGTM

Thanks! Merged to master.

@asfgit asfgit closed this in a81ef9e Sep 14, 2018
locationUri = tableLocation.map(CatalogUtils.stringToURI(_)))
}
val storageWithoutHiveGeneratedProperties = storageWithLocation.copy(
properties = storageWithLocation.properties.filterKeys(!HIVE_GENERATED_STORAGE_PROPERTIES(_)))
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Shall we do this in HiveClientImpl? IIRC we filter out some table props there.

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In HiveClientImpl, we don't know the table is Hive table or DataSource table yet. Can we remove the props even for Hive tables?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

6 participants