[SPARK-25418][SQL] The metadata of DataSource table should not include Hive-generated storage properties.#22410
Conversation
|
cc @gatorsmile |
|
Test build #96023 has finished for PR 22410 at commit
|
|
retest this please |
|
Test build #96026 has finished for PR 22410 at commit
|
|
|
||
| val CREATED_SPARK_VERSION = SPARK_SQL_PREFIX + "create.version" | ||
|
|
||
| val HIVE_GENERATED_STORAGE_PROPERTIES = Set(SERIALIZATION_FORMAT) |
There was a problem hiding this comment.
@ueshin . The title means Hive-generated storage properties, but this PR excludes only this one. Could you add more? Othewise, can we make this as a SQLConf in order to be configurable?
There was a problem hiding this comment.
Actually the hive-generated storage property I think we should exclude for now is only this one, but we might have some more in the future, so I'd say "properties" and we will add them to this set in the case. WDYT?
There was a problem hiding this comment.
We can add more in the future. Basically, these properties are useless to Spark data source tables.
|
LGTM Thanks! Merged to master. |
| locationUri = tableLocation.map(CatalogUtils.stringToURI(_))) | ||
| } | ||
| val storageWithoutHiveGeneratedProperties = storageWithLocation.copy( | ||
| properties = storageWithLocation.properties.filterKeys(!HIVE_GENERATED_STORAGE_PROPERTIES(_))) |
There was a problem hiding this comment.
Shall we do this in HiveClientImpl? IIRC we filter out some table props there.
There was a problem hiding this comment.
In HiveClientImpl, we don't know the table is Hive table or DataSource table yet. Can we remove the props even for Hive tables?
What changes were proposed in this pull request?
When Hive support enabled, Hive catalog puts extra storage properties into table metadata even for DataSource tables, but we should not have them.
How was this patch tested?
Modified a test.