-
Notifications
You must be signed in to change notification settings - Fork 29k
[SPARK-31828][SQL] Retain table properties at CreateTableLikeCommand #28647
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
|
One point. cc @cloud-fan |
|
Test build #123155 has finished for PR 28647 at commit
|
| assert(source.properties("a") == "apple") | ||
| sql("CREATE TABLE t LIKE s STORED AS parquet TBLPROPERTIES('f'='foo', 'b'='bar')") | ||
| val table = catalog.getTableMetadata(TableIdentifier("t")) | ||
| assert(table.properties.get("a") === None) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Seems like we intentionally don't keep the table properties from the original table. @maropu @dongjoon-hyun @viirya are you OK with the behavior change proposed by this PR?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Based on the description of CreateTableLikeCommand:
The CatalogTable attributes copied from the source table are storage(inputFormat, outputFormat, serde, compressed, properties), schema, provider, partitionColumnNames, bucketSpec by default.
So we don't say table properties are copied from source table too. Not sure about why we didn't copy table properties.
I feel it is OK as seems copying original table properties should not be harmful. And we already copy storage properties.
But we should update the doc together.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
hm, I have the same opinion with @viirya; I couldn't find any strong reason not to copy the properites. Rather, is this a kind of bugs? Anyway, yea, we should clearly descibe the behaivour in our doc...
|
In If use this behavior, we should change the test code. |
|
let's update the doc of |
|
I see the enum properties key in Shall we copy the tblproperties without these? |
|
Could you update the SQL doc for the behaivour, too? https://github.com/apache/spark/blame/master/docs/sql-ref-syntax-ddl-create-table-like.md |
| CatalogTableType.EXTERNAL | ||
| } | ||
|
|
||
| val newProperties = |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
plz leave some comments about why the metastore props are remvoed here? #28647 (comment)
| object DDLUtils { | ||
| val HIVE_PROVIDER = "hive" | ||
|
|
||
| val METASTORE_GENERATED_PROPERTIES: Set[String] = Set( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just to check; does the latest hive metastore have the same set of these props here?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I searched somewhere, but I think there is no determinate way to check this.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can we reuse HiveClientImpl.HiveStatisticsProperties? They are the hive specific properties we explicitly ignore.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
HiveClientImpl.HiveStatisticsProperties has been used at HiveClientImpl.getTable(), we needn't remove them again.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't feel comfortable maintaining this list here. Is it possible to remove Hive specific properties in HiveClientImpl? So that the hive related stuff remains in hive related classes.
sql/hive/src/test/scala/org/apache/spark/sql/hive/execution/HiveDDLSuite.scala
Show resolved
Hide resolved
|
Test build #123255 has finished for PR 28647 at commit
|
|
Test build #123256 has finished for PR 28647 at commit
|
|
Test build #123257 has finished for PR 28647 at commit
|
| val newProperties = sourceTableDesc.tableType match { | ||
| case VIEW => | ||
| // For view, we just use new properties | ||
| properties |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Keep view behavior as before. Hive also does not copy view properties.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit format;
val newProperties = sourceTableDesc.tableType match {
case MANAGED | EXTERNAL =>
// Hive only retain the useful properties through serde class annotation.
// For better compatible with Hive, we remove the metastore properties.
sourceTableDesc.properties -- DDLUtils.METASTORE_GENERATED_PROPERTIES ++ properties
case VIEW =>
// For view, we just use new properties
properties
}
|
Test build #123272 has finished for PR 28647 at commit
|
| "maxFileSize", | ||
| "minFileSize" | ||
| ) | ||
| assert(targetTable.properties.filterKeys(!metastoreGeneratedProperties.contains(_)).isEmpty, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Remove this check. And we can't check all meta properties here. For example transient_lastDdlTime, hive will add this properties when table created. So the properties is always exists.
Add some meta properties test at new UT.
|
Test build #123291 has finished for PR 28647 at commit
|
| test("SPARK-31828: Retain table properties at CreateTableLikeCommand") { | ||
| val catalog = spark.sessionState.catalog | ||
| withTable("t1", "t2", "t3") { | ||
| sql(s"CREATE TABLE t1(c1 int) TBLPROPERTIES('k1'='v1', 'k2'='v2', 'totalNumberFiles'='meta')") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just in case, could you test all the prpos in METASTORE_GENERATED_PROPERTIES? Probably, its beetter to split it into two test units like add a new test unit test("SPARK-31828: Filters out Hive metastore properties in CreateTableLikeCommand") {.
|
We're closing this PR because it hasn't been updated in a while. This isn't a judgement on the merit of the PR in any way. It's just a way of keeping the PR queue manageable. |
|
I believe this is useful, can we revisit PR ? cc @maropu @cloud-fan @viirya @dilipbiswal |
|
This is reopened to the author (@ulysses-you )'s request. |
|
@ulysses-you . Please rebase your PR regularly to the master to avoid |
|
Kubernetes integration test starting |
|
Kubernetes integration test status success |
|
Test build #130531 has finished for PR 28647 at commit
|
|
thank you @dongjoon-hyun |
|
Kubernetes integration test starting |
|
Test build #130535 has finished for PR 28647 at commit
|
|
Kubernetes integration test status success |
|
Kubernetes integration test starting |
|
Kubernetes integration test status success |
|
Test build #130760 has finished for PR 28647 at commit
|
|
cc @maropu @cloud-fan @viirya do you have time to review this thanks ! |
|
Kubernetes integration test starting |
|
Kubernetes integration test status success |
|
Test build #131201 has finished for PR 28647 at commit
|
|
Test build #131644 has finished for PR 28647 at commit
|
|
We're closing this PR because it hasn't been updated in a while. This isn't a judgement on the merit of the PR in any way. It's just a way of keeping the PR queue manageable. |
What changes were proposed in this pull request?
At CreateTableLikeCommand, we use the new tblproperties with merge source tblproperties.
Why are the changes needed?
We should retain the useful tblproperties, e.g.
parquet.compression. And hive also retain the tblproperties.Does this PR introduce any user-facing change?
No.
How was this patch tested?
Add UT.