-
Notifications
You must be signed in to change notification settings - Fork 29k
[SPARK-17284] [SQL] Remove Statistics-related Table Properties from SHOW CREATE TABLE #14855
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
| case (key, _) => key == "EXTERNAL" && metadata.tableType == EXTERNAL | ||
| // Skips all the stats info (See the JIRA: HIVE-13792) | ||
| case (key, _) => | ||
| key == "numFiles" || key == "numRows" || key == "totalSize" || key == "numPartitions" || |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is it possible that the list of hidden properties will grow in future? If so, can we not add them with || here? A separated list like the excludedTableProperties below seems good. And we can check if the key is in the list.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sure, let me change it. Thanks!
|
Test build #64560 has finished for PR 14855 at commit
|
|
cc @hvanhovell @cloud-fan @liancheng @yhuai Thanks! |
| val filteredProps = metadata.properties.filterNot { | ||
| // Skips "EXTERNAL" property for external tables | ||
| case (key, _) => key == "EXTERNAL" && metadata.tableType == EXTERNAL | ||
| // Skips all the stats info (See the JIRA: HIVE-13792) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Shouldn't we fix this in the HiveExternalCatalog and just drop those properties there?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Not sure whether I got your point. Are you saying getTableOption in HiveExternalCatalog?
Initially, I had the same design like you. Later, I realized we still use/display them in the other DDL statements, for example, DESCRIBE EXTENDED TABLE.
Let me know if my understanding is wrong. Thanks!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I mean in general: we should not leak Hive specific implementation (i.e. hacks) into sql/core.
At first I thought that we might be able to hide/filter them but you have a point. The good news is that this becomes a non-issue when we have statistics as a part of the CatalogTable. Then these properties can become the problem of a yet-to-be written translation.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah, agree on this general rule.
When we support translations, we need to be very careful about including these statistics info in the SHOW CREATE TABLE DDL. Hive does not include them in SHOW CREATE TABLE, as shown in their JIRA: https://issues.apache.org/jira/browse/HIVE-13792. If we allow users to provide the statistics info when creating the tables, we might need to mark them as inaccurate, like what Hive does now?
BTW, should we merge this in 2.1 before we support the translation? So far, Spark 2.0 has the bug. Let me know what I should do next.
Thanks!
|
Where do we need the hive table statistics properties(does |
|
Another example is spark/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/catalog/SessionCatalog.scala Lines 238 to 246 in f7c9ff5
How about hiding them in |
|
sgtm, do we need to retrieve the statistics properties in |
|
Will try it. My current solution is to first retrieve the table properties and fill these Hive-specific properties (if not set by the callers) before issuing the command to Hive. Will do it tonight. Thanks! |
|
This PR is part of another PR #14971. Close it now. |
What changes were proposed in this pull request?
The statistics-related table properties should be skipped by
SHOW CREATE TABLE, since it could be incorrect in the newly created table. See the Hive JIRA: https://issues.apache.org/jira/browse/HIVE-13792The output of
SHOW CREATE TABLE t1isAfter the fix, the output becomes
How was this patch tested?
Updated the existing test cases.