-
Notifications
You must be signed in to change notification settings - Fork 29k
[SPARK-17284] [SQL] Remove Statistics-related Table Properties from SHOW CREATE TABLE #14855
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Changes from all commits
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
| Original file line number | Diff line number | Diff line change |
|---|---|---|
|
|
@@ -791,11 +791,22 @@ case class ShowCreateTableCommand(table: TableIdentifier) extends RunnableComman | |
| } | ||
| } | ||
|
|
||
| // These table properties should not be included in the output statement of SHOW CREATE TABLE | ||
| val excludedTableProperties = Set( | ||
| // The following are hive-generated statistics fields | ||
| "COLUMN_STATS_ACCURATE", | ||
| "numFiles", | ||
| "numPartitions", | ||
| "numRows", | ||
| "rawDataSize", | ||
| "totalSize" | ||
| ) | ||
|
|
||
| private def showHiveTableProperties(metadata: CatalogTable, builder: StringBuilder): Unit = { | ||
| if (metadata.properties.nonEmpty) { | ||
| val filteredProps = metadata.properties.filterNot { | ||
| // Skips "EXTERNAL" property for external tables | ||
| case (key, _) => key == "EXTERNAL" && metadata.tableType == EXTERNAL | ||
| // Skips all the stats info (See the JIRA: HIVE-13792) | ||
|
Contributor
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Shouldn't we fix this in the
Member
Author
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Not sure whether I got your point. Are you saying Initially, I had the same design like you. Later, I realized we still use/display them in the other DDL statements, for example, Let me know if my understanding is wrong. Thanks!
Contributor
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I mean in general: we should not leak Hive specific implementation (i.e. hacks) into sql/core. At first I thought that we might be able to hide/filter them but you have a point. The good news is that this becomes a non-issue when we have statistics as a part of the CatalogTable. Then these properties can become the problem of a yet-to-be written translation.
Member
Author
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Yeah, agree on this general rule. When we support translations, we need to be very careful about including these statistics info in the SHOW CREATE TABLE DDL. Hive does not include them in SHOW CREATE TABLE, as shown in their JIRA: https://issues.apache.org/jira/browse/HIVE-13792. If we allow users to provide the statistics info when creating the tables, we might need to mark them as inaccurate, like what Hive does now? BTW, should we merge this in 2.1 before we support the translation? So far, Spark 2.0 has the bug. Let me know what I should do next. Thanks! |
||
| case (key, _) => excludedTableProperties.contains(key) | ||
| } | ||
|
|
||
| val props = filteredProps.map { case (key, value) => | ||
|
|
||
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Shouldn't we set each of these property names as a constant so that we can use them in the translation layer?
Uh oh!
There was an error while loading. Please reload this page.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This PR is for fixing a bug. We might need to backport it to 2.0. When we implementing the translation layer, we can do that, just like what we did for the property names of the Data Source Table schema