Skip to content

Conversation

@gatorsmile
Copy link
Member

What changes were proposed in this pull request?

For non-partitioned tables, Hive-generated statistics are stored in table properties. However, for partitioned tables, Hive-generated statistics are stored in partition properties. Thus, we are unable to utilize the Hive-generated statistics for partitioned tables.

The statistics might not be gathered for all the partitions in Hive. For partial collection, we will not utilize the Hive-generated statistics.

How was this patch tested?

Added test cases.

@SparkQA
Copy link

SparkQA commented Sep 20, 2016

Test build #65634 has finished for PR 15158 at commit 061e60b.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@gatorsmile
Copy link
Member Author

cc @hvanhovell @cloud-fan

// For partitioned tables, get the size of all the partitions.
// Note: the statistics might not be gathered for all the partitions.
// For partial collection, we will not utilize the Hive-generated statistics.
private def getTotalTableSize(statType: String): Option[Long] = {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

what if a query only reads some partitions? Looks like the table statistics depend on partition pruning.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

: ) We are facing the same issue in both data source tables and hive tables. That requires another PR to change how we use the statistics of leaf nodes. Partition filtering on statistics should be considered by the Filter nodes. Is my understanding right?

@SparkQA
Copy link

SparkQA commented Mar 3, 2017

Test build #73853 has finished for PR 15158 at commit 061e60b.

  • This patch passes all tests.
  • This patch does not merge cleanly.
  • This patch adds no public classes.

@gatorsmile gatorsmile closed this Jun 16, 2017
@cloud-fan
Copy link
Contributor

hmm, isn't is still valid?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants