Skip to content

Only collect stats for primitive types in ThriftHiveMetastore#17268

Merged
rschlussel merged 1 commit intoprestodb:masterfrom
otakart:fix_analyze
Feb 9, 2022
Merged

Only collect stats for primitive types in ThriftHiveMetastore#17268
rschlussel merged 1 commit intoprestodb:masterfrom
otakart:fix_analyze

Conversation

@otakart
Copy link
Copy Markdown
Contributor

@otakart otakart commented Feb 8, 2022

Fixes #16693

Test plan - (Please fill in how you tested your changes)

  1. Verified on webUI that ANALYZE TABLE on Hive tables reads raw data proportional only to columns with primitive types
  2. Verified ANALYZE TABLE works for both partitioned and un-partitioned hive tables with complex columns
  3. Added unit test for partitioned table with complex columns
== RELEASE NOTES ==

Hive Changes
* Fix ANALYZE TABLE for partitioned Hive tables with complex columns (array, map, struct)
* Improve performance of ANALYZE TABLE on hive tables with complex columns

@linux-foundation-easycla
Copy link
Copy Markdown

CLA Not Signed

1 similar comment
@linux-foundation-easycla
Copy link
Copy Markdown

CLA Not Signed

Copy link
Copy Markdown
Contributor

@rschlussel rschlussel left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you for working on this! Can you add a unit test for partitioned tables to TestHiveTableStatistics for a partitioned table with non-primitive types?

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would inline the function here instead of adding it to Util since it's not used anywhere else.

@rschlussel
Copy link
Copy Markdown
Contributor

Also, please make sure to sign the CLA

@rschlussel
Copy link
Copy Markdown
Contributor

rschlussel commented Feb 8, 2022

One more thing:
Make sure your commit message follows our guidelines here: https://github.com/prestodb/presto/wiki/Review-and-Commit-guidelines#commit-formatting-and-pull-requests. You could change your commit title to something like:

Only collect stats for primitive types in ThriftHiveMetastore

@otakart otakart changed the title Custom getSupportedColumnStatistics for ThriftHiveMetastore ignorin… Only collect stats for primitive types in ThriftHiveMetastore Feb 9, 2022
@otakart
Copy link
Copy Markdown
Contributor Author

otakart commented Feb 9, 2022

Thanks @rschlussel for reviewing this.

I have force pushed a new version of the commit, where all the comments are addressed.

@rschlussel rschlussel merged commit a25592e into prestodb:master Feb 9, 2022
@rschlussel
Copy link
Copy Markdown
Contributor

Thank you!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

ANALYZE reads unused nested data

2 participants