Improve estimation of row count from partition samples#11333
Merged
sopel39 merged 1 commit intotrinodb:masterfrom Mar 8, 2022
Merged
Improve estimation of row count from partition samples#11333sopel39 merged 1 commit intotrinodb:masterfrom
sopel39 merged 1 commit intotrinodb:masterfrom
Conversation
Member
Author
|
TPC benchmark results for partitioned sf1000 orc |
sopel39
reviewed
Mar 7, 2022
...rino-hive/src/main/java/io/trino/plugin/hive/statistics/MetastoreHiveStatisticsProvider.java
Outdated
Show resolved
Hide resolved
...rino-hive/src/main/java/io/trino/plugin/hive/statistics/MetastoreHiveStatisticsProvider.java
Outdated
Show resolved
Hide resolved
6757cfe to
db90cc1
Compare
db90cc1 to
4bc9ead
Compare
sopel39
reviewed
Mar 7, 2022
...rino-hive/src/main/java/io/trino/plugin/hive/statistics/MetastoreHiveStatisticsProvider.java
Outdated
Show resolved
Hide resolved
...rino-hive/src/main/java/io/trino/plugin/hive/statistics/MetastoreHiveStatisticsProvider.java
Outdated
Show resolved
Hide resolved
...-hive/src/test/java/io/trino/plugin/hive/statistics/TestMetastoreHiveStatisticsProvider.java
Outdated
Show resolved
Hide resolved
...-hive/src/test/java/io/trino/plugin/hive/statistics/TestMetastoreHiveStatisticsProvider.java
Outdated
Show resolved
Hide resolved
4bc9ead to
0b02f0f
Compare
skrzypo987
reviewed
Mar 7, 2022
Member
skrzypo987
left a comment
There was a problem hiding this comment.
Not an expert here, but seems legit.
lukasz-stec
approved these changes
Mar 7, 2022
...rino-hive/src/main/java/io/trino/plugin/hive/statistics/MetastoreHiveStatisticsProvider.java
Outdated
Show resolved
Hide resolved
Reduce the possiblity of estimation errors in averageRowsPerPartition and rowCount due to a couple of outliers by excluding the min and max rowCount values from the calculation of avg rows per partition.
0b02f0f to
d3ea6a9
Compare
lukasz-stec
reviewed
Mar 8, 2022
...rino-hive/src/main/java/io/trino/plugin/hive/statistics/MetastoreHiveStatisticsProvider.java
Show resolved
Hide resolved
sopel39
approved these changes
Mar 8, 2022
Member
|
lgtm % mind automation |
Member
Author
|
Test failure due to #11368 |
Closed
This was referenced Mar 21, 2022
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Description
Reduce the possiblity of estimation errors in averageRowsPerPartition
and rowCount due to a couple of outliers by excluding the
min and max rowCount values from the calculation of
avg rows per partition.
improvement
hive connector statistics
improves estimates for partitioned hive tables
Documentation
(x) No documentation is needed.
( ) Sufficient documentation is included in this PR.
( ) Documentation PR is available with #prnumber.
( ) Documentation issue #issuenumber is filed, and can be handled later.
Release notes
(x) No release notes entries required.
( ) Release notes entries required with the following suggested text: