-
Notifications
You must be signed in to change notification settings - Fork 4.8k
HIVE-28581: Support Partition Pruning stats optimizations for Iceberg tables #5498
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
HIVE-28581: Support Partition Pruning stats optimizations for Iceberg tables #5498
Conversation
| public DummyPartition(Table tbl, String name, | ||
| Map<String, String> partSpec) { | ||
| setTable(tbl); | ||
| public DummyPartition(Table tbl, String name, Map<String, String> partSpec) throws HiveException { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can we reuse this object or better create another abstraction, like [Virtual/Hidden]Partition?
cc @kasakrisz
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is the functionality of DummyPartition exploited? I saw that when we create DummyPartition instances we return a Partition type reference and never access any DummyPartition defined method. If this is not the case then DummyPartition is fine. The name is misleading though since these objects represent real partitions, aren't they?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
we use overriden getValues and getSpec. And yes, they represent real partitions (not in HMS)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Seems that the current class hierarchy doesn't represent our needs:
Partitionclass represents a partition stored on HMS.DummyPartitionextendsPartitionso it seems like a special HMS stored partition but this is not the case.
Maybe defining a Partition interface or abstract class with a minimal contract (getName, getValues) would be better. And two class could implement/extend it: one for HMS stored partition and another for non-HMS.
It seems to be a bigger refactor because the current Partition class is widely used.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
DummyPartition is used to represent non-HMS partitions. Before iceberg, it had just two properties: table and name. But I agree the name is not self-explanatory
f2e7ab6 to
4b0f6e9
Compare
4b0f6e9 to
68f1a7a
Compare
5c1439b to
9725aa2
Compare
9725aa2 to
fce1d40
Compare
fce1d40 to
c229071
Compare
|
ayushtkn
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM



What changes were proposed in this pull request?
Adds support for Iceberg partition prune stats optimizations.
Why are the changes needed?
Performance
Does this PR introduce any user-facing change?
No
Is the change a dependency upgrade?
No
How was this patch tested?
mvn test -Dtest=TestIcebergCliDriver -Dqfile=iceberg_stats_with_ppr.q -Drat.skip=true