Skip to content

Conversation

@deniskuzZ
Copy link
Member

@deniskuzZ deniskuzZ commented Oct 9, 2024

What changes were proposed in this pull request?

Adds support for Iceberg partition prune stats optimizations.

  1. Adds functionality to compute, store and fetch basic partition stats per snapshot;
  2. Enhances query plan with basic stats, aggregated for pruned partitions;
  3. Enables PartitionConditionRemover (PCR) optimization;
  4. Improves StatsOptimizer, allows to answers queries like count(*) with partition predicate directly from stats;

Why are the changes needed?

Performance

Does this PR introduce any user-facing change?

No

Is the change a dependency upgrade?

No

How was this patch tested?

mvn test -Dtest=TestIcebergCliDriver -Dqfile=iceberg_stats_with_ppr.q -Drat.skip=true

public DummyPartition(Table tbl, String name,
Map<String, String> partSpec) {
setTable(tbl);
public DummyPartition(Table tbl, String name, Map<String, String> partSpec) throws HiveException {
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we reuse this object or better create another abstraction, like [Virtual/Hidden]Partition?
cc @kasakrisz

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is the functionality of DummyPartition exploited? I saw that when we create DummyPartition instances we return a Partition type reference and never access any DummyPartition defined method. If this is not the case then DummyPartition is fine. The name is misleading though since these objects represent real partitions, aren't they?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

we use overriden getValues and getSpec. And yes, they represent real partitions (not in HMS)

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Seems that the current class hierarchy doesn't represent our needs:

  • Partition class represents a partition stored on HMS.
  • DummyPartition extends Partition so it seems like a special HMS stored partition but this is not the case.

Maybe defining a Partition interface or abstract class with a minimal contract (getName, getValues) would be better. And two class could implement/extend it: one for HMS stored partition and another for non-HMS.

It seems to be a bigger refactor because the current Partition class is widely used.

Copy link
Member Author

@deniskuzZ deniskuzZ Oct 22, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

DummyPartition is used to represent non-HMS partitions. Before iceberg, it had just two properties: table and name. But I agree the name is not self-explanatory

@deniskuzZ deniskuzZ changed the title Support Iceberg partition stats with PPR Support Partition Prunning stats optimization for Iceberg tables Oct 9, 2024
@deniskuzZ deniskuzZ force-pushed the iceberg_part_stats_with_ppr branch from f2e7ab6 to 4b0f6e9 Compare October 11, 2024 08:48
@deniskuzZ deniskuzZ force-pushed the iceberg_part_stats_with_ppr branch from 4b0f6e9 to 68f1a7a Compare October 11, 2024 09:01
@sonarqubecloud
Copy link

Copy link
Member

@ayushtkn ayushtkn left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@deniskuzZ deniskuzZ merged commit dee6546 into apache:master Jan 31, 2025
4 checks passed
@deniskuzZ deniskuzZ changed the title HIVE-28581: Support Partition Prunning stats optimizations for Iceberg tables HIVE-28581: Support Partition Pruning stats optimizations for Iceberg tables Jan 31, 2025
@deniskuzZ deniskuzZ deleted the iceberg_part_stats_with_ppr branch January 31, 2025 15:26
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

8 participants