Skip to content

Conversation

@zhangbutao
Copy link
Contributor

@zhangbutao zhangbutao commented Apr 25, 2024

What changes were proposed in this pull request?

At present, in case of iceberg.hive.keep.stats=true & hive.compute.query.using.stats=true, HS2 will do a fetch task to get iceberg table's numRows property from HMS to optimize count query.
If iceberg.hive.keep.stats=false, HS2 will always launch tez task to compute table's row count when filing a count query.

However, as we know, iceberg table's metadata has some stats information, we can also just start a fetch task to retrieve the row count from iceberg's snapshot summary when iceberg.hive.keep.stats=false or no stats stored in hms. This can avoid launching tez task to compute the table's row count.

BTW, timetravel or branch/tag has different stats from current snapshot, so we need to get the specified snapshotid based on the different iceberg version. Otherwise, we will get the wrong stats when querying the time travel/branch/tag.

Why are the changes needed?

Does this PR introduce any user-facing change?

No

Is the change a dependency upgrade?

No

How was this patch tested?

Qtest

filterExpr: (a = 22) (type: boolean)
Snapshot ref: branch_test1
Statistics: Num rows: 3 Data size: 291 Basic stats: COMPLETE Column stats: COMPLETE
Statistics: Num rows: 5 Data size: 485 Basic stats: COMPLETE Column stats: COMPLETE
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Before this PR, we always get row count of branch/tag/timetravel by the current snapshot summary, which is not right.

Copy link
Member

@deniskuzZ deniskuzZ left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM +1, pending tests
thanks @zhangbutao for addressing the review comment quickly!

@sonarqubecloud
Copy link

@deniskuzZ deniskuzZ merged commit b431e11 into apache:master Oct 25, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants