-
Notifications
You must be signed in to change notification settings - Fork 3k
Core: Add CountNull Aggregation Support #13981
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
|
@huaxingao PTAL |
|
Could you help review this pr when you have time? @huaxingao Thanks ~ |
api/src/test/java/org/apache/iceberg/expressions/TestAggregateBinding.java
Outdated
Show resolved
Hide resolved
|
should we also add to SparkAggregates? https://grep.app/search?f.path=spark%2F&f.repo=apache%2Ficeberg&q=Operation.COUNT_STAR |
I believe CountNull is not supported by Spark because it's not standard SQL, which is why I didn't add it at first; However, it's very useful for data-file statistics, so I think it's reasonable to add the support. |
|
i wonder if we need to update the docs too
we have since added CountStar, CountNonNull, and now CountNull |
|
i wonder if this is consider a "spec change" or at least clarification. from https://iceberg.apache.org/spec/#data-file-fields, |
|
In the original java doc |
fc798b1 to
a272efb
Compare
huaxingao
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
|
Thanks @jackylee-ch for the PR! Thanks @kevinjqliu for the review! |
|
Thanks for your preview. @huaxingao @kevinjqliu |
We’re using
Aggregateto generate per-column statistics from data files—metrics like value counts, min/max, and null counts—but discovered that null counts (CountNull) aren’t currently supported. This PR adds null count support for data files. It also enables any other consumer to easily collect null counts moving forward.