-
Notifications
You must be signed in to change notification settings - Fork 2.9k
API, Core: Update inclusive metrics evaluator for extract and transforms #12311
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
API, Core: Update inclusive metrics evaluator for extract and transforms #12311
Conversation
|
|
||
| @ParameterizedTest | ||
| @FieldSource("MISSING_STATS_EXPRESSIONS") | ||
| public void testMissingStats(Expression expr) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If the bounds are in metadata, they are trusted. Bounds should be missing in cases where there are variant values with incompatible types.
| return current; | ||
| } | ||
|
|
||
| public static ByteBuffer serializeBounds(Map<String, VariantValue> bounds) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This serialization is temporary and will be replaced with a well-defined variant serialization.
0c80b18 to
09fcf7b
Compare
12b7f7c to
b1b068d
Compare
b1b068d to
8faa5f1
Compare
|
This needed changes to |
8faa5f1 to
bcdea90
Compare
|
This now relies on #12374 to move the serialized variant classes into API so that |
37a0c04 to
d1a260c
Compare
|
Thanks for reviewing, @danielcweeks! |
This updates
InclusiveMetricsEvaluatorthat uses column stats to skip data files during scan planning.The evaluator was implementing the older
BoundExpressionVisitorinterface that only supportedBoundReferenceand not otherBoundTerminstances likeBoundTransform. After #12304,BoundExtractalso needs to be supported. This PR includes #12304 and will be rebased when it is merged.Filtering works for transformed values when the transform is order preserving. If it is not order preserving (like
bucket) the bounds cannot be used.Most of the changes to
BoundExpressionVisitorare to produce the lower and upper bounds values that are tested:BoundReference, deserialize the bound from the correctlowerBoundsorupperBoundsmap (moved toparseLowerBoundandparseLowerBoundBoundTransform, deserialize the bound and transform it if the transform is order-preserving (intransformLowerBoundandtransformUpperBound)Extract, deserialize the bound as a map from field name toVariantValue, then convert the value to the internal representation (inextractLowerBoundandextractUpperBound)This adds new test suites for the new
BoundTermcases that are supported:TestInclusiveMetricsEvaluatorWithExtracttests variant casesTestInclusiveMetricsEvaluatorWithTransformstests transform cases