-
Notifications
You must be signed in to change notification settings - Fork 3k
API: Detect whether required fields nested within optionals can produce nulls #14270
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
cb947c6 to
df5466b
Compare
|
@pvary @huaxingao @stevenzwu could you review this one please since you reviewed #13804 already? |
…ce nulls This partially reverts some changes around the `Accessor` API that were introduced by apache#13804 and uses a Schema visitor to detect whether any of the parent fields of a nested required field are optional. This info is then used when IS_NULL / NOT_NULL is evaluated
df5466b to
ccb4cbd
Compare
| assertThat(pred.term().type()).isEqualTo(Types.fromPrimitiveString(typeName)); | ||
| } | ||
|
|
||
| private static Stream<Arguments> nullCasesWithNestedStructs() { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this code has been moved from TestBoundReference and slightly adjusted for better readability
| } | ||
|
|
||
| @Test | ||
| public void testIsNullInNestedStruct() { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
added some additional evaluator tests that reproduce the original problem described in #13804
| private Expression bindUnaryOperation(BoundTerm<T> boundTerm) { | ||
| private Expression bindUnaryOperation(StructType struct, BoundTerm<T> boundTerm) { | ||
| boolean allFieldsAreRequired = | ||
| TypeUtil.findParents(struct.asSchema(), boundTerm.ref().fieldId()).stream() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
How often do we evaluate these expressions? For every manifest entry?
For a wide schema, what is the additional cost of calculating the parents every time?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
we would evaluate these whenever we have an unbound predicate that we want to bind. I've updated this so that it's only evaluated when having a IS_NULL or NOT_NULL and when !boundTerm.producesNull() evaluates to true
783db2f to
fdba428
Compare
huaxingao
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM. Nice work!
stevenzwu
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
awesome work, @nastra. thanks for improving this before our next release.
api/src/main/java/org/apache/iceberg/expressions/UnboundPredicate.java
Outdated
Show resolved
Hide resolved
fdba428 to
ea2ce47
Compare
ea2ce47 to
c588bb1
Compare
|
Thanks @nastra for the PR! Thanks @stevenzwu @pvary for the review! |
…ce nulls (apache#14270) * API: Detect whether required fields nested within optionals can produce nulls This partially reverts some changes around the `Accessor` API that were introduced by apache#13804 and uses a Schema visitor to detect whether any of the parent fields of a nested required field are optional. This info is then used when IS_NULL / NOT_NULL is evaluated * only check parent fields on IS_NULL/NOT_NULL (cherry picked from commit a473b1c)
…ce nulls (#14270) (#14512) * API: Detect whether required fields nested within optionals can produce nulls This partially reverts some changes around the `Accessor` API that were introduced by #13804 and uses a Schema visitor to detect whether any of the parent fields of a nested required field are optional. This info is then used when IS_NULL / NOT_NULL is evaluated * only check parent fields on IS_NULL/NOT_NULL (cherry picked from commit a473b1c) Co-authored-by: Eduard Tudenhoefner <[email protected]>
This partially reverts some changes around the
AccessorAPI that were introduced by #13804 and uses a Schema visitor to detect whether any of the parent fields of a nested required field are optional. This info is then used when IS_NULL / NOT_NULL is evaluated