Skip to content

Conversation

@nastra
Copy link
Contributor

@nastra nastra commented Oct 7, 2025

This partially reverts some changes around the Accessor API that were introduced by #13804 and uses a Schema visitor to detect whether any of the parent fields of a nested required field are optional. This info is then used when IS_NULL / NOT_NULL is evaluated

@github-actions github-actions bot added the API label Oct 7, 2025
@nastra nastra force-pushed the nested-fields-with-nulls branch from cb947c6 to df5466b Compare October 7, 2025 09:50
@nastra
Copy link
Contributor Author

nastra commented Oct 7, 2025

@pvary @huaxingao @stevenzwu could you review this one please since you reviewed #13804 already?

…ce nulls

This partially reverts some changes around the `Accessor` API that were introduced by apache#13804 and uses a Schema visitor
to detect whether any of the parent fields of a nested required field are optional.
This info is then used when IS_NULL / NOT_NULL is evaluated
@nastra nastra force-pushed the nested-fields-with-nulls branch from df5466b to ccb4cbd Compare October 7, 2025 10:56
assertThat(pred.term().type()).isEqualTo(Types.fromPrimitiveString(typeName));
}

private static Stream<Arguments> nullCasesWithNestedStructs() {
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this code has been moved from TestBoundReference and slightly adjusted for better readability

}

@Test
public void testIsNullInNestedStruct() {
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

added some additional evaluator tests that reproduce the original problem described in #13804

@nastra nastra requested a review from rdblue October 7, 2025 10:59
private Expression bindUnaryOperation(BoundTerm<T> boundTerm) {
private Expression bindUnaryOperation(StructType struct, BoundTerm<T> boundTerm) {
boolean allFieldsAreRequired =
TypeUtil.findParents(struct.asSchema(), boundTerm.ref().fieldId()).stream()
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How often do we evaluate these expressions? For every manifest entry?
For a wide schema, what is the additional cost of calculating the parents every time?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

we would evaluate these whenever we have an unbound predicate that we want to bind. I've updated this so that it's only evaluated when having a IS_NULL or NOT_NULL and when !boundTerm.producesNull() evaluates to true

@nastra nastra requested review from huaxingao and stevenzwu October 8, 2025 13:39
@nastra nastra force-pushed the nested-fields-with-nulls branch from 783db2f to fdba428 Compare October 8, 2025 13:42
Copy link
Contributor

@huaxingao huaxingao left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM. Nice work!

Copy link
Contributor

@stevenzwu stevenzwu left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

awesome work, @nastra. thanks for improving this before our next release.

@nastra nastra force-pushed the nested-fields-with-nulls branch from fdba428 to ea2ce47 Compare October 9, 2025 06:34
@nastra nastra force-pushed the nested-fields-with-nulls branch from ea2ce47 to c588bb1 Compare October 9, 2025 06:39
@huaxingao huaxingao merged commit a473b1c into apache:main Oct 10, 2025
43 checks passed
@huaxingao
Copy link
Contributor

Thanks @nastra for the PR! Thanks @stevenzwu @pvary for the review!

@huaxingao huaxingao added this to the Iceberg 1.10.1 milestone Oct 10, 2025
@nastra nastra deleted the nested-fields-with-nulls branch October 15, 2025 15:46
huaxingao pushed a commit to huaxingao/iceberg that referenced this pull request Nov 6, 2025
…ce nulls (apache#14270)

* API: Detect whether required fields nested within optionals can produce nulls

This partially reverts some changes around the `Accessor` API that were introduced by apache#13804 and uses a Schema visitor
to detect whether any of the parent fields of a nested required field are optional.
This info is then used when IS_NULL / NOT_NULL is evaluated

* only check parent fields on IS_NULL/NOT_NULL

(cherry picked from commit a473b1c)
huaxingao added a commit that referenced this pull request Nov 6, 2025
…ce nulls (#14270) (#14512)

* API: Detect whether required fields nested within optionals can produce nulls

This partially reverts some changes around the `Accessor` API that were introduced by #13804 and uses a Schema visitor
to detect whether any of the parent fields of a nested required field are optional.
This info is then used when IS_NULL / NOT_NULL is evaluated

* only check parent fields on IS_NULL/NOT_NULL

(cherry picked from commit a473b1c)

Co-authored-by: Eduard Tudenhoefner <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants