Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix: accept empty arrays in struct field lookup #997

Merged
merged 3 commits into from
Aug 6, 2024

Conversation

grobgl
Copy link
Contributor

@grobgl grobgl commented Aug 5, 2024

Fixes #992.

Empty pyarrow arrays are considered falsy, which caused a ResolveError for required fields during scan operations.

Fixes apache#992.

Empty `pyarrow` arrays are considered falsy, which caused a `ResolveError` for required fields during scan operations.
Copy link
Contributor

@Fokko Fokko left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Great catch @grobgl

@Fokko Fokko added this to the PyIceberg 0.7.1 release milestone Aug 5, 2024
@ndrluis
Copy link
Collaborator

ndrluis commented Aug 5, 2024

@grobgl Thank you for the fix. Could you add a integration test to guarantee the expected behavior?

This covers the issue reported in apache#992 where empty scan queries yielded a `ResolveError`. Specifically, this occurred under the following conditions:
- a table with an ordered, non-nullable string column
- a scan filtering for a non-existing value _within_ the range of the values in that particular column
@ndrluis
Copy link
Collaborator

ndrluis commented Aug 6, 2024

Thank you, @grobgl. Could you please double-check your test implementation? I removed the fix and the test didn't break.

@grobgl
Copy link
Contributor Author

grobgl commented Aug 6, 2024

Thank you, @grobgl. Could you please double-check your test implementation? I removed the fix and the test didn't break.

Removing the fix does break the test for me. Possibly, this depends on your pyarrow version (I'm on 17.0.0).

@ndrluis
Copy link
Collaborator

ndrluis commented Aug 6, 2024

That's strange because I'm running your branch on my machine, so we shouldn't be seeing different behavior.

Copy link
Collaborator

@ndrluis ndrluis left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@grobgl, I apologize for the confusion. I rebuilt the environment from scratch and everything is working correctly now. Thank you for your contribution!

@sungwy
Copy link
Collaborator

sungwy commented Aug 6, 2024

Hi @Fokko and @ndrluis thank you for your reviews, and @grobgl for reporting this issue and getting the fix ready so quickly!

I've ran the CI, but it looks like it's failing the lint check.

@grobgl : could I ask you to run make lint on the branch and push the linted code?

@sungwy sungwy merged commit 5704388 into apache:main Aug 6, 2024
7 checks passed
sungwy pushed a commit that referenced this pull request Aug 9, 2024
* Fix: accept empty arrays in struct field lookup

Fixes #992.

Empty `pyarrow` arrays are considered falsy, which caused a `ResolveError` for required fields during scan operations.

* Integration test: empty scan on non-nullable ordered string column

This covers the issue reported in #992 where empty scan queries yielded a `ResolveError`. Specifically, this occurred under the following conditions:
- a table with an ordered, non-nullable string column
- a scan filtering for a non-existing value _within_ the range of the values in that particular column

* Lint (add missing newline)
sungwy pushed a commit that referenced this pull request Aug 9, 2024
* Fix: accept empty arrays in struct field lookup

Fixes #992.

Empty `pyarrow` arrays are considered falsy, which caused a `ResolveError` for required fields during scan operations.

* Integration test: empty scan on non-nullable ordered string column

This covers the issue reported in #992 where empty scan queries yielded a `ResolveError`. Specifically, this occurred under the following conditions:
- a table with an ordered, non-nullable string column
- a scan filtering for a non-existing value _within_ the range of the values in that particular column

* Lint (add missing newline)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

table.scan queries failing sometimes when result is empty
4 participants